CN116091871B

CN116091871B - Physical countermeasure sample generation method and device for target detection model

Info

Publication number: CN116091871B
Application number: CN202310208464.0A
Authority: CN
Inventors: 李渝; 杨力尘; 张冲; 何道敬; 谢恩泽
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2023-03-07
Filing date: 2023-03-07
Publication date: 2023-08-25
Anticipated expiration: 2043-03-07
Also published as: CN116091871A

Abstract

The invention relates to the technical field of machine vision, in particular to a physical countermeasure sample generation method and a device for a target detection model, wherein the physical countermeasure sample generation method for the target detection model comprises the following steps: acquiring object information to obtain an original data set, and obtaining an enhanced data set and a target mask according to the original data set; establishing a reverse graph model to be trained and a rendering model to be trained; based on the enhanced data set and the target mask, synchronously training the reverse graph model to be trained and the rendering model to be trained to obtain the reverse graph model and the rendering model; obtaining a generated data set according to the original data set, the inverse graph model and the rendering model; obtaining optimized 3D attributes according to the generated data set and the rendering model; and obtaining an countermeasure generation sample according to the optimized 3D attribute and the rendering model. The present invention can generate an countermeasure sample effective in spoofing the target detector by only 2D images in various complex environments.

Description

Physical countermeasure sample generation method and device for target detection model

Technical Field

The invention relates to the technical field of machine vision, in particular to a physical countermeasure sample generation method and device aiming at a target detection model.

Background

Physical challenge sample generation refers to an attack method that adds a carefully designed pattern to a real object through computer technology so that the object can spoof the attacked model. As deep neural networks have been studied more deeply, the drawbacks of neural networks have also been revealed, and by adding deliberate perturbations to the normal image, the attacked neural network can be misclassified, which is also known as a digital challenge sample, since it needs to be generated on the digital image.

In the real world, due to physical limitations, such as shooting angle, distance, light, and the like, a common digital countermeasure sample cannot adapt to a real environment, so that a physical countermeasure sample capable of adapting to a real complex environment is generated. Most of the existing physical countermeasure samples aim at classification problems, few attack methods aiming at target detection are adopted, and most of the existing physical countermeasure samples need to have enough 3D data sets in advance, so that the final effect is poor in performance in reality.

Physical challenge sample generation of a Full-coverage Camouflage Attack (FCA) method is extracted by using a vehicle Full 3D model, the method renders non-planar texture painting on the surface of the whole vehicle, the vehicle is converted into a real scene by using a conversion function, and finally the texture painting is optimized on the basis. This technique requires the pre-possession of a 3D model of the vehicle and is difficult to apply directly to other items.

In the prior art, there is a lack of an antagonistic sample generation method that can effectively fool an existing object detector in various complex environments by only 2D image generation.

Disclosure of Invention

The embodiment of the invention provides a physical countermeasure sample generation method and device aiming at a target detection model. The technical scheme is as follows:

in one aspect, a method for generating a physical challenge sample for a target detection model is provided, the method being implemented by an electronic device, the method comprising:

acquiring object information to obtain an original data set, and obtaining an enhanced data set and a target mask according to the original data set;

establishing a reverse graph model to be trained and a rendering model to be trained;

taking the enhancement data set as an input sample of the inverse graph model to be trained, taking the target 3D attribute and the target mask output by the inverse graph model to be trained as the input sample of the rendering model to be trained, and synchronously training the inverse graph model to be trained and the rendering model to be trained to obtain an inverse graph model and a rendering model;

obtaining a generated data set according to the original data set, the inverse graph model and the rendering model;

obtaining optimized 3D attributes according to the generated data set and the rendering model;

and obtaining an countermeasure generation sample according to the optimized 3D attribute and the rendering model.

Optionally, the collecting the target object information, obtaining an original data set, and obtaining an enhanced data set and a target mask according to the original data set, including:

shooting a target object to obtain an original data set;

inputting the original data set into a preset style to generate an countermeasure network for training, and obtaining an enhanced data set;

and inputting the enhanced data set into a preset unsupervised front background segmentation model for mask extraction to obtain a target mask.

Optionally, the obtaining a generated dataset according to the original dataset, the inverse graph model and the rendering model includes:

inputting the original data set into the inverse graph model to perform 3D attribute extraction to obtain a first 3D attribute;

adding a random disturbance texture attribute into the first 3D attribute to obtain a second 3D attribute;

and obtaining a generated data set according to the second 3D attribute and the rendering model.

Wherein the 3D attributes include 3D mesh attributes, illumination attributes, and texture attributes of the data.

Optionally, the obtaining the optimized 3D attribute according to the generated dataset and the rendering model includes:

inputting the generated data set into a preset target object detection model for calculation to obtain a target loss function;

performing back propagation through the differentiable rendering model according to the target loss function to obtain a first optimized texture attribute;

optimizing the first optimized texture attribute in a gradient optimization mode to obtain a second optimized texture attribute;

and adding the second optimized texture attribute to the first 3D attribute to obtain an optimized 3D attribute.

In another aspect, there is provided a physical challenge sample generating device for a target detection model, the device being applied to a physical challenge sample generating method for a target detection model, the device comprising:

the data acquisition module is used for acquiring information of a target object, obtaining an original data set, and obtaining an enhanced data set and a target mask according to the original data set;

the model building module is used for building a reverse graph model to be trained and a rendering model to be trained;

the model training module is used for taking the enhanced data set as an input sample of the inverse graph model to be trained, taking the target 3D attribute and the target mask output by the inverse graph model to be trained as the input sample of the rendering model to be trained, and synchronously training the inverse graph model to be trained and the rendering model to be trained to obtain the inverse graph model and the rendering model;

the generation data set acquisition module is used for acquiring a generation data set according to the original data set, the inverse graph model and the rendering model;

the optimized 3D attribute acquisition module is used for acquiring optimized 3D attributes according to the generated data set and the rendering model;

and the countermeasure sample generation module is used for obtaining a countermeasure generation sample according to the optimized 3D attribute and the rendering model.

Optionally, the data acquisition module is further configured to:

shooting a target object to obtain an original data set;

Optionally, the generating data set obtaining module is further configured to:

adding random disturbance texture attributes according to the first 3D attributes to obtain second 3D attributes;

Optionally, the optimizing 3D attribute obtaining module is further configured to:

In another aspect, an electronic device is provided that includes a processor and a memory having at least one instruction stored therein that is loaded and executed by the processor to implement the physical challenge sample generating method for a target detection model described above.

In another aspect, a computer readable storage medium having stored therein at least one instruction loaded and executed by a processor to implement the above-described physical challenge sample generating method for a target detection model is provided.

The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:

the invention provides a physical countermeasure sample generation method aiming at a target detection model, which is used for acquiring 3D attributes of a target object through a reverse graph model based on a 2D image of the target object; adding disturbance to textures in the 3D attributes to obtain optimized 3D attributes; and generating a challenge sample through the rendering model based on the optimized 3D attribute. The present invention does not require that a model of the target object be acquired in advance, and in various complex environments, a challenge sample that effectively spoofs the target detector can be generated from only the 2D image.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for generating a physical challenge sample for a target detection model according to an embodiment of the present invention;

FIG. 2 is a block diagram of a physical challenge sample generating device for a target detection model according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantages to be solved more apparent, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.

The embodiment of the invention provides a physical countermeasure sample generation method aiming at a target detection model, which can be realized by electronic equipment, wherein the electronic equipment can be a terminal or a server. A flow chart of a method for generating a physical challenge sample for a target detection model as shown in fig. 1, the process flow of the method may include the steps of:

s1, acquiring object information to obtain an original data set, and obtaining an enhanced data set and a target mask according to the original data set.

Optionally, collecting the object information to obtain an original data set, and obtaining the enhanced data set and the object mask according to the original data set, including:

shooting a target object to obtain an original data set;

inputting the original data set into a preset style generation countermeasure network for training to obtain an enhanced data set;

and inputting the enhanced data set into a preset unsupervised front background segmentation model to perform mask extraction, and obtaining a target mask.

In a possible implementation manner, in the invention, a shooting device is used for shooting a target object from 0 to 360 degrees, a plurality of object images from 12 different angles are shot at intervals of 30 degrees, and angle information is marked to form a smaller data set, and the data set is called an original data set.

Training is performed using an original dataset, i.e., photographed multi-angle images, input into a preset Style generation countermeasure network (Style-Based Generative Adversarial Network, style gan), which is a model with a 16-layer network that converts a code sampled from a normal distribution into 16 vectors. Of these 16 vectors, each vector controls an image feature. The first four vectors of the styleGAN control the angle of the photographing device, and a large number of multi-view pictures can be generated by fixing the vectors of other dimensions and changing the first four vectors. In particular, a large number of multi-view images are generated for each image, the original images and the generated images form a multi-view large dataset, and the dataset generated by the original dataset through the style against the network is called an enhanced dataset.

The image mask may be used to extract the target object mask using an unsupervised foreground segmentation method.

S2, establishing a reverse graph model to be trained and a rendering model to be trained.

In a possible implementation manner, a reverse graph model is constructed based on a convolutional neural network, the input of the reverse graph model is the enhancement data set in the step, the output of the reverse graph model is a 3D grid, illumination and texture corresponding to prediction respectively, and the 3D attribute of an image can be extracted after the reverse graph model is fully trained.

Constructing a rendering model using a differential renderer DIB-R, the inputs of whichAnd->And->Wherein->Is 3D grid attribute and texture attribute extracted by inverse graph model, < >>Is the shooting angle +.>Is an image mask, wherein the image mask can output a rendered image by an unsupervised front background segmentation method>And render mask->. The specific loss function is shown in the following formula (1):

wherein L is _col Is L after image reconstruction ₁ Loss, definition domain RGB color space; l (L) _percept Is a perceptual penalty for making the predicted texture more realistic; l (L) _IOU Calculating a cross union between the real mask and the rendering mask; l (L) _SM And L _lao Are regularization terms used for ensuring good image shape; l (L) _mov Make the image shape small and uniform, lambda _col Is the color loss coefficient lambda _percept Is the perceptual loss coefficient, lambda _IOU Mask loss coefficient lambda _sm Is a regularization coefficient lambda _lap Is the Laplace coefficient lambda _mov Is a shape constraint.

S3, taking the enhanced data set as an input sample of the inverse graph model to be trained, taking the target 3D attribute and the target mask output by the inverse graph model to be trained as the input sample of the rendering model to be trained, and synchronously training the inverse graph model to be trained and the rendering model to be trained to obtain the inverse graph model and the rendering model;

in a feasible implementation mode, inputting a reverse graph model to be trained according to an enhanced data set of a target object in each training round to obtain a target 3D attribute of the target object; inputting the target 3D attribute and the target mask into a rendering model to be trained to continue training, and obtaining a generated data set based on the target object; and finally obtaining the trained reverse graph model and rendering model through multiple rounds of training.

In the above steps, shooting is performed according to the target object, a large number of multi-angle two-dimensional pictures are generated, data enhancement is performed according to a large number of two-dimensional pictures of the target object, and a large number of two-dimensional pictures with the same style, namely an enhancement data set, are obtained. And carrying out data extraction on a large number of two-dimensional pictures with the same style generated based on the target object through a reverse graph model, carrying out continuous iterative optimization along with input data based on self learning capacity of the reverse graph model to obtain model internal parameters capable of extracting 3D attributes of the target object, and fixing the model internal parameters to obtain the trained reverse graph model.

The rendering model training method is the same as that of the inverse graph model, and will not be described here again.

And S4, obtaining a generated data set according to the original data set, the inverse graph model and the rendering model.

Optionally, obtaining the generated dataset from the original dataset, the inverse graph model and the rendering model comprises:

inputting the original data set into a reverse graph model to perform 3D attribute extraction to obtain a first 3D attribute;

In a possible implementation, 3D attributes in the original data are extracted according to a reverse graph model, wherein 3D grid attributes and illumination attributes in the 3D attributes based on the target object are kept fixed, the 3D grid attributes are denoted as mesh, the illumination attributes are denoted as light, and the texture attributes are denoted as t.

Processing the separated texture attributes, adding random disturbance to the texture attributes, and recording the new texture attributes as。

The rendering model is expressed as D, the 3D grid attribute mesh generated based on the target object, the illumination light and the new texture attribute t' are input into the rendering model, a new image is output, and a data set is generated and recorded as x.

In a possible embodiment, the objects in the coordinate system are sampled, and the sampling points are connected in a certain order into a series of facets (triangles or coplanar quadrilaterals, pentagons, etc.), each facet being an independent rendering unit. The object thus sampled is represented by a number of triangular, quadrangular or pentagonal components.

The roughness of the surface of the target object is simulated by using the illumination attribute and the texture attribute, the material attribute of the target object is further reflected, the model constructed by the 3D grid is filled, and the simulated target object is enabled to be more vivid and more fit with the real object.

And S5, obtaining optimized 3D attributes according to the generated data set and the rendering model.

Optionally, obtaining the optimized 3D attribute according to the generated dataset and the rendering model includes:

back propagation is carried out through a differentiable rendering model according to the target loss function, and a first optimized texture attribute is obtained;

In a possible embodiment, a predetermined detector target detector for the target object is used, denoted asDetecting the generated data set x generated in the above step, and outputting the detected target object bounding box set +.>And item confidence +/for each bounding box>And item probability distribution->. Wherein->Confidence indicating the presence of the object in the box, +.>A probability distribution indicating which item is within the box. Minimizing the target item loss function, the mathematical expression is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,extracting the presence of the +.>Probability of class object->For a target item, S is the 3D mesh attribute of the target item extracted from the inverse map.

And reversely transmitting the texture attribute to the data set through the rendering model according to the loss function of the target object, wherein the texture attribute obtained at the moment is the first optimized texture attribute, optimizing the newly generated texture attribute through a gradient optimization method to obtain a second optimized texture attribute, and replacing the second optimized texture attribute with the original texture attribute in the 3D attribute generated based on the target object. The 3D mesh attribute and the illumination attribute obtained by the target object at this time and the newly generated second optimized texture attribute together form a new 3D attribute, namely the optimized 3D attribute.

The spray error and color jump problems are prevented in this step so that the challenge sample fails, adding constraints to it on the basis of the loss function.

Color abrupt change refers to a problem that it is difficult to capture color abrupt change that may exist in painting of a photographed image due to limitations of a camera pixel, and to control the error, a constraint term is added to a loss function loss, which calculates a difference between each pixel point and surrounding pixel points, and controls color abrupt change by excessively minimizing the value. The color loss function is calculated as shown in the following formula (3):

wherein r is _i,j Representing the pixel value of the ith row and jth column in the generated picture.

The spraying error means that due to the limitation of the spraying technology in the real world, the actual spraying color and the digital color have certain chromatic aberration, fine adjustment is carried out on the color on the image of the generated data set x to generate a new image x', the two images are sent to a target detector, loss functions are calculated respectively, and the average value is taken as the total loss function.

The final overall loss function calculation formula is shown in the following formula (4):

wherein L is ₁ 、L ₂ Is data set x and data set x', which minimizes the loss of probability of detection by the target detector after input to the target detector, and λ is the color loss factor in this process.

S6, obtaining an countermeasure generation sample according to the optimized 3D attribute and the rendering model.

In a possible embodiment, the method of the invention can be implemented on a variety of real-world items, enabling an omnidirectional, multi-angle spoof target detector, without requiring a 3D model of the item in advance.

The invention extracts 3D attributes from 2D pictures and generates countermeasure samples based on the 3D attributes. Meanwhile, the spraying error existing in the real world is considered, so that the spraying error can be effective within a certain error, and the robustness and the practicability of the generated countermeasure sample are improved.

FIG. 2 is a block diagram of a physical challenge sample generating device for a target detection model, according to an example embodiment. Referring to fig. 2, the apparatus includes:

the data acquisition module 210 is configured to acquire information of a target object, obtain an original data set, and obtain an enhanced data set and a target mask according to the original data set;

the model building module 220 is configured to build a reverse graph model to be trained and a rendering model to be trained;

the model training module 230 is configured to take the enhancement data set as an input sample of the inverse graph model to be trained, take the target 3D attribute and the target mask output by the inverse graph model to be trained as an input sample of the rendering model to be trained, and perform synchronous training on the inverse graph model to be trained and the rendering model to be trained to obtain an inverse graph model and a rendering model;

a generated data set obtaining module 240, configured to obtain a generated data set according to the original data set, the inverse graph model and the rendering model;

an optimized 3D attribute obtaining module 250, configured to obtain optimized 3D attributes according to the generated dataset and the rendering model;

an countermeasure sample generation module 260 for obtaining a countermeasure generation sample from the optimized 3D attributes and the rendering model.

Optionally, the data acquisition module 210 is further configured to:

shooting a target object to obtain an original data set;

Optionally, the generating data set obtaining module 240 is further configured to:

Optionally, the optimizing 3D attribute obtaining module 250 is further configured to:

Fig. 3 is a schematic structural diagram of an electronic device 300 according to an embodiment of the present invention, where the electronic device 300 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 301 and one or more memories 302, where at least one instruction is stored in the memories 302, and the at least one instruction is loaded and executed by the processors 301 to implement the steps of the above-described method for generating a physical challenge sample for a target detection model.

In an exemplary embodiment, a computer readable storage medium, such as a memory comprising instructions executable by a processor in a terminal to perform a physical challenge sample generating method for a target detection model as described above is also provided. For example, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. A method of generating a physical challenge sample for a target detection model, the method comprising:

2. The method of claim 1, wherein the acquiring object information to obtain an original dataset, and obtaining an enhanced dataset and an object mask from the original dataset, comprises:

shooting a target object to obtain an original data set;

3. A physical challenge sample generation method for a target detection model according to claim 1, wherein the obtaining a generated dataset from the original dataset, the inverse graph model, and the rendering model comprises:

4. A physical challenge sample generation method for a target detection model according to claim 3, wherein the 3D attributes comprise 3D mesh attributes, illumination attributes and texture attributes of the data.

5. A physical challenge sample generation method for a target detection model according to claim 1, wherein said obtaining optimized 3D attributes from said generated dataset and said rendering model comprises:

6. A physical challenge sample generating device for a target detection model, the device comprising:

7. The apparatus for generating physical challenge samples for a target detection model of claim 6, wherein the data acquisition module is further configured to:

shooting a target object to obtain an original data set;

8. The physical challenge sample generating device for a target detection model of claim 6, wherein the generated data set acquisition module is further configured to:

9. The physical challenge sample generating device for the object detection model of claim 8, wherein the 3D attributes comprise 3D mesh attributes, illumination attributes, and texture attributes of the data.

10. The physical challenge sample generating device for a target detection model of claim 6, wherein the optimized 3D attribute acquisition module is further configured to: