CN113743231A

CN113743231A - Video target detection evasion system and method

Info

Publication number: CN113743231A
Application number: CN202110909116.7A
Authority: CN
Inventors: 陈晶; 汪欣欣; 何琨; 杜瑞颖; 康鹏昊; 吴宗儒; 张润航; 胡诗睿; 佘计思
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2021-08-09
Filing date: 2021-08-09
Publication date: 2021-12-03
Anticipated expiration: 2041-08-09
Also published as: CN113743231B

Abstract

The invention discloses a video target detection evasion system and a video target detection evasion method. The adaptive training module is used for detecting the evasion patch target, extracting the number and confidence of human body detection and getting rid of the limitation on the type and parameters of an actual detection model; the patch distance adaptive module is used for carrying out distance adaptive updating on the patch based on a threshold value specified by a user or preset by a system, and ensuring the protection performance of the patch under different distances; the multiple loss function calculation module and the digital world patch fitting module are used for realizing clothes fold simulation, physical world color transformation, picture training loss constraint and the like of the patch in the digital world to ensure the robustness of the patch transferred to the physical world. The method is not specific to a specific model, can be effective in different models, has good physical world robustness, and meets the requirement of user side privacy protection.

Description

Video target detection evasion system and method

Technical Field

The invention belongs to the technical field of protecting target detection privacy disclosure by confrontation samples in computer vision, and relates to a video target detection evasion system and a method, in particular to a universal human target detection privacy protection system and a method.

Background

The behavior tracking, intelligent monitoring and other fields develop rapidly, and target detection and identification are used as core technologies of the behavior tracking, intelligent monitoring and other fields, so that great convenience is provided for people, and meanwhile, huge safety challenges and risks are brought to individual privacy. Users often use camouflage means such as glasses, hats and masks to avoid disclosure of personal privacy, but the means cause inconvenience to users in traveling and cannot technically and thoroughly avoid video target detection.

The main reason that the target detection and identification technology brings great risk to personal privacy security is that the construction cost of the platform is extremely low. According to survey findings, the YOLOv3 human target detection recognition model can run to 40FPS on raspberry pi 4B + development board. This means that individuals or enterprises can build such human target detection models to acquire massive pedestrian data with only extremely low cost (one camera, one raspberry development board and the model code of the opened source). As shown in fig. 1-2, this is a human target detection and feature extraction system, and the data obtained by the system not only includes the captured pedestrian picture, but also includes the personal privacy information obtained after the analysis and processing of the human target detection model, which covers the detailed features of pedestrian behavior, whereabouts, face, clothing, etc.

In recent years, the research of target detection interference based on confrontation samples becomes a hotspot in the field of academic research, but the research still has more problems. The video target detection interference technology based on the countermeasure sample disturbs a target detection network by generating a specific countermeasure sample so as to protect the track privacy of a user. However, most of the existing countermeasure samples are generated based on a single model of a white box, the real target detection model is complex in structure, the actual privacy protection requirements of users are difficult to meet for the countermeasure samples, and weak points such as portability, easiness in perception, long-distance attack failure and the like exist; how to improve the interference capability of a sample on various target detection models, ensure the effectiveness of the sample in a full-distance range, and improve the naturalness and the portability of the sample is a challenge to be solved urgently based on the anti-sample video target detection interference technology at present.

Disclosure of Invention

In view of the above-mentioned drawbacks of the conventional privacy protection scheme and the safety and performance requirements for protecting human body privacy feature information in the real physical world, the invention provides a human body target detection and avoidance system and method with high universality, high distance adaptability and good semantic property in the physical world based on a multi-model generation substitution model.

The technical scheme adopted by the system of the invention is as follows: a video target detection evasion system comprises a multi-model-based model adaptive gray box training module, a threshold-based patch distance adaptive module, a multi-item loss function calculation module and a digital world patch fitting module;

the model adaptive gray box training module based on multiple models (YOLO, SSD, Faster RCNN and the like) is used for detecting a target picture pasted with an evasive patch, and extracting the number and confidence of human body detection

The threshold-based patch distance adaptive module is used for setting a user or system threshold, deciding a threshold distance and adaptively updating a patch distance;

the multinomial loss function-based calculation module is used for training loss constraints including smoothness loss, pixel change loss, non-printable color loss and the like, wherein the smoothness loss and the pixel change loss are used for smoothing an image to keep semantic information of the image;

the digital world patch attaching module is used for clothes wrinkle simulation and physical world color transformation, simulating various physical world environment changes, carrying out relevant transformation on the evasive patch, and improving the robustness of the evasive patch in the physical world.

The method adopts the technical scheme that: a video target detection and avoidance method comprises the following steps:

step 1: the method comprises the steps that a model adaptability gray box training module based on multiple models detects a target picture pasted with an evasion patch, and extracts the number and confidence of human body detection; distributing the patch distance self-adaptive module;

step 2: setting a threshold and making a threshold distance decision, and updating distance adaptability characteristics of the relevant area of the target patch to be updated according to the result parameters distributed by the system; sending the patch picture after iterative update into a digital world patch attaching module for generating a new target detection data set attached with the patch picture;

and step 3: clothes fold simulation, physical world color transformation and training loss constraint are carried out, various loss indexes of the patch picture in the training process are obtained through calculation, and the loss indexes are stored and distributed to a patch distance self-adaptive module;

the clothes wrinkle simulation system generates wrinkle simulation distortion for the patch based on a built-in two-bit wrinkle data set, and stores the distorted patch for later use;

the physical world color transformation is realized by establishing a mapping relation between digital world colors and printable colors of the physical world by using a method of a multilayer perceptron, and then fitting the printable colors of the physical world to the colors of the patch pictures;

and the training loss constraint is based on the output result of the substitution model generated in the model adaptive gray box training module and various loss functions (smoothness loss, pixel change loss, non-printable color loss and the like), calculates various loss indexes of the patch picture in the training process, and stores and distributes the loss indexes to the patch distance adaptive module.

Compared with the prior art, the invention has the advantages and positive effects mainly embodied in the following aspects:

(1) the invention provides a model adaptive gray box training method, which improves the universality of generated patches in the physical world and improves the disturbance capability of the patches on multiple models; based on this, an avoidance patch effective for many detection models can be generated.

(2) The distance adaptive patch generation algorithm is designed, the adaptivity of the countermeasure sample to the attack distance is improved, the effectiveness of the countermeasure patch in the full distance range of 2-10m is guaranteed, and the capability of the countermeasure patch to the full-distance disturbance of the model is improved; is suitable for the real physical world.

(3) The invention provides a semantic retention mechanism based on a smooth image, which retains the semantics of the original image to a certain extent and improves the semantics and naturalness of the sample. When the clothes can be worn in the physical world, the color of the clothes pattern is not easy to be perceived by passers-by to be too strange.

Drawings

FIG. 1 is a system block diagram of an embodiment of the present invention.

FIG. 2 is a schematic diagram of a multi-model-based model-adaptive gray box training module according to an embodiment of the present invention.

FIG. 3 is a schematic diagram of a threshold-based patch distance adaptation module according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of a computation module based on a polynomial loss function and a digital world patch pasting module according to an embodiment of the present invention.

FIG. 5 is a block diagram of a Model1 as a surrogate Model for simulating real-world detection according to an embodiment of the present invention.

Fig. 6 is an application scenario diagram according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples for the purpose of facilitating understanding and practicing the invention by those of ordinary skill in the art, it being understood that the examples described herein are for the purpose of illustration and explanation, and are not to be construed as limiting the invention.

Referring to fig. 1, the video target detection and avoidance system provided by the invention comprises a model adaptive gray box training module based on multiple models, a patch distance adaptive module based on a threshold, a multinomial loss function calculation module and a digital world patch fitting module.

The model adaptive gray box training module based on multiple models (including YOLO, SSD, FasterRCNN and the like) is used for detecting a target picture attached with an evasive patch (also called a 'stealth clothing' patch), and extracting the number and confidence of human body detection; in the process, model types and parameters in actual detection are not needed, and only a preset substitute model in the system is needed;

the threshold-based patch distance adaptive module of the embodiment is used for setting a user or system threshold, deciding a threshold distance and adaptively updating a patch distance; in the process, a threshold value set by a user or preset by a system is used as a decision main body to decide to generate a patch updating area range, and the system updates the image of the patch to ensure the protection performance of the patch under different distances;

the multinomial loss function-based calculation module of the embodiment is used for training loss constraints, and comprises smoothness loss, pixel change loss, non-printable color loss and the like, wherein the smoothness loss and the pixel change loss are used for smoothing an image to retain semantic information of the image;

the digital world patch attaching module is used for clothes wrinkle simulation and physical world color transformation, simulating various physical world environment changes, and carrying out related transformation on the evasive patch, so that the robustness of the evasive patch in the physical world is improved.

The video target detection and avoidance method provided by the embodiment comprises the following steps:

step 1: the method comprises the steps that a model adaptability gray box training module based on multiple models detects a target picture attached with a 'cloaking clothes' patch, and extracts the number and confidence of human body detection; distributing the patch distance self-adaptive module;

referring to fig. 2, the specific process of step 1 in this embodiment includes the following steps:

step A1: detecting a target stage of a patch pasted with a 'cloaking' garment, determining the detection sequence and times of a model by a user or a system, generating a substitution model by using a built-in model of the system, and performing series connection detection on a target picture pasted with the patch;

a1.1: before the system works, according to the user specified or preset parameters, determining the detection sequence and the round epoch of the Model, generating a substitution Model1 for simulating real world detection, and storing the Model for training iteration in patch adaptive training;

referring to FIG. 5, the Model1 of the substitute Model of the present embodiment includes a YOLOv2 layer, a Faster RCNN layer, a YOLOv2/SDD layer, a Faster RCNN layer and a Faster RCNN layer connected in sequence;

in the embodiment, the iterative process of the system is to paste the evasive patch to a human body target picture through a digital world patch pasting module, obtain confidence and a prediction frame through a multi-model-based model adaptive gray box training module, obtain multiple loss values through a loss function calculation module, and finally perform patch updating through a patch distance adaptive module. The user can specify the iteration times or use the iteration times in the system to carry out iteration, and finally the trained evasive patch is output.

A1.2: the system distributes the target picture (batch) attached with the invisible clothes patch generated in the patch attaching module to the patch distance self-adapting module, and uses the substitution Model1 generated by A1.1 to carry out serial identification detection.

Step A2: after detection, the system acquires the human body target detection number and confidence of a target picture output by the substitution model and distributes the extracted result to an optimizer arranged in the module;

a2.1: the system extracts the prediction confidence coefficient and the number of prediction frames of the target picture which is output by the substitution model and is pasted with the 'cloaking clothing' patch through the generated substitution model detection,

a2.2: and the system collects and formats the result obtained by the substitution model, distributes the result to a patch distance self-adaptation module, and actually updates the patch picture according to the result parameter.

Assuming that each batch (total batch of pictures) is processed, the result after formatting is: the confidence weighting sum probe and the number of prediction boxes pred _ num respectively represent the confidence weighting of all bounding boxes in each picture, the confidence of which is greater than a set threshold value configresh and the prediction category is the weighting of human, and the number of all bounding boxes in each picture, the prediction category is human and the prediction is correct.

Step 2: setting a threshold and making a threshold distance decision, and updating distance adaptability characteristics of the relevant area of the target patch to be updated according to result parameters distributed by the system in A2.2; sending the patch picture subjected to iterative update into a digital world patch attaching module for generating a new target detection data set attached with the patch picture;

referring to fig. 3, the specific process of step 2 in this embodiment includes the following steps:

step B1: setting threshold, wherein the user determines to set the distance threshold parameter or adopt the built-in preset threshold S_thresSubmitted by the system to the distance adaptation module, which, when receiving the results generated by the alternative models distributed by the system, is based on the previously submitted threshold S_thresOperating the module function;

step B1.1: before the system runs, a user main body determines to autonomously set a distance threshold or adopts a built-in preset threshold to determine the updating range of a target patch image;

step B1.2: the system carries out legal inspection on the distance threshold parameter submitted by the user, and the distance threshold parameter is distributed to a patch distance self-adaptive module after passing the legal inspection, and the distance threshold is pre-stored by the patch distance self-adaptive module;

step B1.3: after receiving a substitution model generation result distributed by the system, the patch distance self-adaptive module extracts a pre-stored distance threshold value and a target patch distributed by the system and executes the function of the corresponding module;

step B2: and (4) threshold distance decision, wherein the patch distance self-adaptive module performs length and width normalization on the incoming patch picture to be updated, and determines the update range of the patch picture based on a preset threshold. It should be noted that since the patch picture is square, the update range is also a scaled square region;

step B2.1: after the patch distance self-adaptive module receives a distributed target patch to be updated, the module performs normalization processing on the length and the width of a patch picture, so that the patch updating range can be conveniently determined subsequently;

if the size ratio of the picture identification frame pasted with the patch to the patch is smaller than a threshold value, the system decides that the patch is in a remote scene at the moment, and decides the patch updating range as a full picture;

if the size ratio of the picture identification frame pasted with the patch to the patch exceeds a threshold value, the system decides that the patch is in a close scene at the moment, decides an updating area as a picture center, and decides the ratio of the updating range of the patch through the system;

step B2.2: after the system is decided based on the threshold value, determining an anchor point and anchoring a patch to update a related area;

step B2.3: the system transmits the information of the anchored patch updating region to a patch distance self-adapting module, and the distance self-adapting module updates distance adaptive features of the patch;

step B3: patch distance adaptive updating, namely calculating a corresponding mask M after a patch distance adaptive module obtains a patch updating area range to be updated based on a distance threshold, and performing distance adaptive feature updating on a target patch related area to be updated by the distance adaptive module according to result parameters distributed by the system in A2.2;

step B3.1: when the patch distance self-adaptive module obtains the patch updating area range to be updated, a result parameter in A2.2 and an index in the loss function calculation module are requested from the system for patch updating;

step B3.2: when the patch distance self-adaptive module obtains the result parameters and the calculation index is pushed, the characteristics of the corresponding area of the patch are updated, wherein the characteristics comprise patterns, textures, colors and the like;

step B3.3: the patch picture after iterative update is sent to a digital world patch attaching module to generate a new target detection data set attached with the patch picture;

in the updating optimization, a weight attenuation method of L2 regularization is preferred, and the problem of model overfitting is reduced to a certain extent.

clothes wrinkle simulation, namely generating wrinkle simulation distortion for a patch by a system based on a built-in two-bit wrinkle data set, and storing the distorted patch for later use;

and training loss constraint, namely calculating various loss indexes of the patch picture in the training process based on the generated output result of the substitution model and various loss functions (including smoothness loss, pixel change loss, non-printable color loss and the like), and storing and distributing the loss indexes to a patch self-adaptive updating module.

Referring to fig. 4, the specific process of step 3 in this embodiment includes the following steps:

step C1: clothes wrinkle simulation, namely generating wrinkle simulation distortion for a patch by a system based on a built-in two-bit wrinkle data set, and storing the distorted patch for later use;

step C1.1: data set data of two-dimensional anchor point distortion under different human body states and built in clothes fold simulation function loading system_tpsAnd the patch picture patch after color conversion_cnvTo call a distortion function f_tpsRealizing the distortion of the patch picture to provide data support;

step C1.2: clothes fold simulation function loading and loading designed distortion function f_tpsPerforming two-dimensional image distortion on the target patch picture to simulate the appearance characteristics of the patch worn on the human body；

Step C1.3: the clothes wrinkle simulation function stores and distributes the distorted patch to a laminating device in a digital world patch laminating module, and a patch picture is laminated to a digital world human body to form data set data of a model adaptability gray box training module;

step C2: the method comprises the following steps of (1) transforming physical world colors, namely establishing a mapping relation f between digital world colors and printable colors of the physical world by using a method of a multilayer perceptron, and then fitting the printable colors of the physical world to the colors of patch pictures;

step C2.1: before the system works, the color conversion function in the digital world patch attaching module converts the digital world color_digitalAnd a built-in physical world color_physicalLoading into a three-layer full-connection BP network to generate color fitting of a physical world and a digital world;

step C2.2: color conversion reading patch picture patch to be color converted_originAnd through the generated color fitting, converting the color of the patch picture to generate patch_cnvAnd the data is pushed to a clothes wrinkle simulation function part in the digital world patch pasting module;

step C3: training loss constraint, calculating and obtaining each loss index loss of the patch picture in the training process based on the generated substitution model output result and the multiple loss functions f (, and storing and distributing the loss indexes loss to a patch self-adaptive updating module;

step C3.1: the system transmits the result parameters of the model adaptability gray box training module to the loss function calculation module;

step C3.2: the loss function calculation module calculates and obtains each loss index loss in the patch adaptive training by loading the transmitted result parameters and calling a plurality of loss functions f ();

step C3.3: and the loss function calculation module processes each loss index obtained by calculation and then transmits the processed loss index to the patch distance self-adaptive module for updating the patch characteristics.

Fig. 6 is a view of an application scenario of the stealth coat according to the embodiment of the present invention. When the user faces the illegal human body detection camera, the user individual who does not wear the invisible clothes printed with the evasion patch is detected by the illegal human body detection camera (shown in the figure by being framed), while the user who wears the invisible clothes is not detected (shown in the figure by not being framed), and in the picture of the simulated illegal human body detection camera, the user who wears the invisible clothes is arranged on the left, and the user who does not wear the invisible clothes is arranged on the right.

The invention has the advantages that:

1. the scheme adopts a model adaptive gray box training method, improves the universality of the countermeasure patch in the physical world, has a physical world interference success rate of over 50 percent on a plurality of target detection models, improves the transferability of the countermeasure patch, and obviously improves the privacy protection performance of the countermeasure sample when facing a mainstream recognition model in practical application. Applicable to real physical world applications, please see fig. 5;

2. threshold-based patch distance adaptive update mechanism: setting a patch updating threshold by taking a user as a decision maker, and performing targeted feature updating on a central area or the whole patch by taking a distance self-adaptive module as an updating operation main body according to the threshold by the system;

3. and the semantic property and the naturalness of the countermeasure sample are improved based on a semantic retention mechanism of the smooth image. By designing a semantic loss function based on the change of the initial image, the semanteme of the initial image in the countermeasure sample can be effectively reserved in the training process, and the naturalness of the countermeasure sample in the physical world is improved.

The invention can effectively protect the user from the automatic identification of the target identification detection model and prevent the illegal acquisition, storage and use of the target detection extraction technology on the sensitive information of the user. In military affairs, with the continuous development of modern unmanned wars, the intelligent stealth clothes can effectively avoid the detection and locking of unmanned weapons on human targets and seize first-flight in the unmanned wars. In the future, the intelligent stealth clothes based on the countercheck patches have wide application prospects, will bring great benefits to civil, commercial and military scenes, and can be expected in the future.

The invention can provide reliable and convenient sensitive information protection method for users in more fields of civil use, military use and the like.

It should be understood that parts of the specification not set forth in detail are well within the prior art.

It should be understood that the above description of the preferred embodiments is given for clearness of understanding and no unnecessary limitations are to be understood therefrom, for those skilled in the art may make modifications and alterations without departing from the scope of the invention as defined by the appended claims.

Claims

1. A video target detection evasion system is characterized in that: the system comprises a multi-model-based model adaptive gray box training module, a threshold-based patch distance adaptive module, a multi-item loss function calculation module and a digital world patch fitting module;

the model adaptability gray box training module based on the multiple models generates a training substitution model which is used for detecting a human target picture attached with an evasion patch and extracting the number and confidence of human detection; the multi-model includes YOLO, SSD, and fasternn;

the multinomial loss function-based calculation module is used for calculating losses, including smoothness losses, pixel change losses and non-printable color losses, wherein the smoothness losses and the pixel change losses are used for smoothing the image to keep the semantic information of the picture;

the digital world patch fitting module is used for clothes wrinkle simulation and physical world color fitting, simulating various physical world environment changes, carrying out relevant transformation on the evasive patch, and improving the robustness of the evasive patch in the physical world.

2. A video target detection and avoidance method is characterized by comprising the following steps:

step 1: the method comprises the steps that a model adaptability gray box training module based on multiple models detects a target picture attached with an evasion patch, and extracts the number and confidence degree of human body detection; distributing the patch distance self-adaptive module;

the clothes wrinkle simulation system generates wrinkle simulation distortion for the patch based on a built-in two-dimensional wrinkle data set, and stores the distorted patch for later use;

the physical world color transformation comprises the steps of establishing a mapping relation between digital world colors and printable colors of the physical world by using a three-layer full-connection BP network, and then fitting the printable colors of the physical world to the colors of patch pictures;

the training loss constraint is based on an output result of a substitution model generated by a model adaptive gray box training module and a plurality of loss functions including smoothness loss, pixel change loss and non-printable color loss, various loss indexes of a patch picture in the training process are obtained through calculation and stored and distributed to a patch adaptive updating module, wherein the smoothness loss and the pixel change loss are semantic retention mechanisms of a smooth image;

the iterative process of the system is that the evasion patch is pasted to a human body target picture through a digital world patch pasting module, then confidence coefficient and a prediction frame are obtained through a multi-model-based model adaptive gray box training module, multiple loss values are obtained through a loss function calculation module, and finally patch updating is carried out through a patch distance adaptive module. The user can appoint iteration times or use the iteration times built in the system to iterate, and finally the trained evasive patches are output.

3. The video target detection and avoidance method according to claim 2, wherein the specific implementation of step 1 comprises the following sub-steps:

step 1.1: before the system works, the detection sequence and the detection turn of the Model are determined according to the user-specified or preset parameters, a substitution Model1 for training is generated based on multiple models, and the Model is stored for iteration in patch adaptive training;

the surrogate Model1 includes a Yolov2 layer, a Faster RCNN layer, a Yolov2/SDD layer, a Faster RCNN layer and a Faster RCNN layer connected in sequence;

step 1.2: the system distributes the target pictures pasted with the evasion patches generated in the patch pasting module to an adaptive training module, uses the substitution model generated in the step 1.1 to carry out serial identification detection, and obtains the human body target detection number and the prediction confidence coefficient of the target pictures output by the substitution model;

if m target pictures pasted with the evasion patches in each batch exist, the prediction confidence coefficient is the weighted sum of the confidence coefficient of each picture exceeding a set threshold and the confidence coefficient of the boundary box of which the prediction type is human; the number of the human body target detection is the number of the boundary boxes which are used for predicting the types of the human bodies in each picture and are correct in prediction;

step 1.3: the system collects and combines results obtained by the substitution model, distributes the results to the patch distance self-adaptive module, and actually updates the patch picture according to the result parameters.

4. The video target detection avoiding method according to claim 2, wherein the specific implementation of the step 2 comprises the following sub-steps:

step 2.1: setting a threshold value;

the user main body decides to autonomously set a distance threshold parameter or adopt a built-in preset threshold, the distance threshold parameter is submitted to a patch distance self-adaptive module through the system, and when a generation result of a substitution model distributed by the system is received, the distance patch distance self-adaptive module operates a module function based on the previously submitted threshold parameter;

step 2.2: threshold distance decision;

the patch distance self-adaptive module performs length and width normalization on an incoming patch picture to be updated, and determines a patch picture updating range based on a preset threshold value;

step 2.3: patch distance adaptive updating;

and when the patch distance self-adaptive module obtains the range of the patch updating region to be updated based on the distance threshold, the patch distance self-adaptive module performs distance adaptive feature updating on the related region of the target patch to be updated according to the result parameters distributed by the system.

5. The video target detection avoidance method according to claim 4, wherein: step 2.1, before the system runs, a user main body determines to autonomously set a distance threshold or adopts a built-in preset threshold for determining the updating range of the target patch image; the system carries out legal inspection on the distance threshold parameter submitted by the user, distributes the parameter to the patch distance self-adaptive module after passing the legal inspection, and prestores the distance threshold; and after receiving the generation result of the substitution model distributed by the system, the patch distance self-adaptive module extracts a pre-stored distance threshold value and a target patch distributed by the system and executes the function of the corresponding module.

6. The video target detection avoidance method according to claim 4, wherein: in step 2.2, after the distance self-adaptive module receives the distributed target patch to be updated, the module performs normalization processing on the length and the width of the patch picture, so that the patch updating range can be conveniently determined subsequently; if the size ratio of the picture identification frame pasted with the patch to the patch is smaller than a threshold value, the system determines that the patch is in a remote scene when the system determines that the patch is in the remote scene, and determines that the patch updating range is a full picture; if the size ratio of the picture identification frame pasted with the patch to the patch exceeds a threshold value, the system determines that the patch is in a close scene when the system determines that the patch is in the close scene, determines an updating area as a picture center, and determines the ratio of a patch updating range through the system; after the system makes a decision based on the threshold value, anchoring the patch to update the relevant area; the system transmits the anchored patch updating region information to a patch distance self-adapting module, and the distance self-adapting module updates distance adaptive features of the patch.

7. The video target detection avoidance method according to claim 4, wherein: in step 2.3, after the patch distance self-adaptive module obtains the patch updating area range to be updated, a result parameter and an index in the loss function calculation module are requested from the system for patch updating; when the patch distance self-adaptive module obtains the result parameters and the calculation index is pushed, the characteristics of the corresponding area of the patch are updated, wherein the characteristics comprise patterns, textures and colors; and sending the patch picture after iterative updating into a digital world patch attaching module for generating a new target detection data set attached with the patch picture.

8. The video target detection and avoidance method according to claim 2, wherein the clothing wrinkle simulation function in step 3 is implemented by the following sub-steps:

step 3.1.1: the clothes fold simulation function loads a data set with two-dimensional anchor point distortion and a patch picture after color transformation under different human body states, which are built in a system, and provides data support for calling a TPS distortion function to realize the distortion of the patch picture;

step 3.1.2: the garment fold simulation function loads a designed TPS distortion function, two-dimensional image distortion is carried out on a target patch picture, and the appearance characteristics of the patch worn on the human body are simulated;

step 3.1.3: and the clothes wrinkle simulation function stores and distributes the distorted patch to a laminator in the digital world patch laminating module, and laminates the patch picture to the digital world human body to form a data set of the model adaptability grey box training module.

9. The video target detection and avoidance method according to claim 2, wherein the physical world color transformation function in step 3 is implemented by the following sub-steps:

step 3.2.1: before the system works, the color conversion function in the digital world patch attaching module loads the digital world color and the built-in physical world color into a three-layer full-connected BP network to generate color fitting of the physical world and the digital world;

step 3.2.2: and the color conversion function reads the patch picture to be subjected to color conversion, converts the color of the patch picture through generated color fitting, and pushes the color to a garment wrinkle simulation function part in the digital world patch attaching module.

10. The video target detection avoiding method according to any one of claims 2 to 9, wherein the training loss constraint in the step 3 is implemented by the following sub-steps:

step 3.3.1: the system transmits the result parameters of the model adaptive gray box training module to the loss function calculation module;

step 3.3.2: the loss function calculation module is used for calculating and obtaining various loss indexes in the patch adaptability training by loading the transmitted result parameters and calling a plurality of loss functions;

step 3.3.3: and the loss function calculation module processes each loss index obtained by calculation and then transmits the loss index to the patch distance self-adaption module for updating the patch characteristics.