CN115511910A

CN115511910A - Anti-attack method, system, medium, equipment and terminal for video tracking

Info

Publication number: CN115511910A
Application number: CN202211010630.8A
Authority: CN
Inventors: 李福生; 鲁欣
Original assignee: Yangtze River Delta Research Institute of UESTC Huzhou
Current assignee: Yangtze River Delta Research Institute of UESTC Huzhou
Priority date: 2022-08-22
Filing date: 2022-08-22
Publication date: 2022-12-23
Anticipated expiration: 2042-08-22
Also published as: CN115511910B

Abstract

The invention belongs to the technical field of computer vision, and discloses an anti-attack method, a system, a medium, equipment and a terminal for video tracking, wherein a space-time transformation attack algorithm is constructed, and the number of iteration rounds, the disturbance magnitude and the balance coefficient of a joint loss function of the space-time anti-attack algorithm are determined; inputting the tracking result of the previous frame into a space-time countermeasure attack algorithm, performing N-round gradient descent iteration every time the algorithm is operated, generating disturbance, and adding the disturbance into the current frame; inputting the attacked video frame serving as a tracking image into a tracking algorithm, acquiring a tracking result of the current frame, and then iteratively operating the process until all video frames of all video sequences of the test data set are traversed; and recording and storing the tracking result of each frame, and quantitatively analyzing the tracking accuracy and the tracking success rate. The attack resisting method provided by the invention greatly reduces the disturbance intensity of the resisting sample, and greatly reduces the possibility of being perceived while achieving an obvious attack effect.

Description

Anti-attack method, system, medium, equipment and terminal for video tracking

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a video tracking-oriented attack resisting method, a video tracking-oriented attack resisting system, a video tracking-oriented attack resisting medium, a video tracking-oriented attack resisting device and a video tracking-oriented attack resisting terminal.

Background

As an important component of computer vision, visual target tracking has been widely applied to scenes such as unmanned driving and security. With the rapid development of the deep learning technology, the target tracking technology based on the deep learning makes a significant breakthrough. However, in recent years, researchers in the fields of target tracking, image segmentation, and natural language processing have reported counterattack, and the security of deep learning is worried about. In the field of image recognition, the presence of challenge samples has attracted the researchers' attention, but in the field of single-target tracking, the study of challenge samples is still relatively rare due to the complexity of this problem.

Early gradient iteration-based anti-attack algorithms such as FGSM, PAD, BIM misled the deep learning model by optimizing the loss function, but such algorithms required knowledge of the full extent of the tracking algorithm and the attack effect was poor. The method is characterized in that a single-pixel anti-attack algorithm is provided by Jianwei Su et al, a counter sample which only modifies one pixel is generated through the algorithm, namely, a deep learning model misleads to wrong classification with high confidence, but the method cannot adapt to multi-frame tasks of videos. In the same year, xugang Wu et al propose STA algorithms, which analyze the common vulnerability of twin network-based trackers and prove the ubiquitous nature of generating countermeasure samples in the field of target tracking; yan et al propose a cooling shrinkage counterattack method, which cools the thermodynamic diagram of the location of the target and shrinks the predicted bounding box to make the tracked object unable to be tracked.

Through the above analysis, the problems and defects of the prior art are as follows: in the field of single target tracking, because the task of target tracking includes not only classification of foreground and background but also regression of frame, research on countercheck samples is still relatively few due to complexity of countercheck problem of target tracking, on the other hand, research on countercheck mainly stays on a single image at present, and research on video countercheck is still to be explored.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a video tracking-oriented attack resisting method, a system, a medium, equipment and a terminal.

The invention is realized in such a way that a video tracking-oriented anti-attack method comprises the following steps:

constructing a space-time transformation attack algorithm, determining the iterative round number, the range of the added disturbance size, and the balance coefficient and the iterative round number of a combined loss function of the space-time anti-attack algorithm; inputting the tracking result of the previous frame into a space-time countermeasure attack algorithm, performing N-round gradient descent iteration every time the algorithm is operated, generating disturbance, and adding the disturbance into the current frame; inputting the attacked video frame serving as a tracking image into a tracking algorithm, acquiring a tracking result of a current frame, adding disturbance generated by the current frame to a next frame serving as an initial frame of iteration, and then iterating to run a process to traverse all video frames of all video sequences of a test data set; and recording and storing the tracking result of each frame, and quantitatively analyzing the tracking accuracy and the tracking success rate.

Further, the video tracking-oriented attack resisting method comprises the following steps:

acquiring tracking results of a target tracking video and a previous frame of video image;

step two, inputting the current frame into a tracker to obtain a suggested candidate frame, calculating an intersection comparison result of the suggested candidate frame of the current frame and a tracking result of a previous frame, and determining a real classification confidence label;

step three, calculating the real regression offset between the tracking suggestion frame of the current frame and the tracking result of the previous frame;

step four, obtaining a tracker loss function for the current frame, wherein the tracker loss function comprises a binary classification loss function and a frame regression loss function; designing a classification deception loss function according to the binary classification loss function, and designing a regression deception loss function according to the frame regression loss function;

designing a perception loss function; comprehensively classifying the deception loss function, the regression deception loss function and the perception loss function to obtain a comprehensive loss function;

step six, solving a partial derivative related to the input frame for the comprehensive loss function, and calculating a gradient;

and step seven, transferring the gradient to a sign function as disturbance generated by the current iteration. And adding the disturbance generated by the mth round to the countermeasure input frame of the mth round of iteration to obtain the countermeasure input frame of the (M + 1) th iteration, and after the iteration is performed for M times, obtaining the countermeasure video image of the current frame with the disturbance finally added.

And step eight, selecting disturbance finally generated by the t-1 th frame of the video image frame to initialize the video image of the first iteration of the t-th frame for the t-th frame video image.

Further, in the first step, a target tracking video is obtained, a previous frame of video image is input into the tracker, and a previous frame of tracking result (x) is obtained _c ,y _c ,w _r ,h _r ) Wherein x is _c ,x _y Respectively the horizontal and vertical coordinates, w, of the center point of the tracking result _r ,h _r Width and height of the tracking result, respectively.

In the second step, the current frame I is input into the tracker to obtain N suggested candidate frames, and the N suggested candidate frames of the current frame and the tracking result (x) of the previous frame are calculated _c ,y _c ,w _r ,h _r ) Cross-over and cross-over ratio result IOU of ₁ Then true classification confidence label p ^c Comprises the following steps:

in the third step, for the nth tracking suggestion frame of the current frame I

Wherein 0<n≤N，

Respectively the horizontal and vertical coordinates of the central point of the tracking result,

width and height of the tracking result respectively; and the last frame tracking result (x) _c ,y _c ,w _r ,h _r ) Has a true regression offset of

Calculating N tracking suggestion frames of the current frame and the tracking result (x) of the previous frame _c ,y _c ,w _r ,h _r ) True regression offset p ^r 。

Further, in the fourth step, for the current frame I, the obtained tracker loss function is L (I, N, θ);

wherein N represents the total number of proposed candidate frames in the input frame I; l is a radical of an alcohol _c Representing a binary classification loss function, and calculating by adopting a cross entropy loss function; l is _r Representing a frame regression loss function, and calculating by adopting a smoothL1 loss function;

a prediction classification confidence score representing the nth suggested candidate box in the current frame I;

a predicted regression offset for the nth proposed candidate in the input frame I;

representing a true classification confidence score for the nth suggested candidate box in the current frame I;

representing the true regression offset for the nth proposed candidate box in the current frame I; α is a fixed weight parameter; theta represents the network parameter employed by the tracker.

Designing a classification spoofing loss function L _{cheat_class} ；

Wherein,

for the nth candidate suggestion box I _n A corresponding misclassification tag. For

Is generated due to

There are only two possible results 1 or 0 indicating the nth candidate frame I of the current frame I _n Belong to the object or the background, so pass through

Binary negation operations to generate error classification labels

Designing a regression deception loss function L _{cheat_regression} ；

Wherein for

Is generated by

Adding random distance offsets delta respectively _offset And random proportional variation delta _scale Generating error regression labels

0.3<δ _offset <0.5，0.7<δ _scale <0.9；

Further, the perceptual loss function L in the fifth step _quality Comprises the following steps:

wherein, I _m ^adv And T represents the input image frame after disturbance is added in the mth round, and the number of pixels of the input image frame is represented.

The comprehensive classification spoofing loss function L _{cheat_class} Regression deception loss function L _{cheat_regression} And a perception loss function to obtain a comprehensive loss function L _adv (K, N, θ) is:

where λ represents a hyperparameter for balancing the ratio between the plurality of loss functions.

In the sixth step, for the synthetic loss function L _adv Calculating a partial derivative of the synthetic loss function about the input frame I, and calculating a gradient and recording the gradient as r;

further, in the seventh step, the gradient is transmitted into the sign function, and for the m +1 th iteration, the input frame is resisted

Wherein,

epsilon is the maximum perturbation, epsilon =0.15, M is the maximum number of iterations, M =10; m represents the turn of the current iteration,

a countermeasure input frame representing the mth iteration, sign () being a sign function, r _m The gradient obtained for the mth iteration.

Obtaining a confrontation video image of the final added disturbance of the current frame after iteration for M times

In the eighth step, aiming at the characteristic that the image frames in the video have space-time continuity, the video image I of the t-th frame is subjected to ^t Selecting the previous frame of the video image frameCountering video images

The obtained video image of the first iteration of the t frame of disturbance initialization

Wherein,

the video image of the first iteration for the t-1 th frame.

Another object of the present invention is to provide a video tracking-oriented counterattack system applying the video tracking-oriented counterattack method, the video tracking-oriented counterattack system including:

the tracking result acquisition module is used for acquiring tracking results of the target tracking video and the previous frame video image;

the real classification confidence label determining module is used for inputting the current frame into the tracker to obtain a suggested candidate frame, calculating the intersection comparison result of the suggested candidate frame of the current frame and the tracking result of the previous frame, and determining a real classification confidence label;

the real regression offset calculation module is used for calculating the real regression offset of the tracking suggestion frame of the current frame and the tracking result of the previous frame;

the loss function involves the module, is used for confirming the tracker loss function of the current frame, and design and classify and deceive the loss function and design and regress and deceive the loss function;

the comprehensive loss function determining module is used for designing a perception loss function; comprehensively classifying the deception loss function, the regression deception loss function and the perception loss function to obtain a comprehensive loss function;

the gradient calculation module is used for solving the partial derivative of the synthetic loss function about the input frame and calculating the gradient;

the countermeasure video image determination module is used for transmitting the gradient into a sign function to reduce the influence of an abnormal value, adding disturbance generated by the mth iteration into the input countermeasure frame of the mth iteration and obtaining an input countermeasure frame of the m +1 th iteration; after M times of iteration, obtaining a confrontation video image of the current frame with disturbance added finally;

and the video image initialization module is used for selecting the t-1 frame of the video image frame to resist the disturbance obtained by the video image and initializing the video image of the first iteration of the t frame for the t frame video image.

It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the video tracking oriented counter attack method.

It is a further object of the present invention to provide a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, causes the processor to carry out the steps of the video tracking-oriented anti-attack method.

Another object of the present invention is to provide an information data processing terminal, which is used for implementing the video tracking-oriented anti-attack system.

In combination with the technical solutions and the technical problems to be solved, please analyze the advantages and positive effects of the technical solutions to be protected in the present invention from the following aspects:

first, aiming at the technical problems and difficulties in solving the problems in the prior art, the technical problems to be solved by the technical scheme of the present invention are closely combined with results, data and the like in the research and development process, and some creative technical effects are brought after the problems are solved. The specific description is as follows:

with the rapid development of computer vision, visual target tracking has been widely applied to scenes such as unmanned driving, security and the like. However, researches show that the tracking algorithm based on the neural network has potential safety hazards due to artificially created confrontation samples, so that the precision of the tracking algorithm is obviously reduced. Aiming at a target tracking task based on deep learning, the invention provides a novel lightweight and efficient space-time transformation attack algorithm based on gradient iteration. The algorithm generates confrontation samples frame by frame, emphasizes the characteristic of space-time motion consistency in the target tracking field, and adds dense confrontation samples in a space-time domain to mislead the most advanced tracker. In addition, the algorithm adopts a joint loss function which consists of classification loss, regression loss and perception loss functions; the classification loss function enables the tracker to have errors when classifying the target and the background, the regression loss function enables the tracking frame to drift, and the perception loss function maintains the image quality to reduce distortion, so that the provided algorithm can realize high-efficiency attack on the tracking algorithm while the generated disturbance is not easy to be perceived by human eyes. For the countermeasure attack of video target tracking, the existing depth tracker is not modified, disturbance is learned and injected into an input frame, an indistinguishable countermeasure sample is generated, and the disturbance learned in the current frame is utilized to initialize the disturbance learning of the next frame in consideration of the space-time consistency between video frames, so that the performance of the depth tracker is further reduced.

In the invention, the iteration round number, the disturbance range and the balance coefficient of the combined loss function of the space-time countermeasure attack algorithm are determined firstly. And inputting the tracking result of the previous frame into a space-time countermeasure attack algorithm, performing N-round gradient descent iteration every time the algorithm is operated, generating disturbance, and adding the disturbance into the current frame. And inputting the attacked video frame serving as a tracking image into a tracking algorithm, acquiring a tracking result of the current frame, adding the disturbance generated by the current frame to the next frame serving as an initial frame of iteration, and then iteratively operating the process until all video frames of all video sequences of the test data set are traversed. And recording and storing the tracking result of each frame, and quantitatively analyzing the tracking accuracy and the tracking success rate.

Secondly, considering the technical scheme as a whole or from the perspective of products, the technical effect and advantages of the technical scheme to be protected by the invention are specifically described as follows:

aiming at the safety problem of a tracking algorithm based on deep learning, the invention provides a novel light-weight, efficient and intensive-attack space-time transformation attack algorithm aiming at target tracking by focusing on a target tracking network based on deep learning. On one hand, a joint loss function is designed, wherein the classification deception loss function misleading tracking algorithm classifies the target foreground and the background, the frame of the regression deception loss function misleading tracking algorithm regresses, and the perception loss function ensures that the anti-attack algorithm can achieve obvious attack effect and greatly reduces the possibility of being perceived. On the other hand, the attack algorithm provided by the invention has universality by learning a disturbance generation countersample to finish the aggression to the tracking algorithm under the condition that the parameters of the tracking algorithm are unknown. Finally, the attack algorithm provided by the invention considers the space-time consistency among video frames, and utilizes the disturbance learned in the current frame to initialize the disturbance learning of the next frame, thereby further reducing the performance of the depth tracker.

Third, as an inventive supplementary proof of the claims of the present invention, there are also presented several important aspects:

the technical scheme of the invention fills the technical blank in the industry at home and abroad:

in the field of single target tracking, because the task of target tracking includes not only classification of foreground and background but also regression of frame, research on confrontation samples is still less due to complexity of the confrontation attack problem of target tracking. According to the method, a loss function is respectively designed to mislead classification and regression tasks of a tracking algorithm according to output responses, namely a real classification confidence coefficient and a real regression offset, based on a deep learning tracker. In addition, a perception loss function is designed, so that the disturbance added by the proposed algorithm is not easily perceived by human eyes. On the other hand, the current researches on antagonistic attack and defense mainly stay on a single image, and the researches on the video antagonistic attack are still to be explored. The technical scheme of the invention utilizes the space-time consistency between video frames to initialize the disturbance learning of the next frame by the disturbance learned in the current frame, thereby further reducing the performance of the depth tracker.

The technical scheme of the invention solves the technical problem that people are eagerly to solve but can not be successfully solved all the time:

the technical scheme of the invention solves the problem that the attack on the video target tracking can be realized under the condition that the model parameters of the tracking algorithm are unknown. Secondly, the technical scheme of the invention does not modify the existing depth tracker, but learns the disturbances and injects the disturbances into the input frame to generate the indistinguishable countersamples, thereby greatly reducing the difficulty of technical realization. Finally, relatively few studies have been conducted to generate countersamples on video sequences to achieve attack depth tracking, where inter-frame motion consistency involves more challenges, and the technical solution of the present invention utilizes the temporal and spatial consistency between video frames to achieve video-based counterattacks.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic diagram of an attack-fighting method for video tracking according to an embodiment of the present invention;

fig. 2 is a success rate diagram and an accuracy diagram of the DaSiamRPN tracking algorithm before and after being attacked, which are provided by the embodiment of the present invention;

fig. 3 is a schematic diagram of a visualization result of the DaSiamRPN tracking algorithm before and after being attacked according to the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

In view of the problems in the prior art, the present invention provides a method, a system, a medium, a device and a terminal for resisting attack facing video tracking, and the present invention is described in detail below with reference to the accompanying drawings.

The video tracking-oriented attack resisting method provided by the embodiment of the invention comprises the following steps:

s101, acquiring tracking results of a target tracking video and a previous frame video image;

s102, inputting the current frame into a tracker to obtain a suggested candidate frame, calculating an intersection comparison result of the suggested candidate frame of the current frame and a tracking result of a previous frame, and determining a real classification confidence label;

s103, calculating the real regression offset between the tracking suggestion frame of the current frame and the tracking result of the previous frame;

s104, determining a tracker loss function of the current frame, and designing a classification deception loss function and a regression deception loss function;

s105, designing a perception loss function; comprehensively classifying the deception loss function, the regression deception loss function and the perception loss function to obtain a comprehensive loss function;

s106, solving a partial derivative of the comprehensive loss function about the input frame, and calculating a gradient;

s107, correcting the gradient, and inputting the frame after the disturbance is added and obtained by the (m + 1) th iteration; after M times of iteration, obtaining a confrontation video image of the current frame with disturbance added finally;

s108, for the t frame video image, selecting a t-1 frame of the video image frame to resist disturbance obtained by the video image and initializing a video image of the first iteration of the t frame.

As a preferred embodiment, as shown in fig. 1, the video tracking-oriented attack-fighting method provided by the embodiment of the present invention specifically includes the following steps:

step 1: acquiring a target tracking video, inputting a previous frame video image into a tracker, and acquiring a previous frame tracking result (x) _c ,y _c ,w _r ,h _r ) Wherein x is _c ,x _y Respectively the horizontal and vertical coordinates, w, of the center point of the tracking result _r ,h _r Width and height of the tracking result, respectively.

Step 2: inputting the current frame I into a tracker to obtain N suggested candidate frames, and calculating the N suggested candidate frames of the current frame and the tracking result (x) of the previous frame _c ,y _c ,w _r ,h _r ) Cross-over and cross-over ratio result IOU of ₁ Then true classification confidence label p ^c Comprises the following steps:

and step 3: for the nth (0) of the current frame I<N is less than or equal to N) tracking suggestion frame

Wherein

width and height of the tracking result, respectively. It tracks the result (x) with the last frame _c ,y _c ,w _r ,h _r ) Is the true regression offset of

Namely:

And 4, step 4: for the current frame I, the obtained tracker loss function is L (I, N, theta), namely;

where N denotes the total number of proposed candidate frames in the input frame I, L _c Representing a binary classification loss function, and calculating by adopting a cross entropy loss function; l is _r Representing a frame regression loss function, adopting smoothL1 loss function to calculate,

a prediction classification confidence score representing the nth suggested candidate box in the current frame I,

the predicted regression offset for the nth proposed candidate in the input frame I,

represents the true classification confidence score for the nth proposed candidate box in the current frame I,

represents the true regression offset for the nth proposed candidate box in the current frame I, α is a fixed weight parameter, and θ represents the network parameter employed by the tracker.

And 5: in order to cheat the classification branch of the tracker, a classification cheating loss function L is designed _{cheat_class} Namely:

wherein,

for the nth candidate suggestion box I _n A corresponding misclassification tag. For the

Due to generation of

There are only two possible results (1 or 0) to indicate the nth candidate frame I of the current frame I _n Belongs to the target or the background, so pass through

Binary negation operations to generate error classification labels

Step 6: in order to cheat the classification branch of the tracker, a regression cheating loss function L is designed _{cheat_regression} Namely:

wherein, for

Is generated by

Adding random distance offsets delta respectively _offset (0.3<δ _offset <0.5 ) and random proportional change delta _scale (0.7<δ _scale <0.9 ) generate false regression tags

And 7: in order to restrict the disturbance intensity added to a single video image and make the video image after disturbance addition closer to the original video image so as to reduce the possibility of being perceived, the invention designs a perception loss function L _quality Namely:

wherein, I _m ^adv Representing the input image frame after the mth round of adding the disturbance, I ₁ And T represents the number of pixels of the input image frame for the 1 st iteration of the input image frame.

And 8: comprehensive classification spoofing loss function L _{cheat_class} Regression deception loss function L _{cheat_regression} And a perception loss function to obtain a comprehensive loss function L _adv (K, N, θ), i.e.:

And step 9: for the synthetic loss function L _adv To bias it with respect to the input frame IDerivative, calculate gradient as r, i.e.:

step 10: introducing the gradient into a sign function, and for the perturbed input frame obtained by the (m + 1) th iteration

Namely:

wherein,

ε is the perturbation maximum, ε =0.15, M is the maximum number of iterations, M =10; m represents the turn of the current iteration,

representing the perturbed input frame obtained by the mth iteration, sign () is a sign function, r _m The gradient obtained for the mth iteration.

Step 11: obtaining a confrontation video image of the final added disturbance of the current frame after iteration for M times

Step 12: aiming at the characteristic that image frames in the video have space-time continuity, for the t frame video image I ^t Selecting a previous frame (t-1 th frame) of the video image frame as a countermeasure video image

Initializing the video image of the first iteration of the t frame by the obtained disturbance

Namely:

wherein,

the video image of the first iteration for the t-1 th frame.

The video tracking-oriented anti-attack system provided by the embodiment of the invention comprises:

the loss function relating module is used for determining the tracker loss function of the current frame, designing a classification deception loss function and designing a regression deception loss function;

the gradient calculation module is used for solving the partial derivative of the comprehensive loss function about the input frame and calculating the gradient;

the confrontation video image determining module is used for correcting the gradient and inputting the disturbed input frame obtained by the (m + 1) th iteration; after M times of iteration, obtaining a confrontation video image of the current frame with disturbance added finally;

In the tracking framework, a depth tracker usually adopts a convolutional neural network architecture comprising two branches, wherein the first branch is responsible for a classification task and identifies whether all the proposed candidate frames belong to a foreground frame or a background frame; the second branch is responsible for regression tasks and accurately positions the position of the target. The adversarial attack algorithm provided by the invention is deployed into the advanced DaSiamRPN tracking algorithm. Given an input video sequence and an initial frame of labeled bounding boxes, in processing each video frame, the countering perturbations are generated in combination with the perceptual loss function according to the DaSiamRPN tracking algorithm's output response, i.e., the true classification confidence and the true regression offset. These perturbations are added to the input frame to generate an antagonistic sample. And then inputting the generated confrontation sample into a DaSiamRPN tracking algorithm to obtain a precision graph and a success rate graph of the result.

Based on the above thought, the embodiment of the present invention provides an attack resisting method for a DaSiamRPN tracking network, which specifically includes the following steps:

step 1: and inputting the video data set into a DaSiamRPN tracker to obtain an output response, namely acquiring a tracking result given by the tracker aiming at the previous frame. Calculating a real classification confidence label p according to the steps 1 to 3 _c And the true regression offset p _r ；

Step 2: setting the size of disturbance generated by an anti-attack algorithm to be limited between (0) and (255) in each pixel point, and setting the balance coefficient lambda of a joint loss function to be 5;

and step 3: inputting the tracking result of the previous frame into an anti-attack algorithm, and obtaining an anti-video image of the current frame with disturbance added finally according to the steps 4-11;

and 4, step 4: inputting the attacked video frame serving as a tracking image into a DaSiamRPN algorithm to obtain a tracking result of the current frame;

and 5: the video image of the first iteration of the next frame is initialized according to step 12 and then the process is iteratively run through all video frames of the complete video sequence of the test data set.

As shown in fig. 2, the legend "DaSiamRPN" represents the DaSiamRPN tracking performance without using the anti-attack algorithm, and the legend "DaSiamRPN _ attack" represents the DaSiamRPN tracking performance using the anti-attack algorithm, and the tracking accuracy and the tracking success rate of the DaSiamRPN algorithm decreased by 96.57% and 97.37%, respectively, after using the anti-attack algorithm.

As shown in fig. 3, three frames of pictures of the Fish sequence in the OTB2015 dataset are selected, the tracking result is visualized, the first line is three frames of video images and the real position of the target, and the rectangular frame labeling is realized by black. The second line of white solid line rectangular box represents the tracking frame given by DaSiamRPN algorithm before using the anti-attack algorithm, so that the tracking is more accurate and the effect is more ideal; the white dotted rectangle represents the tracking frame after the anti-attack algorithm is used, and it can be obviously seen that the tracking frame drifts obviously, and serious errors occur in the central position and the size of the tracking frame.

It should be noted that embodiments of the present invention can be realized in hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A video tracking-oriented anti-attack method is characterized by comprising the following steps:

2. The video tracking-oriented countering attack method according to claim 1, characterized in that it comprises the following steps:

acquiring tracking results of a target tracking video and a previous frame video image;

inputting the current frame into a tracker to obtain a suggested candidate frame, calculating the intersection and comparison result of the suggested candidate frame of the current frame and the tracking result of the previous frame, and determining a real classification confidence label;

step three, calculating the real regression offset of the tracking suggestion frame of the current frame and the tracking result of the previous frame;

and step seven, transferring the gradient to a sign function as the disturbance generated by the current iteration. And adding the disturbance generated by the mth round to the confrontation input frame of the mth round of iteration to obtain the confrontation input frame of the (M + 1) th iteration, and after iterating for M times, obtaining the confrontation video image of the current frame with the disturbance finally added.

3. The video tracking-oriented attack-fighting method according to claim 2, wherein in the first step, the target tracking video is obtained, the previous frame video image is input into the tracker, and the previous frame tracking result (x) is obtained _c ,y _c ,w _r ,h _r ) Wherein x is _c ,x _y Respectively the horizontal and vertical coordinates, w, of the center point of the tracking result _r ,h _r Width and height of the tracking result respectively;

in the third step, the nth tracking suggestion frame for the current frame I

Wherein 0<n≤N，

Respectively a horizontal coordinate and a vertical coordinate of the central point of the tracking result,

4. The video tracking-oriented attack-resisting method according to claim 2, wherein in the fourth step, for the current frame I, the obtained tracker loss function is L (I, N, θ);

wherein N represents the total number of proposed candidate frames in the input frame I; l is _c Representing a binary classification loss function, and calculating by adopting a cross entropy loss function; l is _r Representing a frame regression loss function, and calculating by adopting a smoothL1 loss function;

a predictive regression offset for the nth proposed candidate box in the input frame I;

representing the true regression offset for the nth proposed candidate box in the current frame I; α is a fixed weight parameter; θ represents a network parameter employed by the tracker;

designing a classification spoofing loss function L _{cheat_class} ；

Wherein,

for the nth candidate suggestion box I _n A corresponding misclassification tag; for the

Due to generation of

There are only two possible results 1 or 0 for indicating the nth candidate box I of the current frame I _n Belongs to the target or the background, so pass through

Binary negation operation generation error classification label

Designing a regression deception loss function L _{cheat_regression} ；

Wherein, for

Is generated by

0.3<δ _offset <0.5，0.7<δ _scale <0.9；

5. The video tracking-oriented attack-countering method according to claim 2, characterized in that the perceptual loss function L in the fifth step _quality Comprises the following steps:

wherein, I _m ^adv Representing the input image frame after disturbance is added in the mth round, wherein T represents the number of pixels of the input image frame;

wherein λ represents a hyper-parameter for balancing the ratio between the plurality of loss functions;

in the sixth step, for the synthetic loss function L _adv Solving a partial derivative of the comprehensive loss function about the input frame I, and calculating a gradient and recording the gradient as r;

6. the video tracking-oriented method of countering attacks as recited in claim 2,in the seventh step, the gradient is corrected, and the disturbed input frame obtained by the (m + 1) th iteration is subjected to

Wherein,

representing the perturbed input frame obtained by the mth iteration, sign () is a sign function, r _m The gradient obtained for the mth iteration;

In the eighth step, aiming at the characteristic that the image frames in the video have space-time continuity, for the t frame video image I ^t Selecting a preceding one of the video image frames as a confrontation video image

Wherein，

The video image of the first iteration for the t-1 th frame.

7. A video tracking-oriented counter attack system applying the video tracking-oriented counter attack method according to any one of claims 1 to 6, the video tracking-oriented counter attack system comprising:

8. A computer arrangement, characterized in that the computer arrangement comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the video tracking oriented counter-attack method according to any one of claims 1 to 6.

9. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the video tracking oriented counter attack method according to any one of claims 1 to 6.

10. An information data processing terminal characterized by being used for implementing the video tracking-oriented anti-attack system according to claim 7.