CN115511910A - Anti-attack method, system, medium, equipment and terminal for video tracking - Google Patents

Anti-attack method, system, medium, equipment and terminal for video tracking Download PDF

Info

Publication number
CN115511910A
CN115511910A CN202211010630.8A CN202211010630A CN115511910A CN 115511910 A CN115511910 A CN 115511910A CN 202211010630 A CN202211010630 A CN 202211010630A CN 115511910 A CN115511910 A CN 115511910A
Authority
CN
China
Prior art keywords
frame
tracking
loss function
video
regression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211010630.8A
Other languages
Chinese (zh)
Other versions
CN115511910B (en
Inventor
李福生
鲁欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangtze River Delta Research Institute of UESTC Huzhou
Original Assignee
Yangtze River Delta Research Institute of UESTC Huzhou
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangtze River Delta Research Institute of UESTC Huzhou filed Critical Yangtze River Delta Research Institute of UESTC Huzhou
Priority to CN202211010630.8A priority Critical patent/CN115511910B/en
Publication of CN115511910A publication Critical patent/CN115511910A/en
Application granted granted Critical
Publication of CN115511910B publication Critical patent/CN115511910B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of computer vision, and discloses an anti-attack method, a system, a medium, equipment and a terminal for video tracking, wherein a space-time transformation attack algorithm is constructed, and the number of iteration rounds, the disturbance magnitude and the balance coefficient of a joint loss function of the space-time anti-attack algorithm are determined; inputting the tracking result of the previous frame into a space-time countermeasure attack algorithm, performing N-round gradient descent iteration every time the algorithm is operated, generating disturbance, and adding the disturbance into the current frame; inputting the attacked video frame serving as a tracking image into a tracking algorithm, acquiring a tracking result of the current frame, and then iteratively operating the process until all video frames of all video sequences of the test data set are traversed; and recording and storing the tracking result of each frame, and quantitatively analyzing the tracking accuracy and the tracking success rate. The attack resisting method provided by the invention greatly reduces the disturbance intensity of the resisting sample, and greatly reduces the possibility of being perceived while achieving an obvious attack effect.

Description

Anti-attack method, system, medium, equipment and terminal for video tracking
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a video tracking-oriented attack resisting method, a video tracking-oriented attack resisting system, a video tracking-oriented attack resisting medium, a video tracking-oriented attack resisting device and a video tracking-oriented attack resisting terminal.
Background
As an important component of computer vision, visual target tracking has been widely applied to scenes such as unmanned driving and security. With the rapid development of the deep learning technology, the target tracking technology based on the deep learning makes a significant breakthrough. However, in recent years, researchers in the fields of target tracking, image segmentation, and natural language processing have reported counterattack, and the security of deep learning is worried about. In the field of image recognition, the presence of challenge samples has attracted the researchers' attention, but in the field of single-target tracking, the study of challenge samples is still relatively rare due to the complexity of this problem.
Early gradient iteration-based anti-attack algorithms such as FGSM, PAD, BIM misled the deep learning model by optimizing the loss function, but such algorithms required knowledge of the full extent of the tracking algorithm and the attack effect was poor. The method is characterized in that a single-pixel anti-attack algorithm is provided by Jianwei Su et al, a counter sample which only modifies one pixel is generated through the algorithm, namely, a deep learning model misleads to wrong classification with high confidence, but the method cannot adapt to multi-frame tasks of videos. In the same year, xugang Wu et al propose STA algorithms, which analyze the common vulnerability of twin network-based trackers and prove the ubiquitous nature of generating countermeasure samples in the field of target tracking; yan et al propose a cooling shrinkage counterattack method, which cools the thermodynamic diagram of the location of the target and shrinks the predicted bounding box to make the tracked object unable to be tracked.
Through the above analysis, the problems and defects of the prior art are as follows: in the field of single target tracking, because the task of target tracking includes not only classification of foreground and background but also regression of frame, research on countercheck samples is still relatively few due to complexity of countercheck problem of target tracking, on the other hand, research on countercheck mainly stays on a single image at present, and research on video countercheck is still to be explored.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a video tracking-oriented attack resisting method, a system, a medium, equipment and a terminal.
The invention is realized in such a way that a video tracking-oriented anti-attack method comprises the following steps:
constructing a space-time transformation attack algorithm, determining the iterative round number, the range of the added disturbance size, and the balance coefficient and the iterative round number of a combined loss function of the space-time anti-attack algorithm; inputting the tracking result of the previous frame into a space-time countermeasure attack algorithm, performing N-round gradient descent iteration every time the algorithm is operated, generating disturbance, and adding the disturbance into the current frame; inputting the attacked video frame serving as a tracking image into a tracking algorithm, acquiring a tracking result of a current frame, adding disturbance generated by the current frame to a next frame serving as an initial frame of iteration, and then iterating to run a process to traverse all video frames of all video sequences of a test data set; and recording and storing the tracking result of each frame, and quantitatively analyzing the tracking accuracy and the tracking success rate.
Further, the video tracking-oriented attack resisting method comprises the following steps:
acquiring tracking results of a target tracking video and a previous frame of video image;
step two, inputting the current frame into a tracker to obtain a suggested candidate frame, calculating an intersection comparison result of the suggested candidate frame of the current frame and a tracking result of a previous frame, and determining a real classification confidence label;
step three, calculating the real regression offset between the tracking suggestion frame of the current frame and the tracking result of the previous frame;
step four, obtaining a tracker loss function for the current frame, wherein the tracker loss function comprises a binary classification loss function and a frame regression loss function; designing a classification deception loss function according to the binary classification loss function, and designing a regression deception loss function according to the frame regression loss function;
designing a perception loss function; comprehensively classifying the deception loss function, the regression deception loss function and the perception loss function to obtain a comprehensive loss function;
step six, solving a partial derivative related to the input frame for the comprehensive loss function, and calculating a gradient;
and step seven, transferring the gradient to a sign function as disturbance generated by the current iteration. And adding the disturbance generated by the mth round to the countermeasure input frame of the mth round of iteration to obtain the countermeasure input frame of the (M + 1) th iteration, and after the iteration is performed for M times, obtaining the countermeasure video image of the current frame with the disturbance finally added.
And step eight, selecting disturbance finally generated by the t-1 th frame of the video image frame to initialize the video image of the first iteration of the t-th frame for the t-th frame video image.
Further, in the first step, a target tracking video is obtained, a previous frame of video image is input into the tracker, and a previous frame of tracking result (x) is obtained c ,y c ,w r ,h r ) Wherein x is c ,x y Respectively the horizontal and vertical coordinates, w, of the center point of the tracking result r ,h r Width and height of the tracking result, respectively.
In the second step, the current frame I is input into the tracker to obtain N suggested candidate frames, and the N suggested candidate frames of the current frame and the tracking result (x) of the previous frame are calculated c ,y c ,w r ,h r ) Cross-over and cross-over ratio result IOU of 1 Then true classification confidence label p c Comprises the following steps:
Figure BDA0003809637750000031
in the third step, for the nth tracking suggestion frame of the current frame I
Figure BDA0003809637750000032
Wherein 0<n≤N,
Figure BDA0003809637750000033
Respectively the horizontal and vertical coordinates of the central point of the tracking result,
Figure BDA0003809637750000034
width and height of the tracking result respectively; and the last frame tracking result (x) c ,y c ,w r ,h r ) Has a true regression offset of
Figure BDA0003809637750000035
Figure BDA0003809637750000036
Figure BDA0003809637750000037
Figure BDA0003809637750000038
Figure BDA0003809637750000039
Calculating N tracking suggestion frames of the current frame and the tracking result (x) of the previous frame c ,y c ,w r ,h r ) True regression offset p r
Further, in the fourth step, for the current frame I, the obtained tracker loss function is L (I, N, θ);
Figure BDA0003809637750000041
wherein N represents the total number of proposed candidate frames in the input frame I; l is a radical of an alcohol c Representing a binary classification loss function, and calculating by adopting a cross entropy loss function; l is r Representing a frame regression loss function, and calculating by adopting a smoothL1 loss function;
Figure BDA0003809637750000042
a prediction classification confidence score representing the nth suggested candidate box in the current frame I;
Figure BDA0003809637750000043
a predicted regression offset for the nth proposed candidate in the input frame I;
Figure BDA0003809637750000044
representing a true classification confidence score for the nth suggested candidate box in the current frame I;
Figure BDA0003809637750000045
representing the true regression offset for the nth proposed candidate box in the current frame I; α is a fixed weight parameter; theta represents the network parameter employed by the tracker.
Designing a classification spoofing loss function L cheat_class
Figure BDA0003809637750000046
Wherein,
Figure BDA0003809637750000047
for the nth candidate suggestion box I n A corresponding misclassification tag. For
Figure BDA0003809637750000048
Is generated due to
Figure BDA0003809637750000049
There are only two possible results 1 or 0 indicating the nth candidate frame I of the current frame I n Belong to the object or the background, so pass through
Figure BDA00038096377500000410
Binary negation operations to generate error classification labels
Figure BDA00038096377500000411
Designing a regression deception loss function L cheat_regression
Figure BDA00038096377500000412
Wherein for
Figure BDA00038096377500000413
Is generated by
Figure BDA00038096377500000414
Adding random distance offsets delta respectively offset And random proportional variation delta scale Generating error regression labels
Figure BDA00038096377500000415
0.3<δ offset <0.5,0.7<δ scale <0.9;
Figure BDA00038096377500000416
Figure BDA00038096377500000417
Figure BDA00038096377500000418
Figure BDA00038096377500000419
Further, the perceptual loss function L in the fifth step quality Comprises the following steps:
Figure BDA0003809637750000051
wherein, I m adv And T represents the input image frame after disturbance is added in the mth round, and the number of pixels of the input image frame is represented.
The comprehensive classification spoofing loss function L cheat_class Regression deception loss function L cheat_regression And a perception loss function to obtain a comprehensive loss function L adv (K, N, θ) is:
Figure BDA0003809637750000052
where λ represents a hyperparameter for balancing the ratio between the plurality of loss functions.
In the sixth step, for the synthetic loss function L adv Calculating a partial derivative of the synthetic loss function about the input frame I, and calculating a gradient and recording the gradient as r;
Figure BDA0003809637750000053
further, in the seventh step, the gradient is transmitted into the sign function, and for the m +1 th iteration, the input frame is resisted
Figure BDA0003809637750000054
Figure BDA0003809637750000055
Wherein,
Figure BDA0003809637750000056
epsilon is the maximum perturbation, epsilon =0.15, M is the maximum number of iterations, M =10; m represents the turn of the current iteration,
Figure BDA0003809637750000057
a countermeasure input frame representing the mth iteration, sign () being a sign function, r m The gradient obtained for the mth iteration.
Obtaining a confrontation video image of the final added disturbance of the current frame after iteration for M times
Figure BDA0003809637750000058
In the eighth step, aiming at the characteristic that the image frames in the video have space-time continuity, the video image I of the t-th frame is subjected to t Selecting the previous frame of the video image frameCountering video images
Figure BDA0003809637750000059
The obtained video image of the first iteration of the t frame of disturbance initialization
Figure BDA00038096377500000510
Figure BDA00038096377500000511
Wherein,
Figure BDA0003809637750000061
the video image of the first iteration for the t-1 th frame.
Another object of the present invention is to provide a video tracking-oriented counterattack system applying the video tracking-oriented counterattack method, the video tracking-oriented counterattack system including:
the tracking result acquisition module is used for acquiring tracking results of the target tracking video and the previous frame video image;
the real classification confidence label determining module is used for inputting the current frame into the tracker to obtain a suggested candidate frame, calculating the intersection comparison result of the suggested candidate frame of the current frame and the tracking result of the previous frame, and determining a real classification confidence label;
the real regression offset calculation module is used for calculating the real regression offset of the tracking suggestion frame of the current frame and the tracking result of the previous frame;
the loss function involves the module, is used for confirming the tracker loss function of the current frame, and design and classify and deceive the loss function and design and regress and deceive the loss function;
the comprehensive loss function determining module is used for designing a perception loss function; comprehensively classifying the deception loss function, the regression deception loss function and the perception loss function to obtain a comprehensive loss function;
the gradient calculation module is used for solving the partial derivative of the synthetic loss function about the input frame and calculating the gradient;
the countermeasure video image determination module is used for transmitting the gradient into a sign function to reduce the influence of an abnormal value, adding disturbance generated by the mth iteration into the input countermeasure frame of the mth iteration and obtaining an input countermeasure frame of the m +1 th iteration; after M times of iteration, obtaining a confrontation video image of the current frame with disturbance added finally;
and the video image initialization module is used for selecting the t-1 frame of the video image frame to resist the disturbance obtained by the video image and initializing the video image of the first iteration of the t frame for the t frame video image.
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the video tracking oriented counter attack method.
It is a further object of the present invention to provide a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, causes the processor to carry out the steps of the video tracking-oriented anti-attack method.
Another object of the present invention is to provide an information data processing terminal, which is used for implementing the video tracking-oriented anti-attack system.
In combination with the technical solutions and the technical problems to be solved, please analyze the advantages and positive effects of the technical solutions to be protected in the present invention from the following aspects:
first, aiming at the technical problems and difficulties in solving the problems in the prior art, the technical problems to be solved by the technical scheme of the present invention are closely combined with results, data and the like in the research and development process, and some creative technical effects are brought after the problems are solved. The specific description is as follows:
with the rapid development of computer vision, visual target tracking has been widely applied to scenes such as unmanned driving, security and the like. However, researches show that the tracking algorithm based on the neural network has potential safety hazards due to artificially created confrontation samples, so that the precision of the tracking algorithm is obviously reduced. Aiming at a target tracking task based on deep learning, the invention provides a novel lightweight and efficient space-time transformation attack algorithm based on gradient iteration. The algorithm generates confrontation samples frame by frame, emphasizes the characteristic of space-time motion consistency in the target tracking field, and adds dense confrontation samples in a space-time domain to mislead the most advanced tracker. In addition, the algorithm adopts a joint loss function which consists of classification loss, regression loss and perception loss functions; the classification loss function enables the tracker to have errors when classifying the target and the background, the regression loss function enables the tracking frame to drift, and the perception loss function maintains the image quality to reduce distortion, so that the provided algorithm can realize high-efficiency attack on the tracking algorithm while the generated disturbance is not easy to be perceived by human eyes. For the countermeasure attack of video target tracking, the existing depth tracker is not modified, disturbance is learned and injected into an input frame, an indistinguishable countermeasure sample is generated, and the disturbance learned in the current frame is utilized to initialize the disturbance learning of the next frame in consideration of the space-time consistency between video frames, so that the performance of the depth tracker is further reduced.
In the invention, the iteration round number, the disturbance range and the balance coefficient of the combined loss function of the space-time countermeasure attack algorithm are determined firstly. And inputting the tracking result of the previous frame into a space-time countermeasure attack algorithm, performing N-round gradient descent iteration every time the algorithm is operated, generating disturbance, and adding the disturbance into the current frame. And inputting the attacked video frame serving as a tracking image into a tracking algorithm, acquiring a tracking result of the current frame, adding the disturbance generated by the current frame to the next frame serving as an initial frame of iteration, and then iteratively operating the process until all video frames of all video sequences of the test data set are traversed. And recording and storing the tracking result of each frame, and quantitatively analyzing the tracking accuracy and the tracking success rate.
Secondly, considering the technical scheme as a whole or from the perspective of products, the technical effect and advantages of the technical scheme to be protected by the invention are specifically described as follows:
aiming at the safety problem of a tracking algorithm based on deep learning, the invention provides a novel light-weight, efficient and intensive-attack space-time transformation attack algorithm aiming at target tracking by focusing on a target tracking network based on deep learning. On one hand, a joint loss function is designed, wherein the classification deception loss function misleading tracking algorithm classifies the target foreground and the background, the frame of the regression deception loss function misleading tracking algorithm regresses, and the perception loss function ensures that the anti-attack algorithm can achieve obvious attack effect and greatly reduces the possibility of being perceived. On the other hand, the attack algorithm provided by the invention has universality by learning a disturbance generation countersample to finish the aggression to the tracking algorithm under the condition that the parameters of the tracking algorithm are unknown. Finally, the attack algorithm provided by the invention considers the space-time consistency among video frames, and utilizes the disturbance learned in the current frame to initialize the disturbance learning of the next frame, thereby further reducing the performance of the depth tracker.
Third, as an inventive supplementary proof of the claims of the present invention, there are also presented several important aspects:
the technical scheme of the invention fills the technical blank in the industry at home and abroad:
in the field of single target tracking, because the task of target tracking includes not only classification of foreground and background but also regression of frame, research on confrontation samples is still less due to complexity of the confrontation attack problem of target tracking. According to the method, a loss function is respectively designed to mislead classification and regression tasks of a tracking algorithm according to output responses, namely a real classification confidence coefficient and a real regression offset, based on a deep learning tracker. In addition, a perception loss function is designed, so that the disturbance added by the proposed algorithm is not easily perceived by human eyes. On the other hand, the current researches on antagonistic attack and defense mainly stay on a single image, and the researches on the video antagonistic attack are still to be explored. The technical scheme of the invention utilizes the space-time consistency between video frames to initialize the disturbance learning of the next frame by the disturbance learned in the current frame, thereby further reducing the performance of the depth tracker.
The technical scheme of the invention solves the technical problem that people are eagerly to solve but can not be successfully solved all the time:
the technical scheme of the invention solves the problem that the attack on the video target tracking can be realized under the condition that the model parameters of the tracking algorithm are unknown. Secondly, the technical scheme of the invention does not modify the existing depth tracker, but learns the disturbances and injects the disturbances into the input frame to generate the indistinguishable countersamples, thereby greatly reducing the difficulty of technical realization. Finally, relatively few studies have been conducted to generate countersamples on video sequences to achieve attack depth tracking, where inter-frame motion consistency involves more challenges, and the technical solution of the present invention utilizes the temporal and spatial consistency between video frames to achieve video-based counterattacks.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic diagram of an attack-fighting method for video tracking according to an embodiment of the present invention;
fig. 2 is a success rate diagram and an accuracy diagram of the DaSiamRPN tracking algorithm before and after being attacked, which are provided by the embodiment of the present invention;
fig. 3 is a schematic diagram of a visualization result of the DaSiamRPN tracking algorithm before and after being attacked according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
In view of the problems in the prior art, the present invention provides a method, a system, a medium, a device and a terminal for resisting attack facing video tracking, and the present invention is described in detail below with reference to the accompanying drawings.
The video tracking-oriented attack resisting method provided by the embodiment of the invention comprises the following steps:
s101, acquiring tracking results of a target tracking video and a previous frame video image;
s102, inputting the current frame into a tracker to obtain a suggested candidate frame, calculating an intersection comparison result of the suggested candidate frame of the current frame and a tracking result of a previous frame, and determining a real classification confidence label;
s103, calculating the real regression offset between the tracking suggestion frame of the current frame and the tracking result of the previous frame;
s104, determining a tracker loss function of the current frame, and designing a classification deception loss function and a regression deception loss function;
s105, designing a perception loss function; comprehensively classifying the deception loss function, the regression deception loss function and the perception loss function to obtain a comprehensive loss function;
s106, solving a partial derivative of the comprehensive loss function about the input frame, and calculating a gradient;
s107, correcting the gradient, and inputting the frame after the disturbance is added and obtained by the (m + 1) th iteration; after M times of iteration, obtaining a confrontation video image of the current frame with disturbance added finally;
s108, for the t frame video image, selecting a t-1 frame of the video image frame to resist disturbance obtained by the video image and initializing a video image of the first iteration of the t frame.
As a preferred embodiment, as shown in fig. 1, the video tracking-oriented attack-fighting method provided by the embodiment of the present invention specifically includes the following steps:
step 1: acquiring a target tracking video, inputting a previous frame video image into a tracker, and acquiring a previous frame tracking result (x) c ,y c ,w r ,h r ) Wherein x is c ,x y Respectively the horizontal and vertical coordinates, w, of the center point of the tracking result r ,h r Width and height of the tracking result, respectively.
Step 2: inputting the current frame I into a tracker to obtain N suggested candidate frames, and calculating the N suggested candidate frames of the current frame and the tracking result (x) of the previous frame c ,y c ,w r ,h r ) Cross-over and cross-over ratio result IOU of 1 Then true classification confidence label p c Comprises the following steps:
Figure BDA0003809637750000101
and step 3: for the nth (0) of the current frame I<N is less than or equal to N) tracking suggestion frame
Figure BDA0003809637750000102
Wherein
Figure BDA0003809637750000103
Respectively the horizontal and vertical coordinates of the central point of the tracking result,
Figure BDA0003809637750000104
width and height of the tracking result, respectively. It tracks the result (x) with the last frame c ,y c ,w r ,h r ) Is the true regression offset of
Figure BDA0003809637750000105
Namely:
Figure BDA0003809637750000111
Figure BDA0003809637750000112
Figure BDA0003809637750000113
Figure BDA0003809637750000114
calculating N tracking suggestion frames of the current frame and the tracking result (x) of the previous frame c ,y c ,w r ,h r ) True regression offset p r
And 4, step 4: for the current frame I, the obtained tracker loss function is L (I, N, theta), namely;
Figure BDA0003809637750000115
where N denotes the total number of proposed candidate frames in the input frame I, L c Representing a binary classification loss function, and calculating by adopting a cross entropy loss function; l is r Representing a frame regression loss function, adopting smoothL1 loss function to calculate,
Figure BDA0003809637750000116
a prediction classification confidence score representing the nth suggested candidate box in the current frame I,
Figure BDA0003809637750000117
the predicted regression offset for the nth proposed candidate in the input frame I,
Figure BDA0003809637750000118
represents the true classification confidence score for the nth proposed candidate box in the current frame I,
Figure BDA0003809637750000119
represents the true regression offset for the nth proposed candidate box in the current frame I, α is a fixed weight parameter, and θ represents the network parameter employed by the tracker.
And 5: in order to cheat the classification branch of the tracker, a classification cheating loss function L is designed cheat_class Namely:
Figure BDA00038096377500001110
wherein,
Figure BDA00038096377500001111
for the nth candidate suggestion box I n A corresponding misclassification tag. For the
Figure BDA00038096377500001112
Due to generation of
Figure BDA00038096377500001113
There are only two possible results (1 or 0) to indicate the nth candidate frame I of the current frame I n Belongs to the target or the background, so pass through
Figure BDA00038096377500001114
Binary negation operations to generate error classification labels
Figure BDA00038096377500001115
Step 6: in order to cheat the classification branch of the tracker, a regression cheating loss function L is designed cheat_regression Namely:
Figure BDA00038096377500001116
wherein, for
Figure BDA0003809637750000121
Is generated by
Figure BDA0003809637750000122
Adding random distance offsets delta respectively offset (0.3<δ offset <0.5 ) and random proportional change delta scale (0.7<δ scale <0.9 ) generate false regression tags
Figure BDA0003809637750000123
Figure BDA0003809637750000124
Figure BDA0003809637750000125
Figure BDA0003809637750000126
Figure BDA0003809637750000127
And 7: in order to restrict the disturbance intensity added to a single video image and make the video image after disturbance addition closer to the original video image so as to reduce the possibility of being perceived, the invention designs a perception loss function L quality Namely:
Figure BDA0003809637750000128
wherein, I m adv Representing the input image frame after the mth round of adding the disturbance, I 1 And T represents the number of pixels of the input image frame for the 1 st iteration of the input image frame.
And 8: comprehensive classification spoofing loss function L cheat_class Regression deception loss function L cheat_regression And a perception loss function to obtain a comprehensive loss function L adv (K, N, θ), i.e.:
Figure BDA0003809637750000129
where λ represents a hyperparameter for balancing the ratio between the plurality of loss functions.
And step 9: for the synthetic loss function L adv To bias it with respect to the input frame IDerivative, calculate gradient as r, i.e.:
Figure BDA00038096377500001210
step 10: introducing the gradient into a sign function, and for the perturbed input frame obtained by the (m + 1) th iteration
Figure BDA00038096377500001211
Namely:
Figure BDA00038096377500001212
wherein,
Figure BDA0003809637750000131
ε is the perturbation maximum, ε =0.15, M is the maximum number of iterations, M =10; m represents the turn of the current iteration,
Figure BDA0003809637750000132
representing the perturbed input frame obtained by the mth iteration, sign () is a sign function, r m The gradient obtained for the mth iteration.
Step 11: obtaining a confrontation video image of the final added disturbance of the current frame after iteration for M times
Figure BDA0003809637750000133
Step 12: aiming at the characteristic that image frames in the video have space-time continuity, for the t frame video image I t Selecting a previous frame (t-1 th frame) of the video image frame as a countermeasure video image
Figure BDA0003809637750000134
Initializing the video image of the first iteration of the t frame by the obtained disturbance
Figure BDA0003809637750000135
Namely:
Figure BDA0003809637750000136
wherein,
Figure BDA0003809637750000137
the video image of the first iteration for the t-1 th frame.
The video tracking-oriented anti-attack system provided by the embodiment of the invention comprises:
the tracking result acquisition module is used for acquiring tracking results of the target tracking video and the previous frame video image;
the real classification confidence label determining module is used for inputting the current frame into the tracker to obtain a suggested candidate frame, calculating the intersection comparison result of the suggested candidate frame of the current frame and the tracking result of the previous frame, and determining a real classification confidence label;
the real regression offset calculation module is used for calculating the real regression offset of the tracking suggestion frame of the current frame and the tracking result of the previous frame;
the loss function relating module is used for determining the tracker loss function of the current frame, designing a classification deception loss function and designing a regression deception loss function;
the comprehensive loss function determining module is used for designing a perception loss function; comprehensively classifying the deception loss function, the regression deception loss function and the perception loss function to obtain a comprehensive loss function;
the gradient calculation module is used for solving the partial derivative of the comprehensive loss function about the input frame and calculating the gradient;
the confrontation video image determining module is used for correcting the gradient and inputting the disturbed input frame obtained by the (m + 1) th iteration; after M times of iteration, obtaining a confrontation video image of the current frame with disturbance added finally;
and the video image initialization module is used for selecting the t-1 frame of the video image frame to resist the disturbance obtained by the video image and initializing the video image of the first iteration of the t frame for the t frame video image.
In the tracking framework, a depth tracker usually adopts a convolutional neural network architecture comprising two branches, wherein the first branch is responsible for a classification task and identifies whether all the proposed candidate frames belong to a foreground frame or a background frame; the second branch is responsible for regression tasks and accurately positions the position of the target. The adversarial attack algorithm provided by the invention is deployed into the advanced DaSiamRPN tracking algorithm. Given an input video sequence and an initial frame of labeled bounding boxes, in processing each video frame, the countering perturbations are generated in combination with the perceptual loss function according to the DaSiamRPN tracking algorithm's output response, i.e., the true classification confidence and the true regression offset. These perturbations are added to the input frame to generate an antagonistic sample. And then inputting the generated confrontation sample into a DaSiamRPN tracking algorithm to obtain a precision graph and a success rate graph of the result.
Based on the above thought, the embodiment of the present invention provides an attack resisting method for a DaSiamRPN tracking network, which specifically includes the following steps:
step 1: and inputting the video data set into a DaSiamRPN tracker to obtain an output response, namely acquiring a tracking result given by the tracker aiming at the previous frame. Calculating a real classification confidence label p according to the steps 1 to 3 c And the true regression offset p r
Step 2: setting the size of disturbance generated by an anti-attack algorithm to be limited between (0) and (255) in each pixel point, and setting the balance coefficient lambda of a joint loss function to be 5;
and step 3: inputting the tracking result of the previous frame into an anti-attack algorithm, and obtaining an anti-video image of the current frame with disturbance added finally according to the steps 4-11;
and 4, step 4: inputting the attacked video frame serving as a tracking image into a DaSiamRPN algorithm to obtain a tracking result of the current frame;
and 5: the video image of the first iteration of the next frame is initialized according to step 12 and then the process is iteratively run through all video frames of the complete video sequence of the test data set.
As shown in fig. 2, the legend "DaSiamRPN" represents the DaSiamRPN tracking performance without using the anti-attack algorithm, and the legend "DaSiamRPN _ attack" represents the DaSiamRPN tracking performance using the anti-attack algorithm, and the tracking accuracy and the tracking success rate of the DaSiamRPN algorithm decreased by 96.57% and 97.37%, respectively, after using the anti-attack algorithm.
As shown in fig. 3, three frames of pictures of the Fish sequence in the OTB2015 dataset are selected, the tracking result is visualized, the first line is three frames of video images and the real position of the target, and the rectangular frame labeling is realized by black. The second line of white solid line rectangular box represents the tracking frame given by DaSiamRPN algorithm before using the anti-attack algorithm, so that the tracking is more accurate and the effect is more ideal; the white dotted rectangle represents the tracking frame after the anti-attack algorithm is used, and it can be obviously seen that the tracking frame drifts obviously, and serious errors occur in the central position and the size of the tracking frame.
It should be noted that embodiments of the present invention can be realized in hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A video tracking-oriented anti-attack method is characterized by comprising the following steps:
constructing a space-time transformation attack algorithm, determining the iterative round number, the range of the added disturbance size, and the balance coefficient and the iterative round number of a combined loss function of the space-time anti-attack algorithm; inputting the tracking result of the previous frame into a space-time countermeasure attack algorithm, performing N-round gradient descent iteration every time the algorithm is operated, generating disturbance, and adding the disturbance into the current frame; inputting the attacked video frame serving as a tracking image into a tracking algorithm, acquiring a tracking result of a current frame, adding disturbance generated by the current frame to a next frame serving as an initial frame of iteration, and then iterating to run a process to traverse all video frames of all video sequences of a test data set; and recording and storing the tracking result of each frame, and quantitatively analyzing the tracking accuracy and the tracking success rate.
2. The video tracking-oriented countering attack method according to claim 1, characterized in that it comprises the following steps:
acquiring tracking results of a target tracking video and a previous frame video image;
inputting the current frame into a tracker to obtain a suggested candidate frame, calculating the intersection and comparison result of the suggested candidate frame of the current frame and the tracking result of the previous frame, and determining a real classification confidence label;
step three, calculating the real regression offset of the tracking suggestion frame of the current frame and the tracking result of the previous frame;
step four, obtaining a tracker loss function for the current frame, wherein the tracker loss function comprises a binary classification loss function and a frame regression loss function; designing a classification deception loss function according to the binary classification loss function, and designing a regression deception loss function according to the frame regression loss function;
designing a perception loss function; comprehensively classifying the deception loss function, the regression deception loss function and the perception loss function to obtain a comprehensive loss function;
step six, solving a partial derivative related to the input frame for the comprehensive loss function, and calculating a gradient;
and step seven, transferring the gradient to a sign function as the disturbance generated by the current iteration. And adding the disturbance generated by the mth round to the confrontation input frame of the mth round of iteration to obtain the confrontation input frame of the (M + 1) th iteration, and after iterating for M times, obtaining the confrontation video image of the current frame with the disturbance finally added.
And step eight, selecting disturbance finally generated by the t-1 th frame of the video image frame to initialize the video image of the first iteration of the t-th frame for the t-th frame video image.
3. The video tracking-oriented attack-fighting method according to claim 2, wherein in the first step, the target tracking video is obtained, the previous frame video image is input into the tracker, and the previous frame tracking result (x) is obtained c ,y c ,w r ,h r ) Wherein x is c ,x y Respectively the horizontal and vertical coordinates, w, of the center point of the tracking result r ,h r Width and height of the tracking result respectively;
in the second step, the current frame I is input into the tracker to obtain N suggested candidate frames, and the N suggested candidate frames of the current frame and the tracking result (x) of the previous frame are calculated c ,y c ,w r ,h r ) Cross-over and cross-over ratio result IOU of 1 Then true classification confidence label p c Comprises the following steps:
Figure FDA0003809637740000021
in the third step, the nth tracking suggestion frame for the current frame I
Figure FDA0003809637740000022
Wherein 0<n≤N,
Figure FDA0003809637740000023
Respectively a horizontal coordinate and a vertical coordinate of the central point of the tracking result,
Figure FDA0003809637740000024
width and height of the tracking result respectively; and the last frame tracking result (x) c ,y c ,w r ,h r ) Has a true regression offset of
Figure FDA0003809637740000025
Figure FDA0003809637740000026
Figure FDA0003809637740000027
Figure FDA0003809637740000028
Figure FDA0003809637740000029
Calculating N tracking suggestion frames of the current frame and the tracking result (x) of the previous frame c ,y c ,w r ,h r ) True regression offset p r
4. The video tracking-oriented attack-resisting method according to claim 2, wherein in the fourth step, for the current frame I, the obtained tracker loss function is L (I, N, θ);
Figure FDA00038096377400000210
wherein N represents the total number of proposed candidate frames in the input frame I; l is c Representing a binary classification loss function, and calculating by adopting a cross entropy loss function; l is r Representing a frame regression loss function, and calculating by adopting a smoothL1 loss function;
Figure FDA0003809637740000031
a prediction classification confidence score representing the nth suggested candidate box in the current frame I;
Figure FDA0003809637740000032
a predictive regression offset for the nth proposed candidate box in the input frame I;
Figure FDA0003809637740000033
representing a true classification confidence score for the nth suggested candidate box in the current frame I;
Figure FDA0003809637740000034
representing the true regression offset for the nth proposed candidate box in the current frame I; α is a fixed weight parameter; θ represents a network parameter employed by the tracker;
designing a classification spoofing loss function L cheat_class
Figure FDA0003809637740000035
Wherein,
Figure FDA0003809637740000036
for the nth candidate suggestion box I n A corresponding misclassification tag; for the
Figure FDA0003809637740000037
Due to generation of
Figure FDA0003809637740000038
There are only two possible results 1 or 0 for indicating the nth candidate box I of the current frame I n Belongs to the target or the background, so pass through
Figure FDA0003809637740000039
Binary negation operation generation error classification label
Figure FDA00038096377400000310
Designing a regression deception loss function L cheat_regression
Figure FDA00038096377400000311
Wherein, for
Figure FDA00038096377400000312
Is generated by
Figure FDA00038096377400000313
Adding random distance offsets delta respectively offset And random proportional variation delta scale Generating error regression labels
Figure FDA00038096377400000314
0.3<δ offset <0.5,0.7<δ scale <0.9;
Figure FDA00038096377400000315
Figure FDA00038096377400000316
Figure FDA00038096377400000317
Figure FDA00038096377400000318
5. The video tracking-oriented attack-countering method according to claim 2, characterized in that the perceptual loss function L in the fifth step quality Comprises the following steps:
Figure FDA0003809637740000041
wherein, I m adv Representing the input image frame after disturbance is added in the mth round, wherein T represents the number of pixels of the input image frame;
the comprehensive classification spoofing loss function L cheat_class Regression deception loss function L cheat_regression And a perception loss function to obtain a comprehensive loss function L adv (K, N, θ) is:
Figure FDA0003809637740000042
wherein λ represents a hyper-parameter for balancing the ratio between the plurality of loss functions;
in the sixth step, for the synthetic loss function L adv Solving a partial derivative of the comprehensive loss function about the input frame I, and calculating a gradient and recording the gradient as r;
Figure FDA0003809637740000043
6. the video tracking-oriented method of countering attacks as recited in claim 2,in the seventh step, the gradient is corrected, and the disturbed input frame obtained by the (m + 1) th iteration is subjected to
Figure FDA0003809637740000044
Figure FDA0003809637740000045
Wherein,
Figure FDA0003809637740000046
ε is the perturbation maximum, ε =0.15, M is the maximum number of iterations, M =10; m represents the turn of the current iteration,
Figure FDA0003809637740000047
representing the perturbed input frame obtained by the mth iteration, sign () is a sign function, r m The gradient obtained for the mth iteration;
obtaining a confrontation video image of the final added disturbance of the current frame after iteration for M times
Figure FDA0003809637740000048
In the eighth step, aiming at the characteristic that the image frames in the video have space-time continuity, for the t frame video image I t Selecting a preceding one of the video image frames as a confrontation video image
Figure FDA0003809637740000049
The obtained video image of the first iteration of the t frame of disturbance initialization
Figure FDA00038096377400000410
Figure FDA00038096377400000411
Wherein,
Figure FDA00038096377400000412
The video image of the first iteration for the t-1 th frame.
7. A video tracking-oriented counter attack system applying the video tracking-oriented counter attack method according to any one of claims 1 to 6, the video tracking-oriented counter attack system comprising:
the tracking result acquisition module is used for acquiring tracking results of the target tracking video and the previous frame video image;
the real classification confidence label determining module is used for inputting the current frame into the tracker to obtain a suggested candidate frame, calculating the intersection comparison result of the suggested candidate frame of the current frame and the tracking result of the previous frame, and determining a real classification confidence label;
the real regression offset calculation module is used for calculating the real regression offset of the tracking suggestion frame of the current frame and the tracking result of the previous frame;
the loss function relating module is used for determining the tracker loss function of the current frame, designing a classification deception loss function and designing a regression deception loss function;
the comprehensive loss function determining module is used for designing a perception loss function; comprehensively classifying the deception loss function, the regression deception loss function and the perception loss function to obtain a comprehensive loss function;
the gradient calculation module is used for solving the partial derivative of the comprehensive loss function about the input frame and calculating the gradient;
the confrontation video image determining module is used for correcting the gradient and inputting the disturbed input frame obtained by the (m + 1) th iteration; after M times of iteration, obtaining a confrontation video image of the current frame with disturbance added finally;
and the video image initialization module is used for selecting the t-1 frame of the video image frame to resist the disturbance obtained by the video image and initializing the video image of the first iteration of the t frame for the t frame video image.
8. A computer arrangement, characterized in that the computer arrangement comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the video tracking oriented counter-attack method according to any one of claims 1 to 6.
9. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the video tracking oriented counter attack method according to any one of claims 1 to 6.
10. An information data processing terminal characterized by being used for implementing the video tracking-oriented anti-attack system according to claim 7.
CN202211010630.8A 2022-08-22 2022-08-22 Video tracking-oriented attack countermeasure method, system, medium, equipment and terminal Active CN115511910B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211010630.8A CN115511910B (en) 2022-08-22 2022-08-22 Video tracking-oriented attack countermeasure method, system, medium, equipment and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211010630.8A CN115511910B (en) 2022-08-22 2022-08-22 Video tracking-oriented attack countermeasure method, system, medium, equipment and terminal

Publications (2)

Publication Number Publication Date
CN115511910A true CN115511910A (en) 2022-12-23
CN115511910B CN115511910B (en) 2024-01-12

Family

ID=84502394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211010630.8A Active CN115511910B (en) 2022-08-22 2022-08-22 Video tracking-oriented attack countermeasure method, system, medium, equipment and terminal

Country Status (1)

Country Link
CN (1) CN115511910B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558902A (en) * 2018-11-20 2019-04-02 成都通甲优博科技有限责任公司 A kind of fast target detection method
CN111627044A (en) * 2020-04-26 2020-09-04 上海交通大学 Target tracking attack and defense method based on deep network
CN112966553A (en) * 2021-02-02 2021-06-15 同济大学 Strong coupling target tracking method, device, medium and equipment based on twin network
CN113628244A (en) * 2021-07-05 2021-11-09 上海交通大学 Target tracking method, system, terminal and medium based on label-free video training
CN113808165A (en) * 2021-09-14 2021-12-17 电子科技大学 Point disturbance attack resisting method facing three-dimensional target tracking model
CN114463113A (en) * 2022-01-27 2022-05-10 度小满科技(北京)有限公司 Method and device for supplementing positive samples in credit investigation wind control modeling
CN114511593A (en) * 2022-01-25 2022-05-17 中国矿业大学 Visual target tracking transferable black box attack method based on important features

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558902A (en) * 2018-11-20 2019-04-02 成都通甲优博科技有限责任公司 A kind of fast target detection method
CN111627044A (en) * 2020-04-26 2020-09-04 上海交通大学 Target tracking attack and defense method based on deep network
CN112966553A (en) * 2021-02-02 2021-06-15 同济大学 Strong coupling target tracking method, device, medium and equipment based on twin network
CN113628244A (en) * 2021-07-05 2021-11-09 上海交通大学 Target tracking method, system, terminal and medium based on label-free video training
CN113808165A (en) * 2021-09-14 2021-12-17 电子科技大学 Point disturbance attack resisting method facing three-dimensional target tracking model
CN114511593A (en) * 2022-01-25 2022-05-17 中国矿业大学 Visual target tracking transferable black box attack method based on important features
CN114463113A (en) * 2022-01-27 2022-05-10 度小满科技(北京)有限公司 Method and device for supplementing positive samples in credit investigation wind control modeling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
程旭 等: ""基于空间感知的多级损失目标跟踪对抗攻击方法"", 《通信学报》, vol. 42, no. 11, pages 242 - 254 *

Also Published As

Publication number Publication date
CN115511910B (en) 2024-01-12

Similar Documents

Publication Publication Date Title
CN110097568B (en) Video object detection and segmentation method based on space-time dual-branch network
Liu et al. Fuzzy-aided solution for out-of-view challenge in visual tracking under IoT-assisted complex environment
CN110910391B (en) Video object segmentation method for dual-module neural network structure
Tsagkatakis et al. Online distance metric learning for object tracking
CN112419327B (en) Image segmentation method, system and device based on generation countermeasure network
Long et al. Object detection in aerial images using feature fusion deep networks
US20220156944A1 (en) Apparatus and method with video processing
CN105844665A (en) Method and device for tracking video object
CN113111947A (en) Image processing method, apparatus and computer-readable storage medium
CN113313166A (en) Ship target automatic labeling method based on feature consistency learning
Li et al. Real-time detection tracking and recognition algorithm based on multi-target faces
Phon-Amnuaisuk et al. Exploring the applications of faster R-CNN and single-shot multi-box detection in a smart nursery domain
CN112084887A (en) Attention mechanism-based self-adaptive video classification method and system
Huang et al. TATrack: Target-aware transformer for object tracking
Liu et al. FishTrack: Multi-object tracking method for fish using spatiotemporal information fusion
Jiang et al. Dynamic temporal–spatial regularization-based channel weight correlation filter for aerial object tracking
CN117115555A (en) Semi-supervised three-dimensional target detection method based on noise data
Yan et al. Real-time unmanned aerial vehicle tracking of fast moving small target on ground
CN115511910A (en) Anti-attack method, system, medium, equipment and terminal for video tracking
Sanches et al. Recommendations for evaluating the performance of background subtraction algorithms for surveillance systems
Maharani et al. Deep features fusion for KCF-based moving object tracking
Raif et al. Metamorphic testing for edge real-time face recognition and intrusion detection solution
Sun et al. Flying Bird Object Detection Algorithm in Surveillance Video Based on Motion Information
Paramanandam et al. A review on deep learning techniques for saliency detection
Nakka et al. Universal, transferable adversarial perturbations for visual object trackers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant