CN112561965A - Real-time long-term tracking method based on correlation filtering - Google Patents

Real-time long-term tracking method based on correlation filtering Download PDF

Info

Publication number
CN112561965A
CN112561965A CN202011519370.8A CN202011519370A CN112561965A CN 112561965 A CN112561965 A CN 112561965A CN 202011519370 A CN202011519370 A CN 202011519370A CN 112561965 A CN112561965 A CN 112561965A
Authority
CN
China
Prior art keywords
target
frame
tracking
occ
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011519370.8A
Other languages
Chinese (zh)
Inventor
刘骏
齐春生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuyang Qiangsong Aviation Technology Co ltd
Original Assignee
Fuyang Qiangsong Aviation Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuyang Qiangsong Aviation Technology Co ltd filed Critical Fuyang Qiangsong Aviation Technology Co ltd
Priority to CN202011519370.8A priority Critical patent/CN112561965A/en
Publication of CN112561965A publication Critical patent/CN112561965A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of target tracking, in particular to a real-time long-term tracking method based on correlation filteringiEntering the next frame, and recording a second frame response value tau of the target; establishing a new frame target search area, wherein the position of the new frame target search area is the same as that of a previous frame target frame, and the area of the new frame target search area is 1.5 times that of the previous frame target frame; extracting a feature vector X of a target, weighting by a cosine window, jointly using an equation (6) and an equation (8) to simultaneously obtain a maximum response value under motion and scale, and selecting the feature vector X with the maximum response value
Figure DDA0002848462750000011
As the translation estimation result of the target, and selects the beta corresponding to the maximum response valueiAs an optimal scale for the target; determining whether the target failed tracking, and stopping tracking after the tracking failedTracking and starting a re-detection mechanism; the method provided by the invention can meet the requirement of real-time tracking, and well solves the problems of target shielding, drifting, rotation and the like.

Description

Real-time long-term tracking method based on correlation filtering
Technical Field
The invention relates to the technical field of target tracking, in particular to a real-time long-term tracking method based on correlation filtering.
Background
The target tracking technology is one of the important research directions in the field of computer vision, and is mature with the rapid development of sensor technology and artificial intelligence technology. In the industries of human-computer interaction, video monitoring, intelligent navigation, unmanned driving and the like, a vision-based target tracking technology occupies an increasingly important position. Although the target tracking technology has advanced greatly in recent years, robust, continuous and accurate target tracking is still regarded as a difficult task due to the influence of interference factors such as occlusion, deformation and illumination change.
In recent years, correlation filters with short-term memory have a great limitation in dealing with problems such as occlusion or disappearance of an object, and therefore, many methods use correlation filters with long-term memory for realizing higher-level tracking tasks. The combination of short-term tracker and detector is a commonly used long-term tracker structure that was first used for Tracking Learning Detection TLD (Tracking-Learning-Detection) trackers. TLD pioneers a memory-less stream group as a short-term tracker and a template-based detector running in parallel. Another example was initiated in the ali tracker, where authors treated localization as local keypoint descriptors matching weak geometric models. Ma et al propose to construct a long-term tracker with KCF (Kernelized Correlation Filter) as a short-term tracker and a random fern classifier as a detector. Likewise, Hong et al combines a KCF tracker with a SIFT-based detector that is also used to detect occlusions. Dai et al established a simplified long-term tracker using assisted re-detection that combines short-term components with support vector machine classifiers to construct long-term components. Suha et al then decomposes the object model into a grid of cells and learns an occlusion classifier for each cell. For multi-target tracking, Beyer et al propose a Bayes filter for target loss detection and re-detection for multi-target tracking. Alan et al designs a detector capable of effectively re-detecting an object in the whole image by using a new DCF (dispersive Correlation filter) constraint filter learning method. Furthermore, they suggest using the correlation response value and psr (peak to silent ratio) to evaluate tracking failure. Dai et al propose an adaptive spatial constraint correlation filtering algorithm to optimize filter weights and spatial constraint matrices. Furthermore, Goutam et al exploits the complementarity of deep and shallow features to improve robustness and accuracy. Their work has played an important role in exploiting the true potential of deep-seated features. Recently, the inventors of the present application have noted that some approaches attempt to combine the proposed network based on deep learning with correlation filter tracking. Yang et al propose a new long-term correlation filtering tracking method, which applies a tracking model based on DCF in cooperation with a new target perception detector. In their approach, they evaluate the reliability of the tracking results based on the enhanced PSR and perform target detection in case of tracking failure using a Region Proposed Network (RPN) based detector. A robust real-time long-term tracking framework is proposed strictly based on a coarse-to-fine module. The fine module aims to pinpoint the tracked object in the local search region using an offline trained regression and validation network, and the coarse module focuses on efficient selection of the most likely region from a sliding window of dense sampling.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a real-time long-term tracking method based on correlation filtering, which can solve the problems of shielding, drifting, rotation and the like of a target.
In order to achieve the purpose, the invention adopts the following technical scheme:
a real-time long-term tracking method based on correlation filtering, the method comprising:
manually selecting a tracking target, and training a motion correlation filter w on the target by using a formula (4);
Figure BDA0002848462730000031
phi represents mapping to a kernel space, and a Gaussian label is given to a training image according to the shift amount, wherein the smaller the shift amount is, the closer the label value is to 1, and otherwise, the closer the label value is to 0; λ is a regularization parameter;
finding the optimal scale β for the target using equation (8)iEnter the next frame, where λ is 10-4The gaussian kernel width σ is 0.1;
Figure BDA0002848462730000032
recording a second frame response value tau of the target;
establishing a new frame target search area, wherein the position of the new frame target search area is the same as that of a previous frame target frame, and the area of the new frame target search area is 1.5 times that of the previous frame target frame;
extracting a feature vector X of a target, weighting by a cosine window, jointly using an equation (6) and an equation (8) to simultaneously obtain a maximum response value under motion and scale, and selecting the feature vector X with the maximum response value
Figure BDA0002848462730000033
As the translation estimation result of the target, and selects the beta corresponding to the maximum response valueiAs an optimal scale for the target;
judging whether the latest continuous 5 frames meet the occlusion criterion condition:
1)Y=[y(1),y(2),y(3),y(4),y(5)]<d·τ
2)sum(Y<θ·d·τ)≥2,θ<1
where Y (i) is a response value, Y (i) is an element of Y, and θ is a coefficient. The operator sum (-) is used to calculate the number of Y (i) < θ · d · τ in the set Y; when the response values of five continuous frames reach the condition, judging that the target tracking fails, stopping tracking and starting a re-detection mechanism;
the re-detection mechanism is specifically to record the position of target loss and the average motion displacement of the target of the latest 5 frames, and correspondingly use a detection method based on a local area or a detection method based on a global area according to the size of the average motion displacement.
In a further technical scheme, when the average motion displacement is less than or equal to 15 pixels, local search is adopted, and the specific implementation steps are as follows:
1) first, the coordinates (x, y) of the object when it is lost and the width W of the object frame are determinedoccAnd height Hocc
2) By the width W of the target frameoccAnd height HoccFor reference, a search area S centered on the coordinates is constructedsearch=Wsearch×Hsearch=AWocc×BHoccWherein W issearchAnd HsearchIs the entire search area SsearchA and B are coefficients corresponding to the width and height, respectively, the larger the values of a and B, the larger the search area;
3) creating a sliding window with the same size as the target bounding box, circularly shifting along the x direction and the y direction, and extracting image characteristics in the window;
4) the step sizes of the sliding in the x and y directions are as follows:
Δx-step=(Wsearch-Wocc)/M (11)
Δy-step=(Hsearch-Hocc)/N (12)
wherein M and N are positive integers;
when the average motion displacement is larger than 15 pixels, global search is adopted, and the specific implementation steps are as follows:
1) first, the coordinates (x, y) of the object when it is lost and the width W of the object frame are determinedoccAnd height Hocc
2) Performing target detection on subsequently input images by using an edgeBox algorithm to form N' target proposals, and recording the width W of each proposali'and Width H'i
3) The N' target proposals are screened and filtered, and the proposals meeting the requirements are screened according to the following conditions
1/1.2×Wocc×Hocc<Wi′×H′i<1.2×Wocc×Hocc (13)
4) Extracting image features in the screened proposal to form a candidate sample set;
for each candidate sample, performing correlation filtering on the candidate sample by using an equation (6) and an equation (8) in sequence, and taking out a maximum response value, and if the maximum response value is greater than epsilon tau, outputting the target frame as a new initial condition to start the tracker; otherwise, the next frame is entered for detection until the target is detected.
In a further technical solution, a relevant filtering operation is implemented on each sliding window, and the constraint conditions for setting the detection result are as follows:
Max(R)>ντ (14)
and then, comparing the maximum response value with a set threshold value, if the response value is greater than the set threshold value, adopting the maximum response value as a final detection result to restart the tracker to continue working, and otherwise, entering the next frame for detection until a correct target is detected.
In a further aspect, the method further comprises,
during tracking, for each frame, conditional expression (15) is used to determine whether the template needs to be updated,
the updated scheme is as follows:
Figure BDA0002848462730000061
and wherein eta is the learning rate, and the steps are repeated until the image sequence is finished.
Compared with the prior art, the invention has the following technical effects:
the real-time long-term tracking method based on the correlation filtering provided by the invention has the advantages that experimental results on the publicly available OTB reference data set, TC-128 data set and UAV20L data set show that the method has good performance on two indexes of distance precision and overlapping success rate. In addition, the invention can meet the requirement of real-time tracking and well solve the problems of target shielding, drifting, rotation and the like.
Additional features and advantages of the invention will be set forth in the detailed description which follows.
Drawings
Fig. 1 is a theoretical framework of the real-time long-term tracking method based on correlation filtering constructed by the invention.
Fig. 2 is a relationship between a response value and a reliability of a tracking result according to the present invention.
Fig. 3 is a typical example of a correlation filter response curve and the reliability of the result according to the present invention.
Fig. 4 is a schematic diagram of a local search strategy for re-detection according to the present invention.
Fig. 5 is a schematic diagram of a global search strategy for re-detection according to the present invention.
FIG. 6 is a comparison of the results of the present invention on the OTB100 data set and the tracking accuracy index of other 9 robust tracking methods.
FIG. 7 is a comparison of the results of the present invention on the OTB100 data set with the results of the other 9 robust tracking methods on the overlay success rate index.
Fig. 8 is a comparison of the tracking accuracy results of the invention on OTB100 and other 9 robust tracking methods under the scale variation property.
FIG. 9 is a comparison of the tracking accuracy results of the present invention on OTB100 with other 9 robust tracking methods under out-of-view properties.
FIG. 10 is a comparison of the tracking accuracy results of the out-of-plane rotation of the present invention on OTB100 with other 9 robust tracking methods.
FIG. 11 is a comparison of the tracking accuracy results of the invention on OTB100 and other 9 robust tracking methods under the occlusion property.
Fig. 12 is a comparison of the tracking accuracy results of the invention on OTB100 and other 9 robust tracking methods under the motion blur property.
Fig. 13 is a comparison of the tracking accuracy of the present invention on OTB100 with the tracking accuracy results of other 9 robust tracking methods under the low accuracy property.
Fig. 14 is a comparison of the tracking accuracy results of the invention on OTB100 with other 9 robust tracking methods under the illumination variation property.
Fig. 15 is a comparison of the tracking accuracy results of the present invention on OTB100 under in-plane rotation property with other 9 robust tracking methods.
Fig. 16 is a comparison of the tracking accuracy results of the invention on OTB100 and other 9 robust tracking methods under the fast motion property.
FIG. 17 is a comparison of the tracking accuracy results of the invention on OTB100 with other 9 robust tracking methods under the distortion property.
Fig. 18 is a comparison of the tracking accuracy results of the invention on OTB100 and other 9 robust tracking methods under the background clutter property.
FIG. 19 is a comparison of tracking overlap ratio results of the present invention over OTB100 with other 9 robust tracking methods under out-of-view properties.
FIG. 20 is a comparison of tracking overlap ratio results of the present invention on OTB100 under out-of-plane rotation property with other 9 robust tracking methods.
FIG. 21 is a comparison of the tracking overlap ratio results of the invention on OTB100 under occlusion property with other 9 robust tracking methods.
FIG. 22 is a comparison of tracking overlap ratio results of the present invention on OTB100 under motion blur property with other 9 robust tracking methods.
FIG. 23 is a comparison of the tracking overlap ratio results of the present invention on OTB100 with other 9 robust tracking methods under the low resolution property.
FIG. 24 is a comparison of tracking overlap ratio results of the present invention on OTB100 under illumination variation property with other 9 robust tracking methods.
FIG. 25 is a comparison of the tracking overlap ratio results of the present invention on OTB100 under in-plane rotation property with other 9 robust tracking methods.
FIG. 26 is a comparison of the tracking overlap ratio results of the invention on OTB100 under the fast motion property with other 9 robust tracking methods.
FIG. 27 is a comparison of tracking overlap ratio results of the present invention on OTB100 under the distortion property with other 9 robust tracking methods.
FIG. 28 is a comparison of the tracking overlap ratio results of the present invention on OTB100 under the background clutter property with other 9 robust tracking methods.
FIG. 29 is a comparison of the tracking overlap ratio results of the invention on OTB100 and other 9 robust tracking methods under the scale variation property.
FIG. 30 is a comparison of the results of the present invention on the TC-128 data set with the results of the other 11 robust tracking methods on the tracking accuracy index.
FIG. 31 is a comparison of the results of the present invention on the TC-128 data set with the results of the other 11 robust tracking methods on the overlap success rate index.
Fig. 32 is a comparison of the results of the present invention on the UAV20L data set with the results on the tracking accuracy index of the other 14 robust tracking methods.
Fig. 33 is a comparison of the results of the present invention on the UAV20L data set with other 15 robust tracking methods on the overlay success rate index.
FIG. 34 is a comparison of the results of the present invention on the TC-128 data set with the results of other 9 robust tracking methods under the visualization of partial sequence tracking results.
Fig. 35 is a comparison of the results of the present invention on the UAV20L dataset with other 9 robust tracking methods under partial sequence tracking results visualization.
Detailed Description
In order to make the technical means, the creation features, the achievement purposes and the effects of the invention easy to understand, the invention is further clarified with the specific embodiments.
The invention provides a real-time long-term tracking method based on correlation filtering, which comprises the following steps:
manually selecting a tracking target, and training a motion correlation filter w on the target by using a formula (4);
Figure BDA0002848462730000101
phi represents mapping to a kernel space, and a Gaussian label is given to a training image according to the shift amount, wherein the smaller the shift amount is, the closer the label value is to 1, and otherwise, the closer the label value is to 0; λ is a regularization parameter;
finding the optimal scale β for the target using equation (8)iEnter the next frame, where λ is 10-4The gaussian kernel width σ is 0.1;
Figure BDA0002848462730000102
recording a second frame response value tau of the target;
establishing a new frame target search area, wherein the position of the new frame target search area is the same as that of a previous frame target frame, and the area of the new frame target search area is 1.5 times that of the previous frame target frame;
extracting a feature vector X of the target, weighting the feature vector X by a cosine window, and jointly using a formula (6),
Equation (8) obtains the maximum response value under the motion and the scale at the same time, and the maximum response value is selected
Figure BDA0002848462730000103
As the translation estimation result of the target, and selects the beta corresponding to the maximum response valueiAs an optimal scale for the target;
after the first frame manually selects the target, the correlation tracker regresses a circularly shifted version of the input features into a gaussian kernel function and locates the target by searching for the maximum on the response map. The basic assumption of these methods is that the circularly shifted versions of the input features approach dense samples of the target object at different locations.
The characteristic of the correlation filtering is its high speed. For the initial target position of the input, for the subsequent frame, the filter will convolute the image block of the frame near the position of the previous frame, and the convoluted output corresponds to a gray response map. The grey scale reflects the degree of correlation, the higher the grey scale the greater the degree of correlation. The position with the greatest gray scale in the response map is the new position of the target. The input image and the correlation filter are Fast Fourier Transform (FFT) transformed into the Fourier domain, where the correlation operation becomes a dot product, whereby the computational efficiency is greatly improved, as shown in equation (1).
G=F⊙H* (1)
Wherein F ═ F (F)im) And H ═ f (H) as image fimAnd filter h is transferred to the Fourier domain, and &' are the dot product operation and conjugate transpose, respectively.
Followed by an inverse transform F using FFT-1Changing G back to the space domain, and obtaining a response map.The computational complexity of the whole process is only o (plogp), where P is the number of pixels in the tracking window.
The method uses a motion Correlation filter in a KCF tracker frame, and uses a scale pool method in a SAMF (Scale additive Correlation Filter) algorithm to estimate the scale of the target in the aspect of scale.
(1) Motion dependent filter
Assuming that the size of the selected target frame is M × N, the number of samples is first increased by cyclic shift to obtain sample xm,nWherein (M, N) is an element {0, 1., M-1} × {0, 1., N-1 }.
The specific principle of cyclic shift is: let x be [ x ]1,x2,...,xn]TIs an n-dimensional column vector, P is a permutation matrix for circularly shifting x, and the shifted sample is Plx, so a sample set { P } of the training classifier is obtainedlx | l ═ 0, 1.., n-1 }. Wherein the permutation matrix:
Figure BDA0002848462730000121
combine all shifts of the n × 1 vector X into a circulant matrix X:
Figure BDA0002848462730000122
where the first row is the original column vector x, the second row is to move the element in x one bit to the right, and so on. The purpose of the cyclic shift is to cyclically shift the convolution of the encoded vector. Due to the cyclic nature, the reference sample can be obtained periodically every n shifts. Similarly, it can be equivalently seen that the first half of the circulant matrix X is shifted in the positive direction and the second half is shifted in the negative direction with respect to the vector X elements.
In obtaining xm,nAnd then minimizing the mean square error between the training image and the regression target through ridge regression to obtain a motion correlation filter w epsilon RM×NAs shown in equation (4):
Figure BDA0002848462730000123
where phi denotes mapping to kernel space. And giving a Gaussian label to the training image according to the shift amount, wherein the label value is closer to 1 if the shift amount is smaller, and is closer to 0 if the shift amount is not smaller. λ is the regularization parameter.
After mapping and discrete fourier transformation, the solution for w can be represented as a linear combination of samples:
Figure BDA0002848462730000124
wherein the coefficient alpha can be determined by equation (5) using a Gaussian kernel
Figure BDA0002848462730000131
The kernel mapping phi is defined as k phi (x) phi (x').
Figure BDA0002848462730000132
When processing the next frame, the filter w performs correlation operation on the image blocks with the size of M × N and near the target position of the previous frame, and obtains a response map after performing inverse discrete fourier transform to the spatial domain, which is calculated by formula (6):
Figure BDA0002848462730000133
wherein H ═ F (H) w, Hi=κ(x,zi) Is an element of h, ziIs the training sample obtained in the new frame and x is the target model obtained from the previous frame. In that
Figure BDA0002848462730000134
The position in which the maximum value is held is the target position of the obtained target in the new frame.
(2) Scale estimation
Since the kernel correlation function only needs to compute the dot product and the vector norm, multiple channels can be applied to the image features. For the kernel function in equation (5) under multi-feature fusion, the solution can be further derived:
Figure BDA0002848462730000135
where | | | x | | | is the modulus of the vector x, x' is the transpose of the vector x, and δ is the gaussian kernel bandwidth. The KCF algorithm determines the target center position of each frame by iterating (5) - (7).
Considering that KCF can not adapt to the scale change of the target, the invention adopts a scale pool method to further improve the performance1…βi]. By setting several scale candidate regions and obtaining the response value of the target, the scale corresponding to the maximum response value is used as the target scale in the current frame compared to the target in the previous frame.
Figure BDA0002848462730000141
Judging whether the latest continuous 5 frames meet the occlusion criterion condition:
1)Y=[y(1),y(2),y(3),y(4),y(5)]<d·τ
2)sum(Y<θ·d·τ)≥2,θ<1
where Y (i) is a response value, Y (i) is an element of Y, and θ is a coefficient. The operator sum (-) is used to calculate the number of Y (i) < θ · d · τ in the set Y; when the response values of five continuous frames reach the condition, judging that the target tracking fails, stopping tracking and starting a re-detection mechanism;
the target tracking result of the correlation filter depends on the position of the maximum response value. If the target is intact and not affected by the environment, the response map is clear with white dots highlighted, whereas it is dim and fuzzy, such as if the target is occluded. When the occlusion starts and the target is not completely occluded, the filter may still be positioned to the target according to the previous training result, however, as time passes, the occlusion area gradually increases, the pollution degree to the filter gradually deepens, and finally the polluted filter cannot re-track the target exiting the occlusion, so that the tracking is disabled.
The response value is closely related to the target tracking, and the fluctuation of the response value reflects the quality of the target tracking process. If the response value drops sharply compared to the response value of the second frame for a period of time, it means that the target tracking may fail, as shown in fig. 1 and 2. Since the tracker realizes the positioning and scale estimation of the target through the response value, the reliability of the tracking result can be characterized by using the response value. In order to improve the effectiveness of the criterion, the criterion of the invention is divided into two stages. The first stage is when the response value to find a target drops sharply over a period of time. However, the condition of reaching the first stage does not necessarily mean that the target tracking fails, which merely means that the tracking may fail. To be more rigorous, the task of the second stage is to find a decreasing more severe response value in the first stage. Here, the present invention uses response values of 5 consecutive frames to determine whether the target tracking is occluded or whether the tracking fails. This criterion is summarized as follows:
Y=[y(1),y(2),y(3),y(4),y(5)]<d·τ (9)
sum(Y<θ·d·τ)≥2,θ<1 (10)
where Y (i) is the response value, Y (i) is the element of Y, and θ is the coefficient. The operator sum (. cndot.) is used to calculate the number of Y (i) < θ d.tau in the set Y. When the response values of five consecutive frames reach the above condition, the target is considered to be occluded. Then the tracking is stopped and the re-detection mechanism is started. In addition to the sharp drop in response values caused by occlusion, other factors may also lead to the same result, such as illumination changes, scale changes, in-plane rotation, and so on. Thus, the proposed criterion can still identify tracking failures caused by other attributes.
In order to improve the accuracy and efficiency of detection, the invention provides a search strategy based on local and global regions. When the criterion is triggered, the tracker records the lost position of the target, calculates the average motion displacement of the target in the latest 5 frames, and correspondingly uses a detection method based on a local area and a detection method based on a global area according to the size of the average motion displacement.
Further, in the present invention, when the average motion displacement is less than or equal to 15 pixels, a local search is adopted, and the specific implementation steps are as follows:
1) first, the coordinates (x, y) of the object when it is lost and the width W of the object frame are determinedoccAnd height Hocc
2) By the width W of the target frameoccAnd height HoccFor reference, a search area S centered on the coordinates is constructedsearch=Wsearch×Hsearch=AWocc×BHoccWherein W issearchAnd HsearchIs the entire search area SsearchA and B are coefficients corresponding to the width and height, respectively, the larger the values of a and B, the larger the search area;
3) creating a sliding window with the same size as the target bounding box, circularly shifting along the x direction and the y direction, and extracting image characteristics in the window;
4) the step sizes of the sliding in the x and y directions are as follows:
Δx-step=(Wsearch-Wocc)/M (11)
Δy-step=(Hsearch-Hocc)/N (12)
wherein M and N are positive integers; this also means that the entire search area S is coveredsearchThere are a total of (M +1) × (N +1) sliding windows. Considering that the location of a small target is more random than a general target, the present invention moderately increases the values of a, B, M, and N when searching for a small target.
In the re-detector, the correlation filtering operation will be implemented on each sliding window. If the response value of the target corresponding to the bounding box reaches the threshold ν τ, the tracker is reinitialized with the detection result. However, environmental disturbances and changes in the target itself have a great influence on the re-detector. The response value corresponding to the sliding window is relatively small due to factors such as illumination variation, background noise, fast motion, etc. In order to effectively detect a potential target, the invention sets the following constraint conditions for the detection result:
Max(R)>ντ (14)
and then, comparing the maximum response value with a set threshold value, if the response value is greater than the set threshold value, adopting the maximum response value as a final detection result to restart the tracker to continue working, and otherwise, entering the next frame for detection until a correct target is detected.
When the average motion displacement is larger than 15 pixels, global search is adopted, and the specific implementation steps are as follows:
1) first, the coordinates (x, y) of the object when it is lost and the width W of the object frame are determinedoccAnd height Hocc
2) Performing target detection on subsequently input images by using an edgeBox algorithm to form N' target proposals, and recording the width W of each proposali'and Width H'i′;
3) The N' target proposals are screened and filtered, and the proposals meeting the requirements are screened according to the following conditions
1/1.2×Wocc×Hocc<Wi′×H′i<1.2×Wocc×Hocc (13)
4) Extracting image features in the screened proposal to form a candidate sample set;
for each candidate sample, performing correlation filtering on the candidate sample by using an equation (6) and an equation (8) in sequence, and taking out a maximum response value, and if the maximum response value is greater than epsilon tau, outputting the target frame as a new initial condition to start the tracker; otherwise, the next frame is entered for detection until the target is detected.
In the present invention, the appearance of the target changes due to rotation, deformation, and the like during tracking. Therefore, the target template should be updated during tracking to obtain strong performance. If the target template is updated too frequently, the template is easily corrupted by noise. Conversely, if the target template is updated too slowly, the template cannot capture the normal appearance changes of the target. A suitable update scheme is crucial for the tracker. To this end, the method of the invention further comprises, during tracking, for each frame, determining whether the template needs to be updated using conditional expression (15),
the updated scheme is as follows:
Figure BDA0002848462730000171
and wherein eta is the learning rate, and the steps are repeated until the image sequence is finished.
For a better understanding of the technical solution of the present invention, it is further described below with reference to the accompanying drawings.
(1) Constructing a theoretical frame of the anti-occlusion target tracking method based on the relevant filtering as shown in FIG. 1;
(2) manually selecting a tracking target, training a motion correlation filter w on the target by using a formula (4), and finding out the optimal scale beta of the target by using a formula (8)iEntering the next frame, λ is 10 in the invention-4The gaussian kernel width σ is 0.1;
(3) recording a second frame response value tau of the target;
(4) and establishing a new frame target search area, wherein the position of the new frame target search area is the same as that of the previous frame target frame, and the area of the new frame target search area is 1.5 times that of the target frame.
(5) And extracting a feature vector x of the target, weighting the feature vector x by a cosine window, and jointly using an equation (6) and an equation (8) to obtain the maximum response value under motion and scale. Choose to have the largest
Figure BDA0002848462730000181
As a result of the translation estimation of the target. Meanwhile, selecting beta corresponding to the maximum response valueiAs the optimal scale for the target.
(6) Judging whether the latest continuous 5 frames meet the shielding criterion of the invention, namely:
1)Y=[y(1),y(2),y(3),y(4),y(5)]<d·τ
2)sum(Y<θ·d·τ)≥2,θ<1
in the present invention, the condition 1) is a basic condition, and if the response value continues to decrease seriously in the condition 2), it is determined that the tracking has failed. Wherein d is 0.3 and θ is 0.7.
(7) If the target is judged to be lost, recording the position of the target lost and the average motion displacement of the target of the nearest 5 frames, and correspondingly detecting each frame of image input next by using different re-detection modes. Wherein a ═ B ═ 4, M ═ N ═ 10, and N ═ 200.
(8) In the detection threshold coefficient setting, ν is 0.3.
(9) For each candidate sample, performing correlation filtering sequentially by using the formulas (6) and (8) and taking out the maximum response value, and if the maximum response value is greater than epsilon tau, outputting the target frame as a new initial condition to start the tracker; otherwise, the next frame is entered for detection until the target is detected. The final test results are shown in fig. 5. And after the target frame of the current frame is obtained, entering the next frame.
(10) During tracking, the appearance of the target may change due to rotation, deformation, etc. For each frame, the present invention uses conditional expression (15) to determine whether the template needs to be updated. The updated scheme is as follows:
Figure BDA0002848462730000191
where η is the learning rate. And repeating the steps until the image sequence is ended.
(11) To verify the effectiveness of the present invention, the present invention (Ours) compares the proposed algorithm to other advanced trackers on the OTB dataset, the TC-128 dataset, and the UAV20L dataset. The comparison objects on the OTB are 9 advanced trackers: KCF, DSST, LCT, MEEM, SAMF, DLSSVM, Stacke, LMCF and ACFN. The comparison objects on TC-128 are 11 advanced trackers: MEEM, SAMF, Struck, ASLA, KCF, MIL, DCF-CA, OCT-KCF, FCT, L1APG, LC-CF. The comparison objects on the UAV20L are 14 advanced trackers: DSLS, MEEM, SRDCF, MUSTER, Struck, SAMF, DSST, TLD, DCF, KCF, CSK, MOSSE, ASLA and IVT. The experimental environment is Intel Core i52.3GHz CPU with 8.00G RAM, MATLAB 2017 b.
(12) To evaluate the overall performance of the tracker, the present invention evaluates the algorithm of the present invention on a published target tracking benchmark (OTB) data set. The OTB dataset contains two groups: (1) OTB-50 with 50 sequences, (2) OTB-100 with 100 sequences. All these sequences are annotated with 11 attributes, covering various challenging factors including scale changes, occlusion, illumination changes, motion blur, morphing, fast motion, out-of-plane rotation, background clutter interference, out-of-view, in-plane rotation, and low resolution. The invention uses two indexes in the reference data set to evaluate the tracking performance, namely the overlapping success rate and the distance accuracy rate.
(13) In fig. 6 and 7, it can be seen that the distance accuracy rate and the overlapping success rate of the tracker of the present invention (Ours) rank first on the OTB data set, which fully demonstrates the effectiveness of the algorithm proposed by the present invention.
(14) For the 11 challenge attributes, it can be seen from fig. 8 to fig. 18 that, on the distance accuracy index, the algorithm of the present invention (Ours) ranks first among five challenge attributes of scale change, out-of-view, occlusion, low resolution, warping, and ranks second under the challenge attributes of out-of-plane rotation, in-plane rotation, and background clutter.
(15) For the 11 challenge attributes, as can be seen from fig. 19 to fig. 29, on the index of the overlapping success rate, the algorithm of the present invention (Ours) ranks first in seven challenge attributes of out-of-plane rotation, occlusion, motion blur, low resolution, in-plane rotation, warping, background clutter, and ranks second in three challenge attributes of out-of-view, fast motion, and scale change. Therefore, the provided algorithm not only well solves the problem of target shielding, but also effectively solves the problem of tracking drift caused by other factors.
(16) From fig. 30 to fig. 31, it can be seen that the tracker of the present invention (Ours) ranks first in distance accuracy and overlap success on the TC-128 dataset, which fully demonstrates the effectiveness of the algorithm proposed by the present invention.
(17) As can be seen from fig. 32 to 33, the tracker of the present invention (Ours) ranked first in distance accuracy and overlap success on the UAV20L data set, which fully demonstrates the effectiveness of the algorithm proposed by the present invention.
(18) From the comparison of the results in FIG. 34, only the tracker of the present invention (Ours) worked well in the sequences Busstation-ce1, Carchase1, Face-ce, while the other trackers lost the target. It is worth noting that there are many similar faces in the Face-ce sequence, but the algorithm of the present invention does not recover to the wrong target, so this suggests that the threshold coefficient has a significant role in resisting similar interference. As for the sequences Motorbike-ce, Panda and Toyplane, the tracker can stably track the target in the whole process, and most other trackers cannot continuously track the target after the target is shielded.
(19) As can be seen from a comparison of the long-term tracking results in fig. 35, the proposed tracker has a significant advantage over other trackers. In the Bike, Car16 and Person7 sequences, the appearance and trajectory of the target have changed dramatically, and the tracker of the present invention (Ours) can adapt well to these changes with efficient model update strategies and target re-detection mechanisms. However, other trackers fail to update the model in a timely manner, resulting in tracking drift or tracking failure. In the sequences Group1, Group2 and Group3, the proposed trackers performed very well, while the other trackers performed less than fully. Particularly, in the sequences Group2 and Group3, only the tracker of the present invention tracks the target robustly due to the interference of other factors such as the occlusion and deformation of the target, and the other trackers fail to track.
The foregoing shows and describes the general principles, essential features, and inventive features of this invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (4)

1. A real-time long-term tracking method based on correlation filtering, the method comprising:
manually selecting a tracking target, and training a motion correlation filter w on the target by using a formula (4);
Figure FDA0002848462720000011
phi represents mapping to a kernel space, and a Gaussian label is given to a training image according to the shift amount, wherein the smaller the shift amount is, the closer the label value is to 1, and otherwise, the closer the label value is to 0; λ is a regularization parameter;
finding the optimal scale β for the target using equation (8)iEnter the next frame, where λ is 10-4The gaussian kernel width σ is 0.1;
Figure FDA0002848462720000012
recording a second frame response value tau of the target;
establishing a new frame target search area, wherein the position of the new frame target search area is the same as that of a previous frame target frame, and the area of the new frame target search area is 1.5 times that of the previous frame target frame;
extracting a feature vector X of a target, weighting by a cosine window, jointly using an equation (6) and an equation (8) to simultaneously obtain a maximum response value under motion and scale, and selecting the feature vector X with the maximum response value
Figure FDA0002848462720000013
As the translation estimation result of the target, and selects the beta corresponding to the maximum response valueiAs an optimal scale for the target;
judging whether the latest continuous 5 frames meet the occlusion criterion condition:
1)Y=[y(1),y(2),y(3),y(4),y(5)]<d·τ
2)sum(Y<θ·d·τ)≥2,θ<1
where Y (i) is the response value, Y (i) is the element of Y, and θ is the coefficient; the operator sum (-) is used to calculate the number of Y (i) < θ · d · τ in the set Y; when the response values of five continuous frames reach the condition, judging that the target tracking fails, stopping tracking and starting a re-detection mechanism;
the re-detection mechanism is specifically to record the position of target loss and the average motion displacement of the target of the latest 5 frames, and correspondingly use a detection method based on a local area or a detection method based on a global area according to the size of the average motion displacement.
2. The real-time long-term tracking method based on correlation filtering according to claim 1, wherein when the average motion displacement is less than or equal to 15 pixels, a local search is adopted, and the specific implementation steps are as follows:
1) first, the coordinates (x, y) of the object when it is lost and the width W of the object frame are determinedoccAnd height Hocc
2) By the width W of the target frameoccAnd height HoccFor reference, a search area S centered on the coordinates is constructedsearch=Wsearch×Hsearch=AWocc×BHoccWherein W issearchAnd HsearchIs the entire search area SsearchA and B are coefficients corresponding to the width and height, respectively, the larger the values of a and B, the larger the search area;
3) creating a sliding window with the same size as the target bounding box, circularly shifting along the x direction and the y direction, and extracting image characteristics in the window;
4) the step sizes of the sliding in the x and y directions are as follows:
Δx-step=(Wsearch-Wocc)/M (11)
Δy-step=(Hsearch-Hocc)/N (12)
wherein M and N are positive integers;
when the average motion displacement is larger than 15 pixels, global search is adopted, and the specific implementation steps are as follows:
1) first, the coordinates (x, y) of the object when it is lost and the width W of the object frame are determinedoccAnd height Hocc
2) Performing target detection on subsequently input images by using an EdgeBox algorithm to form N ' target proposals, and recording the width W ' of each proposal 'iAnd a width H'i
3) The N' target proposals are screened and filtered, and the proposals meeting the requirements are screened according to the following conditions
1/1.2×Wocc×Hocc<W′i×H′i<1.2×Wocc×Hocc (13)
4) Extracting image features in the screened proposal to form a candidate sample set;
for each candidate sample, performing correlation filtering on the candidate sample by using an equation (6) and an equation (8) in sequence, and taking out a maximum response value, and if the maximum response value is greater than epsilon tau, outputting the target frame as a new initial condition to start the tracker; otherwise, the next frame is entered for detection until the target is detected.
3. The correlation filtering based real-time long-term tracking method according to claim 2, wherein the correlation filtering operation is implemented on each sliding window, and the constraint conditions for setting the detection result are as follows:
Max(R)>ντ (14)
and then, comparing the maximum response value with a set threshold value, if the response value is greater than the set threshold value, adopting the maximum response value as a final detection result to restart the tracker to continue working, and otherwise, entering the next frame for detection until a correct target is detected.
4. The correlation filtering based real-time long-term tracking method according to claim 1, further comprising,
during tracking, for each frame, conditional expression (15) is used to determine whether the template needs to be updated,
the updated scheme is as follows:
Figure FDA0002848462720000041
and wherein eta is the learning rate, and the steps are repeated until the image sequence is finished.
CN202011519370.8A 2020-12-21 2020-12-21 Real-time long-term tracking method based on correlation filtering Pending CN112561965A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011519370.8A CN112561965A (en) 2020-12-21 2020-12-21 Real-time long-term tracking method based on correlation filtering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011519370.8A CN112561965A (en) 2020-12-21 2020-12-21 Real-time long-term tracking method based on correlation filtering

Publications (1)

Publication Number Publication Date
CN112561965A true CN112561965A (en) 2021-03-26

Family

ID=75030619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011519370.8A Pending CN112561965A (en) 2020-12-21 2020-12-21 Real-time long-term tracking method based on correlation filtering

Country Status (1)

Country Link
CN (1) CN112561965A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926693A (en) * 2021-04-12 2021-06-08 辽宁工程技术大学 Kernel correlation filtering algorithm for fast motion and motion blur
CN113723190A (en) * 2021-07-29 2021-11-30 北京工业大学 Multi-target tracking method for synchronous moving target

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341819A (en) * 2017-05-09 2017-11-10 深圳市速腾聚创科技有限公司 Method for tracking target and storage medium
CN109299735A (en) * 2018-09-14 2019-02-01 上海交通大学 Anti-shelter target tracking based on correlation filtering
CN110599519A (en) * 2019-08-27 2019-12-20 上海交通大学 Anti-occlusion related filtering tracking method based on domain search strategy

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341819A (en) * 2017-05-09 2017-11-10 深圳市速腾聚创科技有限公司 Method for tracking target and storage medium
CN109299735A (en) * 2018-09-14 2019-02-01 上海交通大学 Anti-shelter target tracking based on correlation filtering
CN110599519A (en) * 2019-08-27 2019-12-20 上海交通大学 Anti-occlusion related filtering tracking method based on domain search strategy

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵俊豪: "机载对地运动目标遮挡条件下稳定跟踪方法研究", 《中国优秀硕士学位论文全文数据库 (工程科技Ⅱ辑)》, 15 June 2020 (2020-06-15), pages 45 - 47 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926693A (en) * 2021-04-12 2021-06-08 辽宁工程技术大学 Kernel correlation filtering algorithm for fast motion and motion blur
CN112926693B (en) * 2021-04-12 2024-05-24 辽宁工程技术大学 Nuclear related filtering method for fast motion and motion blur
CN113723190A (en) * 2021-07-29 2021-11-30 北京工业大学 Multi-target tracking method for synchronous moving target

Similar Documents

Publication Publication Date Title
CN110599519B (en) Anti-occlusion related filtering tracking method based on domain search strategy
CN110211045B (en) Super-resolution face image reconstruction method based on SRGAN network
CN109299735B (en) Anti-occlusion target tracking method based on correlation filtering
CN108520530B (en) Target tracking method based on long-time and short-time memory network
CN109063615B (en) Sign language identification method and system
CN108198209B (en) People tracking method under the condition of shielding and scale change
CN106952288B (en) Based on convolution feature and global search detect it is long when block robust tracking method
Wang et al. Do not lose the details: reinforced representation learning for high performance visual tracking
CN108288282B (en) Adaptive feature selection target tracking method based on convolutional neural network
CN111583300B (en) Target tracking method based on enrichment target morphological change update template
CN112561965A (en) Real-time long-term tracking method based on correlation filtering
CN110120064A (en) A kind of depth related objective track algorithm based on mutual reinforcing with the study of more attention mechanisms
CN110992288B (en) Video image blind denoising method used in mine shaft environment
CN110135365B (en) Robust target tracking method based on illusion countermeasure network
CN114708300B (en) Anti-shielding self-adaptive target tracking method and system
CN111986225A (en) Multi-target tracking method and device based on angular point detection and twin network
CN110660080A (en) Multi-scale target tracking method based on learning rate adjustment and fusion of multilayer convolution features
CN109784155B (en) Visual target tracking method based on verification and error correction mechanism and intelligent robot
CN113052873A (en) Single-target tracking method for on-line self-supervision learning scene adaptation
CN112446900A (en) Twin neural network target tracking method and system
CN111767905A (en) Improved image method based on landmark-convolution characteristics
CN112561845A (en) Long-term tracking method based on infrared and visible light fusion
CN115272405A (en) Robust online learning ship tracking method based on twin network
CN110660077A (en) Multi-scale target tracking method fusing multiple features
Zhou et al. Uhp-sot: An unsupervised high-performance single object tracker

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210326

RJ01 Rejection of invention patent application after publication