CN112561965A

CN112561965A - Real-time long-term tracking method based on correlation filtering

Info

Publication number: CN112561965A
Application number: CN202011519370.8A
Authority: CN
Inventors: 刘骏; 齐春生
Original assignee: Fuyang Qiangsong Aviation Technology Co ltd
Current assignee: Fuyang Qiangsong Aviation Technology Co ltd
Priority date: 2020-12-21
Filing date: 2020-12-21
Publication date: 2021-03-26

Abstract

The invention relates to the technical field of target tracking, in particular to a real-time long-term tracking method based on correlation filtering_iEntering the next frame, and recording a second frame response value tau of the target; establishing a new frame target search area, wherein the position of the new frame target search area is the same as that of a previous frame target frame, and the area of the new frame target search area is 1.5 times that of the previous frame target frame; extracting a feature vector X of a target, weighting by a cosine window, jointly using an equation (6) and an equation (8) to simultaneously obtain a maximum response value under motion and scale, and selecting the feature vector X with the maximum response value

As the translation estimation result of the target, and selects the beta corresponding to the maximum response value_iAs an optimal scale for the target; determining whether the target failed tracking, and stopping tracking after the tracking failedTracking and starting a re-detection mechanism; the method provided by the invention can meet the requirement of real-time tracking, and well solves the problems of target shielding, drifting, rotation and the like.

Description

Real-time long-term tracking method based on correlation filtering

Technical Field

The invention relates to the technical field of target tracking, in particular to a real-time long-term tracking method based on correlation filtering.

Background

The target tracking technology is one of the important research directions in the field of computer vision, and is mature with the rapid development of sensor technology and artificial intelligence technology. In the industries of human-computer interaction, video monitoring, intelligent navigation, unmanned driving and the like, a vision-based target tracking technology occupies an increasingly important position. Although the target tracking technology has advanced greatly in recent years, robust, continuous and accurate target tracking is still regarded as a difficult task due to the influence of interference factors such as occlusion, deformation and illumination change.

In recent years, correlation filters with short-term memory have a great limitation in dealing with problems such as occlusion or disappearance of an object, and therefore, many methods use correlation filters with long-term memory for realizing higher-level tracking tasks. The combination of short-term tracker and detector is a commonly used long-term tracker structure that was first used for Tracking Learning Detection TLD (Tracking-Learning-Detection) trackers. TLD pioneers a memory-less stream group as a short-term tracker and a template-based detector running in parallel. Another example was initiated in the ali tracker, where authors treated localization as local keypoint descriptors matching weak geometric models. Ma et al propose to construct a long-term tracker with KCF (Kernelized Correlation Filter) as a short-term tracker and a random fern classifier as a detector. Likewise, Hong et al combines a KCF tracker with a SIFT-based detector that is also used to detect occlusions. Dai et al established a simplified long-term tracker using assisted re-detection that combines short-term components with support vector machine classifiers to construct long-term components. Suha et al then decomposes the object model into a grid of cells and learns an occlusion classifier for each cell. For multi-target tracking, Beyer et al propose a Bayes filter for target loss detection and re-detection for multi-target tracking. Alan et al designs a detector capable of effectively re-detecting an object in the whole image by using a new DCF (dispersive Correlation filter) constraint filter learning method. Furthermore, they suggest using the correlation response value and psr (peak to silent ratio) to evaluate tracking failure. Dai et al propose an adaptive spatial constraint correlation filtering algorithm to optimize filter weights and spatial constraint matrices. Furthermore, Goutam et al exploits the complementarity of deep and shallow features to improve robustness and accuracy. Their work has played an important role in exploiting the true potential of deep-seated features. Recently, the inventors of the present application have noted that some approaches attempt to combine the proposed network based on deep learning with correlation filter tracking. Yang et al propose a new long-term correlation filtering tracking method, which applies a tracking model based on DCF in cooperation with a new target perception detector. In their approach, they evaluate the reliability of the tracking results based on the enhanced PSR and perform target detection in case of tracking failure using a Region Proposed Network (RPN) based detector. A robust real-time long-term tracking framework is proposed strictly based on a coarse-to-fine module. The fine module aims to pinpoint the tracked object in the local search region using an offline trained regression and validation network, and the coarse module focuses on efficient selection of the most likely region from a sliding window of dense sampling.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides a real-time long-term tracking method based on correlation filtering, which can solve the problems of shielding, drifting, rotation and the like of a target.

In order to achieve the purpose, the invention adopts the following technical scheme:

a real-time long-term tracking method based on correlation filtering, the method comprising:

manually selecting a tracking target, and training a motion correlation filter w on the target by using a formula (4);

phi represents mapping to a kernel space, and a Gaussian label is given to a training image according to the shift amount, wherein the smaller the shift amount is, the closer the label value is to 1, and otherwise, the closer the label value is to 0; λ is a regularization parameter;

finding the optimal scale β for the target using equation (8)_iEnter the next frame, where λ is 10^-4The gaussian kernel width σ is 0.1;

recording a second frame response value tau of the target;

establishing a new frame target search area, wherein the position of the new frame target search area is the same as that of a previous frame target frame, and the area of the new frame target search area is 1.5 times that of the previous frame target frame;

extracting a feature vector X of a target, weighting by a cosine window, jointly using an equation (6) and an equation (8) to simultaneously obtain a maximum response value under motion and scale, and selecting the feature vector X with the maximum response value

As the translation estimation result of the target, and selects the beta corresponding to the maximum response value_iAs an optimal scale for the target;

judging whether the latest continuous 5 frames meet the occlusion criterion condition:

1)Y＝[y(1),y(2),y(3),y(4),y(5)]＜d·τ

2)sum(Y＜θ·d·τ)≥2,θ＜1

where Y (i) is a response value, Y (i) is an element of Y, and θ is a coefficient. The operator sum (-) is used to calculate the number of Y (i) < θ · d · τ in the set Y; when the response values of five continuous frames reach the condition, judging that the target tracking fails, stopping tracking and starting a re-detection mechanism;

the re-detection mechanism is specifically to record the position of target loss and the average motion displacement of the target of the latest 5 frames, and correspondingly use a detection method based on a local area or a detection method based on a global area according to the size of the average motion displacement.

In a further technical scheme, when the average motion displacement is less than or equal to 15 pixels, local search is adopted, and the specific implementation steps are as follows:

1) first, the coordinates (x, y) of the object when it is lost and the width W of the object frame are determined_occAnd height H_occ；

2) By the width W of the target frame_occAnd height H_occFor reference, a search area S centered on the coordinates is constructed_search＝W_search×H_search＝AW_occ×BH_occWherein W is_searchAnd H_searchIs the entire search area S_searchA and B are coefficients corresponding to the width and height, respectively, the larger the values of a and B, the larger the search area;

3) creating a sliding window with the same size as the target bounding box, circularly shifting along the x direction and the y direction, and extracting image characteristics in the window;

4) the step sizes of the sliding in the x and y directions are as follows:

Δ_x-step＝(W_search-W_occ)/M (11)

Δ_y-step＝(H_search-H_occ)/N (12)

wherein M and N are positive integers;

when the average motion displacement is larger than 15 pixels, global search is adopted, and the specific implementation steps are as follows:

2) Performing target detection on subsequently input images by using an edgeBox algorithm to form N' target proposals, and recording the width W of each proposal_i'and Width H'_i；

3) The N' target proposals are screened and filtered, and the proposals meeting the requirements are screened according to the following conditions

1/1.2×W_occ×H_occ＜W_i′×H′_i＜1.2×W_occ×H_occ (13)

4) Extracting image features in the screened proposal to form a candidate sample set;

for each candidate sample, performing correlation filtering on the candidate sample by using an equation (6) and an equation (8) in sequence, and taking out a maximum response value, and if the maximum response value is greater than epsilon tau, outputting the target frame as a new initial condition to start the tracker; otherwise, the next frame is entered for detection until the target is detected.

In a further technical solution, a relevant filtering operation is implemented on each sliding window, and the constraint conditions for setting the detection result are as follows:

Max(R)＞ντ (14)

and then, comparing the maximum response value with a set threshold value, if the response value is greater than the set threshold value, adopting the maximum response value as a final detection result to restart the tracker to continue working, and otherwise, entering the next frame for detection until a correct target is detected.

In a further aspect, the method further comprises,

during tracking, for each frame, conditional expression (15) is used to determine whether the template needs to be updated,

the updated scheme is as follows:

and wherein eta is the learning rate, and the steps are repeated until the image sequence is finished.

Compared with the prior art, the invention has the following technical effects:

the real-time long-term tracking method based on the correlation filtering provided by the invention has the advantages that experimental results on the publicly available OTB reference data set, TC-128 data set and UAV20L data set show that the method has good performance on two indexes of distance precision and overlapping success rate. In addition, the invention can meet the requirement of real-time tracking and well solve the problems of target shielding, drifting, rotation and the like.

Additional features and advantages of the invention will be set forth in the detailed description which follows.

Drawings

Fig. 1 is a theoretical framework of the real-time long-term tracking method based on correlation filtering constructed by the invention.

Fig. 2 is a relationship between a response value and a reliability of a tracking result according to the present invention.

Fig. 3 is a typical example of a correlation filter response curve and the reliability of the result according to the present invention.

Fig. 4 is a schematic diagram of a local search strategy for re-detection according to the present invention.

Fig. 5 is a schematic diagram of a global search strategy for re-detection according to the present invention.

FIG. 6 is a comparison of the results of the present invention on the OTB100 data set and the tracking accuracy index of other 9 robust tracking methods.

FIG. 7 is a comparison of the results of the present invention on the OTB100 data set with the results of the other 9 robust tracking methods on the overlay success rate index.

Fig. 8 is a comparison of the tracking accuracy results of the invention on OTB100 and other 9 robust tracking methods under the scale variation property.

FIG. 9 is a comparison of the tracking accuracy results of the present invention on OTB100 with other 9 robust tracking methods under out-of-view properties.

FIG. 10 is a comparison of the tracking accuracy results of the out-of-plane rotation of the present invention on OTB100 with other 9 robust tracking methods.

FIG. 11 is a comparison of the tracking accuracy results of the invention on OTB100 and other 9 robust tracking methods under the occlusion property.

Fig. 12 is a comparison of the tracking accuracy results of the invention on OTB100 and other 9 robust tracking methods under the motion blur property.

Fig. 13 is a comparison of the tracking accuracy of the present invention on OTB100 with the tracking accuracy results of other 9 robust tracking methods under the low accuracy property.

Fig. 14 is a comparison of the tracking accuracy results of the invention on OTB100 with other 9 robust tracking methods under the illumination variation property.

Fig. 15 is a comparison of the tracking accuracy results of the present invention on OTB100 under in-plane rotation property with other 9 robust tracking methods.

Fig. 16 is a comparison of the tracking accuracy results of the invention on OTB100 and other 9 robust tracking methods under the fast motion property.

FIG. 17 is a comparison of the tracking accuracy results of the invention on OTB100 with other 9 robust tracking methods under the distortion property.

Fig. 18 is a comparison of the tracking accuracy results of the invention on OTB100 and other 9 robust tracking methods under the background clutter property.

FIG. 19 is a comparison of tracking overlap ratio results of the present invention over OTB100 with other 9 robust tracking methods under out-of-view properties.

FIG. 20 is a comparison of tracking overlap ratio results of the present invention on OTB100 under out-of-plane rotation property with other 9 robust tracking methods.

FIG. 21 is a comparison of the tracking overlap ratio results of the invention on OTB100 under occlusion property with other 9 robust tracking methods.

FIG. 22 is a comparison of tracking overlap ratio results of the present invention on OTB100 under motion blur property with other 9 robust tracking methods.

FIG. 23 is a comparison of the tracking overlap ratio results of the present invention on OTB100 with other 9 robust tracking methods under the low resolution property.

FIG. 24 is a comparison of tracking overlap ratio results of the present invention on OTB100 under illumination variation property with other 9 robust tracking methods.

FIG. 25 is a comparison of the tracking overlap ratio results of the present invention on OTB100 under in-plane rotation property with other 9 robust tracking methods.

FIG. 26 is a comparison of the tracking overlap ratio results of the invention on OTB100 under the fast motion property with other 9 robust tracking methods.

FIG. 27 is a comparison of tracking overlap ratio results of the present invention on OTB100 under the distortion property with other 9 robust tracking methods.

FIG. 28 is a comparison of the tracking overlap ratio results of the present invention on OTB100 under the background clutter property with other 9 robust tracking methods.

FIG. 29 is a comparison of the tracking overlap ratio results of the invention on OTB100 and other 9 robust tracking methods under the scale variation property.

FIG. 30 is a comparison of the results of the present invention on the TC-128 data set with the results of the other 11 robust tracking methods on the tracking accuracy index.

FIG. 31 is a comparison of the results of the present invention on the TC-128 data set with the results of the other 11 robust tracking methods on the overlap success rate index.

Fig. 32 is a comparison of the results of the present invention on the UAV20L data set with the results on the tracking accuracy index of the other 14 robust tracking methods.

Fig. 33 is a comparison of the results of the present invention on the UAV20L data set with other 15 robust tracking methods on the overlay success rate index.

FIG. 34 is a comparison of the results of the present invention on the TC-128 data set with the results of other 9 robust tracking methods under the visualization of partial sequence tracking results.

Fig. 35 is a comparison of the results of the present invention on the UAV20L dataset with other 9 robust tracking methods under partial sequence tracking results visualization.

Detailed Description

In order to make the technical means, the creation features, the achievement purposes and the effects of the invention easy to understand, the invention is further clarified with the specific embodiments.

The invention provides a real-time long-term tracking method based on correlation filtering, which comprises the following steps:

recording a second frame response value tau of the target;

extracting a feature vector X of the target, weighting the feature vector X by a cosine window, and jointly using a formula (6),

Equation (8) obtains the maximum response value under the motion and the scale at the same time, and the maximum response value is selected

after the first frame manually selects the target, the correlation tracker regresses a circularly shifted version of the input features into a gaussian kernel function and locates the target by searching for the maximum on the response map. The basic assumption of these methods is that the circularly shifted versions of the input features approach dense samples of the target object at different locations.

The characteristic of the correlation filtering is its high speed. For the initial target position of the input, for the subsequent frame, the filter will convolute the image block of the frame near the position of the previous frame, and the convoluted output corresponds to a gray response map. The grey scale reflects the degree of correlation, the higher the grey scale the greater the degree of correlation. The position with the greatest gray scale in the response map is the new position of the target. The input image and the correlation filter are Fast Fourier Transform (FFT) transformed into the Fourier domain, where the correlation operation becomes a dot product, whereby the computational efficiency is greatly improved, as shown in equation (1).

G＝F⊙H^* (1)

Wherein F ═ F (F)_im) And H ═ f (H) as image f_imAnd filter h is transferred to the Fourier domain, and &' are the dot product operation and conjugate transpose, respectively.

Followed by an inverse transform F using FFT^-1Changing G back to the space domain, and obtaining a response map.The computational complexity of the whole process is only o (plogp), where P is the number of pixels in the tracking window.

The method uses a motion Correlation filter in a KCF tracker frame, and uses a scale pool method in a SAMF (Scale additive Correlation Filter) algorithm to estimate the scale of the target in the aspect of scale.

(1) Motion dependent filter

Assuming that the size of the selected target frame is M × N, the number of samples is first increased by cyclic shift to obtain sample x_m,nWherein (M, N) is an element {0, 1., M-1} × {0, 1., N-1 }.

The specific principle of cyclic shift is: let x be [ x ]₁,x₂,...,x_n]^TIs an n-dimensional column vector, P is a permutation matrix for circularly shifting x, and the shifted sample is P^lx, so a sample set { P } of the training classifier is obtained^lx | l ═ 0, 1.., n-1 }. Wherein the permutation matrix:

combine all shifts of the n × 1 vector X into a circulant matrix X:

where the first row is the original column vector x, the second row is to move the element in x one bit to the right, and so on. The purpose of the cyclic shift is to cyclically shift the convolution of the encoded vector. Due to the cyclic nature, the reference sample can be obtained periodically every n shifts. Similarly, it can be equivalently seen that the first half of the circulant matrix X is shifted in the positive direction and the second half is shifted in the negative direction with respect to the vector X elements.

In obtaining x_m,nAnd then minimizing the mean square error between the training image and the regression target through ridge regression to obtain a motion correlation filter w epsilon R^M×NAs shown in equation (4):

where phi denotes mapping to kernel space. And giving a Gaussian label to the training image according to the shift amount, wherein the label value is closer to 1 if the shift amount is smaller, and is closer to 0 if the shift amount is not smaller. λ is the regularization parameter.

After mapping and discrete fourier transformation, the solution for w can be represented as a linear combination of samples:

wherein the coefficient alpha can be determined by equation (5) using a Gaussian kernel

The kernel mapping phi is defined as k phi (x) phi (x').

When processing the next frame, the filter w performs correlation operation on the image blocks with the size of M × N and near the target position of the previous frame, and obtains a response map after performing inverse discrete fourier transform to the spatial domain, which is calculated by formula (6):

wherein H ═ F (H) w, H_i＝κ(x,z_i) Is an element of h, z_iIs the training sample obtained in the new frame and x is the target model obtained from the previous frame. In that

The position in which the maximum value is held is the target position of the obtained target in the new frame.

(2) Scale estimation

Since the kernel correlation function only needs to compute the dot product and the vector norm, multiple channels can be applied to the image features. For the kernel function in equation (5) under multi-feature fusion, the solution can be further derived:

where | | | x | | | is the modulus of the vector x, x' is the transpose of the vector x, and δ is the gaussian kernel bandwidth. The KCF algorithm determines the target center position of each frame by iterating (5) - (7).

Considering that KCF can not adapt to the scale change of the target, the invention adopts a scale pool method to further improve the performance₁…β_i]. By setting several scale candidate regions and obtaining the response value of the target, the scale corresponding to the maximum response value is used as the target scale in the current frame compared to the target in the previous frame.

1)Y＝[y(1),y(2),y(3),y(4),y(5)]＜d·τ

2)sum(Y＜θ·d·τ)≥2,θ＜1

the target tracking result of the correlation filter depends on the position of the maximum response value. If the target is intact and not affected by the environment, the response map is clear with white dots highlighted, whereas it is dim and fuzzy, such as if the target is occluded. When the occlusion starts and the target is not completely occluded, the filter may still be positioned to the target according to the previous training result, however, as time passes, the occlusion area gradually increases, the pollution degree to the filter gradually deepens, and finally the polluted filter cannot re-track the target exiting the occlusion, so that the tracking is disabled.

The response value is closely related to the target tracking, and the fluctuation of the response value reflects the quality of the target tracking process. If the response value drops sharply compared to the response value of the second frame for a period of time, it means that the target tracking may fail, as shown in fig. 1 and 2. Since the tracker realizes the positioning and scale estimation of the target through the response value, the reliability of the tracking result can be characterized by using the response value. In order to improve the effectiveness of the criterion, the criterion of the invention is divided into two stages. The first stage is when the response value to find a target drops sharply over a period of time. However, the condition of reaching the first stage does not necessarily mean that the target tracking fails, which merely means that the tracking may fail. To be more rigorous, the task of the second stage is to find a decreasing more severe response value in the first stage. Here, the present invention uses response values of 5 consecutive frames to determine whether the target tracking is occluded or whether the tracking fails. This criterion is summarized as follows:

Y＝[y(1),y(2),y(3),y(4),y(5)]＜d·τ (9)

sum(Y＜θ·d·τ)≥2,θ＜1 (10)

where Y (i) is the response value, Y (i) is the element of Y, and θ is the coefficient. The operator sum (. cndot.) is used to calculate the number of Y (i) < θ d.tau in the set Y. When the response values of five consecutive frames reach the above condition, the target is considered to be occluded. Then the tracking is stopped and the re-detection mechanism is started. In addition to the sharp drop in response values caused by occlusion, other factors may also lead to the same result, such as illumination changes, scale changes, in-plane rotation, and so on. Thus, the proposed criterion can still identify tracking failures caused by other attributes.

In order to improve the accuracy and efficiency of detection, the invention provides a search strategy based on local and global regions. When the criterion is triggered, the tracker records the lost position of the target, calculates the average motion displacement of the target in the latest 5 frames, and correspondingly uses a detection method based on a local area and a detection method based on a global area according to the size of the average motion displacement.

Further, in the present invention, when the average motion displacement is less than or equal to 15 pixels, a local search is adopted, and the specific implementation steps are as follows:

4) the step sizes of the sliding in the x and y directions are as follows:

Δ_x-step＝(W_search-W_occ)/M (11)

Δ_y-step＝(H_search-H_occ)/N (12)

wherein M and N are positive integers; this also means that the entire search area S is covered_searchThere are a total of (M +1) × (N +1) sliding windows. Considering that the location of a small target is more random than a general target, the present invention moderately increases the values of a, B, M, and N when searching for a small target.

In the re-detector, the correlation filtering operation will be implemented on each sliding window. If the response value of the target corresponding to the bounding box reaches the threshold ν τ, the tracker is reinitialized with the detection result. However, environmental disturbances and changes in the target itself have a great influence on the re-detector. The response value corresponding to the sliding window is relatively small due to factors such as illumination variation, background noise, fast motion, etc. In order to effectively detect a potential target, the invention sets the following constraint conditions for the detection result:

Max(R)＞ντ (14)

2) Performing target detection on subsequently input images by using an edgeBox algorithm to form N' target proposals, and recording the width W of each proposal_i'and Width H'_i′；

1/1.2×W_occ×H_occ＜W_i′×H′_i＜1.2×W_occ×H_occ (13)

In the present invention, the appearance of the target changes due to rotation, deformation, and the like during tracking. Therefore, the target template should be updated during tracking to obtain strong performance. If the target template is updated too frequently, the template is easily corrupted by noise. Conversely, if the target template is updated too slowly, the template cannot capture the normal appearance changes of the target. A suitable update scheme is crucial for the tracker. To this end, the method of the invention further comprises, during tracking, for each frame, determining whether the template needs to be updated using conditional expression (15),

the updated scheme is as follows:

For a better understanding of the technical solution of the present invention, it is further described below with reference to the accompanying drawings.

(1) Constructing a theoretical frame of the anti-occlusion target tracking method based on the relevant filtering as shown in FIG. 1;

(2) manually selecting a tracking target, training a motion correlation filter w on the target by using a formula (4), and finding out the optimal scale beta of the target by using a formula (8)_iEntering the next frame, λ is 10 in the invention^-4The gaussian kernel width σ is 0.1;

(3) recording a second frame response value tau of the target;

(4) and establishing a new frame target search area, wherein the position of the new frame target search area is the same as that of the previous frame target frame, and the area of the new frame target search area is 1.5 times that of the target frame.

(5) And extracting a feature vector x of the target, weighting the feature vector x by a cosine window, and jointly using an equation (6) and an equation (8) to obtain the maximum response value under motion and scale. Choose to have the largest

As a result of the translation estimation of the target. Meanwhile, selecting beta corresponding to the maximum response value_iAs the optimal scale for the target.

(6) Judging whether the latest continuous 5 frames meet the shielding criterion of the invention, namely:

1)Y＝[y(1),y(2),y(3),y(4),y(5)]＜d·τ

2)sum(Y＜θ·d·τ)≥2,θ＜1

in the present invention, the condition 1) is a basic condition, and if the response value continues to decrease seriously in the condition 2), it is determined that the tracking has failed. Wherein d is 0.3 and θ is 0.7.

(7) If the target is judged to be lost, recording the position of the target lost and the average motion displacement of the target of the nearest 5 frames, and correspondingly detecting each frame of image input next by using different re-detection modes. Wherein a ═ B ═ 4, M ═ N ═ 10, and N ═ 200.

(8) In the detection threshold coefficient setting, ν is 0.3.

(9) For each candidate sample, performing correlation filtering sequentially by using the formulas (6) and (8) and taking out the maximum response value, and if the maximum response value is greater than epsilon tau, outputting the target frame as a new initial condition to start the tracker; otherwise, the next frame is entered for detection until the target is detected. The final test results are shown in fig. 5. And after the target frame of the current frame is obtained, entering the next frame.

(10) During tracking, the appearance of the target may change due to rotation, deformation, etc. For each frame, the present invention uses conditional expression (15) to determine whether the template needs to be updated. The updated scheme is as follows:

where η is the learning rate. And repeating the steps until the image sequence is ended.

(11) To verify the effectiveness of the present invention, the present invention (Ours) compares the proposed algorithm to other advanced trackers on the OTB dataset, the TC-128 dataset, and the UAV20L dataset. The comparison objects on the OTB are 9 advanced trackers: KCF, DSST, LCT, MEEM, SAMF, DLSSVM, Stacke, LMCF and ACFN. The comparison objects on TC-128 are 11 advanced trackers: MEEM, SAMF, Struck, ASLA, KCF, MIL, DCF-CA, OCT-KCF, FCT, L1APG, LC-CF. The comparison objects on the UAV20L are 14 advanced trackers: DSLS, MEEM, SRDCF, MUSTER, Struck, SAMF, DSST, TLD, DCF, KCF, CSK, MOSSE, ASLA and IVT. The experimental environment is Intel Core i52.3GHz CPU with 8.00G RAM, MATLAB 2017 b.

(12) To evaluate the overall performance of the tracker, the present invention evaluates the algorithm of the present invention on a published target tracking benchmark (OTB) data set. The OTB dataset contains two groups: (1) OTB-50 with 50 sequences, (2) OTB-100 with 100 sequences. All these sequences are annotated with 11 attributes, covering various challenging factors including scale changes, occlusion, illumination changes, motion blur, morphing, fast motion, out-of-plane rotation, background clutter interference, out-of-view, in-plane rotation, and low resolution. The invention uses two indexes in the reference data set to evaluate the tracking performance, namely the overlapping success rate and the distance accuracy rate.

(13) In fig. 6 and 7, it can be seen that the distance accuracy rate and the overlapping success rate of the tracker of the present invention (Ours) rank first on the OTB data set, which fully demonstrates the effectiveness of the algorithm proposed by the present invention.

(14) For the 11 challenge attributes, it can be seen from fig. 8 to fig. 18 that, on the distance accuracy index, the algorithm of the present invention (Ours) ranks first among five challenge attributes of scale change, out-of-view, occlusion, low resolution, warping, and ranks second under the challenge attributes of out-of-plane rotation, in-plane rotation, and background clutter.

(15) For the 11 challenge attributes, as can be seen from fig. 19 to fig. 29, on the index of the overlapping success rate, the algorithm of the present invention (Ours) ranks first in seven challenge attributes of out-of-plane rotation, occlusion, motion blur, low resolution, in-plane rotation, warping, background clutter, and ranks second in three challenge attributes of out-of-view, fast motion, and scale change. Therefore, the provided algorithm not only well solves the problem of target shielding, but also effectively solves the problem of tracking drift caused by other factors.

(16) From fig. 30 to fig. 31, it can be seen that the tracker of the present invention (Ours) ranks first in distance accuracy and overlap success on the TC-128 dataset, which fully demonstrates the effectiveness of the algorithm proposed by the present invention.

(17) As can be seen from fig. 32 to 33, the tracker of the present invention (Ours) ranked first in distance accuracy and overlap success on the UAV20L data set, which fully demonstrates the effectiveness of the algorithm proposed by the present invention.

(18) From the comparison of the results in FIG. 34, only the tracker of the present invention (Ours) worked well in the sequences Busstation-ce1, Carchase1, Face-ce, while the other trackers lost the target. It is worth noting that there are many similar faces in the Face-ce sequence, but the algorithm of the present invention does not recover to the wrong target, so this suggests that the threshold coefficient has a significant role in resisting similar interference. As for the sequences Motorbike-ce, Panda and Toyplane, the tracker can stably track the target in the whole process, and most other trackers cannot continuously track the target after the target is shielded.

(19) As can be seen from a comparison of the long-term tracking results in fig. 35, the proposed tracker has a significant advantage over other trackers. In the Bike, Car16 and Person7 sequences, the appearance and trajectory of the target have changed dramatically, and the tracker of the present invention (Ours) can adapt well to these changes with efficient model update strategies and target re-detection mechanisms. However, other trackers fail to update the model in a timely manner, resulting in tracking drift or tracking failure. In the sequences Group1, Group2 and Group3, the proposed trackers performed very well, while the other trackers performed less than fully. Particularly, in the sequences Group2 and Group3, only the tracker of the present invention tracks the target robustly due to the interference of other factors such as the occlusion and deformation of the target, and the other trackers fail to track.

The foregoing shows and describes the general principles, essential features, and inventive features of this invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A real-time long-term tracking method based on correlation filtering, the method comprising:

recording a second frame response value tau of the target;

1)Y＝[y(1),y(2),y(3),y(4),y(5)]＜d·τ

2)sum(Y＜θ·d·τ)≥2,θ＜1

where Y (i) is the response value, Y (i) is the element of Y, and θ is the coefficient; the operator sum (-) is used to calculate the number of Y (i) < θ · d · τ in the set Y; when the response values of five continuous frames reach the condition, judging that the target tracking fails, stopping tracking and starting a re-detection mechanism;

2. The real-time long-term tracking method based on correlation filtering according to claim 1, wherein when the average motion displacement is less than or equal to 15 pixels, a local search is adopted, and the specific implementation steps are as follows:

4) the step sizes of the sliding in the x and y directions are as follows:

Δ_x-step＝(W_search-W_occ)/M (11)

Δ_y-step＝(H_search-H_occ)/N (12)

wherein M and N are positive integers;

2) Performing target detection on subsequently input images by using an EdgeBox algorithm to form N ' target proposals, and recording the width W ' of each proposal '_iAnd a width H'_i；

1/1.2×W_occ×H_occ＜W′_i×H′_i＜1.2×W_occ×H_occ (13)

3. The correlation filtering based real-time long-term tracking method according to claim 2, wherein the correlation filtering operation is implemented on each sliding window, and the constraint conditions for setting the detection result are as follows:

Max(R)＞ντ (14)

4. The correlation filtering based real-time long-term tracking method according to claim 1, further comprising,

the updated scheme is as follows: