CN109064492B

CN109064492B - Context-dependent filtering video tracking method based on manifold regularization

Info

Publication number: CN109064492B
Application number: CN201810826449.1A
Authority: CN
Inventors: 宋慧慧; 樊佳庆; 张开华
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2018-07-25
Filing date: 2018-07-25
Publication date: 2022-04-01
Anticipated expiration: 2038-07-25
Also published as: CN109064492A

Abstract

The invention discloses a context-dependent filtering video tracking method based on manifold regularization, wherein a basic sample of the relevant filtering tracking method adopts a central area and a peripheral area thereof as positive and negative samples, and a manifold structure is added into a series of circulating samples to perform discrete Fourier transform fast calculation. The context-dependent filtering video tracking method based on manifold regularization effectively utilizes the structural information between the target context areas, constructs the manifold regularization term to constrain the target function, well inhibits the noise under the chaotic background, and obviously improves the precision and the robustness of the tracking algorithm.

Description

Context-dependent filtering video tracking method based on manifold regularization

Technical Field

The invention relates to a context-dependent filtering video tracking method based on manifold regularization, belongs to the field of image processing, and particularly relates to a video tracking method.

Background

Visual target tracking is a classic computer vision problem that has been studied for many years, but has hitherto been a challenging task, mainly because the target appearance suffers from many disturbing factors such as: fast motion, cluttered background, arbitrary object appearance changes and deformations, etc. A robust appearance model plays an important role in practical tracking, and thus becomes an important research point in recent years.

Recently, discriminant correlation filtering-like methods have demonstrated great success in visual target tracking. These methods learn a correlation filter from a series of cyclic samples, which, due to their special structure, enable fast computation using discrete fourier transforms. However, these loop structures also introduce unwanted boundary effects that result in inaccurate target representations, thus limiting the discriminability of the learned correlation filters. The boundary effect has been partially mitigated by recent elaboration. Another disadvantage of the correlation filtering is that the object shape is described by a bounding box, which introduces clutter background noise in the object, especially when the object suffers from large-scale non-rigid deformations, ultimately resulting in a series of sub-optimal samples. Drift problems eventually arise over time.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide a video tracking method which can effectively inhibit noise under a chaotic background and improve the accuracy and the robustness of a tracking algorithm.

In order to achieve the above object, the present invention provides a context-dependent filtering video tracking method based on manifold regularization, in which a basic sample of the correlation filtering tracking method adopts a central region and a peripheral region thereof as positive and negative samples, and a manifold structure is added to a series of cyclic samples to perform discrete fourier transform fast calculation.

Further, the popular structure is obtained by introducing a linear graph laplacian regularization term into the relevant filtering objective function.

The graph laplacian regularization term is:

wherein L is a graph laplacian matrix.

The context-dependent filtering video tracking method based on manifold regularization specifically comprises the following steps:

(1) the method comprises the following steps: we first crop out the central region x₀And the area x around it_iAs positive and negative samples. Both positive and negative samples can be cyclically shifted to form a series of virtual samples, which are respectively marked as X₀And X_i. A cropped central region and a region around the central region, which is referred to as a central region x₀Four regions x of upper, lower, left and right_iCentral region x₀Determined by the tracking result of the previous frame.

(2) Step two: a graph Laplacian regularization term is then constructed to take into account the similarity between these samples, if two samples x_iAnd x_jThe closer together in the geometry of the high-dimensional feature, the corresponding projection f (x)_i) And f (x)_j) And also closer together.

(3) Step three: drawing Laplace regularization term constructed in the previous step

Adding the obtained signal into the original context perception correlation filtering tracking to obtain an improved objective function

(4) Step four: obtaining a closed solution by using a regularized least squares regression solution method

Fast calculation by discrete Fourier transform

Namely, the rapid training is completed.

(5) Step five: finally, a response map is rapidly calculated by utilizing discrete Fourier transform, wherein the position of the maximum value is the tracking result. After each frame of tracking is finished, online updating of the model is carried out.

The invention has the beneficial effects that: the context-dependent filtering video tracking method based on manifold regularization effectively utilizes the structural information between the target context areas, constructs the manifold regularization term to constrain the target function, well inhibits the noise under the chaotic background, and obviously improves the precision and the robustness of the tracking algorithm.

Drawings

FIG. 1 is a schematic flow chart of a context-dependent filtering video tracking method based on manifold regularization according to the present invention;

FIG. 2 is a schematic diagram showing the comparison of the video tracking effect of the video tracking method of the present invention with the four methods of RPT, DSST, KCF and Struck in the prior art.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further explained with reference to the accompanying drawings and detailed description.

The invention provides a context-dependent filtering video tracking method based on manifold regularization, which specifically comprises the following operation steps as shown in fig. 1:

(2) Step two: a graph Laplacian regularization term is then constructed to take into account the similarity between these samples, if two samples x_iAnd x_jThe closer together in the geometry of the high-dimensional feature, the corresponding projection f (x)_i) And f (x)_j) And also closer together. The specific calculation is as follows, the similarity between two samples:

where σ is the decay exponent of the gaussian kernel function.

(3) Step three: writing the graph Laplacian regularization term constructed in the previous step into a matrix form

Wherein, f (x)_i) And f (x)_j) Respectively represent samples x_iAnd x_jCorresponding projection, f ═ f₁,f₂,...,f_m]^TTo representA projection function, L being the graph laplacian matrix, calculated from L ═ D-W, where W is the correlation matrix and D is the diagonal matrix, whose diagonal elements

f(·)^TRepresenting a transpose; w_ijRepresenting two samples x_iAnd x_jThe similarity between them.

Adding the obtained signal into an objective function of the original context perception correlation filtering tracking to obtain an improved objective function

Wherein the scalar y_iRepresents each sample x_iCorresponding label, W_ijRepresenting two samples x_iAnd x_jSimilarity between them, λ₁,λ₂Three regular term coefficients are respectively represented, and the three terms are controlled.

Then writing into a matrix form:

herein, the

The graph laplacian matrix is:

Fast calculation by discrete Fourier transform

Wherein

Represents a fast Fourier transform,. indicates a Hadamard product, (. o)^*Denotes conjugation, a₀Image blocks representing intermediate areas, a_iRespective image blocks, y, representing surrounding areas₀Denotes a₀Labels, W, of all cyclically shifted samples of an image block_ijRepresenting two samples x_iAnd x_jSimilarity between them, λ₁,λ₂Three regular term coefficients are respectively represented, and the three terms are controlled.

And solving w to finish the quick training.

(5) Step five: finally, since the original detection function is f^*(z)＝w^Tz, fast calculation of response using discrete Fourier transform

Wherein

Indicating a fast fourier transform, a indicates a hadamard product.

The position of the maximum in the response r is the tracking result. In order to use long-term information in a video sequence, a simple and effective updating strategy is utilized, namely after tracking of each frame is finished, online updating of a model is carried out:

where beta is an online update parameter,

representing the filter at the current t frame.

Compared with the prior art that the video tracking method is adopted to carry out video tracking effect comparison, the success rate curve on the OTB50 data set is shown in figure 2, the method obtains the highest success rate curve score of 0.615 in the 5 methods, and the effectiveness of the method is illustrated.

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. A video tracking method based on context-dependent filtering of manifold regularization is characterized in that: the method for tracking the context-dependent filtering video comprises the following specific steps:

(1) the method comprises the following steps: first, the center area x is cut out₀As a positive sample, a region x around the positive sample is cut out_iAs a negative sample; the positive samples are cyclically shifted out of a series of virtual samples denoted X₀The negative samples are circularly shifted to obtain a series of virtual samples as X_i；

(2) Step two: constructing a graph Laplace regularization term to consider the similarity among the samples;

Adding the improved objective function into context perception correlation filtering tracking to obtain an improved objective function

Wherein, f (x)_i) And f (x)_j) Respectively represent samples x_iAnd x_jCorresponding projections, f denotes the projection function, L is the graph laplacian matrix, calculated from L ═ D-W, where W is the correlation matrixD is a diagonal matrix whose diagonal elements

f(·)^TRepresenting a transpose; w_ijRepresenting two samples x_iAnd x_jSimilarity between them;

wherein the scalar y_iRepresents each sample x_iCorresponding labels, λ₁，λ₂Respectively representing three regular term coefficients;

Fast calculation by discrete Fourier transform

Namely, completing the quick training;

wherein

Represents a fast Fourier transform,. indicates a Hadamard product, (. o)^*Denotes conjugation, a₀Image blocks representing intermediate areas, a_iRespective image blocks, y, representing surrounding areas₀Denotes a₀Labels of all cyclic shifted samples of the image block;

(5) step five: finally, fast calculating a response graph by utilizing discrete Fourier transform, wherein the position of the maximum value is the tracking result; after each frame of tracking is finished, online updating of the model is carried out.