CN107424175B

CN107424175B - Target tracking method combined with space-time context information

Info

Publication number: CN107424175B
Application number: CN201710596203.5A
Authority: CN
Inventors: 朱红; 王道江
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2017-07-20
Filing date: 2017-07-20
Publication date: 2020-09-08
Anticipated expiration: 2037-07-20
Also published as: CN107424175A

Abstract

The invention belongs to the field of pattern recognition and computer vision, and discloses a target tracking method combining space-time context information, which comprises the following steps: training an initial strong classifier by using a first frame of picture, and learning a space-time context model required by the tracking of the next frame; when a new frame arrives, evaluating a plurality of blocks of the search area by using a trained strong classifier to obtain a first confidence matrix; then, a confidence map function is obtained by integrating the spatio-temporal context information, and the confidence values of all blocks in the search area are obtained by utilizing the confidence map function to obtain a second confidence matrix; finally, obtaining a final confidence matrix according to the corresponding weight linear combination, and finding a block with the maximum confidence value in the final confidence matrix as a target to be tracked; according to the method, the space-time context information of the target is combined into the online Boosting algorithm, so that the rapid robustness tracking can be realized.

Description

Target tracking method combined with space-time context information

Technical Field

The invention belongs to the technical field of pattern recognition and computer vision, and particularly relates to a target tracking method combining spatiotemporal context information.

Background

Moving target tracking is one of important research directions in the field of computer vision, and has important application in the fields of human-computer interaction, intelligent monitoring, medical imaging and the like. Tracking algorithms have made great progress in recent years, but how to effectively solve the problem of tracking drift caused by factors such as occlusion, rapid movement, illumination change, background clutter and the like is still a very challenging problem.

In the online Boosting algorithm, when a new frame arrives, a strong classifier is used for classifying the background and the target in the picture to obtain a target area, but when the target is shielded, the characteristic pool is updated by using the shielded characteristics, so that the characteristic pool is polluted, and finally tracking drift occurs.

Based on the above problems, some improved online Boosting algorithms are proposed. Yan et al propose an online Boosting algorithm based on a sub-region classifier, which divides a target region into a plurality of sub-regions, each sub-region corresponding to a strong classifier. In the tracking process, the feature pool corresponding to the strong classifier with the minimum confidence value is selected not to be updated so as to avoid the pollution of the shielded features to the feature pool, but when the target scale changes, the tracking effect is poor.

Sun et al propose an online Boosting algorithm with motion Blob detection, when the confidence value of the tracking result is lower than the lower threshold, detect the moving object in the search area by using the motion Blob detection method, and evaluate the confidence value of the detected moving object by using a strong classifier until the confidence value is greater than the upper threshold or the lower threshold, but the motion Blob detection usually cannot detect the moving object in a long distance, so the improvement effect is not obvious.

Wang et al propose the online Boosting algorithm of amalgamation and sheltering from the perception, utilize certain number of picture frames to train out background feature classifier and target feature classifier, utilize these two classifiers to perceive whether the target shelters from, if the target is sheltered from, then not gather the positive sample that is polluted and upgrade the classifier, just so increased the complexity of classifier, reduced the real-time effect of online Boosting algorithm, and easily follow the losing to the target of quick motion.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a target tracking method combining space-time context information, which can solve the problem of tracking drift occurring when a target area is partially shielded or the target size is greatly changed in the prior art.

In order to achieve the purpose, the invention is realized by adopting the following technical scheme.

A method of target tracking in conjunction with spatiotemporal context information, the method comprising the steps of:

step 1, acquiring a first frame image in a video image, calibrating a target area of the first frame image, expanding the target area to obtain a search area by taking the target area as a center, taking the search area four times as large as the target area, taking the target area as a positive sample, and taking four corner areas of the search area as four negative samples respectively; wherein the size of the target area is the same as the size of each corner area; taking the positive sample and the four negative samples as training samples, and obtaining a strong classifier according to the training samples;

step 2, learning a spatial context model according to the first frame image, and taking the spatial context model as a learned space-time context model for tracking the next frame image;

step 3, obtaining a current frame image to be tracked, and determining an initial search area of the current frame image, wherein the initial search area of the current frame image is centered above a target area of a previous frame image, and the initial search area of the current frame image is four times of the target area of the previous frame image; partitioning an initial search area of a current frame image according to the size of a target area of a previous frame image to obtain a plurality of to-be-searched subblocks with the same size;

step 4, evaluating each subblock to be searched according to the strong classifier to obtain a first confidence value of each subblock to be searched, and forming a first confidence matrix;

step 5, obtaining a confidence map function according to a space-time context model which is learned by the previous frame of image and tracks the current frame of image; determining the central point of each subblock to be searched, and respectively obtaining a second confidence value of each subblock to be searched according to the confidence map function and the central point of each subblock to be searched to form a second confidence matrix;

step 6, determining that the initial value of the weight corresponding to the first confidence matrix is 1/2, the initial value of the weight corresponding to the second confidence matrix is 1/2, and linearly combining the first confidence matrix, the weight corresponding to the first confidence matrix, the second confidence matrix and the weight corresponding to the second confidence matrix to obtain a final confidence matrix; determining the maximum confidence value in the final confidence matrix, wherein the subblock to be searched corresponding to the maximum confidence value is a target area of the tracked current frame image;

step 7, determining a search area of the current frame image, wherein the search area of the current frame image takes a target area of the current frame image as a center, and the search area of the current frame image is four times of the target area of the current frame image; taking a target area of the current frame image as a positive sample, taking four corner areas of a search area of the current frame image as four negative samples respectively, and updating the strong classifier;

step 8, learning a space context model according to the current frame image, and determining a space-time context model which is learned by the current frame and tracks the next frame image by combining the space-time context model which is learned by the previous frame image and tracks the current frame image;

step 9, updating the weight corresponding to the first confidence matrix and the weight corresponding to the second confidence matrix according to the current frame image;

and 10, repeatedly executing the steps 3 to 9 until all the video images needing to be tracked are completed.

The invention integrates space-time context information into the on-line boosting target tracking algorithm, effectively solves the problem that the on-line boosting algorithm is easy to have tracking drift and even tracking loss when the tracking target is partially or completely shielded, and can realize the tracking of fast robustness.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart illustrating a target tracking method in combination with spatiotemporal context information according to an embodiment of the present invention;

FIG. 2 is a schematic diagram showing the comparison between the tracking effect of the method of the present invention and the tracking effect of the two conventional methods.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The technical scheme of the invention utilizes that the target between two adjacent frames in the video image does not change too much, the position does not change suddenly, and a certain specific relation exists between the target and the background around the target; using this relationship can help to distinguish between objects and background when the appearance of the object changes significantly. The invention introduces the space-time context information into the online Boosting algorithm.

Spatio-temporal context information: the temporal information is that the appearance and position of the object between adjacent frames do not change suddenly, and the spatial information is that the object has a certain specific relationship with the background around the object, and the relationship can help to distinguish the object from the background. The combination of these two pieces of information for the target is spatiotemporal context information.

The embodiment of the invention provides a target tracking method combined with space-time context information, as shown in figure 1, the method comprises the following steps:

step 1, acquiring a first frame image in a video image, calibrating a target area of the first frame image, expanding the target area to obtain a search area by taking the target area as a center, taking the search area four times as large as the target area, taking the target area as a positive sample, and taking four corner areas of the search area as four negative samples respectively; wherein the size of the target area is the same as the size of each corner area; and taking the positive sample and the four negative samples as training samples, and obtaining a strong classifier according to the training samples.

In step 1, the positive samples and the four negative samples are used as training samples, and a strong classifier is obtained according to the training samples, and the method specifically comprises the following substeps:

(1a) let training sample set S { (x)_i,y_i)|x_i∈X,y_i∈ Y, i ═ 1,2, … 5}, X denotes a training sample space consisting of one positive sample and four negative samples, X_iRepresents the ith training sample in the training sample space, Y represents the sample class label, and Y { -1,1}, Y {_iA sample class label representing an ith training sample in the training sample space; the sample type label is 1, which means that the training sample is a positive sample, and the sample type label is-1, which means that the training sample is a negative sample;

setting M weak classifiers, wherein the M weak classifier is

M1.., M; m represents the total number of weak classifiers;

the initial value of i is 1, and the initial value of m is 1; setting the sample importance weight lambda to be 1;

(1b) obtaining the ith training sample, and for the mth weak classifier

Parameter (d) of

Updating:

when m weak classifier

When the classification result of the ith training sample is correct, the parameters are ordered

Is added to the value of the sample importance weight lambda as the mth weak classifier

New parameters

Otherwise, let the parameter

New parameters

Wherein the content of the first and second substances,

represents the cumulative classified correct sample weight for the mth weak classifier,

representing the cumulative classification error sample weight of the mth weak classifier;

(1c) adding 1 to the value of i, and repeatedly executing the substep (1b) until the value of i is greater than 5; get the m weak classifier

Final parameter of

(1d) Setting the value of i as 1, adding 1 to the value of M, and repeatedly executing the substeps (1b) to (1c) until the value of M is greater than M to obtain final parameters of M weak classifiers;

(1e) calculating the cumulative error rate of the mth weak classifier

Enabling M to respectively take 1, M and respectively obtaining the accumulated error rates of M weak classifiers;

(1f) obtaining a weak classifier with the minimum accumulated error rate as the nth selector

The initial value of N is 1, N is 1. N represents the total number of selectors;

setting the value of i to 1;

(1g) obtaining the ith training sample, and adopting the nth selector

Update the value of the sample importance weight λ:

when the nth selector

When the classification result of the ith training sample is correct, the value of the sample importance weight lambda is multiplied by 1/(2 × (1-)_n) As a new sample importance weight λ), otherwise, the value of the sample importance weight λ is multiplied by 1/(2 ×)_n) As a new sample importance weight λ; wherein the content of the first and second substances,_ndenotes the nth selector

The cumulative error rate of the corresponding weak classifier;

(1h) adding 1 to the value of i, and repeatedly executing the substep (1g) until the value of i is greater than 5; obtaining a final new sample importance weight lambda value;

(1i) setting the value of i as 1, setting the value of m as 1, adding 1 to the value of N, adopting the final new sample importance weight lambda, and repeatedly executing the substeps (1b) to (1h) until the value of N is greater than N to obtain N selectors;

(1j) calculating the nth selector

Corresponding voting weight

Sequentially taking 1, N and N from the value of N to respectively obtain voting weights corresponding to the N selectors; ln (·) represents a logarithmic function;

(1k) carrying out linear combination on the N selectors according to the corresponding voting weights to obtain a strong classifier

Where sign () represents a sign function.

And 2, learning a spatial context model according to the first frame image, and taking the spatial context model as a learned space-time context model for tracking the next frame image.

The technical scheme of the invention utilizes the advantages of the target tracking algorithm based on the space-time context information in the aspect of processing the shielding, and makes up the defects of the online Boosting algorithm in the aspect of shielding.

Step 3, obtaining a current frame image to be tracked, and determining an initial search area of the current frame image, wherein the initial search area of the current frame image is centered above a target area of a previous frame image, and the initial search area of the current frame image is four times of the target area of the previous frame image; and partitioning the initial search area of the current frame image according to the size of the target area of the previous frame image to obtain a plurality of to-be-searched subblocks with the same size.

In step 3, the initial search area of the current frame image is blocked according to the size of the target area of the previous frame image to obtain a plurality of sub blocks to be searched with the same size, wherein the block step size comprises a row step size and a column step size: the row step size is: floor ((1-T). times.W +0.5), column step size: floor ((1-T). times.H + 0.5); floor (·) denotes downward rounding, T denotes a coincidence factor between two adjacent subblocks to be searched, W denotes a width of a target region of the first frame image, and H denotes a height of the target region of the first frame image.

And 4, evaluating each subblock to be searched according to the strong classifier to obtain a first confidence value of each subblock to be searched, and forming a first confidence matrix.

The step 4 specifically comprises the following steps: evaluating each subblock to be searched according to the strong classifier to obtain a first confidence value of each subblock to be searched

And forming a first confidence matrix, wherein x represents any subblock to be searched.

Step 5, obtaining a confidence map function according to a space-time context model which is learned by the previous frame of image and tracks the current frame of image; and determining the central point of each subblock to be searched, and respectively solving a second confidence value of each subblock to be searched according to the confidence map function and the central point of each subblock to be searched to form a second confidence matrix.

The step 5 specifically comprises the following substeps:

(5a) obtaining a confidence map function c (H) -IFFT (FFT (H)) according to a spatiotemporal context model for tracking the current frame image learned by the previous frame image^stc(h))⊙FFT(R(h)ω_σ(h-h^*)))；

Wherein H^stc(h) Representing a space-time context model which is learned by a previous frame image and tracks the current frame image, h represents any position in a search area of the current frame image, and R (h) represents the gray value of a pixel at the position h in the search area of the current frame image; omega_σ(h-h^*) Represents a weight function and is defined as

Zeta is a regularization constant, sigma is a scale parameter, h^*Representing the position of the center point of the target area in the image of the previous frame, FFT (-) represents Fourier transform, IFFT (-) represents inverse Fourier transform, and ⊙ represents point multiplication;

(5b) and respectively calculating a second confidence value of each subblock to be searched by using the variable h in the confidence map function as the central point of each subblock to be searched of the current frame image to form a second confidence matrix.

Step 6, determining that the initial value of the weight corresponding to the first confidence matrix is 1/2, the initial value of the weight corresponding to the second confidence matrix is 1/2, and linearly combining the first confidence matrix, the weight corresponding to the first confidence matrix, the second confidence matrix and the weight corresponding to the second confidence matrix to obtain a final confidence matrix; and determining the maximum confidence value in the final confidence matrix, wherein the subblock to be searched corresponding to the maximum confidence value is the target area of the tracked current frame image.

Step 7, determining a search area of the current frame image, wherein the search area of the current frame image takes a target area of the current frame image as a center, and the search area of the current frame image is four times of the target area of the current frame image; and taking the target area of the current frame image as a positive sample, taking four corner areas of the current frame image search area as four negative samples respectively, and updating the strong classifier.

And 8, learning a space context model according to the current frame image, and determining the space-time context model which is learned by the current frame and tracks the next frame image by combining the space-time context model which is learned by the previous frame image and tracks the current frame image.

The step 8 specifically comprises the following substeps:

(8a) determining a context prior probability model P (c (z) o) of the current frame image:

P(c(z)|o)＝R(z)ω_σ(z-h^*)

wherein, P (c (z) o) represents prior probability of appearance of the context feature at each pixel point in the background region of the current frame image under the condition that the object appears in the current frame search region, o represents an event that the object appears in the current frame search region, the context feature at z is represented as c (z) ═ r (z), z ∈ Ω, z is any position in the background region of the current frame image, Ω is the background region of the current frame image, the background region of the current frame image refers to the image region except the object region in the search region of the current frame image, r (z) represents gray value of pixel at position z of the background region of the current frame image, ω (z)_σ(z-h^*) Represents a weight function and is defined as

Zeta is a regularization constant, sigma is a scale parameter, h^*Representing the position of the central point of the target area in the previous frame image;

(8b) determining a spatial context model P (h | c (z), o) of the current frame image:

P(h|c(z),o)＝f^sc(h-z)

wherein, P (h | c (z), o) represents the conditional probability that the target position is h under the condition that the target appears in the current frame image search area and the context feature appears at z, h represents any position in the current frame image search area, f^sc(h-z) is a function of position h and position z, representing the learned spatial context model of the current frame;

(8c) according to a confidence function

Obtaining the space context model f learned by the current frame^sc(h)：

Wherein c (h) is a confidence map function expressed as

Wherein b is a constant, α is a scale parameter, β is a shape parameter,

represents a convolution symbol;

(8d) the current frame image is set as the t frame image, and the space-time context model for tracking the current frame image, which is learned by the previous frame image, is

So that the current frame learns a spatiotemporal context model that tracks the next frame image

Comprises the following steps:

where ρ is an update parameter, and ρ ∈ (0,1), when t is 1,

representing the spatial context model learned by the t-th frame image.

And 9, updating the weight corresponding to the first confidence matrix and the weight corresponding to the second confidence matrix according to the current frame image.

For the weight corresponding to the first confidence matrix, the update of the weight corresponding to the second confidence matrix mainly needs to consider whether the target is occluded or not, both are 1/2 initially, and when the target is partially or completely occluded, the voting weight of the weight corresponding to the second confidence matrix is increased, and the weight corresponding to the first confidence matrix is decreased. In order to judge whether the current frame target is occluded, the embodiment of the invention introduces the concept of an occlusion factor to judge, and the method is established on the basis of the characteristics of a color histogram.

The step 9 specifically comprises the following substeps:

(9a) calculating an occlusion factor occ for the current frame image search area;

(9b) setting an occlusion factor threshold value, 0< <1, updating the weight A1 corresponding to the first confidence matrix, and updating the weight A2 corresponding to the second confidence matrix as follows:

wherein, A represents the weight corresponding to the first confidence matrix determined by the previous frame of image, and Y represents the maximum confidence value in the final confidence matrix.

The substep 9(a) specifically includes the substeps of:

(9a1) acquiring color histogram characteristics of a current frame image search area; quantizing the color histogram features to a J-th level, the J-th level features being represented as u_jAnd u is_j＝j，j＝1,...,J；

The initial value of j is 1;

(9a2) let the position of the pixel of the target area of the first frame image be expressed as

k is the total number of pixels contained in the target area of the first frame image, and the j-th level feature u_jProbability density function distributed on target area of first frame image

Is defined as:

wherein C is a normalized constant, K (·) is a kernel function, | ·| caly |)²Represents the square of the modulus value, (-) represents the impulse response function,

indicating a location

A quantization level of a corresponding color histogram feature;

(9a3) let the position of any pixel of the sub-block to be searched in the current frame image be expressed as d_i}_i＝1,2,…kK is the total number of pixels contained in any sub-block to be searched in the current frame image and is equal to the total number of pixels contained in the target area of the first frame image, and then the j-th-level feature u_jProbability density function distributed on any sub-block to be searched of current frame image

Is defined as:

wherein s is the central point position of the current frame image target area, C is a normalized constant, K (-) is a kernel function, | | | can²Represents the square of the modulus value, (-) represents the impulse response function,

indicating a location

Quantization level of corresponding color histogram feature, h₁Window radius as kernel function;

(9a4) recording the central point position of the subblock to be searched with the maximum confidence value in the current frame image searching region as y₀Let a first intermediate variable

Expressed as:

let the second intermediate variable

Expressed as:

wherein λ is₁More than or equal to 1, which is a shielding degree parameter;

(9a5) adding 1 to the value of J, and repeatedly executing the sub-steps (9a2) to (9a4) to obtain J second intermediate variables, thereby calculating the occlusion factor of the current frame image search area

The technical scheme of the invention is realized in MATLAB 2014a, and the number of partial parameters is set as follows, wherein N is 50, M is 250, the coincidence factor T between blocks is 0.99, the proportion parameter α is 2.25, the update parameter rho is 0.075, and the shielding degree parameter lambda is₁The occlusion factor threshold θ is 0.5, 1. Three methods (the method of the present invention, an online Boosting algorithm, based on a space-time context algorithm, a solid line frame is the tracking effect of the method of the present invention, a pure dotted line frame is the tracking effect based on the space-time context algorithm, and a dotted line frame with black solid points is the tracking effect of the online Boosting algorithm) are initialized to the same target frame in the first frame, and the tracking effect is as shown in fig. 2. The first video sequence (a column) tracks a toy dog (background is disordered, and a target is blocked), the method can correctly track the target, but after 120 frames, other two algorithms lose the target; second video sequence(b) tracking the walking people (moving pedestrians shielding the target) on a subway platform, and showing that the method of the invention has obviously better effect than other two methods, especially after 43 frames; and a third sequence (c column) tracks a fast-running automobile (the target scale changes rapidly and part of the sequence is blocked), and the method can also carry out robust tracking, thereby verifying the feasibility of the method.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A target tracking method combined with spatiotemporal context information is characterized by comprising the following steps:

the method specifically comprises the following substeps:

(9b) setting a shielding factor threshold value, wherein 0< 1, the weight A1 corresponding to the first confidence matrix is updated as follows, and the weight A2 corresponding to the second confidence matrix is updated as follows:

wherein, A represents the weight corresponding to the first confidence matrix determined by the previous frame of image, and Y represents the maximum confidence value in the final confidence matrix;

2. The method for tracking a target in combination with spatio-temporal context information as claimed in claim 1, wherein in step 1, the positive samples and the four negative samples are used as training samples, and a strong classifier is obtained according to the training samples, specifically comprising the following sub-steps:

(1a) let training sample set S { (x)_i，y_i)|x_i∈X，y_i∈ Y, i ═ 1,2, … 5}, X denotes a training sample space consisting of one positive sample and four negative samples, X_iRepresents the ith training sample in the training sample space, Y represents the sample class label, and Y { -1,1}, Y {_iA sample class label representing an ith training sample in the training sample space; the sample type label is 1, which means that the training sample is a positive sample, and the sample type label is-1, which means that the training sample is a negative sample;

setting M weak classifiers, wherein the M weak classifier is

M represents the total number of weak classifiers;

(1b) obtaining the ith training sample, and for the mth weak classifier

Parameter (d) of

Updating:

when m weak classifier

New parameters

Otherwise, let the parameter

New parameters

Wherein the content of the first and second substances,

Final parameter of

(1e) calculating the cumulative error rate of the mth weak classifier

setting the value of i to 1;

(1g) obtaining the ith training sample, and adopting the nth selector

Update the value of the sample importance weight λ:

when the nth selector

The cumulative error rate of the corresponding weak classifier;

(1j) calculating the nth selector

Corresponding voting weight

Where sign () represents a sign function.

3. The method for tracking the target in combination with the spatiotemporal context information as claimed in claim 1, wherein in step 3, the initial search area of the current frame image is partitioned according to the size of the target area of the previous frame image to obtain a plurality of sub-blocks to be searched, wherein the block step size comprises a row step size and a column step size: the row step size is: floor ((1-T). times.W +0.5), column step size: floor ((1-T). times.H + 0.5); floor (·) denotes downward rounding, T denotes a coincidence factor between two adjacent subblocks to be searched, W denotes a width of a target region of the first frame image, and H denotes a height of the target region of the first frame image.

4. The method for tracking the target by combining the spatiotemporal context information as claimed in claim 2, wherein the step 4 specifically comprises:

evaluating each subblock to be searched according to the strong classifier to obtain a first confidence value of each subblock to be searched

5. The method for tracking the target by combining the spatiotemporal context information as claimed in claim 1, wherein the step 5 comprises the following sub-steps:

6. The method for tracking the target by combining the spatiotemporal context information as claimed in claim 1, wherein the step 8 comprises the following sub-steps:

P(c(z)|o)＝R(z)ω_σ(z-h^*)

P(h|c(z)，o)＝f^sc(h-z)

(8c) according to a confidence function

Obtaining the space context model f learned by the current frame^sc(h)：

Wherein c (h) is a confidence map function expressed as

Wherein b is a constant, α is a scale parameter, β is a shape parameter,

represents a convolution symbol;

Comprises the following steps:

where ρ is an update parameter, and ρ ∈ (0,1), when t is 1,

f_t ^sc(h) representing the spatial context model learned by the t-th frame image.

7. The method for tracking a target in combination with spatiotemporal context information as claimed in claim 1, wherein the sub-step 9(a) comprises the following sub-steps:

(9a1) acquiring color histogram characteristics of a current frame image search area; quantizing the color histogram features to a J-th level, the J-th level features being represented as u_jAnd u is_jJ, J1.., J; the initial value of j is 1;

Is defined as:

indicating a location

A quantization level of a corresponding color histogram feature;

(9a3) let the position of any pixel of the sub-block to be searched in the current frame image be expressed as d_i}_{i＝1，2，...k}K is the total number of pixels contained in any sub-block to be searched in the current frame image and is equal to the total number of pixels contained in the target area of the first frame image, and then the j-th-level feature u_jProbability density function distributed on any sub-block to be searched of current frame image

Is defined as:

indicating a location

Quantization level of temporal color histogram feature, h₁Window radius as kernel function;

Expressed as:

let the second intermediate variable

Expressed as:

wherein λ is₁More than or equal to 1, which is a shielding degree parameter;