CN107424175B - Target tracking method combined with space-time context information - Google Patents

Target tracking method combined with space-time context information Download PDF

Info

Publication number
CN107424175B
CN107424175B CN201710596203.5A CN201710596203A CN107424175B CN 107424175 B CN107424175 B CN 107424175B CN 201710596203 A CN201710596203 A CN 201710596203A CN 107424175 B CN107424175 B CN 107424175B
Authority
CN
China
Prior art keywords
frame image
current frame
value
confidence
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710596203.5A
Other languages
Chinese (zh)
Other versions
CN107424175A (en
Inventor
朱红
王道江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201710596203.5A priority Critical patent/CN107424175B/en
Publication of CN107424175A publication Critical patent/CN107424175A/en
Application granted granted Critical
Publication of CN107424175B publication Critical patent/CN107424175B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the field of pattern recognition and computer vision, and discloses a target tracking method combining space-time context information, which comprises the following steps: training an initial strong classifier by using a first frame of picture, and learning a space-time context model required by the tracking of the next frame; when a new frame arrives, evaluating a plurality of blocks of the search area by using a trained strong classifier to obtain a first confidence matrix; then, a confidence map function is obtained by integrating the spatio-temporal context information, and the confidence values of all blocks in the search area are obtained by utilizing the confidence map function to obtain a second confidence matrix; finally, obtaining a final confidence matrix according to the corresponding weight linear combination, and finding a block with the maximum confidence value in the final confidence matrix as a target to be tracked; according to the method, the space-time context information of the target is combined into the online Boosting algorithm, so that the rapid robustness tracking can be realized.

Description

Target tracking method combined with space-time context information
Technical Field
The invention belongs to the technical field of pattern recognition and computer vision, and particularly relates to a target tracking method combining spatiotemporal context information.
Background
Moving target tracking is one of important research directions in the field of computer vision, and has important application in the fields of human-computer interaction, intelligent monitoring, medical imaging and the like. Tracking algorithms have made great progress in recent years, but how to effectively solve the problem of tracking drift caused by factors such as occlusion, rapid movement, illumination change, background clutter and the like is still a very challenging problem.
In the online Boosting algorithm, when a new frame arrives, a strong classifier is used for classifying the background and the target in the picture to obtain a target area, but when the target is shielded, the characteristic pool is updated by using the shielded characteristics, so that the characteristic pool is polluted, and finally tracking drift occurs.
Based on the above problems, some improved online Boosting algorithms are proposed. Yan et al propose an online Boosting algorithm based on a sub-region classifier, which divides a target region into a plurality of sub-regions, each sub-region corresponding to a strong classifier. In the tracking process, the feature pool corresponding to the strong classifier with the minimum confidence value is selected not to be updated so as to avoid the pollution of the shielded features to the feature pool, but when the target scale changes, the tracking effect is poor.
Sun et al propose an online Boosting algorithm with motion Blob detection, when the confidence value of the tracking result is lower than the lower threshold, detect the moving object in the search area by using the motion Blob detection method, and evaluate the confidence value of the detected moving object by using a strong classifier until the confidence value is greater than the upper threshold or the lower threshold, but the motion Blob detection usually cannot detect the moving object in a long distance, so the improvement effect is not obvious.
Wang et al propose the online Boosting algorithm of amalgamation and sheltering from the perception, utilize certain number of picture frames to train out background feature classifier and target feature classifier, utilize these two classifiers to perceive whether the target shelters from, if the target is sheltered from, then not gather the positive sample that is polluted and upgrade the classifier, just so increased the complexity of classifier, reduced the real-time effect of online Boosting algorithm, and easily follow the losing to the target of quick motion.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a target tracking method combining space-time context information, which can solve the problem of tracking drift occurring when a target area is partially shielded or the target size is greatly changed in the prior art.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme.
A method of target tracking in conjunction with spatiotemporal context information, the method comprising the steps of:
step 1, acquiring a first frame image in a video image, calibrating a target area of the first frame image, expanding the target area to obtain a search area by taking the target area as a center, taking the search area four times as large as the target area, taking the target area as a positive sample, and taking four corner areas of the search area as four negative samples respectively; wherein the size of the target area is the same as the size of each corner area; taking the positive sample and the four negative samples as training samples, and obtaining a strong classifier according to the training samples;
step 2, learning a spatial context model according to the first frame image, and taking the spatial context model as a learned space-time context model for tracking the next frame image;
step 3, obtaining a current frame image to be tracked, and determining an initial search area of the current frame image, wherein the initial search area of the current frame image is centered above a target area of a previous frame image, and the initial search area of the current frame image is four times of the target area of the previous frame image; partitioning an initial search area of a current frame image according to the size of a target area of a previous frame image to obtain a plurality of to-be-searched subblocks with the same size;
step 4, evaluating each subblock to be searched according to the strong classifier to obtain a first confidence value of each subblock to be searched, and forming a first confidence matrix;
step 5, obtaining a confidence map function according to a space-time context model which is learned by the previous frame of image and tracks the current frame of image; determining the central point of each subblock to be searched, and respectively obtaining a second confidence value of each subblock to be searched according to the confidence map function and the central point of each subblock to be searched to form a second confidence matrix;
step 6, determining that the initial value of the weight corresponding to the first confidence matrix is 1/2, the initial value of the weight corresponding to the second confidence matrix is 1/2, and linearly combining the first confidence matrix, the weight corresponding to the first confidence matrix, the second confidence matrix and the weight corresponding to the second confidence matrix to obtain a final confidence matrix; determining the maximum confidence value in the final confidence matrix, wherein the subblock to be searched corresponding to the maximum confidence value is a target area of the tracked current frame image;
step 7, determining a search area of the current frame image, wherein the search area of the current frame image takes a target area of the current frame image as a center, and the search area of the current frame image is four times of the target area of the current frame image; taking a target area of the current frame image as a positive sample, taking four corner areas of a search area of the current frame image as four negative samples respectively, and updating the strong classifier;
step 8, learning a space context model according to the current frame image, and determining a space-time context model which is learned by the current frame and tracks the next frame image by combining the space-time context model which is learned by the previous frame image and tracks the current frame image;
step 9, updating the weight corresponding to the first confidence matrix and the weight corresponding to the second confidence matrix according to the current frame image;
and 10, repeatedly executing the steps 3 to 9 until all the video images needing to be tracked are completed.
The invention integrates space-time context information into the on-line boosting target tracking algorithm, effectively solves the problem that the on-line boosting algorithm is easy to have tracking drift and even tracking loss when the tracking target is partially or completely shielded, and can realize the tracking of fast robustness.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating a target tracking method in combination with spatiotemporal context information according to an embodiment of the present invention;
FIG. 2 is a schematic diagram showing the comparison between the tracking effect of the method of the present invention and the tracking effect of the two conventional methods.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The technical scheme of the invention utilizes that the target between two adjacent frames in the video image does not change too much, the position does not change suddenly, and a certain specific relation exists between the target and the background around the target; using this relationship can help to distinguish between objects and background when the appearance of the object changes significantly. The invention introduces the space-time context information into the online Boosting algorithm.
Spatio-temporal context information: the temporal information is that the appearance and position of the object between adjacent frames do not change suddenly, and the spatial information is that the object has a certain specific relationship with the background around the object, and the relationship can help to distinguish the object from the background. The combination of these two pieces of information for the target is spatiotemporal context information.
The embodiment of the invention provides a target tracking method combined with space-time context information, as shown in figure 1, the method comprises the following steps:
step 1, acquiring a first frame image in a video image, calibrating a target area of the first frame image, expanding the target area to obtain a search area by taking the target area as a center, taking the search area four times as large as the target area, taking the target area as a positive sample, and taking four corner areas of the search area as four negative samples respectively; wherein the size of the target area is the same as the size of each corner area; and taking the positive sample and the four negative samples as training samples, and obtaining a strong classifier according to the training samples.
In step 1, the positive samples and the four negative samples are used as training samples, and a strong classifier is obtained according to the training samples, and the method specifically comprises the following substeps:
(1a) let training sample set S { (x)i,yi)|xi∈X,yi∈ Y, i ═ 1,2, … 5}, X denotes a training sample space consisting of one positive sample and four negative samples, XiRepresents the ith training sample in the training sample space, Y represents the sample class label, and Y { -1,1}, Y {iA sample class label representing an ith training sample in the training sample space; the sample type label is 1, which means that the training sample is a positive sample, and the sample type label is-1, which means that the training sample is a negative sample;
setting M weak classifiers, wherein the M weak classifier is
Figure BDA0001355956590000051
M1.., M; m represents the total number of weak classifiers;
the initial value of i is 1, and the initial value of m is 1; setting the sample importance weight lambda to be 1;
(1b) obtaining the ith training sample, and for the mth weak classifier
Figure BDA0001355956590000052
Parameter (d) of
Figure BDA0001355956590000053
Updating:
when m weak classifier
Figure BDA0001355956590000054
When the classification result of the ith training sample is correct, the parameters are ordered
Figure BDA0001355956590000055
Is added to the value of the sample importance weight lambda as the mth weak classifier
Figure BDA0001355956590000056
New parameters
Figure BDA0001355956590000057
Otherwise, let the parameter
Figure BDA0001355956590000058
Is added to the value of the sample importance weight lambda as the mth weak classifier
Figure BDA0001355956590000059
New parameters
Figure BDA00013559565900000510
Wherein the content of the first and second substances,
Figure BDA00013559565900000511
represents the cumulative classified correct sample weight for the mth weak classifier,
Figure BDA00013559565900000512
representing the cumulative classification error sample weight of the mth weak classifier;
(1c) adding 1 to the value of i, and repeatedly executing the substep (1b) until the value of i is greater than 5; get the m weak classifier
Figure BDA0001355956590000061
Final parameter of
Figure BDA0001355956590000062
(1d) Setting the value of i as 1, adding 1 to the value of M, and repeatedly executing the substeps (1b) to (1c) until the value of M is greater than M to obtain final parameters of M weak classifiers;
(1e) calculating the cumulative error rate of the mth weak classifier
Figure BDA0001355956590000063
Enabling M to respectively take 1, M and respectively obtaining the accumulated error rates of M weak classifiers;
(1f) obtaining a weak classifier with the minimum accumulated error rate as the nth selector
Figure BDA0001355956590000064
The initial value of N is 1, N is 1. N represents the total number of selectors;
setting the value of i to 1;
(1g) obtaining the ith training sample, and adopting the nth selector
Figure BDA0001355956590000065
Update the value of the sample importance weight λ:
when the nth selector
Figure BDA0001355956590000066
When the classification result of the ith training sample is correct, the value of the sample importance weight lambda is multiplied by 1/(2 × (1-)n) As a new sample importance weight λ), otherwise, the value of the sample importance weight λ is multiplied by 1/(2 ×)n) As a new sample importance weight λ; wherein the content of the first and second substances,ndenotes the nth selector
Figure BDA0001355956590000067
The cumulative error rate of the corresponding weak classifier;
(1h) adding 1 to the value of i, and repeatedly executing the substep (1g) until the value of i is greater than 5; obtaining a final new sample importance weight lambda value;
(1i) setting the value of i as 1, setting the value of m as 1, adding 1 to the value of N, adopting the final new sample importance weight lambda, and repeatedly executing the substeps (1b) to (1h) until the value of N is greater than N to obtain N selectors;
(1j) calculating the nth selector
Figure BDA0001355956590000068
Corresponding voting weight
Figure BDA0001355956590000069
Sequentially taking 1, N and N from the value of N to respectively obtain voting weights corresponding to the N selectors; ln (·) represents a logarithmic function;
(1k) carrying out linear combination on the N selectors according to the corresponding voting weights to obtain a strong classifier
Figure BDA00013559565900000610
Where sign () represents a sign function.
And 2, learning a spatial context model according to the first frame image, and taking the spatial context model as a learned space-time context model for tracking the next frame image.
The technical scheme of the invention utilizes the advantages of the target tracking algorithm based on the space-time context information in the aspect of processing the shielding, and makes up the defects of the online Boosting algorithm in the aspect of shielding.
Step 3, obtaining a current frame image to be tracked, and determining an initial search area of the current frame image, wherein the initial search area of the current frame image is centered above a target area of a previous frame image, and the initial search area of the current frame image is four times of the target area of the previous frame image; and partitioning the initial search area of the current frame image according to the size of the target area of the previous frame image to obtain a plurality of to-be-searched subblocks with the same size.
In step 3, the initial search area of the current frame image is blocked according to the size of the target area of the previous frame image to obtain a plurality of sub blocks to be searched with the same size, wherein the block step size comprises a row step size and a column step size: the row step size is: floor ((1-T). times.W +0.5), column step size: floor ((1-T). times.H + 0.5); floor (·) denotes downward rounding, T denotes a coincidence factor between two adjacent subblocks to be searched, W denotes a width of a target region of the first frame image, and H denotes a height of the target region of the first frame image.
And 4, evaluating each subblock to be searched according to the strong classifier to obtain a first confidence value of each subblock to be searched, and forming a first confidence matrix.
The step 4 specifically comprises the following steps: evaluating each subblock to be searched according to the strong classifier to obtain a first confidence value of each subblock to be searched
Figure BDA0001355956590000071
And forming a first confidence matrix, wherein x represents any subblock to be searched.
Step 5, obtaining a confidence map function according to a space-time context model which is learned by the previous frame of image and tracks the current frame of image; and determining the central point of each subblock to be searched, and respectively solving a second confidence value of each subblock to be searched according to the confidence map function and the central point of each subblock to be searched to form a second confidence matrix.
The step 5 specifically comprises the following substeps:
(5a) obtaining a confidence map function c (H) -IFFT (FFT (H)) according to a spatiotemporal context model for tracking the current frame image learned by the previous frame imagestc(h))⊙FFT(R(h)ωσ(h-h*)));
Wherein Hstc(h) Representing a space-time context model which is learned by a previous frame image and tracks the current frame image, h represents any position in a search area of the current frame image, and R (h) represents the gray value of a pixel at the position h in the search area of the current frame image; omegaσ(h-h*) Represents a weight function and is defined as
Figure BDA0001355956590000081
Zeta is a regularization constant, sigma is a scale parameter, h*Representing the position of the center point of the target area in the image of the previous frame, FFT (-) represents Fourier transform, IFFT (-) represents inverse Fourier transform, and ⊙ represents point multiplication;
(5b) and respectively calculating a second confidence value of each subblock to be searched by using the variable h in the confidence map function as the central point of each subblock to be searched of the current frame image to form a second confidence matrix.
Step 6, determining that the initial value of the weight corresponding to the first confidence matrix is 1/2, the initial value of the weight corresponding to the second confidence matrix is 1/2, and linearly combining the first confidence matrix, the weight corresponding to the first confidence matrix, the second confidence matrix and the weight corresponding to the second confidence matrix to obtain a final confidence matrix; and determining the maximum confidence value in the final confidence matrix, wherein the subblock to be searched corresponding to the maximum confidence value is the target area of the tracked current frame image.
Step 7, determining a search area of the current frame image, wherein the search area of the current frame image takes a target area of the current frame image as a center, and the search area of the current frame image is four times of the target area of the current frame image; and taking the target area of the current frame image as a positive sample, taking four corner areas of the current frame image search area as four negative samples respectively, and updating the strong classifier.
And 8, learning a space context model according to the current frame image, and determining the space-time context model which is learned by the current frame and tracks the next frame image by combining the space-time context model which is learned by the previous frame image and tracks the current frame image.
The step 8 specifically comprises the following substeps:
(8a) determining a context prior probability model P (c (z) o) of the current frame image:
P(c(z)|o)=R(z)ωσ(z-h*)
wherein, P (c (z) o) represents prior probability of appearance of the context feature at each pixel point in the background region of the current frame image under the condition that the object appears in the current frame search region, o represents an event that the object appears in the current frame search region, the context feature at z is represented as c (z) ═ r (z), z ∈ Ω, z is any position in the background region of the current frame image, Ω is the background region of the current frame image, the background region of the current frame image refers to the image region except the object region in the search region of the current frame image, r (z) represents gray value of pixel at position z of the background region of the current frame image, ω (z)σ(z-h*) Represents a weight function and is defined as
Figure BDA0001355956590000091
Zeta is a regularization constant, sigma is a scale parameter, h*Representing the position of the central point of the target area in the previous frame image;
(8b) determining a spatial context model P (h | c (z), o) of the current frame image:
P(h|c(z),o)=fsc(h-z)
wherein, P (h | c (z), o) represents the conditional probability that the target position is h under the condition that the target appears in the current frame image search area and the context feature appears at z, h represents any position in the current frame image search area, fsc(h-z) is a function of position h and position z, representing the learned spatial context model of the current frame;
(8c) according to a confidence function
Figure BDA0001355956590000092
Obtaining the space context model f learned by the current framesc(h):
Figure BDA0001355956590000093
Wherein c (h) is a confidence map function expressed as
Figure BDA0001355956590000094
Wherein b is a constant, α is a scale parameter, β is a shape parameter,
Figure BDA0001355956590000095
represents a convolution symbol;
(8d) the current frame image is set as the t frame image, and the space-time context model for tracking the current frame image, which is learned by the previous frame image, is
Figure BDA0001355956590000101
So that the current frame learns a spatiotemporal context model that tracks the next frame image
Figure BDA0001355956590000102
Comprises the following steps:
Figure BDA0001355956590000103
where ρ is an update parameter, and ρ ∈ (0,1), when t is 1,
Figure BDA0001355956590000104
Figure BDA0001355956590000105
representing the spatial context model learned by the t-th frame image.
And 9, updating the weight corresponding to the first confidence matrix and the weight corresponding to the second confidence matrix according to the current frame image.
For the weight corresponding to the first confidence matrix, the update of the weight corresponding to the second confidence matrix mainly needs to consider whether the target is occluded or not, both are 1/2 initially, and when the target is partially or completely occluded, the voting weight of the weight corresponding to the second confidence matrix is increased, and the weight corresponding to the first confidence matrix is decreased. In order to judge whether the current frame target is occluded, the embodiment of the invention introduces the concept of an occlusion factor to judge, and the method is established on the basis of the characteristics of a color histogram.
The step 9 specifically comprises the following substeps:
(9a) calculating an occlusion factor occ for the current frame image search area;
(9b) setting an occlusion factor threshold value, 0< <1, updating the weight A1 corresponding to the first confidence matrix, and updating the weight A2 corresponding to the second confidence matrix as follows:
Figure BDA0001355956590000106
wherein, A represents the weight corresponding to the first confidence matrix determined by the previous frame of image, and Y represents the maximum confidence value in the final confidence matrix.
The substep 9(a) specifically includes the substeps of:
(9a1) acquiring color histogram characteristics of a current frame image search area; quantizing the color histogram features to a J-th level, the J-th level features being represented as ujAnd u isj=j,j=1,...,J;
The initial value of j is 1;
(9a2) let the position of the pixel of the target area of the first frame image be expressed as
Figure BDA0001355956590000111
k is the total number of pixels contained in the target area of the first frame image, and the j-th level feature ujProbability density function distributed on target area of first frame image
Figure BDA0001355956590000112
Is defined as:
Figure BDA0001355956590000113
wherein C is a normalized constant, K (·) is a kernel function, | ·| caly |)2Represents the square of the modulus value, (-) represents the impulse response function,
Figure BDA0001355956590000114
indicating a location
Figure BDA0001355956590000115
A quantization level of a corresponding color histogram feature;
(9a3) let the position of any pixel of the sub-block to be searched in the current frame image be expressed as di}i=1,2,…kK is the total number of pixels contained in any sub-block to be searched in the current frame image and is equal to the total number of pixels contained in the target area of the first frame image, and then the j-th-level feature ujProbability density function distributed on any sub-block to be searched of current frame image
Figure BDA0001355956590000116
Is defined as:
Figure BDA0001355956590000117
wherein s is the central point position of the current frame image target area, C is a normalized constant, K (-) is a kernel function, | | | can2Represents the square of the modulus value, (-) represents the impulse response function,
Figure BDA0001355956590000118
indicating a location
Figure BDA0001355956590000119
Quantization level of corresponding color histogram feature, h1Window radius as kernel function;
(9a4) recording the central point position of the subblock to be searched with the maximum confidence value in the current frame image searching region as y0Let a first intermediate variable
Figure BDA00013559565900001110
Expressed as:
Figure BDA00013559565900001111
let the second intermediate variable
Figure BDA00013559565900001112
Expressed as:
Figure BDA0001355956590000121
wherein λ is1More than or equal to 1, which is a shielding degree parameter;
(9a5) adding 1 to the value of J, and repeatedly executing the sub-steps (9a2) to (9a4) to obtain J second intermediate variables, thereby calculating the occlusion factor of the current frame image search area
Figure BDA0001355956590000122
And 10, repeatedly executing the steps 3 to 9 until all the video images needing to be tracked are completed.
The technical scheme of the invention is realized in MATLAB 2014a, and the number of partial parameters is set as follows, wherein N is 50, M is 250, the coincidence factor T between blocks is 0.99, the proportion parameter α is 2.25, the update parameter rho is 0.075, and the shielding degree parameter lambda is1The occlusion factor threshold θ is 0.5, 1. Three methods (the method of the present invention, an online Boosting algorithm, based on a space-time context algorithm, a solid line frame is the tracking effect of the method of the present invention, a pure dotted line frame is the tracking effect based on the space-time context algorithm, and a dotted line frame with black solid points is the tracking effect of the online Boosting algorithm) are initialized to the same target frame in the first frame, and the tracking effect is as shown in fig. 2. The first video sequence (a column) tracks a toy dog (background is disordered, and a target is blocked), the method can correctly track the target, but after 120 frames, other two algorithms lose the target; second video sequence(b) tracking the walking people (moving pedestrians shielding the target) on a subway platform, and showing that the method of the invention has obviously better effect than other two methods, especially after 43 frames; and a third sequence (c column) tracks a fast-running automobile (the target scale changes rapidly and part of the sequence is blocked), and the method can also carry out robust tracking, thereby verifying the feasibility of the method.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (7)

1. A target tracking method combined with spatiotemporal context information is characterized by comprising the following steps:
step 1, acquiring a first frame image in a video image, calibrating a target area of the first frame image, expanding the target area to obtain a search area by taking the target area as a center, taking the search area four times as large as the target area, taking the target area as a positive sample, and taking four corner areas of the search area as four negative samples respectively; wherein the size of the target area is the same as the size of each corner area; taking the positive sample and the four negative samples as training samples, and obtaining a strong classifier according to the training samples;
step 2, learning a spatial context model according to the first frame image, and taking the spatial context model as a learned space-time context model for tracking the next frame image;
step 3, obtaining a current frame image to be tracked, and determining an initial search area of the current frame image, wherein the initial search area of the current frame image is centered above a target area of a previous frame image, and the initial search area of the current frame image is four times of the target area of the previous frame image; partitioning an initial search area of a current frame image according to the size of a target area of a previous frame image to obtain a plurality of to-be-searched subblocks with the same size;
step 4, evaluating each subblock to be searched according to the strong classifier to obtain a first confidence value of each subblock to be searched, and forming a first confidence matrix;
step 5, obtaining a confidence map function according to a space-time context model which is learned by the previous frame of image and tracks the current frame of image; determining the central point of each subblock to be searched, and respectively obtaining a second confidence value of each subblock to be searched according to the confidence map function and the central point of each subblock to be searched to form a second confidence matrix;
step 6, determining that the initial value of the weight corresponding to the first confidence matrix is 1/2, the initial value of the weight corresponding to the second confidence matrix is 1/2, and linearly combining the first confidence matrix, the weight corresponding to the first confidence matrix, the second confidence matrix and the weight corresponding to the second confidence matrix to obtain a final confidence matrix; determining the maximum confidence value in the final confidence matrix, wherein the subblock to be searched corresponding to the maximum confidence value is a target area of the tracked current frame image;
step 7, determining a search area of the current frame image, wherein the search area of the current frame image takes a target area of the current frame image as a center, and the search area of the current frame image is four times of the target area of the current frame image; taking a target area of the current frame image as a positive sample, taking four corner areas of a search area of the current frame image as four negative samples respectively, and updating the strong classifier;
step 8, learning a space context model according to the current frame image, and determining a space-time context model which is learned by the current frame and tracks the next frame image by combining the space-time context model which is learned by the previous frame image and tracks the current frame image;
step 9, updating the weight corresponding to the first confidence matrix and the weight corresponding to the second confidence matrix according to the current frame image;
the method specifically comprises the following substeps:
(9a) calculating an occlusion factor occ for the current frame image search area;
(9b) setting a shielding factor threshold value, wherein 0< 1, the weight A1 corresponding to the first confidence matrix is updated as follows, and the weight A2 corresponding to the second confidence matrix is updated as follows:
Figure FDA0002577393140000021
wherein, A represents the weight corresponding to the first confidence matrix determined by the previous frame of image, and Y represents the maximum confidence value in the final confidence matrix;
and 10, repeatedly executing the steps 3 to 9 until all the video images needing to be tracked are completed.
2. The method for tracking a target in combination with spatio-temporal context information as claimed in claim 1, wherein in step 1, the positive samples and the four negative samples are used as training samples, and a strong classifier is obtained according to the training samples, specifically comprising the following sub-steps:
(1a) let training sample set S { (x)i,yi)|xi∈X,yi∈ Y, i ═ 1,2, … 5}, X denotes a training sample space consisting of one positive sample and four negative samples, XiRepresents the ith training sample in the training sample space, Y represents the sample class label, and Y { -1,1}, Y {iA sample class label representing an ith training sample in the training sample space; the sample type label is 1, which means that the training sample is a positive sample, and the sample type label is-1, which means that the training sample is a negative sample;
setting M weak classifiers, wherein the M weak classifier is
Figure FDA0002577393140000031
M represents the total number of weak classifiers;
the initial value of i is 1, and the initial value of m is 1; setting the sample importance weight lambda to be 1;
(1b) obtaining the ith training sample, and for the mth weak classifier
Figure FDA0002577393140000032
Parameter (d) of
Figure FDA0002577393140000033
Updating:
when m weak classifier
Figure FDA0002577393140000034
When the classification result of the ith training sample is correct, the parameters are ordered
Figure FDA0002577393140000035
Is added to the value of the sample importance weight lambda as the mth weak classifier
Figure FDA0002577393140000036
New parameters
Figure FDA0002577393140000037
Otherwise, let the parameter
Figure FDA0002577393140000038
Is added to the value of the sample importance weight lambda as the mth weak classifier
Figure FDA0002577393140000039
New parameters
Figure FDA00025773931400000310
Wherein the content of the first and second substances,
Figure FDA00025773931400000311
represents the cumulative classified correct sample weight for the mth weak classifier,
Figure FDA00025773931400000312
representing the cumulative classification error sample weight of the mth weak classifier;
(1c) adding 1 to the value of i, and repeatedly executing the substep (1b) until the value of i is greater than 5; get the m weak classifier
Figure FDA00025773931400000313
Final parameter of
Figure FDA00025773931400000314
(1d) Setting the value of i as 1, adding 1 to the value of M, and repeatedly executing the substeps (1b) to (1c) until the value of M is greater than M to obtain final parameters of M weak classifiers;
(1e) calculating the cumulative error rate of the mth weak classifier
Figure FDA00025773931400000315
Enabling M to respectively take 1, M and respectively obtaining the accumulated error rates of M weak classifiers;
(1f) obtaining a weak classifier with the minimum accumulated error rate as the nth selector
Figure FDA00025773931400000316
The initial value of N is 1, N is 1. N represents the total number of selectors;
setting the value of i to 1;
(1g) obtaining the ith training sample, and adopting the nth selector
Figure FDA0002577393140000041
Update the value of the sample importance weight λ:
when the nth selector
Figure FDA0002577393140000042
When the classification result of the ith training sample is correct, the value of the sample importance weight lambda is multiplied by 1/(2 × (1-)n) As a new sample importance weight λ), otherwise, the value of the sample importance weight λ is multiplied by 1/(2 ×)n) As a new sample importance weight λ; wherein the content of the first and second substances,ndenotes the nth selector
Figure FDA0002577393140000043
The cumulative error rate of the corresponding weak classifier;
(1h) adding 1 to the value of i, and repeatedly executing the substep (1g) until the value of i is greater than 5; obtaining a final new sample importance weight lambda value;
(1i) setting the value of i as 1, setting the value of m as 1, adding 1 to the value of N, adopting the final new sample importance weight lambda, and repeatedly executing the substeps (1b) to (1h) until the value of N is greater than N to obtain N selectors;
(1j) calculating the nth selector
Figure FDA0002577393140000044
Corresponding voting weight
Figure FDA0002577393140000045
Sequentially taking 1, N and N from the value of N to respectively obtain voting weights corresponding to the N selectors; ln (·) represents a logarithmic function;
(1k) carrying out linear combination on the N selectors according to the corresponding voting weights to obtain a strong classifier
Figure FDA0002577393140000046
Where sign () represents a sign function.
3. The method for tracking the target in combination with the spatiotemporal context information as claimed in claim 1, wherein in step 3, the initial search area of the current frame image is partitioned according to the size of the target area of the previous frame image to obtain a plurality of sub-blocks to be searched, wherein the block step size comprises a row step size and a column step size: the row step size is: floor ((1-T). times.W +0.5), column step size: floor ((1-T). times.H + 0.5); floor (·) denotes downward rounding, T denotes a coincidence factor between two adjacent subblocks to be searched, W denotes a width of a target region of the first frame image, and H denotes a height of the target region of the first frame image.
4. The method for tracking the target by combining the spatiotemporal context information as claimed in claim 2, wherein the step 4 specifically comprises:
evaluating each subblock to be searched according to the strong classifier to obtain a first confidence value of each subblock to be searched
Figure FDA0002577393140000051
And forming a first confidence matrix, wherein x represents any subblock to be searched.
5. The method for tracking the target by combining the spatiotemporal context information as claimed in claim 1, wherein the step 5 comprises the following sub-steps:
(5a) obtaining a confidence map function c (H) -IFFT (FFT (H)) according to a spatiotemporal context model for tracking the current frame image learned by the previous frame imagestc(h))⊙FFT(R(h)ωσ(h-h*)));
Wherein Hstc(h) Representing a space-time context model which is learned by a previous frame image and tracks the current frame image, h represents any position in a search area of the current frame image, and R (h) represents the gray value of a pixel at the position h in the search area of the current frame image; omegaσ(h-h*) Represents a weight function and is defined as
Figure FDA0002577393140000052
Zeta is a regularization constant, sigma is a scale parameter, h*Representing the position of the center point of the target area in the image of the previous frame, FFT (-) represents Fourier transform, IFFT (-) represents inverse Fourier transform, and ⊙ represents point multiplication;
(5b) and respectively calculating a second confidence value of each subblock to be searched by using the variable h in the confidence map function as the central point of each subblock to be searched of the current frame image to form a second confidence matrix.
6. The method for tracking the target by combining the spatiotemporal context information as claimed in claim 1, wherein the step 8 comprises the following sub-steps:
(8a) determining a context prior probability model P (c (z) o) of the current frame image:
P(c(z)|o)=R(z)ωσ(z-h*)
wherein, P (c (z) o) represents prior probability of appearance of the context feature at each pixel point in the background region of the current frame image under the condition that the object appears in the current frame search region, o represents an event that the object appears in the current frame search region, the context feature at z is represented as c (z) ═ r (z), z ∈ Ω, z is any position in the background region of the current frame image, Ω is the background region of the current frame image, the background region of the current frame image refers to the image region except the object region in the search region of the current frame image, r (z) represents gray value of pixel at position z of the background region of the current frame image, ω (z)σ(z-h*) Represents a weight function and is defined as
Figure FDA0002577393140000061
Zeta is a regularization constant, sigma is a scale parameter, h*Representing the position of the central point of the target area in the previous frame image;
(8b) determining a spatial context model P (h | c (z), o) of the current frame image:
P(h|c(z),o)=fsc(h-z)
wherein, P (h | c (z), o) represents the conditional probability that the target position is h under the condition that the target appears in the current frame image search area and the context feature appears at z, h represents any position in the current frame image search area, fsc(h-z) is a function of position h and position z, representing the learned spatial context model of the current frame;
(8c) according to a confidence function
Figure FDA0002577393140000062
Obtaining the space context model f learned by the current framesc(h):
Figure FDA0002577393140000063
Wherein c (h) is a confidence map function expressed as
Figure FDA0002577393140000064
Wherein b is a constant, α is a scale parameter, β is a shape parameter,
Figure FDA0002577393140000065
represents a convolution symbol;
(8d) the current frame image is set as the t frame image, and the space-time context model for tracking the current frame image, which is learned by the previous frame image, is
Figure FDA0002577393140000066
So that the current frame learns a spatiotemporal context model that tracks the next frame image
Figure FDA0002577393140000067
Comprises the following steps:
Figure FDA0002577393140000068
where ρ is an update parameter, and ρ ∈ (0,1), when t is 1,
Figure FDA0002577393140000069
ft sc(h) representing the spatial context model learned by the t-th frame image.
7. The method for tracking a target in combination with spatiotemporal context information as claimed in claim 1, wherein the sub-step 9(a) comprises the following sub-steps:
(9a1) acquiring color histogram characteristics of a current frame image search area; quantizing the color histogram features to a J-th level, the J-th level features being represented as ujAnd u isjJ, J1.., J; the initial value of j is 1;
(9a2) let the position of the pixel of the target area of the first frame image be expressed as
Figure FDA0002577393140000071
k is the total number of pixels contained in the target area of the first frame image, and the j-th level feature ujProbability density function distributed on target area of first frame image
Figure FDA0002577393140000072
Is defined as:
Figure FDA0002577393140000073
wherein C is a normalized constant, K (·) is a kernel function, | ·| caly |)2Represents the square of the modulus value, (-) represents the impulse response function,
Figure FDA0002577393140000074
indicating a location
Figure FDA0002577393140000075
A quantization level of a corresponding color histogram feature;
(9a3) let the position of any pixel of the sub-block to be searched in the current frame image be expressed as di}i=1,2,...kK is the total number of pixels contained in any sub-block to be searched in the current frame image and is equal to the total number of pixels contained in the target area of the first frame image, and then the j-th-level feature ujProbability density function distributed on any sub-block to be searched of current frame image
Figure FDA0002577393140000076
Is defined as:
Figure FDA0002577393140000077
wherein s is the central point position of the current frame image target area, C is a normalized constant, K (-) is a kernel function, | | | can2Represents the square of the modulus value, (-) represents the impulse response function,
Figure FDA0002577393140000078
indicating a location
Figure FDA0002577393140000079
Quantization level of temporal color histogram feature, h1Window radius as kernel function;
(9a4) recording the central point position of the subblock to be searched with the maximum confidence value in the current frame image searching region as y0Let a first intermediate variable
Figure FDA00025773931400000710
Expressed as:
Figure FDA0002577393140000081
let the second intermediate variable
Figure FDA0002577393140000082
Expressed as:
Figure FDA0002577393140000083
wherein λ is1More than or equal to 1, which is a shielding degree parameter;
(9a5) adding 1 to the value of J, and repeatedly executing the sub-steps (9a2) to (9a4) to obtain J second intermediate variables, thereby calculating the occlusion factor of the current frame image search area
Figure FDA0002577393140000084
CN201710596203.5A 2017-07-20 2017-07-20 Target tracking method combined with space-time context information Active CN107424175B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710596203.5A CN107424175B (en) 2017-07-20 2017-07-20 Target tracking method combined with space-time context information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710596203.5A CN107424175B (en) 2017-07-20 2017-07-20 Target tracking method combined with space-time context information

Publications (2)

Publication Number Publication Date
CN107424175A CN107424175A (en) 2017-12-01
CN107424175B true CN107424175B (en) 2020-09-08

Family

ID=60430564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710596203.5A Active CN107424175B (en) 2017-07-20 2017-07-20 Target tracking method combined with space-time context information

Country Status (1)

Country Link
CN (1) CN107424175B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416800A (en) * 2018-03-13 2018-08-17 青岛海信医疗设备股份有限公司 Method for tracking target and device, terminal, computer readable storage medium
CN110070562A (en) * 2019-04-02 2019-07-30 西北工业大学 A kind of context-sensitive depth targets tracking
CN110570451B (en) * 2019-08-05 2022-02-01 武汉大学 Multithreading visual target tracking method based on STC and block re-detection
CN110738685B (en) * 2019-09-09 2023-05-05 桂林理工大学 Space-time context tracking method integrating color histogram response
CN113743252B (en) * 2021-08-17 2024-05-31 北京佳服信息科技有限公司 Target tracking method, device, equipment and readable storage medium
CN114140501A (en) * 2022-01-30 2022-03-04 南昌工程学院 Target tracking method and device and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335986A (en) * 2015-09-10 2016-02-17 西安电子科技大学 Characteristic matching and MeanShift algorithm-based target tracking method
CN106485732A (en) * 2016-09-09 2017-03-08 南京航空航天大学 A kind of method for tracking target of video sequence
WO2017044550A1 (en) * 2015-09-11 2017-03-16 Intel Corporation A real-time multiple vehicle detection and tracking

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335986A (en) * 2015-09-10 2016-02-17 西安电子科技大学 Characteristic matching and MeanShift algorithm-based target tracking method
WO2017044550A1 (en) * 2015-09-11 2017-03-16 Intel Corporation A real-time multiple vehicle detection and tracking
CN106485732A (en) * 2016-09-09 2017-03-08 南京航空航天大学 A kind of method for tracking target of video sequence

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《On-line Boosting and Vision》;Helmut Grabner等;《2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition》;20060630;第1卷;第260-267页 *
《复杂场景下实时目标跟踪算法及实现技术研究》;张雷;《中国博士学位论文全文数据库 信息科技辑》;20160815(第8期);I138-39:正文75-78、81-86页 *
张雷.《复杂场景下实时目标跟踪算法及实现技术研究》.《中国博士学位论文全文数据库 信息科技辑》.2016,(第8期),I138-39:正文74-78、80-86页. *

Also Published As

Publication number Publication date
CN107424175A (en) 2017-12-01

Similar Documents

Publication Publication Date Title
CN107424175B (en) Target tracking method combined with space-time context information
WO2020173226A1 (en) Spatial-temporal behavior detection method
KR102462572B1 (en) Systems and methods for training object classifiers by machine learning
Wang et al. Detection of abnormal visual events via global optical flow orientation histogram
CN104063883B (en) A kind of monitor video abstraction generating method being combined based on object and key frame
US10198657B2 (en) All-weather thermal-image pedestrian detection method
CN108564598B (en) Improved online Boosting target tracking method
CN111680655A (en) Video target detection method for aerial images of unmanned aerial vehicle
CN111932583A (en) Space-time information integrated intelligent tracking method based on complex background
CN108960047B (en) Face duplication removing method in video monitoring based on depth secondary tree
CN110765906A (en) Pedestrian detection algorithm based on key points
CN112597815A (en) Synthetic aperture radar image ship detection method based on Group-G0 model
CN112836640A (en) Single-camera multi-target pedestrian tracking method
CN109919223B (en) Target detection method and device based on deep neural network
CN110084201B (en) Human body action recognition method based on convolutional neural network of specific target tracking in monitoring scene
CN104978567A (en) Vehicle detection method based on scenario classification
CN112270381B (en) People flow detection method based on deep learning
CN113688761B (en) Pedestrian behavior category detection method based on image sequence
CN111724566A (en) Pedestrian falling detection method and device based on intelligent lamp pole video monitoring system
Cao et al. Learning spatial-temporal representation for smoke vehicle detection
CN113129336A (en) End-to-end multi-vehicle tracking method, system and computer readable medium
Teng et al. Robust multi-scale ship tracking via multiple compressed features fusion
CN113963333B (en) Traffic sign board detection method based on improved YOLOF model
Fan et al. Video anomaly detection using CycleGan based on skeleton features
CN111144220B (en) Personnel detection method, device, equipment and medium suitable for big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant