CN107424175B - Target tracking method combined with space-time context information - Google Patents
Target tracking method combined with space-time context information Download PDFInfo
- Publication number
- CN107424175B CN107424175B CN201710596203.5A CN201710596203A CN107424175B CN 107424175 B CN107424175 B CN 107424175B CN 201710596203 A CN201710596203 A CN 201710596203A CN 107424175 B CN107424175 B CN 107424175B
- Authority
- CN
- China
- Prior art keywords
- frame image
- current frame
- value
- confidence
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/248—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the field of pattern recognition and computer vision, and discloses a target tracking method combining space-time context information, which comprises the following steps: training an initial strong classifier by using a first frame of picture, and learning a space-time context model required by the tracking of the next frame; when a new frame arrives, evaluating a plurality of blocks of the search area by using a trained strong classifier to obtain a first confidence matrix; then, a confidence map function is obtained by integrating the spatio-temporal context information, and the confidence values of all blocks in the search area are obtained by utilizing the confidence map function to obtain a second confidence matrix; finally, obtaining a final confidence matrix according to the corresponding weight linear combination, and finding a block with the maximum confidence value in the final confidence matrix as a target to be tracked; according to the method, the space-time context information of the target is combined into the online Boosting algorithm, so that the rapid robustness tracking can be realized.
Description
Technical Field
The invention belongs to the technical field of pattern recognition and computer vision, and particularly relates to a target tracking method combining spatiotemporal context information.
Background
Moving target tracking is one of important research directions in the field of computer vision, and has important application in the fields of human-computer interaction, intelligent monitoring, medical imaging and the like. Tracking algorithms have made great progress in recent years, but how to effectively solve the problem of tracking drift caused by factors such as occlusion, rapid movement, illumination change, background clutter and the like is still a very challenging problem.
In the online Boosting algorithm, when a new frame arrives, a strong classifier is used for classifying the background and the target in the picture to obtain a target area, but when the target is shielded, the characteristic pool is updated by using the shielded characteristics, so that the characteristic pool is polluted, and finally tracking drift occurs.
Based on the above problems, some improved online Boosting algorithms are proposed. Yan et al propose an online Boosting algorithm based on a sub-region classifier, which divides a target region into a plurality of sub-regions, each sub-region corresponding to a strong classifier. In the tracking process, the feature pool corresponding to the strong classifier with the minimum confidence value is selected not to be updated so as to avoid the pollution of the shielded features to the feature pool, but when the target scale changes, the tracking effect is poor.
Sun et al propose an online Boosting algorithm with motion Blob detection, when the confidence value of the tracking result is lower than the lower threshold, detect the moving object in the search area by using the motion Blob detection method, and evaluate the confidence value of the detected moving object by using a strong classifier until the confidence value is greater than the upper threshold or the lower threshold, but the motion Blob detection usually cannot detect the moving object in a long distance, so the improvement effect is not obvious.
Wang et al propose the online Boosting algorithm of amalgamation and sheltering from the perception, utilize certain number of picture frames to train out background feature classifier and target feature classifier, utilize these two classifiers to perceive whether the target shelters from, if the target is sheltered from, then not gather the positive sample that is polluted and upgrade the classifier, just so increased the complexity of classifier, reduced the real-time effect of online Boosting algorithm, and easily follow the losing to the target of quick motion.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a target tracking method combining space-time context information, which can solve the problem of tracking drift occurring when a target area is partially shielded or the target size is greatly changed in the prior art.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme.
A method of target tracking in conjunction with spatiotemporal context information, the method comprising the steps of:
step 1, acquiring a first frame image in a video image, calibrating a target area of the first frame image, expanding the target area to obtain a search area by taking the target area as a center, taking the search area four times as large as the target area, taking the target area as a positive sample, and taking four corner areas of the search area as four negative samples respectively; wherein the size of the target area is the same as the size of each corner area; taking the positive sample and the four negative samples as training samples, and obtaining a strong classifier according to the training samples;
step 2, learning a spatial context model according to the first frame image, and taking the spatial context model as a learned space-time context model for tracking the next frame image;
step 3, obtaining a current frame image to be tracked, and determining an initial search area of the current frame image, wherein the initial search area of the current frame image is centered above a target area of a previous frame image, and the initial search area of the current frame image is four times of the target area of the previous frame image; partitioning an initial search area of a current frame image according to the size of a target area of a previous frame image to obtain a plurality of to-be-searched subblocks with the same size;
step 4, evaluating each subblock to be searched according to the strong classifier to obtain a first confidence value of each subblock to be searched, and forming a first confidence matrix;
step 5, obtaining a confidence map function according to a space-time context model which is learned by the previous frame of image and tracks the current frame of image; determining the central point of each subblock to be searched, and respectively obtaining a second confidence value of each subblock to be searched according to the confidence map function and the central point of each subblock to be searched to form a second confidence matrix;
step 6, determining that the initial value of the weight corresponding to the first confidence matrix is 1/2, the initial value of the weight corresponding to the second confidence matrix is 1/2, and linearly combining the first confidence matrix, the weight corresponding to the first confidence matrix, the second confidence matrix and the weight corresponding to the second confidence matrix to obtain a final confidence matrix; determining the maximum confidence value in the final confidence matrix, wherein the subblock to be searched corresponding to the maximum confidence value is a target area of the tracked current frame image;
step 7, determining a search area of the current frame image, wherein the search area of the current frame image takes a target area of the current frame image as a center, and the search area of the current frame image is four times of the target area of the current frame image; taking a target area of the current frame image as a positive sample, taking four corner areas of a search area of the current frame image as four negative samples respectively, and updating the strong classifier;
step 8, learning a space context model according to the current frame image, and determining a space-time context model which is learned by the current frame and tracks the next frame image by combining the space-time context model which is learned by the previous frame image and tracks the current frame image;
step 9, updating the weight corresponding to the first confidence matrix and the weight corresponding to the second confidence matrix according to the current frame image;
and 10, repeatedly executing the steps 3 to 9 until all the video images needing to be tracked are completed.
The invention integrates space-time context information into the on-line boosting target tracking algorithm, effectively solves the problem that the on-line boosting algorithm is easy to have tracking drift and even tracking loss when the tracking target is partially or completely shielded, and can realize the tracking of fast robustness.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating a target tracking method in combination with spatiotemporal context information according to an embodiment of the present invention;
FIG. 2 is a schematic diagram showing the comparison between the tracking effect of the method of the present invention and the tracking effect of the two conventional methods.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The technical scheme of the invention utilizes that the target between two adjacent frames in the video image does not change too much, the position does not change suddenly, and a certain specific relation exists between the target and the background around the target; using this relationship can help to distinguish between objects and background when the appearance of the object changes significantly. The invention introduces the space-time context information into the online Boosting algorithm.
Spatio-temporal context information: the temporal information is that the appearance and position of the object between adjacent frames do not change suddenly, and the spatial information is that the object has a certain specific relationship with the background around the object, and the relationship can help to distinguish the object from the background. The combination of these two pieces of information for the target is spatiotemporal context information.
The embodiment of the invention provides a target tracking method combined with space-time context information, as shown in figure 1, the method comprises the following steps:
step 1, acquiring a first frame image in a video image, calibrating a target area of the first frame image, expanding the target area to obtain a search area by taking the target area as a center, taking the search area four times as large as the target area, taking the target area as a positive sample, and taking four corner areas of the search area as four negative samples respectively; wherein the size of the target area is the same as the size of each corner area; and taking the positive sample and the four negative samples as training samples, and obtaining a strong classifier according to the training samples.
In step 1, the positive samples and the four negative samples are used as training samples, and a strong classifier is obtained according to the training samples, and the method specifically comprises the following substeps:
(1a) let training sample set S { (x)i,yi)|xi∈X,yi∈ Y, i ═ 1,2, … 5}, X denotes a training sample space consisting of one positive sample and four negative samples, XiRepresents the ith training sample in the training sample space, Y represents the sample class label, and Y { -1,1}, Y {iA sample class label representing an ith training sample in the training sample space; the sample type label is 1, which means that the training sample is a positive sample, and the sample type label is-1, which means that the training sample is a negative sample;
setting M weak classifiers, wherein the M weak classifier isM1.., M; m represents the total number of weak classifiers;
the initial value of i is 1, and the initial value of m is 1; setting the sample importance weight lambda to be 1;
when m weak classifierWhen the classification result of the ith training sample is correct, the parameters are orderedIs added to the value of the sample importance weight lambda as the mth weak classifierNew parametersOtherwise, let the parameterIs added to the value of the sample importance weight lambda as the mth weak classifierNew parameters
Wherein the content of the first and second substances,represents the cumulative classified correct sample weight for the mth weak classifier,representing the cumulative classification error sample weight of the mth weak classifier;
(1c) adding 1 to the value of i, and repeatedly executing the substep (1b) until the value of i is greater than 5; get the m weak classifierFinal parameter of
(1d) Setting the value of i as 1, adding 1 to the value of M, and repeatedly executing the substeps (1b) to (1c) until the value of M is greater than M to obtain final parameters of M weak classifiers;
(1e) calculating the cumulative error rate of the mth weak classifierEnabling M to respectively take 1, M and respectively obtaining the accumulated error rates of M weak classifiers;
(1f) obtaining a weak classifier with the minimum accumulated error rate as the nth selectorThe initial value of N is 1, N is 1. N represents the total number of selectors;
setting the value of i to 1;
(1g) obtaining the ith training sample, and adopting the nth selectorUpdate the value of the sample importance weight λ:
when the nth selectorWhen the classification result of the ith training sample is correct, the value of the sample importance weight lambda is multiplied by 1/(2 × (1-)n) As a new sample importance weight λ), otherwise, the value of the sample importance weight λ is multiplied by 1/(2 ×)n) As a new sample importance weight λ; wherein the content of the first and second substances,ndenotes the nth selectorThe cumulative error rate of the corresponding weak classifier;
(1h) adding 1 to the value of i, and repeatedly executing the substep (1g) until the value of i is greater than 5; obtaining a final new sample importance weight lambda value;
(1i) setting the value of i as 1, setting the value of m as 1, adding 1 to the value of N, adopting the final new sample importance weight lambda, and repeatedly executing the substeps (1b) to (1h) until the value of N is greater than N to obtain N selectors;
(1j) calculating the nth selectorCorresponding voting weightSequentially taking 1, N and N from the value of N to respectively obtain voting weights corresponding to the N selectors; ln (·) represents a logarithmic function;
(1k) carrying out linear combination on the N selectors according to the corresponding voting weights to obtain a strong classifierWhere sign () represents a sign function.
And 2, learning a spatial context model according to the first frame image, and taking the spatial context model as a learned space-time context model for tracking the next frame image.
The technical scheme of the invention utilizes the advantages of the target tracking algorithm based on the space-time context information in the aspect of processing the shielding, and makes up the defects of the online Boosting algorithm in the aspect of shielding.
Step 3, obtaining a current frame image to be tracked, and determining an initial search area of the current frame image, wherein the initial search area of the current frame image is centered above a target area of a previous frame image, and the initial search area of the current frame image is four times of the target area of the previous frame image; and partitioning the initial search area of the current frame image according to the size of the target area of the previous frame image to obtain a plurality of to-be-searched subblocks with the same size.
In step 3, the initial search area of the current frame image is blocked according to the size of the target area of the previous frame image to obtain a plurality of sub blocks to be searched with the same size, wherein the block step size comprises a row step size and a column step size: the row step size is: floor ((1-T). times.W +0.5), column step size: floor ((1-T). times.H + 0.5); floor (·) denotes downward rounding, T denotes a coincidence factor between two adjacent subblocks to be searched, W denotes a width of a target region of the first frame image, and H denotes a height of the target region of the first frame image.
And 4, evaluating each subblock to be searched according to the strong classifier to obtain a first confidence value of each subblock to be searched, and forming a first confidence matrix.
The step 4 specifically comprises the following steps: evaluating each subblock to be searched according to the strong classifier to obtain a first confidence value of each subblock to be searchedAnd forming a first confidence matrix, wherein x represents any subblock to be searched.
Step 5, obtaining a confidence map function according to a space-time context model which is learned by the previous frame of image and tracks the current frame of image; and determining the central point of each subblock to be searched, and respectively solving a second confidence value of each subblock to be searched according to the confidence map function and the central point of each subblock to be searched to form a second confidence matrix.
The step 5 specifically comprises the following substeps:
(5a) obtaining a confidence map function c (H) -IFFT (FFT (H)) according to a spatiotemporal context model for tracking the current frame image learned by the previous frame imagestc(h))⊙FFT(R(h)ωσ(h-h*)));
Wherein Hstc(h) Representing a space-time context model which is learned by a previous frame image and tracks the current frame image, h represents any position in a search area of the current frame image, and R (h) represents the gray value of a pixel at the position h in the search area of the current frame image; omegaσ(h-h*) Represents a weight function and is defined asZeta is a regularization constant, sigma is a scale parameter, h*Representing the position of the center point of the target area in the image of the previous frame, FFT (-) represents Fourier transform, IFFT (-) represents inverse Fourier transform, and ⊙ represents point multiplication;
(5b) and respectively calculating a second confidence value of each subblock to be searched by using the variable h in the confidence map function as the central point of each subblock to be searched of the current frame image to form a second confidence matrix.
Step 6, determining that the initial value of the weight corresponding to the first confidence matrix is 1/2, the initial value of the weight corresponding to the second confidence matrix is 1/2, and linearly combining the first confidence matrix, the weight corresponding to the first confidence matrix, the second confidence matrix and the weight corresponding to the second confidence matrix to obtain a final confidence matrix; and determining the maximum confidence value in the final confidence matrix, wherein the subblock to be searched corresponding to the maximum confidence value is the target area of the tracked current frame image.
Step 7, determining a search area of the current frame image, wherein the search area of the current frame image takes a target area of the current frame image as a center, and the search area of the current frame image is four times of the target area of the current frame image; and taking the target area of the current frame image as a positive sample, taking four corner areas of the current frame image search area as four negative samples respectively, and updating the strong classifier.
And 8, learning a space context model according to the current frame image, and determining the space-time context model which is learned by the current frame and tracks the next frame image by combining the space-time context model which is learned by the previous frame image and tracks the current frame image.
The step 8 specifically comprises the following substeps:
(8a) determining a context prior probability model P (c (z) o) of the current frame image:
P(c(z)|o)=R(z)ωσ(z-h*)
wherein, P (c (z) o) represents prior probability of appearance of the context feature at each pixel point in the background region of the current frame image under the condition that the object appears in the current frame search region, o represents an event that the object appears in the current frame search region, the context feature at z is represented as c (z) ═ r (z), z ∈ Ω, z is any position in the background region of the current frame image, Ω is the background region of the current frame image, the background region of the current frame image refers to the image region except the object region in the search region of the current frame image, r (z) represents gray value of pixel at position z of the background region of the current frame image, ω (z)σ(z-h*) Represents a weight function and is defined asZeta is a regularization constant, sigma is a scale parameter, h*Representing the position of the central point of the target area in the previous frame image;
(8b) determining a spatial context model P (h | c (z), o) of the current frame image:
P(h|c(z),o)=fsc(h-z)
wherein, P (h | c (z), o) represents the conditional probability that the target position is h under the condition that the target appears in the current frame image search area and the context feature appears at z, h represents any position in the current frame image search area, fsc(h-z) is a function of position h and position z, representing the learned spatial context model of the current frame;
(8c) according to a confidence functionObtaining the space context model f learned by the current framesc(h):
Wherein c (h) is a confidence map function expressed asWherein b is a constant, α is a scale parameter, β is a shape parameter,represents a convolution symbol;
(8d) the current frame image is set as the t frame image, and the space-time context model for tracking the current frame image, which is learned by the previous frame image, isSo that the current frame learns a spatiotemporal context model that tracks the next frame imageComprises the following steps:
where ρ is an update parameter, and ρ ∈ (0,1), when t is 1, representing the spatial context model learned by the t-th frame image.
And 9, updating the weight corresponding to the first confidence matrix and the weight corresponding to the second confidence matrix according to the current frame image.
For the weight corresponding to the first confidence matrix, the update of the weight corresponding to the second confidence matrix mainly needs to consider whether the target is occluded or not, both are 1/2 initially, and when the target is partially or completely occluded, the voting weight of the weight corresponding to the second confidence matrix is increased, and the weight corresponding to the first confidence matrix is decreased. In order to judge whether the current frame target is occluded, the embodiment of the invention introduces the concept of an occlusion factor to judge, and the method is established on the basis of the characteristics of a color histogram.
The step 9 specifically comprises the following substeps:
(9a) calculating an occlusion factor occ for the current frame image search area;
(9b) setting an occlusion factor threshold value, 0< <1, updating the weight A1 corresponding to the first confidence matrix, and updating the weight A2 corresponding to the second confidence matrix as follows:
wherein, A represents the weight corresponding to the first confidence matrix determined by the previous frame of image, and Y represents the maximum confidence value in the final confidence matrix.
The substep 9(a) specifically includes the substeps of:
(9a1) acquiring color histogram characteristics of a current frame image search area; quantizing the color histogram features to a J-th level, the J-th level features being represented as ujAnd u isj=j,j=1,...,J;
The initial value of j is 1;
(9a2) let the position of the pixel of the target area of the first frame image be expressed ask is the total number of pixels contained in the target area of the first frame image, and the j-th level feature ujProbability density function distributed on target area of first frame imageIs defined as:
wherein C is a normalized constant, K (·) is a kernel function, | ·| caly |)2Represents the square of the modulus value, (-) represents the impulse response function,indicating a locationA quantization level of a corresponding color histogram feature;
(9a3) let the position of any pixel of the sub-block to be searched in the current frame image be expressed as di}i=1,2,…kK is the total number of pixels contained in any sub-block to be searched in the current frame image and is equal to the total number of pixels contained in the target area of the first frame image, and then the j-th-level feature ujProbability density function distributed on any sub-block to be searched of current frame imageIs defined as:
wherein s is the central point position of the current frame image target area, C is a normalized constant, K (-) is a kernel function, | | | can2Represents the square of the modulus value, (-) represents the impulse response function,indicating a locationQuantization level of corresponding color histogram feature, h1Window radius as kernel function;
(9a4) recording the central point position of the subblock to be searched with the maximum confidence value in the current frame image searching region as y0Let a first intermediate variableExpressed as:
wherein λ is1More than or equal to 1, which is a shielding degree parameter;
(9a5) adding 1 to the value of J, and repeatedly executing the sub-steps (9a2) to (9a4) to obtain J second intermediate variables, thereby calculating the occlusion factor of the current frame image search area
And 10, repeatedly executing the steps 3 to 9 until all the video images needing to be tracked are completed.
The technical scheme of the invention is realized in MATLAB 2014a, and the number of partial parameters is set as follows, wherein N is 50, M is 250, the coincidence factor T between blocks is 0.99, the proportion parameter α is 2.25, the update parameter rho is 0.075, and the shielding degree parameter lambda is1The occlusion factor threshold θ is 0.5, 1. Three methods (the method of the present invention, an online Boosting algorithm, based on a space-time context algorithm, a solid line frame is the tracking effect of the method of the present invention, a pure dotted line frame is the tracking effect based on the space-time context algorithm, and a dotted line frame with black solid points is the tracking effect of the online Boosting algorithm) are initialized to the same target frame in the first frame, and the tracking effect is as shown in fig. 2. The first video sequence (a column) tracks a toy dog (background is disordered, and a target is blocked), the method can correctly track the target, but after 120 frames, other two algorithms lose the target; second video sequence(b) tracking the walking people (moving pedestrians shielding the target) on a subway platform, and showing that the method of the invention has obviously better effect than other two methods, especially after 43 frames; and a third sequence (c column) tracks a fast-running automobile (the target scale changes rapidly and part of the sequence is blocked), and the method can also carry out robust tracking, thereby verifying the feasibility of the method.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Claims (7)
1. A target tracking method combined with spatiotemporal context information is characterized by comprising the following steps:
step 1, acquiring a first frame image in a video image, calibrating a target area of the first frame image, expanding the target area to obtain a search area by taking the target area as a center, taking the search area four times as large as the target area, taking the target area as a positive sample, and taking four corner areas of the search area as four negative samples respectively; wherein the size of the target area is the same as the size of each corner area; taking the positive sample and the four negative samples as training samples, and obtaining a strong classifier according to the training samples;
step 2, learning a spatial context model according to the first frame image, and taking the spatial context model as a learned space-time context model for tracking the next frame image;
step 3, obtaining a current frame image to be tracked, and determining an initial search area of the current frame image, wherein the initial search area of the current frame image is centered above a target area of a previous frame image, and the initial search area of the current frame image is four times of the target area of the previous frame image; partitioning an initial search area of a current frame image according to the size of a target area of a previous frame image to obtain a plurality of to-be-searched subblocks with the same size;
step 4, evaluating each subblock to be searched according to the strong classifier to obtain a first confidence value of each subblock to be searched, and forming a first confidence matrix;
step 5, obtaining a confidence map function according to a space-time context model which is learned by the previous frame of image and tracks the current frame of image; determining the central point of each subblock to be searched, and respectively obtaining a second confidence value of each subblock to be searched according to the confidence map function and the central point of each subblock to be searched to form a second confidence matrix;
step 6, determining that the initial value of the weight corresponding to the first confidence matrix is 1/2, the initial value of the weight corresponding to the second confidence matrix is 1/2, and linearly combining the first confidence matrix, the weight corresponding to the first confidence matrix, the second confidence matrix and the weight corresponding to the second confidence matrix to obtain a final confidence matrix; determining the maximum confidence value in the final confidence matrix, wherein the subblock to be searched corresponding to the maximum confidence value is a target area of the tracked current frame image;
step 7, determining a search area of the current frame image, wherein the search area of the current frame image takes a target area of the current frame image as a center, and the search area of the current frame image is four times of the target area of the current frame image; taking a target area of the current frame image as a positive sample, taking four corner areas of a search area of the current frame image as four negative samples respectively, and updating the strong classifier;
step 8, learning a space context model according to the current frame image, and determining a space-time context model which is learned by the current frame and tracks the next frame image by combining the space-time context model which is learned by the previous frame image and tracks the current frame image;
step 9, updating the weight corresponding to the first confidence matrix and the weight corresponding to the second confidence matrix according to the current frame image;
the method specifically comprises the following substeps:
(9a) calculating an occlusion factor occ for the current frame image search area;
(9b) setting a shielding factor threshold value, wherein 0< 1, the weight A1 corresponding to the first confidence matrix is updated as follows, and the weight A2 corresponding to the second confidence matrix is updated as follows:
wherein, A represents the weight corresponding to the first confidence matrix determined by the previous frame of image, and Y represents the maximum confidence value in the final confidence matrix;
and 10, repeatedly executing the steps 3 to 9 until all the video images needing to be tracked are completed.
2. The method for tracking a target in combination with spatio-temporal context information as claimed in claim 1, wherein in step 1, the positive samples and the four negative samples are used as training samples, and a strong classifier is obtained according to the training samples, specifically comprising the following sub-steps:
(1a) let training sample set S { (x)i,yi)|xi∈X,yi∈ Y, i ═ 1,2, … 5}, X denotes a training sample space consisting of one positive sample and four negative samples, XiRepresents the ith training sample in the training sample space, Y represents the sample class label, and Y { -1,1}, Y {iA sample class label representing an ith training sample in the training sample space; the sample type label is 1, which means that the training sample is a positive sample, and the sample type label is-1, which means that the training sample is a negative sample;
setting M weak classifiers, wherein the M weak classifier isM represents the total number of weak classifiers;
the initial value of i is 1, and the initial value of m is 1; setting the sample importance weight lambda to be 1;
when m weak classifierWhen the classification result of the ith training sample is correct, the parameters are orderedIs added to the value of the sample importance weight lambda as the mth weak classifierNew parametersOtherwise, let the parameterIs added to the value of the sample importance weight lambda as the mth weak classifierNew parameters
Wherein the content of the first and second substances,represents the cumulative classified correct sample weight for the mth weak classifier,representing the cumulative classification error sample weight of the mth weak classifier;
(1c) adding 1 to the value of i, and repeatedly executing the substep (1b) until the value of i is greater than 5; get the m weak classifierFinal parameter of
(1d) Setting the value of i as 1, adding 1 to the value of M, and repeatedly executing the substeps (1b) to (1c) until the value of M is greater than M to obtain final parameters of M weak classifiers;
(1e) calculating the cumulative error rate of the mth weak classifierEnabling M to respectively take 1, M and respectively obtaining the accumulated error rates of M weak classifiers;
(1f) obtaining a weak classifier with the minimum accumulated error rate as the nth selectorThe initial value of N is 1, N is 1. N represents the total number of selectors;
setting the value of i to 1;
(1g) obtaining the ith training sample, and adopting the nth selectorUpdate the value of the sample importance weight λ:
when the nth selectorWhen the classification result of the ith training sample is correct, the value of the sample importance weight lambda is multiplied by 1/(2 × (1-)n) As a new sample importance weight λ), otherwise, the value of the sample importance weight λ is multiplied by 1/(2 ×)n) As a new sample importance weight λ; wherein the content of the first and second substances,ndenotes the nth selectorThe cumulative error rate of the corresponding weak classifier;
(1h) adding 1 to the value of i, and repeatedly executing the substep (1g) until the value of i is greater than 5; obtaining a final new sample importance weight lambda value;
(1i) setting the value of i as 1, setting the value of m as 1, adding 1 to the value of N, adopting the final new sample importance weight lambda, and repeatedly executing the substeps (1b) to (1h) until the value of N is greater than N to obtain N selectors;
(1j) calculating the nth selectorCorresponding voting weightSequentially taking 1, N and N from the value of N to respectively obtain voting weights corresponding to the N selectors; ln (·) represents a logarithmic function;
3. The method for tracking the target in combination with the spatiotemporal context information as claimed in claim 1, wherein in step 3, the initial search area of the current frame image is partitioned according to the size of the target area of the previous frame image to obtain a plurality of sub-blocks to be searched, wherein the block step size comprises a row step size and a column step size: the row step size is: floor ((1-T). times.W +0.5), column step size: floor ((1-T). times.H + 0.5); floor (·) denotes downward rounding, T denotes a coincidence factor between two adjacent subblocks to be searched, W denotes a width of a target region of the first frame image, and H denotes a height of the target region of the first frame image.
4. The method for tracking the target by combining the spatiotemporal context information as claimed in claim 2, wherein the step 4 specifically comprises:
5. The method for tracking the target by combining the spatiotemporal context information as claimed in claim 1, wherein the step 5 comprises the following sub-steps:
(5a) obtaining a confidence map function c (H) -IFFT (FFT (H)) according to a spatiotemporal context model for tracking the current frame image learned by the previous frame imagestc(h))⊙FFT(R(h)ωσ(h-h*)));
Wherein Hstc(h) Representing a space-time context model which is learned by a previous frame image and tracks the current frame image, h represents any position in a search area of the current frame image, and R (h) represents the gray value of a pixel at the position h in the search area of the current frame image; omegaσ(h-h*) Represents a weight function and is defined asZeta is a regularization constant, sigma is a scale parameter, h*Representing the position of the center point of the target area in the image of the previous frame, FFT (-) represents Fourier transform, IFFT (-) represents inverse Fourier transform, and ⊙ represents point multiplication;
(5b) and respectively calculating a second confidence value of each subblock to be searched by using the variable h in the confidence map function as the central point of each subblock to be searched of the current frame image to form a second confidence matrix.
6. The method for tracking the target by combining the spatiotemporal context information as claimed in claim 1, wherein the step 8 comprises the following sub-steps:
(8a) determining a context prior probability model P (c (z) o) of the current frame image:
P(c(z)|o)=R(z)ωσ(z-h*)
wherein, P (c (z) o) represents prior probability of appearance of the context feature at each pixel point in the background region of the current frame image under the condition that the object appears in the current frame search region, o represents an event that the object appears in the current frame search region, the context feature at z is represented as c (z) ═ r (z), z ∈ Ω, z is any position in the background region of the current frame image, Ω is the background region of the current frame image, the background region of the current frame image refers to the image region except the object region in the search region of the current frame image, r (z) represents gray value of pixel at position z of the background region of the current frame image, ω (z)σ(z-h*) Represents a weight function and is defined asZeta is a regularization constant, sigma is a scale parameter, h*Representing the position of the central point of the target area in the previous frame image;
(8b) determining a spatial context model P (h | c (z), o) of the current frame image:
P(h|c(z),o)=fsc(h-z)
wherein, P (h | c (z), o) represents the conditional probability that the target position is h under the condition that the target appears in the current frame image search area and the context feature appears at z, h represents any position in the current frame image search area, fsc(h-z) is a function of position h and position z, representing the learned spatial context model of the current frame;
(8c) according to a confidence functionObtaining the space context model f learned by the current framesc(h):
Wherein c (h) is a confidence map function expressed asWherein b is a constant, α is a scale parameter, β is a shape parameter,represents a convolution symbol;
(8d) the current frame image is set as the t frame image, and the space-time context model for tracking the current frame image, which is learned by the previous frame image, isSo that the current frame learns a spatiotemporal context model that tracks the next frame imageComprises the following steps:
7. The method for tracking a target in combination with spatiotemporal context information as claimed in claim 1, wherein the sub-step 9(a) comprises the following sub-steps:
(9a1) acquiring color histogram characteristics of a current frame image search area; quantizing the color histogram features to a J-th level, the J-th level features being represented as ujAnd u isjJ, J1.., J; the initial value of j is 1;
(9a2) let the position of the pixel of the target area of the first frame image be expressed ask is the total number of pixels contained in the target area of the first frame image, and the j-th level feature ujProbability density function distributed on target area of first frame imageIs defined as:
wherein C is a normalized constant, K (·) is a kernel function, | ·| caly |)2Represents the square of the modulus value, (-) represents the impulse response function,indicating a locationA quantization level of a corresponding color histogram feature;
(9a3) let the position of any pixel of the sub-block to be searched in the current frame image be expressed as di}i=1,2,...kK is the total number of pixels contained in any sub-block to be searched in the current frame image and is equal to the total number of pixels contained in the target area of the first frame image, and then the j-th-level feature ujProbability density function distributed on any sub-block to be searched of current frame imageIs defined as:
wherein s is the central point position of the current frame image target area, C is a normalized constant, K (-) is a kernel function, | | | can2Represents the square of the modulus value, (-) represents the impulse response function,indicating a locationQuantization level of temporal color histogram feature, h1Window radius as kernel function;
(9a4) recording the central point position of the subblock to be searched with the maximum confidence value in the current frame image searching region as y0Let a first intermediate variableExpressed as:
wherein λ is1More than or equal to 1, which is a shielding degree parameter;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710596203.5A CN107424175B (en) | 2017-07-20 | 2017-07-20 | Target tracking method combined with space-time context information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710596203.5A CN107424175B (en) | 2017-07-20 | 2017-07-20 | Target tracking method combined with space-time context information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107424175A CN107424175A (en) | 2017-12-01 |
CN107424175B true CN107424175B (en) | 2020-09-08 |
Family
ID=60430564
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710596203.5A Active CN107424175B (en) | 2017-07-20 | 2017-07-20 | Target tracking method combined with space-time context information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107424175B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108416800A (en) * | 2018-03-13 | 2018-08-17 | 青岛海信医疗设备股份有限公司 | Method for tracking target and device, terminal, computer readable storage medium |
CN110070562A (en) * | 2019-04-02 | 2019-07-30 | 西北工业大学 | A kind of context-sensitive depth targets tracking |
CN110570451B (en) * | 2019-08-05 | 2022-02-01 | 武汉大学 | Multithreading visual target tracking method based on STC and block re-detection |
CN110738685B (en) * | 2019-09-09 | 2023-05-05 | 桂林理工大学 | Space-time context tracking method integrating color histogram response |
CN113743252B (en) * | 2021-08-17 | 2024-05-31 | 北京佳服信息科技有限公司 | Target tracking method, device, equipment and readable storage medium |
CN114140501A (en) * | 2022-01-30 | 2022-03-04 | 南昌工程学院 | Target tracking method and device and readable storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105335986A (en) * | 2015-09-10 | 2016-02-17 | 西安电子科技大学 | Characteristic matching and MeanShift algorithm-based target tracking method |
CN106485732A (en) * | 2016-09-09 | 2017-03-08 | 南京航空航天大学 | A kind of method for tracking target of video sequence |
WO2017044550A1 (en) * | 2015-09-11 | 2017-03-16 | Intel Corporation | A real-time multiple vehicle detection and tracking |
-
2017
- 2017-07-20 CN CN201710596203.5A patent/CN107424175B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105335986A (en) * | 2015-09-10 | 2016-02-17 | 西安电子科技大学 | Characteristic matching and MeanShift algorithm-based target tracking method |
WO2017044550A1 (en) * | 2015-09-11 | 2017-03-16 | Intel Corporation | A real-time multiple vehicle detection and tracking |
CN106485732A (en) * | 2016-09-09 | 2017-03-08 | 南京航空航天大学 | A kind of method for tracking target of video sequence |
Non-Patent Citations (3)
Title |
---|
《On-line Boosting and Vision》;Helmut Grabner等;《2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition》;20060630;第1卷;第260-267页 * |
《复杂场景下实时目标跟踪算法及实现技术研究》;张雷;《中国博士学位论文全文数据库 信息科技辑》;20160815(第8期);I138-39:正文75-78、81-86页 * |
张雷.《复杂场景下实时目标跟踪算法及实现技术研究》.《中国博士学位论文全文数据库 信息科技辑》.2016,(第8期),I138-39:正文74-78、80-86页. * |
Also Published As
Publication number | Publication date |
---|---|
CN107424175A (en) | 2017-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107424175B (en) | Target tracking method combined with space-time context information | |
WO2020173226A1 (en) | Spatial-temporal behavior detection method | |
KR102462572B1 (en) | Systems and methods for training object classifiers by machine learning | |
Wang et al. | Detection of abnormal visual events via global optical flow orientation histogram | |
CN104063883B (en) | A kind of monitor video abstraction generating method being combined based on object and key frame | |
US10198657B2 (en) | All-weather thermal-image pedestrian detection method | |
CN108564598B (en) | Improved online Boosting target tracking method | |
CN111680655A (en) | Video target detection method for aerial images of unmanned aerial vehicle | |
CN111932583A (en) | Space-time information integrated intelligent tracking method based on complex background | |
CN108960047B (en) | Face duplication removing method in video monitoring based on depth secondary tree | |
CN110765906A (en) | Pedestrian detection algorithm based on key points | |
CN112597815A (en) | Synthetic aperture radar image ship detection method based on Group-G0 model | |
CN112836640A (en) | Single-camera multi-target pedestrian tracking method | |
CN109919223B (en) | Target detection method and device based on deep neural network | |
CN110084201B (en) | Human body action recognition method based on convolutional neural network of specific target tracking in monitoring scene | |
CN104978567A (en) | Vehicle detection method based on scenario classification | |
CN112270381B (en) | People flow detection method based on deep learning | |
CN113688761B (en) | Pedestrian behavior category detection method based on image sequence | |
CN111724566A (en) | Pedestrian falling detection method and device based on intelligent lamp pole video monitoring system | |
Cao et al. | Learning spatial-temporal representation for smoke vehicle detection | |
CN113129336A (en) | End-to-end multi-vehicle tracking method, system and computer readable medium | |
Teng et al. | Robust multi-scale ship tracking via multiple compressed features fusion | |
CN113963333B (en) | Traffic sign board detection method based on improved YOLOF model | |
Fan et al. | Video anomaly detection using CycleGan based on skeleton features | |
CN111144220B (en) | Personnel detection method, device, equipment and medium suitable for big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |