CN110660077A

CN110660077A - Multi-scale target tracking method fusing multiple features

Info

Publication number: CN110660077A
Application number: CN201910861204.7A
Authority: CN
Inventors: 尚振宏; 曾梦媛
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2019-09-12
Filing date: 2019-09-12
Publication date: 2020-01-07

Abstract

The invention relates to a multi-scale target tracking method fusing multiple characteristics, and belongs to the technical field of computer vision tracking. Firstly, extracting HOG characteristics and Color (CN) characteristics from a target, and fusing the two characteristics by using a linear weighting method to predict a target position; then determining the optimal scale of the target by using the HOG characteristics of the target under multiple scales; and finally, designing a re-detection module by using an image blocking and sample one-by-one testing method, starting the re-detection module when the target cannot be stably tracked due to shielding, finding out a target central point by using a maximum response value after re-detection, and simultaneously introducing a model automatic updating strategy to update the position of the target, thereby avoiding the tracking drift phenomenon. The method can effectively process the situation of tracking drift caused by error accumulation of the shielding filter in the traditional related filtering tracking method, can effectively track the target under the complex situation, and improves the tracking robustness.

Description

Multi-scale target tracking method fusing multiple features

Technical Field

The invention relates to the technical field of computer vision target tracking, in particular to a multi-scale target tracking method fusing multiple characteristics.

Background

Target tracking is one of the key research directions in computer vision, and comprises machine learning, event detection, signal processing, video monitoring, automatic driving, statistics and other related knowledge. Although the target tracking has been advanced greatly in these years, the related filtering tracking method still faces a great challenge due to factors such as occlusion, fast motion, blur, illumination and deformation of the tracked target.

The existing target tracking method based on multiple characteristics has the defect that the occlusion, deformation and out-of-plane rotation cannot be processed. When the target is blocked and the tracking fails, a target re-detection module is not provided, the method for searching the optimal scale estimation corresponding to the maximum output response value and the optimal central point position of the target is optimized on the basis of the tracker of the existing method, the thought of image block re-detection is provided, the image blocks after being partitioned are subjected to relevant filtering one by one, the position of the maximum response value is found, and the target central point position of the current frame is updated. In addition, a model updating strategy is introduced, the model is normally updated when the target normally tracks, when the target starts to be shielded, the model is updated slowly or even stopped to be updated, and the error accumulation of the model is prevented.

Disclosure of Invention

In order to make up for the defects of the prior art, the invention provides a multi-scale target tracking method fusing a plurality of characteristics from three aspects of classifier training, model updating and adaptive scale estimation.

To achieve the above purpose, four parts are designed: classifier training, position estimation, adaptive scale estimation and model updating.

The method of each part is as follows:

and training the classifier, namely training a position filter for each layer of neural network by establishing a minimum cost function to obtain an optimal correlation filter:

in the formula (1), f^lIs a d-dimensional feature vector, d represents the dimension of the selected feature, l ∈ {1, 2.. d }. Each channel has a corresponding filter h^lThe operation of convolution is represented, the superscript l represents a certain dimension of the characteristic, g represents the output of an ideal Gaussian function, l is a regularization coefficient, lambda has the function of eliminating the influence of zero-frequency components in the frequency spectrum on one hand, and the overfitting of a filter is avoided on the other hand, and the formula (1) can be solved in a frequency domain to obtain a formula (2):

where capital letters are the frequency domain form of the corresponding variable, and the upper dash line indicates the conjugate complex number of the variable.

The position estimate is used to predict a target position, and is specifically configured to:

after the filter training is finished, the position of the target of the next frame is estimated, and after the HOG characteristic and the CN characteristic of the test sample are extracted, the spatial response of the corresponding characteristic target position is respectively obtained:

wherein

Representing the inverse fourier transform and Z representing the frequency domain description of the candidate samples input in the new frame. And adopting a fixed weight fusion strategy to perform response graph fusion of the HOG characteristic and the CN characteristic, wherein the maximum response value is the predicted target position. The fusion formula is:

y＝(1-r)y_hog+ry_color (4)

the scale estimation work is as follows:

when the target is subjected to scale estimation, the optimal scale of the target is found by using a scale filter on the position of the predicted target by using the HOG characteristics of the target image. The target scale estimation process is similar to the target position predicted by the position filter, and the specific operation is as follows:

assuming that the size of an input sample is P multiplied by R, the target optimal scale is determined by the maximum value of the scale response obtained by convolving the input sample with a one-dimensional scale filter, and the selection principle for evaluating the target scale sample is as follows:

in the formula (5), a is a step factor; p, R are width and height of the previous frame of the target, respectively; and S is the number of scales.

The filter updating mode is as follows:

for the tracking task, the target is always in the change, and the position and scale filter needs to be updated to adapt to the appearance and scale change of the target. To solve the problem of calculating a linear equation of dimension d × d for each pixel point in the image, the numerator in equation (2) can be paired

And denominator B_tAnd respectively updating:

the position filter and the scale filter are both updated according to equation (6). In order to avoid the phenomenon that tracking drift occurs due to the influence of conditions such as shielding, rotation and illumination change on model updating training, the method introduces a model updating strategy to control the speed of model updating, and aims to reduce the accumulated error of a target model. Setting a threshold value T_rWhen responding to the maximum value y>T_rIf so, normally updating the model of the current frame (the tth frame), namely the formula (6); otherwise, the target area is considered to have uncertainty and is not updated.

Compared with the existing mainstream correlation filtering tracking method, the method has the advantages that:

1. the HOG characteristic has better adaptability to small deformation, illumination change and the like of the target, but when the target is greatly deformed or shielded, the HOG characteristic cannot adapt to the change of the target, so that the target is lost; the CN feature is a global feature based on pixel points, can effectively solve the problems of target deformation and scale change, but cannot adapt to illumination change. Therefore, the defects of the method can be well complemented by using a mode of fusing HOG characteristics and CN characteristics, so that the tracking can adapt to various conditions, and the accuracy and the robustness of the target tracking method are improved.

2. The improved method based on image block re-detection is provided, the maximum response searching method is optimized, meanwhile, a model automatic updating strategy is introduced, the problem of target shielding can be effectively solved, and robust target tracking can be achieved under complex scenes such as illumination change, shielding, motion blurring and rotation.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

In order to more clearly understand the technical features, objects, and effects of the present invention, embodiments of the present invention will now be described with reference to the accompanying drawings.

FIG. 1 is a flow chart of the method of the present invention:

the first step is as follows: and acquiring initial position information and scale information of the target.

The second step is that: and extracting the HOG characteristic and the CN characteristic of the target according to the initial information acquired in the first step.

Training a position filter by establishing a minimum cost function to obtain an optimal correlation filter:

in the formula (1), f^lIs a d-dimensional feature vector, d represents the dimension of the selected feature, l ∈ {1, 2.. d }. Each channel has a corresponding filter h^lThe operation of convolution is represented, the superscript l represents a certain dimension of the characteristic, g represents an ideal Gaussian function output, and λ is a regularization coefficient, the function of λ is to eliminate the influence of zero-frequency components in the frequency spectrum on one hand, and on the other hand, to avoid filter overfitting, the formula (1) can be solved in the frequency domain to obtain the formula (2):

The third step: and respectively establishing related filtering models according to the second step, extracting HOG characteristics and CN characteristics of the test sample, and respectively obtaining the spatial response of the corresponding characteristic target position:

wherein

Representing the inverse fourier transform and Z representing the frequency domain description of the candidate samples input in the new frame.

And adopting a fixed weight fusion strategy to perform response graph fusion of the HOG characteristic and the CN characteristic, wherein the maximum response value is the predicted target position. The fusion formula is:

y＝(1-r)y_hog+ry_color (10)

the fourth step: and predicting the target position according to the filter response of the third step. The maximum response value y is the predicted target position.

The fifth step: and according to the target position information predicted in the fourth step, on the basis, HOG characteristics are extracted from the target, and the optimal scale of the target is found by using a scale filter.

The size of an input sample is P multiplied by R, the target optimal scale is determined by the maximum scale response value obtained by convolution of the input sample and a one-dimensional scale filter, and the selection principle for evaluating the target scale sample is as follows:

And a sixth step: based on the predicted target response value, a threshold value T can be set for this purpose_rWhen responding to the maximum value y>T_rIf so, normally updating the model of the current frame (the tth frame), namely the formula (6); otherwise, the target area is considered to have uncertainty and is not updated.

The position filter and the scale filter are both updated according to equation (6).

The above are only preferred embodiments of the invention and, of course, are not intended to limit the scope of the invention. Accordingly, equivalent changes made in the claims of the invention are still within the scope of the invention.

Claims

1. A multi-scale target tracking method fusing multiple features is characterized in that the multi-scale target tracking method fusing multiple features specifically comprises the following steps:

the first step is as follows: acquiring initial position information and scale information of a target;

the second step is that: extracting HOG characteristics and CN characteristics of a target according to the initial information acquired in the first step;

the third step: according to the HOG characteristics and CN characteristics extracted in the second step, a linear weighting method is utilized and a fixed coefficient fusion mode is adopted to obtain related filtering response;

the fourth step: predicting the target position according to the filter response of the third step;

the fifth step: according to the target position information predicted in the fourth step, on the basis, HOG characteristics are extracted from the target, and the optimal scale of the target is found by using a scale filter;

and a sixth step: finding the optimal predicted position of the target, retraining the model at the new position, introducing a model updating strategy, and controlling the model updating speed of the whole tracking process, and circulating the steps until the last frame of the image sequence.

2. The multi-scale target tracking method fusing multiple features according to claim 1, characterized in that: the HOG features and CN features extracted in the second step are obtained by constructing an optimal correlation filter to train a position filter for each feature, wherein the optimal correlation filter is obtained by a minimum cost function, and the formula is as follows:

in the formula (1), f^lIs a d-dimensional feature vector, d represents the dimension of the selected feature, l ∈ {1, 2.. d }. Each channel has a corresponding filter h^lThe x represents convolution operation, the superscript l represents a certain dimension of the feature, g represents ideal Gaussian function output, and lambda is a regularization coefficient;

equation (1) can be solved in the frequency domain to obtain equation (2):

where capital letters are the frequency domain form of the corresponding variable, and the upper dash line indicates the conjugate complex number of the variable. After the filter training is finished, the position of the target of the next frame is estimated, the HOG characteristic and the CN characteristic of the test sample are extracted, and the spatial response of the corresponding characteristic target position is respectively obtained:

whereinRepresenting the inverse fourier transform and Z representing the frequency domain description of the candidate samples input in the new frame. The maximum value of the filter response is the target position, and the target scale estimation process is similar to the target position predicted by the position filter.

3. The multi-scale target tracking method fusing multiple features according to claim 1, characterized in that: in the sixth step, the model is retrained, and a model updating strategy is introduced to match molecules in the formula (2)

And denominator B_tAnd respectively updating:

eta is the learning rate, the numeric area is more than or equal to 0 and less than 0.1, and the larger the value is, the larger the occupied weight is.

4. The multi-scale target tracking method fusing multiple features according to claim 2, characterized in that: the maximum value of the target response is the predicted target position, and a threshold value T can be set for the predicted target position_rWhen responding to the maximum value y>T_rIf so, normally updating the model of the current frame (the tth frame), namely the formula (4); otherwise, the target area is considered to have uncertainty and is not updated.