CN106952288A

CN106952288A - Based on convolution feature and global search detect it is long when block robust tracking method

Info

Publication number: CN106952288A
Application number: CN201710204379.1A
Authority: CN
Inventors: 李映; 林彬; 杭涛
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2017-03-31
Filing date: 2017-03-31
Publication date: 2017-07-14
Anticipated expiration: 2037-03-31
Also published as: CN106952288B

Abstract

The present invention relates to it is a kind of based on convolution feature and global search detect it is long when block robust tracking method, by using convolution feature and multiple dimensioned correlation filtering method in tracking module, enhance the feature representation ability of tracking target appearance model so that tracking result has very strong robustness for factors such as illumination variation, target scale change, target rotations；Further through the global search testing mechanism of introducing, so that when target by it is long when block cause tracking failure when, detection module can detect target again, tracking module is recovered from mistake, accordingly even when in the case where target appearance changes, can also be tracked long lasting for ground.

Description

Based on convolution feature and global search detect it is long when block robust tracking method

Technical field

The invention belongs to computer vision field, it is related to a kind of method for tracking target, and in particular to one kind is based on convolution feature With global search detect it is long when block robust tracking method.

Background technology

The main task of target following is to obtain the position of specific objective and movable information in video sequence, in video prison The fields such as control, man-machine interaction have a wide range of applications.During tracking, illumination variation, background are complicated, target is rotated or contracted The complexity of Target Tracking Problem can all be increased by the factor such as putting, especially when target by it is long when block when, then be easier to cause with Track fails.

Document " Tracking-learning-detection, IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(7):The tracking that 1409-1422 " is proposed is (referred to as TLD) traditional track algorithm and detection algorithm are combined first, tracking result is improved using testing result, improves and is The reliability and robustness of system.Its track algorithm is based on optical flow method, and detection algorithm produces substantial amounts of detection window, for each inspection Survey window, it is necessary to received that last testing result could be turned into by three detectors.For occlusion issue, TLD provides one Individual effective resolving ideas, can carry out long time-tracking (Long-term Tracking) to target.But, TLD is used Be shallow-layer manual features, the sign to target is limited in one's ability, and the design of detection algorithm is also complex, there is certain change Enter space.

The content of the invention

The technical problem to be solved

In order to avoid in place of the deficiencies in the prior art, the present invention proposes a kind of to detect based on convolution feature and global search Block robust tracking method when long, solve video frequency motion target during tracking due to it is long when block or target translates out the visual field Outside etc. factor cause display model to drift about so that be easily caused tracking failure the problem of.

Technical scheme

It is a kind of based on convolution feature and global search detect it is long when block robust tracking method, it is characterised in that step is such as Under：

Step 1 reads the first frame image data and the initial position message [x, y, w, h] where target in video, wherein X, y represent the abscissa and ordinate of target's center, and w, h represents the wide and height of target.(x, y) corresponding coordinate points are designated as P, Centered on P, size is designated as R for w × h target prime area_init, then the yardstick of target is designated as scale, it is initialized as 1.

Step 2 determines a region R comprising target and background information centered on P_bkg, R_bkgSize be M × N, M =2w, N=2h.Using VGGNet-19 as CNN models, convolution characteristic pattern is extracted to R' in the 5th layer of convolutional layer (conv5-4) z_{target_init}.Then according to z_{target_init}Build the object module of tracking moduleT ∈ { 1,2 ..., T }, T are CNN moulds Type port number, computational methods are as follows：

Wherein, the variable of capitalization is expression of the corresponding non-capitalized variables on frequency domain, gaussian filtering templateM, n be Gaussian function independent variable, m ∈ { 1,2 ..., M }, n ∈ { 1,2 ..., N }, σ_targetFor the bandwidth of Gaussian kernel,⊙ representative element multiplication operations, upper line represents complex conjugate, λ₁For adjustment Parameter (in order to avoid denominator introduces for 0), is set as 0.0001.

Step 3 extracts the image subblock of S different scale centered on P, and S is set as 33.The size of each sub-block is w × h × s, variable s are the scale factor of image subblock, s ∈ [0.7,1.4].Then the HOG for extracting each image subblock respectively is special Levy, turn into the HOG characteristic vectors of a S dimension after merging, scale feature vector is named as here, z is designated as_{scale_init}.Again According to z_{scale_init}Build the Scale Model W of tracking module_scale, computational methods in step 2 with calculatingIt is similar (yardstick is special Levy vector and replace convolution characteristic pattern), it is specific as follows：

Wherein,S' is Gaussian function independent variable, s' ∈ { 1,2 ..., S }, σ_scaleFor Gaussian kernel Bandwidth,λ₂For adjusting parameter, it is set as 0.0001.

Step 4 is to target prime area R_initGray feature is extracted, obtained gray feature represents it is a two-dimensional matrix, Here the matrix is named as target appearance representing matrix, is designated as A_k, subscript k represents current frame number, k=1 when initial.Then will The Filtering Model D of detection module is initialized as A₁, i.e. D=A₁, the history that reinitializes object representation set of matrices A_his。A_hisWork With being current and each frame before the target appearance representing matrix of storage, i.e. A_his={ A₁,A₂,...,A_k, A when initial_his= {A₁}。

Step 5 reads next two field picture, still centered on P, and extraction size is R_bkg× scale process scaling Target search region afterwards.Then by the convolution feature in the CNN network extraction target searches region in step 2, and with bilateral The mode of interpolation samples R_bkgSize obtain the convolution characteristic pattern z of present frame_{target_cur}, recycle object moduleMeter Calculate target confidence map f_target, computational methods are as follows：

Wherein,For inverse Fourier transform.Final updating P coordinate, f is modified to by (x, y)_targetIn peak response The corresponding coordinate of value：

Step 6 is extracted the image subblock of S different scale, each image subblock is then extracted respectively centered on P The scale feature vector z of present frame is obtained after HOG features, merging_{scale_cur}(with z in step 3_{scale_init}Computational methods).Again Utilize Scale Model W_scaleCalculate yardstick confidence map：

The yardstick scale of final updating target, computational methods are as follows：

So far, output of the tracking module in present frame (kth frame) can be obtained：Centered on using coordinate as the P of (x, y), greatly Small is R_init× scale image subblock TPatch_k.In addition, the f that completion will be computed_targetIn maximum response brief note For TPeak_k, i.e. TPeak_k=f_target(x,y)。

The entire image of Filtering Model D and present frame is carried out convolution, meter by step 7 detection module in the way of global search Calculate Filtering Model D and the similarity degree of each position of present frame.J value (j is set as 10) before responsiveness highest is taken, and respectively Centered on the corresponding location point of this j value, extraction size is R_init× scale j image subblock.By this j image Block generates an image subblock set DPatches as element_k, i.e. output of the detection module in kth frame.

The set DPatches that step 8 is exported to detection module_kIn each image subblock, it is calculated respectively defeated with tracking module The TPatch gone out_kBetween pixel Duplication, j value can be obtained, wherein highest value is designated asIfLess than threshold value(It is set as 0.05), being determined as that target is blocked, it is necessary to suppress learning rate β of the tracking module in model modification completely, And go to step 9；Otherwise initial learning rate β is pressed_init(β_initIt is set as 0.02) being updated, and goes to step 10.β computing formula It is as follows：

Step 9 is according to DPatches_kIn each image subblock center, respectively extract size be R_bkg× scale j mesh Region of search is marked, according to the method in step 5 is to each target search extracted region convolution characteristic pattern and calculates target confidence Figure, can obtain the maximum response on j target search region.It is compared again in this j response, by maximum value It is designated as DPeak_k.If DPeak_kMore than TPeak_k, then P coordinate is updated again, and (x, y) is modified to DPeak_kCorresponding Coordinate.And recalculate target scale characteristic vector and target scale scale (with the calculation in step 6).

Step 10 target is defined as P in the optimal place-centric of present frame, and optimal scale is defined as scale.In the picture Indicate new target area R_new, i.e., centered on P, wide and high respectively w × scale, h × scale rectangle frame.In addition, The convolution characteristic pattern for being computed completing and can obtain optimal objective place-centric P is abbreviated as z_target；Equally, by energy The scale feature vector for accessing optimal objective yardstick scale is abbreviated as z_scale。

Step 11 utilizes z_target、z_scale, and the object module in the tracking module of previous frame foundationAnd yardstick Model W_scale, model modification is carried out in the way of weighted sum respectively, computational methods are as follows：

Wherein, β is the learning rate after step 8 is calculated.

The new target area R of step 12 pair_newExtract the target appearance representing matrix A that present frame is obtained after gray feature_k, By A_kIt is added to history object representation set of matrices A_his.If set A_hisMiddle element number is more than_c(_cBe set as 20), then from A_hisMiddle random selection_cIndividual one three-dimensional matrice C of Element generation_k, C_k(:, i) corresponding is A_hisIn any one element it is (i.e. two-dimentional Matrix A_k)；Otherwise A is used_hisMiddle all elements generator matrix C_k.Then to C_kAverage and obtain two-dimensional matrix, by this two Matrix is tieed up as the new Filtering Model D of detection module, computational methods are as follows：

Step 13 judges whether to have handled picture frames all in video, and algorithm terminates if having handled, and otherwise goes to step 5 Continue executing with.

Beneficial effect

It is proposed by the present invention it is a kind of based on convolution feature and global search detect it is long when block robust tracking method, respectively Tracking module and detection module are devised, during tracking, two module cooperative work：Tracking module mainly utilizes convolutional Neural The convolution feature that network (Convolutional Neural Network, CNN) extracts target is used for the target mould for building robust Type, and by histograms of oriented gradients (Histogram of Oriented Gradient, HOG) feature construction Scale Model, Determine the place-centric and yardstick of target respectively with reference to correlation filtering method；Detection module extracts gray feature and builds target Filtering Model, is used for quickly detecting and judges the generation blocked, one by the way of global search in entire image to target Denier target is blocked (or other factorses cause target appearance acute variation) completely, and detection module is tracked using testing result amendment The position of target, and suppress the model modification of tracking module, prevent from introducing unnecessary noise so as to cause model drift about and with Track fails.

Superiority：By using convolution feature and multiple dimensioned correlation filtering method in tracking module, tracking mesh is enhanced Mark the feature representation ability of display model so that tracking result is for factors such as illumination variation, target scale change, target rotations With very strong robustness；Further through the global search testing mechanism of introducing so that when target by it is long when block cause tracking lose When losing, detection module can detect target again, tracking module is recovered from mistake, accordingly even when in target appearance In the case of change, it can also be tracked long lasting for ground.

Brief description of the drawings

Fig. 1：Based on convolution feature and global search detect it is long when block robust tracking method flow chart

Embodiment

In conjunction with embodiment, accompanying drawing, the invention will be further described：

Wherein, the variable of capitalization is expression of the corresponding non-capitalized variables on frequency domain, gaussian filtering templateM, n be Gaussian function independent variable, m ∈ { 1,2 ..., M }, n ∈ { 1,2 ..., N }, σ_targetFor the bandwidth of Gaussian kernel,⊙ representative element multiplication operations, upper line represents complex conjugate, λ₁To adjust Whole parameter (in order to avoid denominator introduces for 0), is set as 0.0001.

Wherein, β is the learning rate after step 8 is calculated.

Claims

1. it is a kind of based on convolution feature and global search detect it is long when block robust tracking method, it is characterised in that step is such as Under：

Step 1：Read the first frame image data and the initial position message [x, y, w, h] where target, wherein x, y in video The abscissa and ordinate of target's center are represented, w, h represents the wide and height of target；(x, y) corresponding coordinate points are designated as P, with P Centered on, size is designated as R for w × h target prime area_init, then the yardstick of target is designated as scale, it is initialized as 1；

Step 2：Centered on P, a region R comprising target and background information is determined_bkg, R_bkgSize be M × N, M= 2w, N=2h；Using VGGNet-19 as CNN models, convolution characteristic pattern is extracted to R' in the 5th layer of convolutional layer (conv5-4) z_{target_init}；Then according to z_{target_init}Build the object module of tracking moduleT ∈ { 1,2 ..., T }, T are CNN moulds Type port number, computational methods are as follows：

Wherein：The variable of capitalization is expression of the corresponding non-capitalized variables on frequency domain, gaussian filtering templateM, n be Gaussian function independent variable, m ∈ { 1,2 ..., M }, n ∈ { 1,2 ..., N }, σ_targetFor the bandwidth of Gaussian kernel,⊙ representative element multiplication operations, upper line represents complex conjugate, λ₁To adjust Whole parameter；

Step 3：Centered on P, the image subblock of S different scale is extracted, S is set as 33；The size of each sub-block is w × h × s, variable s are the scale factor of image subblock, s ∈ [0.7,1.4]；Then the HOG features of each image subblock are extracted respectively, Turn into the HOG characteristic vectors of a S dimension after merging, and be named as scale feature vector, be designated as z_{scale_init}；Further according to z_{scale_init}Build the Scale Model W of tracking module_scale, computational methods are as follows：

Wherein,S' is Gaussian function independent variable, s' ∈ { 1,2 ..., S }, σ_scaleFor the band of Gaussian kernel Width,λ₂For adjusting parameter；

Step 4：To target prime area R_initGray feature is extracted, the two-dimensional matrix that obtained gray feature is represented is named as mesh Outward appearance representing matrix is marked, A is designated as_k, subscript k represents current frame number, k=1 when initial；Then by the Filtering Model D of detection module It is initialized as A₁, i.e. D=A₁, the history that reinitializes object representation set of matrices A_his；A_hisTo store current and each frame before Target appearance representing matrix, i.e. A_his={ A₁,A₂,...,A_k, A when initial_his={ A₁}；

Step 5：Next two field picture is read, still centered on P, extraction size is R_bkg× scale after scaling Target search region；Then by the convolution feature in the CNN network extraction target searches region in step 2, and with bilateral interpolation Mode sample R_bkgSize obtain the convolution characteristic pattern z of present frame_{target_cur}, recycle object moduleCalculate mesh Mark confidence map f_target, computational methods are as follows：

Wherein,For inverse Fourier transform.Final updating P coordinate, f is modified to by (x, y)_targetIn maximum response institute Corresponding coordinate：

(x, y) = \underset{x^{'}, y^{'}}{\arg m a x} (f_{t \arg e t} (x^{'}, y^{'}));

Step 6：Centered on P, the image subblock of S different scale is extracted, the HOG that each image subblock is then extracted respectively is special Levy, the scale feature vector z of present frame is obtained after merging_{scale_cur}, with z in step 3_{scale_init}Computational methods；Recycle chi Spend model W_scaleCalculate yardstick confidence map：

s c a l e = \underset{s^{''}}{\arg m a x} (f_{s c a l e} (s^{''}))

Obtain output of the tracking module in present frame (kth frame)：Centered on using coordinate as the P of (x, y), size is R_init×scale Image subblock TPatch_k；In addition, the f that completion will be computed_targetIn maximum response be abbreviated as TPeak_k, i.e., TPeak_k=f_target(x,y)；

Step 7：The entire image of Filtering Model D and present frame is carried out convolution by detection module in the way of global search, is calculated Filtering Model D and the similarity degree of each position of present frame；J value before responsiveness highest is taken, and it is corresponding with j value respectively Centered on location point, extraction size is R_init× scale j image subblock；It regard j image subblock as element, generation one Individual image subblock set DPatches_k, i.e. output of the detection module in kth frame；

Step 8：The set DPatches of detection module output is calculated respectively_kIn each image subblock and tracking module export TPatch_kBetween pixel Duplication, obtain j value, wherein highest value be designated asIfLess than threshold valueJudge Blocked completely for target, it is necessary to suppress learning rate β of the tracking module in model modification, and go to step 9；Otherwise by initial Habit rate β_initIt is updated, and goes to step 10；

The computing formula of the β is as follows：

Step 9：According to DPatches_kIn each image subblock center, respectively extract size be R_bkg× scale j target is searched Rope region, according to the method in step 5 is to each target search extracted region convolution characteristic pattern and calculates target confidence map, is obtained Maximum response onto j target search region；Maximum value in j response is designated as DPeak_k；If DPeak_kGreatly In TPeak_k, then P coordinate is updated again, and (x, y) is modified to DPeak_kCorresponding coordinate；And recalculate target scale Characteristic vector and target scale scale, using the calculation in step 6；

Step 10：Target is defined as P in the optimal place-centric of present frame, and optimal scale is defined as scale；Indicate in the picture Go out new target area R_new, centered on P, wide and high respectively w × scale, h × scale rectangle frame；In addition, by Calculate and complete and can obtain optimal objective place-centric P convolution characteristic pattern and be abbreviated as z_target；Equally, it is possible to obtain Optimal objective yardstick scale scale feature vector is abbreviated as z_scale；

Step 11：Utilize z_target、z_scale, and the object module in the tracking module of previous frame foundationAnd Scale Model W_scale, model modification is carried out in the way of weighted sum respectively, computational methods are as follows：

W_{t \arg e t}^{t} = W_{t \arg e t_n e w}^{t}

W_scale=W_{scale_new}；

Step 12：To new target area R_newExtract the target appearance representing matrix A that present frame is obtained after gray feature_k, by A_k It is added to history object representation set of matrices A_his；If set A_hisMiddle element number is more than c, then from A_hisMiddle random selection c One three-dimensional matrice C of Element generation_k, C_k(:, i) corresponding is A_hisIn any one element, i.e. two-dimensional matrix A_k；Otherwise A is used_his Middle all elements generator matrix C_k；Then to C_kAverage and obtain two-dimensional matrix, regard this two-dimensional matrix as detection module New Filtering Model D, computational methods are as follows：

D = \{\begin{matrix} \frac{1}{c} \underset{i = 1, ..., c}{Σ} C (:, i), & \begin{matrix} i f & | A_{h i s} | > c \end{matrix} \\ \frac{1}{| A_{h i s} |} \underset{i = 1, ..., | A_{h i s} |}{Σ} C (:, i), & o t h e r w i s e \end{matrix}

Step 13：Algorithm terminates if picture frames all in video have been handled, and otherwise goes to step 5 and continues executing with.

2. according to claim 1 based on convolution feature and global search detect it is long when block robust tracking method, it is special Levy and be：The adjusting parameter λ₁And λ₂It is set as 0.0001.

3. according to claim 1 based on convolution feature and global search detect it is long when block robust tracking method, it is special Levy and be：The j values are set as 10.

4. according to claim 1 based on convolution feature and global search detect it is long when block robust tracking method, it is special Levy and be：The threshold valueIt is set as 0.05.

5. according to claim 1 based on convolution feature and global search detect it is long when block robust tracking method, it is special Levy and be：The initial learning rate β_initIt is set as 0.02.

6. according to claim 1 based on convolution feature and global search detect it is long when block robust tracking method, it is special Levy and be：The c is set as 20.