CN110276785B

CN110276785B - Anti-shielding infrared target tracking method

Info

Publication number: CN110276785B
Application number: CN201910547576.2A
Authority: CN
Inventors: 田瑛; 钟妤; 胡宏博; 彭真明; 李美惠; 张天放; 龙鸿峰; 彭凌冰; 蒲恬
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-06-24
Filing date: 2019-06-24
Publication date: 2023-03-31
Anticipated expiration: 2039-06-24
Also published as: CN110276785A

Abstract

The invention discloses an anti-shielding infrared target tracking method, solves the problem of long-time target tracking in complex environments such as attitude change, shielding and the like, and belongs to the field of target tracking technology and computer vision. Reading an infrared image sequence, selecting a target in an initial frame image to obtain the central position and the size of the target, taking the target in the initial frame image as a template, obtaining a second frame image as a current frame image, taking the template of the initial frame image as the template of the current frame image, and obtaining a two-dimensional cosine window according to the size of the template and the cell unit size; and initializing or updating a target model and a target regression coefficient based on the characteristics of the histogram of the directional gradient and the haar extraction template and linear fusion, and obtaining a multi-layer kernel correlation filter response graph set based on a search frame in each frame of image to perform subsequent target tracking. The invention is used for infrared image target tracking.

Description

Anti-shielding infrared target tracking method

Technical Field

An anti-blocking infrared target tracking method is used for infrared image target tracking and belongs to the field of target tracking technology and computer vision.

Background

In the above complex environments, due to severe deformation or occlusion of the target, no good algorithm can automatically track the person of interest, and manual force must be used.

The existing target tracking method mainly comprises the steps of based on regional information, such as a template matching method, being simple, accurate and fast, but not being suitable for complex environments such as severe target deformation and the like, and easily causing target loss under the conditions; based on model information, by establishing a geometric model of the target and searching the model, the method is difficult to solve the shielding problem, and the shielding resistance is weaker due to the lack of color information in the infrared environment; based on a Bayes framework, namely, on the basis of capturing the initial state of a target and extracting the target features through features, a space-time combined target state estimation is carried out, which can be used for target position estimation under the condition of being shielded, but the algorithm complexity is higher; the deep learning-based method has good robustness but is easy to cause data loss, and the network training speed is difficult to meet the real-time requirement; based on relevant filtering, the method is high in speed, wherein KCF filtering has the characteristics of high speed and high accuracy, the tracking speed is improved by nearly 10 times compared with tracking algorithms such as Struck and TLD, and the accuracy is extremely high compared with an MOSSE algorithm with the accuracy of 43.1% of OBT50, and the accuracy can reach 73.2% under the condition of using HOG characteristics.

Aiming at the problem that the tracking accuracy is difficult to guarantee due to the influence of target change and external environment on the infrared imaging target tracking under the complex condition, finding an anti-shielding long-term tracking algorithm is a problem which needs to be solved urgently at present. And the existing improved algorithms based on kernel correlation filtering solve the problem of tracking failure caused by the fact that a target is shielded in the tracking process to a certain extent. When the target scale is not considered to be changed greatly, namely under the shielding condition, the algorithm can still accurately complete the matching of the search area and the target to a great extent, so as to realize the target tracking, but if the target scale is considered to be changed greatly, under the shielding condition, the problem of tracking failure is easily caused.

Disclosure of Invention

Aiming at the problems of the research, the invention aims to provide an anti-blocking infrared target tracking method, which solves the problem that the target tracking method adopted in the prior art is influenced by blocking and is easy to cause tracking failure.

In order to achieve the purpose, the invention adopts the following technical scheme:

an anti-blocking infrared target tracking method comprises the following steps:

s1: reading an infrared image sequence, framing a target in an initial frame image to obtain the central position and the size of the target, taking the target in the initial frame image as a template, obtaining a second frame image as a current frame image, and taking the template of the initial frame image as the template of the current frame image;

s2: obtaining a two-dimensional cosine window according to the size of the template and the size of the cell unit;

s3: performing linear fusion on the features of the directional gradient histogram and the haar extraction template, adding a two-dimensional cosine window to the linearly fused features to obtain fusion features, calculating to obtain a target regression coefficient on the basis of the fusion features, initializing the target model and the target regression coefficient by using the target regression coefficient if the target regression coefficient is obtained by calculating the second frame image, and not processing if the target regression coefficient is obtained by calculating the last frame image, otherwise updating the target model and the target regression coefficient;

s4: determining a search frame of the current frame image by taking the template center of the current frame image as the center position of the search frame;

s5: traversing in a search frame in a current frame image based on the size of a template to obtain a set of regions to be matched, obtaining fusion characteristics corresponding to a plurality of regions to be matched based on the set of regions to be matched, and calculating a multi-kernel correlation filter response map corresponding to each region to be matched based on the fusion characteristics, a corresponding target model and a target regression coefficient to obtain a multi-kernel correlation filter response map set;

s6: judging whether the maximum response value in the multi-layer kernel correlation filtering response image set is larger than or equal to a given first threshold value, if so, turning to the step S7, otherwise, turning to the step S8;

s7: calculating the center position of the target of the current frame image by using the horizontal and vertical coordinates of the maximum response value, if the current frame image is not the last frame, updating the template in the step S3 to be the weighting of the template of the current frame image and the center position of the target of the current frame image, turning to the step S3 to process the next frame after updating, and otherwise, ending the tracking;

s8: the template is not updated, a target state in the current frame image is predicted by Kalman filtering to obtain a predicted coordinate, the predicted coordinate is taken as a center, a target area with the size consistent with the size of the template and a target area obtained by actual matching of the current frame image are weighted to be used as a matching result of the current frame image, wherein the target area is the center position of the target of the previous frame or the horizontal and vertical coordinates of the maximum response value as the center, the size is consistent with the size of the template, the center position of the matching result is taken as the center, the size which is 3 times the size of the matching result is taken as a search frame to be traversed to obtain a set of areas to be matched, the direction gradient histogram and the haar feature of each area to be matched in the set of areas to be matched are extracted to obtain corresponding fusion features, and then a multi-layer kernel correlation filtering response graph set is obtained based on each fusion feature;

if the maximum response value in the multi-layer kernel correlation filtering response image set is larger than or equal to a given second threshold value, updating the center position of the target by the horizontal and vertical coordinates of the maximum value, if the current frame image is not the last frame, updating the template in the step S3 to be the weighting of the template of the current frame image and the center position of the current frame target, turning to the step S3 to process the next frame after updating, and otherwise, ending the tracking;

if the maximum response value in the multi-layer kernel correlation filtering response image set is lower than a given second threshold value, ending tracking if the current frame image is the last frame, otherwise, reading the next frame image as the current frame, and turning to the step S8.

Further, the specific steps of step S1 are as follows: reading an infrared image sequence, framing a target in an initial frame image, recording the central position and the size of the target, and using the framed target in the initial frame image as a template, wherein the central position and the size of the template are the central position and the size of the target; and acquiring a second frame image as a current frame image, and taking a template of the initial frame image as a template of the current frame image.

Further, the specific steps of step S2 are as follows:

s2.1: determining a search frame according to the size target _ sz of the template, wherein the size of the search frame is window _ sz = target _ sz (1 + padding), and padding is to determine the ratio of the size of the search frame to the size of the target;

s2.2: determining a characteristic regression label yf according to the given cell size cell _ size, the size target _ sz of the template and the size window _ sz of the search frame, and then obtaining a two-dimensional cosine window cos _ window based on the characteristic regression label yf;

the method comprises the following specific steps:

s2.2.1: calculating the bandwidth sigma of the Gaussian regression label according to the size target _ sz of the template and the size of the cell _ size, wherein the formula is as follows:

wherein w and h are the width and height of the template, and a is the spatial bandwidth, proportional to the target size;

s2.2.2: calculating a regression label yf according to the bandwidth sigma of the Gaussian regression label and the size window _ sz of the search box, wherein the calculation formula is as follows:

1≤r≤m，1≤s≤n

wherein m and n are respectively

After y' is obtained through calculation, cyclic shift enables the peak value of the regression label to move to the upper left corner to obtain y, and then Fourier transform is carried out to obtain a regression label yf;

s2.2.3: calculating by utilizing a hann function according to the size of the regression label yf to obtain a two-dimensional cosine window cos _ window;

further, the specific steps of step S3 are as follows:

s3.1: HOG and Haar based extraction template m _t Performing linear fusion on the characteristics of the two-dimensional cosine window cos _ window to obtain fusion characteristics, wherein Haar is Haar, and HOG is a direction gradient histogram;

s3.2: obtaining a target regression coefficient according to the fusion characteristics;

s3.3: if the current frame image is the second frame image, the step S3.4 is carried out, if the current frame image is the last frame image, no processing is carried out, otherwise, the step S3.6 is carried out;

s3.4: when the target tracks the second frame image, the target model mode _ xf is initialized by using the fusion feature xf of the frequency domain of the current frame image, namely

Where, t represents the second frame image,

fusion feature xf, representing a frequency domain>

Representing a target model, mode _ xf;

s3.5: when the target tracks the second frame image, the target regression coefficient mode _ alpha is initialized by using the target regression coefficient alpha of the current frame image, namely

Where, t represents the second frame image,

represents the target regression coefficient alpha, ->

Represents the target regression coefficient, mod _ α;

s3.6: when the target tracks the third frame or the image after the third frame, the target model mode _ xf is updated by linear interpolation, that is

Wherein, eta is a given learning rate,

for a target model in a current frame image>

Obtaining updated ^ for the target model of the previous frame image>

Namely, the updated mode _ xf;

s3.7: the target regression coefficient (mod _ α) updated by linear interpolation when the target tracks the third frame or an image subsequent to the third frame, i.e. the target regression coefficient

Wherein, eta is a given learning rate,

target regression coefficients for a previous frame image>

Is the target regression coefficient mod _ α of the current frame image.

Further, the specific steps of step 3.1 are:

s3.1.1: based on a given cell size cell _ size, template m was extracted using the corresponding piotr _ toolbox toolkit of MATLAB _t Obtaining a FHIG signature g of 31 dimensions ₀ I.e. histogram of oriented gradient features;

s3.1.2: the current frame image is the t frame image, and the template m of the t frame image is calculated _t An integral map SAT of (x, y), a calculation formula of each pixel value in the integral map SAT being:

SAT(x,y)＝SAT(x,y-1)+SAT(x-1,y)-SAT(x-1,y-1)+m _t (x,y)

wherein, SAT (x, y-1) represents the integral image pixel value at the upper side of the current pixel position x, y, SAT (x-1, y) represents the integral image pixel value at the left side of the current pixel position x, y, SAT (x-1, y-1) represents the integral image pixel value at the upper left corner of the current pixel position x, y, and the initial boundary of the integral image SAT is SAT (-1, y) = SAT (x, -1) = SAT (-1, -1) =0; SAT (-1, y) is the left boundary pixel value, SAT (x, -1) is the upper boundary pixel value, SAT (-1, -1) is the upper left vertex pixel value; initial boundaries SAT (-1, y), SAT (x, -1) and SAT (-1, -1) are used to calculate SAT (0, y) and SAT (x, 0);

s3.1.3: dividing the integrogram SAT according to the cell size cell _ size, i.e. the sum of the upper half pixel integrograms of any cell is SAT _A The sum of the lower half pixel integrals of the cell units is SAT _B The sum of the left half pixel integrals of the cell units isSAT _C The sum of the pixel integrals of the right half of the cell unit is SATD _， The 1-dimensional vertical Haar feature g1 corresponding to each cell unit is SAT _A And SAT _B 1-dimensional horizontal Haar feature g2 is SAT _C And SAT _D The 1-dimensional Haar characteristic g1 in the vertical direction and the 1-dimensional Haar characteristic g2 in the horizontal direction of all the cell units are Haar characteristics;

s3.1.4: the 31-dimensional FHOG characteristic g ₀ All 1-dimensional vertical Haar features g ₁ All 1-dimensional horizontal Haar features g ₂ After linear fusion, the linear fusion is point-multiplied by a two-dimensional cosine window cos _ window to obtain a 33-dimensional fusion feature g.

Further, the specific steps of step 3.2 are:

s3.2.1: performing fast Fourier transform on the fusion features g to obtain a template m _t The fused feature xf in the frequency domain,

calculating the template m _t The formula of the fusion feature xf in the frequency domain is:

wherein g represents a pair of templates m _t The extracted 33-dimensional fusion features are combined,

represents a Fourier transform, <' > is selected>

Representing the obtained fusion feature xf of the frequency domain;

s3.2.2: calculating a Gaussian autocorrelation kernel matrix kf on a frequency domain based on the fusion characteristic xf of the frequency domain according to the Gaussian kernel correlation function; formula of gaussian kernel correlation function:

wherein, K ^xx′ Representing x and xThe Gaussian kernel correlation matrix, x and x' respectively represent different characteristic symbols used for calculating the Gaussian kernel correlation matrix, and are replaced by corresponding characteristics, | x | in the actual calculation process ² The sum of the squares of the modes of each element in feature x is divided by N, where N is the product of the two dimensions of the matrix x,

represents the form of the matrix x in the Fourier domain, and->

Represents->

Is heteroconjugate,. Sup.>

Expressing inverse Fourier transform, wherein sigma is the bandwidth of a Gaussian regression label, and T expresses transposition;

replacing x and x' in a formula of the Gaussian kernel correlation function with xf by using the fusion characteristic xf of the frequency domain, and calculating a Gaussian autocorrelation kernel matrix kf on the frequency domain;

s3.2.3: calculating a target regression coefficient alpha according to the Gaussian autocorrelation kernel matrix kf, wherein the calculation formula is as follows:

where λ is the regularization parameter, K ^xx′ The value is given as kf and the value is,

for the regression label yf, a value is calculated->

Namely the target regression coefficient alpha.

Further, the specific steps of step S4 are:

according to the template m of the current frame image _t Determining a search box s of a current frame image _t Search box s _t Is 1.5 times the template size target _ sz, search box s _t The central position of the template m _t The center position of (a).

Further, the specific steps of step S5 are as follows:

s5.1: traversing search box s in current frame image _t Obtaining a plurality of regions p to be matched with the template size target _ sz _i Obtaining a region set A to be matched _t ；

S5.2: based on a given cell size cell _ size, treat matching region set A _t Each region p to be matched in (1) _i Sequentially extracting FHOG characteristics and Haar characteristics, linearly fusing, performing dot multiplication with a two-dimensional cosine window cos _ window after linear fusion, and performing fast Fourier transform to obtain each region p to be matched _i Corresponding fusion feature zf in frequency domain _i Set of regions to be matched A _t The set of the fusion features on the corresponding frequency domain of all the areas to be matched is zf;

s5.3: based on each fusion feature zf according to the Gaussian kernel correlation function _i Calculating the Gaussian cross correlation kernel matrix on the frequency domain to obtain each region p to be matched _i Corresponding Gaussian cross-correlation kernel matrix kzf in frequency domain _i Wherein, the formula of the Gaussian kernel correlation function is as follows:

for each region p to be matched _i Corresponding fusion feature zf in frequency domain _i Respectively replacing x and x' with target models, mode _ xf and zf, according to a Gaussian kernel correlation function formula _i Calculating each region to be matched p _i Corresponding Gaussian cross-correlation kernel matrix kzf in frequency domain _i Set of regions to be matched A _t The set of Gaussian cross-correlation kernel matrixes corresponding to all the regions to be matched is kzf;

obtaining a fusion characteristic kernel correlation filtering response graph set response according to a ridge regression response score function and a Gaussian cross-correlation kernel matrix set kzf, wherein the method specifically comprises the following steps:

obtaining each Gaussian cross-correlation kernel matrix kzf according to the ridge regression response scoring function _i Wherein the formula of the ridge regression response score function is:

wherein the content of the first and second substances,

the value is a Gaussian cross-correlation kernel matrix kzf in a Gaussian cross-correlation kernel matrix set kzf _i ，/>

Is the target regression coefficient, mode _ α,>

for a one Gaussian cross correlation kernel matrix kzf _i Obtaining a single regression response value;

cross-correlating each Gaussian with a kernel matrix kzf _i And arranging the corresponding single regression response values into a matrix according to the row and column sequence, performing inverse Fourier transform to return to a time domain, reserving a real part to obtain a multilayer kernel correlation filter response graph, and obtaining a multilayer kernel correlation filter response graph set response according to a Gaussian cross correlation kernel matrix set kzf.

Further, the specific steps of step S7 are as follows:

s7.1: the maximum response value max (response) in the multi-layer kernel correlation filter response graph set is more than or equal to T ₁ Taking the coordinate of the maximum response value max (response) in the current frame image as the central position pos of the target of the current frame image;

s7.2: if the current frame image is the last frame, ending the circulation, otherwise, updating the template in the step S3 to be the weighting of the template of the current frame image and the center position of the target of the current frame image, returning to the step S3 after updating to perform the next frame processing, wherein the updating formula is as follows:

m _t+1 ＝(1-intetp _factor )*m _t +interp _factor ·p _max

wherein m is _t+1 For updated templates, m _t For the current frame image template, p _max For the area to be matched corresponding to the maximum response value max (response), interp _factor Is a weight adjustment factor.

Further, the specific steps of step S8 are as follows:

s8.1: maximum response value max (response) in the set of multi-layer kernel correlation filter response graphs<T ₁ Then the target is occluded, the template is not updated, m _t+1 ＝m _t ；

S8.2: according to the target state x in the previous frame image _t-1 Predicting the target state of the current frame image, and taking out the coordinates of the center position of the target from the predicted target state, namely the predicted coordinates, wherein the target state comprises the center position and the speed of the target, and as the template is not updated, namely the template between two adjacent frames does not change greatly, the target is considered to do uniform motion, and the specific formula for predicting the target state of the current frame image is calculated as follows:

x _t ＝A·x _t-1 +B·u _t-1 +w _t-1

where A is the state transition matrix, B is the matrix relating to the external control parameters, x _t-1 Is the target state in the t-1 frame image, u _t-1 Is the acceleration of the target in the t-1 frame image, and makes uniform motion, namely 0 _t-1 For describing process noise, subject to a Gaussian distribution w _t-1 ～N(0,Q _t-1 )，

p _x And p _y As coordinates of the center position of the object in the t-th frame image, v _x ，v _y Setting a state transition matrix as the speed of the central position of the target in the t-th frame image on the x-axis and the y-axis according to a uniform motion model

Therefore, the formula for predicting the target state of the current frame image is as follows:

s8.3: taking the central position of the target obtained by prediction as the center, and taking the target area weight obtained by actually matching the area with the size consistent with the template with the current frame image as the matching result of the current frame image, the specific steps are as follows:

s8.3.1, if the central position of the target in the previous frame of image is obtained by prediction, the central position of the target area obtained by actual matching of the current frame of image

The central position of the target in the previous frame image is obtained, and the size of the target area is consistent with that of the template; if the central position of the target in the previous frame image is not predicted, the central position of the target area actually matched and obtained by the current frame image is greater than or equal to the central position of the target area>

The position of the maximum response value in the current frame image is taken as the central position of the actual matching target of the current frame, and the size of the target area is consistent with that of the template;

s8.3.2: calculating a covariance matrix of prior estimation of a current frame image, namely a t frame image, wherein a specific formula is as follows:

for the posterior error of t-1 frame image, the initial value is a given value, A ^T The method is characterized in that the method is a transpose of A, and Q is a process noise covariance given by a frame image;

s8.3.3: calculating a filter gain matrix of the current frame image

The calculation formula is as follows:

wherein the content of the first and second substances,

is the state transition matrix, is asserted>

Is a transpose of the state transition matrix, R _t Is the observed noise covariance, as a constant value R, (X) ^-1 Represents the inverse of X;

s8.3.4: according to the filter gain matrix of the current frame image

And predicted target state x _t Best estimated position->

I.e. the matching result, the calculation formula is as follows: />

Represents the central position of a target area actually matched with the current frame image, namely the measured value, and is combined with the current frame image>

Is measured value->

And predicted coordinate>

With a difference between v _t Is represented by v _t Satisfy the Gaussian distribution, v _t ～N(0,R _t )；

S8.3.5: if the current frame is not the last frame, based on the filter gain matrix

Status switch matrix->

Covariance matrix P with a priori estimate _t Updating the posterior error of the current frame image, wherein the calculation formula is as follows:

s8.4: based on the best estimated position

The central position updating formula for updating the target of the current frame image is as follows:

wherein, pos _x And pos _y For the updated center position of the target, p _x And p _y For best estimation of position

The coordinates of (a);

s8.5: taking the central position of the matching result as the center, namely the central position of the updated target as the center, taking a rectangular frame with the size 3 times of the matching result in the current frame image as a search frame, traversing the region set to be matched in the search frame, extracting the directional gradient histogram and haar characteristics of each region to be matched in the region set to be matched, and then performing subsequent processing to obtain a relevant filtering response image set, wherein the specific steps are as follows:

s8.5.1: to match the center position (pos) of the result _x ,pos _y ) As the center, the length and width of the search box are determined to be 3 times of the length and width of the template,

by the size w of the template _m ,l _m Traversing the whole search box to obtain a region set to be matched;

s8.5.2: then obtaining fusion characteristics corresponding to a plurality of regions to be matched based on the region set to be matched, and calculating a multilayer kernel-dependent filter response map set response _ c based on the fusion characteristics, corresponding target models and target regression coefficients;

s8.6: if the maximum response value max (response _ c) in the multi-layer kernel correlation filter response map set is more than or equal to T ₂ And updating the center position of the target of the current frame in the following manner:

wherein, pos _x ，pos _y For the updated centre position, p, of the target in the current frame image _x ，p _y For the centre position, w, of the target obtained in step S8.4 _m ,l _m Is the size of the template, vert _x ，vert _y Relative search box top left corner (pos) at maximum response value, respectively _x -1.5·w _m ,pos _y -1.5·l _m ) Moving the number of pixels;

s8.7: if the current frame is the last frame, ending the circulation, otherwise, updating the template in the step S3 to be the template m of the current frame image _t And the weighting of the center position of the target in the current frame image, the formula is calculated as follows:

m _t+1 ＝(1-interp _factor )·m _t +interp _factor ·p_c _max

wherein m is _t+1 For updated templates, m _t As a template for the current frame image, p _ c _max An interp corresponding to the maximum response value max (response _ c) _factor Is a weight adjustment factor;

s8.8: if the maximum response value max (response _ c) in the multi-layer kernel correlation filter response map set<T ₂ If the current frame image is the last frame, the tracking is finished, otherwise, the next frame image is read as the current frame, and the step S8 is carried out.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention adopts a nuclear correlation filtering algorithm which has the same low complexity as a linear correlation filter, has few code lines, has higher speed than other tracking algorithms, can run at the speed of hundreds of frames per second, is superior to trackers such as Struke or TLD and is completely competent for real-time tracking;

2. the invention adopts the fusion characteristic of combining the FHOG characteristic and the haar characteristic, the former describes the edge and gradient change of the image, and has small occupied storage space and high operation speed, the latter describes the edge by a small amount of data, and the fusion characteristic of combining the two effectively describes the edge and gradient change of a local area, thus more accurately embodying the characteristic of the infrared target, reducing the loss of original image information and improving the tracking precision;

3. according to the invention, through introducing Kalman filtering, the problems of continuous tracking under the shielding condition and tracking failure caused by overlarge drift are solved, the shielding resistance of tracking is increased, a target can be searched in a larger searching frame when the tracking fails, the target re-detection function is realized, and the tracking accuracy is greatly improved.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a block diagram of a target image framed in an initial frame of an infrared image sequence in accordance with the present invention;

FIG. 3 is a response fusion graph of a frame during infrared image sequence tracking according to the present invention;

fig. 4 is an original image and a tracking effect image of a 3-frame image sequence in the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings and specific embodiments.

Based on a KCF frame, after an image sequence is read in, a target is selected in a first frame, features based on Haar (Haar) and Histogram of Oriented Gradient (HOG) are extracted and linearly fused to obtain a first frame template, a second frame image is preprocessed, features based on Haar and histogram of oriented gradient are extracted in a search frame, the position of the target is predicted by adopting kernel correlation filtering, the similarity between the target and the template is calculated, if the similarity is greater than a threshold value, the template is updated, the predicted position of the next frame is read until the last frame or the confidence coefficient is less than the threshold value, the position of the target is predicted by utilizing Kalman filtering, the template is not updated any more, the predicted target track is taken as a motion track, the similarity between the predicted position and the template is calculated, if the similarity is greater than a certain threshold value, the target is considered to reappear the correlation filtering, and if not, the Kalman filtering is carried out until the last frame.

An anti-blocking infrared target tracking method comprises the following steps:

s1: reading an infrared image sequence, framing a target in an initial frame image to obtain the central position and the size of the target, taking the target in the initial frame image as a template, obtaining a second frame image as a current frame image, and taking the template of the initial frame image as the template of the current frame image; the method comprises the following specific steps: reading an infrared image sequence, framing a target in an initial frame image, recording the central position and the size of the target, and using the framed target in the initial frame image as a template, wherein the central position and the size of the template are the central position and the size of the target; and acquiring a second frame image as a current frame image, and taking the template of the initial frame image as the template of the current frame image.

S2: obtaining a two-dimensional cosine window according to the size of the template and the size of the cell unit; the method comprises the following specific steps:

the method comprises the following specific steps:

in the formula, w and h are the width and height of the template, and a is the space bandwidth and is proportional to the target size;

1≤r≤m，1≤s≤n

wherein m and n are respectively

s3: performing linear fusion on the features of the directional gradient histogram and the haar extraction template, adding a two-dimensional cosine window to the linearly fused features to obtain fusion features, calculating to obtain a target regression coefficient on the basis of the fusion features, initializing the target model and the target regression coefficient by using the target regression coefficient if the target regression coefficient is obtained by calculating the second frame image, and not processing if the target regression coefficient is obtained by calculating the last frame image, otherwise updating the target model and the target regression coefficient; the method comprises the following specific steps:

s3.1: HOG and Haar based extraction template m _t Performing linear fusion on the characteristics of the two-dimensional cosine window cos _ window to obtain fusion characteristics, wherein Haar is Haar, and HOG is a direction gradient histogram; the method comprises the following specific steps:

s3.1.2: the current frame image is the t frame image, and a template m of the t frame image is calculated _t An integrogram SAT of (x, y), the calculation formula of each pixel value in the integrogram SAT being:

SAT(x,y)＝SAT(x,y-1)+SAT(x-1,y)-SAT(x-1,y-1)+m _t (x,y)

wherein, SAT (x, y-1) represents the integral image pixel value above x, y of the current pixel position, SAT (x-1, y-1) represents the integral image pixel value above x, y of the current pixel position, and the initial boundary of the integral image SAT is SAT (-1, y) = SAT (x, -1) = SAT (-1, -1) =0; SAT (-1, y) is the left boundary pixel value, SAT (x, -1) is the upper boundary pixel value, SAT (-1, -1) is the upper left vertex pixel value; initial boundaries SAT (-1, y), SAT (x, -1) and SAT (-1, -1) are used to calculate SAT (0, y) and SAT (x, 0);

s3.1.3: dividing the integral map SAT according to the cell size cell _ size, i.e. the sum of the upper half pixel integral maps of any cell is SAT _A The sum of the lower half pixel integrals of the cell units is SAT _B The sum of the pixel integrals of the left half of the cell unit is SAT _C The sum of the pixel integrals of the right half of the cell unit is SAT _D The 1-dimensional vertical Haar feature g1 corresponding to each cell unit is SAT _A And SAT _B 1-dimensional horizontal Haar feature g2 is SAT _C And SAT _D The 1-dimensional Haar characteristic g1 in the vertical direction and the 1-dimensional Haar characteristic g2 in the horizontal direction of all the cell units are Haar characteristics;

S3.2: obtaining a target regression coefficient according to the fusion characteristic g; the method comprises the following specific steps:

represents a Fourier transform, <' > is selected>

Representing the obtained fusion feature xf of the frequency domain;

wherein, K ^xx′ Representing the Gaussian kernel correlation matrix of x and x ', wherein x and x' respectively represent different feature symbols used for calculating the Gaussian kernel correlation matrix and are replaced by corresponding features | < x | > during the actual calculation process ² For each element in feature xThe sum of the squares of the modes is divided by N, which is the product of the two dimensions of the matrix x,

represents the form of the matrix x in the Fourier domain, and->

Represents->

Is heteroconjugate,. Sup.>

where λ is the regularization parameter, K ^xx′ The value is given as kf and,

for the regression tag yf, a calculation is made +>

Namely the target regression coefficient alpha.

Where, t represents the second frame image,

represents a fused feature xf, <' > in the frequency domain>

Representing a target model mode _ xf;

Wherein, t represents a second frame image,

represents a target regression coefficient alpha, <' > in>

Represents the target regression coefficient, mod _ α;

Wherein eta is a given learning rate and takes a value of 0.02,

for the target model of the current frame image, < > or>

Obtaining updated ^ for the target model of the previous frame image>

I.e. updated mode _ xf;

Wherein eta is a given learning rate and takes a value of 0.02,

is the target regression coefficient of the previous frame image, is based on the value of the regression coefficient>

Is the target regression coefficient model _ α of the current frame image.

S4: determining a search frame of the current frame image by taking the template center of the current frame image as the center position of the search frame; the method comprises the following specific steps:

S5: traversing in a search frame in a current frame image based on the size of a template to obtain a set of regions to be matched, obtaining fusion characteristics corresponding to a plurality of regions to be matched based on the set of regions to be matched, and calculating a multi-kernel correlation filter response map corresponding to each region to be matched based on the fusion characteristics, a corresponding target model and a target regression coefficient to obtain a multi-kernel correlation filter response map set; the method comprises the following specific steps:

s5.1: traversing searches in a current frame imageFrame s _t Obtaining a plurality of regions p to be matched with the template size target _ sz _i Obtaining a region set A to be matched _t ；

S5.2: based on a given cell size cell _ size, treat matching region set A _t Each region p to be matched in (1) _i Sequentially extracting FHOG characteristics and Haar characteristics, linearly fusing, performing dot multiplication with a two-dimensional cosine window cos _ window after linear fusion, and performing fast Fourier transform to obtain each region p to be matched _i Corresponding fusion feature zf in frequency domain _i Set of regions to be matched A _t The set of the fusion features on the corresponding frequency domains of all the regions to be matched is zf;

s5.3: based on each fusion feature zf according to the Gaussian kernel correlation function _i Calculating a Gaussian cross-correlation kernel matrix on a frequency domain to obtain each region p to be matched _i Corresponding Gaussian cross-correlation kernel matrix kzf in frequency domain _i Wherein, the formula of the Gaussian kernel correlation function is as follows:

for each region p to be matched _i Corresponding fusion feature zf in frequency domain _i Respectively replacing x and x' with target models of mode _ xf and zf according to a Gaussian kernel correlation function formula _i Calculating each region to be matched p _i Corresponding Gaussian cross-correlation kernel matrix kzf in frequency domain _i Set of regions to be matched A _t The set of Gaussian cross-correlation kernel matrixes corresponding to all the regions to be matched in the image is kzf;

wherein, the first and the second end of the pipe are connected with each other,

For the target regression coefficient mode _ alpha, based on the regression coefficient value>

For a Gaussian cross-correlation kernel matrix kzf _i Obtaining a single regression response value;

s7: calculating the central position of the target of the current frame image by using the horizontal and vertical coordinates of the maximum response value, if the current frame image is not the last frame, updating the template in the step S3 to be the weighting of the template of the current frame image and the central position of the target of the current frame image, turning to the step S3 to process the next frame after updating, and otherwise, ending the tracking; the method comprises the following specific steps:

m _t+1 ＝(1-interp _factor )·m _t +interp _factor ·p _max

S8: the template is not updated, a target state in the current frame image is predicted by Kalman filtering to obtain a predicted coordinate, the predicted coordinate is taken as a center, a target area with the size consistent with the size of the template and a target area obtained by actual matching of the current frame image are weighted to serve as a matching result of the current frame image, wherein the target area is an area with the size consistent with the size of the template at the center position or the horizontal and vertical coordinates of the maximum response value as the center, the center position of the matching result is taken as the center, the size which is 3 times the size of the matching result is taken as a search frame to traverse to obtain a set of areas to be matched, the direction gradient histogram and the haar feature of each area to be matched in the set of areas to be matched are extracted to obtain corresponding fusion features, and then a multi-layer kernel correlation filtering response graph set is obtained based on each fusion feature (the calculation method is the same as the calculation method in the step S3);

if the maximum response value in the multi-layer kernel correlation filtering response image set is larger than or equal to a given second threshold value, updating the central position of the target by the horizontal and vertical coordinates of the maximum value, if the current frame image is not the last frame, updating the template in the step S3 to weight the template of the current frame image and the central position of the current frame target, and turning to the step S3 to process the next frame after updating, otherwise, ending the tracking;

The method comprises the following specific steps:

s8.1: most in the set of multi-layer kernel-dependent filter response mapsLarge response value max (response)<T ₁ Then the target is occluded, the template is not updated, m _t+1 ＝m _t ；

x _t ＝A·x _t-1 +B·u _t-1 +w _t-1

p _x And p _y As coordinates of the center position of the object in the t-th frame image, v _x ，v _y Setting a state transfer matrix to be ^ based on a uniform motion model for the speeds of the central position of the target in the t-th frame image on the x axis and the y axis>

s8.3.1 if the last oneThe central position of the target in the frame image is obtained by prediction, and the central position of the target area obtained by actual matching of the current frame image

Taking the position of the maximum response value in the current frame image as the central position of the actual matching target of the current frame, wherein the size of the target area is consistent with that of the template;

for the posterior error of t-1 frame image, the initial value is a given value, A ^T The method is characterized in that the method is the transposition of A, and Q is the process noise covariance given by a frame image;

s8.3.3: calculating a filter gain matrix of the current frame image

The calculation formula is as follows:

wherein the content of the first and second substances,

is the state transition matrix, is asserted>

s8.3.4: according to the filter gain matrix of the current frame image

And predicted target state x _t Best estimated position->

I.e., the matching result, the calculation formula is as follows:

represents the central position of the target area actually matched with the current frame image, namely the measured value, is->

Is measured value->

And predicted coordinate>

Error between, by v _t Is represented by v _t Satisfy the Gaussian distribution, v _t ～N(0,R _t )；

Status switch matrix->

s8.4: based on the best estimated position

The coordinates of (a);

s8.5: taking the central position of the matching result as the center, namely the central position of the updated target as the center, taking a rectangular frame with the size 3 times that of the matching result in the current frame image as a search frame, traversing the region set to be matched in the search frame, extracting the directional gradient histogram and haar characteristics of each region to be matched in the region set to be matched, and then performing subsequent processing to obtain a relevant filtering response image set (the calculation method is the same as the calculation method in the step S3), wherein the specific steps are as follows:

s8.6: if the maximum response value max (response _ c) in the multi-layer kernel correlation filter response map set is more than or equal to T ₂ And updating the center position of the target of the current frame in the following updating mode:

wherein, pos _x ，pos _y For the central position of the target in the updated current frame image, p _x ，p _y For the centre position, w, of the target obtained in step S8.4 _m ,l _m Is the size of the template, vert _x ，vert _y Relative search box top left corner (pos) at maximum response value, respectively _x -1.5·w _m ,pos _y -1.5·l _m ) Moving the number of pixels;

m _t+1 ＝(1-interp _factor )·m _t +interp _factor ·p_c _max

wherein m is _t+1 For updated templates, m _t As a template for the current frame image, p _ c _max For the area to be matched corresponding to the maximum response value max (response _ c), interp _factor Is a weight adjustment factor;

The above are merely representative examples of the many specific applications of the present invention, and do not limit the scope of the invention in any way. All the technical solutions formed by using the conversion or the equivalent substitution fall within the protection scope of the present invention.

Claims

1. An anti-blocking infrared target tracking method is characterized by comprising the following steps:

s8: the template is not updated, a target state in the current frame image is predicted by Kalman filtering to obtain a predicted coordinate, the predicted coordinate is taken as a center, a target area with the size consistent with the size of the template and a target area obtained by actual matching of the current frame image are weighted to serve as a matching result of the current frame image, wherein the target area is an area with the size consistent with the size of the template at the center position of the previous frame target or the horizontal and vertical coordinates of the maximum response value as the center, the center position of the matching result is taken as the center, the size which is 3 times the size of the matching result is taken as a search frame to be traversed to obtain a set of areas to be matched, the direction gradient histogram and the haar feature of each area to be matched in the set of areas to be matched are extracted to obtain corresponding fusion features, and then a multi-layer kernel correlation filtering response graph set is obtained based on each fusion feature;

if the maximum response value in the multi-layer kernel correlation filtering response image set is lower than a given second threshold value, ending tracking if the current frame image is the last frame, otherwise, reading the next frame image as the current frame, and turning to the step S8;

the specific steps of the step S2 are as follows:

the method comprises the following specific steps:

1≤r≤m，1≤s≤n

wherein m and n are respectively

After y' is obtained through calculation, cyclic shift enables a regression label peak value to move to the upper left corner to obtain y, and then Fourier transform is carried out to obtain a regression label yf;

the specific steps of step S3 are as follows:

s3.1: HOG and Haar based extraction template m _t The features of the two-dimensional cosine window cos _ window are added to the linearly fused features to obtain fused features, wherein Haar is Haar, and HOG is a direction gradient histogram;

Where, t represents the second frame image,

represents a fused feature xf, <' > in the frequency domain>

Representing a target model, mode _ xf;

Where, t represents the second frame image,

represents the target regression coefficient alpha, ->

Represents the target regression coefficient, mod _ α;

s3.6: when the target tracks the third frame or the image after the third frame, the target model (mode _ xf) is updated by linear interpolation, that is

Wherein, eta is a given learning rate,

for a target model in a current frame image>

Obtaining updated ^ for the target model of the previous frame image>

Namely, the updated mode _ xf;

s3.7: target regression coefficient modela updated by linear interpolation, i.e. when the target tracks the third frame or an image following the third frame

Wherein, eta is a given learning rate,

Is the target regression coefficient mode _ alpha of the current frame image.

2. The anti-occlusion infrared target tracking method according to claim 1, characterized in that the specific steps of the step S1 are as follows: reading an infrared image sequence, framing a target in an initial frame image, recording the central position and the size of the target, and using the framed target in the initial frame image as a template, wherein the central position and the size of the template are the central position and the size of the target; and acquiring a second frame image as a current frame image, and taking the template of the initial frame image as the template of the current frame image.

3. The anti-occlusion infrared target tracking method according to claim 1, characterized in that the specific steps of the step 3.1 are as follows:

s3.1.1: cell _ si based on a given cell sizeze, extract template m using the corresponding piotr _ toolbox toolkit of MATLAB _t Obtaining FHOG signature g of 31 dimensions ₀ I.e. histogram of oriented gradient features;

s3.1.2: the current frame image is the t frame image, and the template m of the t frame image is calculated _t An integrogram SAT of (x, y), the calculation formula of each pixel value in the integrogram SAT being:

SAT(x,y)＝SAT(x,y-1)+SAT(x-1,y)-SAT(x-1,y-1)+m _t (x,y)

s3.1.3: dividing the integrogram SAT according to the cell size cell _ size, i.e. the sum of the upper half pixel integrograms of any cell is SAT _A The sum of the lower half pixel integrals of the cell units is SAT _B The sum of the left half pixel integrals of the cell units is SAT _C The sum of the pixel integrals of the right half of the cell unit is SAT _D The 1-dimensional vertical Haar feature g1 corresponding to each cell unit is SAT _A And SAT _B 1-dimensional horizontal Haar feature g2 is SAT _C And SAT _D The 1-dimensional Haar characteristic g1 in the vertical direction and the 1-dimensional Haar characteristic g2 in the horizontal direction of all the cell units are Haar characteristics;

4. The anti-blocking infrared target tracking method according to claim 1, characterized in that the specific steps of the step 3.2 are as follows:

s3.2.1: performing fast Fourier transform on the fusion characteristic g to obtain a template m _t The fused feature xf in the frequency domain,

representing a Fourier transform, <' > based on>

Representing the obtained fusion feature xf of the frequency domain;

wherein, K ^xx′ Representing the Gaussian kernel correlation matrix of x and x ', wherein x and x' respectively represent different feature symbols used for calculating the Gaussian kernel correlation matrix and are replaced by corresponding features | < x | > during the actual calculation process ² The sum of the squares of the modes of each element in the feature x is divided by N, where N is the product of the two dimensions of the matrix x,

represents the form of the matrix x in the Fourier domain, and->

Represents->

The complex conjugate of (a) and (b),

for the regression label yf, a value is calculated->

Namely the target regression coefficient alpha.

5. The anti-occlusion infrared target tracking method according to any one of claims 1 to 4, characterized in that the specific steps of the step S4 are as follows:

according to the template m of the current frame image _t Determining a search box s for a current frame image _t Search box s _t Is 1.5 times the template size target _ sz, search box s _t The central position of the template m _t The center position of (a).

6. The anti-occlusion infrared target tracking method according to claim 5, wherein the specific steps of the step S5 are as follows:

S5.2: based on a given cell size cell _ size, treat matching region set A _t Each region p to be matched in _i Sequentially extracting FHOG characteristics and Haar characteristics, linearly fusing, performing point multiplication with a two-dimensional cosine window cos _ window after linear fusion, and performing fast Fourier transform to obtain p of each area to be matched _i Corresponding fusion feature zf in frequency domain _i Set of regions to be matched A _t The set of the fusion features on the corresponding frequency domain of all the areas to be matched is zf;

for each region p to be matched _i Corresponding fusion feature zf in frequency domain _i Respectively replacing x and x' with target models of mode _ xf and zf according to a Gaussian kernel correlation function formula _i Calculating each region to be matched p _i Corresponding Gaussian cross-correlation kernel matrix kzf in frequency domain _i Set of regions to be matched A _t The set of Gaussian cross-correlation kernel matrixes corresponding to all the regions to be matched is kzf;

obtaining each Gaussian cross-correlation kernel matrix kzf according to ridge regression response scoring function _i Wherein the formula of the ridge regression response score function is:

wherein the content of the first and second substances,

cross-correlating each Gaussian with a kernel matrix kzf _i And arranging the corresponding single regression response values into matrixes according to the row and column sequence, performing inverse Fourier transform to return to a time domain, reserving a real part, obtaining a multilayer kernel correlation filter response graph, and obtaining a multilayer kernel correlation filter response graph set response according to a Gaussian cross correlation kernel matrix set kzf.

7. The anti-occlusion infrared target tracking method according to claim 6, wherein the specific steps of the step S7 are as follows:

m _t+1 ＝(1-interp _factor )·m _t +interp _factor ·p _max

8. The anti-occlusion infrared target tracking method according to claim 7, wherein the specific steps of the step S8 are as follows:

s8.1: maximum response value max (response) in the multi-layer kernel correlation filter response graph set<T ₁ Then the target is occluded, the template is not updated, m _t+1 ＝m _t ；

S8.2: according to the target state x in the previous frame image _t-1 Predicting the target state of the current frame image, and taking out the coordinates of the center position of the target from the predicted target state, namely predicted coordinates, wherein the target state comprises the center position and the speed of the target, and the target is considered to move at a constant speed because the template is not updated, namely the template between two adjacent frames does not change greatly, and the specific formula for predicting the target state of the current frame image is calculated as follows:

x _t ＝A·x _t-1 +B·u _t-1 +w _t-1

s8.3.1, if the central position of the target in the previous frame image is obtained by prediction, the central position of the target area obtained by actual matching of the current frame image

s8.3.3: calculating a filter gain matrix for a current frame image

The calculation formula is as followsThe following:

is the state transition matrix, is asserted>

s8.3.4: according to the filter gain matrix of the current frame image

And predicted target state x _t Best estimated position->

I.e. the matching result, the calculation formula is as follows:

Is measured value->

And the predicted coordinate->

Status switch matrix->

s8.4: based on the best estimated position

The coordinates of (a);

s8.5.2: then obtaining fusion characteristics corresponding to a plurality of regions to be matched based on the region set to be matched, and calculating a multilayer kernel-dependent filter response graph set response _ c based on the fusion characteristics, corresponding target models and target regression coefficients;

m _t+1 ＝(1-interp _factor )·m _t +interp _factor ·p_c _max

wherein m is _t+1 For updated templates, m _t As a template for the current frame image, p _ c _max Corresponding to the maximum response value max (response _ c)Region to be matched, interp _factor Is a weight adjustment factor;