CN110276785B - Anti-shielding infrared target tracking method - Google Patents
Anti-shielding infrared target tracking method Download PDFInfo
- Publication number
- CN110276785B CN110276785B CN201910547576.2A CN201910547576A CN110276785B CN 110276785 B CN110276785 B CN 110276785B CN 201910547576 A CN201910547576 A CN 201910547576A CN 110276785 B CN110276785 B CN 110276785B
- Authority
- CN
- China
- Prior art keywords
- target
- frame image
- template
- current frame
- size
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/251—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/277—Analysis of motion involving stochastic approaches, e.g. using Kalman filters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10048—Infrared image
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an anti-shielding infrared target tracking method, solves the problem of long-time target tracking in complex environments such as attitude change, shielding and the like, and belongs to the field of target tracking technology and computer vision. Reading an infrared image sequence, selecting a target in an initial frame image to obtain the central position and the size of the target, taking the target in the initial frame image as a template, obtaining a second frame image as a current frame image, taking the template of the initial frame image as the template of the current frame image, and obtaining a two-dimensional cosine window according to the size of the template and the cell unit size; and initializing or updating a target model and a target regression coefficient based on the characteristics of the histogram of the directional gradient and the haar extraction template and linear fusion, and obtaining a multi-layer kernel correlation filter response graph set based on a search frame in each frame of image to perform subsequent target tracking. The invention is used for infrared image target tracking.
Description
Technical Field
An anti-blocking infrared target tracking method is used for infrared image target tracking and belongs to the field of target tracking technology and computer vision.
Background
In the above complex environments, due to severe deformation or occlusion of the target, no good algorithm can automatically track the person of interest, and manual force must be used.
The existing target tracking method mainly comprises the steps of based on regional information, such as a template matching method, being simple, accurate and fast, but not being suitable for complex environments such as severe target deformation and the like, and easily causing target loss under the conditions; based on model information, by establishing a geometric model of the target and searching the model, the method is difficult to solve the shielding problem, and the shielding resistance is weaker due to the lack of color information in the infrared environment; based on a Bayes framework, namely, on the basis of capturing the initial state of a target and extracting the target features through features, a space-time combined target state estimation is carried out, which can be used for target position estimation under the condition of being shielded, but the algorithm complexity is higher; the deep learning-based method has good robustness but is easy to cause data loss, and the network training speed is difficult to meet the real-time requirement; based on relevant filtering, the method is high in speed, wherein KCF filtering has the characteristics of high speed and high accuracy, the tracking speed is improved by nearly 10 times compared with tracking algorithms such as Struck and TLD, and the accuracy is extremely high compared with an MOSSE algorithm with the accuracy of 43.1% of OBT50, and the accuracy can reach 73.2% under the condition of using HOG characteristics.
Aiming at the problem that the tracking accuracy is difficult to guarantee due to the influence of target change and external environment on the infrared imaging target tracking under the complex condition, finding an anti-shielding long-term tracking algorithm is a problem which needs to be solved urgently at present. And the existing improved algorithms based on kernel correlation filtering solve the problem of tracking failure caused by the fact that a target is shielded in the tracking process to a certain extent. When the target scale is not considered to be changed greatly, namely under the shielding condition, the algorithm can still accurately complete the matching of the search area and the target to a great extent, so as to realize the target tracking, but if the target scale is considered to be changed greatly, under the shielding condition, the problem of tracking failure is easily caused.
Disclosure of Invention
Aiming at the problems of the research, the invention aims to provide an anti-blocking infrared target tracking method, which solves the problem that the target tracking method adopted in the prior art is influenced by blocking and is easy to cause tracking failure.
In order to achieve the purpose, the invention adopts the following technical scheme:
an anti-blocking infrared target tracking method comprises the following steps:
s1: reading an infrared image sequence, framing a target in an initial frame image to obtain the central position and the size of the target, taking the target in the initial frame image as a template, obtaining a second frame image as a current frame image, and taking the template of the initial frame image as the template of the current frame image;
s2: obtaining a two-dimensional cosine window according to the size of the template and the size of the cell unit;
s3: performing linear fusion on the features of the directional gradient histogram and the haar extraction template, adding a two-dimensional cosine window to the linearly fused features to obtain fusion features, calculating to obtain a target regression coefficient on the basis of the fusion features, initializing the target model and the target regression coefficient by using the target regression coefficient if the target regression coefficient is obtained by calculating the second frame image, and not processing if the target regression coefficient is obtained by calculating the last frame image, otherwise updating the target model and the target regression coefficient;
s4: determining a search frame of the current frame image by taking the template center of the current frame image as the center position of the search frame;
s5: traversing in a search frame in a current frame image based on the size of a template to obtain a set of regions to be matched, obtaining fusion characteristics corresponding to a plurality of regions to be matched based on the set of regions to be matched, and calculating a multi-kernel correlation filter response map corresponding to each region to be matched based on the fusion characteristics, a corresponding target model and a target regression coefficient to obtain a multi-kernel correlation filter response map set;
s6: judging whether the maximum response value in the multi-layer kernel correlation filtering response image set is larger than or equal to a given first threshold value, if so, turning to the step S7, otherwise, turning to the step S8;
s7: calculating the center position of the target of the current frame image by using the horizontal and vertical coordinates of the maximum response value, if the current frame image is not the last frame, updating the template in the step S3 to be the weighting of the template of the current frame image and the center position of the target of the current frame image, turning to the step S3 to process the next frame after updating, and otherwise, ending the tracking;
s8: the template is not updated, a target state in the current frame image is predicted by Kalman filtering to obtain a predicted coordinate, the predicted coordinate is taken as a center, a target area with the size consistent with the size of the template and a target area obtained by actual matching of the current frame image are weighted to be used as a matching result of the current frame image, wherein the target area is the center position of the target of the previous frame or the horizontal and vertical coordinates of the maximum response value as the center, the size is consistent with the size of the template, the center position of the matching result is taken as the center, the size which is 3 times the size of the matching result is taken as a search frame to be traversed to obtain a set of areas to be matched, the direction gradient histogram and the haar feature of each area to be matched in the set of areas to be matched are extracted to obtain corresponding fusion features, and then a multi-layer kernel correlation filtering response graph set is obtained based on each fusion feature;
if the maximum response value in the multi-layer kernel correlation filtering response image set is larger than or equal to a given second threshold value, updating the center position of the target by the horizontal and vertical coordinates of the maximum value, if the current frame image is not the last frame, updating the template in the step S3 to be the weighting of the template of the current frame image and the center position of the current frame target, turning to the step S3 to process the next frame after updating, and otherwise, ending the tracking;
if the maximum response value in the multi-layer kernel correlation filtering response image set is lower than a given second threshold value, ending tracking if the current frame image is the last frame, otherwise, reading the next frame image as the current frame, and turning to the step S8.
Further, the specific steps of step S1 are as follows: reading an infrared image sequence, framing a target in an initial frame image, recording the central position and the size of the target, and using the framed target in the initial frame image as a template, wherein the central position and the size of the template are the central position and the size of the target; and acquiring a second frame image as a current frame image, and taking a template of the initial frame image as a template of the current frame image.
Further, the specific steps of step S2 are as follows:
s2.1: determining a search frame according to the size target _ sz of the template, wherein the size of the search frame is window _ sz = target _ sz (1 + padding), and padding is to determine the ratio of the size of the search frame to the size of the target;
s2.2: determining a characteristic regression label yf according to the given cell size cell _ size, the size target _ sz of the template and the size window _ sz of the search frame, and then obtaining a two-dimensional cosine window cos _ window based on the characteristic regression label yf;
the method comprises the following specific steps:
s2.2.1: calculating the bandwidth sigma of the Gaussian regression label according to the size target _ sz of the template and the size of the cell _ size, wherein the formula is as follows:
wherein w and h are the width and height of the template, and a is the spatial bandwidth, proportional to the target size;
s2.2.2: calculating a regression label yf according to the bandwidth sigma of the Gaussian regression label and the size window _ sz of the search box, wherein the calculation formula is as follows:
1≤r≤m,1≤s≤n
wherein m and n are respectivelyAfter y' is obtained through calculation, cyclic shift enables the peak value of the regression label to move to the upper left corner to obtain y, and then Fourier transform is carried out to obtain a regression label yf;
s2.2.3: calculating by utilizing a hann function according to the size of the regression label yf to obtain a two-dimensional cosine window cos _ window;
further, the specific steps of step S3 are as follows:
s3.1: HOG and Haar based extraction template m t Performing linear fusion on the characteristics of the two-dimensional cosine window cos _ window to obtain fusion characteristics, wherein Haar is Haar, and HOG is a direction gradient histogram;
s3.2: obtaining a target regression coefficient according to the fusion characteristics;
s3.3: if the current frame image is the second frame image, the step S3.4 is carried out, if the current frame image is the last frame image, no processing is carried out, otherwise, the step S3.6 is carried out;
s3.4: when the target tracks the second frame image, the target model mode _ xf is initialized by using the fusion feature xf of the frequency domain of the current frame image, namely
Where, t represents the second frame image,fusion feature xf, representing a frequency domain>Representing a target model, mode _ xf;
s3.5: when the target tracks the second frame image, the target regression coefficient mode _ alpha is initialized by using the target regression coefficient alpha of the current frame image, namely
Where, t represents the second frame image,represents the target regression coefficient alpha, ->Represents the target regression coefficient, mod _ α;
s3.6: when the target tracks the third frame or the image after the third frame, the target model mode _ xf is updated by linear interpolation, that is
Wherein, eta is a given learning rate,for a target model in a current frame image>Obtaining updated ^ for the target model of the previous frame image>Namely, the updated mode _ xf;
s3.7: the target regression coefficient (mod _ α) updated by linear interpolation when the target tracks the third frame or an image subsequent to the third frame, i.e. the target regression coefficient
Wherein, eta is a given learning rate,target regression coefficients for a previous frame image>Is the target regression coefficient mod _ α of the current frame image.
Further, the specific steps of step 3.1 are:
s3.1.1: based on a given cell size cell _ size, template m was extracted using the corresponding piotr _ toolbox toolkit of MATLAB t Obtaining a FHIG signature g of 31 dimensions 0 I.e. histogram of oriented gradient features;
s3.1.2: the current frame image is the t frame image, and the template m of the t frame image is calculated t An integral map SAT of (x, y), a calculation formula of each pixel value in the integral map SAT being:
SAT(x,y)=SAT(x,y-1)+SAT(x-1,y)-SAT(x-1,y-1)+m t (x,y)
wherein, SAT (x, y-1) represents the integral image pixel value at the upper side of the current pixel position x, y, SAT (x-1, y) represents the integral image pixel value at the left side of the current pixel position x, y, SAT (x-1, y-1) represents the integral image pixel value at the upper left corner of the current pixel position x, y, and the initial boundary of the integral image SAT is SAT (-1, y) = SAT (x, -1) = SAT (-1, -1) =0; SAT (-1, y) is the left boundary pixel value, SAT (x, -1) is the upper boundary pixel value, SAT (-1, -1) is the upper left vertex pixel value; initial boundaries SAT (-1, y), SAT (x, -1) and SAT (-1, -1) are used to calculate SAT (0, y) and SAT (x, 0);
s3.1.3: dividing the integrogram SAT according to the cell size cell _ size, i.e. the sum of the upper half pixel integrograms of any cell is SAT A The sum of the lower half pixel integrals of the cell units is SAT B The sum of the left half pixel integrals of the cell units isSAT C The sum of the pixel integrals of the right half of the cell unit is SATD , The 1-dimensional vertical Haar feature g1 corresponding to each cell unit is SAT A And SAT B 1-dimensional horizontal Haar feature g2 is SAT C And SAT D The 1-dimensional Haar characteristic g1 in the vertical direction and the 1-dimensional Haar characteristic g2 in the horizontal direction of all the cell units are Haar characteristics;
s3.1.4: the 31-dimensional FHOG characteristic g 0 All 1-dimensional vertical Haar features g 1 All 1-dimensional horizontal Haar features g 2 After linear fusion, the linear fusion is point-multiplied by a two-dimensional cosine window cos _ window to obtain a 33-dimensional fusion feature g.
Further, the specific steps of step 3.2 are:
s3.2.1: performing fast Fourier transform on the fusion features g to obtain a template m t The fused feature xf in the frequency domain,
calculating the template m t The formula of the fusion feature xf in the frequency domain is:
wherein g represents a pair of templates m t The extracted 33-dimensional fusion features are combined,represents a Fourier transform, <' > is selected>Representing the obtained fusion feature xf of the frequency domain;
s3.2.2: calculating a Gaussian autocorrelation kernel matrix kf on a frequency domain based on the fusion characteristic xf of the frequency domain according to the Gaussian kernel correlation function; formula of gaussian kernel correlation function:
wherein, K xx′ Representing x and xThe Gaussian kernel correlation matrix, x and x' respectively represent different characteristic symbols used for calculating the Gaussian kernel correlation matrix, and are replaced by corresponding characteristics, | x | in the actual calculation process 2 The sum of the squares of the modes of each element in feature x is divided by N, where N is the product of the two dimensions of the matrix x,represents the form of the matrix x in the Fourier domain, and->Represents->Is heteroconjugate,. Sup.>Expressing inverse Fourier transform, wherein sigma is the bandwidth of a Gaussian regression label, and T expresses transposition;
replacing x and x' in a formula of the Gaussian kernel correlation function with xf by using the fusion characteristic xf of the frequency domain, and calculating a Gaussian autocorrelation kernel matrix kf on the frequency domain;
s3.2.3: calculating a target regression coefficient alpha according to the Gaussian autocorrelation kernel matrix kf, wherein the calculation formula is as follows:
where λ is the regularization parameter, K xx′ The value is given as kf and the value is,for the regression label yf, a value is calculated-> Namely the target regression coefficient alpha.
Further, the specific steps of step S4 are:
according to the template m of the current frame image t Determining a search box s of a current frame image t Search box s t Is 1.5 times the template size target _ sz, search box s t The central position of the template m t The center position of (a).
Further, the specific steps of step S5 are as follows:
s5.1: traversing search box s in current frame image t Obtaining a plurality of regions p to be matched with the template size target _ sz i Obtaining a region set A to be matched t ;
S5.2: based on a given cell size cell _ size, treat matching region set A t Each region p to be matched in (1) i Sequentially extracting FHOG characteristics and Haar characteristics, linearly fusing, performing dot multiplication with a two-dimensional cosine window cos _ window after linear fusion, and performing fast Fourier transform to obtain each region p to be matched i Corresponding fusion feature zf in frequency domain i Set of regions to be matched A t The set of the fusion features on the corresponding frequency domain of all the areas to be matched is zf;
s5.3: based on each fusion feature zf according to the Gaussian kernel correlation function i Calculating the Gaussian cross correlation kernel matrix on the frequency domain to obtain each region p to be matched i Corresponding Gaussian cross-correlation kernel matrix kzf in frequency domain i Wherein, the formula of the Gaussian kernel correlation function is as follows:
for each region p to be matched i Corresponding fusion feature zf in frequency domain i Respectively replacing x and x' with target models, mode _ xf and zf, according to a Gaussian kernel correlation function formula i Calculating each region to be matched p i Corresponding Gaussian cross-correlation kernel matrix kzf in frequency domain i Set of regions to be matched A t The set of Gaussian cross-correlation kernel matrixes corresponding to all the regions to be matched is kzf;
obtaining a fusion characteristic kernel correlation filtering response graph set response according to a ridge regression response score function and a Gaussian cross-correlation kernel matrix set kzf, wherein the method specifically comprises the following steps:
obtaining each Gaussian cross-correlation kernel matrix kzf according to the ridge regression response scoring function i Wherein the formula of the ridge regression response score function is:
wherein the content of the first and second substances,the value is a Gaussian cross-correlation kernel matrix kzf in a Gaussian cross-correlation kernel matrix set kzf i ,/>Is the target regression coefficient, mode _ α,>for a one Gaussian cross correlation kernel matrix kzf i Obtaining a single regression response value;
cross-correlating each Gaussian with a kernel matrix kzf i And arranging the corresponding single regression response values into a matrix according to the row and column sequence, performing inverse Fourier transform to return to a time domain, reserving a real part to obtain a multilayer kernel correlation filter response graph, and obtaining a multilayer kernel correlation filter response graph set response according to a Gaussian cross correlation kernel matrix set kzf.
Further, the specific steps of step S7 are as follows:
s7.1: the maximum response value max (response) in the multi-layer kernel correlation filter response graph set is more than or equal to T 1 Taking the coordinate of the maximum response value max (response) in the current frame image as the central position pos of the target of the current frame image;
s7.2: if the current frame image is the last frame, ending the circulation, otherwise, updating the template in the step S3 to be the weighting of the template of the current frame image and the center position of the target of the current frame image, returning to the step S3 after updating to perform the next frame processing, wherein the updating formula is as follows:
m t+1 =(1-intetp factor )*m t +interp factor ·p max
wherein m is t+1 For updated templates, m t For the current frame image template, p max For the area to be matched corresponding to the maximum response value max (response), interp factor Is a weight adjustment factor.
Further, the specific steps of step S8 are as follows:
s8.1: maximum response value max (response) in the set of multi-layer kernel correlation filter response graphs<T 1 Then the target is occluded, the template is not updated, m t+1 =m t ;
S8.2: according to the target state x in the previous frame image t-1 Predicting the target state of the current frame image, and taking out the coordinates of the center position of the target from the predicted target state, namely the predicted coordinates, wherein the target state comprises the center position and the speed of the target, and as the template is not updated, namely the template between two adjacent frames does not change greatly, the target is considered to do uniform motion, and the specific formula for predicting the target state of the current frame image is calculated as follows:
x t =A·x t-1 +B·u t-1 +w t-1
where A is the state transition matrix, B is the matrix relating to the external control parameters, x t-1 Is the target state in the t-1 frame image, u t-1 Is the acceleration of the target in the t-1 frame image, and makes uniform motion, namely 0 t-1 For describing process noise, subject to a Gaussian distribution w t-1 ~N(0,Q t-1 ),p x And p y As coordinates of the center position of the object in the t-th frame image, v x ,v y Setting a state transition matrix as the speed of the central position of the target in the t-th frame image on the x-axis and the y-axis according to a uniform motion modelTherefore, the formula for predicting the target state of the current frame image is as follows:
s8.3: taking the central position of the target obtained by prediction as the center, and taking the target area weight obtained by actually matching the area with the size consistent with the template with the current frame image as the matching result of the current frame image, the specific steps are as follows:
s8.3.1, if the central position of the target in the previous frame of image is obtained by prediction, the central position of the target area obtained by actual matching of the current frame of imageThe central position of the target in the previous frame image is obtained, and the size of the target area is consistent with that of the template; if the central position of the target in the previous frame image is not predicted, the central position of the target area actually matched and obtained by the current frame image is greater than or equal to the central position of the target area>The position of the maximum response value in the current frame image is taken as the central position of the actual matching target of the current frame, and the size of the target area is consistent with that of the template;
s8.3.2: calculating a covariance matrix of prior estimation of a current frame image, namely a t frame image, wherein a specific formula is as follows:
for the posterior error of t-1 frame image, the initial value is a given value, A T The method is characterized in that the method is a transpose of A, and Q is a process noise covariance given by a frame image;
s8.3.3: calculating a filter gain matrix of the current frame imageThe calculation formula is as follows:
wherein the content of the first and second substances,is the state transition matrix, is asserted> Is a transpose of the state transition matrix, R t Is the observed noise covariance, as a constant value R, (X) -1 Represents the inverse of X;
s8.3.4: according to the filter gain matrix of the current frame imageAnd predicted target state x t Best estimated position->I.e. the matching result, the calculation formula is as follows: />
Represents the central position of a target area actually matched with the current frame image, namely the measured value, and is combined with the current frame image>Is measured value->And predicted coordinate>With a difference between v t Is represented by v t Satisfy the Gaussian distribution, v t ~N(0,R t );
S8.3.5: if the current frame is not the last frame, based on the filter gain matrixStatus switch matrix->Covariance matrix P with a priori estimate t Updating the posterior error of the current frame image, wherein the calculation formula is as follows:
s8.4: based on the best estimated positionThe central position updating formula for updating the target of the current frame image is as follows:
wherein, pos x And pos y For the updated center position of the target, p x And p y For best estimation of positionThe coordinates of (a);
s8.5: taking the central position of the matching result as the center, namely the central position of the updated target as the center, taking a rectangular frame with the size 3 times of the matching result in the current frame image as a search frame, traversing the region set to be matched in the search frame, extracting the directional gradient histogram and haar characteristics of each region to be matched in the region set to be matched, and then performing subsequent processing to obtain a relevant filtering response image set, wherein the specific steps are as follows:
s8.5.1: to match the center position (pos) of the result x ,pos y ) As the center, the length and width of the search box are determined to be 3 times of the length and width of the template,by the size w of the template m ,l m Traversing the whole search box to obtain a region set to be matched;
s8.5.2: then obtaining fusion characteristics corresponding to a plurality of regions to be matched based on the region set to be matched, and calculating a multilayer kernel-dependent filter response map set response _ c based on the fusion characteristics, corresponding target models and target regression coefficients;
s8.6: if the maximum response value max (response _ c) in the multi-layer kernel correlation filter response map set is more than or equal to T 2 And updating the center position of the target of the current frame in the following manner:
wherein, pos x ,pos y For the updated centre position, p, of the target in the current frame image x ,p y For the centre position, w, of the target obtained in step S8.4 m ,l m Is the size of the template, vert x ,vert y Relative search box top left corner (pos) at maximum response value, respectively x -1.5·w m ,pos y -1.5·l m ) Moving the number of pixels;
s8.7: if the current frame is the last frame, ending the circulation, otherwise, updating the template in the step S3 to be the template m of the current frame image t And the weighting of the center position of the target in the current frame image, the formula is calculated as follows:
m t+1 =(1-interp factor )·m t +interp factor ·p_c max
wherein m is t+1 For updated templates, m t As a template for the current frame image, p _ c max An interp corresponding to the maximum response value max (response _ c) factor Is a weight adjustment factor;
s8.8: if the maximum response value max (response _ c) in the multi-layer kernel correlation filter response map set<T 2 If the current frame image is the last frame, the tracking is finished, otherwise, the next frame image is read as the current frame, and the step S8 is carried out.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention adopts a nuclear correlation filtering algorithm which has the same low complexity as a linear correlation filter, has few code lines, has higher speed than other tracking algorithms, can run at the speed of hundreds of frames per second, is superior to trackers such as Struke or TLD and is completely competent for real-time tracking;
2. the invention adopts the fusion characteristic of combining the FHOG characteristic and the haar characteristic, the former describes the edge and gradient change of the image, and has small occupied storage space and high operation speed, the latter describes the edge by a small amount of data, and the fusion characteristic of combining the two effectively describes the edge and gradient change of a local area, thus more accurately embodying the characteristic of the infrared target, reducing the loss of original image information and improving the tracking precision;
3. according to the invention, through introducing Kalman filtering, the problems of continuous tracking under the shielding condition and tracking failure caused by overlarge drift are solved, the shielding resistance of tracking is increased, a target can be searched in a larger searching frame when the tracking fails, the target re-detection function is realized, and the tracking accuracy is greatly improved.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a block diagram of a target image framed in an initial frame of an infrared image sequence in accordance with the present invention;
FIG. 3 is a response fusion graph of a frame during infrared image sequence tracking according to the present invention;
fig. 4 is an original image and a tracking effect image of a 3-frame image sequence in the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific embodiments.
Based on a KCF frame, after an image sequence is read in, a target is selected in a first frame, features based on Haar (Haar) and Histogram of Oriented Gradient (HOG) are extracted and linearly fused to obtain a first frame template, a second frame image is preprocessed, features based on Haar and histogram of oriented gradient are extracted in a search frame, the position of the target is predicted by adopting kernel correlation filtering, the similarity between the target and the template is calculated, if the similarity is greater than a threshold value, the template is updated, the predicted position of the next frame is read until the last frame or the confidence coefficient is less than the threshold value, the position of the target is predicted by utilizing Kalman filtering, the template is not updated any more, the predicted target track is taken as a motion track, the similarity between the predicted position and the template is calculated, if the similarity is greater than a certain threshold value, the target is considered to reappear the correlation filtering, and if not, the Kalman filtering is carried out until the last frame.
An anti-blocking infrared target tracking method comprises the following steps:
s1: reading an infrared image sequence, framing a target in an initial frame image to obtain the central position and the size of the target, taking the target in the initial frame image as a template, obtaining a second frame image as a current frame image, and taking the template of the initial frame image as the template of the current frame image; the method comprises the following specific steps: reading an infrared image sequence, framing a target in an initial frame image, recording the central position and the size of the target, and using the framed target in the initial frame image as a template, wherein the central position and the size of the template are the central position and the size of the target; and acquiring a second frame image as a current frame image, and taking the template of the initial frame image as the template of the current frame image.
S2: obtaining a two-dimensional cosine window according to the size of the template and the size of the cell unit; the method comprises the following specific steps:
s2.1: determining a search frame according to the size target _ sz of the template, wherein the size of the search frame is window _ sz = target _ sz (1 + padding), and padding is to determine the ratio of the size of the search frame to the size of the target;
s2.2: determining a characteristic regression label yf according to the given cell size cell _ size, the size target _ sz of the template and the size window _ sz of the search frame, and then obtaining a two-dimensional cosine window cos _ window based on the characteristic regression label yf;
the method comprises the following specific steps:
s2.2.1: calculating the bandwidth sigma of the Gaussian regression label according to the size target _ sz of the template and the size of the cell _ size, wherein the formula is as follows:
in the formula, w and h are the width and height of the template, and a is the space bandwidth and is proportional to the target size;
s2.2.2: calculating a regression label yf according to the bandwidth sigma of the Gaussian regression label and the size window _ sz of the search box, wherein the calculation formula is as follows:
1≤r≤m,1≤s≤n
wherein m and n are respectivelyAfter y' is obtained through calculation, cyclic shift enables the peak value of the regression label to move to the upper left corner to obtain y, and then Fourier transform is carried out to obtain a regression label yf;
s2.2.3: calculating by utilizing a hann function according to the size of the regression label yf to obtain a two-dimensional cosine window cos _ window;
s3: performing linear fusion on the features of the directional gradient histogram and the haar extraction template, adding a two-dimensional cosine window to the linearly fused features to obtain fusion features, calculating to obtain a target regression coefficient on the basis of the fusion features, initializing the target model and the target regression coefficient by using the target regression coefficient if the target regression coefficient is obtained by calculating the second frame image, and not processing if the target regression coefficient is obtained by calculating the last frame image, otherwise updating the target model and the target regression coefficient; the method comprises the following specific steps:
s3.1: HOG and Haar based extraction template m t Performing linear fusion on the characteristics of the two-dimensional cosine window cos _ window to obtain fusion characteristics, wherein Haar is Haar, and HOG is a direction gradient histogram; the method comprises the following specific steps:
s3.1.1: based on a given cell size cell _ size, template m was extracted using the corresponding piotr _ toolbox toolkit of MATLAB t Obtaining a FHIG signature g of 31 dimensions 0 I.e. histogram of oriented gradient features;
s3.1.2: the current frame image is the t frame image, and a template m of the t frame image is calculated t An integrogram SAT of (x, y), the calculation formula of each pixel value in the integrogram SAT being:
SAT(x,y)=SAT(x,y-1)+SAT(x-1,y)-SAT(x-1,y-1)+m t (x,y)
wherein, SAT (x, y-1) represents the integral image pixel value above x, y of the current pixel position, SAT (x-1, y-1) represents the integral image pixel value above x, y of the current pixel position, and the initial boundary of the integral image SAT is SAT (-1, y) = SAT (x, -1) = SAT (-1, -1) =0; SAT (-1, y) is the left boundary pixel value, SAT (x, -1) is the upper boundary pixel value, SAT (-1, -1) is the upper left vertex pixel value; initial boundaries SAT (-1, y), SAT (x, -1) and SAT (-1, -1) are used to calculate SAT (0, y) and SAT (x, 0);
s3.1.3: dividing the integral map SAT according to the cell size cell _ size, i.e. the sum of the upper half pixel integral maps of any cell is SAT A The sum of the lower half pixel integrals of the cell units is SAT B The sum of the pixel integrals of the left half of the cell unit is SAT C The sum of the pixel integrals of the right half of the cell unit is SAT D The 1-dimensional vertical Haar feature g1 corresponding to each cell unit is SAT A And SAT B 1-dimensional horizontal Haar feature g2 is SAT C And SAT D The 1-dimensional Haar characteristic g1 in the vertical direction and the 1-dimensional Haar characteristic g2 in the horizontal direction of all the cell units are Haar characteristics;
s3.1.4: the 31-dimensional FHOG characteristic g 0 All 1-dimensional vertical Haar features g 1 All 1-dimensional horizontal Haar features g 2 After linear fusion, the linear fusion is point-multiplied by a two-dimensional cosine window cos _ window to obtain a 33-dimensional fusion feature g.
S3.2: obtaining a target regression coefficient according to the fusion characteristic g; the method comprises the following specific steps:
s3.2.1: performing fast Fourier transform on the fusion features g to obtain a template m t The fused feature xf in the frequency domain,
calculating the template m t The formula of the fusion feature xf in the frequency domain is:
wherein g represents a pair of templates m t The extracted 33-dimensional fusion features are combined,represents a Fourier transform, <' > is selected>Representing the obtained fusion feature xf of the frequency domain;
s3.2.2: calculating a Gaussian autocorrelation kernel matrix kf on a frequency domain based on the fusion characteristic xf of the frequency domain according to the Gaussian kernel correlation function; formula of gaussian kernel correlation function:
wherein, K xx′ Representing the Gaussian kernel correlation matrix of x and x ', wherein x and x' respectively represent different feature symbols used for calculating the Gaussian kernel correlation matrix and are replaced by corresponding features | < x | > during the actual calculation process 2 For each element in feature xThe sum of the squares of the modes is divided by N, which is the product of the two dimensions of the matrix x,represents the form of the matrix x in the Fourier domain, and->Represents->Is heteroconjugate,. Sup.>Expressing inverse Fourier transform, wherein sigma is the bandwidth of a Gaussian regression label, and T expresses transposition;
replacing x and x' in a formula of the Gaussian kernel correlation function with xf by using the fusion characteristic xf of the frequency domain, and calculating a Gaussian autocorrelation kernel matrix kf on the frequency domain;
s3.2.3: calculating a target regression coefficient alpha according to the Gaussian autocorrelation kernel matrix kf, wherein the calculation formula is as follows:
where λ is the regularization parameter, K xx′ The value is given as kf and,for the regression tag yf, a calculation is made +> Namely the target regression coefficient alpha.
S3.3: if the current frame image is the second frame image, the step S3.4 is carried out, if the current frame image is the last frame image, no processing is carried out, otherwise, the step S3.6 is carried out;
s3.4: when the target tracks the second frame image, the target model mode _ xf is initialized by using the fusion feature xf of the frequency domain of the current frame image, namely
Where, t represents the second frame image,represents a fused feature xf, <' > in the frequency domain>Representing a target model mode _ xf;
s3.5: when the target tracks the second frame image, the target regression coefficient mode _ alpha is initialized by using the target regression coefficient alpha of the current frame image, namely
Wherein, t represents a second frame image,represents a target regression coefficient alpha, <' > in>Represents the target regression coefficient, mod _ α;
s3.6: when the target tracks the third frame or the image after the third frame, the target model mode _ xf is updated by linear interpolation, that is
Wherein eta is a given learning rate and takes a value of 0.02,for the target model of the current frame image, < > or>Obtaining updated ^ for the target model of the previous frame image>I.e. updated mode _ xf;
s3.7: the target regression coefficient (mod _ α) updated by linear interpolation when the target tracks the third frame or an image subsequent to the third frame, i.e. the target regression coefficient
Wherein eta is a given learning rate and takes a value of 0.02,is the target regression coefficient of the previous frame image, is based on the value of the regression coefficient>Is the target regression coefficient model _ α of the current frame image.
S4: determining a search frame of the current frame image by taking the template center of the current frame image as the center position of the search frame; the method comprises the following specific steps:
according to the template m of the current frame image t Determining a search box s of a current frame image t Search box s t Is 1.5 times the template size target _ sz, search box s t The central position of the template m t The center position of (a).
S5: traversing in a search frame in a current frame image based on the size of a template to obtain a set of regions to be matched, obtaining fusion characteristics corresponding to a plurality of regions to be matched based on the set of regions to be matched, and calculating a multi-kernel correlation filter response map corresponding to each region to be matched based on the fusion characteristics, a corresponding target model and a target regression coefficient to obtain a multi-kernel correlation filter response map set; the method comprises the following specific steps:
s5.1: traversing searches in a current frame imageFrame s t Obtaining a plurality of regions p to be matched with the template size target _ sz i Obtaining a region set A to be matched t ;
S5.2: based on a given cell size cell _ size, treat matching region set A t Each region p to be matched in (1) i Sequentially extracting FHOG characteristics and Haar characteristics, linearly fusing, performing dot multiplication with a two-dimensional cosine window cos _ window after linear fusion, and performing fast Fourier transform to obtain each region p to be matched i Corresponding fusion feature zf in frequency domain i Set of regions to be matched A t The set of the fusion features on the corresponding frequency domains of all the regions to be matched is zf;
s5.3: based on each fusion feature zf according to the Gaussian kernel correlation function i Calculating a Gaussian cross-correlation kernel matrix on a frequency domain to obtain each region p to be matched i Corresponding Gaussian cross-correlation kernel matrix kzf in frequency domain i Wherein, the formula of the Gaussian kernel correlation function is as follows:
for each region p to be matched i Corresponding fusion feature zf in frequency domain i Respectively replacing x and x' with target models of mode _ xf and zf according to a Gaussian kernel correlation function formula i Calculating each region to be matched p i Corresponding Gaussian cross-correlation kernel matrix kzf in frequency domain i Set of regions to be matched A t The set of Gaussian cross-correlation kernel matrixes corresponding to all the regions to be matched in the image is kzf;
obtaining a fusion characteristic kernel correlation filtering response graph set response according to a ridge regression response score function and a Gaussian cross-correlation kernel matrix set kzf, wherein the method specifically comprises the following steps:
obtaining each Gaussian cross-correlation kernel matrix kzf according to the ridge regression response scoring function i Wherein the formula of the ridge regression response score function is:
wherein, the first and the second end of the pipe are connected with each other,the value is a Gaussian cross-correlation kernel matrix kzf in a Gaussian cross-correlation kernel matrix set kzf i ,/>For the target regression coefficient mode _ alpha, based on the regression coefficient value>For a Gaussian cross-correlation kernel matrix kzf i Obtaining a single regression response value;
cross-correlating each Gaussian with a kernel matrix kzf i And arranging the corresponding single regression response values into a matrix according to the row and column sequence, performing inverse Fourier transform to return to a time domain, reserving a real part to obtain a multilayer kernel correlation filter response graph, and obtaining a multilayer kernel correlation filter response graph set response according to a Gaussian cross correlation kernel matrix set kzf.
S6: judging whether the maximum response value in the multi-layer kernel correlation filtering response image set is larger than or equal to a given first threshold value, if so, turning to the step S7, otherwise, turning to the step S8;
s7: calculating the central position of the target of the current frame image by using the horizontal and vertical coordinates of the maximum response value, if the current frame image is not the last frame, updating the template in the step S3 to be the weighting of the template of the current frame image and the central position of the target of the current frame image, turning to the step S3 to process the next frame after updating, and otherwise, ending the tracking; the method comprises the following specific steps:
s7.1: the maximum response value max (response) in the multi-layer kernel correlation filter response graph set is more than or equal to T 1 Taking the coordinate of the maximum response value max (response) in the current frame image as the central position pos of the target of the current frame image;
s7.2: if the current frame image is the last frame, ending the circulation, otherwise, updating the template in the step S3 to be the weighting of the template of the current frame image and the center position of the target of the current frame image, returning to the step S3 after updating to perform the next frame processing, wherein the updating formula is as follows:
m t+1 =(1-interp factor )·m t +interp factor ·p max
wherein m is t+1 For updated templates, m t For the current frame image template, p max For the area to be matched corresponding to the maximum response value max (response), interp factor Is a weight adjustment factor.
S8: the template is not updated, a target state in the current frame image is predicted by Kalman filtering to obtain a predicted coordinate, the predicted coordinate is taken as a center, a target area with the size consistent with the size of the template and a target area obtained by actual matching of the current frame image are weighted to serve as a matching result of the current frame image, wherein the target area is an area with the size consistent with the size of the template at the center position or the horizontal and vertical coordinates of the maximum response value as the center, the center position of the matching result is taken as the center, the size which is 3 times the size of the matching result is taken as a search frame to traverse to obtain a set of areas to be matched, the direction gradient histogram and the haar feature of each area to be matched in the set of areas to be matched are extracted to obtain corresponding fusion features, and then a multi-layer kernel correlation filtering response graph set is obtained based on each fusion feature (the calculation method is the same as the calculation method in the step S3);
if the maximum response value in the multi-layer kernel correlation filtering response image set is larger than or equal to a given second threshold value, updating the central position of the target by the horizontal and vertical coordinates of the maximum value, if the current frame image is not the last frame, updating the template in the step S3 to weight the template of the current frame image and the central position of the current frame target, and turning to the step S3 to process the next frame after updating, otherwise, ending the tracking;
if the maximum response value in the multi-layer kernel correlation filtering response image set is lower than a given second threshold value, ending tracking if the current frame image is the last frame, otherwise, reading the next frame image as the current frame, and turning to the step S8.
The method comprises the following specific steps:
s8.1: most in the set of multi-layer kernel-dependent filter response mapsLarge response value max (response)<T 1 Then the target is occluded, the template is not updated, m t+1 =m t ;
S8.2: according to the target state x in the previous frame image t-1 Predicting the target state of the current frame image, and taking out the coordinates of the center position of the target from the predicted target state, namely the predicted coordinates, wherein the target state comprises the center position and the speed of the target, and as the template is not updated, namely the template between two adjacent frames does not change greatly, the target is considered to do uniform motion, and the specific formula for predicting the target state of the current frame image is calculated as follows:
x t =A·x t-1 +B·u t-1 +w t-1
where A is the state transition matrix, B is the matrix relating to the external control parameters, x t-1 Is the target state in the t-1 frame image, u t-1 Is the acceleration of the target in the t-1 frame image, and makes uniform motion, namely 0 t-1 For describing process noise, subject to a Gaussian distribution w t-1 ~N(0,Q t-1 ),p x And p y As coordinates of the center position of the object in the t-th frame image, v x ,v y Setting a state transfer matrix to be ^ based on a uniform motion model for the speeds of the central position of the target in the t-th frame image on the x axis and the y axis>Therefore, the formula for predicting the target state of the current frame image is as follows:
s8.3: taking the central position of the target obtained by prediction as the center, and taking the target area weight obtained by actually matching the area with the size consistent with the template with the current frame image as the matching result of the current frame image, the specific steps are as follows:
s8.3.1 if the last oneThe central position of the target in the frame image is obtained by prediction, and the central position of the target area obtained by actual matching of the current frame imageThe central position of the target in the previous frame image is obtained, and the size of the target area is consistent with that of the template; if the central position of the target in the previous frame image is not predicted, the central position of the target area actually matched and obtained by the current frame image is greater than or equal to the central position of the target area>Taking the position of the maximum response value in the current frame image as the central position of the actual matching target of the current frame, wherein the size of the target area is consistent with that of the template;
s8.3.2: calculating a covariance matrix of prior estimation of a current frame image, namely a t frame image, wherein a specific formula is as follows:
for the posterior error of t-1 frame image, the initial value is a given value, A T The method is characterized in that the method is the transposition of A, and Q is the process noise covariance given by a frame image;
s8.3.3: calculating a filter gain matrix of the current frame imageThe calculation formula is as follows:
wherein the content of the first and second substances,is the state transition matrix, is asserted> Is a transpose of the state transition matrix, R t Is the observed noise covariance, as a constant value R, (X) -1 Represents the inverse of X;
s8.3.4: according to the filter gain matrix of the current frame imageAnd predicted target state x t Best estimated position->I.e., the matching result, the calculation formula is as follows:
represents the central position of the target area actually matched with the current frame image, namely the measured value, is->Is measured value->And predicted coordinate>Error between, by v t Is represented by v t Satisfy the Gaussian distribution, v t ~N(0,R t );
S8.3.5: if the current frame is not the last frame, based on the filter gain matrixStatus switch matrix->Covariance matrix P with a priori estimate t Updating the posterior error of the current frame image, wherein the calculation formula is as follows:
s8.4: based on the best estimated positionThe central position updating formula for updating the target of the current frame image is as follows:
wherein, pos x And pos y For the updated center position of the target, p x And p y For best estimation of positionThe coordinates of (a);
s8.5: taking the central position of the matching result as the center, namely the central position of the updated target as the center, taking a rectangular frame with the size 3 times that of the matching result in the current frame image as a search frame, traversing the region set to be matched in the search frame, extracting the directional gradient histogram and haar characteristics of each region to be matched in the region set to be matched, and then performing subsequent processing to obtain a relevant filtering response image set (the calculation method is the same as the calculation method in the step S3), wherein the specific steps are as follows:
s8.5.1: to match the center position (pos) of the result x ,pos y ) As the center, the length and width of the search box are determined to be 3 times of the length and width of the template,by the size w of the template m ,l m Traversing the whole search box to obtain a region set to be matched;
s8.5.2: then obtaining fusion characteristics corresponding to a plurality of regions to be matched based on the region set to be matched, and calculating a multilayer kernel-dependent filter response map set response _ c based on the fusion characteristics, corresponding target models and target regression coefficients;
s8.6: if the maximum response value max (response _ c) in the multi-layer kernel correlation filter response map set is more than or equal to T 2 And updating the center position of the target of the current frame in the following updating mode:
wherein, pos x ,pos y For the central position of the target in the updated current frame image, p x ,p y For the centre position, w, of the target obtained in step S8.4 m ,l m Is the size of the template, vert x ,vert y Relative search box top left corner (pos) at maximum response value, respectively x -1.5·w m ,pos y -1.5·l m ) Moving the number of pixels;
s8.7: if the current frame is the last frame, ending the circulation, otherwise, updating the template in the step S3 to be the template m of the current frame image t And the weighting of the center position of the target in the current frame image, the formula is calculated as follows:
m t+1 =(1-interp factor )·m t +interp factor ·p_c max
wherein m is t+1 For updated templates, m t As a template for the current frame image, p _ c max For the area to be matched corresponding to the maximum response value max (response _ c), interp factor Is a weight adjustment factor;
s8.8: if the maximum response value max (response _ c) in the multi-layer kernel correlation filter response map set<T 2 If the current frame image is the last frame, the tracking is finished, otherwise, the next frame image is read as the current frame, and the step S8 is carried out.
The above are merely representative examples of the many specific applications of the present invention, and do not limit the scope of the invention in any way. All the technical solutions formed by using the conversion or the equivalent substitution fall within the protection scope of the present invention.
Claims (8)
1. An anti-blocking infrared target tracking method is characterized by comprising the following steps:
s1: reading an infrared image sequence, framing a target in an initial frame image to obtain the central position and the size of the target, taking the target in the initial frame image as a template, obtaining a second frame image as a current frame image, and taking the template of the initial frame image as the template of the current frame image;
s2: obtaining a two-dimensional cosine window according to the size of the template and the size of the cell unit;
s3: performing linear fusion on the features of the directional gradient histogram and the haar extraction template, adding a two-dimensional cosine window to the linearly fused features to obtain fusion features, calculating to obtain a target regression coefficient on the basis of the fusion features, initializing the target model and the target regression coefficient by using the target regression coefficient if the target regression coefficient is obtained by calculating the second frame image, and not processing if the target regression coefficient is obtained by calculating the last frame image, otherwise updating the target model and the target regression coefficient;
s4: determining a search frame of the current frame image by taking the template center of the current frame image as the center position of the search frame;
s5: traversing in a search frame in a current frame image based on the size of a template to obtain a set of regions to be matched, obtaining fusion characteristics corresponding to a plurality of regions to be matched based on the set of regions to be matched, and calculating a multi-kernel correlation filter response map corresponding to each region to be matched based on the fusion characteristics, a corresponding target model and a target regression coefficient to obtain a multi-kernel correlation filter response map set;
s6: judging whether the maximum response value in the multi-layer kernel correlation filtering response image set is larger than or equal to a given first threshold value, if so, turning to the step S7, otherwise, turning to the step S8;
s7: calculating the center position of the target of the current frame image by using the horizontal and vertical coordinates of the maximum response value, if the current frame image is not the last frame, updating the template in the step S3 to be the weighting of the template of the current frame image and the center position of the target of the current frame image, turning to the step S3 to process the next frame after updating, and otherwise, ending the tracking;
s8: the template is not updated, a target state in the current frame image is predicted by Kalman filtering to obtain a predicted coordinate, the predicted coordinate is taken as a center, a target area with the size consistent with the size of the template and a target area obtained by actual matching of the current frame image are weighted to serve as a matching result of the current frame image, wherein the target area is an area with the size consistent with the size of the template at the center position of the previous frame target or the horizontal and vertical coordinates of the maximum response value as the center, the center position of the matching result is taken as the center, the size which is 3 times the size of the matching result is taken as a search frame to be traversed to obtain a set of areas to be matched, the direction gradient histogram and the haar feature of each area to be matched in the set of areas to be matched are extracted to obtain corresponding fusion features, and then a multi-layer kernel correlation filtering response graph set is obtained based on each fusion feature;
if the maximum response value in the multi-layer kernel correlation filtering response image set is larger than or equal to a given second threshold value, updating the center position of the target by the horizontal and vertical coordinates of the maximum value, if the current frame image is not the last frame, updating the template in the step S3 to be the weighting of the template of the current frame image and the center position of the current frame target, turning to the step S3 to process the next frame after updating, and otherwise, ending the tracking;
if the maximum response value in the multi-layer kernel correlation filtering response image set is lower than a given second threshold value, ending tracking if the current frame image is the last frame, otherwise, reading the next frame image as the current frame, and turning to the step S8;
the specific steps of the step S2 are as follows:
s2.1: determining a search frame according to the size target _ sz of the template, wherein the size of the search frame is window _ sz = target _ sz (1 + padding), and padding is to determine the ratio of the size of the search frame to the size of the target;
s2.2: determining a characteristic regression label yf according to the given cell size cell _ size, the size target _ sz of the template and the size window _ sz of the search frame, and then obtaining a two-dimensional cosine window cos _ window based on the characteristic regression label yf;
the method comprises the following specific steps:
s2.2.1: calculating the bandwidth sigma of the Gaussian regression label according to the size target _ sz of the template and the size of the cell _ size, wherein the formula is as follows:
in the formula, w and h are the width and height of the template, and a is the space bandwidth and is proportional to the target size;
s2.2.2: calculating a regression label yf according to the bandwidth sigma of the Gaussian regression label and the size window _ sz of the search box, wherein the calculation formula is as follows:
1≤r≤m,1≤s≤n
wherein m and n are respectivelyAfter y' is obtained through calculation, cyclic shift enables a regression label peak value to move to the upper left corner to obtain y, and then Fourier transform is carried out to obtain a regression label yf;
s2.2.3: calculating by utilizing a hann function according to the size of the regression label yf to obtain a two-dimensional cosine window cos _ window;
the specific steps of step S3 are as follows:
s3.1: HOG and Haar based extraction template m t The features of the two-dimensional cosine window cos _ window are added to the linearly fused features to obtain fused features, wherein Haar is Haar, and HOG is a direction gradient histogram;
s3.2: obtaining a target regression coefficient according to the fusion characteristics;
s3.3: if the current frame image is the second frame image, the step S3.4 is carried out, if the current frame image is the last frame image, no processing is carried out, otherwise, the step S3.6 is carried out;
s3.4: when the target tracks the second frame image, the target model mode _ xf is initialized by using the fusion feature xf of the frequency domain of the current frame image, namely
Where, t represents the second frame image,represents a fused feature xf, <' > in the frequency domain>Representing a target model, mode _ xf;
s3.5: when the target tracks the second frame image, the target regression coefficient mode _ alpha is initialized by using the target regression coefficient alpha of the current frame image, namely
Where, t represents the second frame image,represents the target regression coefficient alpha, ->Represents the target regression coefficient, mod _ α;
s3.6: when the target tracks the third frame or the image after the third frame, the target model (mode _ xf) is updated by linear interpolation, that is
Wherein, eta is a given learning rate,for a target model in a current frame image>Obtaining updated ^ for the target model of the previous frame image>Namely, the updated mode _ xf;
s3.7: target regression coefficient modela updated by linear interpolation, i.e. when the target tracks the third frame or an image following the third frame
2. The anti-occlusion infrared target tracking method according to claim 1, characterized in that the specific steps of the step S1 are as follows: reading an infrared image sequence, framing a target in an initial frame image, recording the central position and the size of the target, and using the framed target in the initial frame image as a template, wherein the central position and the size of the template are the central position and the size of the target; and acquiring a second frame image as a current frame image, and taking the template of the initial frame image as the template of the current frame image.
3. The anti-occlusion infrared target tracking method according to claim 1, characterized in that the specific steps of the step 3.1 are as follows:
s3.1.1: cell _ si based on a given cell sizeze, extract template m using the corresponding piotr _ toolbox toolkit of MATLAB t Obtaining FHOG signature g of 31 dimensions 0 I.e. histogram of oriented gradient features;
s3.1.2: the current frame image is the t frame image, and the template m of the t frame image is calculated t An integrogram SAT of (x, y), the calculation formula of each pixel value in the integrogram SAT being:
SAT(x,y)=SAT(x,y-1)+SAT(x-1,y)-SAT(x-1,y-1)+m t (x,y)
wherein, SAT (x, y-1) represents the integral image pixel value above x, y of the current pixel position, SAT (x-1, y-1) represents the integral image pixel value above x, y of the current pixel position, and the initial boundary of the integral image SAT is SAT (-1, y) = SAT (x, -1) = SAT (-1, -1) =0; SAT (-1, y) is the left boundary pixel value, SAT (x, -1) is the upper boundary pixel value, SAT (-1, -1) is the upper left vertex pixel value; initial boundaries SAT (-1, y), SAT (x, -1) and SAT (-1, -1) are used to calculate SAT (0, y) and SAT (x, 0);
s3.1.3: dividing the integrogram SAT according to the cell size cell _ size, i.e. the sum of the upper half pixel integrograms of any cell is SAT A The sum of the lower half pixel integrals of the cell units is SAT B The sum of the left half pixel integrals of the cell units is SAT C The sum of the pixel integrals of the right half of the cell unit is SAT D The 1-dimensional vertical Haar feature g1 corresponding to each cell unit is SAT A And SAT B 1-dimensional horizontal Haar feature g2 is SAT C And SAT D The 1-dimensional Haar characteristic g1 in the vertical direction and the 1-dimensional Haar characteristic g2 in the horizontal direction of all the cell units are Haar characteristics;
s3.1.4: the 31-dimensional FHOG characteristic g 0 All 1-dimensional vertical Haar features g 1 All 1-dimensional horizontal Haar features g 2 After linear fusion, the linear fusion is point-multiplied by a two-dimensional cosine window cos _ window to obtain a 33-dimensional fusion feature g.
4. The anti-blocking infrared target tracking method according to claim 1, characterized in that the specific steps of the step 3.2 are as follows:
s3.2.1: performing fast Fourier transform on the fusion characteristic g to obtain a template m t The fused feature xf in the frequency domain,
calculating the template m t The formula of the fusion feature xf in the frequency domain is:
wherein g represents a pair of templates m t The extracted 33-dimensional fusion features are combined,representing a Fourier transform, <' > based on>Representing the obtained fusion feature xf of the frequency domain;
s3.2.2: calculating a Gaussian autocorrelation kernel matrix kf on a frequency domain based on the fusion characteristic xf of the frequency domain according to the Gaussian kernel correlation function; formula of gaussian kernel correlation function:
wherein, K xx′ Representing the Gaussian kernel correlation matrix of x and x ', wherein x and x' respectively represent different feature symbols used for calculating the Gaussian kernel correlation matrix and are replaced by corresponding features | < x | > during the actual calculation process 2 The sum of the squares of the modes of each element in the feature x is divided by N, where N is the product of the two dimensions of the matrix x,represents the form of the matrix x in the Fourier domain, and->Represents->The complex conjugate of (a) and (b),expressing inverse Fourier transform, wherein sigma is the bandwidth of a Gaussian regression label, and T expresses transposition;
replacing x and x' in a formula of the Gaussian kernel correlation function with xf by using the fusion characteristic xf of the frequency domain, and calculating a Gaussian autocorrelation kernel matrix kf on the frequency domain;
s3.2.3: calculating a target regression coefficient alpha according to the Gaussian autocorrelation kernel matrix kf, wherein the calculation formula is as follows:
5. The anti-occlusion infrared target tracking method according to any one of claims 1 to 4, characterized in that the specific steps of the step S4 are as follows:
according to the template m of the current frame image t Determining a search box s for a current frame image t Search box s t Is 1.5 times the template size target _ sz, search box s t The central position of the template m t The center position of (a).
6. The anti-occlusion infrared target tracking method according to claim 5, wherein the specific steps of the step S5 are as follows:
S5.1: traversing search box s in current frame image t Obtaining a plurality of regions p to be matched with the template size target _ sz i Obtaining a region set A to be matched t ;
S5.2: based on a given cell size cell _ size, treat matching region set A t Each region p to be matched in i Sequentially extracting FHOG characteristics and Haar characteristics, linearly fusing, performing point multiplication with a two-dimensional cosine window cos _ window after linear fusion, and performing fast Fourier transform to obtain p of each area to be matched i Corresponding fusion feature zf in frequency domain i Set of regions to be matched A t The set of the fusion features on the corresponding frequency domain of all the areas to be matched is zf;
s5.3: based on each fusion feature zf according to the Gaussian kernel correlation function i Calculating a Gaussian cross-correlation kernel matrix on a frequency domain to obtain each region p to be matched i Corresponding Gaussian cross-correlation kernel matrix kzf in frequency domain i Wherein, the formula of the Gaussian kernel correlation function is as follows:
for each region p to be matched i Corresponding fusion feature zf in frequency domain i Respectively replacing x and x' with target models of mode _ xf and zf according to a Gaussian kernel correlation function formula i Calculating each region to be matched p i Corresponding Gaussian cross-correlation kernel matrix kzf in frequency domain i Set of regions to be matched A t The set of Gaussian cross-correlation kernel matrixes corresponding to all the regions to be matched is kzf;
obtaining a fusion characteristic kernel correlation filtering response graph set response according to a ridge regression response score function and a Gaussian cross-correlation kernel matrix set kzf, wherein the method specifically comprises the following steps:
obtaining each Gaussian cross-correlation kernel matrix kzf according to ridge regression response scoring function i Wherein the formula of the ridge regression response score function is:
wherein the content of the first and second substances,the value is a Gaussian cross-correlation kernel matrix kzf in a Gaussian cross-correlation kernel matrix set kzf i ,/>For the target regression coefficient mode _ alpha, based on the regression coefficient value>For a Gaussian cross-correlation kernel matrix kzf i Obtaining a single regression response value;
cross-correlating each Gaussian with a kernel matrix kzf i And arranging the corresponding single regression response values into matrixes according to the row and column sequence, performing inverse Fourier transform to return to a time domain, reserving a real part, obtaining a multilayer kernel correlation filter response graph, and obtaining a multilayer kernel correlation filter response graph set response according to a Gaussian cross correlation kernel matrix set kzf.
7. The anti-occlusion infrared target tracking method according to claim 6, wherein the specific steps of the step S7 are as follows:
s7.1: the maximum response value max (response) in the multi-layer kernel correlation filter response graph set is more than or equal to T 1 Taking the coordinate of the maximum response value max (response) in the current frame image as the central position pos of the target of the current frame image;
s7.2: if the current frame image is the last frame, ending the circulation, otherwise, updating the template in the step S3 to be the weighting of the template of the current frame image and the center position of the target of the current frame image, returning to the step S3 after updating to perform the next frame processing, wherein the updating formula is as follows:
m t+1 =(1-interp factor )·m t +interp factor ·p max
wherein m is t+1 For updated templates, m t For the current frame image template, p max For the area to be matched corresponding to the maximum response value max (response), interp factor Is a weight adjustment factor.
8. The anti-occlusion infrared target tracking method according to claim 7, wherein the specific steps of the step S8 are as follows:
s8.1: maximum response value max (response) in the multi-layer kernel correlation filter response graph set<T 1 Then the target is occluded, the template is not updated, m t+1 =m t ;
S8.2: according to the target state x in the previous frame image t-1 Predicting the target state of the current frame image, and taking out the coordinates of the center position of the target from the predicted target state, namely predicted coordinates, wherein the target state comprises the center position and the speed of the target, and the target is considered to move at a constant speed because the template is not updated, namely the template between two adjacent frames does not change greatly, and the specific formula for predicting the target state of the current frame image is calculated as follows:
x t =A·x t-1 +B·u t-1 +w t-1
where A is the state transition matrix, B is the matrix relating to the external control parameters, x t-1 Is the target state in the t-1 frame image, u t-1 Is the acceleration of the target in the t-1 frame image, and makes uniform motion, namely 0 t-1 For describing process noise, subject to a Gaussian distribution w t-1 ~N(0,Q t-1 ),p x And p y As coordinates of the center position of the object in the t-th frame image, v x ,v y Setting a state transition matrix as the speed of the central position of the target in the t-th frame image on the x-axis and the y-axis according to a uniform motion modelTherefore, the formula for predicting the target state of the current frame image is as follows:
s8.3: taking the central position of the target obtained by prediction as the center, and taking the target area weight obtained by actually matching the area with the size consistent with the template with the current frame image as the matching result of the current frame image, the specific steps are as follows:
s8.3.1, if the central position of the target in the previous frame image is obtained by prediction, the central position of the target area obtained by actual matching of the current frame imageThe central position of the target in the previous frame image is obtained, and the size of the target area is consistent with that of the template; if the central position of the target in the previous frame image is not predicted, the central position of the target area actually matched and obtained by the current frame image is greater than or equal to the central position of the target area>Taking the position of the maximum response value in the current frame image as the central position of the actual matching target of the current frame, wherein the size of the target area is consistent with that of the template;
s8.3.2: calculating a covariance matrix of prior estimation of a current frame image, namely a t frame image, wherein a specific formula is as follows:
for the posterior error of t-1 frame image, the initial value is a given value, A T The method is characterized in that the method is the transposition of A, and Q is the process noise covariance given by a frame image;
s8.3.3: calculating a filter gain matrix for a current frame imageThe calculation formula is as followsThe following:
wherein, the first and the second end of the pipe are connected with each other,is the state transition matrix, is asserted> Is a transpose of the state transition matrix, R t Is the observed noise covariance, as a constant value R, (X) -1 Represents the inverse of X;
s8.3.4: according to the filter gain matrix of the current frame imageAnd predicted target state x t Best estimated position->I.e. the matching result, the calculation formula is as follows:
represents the central position of the target area actually matched with the current frame image, namely the measured value, is->Is measured value->And the predicted coordinate->Error between, by v t Is represented by v t Satisfy the Gaussian distribution, v t ~N(0,R t );
S8.3.5: if the current frame is not the last frame, based on the filter gain matrixStatus switch matrix->Covariance matrix P with a priori estimate t Updating the posterior error of the current frame image, wherein the calculation formula is as follows:
s8.4: based on the best estimated positionThe central position updating formula for updating the target of the current frame image is as follows:
wherein, pos x And pos y For the updated center position of the target, p x And p y For best estimation of positionThe coordinates of (a);
s8.5: taking the central position of the matching result as the center, namely the central position of the updated target as the center, taking a rectangular frame with the size 3 times of the matching result in the current frame image as a search frame, traversing the region set to be matched in the search frame, extracting the directional gradient histogram and haar characteristics of each region to be matched in the region set to be matched, and then performing subsequent processing to obtain a relevant filtering response image set, wherein the specific steps are as follows:
s8.5.1: to match the center position (pos) of the result x ,pos y ) As the center, the length and width of the search box are determined to be 3 times of the length and width of the template,by the size w of the template m ,l m Traversing the whole search box to obtain a region set to be matched;
s8.5.2: then obtaining fusion characteristics corresponding to a plurality of regions to be matched based on the region set to be matched, and calculating a multilayer kernel-dependent filter response graph set response _ c based on the fusion characteristics, corresponding target models and target regression coefficients;
s8.6: if the maximum response value max (response _ c) in the multi-layer kernel correlation filter response map set is more than or equal to T 2 And updating the center position of the target of the current frame in the following manner:
wherein, pos x ,pos y For the updated centre position, p, of the target in the current frame image x ,p y For the centre position, w, of the target obtained in step S8.4 m ,l m Is the size of the template, vert x ,vert y Relative search box top left corner (pos) at maximum response value, respectively x -1.5·w m ,pos y -1.5·l m ) Moving the number of pixels;
s8.7: if the current frame is the last frame, ending the circulation, otherwise, updating the template in the step S3 to be the template m of the current frame image t And the weighting of the center position of the target in the current frame image, the formula is calculated as follows:
m t+1 =(1-interp factor )·m t +interp factor ·p_c max
wherein m is t+1 For updated templates, m t As a template for the current frame image, p _ c max Corresponding to the maximum response value max (response _ c)Region to be matched, interp factor Is a weight adjustment factor;
s8.8: if the maximum response value max (response _ c) in the multi-layer kernel correlation filter response map set<T 2 If the current frame image is the last frame, the tracking is finished, otherwise, the next frame image is read as the current frame, and the step S8 is carried out.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910547576.2A CN110276785B (en) | 2019-06-24 | 2019-06-24 | Anti-shielding infrared target tracking method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910547576.2A CN110276785B (en) | 2019-06-24 | 2019-06-24 | Anti-shielding infrared target tracking method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110276785A CN110276785A (en) | 2019-09-24 |
CN110276785B true CN110276785B (en) | 2023-03-31 |
Family
ID=67961532
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910547576.2A Active CN110276785B (en) | 2019-06-24 | 2019-06-24 | Anti-shielding infrared target tracking method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110276785B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110728697B (en) * | 2019-09-30 | 2023-06-13 | 华中光电技术研究所(中国船舶重工集团有限公司第七一七研究所) | Infrared dim target detection tracking method based on convolutional neural network |
CN110796687B (en) * | 2019-10-30 | 2022-04-01 | 电子科技大学 | Sky background infrared imaging multi-target tracking method |
CN111563919B (en) * | 2020-04-03 | 2023-12-29 | 深圳市优必选科技股份有限公司 | Target tracking method, device, computer readable storage medium and robot |
CN111721420B (en) * | 2020-04-27 | 2021-06-29 | 浙江智物慧云技术有限公司 | Semi-supervised artificial intelligence human body detection embedded algorithm based on infrared array time sequence |
CN113076949B (en) * | 2021-03-31 | 2023-04-18 | 成都唐源电气股份有限公司 | Method and system for quickly positioning parts of contact net |
CN112991394B (en) * | 2021-04-16 | 2024-01-19 | 北京京航计算通讯研究所 | KCF target tracking method based on cubic spline interpolation and Markov chain |
CN114066934B (en) * | 2021-10-21 | 2024-03-22 | 华南理工大学 | Anti-occlusion cell tracking method for targeting micro-operation |
CN115631359B (en) * | 2022-11-17 | 2023-03-14 | 诡谷子人工智能科技(深圳)有限公司 | Image data processing method and device for machine vision recognition |
CN115631216B (en) * | 2022-12-21 | 2023-05-12 | 金城集团有限公司 | Multi-feature filter fusion-based holder target tracking system and method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103700112A (en) * | 2012-09-27 | 2014-04-02 | 中国航天科工集团第二研究院二O七所 | Sheltered target tracking method based on mixed predicting strategy |
CN105981075A (en) * | 2013-12-13 | 2016-09-28 | 英特尔公司 | Efficient facial landmark tracking using online shape regression method |
CN106887012A (en) * | 2017-04-11 | 2017-06-23 | 山东大学 | A kind of quick self-adapted multiscale target tracking based on circular matrix |
CN108550161A (en) * | 2018-03-20 | 2018-09-18 | 南京邮电大学 | A kind of dimension self-adaption core correlation filtering fast-moving target tracking method |
CN108665481A (en) * | 2018-03-27 | 2018-10-16 | 西安电子科技大学 | Multilayer depth characteristic fusion it is adaptive resist block infrared object tracking method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011013179A1 (en) * | 2009-07-31 | 2011-02-03 | 富士通株式会社 | Mobile object position detecting device and mobile object position detecting method |
-
2019
- 2019-06-24 CN CN201910547576.2A patent/CN110276785B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103700112A (en) * | 2012-09-27 | 2014-04-02 | 中国航天科工集团第二研究院二O七所 | Sheltered target tracking method based on mixed predicting strategy |
CN105981075A (en) * | 2013-12-13 | 2016-09-28 | 英特尔公司 | Efficient facial landmark tracking using online shape regression method |
CN106887012A (en) * | 2017-04-11 | 2017-06-23 | 山东大学 | A kind of quick self-adapted multiscale target tracking based on circular matrix |
CN108550161A (en) * | 2018-03-20 | 2018-09-18 | 南京邮电大学 | A kind of dimension self-adaption core correlation filtering fast-moving target tracking method |
CN108665481A (en) * | 2018-03-27 | 2018-10-16 | 西安电子科技大学 | Multilayer depth characteristic fusion it is adaptive resist block infrared object tracking method |
Non-Patent Citations (6)
Title |
---|
a novel method for quantifying target tracking difficulty of the infrared of the image sequence;Haichao, Zheng;《infrared physics & technology》;20150930;第72卷;第8-18页 * |
target tracking based on biological-like vision identity via improved sparse representation and particle filtering;Li,Gun;《gognitive computation》;20161116;第8卷(第5期);第910-923页 * |
基于KCF和SIFT特征的抗遮挡目标跟踪算法;包晓安等;《计算机测量与控制》;20180525(第05期);第154-158页 * |
基于视频监控的目标检测与跟踪算法研究;姜丹;《中国优秀硕士学位论文全文数据库信息科技辑》;20181215(第12期);第I136-602页 * |
复杂背景中的红外人体目标检测技术研究;刘兆雄;《中国优秀硕士学位论文全文数据库信息科技辑》;20150115(第01期);第I140-623页 * |
强背景下模糊闪烁成像目标探测与跟踪;许俊平;《强激光与粒子束》;20080430(第4期);第537-541页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110276785A (en) | 2019-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110276785B (en) | Anti-shielding infrared target tracking method | |
US11763485B1 (en) | Deep learning based robot target recognition and motion detection method, storage medium and apparatus | |
CN108615027B (en) | Method for counting video crowd based on long-term and short-term memory-weighted neural network | |
CN108665481B (en) | Self-adaptive anti-blocking infrared target tracking method based on multi-layer depth feature fusion | |
CN105335986B (en) | Method for tracking target based on characteristic matching and MeanShift algorithm | |
CN107424171B (en) | Block-based anti-occlusion target tracking method | |
CN109242884B (en) | Remote sensing video target tracking method based on JCFNet network | |
CN107424177B (en) | Positioning correction long-range tracking method based on continuous correlation filter | |
CN111311647B (en) | Global-local and Kalman filtering-based target tracking method and device | |
CN110956653B (en) | Satellite video dynamic target tracking method with fusion of correlation filter and motion estimation | |
CN104050488B (en) | A kind of gesture identification method of the Kalman filter model based on switching | |
CN109949340A (en) | Target scale adaptive tracking method based on OpenCV | |
CN106295564B (en) | A kind of action identification method of neighborhood Gaussian structures and video features fusion | |
CN104680559B (en) | The indoor pedestrian tracting method of various visual angles based on motor behavior pattern | |
CN107169994B (en) | Correlation filtering tracking method based on multi-feature fusion | |
CN111523447B (en) | Vehicle tracking method, device, electronic equipment and storage medium | |
CN110097575B (en) | Target tracking method based on local features and scale pool | |
CN102156995A (en) | Video movement foreground dividing method in moving camera | |
CN110827262B (en) | Weak and small target detection method based on continuous limited frame infrared image | |
CN112488057A (en) | Single-camera multi-target tracking method utilizing human head point positioning and joint point information | |
CN104050685A (en) | Moving target detection method based on particle filtering visual attention model | |
Li et al. | Object tracking in satellite videos: Correlation particle filter tracking method with motion estimation by Kalman filter | |
CN110569706A (en) | Deep integration target tracking algorithm based on time and space network | |
CN107609571A (en) | A kind of adaptive target tracking method based on LARK features | |
CN111027586A (en) | Target tracking method based on novel response map fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |