CN104766343B

CN104766343B - A kind of visual target tracking method based on rarefaction representation

Info

Publication number: CN104766343B
Application number: CN201510142274.9A
Authority: CN
Inventors: 解梅; 张碧武; 何磊; 卜英家
Original assignee: University of Electronic Science and Technology of China
Current assignee: Houpu Clean Energy Group Co ltd
Priority date: 2015-03-27
Filing date: 2015-03-27
Publication date: 2017-08-25
Anticipated expiration: 2035-03-27
Also published as: CN104766343A

Abstract

The invention discloses a kind of visual target tracking method based on rarefaction representation, belong to technical field of computer vision.The present invention method be：It is primarily based on target image and determines that follow-up tracking processing needs the judgement dictionary used, match dictionary, gray matrix T, then processing is tracked to each picture frame to be tracked, sample multiple first candidate image sample sets, K representative cluster centres are selected by K mean cluster, and its value of the confidence is calculated based on judgement dictionary, sampling center is set based on maximum the value of the confidence, so as to obtain the second candidate image sample set, N number of the value of the confidence highest candidate image is therefrom selected again, sampled based on matching dictionary by fragment the gray matrix of the candidate image, take with gray matrix T-phase like tracking target of the degree highest candidate image as present frame.The present invention is used for field of intelligent monitoring, and to tracking target carriage change, situations such as ambient lighting changes and blocked is with very strong robustness.

Description

A kind of visual target tracking method based on rarefaction representation

Technical field

The invention belongs to computer vision field, and in particular to field of intelligent monitoring, more particularly to a kind of to be based on sparse table The visual target tracking shown.

Background technology

View-based access control model target following technology (generally refer to based on video either image sequence target following), be Target object is detected in a series of image, extracted, recognized and tracked, so that the relevant parameter of target object is obtained, Such as position, speed, yardstick, track；Further handled and analyzed according to tracking result, realize the behavior to target object Understand, or complete more higher leveled task.This is the hot fields rapidly emerged, belongs to grinding for computer vision field Study carefully category, be with a wide range of applications and scientific research value.By the evolution of nearly twenty or thirty year, go out both at home and abroad Various track algorithms, including some classical algorithms and the innovative algorithm based on these algorithms are showed.

It is different according to the philosophy and technique of track algorithm, these track algorithms can be divided into the tracking of discrimination model With the tracking of generation model.Track algorithm based on discrimination model regards tracking problem as classification problem, especially in list In target following, tracking problem is regarded as two classification problems, relatively classical grader there are SVM (Support Vector Machine) grader, Adaboost grader.The algorithm based on grader of more prevalence has CSK, Struct, CT recently Scheduling algorithm.For the track algorithm based on generation model, the purpose of tracking is to be searched out in neighborhood and target similarity Maximum region, actually finds the state in the region, i.e. optimum estimation target maximum with target similarity function, wherein Relatively effective classic algorithm has Kalman filtering, particle filter, more popular recently to have MTT, LSK scheduling algorithms.

Sparse track algorithm is developed in recent years than a kind of relatively rapid emerging track algorithm, belongs to the tracking of generation model Method.Because algorithm is the final result that is obtained based on a series of ATL, thus for illumination variation, complex environment and Posture change etc. all has preferable robustness, such as document " X.Mei and H.Ling. " Robust visual tracking CN 103440645 A using L1 minimization”.12th International Conference on Computer Vision, Kyoto, Japan, 2009 (1436-1443) " is still, often whole using target due to ATL Body template is not therefore good to the situation treatment effect of target occlusion as feature.

Above-mentioned algorithm can carry out the real-time tracking of robust in simple scene mostly.But daily regarded unrestricted It is still a significant challenge that target is tracked in frequency.Illumination, change in size, block, target rotation, complex background etc. are all brought Difficult factor.Therefore, a stronger track algorithm of robustness how is designed, can be obtained in the case where condition is more harsh Obtain and preferably show, be one important problem of computer vision field.

The content of the invention

The present invention proposes a kind of Vision Tracking based on rarefaction representation, to tracking target carriage change, ambient light According to situations such as changing and block, with very strong robustness, and the situation treatment effect to target occlusion can be improved.

The present invention includes two parts, is configured to track the judgement dictionary D of target, matching dictionary A and Gray Moment first Battle array T, is then searched and tracking mesh based on judgement dictionary D, matching dictionary A and gray matrix T in candidate image in the current frame The most like candidate image of mark, determines the target image of currently tracked target, its detailed process is：

Step 1：Determine initialisation image and determine the target image of the initialisation image, in this step, generally take First two field picture of video either image sequence is initialisation image, and determines the target image (histogram of initialisation image As block), i.e. To Template, can be obtained according to the initialized location of initialisation image manual setting target image (target sizes, Position coordinates in image).

Step 2：Based on current target image generation judgement dictionary D, matching dictionary A and To Template gray scale attribute matrix T：

In initialisation image, determining distance objective picture centre P1, (preset value, occurrence is according to different processing environments Rule of thumb set, be usually arranged as 1~10) rectangular area of individual pixel be the sample area of foreground template, it is determined that distance Target image center P2 (preset value, and P2 is more than P1, occurrence rule of thumb sets according to different processing environments, generally set The difference for putting P2 and P1 is 1~5) empty set of the sample area of the rectangular area of individual pixel and foreground template be background template Sample area (i.e. around foreground template sample area three-back-shaped region)；Sample area to foreground template is carried out at random Sampling, and select N_p(preset value, occurrence is rule of thumb set according to different processing environments, and usual value is [30,60]) The individual sampled images equal with target image size are used as foreground template D₊；Sample area to background template is adopted at random Sample, and select N_n(preset value, occurrence is rule of thumb set according to different processing environments, usual value be set to [200, 300]) the individual sampled images equal with target image size are used as background template D_-, by foreground template D₊With background template D_-Constitute Adjudicate dictionary D；

Based on default rectangular slide window, (size is w*h, and its wide w, high h value are preset value, and occurrence is according to difference Processing environment rule of thumb set, be generally based on target image size and be configured, for example w could be arranged to target image The 1/8 of width, 1/16,1/32,1/64 etc., h could be arranged to the 1/8 of target image height, 1/16,1/32,1/64 etc.), Left and right, upper and lower slide is carried out on target image to sample, and the left and right sampling interval is identical, the sampling interval is identical (usually up and down May be configured as w/2, up and down the sampling interval may be configured as h/2, naturally it is also possible to carry out other and such as w/4, h/4 etc. be set, depend on In the fineness requirement of calculating), the fragment collection of target image is obtained, and choosing is concentrated from the fragment based on K mean cluster method K fragment (K representative fragment) is taken, by K fragment composing training collection and matching dictionary A is obtained, wherein K is K equal It is worth the general term of clustering method, is preset value, occurrence is rule of thumb set according to different processing environments, usual span is [50,100], when K mean cluster method described below is calculated, K value can set different value, usual value Scope is [50,100]；It is provided for the gray matrix T of the gray value of each fragment of the fragment collection of stored target image, the ash The gray value of each fragment is in degree matrix T：The each point pixel value of fragment cumulative and.

Step 3：Based on build judgement dictionary D, matching dictionary A and gray matrix T, determine initialisation image it is each after The target location of continuous two field picture, i.e., carry out target following processing to non-first two field picture：

301：The sampling of candidate's particle and the pre- selection of candidate's particle

It is the sampling of candidate's particle first：Current frame image is sampled, and (present treatment process is directly based upon conventional techniques Completion, the present invention is not construed as limiting, and sampling can be generally completed with particle filter mode, and the number of particles of sampling is specific according to reality It is general to may be selected 500 or so depending on the situation of border), the first candidate image sample set is obtained, K mean cluster is then based on from first Candidate samples, which are concentrated, chooses K cluster centre, and calculates the value of the confidence H of each cluster centre_c：

Corresponding sparse coefficient is calculated based on formula (1)：

Wherein D (refers specifically to foreground template D for judgement dictionary₊(it can also claim foreground template dictionary D₊) and background template D_-( Background template dictionary D can be claimed_-)), X is some sample (referring specifically to each candidate image in the present invention), can be asked by L1 optimization problems Sparse coefficient α is obtained, wherein μ is the predetermined coefficient of rarefaction representation formula, and occurrence is rule of thumb set according to different experimental situations It is fixed.‖.‖₁、‖.‖₂1 norm, 2 norms of correspondence vector are represented respectively.

Because if a sample has small reconstructed error to mean that the sample is likely to one on foreground template collection Target., whereas if a sample has small reconstructed error to mean that this sample is likely to background on background template collection. Therefore, putting for sample can be constructed according to reconstructed error of the sample on reconstructed error and background template collection on foreground template collection Letter value H_c:

H_c=exp (- ε_f+ε_b) (2)

In above formula,It is sample (candidate image) X in foreground template D₊On reconstructed error, a₊It is Candidate image X corresponding to cluster centre is based on foreground template D₊Sparse coefficient (according to formula (1) calculate solve),It is sample X in background template D_{_}On reconstructed error, α_{_}It is the candidate image X corresponding to cluster centre Based on foreground template D_-Sparse coefficient (according to formula (1) calculate solve).

It is candidate's this sampling of sampled images center to take the maximum cluster centre of the value of the confidence in K cluster centre, is waited based on described Image pattern sampling center is selected sample obtaining the second candidate image sample set.

Next to that the pre- selection to candidate's particle, the processing mainly by choosing in advance, removes a large amount of departure ratios larger Candidate's particle, further to reduce candidate's scope.It is processed as：To the second candidate image sample set, according to formula (1) and (2) Calculate the value of the confidence of each candidate image, then therefrom choose N before the value of the confidence highest (empirical value, span generally may be configured as [20,60]) individual candidate is as new candidate image, to treat below further processing.The value of the confidence highest is not directly selected herein One candidate image is allowed for because the reason for blocking as target, and the value of the confidence highest candidate may not be optimal waits Choosing, need to by below block processing could select optimal candidate.

In view of in follow-up object tracking process, target is possible to be blocked by background, if without blocking processing, It is possible to that target is treated as background during tracking, so that the tracking result of mistake is caused, therefore, it is necessary to carry out blocking place Reason so that the present invention also has preferable robustness to circumstance of occlusion, and so-called block is processed as：

Carrying out fragment sampling in the way of sliding window to each new candidate image first (needs to record the coordinate of fragment Position), the mode of sampling is identical to the sample mode of target image with step 2, and obtained fragment of sampling contains the candidate Each part of image, matching dictionary A is then based on to each fragment, corresponding sparse coefficient is calculated using formula (1) β_i, it is next based on formulaCalculate each fragment y_iReconstructed error ε_i(subscript i is used to identifying different broken Piece).Fragment for belonging to background, its reconstructed error can be larger, belongs to the fragment of prospect, and its reconstructed error can be smaller, if One threshold epsilon₀(span is usually [0.3 0.6]), successively come the attribute of each fragment for judging candidate image, that is, works as ε_i＞ ε₀, illustrate that the fragment belongs to background；Work as ε_i≤ε₀When, illustrate that the fragment belongs to prospect.Then set up one and record every successively The attribute of individual fragment attribute matrix (for example, it is m*n to define target image or fragment sum corresponding to candidate image, Wherein m is often capable fragment sum, and n is the fragment sum of each column, then attribute matrix is the two-dimensional matrix of m*n ranks, also may be used certainly To store successively by row vector or column vector, simply need to limit the storage mode one that target image and candidate image are taken Cause, so as to realize matching treatment), i.e., when fragment belongs to prospect, 1 is set in the matrix correspondence position value, when belonging to background, its Value is set to 0, and 0-1 matrix is so set up to each candidate.In the same way to initialisation image (the first frame figure Picture) in target set up 0-1 matrix as To Template (To Template attribute matrix), then by each candidate and mould Plate is matched, and one candidate image of matching degree highest is the target image of present frame.If being directly based upon target image institute Corresponding attribute matrix carries out matched jamming target with the attribute matrix corresponding to candidate image, then can be without life in step 2 Into To Template gray scale attribute matrix T, the corresponding objective attribute target attribute matrix of foregoing description generation target image is based only upon.

Specific reality method is as follows：

To each new candidate image c, the rectangular slide window of a w*h size is set first, using the sliding window in candidate The enterprising line slip samplings of image c, obtain candidate image c fragment collection, based on matching dictionary A (also referred to as sparse dictionary A), root Solved according to formula (3) and obtain each fragment y_iSparse coefficient vector β_i：

The judgement dictionary in formula (1) is replaced with into matching dictionary A, sample corresponds to each fragment.

Further according toCalculate each fragment y_iReconstructed error ε_i, and based on reconstructed error ε_iWith presetting Threshold epsilon₀Magnitude relationship the property value of each fragment is set, then the property value of each fragment is sequentially recorded in attribute matrix S_c In.

If being directly based upon the attribute matrix corresponding to target image to be matched with the attribute matrix corresponding to candidate image Target is tracked, then directly by N number of attribute matrix S_cMatching treatment is carried out with objective attribute target attribute matrix S respectively, matching degree highest is taken Attribute matrix S_cCorresponding candidate image is the target image of present frame.

Meanwhile, the present invention is also based on the phase of the gray matrix and the gray matrix corresponding to new candidate image of target The target image of present frame is tracked like degree.It is specifically processed as, i.e., firstly the need of in step 2, based on initialisation image Target image determines To Template gray scale attribute matrix T (gray value of each fragment of stored target image successively), is then hiding In gear processing, in addition it is also necessary to which for each new candidate image, one gray matrix F is set^c, the candidate image c's new for depositing is every The gray value of individual fragment, in order to simplify calculating, the attribute matrix S based on each candidate image_cTo its gray matrix F^cIt is normalized Processing, that is, work as S_cIn property value be 0 when, then by F^cThe value of middle correspondence position is set to 0；Work as S_cIn property value be 1 when, F^c The value of middle correspondence position be the cumulative of the gray value of correspondence fragment, i.e. each point pixel value of the fragment with.Finally, take and gray scale Gray matrix F most like matrix T^cCorresponding new candidate image c as present frame target image.

Further, the similarity L of candidate image and target image can be calculated according to formula (4)_c：

M for target image (To Template gray scale attribute matrix T) gray value summation a little, i.e., by calculated value normalizing Change.Represent candidate image c gray matrix F^cEach element, T_ijRepresent To Template gray scale attribute matrix T each element.Through Similarity that above formula tried to achieve is crossed in the range of [0,1], similarity L in all candidates is found out_cA maximum candidate, should Candidate image and the matching degree highest of target image, therefore, can using the candidate image as present frame tracking target (target figure Picture).

In order to keep the real-time that tracking is handled, it is determined that after the target image of complete present frame, setting renewal to handle, it has Body updates processing mode：

Target image based on present frame, according to step 2 to foreground template D₊, background template D_-, matching dictionary A and target Template attributes matrix S or To Template gray scale attribute matrix T are updated；

Or, the target image based on present frame matches dictionary A and gray matrix T according to step 2 pair and is updated；And And every 5-10 frames, the target image based on present frame, the sample area according to step 2 in background template carries out stochastical sampling, The individual sampled images equal with target image size of N ' are selected, background template D is added to_-In, and delete background template D_-In from work as Previous frame interval time most long individual sampled images of N ', wherein 1≤N '≤N_n(during tracking below, tracking environmental is not Disconnected change, tracking object variations are little, but background is changed greatly, therefore a consideration cycle place can be updated to background template Reason)；Further, the target image of present frame can also be added to foreground template D₊In, and delete foreground template D₊In from work as Previous frame interval time most long sampled images.

In summary, by adopting the above-described technical solution, the beneficial effects of the invention are as follows：Tracking targeted attitude is become Change, situations such as ambient lighting changes and blocked can improve the situation processing effect to target occlusion with very strong robustness Really.

Brief description of the drawings

The present invention will illustrate by embodiment and with reference to the appended drawing, wherein：

Fig. 1 is the processing procedure schematic diagram of the specific embodiment of the invention；

Fig. 2 is the foreground template (positive template) of the specific embodiment of the invention, the sampling signal of background template (negative norm plate) Figure；

Fig. 3 obtains exemplary plot for the fragment of the specific embodiment of the invention.

Embodiment

To make the object, technical solutions and advantages of the present invention clearer, with reference to embodiment and accompanying drawing, to this hair It is bright to be described in further detail.

Referring to Fig. 1, for actual current frame image, whether be first frame, if the first frame, then if first determining whether present frame Need according to the target image information in the first frame that (position of target in the picture, the information such as size (length and width are respectively W, H) is carried It is preceding obtain) ask for the present invention tracking needed for judgement dictionary (foreground template and background template), matching dictionary and its Its information, be specifically：For the first two field picture, the sample area first shown in figure two carries out palette sample, obtained positive and negative Template is used as training set.Positive template is to pass through upper and lower, left and right translation 1-5 in the target location indicated with central rectangular frame Some images of the mode stochastical sampling of pixel, then choose N by way of K mean cluster_pIndividual cluster centre is used as positive template (foreground template).Negative norm plate is some images of being sampled in the relatively remote position in position of distance mark, equal also with K The method of value cluster chooses N_nIndividual image (i.e. in a back-shaped region of distance objective center certain distance, equally return by size One turns to W*H), these sampled images are used as negative norm plate (background template).

Also need to calculate simultaneously and obtain matching dictionary A and gray matrix T in the first frame, i.e., be by the broken of To Template Piece, which is obtained, carries out correspondence calculating, and its method is：The rectangular slide window of a w*h size is set first, using the sliding window in target It is left and right on image, upper and lower enter line slip sampling, the left and right sampling interval be w/2, up and down the sampling interval be h/2, if target image Height and width be respectively H and W, then number of samples is [W/ (w/2) -1] * [H/ (h/2) -1] (as shown in figure 3, its number of samples is 8*8).It is same that these fragments are asked for into K most representational fragments first with K mean cluster method, then by row heap The mode put piles column vector composing training collection, utilizes approximatioss (such as convex method of relaxation) generation judgement dictionary A.For target mould Plate gray scale attribute matrixFor each point in T, a local fragment of target image all correspond to Gray value.

If present frame is not the first two field picture, the target location for estimating present frame is needed (to determine the target figure of present frame Picture).Current frame image is sampled first with particle filter mode, more candidate images is obtained and is used as the first candidate Image pattern set, then chooses K cluster centre using K mean cluster, is calculated according to formula (1) and (2) in each cluster The value of the confidence of the heart, regard a maximum cluster centre of the value of the confidence as candidate's sampling center.Then according to the sampling center again A number of candidate image is obtained as the second candidate image sample set in particle filter mode.Calculate and the second candidate is schemed As the value of the confidence of each candidate image in sample set, 20 candidates are taken before the value of the confidence highest as new candidate image.

Each in the candidate image new to 20, the mode obtained according to fragment in the first frame obtains fragment, to every One fragment, utilizes the reconstructed error ε for matching dictionary A and asking each fragment_i, work as ε_i＞ ε₀, illustrate that the fragment belongs to background；When ε_i≤ε₀When, illustrate that the fragment belongs to prospect.By the fragment in attribute matrix S_cThe value of correspondence position is set to 1.By repeating to count Calculate, for each candidate image, its corresponding attribute matrix S can be obtained_c.Then asked with gray matrix T in the first frame Take method similar, ask for the gray matrix of each candidate imageIt is but different with the first frame here It is to need to combine attribute matrix S_c, it is the normalizing of corresponding points in gray matrix when the property value that certain in attribute matrix is put is 0 Ashing angle value is also 0, and when property value is 1, Normalized Grey Level value is constant.Reference picture 3, so that the total number of fragment is 8*8 as an example, Its corresponding attribute matrix S_c, To Template gray scale attribute matrix T and new candidate image gray matrix F^cIt is then 8*8's Two-dimensional matrix, if the property value of the fragment of (3,4) is 0, then attribute matrix S_cMiddle S₃₄Value be 0, and gray matrix F^cMiddle F₃₄ Value be also 0, if the property value of the fragment of (4,2) is 1, then attribute matrix S_cMiddle S₄₂Value be 1, and gray matrix F^cIn F₄₂Value the summation tried to achieve is added by each point pixel value of fragment (4,2).

The similarity of candidate and template is calculated finally according to formula (4), will wherein one candidate image of similarity highest Foreground template, background template, matching dictionary A and target mould are updated as the target image of present frame, and using the target image Plate gray scale attribute matrix T.

Claims

1. a kind of visual target tracking method based on rarefaction representation, it is characterised in that comprise the following steps：

Step 1：Determine initialisation image and determine the target image of the initialisation image；

Step 2：Based on current target image generation foreground template D₊, background template D_-, matching dictionary A and To Template gray scale category Property matrix T：

In the initialisation image, the rectangular area of P1 pixel of distance objective picture centre adopting for foreground template is determined Sample region, the empty set for determining the rectangular area of P2 pixel of distance objective picture centre and the sample area of foreground template is the back of the body The sample area of scape template, wherein P1, P2 a preset value, and P2 is more than P1；Sample area to foreground template is adopted at random Sample, and select N_pThe individual sampled images equal with target image size are used as foreground template D₊；Sample area to background template is entered Row stochastical sampling, and select N_nThe individual sampled images equal with target image size are used as background template D_-；

Based on default rectangular slide window, left and right, upper and lower slide is carried out on target image and is sampled, and sampling interval phase up and down Together, the left and right sampling interval is identical, obtains the fragment collection of target image, and concentrate choosing from the fragment based on K mean cluster method K fragment is taken, by the K fragment composing training collection and matching dictionary A is obtained；And fragment collection is calculated based on matching dictionary A Each fragment y_iCorresponding sparse coefficient β_i, subscript i is used to identify different fragments, and is based on formulaMeter Calculate fragment y_iReconstructed error ε_i；If ε_iMore than predetermined threshold value ε₀, then fragment y is set_iProperty value be 0, be otherwise provided as 1；

The To Template gray scale attribute matrix T of the gray value of each fragment of the fragment collection of stored target image successively is provided for, The gray value of each fragment is in the matrix T：The each point pixel value of fragment cumulative and；

Step 3：Determine the target location of each follow-up two field picture of initialisation image：

301：Current frame image is sampled, the first candidate image sample set is obtained, is waited based on K mean cluster from described first This concentration of sampling chooses K cluster centre, and calculates the value of the confidence H of each cluster centre_c；Take the value of the confidence in K cluster centre Maximum cluster centre is candidate's this sampling of sampled images center, is sampled based on the candidate image specimen sample center Second candidate image sample set, and calculate the value of the confidence H that the second candidate samples concentrate each candidate samples_c, take the value of the confidence highest Top n candidate image is used as new candidate image；The first candidate image sample set with it is each in the second candidate image sample set The size of image pattern is consistent with target image；

The value of the confidence H_cCalculation formula beCalculate the value of the confidence H of each cluster centre_c, prospect weight Structure errorWherein a₊Represent that the candidate image X corresponding to cluster centre is based on foreground template D₊Sparse system Number；Background reconstruction errorWherein a_-Represent that the candidate image X corresponding to cluster centre is based on background mould Plate D_-Sparse coefficient；

302：Based on sample mode is slided with the fragment collection identical that target image is obtained in step 2, obtain new candidate image c's Fragment collection, and each fragment y is calculated based on matching dictionary A_iCorresponding sparse coefficient β_i, subscript i be used for identify different fragments, And based on formulaCalculate fragment y_iReconstructed error ε_iIf, ε_iMore than predetermined threshold value ε₀, then fragment y is set_i Property value be 0, be otherwise provided as 1；

303：Set a property matrix S_c, the attribute of each fragment for depositing new candidate image c successively, setting gray matrix F^c, For the gray value for each fragment for depositing new candidate image c successively, the gray value of the fragment is each point pixel value of fragment It is cumulative and, and based on attribute matrix S_cTo gray matrix F^cValue be adjusted：If S_cIn property value be 0, then by F^cIn The value of correspondence position is set to 0；

304：Take the gray matrix F most like with To Template gray scale attribute matrix T^cCorresponding new candidate image c is as current The target image of frame.

2. the method as described in claim 1, it is characterised in that in the step 304, calculate To Template gray matrix T with Gray matrix F^cSimilarity L_cFormula beWherein, M represents gray matrix T each elements Summation, the element position of i and j representing matrixs.

3. method as claimed in claim 1 or 2, it is characterised in that the step 3 also includes 305：Target based on present frame Image, according to step 2 to foreground template D₊, background template D_-, matching dictionary A and To Template gray scale attribute matrix T carry out more Newly.

4. method as claimed in claim 1 or 2, it is characterised in that the step 3 also includes 305：Target based on present frame Image, matches dictionary A and gray matrix T according to step 2 pair and is updated；And every 5-10 frames, the target based on present frame Image, carries out stochastical sampling, N ' is individual equal with target image size adopts for selection according to step 2 in the sample area of background template Sampled images, are added to background template D_-In, and delete background template D_-In from present frame interval time most long individual sample graphs of N ' Picture, wherein 1≤N '≤N_n。

5. method as claimed in claim 4, it is characterised in that the step 305 also includes, every 5-10 frames, by present frame Target image be added to foreground template D₊In, and delete foreground template D₊In from the most long sampling of present frame interval time Image.

6. the method as described in claim 1, it is characterised in that in the step 2, P1 value is 1~10, P2 and P1 difference It is worth for 1~5.

7. the method as described in claim 1, it is characterised in that the initialisation image is the of video image or image sequence One two field picture.

8. the method as described in claim 1, it is characterised in that left when entering line slip sampling based on default rectangular slide window The right sampling interval is w/2, and the sampling interval is h/2 up and down, and wherein w represents the width of rectangular slide window, and h represents rectangular slide window It is high.

9. the method as described in claim 1, it is characterised in that in the step 301, sampled, obtained based on particle filter Take the first candidate image sample set and the second candidate image sample set.