The content of the invention
The present invention proposes a kind of Vision Tracking based on rarefaction representation, to tracking target carriage change, ambient light
According to situations such as changing and block, with very strong robustness, and the situation treatment effect to target occlusion can be improved.
The present invention includes two parts, is configured to track the judgement dictionary D of target, matching dictionary A and Gray Moment first
Battle array T, is then searched and tracking mesh based on judgement dictionary D, matching dictionary A and gray matrix T in candidate image in the current frame
The most like candidate image of mark, determines the target image of currently tracked target, its detailed process is:
Step 1:Determine initialisation image and determine the target image of the initialisation image, in this step, generally take
First two field picture of video either image sequence is initialisation image, and determines the target image (histogram of initialisation image
As block), i.e. To Template, can be obtained according to the initialized location of initialisation image manual setting target image (target sizes,
Position coordinates in image).
Step 2:Based on current target image generation judgement dictionary D, matching dictionary A and To Template gray scale attribute matrix T:
In initialisation image, determining distance objective picture centre P1, (preset value, occurrence is according to different processing environments
Rule of thumb set, be usually arranged as 1~10) rectangular area of individual pixel be the sample area of foreground template, it is determined that distance
Target image center P2 (preset value, and P2 is more than P1, occurrence rule of thumb sets according to different processing environments, generally set
The difference for putting P2 and P1 is 1~5) empty set of the sample area of the rectangular area of individual pixel and foreground template be background template
Sample area (i.e. around foreground template sample area three-back-shaped region);Sample area to foreground template is carried out at random
Sampling, and select Np(preset value, occurrence is rule of thumb set according to different processing environments, and usual value is [30,60])
The individual sampled images equal with target image size are used as foreground template D+;Sample area to background template is adopted at random
Sample, and select Nn(preset value, occurrence is rule of thumb set according to different processing environments, usual value be set to [200,
300]) the individual sampled images equal with target image size are used as background template D-, by foreground template D+With background template D-Constitute
Adjudicate dictionary D;
Based on default rectangular slide window, (size is w*h, and its wide w, high h value are preset value, and occurrence is according to difference
Processing environment rule of thumb set, be generally based on target image size and be configured, for example w could be arranged to target image
The 1/8 of width, 1/16,1/32,1/64 etc., h could be arranged to the 1/8 of target image height, 1/16,1/32,1/64 etc.),
Left and right, upper and lower slide is carried out on target image to sample, and the left and right sampling interval is identical, the sampling interval is identical (usually up and down
May be configured as w/2, up and down the sampling interval may be configured as h/2, naturally it is also possible to carry out other and such as w/4, h/4 etc. be set, depend on
In the fineness requirement of calculating), the fragment collection of target image is obtained, and choosing is concentrated from the fragment based on K mean cluster method
K fragment (K representative fragment) is taken, by K fragment composing training collection and matching dictionary A is obtained, wherein K is K equal
It is worth the general term of clustering method, is preset value, occurrence is rule of thumb set according to different processing environments, usual span is
[50,100], when K mean cluster method described below is calculated, K value can set different value, usual value
Scope is [50,100];It is provided for the gray matrix T of the gray value of each fragment of the fragment collection of stored target image, the ash
The gray value of each fragment is in degree matrix T:The each point pixel value of fragment cumulative and.
Step 3:Based on build judgement dictionary D, matching dictionary A and gray matrix T, determine initialisation image it is each after
The target location of continuous two field picture, i.e., carry out target following processing to non-first two field picture:
301:The sampling of candidate's particle and the pre- selection of candidate's particle
It is the sampling of candidate's particle first:Current frame image is sampled, and (present treatment process is directly based upon conventional techniques
Completion, the present invention is not construed as limiting, and sampling can be generally completed with particle filter mode, and the number of particles of sampling is specific according to reality
It is general to may be selected 500 or so depending on the situation of border), the first candidate image sample set is obtained, K mean cluster is then based on from first
Candidate samples, which are concentrated, chooses K cluster centre, and calculates the value of the confidence H of each cluster centrec:
Corresponding sparse coefficient is calculated based on formula (1):
Wherein D (refers specifically to foreground template D for judgement dictionary+(it can also claim foreground template dictionary D+) and background template D-(
Background template dictionary D can be claimed-)), X is some sample (referring specifically to each candidate image in the present invention), can be asked by L1 optimization problems
Sparse coefficient α is obtained, wherein μ is the predetermined coefficient of rarefaction representation formula, and occurrence is rule of thumb set according to different experimental situations
It is fixed.‖.‖1、‖.‖21 norm, 2 norms of correspondence vector are represented respectively.
Because if a sample has small reconstructed error to mean that the sample is likely to one on foreground template collection
Target., whereas if a sample has small reconstructed error to mean that this sample is likely to background on background template collection.
Therefore, putting for sample can be constructed according to reconstructed error of the sample on reconstructed error and background template collection on foreground template collection
Letter value Hc:
Hc=exp (- εf+εb) (2)
In above formula,It is sample (candidate image) X in foreground template D+On reconstructed error, a+It is
Candidate image X corresponding to cluster centre is based on foreground template D+Sparse coefficient (according to formula (1) calculate solve),It is sample X in background template D_On reconstructed error, α_It is the candidate image X corresponding to cluster centre
Based on foreground template D-Sparse coefficient (according to formula (1) calculate solve).
It is candidate's this sampling of sampled images center to take the maximum cluster centre of the value of the confidence in K cluster centre, is waited based on described
Image pattern sampling center is selected sample obtaining the second candidate image sample set.
Next to that the pre- selection to candidate's particle, the processing mainly by choosing in advance, removes a large amount of departure ratios larger
Candidate's particle, further to reduce candidate's scope.It is processed as:To the second candidate image sample set, according to formula (1) and (2)
Calculate the value of the confidence of each candidate image, then therefrom choose N before the value of the confidence highest (empirical value, span generally may be configured as
[20,60]) individual candidate is as new candidate image, to treat below further processing.The value of the confidence highest is not directly selected herein
One candidate image is allowed for because the reason for blocking as target, and the value of the confidence highest candidate may not be optimal waits
Choosing, need to by below block processing could select optimal candidate.
In view of in follow-up object tracking process, target is possible to be blocked by background, if without blocking processing,
It is possible to that target is treated as background during tracking, so that the tracking result of mistake is caused, therefore, it is necessary to carry out blocking place
Reason so that the present invention also has preferable robustness to circumstance of occlusion, and so-called block is processed as:
Carrying out fragment sampling in the way of sliding window to each new candidate image first (needs to record the coordinate of fragment
Position), the mode of sampling is identical to the sample mode of target image with step 2, and obtained fragment of sampling contains the candidate
Each part of image, matching dictionary A is then based on to each fragment, corresponding sparse coefficient is calculated using formula (1)
βi, it is next based on formulaCalculate each fragment yiReconstructed error εi(subscript i is used to identifying different broken
Piece).Fragment for belonging to background, its reconstructed error can be larger, belongs to the fragment of prospect, and its reconstructed error can be smaller, if
One threshold epsilon0(span is usually [0.3 0.6]), successively come the attribute of each fragment for judging candidate image, that is, works as
εi> ε0, illustrate that the fragment belongs to background;Work as εi≤ε0When, illustrate that the fragment belongs to prospect.Then set up one and record every successively
The attribute of individual fragment attribute matrix (for example, it is m*n to define target image or fragment sum corresponding to candidate image,
Wherein m is often capable fragment sum, and n is the fragment sum of each column, then attribute matrix is the two-dimensional matrix of m*n ranks, also may be used certainly
To store successively by row vector or column vector, simply need to limit the storage mode one that target image and candidate image are taken
Cause, so as to realize matching treatment), i.e., when fragment belongs to prospect, 1 is set in the matrix correspondence position value, when belonging to background, its
Value is set to 0, and 0-1 matrix is so set up to each candidate.In the same way to initialisation image (the first frame figure
Picture) in target set up 0-1 matrix as To Template (To Template attribute matrix), then by each candidate and mould
Plate is matched, and one candidate image of matching degree highest is the target image of present frame.If being directly based upon target image institute
Corresponding attribute matrix carries out matched jamming target with the attribute matrix corresponding to candidate image, then can be without life in step 2
Into To Template gray scale attribute matrix T, the corresponding objective attribute target attribute matrix of foregoing description generation target image is based only upon.
Specific reality method is as follows:
To each new candidate image c, the rectangular slide window of a w*h size is set first, using the sliding window in candidate
The enterprising line slip samplings of image c, obtain candidate image c fragment collection, based on matching dictionary A (also referred to as sparse dictionary A), root
Solved according to formula (3) and obtain each fragment yiSparse coefficient vector βi:
The judgement dictionary in formula (1) is replaced with into matching dictionary A, sample corresponds to each fragment.
Further according toCalculate each fragment yiReconstructed error εi, and based on reconstructed error εiWith presetting
Threshold epsilon0Magnitude relationship the property value of each fragment is set, then the property value of each fragment is sequentially recorded in attribute matrix Sc
In.
If being directly based upon the attribute matrix corresponding to target image to be matched with the attribute matrix corresponding to candidate image
Target is tracked, then directly by N number of attribute matrix ScMatching treatment is carried out with objective attribute target attribute matrix S respectively, matching degree highest is taken
Attribute matrix ScCorresponding candidate image is the target image of present frame.
Meanwhile, the present invention is also based on the phase of the gray matrix and the gray matrix corresponding to new candidate image of target
The target image of present frame is tracked like degree.It is specifically processed as, i.e., firstly the need of in step 2, based on initialisation image
Target image determines To Template gray scale attribute matrix T (gray value of each fragment of stored target image successively), is then hiding
In gear processing, in addition it is also necessary to which for each new candidate image, one gray matrix F is setc, the candidate image c's new for depositing is every
The gray value of individual fragment, in order to simplify calculating, the attribute matrix S based on each candidate imagecTo its gray matrix FcIt is normalized
Processing, that is, work as ScIn property value be 0 when, then by FcThe value of middle correspondence position is set to 0;Work as ScIn property value be 1 when, Fc
The value of middle correspondence position be the cumulative of the gray value of correspondence fragment, i.e. each point pixel value of the fragment with.Finally, take and gray scale
Gray matrix F most like matrix TcCorresponding new candidate image c as present frame target image.
Further, the similarity L of candidate image and target image can be calculated according to formula (4)c:
M for target image (To Template gray scale attribute matrix T) gray value summation a little, i.e., by calculated value normalizing
Change.Represent candidate image c gray matrix FcEach element, TijRepresent To Template gray scale attribute matrix T each element.Through
Similarity that above formula tried to achieve is crossed in the range of [0,1], similarity L in all candidates is found outcA maximum candidate, should
Candidate image and the matching degree highest of target image, therefore, can using the candidate image as present frame tracking target (target figure
Picture).
In order to keep the real-time that tracking is handled, it is determined that after the target image of complete present frame, setting renewal to handle, it has
Body updates processing mode:
Target image based on present frame, according to step 2 to foreground template D+, background template D-, matching dictionary A and target
Template attributes matrix S or To Template gray scale attribute matrix T are updated;
Or, the target image based on present frame matches dictionary A and gray matrix T according to step 2 pair and is updated;And
And every 5-10 frames, the target image based on present frame, the sample area according to step 2 in background template carries out stochastical sampling,
The individual sampled images equal with target image size of N ' are selected, background template D is added to-In, and delete background template D-In from work as
Previous frame interval time most long individual sampled images of N ', wherein 1≤N '≤Nn(during tracking below, tracking environmental is not
Disconnected change, tracking object variations are little, but background is changed greatly, therefore a consideration cycle place can be updated to background template
Reason);Further, the target image of present frame can also be added to foreground template D+In, and delete foreground template D+In from work as
Previous frame interval time most long sampled images.
In summary, by adopting the above-described technical solution, the beneficial effects of the invention are as follows:Tracking targeted attitude is become
Change, situations such as ambient lighting changes and blocked can improve the situation processing effect to target occlusion with very strong robustness
Really.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, with reference to embodiment and accompanying drawing, to this hair
It is bright to be described in further detail.
Referring to Fig. 1, for actual current frame image, whether be first frame, if the first frame, then if first determining whether present frame
Need according to the target image information in the first frame that (position of target in the picture, the information such as size (length and width are respectively W, H) is carried
It is preceding obtain) ask for the present invention tracking needed for judgement dictionary (foreground template and background template), matching dictionary and its
Its information, be specifically:For the first two field picture, the sample area first shown in figure two carries out palette sample, obtained positive and negative
Template is used as training set.Positive template is to pass through upper and lower, left and right translation 1-5 in the target location indicated with central rectangular frame
Some images of the mode stochastical sampling of pixel, then choose N by way of K mean clusterpIndividual cluster centre is used as positive template
(foreground template).Negative norm plate is some images of being sampled in the relatively remote position in position of distance mark, equal also with K
The method of value cluster chooses NnIndividual image (i.e. in a back-shaped region of distance objective center certain distance, equally return by size
One turns to W*H), these sampled images are used as negative norm plate (background template).
Also need to calculate simultaneously and obtain matching dictionary A and gray matrix T in the first frame, i.e., be by the broken of To Template
Piece, which is obtained, carries out correspondence calculating, and its method is:The rectangular slide window of a w*h size is set first, using the sliding window in target
It is left and right on image, upper and lower enter line slip sampling, the left and right sampling interval be w/2, up and down the sampling interval be h/2, if target image
Height and width be respectively H and W, then number of samples is [W/ (w/2) -1] * [H/ (h/2) -1] (as shown in figure 3, its number of samples is
8*8).It is same that these fragments are asked for into K most representational fragments first with K mean cluster method, then by row heap
The mode put piles column vector composing training collection, utilizes approximatioss (such as convex method of relaxation) generation judgement dictionary A.For target mould
Plate gray scale attribute matrixFor each point in T, a local fragment of target image all correspond to
Gray value.
If present frame is not the first two field picture, the target location for estimating present frame is needed (to determine the target figure of present frame
Picture).Current frame image is sampled first with particle filter mode, more candidate images is obtained and is used as the first candidate
Image pattern set, then chooses K cluster centre using K mean cluster, is calculated according to formula (1) and (2) in each cluster
The value of the confidence of the heart, regard a maximum cluster centre of the value of the confidence as candidate's sampling center.Then according to the sampling center again
A number of candidate image is obtained as the second candidate image sample set in particle filter mode.Calculate and the second candidate is schemed
As the value of the confidence of each candidate image in sample set, 20 candidates are taken before the value of the confidence highest as new candidate image.
Each in the candidate image new to 20, the mode obtained according to fragment in the first frame obtains fragment, to every
One fragment, utilizes the reconstructed error ε for matching dictionary A and asking each fragmenti, work as εi> ε0, illustrate that the fragment belongs to background;When
εi≤ε0When, illustrate that the fragment belongs to prospect.By the fragment in attribute matrix ScThe value of correspondence position is set to 1.By repeating to count
Calculate, for each candidate image, its corresponding attribute matrix S can be obtainedc.Then asked with gray matrix T in the first frame
Take method similar, ask for the gray matrix of each candidate imageIt is but different with the first frame here
It is to need to combine attribute matrix Sc, it is the normalizing of corresponding points in gray matrix when the property value that certain in attribute matrix is put is 0
Ashing angle value is also 0, and when property value is 1, Normalized Grey Level value is constant.Reference picture 3, so that the total number of fragment is 8*8 as an example,
Its corresponding attribute matrix Sc, To Template gray scale attribute matrix T and new candidate image gray matrix FcIt is then 8*8's
Two-dimensional matrix, if the property value of the fragment of (3,4) is 0, then attribute matrix ScMiddle S34Value be 0, and gray matrix FcMiddle F34
Value be also 0, if the property value of the fragment of (4,2) is 1, then attribute matrix ScMiddle S42Value be 1, and gray matrix FcIn
F42Value the summation tried to achieve is added by each point pixel value of fragment (4,2).
The similarity of candidate and template is calculated finally according to formula (4), will wherein one candidate image of similarity highest
Foreground template, background template, matching dictionary A and target mould are updated as the target image of present frame, and using the target image
Plate gray scale attribute matrix T.