CN109993775A - Monotrack method based on feature compensation - Google Patents
Monotrack method based on feature compensation Download PDFInfo
- Publication number
- CN109993775A CN109993775A CN201910258571.8A CN201910258571A CN109993775A CN 109993775 A CN109993775 A CN 109993775A CN 201910258571 A CN201910258571 A CN 201910258571A CN 109993775 A CN109993775 A CN 109993775A
- Authority
- CN
- China
- Prior art keywords
- target
- pixel
- frame
- histograms
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/207—Analysis of motion for motion estimation over a hierarchy of resolutions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20076—Probabilistic image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20132—Image cropping
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses it is a kind of based on posteriority pixel color histogram, histograms of oriented gradients, convolutional neural networks feature compensation video target tracking method, in simple scenario using simple feature to guarantee real-time, complex scene uses complex characteristic, to guarantee accuracy.It is combined by two kinds of features of posteriority pixel histogram and histograms of oriented gradients, obtained response characteristic figure can be good at the fairly simple situation of adaptive video scene;One classifier of training is to judge that it is insincere that the former merges the target when obtained response obtains;Finally further according to the judging result of classifier, the convolutional neural networks tracker for choosing whether the relatively slow still performance more robust of speed to be added is corrected to the target for occurring deviateing is tracked, or is given for change again to tracking lost target.The present invention improves the precision that target sizes and position are judged in video, and it can well adapt to prolonged target following task, to reach the scene of practical application.
Description
Technical field
The invention belongs to the monotrack technical field of computer vision, more particularly to a kind of based on feature compensation
Monotrack method.
Background technique
In computer vision field, the problem of tracing task is all a core all the time, it is widely used in video
Many aspects such as monitoring, human-computer interaction, robot vision perception, military guidance.Monotrack is in the first frame of video
The positions and dimensions of tracking target are manually marked with rectangle frame, what then tracking needed to do is exactly after video
In continuous frame, equally with rectangle frame immediately following firmly this object manually marked.Similar target detection, be in still image or
Target is scanned and searched in dynamic video within the scope of whole frame, summarizes to say, target detection is concerned with positioning and classification.And
Target following, it is of interest that how to lock certain people or object in real time, it and pay no attention to oneself tracking what is.Due to tracking
The requirement of method real-time, the expense that whole frame search calculates are very expensive, hence it is evident that are poorly suitable for this scene, and the object tracked
There is continuity over time and space, therefore the search range of tracking can greatly reduce.However also exactly because
For there are this continuitys, during tracking, some complex scenes, there are the variations of illumination, the deformation of appearance, quickly fortune
It moves, block the disturbing factors such as similar with background, the model of most of tracking needs constantly more during tracing task
New model itself, therefore, once model learning to background information, is easy for generating error, and this error can accumulate always
Go down, causes finally to lose target.
Currently, the track algorithm overwhelming majority of mainstream is short-term tracking (short-term tracking), it is primarily present
Following defect:
(1) poor robustness
It can not be given for change again after trace model loses target, this kind of algorithms are mainly in the position of tracking target
With article above and below the precision of size, do not have higher robustness, do not adapt to prolonged tracing task, such model its
It should can not be used in reality scene well in fact.
(2) speed is slow
Either end to end neural network structure trace model or depth convolution characteristic pattern in conjunction with correlation filtering
Trace model will spend the high calculating time, therefore answer in actual scene although higher accuracy rate can be obtained
With less.And other are based on correlation filtering tradition trace model, although cracking speed can be reached, but in accuracy and Shandong
Show not good enough on stick.
(3) error accumulation
Since, there are various disturbing factors, model is difficult all correctly to trace into mesh in each frame in video scene
Mark, therefore will learn updating template to background or other interference informations, this error will constantly accumulate, and be it is a kind of not
Reversible process.
For defect existing for above-mentioned tracking, to accomplish to be well applied in reality scene, still cutting
Access point is placed on long-term follow (long-term tracking), i.e., improves robustness as much as possible, guarantee that speed reaches real-time
While can adapt to prolonged tracing task.
Summary of the invention
The purpose of the present invention is to provide one kind based on posteriority pixel color histogram, histograms of oriented gradients, convolution mind
Feature compensation video target tracking method through network, with realize construction one robustness it is good, speed is fast, ensure accuracy and
While robustness is promoted, fully ensure that model there can be higher frame per second in terms of speed;The present invention, which improves in video, to be judged
The precision of target sizes and position, and it can well adapt to prolonged target following task, to reach practical application
Scene.
The technical scheme adopted by the invention is that providing a kind of feature compensation video frequency object tracking side based on Fusion Features
Method, comprising the following steps:
S1 establishes the target following model branch of color histogram feature:
S11 calls OpenCV kit, is first with the target image manually marked before target following task starts
Basis cuts out the target subgraph E with background information;
Target subgraph with background information according to a certain percentage according to the size of target is done foreground area by S12
With the separation of background area;Meanwhile scale compression is carried out to pixel in the integer range that pixel value is 0-32, and rely on respectively
The identical foreground mask of size and background exposure mask calculate in corresponding foreground area and background area relative to each pixel value
Pixel ratio, i.e. foreground pixel ratio ρ (O) and background pixel ratio ρ (B), the expression formula of pixel ratio ρ be as follows:
ρ (O)=N (O)/| O |; (1-1)
ρ (B)=N (B)/| B |; (1-2)
Wherein, the image-region of O character representation prospect O, the image-region of B character representation background B;N (O) indicates prospect O
Image-region in non-zero pixels value number, N (B) indicate background B image-region in non-zero pixels value number;| O | table
Show the total number of pixel value in the image-region of prospect O, | B | indicate the total number of pixel value in the image-region of background B;It is based on
The weight beta of present frame posteriority pixel color histogram template is calculated in formula (1-1), (1-2)t:
Wherein, t indicates present frame, and λ is hyper parameter;
S13, in video next frame, using former frame target's center as in the image range at region of search center, with described
S12 cuts out subgraph e, and carries out scale to pixel and compress to obtain ψ;It is obtained according to formula (1-1), (1-2) and formula (2)
The weight beta of the posteriority pixel color histogram template of former framet-1, color histogram, which is finally obtained, using integrogram formula responds
fhist:
Wherein, ψ is the compressed subgraph of the channel M pixel, is defined on present frame and cuts on picture e;ψtIt is current
The compressed subgraph of the channel frame M pixel;H represents each pixel in integer range of the correspondence of picture;U represents H net
Each corresponding grid in lattice, ψ [u] are the corresponding pixels on ψ, and superscript T is matrix transposition;
S14, while completing the tracing task of a frame every time, in the position for the present frame that prediction obtains, to posteriority pixel
The weight beta of histogram templatetIt is updated, i.e., foreground pixel ratio ρ (O) and background pixel ratio ρ (B) is carried out more respectively
Newly, the pixel ratio ρ of the prospect O of present frame after being updatedt(O) and update after present frame background B pixel ratio ρt(B):
ρt(O)=(1- ηhist)ρt-1(O)+ηhistρ′t(O)
ρt(B)=(1- ηhist)ρt-1(B)+ηhistρ′t(B); (4)
Wherein ρ 'tIt (O) is the pixel ratio in the image-region of the prospect O of present frame, ρ 'tIt (B) is the background B of present frame
Image-region in pixel ratio, ρt-1It (O) is the pixel ratio in the image-region of the prospect O of former frame, ρt-1(B) before being
Pixel ratio in the image-region of the background B of one frame;ηhistThe weight updated for pixel ratio;
S2 establishes the target following model branch of histograms of oriented gradients feature:
S21 is selected on the target image to be tracked in S11 with rectangle frame, and it is different but same to cut out another size
Target area subgraph E ' with background information, and extract the histograms of oriented gradients feature φ of K channel three-dimensionalk, it is multiplied by
The template of histograms of oriented gradients feature is calculated in cosine window function in OpenCV packet
Wherein,It is to be defined on frequency domain to pass through the variable that discrete Fourier transform obtains;U is represented in Γ grid
Each corresponding grid, Γ represent φkEach grid in integer range of upper correspondence;Superscript i represents the every of K channel
One channel,It is the conjugation of Fourier transformation the latter gaussian signal;It indicates to be conjugated with * in a frequency domain, e indicates element
Multiplication,It is histograms of oriented gradients feature φkEach Channel elements obtained by Fourier transform, K are the numbers in channel
Mesh;
S22, by histograms of oriented gradients template obtained in S21It carries out inverse Fourier transform and obtains h [u], regarding
In frequency next frame, using former frame target's center in the image range at region of search center, to cut a subgraph e ', and extract
The histograms of oriented gradients feature φ of current subgraph, is calculated present frame histograms of oriented gradients score using linear function
fhog:
fhog(φ, h)=∑u∈Γh[u]Tφ[u]; (7)
S23 is straight to direction gradient in the position for the present frame that prediction obtains after the tracing task for completing each frame
The template of square figure feature is updated, that is, respectively obtains updated final signalWith
Wherein,WithIt is to calculate separately to obtain the signal for indicating present frame from formula (6),WithIndicate previous
The signal of frame,WithIndicate updated final signal;ηhogFor the weight of histograms of oriented gradients template renewal;
S3, Fusion Features and establishes classifier:
The color histogram respectively obtained in S13, S22 is responded f by S31hist, histograms of oriented gradients score fhogPass through
Define linear function f (x) carry out Fusion Features to get:
F (x)=γhogfhog(x)+γhistfhist(x); (9)
Wherein, γhogFor the weight of histograms of oriented gradients response, γhsitFor the weight of color histogram response, f is taken
(x) coordinate of point corresponding to maximum value is the centre coordinate of target;
S32 passes through f, fhogTraining classifier: a collection of video sequence is selected, by the Fusion Features in S31, is exported respectively
f、fhog, and enabling the input of data set is X=[max (fhog);Max (f)], enabling output label is h 'θ, indicate the true of data set
Value, h 'θTracking box for 0 or 1 integer, 0 expression model has had deviated from target, and 1 indicates without departing from target;Logic is enabled to return
Return function hθThe output of presentation class device:
Data are divided into training set in the ratio of 7:3 and verifying collects, training set passes through under cross entropy loss function and gradient
Algorithm is dropped, after successive ignition calculates convergence, obtains the parameter θ of Logic Regression Models in formula (10);Again with verifying collection data
The fine tuning for carrying out hyper parameter calculates the correct result that parameter obtains under different value, selects by the way that the parameter of different numerical value is arranged
The highest value of correct result is selected as final parameter value, so that classifier reaches preferable classification results on verifying collection;
S4 judges whether to need to access convolutional neural networks structure tracker:
S41, by f, fhogIn the classifier that input S32 is obtained, exported;Classifier (i.e. formula 10) exports in S32
Successive value in select 0.5 as threshold value;It when output is greater than 0.5, indicates that the result of Fusion Model is credible, does not have to switching
Access convolutional neural networks tracker;When output is less than 0.5, indicates that the result of Fusion Model is not received, access need to be switched
Convolutional neural networks tracker;
S42, when the target response score that convolutional neural networks tracker predicted current frame goes out is higher, recycling S14,
S23 is respectively updated posteriority pixel histogram template and histograms of oriented gradients template;Later into next frame with
Track task, until completing all video frames.
The beneficial effects of the present invention are:
(1) present invention merges manifold method, combines the characteristic that these features have, can be good at coping with
Illumination variation, object deformation, the video scenes such as blocks at motion blur, and quick using simple feature in simple scene
Tracing task is completed, reduces the influence of interference information in the complicated more robust feature of scene switching.
(2) present invention adds the methods of self-test classifier, show model in handoff features more intelligent, more
When the template of new feature, inhibits study to arrive invalid information, reduce the accumulation of error;Meanwhile classifier is relatively simple, does not need too
More computing costs.
(3) tracker for the neural network structure that the present invention selects will not learn the letter to interference without updating template
Breath, in the case where target occlusion, there is good performance.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is foreground and background exposure mask schematic diagram.
Fig. 2 is posteriority pixel histogram and response diagram schematic diagram.
Fig. 3 is histograms of oriented gradients and response diagram.
Fig. 4 is the monotrack algorithm schematic diagram based on feature compensation.
Fig. 5 is the accuracy of each algorithm and robustness distribution map under reset mechanism.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
In target tracking domain, mainly have deformation, illumination variation, quickly movement, background is similar, Plane Rotation, scale
Change, block, going out the difficult points such as the visual field.
Detailed process is as follows:
S1 establishes the target following model branch of color histogram feature:
S11 is built upon under the scene of monotrack task due to this method, that is, needs to appoint in target following
Before business starts, OpenCV kit is called, the mode manually marked selects the target to be tracked with rectangle frame, and cuts
The target subgraph E of background information is had out, and the characteristic that then model can just have according to selected target and its background carries out
It distinguishes, completes subsequent tracing task.Therefore, under such scene, it is either based on which kind of feature, model can all be risen by video
It is subsequent to match to generate an initial feature template according to the respective mode of model for the image in target frame that beginning frame is selected
The candidate region of frame image, to predict position and the size of target.
S12, on the first frame basis that frame has selected the target to be tracked, the model of this color histogram will
Target subgraph E with background information according to a certain percentage according to the size of target is done a foreground area and background
The separation in region.Because pixel value range is 0~255, such as calculated with original pixel value, a large amount of calculating times will be spent, so needing
One is wanted to do a scale compression to pixel, the scale selected here is 8, i.e., calculates in 0~32 integer range, significantly
Lift scheme speed.And (as shown in Fig. 1-a, white object region is 1 black of value back to the identical foreground mask of support size respectively
Scenic spot thresholding be 0 single channel image) and background exposure mask (as shown in Fig. 1-b, black objects regional value be 0 white background area
Single channel image for 1) it calculates in two regions, relative to the pixel ratio of each pixel value, i.e. foreground pixel ratio ρ
(O) and background pixel ratio ρ (B):
ρ (O)=N (O)/| O |; (1-1)
ρ (B)=N (B)/| B |; (1-2)
Wherein, the image-region of O character representation prospect O, the image-region of B character representation background B;N (O) indicates prospect O
Image-region in non-zero pixels value number, N (B) indicate background B image-region in non-zero pixels value number;| O | table
Show the total number of pixel value in the image-region of prospect O, | B | indicate the total number of pixel value in the image-region of background B;Respectively
After obtaining the pixel ratio of foreground and background, so that it may the weight of present frame posteriority pixel color histogram template be calculated
βt:
T indicates present frame, and λ is hyper parameter.
S13 is search with former frame target's center in video next frame after establishing posteriority pixel histogram template
In the image range of regional center, subgraph e is similarly cut out, and scale is carried out to pixel and compresses to obtain ψ.According to formula
(1-1), (1-2) obtain the weight beta of the posteriority pixel color histogram template of former framet-1, such as Fig. 2-a, Fig. 2-b and Fig. 2-c institute
Show, finally obtains color histogram response f using integrogram formulahist:
Wherein, ψ is the compressed subgraph of the channel M pixel, is defined on present frame and cuts on picture e;ψtIt is current
The compressed subgraph of the channel frame M pixel;H represents each pixel in integer range of the correspondence of picture;U represents H net
Each corresponding grid in lattice, ψ [u] are the corresponding pixels on ψ, and superscript T is matrix transposition;
S1-4, during online tracking, all delicate or violent variation is occurring for the scene in video at any time, for
For color histogram feature, the influence of the disturbing factors such as illumination variation, motion blur is especially serious.Therefore in order to preferably suitable
Many variations present in video scene are answered, while completing the tracing task of a frame every time, in the present frame that prediction obtains
Position, need the weight beta to posteriority pixel histogram templatetIt is updated, that is, respectively to the pixel of foreground and background
Ratio ρ (O), ρ (B) update:
ρt(O)=(1- ηhist)ρt-1(O)+ηhistρ′t(O)
ρt(B)=(1- ηhist)ρt-1(B)+ηhistρ′t(B); (4)
Wherein, ρ 'tIt (O) is the pixel ratio in the image-region of the prospect O of present frame, ρ 'tIt (B) is the background B of present frame
Image-region in pixel ratio, ρt-1It (O) is the pixel ratio in the image-region of the prospect O of former frame, ρt-1(B) before being
Pixel ratio in the image-region of the background B of one frame;ηhistThe weight updated for pixel ratio;
S2 establishes the target following model branch of histograms of oriented gradients feature:
S2-1 is cut out on the first frame basis that frame has selected the tracking target for manually marking out with back
The target area subgraph E ' of scape information extracts the histograms of oriented gradients feature φ of K channel three-dimensionalk, such as Fig. 3-a, Fig. 3-b
It is shown, inhibit the influence of the subgraph peripheral part of surrounding multiplied by the cosine window function in an OpenCV packet.It calculates
The template of histograms of oriented gradients feature
Wherein,It is to be defined on frequency domain to pass through the variable that discrete Fourier transform obtains, because of correlation filtering
In model in have cross-correlation operation, this will spend very high calculating time overhead, therefore after variable is done Fourier transform,
Convolutional calculation in the time domain can be converted into the product of the element in frequency domain and calculate, and can greatly reduce and calculate the time.U is represented
Each corresponding grid, Γ represent φ in Γ gridkEach grid in integer range of upper correspondence;Superscript i represents K
Each channel in channel,It is the conjugation of Fourier transformation the latter gaussian signal;It indicates to be conjugated with * in a frequency domain, e
Indicate element multiplication,It is histograms of oriented gradients feature φkEach Channel elements obtained by Fourier transform, K are
The number in channel.
S22 establishes histograms of oriented gradients templateAfterwards, it carries out inverse Fourier transform and obtains h [u], under video
In one frame, using former frame target's center as in the image range at region of search center, a subgraph e ' searched is cut, is extracted
The histograms of oriented gradients feature φ of current subgraph, so that it may it is straight that present frame direction gradient be calculated with a linear function
Square figure score fhog, effect picture is as shown in Fig. 3-c:
fhog(φ, h)=∑u∈Γh[u]Tφ[u]; (7)
S23, in the stage tracked online, histograms of oriented gradients is same as target changes in scene, and makes
It at interference, is especially affected caused by object deformation, therefore, also needs after the tracing task for completing each frame,
The position for predicting obtained present frame, is updated the template of histograms of oriented gradients feature:
Wherein,WithIt is to calculate separately to obtain the signal for indicating present frame from formula (6),WithIndicate previous
The signal of frame,WithIndicate updated final signal;ηhogFor the weight of histograms of oriented gradients template renewal.
S3, Fusion Features and establishes classifier:
S31, because there is illumination variation in the scene in color histogram feature, when the disturbing factors such as fuzzy pictures,
It is affected to model, and histograms of oriented gradients feature has deformation in target, it is right quickly when the disturbing factors such as movement
Model is affected.So two kinds of Fusion Features can be reduced the interference of these factors to a certain extent, tracking is improved
Model accuracy and robustness, make it in tracing task, can predict the more accurate position of target and size, and and do not allow
Target easy to be lost.Here the color histogram respectively obtained in S13, S22 is responded into fhist, histograms of oriented gradients score fhog
By define linear function f (x) carry out Fusion Features to get:
F (x)=γhogfhog(x)+γhistfhist(x); (9)
Wherein, γhogFor the weight of histograms of oriented gradients response, γhsitFor the weight of color histogram response, f is taken
(x) coordinate of point corresponding to maximum value is the centre coordinate of target.
S32, although after two kinds of features are merged, good effect can be shown in most of scene,
Be it is similar for a part of such as background, block, the more complicated video scene in the visual field out, still have biggish performance boost
Space.Therefore other more robusts are added, the tracker of the better neural network structure of effect improves the performance of model.It considers
The neural network speed of service is slower, and current general hardware device is not able to satisfy the requirement of real-time, therefore only in first two
It, could be maximum using the tracker of neural network when the model of Fusion Features cannot complete the tracing task of present frame well
The performance of limit performance model.Cope with such demand most critical is exactly that Fusion Features model is allowed to know when to need to cut
The tracker for changing neural network structure, by analyzing f, fhist、fhogIt is (single only there are input (x) when mapping relations in formula
Only symbol indicates not needing statement (x)) three situations of change of the response score under different scenes can be seen that and occur in target
Larger deformation or when blocking, f, fhogThere is larger fluctuation, therefore one point of the two values training can be passed through
Mark of the class device as switching tracker.
A collection of video sequence is selected, by the Fusion Features in S31, exports f, f respectivelyhog, and enable the input of data set be
X=[max (fhog);Max (f)], enabling output label is h 'θ, indicate the true value of data set, h 'θIt is indicated for 0 or 1 integer, 0
The tracking box of model has had deviated from target, and 1 indicates without departing from target;There is a concept in target detection friendship and compares
(Intersection-over-Union, IoU) indicates the frames images and real image frame Duplication of prediction, makes herein herein
Use friendship and the foundation than whether deviateing as measurement target;Experiment by multiple values is attempted, and 0.35 conduct cut off value is one
Selection more appropriate, i.e., as IoU > 0.35, h 'θWhen=1, IoU < 0.35, h 'θ=0;Enable logistic regression function hθIt indicates to divide
The output of class device:
Data are divided into training set in the ratio of 7:3 and verifying collects, training set passes through under cross entropy loss function and gradient
Algorithm is dropped, after successive ignition calculates convergence, obtains the weight θ in formula (10).Hyper parameter is carried out with verifying collection data again
Fine tuning calculate the correct result that parameter obtains under different value by the way that the parameter of different numerical value is arranged, select correct result
Highest value is as final parameter value, so that classifier reaches preferable classification results on verifying collection.
S4 judges whether to need to access convolutional neural networks structure tracker:
S41, after training and finely tuned sorter model in previous step, so that it may judge face in the stage of tracing task
Whether the model of Color Histogram feature and histograms of oriented gradients Fusion Features can also adapt to current video scene, to do
Whether need to switch the tracker of neural network out.During tracking, f that formula (7), (9) obtainhog, f as classification
The input of device, i.e. formula (10), obtained output, the mark as switched.When previous step verify data carries out adjusting ginseng,
A suitable threshold value 0.5 has been had selected, when output is greater than this threshold value, has indicated that the end value of Fusion Model must be believed
Appoint, does not have to switching;When output is less than this threshold value, the result of Fusion Model is not received, and at this time just switches nerve net
The tracker of network.Here the neural network tracker selected is DaSiamRPN, it combines the thought and structure of target detection
RPN (network is suggested in Region Proposal Networks, region), can be good at the scene for coping with some complexity, more can
The accurately size after fit object deformation, does not need the template of online updating target, therefore cumulative errors are also just not present
Pollute template situation.
S42, the stage tracked online need with formula (4), (8) respectively to posteriority pixel histogram template and direction gradient
Histogram template is updated, and carrys out the variation of scene in adaptive video.Similarly, work as in switching DaSiamRPN tracker completion
After the tracing task of previous frame, it is still desirable to be updated operation with the two formula.But because DaSiamRPN there is also with
The case where track fails, so only just template is updated when the target response score that tracker predicted current frame goes out is higher, so far,
The tracing task of next frame is just carried out, until completing all video frames.Entire trace flow is as shown in Figure 4.
Embodiment
In order to assess performance of the invention, need to be tested on video sequence test set.Here VOT is selected
The evaluation method, data set, evaluation system of (visual object tracking) contest carry out this experiment.Data set includes
60 video sequences have been directed to block, the movement of illumination variation, target, dimensional variation, camera movement, have gone out the scenes such as visual field,
A variety of above-mentioned attributes are likely to occur in a video sequence, the perceptual property of different frame is different, can carry out in this way to model
More accurately evaluate.Before VOT proposition, popular evaluation system is that tracker is allowed to carry out just in the first frame of sequence
Beginningization allows tracker to go to last frame always later.However tracker may lead to it because of wherein one or two of factor
Beginning certain frames just with losing (fail), so the very small part of sequence is only utilized in final evaluation system, cause to waste.
And VOT is proposed, evaluation system should detect wrong (failure) when tracker is with losing, and occur in failure
5 frames after (reinitialize) is reinitialized to tracker, data set can be made full use of in this way.
Referring initially to the experiment scoring of reset mechanism, such as table 1:
The scoring of algorithms of different under 1 reset mechanism of table
A-R rank indicates the index of accuracy rate (Accuracy) and robustness (Robustness) ranking in table 1,
Overlap is equivalent to accuracy rate, and representative is target and the target Duplication manually really marked that tracking is predicted,
The bigger explanation of Overlap, prediction it is more accurate;Failure is the stability for evaluating tracking, and numerical value is smaller, is stablized
Property is better.By being compared with 7 trackings, it can be seen that make number one in the accuracy of this method, stability comes
Third position.The scoring tendency of all algorithms in table can also be more intuitively found out by Fig. 5.However it can not in actual scene
It is reset again after failing in the presence of tracking, it is clear that the first appraisement system has more reference value for actual scene, and experiment is commented
Divide such as table 2:
Scoring of the table 2 without algorithms of different under reset mechanism
AUC in table 2 (Area Under Curve, the area surrounded under curve with reference axis) is an evaluation algorithms quality
Index, value is bigger, and the performance for illustrating algorithm is better.Speed index FPS (Frames Per Second, transmission frame per second per second),
It is also that the bigger value the faster.As can be seen that will not be by scoring again after also meaning that tracking failure under no reset mechanism
In the case where system positioning target, the relatively other 7 method accuracys rate of this method have reached highest, and accuracy rate ranking first three
Algorithm in it is fastest.In addition, being CPU:Intel Core i7-6700, GPU:GeForce GT in the machine hardware configuration
It is tested on 730, the prestissimo for obtaining SiamFC method in table is only 3FPS, and this method can reach most fast 30FPS, therefore
It is higher in accuracy rate while still having original speed as can be seen that compare other methods, more adaptation actual scene.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention
It is interior.
Claims (1)
1. a kind of monotrack method based on feature compensation, which comprises the following steps:
S1 establishes the target following model branch of color histogram feature:
S11 calls OpenCV kit before target following task starts, first based on the target image manually marked,
Cut out the target subgraph E with background information;
Target subgraph with background information according to a certain percentage according to the size of target is done foreground area and back by S12
The separation of scene area;Meanwhile scale compression is carried out to pixel in the integer range that pixel value is 0-32, and rely on size respectively
Identical foreground mask and background exposure mask calculate the picture in corresponding foreground area and background area relative to each pixel value
Plain ratio, i.e. foreground pixel ratio ρ (O) and background pixel ratio ρ (B), the expression formula of pixel ratio ρ are as follows:
ρ (O)=N (O)/| O |; (1-1)
ρ (B)=N (B)/| B |; (1-2)
Wherein, the image-region of O character representation prospect O, the image-region of B character representation background B;The figure of N (O) expression prospect O
As the number of non-zero pixels value in region, N (B) indicates the number of non-zero pixels value in the image-region of background B;| O | before expression
The total number of pixel value in the image-region of scape O, | B | indicate the total number of pixel value in the image-region of background B;Based on formula
The weight beta of present frame posteriority pixel color histogram template is calculated in (1-1), (1-2)t:
Wherein, t indicates present frame, and λ is hyper parameter;
S13, in video next frame, using former frame target's center as in the image range at region of search center, with the S12,
Subgraph e is cut out, and scale is carried out to pixel and compresses to obtain ψ;It is obtained according to formula (1-1), (1-2) and formula (2) previous
The weight beta of the posteriority pixel color histogram template of framet-1, color histogram response f is finally obtained using integrogram formulahist:
Wherein, ψ is the compressed subgraph of the channel M pixel, is defined on present frame and cuts on picture e;ψtFor the M of present frame
The compressed subgraph of channel pixel;H represents each pixel in integer range of the correspondence of picture;U is represented in H grid
Each corresponding grid, ψ [u] are the corresponding pixels on ψ, and superscript T is matrix transposition;
S14, while completing the tracing task of a frame every time, in the position for the present frame that prediction obtains, to posteriority pixel histogram
The weight beta of artwork versiontIt is updated, i.e., foreground pixel ratio ρ (O) and background pixel ratio ρ (B) is updated respectively, is obtained
The pixel ratio ρ of the prospect O of present frame after to updatet(O) and update after present frame background B pixel ratio ρt(B):
ρt(O)=(1- ηhist)ρt-1(O)+ηhistρ′t(O)
ρt(B)=(1- ηhist)ρt-1(B)+ηhistρ′t(B); (4)
Wherein, ρ 'tIt (O) is the pixel ratio in the image-region of the prospect O of present frame, ρ 'tIt (B) is the figure of the background B of present frame
As the pixel ratio in region, ρt-1It (O) is the pixel ratio in the image-region of the prospect O of former frame, ρt-1It (B) is former frame
Background B image-region in pixel ratio;ηhistThe weight updated for pixel ratio;
S2 establishes the target following model branch of histograms of oriented gradients feature:
S21 is selected on the target image to be tracked in S11 with rectangle frame, cut out another size it is different but again with
The target area subgraph E ' of background information, and extract the histograms of oriented gradients feature φ of K channel three-dimensionalk, it is multiplied by
The template of histograms of oriented gradients feature is calculated in cosine window function in OpenCV packet
Wherein,It is to be defined on frequency domain to pass through the variable that discrete Fourier transform obtains;U represents corresponding in Γ grid
Each grid, Γ represents φkEach grid in integer range of upper correspondence;Superscript i represents each of K channel
Channel,It is the conjugation of Fourier transformation the latter gaussian signal;It indicates to be conjugated with * in a frequency domain, e indicates that element multiplies
Method,It is histograms of oriented gradients feature φkEach Channel elements obtained by Fourier transform, K are the numbers in channel;
S22, by histograms of oriented gradients template obtained in S21It carries out inverse Fourier transform and obtains h [u], under video
In one frame, using former frame target's center in the image range at region of search center, to cut a subgraph e ', and extract current
Present frame histograms of oriented gradients score f is calculated using linear function in the histograms of oriented gradients feature φ of subgraphhog:
fhog(φ, h)=∑u∈Γh[u]Tφ[u]; (7)
S23, after the tracing task for completing each frame, in the position for the present frame that prediction obtains, to histograms of oriented gradients
The template of feature is updated, that is, respectively obtains updated final signalWith
Wherein,WithIt is to calculate separately to obtain the signal for indicating present frame from formula (6),WithIndicate former frame
Signal,WithIndicate updated final signal;ηhogFor the weight of histograms of oriented gradients template renewal;
S3, Fusion Features and establishes classifier:
The color histogram respectively obtained in S13, S22 is responded f by S31hist, histograms of oriented gradients score fhogPass through definition
Linear function f (x) carry out Fusion Features to get:
F (x)=γhogfhog(x)+γhistfhist(x); (9)
Wherein, γhogFor the weight of histograms of oriented gradients response, γhsitFor the weight of color histogram response, f (x) is taken most
The coordinate of the corresponding point of big value is the centre coordinate of target;
S32 passes through f, fhogTraining classifier: selecting a collection of video sequence, by the Fusion Features in S31, export respectively f,
fhog, and enabling the input of data set is X=[max (fhog);Max (f)], enabling output label is h 'θ, indicate the true of data set
Value, h 'θTracking box for 0 or 1 integer, 0 expression model has had deviated from target, and 1 indicates without departing from target;Logic is enabled to return
Return function hθThe output of presentation class device:
Data are divided into training set in the ratio of 7:3 and verifying collects, training set is calculated by cross entropy loss function and gradient decline
Method obtains the parameter θ of Logic Regression Models in formula (10) after successive ignition calculates convergence;Again with verifying collection data into
The fine tuning of row hyper parameter calculates the correct result that parameter obtains under different value, selects by the way that the parameter of different numerical value is arranged
The highest value of correct result is as final parameter value, so that classifier reaches preferable classification results on verifying collection;
S4 judges whether to need to access convolutional neural networks structure tracker:
S41, by f, fhogIn the classifier that input S32 is obtained, exported;It is selected in the successive value that classifier exports in S32
0.5 is used as threshold value;It when output is greater than 0.5, indicates that the result of Fusion Model is credible, does not have to switching access convolutional Neural net
Network tracker;When output less than 0.5 when, indicate that the result of Fusion Model is not received, need to switch access convolutional neural networks with
Track device;
S42 reuses S14, S23 when the target response score that convolutional neural networks tracker predicted current frame goes out is higher,
Posteriority pixel histogram template and histograms of oriented gradients template are updated respectively;Appoint later into the tracking of next frame
Business, until completing all video frames.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910258571.8A CN109993775B (en) | 2019-04-01 | 2019-04-01 | Single target tracking method based on characteristic compensation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910258571.8A CN109993775B (en) | 2019-04-01 | 2019-04-01 | Single target tracking method based on characteristic compensation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109993775A true CN109993775A (en) | 2019-07-09 |
CN109993775B CN109993775B (en) | 2023-03-21 |
Family
ID=67132176
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910258571.8A Active CN109993775B (en) | 2019-04-01 | 2019-04-01 | Single target tracking method based on characteristic compensation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109993775B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110490148A (en) * | 2019-08-22 | 2019-11-22 | 四川自由健信息科技有限公司 | A kind of recognition methods for behavior of fighting |
CN110647836A (en) * | 2019-09-18 | 2020-01-03 | 中国科学院光电技术研究所 | Robust single-target tracking method based on deep learning |
CN110675423A (en) * | 2019-08-29 | 2020-01-10 | 电子科技大学 | Unmanned aerial vehicle tracking method based on twin neural network and attention model |
CN110738149A (en) * | 2019-09-29 | 2020-01-31 | 深圳市优必选科技股份有限公司 | Target tracking method, terminal and storage medium |
CN111046796A (en) * | 2019-12-12 | 2020-04-21 | 哈尔滨拓博科技有限公司 | Low-cost space gesture control method and system based on double-camera depth information |
CN111260686A (en) * | 2020-01-09 | 2020-06-09 | 滨州学院 | Target tracking method and system for anti-shielding multi-feature fusion of self-adaptive cosine window |
CN112991395A (en) * | 2021-04-28 | 2021-06-18 | 山东工商学院 | Vision tracking method based on foreground condition probability optimization scale and angle |
CN115063449A (en) * | 2022-07-06 | 2022-09-16 | 西北工业大学 | Hyperspectral video-oriented three-channel video output method for target tracking |
Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0795385A (en) * | 1993-09-21 | 1995-04-07 | Dainippon Printing Co Ltd | Method and device for clipping picture |
EP0951182A1 (en) * | 1998-04-14 | 1999-10-20 | THOMSON multimedia S.A. | Method for detecting static areas in a sequence of video pictures |
EP1126414A2 (en) * | 2000-02-08 | 2001-08-22 | The University Of Washington | Video object tracking using a hierarchy of deformable templates |
WO2010001364A2 (en) * | 2008-07-04 | 2010-01-07 | Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi | Complex wavelet tracker |
DE102009038364A1 (en) * | 2009-08-23 | 2011-02-24 | Friedrich-Alexander-Universität Erlangen-Nürnberg | Method and system for automatic object recognition and subsequent object tracking according to the object shape |
CN102750708A (en) * | 2012-05-11 | 2012-10-24 | 天津大学 | Affine motion target tracing algorithm based on fast robust feature matching |
US20130083192A1 (en) * | 2011-09-30 | 2013-04-04 | Siemens Industry, Inc. | Methods and System for Stabilizing Live Video in the Presence of Long-Term Image Drift |
CN103426178A (en) * | 2012-05-17 | 2013-12-04 | 深圳中兴力维技术有限公司 | Target tracking method and system based on mean shift in complex scene |
CN103793926A (en) * | 2014-02-27 | 2014-05-14 | 西安电子科技大学 | Target tracking method based on sample reselecting |
CN104299247A (en) * | 2014-10-15 | 2015-01-21 | 云南大学 | Video object tracking method based on self-adaptive measurement matrix |
CN104361611A (en) * | 2014-11-18 | 2015-02-18 | 南京信息工程大学 | Group sparsity robust PCA-based moving object detecting method |
US20150146022A1 (en) * | 2013-11-25 | 2015-05-28 | Canon Kabushiki Kaisha | Rapid shake detection using a cascade of quad-tree motion detectors |
WO2017088050A1 (en) * | 2015-11-26 | 2017-06-01 | Sportlogiq Inc. | Systems and methods for object tracking and localization in videos with adaptive image representation |
WO2017132830A1 (en) * | 2016-02-02 | 2017-08-10 | Xiaogang Wang | Methods and systems for cnn network adaption and object online tracking |
WO2017143589A1 (en) * | 2016-02-26 | 2017-08-31 | SZ DJI Technology Co., Ltd. | Systems and methods for visual target tracking |
CN108346159A (en) * | 2018-01-28 | 2018-07-31 | 北京工业大学 | A kind of visual target tracking method based on tracking-study-detection |
CN108447078A (en) * | 2018-02-28 | 2018-08-24 | 长沙师范学院 | The interference of view-based access control model conspicuousness perceives track algorithm |
US20180372499A1 (en) * | 2017-06-25 | 2018-12-27 | Invensense, Inc. | Method and apparatus for characterizing platform motion |
CN109360223A (en) * | 2018-09-14 | 2019-02-19 | 天津大学 | A kind of method for tracking target of quick spatial regularization |
-
2019
- 2019-04-01 CN CN201910258571.8A patent/CN109993775B/en active Active
Patent Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0795385A (en) * | 1993-09-21 | 1995-04-07 | Dainippon Printing Co Ltd | Method and device for clipping picture |
EP0951182A1 (en) * | 1998-04-14 | 1999-10-20 | THOMSON multimedia S.A. | Method for detecting static areas in a sequence of video pictures |
EP1126414A2 (en) * | 2000-02-08 | 2001-08-22 | The University Of Washington | Video object tracking using a hierarchy of deformable templates |
WO2010001364A2 (en) * | 2008-07-04 | 2010-01-07 | Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi | Complex wavelet tracker |
DE102009038364A1 (en) * | 2009-08-23 | 2011-02-24 | Friedrich-Alexander-Universität Erlangen-Nürnberg | Method and system for automatic object recognition and subsequent object tracking according to the object shape |
US20130083192A1 (en) * | 2011-09-30 | 2013-04-04 | Siemens Industry, Inc. | Methods and System for Stabilizing Live Video in the Presence of Long-Term Image Drift |
CN102750708A (en) * | 2012-05-11 | 2012-10-24 | 天津大学 | Affine motion target tracing algorithm based on fast robust feature matching |
CN103426178A (en) * | 2012-05-17 | 2013-12-04 | 深圳中兴力维技术有限公司 | Target tracking method and system based on mean shift in complex scene |
US20150146022A1 (en) * | 2013-11-25 | 2015-05-28 | Canon Kabushiki Kaisha | Rapid shake detection using a cascade of quad-tree motion detectors |
CN103793926A (en) * | 2014-02-27 | 2014-05-14 | 西安电子科技大学 | Target tracking method based on sample reselecting |
CN104299247A (en) * | 2014-10-15 | 2015-01-21 | 云南大学 | Video object tracking method based on self-adaptive measurement matrix |
CN104361611A (en) * | 2014-11-18 | 2015-02-18 | 南京信息工程大学 | Group sparsity robust PCA-based moving object detecting method |
WO2017088050A1 (en) * | 2015-11-26 | 2017-06-01 | Sportlogiq Inc. | Systems and methods for object tracking and localization in videos with adaptive image representation |
WO2017132830A1 (en) * | 2016-02-02 | 2017-08-10 | Xiaogang Wang | Methods and systems for cnn network adaption and object online tracking |
WO2017143589A1 (en) * | 2016-02-26 | 2017-08-31 | SZ DJI Technology Co., Ltd. | Systems and methods for visual target tracking |
US20180372499A1 (en) * | 2017-06-25 | 2018-12-27 | Invensense, Inc. | Method and apparatus for characterizing platform motion |
CN108346159A (en) * | 2018-01-28 | 2018-07-31 | 北京工业大学 | A kind of visual target tracking method based on tracking-study-detection |
CN108447078A (en) * | 2018-02-28 | 2018-08-24 | 长沙师范学院 | The interference of view-based access control model conspicuousness perceives track algorithm |
CN109360223A (en) * | 2018-09-14 | 2019-02-19 | 天津大学 | A kind of method for tracking target of quick spatial regularization |
Non-Patent Citations (4)
Title |
---|
戴凤智等: "基于深度学习的视频跟踪研究进展综述", 《计算机工程与应用》 * |
李杰等: "基于粒子群优化的模板匹配跟踪算法", 《计算机应用》 * |
武星等: "视觉导引AGV鲁棒特征识别与精确路径跟踪研究", 《农业机械学报》 * |
陆惟见等: "基于多模板的鲁棒运动目标跟踪方法", 《传感器与微***》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110490148A (en) * | 2019-08-22 | 2019-11-22 | 四川自由健信息科技有限公司 | A kind of recognition methods for behavior of fighting |
CN110675423A (en) * | 2019-08-29 | 2020-01-10 | 电子科技大学 | Unmanned aerial vehicle tracking method based on twin neural network and attention model |
CN110647836A (en) * | 2019-09-18 | 2020-01-03 | 中国科学院光电技术研究所 | Robust single-target tracking method based on deep learning |
CN110738149A (en) * | 2019-09-29 | 2020-01-31 | 深圳市优必选科技股份有限公司 | Target tracking method, terminal and storage medium |
CN111046796A (en) * | 2019-12-12 | 2020-04-21 | 哈尔滨拓博科技有限公司 | Low-cost space gesture control method and system based on double-camera depth information |
CN111260686A (en) * | 2020-01-09 | 2020-06-09 | 滨州学院 | Target tracking method and system for anti-shielding multi-feature fusion of self-adaptive cosine window |
CN111260686B (en) * | 2020-01-09 | 2023-11-10 | 滨州学院 | Target tracking method and system for anti-shielding multi-feature fusion of self-adaptive cosine window |
CN112991395A (en) * | 2021-04-28 | 2021-06-18 | 山东工商学院 | Vision tracking method based on foreground condition probability optimization scale and angle |
CN112991395B (en) * | 2021-04-28 | 2022-04-15 | 山东工商学院 | Vision tracking method based on foreground condition probability optimization scale and angle |
CN115063449A (en) * | 2022-07-06 | 2022-09-16 | 西北工业大学 | Hyperspectral video-oriented three-channel video output method for target tracking |
Also Published As
Publication number | Publication date |
---|---|
CN109993775B (en) | 2023-03-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109993775A (en) | Monotrack method based on feature compensation | |
CN111797716A (en) | Single target tracking method based on Siamese network | |
CN104902267B (en) | No-reference image quality evaluation method based on gradient information | |
CN106228528B (en) | A kind of multi-focus image fusing method based on decision diagram and rarefaction representation | |
CN106709936A (en) | Single target tracking method based on convolution neural network | |
CN108182388A (en) | A kind of motion target tracking method based on image | |
CN110443763B (en) | Convolutional neural network-based image shadow removing method | |
CN108573222A (en) | The pedestrian image occlusion detection method for generating network is fought based on cycle | |
CN108198201A (en) | A kind of multi-object tracking method, terminal device and storage medium | |
CN102034247B (en) | Motion capture method for binocular vision image based on background modeling | |
CN105357519B (en) | Quality objective evaluation method for three-dimensional image without reference based on self-similarity characteristic | |
CN104992403B (en) | Mixed operation operator image redirection method based on visual similarity measurement | |
CN108460790A (en) | A kind of visual tracking method based on consistency fallout predictor model | |
CN110322445A (en) | A kind of semantic segmentation method based on maximization prediction and impairment correlations function between label | |
CN109886356A (en) | A kind of target tracking method based on three branch's neural networks | |
Wang et al. | Background extraction based on joint gaussian conditional random fields | |
CN109711267A (en) | A kind of pedestrian identifies again, pedestrian movement's orbit generation method and device | |
CN106791822A (en) | It is a kind of based on single binocular feature learning without refer to stereo image quality evaluation method | |
CN104902268A (en) | Non-reference three-dimensional image objective quality evaluation method based on local ternary pattern | |
CN109840905A (en) | Power equipment rusty stain detection method and system | |
CN112818849A (en) | Crowd density detection algorithm based on context attention convolutional neural network of counterstudy | |
CN110866473B (en) | Target object tracking detection method and device, storage medium and electronic device | |
Luo et al. | Bi-GANs-ST for perceptual image super-resolution | |
Liu et al. | Spatio-temporal interactive laws feature correlation method to video quality assessment | |
Da et al. | Perceptual quality assessment of nighttime video |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |