CN109993775A - Monotrack method based on feature compensation - Google Patents

Monotrack method based on feature compensation Download PDF

Info

Publication number
CN109993775A
CN109993775A CN201910258571.8A CN201910258571A CN109993775A CN 109993775 A CN109993775 A CN 109993775A CN 201910258571 A CN201910258571 A CN 201910258571A CN 109993775 A CN109993775 A CN 109993775A
Authority
CN
China
Prior art keywords
target
pixel
frame
histograms
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910258571.8A
Other languages
Chinese (zh)
Other versions
CN109993775B (en
Inventor
杨云
白杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN201910258571.8A priority Critical patent/CN109993775B/en
Publication of CN109993775A publication Critical patent/CN109993775A/en
Application granted granted Critical
Publication of CN109993775B publication Critical patent/CN109993775B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/207Analysis of motion for motion estimation over a hierarchy of resolutions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses it is a kind of based on posteriority pixel color histogram, histograms of oriented gradients, convolutional neural networks feature compensation video target tracking method, in simple scenario using simple feature to guarantee real-time, complex scene uses complex characteristic, to guarantee accuracy.It is combined by two kinds of features of posteriority pixel histogram and histograms of oriented gradients, obtained response characteristic figure can be good at the fairly simple situation of adaptive video scene;One classifier of training is to judge that it is insincere that the former merges the target when obtained response obtains;Finally further according to the judging result of classifier, the convolutional neural networks tracker for choosing whether the relatively slow still performance more robust of speed to be added is corrected to the target for occurring deviateing is tracked, or is given for change again to tracking lost target.The present invention improves the precision that target sizes and position are judged in video, and it can well adapt to prolonged target following task, to reach the scene of practical application.

Description

Monotrack method based on feature compensation
Technical field
The invention belongs to the monotrack technical field of computer vision, more particularly to a kind of based on feature compensation Monotrack method.
Background technique
In computer vision field, the problem of tracing task is all a core all the time, it is widely used in video Many aspects such as monitoring, human-computer interaction, robot vision perception, military guidance.Monotrack is in the first frame of video The positions and dimensions of tracking target are manually marked with rectangle frame, what then tracking needed to do is exactly after video In continuous frame, equally with rectangle frame immediately following firmly this object manually marked.Similar target detection, be in still image or Target is scanned and searched in dynamic video within the scope of whole frame, summarizes to say, target detection is concerned with positioning and classification.And Target following, it is of interest that how to lock certain people or object in real time, it and pay no attention to oneself tracking what is.Due to tracking The requirement of method real-time, the expense that whole frame search calculates are very expensive, hence it is evident that are poorly suitable for this scene, and the object tracked There is continuity over time and space, therefore the search range of tracking can greatly reduce.However also exactly because For there are this continuitys, during tracking, some complex scenes, there are the variations of illumination, the deformation of appearance, quickly fortune It moves, block the disturbing factors such as similar with background, the model of most of tracking needs constantly more during tracing task New model itself, therefore, once model learning to background information, is easy for generating error, and this error can accumulate always Go down, causes finally to lose target.
Currently, the track algorithm overwhelming majority of mainstream is short-term tracking (short-term tracking), it is primarily present Following defect:
(1) poor robustness
It can not be given for change again after trace model loses target, this kind of algorithms are mainly in the position of tracking target With article above and below the precision of size, do not have higher robustness, do not adapt to prolonged tracing task, such model its It should can not be used in reality scene well in fact.
(2) speed is slow
Either end to end neural network structure trace model or depth convolution characteristic pattern in conjunction with correlation filtering Trace model will spend the high calculating time, therefore answer in actual scene although higher accuracy rate can be obtained With less.And other are based on correlation filtering tradition trace model, although cracking speed can be reached, but in accuracy and Shandong Show not good enough on stick.
(3) error accumulation
Since, there are various disturbing factors, model is difficult all correctly to trace into mesh in each frame in video scene Mark, therefore will learn updating template to background or other interference informations, this error will constantly accumulate, and be it is a kind of not Reversible process.
For defect existing for above-mentioned tracking, to accomplish to be well applied in reality scene, still cutting Access point is placed on long-term follow (long-term tracking), i.e., improves robustness as much as possible, guarantee that speed reaches real-time While can adapt to prolonged tracing task.
Summary of the invention
The purpose of the present invention is to provide one kind based on posteriority pixel color histogram, histograms of oriented gradients, convolution mind Feature compensation video target tracking method through network, with realize construction one robustness it is good, speed is fast, ensure accuracy and While robustness is promoted, fully ensure that model there can be higher frame per second in terms of speed;The present invention, which improves in video, to be judged The precision of target sizes and position, and it can well adapt to prolonged target following task, to reach practical application Scene.
The technical scheme adopted by the invention is that providing a kind of feature compensation video frequency object tracking side based on Fusion Features Method, comprising the following steps:
S1 establishes the target following model branch of color histogram feature:
S11 calls OpenCV kit, is first with the target image manually marked before target following task starts Basis cuts out the target subgraph E with background information;
Target subgraph with background information according to a certain percentage according to the size of target is done foreground area by S12 With the separation of background area;Meanwhile scale compression is carried out to pixel in the integer range that pixel value is 0-32, and rely on respectively The identical foreground mask of size and background exposure mask calculate in corresponding foreground area and background area relative to each pixel value Pixel ratio, i.e. foreground pixel ratio ρ (O) and background pixel ratio ρ (B), the expression formula of pixel ratio ρ be as follows:
ρ (O)=N (O)/| O |; (1-1)
ρ (B)=N (B)/| B |; (1-2)
Wherein, the image-region of O character representation prospect O, the image-region of B character representation background B;N (O) indicates prospect O Image-region in non-zero pixels value number, N (B) indicate background B image-region in non-zero pixels value number;| O | table Show the total number of pixel value in the image-region of prospect O, | B | indicate the total number of pixel value in the image-region of background B;It is based on The weight beta of present frame posteriority pixel color histogram template is calculated in formula (1-1), (1-2)t:
Wherein, t indicates present frame, and λ is hyper parameter;
S13, in video next frame, using former frame target's center as in the image range at region of search center, with described S12 cuts out subgraph e, and carries out scale to pixel and compress to obtain ψ;It is obtained according to formula (1-1), (1-2) and formula (2) The weight beta of the posteriority pixel color histogram template of former framet-1, color histogram, which is finally obtained, using integrogram formula responds fhist:
Wherein, ψ is the compressed subgraph of the channel M pixel, is defined on present frame and cuts on picture e;ψtIt is current The compressed subgraph of the channel frame M pixel;H represents each pixel in integer range of the correspondence of picture;U represents H net Each corresponding grid in lattice, ψ [u] are the corresponding pixels on ψ, and superscript T is matrix transposition;
S14, while completing the tracing task of a frame every time, in the position for the present frame that prediction obtains, to posteriority pixel The weight beta of histogram templatetIt is updated, i.e., foreground pixel ratio ρ (O) and background pixel ratio ρ (B) is carried out more respectively Newly, the pixel ratio ρ of the prospect O of present frame after being updatedt(O) and update after present frame background B pixel ratio ρt(B):
ρt(O)=(1- ηhistt-1(O)+ηhistρ′t(O)
ρt(B)=(1- ηhistt-1(B)+ηhistρ′t(B); (4)
Wherein ρ 'tIt (O) is the pixel ratio in the image-region of the prospect O of present frame, ρ 'tIt (B) is the background B of present frame Image-region in pixel ratio, ρt-1It (O) is the pixel ratio in the image-region of the prospect O of former frame, ρt-1(B) before being Pixel ratio in the image-region of the background B of one frame;ηhistThe weight updated for pixel ratio;
S2 establishes the target following model branch of histograms of oriented gradients feature:
S21 is selected on the target image to be tracked in S11 with rectangle frame, and it is different but same to cut out another size Target area subgraph E ' with background information, and extract the histograms of oriented gradients feature φ of K channel three-dimensionalk, it is multiplied by The template of histograms of oriented gradients feature is calculated in cosine window function in OpenCV packet
Wherein,It is to be defined on frequency domain to pass through the variable that discrete Fourier transform obtains;U is represented in Γ grid Each corresponding grid, Γ represent φkEach grid in integer range of upper correspondence;Superscript i represents the every of K channel One channel,It is the conjugation of Fourier transformation the latter gaussian signal;It indicates to be conjugated with * in a frequency domain, e indicates element Multiplication,It is histograms of oriented gradients feature φkEach Channel elements obtained by Fourier transform, K are the numbers in channel Mesh;
S22, by histograms of oriented gradients template obtained in S21It carries out inverse Fourier transform and obtains h [u], regarding In frequency next frame, using former frame target's center in the image range at region of search center, to cut a subgraph e ', and extract The histograms of oriented gradients feature φ of current subgraph, is calculated present frame histograms of oriented gradients score using linear function fhog:
fhog(φ, h)=∑u∈Γh[u]Tφ[u]; (7)
S23 is straight to direction gradient in the position for the present frame that prediction obtains after the tracing task for completing each frame The template of square figure feature is updated, that is, respectively obtains updated final signalWith
Wherein,WithIt is to calculate separately to obtain the signal for indicating present frame from formula (6),WithIndicate previous The signal of frame,WithIndicate updated final signal;ηhogFor the weight of histograms of oriented gradients template renewal;
S3, Fusion Features and establishes classifier:
The color histogram respectively obtained in S13, S22 is responded f by S31hist, histograms of oriented gradients score fhogPass through Define linear function f (x) carry out Fusion Features to get:
F (x)=γhogfhog(x)+γhistfhist(x); (9)
Wherein, γhogFor the weight of histograms of oriented gradients response, γhsitFor the weight of color histogram response, f is taken (x) coordinate of point corresponding to maximum value is the centre coordinate of target;
S32 passes through f, fhogTraining classifier: a collection of video sequence is selected, by the Fusion Features in S31, is exported respectively f、fhog, and enabling the input of data set is X=[max (fhog);Max (f)], enabling output label is h 'θ, indicate the true of data set Value, h 'θTracking box for 0 or 1 integer, 0 expression model has had deviated from target, and 1 indicates without departing from target;Logic is enabled to return Return function hθThe output of presentation class device:
Data are divided into training set in the ratio of 7:3 and verifying collects, training set passes through under cross entropy loss function and gradient Algorithm is dropped, after successive ignition calculates convergence, obtains the parameter θ of Logic Regression Models in formula (10);Again with verifying collection data The fine tuning for carrying out hyper parameter calculates the correct result that parameter obtains under different value, selects by the way that the parameter of different numerical value is arranged The highest value of correct result is selected as final parameter value, so that classifier reaches preferable classification results on verifying collection;
S4 judges whether to need to access convolutional neural networks structure tracker:
S41, by f, fhogIn the classifier that input S32 is obtained, exported;Classifier (i.e. formula 10) exports in S32 Successive value in select 0.5 as threshold value;It when output is greater than 0.5, indicates that the result of Fusion Model is credible, does not have to switching Access convolutional neural networks tracker;When output is less than 0.5, indicates that the result of Fusion Model is not received, access need to be switched Convolutional neural networks tracker;
S42, when the target response score that convolutional neural networks tracker predicted current frame goes out is higher, recycling S14, S23 is respectively updated posteriority pixel histogram template and histograms of oriented gradients template;Later into next frame with Track task, until completing all video frames.
The beneficial effects of the present invention are:
(1) present invention merges manifold method, combines the characteristic that these features have, can be good at coping with Illumination variation, object deformation, the video scenes such as blocks at motion blur, and quick using simple feature in simple scene Tracing task is completed, reduces the influence of interference information in the complicated more robust feature of scene switching.
(2) present invention adds the methods of self-test classifier, show model in handoff features more intelligent, more When the template of new feature, inhibits study to arrive invalid information, reduce the accumulation of error;Meanwhile classifier is relatively simple, does not need too More computing costs.
(3) tracker for the neural network structure that the present invention selects will not learn the letter to interference without updating template Breath, in the case where target occlusion, there is good performance.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is foreground and background exposure mask schematic diagram.
Fig. 2 is posteriority pixel histogram and response diagram schematic diagram.
Fig. 3 is histograms of oriented gradients and response diagram.
Fig. 4 is the monotrack algorithm schematic diagram based on feature compensation.
Fig. 5 is the accuracy of each algorithm and robustness distribution map under reset mechanism.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
In target tracking domain, mainly have deformation, illumination variation, quickly movement, background is similar, Plane Rotation, scale Change, block, going out the difficult points such as the visual field.
Detailed process is as follows:
S1 establishes the target following model branch of color histogram feature:
S11 is built upon under the scene of monotrack task due to this method, that is, needs to appoint in target following Before business starts, OpenCV kit is called, the mode manually marked selects the target to be tracked with rectangle frame, and cuts The target subgraph E of background information is had out, and the characteristic that then model can just have according to selected target and its background carries out It distinguishes, completes subsequent tracing task.Therefore, under such scene, it is either based on which kind of feature, model can all be risen by video It is subsequent to match to generate an initial feature template according to the respective mode of model for the image in target frame that beginning frame is selected The candidate region of frame image, to predict position and the size of target.
S12, on the first frame basis that frame has selected the target to be tracked, the model of this color histogram will Target subgraph E with background information according to a certain percentage according to the size of target is done a foreground area and background The separation in region.Because pixel value range is 0~255, such as calculated with original pixel value, a large amount of calculating times will be spent, so needing One is wanted to do a scale compression to pixel, the scale selected here is 8, i.e., calculates in 0~32 integer range, significantly Lift scheme speed.And (as shown in Fig. 1-a, white object region is 1 black of value back to the identical foreground mask of support size respectively Scenic spot thresholding be 0 single channel image) and background exposure mask (as shown in Fig. 1-b, black objects regional value be 0 white background area Single channel image for 1) it calculates in two regions, relative to the pixel ratio of each pixel value, i.e. foreground pixel ratio ρ (O) and background pixel ratio ρ (B):
ρ (O)=N (O)/| O |; (1-1)
ρ (B)=N (B)/| B |; (1-2)
Wherein, the image-region of O character representation prospect O, the image-region of B character representation background B;N (O) indicates prospect O Image-region in non-zero pixels value number, N (B) indicate background B image-region in non-zero pixels value number;| O | table Show the total number of pixel value in the image-region of prospect O, | B | indicate the total number of pixel value in the image-region of background B;Respectively After obtaining the pixel ratio of foreground and background, so that it may the weight of present frame posteriority pixel color histogram template be calculated βt:
T indicates present frame, and λ is hyper parameter.
S13 is search with former frame target's center in video next frame after establishing posteriority pixel histogram template In the image range of regional center, subgraph e is similarly cut out, and scale is carried out to pixel and compresses to obtain ψ.According to formula (1-1), (1-2) obtain the weight beta of the posteriority pixel color histogram template of former framet-1, such as Fig. 2-a, Fig. 2-b and Fig. 2-c institute Show, finally obtains color histogram response f using integrogram formulahist:
Wherein, ψ is the compressed subgraph of the channel M pixel, is defined on present frame and cuts on picture e;ψtIt is current The compressed subgraph of the channel frame M pixel;H represents each pixel in integer range of the correspondence of picture;U represents H net Each corresponding grid in lattice, ψ [u] are the corresponding pixels on ψ, and superscript T is matrix transposition;
S1-4, during online tracking, all delicate or violent variation is occurring for the scene in video at any time, for For color histogram feature, the influence of the disturbing factors such as illumination variation, motion blur is especially serious.Therefore in order to preferably suitable Many variations present in video scene are answered, while completing the tracing task of a frame every time, in the present frame that prediction obtains Position, need the weight beta to posteriority pixel histogram templatetIt is updated, that is, respectively to the pixel of foreground and background Ratio ρ (O), ρ (B) update:
ρt(O)=(1- ηhistt-1(O)+ηhistρ′t(O)
ρt(B)=(1- ηhistt-1(B)+ηhistρ′t(B); (4)
Wherein, ρ 'tIt (O) is the pixel ratio in the image-region of the prospect O of present frame, ρ 'tIt (B) is the background B of present frame Image-region in pixel ratio, ρt-1It (O) is the pixel ratio in the image-region of the prospect O of former frame, ρt-1(B) before being Pixel ratio in the image-region of the background B of one frame;ηhistThe weight updated for pixel ratio;
S2 establishes the target following model branch of histograms of oriented gradients feature:
S2-1 is cut out on the first frame basis that frame has selected the tracking target for manually marking out with back The target area subgraph E ' of scape information extracts the histograms of oriented gradients feature φ of K channel three-dimensionalk, such as Fig. 3-a, Fig. 3-b It is shown, inhibit the influence of the subgraph peripheral part of surrounding multiplied by the cosine window function in an OpenCV packet.It calculates The template of histograms of oriented gradients feature
Wherein,It is to be defined on frequency domain to pass through the variable that discrete Fourier transform obtains, because of correlation filtering In model in have cross-correlation operation, this will spend very high calculating time overhead, therefore after variable is done Fourier transform, Convolutional calculation in the time domain can be converted into the product of the element in frequency domain and calculate, and can greatly reduce and calculate the time.U is represented Each corresponding grid, Γ represent φ in Γ gridkEach grid in integer range of upper correspondence;Superscript i represents K Each channel in channel,It is the conjugation of Fourier transformation the latter gaussian signal;It indicates to be conjugated with * in a frequency domain, e Indicate element multiplication,It is histograms of oriented gradients feature φkEach Channel elements obtained by Fourier transform, K are The number in channel.
S22 establishes histograms of oriented gradients templateAfterwards, it carries out inverse Fourier transform and obtains h [u], under video In one frame, using former frame target's center as in the image range at region of search center, a subgraph e ' searched is cut, is extracted The histograms of oriented gradients feature φ of current subgraph, so that it may it is straight that present frame direction gradient be calculated with a linear function Square figure score fhog, effect picture is as shown in Fig. 3-c:
fhog(φ, h)=∑u∈Γh[u]Tφ[u]; (7)
S23, in the stage tracked online, histograms of oriented gradients is same as target changes in scene, and makes It at interference, is especially affected caused by object deformation, therefore, also needs after the tracing task for completing each frame, The position for predicting obtained present frame, is updated the template of histograms of oriented gradients feature:
Wherein,WithIt is to calculate separately to obtain the signal for indicating present frame from formula (6),WithIndicate previous The signal of frame,WithIndicate updated final signal;ηhogFor the weight of histograms of oriented gradients template renewal.
S3, Fusion Features and establishes classifier:
S31, because there is illumination variation in the scene in color histogram feature, when the disturbing factors such as fuzzy pictures, It is affected to model, and histograms of oriented gradients feature has deformation in target, it is right quickly when the disturbing factors such as movement Model is affected.So two kinds of Fusion Features can be reduced the interference of these factors to a certain extent, tracking is improved Model accuracy and robustness, make it in tracing task, can predict the more accurate position of target and size, and and do not allow Target easy to be lost.Here the color histogram respectively obtained in S13, S22 is responded into fhist, histograms of oriented gradients score fhog By define linear function f (x) carry out Fusion Features to get:
F (x)=γhogfhog(x)+γhistfhist(x); (9)
Wherein, γhogFor the weight of histograms of oriented gradients response, γhsitFor the weight of color histogram response, f is taken (x) coordinate of point corresponding to maximum value is the centre coordinate of target.
S32, although after two kinds of features are merged, good effect can be shown in most of scene, Be it is similar for a part of such as background, block, the more complicated video scene in the visual field out, still have biggish performance boost Space.Therefore other more robusts are added, the tracker of the better neural network structure of effect improves the performance of model.It considers The neural network speed of service is slower, and current general hardware device is not able to satisfy the requirement of real-time, therefore only in first two It, could be maximum using the tracker of neural network when the model of Fusion Features cannot complete the tracing task of present frame well The performance of limit performance model.Cope with such demand most critical is exactly that Fusion Features model is allowed to know when to need to cut The tracker for changing neural network structure, by analyzing f, fhist、fhogIt is (single only there are input (x) when mapping relations in formula Only symbol indicates not needing statement (x)) three situations of change of the response score under different scenes can be seen that and occur in target Larger deformation or when blocking, f, fhogThere is larger fluctuation, therefore one point of the two values training can be passed through Mark of the class device as switching tracker.
A collection of video sequence is selected, by the Fusion Features in S31, exports f, f respectivelyhog, and enable the input of data set be X=[max (fhog);Max (f)], enabling output label is h 'θ, indicate the true value of data set, h 'θIt is indicated for 0 or 1 integer, 0 The tracking box of model has had deviated from target, and 1 indicates without departing from target;There is a concept in target detection friendship and compares (Intersection-over-Union, IoU) indicates the frames images and real image frame Duplication of prediction, makes herein herein Use friendship and the foundation than whether deviateing as measurement target;Experiment by multiple values is attempted, and 0.35 conduct cut off value is one Selection more appropriate, i.e., as IoU > 0.35, h 'θWhen=1, IoU < 0.35, h 'θ=0;Enable logistic regression function hθIt indicates to divide The output of class device:
Data are divided into training set in the ratio of 7:3 and verifying collects, training set passes through under cross entropy loss function and gradient Algorithm is dropped, after successive ignition calculates convergence, obtains the weight θ in formula (10).Hyper parameter is carried out with verifying collection data again Fine tuning calculate the correct result that parameter obtains under different value by the way that the parameter of different numerical value is arranged, select correct result Highest value is as final parameter value, so that classifier reaches preferable classification results on verifying collection.
S4 judges whether to need to access convolutional neural networks structure tracker:
S41, after training and finely tuned sorter model in previous step, so that it may judge face in the stage of tracing task Whether the model of Color Histogram feature and histograms of oriented gradients Fusion Features can also adapt to current video scene, to do Whether need to switch the tracker of neural network out.During tracking, f that formula (7), (9) obtainhog, f as classification The input of device, i.e. formula (10), obtained output, the mark as switched.When previous step verify data carries out adjusting ginseng, A suitable threshold value 0.5 has been had selected, when output is greater than this threshold value, has indicated that the end value of Fusion Model must be believed Appoint, does not have to switching;When output is less than this threshold value, the result of Fusion Model is not received, and at this time just switches nerve net The tracker of network.Here the neural network tracker selected is DaSiamRPN, it combines the thought and structure of target detection RPN (network is suggested in Region Proposal Networks, region), can be good at the scene for coping with some complexity, more can The accurately size after fit object deformation, does not need the template of online updating target, therefore cumulative errors are also just not present Pollute template situation.
S42, the stage tracked online need with formula (4), (8) respectively to posteriority pixel histogram template and direction gradient Histogram template is updated, and carrys out the variation of scene in adaptive video.Similarly, work as in switching DaSiamRPN tracker completion After the tracing task of previous frame, it is still desirable to be updated operation with the two formula.But because DaSiamRPN there is also with The case where track fails, so only just template is updated when the target response score that tracker predicted current frame goes out is higher, so far, The tracing task of next frame is just carried out, until completing all video frames.Entire trace flow is as shown in Figure 4.
Embodiment
In order to assess performance of the invention, need to be tested on video sequence test set.Here VOT is selected The evaluation method, data set, evaluation system of (visual object tracking) contest carry out this experiment.Data set includes 60 video sequences have been directed to block, the movement of illumination variation, target, dimensional variation, camera movement, have gone out the scenes such as visual field, A variety of above-mentioned attributes are likely to occur in a video sequence, the perceptual property of different frame is different, can carry out in this way to model More accurately evaluate.Before VOT proposition, popular evaluation system is that tracker is allowed to carry out just in the first frame of sequence Beginningization allows tracker to go to last frame always later.However tracker may lead to it because of wherein one or two of factor Beginning certain frames just with losing (fail), so the very small part of sequence is only utilized in final evaluation system, cause to waste. And VOT is proposed, evaluation system should detect wrong (failure) when tracker is with losing, and occur in failure 5 frames after (reinitialize) is reinitialized to tracker, data set can be made full use of in this way.
Referring initially to the experiment scoring of reset mechanism, such as table 1:
The scoring of algorithms of different under 1 reset mechanism of table
A-R rank indicates the index of accuracy rate (Accuracy) and robustness (Robustness) ranking in table 1, Overlap is equivalent to accuracy rate, and representative is target and the target Duplication manually really marked that tracking is predicted, The bigger explanation of Overlap, prediction it is more accurate;Failure is the stability for evaluating tracking, and numerical value is smaller, is stablized Property is better.By being compared with 7 trackings, it can be seen that make number one in the accuracy of this method, stability comes Third position.The scoring tendency of all algorithms in table can also be more intuitively found out by Fig. 5.However it can not in actual scene It is reset again after failing in the presence of tracking, it is clear that the first appraisement system has more reference value for actual scene, and experiment is commented Divide such as table 2:
Scoring of the table 2 without algorithms of different under reset mechanism
AUC in table 2 (Area Under Curve, the area surrounded under curve with reference axis) is an evaluation algorithms quality Index, value is bigger, and the performance for illustrating algorithm is better.Speed index FPS (Frames Per Second, transmission frame per second per second), It is also that the bigger value the faster.As can be seen that will not be by scoring again after also meaning that tracking failure under no reset mechanism In the case where system positioning target, the relatively other 7 method accuracys rate of this method have reached highest, and accuracy rate ranking first three Algorithm in it is fastest.In addition, being CPU:Intel Core i7-6700, GPU:GeForce GT in the machine hardware configuration It is tested on 730, the prestissimo for obtaining SiamFC method in table is only 3FPS, and this method can reach most fast 30FPS, therefore It is higher in accuracy rate while still having original speed as can be seen that compare other methods, more adaptation actual scene.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (1)

1. a kind of monotrack method based on feature compensation, which comprises the following steps:
S1 establishes the target following model branch of color histogram feature:
S11 calls OpenCV kit before target following task starts, first based on the target image manually marked, Cut out the target subgraph E with background information;
Target subgraph with background information according to a certain percentage according to the size of target is done foreground area and back by S12 The separation of scene area;Meanwhile scale compression is carried out to pixel in the integer range that pixel value is 0-32, and rely on size respectively Identical foreground mask and background exposure mask calculate the picture in corresponding foreground area and background area relative to each pixel value Plain ratio, i.e. foreground pixel ratio ρ (O) and background pixel ratio ρ (B), the expression formula of pixel ratio ρ are as follows:
ρ (O)=N (O)/| O |; (1-1)
ρ (B)=N (B)/| B |; (1-2)
Wherein, the image-region of O character representation prospect O, the image-region of B character representation background B;The figure of N (O) expression prospect O As the number of non-zero pixels value in region, N (B) indicates the number of non-zero pixels value in the image-region of background B;| O | before expression The total number of pixel value in the image-region of scape O, | B | indicate the total number of pixel value in the image-region of background B;Based on formula The weight beta of present frame posteriority pixel color histogram template is calculated in (1-1), (1-2)t:
Wherein, t indicates present frame, and λ is hyper parameter;
S13, in video next frame, using former frame target's center as in the image range at region of search center, with the S12, Subgraph e is cut out, and scale is carried out to pixel and compresses to obtain ψ;It is obtained according to formula (1-1), (1-2) and formula (2) previous The weight beta of the posteriority pixel color histogram template of framet-1, color histogram response f is finally obtained using integrogram formulahist:
Wherein, ψ is the compressed subgraph of the channel M pixel, is defined on present frame and cuts on picture e;ψtFor the M of present frame The compressed subgraph of channel pixel;H represents each pixel in integer range of the correspondence of picture;U is represented in H grid Each corresponding grid, ψ [u] are the corresponding pixels on ψ, and superscript T is matrix transposition;
S14, while completing the tracing task of a frame every time, in the position for the present frame that prediction obtains, to posteriority pixel histogram The weight beta of artwork versiontIt is updated, i.e., foreground pixel ratio ρ (O) and background pixel ratio ρ (B) is updated respectively, is obtained The pixel ratio ρ of the prospect O of present frame after to updatet(O) and update after present frame background B pixel ratio ρt(B):
ρt(O)=(1- ηhistt-1(O)+ηhistρ′t(O)
ρt(B)=(1- ηhistt-1(B)+ηhistρ′t(B); (4)
Wherein, ρ 'tIt (O) is the pixel ratio in the image-region of the prospect O of present frame, ρ 'tIt (B) is the figure of the background B of present frame As the pixel ratio in region, ρt-1It (O) is the pixel ratio in the image-region of the prospect O of former frame, ρt-1It (B) is former frame Background B image-region in pixel ratio;ηhistThe weight updated for pixel ratio;
S2 establishes the target following model branch of histograms of oriented gradients feature:
S21 is selected on the target image to be tracked in S11 with rectangle frame, cut out another size it is different but again with The target area subgraph E ' of background information, and extract the histograms of oriented gradients feature φ of K channel three-dimensionalk, it is multiplied by The template of histograms of oriented gradients feature is calculated in cosine window function in OpenCV packet
Wherein,It is to be defined on frequency domain to pass through the variable that discrete Fourier transform obtains;U represents corresponding in Γ grid Each grid, Γ represents φkEach grid in integer range of upper correspondence;Superscript i represents each of K channel Channel,It is the conjugation of Fourier transformation the latter gaussian signal;It indicates to be conjugated with * in a frequency domain, e indicates that element multiplies Method,It is histograms of oriented gradients feature φkEach Channel elements obtained by Fourier transform, K are the numbers in channel;
S22, by histograms of oriented gradients template obtained in S21It carries out inverse Fourier transform and obtains h [u], under video In one frame, using former frame target's center in the image range at region of search center, to cut a subgraph e ', and extract current Present frame histograms of oriented gradients score f is calculated using linear function in the histograms of oriented gradients feature φ of subgraphhog:
fhog(φ, h)=∑u∈Γh[u]Tφ[u]; (7)
S23, after the tracing task for completing each frame, in the position for the present frame that prediction obtains, to histograms of oriented gradients The template of feature is updated, that is, respectively obtains updated final signalWith
Wherein,WithIt is to calculate separately to obtain the signal for indicating present frame from formula (6),WithIndicate former frame Signal,WithIndicate updated final signal;ηhogFor the weight of histograms of oriented gradients template renewal;
S3, Fusion Features and establishes classifier:
The color histogram respectively obtained in S13, S22 is responded f by S31hist, histograms of oriented gradients score fhogPass through definition Linear function f (x) carry out Fusion Features to get:
F (x)=γhogfhog(x)+γhistfhist(x); (9)
Wherein, γhogFor the weight of histograms of oriented gradients response, γhsitFor the weight of color histogram response, f (x) is taken most The coordinate of the corresponding point of big value is the centre coordinate of target;
S32 passes through f, fhogTraining classifier: selecting a collection of video sequence, by the Fusion Features in S31, export respectively f, fhog, and enabling the input of data set is X=[max (fhog);Max (f)], enabling output label is h 'θ, indicate the true of data set Value, h 'θTracking box for 0 or 1 integer, 0 expression model has had deviated from target, and 1 indicates without departing from target;Logic is enabled to return Return function hθThe output of presentation class device:
Data are divided into training set in the ratio of 7:3 and verifying collects, training set is calculated by cross entropy loss function and gradient decline Method obtains the parameter θ of Logic Regression Models in formula (10) after successive ignition calculates convergence;Again with verifying collection data into The fine tuning of row hyper parameter calculates the correct result that parameter obtains under different value, selects by the way that the parameter of different numerical value is arranged The highest value of correct result is as final parameter value, so that classifier reaches preferable classification results on verifying collection;
S4 judges whether to need to access convolutional neural networks structure tracker:
S41, by f, fhogIn the classifier that input S32 is obtained, exported;It is selected in the successive value that classifier exports in S32 0.5 is used as threshold value;It when output is greater than 0.5, indicates that the result of Fusion Model is credible, does not have to switching access convolutional Neural net Network tracker;When output less than 0.5 when, indicate that the result of Fusion Model is not received, need to switch access convolutional neural networks with Track device;
S42 reuses S14, S23 when the target response score that convolutional neural networks tracker predicted current frame goes out is higher, Posteriority pixel histogram template and histograms of oriented gradients template are updated respectively;Appoint later into the tracking of next frame Business, until completing all video frames.
CN201910258571.8A 2019-04-01 2019-04-01 Single target tracking method based on characteristic compensation Active CN109993775B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910258571.8A CN109993775B (en) 2019-04-01 2019-04-01 Single target tracking method based on characteristic compensation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910258571.8A CN109993775B (en) 2019-04-01 2019-04-01 Single target tracking method based on characteristic compensation

Publications (2)

Publication Number Publication Date
CN109993775A true CN109993775A (en) 2019-07-09
CN109993775B CN109993775B (en) 2023-03-21

Family

ID=67132176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910258571.8A Active CN109993775B (en) 2019-04-01 2019-04-01 Single target tracking method based on characteristic compensation

Country Status (1)

Country Link
CN (1) CN109993775B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490148A (en) * 2019-08-22 2019-11-22 四川自由健信息科技有限公司 A kind of recognition methods for behavior of fighting
CN110647836A (en) * 2019-09-18 2020-01-03 中国科学院光电技术研究所 Robust single-target tracking method based on deep learning
CN110675423A (en) * 2019-08-29 2020-01-10 电子科技大学 Unmanned aerial vehicle tracking method based on twin neural network and attention model
CN110738149A (en) * 2019-09-29 2020-01-31 深圳市优必选科技股份有限公司 Target tracking method, terminal and storage medium
CN111046796A (en) * 2019-12-12 2020-04-21 哈尔滨拓博科技有限公司 Low-cost space gesture control method and system based on double-camera depth information
CN111260686A (en) * 2020-01-09 2020-06-09 滨州学院 Target tracking method and system for anti-shielding multi-feature fusion of self-adaptive cosine window
CN112991395A (en) * 2021-04-28 2021-06-18 山东工商学院 Vision tracking method based on foreground condition probability optimization scale and angle
CN115063449A (en) * 2022-07-06 2022-09-16 西北工业大学 Hyperspectral video-oriented three-channel video output method for target tracking

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0795385A (en) * 1993-09-21 1995-04-07 Dainippon Printing Co Ltd Method and device for clipping picture
EP0951182A1 (en) * 1998-04-14 1999-10-20 THOMSON multimedia S.A. Method for detecting static areas in a sequence of video pictures
EP1126414A2 (en) * 2000-02-08 2001-08-22 The University Of Washington Video object tracking using a hierarchy of deformable templates
WO2010001364A2 (en) * 2008-07-04 2010-01-07 Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi Complex wavelet tracker
DE102009038364A1 (en) * 2009-08-23 2011-02-24 Friedrich-Alexander-Universität Erlangen-Nürnberg Method and system for automatic object recognition and subsequent object tracking according to the object shape
CN102750708A (en) * 2012-05-11 2012-10-24 天津大学 Affine motion target tracing algorithm based on fast robust feature matching
US20130083192A1 (en) * 2011-09-30 2013-04-04 Siemens Industry, Inc. Methods and System for Stabilizing Live Video in the Presence of Long-Term Image Drift
CN103426178A (en) * 2012-05-17 2013-12-04 深圳中兴力维技术有限公司 Target tracking method and system based on mean shift in complex scene
CN103793926A (en) * 2014-02-27 2014-05-14 西安电子科技大学 Target tracking method based on sample reselecting
CN104299247A (en) * 2014-10-15 2015-01-21 云南大学 Video object tracking method based on self-adaptive measurement matrix
CN104361611A (en) * 2014-11-18 2015-02-18 南京信息工程大学 Group sparsity robust PCA-based moving object detecting method
US20150146022A1 (en) * 2013-11-25 2015-05-28 Canon Kabushiki Kaisha Rapid shake detection using a cascade of quad-tree motion detectors
WO2017088050A1 (en) * 2015-11-26 2017-06-01 Sportlogiq Inc. Systems and methods for object tracking and localization in videos with adaptive image representation
WO2017132830A1 (en) * 2016-02-02 2017-08-10 Xiaogang Wang Methods and systems for cnn network adaption and object online tracking
WO2017143589A1 (en) * 2016-02-26 2017-08-31 SZ DJI Technology Co., Ltd. Systems and methods for visual target tracking
CN108346159A (en) * 2018-01-28 2018-07-31 北京工业大学 A kind of visual target tracking method based on tracking-study-detection
CN108447078A (en) * 2018-02-28 2018-08-24 长沙师范学院 The interference of view-based access control model conspicuousness perceives track algorithm
US20180372499A1 (en) * 2017-06-25 2018-12-27 Invensense, Inc. Method and apparatus for characterizing platform motion
CN109360223A (en) * 2018-09-14 2019-02-19 天津大学 A kind of method for tracking target of quick spatial regularization

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0795385A (en) * 1993-09-21 1995-04-07 Dainippon Printing Co Ltd Method and device for clipping picture
EP0951182A1 (en) * 1998-04-14 1999-10-20 THOMSON multimedia S.A. Method for detecting static areas in a sequence of video pictures
EP1126414A2 (en) * 2000-02-08 2001-08-22 The University Of Washington Video object tracking using a hierarchy of deformable templates
WO2010001364A2 (en) * 2008-07-04 2010-01-07 Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi Complex wavelet tracker
DE102009038364A1 (en) * 2009-08-23 2011-02-24 Friedrich-Alexander-Universität Erlangen-Nürnberg Method and system for automatic object recognition and subsequent object tracking according to the object shape
US20130083192A1 (en) * 2011-09-30 2013-04-04 Siemens Industry, Inc. Methods and System for Stabilizing Live Video in the Presence of Long-Term Image Drift
CN102750708A (en) * 2012-05-11 2012-10-24 天津大学 Affine motion target tracing algorithm based on fast robust feature matching
CN103426178A (en) * 2012-05-17 2013-12-04 深圳中兴力维技术有限公司 Target tracking method and system based on mean shift in complex scene
US20150146022A1 (en) * 2013-11-25 2015-05-28 Canon Kabushiki Kaisha Rapid shake detection using a cascade of quad-tree motion detectors
CN103793926A (en) * 2014-02-27 2014-05-14 西安电子科技大学 Target tracking method based on sample reselecting
CN104299247A (en) * 2014-10-15 2015-01-21 云南大学 Video object tracking method based on self-adaptive measurement matrix
CN104361611A (en) * 2014-11-18 2015-02-18 南京信息工程大学 Group sparsity robust PCA-based moving object detecting method
WO2017088050A1 (en) * 2015-11-26 2017-06-01 Sportlogiq Inc. Systems and methods for object tracking and localization in videos with adaptive image representation
WO2017132830A1 (en) * 2016-02-02 2017-08-10 Xiaogang Wang Methods and systems for cnn network adaption and object online tracking
WO2017143589A1 (en) * 2016-02-26 2017-08-31 SZ DJI Technology Co., Ltd. Systems and methods for visual target tracking
US20180372499A1 (en) * 2017-06-25 2018-12-27 Invensense, Inc. Method and apparatus for characterizing platform motion
CN108346159A (en) * 2018-01-28 2018-07-31 北京工业大学 A kind of visual target tracking method based on tracking-study-detection
CN108447078A (en) * 2018-02-28 2018-08-24 长沙师范学院 The interference of view-based access control model conspicuousness perceives track algorithm
CN109360223A (en) * 2018-09-14 2019-02-19 天津大学 A kind of method for tracking target of quick spatial regularization

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
戴凤智等: "基于深度学习的视频跟踪研究进展综述", 《计算机工程与应用》 *
李杰等: "基于粒子群优化的模板匹配跟踪算法", 《计算机应用》 *
武星等: "视觉导引AGV鲁棒特征识别与精确路径跟踪研究", 《农业机械学报》 *
陆惟见等: "基于多模板的鲁棒运动目标跟踪方法", 《传感器与微***》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490148A (en) * 2019-08-22 2019-11-22 四川自由健信息科技有限公司 A kind of recognition methods for behavior of fighting
CN110675423A (en) * 2019-08-29 2020-01-10 电子科技大学 Unmanned aerial vehicle tracking method based on twin neural network and attention model
CN110647836A (en) * 2019-09-18 2020-01-03 中国科学院光电技术研究所 Robust single-target tracking method based on deep learning
CN110738149A (en) * 2019-09-29 2020-01-31 深圳市优必选科技股份有限公司 Target tracking method, terminal and storage medium
CN111046796A (en) * 2019-12-12 2020-04-21 哈尔滨拓博科技有限公司 Low-cost space gesture control method and system based on double-camera depth information
CN111260686A (en) * 2020-01-09 2020-06-09 滨州学院 Target tracking method and system for anti-shielding multi-feature fusion of self-adaptive cosine window
CN111260686B (en) * 2020-01-09 2023-11-10 滨州学院 Target tracking method and system for anti-shielding multi-feature fusion of self-adaptive cosine window
CN112991395A (en) * 2021-04-28 2021-06-18 山东工商学院 Vision tracking method based on foreground condition probability optimization scale and angle
CN112991395B (en) * 2021-04-28 2022-04-15 山东工商学院 Vision tracking method based on foreground condition probability optimization scale and angle
CN115063449A (en) * 2022-07-06 2022-09-16 西北工业大学 Hyperspectral video-oriented three-channel video output method for target tracking

Also Published As

Publication number Publication date
CN109993775B (en) 2023-03-21

Similar Documents

Publication Publication Date Title
CN109993775A (en) Monotrack method based on feature compensation
CN111797716A (en) Single target tracking method based on Siamese network
CN104902267B (en) No-reference image quality evaluation method based on gradient information
CN106228528B (en) A kind of multi-focus image fusing method based on decision diagram and rarefaction representation
CN106709936A (en) Single target tracking method based on convolution neural network
CN108182388A (en) A kind of motion target tracking method based on image
CN110443763B (en) Convolutional neural network-based image shadow removing method
CN108573222A (en) The pedestrian image occlusion detection method for generating network is fought based on cycle
CN108198201A (en) A kind of multi-object tracking method, terminal device and storage medium
CN102034247B (en) Motion capture method for binocular vision image based on background modeling
CN105357519B (en) Quality objective evaluation method for three-dimensional image without reference based on self-similarity characteristic
CN104992403B (en) Mixed operation operator image redirection method based on visual similarity measurement
CN108460790A (en) A kind of visual tracking method based on consistency fallout predictor model
CN110322445A (en) A kind of semantic segmentation method based on maximization prediction and impairment correlations function between label
CN109886356A (en) A kind of target tracking method based on three branch&#39;s neural networks
Wang et al. Background extraction based on joint gaussian conditional random fields
CN109711267A (en) A kind of pedestrian identifies again, pedestrian movement&#39;s orbit generation method and device
CN106791822A (en) It is a kind of based on single binocular feature learning without refer to stereo image quality evaluation method
CN104902268A (en) Non-reference three-dimensional image objective quality evaluation method based on local ternary pattern
CN109840905A (en) Power equipment rusty stain detection method and system
CN112818849A (en) Crowd density detection algorithm based on context attention convolutional neural network of counterstudy
CN110866473B (en) Target object tracking detection method and device, storage medium and electronic device
Luo et al. Bi-GANs-ST for perceptual image super-resolution
Liu et al. Spatio-temporal interactive laws feature correlation method to video quality assessment
Da et al. Perceptual quality assessment of nighttime video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant