CN109102521A - A kind of video target tracking method based on parallel attention correlation filtering - Google Patents

A kind of video target tracking method based on parallel attention correlation filtering Download PDF

Info

Publication number
CN109102521A
CN109102521A CN201810647331.2A CN201810647331A CN109102521A CN 109102521 A CN109102521 A CN 109102521A CN 201810647331 A CN201810647331 A CN 201810647331A CN 109102521 A CN109102521 A CN 109102521A
Authority
CN
China
Prior art keywords
target
tracking
weight
function
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810647331.2A
Other languages
Chinese (zh)
Other versions
CN109102521B (en
Inventor
宋慧慧
樊佳庆
张开华
刘青山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN201810647331.2A priority Critical patent/CN109102521B/en
Publication of CN109102521A publication Critical patent/CN109102521A/en
Application granted granted Critical
Publication of CN109102521B publication Critical patent/CN109102521B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/262Analysis of motion using transform domain methods, e.g. Fourier domain methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20056Discrete and fast Fourier transform, [DFT, FFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The present invention discloses a kind of video target tracking method based on parallel attention correlation filtering, belongs to technical field of image processing.Tracking problem is designed as to the probability of one target position of estimation, spatial choice attention SSA and apparent Selective attention power ASA is integrated, obtains objective function using Log function, realizes the continuous and effective tracking of video object.SSA modeling is carried out first, generate a series of binary map, position response figure is obtained through filtering, then half local field around tracking target samples a series of interference regions, learn anti-interference distance metric in associated video filtering and carry out noise immunity amount canonical correlation filtering, distracter is pushed into negative domain, obtains ASA target figure, the image handled in local field and half local field is blended and is tracked to target by the objective function obtained again by Log function.When having many advantages, such as to handle problem more it is steady it is accurate, adaptable, tracking effect is good.

Description

A kind of video target tracking method based on parallel attention correlation filtering
Technical field
The present invention relates to a kind of video target tracking methods based on parallel attention correlation filtering, belong to image procossing skill Art field.
Background technique
Vision tracking is a prerequisite, such as video monitoring, behavior in some important computer vision applications Identification, video frequency searching and human-computer interaction etc..Although Visual Tracking has been achieved for compared with much progress in recent years, but only The target position information that first frame can be given constantly tracks a general objectives and still has in some unconfined environment It is challenging, this is because being apparently blocked of target, quickly movement and the disturbing factors such as deformation seriously affect.
The task of target following is to find target position and judge target property, this be where with what the problem of, also and Attention selection mechanism in human visual perception is related.Psychology and cognitive science research evidence show that human visual perception has There are principal characteristic and selectivity, so that the vision system of people can be absorbed in the related important visual information of quickly processing.People There are two main visual attention mechanisms in class visual perception: one is Learning and memory power (SSA), it can subtract The opposite field of a small neuron, and improve the sensibility to a specific position in vision territory;The other is apparent selection Attention (ASA), it passes through the different types of feature of special disposal to enhance response, with this in corticocerebral not same district Domain enhancing activity.
After leaving eyes, these scene input signals for entering prefrontal cerebral cortex are divided into back stream and abdomen stream, I.e. where the former utilize existing spatial relationship (), and the latter then highlights appearance features (i.e. what).Some perception are ground Study carefully proof, the function of both types may be processed in parallel, distracter that these mechanism are tracked in processing target, it is fuzzy and It can play an important role when blocking.How using these research go processing correlation filtering class tracker in where and what The problem of, it is of great significance for solving the target following under complex environment.
Summary of the invention
The technical problem to be solved by the present invention is to cannot constantly track general objectives for existing method for tracking target Disadvantage proposes a kind of video target tracking method based on parallel attention correlation filtering, by merging Learning and memory Power and apparent Selective attention power are realized and are tracked to the continuous and effective of video object.
In order to solve the above technical problems, the present invention provides a kind of video frequency object tracking based on parallel attention correlation filtering Tracking problem is designed as the probability of one target position of estimation by method, integrates spatial choice attention (SSA) and apparent choosing Attention (ASA) is selected, obtains objective function using Log function, realizes the continuous and effective tracking of video object, including following step It is rapid:
(1) SSA position response figure is obtained: firstly, tracking the local field around target for tracking target, generating one The binary map of series describes the topological structure under different grain size between target and its surrounding's scene, from top to bottom by picture By description granularity arrangement from thick to thin, one group of tracking target Boolean Graphs B is obtainedi(i=1,2 ..., Nb), coarseness cloth Your figure, which carries out the apparent target of Coding and description to global shape information, apparently to be changed, and fine-grained Boolean Graphs describe the details in space Structure;Then, for tracking one two value filter F of object definition, F is acted on into Boolean Graphs BiOn, it obtains condition position and rings Ying Tu, and study weight is completed by minimizing linear regression function, learn an optimal weight for each Boolean Graphs, it is right The position response figure that each figure weights to the end:
(2) ASA target figure is obtained: firstly, half local field around tracking target samples a series of interference regions, it will Ridge regression objective function approximately equivalent be a metric learning correlation filter, associated video filtering in learn it is anti-interference away from From measurement, the correlation between modeling positive sample is solved;Then, anti-interference measurement regular terms is introduced, is carried out to through target image Anti-interference measurement canonical correlation filtering learns anti-interference distance metric in correlation filtering, while considering to come from true negative sample Useful correlation, distracter is pushed into negative domain, obtains target following picture:
(3) it persistently tracks video object: obtaining the objective function for integrating SSA and ASA by Log function modelling, utilize this Function tracks video object, and online updating parameter, realizes effective tracking to video object.
Specific step is as follows for the video target tracking method based on parallel attention correlation filtering:
(1) SSA position response figure is obtained
(1.1) for tracking target, tracking target around local field, by following formula generate a series of binary map come Topological structure under different grain size between target and its surrounding's scene is described:
Wherein, I (j) indicates j-th of image pixel intensities, and U () is a function of a single variable, and R () indicates a bracket function,It is the RGB color channel figure an of image block, T indicates transposition;
Picture is pressed to description granularity arrangement from thick to thin from top to bottom, obtains one group of tracking target Boolean Graphs Bi(i=1, 2 ..., Nb), coarseness Boolean Graphs, which carry out the apparent target of Coding and description to global shape information, apparently to be changed, fine granularity Boolean Graphs the detailed structure in space is described
(1.2) weight study is carried out: according to a conventional method, for tracking one two value filter of object definitionF is acted on into the tracking target Boolean Graphs B that step (1.1) obtainsiOn, obtain one group of condition position response Figure, and study weight is completed by following minimum linear regression function, learn an optimal weight for each Boolean GraphsEach figure is weighted, one group of last position response figure P (B is obtainedi,F|I∈Ωo):
Wherein, ΩoIt is to occur mesh target area, Ω in scenebIt is the background area occurred in scene, dwIt is the width of feature Degree, dhIt is the height of feature,It is the classifier parameters vector of kth frame,It is the number of pixels of non-blank-white in target area,It is the number of non-empty white pixel in background area, βkIt is a weight coefficient to be optimized, weight coefficient It needs to pass throughOnline updating, to adapt to target with the apparent variation of time, βtAfter being update Weight coefficient vector, η is fusion coefficients,It is the weight coefficient vector of present frame;
(2) ASA target figure is obtained
(2.1) half local field around tracking target samples a series of interference regionsBy following ridge regression mesh Scalar functions approximately equivalent is the correlation filter of a metric learningStudy is anti-in associated video filtering Interference distance measurement;
Wherein,XiIt is sample matrix,It is the DFT of vector x,It isThe i-th row, wiIt is i-th of sample matrix XiCorresponding dependent filter weight,It is all wiThe vector of composition, y are The label of Gaussian, dw′dh' be respectively eigenmatrix width and height, λ is regularization coefficient,It is geneva Distance,And
(2.2) anti-interference measurement regular terms is introduced in correlation filtering objective function, and it is related to obtain anti-interference measurement canonical Filtering ModelBy the model to the target image obtained through step (2.1) into one Step carries out anti-interference measurement canonical correlation filtering, strengthens differentiation and tracking to target signature, the distracter filtered is pushed into negative Domain obtains positive space target following picture P (Xi, wi|I∈Ωo):
Wherein,It is k-th of subvector in anti-interference measurement canonical correlation filtering weight,Total sample to K-th of subvector in amount,It is k-th of subvector in Gaussian label vector, wiIt is i-th of circulation sample matrix Corresponding weight vectors,Pass throughOnline updating obtains,It is to askThe obtained tracking result of t frame of inverse FFT,It isConjugate transposition, I is single Bit matrix, λ are regularization coefficients, and η is fusion coefficients;
It is defined as:
Wherein, xiIt is i-th of sample vector,It is m-th of circulation sample of k-th of basic sample,It is kth N-th of circulation sample of basic sample, wmnIt is that differences between samples weight (for measuring the similitude between sample i and j, get over by weight Greatly, the otherness of sample is bigger, and the appearance features acquired just more have judgement index);
(4) video object is persistently tracked
By Log function modelling, SSA and ASA image is integrated, obtains following objective function:
Wherein, P (Bi, F | I ∈ Ωo) indicate the SSA position response figure obtained, Indicate a series of NbThe Boolean Graphs in channel,Indicate Boolean Graphs filter, P (Xi, wi|I ∈Ωo) indicate the ASA target figure obtained,* a space correlation operation, β are indicatediTable Show a weight coefficient to be optimized, e(·)Indicate exponential function, Ωo∈R2Indicate target area, o expression appears in scene Target,Indicate a series of NxCircular matrix (be each of wherein by one mobile Basic HOG feature channel vectorObtain, all feature channels are all independently distributed), Indicate ASA filter;
Using the objective function, video object is tracked, and online updating parameter, realize to target it is effective with Track.
The value of the regularization coefficient λ is 0.001, and the value of fusion coefficients η is 0.006.
The principle of the present invention is:
Core of the invention be tracking problem is planned to estimation one target position probability, seamlessly integrate SSA and ASA:
Here Ωo∈R2It indicates target area and o indicates to appear in the target in scene,Table Show a series of NbThe Boolean Graphs in channel,It is a series of NxCircular matrix, be each of wherein By to a mobile basic HOG feature channel vectorIt obtains,WithIt is their corresponding filters.In addition, to put it more simply, all feature channels are assumed to be independently distributed. Finally, utilizing Log function on the both sides of formula (1), obtain:
Here P (Bi, F | I ∈ Ωo) and P (Xi, wi|I∈Ωo) is defined as:
Here * is a space correlation operation, βiIt is a weight coefficient to be optimized, and e(·)It is an index letter Number.
In modeling SSA, the present invention generates a series of binary map first, i.e. generation BMR, describes mesh under different grain size Topological structure between mark and its surrounding's scene.In Fig. 2, from top to bottom, Boolean Graphs description granularity from coarse to fine, wherein Coarseness Boolean Graphs encode global shape information, and it is robust that it, which apparently changes big target, however, it is fine-grained then The CONSTRUCTED SPECIFICATION in space is described, it is effective to the positioning of accurate target.Then by two value filtering predetermined Device acts on these figures, obtains one group of condition position response diagram, wherein being each weighted the position response for coming to the end Figure, target are that one optimal weight of study is gone for each Boolean Graphs.
BMR is the inspiration studied by nearest human visual attention, shows as realizing energy to the of short duration consciousness of a scene Enough indicated using one group of Boolean Graphs.Particularly, it providesIt is the RGB color channel figure an of image block, it AccordinglyIt is obtained by following formula
Here threshold θiFrom being independently distributed (black and white binary map) between [0,255], and this symbol The sign of inequality of >=expression Element-Level.To put it more simply, threshold value is set as θi=Nb(i-1)/255, it passes through a fixed step-length δ=Nb/ 255 from 0 to 255 between sample because the sampling of fixed step size is and the uniform sampling equivalent of infinite δ → 0. Therefore, it is very easy to proveAlso, j-th of image pixel intensities I (j) can be expressed as
Here U () is a function of a single variable, such as U (2)=[1;1;0], (3)=[1 U;1;1] there are 3 discrete layers, Also, R () indicates a bracket function.It is the RGB color channel figure an of image block,
When carrying out weight study, the present invention learns weight by minimizing linear regression function below:
Here | | | |FIndicate F norm.It will be apparent that minimizing in formula (6)It is equivalent to minimize following Objective function:
Here formula (5) by formula (7) instead of.ΩoAnd ΩbTarget and background region is respectively indicated, and
By settingIt minimizesObtained solution { βiCan obtain
In order to which adaptive targets are with the apparent variation of time, online updating coefficient
HereIt is calculated by formula (8) using the tracking result in t frame.
In terms of solving interference problem, the present invention focuses on that study appearance features are former using the ASA in human visual perception Reason is shifted distracter in negative space by one anti-interference distance metric of study, with the discriminating power of this Enhanced feature, Robust tracking when to generate for distracter, can distinguish target in distracter well.Phase will first be learnt Closing filtering is approximately one distance metric of study, solves the correlation between modeling positive sample, then learns in correlation filtering Anti-interference distance metric, while considering the useful correlation from true negative sample.
In learning distance metric, study CF is expressed as a space ridge regression objective function:
HereIt is a Gauss regressive object, Also, λ is a regularization coefficient.Notice ifIt is arrived by remodelingFor any a ≠ 0, then, formula (10) can It is divided by weight-normalityIn addition to remolding y with the ratio of 1/a, it be equivalent to formula (10) also, by In same peak response position, this will generate same tracking result.
Based on this, in order to clearly show the relationship between correlation filtering study and metric learning, the setting in formula (10)And it remoldsThis is equivalent to that constraint is inside added
Next, with markTo indicateThe i-th row, then, rewrite formula (10) in data item are as follows:
HereIt is mahalanobis distance,AndBe one be entirely one vector.Cause This, study correlation filtering substantially can be regarded as one optimal distance metric of study.
But it only considered the relationship between positive sample in formula (11), therefore limit it and distinguish target from background Discriminating power.In order to solve this problem, an anti-interference measurement regular terms is added in formula (10), by negative space Relationship composition, and as a kind of strength for distracter being shifted onto negative space.
When carrying out anti-interference measurement canonical correlation filtering, a series of interference are sampled from half local field around target first RegionThen the interaction between them is modeled asAnd it is whole It closes into formula (10) as a regular terms:
Here γ is a regularization coefficient, and wmnIt is a weight, it measures out similar between sample i and j Property.Weight is bigger, and sample variation is bigger, so that the appearance features acquired more have judgement index.
Formula (12) can be formulated for again:
HereAnd ThisMinimal solution can pass throughIt obtains:
HereIt is a block matrix, there is Nx×NxA block
HereAnd
Because circular matrix X meets
Here F indicates Discrete Fourier Transform (DFT) matrix,Indicate the DFT of reference vector x, and FH=(F*)T Indicate conjugate transposition.Using this modeling, formula (15) can be diagonally melted into
HereAndIn addition, (16) are substituted into (14), it The right can be divided by weight-normality
Substitution formula (17), (18), obtain the FFT of its solution in formula (14)
HereIts i-th of element beK-th of element, andSimilar to formula (9),It is to be obtained by online updating
HereIt is to be calculated by formula (19), is the tracking result of t frame.
It is defined as
HereBecauseLine number beIt is Columns, in direct formula for calculating (19)It is inverse not-so-practical.On the contrary, we pass through with transformationTo calculateIt is inverse.OwnedLater, it can concurrently be calculated, the optimal solution of formula (14)It can be by askingInverse FFT It obtains.
The present invention is based on human visual perceptions to propose a correlation filtering class track algorithm, reflects in human vision sense SSA and ASA mechanism in knowing enhances the robust of target following by concurrently handling a part and half local background domain Property and anti-interference.For local field, in order to model SSA, a simple but effective BMR is introduced into correlation filtering study Among, the local topology of target and its scene is portrayed by random binary image color channel, for various Transformation is constant.For half local field, in order to model ASA, an anti-interference measurement regular terms is introduced into the mesh of correlation filtering Among scalar functions, it is encountering challenging target similar object as a strength distracter push-in negative domain When distracter, the robustness of tracking is enhanced.When with processing problem more it is steady it is accurate, adaptable, tracking effect is good etc. Advantage is, it can be achieved that the continuous and effective to video object tracks.
Detailed description of the invention
Fig. 1 is the principle of the present invention figure.
Fig. 2 is the flow chart of present invention modeling SSA.
Specific embodiment
A specific embodiment of the invention is further described in detail with reference to the accompanying drawing, the skill being not specified in embodiment The conventional products that art or product are the prior art or can be obtained by purchase.
Embodiment 1: as shown in Figure 1, 2, the video target tracking method based on parallel attention correlation filtering is will be with Track problem is designed as the probability of one target position of estimation, integrates spatial choice attention (SSA) and apparent Selective attention power (ASA), objective function is obtained using Log function, realizes the continuous and effective tracking of video object, comprising the following steps:
(1) SSA position response figure is obtained: firstly, tracking the local field around target for tracking target, generating one The binary map of series describes the topological structure under different grain size between target and its surrounding's scene, from top to bottom by picture By description granularity arrangement from thick to thin, one group of tracking target Boolean Graphs B is obtainedi(i=1,2 ..., Nb), coarseness cloth Your figure, which carries out the apparent target of Coding and description to global shape information, apparently to be changed, and fine-grained Boolean Graphs describe the details in space Structure;Then, for tracking one two value filter F of object definition, F is acted on into Boolean Graphs BiOn, it obtains condition position and rings Ying Tu, and study weight is completed by minimizing linear regression function, learn an optimal weight for each Boolean Graphs, it is right The position response figure that each figure weights to the end:
(2) ASA target figure is obtained: firstly, half local field around tracking target samples a series of interference regions, it will Ridge regression objective function approximately equivalent be a metric learning correlation filter, associated video filtering in learn it is anti-interference away from From measurement, the correlation between modeling positive sample is solved;Then, anti-interference measurement regular terms is introduced, is carried out to through target image Anti-interference measurement canonical correlation filtering learns anti-interference distance metric in correlation filtering, while considering to come from true negative sample Useful correlation, distracter is pushed into negative domain, obtains target following picture:
(3) it persistently tracks video object: obtaining the objective function for integrating SSA and ASA by Log function modelling, utilize this Function tracks video object, and online updating parameter, realizes effective tracking to video object.
Based on the video target tracking method of parallel attention correlation filtering, specific step is as follows:
(1) SSA position response figure is obtained
(1.1) for tracking target, tracking target around local field, by following formula generate a series of binary map come Topological structure under different grain size between target and its surrounding's scene is described:
Wherein, I (j) indicates j-th of image pixel intensities, and U () is a function of a single variable, and R () indicates a bracket function,It is the RGB color channel figure an of image block, T indicates transposition;
Picture is pressed to description granularity arrangement from thick to thin from top to bottom, obtains one group of tracking target Boolean Graphs Bi(i=1, 2 ..., Nb), coarseness Boolean Graphs, which carry out the apparent target of Coding and description to global shape information, apparently to be changed, fine granularity Boolean Graphs the detailed structure in space is described
(1.2) weight study is carried out: according to a conventional method, for tracking one two value filter of object definitionF is acted on into the tracking target Boolean Graphs B that step (1.1) obtainsiOn, obtain one group of condition position response Figure, and study weight is completed by following minimum linear regression function, learn an optimal weight for each Boolean GraphsEach figure is weighted, one group of last position response figure P (B is obtainedi, F | I ∈ Ωo):
Wherein, ΩoIt is to occur mesh target area, Ω in scenebIt is the background area occurred in scene, dwIt is the width of feature Degree, dhIt is the height of feature,It is the classifier parameters vector of kth frame,It is the number of pixels of non-blank-white in target area,It is the number of non-empty white pixel in background area, βkIt is a weight coefficient to be optimized, weight coefficient It needs to pass throughOnline updating, to adapt to target with the apparent variation of time, βtAfter being update Weight coefficient vector, η is fusion coefficients,It is the weight coefficient vector of present frame;
(2) ASA target figure is obtained
(2.1) half local field around tracking target samples a series of interference regionsBy following ridge regression mesh Scalar functions approximately equivalent is the correlation filter of a metric learningLearn in associated video filtering Anti-interference distance metric;
Wherein,xiSample matrix,It is the DFT of vector x,It isThe i-th row, wiIt is i-th of sample matrix XiCorresponding dependent filter weight,It is all wiThe vector of composition, y are Gausses The label of type, dw′dh' be eigenmatrix width and height, λ is regularization coefficient,It is mahalanobis distance,And
(2.2) anti-interference measurement regular terms is introduced in correlation filtering objective function, and it is related to obtain anti-interference measurement canonical Filtering ModelBy the model to the target image obtained through step (2.1) into one Step carries out anti-interference measurement canonical correlation filtering, strengthens differentiation and tracking to target signature, the distracter filtered is pushed into negative Domain obtains positive space target following picture P (Xi, wi|I∈Ωo):
Wherein,It is k-th of subvector in anti-interference measurement canonical correlation filtering model,Total sample to K-th of subvector in amount,It is k-th of subvector in Gaussian label vector, wiIt is i-th of circulation sample matrix Corresponding weight vectors,Pass throughWhat online updating obtained,It is to askThe obtained tracking result of t frame of inverse FFT,It isConjugate transposition, I is Unit matrix, λ are regularization coefficients, and η is fusion coefficients;
It is defined as:
Wherein, XiIt is i-th of sample vector,It is m-th of circulation sample of k-th of basic sample,It is kth base N-th of circulation sample of plinth sample, WmnIt is that differences between samples weight (for measuring the similitude between sample i and j, get over by weight Greatly, the otherness of sample is bigger, and the appearance features acquired just more have judgement index);
(4) video object is persistently tracked
By Log function modelling, SSA and ASA image is integrated, obtains following objective function:
Wherein, P (Bi, F | I ∈ Ωo) indicate the SSA position response figure obtained, Indicate a series of NbThe Boolean Graphs in channel,Indicate Boolean Graphs filter, P (Xi, w, | I ∈Ωo) indicate the ASA target figure obtained,* a space correlation operation, β are indicatediTable Show a weight coefficient to be optimized, e(·)Indicate exponential function, Ωo∈R2Indicate target area, o expression appears in scene Target,Indicate a series of NxCircular matrix (be each of wherein by one mobile Basic HOG feature channel vectorObtain, all feature channels are all independently distributed),Indicate ASA filter;
Using the objective function, video object is tracked, and online updating parameter, realize to target it is effective with Track.
In this example, regularization coefficient λ=0.001, fusion coefficients η=0.3.
Embodiment 2: as shown in Figure 1, 2, the video target tracking method based on parallel attention correlation filtering is will be with Track problem is designed as the probability of one target position of estimation, integrates spatial choice attention SSA and apparent Selective attention power ASA, Objective function is obtained using Log function, realizes the continuous and effective tracking of video object, comprising the following steps:
(1) SSA position response figure is obtained: firstly, generating a series of binary map for tracking target to describe different grains Lower topological structure between target and its surrounding's scene is spent, picture is pressed to description granularity arrangement from thick to thin from top to bottom, Obtain one group of tracking target Boolean Graphs Bi, it is apparent that coarseness Boolean Graphs carry out the apparent target of Coding and description to global shape information Variation, fine-grained Boolean Graphs describe the detailed structure in space;Then, for tracking one two value filter F of object definition, F is acted on into Boolean Graphs BiOn, condition position response diagram is obtained, and complete study weight by minimizing linear regression function, Learn an optimal weight for each Boolean Graphs, the position response figure for weighting to the end to each figure:
(2) obtain ASA target figure: half local field first around tracking target samples a series of interference regions, by ridge Regressive object approximation to function is equivalent to the correlation filter of a metric learning, learns anti-interference distance in associated video filtering Measurement solves the correlation between modeling positive sample;Then anti-interference measurement regular terms is introduced, is resisted to through target image Interference metric canonical correlation filtering learns anti-interference distance metric in correlation filtering, while considering from true negative sample Distracter is pushed into negative domain, obtains target following picture by useful correlation:
(3) it persistently tracks video object: obtaining the objective function for integrating SSA and ASA by Log function modelling, utilize this Function tracks video object, and online updating parameter, realizes effective tracking to video object.
The specific steps of this example are same as Example 1, regularization coefficient λ=0.001, fusion coefficients η=0.3.
Technology contents of the invention are described above in conjunction with attached drawing, but protection scope of the present invention be not limited to it is described Content within the knowledge of one of ordinary skill in the art can also be in the premise for not departing from present inventive concept Under technology contents of the invention are made a variety of changes, all within the spirits and principles of the present invention, any modification for being made, Equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (3)

1. a kind of video target tracking method based on parallel attention correlation filtering, it is characterised in that: design tracking problem For the probability for estimating a target position, spatial choice attention (SSA) and apparent Selective attention power (ASA) are integrated, Log is utilized Function obtains objective function, realizes the continuous and effective tracking of video object, comprising the following steps:
(1) SSA position response figure is obtained: firstly, tracking the local field around target for tracking target, generating a series of Binary map the topological structure under different grain size between target and its surrounding's scene is described, picture pressed from top to bottom by thick It is arranged to thin description granularity, obtains one group of tracking target Boolean Graphs Bi(i=1,2 ..., Nb), coarseness Boolean Graphs are to complete Office's shape information carries out the apparent target of Coding and description and apparently changes, and fine-grained Boolean Graphs describe the detailed structure in space;So Afterwards, for tracking one two value filter F of object definition, F is acted on into Boolean Graphs BiOn, condition position response diagram is obtained, and Study weight is completed by minimizing linear regression function, learns an optimal weight for each Boolean Graphs, each figure is added The position response figure for weighing to the end:
(2) ASA target figure is obtained: firstly, half local field around tracking target samples a series of interference regions, by ridge regression Objective function approximately equivalent is the correlation filter of a metric learning, is learnt in associated video filtering anti-interference apart from degree Amount;Then, anti-interference measurement regular terms is introduced, anti-interference measurement canonical correlation filtering is carried out to through target image, by distracter It is pushed into negative domain, obtains target following picture:
(3) it persistently tracks video object: obtaining the objective function for integrating SSA and ASA by Log function modelling, utilize the function Video object is tracked, and online updating parameter, realizes effective tracking to video object.
2. the video target tracking method according to claim 1 based on parallel attention correlation filtering, it is characterised in that: Specific step is as follows for the video target tracking method:
(1) SSA position response figure is obtained
(1.1) for tracking target, the local field around target is being tracked, a series of binary map is generated by following formula to describe Topological structure under different grain size between target and its surrounding's scene:
Wherein, I (j) indicates j-th of image pixel intensities, and U () is a function of a single variable, and R () indicates a bracket function,It is the RGB color channel figure an of image block, T indicates transposition;
Picture is pressed to description granularity arrangement from thick to thin from top to bottom, obtains one group of tracking target Boolean Graphs Bi(i=1, 2 ..., Nb), coarseness Boolean Graphs, which carry out the apparent target of Coding and description to global shape information, apparently to be changed, fine granularity Boolean Graphs the detailed structure in space is described
(1.2) weight study is carried out: for tracking one two value filter of object definitionF is acted on into step (1.1) the tracking target Boolean Graphs B obtainediOn, one group of condition position response diagram is obtained, and pass through following minimum linear regression Function completes study weight, learns an optimal weight for each Boolean GraphsEach figure is weighted, obtain one group it is last Position response figure P (Bi, F | I ∈ Ωo):
Wherein, ΩoIt is to occur mesh target area, Ω in scenebIt is the background area occurred in scene, dwIt is the width of feature, dh It is the height of feature,It is the classifier parameters vector of kth frame,It is the number of pixels of non-blank-white in target area,It is The number of non-empty white pixel, β in background areakIt is a weight coefficient to be optimized, weight coefficientPass throughOnline updating, βtIt is the weight coefficient vector after updating, η is fusion coefficients,It is present frame Weight coefficient vector;
(2) ASA target figure is obtained
(2.1) half local field around tracking target samples a series of interference regionsBy following ridge regression target letter Number approximately equivalent is the correlation filter of a metric learningLearn in associated video filtering anti-interference Distance metric;
Wherein,XiIt is sample matrix,It is the DFT of vector x,It is The i-th row, wiIt is i-th of sample matrix xiCorresponding dependent filter weight,It is all wiThe vector of composition, y are Gaussians Label, dw′dh' be respectively eigenmatrix width and height, λ is regularization coefficient,It is mahalanobis distance,And
(2.2) anti-interference measurement regular terms is introduced in correlation filtering objective function, obtains anti-interference measurement canonical correlation filtering ModelBy the model to the target image obtained through step (2.1) further into The anti-interference measurement canonical correlation filtering of row, strengthens the differentiation and tracking to target signature, and the distracter filtered is pushed into negative domain, is obtained Take positive space target following picture P (Xi, wi| I ∈ Ω o):
Wherein,It is k-th of subvector in anti-interference measurement canonical correlation filtering weight,It is in total sample vector K-th of subvector,It is k-th of subvector in Gaussian label vector, wiIt is that i-th of circulation sample matrix is corresponding Weight vectors,Pass throughOnline updating obtains,It is to askThe obtained tracking result of t frame of inverse FFT,It isConjugate transposition, I is single Bit matrix, λ are regularization coefficients, and η is fusion coefficients;
It is defined as:
Wherein, xiIt is i-th of sample vector,It is m-th of circulation sample of k-th of basic sample,It is k-th of basic sample This n-th of circulation sample, wmnIt is differences between samples weight;
(4) video object is persistently tracked
By Log function modelling, SSA and ASA image is integrated, obtains following objective function:
Wherein, P (Bi, F | I ∈ Ωo) indicate the SSA position response figure obtained, Indicate a series of NbThe Boolean Graphs in channel,Indicate Boolean Graphs filter, P (Xi, wi|I ∈Ωo) indicate the ASA target figure obtained,* a space correlation operation, β are indicatediTable Show a weight coefficient to be optimized, e(·)Indicate exponential function, Ωo∈R2Indicate target area, o expression appears in scene Target,Indicate a series of NxCircular matrix (be each of wherein by a mobile base This HOG feature channel vectorObtain, all feature channels are all independently distributed),Table Show ASA filter;
Using the objective function, video object is tracked, and online updating parameter, realizes effective tracking to target.
3. the video target tracking method according to claim 2 based on parallel attention correlation filtering, it is characterised in that: The value of the regularization coefficient λ is 0.001, and the value of fusion coefficients η is 0.3.
CN201810647331.2A 2018-06-22 2018-06-22 Video target tracking method based on parallel attention-dependent filtering Active CN109102521B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810647331.2A CN109102521B (en) 2018-06-22 2018-06-22 Video target tracking method based on parallel attention-dependent filtering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810647331.2A CN109102521B (en) 2018-06-22 2018-06-22 Video target tracking method based on parallel attention-dependent filtering

Publications (2)

Publication Number Publication Date
CN109102521A true CN109102521A (en) 2018-12-28
CN109102521B CN109102521B (en) 2021-08-27

Family

ID=64844863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810647331.2A Active CN109102521B (en) 2018-06-22 2018-06-22 Video target tracking method based on parallel attention-dependent filtering

Country Status (1)

Country Link
CN (1) CN109102521B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919114A (en) * 2019-03-14 2019-06-21 浙江大学 One kind is based on the decoded video presentation method of complementary attention mechanism cyclic convolution
CN109993777A (en) * 2019-04-04 2019-07-09 杭州电子科技大学 A kind of method for tracking target and system based on double-template adaptive threshold
CN110102050A (en) * 2019-04-30 2019-08-09 腾讯科技(深圳)有限公司 Virtual objects display methods, device, electronic equipment and storage medium
CN110335290A (en) * 2019-06-04 2019-10-15 大连理工大学 Twin candidate region based on attention mechanism generates network target tracking method
CN110443852A (en) * 2019-08-07 2019-11-12 腾讯科技(深圳)有限公司 A kind of method and relevant apparatus of framing
CN110807437A (en) * 2019-11-08 2020-02-18 腾讯科技(深圳)有限公司 Video granularity characteristic determination method and device and computer-readable storage medium
CN112085765A (en) * 2020-09-15 2020-12-15 浙江理工大学 Video target tracking method combining particle filtering and metric learning
CN113704684A (en) * 2021-07-27 2021-11-26 浙江工商大学 Centralized fusion robust filtering method
CN113808171A (en) * 2021-09-27 2021-12-17 山东工商学院 Unmanned aerial vehicle visual tracking method based on dynamic feature selection of feature weight pool

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809713A (en) * 2016-03-03 2016-07-27 南京信息工程大学 Object tracing method based on online Fisher discrimination mechanism to enhance characteristic selection

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809713A (en) * 2016-03-03 2016-07-27 南京信息工程大学 Object tracing method based on online Fisher discrimination mechanism to enhance characteristic selection

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
QINGSHAN LIU,ETC: "Visual Tracking via Nonlocal Similarity Learning", 《IEEE TRANSACTION ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》 *
ZHETAO LI,ETC: "Visual Tracking With Weighted Adaptive Local Sparse Appearance Model via Spatio-Temporal Context Learning", 《IEEE TRANSCATION ON IMAGE PROCESSING》 *
樊佳庆等: "通道稳定性加权补充学习的实时视觉跟踪算法", 《计算机应用》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919114A (en) * 2019-03-14 2019-06-21 浙江大学 One kind is based on the decoded video presentation method of complementary attention mechanism cyclic convolution
CN109993777A (en) * 2019-04-04 2019-07-09 杭州电子科技大学 A kind of method for tracking target and system based on double-template adaptive threshold
CN110102050A (en) * 2019-04-30 2019-08-09 腾讯科技(深圳)有限公司 Virtual objects display methods, device, electronic equipment and storage medium
US11615570B2 (en) 2019-04-30 2023-03-28 Tencent Technology (Shenzhen) Company Limited Virtual object display method and apparatus, electronic device, and storage medium
CN110335290A (en) * 2019-06-04 2019-10-15 大连理工大学 Twin candidate region based on attention mechanism generates network target tracking method
CN110443852B (en) * 2019-08-07 2022-03-01 腾讯科技(深圳)有限公司 Image positioning method and related device
CN110443852A (en) * 2019-08-07 2019-11-12 腾讯科技(深圳)有限公司 A kind of method and relevant apparatus of framing
CN110807437A (en) * 2019-11-08 2020-02-18 腾讯科技(深圳)有限公司 Video granularity characteristic determination method and device and computer-readable storage medium
CN110807437B (en) * 2019-11-08 2023-01-03 腾讯科技(深圳)有限公司 Video granularity characteristic determination method and device and computer-readable storage medium
CN112085765A (en) * 2020-09-15 2020-12-15 浙江理工大学 Video target tracking method combining particle filtering and metric learning
CN112085765B (en) * 2020-09-15 2024-05-31 浙江理工大学 Video target tracking method combining particle filtering and metric learning
CN113704684A (en) * 2021-07-27 2021-11-26 浙江工商大学 Centralized fusion robust filtering method
CN113704684B (en) * 2021-07-27 2023-08-29 浙江工商大学 Centralized fusion robust filtering method
CN113808171A (en) * 2021-09-27 2021-12-17 山东工商学院 Unmanned aerial vehicle visual tracking method based on dynamic feature selection of feature weight pool

Also Published As

Publication number Publication date
CN109102521B (en) 2021-08-27

Similar Documents

Publication Publication Date Title
CN109102521A (en) A kind of video target tracking method based on parallel attention correlation filtering
Baldwin et al. Time-ordered recent event (tore) volumes for event cameras
Zou et al. Attend to count: Crowd counting with adaptive capacity multi-scale CNNs
Li et al. Directional temporal modeling for action recognition
AU2013254437A1 (en) Abnormal object track determination using a Gaussian Processes based Variational Bayes Expectation Maximisation
Wu et al. A visual attention model based on hierarchical spiking neural networks
CN107704924B (en) Construction method of synchronous self-adaptive space-time feature expression learning model and related method
Hamker Modeling feature-based attention as an active top-down inference process
Wang et al. A cognitive memory-augmented network for visual anomaly detection
Rudi et al. Parameter estimation with dense and convolutional neural networks applied to the FitzHugh–Nagumo ODE
Li et al. Dynamic spatio-temporal specialization learning for fine-grained action recognition
Ratre et al. Tucker visual search-based hybrid tracking model and Fractional Kohonen Self-Organizing Map for anomaly localization and detection in surveillance videos
Iqbal et al. Learning feature fusion strategies for various image types to detect salient objects
Kavikuil et al. Leveraging deep learning for anomaly detection in video surveillance
Xie et al. Representation learning: A statistical perspective
Li et al. Action recognition using visual attention with reinforcement learning
Wu et al. Spatial-temporal graph attention network for video-based gait recognition
Yildirim et al. A new model for classification of human movements on videos using convolutional neural networks: MA-Net
Linsley et al. Tracking without re-recognition in humans and machines
CN115410222A (en) Video pedestrian re-recognition network with posture sensing function
Ding et al. Machine learning model for feature recognition of sports competition based on improved TLD algorithm
CN108257148B (en) Target suggestion window generation method of specific object and application of target suggestion window generation method in target tracking
Lehnert et al. Retina-inspired visual module for robot navigation in complex environments
Xu et al. Label noise robust crowd counting with loss filtering factor
Venu et al. Disease Identification in Plant Leaf Using Deep Convolutional Neural Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant