CN108734109B - Visual target tracking method and system for image sequence - Google Patents

Visual target tracking method and system for image sequence Download PDF

Info

Publication number
CN108734109B
CN108734109B CN201810373435.9A CN201810373435A CN108734109B CN 108734109 B CN108734109 B CN 108734109B CN 201810373435 A CN201810373435 A CN 201810373435A CN 108734109 B CN108734109 B CN 108734109B
Authority
CN
China
Prior art keywords
target
convolution
size
regression model
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810373435.9A
Other languages
Chinese (zh)
Other versions
CN108734109A (en
Inventor
刘李漫
刘佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Tuke Intelligent Information Technology Co.,Ltd.
Original Assignee
South Central University for Nationalities
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South Central University for Nationalities filed Critical South Central University for Nationalities
Priority to CN201810373435.9A priority Critical patent/CN108734109B/en
Publication of CN108734109A publication Critical patent/CN108734109A/en
Application granted granted Critical
Publication of CN108734109B publication Critical patent/CN108734109B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a visual target tracking method and a system facing to an image sequence, wherein the visual target tracking method comprises the following steps of training a convolution regression model for target tracking by utilizing a given initialization image and a target rectangular frame to be tracked; predicting the position of the target by using a convolution regression model obtained by training; further predicting the size of the target on the basis of the target position prediction result; and updating the convolution regression model according to the position and the size of the target obtained by tracking. The invention relates to the technologies of target integral regression model training, target texture regression model training, target position prediction, target size prediction, tracking model updating and the like, can fully overcome the interference of various environmental factors in a tracking scene, realizes accurate prediction of the target position and size, and has higher commercial value and research significance.

Description

Visual target tracking method and system for image sequence
Technical Field
The invention relates to the technical field of computer vision, in particular to a visual target tracking method and system for an image sequence.
Background
In the field of computer vision, it is generally necessary to automatically recognize and analyze video information by using an intelligent algorithm, so as to realize intelligent control over equipment. The target tracking algorithm based on the visual image sequence can fully utilize the existing target detection algorithm facing to a single image, quickly and reliably track the motion track of a target in a video, and improve technical support for understanding and analyzing the video.
With the rapid expansion of industrial production scale, the automation and intelligence degree in the industrial product production process also needs to be improved continuously. For example, in video surveillance systems, intelligent algorithms are needed to automatically identify and detect anomalous events occurring in video. The visual target tracking algorithm can automatically track each target in the video and obtain the motion trail of the target, and a key technical means is provided for analyzing and understanding abnormal events in the video. However, the conventional visual target tracking algorithm has the following defects:
(1) the target size cannot be well predicted, and particularly when the target is obviously deformed, the target size cannot be accurately predicted by a traditional tracking algorithm, so that the target is lost in subsequent tracking, and reliable bottom layer information cannot be provided for video analysis and understanding.
(2) The target tracking can not be accurately and reliably carried out under the interference of various environmental factors.
In view of this, it is urgently needed to improve the existing visual target tracking algorithm, and a visual target tracking algorithm capable of overcoming the interference of various environmental factors and accurately predicting the position and size of a target is provided.
Disclosure of Invention
The invention aims to solve the technical problems that the position and the size of a target cannot be predicted, the tracking process is easily interfered by the environment, and the target cannot be accurately and reliably tracked by the conventional visual target tracking algorithm.
In order to solve the above technical problems, the technical solution adopted by the present invention is to provide a visual target tracking method for an image sequence, comprising the following steps:
training a convolution regression model for target tracking by using a given initialization image and a target rectangular frame to be tracked;
predicting the position of the target by using a convolution regression model obtained by training;
further predicting the size of the target on the basis of the target position prediction result;
and updating the convolution regression model according to the position and the size of the target obtained by tracking.
In the above scheme, the method for training the convolution regression model includes the following steps:
step 10, constructing a feature extraction network for expressing the target morphology characteristics, wherein the network can be realized based on any feature extraction method for expressing target information;
step 11, extracting the features corresponding to the current input image by using the feature extraction network in the step 10;
step 12, constructing a convolution regression model which is realized based on a single convolution network layer and faces to the whole target, wherein the size of a convolution kernel of the convolution layer is consistent with that of the target in a characteristic space, meanwhile, an output channel of the convolution layer network is 1, and the output of the convolution layer network can be used for predicting the position of the target;
step 13, generating a corresponding training label graph based on the features extracted in the step 11, wherein the training label graph is generated according to a two-dimensional Gaussian function, the peak value of the training label graph corresponds to the real position of the target, and the single convolutional layer in the step 12 is optimized in an iterative manner by using a gradient descent algorithm;
step 14, constructing a convolution regression model which is realized based on a single convolution network layer and faces to the target texture, wherein the size of a convolution kernel of the convolution layer is consistent with that of the target in a feature space, meanwhile, an output channel of the convolution layer network is 1, and the output of the convolution layer network can be used for predicting the prospect of the target;
step 15, generating a corresponding training label map based on the features extracted in step 11, wherein a rectangular frame is used for marking the foreground of the target in the label map, and the single convolutional layer in step 12 is iteratively optimized by using a gradient descent algorithm;
and step 16, finishing the initial training of the convolution regression model.
In the above scheme, the target position prediction method specifically includes the following steps:
step 20, extracting the features corresponding to the current input image by using the feature extraction network constructed in the step 10 to prepare for subsequent target tracking;
step 21, inputting the image features obtained in the step 20 into the target whole-oriented convolution regression network obtained in the step 12, and calculating to obtain a model based on the target whole regression modelTarget position prediction map H (x)t,yt);
Step 22, inputting the image features obtained in the step 20 into the convolution regression network facing the target texture obtained in the step 14, and calculating to obtain a target foreground prediction map T (x) based on the target texture regression modelt,yt);
Step 23, performing a mean filtering operation on the target foreground prediction image obtained in step 22, wherein the size of the filtering template is consistent with that of the target, and calculating to obtain a target position prediction image F (x) based on the target foregroundt,yt);
Step 24, superposing the two target position prediction maps obtained in the steps 21 and 23 to obtain a final target position prediction map, and predicting a target position according to an index corresponding to the maximum value in the position prediction maps, wherein the calculation formula is as follows:
Figure GDA0002626006760000031
in the above scheme, in step 23, a target position prediction map F (x) based on the target foreground is obtainedt,yt) The calculation formula of (2) is as follows:
Figure GDA0002626006760000032
wherein wt-1And ht-1Respectively identifying the size of the target, R (x), obtained in the last frame trackingt,yt,wt-1,ht-1) Representing coordinates of (x)t,yt) A size of wt-1,ht-1The rectangular frame of (2). T (i, j) represents the value corresponding to each pixel point in the rectangular frame of the target foreground prediction image of the target texture regression model, i, j is the pixel point in the rectangular frame
In the above solution, the target size prediction method includes the steps of:
step 30, extracting the corresponding features of the current input image by using the feature extraction network constructed in the step 10 to prepare for subsequent target tracking;
step 31, obtaining in step 30The obtained image features are input into the convolution regression network facing the target texture obtained in the step 14, and a target foreground prediction image T (x) based on a target texture regression model is obtained through calculationt,yt);
Step 32, obtaining the position x of the current targett,ytAnd knowing the size w of the object in the previous framet,htThen, the target size is calculated as wt,htA posterior probability of (d);
step 33, calculating posterior probabilities corresponding to the target candidate sizes by repeatedly using the method in step 32, and selecting the target size with the maximum posterior probability as a final target size predicted value;
in step 34, the target size prediction is finished.
In the above scheme, the target size in step 32 is wt,htThe formula for calculating the posterior probability of (2) is: p (w)t,ht|O,xt,yt,wt-1,ht-1)=P(O|xt,yt,wt,ht)P(wt,ht|wt-1,ht-1) Wherein P (O | x)t,yt,wt,ht) The position and size state of the object is represented as (x)t,yt,wt,ht) Probability of (1), P (O | x)t,yt,wt,ht)P(wt,ht|wt-1,ht-1) Representing the probability of a state transition of the target size between two adjacent frames,
Figure GDA0002626006760000041
P(O|xt,yt,wt,ht)=A(wt,ht)-B(wt,ht) Wherein A (w)t,ht) Representing a candidate target rectangular box (x)t,yt,wt,ht) Average target foreground probability of (1), B (w)t,ht) Represents the target rectangle box (x)t,yt,wt,ht) Average target foreground probability of surrounding background area.
In the above scheme, updating the convolution regression model includes the following steps:
step 40, generating a label graph for training a convolution regression model facing the whole target according to the predicted target position, and updating the parameters of the single convolution layer network in the step 12 by using a gradient descent method;
step 41, generating a labeled graph for training a convolution regression model facing to the target texture according to the predicted target size, and updating the network parameters of the single convolution layer in the step 14 by using a gradient descent method;
and step 42, finishing the updating of the convolution regression model.
The invention also provides an image sequence-oriented visual target tracking system, which comprises:
the training module is used for training a convolution regression model for target tracking;
the target position prediction module predicts the position of a target by using a convolution regression model obtained by training;
the target size prediction module is used for further predicting the size of the target on the basis of the target position prediction result;
and the updating module is used for updating the convolution regression model according to the position and the size of the target obtained by tracking.
Compared with the prior art, the method relates to the technologies of target integral regression model training, target texture regression model training, target position prediction, target size prediction, tracking model updating and the like, can fully overcome the interference of various environmental factors in a tracking scene, realizes accurate prediction of the target position and size, and has higher commercial value and research significance.
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is an input initial frame image of the present invention;
FIG. 3 is a schematic diagram of a training process of a convolution regression model according to the present invention;
FIG. 4 is a schematic of the integral regression model of the present invention;
FIG. 5 is a schematic diagram of a texture regression model of the present invention;
FIG. 6 is a schematic diagram illustrating a target location prediction process according to the present invention;
FIG. 7 is a graph of the present invention based on global regression model target location prediction;
FIG. 8 is a graph of target foreground prediction based on a texture regression model according to the present invention;
FIG. 9 is a graph of the prediction of target location based on a texture regression model of the present invention;
FIG. 10 is a flowchart illustrating a target size prediction process according to the present invention.
Detailed Description
The invention provides a technical method for tracking a visual target facing an image sequence, which can fully overcome the interference of various environmental factors in a tracking scene, realize accurate prediction of the position and the size of the target and have higher commercial value and research significance. The invention is described in detail below with reference to the drawings and the detailed description.
As shown in fig. 1 and fig. 2, the technical method for tracking a visual target facing an image sequence according to the present invention may specifically include the following steps:
training a convolution regression model for target tracking by using a given initialization image and a target rectangular frame to be tracked;
predicting the position of the target by using a convolution regression model obtained by training;
further predicting the size of the target on the basis of the target position prediction result;
and finally, updating the convolution regression model according to the position and the size of the target obtained by tracking.
Correspondingly, the invention also provides an image sequence-oriented visual target tracking system, which comprises a training module, a target position prediction module, a target size prediction module and an updating module.
Training a convolution regression model for target tracking by a training module;
the target position prediction module predicts the position of a target by using a convolution regression model obtained by training;
the target size prediction module is used for further predicting the size of the target on the basis of the target position prediction result;
and the updating module updates the convolution regression model according to the position and the size of the target obtained by tracking.
The invention relates to the technologies of target integral regression model training, target texture regression model training, target position prediction, target size prediction, tracking model updating and the like, can fully overcome the interference of various environmental factors in a tracking scene, realizes accurate prediction of the target position and size, and has higher commercial value and research significance.
As shown in fig. 3, the method for training the convolution regression model specifically includes the following steps:
constructing a feature extraction network for expressing the target morphology characteristics, wherein the network can be realized based on any feature extraction method for expressing target information;
extracting the corresponding features of the current input image by using a feature extraction network;
constructing a convolution regression model which is realized based on a single convolution network layer and faces the whole target, wherein the size of a convolution kernel of the convolution layer is consistent with that of the target in a characteristic space, an output channel of the convolution layer network is 1, and the output of the convolution layer network can be used for predicting the position of the target;
generating a corresponding training label graph based on the extracted features for expressing the target information, wherein the training label graph is generated according to a two-dimensional Gaussian function, the peak value of the training label graph corresponds to the real position of the target, and the single convolutional layer is iteratively optimized by utilizing a gradient descent algorithm;
constructing a convolution regression model which is realized based on a single convolution network layer and faces to target textures, wherein the size of a convolution kernel of the convolution layer is consistent with that of a target in a characteristic space, an output channel of the convolution layer network is 1, and the output of the convolution layer network can be used for predicting the prospect of the target;
generating a corresponding training label graph based on the extracted features for expressing target information, wherein a rectangular box is used for marking the foreground of a target in the label graph, and a gradient descent algorithm is used for iteratively optimizing a single convolutional layer;
and finishing the initial training of the convolution regression model.
As shown in fig. 4 to 9, the target position prediction method specifically includes the following steps:
extracting the corresponding characteristics of the current input image by using the constructed characteristic extraction network for expressing the target morphology characteristics to prepare for subsequent target tracking;
inputting the obtained image features into the convolution regression network facing the whole target, and calculating to obtain a target position prediction graph H (x) based on a target whole regression modelt,yt);
Inputting the obtained image features into a convolution regression network facing to target textures, and calculating to obtain a target foreground prediction image T (x) based on a target texture regression modelt,yt);
Carrying out mean value filtering operation on the target foreground prediction image, wherein the size of the filtering template is consistent with that of the target, and calculating to obtain a target position prediction image F (x) based on the target foregroundt,yt) Wherein F (x)t,yt) The calculation formula of (2) is as follows:
Figure GDA0002626006760000071
wherein wt-1And ht-1Respectively identifying the size of the target, R (x), obtained in the last frame trackingt,yt,wt-1,ht-1)R(xt,yt,wt-1,ht-1) Representing coordinates of (x)t,yt) A size of wt-1,ht-1I, j are pixel points in the rectangular frame;
two kinds of target position prediction maps H (x) obtained as described abovet,yt) And F (x)t,yt) Superposing the two images together to obtain a final target position prediction image, predicting the target position according to the index corresponding to the maximum value in the position prediction image, wherein the calculation formula is as follows:
Figure GDA0002626006760000082
as shown in fig. 10, the method for predicting the target size includes the following steps:
extracting the corresponding characteristics of the current input image by using the constructed characteristic extraction network for expressing the target morphology characteristics to prepare for subsequent target tracking;
inputting the obtained image features into a convolution layer network to obtain a target texture-oriented convolution regression network, and calculating to obtain a target foreground prediction image T (x) based on a target texture regression modelt,yt);
Obtaining the position x of the current targett,ytAnd knowing the size w of the object in the previous framet,htThen, the target size is calculated as w in the last framet,htA posteriori probability P (O | x)t,yt,wt,ht) Wherein the calculation formula is as follows:
P(wt,ht|O,xt,yt,wt-1,ht-1)=P(O|xt,yt,wt,ht)P(wt,ht|wt-1,ht-1) Wherein P (O | x)t,yt,wt,ht) The position and size state of the object is represented as (x)t,yt,wt,ht) The probability of (a) of (b) being,
Figure GDA0002626006760000081
P(O|xt,yt,wt,ht)=A(wt,ht)-B(wt,ht) Wherein A (w)t,ht) Representing a candidate target rectangular box (x)t,yt,wt,ht) Average target foreground probability of (1), B (w)t,ht) Represents the target rectangle box (x)t,yt,wt,ht) Average target foreground probability of surrounding background regions;
repeatedly utilizing the method to calculate the posterior probabilities corresponding to the target candidate sizes, and selecting the target size with the maximum posterior probability as a final target size predicted value;
the target size prediction ends.
The convolution regression model updating mainly comprises the following steps:
generating a label graph for training a convolution regression model facing the whole target according to the predicted target position, and updating the single convolution layer network parameters by using a gradient descent method;
generating a label graph for training a convolution regression model facing to target textures according to the predicted target size, and updating the single convolution layer network parameters by using a gradient descent method;
and finishing the updating of the convolution regression model.
The method takes a continuous video image sequence as input data, and realizes the continuous tracking of the target in the image sequence through the steps of target integral regression model training, target texture regression model training, target position prediction, target size prediction, tracking model updating and the like after a target rectangular frame needing to be tracked by an algorithm is given. The method can accurately track the target under the conditions that the target rotates or is shielded and the like, simultaneously solves the problem that the size of the target is difficult to accurately predict by a traditional visual target tracking algorithm, and can accurately predict the size of the target when the target deforms. Meanwhile, the method provided by the invention has the characteristics of high tracking accuracy, high running speed, insensitivity to interference of a background environment and the like, and has very wide application prospects in industrial control, automatic production and other occasions.
The present invention is not limited to the above-mentioned preferred embodiments, and any structural changes made under the teaching of the present invention shall fall within the scope of the present invention, which is similar or similar to the technical solutions of the present invention.

Claims (5)

1. A technical method for visual target tracking facing to image sequences is characterized by comprising the following steps:
training a convolution regression model for target tracking by using a given initialization image and a target rectangular frame to be tracked;
predicting the position of the target by using a convolution regression model obtained by training;
further predicting the size of the target on the basis of the target position prediction result;
updating the convolution regression model according to the position and the size of the target obtained by tracking;
the method for training the convolution regression model comprises the following steps:
step 10, constructing a feature extraction network for expressing the target morphology characteristics, wherein the network can be realized based on any feature extraction method for expressing target information;
step 11, extracting the features corresponding to the current input image by using the feature extraction network in the step 10;
step 12, constructing a convolution regression model which is realized based on a single convolution network layer and faces to the whole target, wherein the size of a convolution kernel of the convolution layer is consistent with that of the target in a characteristic space, meanwhile, an output channel of the convolution layer network is 1, and the output of the convolution layer network can be used for predicting the position of the target;
step 13, generating a corresponding training label graph based on the features extracted in the step 11, wherein the training label graph is generated according to a two-dimensional Gaussian function, the peak value of the training label graph corresponds to the real position of the target, and the single convolutional layer in the step 12 is optimized in an iterative manner by using a gradient descent algorithm;
step 14, constructing a convolution regression model which is realized based on a single convolution network layer and faces to the target texture, wherein the size of a convolution kernel of the convolution layer is consistent with that of the target in a feature space, meanwhile, an output channel of the convolution layer network is 1, and the output of the convolution layer network can be used for predicting the prospect of the target;
step 15, generating a corresponding training label map based on the features extracted in step 11, wherein a rectangular frame is used for marking the foreground of the target in the label map, and the single convolutional layer in step 12 is iteratively optimized by using a gradient descent algorithm;
step 16, finishing the initial training of the convolution regression model;
the target position prediction method specifically comprises the following steps:
step 20, extracting the features corresponding to the current input image by using the feature extraction network constructed in the step 10 to prepare for subsequent target tracking;
step 21, inputting the image features obtained in step 20 into the convolution regression network for the whole target obtained in step 12, and calculating to obtain a target position prediction graph H (x) based on the target whole regression modelt,yt);
Step 22, inputting the image features obtained in the step 20 into the convolution regression network facing the target texture obtained in the step 14, and calculating to obtain a target foreground prediction map T (x) based on the target texture regression modelt,yt);
Step 23, performing a mean filtering operation on the target foreground prediction image obtained in step 22, wherein the size of the filtering template is consistent with that of the target, and calculating to obtain a target position prediction image F (x) based on the target foregroundt,yt);
Step 24, superposing the two target position prediction maps obtained in the steps 21 and 23 to obtain a final target position prediction map, and predicting a target position according to an index corresponding to the maximum value in the position prediction maps, wherein the calculation formula is as follows:
Figure FDA0002626006750000021
the target size prediction method comprises the following steps:
step 30, extracting the corresponding features of the current input image by using the feature extraction network constructed in the step 10 to prepare for subsequent target tracking;
step 31, inputting the image features obtained in step 30 into the convolution regression network facing the target texture obtained in step 14, and calculating to obtain a target foreground prediction map T (x) based on the target texture regression modelt,yt);
Step 32, obtainingPosition x of the current targett,ytAnd knowing the size w of the object in the previous framet,htThen, the size of the target in the last frame is calculated to be wt,htA posterior probability of (d);
step 33, calculating posterior probabilities corresponding to the target candidate sizes by repeatedly using the method in step 32, and selecting the target size with the maximum posterior probability as a final target size predicted value;
in step 34, the target size prediction is finished.
2. The method of claim 1, wherein the step 23 predicts a map F (x) based on the target position of the target foregroundt,yt) The calculation formula of (2) is as follows:
Figure FDA0002626006750000031
wherein wt-1And ht-1Respectively identifying the size of the target, R (x), obtained in the last frame trackingt,yt,wt-1,ht-1) Representing coordinates of (x)t,yt) A size of wt-1,ht-1The rectangular frame i, j is the pixel point in the rectangular frame.
3. A technical method for image sequence oriented visual target tracking according to claim 1, characterized in that the target size in said step 32 is wt,htThe formula for calculating the posterior probability of (2) is: p (w)t,ht|O,xt,yt,wt-1,ht-1)=P(O|xt,yt,wt,ht)P(wt,ht|wt-1,ht-1) Wherein P (O | x)t,yt,wt,ht) The position and size state of the object is represented as (x)t,yt,wt,ht) The probability of (a) of (b) being,
Figure FDA0002626006750000032
P(O|xt,yt,wt,ht)=A(wt,ht)-B(wt,ht) Wherein A (w)t,ht) Representing a candidate target rectangular box (x)t,yt,wt,ht) Average target foreground probability of (1), B (w)t,ht) Represents the target rectangle box (x)t,yt,wt,ht) Average target foreground probability of surrounding background area.
4. The technical method for image sequence-oriented visual target tracking according to claim 1, wherein updating the convolution regression model comprises the following steps:
step 40, generating a label graph for training a convolution regression model facing the whole target according to the predicted target position, and updating the parameters of the single convolution layer network in the step 12 by using a gradient descent method;
step 41, generating a labeled graph for training a convolution regression model facing to the target texture according to the predicted target size, and updating the network parameters of the single convolution layer in the step 14 by using a gradient descent method;
and step 42, finishing the updating of the convolution regression model.
5. An image sequence oriented visual target tracking system, comprising:
the training module is used for training a convolution regression model for target tracking;
the target position prediction module predicts the position of a target by using a convolution regression model obtained by training;
the target size prediction module is used for further predicting the size of the target on the basis of the target position prediction result;
the updating module is used for updating the convolution regression model according to the position and the size of the target obtained by tracking;
the method for training the convolution regression model comprises the following steps:
step 10, constructing a feature extraction network for expressing the target morphology characteristics, wherein the network can be realized based on any feature extraction method for expressing target information;
step 11, extracting the features corresponding to the current input image by using the feature extraction network in the step 10;
step 12, constructing a convolution regression model which is realized based on a single convolution network layer and faces to the whole target, wherein the size of a convolution kernel of the convolution layer is consistent with that of the target in a characteristic space, meanwhile, an output channel of the convolution layer network is 1, and the output of the convolution layer network can be used for predicting the position of the target;
step 13, generating a corresponding training label graph based on the features extracted in the step 11, wherein the training label graph is generated according to a two-dimensional Gaussian function, the peak value of the training label graph corresponds to the real position of the target, and the single convolutional layer in the step 12 is optimized in an iterative manner by using a gradient descent algorithm;
step 14, constructing a convolution regression model which is realized based on a single convolution network layer and faces to the target texture, wherein the size of a convolution kernel of the convolution layer is consistent with that of the target in a feature space, meanwhile, an output channel of the convolution layer network is 1, and the output of the convolution layer network can be used for predicting the prospect of the target;
step 15, generating a corresponding training label map based on the features extracted in step 11, wherein a rectangular frame is used for marking the foreground of the target in the label map, and the single convolutional layer in step 12 is iteratively optimized by using a gradient descent algorithm;
step 16, finishing the initial training of the convolution regression model;
the target position prediction method specifically comprises the following steps:
step 20, extracting the features corresponding to the current input image by using the feature extraction network constructed in the step 10 to prepare for subsequent target tracking;
step 21, inputting the image features obtained in step 20 into the convolution regression network for the whole target obtained in step 12, and calculating to obtain a target position prediction graph H (x) based on the target whole regression modelt,yt);
Step 22, the image characteristics obtained in the step 20 are processedInputting the convolution regression network facing the target texture obtained in the step 14, and calculating to obtain a target foreground prediction graph T (x) based on a target texture regression modelt,yt);
Step 23, performing a mean filtering operation on the target foreground prediction image obtained in step 22, wherein the size of the filtering template is consistent with that of the target, and calculating to obtain a target position prediction image F (x) based on the target foregroundt,yt);
Step 24, superposing the two target position prediction maps obtained in the steps 21 and 23 to obtain a final target position prediction map, and predicting a target position according to an index corresponding to the maximum value in the position prediction maps, wherein the calculation formula is as follows:
Figure FDA0002626006750000051
the target size prediction method comprises the following steps:
step 30, extracting the corresponding features of the current input image by using the feature extraction network constructed in the step 10 to prepare for subsequent target tracking;
step 31, inputting the image features obtained in step 30 into the convolution regression network facing the target texture obtained in step 14, and calculating to obtain a target foreground prediction map T (x) based on the target texture regression modelt,yt);
Step 32, obtaining the position x of the current targett,ytAnd knowing the size w of the object in the previous framet,htThen, the size of the target in the last frame is calculated to be wt,htA posterior probability of (d);
step 33, calculating posterior probabilities corresponding to the target candidate sizes by repeatedly using the method in step 32, and selecting the target size with the maximum posterior probability as a final target size predicted value;
in step 34, the target size prediction is finished.
CN201810373435.9A 2018-04-24 2018-04-24 Visual target tracking method and system for image sequence Active CN108734109B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810373435.9A CN108734109B (en) 2018-04-24 2018-04-24 Visual target tracking method and system for image sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810373435.9A CN108734109B (en) 2018-04-24 2018-04-24 Visual target tracking method and system for image sequence

Publications (2)

Publication Number Publication Date
CN108734109A CN108734109A (en) 2018-11-02
CN108734109B true CN108734109B (en) 2020-11-17

Family

ID=63939209

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810373435.9A Active CN108734109B (en) 2018-04-24 2018-04-24 Visual target tracking method and system for image sequence

Country Status (1)

Country Link
CN (1) CN108734109B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829936B (en) * 2019-01-29 2021-12-24 青岛海信网络科技股份有限公司 Target tracking method and device
CN110110787A (en) * 2019-05-06 2019-08-09 腾讯科技(深圳)有限公司 Location acquiring method, device, computer equipment and the storage medium of target
CN112465859A (en) * 2019-09-06 2021-03-09 顺丰科技有限公司 Method, device, equipment and storage medium for detecting fast moving object
CN111027586A (en) * 2019-11-04 2020-04-17 天津大学 Target tracking method based on novel response map fusion
CN112378397B (en) * 2020-11-02 2023-10-10 中国兵器工业计算机应用技术研究所 Unmanned aerial vehicle target tracking method and device and unmanned aerial vehicle

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102881012A (en) * 2012-09-04 2013-01-16 上海交通大学 Vision target tracking method aiming at target scale change
CN103106667A (en) * 2013-02-01 2013-05-15 山东科技大学 Motion target tracing method towards shielding and scene change
CN103632382A (en) * 2013-12-19 2014-03-12 中国矿业大学(北京) Compressive sensing-based real-time multi-scale target tracking method
CN105321188A (en) * 2014-08-04 2016-02-10 江南大学 Foreground probability based target tracking method
EP3229206A1 (en) * 2016-04-04 2017-10-11 Xerox Corporation Deep data association for online multi-class multi-object tracking
CN107403175A (en) * 2017-09-21 2017-11-28 昆明理工大学 Visual tracking method and Visual Tracking System under a kind of movement background
US20180032817A1 (en) * 2016-07-27 2018-02-01 Conduent Business Services, Llc System and method for detecting potential mugging event via trajectory-based analysis

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102881012A (en) * 2012-09-04 2013-01-16 上海交通大学 Vision target tracking method aiming at target scale change
CN103106667A (en) * 2013-02-01 2013-05-15 山东科技大学 Motion target tracing method towards shielding and scene change
CN103632382A (en) * 2013-12-19 2014-03-12 中国矿业大学(北京) Compressive sensing-based real-time multi-scale target tracking method
CN105321188A (en) * 2014-08-04 2016-02-10 江南大学 Foreground probability based target tracking method
EP3229206A1 (en) * 2016-04-04 2017-10-11 Xerox Corporation Deep data association for online multi-class multi-object tracking
US20180032817A1 (en) * 2016-07-27 2018-02-01 Conduent Business Services, Llc System and method for detecting potential mugging event via trajectory-based analysis
CN107403175A (en) * 2017-09-21 2017-11-28 昆明理工大学 Visual tracking method and Visual Tracking System under a kind of movement background

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking";Chao Ma等;《International Journal of Computer Vision (2018)》;20180316;全文 *
"Convolutional Regression for Visual Tracking";Kai Chen等;《arXiv》;20161115;全文 *
"CREST: Convolutional Residual Learning for Visual Tracking";Yibing Song等;《2017 IEEE International Conference on Computer Vision》;20171231;第2574-2581页 *
"基于均值漂移的视觉目标跟踪方法综述";齐飞等;《计算机工程》;20071130;第33卷(第21期);全文 *
Martin Danelljan等."Accurate Scale Estimation for Robust Visual Tracking".《BMVC 2014 •Computer Science》.2014,第1-11页. *

Also Published As

Publication number Publication date
CN108734109A (en) 2018-11-02

Similar Documents

Publication Publication Date Title
CN108734109B (en) Visual target tracking method and system for image sequence
CN110097568B (en) Video object detection and segmentation method based on space-time dual-branch network
CN107194559B (en) Workflow identification method based on three-dimensional convolutional neural network
CN107369166B (en) Target tracking method and system based on multi-resolution neural network
CN110287826B (en) Video target detection method based on attention mechanism
CN110232330B (en) Pedestrian re-identification method based on video detection
Rout A survey on object detection and tracking algorithms
JP2008538832A (en) Estimating 3D road layout from video sequences by tracking pedestrians
CN111340881B (en) Direct method visual positioning method based on semantic segmentation in dynamic scene
CN111199556A (en) Indoor pedestrian detection and tracking method based on camera
CN111886600A (en) Device and method for instance level segmentation of image
CN106952293A (en) A kind of method for tracking target based on nonparametric on-line talking
Mayr et al. Self-supervised learning of the drivable area for autonomous vehicles
CN106023249A (en) Moving object detection method based on local binary similarity pattern
Doulamis Coupled multi-object tracking and labeling for vehicle trajectory estimation and matching
CN105809718A (en) Object tracking method with minimum trajectory entropy
CN112149612A (en) Marine organism recognition system and recognition method based on deep neural network
CN114943840A (en) Training method of machine learning model, image processing method and electronic equipment
Roy et al. A comprehensive survey on computer vision based approaches for moving object detection
KR101690050B1 (en) Intelligent video security system
CN110688512A (en) Pedestrian image search algorithm based on PTGAN region gap and depth neural network
Zhang et al. An optical flow based moving objects detection algorithm for the UAV
CN116385493A (en) Multi-moving-object detection and track prediction method in field environment
He et al. Building extraction based on U-net and conditional random fields
Li et al. Fast visual tracking using motion saliency in video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20231031

Address after: No. 548, 5th Floor, Building 10, No. 28 Linping Avenue, Donghu Street, Linping District, Hangzhou City, Zhejiang Province

Patentee after: Hangzhou Tuke Intelligent Information Technology Co.,Ltd.

Address before: No. 182, Minzu Avenue, Hongshan District, Wuhan City, Hubei Province

Patentee before: SOUTH CENTRAL University FOR NATIONALITIES

TR01 Transfer of patent right