CN112116630A - Target tracking method - Google Patents

Target tracking method Download PDF

Info

Publication number
CN112116630A
CN112116630A CN202010834753.8A CN202010834753A CN112116630A CN 112116630 A CN112116630 A CN 112116630A CN 202010834753 A CN202010834753 A CN 202010834753A CN 112116630 A CN112116630 A CN 112116630A
Authority
CN
China
Prior art keywords
target
frame
network
layer
tracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010834753.8A
Other languages
Chinese (zh)
Inventor
许剑华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Supremind Intelligent Technology Co Ltd
Original Assignee
Shanghai Supremind Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Supremind Intelligent Technology Co Ltd filed Critical Shanghai Supremind Intelligent Technology Co Ltd
Priority to CN202010834753.8A priority Critical patent/CN112116630A/en
Publication of CN112116630A publication Critical patent/CN112116630A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a video monitoring technology, in particular to a target tracking method. The method is mainly used for realizing a single-target tracking scheme based on a reinforcement learning algorithm, and each tracking of each target is treated as a strategy problem. And optimizing a control strategy (policy) through the tracked state (reward), and further training the deep learning network. Compared with the general RL scheme, the method directly predicts the target based on the deducted image and predicts the related characteristics by using the target template and the image by combining the Siemese network structure and the RL scheme, can use the peripheral information of the target, improves the semantic background distinguishing capability of the target and improves the tracking robustness. The method simultaneously uses DQN for tracking, does not perform decoupling on estimation of the value function and optimization of the strategy, and is easy to train and not converge or overfit.

Description

Target tracking method
Technical Field
The invention relates to a video monitoring technology, in particular to a target tracking method.
Background
In an intelligent traffic system, a video monitoring technology is widely applied. Currently, research focuses on vehicle detection, identification, tracking, traffic statistics, traffic dispersion, and violation detection.
The existing target tracking method mainly comprises the following steps:
firstly, the method comprises the following steps: patent application No. 201810592957.8 multi-target tracking based on multi-agent deep reinforcement learning:
according to the patent, each frame is detected by using YOLOV3, objects are deducted by using a detection frame, an agent is created for each deducted object, then the deducted objects are subjected to feature extraction, and the action prediction is performed on each object through LSTM and DQN.
The single target tracking of the scheme extracts the features of each target based on CNN, extracts the related feature information of all targets by using LSTM, and predicts the action (the prediction of the position of the target frame) of each target frame through the strategy (policy) learned by DQN. A total of 9 actions are involved.
This patent suffers from the following disadvantages:
1) and the DQN is used for performing action and a cost function, so that overfitting is easy and training is not easy.
2) And the target prediction problem is regarded as a split screen (discrete) problem, so that the tracking precision is insufficient, and the target position change is regarded as a continuous problem of a space.
II, secondly: the patent of application No. 201810220513.1, multi-target tracking method based on deep reinforcement learning:
in the patent, each target is assigned with an agent, then the target frame is predicted through a DQN network, and the image input into the DQN network is the deducted detection frame image.
This patent suffers from the following disadvantages: the DQN is an action value optimization algorithm, and the strategy is optimized by maximizing the action value and converged to the local maximum. But the convergence is sensitive to reward and is not easy to converge.
Thirdly, the method comprises the following steps: siemeselrpn series single target tracking:
and respectively sending the template picture and the global picture into a CNN network in a certain area, then correlating by an RPN network, and respectively predicting the frame and the category of the target by two different correlated branches.
This method has the following disadvantages: the siamese rpn series scheme assumes that the current position of the target is near the position of the previous frame, so that the correlation operation is not performed on the whole image, but the image in a small range of the position of the previous frame of the target is deducted for correlation, and thus, the main problem of the operation is that the semantic background discrimination capability of the siamese rpn correlation operation is poor, so that the mis-tracking is easily caused, and the range of feature extraction is limited. Meanwhile, the SiemesRPN network does not consider the historical position information of the target, and only depends on the current template and the current frame for prediction.
Accordingly, there is a need for improvements in the art.
Disclosure of Invention
The invention aims to provide an efficient target tracking method.
In order to solve the above technical problem, the present invention provides a target tracking method, including the following steps:
1) inputting a target frame for selecting a target in the current frame image; executing the step 2;
2) the Simese feature extraction module uses an input target frame or a predicted target frame to scratch a target, the target frame scratch is used as a target scratch, and then a CNN network is used for extracting features of the target scratch and a current frame image; processing the two characteristic graphs to obtain a characteristic graph by using the same CNN network, and then performing related calculation on the two characteristic graphs to obtain a related characteristic graph which is used as the input of an RL tracking module; executing the step 3;
3) the RL tracking module predicts the position of a target frame of the next frame image based on the related characteristic diagram of the current frame image extracted in the step 2 to obtain the difference of the current frame relative to the previous frame, and obtains the position of the predicted target frame according to the difference and the position of the current frame target frame; according to the position of the prediction target frame, performing feature deduction from the next frame image, and performing score prediction to obtain tracked state reward;
4) and optimizing the control strategy policy according to the tracked state reward obtained in the step 3.
As an improvement to the method of object tracking of the present invention:
score prediction includes iou, giou, L2 distances, etc.
As a further improvement to the method of object tracking of the present invention:
the Siamese feature extraction module comprises three convolutional layers;
the RL tracking module comprises an Actor network and a Critic network, wherein the Actor network and the Critic network comprise shared three-layer convolution layers, the Actor network further comprises a convolution layer, two FC layers and an action layer, and the Critic network further comprises a Crop layer, a convolution layer, two FC layers and a score layer;
and the Simase feature extraction module inputs the output of the last convolution layer into the convolution layer of the Actor network and the Crop layer of the Critic network, and inputs the output of the action layer of the Actor network into the Crop layer of the Critic network.
As a further improvement to the method of object tracking of the present invention:
in step 3:
the Actor network performs action prediction on a next frame target frame based on a current target frame to obtain the difference of the next frame relative to the current frame, including but not limited to the following parameters: Δ x, Δ y, Δ w, Δ h;
wherein, the position change of the target next frame prediction relative to the central point of the current frame is delta x and delta y, and the height and width change of the target next frame prediction relative to the current frame is delta w and delta h;
obtaining the position of a predicted target frame according to the position of the current frame target frame and the position of the delta x, the delta y, the delta w and the delta h;
the Critic network deducts the target from the middle layer by using the position of the predicted target frame, namely deducts related features from a certain layer in the middle of the Critic network, and then carries out feature extraction according to the position of the target;
then predicting the IOU of the target through the convolution layer and the FC layer; the predicted value is used as the action value socre of the RL network; for evaluating the reliability of the current action.
The single-target tracking method has the technical advantages that:
the method is mainly used for realizing a single-target tracking scheme based on a reinforcement learning algorithm, and each tracking of each target is treated as a strategy problem. And optimizing a control strategy (policy) through the tracked state (reward), and further training the deep learning network.
1) Compared with a general RL scheme, the method has the advantages that the target prediction is directly carried out on the basis of the deducted image, the target template and the image are used for carrying out the related characteristic prediction, the peripheral information of the target can be used, the semantic background distinguishing capability of the target is improved, and the tracking robustness is improved.
2) In the prior art, the position of a target frame is predicted by using a deducted image, and information around the target is lacked, so that the tracking robustness is poor. The method simultaneously uses DQN for tracking, has no decoupling on the estimation of the valence function and the optimization of the strategy, and is easy to train and not converge or overfit.
Drawings
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
FIG. 1 is a schematic diagram of an algorithmic framework of a system for object tracking according to the present invention;
FIG. 2 is a flow chart of a method of target tracking of the present invention;
FIG. 3 is a block diagram of the Siemese feature extraction module;
fig. 4 is a block diagram of the Siamese feature extraction module in connection with the RL tracking module.
Detailed Description
The invention will be further described with reference to specific examples, but the scope of the invention is not limited thereto.
Embodiment 1, a system for target tracking, as shown in fig. 1-4, includes a Siamese feature extraction module and a RL tracking module connected to each other.
The RL tracking module carries out tracking operation by using the features obtained by the Siamese network, adopts an operator-critic scheme, trains the network by using a policy gradient, does not directly depend on reward, increases the robustness of the network, and decouples the training of the policy and the prediction of the action value by using the operator-critic scheme. The RL trace module comprises an Actor network and a Critic network.
The actor network and critic network share features through the preceding shared convolutional layer.
An Actor network:
the Actor makes action prediction on the current target frame based on the previous frame target frame, including but not limited to the following parameters: Δ x, Δ y, Δ w, Δ h;
Δ x, Δ y are the position changes of the center point of the target current frame prediction relative to the previous frame, and Δ w, Δ h are the height and width changes of the target frame.
Critic network:
the Critic network deducts a target from the middle layer by using the predicted target position, and then predicts the IOU of the target through the convolution layer and the FC layer. The predicted value serves as the action value of the RL network.
The Simase feature extraction module comprises three convolutional layers, the Actor network and the Critic network comprise shared three convolutional layers, the Actor network further comprises a convolutional layer, two FC layers and an action layer, and the Critic network further comprises a Crop layer, a convolutional layer, two FC layers and a score layer.
And the Simase feature extraction module inputs the output of the last convolution layer into the convolution layers of the Actor network and the Critic network, and inputs the output of the action layer of the Actor network into the Crop layer of the Critic network.
The target tracking method comprises the following steps:
1) firstly, externally inputting a target frame for selecting a target in a current frame image for initializing a tracking algorithm; executing the step 2;
2) the Simese module uses the input target frame or the predicted target frame to scratch the target, the target frame is used as the target scratch, and then the CNN network is used for extracting the characteristics of the target scratch and the current frame image; and processing the two characteristic graphs by using the same CNN network to obtain a characteristic graph, and then performing a correlation algorithm on the two characteristic graphs to obtain a correlation characteristic graph which is used as the input of an RL tracking module. Executing the step 3;
3) the RL tracking module predicts the position of a target frame of the next frame image based on the related characteristic diagram of the current frame image extracted in the step 2 to obtain the difference of the next frame image relative to the current frame image, and obtains the position of the predicted target frame according to the difference and the position of the current frame target frame; and (3) performing feature extraction from the next frame of image according to the position of the prediction target frame, outputting a score as score prediction by the network, and evaluating the difference between the prediction frame and the truth frame as tracked state reward by the score prediction, wherein the score prediction generally adopts the schemes of iou, giou, L2 distance and the like.
4) Optimizing the control strategy policy according to the tracked state reward obtained in the step 3, and training the network of the RL tracking module by using a policy gradient.
The specific prediction method comprises the following steps:
the Actor network performs action prediction on a next frame target frame based on a current target frame to obtain the difference of the next frame relative to the current frame, including but not limited to the following parameters: Δ x, Δ y, Δ w, Δ h;
wherein, the position change of the target next frame prediction relative to the center point of the current frame is delta x and delta y, and the height and width change of the target next frame prediction relative to the current frame is delta w and delta h.
And obtaining the position of the predicted target frame according to the position of the current frame target frame and the position of the delta x, the delta y, the delta w, the delta h.
The Critic network deducts the target from the middle layer by using the position of the predicted target frame, namely deducts related features from a certain layer in the middle of the Critic network, and then carries out feature extraction according to the position of the target;
then, the IOU of the target is predicted through the convolution layer and the FC layer. The predicted value serves as the action value of the RL network. For evaluating the reliability of the current action.
If a plurality of single targets need to be tracked, a plurality of tracking instances can be started to realize the tracking of the plurality of single targets.
Finally, it is also noted that the above-mentioned lists merely illustrate a few specific embodiments of the invention. It is obvious that the invention is not limited to the above embodiments, but that many variations are possible. All modifications which can be derived or suggested by a person skilled in the art from the disclosure of the present invention are to be considered within the scope of the invention.

Claims (4)

1. A method of target tracking, characterized by: the method comprises the following steps:
1) inputting a target frame for selecting a target in the current frame image; executing the step 2;
2) the Simese feature extraction module uses an input target frame or a predicted target frame to scratch a target, the target frame scratch is used as a target scratch, and then a CNN network is used for extracting features of the target scratch and a current frame image; processing the two characteristic graphs to obtain a characteristic graph by using the same CNN network, and then performing related calculation on the two characteristic graphs to obtain a related characteristic graph which is used as the input of an RL tracking module; the correlation algorithm executes step 3;
3) the RL tracking module predicts the position of a target frame of the next frame image based on the related characteristic diagram of the current frame image extracted in the step 2 to obtain the difference of the current frame relative to the previous frame, and obtains the position of the predicted target frame according to the difference and the position of the current frame target frame; performing feature extraction from the next frame image according to the position of the prediction target frame, outputting a score as score prediction by a network, and using the score prediction as tracked state reward;
4) and optimizing the control strategy policy according to the tracked state reward obtained in the step 3.
2. The method of object tracking according to claim 1, characterized by:
score predictions include, but are not limited to, iou, giou, L2 distances, and the like.
3. The method of object tracking according to claim 2, characterized by:
the Siamese feature extraction module comprises three convolutional layers;
the RL tracking module comprises an Actor network and a Critic network, wherein the Actor network and the Critic network comprise shared three-layer convolution layers, the Actor network further comprises a convolution layer, two FC layers and an action layer, and the Critic network further comprises a Crop layer, a convolution layer, two FC layers and a score layer;
and the Simase feature extraction module inputs the output of the last convolution layer into the convolution layer of the Actor network and the Crop layer of the Critic network, and inputs the output of the action layer of the Actor network into the Crop layer of the Critic network.
4. The method of object tracking according to claim 3, wherein:
in step 3:
the Actor network performs action prediction on a next frame target frame based on a current target frame to obtain the difference of the next frame relative to the current frame, including but not limited to the following parameters: Δ x, Δ y, Δ w, Δ h;
wherein, the position change of the target next frame prediction relative to the central point of the current frame is delta x and delta y, and the height and width change of the target next frame prediction relative to the current frame is delta w and delta h;
obtaining the position of a predicted target frame according to the position of the current frame target frame and the position of the delta x, the delta y, the delta w and the delta h;
the Critic network deducts the target from the middle layer by using the position of the predicted target frame, namely deducts related features from a certain layer in the middle of the Critic network, and then carries out feature extraction according to the position of the target;
then predicting the IOU of the target through the convolution layer and the FC layer; the predicted value is used as the action value socre of the RL network; for evaluating the reliability of the current action.
CN202010834753.8A 2020-08-19 2020-08-19 Target tracking method Pending CN112116630A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010834753.8A CN112116630A (en) 2020-08-19 2020-08-19 Target tracking method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010834753.8A CN112116630A (en) 2020-08-19 2020-08-19 Target tracking method

Publications (1)

Publication Number Publication Date
CN112116630A true CN112116630A (en) 2020-12-22

Family

ID=73803769

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010834753.8A Pending CN112116630A (en) 2020-08-19 2020-08-19 Target tracking method

Country Status (1)

Country Link
CN (1) CN112116630A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113421287A (en) * 2021-07-16 2021-09-21 上海微电机研究所(中国电子科技集团公司第二十一研究所) Robot based on vision active target tracking and control method and system thereof
JP2023535672A (en) * 2021-06-30 2023-08-21 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド OBJECT SEGMENTATION METHOD, OBJECT SEGMENTATION APPARATUS, AND ELECTRONIC DEVICE

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767456A (en) * 2019-01-09 2019-05-17 上海大学 A kind of method for tracking target based on SiameseFC frame and PFP neural network
CN110120064A (en) * 2019-05-13 2019-08-13 南京信息工程大学 A kind of depth related objective track algorithm based on mutual reinforcing with the study of more attention mechanisms
US20200026954A1 (en) * 2019-09-27 2020-01-23 Intel Corporation Video tracking with deep siamese networks and bayesian optimization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767456A (en) * 2019-01-09 2019-05-17 上海大学 A kind of method for tracking target based on SiameseFC frame and PFP neural network
CN110120064A (en) * 2019-05-13 2019-08-13 南京信息工程大学 A kind of depth related objective track algorithm based on mutual reinforcing with the study of more attention mechanisms
US20200026954A1 (en) * 2019-09-27 2020-01-23 Intel Corporation Video tracking with deep siamese networks and bayesian optimization

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XINGYU WAN等: "Visual Tracking Using Online Deep Reinforcement Learning with Heatmap", 《2019 2ND CHINA SYMPOSIUM ON COGNITIVE COMPUTING AND HYBRID INTELLIGENCE (CCHI)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2023535672A (en) * 2021-06-30 2023-08-21 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド OBJECT SEGMENTATION METHOD, OBJECT SEGMENTATION APPARATUS, AND ELECTRONIC DEVICE
JP7372487B2 (en) 2021-06-30 2023-10-31 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Object segmentation method, object segmentation device and electronic equipment
CN113421287A (en) * 2021-07-16 2021-09-21 上海微电机研究所(中国电子科技集团公司第二十一研究所) Robot based on vision active target tracking and control method and system thereof

Similar Documents

Publication Publication Date Title
Hoermann et al. Dynamic occupancy grid prediction for urban autonomous driving: A deep learning approach with fully automatic labeling
CN107516321B (en) Video multi-target tracking method and device
CN106845487B (en) End-to-end license plate identification method
Hoermann et al. Object detection on dynamic occupancy grid maps using deep learning and automatic label generation
CN112052802B (en) Machine vision-based front vehicle behavior recognition method
CN112734808B (en) Trajectory prediction method for vulnerable road users in vehicle driving environment
CN112489081B (en) Visual target tracking method and device
WO2011028380A2 (en) Foreground object detection in a video surveillance system
WO2011022277A2 (en) Inter-trajectory anomaly detection using adaptive voting experts in a video surveillance system
CN112116630A (en) Target tracking method
JP2012190159A (en) Information processing device, information processing method, and program
CN111950498A (en) Lane line detection method and device based on end-to-end instance segmentation
Pool et al. Crafted vs learned representations in predictive models—A case study on cyclist path prediction
CN116985793B (en) Automatic driving safety control system and method based on deep learning algorithm
Lim et al. Gaussian process auto regression for vehicle center coordinates trajectory prediction
CN113947208A (en) Method and apparatus for creating machine learning system
CN110111358B (en) Target tracking method based on multilayer time sequence filtering
CN111160089A (en) Trajectory prediction system and method based on different vehicle types
Tran et al. A probabilistic discriminative approach for situation recognition in traffic scenarios
CN115861386A (en) Unmanned aerial vehicle multi-target tracking method and device through divide-and-conquer association
WO2022127819A1 (en) Sequence processing for a dataset with frame dropping
CN110244746B (en) Robot dynamic barrier avoiding method and system based on visual attention
EP3879461A1 (en) Device and method for training a neuronal network
CN111273779B (en) Dynamic gesture recognition method based on self-adaptive space supervision
Priya et al. Vehicle Detection in Autonomous Vehicles Using Computer Vision Check for updates

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201222