CN113256680A - High-precision target tracking system based on unsupervised learning - Google Patents

High-precision target tracking system based on unsupervised learning Download PDF

Info

Publication number
CN113256680A
CN113256680A CN202110523935.8A CN202110523935A CN113256680A CN 113256680 A CN113256680 A CN 113256680A CN 202110523935 A CN202110523935 A CN 202110523935A CN 113256680 A CN113256680 A CN 113256680A
Authority
CN
China
Prior art keywords
tracker
target
tracking
frame
selection module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110523935.8A
Other languages
Chinese (zh)
Inventor
胡硕
王洁
周思恩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanshan University
Original Assignee
Yanshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanshan University filed Critical Yanshan University
Priority to CN202110523935.8A priority Critical patent/CN113256680A/en
Publication of CN113256680A publication Critical patent/CN113256680A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a high-precision target tracking system based on unsupervised learning, which comprises an image acquisition module, a target tracking module and a target tracking module, wherein the image acquisition module is used for acquiring video images; the tracking module comprises a tracker 1 and a tracker 2 and is used for obtaining the characteristics of the image and the target rectangular frame; the selection module comprises two full connection layers and a softmax layer, wherein the full connection layers comprise a linear full connection layer and a Relu activation function; and taking the characteristic diagram of the tracker to be selected and the tracker result as input, and outputting the best tracking result through the selector. The invention obtains the result through two different trackers and obtains the optimal result output through the judgment of the selector, and the tracking is continued in the subsequent frames so as to adapt to the target tracking under different scenes.

Description

High-precision target tracking system based on unsupervised learning
Technical Field
The invention relates to the technical field of computer vision tracking, in particular to a high-precision target tracking system based on unsupervised learning.
Background
Target tracking is a basic task in computer vision, the purpose of which is to locate a target object in a video given a bounding box annotation in the first frame. The current target tracking has wide application fields, such as an intelligent transportation system, a medical field, man-machine interaction, athlete match analysis and the like.
Current advanced depth tracking methods typically use pre-processed CNN models for feature extraction, these models are trained in a supervised fashion, require a large number of annotated labels, are expensive and time consuming to manually label, and unlabeled videos are more readily available on the internet. Therefore, it is worth exploring how to visually track using unmarked video sequences.
The current data is extremely easy to obtain in the Internet, the problem of manual labeling is solved by the development of an unsupervised technology, and the method plays a great role in target tracking of deep learning. Unsupervised learning on video has resulted in a great deal of research effort, and in order to learn visual features from unlabeled data, unsupervised methods explore intrinsic information inside images or video from different perspectives as surveillance signals, and train through aspects such as designing loss functions and agent tasks. In the prior art, for example, a patent of an unsupervised related filtering target tracking method and system based on a jigsaw task, a prediction task training network for indexing image block positions is adopted to extract feature capability, and one tracker is mainly trained to adapt to different scenes, but usually, a certain tracker has inherent defects so that the tracker is difficult to adapt to different scenes. Therefore, the prior art has certain defects and shortcomings, and further has a space for further promotion and improvement.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide a high-precision target tracking system based on unsupervised learning, which adapts to target tracking in different scenes by selecting two different trackers, and a selection module selects an optimal result for continuous tracking of a subsequent frame according to comparison of tracking results, and has the advantages of simple structure, improved precision and improved robustness.
The technical scheme adopted by the invention is as follows:
the invention provides a high-precision target tracking system based on unsupervised learning, which comprises the following modules:
the image acquisition module is used for acquiring a video image;
the tracking module comprises a tracker 1 and a tracker 2 and is used for obtaining the characteristics of the image and the target rectangular frame;
the selection module comprises two full connection layers and a softmax layer, wherein the full connection layers comprise a linear full connection layer and a Relu activation function; and taking the characteristic diagram of the tracker to be selected and the tracker result as input, and outputting the best tracking result through the selector.
Further, tracker 1 and tracker 2 in the tracking module adopt two different trackers, are suitable for tracking under different scenes and respectively use two different loss functions for training, and specifically include the following steps:
Figure BDA0003065105150000021
Lm=∑izi (2)
Figure BDA0003065105150000022
wherein L isCIs a loss function selected by the tracker 1, RTIs a label for cropping the template patch from the initial frame, which is a Gaussian response centered at the center of the initial bounding box, ZTIs a response graph generated by a second search frame in the back tracking, and utilizes cycle consistency training; l ismIs the Huber loss function of the tracker 2,
Figure BDA0003065105150000023
is a reconstructed frame of the video sequence to be displayed,
Figure BDA0003065105150000024
is a real frame, trained using the consistency of pixel reorganization.
Further, the specific content of the selection module is as follows:
the selection module aims at selecting the most appropriate target result according to the tracking result; in the selection module, the results of two trackers need to be operated simultaneously, so that the preferred selection in the tracker 1 and the tracker 2 can be realized;
(1) acquiring an overlapping value IOU between the candidate box and the pseudo label; after the two trackers track forward to obtain a predicted target position, the predicted target position is used as a pseudo label to track reversely to skip an interval n frame, a new predicted position is obtained in an initial frame, and a new estimated target frame and an annotation frame in the initial frame are used for calculating the degree of coincidence IOU; the label P required by selector training is obtained through the IOU value, and the calculation formula of the label P is as (4)
Figure BDA0003065105150000031
(2) The selection module consists of two fully-connected layers and a softmax layer, wherein the fully-connected layers comprise a linear fully-connected layer and a Relu activation function; inputting a feature map obtained by a tracker to be selected into a selection module, and respectively obtaining the probability values of the precision estimation of the two target frames by the feature map through the selection module; this selector is trained for this selection module using a cross-entropy loss function, which is equation (5):
Ls=-∑plna+(1-p)ln(1-a) (5)
wherein a is a probability value of the target box precision estimation obtained by the characteristics through a selector, and p is a label required by the selector for training;
(3) in the tracking stage, a target is positioned by using the tracker, the characteristic diagram and the positioning result obtained by the tracker are used as the input of the selection module, and the result of the optimal tracker can be directly judged and output by the selection module.
Compared with the prior art, the invention has the following beneficial effects:
according to the invention, a combined model is designed to comprise an image acquisition module, a tracking module and a selection module, results obtained by two different trackers are judged by a selector to obtain the optimal result output, and tracking is continued in subsequent frames; the target tracking algorithm faces huge challenges such as motion blur, shielding and the like, and the method has the advantages that the two trackers have different applied target motion scenes, the result of the proper tracker is selected for tracking, the structure is simple, and the precision and the robustness can be effectively improved.
Drawings
FIG. 1 is a block flow diagram of the system of the present invention;
FIG. 2 is a schematic diagram of a training method of the tracker 1;
fig. 3 is a schematic diagram of a training method of the tracker 2.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
As shown in fig. 1, the high-accuracy target tracking system based on unsupervised learning proposed by the present invention includes the following modules:
the image acquisition module is used for acquiring a video image;
the tracking module comprises a tracker 1 and a tracker 2 and is used for obtaining the characteristics of the image and the target rectangular frame;
the selection module comprises two full connection layers and a softmax layer, wherein the full connection layers comprise a linear full connection layer and a Relu activation function; and taking the characteristic diagram of the tracker to be selected and the tracker result as input, and outputting the best tracking result through the selector.
The tracking module part comprises two branches, namely a tracker 1 and a tracker 2, wherein the tracker 1 can track a target without sudden change in scene with high precision, good robustness and high speed; the tracker 2 has a memory storage body, so that moving targets such as occlusion or target loss can be tracked with higher precision; the two trackers can be automatically adjusted in different scenes by means of complementary advantages so as to ensure higher target tracking precision. The training steps for both trackers are as follows:
the tracker 1 adopts the idea of cycle consistency as shown in fig. 2 for training, and comprises the following specific steps: randomly selecting three patches in continuous 10 frames of a video, wherein any one patch is set as a first frame template, and the rest patches are set as search patches; giving a target object annotated on a template frame, sequentially carrying out forward tracking twice in the subsequent two frames, then directly carrying out backward tracking on the first frame by using a predicted position in the last frame as an initial target annotation, wherein the initial annotation of the first frame is consistent with the target position predicted by the backward tracking in the first frame in principle, and then utilizing the error of the tracking result to train between the initial annotation and the backward tracking, and the training function is formula (1)
Figure BDA0003065105150000041
Wherein L isCIs a loss function selected by the tracker 1, RTIs the label of the initial frame cropping template patch, which is a Gaussian response centered at the center of the initial bounding box, ZTIs a response graph generated from the second search frame in back-tracking, using cyclic consistency training.
The tracker 2 uses fine-grained based pixel matching, as shown in fig. 3, using a memory storage for storing a plurality of frames of information, which has the advantage that it can use more feature information. The tracker is designed with long-term and short-term storage, and in a video sequence, the target can be changed with the time, so that when the target is changed during tracking, the target is not utilized by previous good features, and errors can be amplified and even lost during subsequent tracking easily. The use of this memory storage, which is configured to store 5 frames of information, fix the 0 th and 5 th frames as long-term memory, ensures that the previous feature information can always be stored in the long-term video sequence, and then use It-5,It-3,It-1As short-term memory, the latest feature information is obtained. The tracker 2 derives the reference frame (I) by linear combinationt-1) And training the pixel recombination target frame. In particular, for each input frame ItThere is one triplet (Qt, Kt,vt), i.e., Query, Key, Value. Taking the current frame and multiple past frames in the memory bank as input, using a trained feature encoder to calculate an affinity matrix between Q in the target frame and K in the reference frame, and reconstructing the pixel formula in the t frame as (2)
Figure BDA0003065105150000051
Figure BDA0003065105150000052
Wherein,<·,·>is the dot product of two vectors, Q and K are the target frame ItThe feature representation after the twin network, K corresponds to the features of a plurality of reference frames, Q is the feature of the current target frame, AtIs a pixel
Figure BDA0003065105150000053
And
Figure BDA0003065105150000054
v is the original reference frame.
By reconstructing frames
Figure BDA0003065105150000055
And the original frame
Figure BDA0003065105150000056
Training the tracker with the loss therebetween, the loss function being as in equation (4)
Lm=∑izi (4)
Figure BDA0003065105150000057
Wherein L ismIs the Huber loss function of the tracker 2, trained with the consistency of the pixel reorganization.
The purpose of the selection module is to select according to the tracking resultSelecting the most suitable target result; in the selection module, two tracker results need to be run simultaneously, and the implementation in the tracker T can be realized1,T2Selecting the Chinese medicines preferentially; the specific contents are as follows:
(1) acquiring an overlapping value IOU between the candidate box and the pseudo label; because unsupervised learning has no pseudo label, after the two trackers track forward to obtain the predicted target position, the predicted target position is used as the pseudo label to track reversely to skip the interval n frames, a new predicted position is obtained in the initial frame, and the overlap ratio IOU is calculated between the new predicted target frame and the annotation frame in the initial frame. The label P calculation formula required by selector training is obtained through the IOU value as (6)
Figure BDA0003065105150000061
(2) The selection module consists of two full connection layers and a softmax layer, wherein the full connection layers comprise a linear full connection layer and a Relu activation function; and inputting the feature map obtained by the tracker to be selected into a selection module, and respectively obtaining the probability values of the precision estimation of the two target frames by the feature map through the selection module. This selector is trained using a cross entropy loss function. The loss function is of the formula (7)
Ls=-∑plna+(1-p)ln(1-a) (7)
Wherein a is the probability value of the target box precision estimation of the features obtained by the selector, and p is the label required by the selector training.
(3) In the tracking stage, a target is positioned by using the tracker, the characteristic diagram and the positioning result obtained by the tracker are used as the input of the selection module, and the result of the optimal tracker can be directly judged and output by the selection module.
The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solution of the present invention by those skilled in the art should fall within the protection scope defined by the claims of the present invention without departing from the spirit of the present invention.

Claims (3)

1. A high-precision target tracking system based on unsupervised learning is characterized in that: the system comprises the following modules:
the image acquisition module is used for acquiring a video image;
the tracking module comprises a tracker 1 and a tracker 2 and is used for obtaining the characteristics of the image and the target rectangular frame;
the selection module comprises two full connection layers and a softmax layer, wherein the full connection layers comprise a linear full connection layer and a Relu activation function; and taking the characteristic diagram of the tracker to be selected and the tracker result as input, and outputting the best tracking result through the selector.
2. The unsupervised learning high accuracy-based target tracking system of claim 1, wherein: tracker 1 and tracker 2 in the tracking module adopt two kinds of different trackers, are suitable for and track and use two different loss function training respectively under different scenes, specifically as follows:
Figure FDA0003065105140000011
Lm=∑izi (2)
Figure FDA0003065105140000012
wherein L isCIs a loss function selected by the tracker 1, RTIs a label for cropping the template patch from the initial frame, which is a Gaussian response centered at the center of the initial bounding box, ZTIs a response graph generated by a second search frame in the back tracking, and utilizes cycle consistency training; l ismIs the Huber loss function of the tracker 2,
Figure FDA0003065105140000013
is a reconstructed frame of the video sequence to be displayed,
Figure FDA0003065105140000014
is a real frame, trained using the consistency of pixel reorganization.
3. The unsupervised learning high accuracy-based target tracking system of claim 1, wherein: the specific contents of the selection module are as follows:
the selection module aims at selecting the most appropriate target result according to the tracking result; in the selection module, the results of two trackers need to be operated simultaneously, so that the preferred selection in the tracker 1 and the tracker 2 can be realized;
(1) acquiring an overlapping value IOU between the candidate box and the pseudo label; after the two trackers track forward to obtain a predicted target position, the predicted target position is used as a pseudo label to track reversely to skip an interval n frame, a new predicted position is obtained in an initial frame, and a new estimated target frame and an annotation frame in the initial frame are used for calculating the degree of coincidence IOU; the label P required by selector training is obtained through the IOU value, and the calculation formula of the label P is as (4)
Figure FDA0003065105140000021
(2) The selection module consists of two fully-connected layers and a softmax layer, wherein the fully-connected layers comprise a linear fully-connected layer and a Relu activation function; inputting a feature map obtained by a tracker to be selected into a selection module, and respectively obtaining the probability values of the precision estimation of the two target frames by the feature map through the selection module; this selector is trained for this selection module using a cross-entropy loss function, which is equation (5):
Ls=-∑p ln a+(1-p)ln(1-a) (5)
wherein a is a probability value of the target box precision estimation obtained by the characteristics through a selector, and p is a label required by the selector for training;
(3) in the tracking stage, a target is positioned by using the tracker, the characteristic diagram and the positioning result obtained by the tracker are used as the input of the selection module, and the result of the optimal tracker can be directly judged and output by the selection module.
CN202110523935.8A 2021-05-13 2021-05-13 High-precision target tracking system based on unsupervised learning Pending CN113256680A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110523935.8A CN113256680A (en) 2021-05-13 2021-05-13 High-precision target tracking system based on unsupervised learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110523935.8A CN113256680A (en) 2021-05-13 2021-05-13 High-precision target tracking system based on unsupervised learning

Publications (1)

Publication Number Publication Date
CN113256680A true CN113256680A (en) 2021-08-13

Family

ID=77181801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110523935.8A Pending CN113256680A (en) 2021-05-13 2021-05-13 High-precision target tracking system based on unsupervised learning

Country Status (1)

Country Link
CN (1) CN113256680A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117951648A (en) * 2024-03-26 2024-04-30 成都正扬博创电子技术有限公司 Airborne multisource information fusion method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682696A (en) * 2016-12-29 2017-05-17 华中科技大学 Multi-example detection network based on refining of online example classifier and training method thereof
CN109978045A (en) * 2019-03-20 2019-07-05 深圳市道通智能航空技术有限公司 A kind of method for tracking target, device and unmanned plane
US20190228266A1 (en) * 2018-01-22 2019-07-25 Qualcomm Incorporated Failure detection for a neural network object tracker
US20190347806A1 (en) * 2018-05-09 2019-11-14 Figure Eight Technologies, Inc. Video object tracking
CN110569793A (en) * 2019-09-09 2019-12-13 西南交通大学 Target tracking method for unsupervised similarity discrimination learning
CN111161558A (en) * 2019-12-16 2020-05-15 华东师范大学 Method for judging forklift driving position in real time based on deep learning
CN111950367A (en) * 2020-07-08 2020-11-17 中国科学院大学 Unsupervised vehicle re-identification method for aerial images

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682696A (en) * 2016-12-29 2017-05-17 华中科技大学 Multi-example detection network based on refining of online example classifier and training method thereof
US20190228266A1 (en) * 2018-01-22 2019-07-25 Qualcomm Incorporated Failure detection for a neural network object tracker
US20190347806A1 (en) * 2018-05-09 2019-11-14 Figure Eight Technologies, Inc. Video object tracking
CN109978045A (en) * 2019-03-20 2019-07-05 深圳市道通智能航空技术有限公司 A kind of method for tracking target, device and unmanned plane
CN110569793A (en) * 2019-09-09 2019-12-13 西南交通大学 Target tracking method for unsupervised similarity discrimination learning
CN111161558A (en) * 2019-12-16 2020-05-15 华东师范大学 Method for judging forklift driving position in real time based on deep learning
CN111950367A (en) * 2020-07-08 2020-11-17 中国科学院大学 Unsupervised vehicle re-identification method for aerial images

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BINENG ZHONG等: "Visual tracking via weakly supervised learning from multiple imperfect oracles", 《2010 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
NING WANG等: "Unsupervised deep tracking", 《2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
ZIHANG LAI等: "MAST: A Memory-Augmented Self-Supervised Tracker", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117951648A (en) * 2024-03-26 2024-04-30 成都正扬博创电子技术有限公司 Airborne multisource information fusion method and system
CN117951648B (en) * 2024-03-26 2024-06-07 成都正扬博创电子技术有限公司 Airborne multisource information fusion method and system

Similar Documents

Publication Publication Date Title
Sun et al. Simultaneous detection and tracking with motion modelling for multiple object tracking
CN103336957B (en) A kind of network homology video detecting method based on space-time characteristic
Lu et al. Monet: Motion-based point cloud prediction network
CN111182364B (en) Short video copyright detection method and system
CN112446342A (en) Key frame recognition model training method, recognition method and device
CN111523463B (en) Target tracking method and training method based on matching-regression network
Oh et al. Space-time memory networks for video object segmentation with user guidance
Porav et al. Don’t worry about the weather: Unsupervised condition-dependent domain adaptation
GB2579262A (en) Space-time memory network for locating target object in video content
CN112801068A (en) Video multi-target tracking and segmenting system and method
CN111612825A (en) Image sequence motion occlusion detection method based on optical flow and multi-scale context
Wu et al. Cavit: Contextual alignment vision transformer for video object re-identification
Yang et al. SAM-Net: Semantic probabilistic and attention mechanisms of dynamic objects for self-supervised depth and camera pose estimation in visual odometry applications
CN113256680A (en) High-precision target tracking system based on unsupervised learning
Xi et al. Implicit motion-compensated network for unsupervised video object segmentation
Venator et al. Self-supervised learning of domain-invariant local features for robust visual localization under challenging conditions
Li et al. Collaborative convolution operators for real-time coarse-to-fine tracking
Wang et al. Monocular VO based on deep siamese convolutional neural network
CN115131362A (en) Large-scale point cloud local area feature coding method
CN114882067A (en) Encoder, encoder and decoder framework and multi-target tracking and partitioning method
CN113963021A (en) Single-target tracking method and system based on space-time characteristics and position changes
Li et al. Traffic4d: Single view reconstruction of repetitious activity using longitudinal self-supervision
Wang et al. Cross complementary fusion network for video salient object detection
Li et al. Traffic4d: Single view longitudinal 4d reconstruction of repetitious activity using self-supervised experts
CN118262275B (en) Weak supervision target tracking method based on co-saliency learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210813

RJ01 Rejection of invention patent application after publication