CN114299113A

CN114299113A - Target tracking method and device based on twin network

Info

Publication number: CN114299113A
Application number: CN202111614814.0A
Authority: CN
Inventors: 魏振忠; 蔡雁南; 谈可
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2022-04-08

Abstract

The invention relates to a target tracking method and a target tracking device based on a twin network, wherein the method comprises the following steps: acquiring an image tracked by a target; enhancing and cutting the image to obtain a template graph and a search graph; obtaining a classification score map and a regression score map by using a target tracking network according to the template map and the search map; the target tracking network comprises a feature extraction network, a feature fusion network and a classification regression network which are connected in sequence; the target tracking network utilizes a dynamic sample label distribution method to distribute sample labels in a training stage; and determining a target prediction frame by using the classification score map and the regression score map. The invention improves the accuracy of target positioning under the condition of ensuring the speed of the tracker.

Description

Target tracking method and device based on twin network

Technical Field

The invention relates to the field of target tracking, in particular to a target tracking method and device based on a twin network.

Background

The target tracking refers to the automatic estimation of the motion trajectory of any target in a continuously changing video sequence, provides a basis for further semantic analysis (attitude estimation and scene recognition), has important research significance in civil security and military defense, and is widely applied to the fields of automatic driving, security monitoring, visual navigation, human-computer interaction and the like. The main difficulty of current target tracking is tracking drift caused by unknown motion, appearance deformation and environmental change of the target.

In recent years, twin network based target trackers have met with great success due to their performance and speed balancing advantages. It treats tracking as a template matching problem and breaks the tracking task into classification and regression subtasks. To obtain an accurate target frame, whether a multi-scale search strategy or a preset anchor frame or an anchor frame-free design is adopted, an accurate target position in a response map should be provided firstly. The performance of the classification branch is therefore the basis for the performance of the algorithm. The positive and negative samples are reasonably defined, so that the accuracy of classification branches can be improved, and the performance gap of an anchor frame tracking algorithm can be made up. Existing label allocation methods can be generally divided into fixed label allocation and dynamic label allocation, wherein the fixed label allocation can be further refined into IoU threshold-based and location distribution-based two-class label allocation methods. However, the IoU threshold-based method is greatly influenced by the selection of the threshold, and the position distribution-based method mainly measures the distance between the sample and the target central point, and the method is slightly different in specific implementation but poor in adaptability to different types of sample distribution. Compared with a fixed sample label distribution method, the dynamic sample label distribution method has great potential in a target detection algorithm, and becomes a current research hotspot. But dynamic sample label assignment is not utilized in the field of object tracking.

Disclosure of Invention

The invention aims to provide a target tracking method and device based on a twin network, which can improve the accuracy of target positioning under the condition of ensuring the speed of a tracker.

In order to achieve the purpose, the invention provides the following scheme:

a twin network-based target tracking method comprises the following steps:

acquiring an image tracked by a target;

enhancing and cutting the image to obtain a template graph and a search graph;

obtaining a classification score map and a regression score map by using a target tracking network according to the template map and the search map; the target tracking network comprises a feature extraction network, a feature fusion network and a classification regression network which are connected in sequence; the target tracking network utilizes a dynamic sample label distribution method to distribute sample labels in a training stage;

and determining a target prediction frame by using the classification score map and the regression score map.

Optionally, the training process of the target tracking network specifically includes:

performing label distribution on a search graph and a template graph of a training set by using a dynamic sample label distribution method to obtain a positive sample with a label and a negative sample with the label;

carrying out feature extraction on the search graph and the template graph of the training set by using the feature extraction network to obtain a feature graph;

performing feature fusion on the feature map by using the feature fusion network to obtain a feature fusion map;

inputting the feature fusion graph into the classification regression network to obtain a training set classification score graph and a training regression score graph;

determining a network loss function according to the training set classification score map, the training regression score map, the positive sample with the label and the negative sample with the label;

and training the target tracking network by using a random gradient descent method according to the network loss function to obtain the trained target tracking network.

Optionally, the performing label allocation on the search graph and the template graph of the training set by using a dynamic sample label allocation method to obtain a positive sample with a label and a negative sample with a label specifically includes:

respectively setting an anchor point by taking the search drawing and the template drawing as centers and tiling a plurality of anchor frames on the anchor point;

calculating Euclidean distances between all anchor points and the center point of a real target frame, and selecting a set number of anchor points according to the Euclidean distances;

and determining a positive sample with a label and a negative sample with a label according to the intersection ratio between the set number of anchor points and the real target frame.

Optionally, the determining a positive sample with a label and a negative sample with a label according to the intersection-to-union ratio between the anchor points of the set number and the real target frame specifically includes:

calculating the mean value and standard deviation of all intersection ratios;

summing according to the mean value and the standard deviation to determine a set parameter;

judging whether the intersection ratio of the set anchor frames is greater than or equal to the set parameter or not, wherein the set anchor frames are the anchor frames corresponding to the anchor points with the set number;

if yes, determining that the set anchor frame is a positive sample, and determining that the label of the positive sample is 1;

if not, performing random distribution according to a set proportion to obtain a negative sample and a neglected sample; the label of the negative sample is 0; the label of the ignore sample is-1.

Optionally, the expression of the network loss function is:

wherein,

for the network loss function, w is the length of the classification score map, h is the width of the classification score map, m is the height of the classification score map, y is the label of the sample set,

is a class score map in which (i, j,k) respectively are the abscissa, ordinate and anchor frame sequence of the sample on the classification score map.

Optionally, the enhancing and cutting the image to obtain a template map and a search map specifically includes:

cutting and filling the image to obtain a resampled image;

performing data enhancement on the resampled image to obtain an enhanced image;

and cutting the enhanced image to obtain a template image and a search image.

Optionally, the determining a target prediction frame by using the classification score map and the regression score map specifically includes:

obtaining classification scores of a plurality of anchor frames according to the classification score map;

determining a prediction target frame according to the classification score; the predicted target frame is an anchor frame with the highest classification score;

punishment is carried out on the scale and the aspect ratio according to the deviation amount of the regression score map to obtain a prediction regression frame;

and performing linear processing on the prediction regression frame and the prediction target frame to obtain a target prediction frame.

Optionally, the expression of the classification score is:

s＝(1-λ)*peanlty*s₁+λ*ω

wherein s is the classification score, s₁For the classification score map, peanlty is a scale and aspect ratio penalty term, ω is a hanning window, and λ is a weight coefficient.

A twin network based target tracking device, wherein the twin network based target tracking device applies any one of the above twin network based target tracking methods, and the twin network based target tracking device comprises: the device comprises an image acquisition module, a tracking calculation module, an image processing module and a servo control module;

the image acquisition module is used for acquiring an image tracked by a target; the image processing module is used for training and optimizing a target tracking network according to the image; the tracking calculation module is used for determining a target prediction frame according to the image, determining the miss distance according to the target prediction frame and transmitting the miss distance to the servo control module; the servo control module is used for controlling the image acquisition module to track the target according to the miss distance.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the target tracking method and device based on the twin network, provided by the invention, are used for acquiring an image of target tracking; enhancing and cutting the image to obtain a template graph and a search graph; obtaining a classification score map and a regression score map by using a target tracking network according to the template map and the search map; the target tracking network comprises a feature extraction network, a feature fusion network and a classification regression network which are connected in sequence; the target tracking network utilizes a dynamic sample label distribution method to distribute sample labels in a training stage; and determining a target prediction frame by using the classification score map and the regression score map. The dynamic sample label distribution method only occurs in the training stage and does not perform the image testing stage, so that the accuracy of target positioning is improved under the condition of ensuring the speed of the tracker.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flow chart of a twin network-based target tracking method provided by the present invention;

FIG. 2 is a schematic diagram of a target tracking network structure provided by the present invention;

FIG. 3 is a flow chart of a sample preparation process for a continuous flow system according to the present invention;

FIG. 4 is a schematic diagram of sample label assignment provided by the present invention;

FIG. 5 is a flow chart of the online tracking provided by the present invention;

FIG. 6 is a schematic diagram of a twin network based target tracking device provided by the present invention;

fig. 7 is a schematic diagram of experimental results of a twin network-based target tracking method on an OTB2015 dataset;

fig. 8 is an effect diagram of a twin network based target tracking method.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

As shown in fig. 1, the target tracking method based on the twin network provided by the present invention includes:

step 101: an image of the target tracking is acquired.

Step 102: and enhancing and cutting the image to obtain a template picture and a search picture. The enhancing and cutting the image to obtain a template graph and a search graph specifically comprises the following steps: and cutting and filling the image to obtain a resampled image. And performing data enhancement on the resampled image to obtain an enhanced image. And cutting the enhanced image to obtain a template image and a search image.

Step 103: obtaining a classification score map and a regression score map by using a target tracking network according to the template map and the search map; as shown in fig. 2, the target tracking network includes a feature extraction network, a feature fusion network, and a classification regression network, which are connected in sequence; and the target tracking network performs sample label distribution by using a dynamic sample label distribution method in a training stage. The feature extraction network 101 employs a twin network, i.e., weight sharing. And the backbone network adopts a modified modern network ResNet-50, so that the characteristic graphs output in the last three stages have the same spatial resolution. After multi-layer fusion and template feature graph center clipping, templates and search branches with the sizes of 7 × 256 and 31 × 256 are respectively output.

The feature fusion network 102 further adjusts the spatial resolution of the template and search feature maps to 5 × 256 and 29 × 256 through two 3 × 3 convolutional layers. The two branches were subjected to a deep cross-correlation operation, resulting in a 25 × 256 profile.

The target classification regression network 103 obtains a 25 × 10 classification score map and a 25 × 20 regression score map by convolution of two 1 × 1, and performs target localization and target frame estimation, respectively.

The network model is implemented in python language on a PyTorch framework.

Step 104: and determining a target prediction frame by using the classification score map and the regression score map. The determining a target prediction frame by using the classification score map and the regression score map specifically includes: and obtaining the classification scores of a plurality of anchor frames according to the classification score map. Determining a prediction target frame according to the classification score; the predicted target box is the anchor box with the highest classification score. And carrying out scale and aspect ratio punishment according to the deviation amount of the regression score map to obtain a prediction regression frame. And performing linear processing on the prediction regression frame and the prediction target frame to obtain a target prediction frame. The expression of the classification score is:

s＝(1-λ)*peanlty*s₁+λ*ω

In practical application, the training process of the target tracking network specifically includes:

and performing label distribution on the search graph and the template graph of the training set by using a dynamic sample label distribution method to obtain a positive sample with labels and a negative sample with labels. The method for distributing labels to the search graph and the template graph of the training set by using a dynamic sample label distribution method to obtain a positive sample with labels and a negative sample with labels specifically comprises the following steps: respectively setting an anchor point by taking the search drawing and the template drawing as centers and tiling a plurality of anchor frames on the anchor point; calculating Euclidean distances between all anchor points and the center point of a real target frame, and selecting a set number of anchor points according to the Euclidean distances; and determining a positive sample with a label and a negative sample with a label according to the intersection ratio between the set number of anchor points and the real target frame. The determining a positive sample with a label and a negative sample with a label according to the intersection ratio between the anchor points with the set number and the real target frame specifically comprises: calculating the mean value and standard deviation of all intersection ratios; summing according to the mean value and the standard deviation to determine a set parameter; judging whether the intersection ratio of the set anchor frames is greater than or equal to the set parameter or not, wherein the set anchor frames are the anchor frames corresponding to the anchor points with the set number; if yes, determining that the set anchor frame is a positive sample, and determining that the label of the positive sample is 1; if not, performing random distribution according to a set proportion to obtain a negative sample and a neglected sample; the label of the negative sample is 0; the label of the ignore sample is-1.

And carrying out feature extraction on the search graph and the template graph of the training set by using the feature extraction network to obtain a feature graph.

And performing feature fusion on the feature graph by using the feature fusion network to obtain a feature fusion graph.

And inputting the feature fusion graph into the classification regression network to obtain a training set classification score graph and a training regression score graph.

And determining a network loss function according to the training set classification score map, the training regression score map, the positive sample with the label and the negative sample with the label.

And training the target tracking network by using a random gradient descent method according to the network loss function to obtain the trained target tracking network. The expression of the network loss function is:

wherein,

the classification score map is shown, wherein (i, j, k) is respectively the abscissa, ordinate and anchor frame sequence of the sample on the classification score map.

Training data preparation:

(1) as shown in fig. 3, in the labeled target tracking public training dataset, two images separated by N frames in a video sequence are selected, and the real target frame in the two images can be represented as (cx, cy, w, h), where (cx, cy) is the coordinate of the central point of the rectangular frame, and w and h are the width and height of the rectangular frame.

(2) Centered on (cx, cy), to

And (3) for cutting the side length of the image, if the side length exceeds the size of the original image, performing boundary filling by using an RGB mean value, and finally resampling the image to 511 × 511, wherein the central point of a real target frame in the image is also the middle point of the image.

(3) In order to increase the diversity of training samples and avoid center position bias, a series of data enhancement operations such as translation transformation, turnover change, scale change and the like are carried out on the images, so that overfitting is prevented, and the network model is more robust.

(4) And cutting the two images with the enhanced data into a template graph with the resolution of 127 × 127 and a search graph with the resolution of 255 × 255 by taking (255 ) as a central point. The preprocessed template-search image pair can be directly sent to a network for training.

Dynamic sample label assignment

(1) As shown in fig. 4, in order to estimate the size of the real target, a preprocessed 255 × 3 search image is centered at (127 ), one anchor point is set every 8 steps, 25 × 25 anchor points are set altogether, several anchor frames with different aspect ratios are tiled on each anchor point, the aspect ratio is generally set to {1:3,1:2,1:1,2:1,3:1}, and the total number of anchor frames is 25 × 5, where the distribution of the anchor frames and anchor points can refer to fig. 4.

(2) Each anchor box is referred to as a sample, and each sample is then tagged according to dynamic sample tag assignment rules. And (3) allocating rules for the dynamic classification sample labels.

(3) And calculating Euclidean distances between all anchor points and the central point of the real target frame, and selecting K anchor points with the closest distances.

(4) Tiling 5 anchor boxes on each anchor point, calculating IoU (intersection ratio) between the selected 5K anchor boxes and the real target box, and calculating the mean M of all IoU_gSum standard deviation V_gThe IoU threshold for measuring the statistical properties of the target is the sum of the mean and standard deviation T_g＝M_g+V_g。

(5) For the selected 5K anchor frames, when its IoU is greater than or equal to T_gThen, the anchor box is defined as a positive sample, with a label of 1.

(6) All anchor frames except the positive sample, as per 1: the scale of 4 is randomly assigned as negative and ignore samples, labeled 0 and-1, respectively.

(7) The regression sample label distribution rule is consistent with the regional candidate network, namely the space transformation relation between the positive sample and the real target frame.

Network parameter optimization

The net penalty is a weighted sum of the classification penalty and the regression penalty. Wherein the classification loss is two-classification cross entropy loss between the classification score map and the classification label of the network output:

wherein

The method comprises the following steps that a classification score chart output by a network is obtained, wherein w, h and m are respectively expressed as the length and the width of the classification score chart and the number of anchor frames on each point; y represents a label of the sample set; wherein (i, j, k) is the abscissa, ordinate and anchor frame sequence of the sample on the classification score map, the positive sample is 1, the negative sample is 0, and the neglected sample (not participating in calculation) is-1. The regression loss is the smooth L1 loss between the regression score map of the network output and the regression label.

The model of the network is updated by a random gradient descent method with a learning rate warm-up training strategy, an end-to-end offline training mode is adopted on public data sets such as VOT, LaSOT and GOT10K, 800000 template-search image pairs are trained in each period, and 15-20 periods are trained in total. Thus, the training phase of the network model is completed.

On-line tracking

As shown in fig. 5:

(1) loading model parameters

And loading the trained network model parameters in the network parameter optimization, sending the processed images into the network model and keeping the network model parameters unchanged. The feature map output by the template image after passing through the feature extraction network 101 does not need to be updated in the tracking process.

(2) Tracking image pre-processing

Given prior information, namely a first frame of a video sequence to be tracked and a real target frame thereof, cutting an image with a certain size from the first frame image by taking the center of the real target frame as the center, processing the image into a 127 × 127 template image after resampling, and processing the image into a 255 × 255 search image from a second frame and each frame after the second frame by taking a predicted target frame of the previous frame as the center after clipping and resampling.

(3) Predictive tracking box

Classification score maps 25 × 10 and 25 × 20 regression score maps from the outputs of the target classification regression network 103. The classification score map includes the prediction score for each anchor box. In target tracking, it is often assumed that the size and position of a target in adjacent frames do not change much, so that two penalty terms of scale and aspect ratio are added to suppress size mutation, and a hanning window is used to suppress large displacement. After adding the space-time constraint condition, the final classification of each anchor frame is as follows:

s＝(1-λ)*peanlty*s₁+λ*ω

wherein s is₁For the classification score of 103 network outputs, peanlty is the scale and aspect ratio penalty term, ω is the hanning window, and λ is the weighting factor. The anchor box with the highest final score is the predicted anchor box. And finally, in order to smooth the tracking result, performing linear processing on the prediction regression frame and the target prediction frame of the previous frame to obtain a final target prediction frame, and updating the target state.

As shown in fig. 6, the twin network-based target tracking device according to the present invention applies the twin network-based target tracking method, and includes: the device comprises an image acquisition module, a tracking calculation module, an image processing module and a servo control module.

The main hardware of the image acquisition module is an industrial camera and a visible light zoom lens, and the image acquisition module is mainly used for acquiring images in a field of view in real time, storing historical images and transmitting the acquired images to the image processing module and the tracking calculation module in a conversion mode. The image processing module may train and fine tune model parameters of the tracking algorithm using the stored public data set and the stored historical images. After an initial frame target frame is given by a manual or detection algorithm, the tracking calculation module determines the position of the target frame in a subsequent frame image by using the tracking algorithm, calculates the deviation between the center of the target frame and the central point of a view field, namely the miss distance, and transmits the deviation to the servo control module. The servo control module can control the image acquisition module to follow the azimuth and pitching motion of the target according to the miss distance, so that the target is kept in the central area of the image, and continuous tracking is realized. And adjusting the focal length of the lens according to the proportion of the target frame in the image, so that the size of the target in the shot image is kept constant.

The invention provides a method for automatically selecting positive and negative samples according to target statistical characteristics. As only the training process of the model is changed without influencing the testing process, the tracking speed can keep the speed of the original algorithm not to be reduced, the real-time requirement of the engineering landing is met, and the performance is further improved without additional overhead. The method consists of two parts, namely network structure design and dynamic label distribution. The network structure design part comprises the steps of sending a pair of template graphs and search graphs of fixed sizes after pretreatment to a ResNet-50 backbone network with shared weights for feature extraction, carrying out feature fusion on template branches and search branches output by a twin network through deep cross-correlation operation, and finally synchronously outputting a classification score graph and a regression score graph through a regional candidate network. The dynamic label distribution part comprises the steps of tiling a certain number of anchor frames on a search image of a known real regression frame, selecting a plurality of anchor frames with the Euclidean distance between the center point of the anchor frame and the center point of the real regression frame to participate in calculation, statistically processing the intersection ratio of the selected anchor frames and the real regression frame to obtain a dynamic threshold value, marking the anchor frames with the intersection ratio of the selected anchor frames and the real regression frame larger than the threshold value on the search image as positive samples, and randomly distributing the rest anchor frames as negative samples or neglected samples according to a certain proportion. The labels for positive, negative, and ignore samples are set to 1, 0, and-1, respectively.

The target tracker adopts an end-to-end large-scale image off-line training mode, and the total loss of the model is the sum of cross entropy loss between a classification score map and positive and negative samples output by the network and smooth L1 loss between a regression score map and the positive and negative samples. And optimizing the parameters of the network model by using a random gradient descent method. The dynamic label distribution only occurs in the training stage and does not affect the testing stage, so that the tracker precision is improved while the tracker speed is not reduced.

OTB2015 is a reference set for evaluating performance of a target tracking algorithm proposed by wu-yi et al on CVPR2013, includes 100 video sequences with manually labeled tracking target positions, sets up common difficulties in 11 tracking processes such as illumination change, scale change, occlusion and the like, and is widely used in the field of target tracking. The OTB method uses a Success rate map (Success plot) based on IoU and a precision map (precision plot) based on center position error to evaluate tracking algorithm performance. IoU between the predicted target frame and the real target frame of the tracker on all frames is calculated, when IoU is greater than a certain threshold, the tracker is considered to successfully predict the target in the frame, and the success rate graph is the ratio of the number of tracking success frames under different thresholds to the total number of frames. The precision graph refers to the proportion of the number of frames of which the number of pixels of the deviation between the central points of the predicted target frame and the real target frame is smaller than a given threshold value to the total number of frames. AUC is the area under the success rate graph curve and is an important parameter for representing the accuracy of the algorithm.

The performance test result of the algorithm under the single evaluation criterion (OPE) is shown in fig. 7, in which fig. 7(a) is a success rate graph and fig. 7(b) is an accuracy graph. By comparing the current mainstream target trackers SiamRCNN, SiamCAR, DaSiamRPN, ECO _ HC, DiMP, ATOM, it can be seen that the AUC of the invention is 70.6%, and the algorithm precision is 92%, indicating that the accuracy of the invention is higher than that of other algorithms. The tracking effect diagram of the present invention is shown in fig. 8. Wherein fig. 8(a) (b) (c) (d) (e) shows the tracking effect of the algorithm on Bird1, Human2, liqor, twinning, Trans video sequences in OTB100 data set, respectively, and the numbers in the upper left corner of each picture represent the frame number sequence of the video sequences. The speed of the invention reaches 70fps under NVIDIA RTX 2080ti GPU, and the real-time requirement is met.

The high performance of the algorithm mainly comes from the following two aspects: 1) in the training stage, the adaptive positive and negative sample selection method based on the target statistical characteristics reduces the influence of artificially set hyper-parameters in label distribution on the classification branch result of the tracker, and improves the accuracy of the classification branch target positioning; 2) in the testing stage, the off-line trained network model is adopted, and model parameters are not updated in the tracking process, so that the speed of the tracker is not influenced.

The article Bo, L., et al, high Performance Visual Tracking with parameter area Region Proposal network in 2018 IEEE/CVF Conference on Computer Vision and Pattern Registration (CVPR).2018. However, because only the non-filled AlexNet network is adopted, the network hierarchy is too shallow, and the performance of the algorithm is limited to a certain extent.

The article Li, B., et al, SimRPN + +: Evolution of silicon Visual Tracking With Very Deep networks.in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).2020. by improving the sampling mode, deeper modern networks can be applied to target Tracking. The cross-correlation mode of the upgrading template and the searching branch is lightweight depth-wise cross-correlation, so that the parameter quantity is reduced, the training is more stable, and the performance of the algorithm is further improved and accelerated. But still adopt IoU-based fixed label assignment methods, the associated hyper-parameters have a significant impact on tracker performance.

The article Guo, D.E., et al.SiamCAR: Simense full volumetric Classification and Regression for Visual tracking.in 2020 IEEE/CVF Conference on Computer Vision and Pattern Registration (CVPR).2020. to further reduce a large number of hyper-parameters brought by presetting anchor frames in a regional candidate network and reduce the interference of artificial references, a tracker without anchor frames is provided, a target tracking task is decomposed into pixel-by-pixel prediction and boundary frame Regression, and the framework is simple and effective. But it still uses a fixed tag assignment method based on distance, affecting the performance of the tracker.

The article Zhang, S., et al, Bridging the Gap Between Anchor-Based and Anchor-Free Detection view Adaptive sampling selection.in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).2020. theoretical plus experiment analyzes the nature of how to define the difference Between the performance of two types of target detectors, namely Anchor frame-Based and Anchor frame-Free, and proposes an Adaptive positive and negative Sample selection method Based on statistical properties. However, due to the difference between the target tracking algorithm and the target detection algorithm, the requirement of the target tracking algorithm can be met only after the method is adjusted and designed.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A target tracking method based on a twin network is characterized by comprising the following steps:

acquiring an image tracked by a target;

enhancing and cutting the image to obtain a template graph and a search graph;

2. The twin network-based target tracking method according to claim 1, wherein the training process of the target tracking network specifically includes:

3. The twin network-based target tracking method according to claim 2, wherein the label assignment is performed on the search graph and the template graph of the training set by using a dynamic sample label assignment method to obtain a positive sample with a label and a negative sample with a label, and specifically comprises:

4. The twin network-based target tracking method according to claim 3, wherein the determining the labeled positive sample and the labeled negative sample according to the intersection ratio between the set number of anchor points and the real target frame specifically comprises:

calculating the mean value and standard deviation of all intersection ratios;

5. The twin network based target tracking method of claim 2, wherein the expression of the network loss function is:

wherein,

6. The twin network-based target tracking method according to claim 1, wherein the enhancing and cropping the image to obtain a template map and a search map specifically comprises:

cutting and filling the image to obtain a resampled image;

performing data enhancement on the resampled image to obtain an enhanced image;

and cutting the enhanced image to obtain a template image and a search image.

7. The twin network-based target tracking method according to claim 1, wherein the determining a target prediction box by using the classification score map and the regression score map specifically comprises:

8. The twin network based target tracking method of claim 7, wherein the classification score is expressed by:

s＝(1-λ)*peanlty*s₁+λ*ω

9. A twin network based target tracking device, characterized in that the twin network based target tracking device applies the twin network based target tracking method of any one of claims 1 to 8, the twin network based target tracking device comprising: the device comprises an image acquisition module, a tracking calculation module, an image processing module and a servo control module;