CN117152206A - Multi-target long-term tracking method for unmanned aerial vehicle - Google Patents

Multi-target long-term tracking method for unmanned aerial vehicle Download PDF

Info

Publication number
CN117152206A
CN117152206A CN202311070990.1A CN202311070990A CN117152206A CN 117152206 A CN117152206 A CN 117152206A CN 202311070990 A CN202311070990 A CN 202311070990A CN 117152206 A CN117152206 A CN 117152206A
Authority
CN
China
Prior art keywords
target
frame
detection
apparent
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311070990.1A
Other languages
Chinese (zh)
Inventor
李成哲
殷艳坤
赵凯敏
陈萧冰
侯静静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Ordnance Equipment Group Ordnance Equipment Research Institute
Original Assignee
China Ordnance Equipment Group Ordnance Equipment Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Ordnance Equipment Group Ordnance Equipment Research Institute filed Critical China Ordnance Equipment Group Ordnance Equipment Research Institute
Priority to CN202311070990.1A priority Critical patent/CN117152206A/en
Publication of CN117152206A publication Critical patent/CN117152206A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a multi-target long-term tracking method of an unmanned aerial vehicle, which comprises the following steps: the detection module identifies target detection frame images and target relative position information of all monitoring targets from the environment image information to obtain a target detection frame image sequence and a target relative position information sequence; the cross-mirror tracking module extracts the multidimensional apparent characteristics of each target detection frame image; and for each frame in the image sequence of the target detection block, after the anti-shake operation is carried out on the frame, the frame is divided into a high-resolution target frame and a low-resolution target frame according to the confidence level, a fusion loss matrix is obtained based on the multidimensional apparent characteristics and the relative position information of the target, and then the track matching is carried out through a Hungary algorithm so as to finish the target tracking. According to the application, the lens anti-shake strategy is fused in the matching process, and the cross-mirror tracking module is introduced, so that the problem of error tracking when the unmanned aerial vehicle shakes is solved, and the problem of newly increased ID caused by rotation of the camera is solved.

Description

Multi-target long-term tracking method for unmanned aerial vehicle
Technical Field
The application belongs to the technical field of target tracking, and particularly relates to a multi-target long-term tracking method of an unmanned aerial vehicle.
Background
Multi-target tracking has been developed rapidly in the last decade, and typical methods in early stages include MeanShift (mean shift) and particle filtering, but the overall accuracy is low and single-target tracking is the main method. As the accuracy of detecting targets continues to improve, a detection-based tracking framework appears, like SORT, deepSORT, BYTETrack, boTSORT, which is a typical algorithm.
The tracking framework flow based on detection is mainly divided into two parts of detection and matching, but due to the flexibility of the framework, the detection part can be continuously updated along with the development of a target detection algorithm, and the accuracy of the subsequent matching process is affected by the detection result. The SORT algorithm is a classic algorithm initially started in detection tracking, the core of the matching process is mainly the combination of Kalman filtering and Hungary algorithm, the main process is to firstly carry out Kalman filtering prediction on a detection target frame of the previous frame at the position of a current frame, and then carry out optimal matching between the current detection target frame and a prediction target frame by using Hungary matching algorithm. The deep SORT algorithm is an improved version of SORT, and mainly increases the apparent characteristics of a CNN network extracted target frame, increases the robustness to shielding, and proposes a cascade matching strategy to realize the association measurement between a real target frame and a predicted target frame by combining motion information and appearance information. The BYTETrack algorithm provides a new data association method, which is different from the SORT algorithm, the algorithm considers a low confidence detection target frame generated due to shielding, the algorithm utilizes the similarity between the detection frame and the track, removes the background from the low score result while maintaining the high score detection result, and mines a real object, so that the track consistency is improved, firstly, the high score frame is matched with the previous track, then the low score frame is matched with the high score track frame which is not successfully matched for the first time, and the detection frame with high score is used for constructing a new tracking track. The botport algorithm is a more robust tracker, matching the overall flow is substantially consistent with the BYTETrack, but it incorporates a ReID module and camera motion compensation approach.
In the existing multi-target tracking technical scheme, like SORT, BYTETrack, the position information is only used for matching, the premise of accurate matching is that the background of the front frame and the back frame cannot be changed greatly, the moving speed of the target is relatively slow, if the premise is not met, the problem of serious ID Switch (identity conversion) exists in the matching result, even short-term continuous tracking of a certain target cannot be realized, when the camera is subjected to shaking and rotation, the situation of the head of the camera is almost impossible to continuously track by means of the position information, while DeepSORT, boTSORT is added with ReID (cross-mirror tracking), the object under the monitoring lens is re-identified and identified, the ID Switch situation is relieved to a certain extent, but when the shielding time of the target is long or the rotation amplitude of the camera is large, the position information added in the matching process cannot play a role, and the same target cannot be associated.
Disclosure of Invention
In order to solve the technical problems, the application provides a multi-target long-term tracking method of an unmanned aerial vehicle.
The first aspect of the application discloses a multi-target long-term tracking method of an unmanned aerial vehicle; the method is characterized in that a plurality of targets acquired by the unmanned aerial vehicle are positioned, tracked and monitored based on a tracking system, wherein the tracking system comprises a detection module, a cross-mirror tracking module and a matching module;
the method comprises the following steps:
s1, the detection module identifies target detection frame images and target relative position information of all monitoring targets from environment image information to obtain a target detection frame image sequence and a target relative position information sequence;
s2, the cross-mirror tracking module extracts multidimensional apparent features of each target detection frame image;
s3, for each frame in the target detection frame image sequence, after the matching module carries out anti-shake operation on the frame, the frame is divided into a high-resolution target frame and a low-resolution target frame according to the confidence level, a fusion loss matrix is obtained based on the multi-dimensional apparent characteristics and the target relative position information, and then the scored target frame is subjected to track matching through a Hungary algorithm, so that target tracking is completed.
According to the method of the first aspect of the present application, in the step S3, the step of performing the anti-shake operation includes:
and acquiring a homography transformation matrix based on the target detection frame image of the current frame and the target detection frame image of the previous frame, and further obtaining the perspective transformed current detection target frame according to the product of the target detection frame image of the previous frame and the homography transformation matrix.
According to the method of the first aspect of the present application, the step of obtaining the homography transformation matrix based on the target detection frame image of the current frame and the target detection frame image of the previous frame includes:
respectively extracting feature points of a target detection frame image of a current frame and a target detection frame image of a previous frame by using an ORB operator, and describing a corresponding descriptor of each feature point;
matching the descriptors of the feature points to obtain matched feature point pairs;
after carrying out abnormal matching filtering on the characteristic point pairs, obtaining correct characteristic point pairs;
and selecting the optimal 8 feature point pairs based on a random sampling consistency algorithm, and calculating the homography transformation matrix through the 8 feature point pairs.
According to the method of the first aspect of the present application, in the step S3, the step of obtaining a fusion loss matrix based on the multi-dimensional apparent features and the target relative position information includes:
based on the position of a last frame detection target frame in a Kalman filtering prediction track pool in the current detection target frame, obtaining a prediction target frame;
modeling the current detection target frame and the prediction target frame into 2D Gaussian distribution based on the target relative position information;
calculating the distance between the current detection target frame and Gaussian distribution corresponding to the prediction target frame;
the calculation formula of the distance between the Gaussian distributions is as follows:
wherein,representing the distance between the gaussian distribution between the current detection target frame a and the prediction target frame b,/and->Gaussian distribution indicating the current detection target frame a, < ->Gaussian distribution representing predicted target box b, (-)>,/>) Represents the coordinates of the center point of the current detection target frame a, < >>Represents the width of the current detection target frame a, +.>Representing the height of the current detection target frame a, (-)>,/>) Represents the center point coordinates of the predicted target frame b, < >>Representing the width of the predicted target frame b, +.>Representing the height of the predicted target frame b;
normalizing the distance between the Gaussian distributions to obtain the position similarity between the current detection target frame and the prediction target frame;
the calculation formula of the position similarity is as follows:
wherein,representing the similarity between the current detection target frame a and the prediction target frame b, and C represents the data set correlation constant;
calculating an NWD loss matrix according to the position similarity;
the NWD loss matrix formula is:
calculating the similarity between the multi-dimensional apparent features corresponding to the current detection target frame and the prediction target frame by adopting cos similarity;
the calculation formula of the similarity between the multidimensional apparent features is as follows:
wherein,indicating the apparent characteristics of the ith target box, < >>Indicating the apparent characteristics of the jth target box, < >>A value representing the kth dimension of the ith target,/->A value representing the kth dimension of the jth target,/->A dimension representing the apparent feature;
normalizing the similarity between the multidimensional apparent features to obtain an apparent feature loss matrix;
the apparent characteristic loss matrix is:
obtaining a fusion loss matrix according to the NWD loss matrix and the apparent characteristic loss matrix;
the fusion loss matrix is as follows:
wherein,a threshold value representing the position distance, currently taking the value 0.3 +.>A threshold representing an apparent characteristic.
According to the method of the first aspect of the application, the track pool matched with the Gao Fenmu frame is a track which can be correlated currently, and the track pool matched with the low-resolution frame is a track which is not successfully matched with the Gao Fenmu frame.
According to the method of the first aspect of the present application, step S3 further comprises:
and calculating a cosine loss matrix by utilizing the target apparent features which are not matched in high scores and the target apparent features in the deletion track, and carrying out association matching by utilizing a Hungary algorithm based on the cosine loss matrix.
According to the method of the first aspect of the application, the tracks successfully matched are updated;
the updating is specifically performed by the following formula:
wherein,representation ofThe weight parameter is currently valued at 0.9, the fea is the apparent characteristic of the target of the previous frame, and cur_fea is the apparent characteristic of the current target.
According to the method of the first aspect of the application, step S2 comprises:
performing image preprocessing on each target detection frame image;
capturing a feature map [1,2048,16,8] of each preprocessed target detection frame image by adopting the Restnet50 as a backbone network;
and adopting an aggregation network to perform the dimension reduction and normalization on the characteristic map [1,2048,16,8] to obtain the multidimensional apparent characteristic [1,2048,1,1].
The second aspect of the application discloses an electronic device. The electronic device comprises a memory storing a computer program and a processor which, when executing the computer program, implements the steps in a multi-target long-term tracking method for a drone of any one of the first aspects of the present disclosure.
A third aspect of the application discloses a computer-readable storage medium. A computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps in a multi-target long-term tracking method for a drone of any one of the first aspects of the present disclosure.
In summary, the scheme provided by the application has the following technical effects: according to the application, the camera anti-shake strategy is integrated in the matching process, the problem of error tracking when the unmanned aerial vehicle shakes due to weather influence is solved, the ReID model is introduced, the matching post-processing strategy is added, the problem that the same person disappears and the ID is newly increased when the camera is reproduced when the camera rotates is solved, and further the multi-target long-term stable target tracking under the view angle of the unmanned aerial vehicle is realized.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a multi-target long-term tracking method for a unmanned aerial vehicle according to an embodiment of the present application;
fig. 2 is a block diagram of a multi-target long-term tracking system for a drone according to an embodiment of the present application;
fig. 3 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Along with the development of unmanned aerial vehicle technology, unmanned aerial vehicle application scenes in the military field are more and more, the basic ability construction when unmanned aerial vehicle is used is tracked to the target, and long-term stable tracking target is one of the research difficulties. The difference between multi-target tracking and other visual angle target tracking of the visual angle of the unmanned aerial vehicle is mainly expressed in the following steps: 1) The targets being mostly small targets, generally smaller thanA pixel; 2) The camera has poor stability when being seriously affected by weather and in abnormal weather such as strong wind; 3) When the camera rotates, the same object is more likely to undergo an ID Switch (identity Switch) when reproduced after disappearing, and these factors are the main factors causing a failure to track the object stably for a long period of time. Therefore, in order to realize the long-term stable tracking of the target of the unmanned aerial vehicle, we propose a multi-target long-term track tracking method. The method provided by the application can solve the technical problems.
Referring to fig. 1, the first aspect of the present application discloses a multi-target long-term tracking method for an unmanned aerial vehicle; the method is based on a tracking system to locate, track and monitor various targets acquired by the unmanned aerial vehicle. Referring to fig. 2, the tracking system includes a detection module, a cross-mirror tracking module, and a matching module;
the method comprises the following steps:
s1, the detection module identifies target detection frame images and target relative position information of all monitoring targets from environment image information to obtain a target detection frame image sequence and a target relative position information sequence;
the detection module is an early basis of a tracking algorithm, the tracked input is the output of the detection module, and the accuracy of the detection module also has a certain influence on the tracking effect. The current detection model uses a YOLOX algorithm, and the input end uses technologies of more mainstream including mosaics, mixUp, random overturn and the like, and the backbone network is a Darknet53 model, and an SPP layer is added on the basis. The training data set is formed by combining self-labeling data and open source data in a mixed mode, wherein the self-labeling data comprises 10w+ images, the open source data extracts images of people from COCO data, and 5000 images of a COCO2017 test set are used for verification.
Assume thatRepresenting the kth image, the detection module detects N targets of the kth frame, and the output result is represented as
Wherein the method comprises the steps ofRepresenting the center point of the target frame, width represents the width of the target frame, and height represents the height of the target frame.
S2, the cross-mirror tracking module extracts multidimensional apparent features of each target detection frame image;
the cross-mirror tracking (ReID) module is mainly divided into image preprocessing, a backbone network, an aggregation network, a Head layer and a Loss, and the sub-method of each layer of module has various implementation modes, so that the application is not particularly limited. The step S2 comprises the following steps:
performing image preprocessing on each target detection frame image through image preprocessing;
capturing a feature map [1,2048,16,8] of each preprocessed target detection frame image by adopting the Restnet50 as a backbone network;
and adopting an aggregation network to perform the dimension reduction and normalization on the characteristic map [1,2048,16,8] to obtain the multidimensional apparent characteristic [1,2048,1,1].
The ReID is input as the detected target image, and can be expressed as
The image preprocessing is one of common enhancement strategies in deep learning, and the method mainly comprises the steps of restoration, random overturn, random erasure, random patch and cutoff in the preprocessing process, and the robustness and the generalization capability of the model are enhanced through the cleaning, standardization and expansion of data.
The size of the image input influences the size of the feature map, a large image can enable a model to learn clearer high-dimensional features, the size of the image is 384x128 in a unified mode, the size is a fixed size, and the target features are unified to 2048 dimensions in relation to the ReID model backup using the resnet 50.
And randomly turning the picture, wherein the random turning comprises random horizontal direction turning and random vertical direction turning. The random erasing is to remove a rectangle block with a predefined size range from the original images of different epochs during training, randomly generate a rectangle block with the height and width of 1/6,1/2 of the original image in the input image, and erase, namely, set the RGB value in the random rectangle block range as a random value.
The operation of random patch and previous random erasure is somewhat similar, except that the random rectangular block RGB values of an image are obtained from other images. The Cutout operation sets the removed location to 0, 0.
ReID model backbone network: the backbone network uses classical Resnet50, taking R into accountThe esNet50 is added with a residual structure, gradient disappearance can not occur when the network structure is deepened continuously, the network structure comprises 4 blocks, in order to reduce feature dimension, reduce calculation amount of a model and increase nonlinear expression capability of the model, each Block is respectively added with different numbers of bottleneck and Bott, and each bottleneck structure is thatConvolution followed by->Convolution, finallyConvolution retains enough representation power to capture detailed information of an image.
In order to further improve the diversity and robustness of the features, an aggregation network is added in the backbone network and the head layer, and the feature map [1,2048,16,8] generated by the Restnet50 is output as [1,2048,1,1] after the pooling layer.
The Head layer generates a feature with relatively large dimension, and the feature needs to be subjected to dimension reduction and normalization, and comprises a BN layer and a decision layer.
The Loss layer uses a cross entropy Loss function. The cross entropy loss function is a loss function commonly used for classifying problems, each target is considered as a class in the ReID model, and the class probability of the target is output.
In order to increase the robustness of the model, self-acquisition and multi-source data integration are performed on the data, the training data sets are from marks 1501, dukeMTMC, MSMT17 and self-labeling data, and the category number of the data reaches 10903. The self-labeling data are obtained by labeling targets through a professional labeling team according to the video data acquired by matching the real scene with the unmanned aerial vehicle, and deleting and filtering pedestrians with the frame number smaller than 100 frames.
ReID produces an apparent feature of 2048 dimensions for each target, and the apparent features of all targets of the kth frame can be expressed as
S3, for each frame in the target detection frame image sequence, after the matching module carries out anti-shake operation on the frame, the frame is divided into a high-resolution target frame and a low-resolution target frame according to the confidence level, a fusion loss matrix is obtained based on the multi-dimensional apparent characteristics and the target relative position information, and then the scored target frame is subjected to track matching through a Hungary algorithm, so that target tracking is completed.
In the step S3, the step of performing the anti-shake operation includes:
and acquiring a homography transformation matrix based on the target detection frame image of the current frame and the target detection frame image of the previous frame, and further obtaining the perspective transformed current detection target frame according to the product of the target detection frame image of the previous frame and the homography transformation matrix.
According to the method of the first aspect of the present application, the step of obtaining the homography transformation matrix based on the target detection frame image of the current frame and the target detection frame image of the previous frame includes:
respectively extracting feature points of a target detection frame image of a current frame and a target detection frame image of a previous frame by using an ORB operator, and describing a corresponding descriptor of each feature point;
matching the descriptors of the feature points to obtain matched feature point pairs;
after carrying out abnormal matching filtering on the characteristic point pairs, obtaining correct characteristic point pairs;
and selecting the optimal 8 feature point pairs based on a random sampling consistency algorithm, and calculating the homography transformation matrix through the 8 feature point pairs.
When the unmanned aerial vehicle is in the air, the unmanned aerial vehicle is more easily affected by weather, and when the wind power is large, the lens shake is serious, and this can lead to the ID to be in wrong matching. To reduce IDSwitch due to such reasons, anti-shake procedures have been added. The whole anti-shake core point is to search 8 correct characteristic points of the current frame and the previous frame, so that matching parallelism is added. The object detection block image feature descriptors and feature points of the previous frame are respectively expressed asThe feature descriptor and feature point of the object detection frame image of the current frame are respectively expressed as +.>
In order to reduce the possibility of incorrect matching, abnormal matching filtering is carried out on the matching characteristic points by calculating the parallelism between the matching characteristic points and the virtual line segments, most of the characteristic points can be normally matched, parallel relations exist, and when few matching points are incorrect, the parallel relations disappear, so that an included angle matrix between the matching point pairs is calculated; when the number of parallel relations between a certain point pair and other point pairs is less than 50% of all the point pairs, the point pair is deleted.
According to the method of the first aspect of the present application, in the step S3, the step of obtaining a fusion loss matrix based on the multi-dimensional apparent features and the target relative position information includes:
based on the position of a last frame detection target frame in a Kalman filtering prediction track pool in the current detection target frame, obtaining a prediction target frame;
modeling the current detection target frame and the prediction target frame into 2D Gaussian distribution based on the target relative position information;
for small objects there will always be some background pixels in the object box, as it is not possible for a real object to be exactly a rectangle. In the target frame, foreground pixels are typically centered and background pixels are typically centered on the edges, with the importance of the pixels decreasing from center to boundary. To better weight each pixel in the target frame, the target frame may be modeled as a 2D gaussian distribution. Specifically, for a horizontal target box, the inscribed ellipse can be expressed as:
(is the center point of ellipse, ">Is the radius of the x-axis and the y-axis, respectively corresponding to the target frame, />Representing the center point of the target frame,/-, and>representing the width of the target frame +.>Representing the target frame height.
This ellipse is one distribution profile of a 2D gaussian distribution. Thus, the target box may be modeled as a 2D gaussian distribution, and the similarity between two target boxes may be represented by the distance between the two gaussian distributions.
Calculating the distance between the current detection target frame and Gaussian distribution corresponding to the prediction target frame;
the calculation formula of the distance between the Gaussian distributions is as follows:
wherein,representing the distance between the gaussian distribution between the current detection target frame a and the prediction target frame b,/and->Gaussian distribution indicating the current detection target frame a, < ->Gaussian distribution representing predicted target box b, (-)>,/>) Represents the coordinates of the center point of the current detection target frame a, < >>Represents the width of the current detection target frame a, +.>Representing the height of the current detection target frame a, (-)>,/>) Represents the center point coordinates of the predicted target frame b, < >>Representing the width of the predicted target frame b, +.>Representing the height of the predicted target frame b;
normalizing the distance between the Gaussian distributions to obtain the position similarity between the current detection target frame and the prediction target frame;
the calculation formula of the position similarity is as follows:
wherein,representing the similarity between the current detection target frame a and the prediction target frame b, and C represents the data set correlation constant;
calculating an NWD loss matrix according to the position similarity;
the NWD loss matrix formula is:
calculating the similarity between the multi-dimensional apparent features corresponding to the current detection target frame and the prediction target frame by adopting cos similarity;
the calculation formula of the similarity between the multidimensional apparent features is as follows:
wherein,indicating the apparent characteristics of the ith target box, < >>Representing the apparent characteristics of the jth target box,a value representing the kth dimension of the ith target,/->A value representing the kth dimension of the jth target,/->A dimension representing the apparent feature;
normalizing the similarity between the multidimensional apparent features to obtain an apparent feature loss matrix;
the apparent characteristic loss matrix is:
obtaining a fusion loss matrix according to the NWD loss matrix and the apparent characteristic loss matrix;
the fusion loss matrix is as follows:
wherein,representing distance between positionsThe threshold value of the ion, currently taking on the value 0.3, < ->A threshold representing an apparent characteristic.
The current calculation method of the position loss of the small target in the tracking algorithm is based on the measurement method of IoU, and IoU is very sensitive to the change of the position of the small target. The position information contributes almost equal to 0 to the tracking process matching when the drone is dithered or the lens is rotated. Considering scene requirements, the current position loss IOU method is not suitable for the unmanned airport scene, because the distance between targets becomes zero due to small lens shake, and the position information between the targets cannot be accurately measured, and in order to reduce the influence, normalized Gaussian Wasserstein Distance is adopted for calculating the similarity between the current target frame and the target frame of the previous frame.
According to the first aspect of the application, the track pool matched with the Gao Fenmu frame is a track which can be correlated currently, and the track pool matched with the low-resolution frame is a track which is not successfully matched with the Gao Fenmu frame.
Specifically, high-resolution target matching:
the high-score matching process is shown in part 2-1 of FIG. 2, and the target frame with the detection result confidence degree of >0.7 is defined as a high-score target frame
The track is a target frame formed by the same targets in a series, the track pool is a track which can be correlated at present and mainly comprises an activated track and a lost track, and the position of the target frame of the last frame in the track pool in the current frame is predicted based on Kalman filtering, namely the predicted target frame->Calculate the current frame target frame +.>And forecast target frame->Fusion loss matrix->And performing association matching based on a loss matrix Hungary algorithm, defining the target as a lost track when the target frame cannot be associated with any track in the track pool, namely the association fails, and defining the target as a deleted track when the association fails for more than 30 frames when the association fails for 30 continuous frames. Gao Fenmu the matching output result is the matched high-score target frame +.>And unmatched high-resolution target frame->The track pool track is divided into a matched track +.>And unmatched track->
Low-score target matching: the low-resolution target frame flow is shown in the section 2-2 in figure 2, and the confidence is output by the detection result<A target frame of 0.7 is defined as a low-resolution target frameThe track pool is Gao Fenmu target matching unmatched successful trackBased on the position of the last frame target frame in the Kalman filtering prediction track pool in the current frame, namely the predicted target frame +.>Low-split target frame and predicted target frame of unmatched successful track>Calculating fusion loss matrixPerforming association matching by using Hungary algorithm based on the loss matrix, wherein the low-resolution target matching output result is a matched low-resolution target frame +.>And unmatched low-resolution target frame->The current track pool track is divided into matched tracks +.>And unmatched track->
The step S3 further comprises long-term vanishing target matching, and the long-term vanishing target flow is shown in the 2-3 part of FIG. 2, and is a step of retrieving the high-score target frame after long-term target loss, and the specific steps comprise:
and calculating a cosine loss matrix by utilizing the target apparent features which are not matched in high scores and the target apparent features in the deletion track, and carrying out association matching by utilizing a Hungary algorithm based on the cosine loss matrix.
The long-term disappearing target matching output result is a matched high-resolution target frame and an unmatched high-resolution target frame, a new ID is given to the unmatched high-resolution target frame, the new ID is considered as a new target, the deleting track is divided into a matched deleting track and an unmatched deleting track, and the state of the matched deleting track is updated into an activating track.
According to the method of the first aspect of the application, the tracks successfully matched are updated;
the updating is specifically performed by the following formula:
wherein,representing the updated appearance characteristics, +.>Representing weight parameters, the current value is 0.9, < ->Representing the apparent characteristics of the object of the previous frame, +.>Representing the current target apparent characteristics.
The second aspect of the application discloses an electronic device. The electronic device comprises a memory storing a computer program and a processor which, when executing the computer program, implements the steps in a multi-target long-term tracking method for a drone of any one of the first aspects of the present disclosure.
Fig. 3 is a block diagram of an electronic device according to an embodiment of the present application, and as shown in fig. 3, the electronic device includes a processor, a memory, a communication interface, a display screen, and an input device connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the electronic device is used for conducting wired or wireless communication with an external terminal, and the wireless communication can be achieved through WIFI, an operator network, near Field Communication (NFC) or other technologies. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the electronic equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in fig. 3 is merely a block diagram of a portion related to the technical solution of the present disclosure, and does not constitute a limitation of the electronic device to which the technical solution of the present disclosure is applied, and that a specific electronic device may include more or less components than those shown in the drawings, or may combine some components, or have different component arrangements.
A third aspect of the application discloses a computer-readable storage medium. A computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements steps of a method for multi-target long-term tracking of a drone of any one of the first aspects of the present disclosure.
In summary, the scheme provided by the application has the following technical effects: according to the application, the camera anti-shake strategy is integrated in the matching process, the problem of error tracking when the unmanned aerial vehicle shakes due to weather influence is solved, the ReID model is introduced, the matching post-processing strategy is added, the problem that the same person disappears and the ID is newly increased when the camera is reproduced when the camera rotates is solved, and further the multi-target long-term stable target tracking under the view angle of the unmanned aerial vehicle is realized.
Note that the technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be regarded as the scope of the description. The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (10)

1. A multi-target long-term tracking method for unmanned aerial vehicle is characterized in that,
the method is characterized in that a plurality of targets acquired by the unmanned aerial vehicle are positioned, tracked and monitored based on a tracking system, wherein the tracking system comprises a detection module, a cross-mirror tracking module and a matching module;
the method comprises the following steps:
s1, the detection module identifies target detection frame images and target relative position information of all monitoring targets from environment image information to obtain a target detection frame image sequence and a target relative position information sequence;
s2, the cross-mirror tracking module extracts multidimensional apparent features of each target detection frame image;
s3, for each frame in the target detection frame image sequence, after the matching module carries out anti-shake operation on the frame, the frame is divided into a high-resolution target frame and a low-resolution target frame according to the confidence level, a fusion loss matrix is obtained based on the multi-dimensional apparent characteristics and the target relative position information, and then track matching is carried out on the scored target frame through a Hungary algorithm, so that target tracking is completed.
2. The method according to claim 1, wherein in the step S3, the step of performing an anti-shake operation includes:
and acquiring a homography transformation matrix based on the target detection frame image of the current frame and the target detection frame image of the previous frame, and further obtaining the perspective transformed current detection target frame according to the product of the target detection frame image of the previous frame and the homography transformation matrix.
3. The method for multi-target long-term tracking of an unmanned aerial vehicle according to claim 2, wherein the step of acquiring the homography transformation matrix based on the target detection frame image of the current frame and the target detection frame image of the previous frame comprises:
respectively extracting feature points of a target detection frame image of a current frame and a target detection frame image of a previous frame by using an ORB operator, and describing a corresponding descriptor of each feature point;
matching the descriptors of the feature points to obtain matched feature point pairs;
after carrying out abnormal matching filtering on the characteristic point pairs, obtaining correct characteristic point pairs;
and selecting the optimal 8 feature point pairs based on a random sampling consistency algorithm, and calculating the homography transformation matrix through the 8 feature point pairs.
4. A method of multi-target long-term tracking of unmanned aerial vehicles according to claim 3, wherein in step S3, the step of obtaining a fusion loss matrix based on the multi-dimensional apparent features and the target relative position information comprises:
based on the position of a last frame detection target frame in a Kalman filtering prediction track pool in the current detection target frame, obtaining a prediction target frame;
modeling the current detection target frame and the prediction target frame into 2D Gaussian distribution based on the target relative position information;
calculating the distance between the current detection target frame and Gaussian distribution corresponding to the prediction target frame;
the calculation formula of the distance between the Gaussian distributions is as follows:
wherein,representing the distance between the gaussian distribution between the current detection target frame a and the prediction target frame b,/and->Gaussian distribution indicating the current detection target frame a, < ->Gaussian distribution representing predicted target box b, (-)>,/>) Represents the coordinates of the center point of the current detection target frame a, < >>Represents the width of the current detection target frame a, +.>Representing the height of the current detection target frame a, (-)>,/>) Represents the center point coordinates of the predicted target frame b, < >>Representing the width of the predicted target frame b, +.>Representing the height of the predicted target frame b;
normalizing the distance between the Gaussian distributions to obtain the position similarity between the current detection target frame and the prediction target frame;
the calculation formula of the position similarity is as follows:
wherein,representing the similarity between the current detection target frame a and the prediction target frame b, and C represents the data set correlation constant;
calculating an NWD loss matrix according to the position similarity;
the NWD loss matrix formula is:
calculating the similarity between the multi-dimensional apparent features corresponding to the current detection target frame and the prediction target frame by adopting cos similarity;
the calculation formula of the similarity between the multidimensional apparent features is as follows:
wherein,indicating the apparent characteristics of the ith target box, < >>Indicating the apparent characteristics of the jth target box, < >>A value representing the kth dimension of the ith target,/->A value representing the kth dimension of the jth target,/->A dimension representing the apparent feature;
normalizing the similarity between the multidimensional apparent features to obtain an apparent feature loss matrix;
the apparent characteristic loss matrix is:
obtaining a fusion loss matrix according to the NWD loss matrix and the apparent characteristic loss matrix;
the fusion loss matrix is as follows:
wherein,a threshold value representing the position distance, currently taking the value 0.3 +.>A threshold representing an apparent characteristic.
5. The method of claim 4, wherein the trajectory pool matched with the Gao Fenmu frame is a current trajectory which can be correlated, and the trajectory pool matched with the low-resolution frame is a trajectory which is not successfully matched with the Gao Fenmu frame.
6. The method for long-term tracking of multiple targets by unmanned aerial vehicle according to claim 5, further comprising, after step S3:
and calculating a cosine loss matrix by utilizing the target apparent features which are not matched in high scores and the target apparent features in the deletion track, and carrying out association matching by utilizing a Hungary algorithm based on the cosine loss matrix.
7. The unmanned aerial vehicle multi-target long-term tracking method of claim 6, wherein the tracks that are successfully matched are updated;
the updating is specifically performed by the following formula:
wherein,representing the updated appearance characteristics, +.>Representing weight parameters, the current value is 0.9, < ->Representing the apparent characteristics of the object of the previous frame, +.>Representing the current target apparent characteristics.
8. The unmanned aerial vehicle multi-target long-term tracking method according to claim 1, wherein step S2 comprises:
performing image preprocessing on each target detection frame image;
capturing a feature map [1,2048,16,8] of each preprocessed target detection frame image by adopting the Restnet50 as a backbone network;
and adopting an aggregation network to perform the dimension reduction and normalization on the characteristic map [1,2048,16,8] to obtain the multidimensional apparent characteristic [1,2048,1,1].
9. An electronic device comprising a memory storing a computer program and a processor implementing the steps of a multi-target long-term tracking method of an unmanned aerial vehicle according to any one of claims 1 to 8 when the computer program is executed by the processor.
10. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the steps of a multi-target long-term tracking method of a drone according to any of claims 1 to 8.
CN202311070990.1A 2023-08-24 2023-08-24 Multi-target long-term tracking method for unmanned aerial vehicle Pending CN117152206A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311070990.1A CN117152206A (en) 2023-08-24 2023-08-24 Multi-target long-term tracking method for unmanned aerial vehicle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311070990.1A CN117152206A (en) 2023-08-24 2023-08-24 Multi-target long-term tracking method for unmanned aerial vehicle

Publications (1)

Publication Number Publication Date
CN117152206A true CN117152206A (en) 2023-12-01

Family

ID=88905484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311070990.1A Pending CN117152206A (en) 2023-08-24 2023-08-24 Multi-target long-term tracking method for unmanned aerial vehicle

Country Status (1)

Country Link
CN (1) CN117152206A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117495915A (en) * 2023-12-29 2024-02-02 图灵人工智能研究院(南京)有限公司 Multi-target tracking method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117495915A (en) * 2023-12-29 2024-02-02 图灵人工智能研究院(南京)有限公司 Multi-target tracking method and system
CN117495915B (en) * 2023-12-29 2024-04-02 图灵人工智能研究院(南京)有限公司 Multi-target tracking method and system

Similar Documents

Publication Publication Date Title
US11062123B2 (en) Method, terminal, and storage medium for tracking facial critical area
Yue et al. The peeping eye in the sky
CN111627045B (en) Multi-pedestrian online tracking method, device and equipment under single lens and storage medium
CN105405154B (en) Target object tracking based on color-structure feature
US11967089B2 (en) Object tracking method, tracking processing method, corresponding apparatus, and electronic device
CN108846854B (en) Vehicle tracking method based on motion prediction and multi-feature fusion
CN103514432B (en) Face feature extraction method, equipment and computer program product
US20150379371A1 (en) Object Detection Utilizing Geometric Information Fused With Image Data
CN105931269A (en) Tracking method for target in video and tracking device thereof
US11288887B2 (en) Object tracking method and apparatus
CN108198201A (en) A kind of multi-object tracking method, terminal device and storage medium
US20210166450A1 (en) Motion trajectory drawing method and apparatus, and device and storage medium
CN109741369A (en) A kind of method and system for robotic tracking target pedestrian
US20210166042A1 (en) Device and method of objective identification and driving assistance device
CN109087337B (en) Long-time target tracking method and system based on hierarchical convolution characteristics
Zhang et al. Visual tracking using Siamese convolutional neural network with region proposal and domain specific updating
Wan et al. Face image reflection removal
CN117152206A (en) Multi-target long-term tracking method for unmanned aerial vehicle
CN103871056A (en) Anisotropic optical flow field and deskew field-based brain MR (magnetic resonance) image registration method
CN110910445A (en) Object size detection method and device, detection equipment and storage medium
Zhang et al. A vehicle detection and shadow elimination method based on greyscale information, edge information, and prior knowledge
CN114926859A (en) Pedestrian multi-target tracking method in dense scene combined with head tracking
CN110619280A (en) Vehicle heavy identification method and device based on deep joint discrimination learning
Haggui et al. Centroid human tracking via oriented detection in overhead fisheye sequences
CN109978908A (en) A kind of quick method for tracking and positioning of single goal adapting to large scale deformation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination