CN117152206A

CN117152206A - Multi-target long-term tracking method for unmanned aerial vehicle

Info

Publication number: CN117152206A
Application number: CN202311070990.1A
Authority: CN
Inventors: 李成哲; 殷艳坤; 赵凯敏; 陈萧冰; 侯静静
Original assignee: China Ordnance Equipment Group Ordnance Equipment Research Institute
Current assignee: China Ordnance Equipment Group Ordnance Equipment Research Institute
Priority date: 2023-08-24
Filing date: 2023-08-24
Publication date: 2023-12-01

Abstract

The application provides a multi-target long-term tracking method of an unmanned aerial vehicle, which comprises the following steps: the detection module identifies target detection frame images and target relative position information of all monitoring targets from the environment image information to obtain a target detection frame image sequence and a target relative position information sequence; the cross-mirror tracking module extracts the multidimensional apparent characteristics of each target detection frame image; and for each frame in the image sequence of the target detection block, after the anti-shake operation is carried out on the frame, the frame is divided into a high-resolution target frame and a low-resolution target frame according to the confidence level, a fusion loss matrix is obtained based on the multidimensional apparent characteristics and the relative position information of the target, and then the track matching is carried out through a Hungary algorithm so as to finish the target tracking. According to the application, the lens anti-shake strategy is fused in the matching process, and the cross-mirror tracking module is introduced, so that the problem of error tracking when the unmanned aerial vehicle shakes is solved, and the problem of newly increased ID caused by rotation of the camera is solved.

Description

Multi-target long-term tracking method for unmanned aerial vehicle

Technical Field

The application belongs to the technical field of target tracking, and particularly relates to a multi-target long-term tracking method of an unmanned aerial vehicle.

Background

Multi-target tracking has been developed rapidly in the last decade, and typical methods in early stages include MeanShift (mean shift) and particle filtering, but the overall accuracy is low and single-target tracking is the main method. As the accuracy of detecting targets continues to improve, a detection-based tracking framework appears, like SORT, deepSORT, BYTETrack, boTSORT, which is a typical algorithm.

The tracking framework flow based on detection is mainly divided into two parts of detection and matching, but due to the flexibility of the framework, the detection part can be continuously updated along with the development of a target detection algorithm, and the accuracy of the subsequent matching process is affected by the detection result. The SORT algorithm is a classic algorithm initially started in detection tracking, the core of the matching process is mainly the combination of Kalman filtering and Hungary algorithm, the main process is to firstly carry out Kalman filtering prediction on a detection target frame of the previous frame at the position of a current frame, and then carry out optimal matching between the current detection target frame and a prediction target frame by using Hungary matching algorithm. The deep SORT algorithm is an improved version of SORT, and mainly increases the apparent characteristics of a CNN network extracted target frame, increases the robustness to shielding, and proposes a cascade matching strategy to realize the association measurement between a real target frame and a predicted target frame by combining motion information and appearance information. The BYTETrack algorithm provides a new data association method, which is different from the SORT algorithm, the algorithm considers a low confidence detection target frame generated due to shielding, the algorithm utilizes the similarity between the detection frame and the track, removes the background from the low score result while maintaining the high score detection result, and mines a real object, so that the track consistency is improved, firstly, the high score frame is matched with the previous track, then the low score frame is matched with the high score track frame which is not successfully matched for the first time, and the detection frame with high score is used for constructing a new tracking track. The botport algorithm is a more robust tracker, matching the overall flow is substantially consistent with the BYTETrack, but it incorporates a ReID module and camera motion compensation approach.

In the existing multi-target tracking technical scheme, like SORT, BYTETrack, the position information is only used for matching, the premise of accurate matching is that the background of the front frame and the back frame cannot be changed greatly, the moving speed of the target is relatively slow, if the premise is not met, the problem of serious ID Switch (identity conversion) exists in the matching result, even short-term continuous tracking of a certain target cannot be realized, when the camera is subjected to shaking and rotation, the situation of the head of the camera is almost impossible to continuously track by means of the position information, while DeepSORT, boTSORT is added with ReID (cross-mirror tracking), the object under the monitoring lens is re-identified and identified, the ID Switch situation is relieved to a certain extent, but when the shielding time of the target is long or the rotation amplitude of the camera is large, the position information added in the matching process cannot play a role, and the same target cannot be associated.

Disclosure of Invention

In order to solve the technical problems, the application provides a multi-target long-term tracking method of an unmanned aerial vehicle.

The first aspect of the application discloses a multi-target long-term tracking method of an unmanned aerial vehicle; the method is characterized in that a plurality of targets acquired by the unmanned aerial vehicle are positioned, tracked and monitored based on a tracking system, wherein the tracking system comprises a detection module, a cross-mirror tracking module and a matching module;

the method comprises the following steps:

s1, the detection module identifies target detection frame images and target relative position information of all monitoring targets from environment image information to obtain a target detection frame image sequence and a target relative position information sequence;

s2, the cross-mirror tracking module extracts multidimensional apparent features of each target detection frame image;

s3, for each frame in the target detection frame image sequence, after the matching module carries out anti-shake operation on the frame, the frame is divided into a high-resolution target frame and a low-resolution target frame according to the confidence level, a fusion loss matrix is obtained based on the multi-dimensional apparent characteristics and the target relative position information, and then the scored target frame is subjected to track matching through a Hungary algorithm, so that target tracking is completed.

According to the method of the first aspect of the present application, in the step S3, the step of performing the anti-shake operation includes:

and acquiring a homography transformation matrix based on the target detection frame image of the current frame and the target detection frame image of the previous frame, and further obtaining the perspective transformed current detection target frame according to the product of the target detection frame image of the previous frame and the homography transformation matrix.

According to the method of the first aspect of the present application, the step of obtaining the homography transformation matrix based on the target detection frame image of the current frame and the target detection frame image of the previous frame includes:

respectively extracting feature points of a target detection frame image of a current frame and a target detection frame image of a previous frame by using an ORB operator, and describing a corresponding descriptor of each feature point;

matching the descriptors of the feature points to obtain matched feature point pairs;

after carrying out abnormal matching filtering on the characteristic point pairs, obtaining correct characteristic point pairs;

and selecting the optimal 8 feature point pairs based on a random sampling consistency algorithm, and calculating the homography transformation matrix through the 8 feature point pairs.

According to the method of the first aspect of the present application, in the step S3, the step of obtaining a fusion loss matrix based on the multi-dimensional apparent features and the target relative position information includes:

based on the position of a last frame detection target frame in a Kalman filtering prediction track pool in the current detection target frame, obtaining a prediction target frame;

modeling the current detection target frame and the prediction target frame into 2D Gaussian distribution based on the target relative position information;

calculating the distance between the current detection target frame and Gaussian distribution corresponding to the prediction target frame;

the calculation formula of the distance between the Gaussian distributions is as follows:

wherein,representing the distance between the gaussian distribution between the current detection target frame a and the prediction target frame b,/and->Gaussian distribution indicating the current detection target frame a, < ->Gaussian distribution representing predicted target box b, (-)>,/>) Represents the coordinates of the center point of the current detection target frame a, < >>Represents the width of the current detection target frame a, +.>Representing the height of the current detection target frame a, (-)>,/>) Represents the center point coordinates of the predicted target frame b, < >>Representing the width of the predicted target frame b, +.>Representing the height of the predicted target frame b;

normalizing the distance between the Gaussian distributions to obtain the position similarity between the current detection target frame and the prediction target frame;

the calculation formula of the position similarity is as follows:

wherein,representing the similarity between the current detection target frame a and the prediction target frame b, and C represents the data set correlation constant;

calculating an NWD loss matrix according to the position similarity;

the NWD loss matrix formula is:

calculating the similarity between the multi-dimensional apparent features corresponding to the current detection target frame and the prediction target frame by adopting cos similarity;

the calculation formula of the similarity between the multidimensional apparent features is as follows:

wherein,indicating the apparent characteristics of the ith target box, < >>Indicating the apparent characteristics of the jth target box, < >>A value representing the kth dimension of the ith target,/->A value representing the kth dimension of the jth target,/->A dimension representing the apparent feature;

normalizing the similarity between the multidimensional apparent features to obtain an apparent feature loss matrix;

the apparent characteristic loss matrix is:

obtaining a fusion loss matrix according to the NWD loss matrix and the apparent characteristic loss matrix;

the fusion loss matrix is as follows:

wherein,a threshold value representing the position distance, currently taking the value 0.3 +.>A threshold representing an apparent characteristic.

According to the method of the first aspect of the application, the track pool matched with the Gao Fenmu frame is a track which can be correlated currently, and the track pool matched with the low-resolution frame is a track which is not successfully matched with the Gao Fenmu frame.

According to the method of the first aspect of the present application, step S3 further comprises:

and calculating a cosine loss matrix by utilizing the target apparent features which are not matched in high scores and the target apparent features in the deletion track, and carrying out association matching by utilizing a Hungary algorithm based on the cosine loss matrix.

According to the method of the first aspect of the application, the tracks successfully matched are updated;

the updating is specifically performed by the following formula:

wherein,representation ofThe weight parameter is currently valued at 0.9, the fea is the apparent characteristic of the target of the previous frame, and cur_fea is the apparent characteristic of the current target.

According to the method of the first aspect of the application, step S2 comprises:

performing image preprocessing on each target detection frame image;

capturing a feature map [1,2048,16,8] of each preprocessed target detection frame image by adopting the Restnet50 as a backbone network;

and adopting an aggregation network to perform the dimension reduction and normalization on the characteristic map [1,2048,16,8] to obtain the multidimensional apparent characteristic [1,2048,1,1].

The second aspect of the application discloses an electronic device. The electronic device comprises a memory storing a computer program and a processor which, when executing the computer program, implements the steps in a multi-target long-term tracking method for a drone of any one of the first aspects of the present disclosure.

A third aspect of the application discloses a computer-readable storage medium. A computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps in a multi-target long-term tracking method for a drone of any one of the first aspects of the present disclosure.

In summary, the scheme provided by the application has the following technical effects: according to the application, the camera anti-shake strategy is integrated in the matching process, the problem of error tracking when the unmanned aerial vehicle shakes due to weather influence is solved, the ReID model is introduced, the matching post-processing strategy is added, the problem that the same person disappears and the ID is newly increased when the camera is reproduced when the camera rotates is solved, and further the multi-target long-term stable target tracking under the view angle of the unmanned aerial vehicle is realized.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a multi-target long-term tracking method for a unmanned aerial vehicle according to an embodiment of the present application;

fig. 2 is a block diagram of a multi-target long-term tracking system for a drone according to an embodiment of the present application;

fig. 3 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Along with the development of unmanned aerial vehicle technology, unmanned aerial vehicle application scenes in the military field are more and more, the basic ability construction when unmanned aerial vehicle is used is tracked to the target, and long-term stable tracking target is one of the research difficulties. The difference between multi-target tracking and other visual angle target tracking of the visual angle of the unmanned aerial vehicle is mainly expressed in the following steps: 1) The targets being mostly small targets, generally smaller thanA pixel; 2) The camera has poor stability when being seriously affected by weather and in abnormal weather such as strong wind; 3) When the camera rotates, the same object is more likely to undergo an ID Switch (identity Switch) when reproduced after disappearing, and these factors are the main factors causing a failure to track the object stably for a long period of time. Therefore, in order to realize the long-term stable tracking of the target of the unmanned aerial vehicle, we propose a multi-target long-term track tracking method. The method provided by the application can solve the technical problems.

Referring to fig. 1, the first aspect of the present application discloses a multi-target long-term tracking method for an unmanned aerial vehicle; the method is based on a tracking system to locate, track and monitor various targets acquired by the unmanned aerial vehicle. Referring to fig. 2, the tracking system includes a detection module, a cross-mirror tracking module, and a matching module;

the method comprises the following steps:

the detection module is an early basis of a tracking algorithm, the tracked input is the output of the detection module, and the accuracy of the detection module also has a certain influence on the tracking effect. The current detection model uses a YOLOX algorithm, and the input end uses technologies of more mainstream including mosaics, mixUp, random overturn and the like, and the backbone network is a Darknet53 model, and an SPP layer is added on the basis. The training data set is formed by combining self-labeling data and open source data in a mixed mode, wherein the self-labeling data comprises 10w+ images, the open source data extracts images of people from COCO data, and 5000 images of a COCO2017 test set are used for verification.

Assume thatRepresenting the kth image, the detection module detects N targets of the kth frame, and the output result is represented as

，

Wherein the method comprises the steps ofRepresenting the center point of the target frame, width represents the width of the target frame, and height represents the height of the target frame.

the cross-mirror tracking (ReID) module is mainly divided into image preprocessing, a backbone network, an aggregation network, a Head layer and a Loss, and the sub-method of each layer of module has various implementation modes, so that the application is not particularly limited. The step S2 comprises the following steps:

performing image preprocessing on each target detection frame image through image preprocessing;

The ReID is input as the detected target image, and can be expressed as。

The image preprocessing is one of common enhancement strategies in deep learning, and the method mainly comprises the steps of restoration, random overturn, random erasure, random patch and cutoff in the preprocessing process, and the robustness and the generalization capability of the model are enhanced through the cleaning, standardization and expansion of data.

The size of the image input influences the size of the feature map, a large image can enable a model to learn clearer high-dimensional features, the size of the image is 384x128 in a unified mode, the size is a fixed size, and the target features are unified to 2048 dimensions in relation to the ReID model backup using the resnet 50.

And randomly turning the picture, wherein the random turning comprises random horizontal direction turning and random vertical direction turning. The random erasing is to remove a rectangle block with a predefined size range from the original images of different epochs during training, randomly generate a rectangle block with the height and width of 1/6,1/2 of the original image in the input image, and erase, namely, set the RGB value in the random rectangle block range as a random value.

The operation of random patch and previous random erasure is somewhat similar, except that the random rectangular block RGB values of an image are obtained from other images. The Cutout operation sets the removed location to 0, 0.

ReID model backbone network: the backbone network uses classical Resnet50, taking R into accountThe esNet50 is added with a residual structure, gradient disappearance can not occur when the network structure is deepened continuously, the network structure comprises 4 blocks, in order to reduce feature dimension, reduce calculation amount of a model and increase nonlinear expression capability of the model, each Block is respectively added with different numbers of bottleneck and Bott, and each bottleneck structure is thatConvolution followed by->Convolution, finallyConvolution retains enough representation power to capture detailed information of an image.

In order to further improve the diversity and robustness of the features, an aggregation network is added in the backbone network and the head layer, and the feature map [1,2048,16,8] generated by the Restnet50 is output as [1,2048,1,1] after the pooling layer.

The Head layer generates a feature with relatively large dimension, and the feature needs to be subjected to dimension reduction and normalization, and comprises a BN layer and a decision layer.

The Loss layer uses a cross entropy Loss function. The cross entropy loss function is a loss function commonly used for classifying problems, each target is considered as a class in the ReID model, and the class probability of the target is output.

In order to increase the robustness of the model, self-acquisition and multi-source data integration are performed on the data, the training data sets are from marks 1501, dukeMTMC, MSMT17 and self-labeling data, and the category number of the data reaches 10903. The self-labeling data are obtained by labeling targets through a professional labeling team according to the video data acquired by matching the real scene with the unmanned aerial vehicle, and deleting and filtering pedestrians with the frame number smaller than 100 frames.

ReID produces an apparent feature of 2048 dimensions for each target, and the apparent features of all targets of the kth frame can be expressed as。

In the step S3, the step of performing the anti-shake operation includes:

When the unmanned aerial vehicle is in the air, the unmanned aerial vehicle is more easily affected by weather, and when the wind power is large, the lens shake is serious, and this can lead to the ID to be in wrong matching. To reduce IDSwitch due to such reasons, anti-shake procedures have been added. The whole anti-shake core point is to search 8 correct characteristic points of the current frame and the previous frame, so that matching parallelism is added. The object detection block image feature descriptors and feature points of the previous frame are respectively expressed asThe feature descriptor and feature point of the object detection frame image of the current frame are respectively expressed as +.>；

In order to reduce the possibility of incorrect matching, abnormal matching filtering is carried out on the matching characteristic points by calculating the parallelism between the matching characteristic points and the virtual line segments, most of the characteristic points can be normally matched, parallel relations exist, and when few matching points are incorrect, the parallel relations disappear, so that an included angle matrix between the matching point pairs is calculated; when the number of parallel relations between a certain point pair and other point pairs is less than 50% of all the point pairs, the point pair is deleted.

for small objects there will always be some background pixels in the object box, as it is not possible for a real object to be exactly a rectangle. In the target frame, foreground pixels are typically centered and background pixels are typically centered on the edges, with the importance of the pixels decreasing from center to boundary. To better weight each pixel in the target frame, the target frame may be modeled as a 2D gaussian distribution. Specifically, for a horizontal target box, the inscribed ellipse can be expressed as:

(is the center point of ellipse, ">Is the radius of the x-axis and the y-axis, respectively corresponding to the target frame， />Representing the center point of the target frame,/-, and>representing the width of the target frame +.>Representing the target frame height.

This ellipse is one distribution profile of a 2D gaussian distribution. Thus, the target box may be modeled as a 2D gaussian distribution, and the similarity between two target boxes may be represented by the distance between the two gaussian distributions.

the calculation formula of the position similarity is as follows:

calculating an NWD loss matrix according to the position similarity;

the NWD loss matrix formula is:

wherein,indicating the apparent characteristics of the ith target box, < >>Representing the apparent characteristics of the jth target box,a value representing the kth dimension of the ith target,/->A value representing the kth dimension of the jth target,/->A dimension representing the apparent feature;

the apparent characteristic loss matrix is:

the fusion loss matrix is as follows:

wherein,representing distance between positionsThe threshold value of the ion, currently taking on the value 0.3, < ->A threshold representing an apparent characteristic.

The current calculation method of the position loss of the small target in the tracking algorithm is based on the measurement method of IoU, and IoU is very sensitive to the change of the position of the small target. The position information contributes almost equal to 0 to the tracking process matching when the drone is dithered or the lens is rotated. Considering scene requirements, the current position loss IOU method is not suitable for the unmanned airport scene, because the distance between targets becomes zero due to small lens shake, and the position information between the targets cannot be accurately measured, and in order to reduce the influence, normalized Gaussian Wasserstein Distance is adopted for calculating the similarity between the current target frame and the target frame of the previous frame.

According to the first aspect of the application, the track pool matched with the Gao Fenmu frame is a track which can be correlated currently, and the track pool matched with the low-resolution frame is a track which is not successfully matched with the Gao Fenmu frame.

Specifically, high-resolution target matching:

the high-score matching process is shown in part 2-1 of FIG. 2, and the target frame with the detection result confidence degree of >0.7 is defined as a high-score target frame

The track is a target frame formed by the same targets in a series, the track pool is a track which can be correlated at present and mainly comprises an activated track and a lost track, and the position of the target frame of the last frame in the track pool in the current frame is predicted based on Kalman filtering, namely the predicted target frame->Calculate the current frame target frame +.>And forecast target frame->Fusion loss matrix->And performing association matching based on a loss matrix Hungary algorithm, defining the target as a lost track when the target frame cannot be associated with any track in the track pool, namely the association fails, and defining the target as a deleted track when the association fails for more than 30 frames when the association fails for 30 continuous frames. Gao Fenmu the matching output result is the matched high-score target frame +.>And unmatched high-resolution target frame->The track pool track is divided into a matched track +.>And unmatched track->。

Low-score target matching: the low-resolution target frame flow is shown in the section 2-2 in figure 2, and the confidence is output by the detection result<A target frame of 0.7 is defined as a low-resolution target frameThe track pool is Gao Fenmu target matching unmatched successful trackBased on the position of the last frame target frame in the Kalman filtering prediction track pool in the current frame, namely the predicted target frame +.>Low-split target frame and predicted target frame of unmatched successful track>Calculating fusion loss matrixPerforming association matching by using Hungary algorithm based on the loss matrix, wherein the low-resolution target matching output result is a matched low-resolution target frame +.>And unmatched low-resolution target frame->The current track pool track is divided into matched tracks +.>And unmatched track->。

The step S3 further comprises long-term vanishing target matching, and the long-term vanishing target flow is shown in the 2-3 part of FIG. 2, and is a step of retrieving the high-score target frame after long-term target loss, and the specific steps comprise:

The long-term disappearing target matching output result is a matched high-resolution target frame and an unmatched high-resolution target frame, a new ID is given to the unmatched high-resolution target frame, the new ID is considered as a new target, the deleting track is divided into a matched deleting track and an unmatched deleting track, and the state of the matched deleting track is updated into an activating track.

the updating is specifically performed by the following formula:

wherein,representing the updated appearance characteristics, +.>Representing weight parameters, the current value is 0.9, < ->Representing the apparent characteristics of the object of the previous frame, +.>Representing the current target apparent characteristics.

Fig. 3 is a block diagram of an electronic device according to an embodiment of the present application, and as shown in fig. 3, the electronic device includes a processor, a memory, a communication interface, a display screen, and an input device connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the electronic device is used for conducting wired or wireless communication with an external terminal, and the wireless communication can be achieved through WIFI, an operator network, near Field Communication (NFC) or other technologies. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the electronic equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 3 is merely a block diagram of a portion related to the technical solution of the present disclosure, and does not constitute a limitation of the electronic device to which the technical solution of the present disclosure is applied, and that a specific electronic device may include more or less components than those shown in the drawings, or may combine some components, or have different component arrangements.

A third aspect of the application discloses a computer-readable storage medium. A computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements steps of a method for multi-target long-term tracking of a drone of any one of the first aspects of the present disclosure.

Note that the technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be regarded as the scope of the description. The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A multi-target long-term tracking method for unmanned aerial vehicle is characterized in that,

the method is characterized in that a plurality of targets acquired by the unmanned aerial vehicle are positioned, tracked and monitored based on a tracking system, wherein the tracking system comprises a detection module, a cross-mirror tracking module and a matching module;

the method comprises the following steps:

s3, for each frame in the target detection frame image sequence, after the matching module carries out anti-shake operation on the frame, the frame is divided into a high-resolution target frame and a low-resolution target frame according to the confidence level, a fusion loss matrix is obtained based on the multi-dimensional apparent characteristics and the target relative position information, and then track matching is carried out on the scored target frame through a Hungary algorithm, so that target tracking is completed.

2. The method according to claim 1, wherein in the step S3, the step of performing an anti-shake operation includes:

3. The method for multi-target long-term tracking of an unmanned aerial vehicle according to claim 2, wherein the step of acquiring the homography transformation matrix based on the target detection frame image of the current frame and the target detection frame image of the previous frame comprises:

4. A method of multi-target long-term tracking of unmanned aerial vehicles according to claim 3, wherein in step S3, the step of obtaining a fusion loss matrix based on the multi-dimensional apparent features and the target relative position information comprises:

，

the calculation formula of the position similarity is as follows:

，

calculating an NWD loss matrix according to the position similarity;

the NWD loss matrix formula is:

，

the apparent characteristic loss matrix is:

，

the fusion loss matrix is as follows:

，

5. The method of claim 4, wherein the trajectory pool matched with the Gao Fenmu frame is a current trajectory which can be correlated, and the trajectory pool matched with the low-resolution frame is a trajectory which is not successfully matched with the Gao Fenmu frame.

6. The method for long-term tracking of multiple targets by unmanned aerial vehicle according to claim 5, further comprising, after step S3:

7. The unmanned aerial vehicle multi-target long-term tracking method of claim 6, wherein the tracks that are successfully matched are updated;

the updating is specifically performed by the following formula:

，

8. The unmanned aerial vehicle multi-target long-term tracking method according to claim 1, wherein step S2 comprises:

performing image preprocessing on each target detection frame image;

9. An electronic device comprising a memory storing a computer program and a processor implementing the steps of a multi-target long-term tracking method of an unmanned aerial vehicle according to any one of claims 1 to 8 when the computer program is executed by the processor.

10. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the steps of a multi-target long-term tracking method of a drone according to any of claims 1 to 8.