CN114757974A

CN114757974A - Trajectory tracking method and system for multi-rotor unmanned aerial vehicle

Info

Publication number: CN114757974A
Application number: CN202210445628.7A
Authority: CN
Inventors: 蔡浩; 覃大创; 许建龙; 熊智; 朱长盛
Original assignee: Shantou University
Current assignee: Shantou University
Priority date: 2022-04-24
Filing date: 2022-04-24
Publication date: 2022-07-15

Abstract

The invention discloses a track tracking method and a track tracking system for a multi-rotor unmanned aerial vehicle, wherein the method comprises the following steps: acquiring video images of a plurality of multi-rotor unmanned aerial vehicles when the multi-rotor unmanned aerial vehicles execute flight tasks; performing target identification detection on each frame of image contained in the video image, and acquiring position detection results of all multi-rotor unmanned aerial vehicles contained in each frame of image and the types of the unmanned aerial vehicles; performing appearance feature extraction on each frame of image according to the position detection results of all the multi-rotor unmanned aerial vehicles contained in each frame of image to obtain the appearance feature detection results of all the multi-rotor unmanned aerial vehicles contained in each frame of image; and applying the appearance characteristic detection results of all the multi-rotor unmanned aerial vehicles contained in each frame of image to a Deepsort algorithm, and performing frame-by-frame matching on the motion tracks of a plurality of multi-rotor unmanned aerial vehicles according to the position detection results of all the multi-rotor unmanned aerial vehicles contained in each frame of image. The invention can improve the motion trail tracking effect of a plurality of multi-rotor unmanned aerial vehicles.

Description

Trajectory tracking method and system for multi-rotor unmanned aerial vehicle

Technical Field

The invention relates to the technical field of unmanned aerial vehicle application, in particular to a trajectory tracking method and system for a multi-rotor unmanned aerial vehicle.

Background

Traditional unmanned aerial vehicle aerial monitoring system usually uses the radar detection to give first place to, however many rotor unmanned aerial vehicle have the little characteristics of radar detection area, use the radar to detect unmanned aerial vehicle's three-dimensional position often appear the condition of lou examining when catching its flight path. Later, the unmanned aerial vehicle position detection is proposed by using sound detection and radio frequency detection, but both detection modes need a shared frequency band, so that the false detection probability is increased. Therefore, how to reduce the position false detection probability of a plurality of multi-rotor unmanned aerial vehicles when the multi-rotor unmanned aerial vehicles execute flight tasks and realize efficient trajectory tracking is a technical problem to be solved by the invention.

Disclosure of Invention

The invention provides a trajectory tracking method and a trajectory tracking system for a multi-rotor unmanned aerial vehicle, which are used for solving one or more technical problems in the prior art and at least providing a beneficial selection or creation condition.

The embodiment of the invention provides a trajectory tracking method of a multi-rotor unmanned aerial vehicle, which comprises the following steps:

acquiring video images of a plurality of multi-rotor unmanned aerial vehicles when the multi-rotor unmanned aerial vehicles execute flight tasks;

performing target identification detection on each frame of image contained in the video image, and acquiring position detection results of all multi-rotor unmanned aerial vehicles contained in each frame of image and the types of the unmanned aerial vehicles;

Performing appearance feature extraction on each frame of image according to the position detection results of all the multi-rotor unmanned aerial vehicles contained in each frame of image to obtain the appearance feature detection results of all the multi-rotor unmanned aerial vehicles contained in each frame of image;

and applying the appearance characteristic detection results of all the multi-rotor unmanned aerial vehicles contained in each frame of image to a Deepsort algorithm, and performing frame-by-frame matching on the motion tracks of the plurality of multi-rotor unmanned aerial vehicles according to the position detection results of all the multi-rotor unmanned aerial vehicles contained in each frame of image.

Further, the performing target identification detection on each frame of image included in the video image, and acquiring the position detection results of all multi-rotor drones included in each frame of image and the classes of the drones includes:

performing main feature extraction on each frame of image contained in the video image by using a first convolutional neural network built in advance to obtain low-level feature data and high-level feature data contained in each frame of image;

performing feature fusion on the low-level feature data and the high-level feature data contained in each frame of image by using a UAV-FPN neural network to obtain fusion feature data contained in each frame of image;

And performing feature conversion on the fused feature data contained in each frame of image by using a YOLOseal prediction network to obtain the position detection results of all multi-rotor unmanned aerial vehicles contained in each frame of image and the types of the unmanned aerial vehicles.

Further, the first convolutional neural network comprises an input processing module, a low-level feature extraction module and a high-level feature extraction module which are sequentially connected in sequence; the preprocessing module is used for extracting down-sampling feature data from any frame of image, the low-layer feature extraction module is used for extracting low-layer feature data from the down-sampling feature data, and the high-layer feature extraction module is used for further extracting high-layer feature data from the low-layer feature data.

Further, the input processing module comprises an input layer and a Focus structural layer which are sequentially connected, the low-layer feature extraction module comprises a first BaseBlock, a second BaseBlock, a first residual convolution layer, a third BaseBlock, a second residual convolution layer, a fourth BaseBlock, a first spatial pyramid pooling layer and a third residual convolution layer which are sequentially connected, and the high-layer feature extraction module comprises a fifth BaseBlock, a second spatial pyramid pooling layer and a fourth residual convolution layer which are sequentially connected.

Further, any BaseBlock includes an input layer, a two-dimensional convolutional layer, a max-pooling layer, and an active layer, which are connected in sequence.

Further, the performing appearance feature extraction on each frame of image according to the position detection results of all the multi-rotor unmanned aerial vehicles included in each frame of image to obtain the appearance feature detection results of all the multi-rotor unmanned aerial vehicles included in each frame of image includes:

according to the position detection results of all the multi-rotor unmanned aerial vehicles contained in each frame of image, intercepting the area image of each multi-rotor unmanned aerial vehicle from each frame of image;

and carrying out feature extraction on the image of the region where each multi-rotor unmanned aerial vehicle is located by utilizing a pre-built second convolution neural network to obtain the appearance feature vector of each multi-rotor unmanned aerial vehicle.

Further, the second convolutional neural network comprises an input layer, a convolutional layer, an average pooling layer and a normalization layer which are connected in sequence; the convolution layer is used for extracting global appearance characteristic data of each multi-rotor unmanned aerial vehicle from the image of the area where the multi-rotor unmanned aerial vehicle is located, the average pooling layer is used for performing vector mode adjustment on the global appearance characteristic data, and the normalization layer is used for converting the adjusted global appearance characteristic data into appearance characteristic vectors.

In addition, an embodiment of the present invention further provides a trajectory tracking system for a multi-rotor drone, where the system includes:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement a method for trajectory tracking of a multi-rotor drone according to any of the above.

The invention has at least the following beneficial effects: by combining the first convolution neural network, the UAV-FPN neural network and the YOLOhead prediction network, all multi-rotor unmanned aerial vehicles can be identified and detected from each frame of image contained in the video image more accurately, and the problem that the multi-rotor unmanned aerial vehicles are missed when the multi-rotor unmanned aerial vehicles execute flight tasks in the prior art is solved in the visual field. Through combining second convolution neural network and deep sort algorithm, fully consider many rotor unmanned aerial vehicle's appearance characteristic, can effectively improve the trail of motion tracking effect to a plurality of many rotor unmanned aerial vehicle, solve among the prior art and appear many rotor unmanned aerial vehicle false retrieval's when carrying out the flight task problem easily.

Drawings

The accompanying drawings are included to provide a further understanding of the present invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the example serve to explain the principles of the invention and do not constitute a limitation thereof.

Fig. 1 is a schematic flow chart of a trajectory tracking method for a multi-rotor drone in an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a first convolutional neural network in an embodiment of the present invention;

FIG. 3 is a schematic diagram of the architecture of the UAV-FPN neural network in an embodiment of the present invention;

fig. 4 is a schematic structural composition diagram of the second convolutional neural network in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

It is noted that while a division of functional blocks is depicted in the system diagram, and logical order is depicted in the flowchart, in some cases the steps depicted and described may be performed in a different order than the division of blocks in the system or the flowchart. The terms first, second and the like in the description and in the claims, as well as in the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Referring to fig. 1, fig. 1 is a schematic flowchart of a trajectory tracking method for a multi-rotor drone according to an embodiment of the present invention, where the method includes the following steps:

s101, video images of a plurality of multi-rotor unmanned aerial vehicles during flight tasks are obtained.

In the embodiment of the invention, the video images of the multiple multi-rotor unmanned aerial vehicles during the flight mission are acquired by a camera device built on the ground, and the types of the multiple multi-rotor unmanned aerial vehicles are different from each other.

S102, performing target identification detection on each frame of image contained in the video image, and acquiring position detection results of all multi-rotor unmanned aerial vehicles contained in each frame of image and the types of the unmanned aerial vehicles.

In the embodiment of the invention, firstly, a first convolutional neural network built in advance is utilized to extract the main features of each frame of image contained in the video image, so as to obtain the low-level feature data and the high-level feature data contained in each frame of image; secondly, carrying out feature fusion on the low-level feature data and the high-level feature data contained in each frame of image by utilizing a UAV-FPN neural network to obtain fusion feature data contained in each frame of image; and then, performing feature conversion on the fused feature data contained in each frame of image by using a Yolohead prediction network to obtain the position detection results of all multi-rotor unmanned aerial vehicles contained in each frame of image and the types of the unmanned aerial vehicles.

In the embodiment of the present invention, the first convolutional neural network includes an input processing module, a low-level feature extraction module, and a high-level feature extraction module, which are sequentially connected, as shown in fig. 2; the input processing module is used for extracting down-sampling feature data from any one frame of image with the input dimension of 640 x 3, the low-layer feature extraction module is used for extracting low-layer feature data with the dimension of 40 x 256 from the down-sampling feature data, and the high-layer feature extraction module is used for further extracting high-layer feature data with the dimension of 20 x 512 from the low-layer feature data.

Further, the input processing module comprises an input layer and a Focus structural layer which are sequentially connected, and the Focus structural layer can slice and stack each frame of image provided by the input layer according to a pixel value so as to ensure that the characteristic information of each frame of image is extracted to the maximum extent; the low-layer feature extraction module comprises a first BaseBlock, a second BaseBlock, a first residual convolution layer, a third BaseBlock, a second residual convolution layer, a fourth BaseBlock, a first spatial pyramid pooling layer and a third residual convolution layer which are sequentially connected; the high-layer feature extraction module comprises a fifth BaseBlock, a second spatial pyramid pooling layer and a fourth residual convolution layer which are sequentially connected; any BaseBlock comprises an input layer, a two-dimensional convolutional layer, a maximum pooling layer and an activation layer which are sequentially connected, and the activation layer is provided with a SiLu (sigmoid Weighted Link Unit) activation function.

Because the multi-rotor unmanned aerial vehicle is small in size, when the multi-rotor unmanned aerial vehicle executes a high-altitude flight mission, the body occupation ratio of each frame of image included in the video image is possibly small, and the flight distance can change at any time, in order to effectively extract more feature data of the multi-rotor unmanned aerial vehicle in each frame of image, the embodiment of the invention provides that low-layer feature data and high-layer feature data with fixed dimensions are extracted from each frame of image, namely the low-layer feature data with the dimension of 40 × 40 × 256 is extracted by using the first spatial pyramid pooling layer, and then the high-layer feature data with the dimension of 20 × 20 × 512 is extracted by using the second spatial pyramid pooling layer.

Because the activation function in the first convolutional neural network causes inevitable feature information extraction loss and easily causes network degradation along with the increase of the number of network layers, the embodiment of the invention provides that a residual convolutional layer is used for ensuring the network learning feature capability and preventing network degradation, wherein any one residual convolutional layer comprises an input layer, a channel stack layer, three BaseBlock layers and two-dimensional convolutional layers, and the channel stack layer is used for combining low-layer features processed by a single BaseBlock layer and high-layer features processed by two BaseBlock layers and a single two-dimensional convolutional layer.

In the embodiment of the present invention, the UAV-FPN neural Network (where, UAV is an abbreviation of Unmanned airborne Vehicle and is translated into an Unmanned Aerial Vehicle; and FPN is an abbreviation of Feature Pyramid Network and is translated into a Feature Pyramid Network) is mainly used for performing Feature fusion on the high-layer Feature data output by the high-layer Feature extraction module and the low-layer Feature data output by the low-layer Feature extraction module, and the UAV-FPN neural Network includes BaseBlock, an upsampling layer, a residual convolution layer, a two-dimensional convolution layer, and two channel stack layers, as shown in fig. 3, and its implementation process is as follows: firstly, processing the high-level feature data by a BaseBlock and an up-sampling layer, and amplifying the scale of the high-level feature data to be consistent with the scale of the low-level feature data by using a nearest neighbor interpolation method; secondly, fusing the high-level feature data after the scale is enlarged with the low-level feature data by utilizing a single channel stacking layer to obtain primary fused feature data; then, after the preliminary fusion feature data are processed by a residual convolution layer and a two-dimensional convolution layer, the scale of the preliminary fusion feature data is reduced to be consistent with that of the high-level feature data; and finally, fusing the primary fusion feature data after the size reduction and the high-level feature data after BaseBlock processing by using a single channel stacking layer to obtain final fusion feature data.

In an embodiment of the present invention, the YOLOHead prediction network is mainly used for performing feature transformation on the fusion feature data, and a network structure of the YOLOHead prediction network includes an Obj branch structure, a cis branch structure, and a Reg branch structure, where the Obj branch structure is used to determine whether the fusion feature data includes multi-rotor drones, the cis branch structure is used to identify category information of all the multi-rotor drones included in the fusion feature data, and the Reg branch structure is used to extract location frame information of all the multi-rotor drones included in the fusion feature data.

S103, extracting appearance characteristics of each frame of image according to the position detection results of all the multi-rotor unmanned aerial vehicles contained in each frame of image to obtain the appearance characteristic detection results of all the multi-rotor unmanned aerial vehicles contained in each frame of image.

In the embodiment of the invention, firstly, according to the position detection results of all multi-rotor unmanned aerial vehicles contained in each frame of image, the image of the area where each multi-rotor unmanned aerial vehicle is located is intercepted from each frame of image; secondly, carrying out feature extraction on the image of the region where each multi-rotor unmanned aerial vehicle is located by utilizing a pre-built second convolution neural network to obtain the appearance feature vector of each multi-rotor unmanned aerial vehicle.

More specifically, in the embodiment of the present invention, for any frame image with a dimension of 640 × 640 × 3 included in the video image, the position detection results of all the multi-rotor drones included in the frame image are taken as a reference center, an image of an area where each multi-rotor drone is located is captured from the frame image, and in the whole capturing process, the dimension output of the image of the area where each multi-rotor drone is located is adjusted to 64 × 128 by calling the existing Reshape function.

In the embodiment of the present invention, the second convolutional neural network includes an input layer, a convolutional layer, an average pooling layer, and a normalization layer, which are connected in sequence, as shown in fig. 4; the convolution layers are used for extracting global appearance characteristic data of each multi-rotor unmanned aerial vehicle from an image of a region where the multi-rotor unmanned aerial vehicle is located, preferably, the invention is provided with four convolution layers, each convolution layer is composed of an input layer, two activation layers and a network layer formed by combining two-dimensional convolution and BN (Batch Normalization), and any one activation layer is provided with a ReLu (Rectified Linear Units) activation function; the average pooling layer is used for carrying out vector mode adjustment on the global appearance characteristic data; the normalization layer is used for converting the adjusted global appearance characteristic data into appearance characteristic vectors.

And S104, applying the appearance characteristic detection results of all the multi-rotor unmanned aerial vehicles contained in each frame of image to a Deepsort algorithm, and performing frame-by-frame matching on the motion tracks of the multiple multi-rotor unmanned aerial vehicles according to the position detection results of all the multi-rotor unmanned aerial vehicles contained in each frame of image.

The implementation process of the invention comprises the following steps:

step 1, initializing parameters of a Kalman filter according to position detection results of all multi-rotor unmanned aerial vehicles contained in the first two frames of images in the video image through a Deepsort algorithm to obtain motion tracks of the multi-rotor unmanned aerial vehicles.

And 2, predicting according to the position tracking results of all multi-rotor unmanned aerial vehicles contained in the previous frame of image by using the Kalman filter to obtain the position predicting and tracking results of all multi-rotor unmanned aerial vehicles contained in the current frame of image.

It should be noted that, in the embodiment of the present invention, a loop operation is created from step 2, and the loop operation is performed starting from the third frame image in the video images, where the position tracking results of all the multi-rotor drones included in the second frame image in the video images are actually the position detection results of all the multi-rotor drones included in the second frame image.

Step 3, calculating motion offset measurement results of all multi-rotor unmanned aerial vehicles contained in the current frame image by using the Mahalanobis distance according to the position prediction tracking results and the position detection results of all multi-rotor unmanned aerial vehicles contained in the current frame image; the calculation formula of the motion offset measurement result of any one multi-rotor unmanned aerial vehicle contained in the current frame image is as follows:

wherein, d₁(i) Motion offset measurement result for ith multi-rotor unmanned aerial vehicle included in current frame image, a_iPosition detection result for ith multiple-rotor unmanned aerial vehicle, b_iPredict tracking results for the ith multiple rotor drone's position, T is transposedThe symbol(s) is (are),

and predicting a covariance matrix between the position detection result and the position prediction tracking result of the ith multi-rotor unmanned aerial vehicle.

And 4, extracting the appearance characteristics of the current frame image according to the position predicting and tracking results of all the multi-rotor unmanned aerial vehicles contained in the current frame image to obtain the appearance characteristic predicting and tracking results of all the multi-rotor unmanned aerial vehicles contained in the current frame image.

Step 5, predicting a tracking result and an appearance characteristic detection result according to appearance characteristics of all multi-rotor unmanned aerial vehicles contained in the current frame image, and calculating appearance difference measurement results of all multi-rotor unmanned aerial vehicles contained in the current frame image by using cosine distances; wherein, the formula for calculating the appearance difference measurement result of any one multi-rotor unmanned aerial vehicle contained in the current frame image is as follows:

Wherein, d₂(i) An appearance difference measurement result r of the ith multi-rotor unmanned aerial vehicle included in the current frame image_iFor the appearance feature detection results of the ith multi-rotor drone,

and predicting a set of tracking results for all appearance characteristics which are successfully matched in all frame images arranged before the current frame image, wherein R is a real number space.

And 6, combining the motion offset measurement results and the appearance difference measurement results of all the multi-rotor unmanned aerial vehicles contained in the current frame image, matching the position detection results of all the multi-rotor unmanned aerial vehicles contained in the current frame image with the motion trail formed by each multi-rotor unmanned aerial vehicle in front of the current frame image by using the existing Hungarian matching algorithm, and further outputting the position tracking results of all the multi-rotor unmanned aerial vehicles contained in the current frame image and the updated motion trail of each multi-rotor unmanned aerial vehicle.

And 7, updating parameters of the Kalman filter by using position tracking results of all multi-rotor unmanned aerial vehicles contained in the current frame image, and returning to the step 2 to continuously perform prediction operation on the next frame image until motion tracks of all multi-rotor unmanned aerial vehicles matched and output by the last frame image in the video image are obtained.

In the embodiment of the invention, by combining the first convolution neural network, the UAV-FPN neural network and the Yolopore prediction network, all multi-rotor unmanned planes can be identified and detected from each frame of image contained in the video image more accurately, and the problem that the multi-rotor unmanned planes are easy to miss detection when performing a flight mission in the prior art is solved in the visual field. Through combining second convolution neural network and deep sort algorithm, fully consider many rotor unmanned aerial vehicle's appearance characteristic, can effectively improve the trail of motion tracking effect to a plurality of many rotor unmanned aerial vehicle, solve among the prior art and appear many rotor unmanned aerial vehicle false retrieval's when carrying out the flight task problem easily.

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, the at least one program causes the at least one processor to implement the method for trajectory tracking of a multi-rotor drone of any of the embodiments described above.

The contents in the method embodiments are all applicable to the system embodiments, the functions realized by the system embodiments are the same as the method embodiments, and the beneficial effects achieved by the system embodiments are the same as the method embodiments.

The Processor may be a Central-Processing Unit (CPU), other general-purpose Processor, a Digital Signal Processor (DSP), an Application-Specific-Integrated-Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor being the control center for the trajectory tracking system of the multi-rotor drone, with various interfaces and lines connecting the various parts of the entire trajectory tracking system-operable device of the multi-rotor drone.

The memory may be configured to store the computer programs and/or modules, and the processor may be configured to implement the various functions of the trajectory tracking system of the multi-rotor drone by executing or otherwise executing the computer programs and/or modules stored in the memory, and by invoking the data stored in the memory. The memory may mainly include a program storage area and a data storage area, wherein: the storage program area is used for storing an operating system, application programs (such as a sound playing function and an image playing function) required by at least one function and the like; the storage data area is used for storing data (such as audio data, a phone book and the like) created according to the use of the mobile phone. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart-Media-Card (SMC), a Secure-Digital (SD) Card, a Flash-memory Card (Flash-Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

While the description of the present application has been presented in considerable detail and with particular reference to several illustrated embodiments, it is not intended to be limited to any such detail or embodiment or any particular embodiment, but rather should be construed to effectively cover the intended scope of the application by providing a broad interpretation of such claims in view of the prior art, and by reference to the appended claims. Moreover, the foregoing describes the present application in terms of embodiments foreseen by the inventor for which an enabling description was available, notwithstanding that insubstantial modifications of the present application, not presently foreseen, may nonetheless represent equivalents thereto.

Claims

1. A method of trajectory tracking for a multi-rotor drone, the method comprising:

acquiring video images of a plurality of multi-rotor unmanned aerial vehicles when the multi-rotor unmanned aerial vehicles execute flight missions;

And applying the appearance characteristic detection results of all the multi-rotor unmanned aerial vehicles contained in each frame of image to a Deepsort algorithm, and performing frame-by-frame matching on the motion tracks of the multiple multi-rotor unmanned aerial vehicles according to the position detection results of all the multi-rotor unmanned aerial vehicles contained in each frame of image.

2. The method for tracking the trajectory of multi-rotor drones according to claim 1, wherein the step of performing target recognition detection on each frame of image included in the video image, and the step of obtaining the position detection results and the classes of all the multi-rotor drones included in each frame of image comprises:

performing main feature extraction on each frame of image contained in the video image by using a first convolutional neural network which is set up in advance to obtain low-level feature data and high-level feature data contained in each frame of image;

and performing feature conversion on the fusion feature data contained in each frame of image by using a YOLOseal prediction network to obtain the position detection results of all multi-rotor unmanned aerial vehicles contained in each frame of image and the types of the unmanned aerial vehicles.

3. The trajectory tracking method for a multi-rotor drone of claim 2, wherein the first convolutional neural network includes an input processing module, a low-level feature extraction module, and a high-level feature extraction module connected in sequence; the preprocessing module is used for extracting down-sampling feature data from any frame of image, the low-layer feature extraction module is used for extracting low-layer feature data from the down-sampling feature data, and the high-layer feature extraction module is used for further extracting high-layer feature data from the low-layer feature data.

4. The trajectory tracking method for a multi-rotor unmanned aerial vehicle according to claim 3, wherein the input processing module comprises an input layer and a Focus structural layer which are connected in sequence, the low-layer feature extraction module comprises a first BaseBlock, a second BaseBlock, a first residual convolution layer, a third BaseBlock, a second residual convolution layer, a fourth BaseBlock, a first spatial pyramid pooling layer and a third residual convolution layer which are connected in sequence, and the high-layer feature extraction module comprises a fifth BaseBlock, a second spatial pyramid pooling layer and a fourth residual convolution layer which are connected in sequence.

5. The method for trajectory tracking of multi-rotor drones, according to claim 4, characterized in that any one BaseBlock comprises, connected in sequence, an input layer, a two-dimensional convolution layer, a max-pooling layer, and an activation layer.

6. The method according to claim 1, wherein performing appearance feature extraction on each frame of image according to the position detection results of all the multi-rotor drones included in each frame of image to obtain the appearance feature detection results of all the multi-rotor drones included in each frame of image includes:

and carrying out feature extraction on the image of the area where each multi-rotor unmanned aerial vehicle is located by utilizing a pre-built second convolution neural network to obtain the appearance feature vector of each multi-rotor unmanned aerial vehicle.

7. The method of claim 6, wherein the second convolutional neural network comprises an input layer, a convolutional layer, an average pooling layer, and a normalization layer connected in series; the convolution layer is used for extracting global appearance feature data of each multi-rotor unmanned aerial vehicle from the image of the area where the multi-rotor unmanned aerial vehicle is located, the average pooling layer is used for performing vector mode adjustment on the global appearance feature data, and the normalization layer is used for converting the adjusted global appearance feature data into appearance feature vectors.

8. A trajectory tracking system for multi-rotor drones, the system comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method for trajectory tracking of a multi-rotor drone of any one of claims 1 to 7.