CN114511793B

CN114511793B - Unmanned aerial vehicle ground detection method and system based on synchronous detection tracking

Info

Publication number: CN114511793B
Application number: CN202011285803.8A
Authority: CN
Inventors: 苏龙飞; 王之元; 凡遵林; 管乃洋; 张天昊; 王浩; 沈天龙; 黄强娟
Original assignee: National Defense Technology Innovation Institute PLA Academy of Military Science
Current assignee: National Defense Technology Innovation Institute PLA Academy of Military Science
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2024-04-05
Anticipated expiration: 2040-11-17
Also published as: CN114511793A

Abstract

The invention relates to a method and a system for detecting the ground of an unmanned aerial vehicle based on synchronous detection and tracking, wherein the method comprises the following steps: counting video frames acquired by an unmanned aerial vehicle based on a trained model file and a weight file of a target detection depth neural network, performing forward reasoning on the video frames with the count of 1, acquiring a target position area, initializing a target tracker, simultaneously acquiring a detected target position area and a tracked target position area for each video frame acquired subsequently, and if tracking is successful and the image size of the target position area accords with the preset image size, comparing whether the detected target position area and the tracked target position area are overlapped or not, and determining a final output target position area; according to the technical scheme provided by the invention, the interference caused by false detection of target detection is reduced by adopting a synchronous detection tracking method, the calculated amount in the unmanned aerial vehicle detection process is reduced, and the calculation accuracy is improved.

Description

Unmanned aerial vehicle ground detection method and system based on synchronous detection tracking

Technical Field

The invention relates to the technical field of computer vision, in particular to a ground detection method and system of an unmanned aerial vehicle based on synchronous detection and tracking.

Background

The current deep neural network is rapidly developed, the application is more and more extensive, and the method for carrying out target detection or search on a video or an image by utilizing the deep neural network mainly comprises a two-step method represented by FasterR-CNN, R-CNN and the like and a one-step method represented by YOLO, SSD and the like; although FasterR-CNN is an excellent algorithm in a two-step method, the processing speed of 5FPS can be only reached under the support of the powerful computing capacity of K40GPU, and the real-time requirement can not be met; although the speed of YOLO and SSD destination detection in the one-step approach can reach 15FPS or more and can reach real-time requirements, the computational power of the TitanX or M40GPU is necessary to support. The algorithm with better performance and faster speed in the target tracking algorithm is represented by a related filtering algorithm, and the algorithm has stable tracking and faster speed, and can reach 172FPS under the limited computing capacity.

The unmanned aerial vehicle is a reusable aircraft which is controlled by a radio remote control or an autonomous program and unmanned, and has the advantages of simple structure, low cost, strong survivability, good maneuvering performance and capability of completing various tasks; however, the unmanned aerial vehicle has low bearing weight, so that the unmanned aerial vehicle cannot carry a computing device with strong computing performance, so that the difficulty exists in deploying a target detection algorithm based on a deep neural network, and a small unmanned aerial vehicle-mounted computer such as raspberry pie or odroid has light weight and limited computing capacity; even if Tinyyolo or mobilets-SSD in a one-step method with high speed is deployed on an odroid onboard computer, the target detection speed does not exceed 3FPS, and the real-time requirement cannot be met. The retired predator unmanned aerial vehicle mainly acquires data through a sensor of the unmanned aerial vehicle and returns the data to the ground, and the data are interpreted manually on the ground; the improved global eagle-type portable signal sensor and the radar for detecting the ground moving target have preliminary on-board target detection monitoring capability (distinguishing dynamic and static, detecting the moving target), and the detection technology is not mature enough; the rainbow unmanned aerial vehicle acquires data through a sensor of the unmanned aerial vehicle and returns the data to the ground, the data are interpreted manually on the ground, and the rear end is further processed; the artificial intelligence algorithm is tested on a scanning hawk, the test is started for only a few days, the recognition accuracy of a computer to objects such as personnel, vehicles, buildings and the like reaches 60%, and the recognition accuracy is improved to 80% after 1 week, however, the application is still completed on the ground; from this point of view, the current technology still cannot realize the tracking detection of the target in the data acquired by the unmanned aerial vehicle onboard camera in real time and the processing operation of the next instruction.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a ground detection method and a ground detection system for an unmanned aerial vehicle based on synchronous detection tracking, which are used for counting video frames in real time from data acquired by an onboard camera in the flight process of the unmanned aerial vehicle by combining a target detection algorithm and a tracking algorithm based on a deep neural network, synchronously detecting and tracking specific targets in the video frames, and realizing the monitoring and searching of the ground targets, the directional tracking of the moving targets and the synchronous detection and tracking of the air targets by a tactical unmanned aerial vehicle.

The invention aims at adopting the following technical scheme:

the invention provides an unmanned aerial vehicle ground detection method based on synchronous detection and tracking, which is characterized by comprising the following steps:

training a target detection depth neural network model to obtain a model file and a weight file;

step (2) collecting real-time video data frame by frame;

step (3) initializing a frame number counter h=0;

step (4) let h=h+1, and simultaneously executing step (5) and step (8);

step (5) based on the trained model file and weight file of the target detection depth neural network, forward reasoning is carried out on real-time video data collected frame by frame, and a target position area detected in the h video frame is obtained;

step (6) judging whether h is 1, if so, executing step (7), otherwise, storing the target position area detected in the h video frame to step (12);

initializing a target tracker according to the target position area detected in the h video frame;

step (8) judging whether h is 1, if so, executing step (4), and if not, executing step (9);

step (9) obtaining a candidate region corresponding to the detected target position region in the h-1 video frame and a region in the h video frame, which is consistent with the candidate region corresponding to the h-1 video frame, and taking the region as the candidate region corresponding to the detected target position region in the h video frame, and obtaining the tracked target position region in the h video frame according to the candidate region corresponding to the detected target position region in the h-1 video frame;

step (10) judging whether the target tracking in the h video frame is successful, if so, executing the step (11), otherwise, executing the step (3);

step (11), judging whether the pixel coordinates of the target position area image tracked in the h video frame exceed the coordinate range of the preset video frame image, if so, executing the step (3), and if not, executing the step (12);

and (12) judging whether the detected target position area and the tracked target position area in the h video frame are overlapped or not, if so, outputting the detected target position area and executing the step (4), and if not, outputting the tracked target position area and executing the step (4).

Preferably, the step (1) includes:

marking each type of target in the historical video data collected frame by frame;

constructing training data by utilizing historical video data marked frame by frame, and training a target detection depth neural network model by utilizing the training data;

and obtaining a model file and a weight file of the trained target detection depth neural network.

Preferably, the step (5) includes:

and sequentially reading the label corresponding to the target, the trained model file, the weight file and the real-time video data acquired frame by using the forward reasoning frame to acquire the position of the target output by the forward reasoning frame.

Preferably, the obtaining the candidate region corresponding to the detected target position region in the h-1 video frame includes:

and expanding the detected target position area in the h-1 video frame by a preset multiple.

Further, the value range of the preset multiple is [1.5,3].

Preferably, the step (10) includes:

analyzing candidate areas corresponding to the detected target position areas in the h video frame by using a classifier of the h-1 video frame, and obtaining scores of the candidate areas corresponding to the detected target position areas in the h video frame;

if the score of the candidate region corresponding to the target position region detected in the h video frame is larger than the preset value of the classifier score, the target tracking is successful, otherwise, the target tracking fails.

Further, the training process of the h-1 video frame classifier comprises the following steps:

taking candidate areas corresponding to the detected target position areas contained in the h-1 video frame as positive sample data for training the second classifier;

taking candidate areas corresponding to the target position areas which do not contain detection in the h-1 video frame as negative sample data for training the second classifier;

constructing sample data for training a two-classifier by utilizing the positive sample data and the negative sample data;

and executing a classifier algorithm on the sample data of the training second classifier to obtain the trained classifier of the h-1 video frame.

The invention provides an unmanned aerial vehicle ground detection system based on synchronous detection and tracking, which is characterized by comprising the following components:

the training module is used for training the target detection deep neural network model and obtaining a model file and a weight file;

the acquisition module is used for acquiring real-time video data frame by frame;

an initialization module i, configured to initialize a frame number counter h=0;

the assignment module is used for enabling h=h+1 and executing the detection module and the judgment module b at the same time;

the detection module is used for carrying out forward reasoning on real-time video data collected frame by frame based on a trained model file and a weight file of the target detection depth neural network, and obtaining a target position area detected in an h video frame;

the judging module a is used for judging whether h is 1, if so, executing an initializing module II, and if not, storing the target position area detected in the h video frame into a judging module e;

the initialization module II is used for initializing a target tracker according to the target position area detected in the h video frame;

the judging module b is used for judging whether h is 1, if yes, executing the assignment module, and if no, executing the tracking module;

the tracking module is used for acquiring a candidate region corresponding to the detected target position region in the h-1 video frame and a region, corresponding to the candidate region corresponding to the h-1 video frame, in the h video frame, and taking the region as the candidate region corresponding to the detected target position region in the h video frame, and acquiring the tracked target position region in the h video frame according to the candidate region corresponding to the detected target position region in the h-1 video frame;

the judging module c is used for judging whether the target tracking in the h video frame is successful, if so, executing the judging module d, and if not, executing the initializing module I;

the judging module d is used for judging whether the pixel coordinates of the target position area image tracked in the h video frame exceed the coordinate range of the preset video frame image, if so, executing the initializing module I, and if not, executing the judging module e;

and the judging module e judges whether the detected target position area and the tracked target position area in the h video frame are overlapped or not, if so, the detected target position area is output, the assignment module is executed, and if not, the tracked target position area is output, and the assignment module is executed.

Preferably, the training module is specifically configured to:

Preferably, the detection module is specifically configured to:

Further, the value range of the preset multiple is [1.5,3].

Preferably, the judging module c is specifically configured to:

Compared with the closest prior art, the invention has the following beneficial effects:

according to the technical scheme, based on a trained model file and a weight file of a target detection depth neural network, counting video frames acquired by an unmanned aerial vehicle, performing forward reasoning on the video frames with the count of 1, acquiring a target position area, initializing a target tracker, simultaneously acquiring a detected target position area and a tracked target position area for each video frame acquired subsequently, and if tracking is successful and the image size of the target position area accords with the preset image size, comparing whether the detected target position area and the tracked target position area are overlapped or not, and determining a final output target position area; the scheme can reduce the interference caused by false detection of target detection; by combining the advantages of the accuracy of the deep learning target detection algorithm and the stability of the target tracking algorithm, the instability caused by jump of the target detection position can be avoided on the premise of keeping the advantage of high accuracy of the target detection algorithm of the deep neural network; meanwhile, the target tracking algorithm with a single scale can adapt to the multi-scale change of the target by means of the target detection algorithm; the technical scheme provided by the invention has small calculated amount, can directly utilize the calculation performance of the onboard GPU of the unmanned aerial vehicle, and has high application value.

Drawings

FIG. 1 is a flow chart of a method of unmanned aerial vehicle ground detection based on synchronous detection tracking;

FIG. 2 is a training flow diagram of a target detection model based on synchronous detection tracking in an embodiment of the invention;

FIG. 3 is a flow chart of real-time detection of targets based on synchronous detection tracking in an embodiment of the invention

Fig. 4 is a block diagram of a ground detection system of an unmanned aerial vehicle based on synchronous detection tracking.

Detailed Description

The following describes the embodiments of the present invention in further detail with reference to the drawings.

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention provides an unmanned aerial vehicle ground detection method based on synchronous detection tracking, which is shown in figure 1 and comprises the following steps:

step (2) collecting real-time video data frame by frame;

step (3) initializing a frame number counter h=0;

step (4) let h=h+1, and simultaneously executing step (5) and step (8);

Preferably, step (1) includes:

Preferably, step (5) comprises:

In an embodiment of the present invention, offline training of the target detection depth neural network includes:

a-1, labeling the video data of the same type aiming at a specific target to be detected and tracked by the unmanned aerial vehicle, and performing offline training on a deep neural network by using the labeling data on a GPU server or a computer with stronger performance;

step A-2, decomposing the video data of the same type acquired by the unmanned aerial vehicle into images, wherein the number of the images is as large as possible, and is usually not less than 1 ten thousand, in order to avoid overfitting and improve generalization capability; labeling targets (automobiles, people, tanks, unmanned aerial vehicles and the like) in each image; specifically: the method comprises the steps of framing a target by a rectangular frame, and recording pixel coordinates of vertexes of an upper left corner and a lower right corner of the rectangular frame or vertex coordinates of the upper left corner, length and width of the rectangular frame and corresponding target labels according to a specific format;

step A-3, constructing a deep neural network training platform (TensorFlow, darknet, caffe and the like), setting parameters such as training path size, learning rate and the like, reading a model of the deep neural network such as a mobiles-SSD, and updating the parameters of the model of the deep neural network of the specific target detection algorithm on the marked data;

and A-4, after training for a specific number of times (more than 10000 rounds), storing a training model of the deep neural network, and obtaining a model file and a weight file of the training model of the deep neural network.

Secondly, detecting a target:

step B-1, loading video data and reading video frames;

b-2, initializing a frame number counter to 0;

step B-3, adding 1 to the frame counter, and simultaneously executing step B-4 and step B-7;

b-4, loading a pre-training model based on a deep learning algorithm, and detecting a specific target on the read video frame by using a deep learning forward reasoning mechanism: reading a target category label, a pre-training parameter model file, a weight file and a video frame to be detected, and performing forward reasoning on a new video frame to acquire target position information and confidence;

b-5, judging whether h is 1, if so, executing the step B-6, and if not, storing the detected target position area to the step C-4;

b-6, initializing by the target tracker by taking the target position detected by the target detector as a tracking starting point;

b-7, judging whether h is 1, if so, executing the step B-3, and if not, executing the step C-1;

finally, tracking the target:

step C-1, tracking the target by a tracking algorithm, and updating the target position on a new video frame: determining the position of a candidate region of the previous frame, and extracting the characteristics of the candidate region; searching a region which is most matched with the characteristics of the candidate region in the current and subsequent video frames as a target tracking object, and acquiring a target position region tracked in the video frames;

c-2, judging whether target tracking is successful or not through a preset threshold value, executing the step B-2 when the target tracking is unsuccessful, and executing the next step when the target tracking is successful;

c-3, judging whether the pixel coordinates of the image of the position area of the output target exceed the coordinate range of the preset video frame image, if so, executing the step B-2, and if not, outputting the position of the target and executing the step C-4;

and C-4, judging whether the detected target position area and the tracked target position area in the current video frame are overlapped, if so, outputting the detected target position area, executing the step B-3, and if not, outputting the tracked target position area, and executing the step B-3.

Preferably, the obtaining a candidate region corresponding to the detected target position region in the h-1 video frame includes:

Further, the value range of the preset multiple is [1.5,3].

Preferably, the training process of the classifier of the h-1 th video frame comprises:

In the embodiment of the invention, in the step C-2, a candidate region of the current frame is used as a template, and whether a real target frame is contained as a positive sample or not is used as a positive sample for training a classification algorithm to obtain a classifier; obtaining a prediction template on the next frame image according to the template of the real target frame of the current frame, and generating a plurality of alternative templates by using a cyclic matrix; operating a classifier generated by the current frame by taking the alternative template as a sample on an image of the next frame to obtain labels of all samples, wherein an alternative frame corresponding to the label containing the real position of the target is taken as a target prediction template in the next frame; comparing the relative positions of the predicted template of the next frame and the amplified template of the real target of the current frame to obtain the position change of the target, thereby obtaining a new target position in the next frame; and comparing the classification value obtained by the classifier with a preset value M, if the classification value is larger than M, tracking successfully, and if the classification value is smaller than M, tracking fails.

Based on the technical scheme provided by the invention, the embodiment of the invention also provides a training flow chart of the target detection model based on the confidence coefficient, as shown in fig. 2:

s1, offline training a target detection model:

s11, aiming at monitoring a specific area, acquiring a video or an image, wherein the acquired image or video scene is required to be similar to the scene of the monitoring area of the actual unmanned aerial vehicle as far as possible;

s12, marking multiple types of targets (vehicles, personnel, trees and the like) in the acquired video or image frame by frame, wherein a marking frame is preferably a rectangular frame, positioning is performed through top left corners and bottom right corners or positioning is performed by adopting top left corners and rectangular long and wide sides, marked coordinates and type labels are stored as xml or txt file types according to a fixed format, and an index file is established to correspond image paths and file names to xml or txt file path file names one by one;

s13, selecting a training platform of a deep neural network, wherein the training platform can be caffe, tensorlow, pyrach and dark net, but is not limited to the platform;

s14, selecting a target detection depth neural network including but not limited to a Mobilens-SSD target detection neural network, setting parameters such as training pathsize, learning rate and the like, reading training images and corresponding xml or txt files according to the index file, and training on the training platform selected in the S13 by using the marked data;

s15, performing N-wheel training on the acquired data in the training process of S14, wherein N is usually not less than 10000, and storing the obtained model file for later use in a real-time target detection process.

Based on the technical scheme provided by the invention, the embodiment of the invention also provides a target real-time detection flow chart based on the confidence coefficient, as shown in fig. 3:

s2, online real-time target detection:

s21, reading camera video or image data frame by frame on the unmanned aerial vehicle in real time;

s22, running a lightweight forward reasoning framework which is convenient to deploy on a mobile platform, wherein the lightweight forward reasoning framework comprises, but is not limited to, an opencvDNN module, a TensorRT forward reasoning module, a Tengine forward reasoning module and a tennine forward reasoning module;

s23, reading the model weight file trained and saved in the S15, detecting a selected target on a video or image read frame by frame, and obtaining and outputting corresponding information such as a target position rectangular frame, confidence coefficient, category label and the like;

the invention provides an unmanned aerial vehicle ground detection system based on synchronous detection tracking, as shown in fig. 4, comprising:

Preferably, the training module is specifically configured to:

Preferably, the detection module is specifically configured to:

Further, the value range of the preset multiple is [1.5,3].

Preferably, the judging module c is specifically configured to:

Further, the training process of the classifier of the h-1 th video frame comprises the following steps:

constructing sample data for training the two classifiers by using the positive sample data and the negative sample data;

The unmanned aerial vehicle ground detection system or the electronic equipment loaded with the unmanned aerial vehicle ground detection method can be deployed on the unmanned aerial vehicle so as to monitor and track the target.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims

1. An unmanned aerial vehicle ground detection method based on synchronous detection tracking is characterized by comprising the following steps:

step (2) collecting real-time video data frame by frame;

step (3) initializing a frame number counter h=0;

step (4) let h=h+1, and simultaneously executing step (5) and step (8);

2. The method of claim 1, wherein the step (1) comprises:

3. The method of claim 1, wherein the step (5) comprises:

and sequentially reading the label corresponding to the target, the trained model file, the weight file and the real-time video data acquired frame by using the forward reasoning frame, and acquiring the detected target position area output by the forward reasoning frame.

4. The method of claim 1, wherein the acquiring the candidate region corresponding to the detected target location region in the h-1 th video frame comprises:

5. The method of claim 4, wherein the predetermined multiple has a value in the range of [1.5,3].

6. The method of claim 1, wherein the step (10) comprises:

7. The method of claim 6, wherein the training process of the classifier of the h-1 th video frame comprises:

8. Unmanned aerial vehicle ground detection system based on synchronous detection tracking, characterized in that the system includes:

9. The system of claim 8, wherein the training module is specifically configured to:

10. The system of claim 8, wherein the detection module is specifically configured to:

and sequentially reading the label corresponding to the target, the trained model file, the weight file and the video data acquired frame by utilizing the forward reasoning frame, and acquiring the detected target position area output by the forward reasoning frame.

11. The system of claim 8, wherein the acquiring the candidate region corresponding to the detected target location region in the h-1 th video frame comprises:

12. The system of claim 11, wherein the predetermined multiple has a value in the range of [1.5,3].

13. The system of claim 8, wherein the judging module c is specifically configured to:

14. The system of claim 13, wherein the training process of the classifier of the h-1 st video frame comprises: