CN112070085B

CN112070085B - Unmanned aerial vehicle multi-feature point detection method and device based on two-stage cascade depth network

Info

Publication number: CN112070085B
Application number: CN202010925838.7A
Authority: CN
Inventors: 胡天江; 李铭慧; 郑勋臣; 张嘉榕; 朱波
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2023-07-28
Anticipated expiration: 2040-09-04
Also published as: CN112070085A

Abstract

The invention discloses an unmanned aerial vehicle multi-feature point detection method and device based on a two-stage cascade depth network. The method comprises the following steps: the method comprises the steps that category and boundary box labeling is carried out on a plurality of characteristic areas of an unmanned aerial vehicle in each acquired unmanned aerial vehicle image, and corresponding training images are obtained; inputting each training image into a pre-constructed boundary frame positioning network for training, so that the boundary frame positioning network outputs a plurality of characteristic region prediction frames; extracting corresponding regions of interest according to each characteristic region prediction frame, and inputting all the regions of interest into a pre-constructed characteristic point regression network for training; and performing multi-feature point detection on the image to be detected through a two-stage cascade depth network consisting of the trained boundary box positioning network and the feature point regression network to obtain a plurality of feature point coordinates. The invention can stably and accurately detect a plurality of characteristic points of the unmanned aerial vehicle in real time.

Description

Unmanned aerial vehicle multi-feature point detection method and device based on two-stage cascade depth network

Technical Field

The invention relates to the technical field of image processing, in particular to an unmanned aerial vehicle multi-feature point detection method and device based on a two-stage cascade depth network.

Background

In the autonomous landing process of the unmanned aerial vehicle, the gesture of the unmanned aerial vehicle is an important control index. The existing unmanned aerial vehicle attitude estimation technology mainly depends on means such as inertial measurement elements and visual cooperation mark positioning. Although the navigation system based on the inertial measurement unit has obvious advantages in the application field, the navigation system can only be installed on an unmanned aerial vehicle carrier for use, so that ground personnel cannot obtain accurate unmanned aerial vehicle attitude information under the condition that a communication link is interfered. The gesture resolving method based on the cooperative sign needs to be arranged in advance, has certain requirements on ambient illumination, and still depends on active communication between the unmanned aerial vehicle and the ground station.

To facilitate the development of unmanned aerial vehicle attitude estimation, it is currently mainly studied how to directly acquire unmanned aerial vehicle attitude information on the ground based on vision, wherein one of the challenges to be faced is unmanned aerial vehicle multi-feature point detection. In the traditional unmanned aerial vehicle multi-feature point detection method, feature points are extracted by respectively establishing a shape model and a texture model through extracting bottom layer features such as shapes, colors and edges, and the application requirements are difficult to meet in the aspects of detection speed and robustness.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides the unmanned aerial vehicle multi-feature point detection method and device based on the two-stage cascade depth network, which can stably and accurately detect a plurality of feature points of the unmanned aerial vehicle in real time.

In order to solve the above technical problems, in a first aspect, an embodiment of the present invention provides an unmanned aerial vehicle multi-feature point detection method based on a two-stage cascade depth network, including:

the method comprises the steps that category and boundary box labeling is carried out on a plurality of characteristic areas of an unmanned aerial vehicle in each acquired unmanned aerial vehicle image, and corresponding training images are obtained;

inputting each training image into a pre-constructed boundary frame positioning network for training, so that the boundary frame positioning network outputs a plurality of characteristic region prediction frames;

extracting corresponding regions of interest according to each characteristic region prediction frame, and inputting all the regions of interest into a pre-constructed characteristic point regression network for training;

and performing multi-feature point detection on the image to be detected through a two-stage cascade depth network consisting of the trained boundary box positioning network and the feature point regression network to obtain a plurality of feature point coordinates.

Further, the classifying and bounding box labeling are performed on a plurality of characteristic areas of the unmanned aerial vehicle in each acquired unmanned aerial vehicle image, so as to obtain a corresponding training image, which specifically comprises:

acquiring unmanned aerial vehicle images, and marking a plurality of characteristic areas of the unmanned aerial vehicle in each unmanned aerial vehicle image by using a boundary frame marking tool to obtain corresponding training images; the characteristic area comprises an unmanned aerial vehicle body, a left wing, a right wing, a left tail wing, a right tail wing and a middle foot rest area.

Further, the training image is input into a pre-constructed bounding box positioning network to train, so that the bounding box positioning network outputs a plurality of feature area prediction frames, specifically:

constructing the boundary frame positioning network, inputting a training image into the boundary frame positioning network for training, and enabling the boundary frame positioning network to output a plurality of characteristic region prediction frames; the characteristic region prediction frame comprises an unmanned aerial vehicle body, a left wing, a right wing, a left tail wing, a right tail wing and a tripod region prediction frame;

and reversely updating network parameters of the boundary frame positioning network according to the network loss of the boundary frame positioning network, finishing training the boundary frame positioning network when the network loss of the boundary frame positioning network is smaller than a first preset threshold, otherwise, continuously inputting the next training image into the boundary frame positioning network for training.

Further, the extracting a corresponding region of interest according to each of the feature region prediction frames specifically includes:

and extracting an image region with a preset size based on the central point of the characteristic region prediction frame, and taking the image region as the corresponding region of interest.

Further, inputting all the interested areas into a pre-constructed feature point regression network for training, specifically:

constructing the characteristic point regression network, and inputting all the regions of interest corresponding to a training image into the characteristic point regression network for training;

and reversely updating network parameters of the characteristic point regression network according to the network loss of the characteristic point regression network, finishing training the characteristic point regression network when the network loss of the characteristic point regression network is smaller than a second preset threshold, otherwise, inputting all the regions of interest corresponding to the next training image into the characteristic point regression network for training.

In a second aspect, an embodiment of the present invention provides an unmanned aerial vehicle multi-feature point detection device based on a dual-stage cascade depth network, including:

the training image acquisition module is used for marking a plurality of characteristic areas of the unmanned aerial vehicle in the acquired images of each unmanned aerial vehicle by category and boundary frames to obtain corresponding training images;

the boundary frame positioning network training module is used for inputting each training image into a pre-constructed boundary frame positioning network for training, so that the boundary frame positioning network outputs a plurality of characteristic region prediction frames;

the feature point regression network training module is used for extracting corresponding regions of interest according to each feature region prediction frame, and inputting all the regions of interest into a pre-constructed feature point regression network for training;

and the multi-feature point detection module is used for carrying out multi-feature point detection on the image to be detected through a two-stage cascade depth network consisting of the trained boundary box positioning network and the feature point regression network to obtain a plurality of feature point coordinates.

The embodiment of the invention has the following beneficial effects:

the method comprises the steps of marking a plurality of characteristic areas of an unmanned aerial vehicle in each acquired unmanned aerial vehicle image by category and boundary frames, obtaining corresponding training images, inputting each training image into a pre-built boundary frame positioning network for training, enabling the boundary frame positioning network to output a plurality of characteristic area prediction frames, extracting corresponding interested areas according to each characteristic area prediction frame, inputting all interested areas into a pre-built characteristic point regression network for training, and detecting a plurality of characteristic points of an image to be detected through a two-stage cascade depth network consisting of the trained boundary frame positioning network and the characteristic point regression network, so as to obtain a plurality of characteristic point coordinates, and detecting a plurality of characteristic points of the unmanned aerial vehicle. Compared with the prior art, the method and the device have the advantages that the boundary box positioning network is trained by utilizing the training image, a plurality of feature area prediction frames are directly output by the boundary box positioning network according to the training image, the feature point regression network is trained by utilizing the region of interest corresponding to the feature area prediction frames, a plurality of feature point coordinates are output by the feature point regression network according to the region of interest, and therefore a plurality of feature points of the unmanned aerial vehicle in the image to be detected can be stably and accurately detected in real time through a two-stage cascade depth network formed by the trained boundary box positioning network and the feature point regression network.

Drawings

Fig. 1 is a schematic flow chart of a method for detecting multiple feature points of an unmanned aerial vehicle based on a two-stage cascade depth network in a first embodiment of the invention;

FIG. 2 is a schematic diagram of a dual-cascaded depth network according to a first embodiment of the present invention;

fig. 3 is a schematic structural diagram of an unmanned aerial vehicle multi-feature point detection device based on a two-stage cascade depth network according to a second embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made more apparent and fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that, the step numbers herein are only for convenience of explanation of the specific embodiments, and are not used as limiting the order of execution of the steps.

First embodiment:

as shown in fig. 1, a first embodiment provides a method for detecting multiple feature points of an unmanned aerial vehicle based on a two-stage cascade depth network, which includes steps S1 to S4:

s1, marking a plurality of characteristic areas of an unmanned aerial vehicle and a boundary box in each acquired unmanned aerial vehicle image to obtain a corresponding training image;

s2, inputting each training image into a pre-constructed boundary frame positioning network for training, and enabling the boundary frame positioning network to output a plurality of characteristic region prediction frames;

s3, extracting corresponding regions of interest according to each characteristic region prediction frame, and inputting all the regions of interest into a pre-constructed characteristic point regression network for training;

and S4, performing multi-feature point detection on the image to be detected through a two-stage cascade depth network consisting of the trained boundary box positioning network and the feature point regression network, and obtaining a plurality of feature point coordinates.

It should be noted that the unmanned aerial vehicle image is an RGB three-channel image, the bounding box positioning network is a bbox locate-Net network, the characteristic point regression network is a pointRefine-Net network, and the two-stage cascade depth network is obtained by cascading the bbox locate-Net network and the pointRefine-Net network. A schematic of the structure of a dual cascaded depth network is shown in fig. 2.

As an example, in step S1, an unmanned aerial vehicle in the landing process is photographed by a ground camera to obtain unmanned aerial vehicle images, and categories and bounding boxes of a plurality of feature areas of the unmanned aerial vehicle are marked in each unmanned aerial vehicle image to obtain corresponding training images, so that the bounding box positioning network is trained by the training images later, and a plurality of feature points of the unmanned aerial vehicle are detected on the ground based on vision.

In a preferred embodiment, a plurality of feature areas of the unmanned aerial vehicle are labeled in each acquired unmanned aerial vehicle image by category and bounding box, and a corresponding training image is obtained, specifically: acquiring unmanned aerial vehicle images, and marking a plurality of characteristic areas of the unmanned aerial vehicle in each unmanned aerial vehicle image by using a boundary frame marking tool to obtain corresponding training images; the characteristic area comprises an unmanned aerial vehicle body, a left wing, a right wing, a left tail wing, a right tail wing and a middle foot rest area.

According to the method, the boundary boxes of the unmanned aerial vehicle body, the left wing, the right wing, the left tail wing, the right tail wing, the middle foot rest and other areas are marked in each unmanned aerial vehicle image by using the boundary box marking tool, so that the boundary box positioning network is trained by using training images marked with a plurality of characteristic areas, and the detection efficiency, the detection robustness and the detection precision of the boundary box positioning network to the unmanned aerial vehicle multi-characteristic areas are improved.

As an example, in step S2, after obtaining the training images, each training image is directly input into the pre-constructed bounding box positioning network in sequence to train, so that the bounding box positioning network outputs a plurality of feature area prediction frames, and network parameters of the bounding box positioning network are iteratively updated.

In a preferred embodiment, each training image is input into a pre-constructed bounding box positioning network for training, so that the bounding box positioning network outputs a plurality of feature area prediction frames, specifically: constructing a boundary frame positioning network, inputting a training image into the boundary frame positioning network for training, and enabling the boundary frame positioning network to output a plurality of characteristic region prediction frames; the characteristic region prediction frame comprises an unmanned aerial vehicle body, a left wing, a right wing, a left tail wing, a right tail wing and a middle foot rest region prediction frame; and reversely updating network parameters of the boundary frame positioning network according to the network loss of the boundary frame positioning network, ending training the boundary frame positioning network when the network loss of the boundary frame positioning network is smaller than a first preset threshold value, otherwise, continuing inputting the next training image into the boundary frame positioning network for training.

The specific processes of constructing, training and applying the bounding box positioning network are as follows:

1. constructing a bboxLocate-Net network;

the bbox location-Net network is improved based on YOLO V3, an RGB three-channel image with 416 x 416 is input, the feature extraction network is a 23-layer Darknet-tiny network formed by stacking convolution pooling units, and the detection layer predicts the boundary frame by adopting 13 x 13 and 26 x 26 double scales.

2. Training a bboxLocate-Net network;

scaling an input training image to 416 x 416, inputting the training image to a dark-tiny network in batches for feature extraction, performing convolution and pooling operation to obtain output predictions with two different scales, and performing loss value calculation on the prediction results in a YOLO detection layer. The penalty values include bounding box coordinate penalty, confidence penalty, and category confidence penalty, the penalty values calculated from the penalty function.

3. Detecting and positioning six characteristic areas of the unmanned aerial vehicle;

inputting the image to be detected into a bboxLocate-Net network to obtain a predicted tensor, wherein the predicted tensor comprises a central coordinate value (t _x ，t _y ) Width and height values (t _w ，t _h ) As well as confidence and category.

As an example, in step S3, after the bounding box positioning network outputs a plurality of feature region prediction frames, a region of interest (ROI) corresponding to the feature region prediction frames is input into a pre-constructed feature point regression network for training, which is beneficial to improving the detection efficiency, the detection robustness and the detection precision of the feature point network on the multi-feature points of the unmanned aerial vehicle.

In a preferred embodiment, the extracting a corresponding region of interest according to each feature region prediction frame is specifically: and extracting an image region with a preset size based on the central point of the characteristic region prediction frame, and taking the image region as a corresponding region of interest.

In a preferred embodiment, all the regions of interest are input into a pre-constructed feature point regression network for training, specifically: constructing a feature point regression network, and inputting all the regions of interest corresponding to a training image into the feature point regression network for training; and reversely updating network parameters of the characteristic point regression network according to the network loss of the characteristic point regression network, ending training the characteristic point regression network when the network loss of the characteristic point regression network is smaller than a second preset threshold, otherwise, inputting all the interested areas corresponding to the next training image into the characteristic point regression network for training.

The specific process of constructing, training and applying the characteristic point regression network is as follows:

1. constructing a pointRefine-Net network;

the pointRefine-Net network is a point regression network group and comprises five sub-networks, the input image size of each network is 15 x 15, the full-connection network is adopted in the last layer after the feature extraction of the main network, and a tensor of 1 x 2 is output, wherein the tensor is the predicted value of the feature point coordinate.

2. Training a pointRefine-Net network;

each sub-network of the pointRefine-Net network is independently trained, training samples are randomly intercepted near feature points, 10 small areas are randomly generated near each feature point of each image to serve as an interested area, the network regression target is the offset of the feature point relative to the upper left corner coordinates of the area, and the aim of minimizing the offset loss value is achieved, and the convolution kernel parameters are iteratively updated through a backward gradient propagation algorithm. And stopping training to obtain a final characteristic point regression network when the loss value is lower than a certain threshold value.

3. On the basis of coarse extraction of six characteristic areas of the unmanned aerial vehicle, fine positioning is carried out on the characteristic points in each area;

after the boundary frame of the six characteristic areas of the unmanned aerial vehicle is obtained, the boundary frame is taken as the center, the width and the height are respectively (w is 0.15 and h is 0.15), an ROI area is generated around the coarse positioning point, and the ROI area is sent into a corresponding pointeonce-Net network to obtain more accurate characteristic point coordinates.

As an example, in step S4, after the training of the bounding box positioning network and the feature point regression network is completed, the bounding box positioning network and the feature point regression network are cascaded to obtain a two-stage cascade depth network, so that the multi-feature point detection can be performed on the image to be detected through the two-stage cascade depth network, thereby realizing the real-time stable and accurate detection of a plurality of feature points of the unmanned aerial vehicle.

The first-stage network bboxLocate-Net network of the double-cascade depth network is a category and bounding box regression network, a large number of training images train the bboxLocate-Net network, and after one image is input, the category and the frame positioning of unmanned aerial vehicle bodies, left wings, right wings, left tail wings, right tail wings and tripod areas can be carried out on the bboxLocate-Net network. The second-stage network pointRefine-Net network of the two-stage cascade depth network is a network group consisting of five sub-networks, each sub-network is responsible for regression of a characteristic point, training samples are small areas which are randomly intercepted by 10 different positions near each characteristic point of each image and contain the characteristic point, and the characteristic point can be accurately positioned on the basis of a characteristic area prediction frame output by the bbox location-Net network.

The bboxLocate-Net network and the pointRefine-Net network share three tasks to output, wherein the first task of the bboxLocate-Net network outputs discrete classification probability, namely whether one of five characteristic areas exists, and the second task outputs specific coordinates and frame width and height values of the characteristic areas. The only task of the pointRefine-Net network is to output the coordinate vector of the feature point regression.

Taking the left wing tip region as an example, for example, the formula (1) is a loss function of classification tasks, the formula (2) is a loss function of boundary box regression tasks, and the formula (3) is a loss function of wing tip characteristic point regression tasks:

in the formula (1), y _i Representing sample x _i With a value of 0 or 1,0 representing the non-left wing tip region, 1 representing the left wing tip region, p _i Representing a probability that the network determines the sample as a left wing tip;

in the formula (2), the amino acid sequence of the compound,representing the frame position increment of each candidate window predicted by the network, wherein y represents the actual boundary frame position increment;

in the formula (3), the amino acid sequence of the compound,and (3) representing a network predicted unmanned aerial vehicle characteristic point position vector, and y representing a real characteristic point vector.

After the image to be detected is input, the approximate region where a plurality of characteristic points are positioned is determined through a bbox location-Net network, a region is intercepted by taking the center point of the region as the center, the width and the height are (w is 0.15 and h is 0.15), and the region is sent to a corresponding trained pointRefine-Net point regression network, so that the coordinates of the characteristic points can be accurately positioned in the region of the ROI.

According to the method for detecting the multi-feature points of the unmanned aerial vehicle based on the two-stage cascade depth network, a to-be-detected image is firstly input into a bbox location-Net network to obtain a prediction frame of six feature areas of the unmanned aerial vehicle, and then each sub-area is respectively sent into corresponding second-stage network pointensine-Net regression five-feature point coordinates.

Second embodiment:

as shown in fig. 3, a second embodiment provides an unmanned aerial vehicle multi-feature point detection device based on a dual-stage cascade depth network, including: the training image obtaining module 21 is configured to perform category and bounding box labeling on a plurality of feature areas of the unmanned aerial vehicle in each acquired unmanned aerial vehicle image, so as to obtain a corresponding training image; the bounding box positioning network training module 22 is configured to input each training image into a pre-constructed bounding box positioning network for training, so that the bounding box positioning network outputs a plurality of feature area prediction frames; the feature point regression network training module 23 is configured to extract a corresponding region of interest according to each feature region prediction frame, and input all the regions of interest into a pre-constructed feature point regression network for training; the multi-feature point detection module 24 is configured to perform multi-feature point detection on an image to be detected through a two-stage cascade depth network composed of a trained bounding box positioning network and a feature point regression network, so as to obtain a plurality of feature point coordinates.

It should be noted that the unmanned aerial vehicle image is an RGB three-channel image, the bounding box positioning network is a bbox locate-Net network, the characteristic point regression network is a pointRefine-Net network, and the two-stage cascade depth network is obtained by cascading the bbox locate-Net network and the pointRefine-Net network.

Illustratively, through the training image acquisition module 21, the ground camera shoots the unmanned aerial vehicle in the landing process to obtain unmanned aerial vehicle images, and marks the categories and the bounding boxes of a plurality of characteristic areas of the unmanned aerial vehicle in each unmanned aerial vehicle image to obtain corresponding training images, so that the bounding box positioning network is trained by the training images, and a plurality of characteristic points of the unmanned aerial vehicle are detected on the ground based on vision.

According to the embodiment, the boundary frame marking tool is utilized to mark the boundary frames of the areas such as the unmanned aerial vehicle body, the left wing, the right wing, the left tail wing, the right tail wing, the middle foot rest and the like in each unmanned aerial vehicle image through the training image acquisition module 21, so that the boundary frame positioning network is trained by the training images marked with a plurality of characteristic areas, and the detection efficiency, the detection robustness and the detection precision of the boundary frame positioning network to the multi-characteristic areas of the unmanned aerial vehicle are improved.

Illustratively, by the bounding box positioning network training module 22, after obtaining training images, each training image is directly input into a pre-constructed bounding box positioning network in sequence for training, so that the bounding box positioning network outputs a plurality of feature area prediction frames, and network parameters of the bounding box positioning network are iteratively updated.

1. constructing a bboxLocate-Net network;

2. Training a bboxLocate-Net network;

As an example, through the feature point regression network training module 23, after the bounding box positioning network outputs a plurality of feature region prediction frames, the region of interest (ROI) corresponding to the feature region prediction frames is input into the pre-constructed feature point regression network for training, which is beneficial to improving the detection efficiency, the detection robustness and the detection precision of the feature point network to the multi-feature points of the unmanned aerial vehicle.

1. constructing a pointRefine-Net network;

2. Training a pointRefine-Net network;

Illustratively, by the multi-feature point detection module 24, after the bounding box positioning network and the feature point regression network are trained, the bounding box positioning network and the feature point regression network are cascaded to obtain a two-stage cascade depth network, so that the multi-feature point detection can be performed on the image to be detected through the two-stage cascade depth network, thereby realizing the real-time stable and accurate detection of a plurality of feature points of the unmanned aerial vehicle.

Taking the left wing tip region as an example, for example, the formula (4) is a loss function of classification tasks, the formula (5) is a loss function of boundary box regression tasks, and the formula (6) is a loss function of wing tip characteristic point regression tasks:

in the formula (4), y _i Representing sample x _i With a value of 0 or 1,0 representing the non-left wing tip region, 1 representing the left wing tip region, p _i Representing a probability that the network determines the sample as a left wing tip;

in the formula (5), the amino acid sequence of the compound,representing the frame position increment of each candidate window predicted by the network, wherein y represents the actual boundary frame position increment;

in the formula (6), the amino acid sequence of the compound,unmanned representation of network predictionsAnd (3) a machine feature point position vector, wherein y represents a real feature point vector.

In this embodiment, the multi-feature point detection module 24 is used to determine the approximate region where the feature points are located by the bbox location-Net network after the image to be detected is input, and the region center point is taken as the center, the width and height are (w is 0.15, h is 0.15), an ROI region is intercepted, and the ROI region is sent to the corresponding trained pointRefine-Net point regression network, so that the feature point coordinates can be accurately located in the ROI region.

In summary, the embodiment of the invention has the following beneficial effects:

the method comprises the steps of marking a plurality of characteristic areas of an unmanned aerial vehicle in each acquired unmanned aerial vehicle image by category and boundary frames, obtaining corresponding training images, inputting each training image into a pre-built boundary frame positioning network for training, enabling the boundary frame positioning network to output a plurality of characteristic area prediction frames, extracting corresponding interested areas according to each characteristic area prediction frame, inputting all interested areas into a pre-built characteristic point regression network for training, and detecting a plurality of characteristic points of an image to be detected through a two-stage cascade depth network consisting of the trained boundary frame positioning network and the characteristic point regression network, so as to obtain a plurality of characteristic point coordinates, and detecting a plurality of characteristic points of the unmanned aerial vehicle. According to the embodiment of the invention, the boundary frame positioning network is trained by utilizing the training image, so that the boundary frame positioning network directly outputs a plurality of feature area prediction frames according to the training image, and the feature point regression network is trained by utilizing the region of interest corresponding to the feature area prediction frames, so that the feature point regression network outputs a plurality of feature point coordinates according to the region of interest, and therefore, a plurality of feature points of the unmanned aerial vehicle in the image to be detected can be stably and accurately detected in real time through a two-stage cascade depth network consisting of the trained boundary frame positioning network and the feature point regression network.

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Those skilled in the art will appreciate that implementing all or part of the above-described embodiments may be accomplished by way of computer programs, which may be stored on a computer readable storage medium, which when executed may comprise the steps of the above-described embodiments. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.

Claims

1. The unmanned aerial vehicle multi-feature point detection method based on the two-stage cascade depth network is characterized by comprising the following steps of:

inputting each training image into a pre-constructed boundary frame positioning network for training, so that the boundary frame positioning network outputs a plurality of characteristic region prediction frames; inputting each training image into a pre-constructed boundary frame positioning network for training, and enabling the boundary frame positioning network to output a plurality of characteristic region prediction frames, wherein the method specifically comprises the following steps: constructing the boundary frame positioning network, inputting a training image into the boundary frame positioning network for training, and enabling the boundary frame positioning network to output a plurality of characteristic region prediction frames; the characteristic region prediction frame comprises an unmanned aerial vehicle body, a left wing, a right wing, a left tail wing, a right tail wing and a tripod region prediction frame; reversely updating network parameters of the boundary frame positioning network according to the network loss of the boundary frame positioning network, ending training the boundary frame positioning network when the network loss of the boundary frame positioning network is smaller than a first preset threshold value, otherwise, continuing inputting the next training image into the boundary frame positioning network for training;

extracting corresponding regions of interest according to each characteristic region prediction frame, and inputting all the regions of interest into a pre-constructed characteristic point regression network for training; the extracting the corresponding interested region according to each characteristic region prediction frame specifically comprises the following steps: extracting an image region with a preset size based on the central point of the characteristic region prediction frame, and taking the image region as the corresponding region of interest; inputting all the interested areas into a pre-constructed characteristic point regression network for training, wherein the training is specifically as follows: constructing the characteristic point regression network, and inputting all the regions of interest corresponding to a training image into the characteristic point regression network for training; reversely updating network parameters of the characteristic point regression network according to the network loss of the characteristic point regression network, ending training the characteristic point regression network when the network loss of the characteristic point regression network is smaller than a second preset threshold, otherwise, inputting all the regions of interest corresponding to the next training image into the characteristic point regression network for training;

2. The method for detecting the multiple feature points of the unmanned aerial vehicle based on the two-stage cascade depth network according to claim 1, wherein the method is characterized in that the category and the bounding box of the multiple feature areas of the unmanned aerial vehicle are marked in each acquired unmanned aerial vehicle image to obtain the corresponding training image, specifically:

3. Unmanned aerial vehicle multi-feature point detection device based on doublestage allies oneself with degree of depth network, its characterized in that includes:

the boundary frame positioning network training module is used for inputting each training image into a pre-constructed boundary frame positioning network for training, so that the boundary frame positioning network outputs a plurality of characteristic region prediction frames; the method is particularly used for: constructing the boundary frame positioning network, inputting a training image into the boundary frame positioning network for training, and enabling the boundary frame positioning network to output a plurality of characteristic region prediction frames; the characteristic region prediction frame comprises an unmanned aerial vehicle body, a left wing, a right wing, a left tail wing, a right tail wing and a tripod region prediction frame; reversely updating network parameters of the boundary frame positioning network according to the network loss of the boundary frame positioning network, ending training the boundary frame positioning network when the network loss of the boundary frame positioning network is smaller than a first preset threshold value, otherwise, continuing inputting the next training image into the boundary frame positioning network for training;

the feature point regression network training module is used for extracting corresponding regions of interest according to each feature region prediction frame, and inputting all the regions of interest into a pre-constructed feature point regression network for training; the method is particularly used for: extracting an image region with a preset size based on the central point of the characteristic region prediction frame, and taking the image region as the corresponding region of interest; constructing the characteristic point regression network, and inputting all the regions of interest corresponding to a training image into the characteristic point regression network for training; reversely updating network parameters of the characteristic point regression network according to the network loss of the characteristic point regression network, ending training the characteristic point regression network when the network loss of the characteristic point regression network is smaller than a second preset threshold, otherwise, inputting all the regions of interest corresponding to the next training image into the characteristic point regression network for training;

4. The unmanned aerial vehicle multi-feature point detection device based on the two-stage cascade depth network according to claim 3, wherein the classification and bounding box labeling are performed on a plurality of feature areas of the unmanned aerial vehicle in each acquired unmanned aerial vehicle image, so as to obtain a corresponding training image, specifically: