CN112070085B - Unmanned aerial vehicle multi-feature point detection method and device based on two-stage cascade depth network - Google Patents

Unmanned aerial vehicle multi-feature point detection method and device based on two-stage cascade depth network Download PDF

Info

Publication number
CN112070085B
CN112070085B CN202010925838.7A CN202010925838A CN112070085B CN 112070085 B CN112070085 B CN 112070085B CN 202010925838 A CN202010925838 A CN 202010925838A CN 112070085 B CN112070085 B CN 112070085B
Authority
CN
China
Prior art keywords
network
training
aerial vehicle
unmanned aerial
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010925838.7A
Other languages
Chinese (zh)
Other versions
CN112070085A (en
Inventor
胡天江
李铭慧
郑勋臣
张嘉榕
朱波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202010925838.7A priority Critical patent/CN112070085B/en
Publication of CN112070085A publication Critical patent/CN112070085A/en
Application granted granted Critical
Publication of CN112070085B publication Critical patent/CN112070085B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an unmanned aerial vehicle multi-feature point detection method and device based on a two-stage cascade depth network. The method comprises the following steps: the method comprises the steps that category and boundary box labeling is carried out on a plurality of characteristic areas of an unmanned aerial vehicle in each acquired unmanned aerial vehicle image, and corresponding training images are obtained; inputting each training image into a pre-constructed boundary frame positioning network for training, so that the boundary frame positioning network outputs a plurality of characteristic region prediction frames; extracting corresponding regions of interest according to each characteristic region prediction frame, and inputting all the regions of interest into a pre-constructed characteristic point regression network for training; and performing multi-feature point detection on the image to be detected through a two-stage cascade depth network consisting of the trained boundary box positioning network and the feature point regression network to obtain a plurality of feature point coordinates. The invention can stably and accurately detect a plurality of characteristic points of the unmanned aerial vehicle in real time.

Description

Unmanned aerial vehicle multi-feature point detection method and device based on two-stage cascade depth network
Technical Field
The invention relates to the technical field of image processing, in particular to an unmanned aerial vehicle multi-feature point detection method and device based on a two-stage cascade depth network.
Background
In the autonomous landing process of the unmanned aerial vehicle, the gesture of the unmanned aerial vehicle is an important control index. The existing unmanned aerial vehicle attitude estimation technology mainly depends on means such as inertial measurement elements and visual cooperation mark positioning. Although the navigation system based on the inertial measurement unit has obvious advantages in the application field, the navigation system can only be installed on an unmanned aerial vehicle carrier for use, so that ground personnel cannot obtain accurate unmanned aerial vehicle attitude information under the condition that a communication link is interfered. The gesture resolving method based on the cooperative sign needs to be arranged in advance, has certain requirements on ambient illumination, and still depends on active communication between the unmanned aerial vehicle and the ground station.
To facilitate the development of unmanned aerial vehicle attitude estimation, it is currently mainly studied how to directly acquire unmanned aerial vehicle attitude information on the ground based on vision, wherein one of the challenges to be faced is unmanned aerial vehicle multi-feature point detection. In the traditional unmanned aerial vehicle multi-feature point detection method, feature points are extracted by respectively establishing a shape model and a texture model through extracting bottom layer features such as shapes, colors and edges, and the application requirements are difficult to meet in the aspects of detection speed and robustness.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides the unmanned aerial vehicle multi-feature point detection method and device based on the two-stage cascade depth network, which can stably and accurately detect a plurality of feature points of the unmanned aerial vehicle in real time.
In order to solve the above technical problems, in a first aspect, an embodiment of the present invention provides an unmanned aerial vehicle multi-feature point detection method based on a two-stage cascade depth network, including:
the method comprises the steps that category and boundary box labeling is carried out on a plurality of characteristic areas of an unmanned aerial vehicle in each acquired unmanned aerial vehicle image, and corresponding training images are obtained;
inputting each training image into a pre-constructed boundary frame positioning network for training, so that the boundary frame positioning network outputs a plurality of characteristic region prediction frames;
extracting corresponding regions of interest according to each characteristic region prediction frame, and inputting all the regions of interest into a pre-constructed characteristic point regression network for training;
and performing multi-feature point detection on the image to be detected through a two-stage cascade depth network consisting of the trained boundary box positioning network and the feature point regression network to obtain a plurality of feature point coordinates.
Further, the classifying and bounding box labeling are performed on a plurality of characteristic areas of the unmanned aerial vehicle in each acquired unmanned aerial vehicle image, so as to obtain a corresponding training image, which specifically comprises:
acquiring unmanned aerial vehicle images, and marking a plurality of characteristic areas of the unmanned aerial vehicle in each unmanned aerial vehicle image by using a boundary frame marking tool to obtain corresponding training images; the characteristic area comprises an unmanned aerial vehicle body, a left wing, a right wing, a left tail wing, a right tail wing and a middle foot rest area.
Further, the training image is input into a pre-constructed bounding box positioning network to train, so that the bounding box positioning network outputs a plurality of feature area prediction frames, specifically:
constructing the boundary frame positioning network, inputting a training image into the boundary frame positioning network for training, and enabling the boundary frame positioning network to output a plurality of characteristic region prediction frames; the characteristic region prediction frame comprises an unmanned aerial vehicle body, a left wing, a right wing, a left tail wing, a right tail wing and a tripod region prediction frame;
and reversely updating network parameters of the boundary frame positioning network according to the network loss of the boundary frame positioning network, finishing training the boundary frame positioning network when the network loss of the boundary frame positioning network is smaller than a first preset threshold, otherwise, continuously inputting the next training image into the boundary frame positioning network for training.
Further, the extracting a corresponding region of interest according to each of the feature region prediction frames specifically includes:
and extracting an image region with a preset size based on the central point of the characteristic region prediction frame, and taking the image region as the corresponding region of interest.
Further, inputting all the interested areas into a pre-constructed feature point regression network for training, specifically:
constructing the characteristic point regression network, and inputting all the regions of interest corresponding to a training image into the characteristic point regression network for training;
and reversely updating network parameters of the characteristic point regression network according to the network loss of the characteristic point regression network, finishing training the characteristic point regression network when the network loss of the characteristic point regression network is smaller than a second preset threshold, otherwise, inputting all the regions of interest corresponding to the next training image into the characteristic point regression network for training.
In a second aspect, an embodiment of the present invention provides an unmanned aerial vehicle multi-feature point detection device based on a dual-stage cascade depth network, including:
the training image acquisition module is used for marking a plurality of characteristic areas of the unmanned aerial vehicle in the acquired images of each unmanned aerial vehicle by category and boundary frames to obtain corresponding training images;
the boundary frame positioning network training module is used for inputting each training image into a pre-constructed boundary frame positioning network for training, so that the boundary frame positioning network outputs a plurality of characteristic region prediction frames;
the feature point regression network training module is used for extracting corresponding regions of interest according to each feature region prediction frame, and inputting all the regions of interest into a pre-constructed feature point regression network for training;
and the multi-feature point detection module is used for carrying out multi-feature point detection on the image to be detected through a two-stage cascade depth network consisting of the trained boundary box positioning network and the feature point regression network to obtain a plurality of feature point coordinates.
Further, the classifying and bounding box labeling are performed on a plurality of characteristic areas of the unmanned aerial vehicle in each acquired unmanned aerial vehicle image, so as to obtain a corresponding training image, which specifically comprises:
acquiring unmanned aerial vehicle images, and marking a plurality of characteristic areas of the unmanned aerial vehicle in each unmanned aerial vehicle image by using a boundary frame marking tool to obtain corresponding training images; the characteristic area comprises an unmanned aerial vehicle body, a left wing, a right wing, a left tail wing, a right tail wing and a middle foot rest area.
Further, the training image is input into a pre-constructed bounding box positioning network to train, so that the bounding box positioning network outputs a plurality of feature area prediction frames, specifically:
constructing the boundary frame positioning network, inputting a training image into the boundary frame positioning network for training, and enabling the boundary frame positioning network to output a plurality of characteristic region prediction frames; the characteristic region prediction frame comprises an unmanned aerial vehicle body, a left wing, a right wing, a left tail wing, a right tail wing and a tripod region prediction frame;
and reversely updating network parameters of the boundary frame positioning network according to the network loss of the boundary frame positioning network, finishing training the boundary frame positioning network when the network loss of the boundary frame positioning network is smaller than a first preset threshold, otherwise, continuously inputting the next training image into the boundary frame positioning network for training.
Further, the extracting a corresponding region of interest according to each of the feature region prediction frames specifically includes:
and extracting an image region with a preset size based on the central point of the characteristic region prediction frame, and taking the image region as the corresponding region of interest.
Further, inputting all the interested areas into a pre-constructed feature point regression network for training, specifically:
constructing the characteristic point regression network, and inputting all the regions of interest corresponding to a training image into the characteristic point regression network for training;
and reversely updating network parameters of the characteristic point regression network according to the network loss of the characteristic point regression network, finishing training the characteristic point regression network when the network loss of the characteristic point regression network is smaller than a second preset threshold, otherwise, inputting all the regions of interest corresponding to the next training image into the characteristic point regression network for training.
The embodiment of the invention has the following beneficial effects:
the method comprises the steps of marking a plurality of characteristic areas of an unmanned aerial vehicle in each acquired unmanned aerial vehicle image by category and boundary frames, obtaining corresponding training images, inputting each training image into a pre-built boundary frame positioning network for training, enabling the boundary frame positioning network to output a plurality of characteristic area prediction frames, extracting corresponding interested areas according to each characteristic area prediction frame, inputting all interested areas into a pre-built characteristic point regression network for training, and detecting a plurality of characteristic points of an image to be detected through a two-stage cascade depth network consisting of the trained boundary frame positioning network and the characteristic point regression network, so as to obtain a plurality of characteristic point coordinates, and detecting a plurality of characteristic points of the unmanned aerial vehicle. Compared with the prior art, the method and the device have the advantages that the boundary box positioning network is trained by utilizing the training image, a plurality of feature area prediction frames are directly output by the boundary box positioning network according to the training image, the feature point regression network is trained by utilizing the region of interest corresponding to the feature area prediction frames, a plurality of feature point coordinates are output by the feature point regression network according to the region of interest, and therefore a plurality of feature points of the unmanned aerial vehicle in the image to be detected can be stably and accurately detected in real time through a two-stage cascade depth network formed by the trained boundary box positioning network and the feature point regression network.
Drawings
Fig. 1 is a schematic flow chart of a method for detecting multiple feature points of an unmanned aerial vehicle based on a two-stage cascade depth network in a first embodiment of the invention;
FIG. 2 is a schematic diagram of a dual-cascaded depth network according to a first embodiment of the present invention;
fig. 3 is a schematic structural diagram of an unmanned aerial vehicle multi-feature point detection device based on a two-stage cascade depth network according to a second embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made more apparent and fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, the step numbers herein are only for convenience of explanation of the specific embodiments, and are not used as limiting the order of execution of the steps.
First embodiment:
as shown in fig. 1, a first embodiment provides a method for detecting multiple feature points of an unmanned aerial vehicle based on a two-stage cascade depth network, which includes steps S1 to S4:
s1, marking a plurality of characteristic areas of an unmanned aerial vehicle and a boundary box in each acquired unmanned aerial vehicle image to obtain a corresponding training image;
s2, inputting each training image into a pre-constructed boundary frame positioning network for training, and enabling the boundary frame positioning network to output a plurality of characteristic region prediction frames;
s3, extracting corresponding regions of interest according to each characteristic region prediction frame, and inputting all the regions of interest into a pre-constructed characteristic point regression network for training;
and S4, performing multi-feature point detection on the image to be detected through a two-stage cascade depth network consisting of the trained boundary box positioning network and the feature point regression network, and obtaining a plurality of feature point coordinates.
It should be noted that the unmanned aerial vehicle image is an RGB three-channel image, the bounding box positioning network is a bbox locate-Net network, the characteristic point regression network is a pointRefine-Net network, and the two-stage cascade depth network is obtained by cascading the bbox locate-Net network and the pointRefine-Net network. A schematic of the structure of a dual cascaded depth network is shown in fig. 2.
As an example, in step S1, an unmanned aerial vehicle in the landing process is photographed by a ground camera to obtain unmanned aerial vehicle images, and categories and bounding boxes of a plurality of feature areas of the unmanned aerial vehicle are marked in each unmanned aerial vehicle image to obtain corresponding training images, so that the bounding box positioning network is trained by the training images later, and a plurality of feature points of the unmanned aerial vehicle are detected on the ground based on vision.
In a preferred embodiment, a plurality of feature areas of the unmanned aerial vehicle are labeled in each acquired unmanned aerial vehicle image by category and bounding box, and a corresponding training image is obtained, specifically: acquiring unmanned aerial vehicle images, and marking a plurality of characteristic areas of the unmanned aerial vehicle in each unmanned aerial vehicle image by using a boundary frame marking tool to obtain corresponding training images; the characteristic area comprises an unmanned aerial vehicle body, a left wing, a right wing, a left tail wing, a right tail wing and a middle foot rest area.
According to the method, the boundary boxes of the unmanned aerial vehicle body, the left wing, the right wing, the left tail wing, the right tail wing, the middle foot rest and other areas are marked in each unmanned aerial vehicle image by using the boundary box marking tool, so that the boundary box positioning network is trained by using training images marked with a plurality of characteristic areas, and the detection efficiency, the detection robustness and the detection precision of the boundary box positioning network to the unmanned aerial vehicle multi-characteristic areas are improved.
As an example, in step S2, after obtaining the training images, each training image is directly input into the pre-constructed bounding box positioning network in sequence to train, so that the bounding box positioning network outputs a plurality of feature area prediction frames, and network parameters of the bounding box positioning network are iteratively updated.
In a preferred embodiment, each training image is input into a pre-constructed bounding box positioning network for training, so that the bounding box positioning network outputs a plurality of feature area prediction frames, specifically: constructing a boundary frame positioning network, inputting a training image into the boundary frame positioning network for training, and enabling the boundary frame positioning network to output a plurality of characteristic region prediction frames; the characteristic region prediction frame comprises an unmanned aerial vehicle body, a left wing, a right wing, a left tail wing, a right tail wing and a middle foot rest region prediction frame; and reversely updating network parameters of the boundary frame positioning network according to the network loss of the boundary frame positioning network, ending training the boundary frame positioning network when the network loss of the boundary frame positioning network is smaller than a first preset threshold value, otherwise, continuing inputting the next training image into the boundary frame positioning network for training.
The specific processes of constructing, training and applying the bounding box positioning network are as follows:
1. constructing a bboxLocate-Net network;
the bbox location-Net network is improved based on YOLO V3, an RGB three-channel image with 416 x 416 is input, the feature extraction network is a 23-layer Darknet-tiny network formed by stacking convolution pooling units, and the detection layer predicts the boundary frame by adopting 13 x 13 and 26 x 26 double scales.
2. Training a bboxLocate-Net network;
scaling an input training image to 416 x 416, inputting the training image to a dark-tiny network in batches for feature extraction, performing convolution and pooling operation to obtain output predictions with two different scales, and performing loss value calculation on the prediction results in a YOLO detection layer. The penalty values include bounding box coordinate penalty, confidence penalty, and category confidence penalty, the penalty values calculated from the penalty function.
3. Detecting and positioning six characteristic areas of the unmanned aerial vehicle;
inputting the image to be detected into a bboxLocate-Net network to obtain a predicted tensor, wherein the predicted tensor comprises a central coordinate value (t x ,t y ) Width and height values (t w ,t h ) As well as confidence and category.
As an example, in step S3, after the bounding box positioning network outputs a plurality of feature region prediction frames, a region of interest (ROI) corresponding to the feature region prediction frames is input into a pre-constructed feature point regression network for training, which is beneficial to improving the detection efficiency, the detection robustness and the detection precision of the feature point network on the multi-feature points of the unmanned aerial vehicle.
In a preferred embodiment, the extracting a corresponding region of interest according to each feature region prediction frame is specifically: and extracting an image region with a preset size based on the central point of the characteristic region prediction frame, and taking the image region as a corresponding region of interest.
In a preferred embodiment, all the regions of interest are input into a pre-constructed feature point regression network for training, specifically: constructing a feature point regression network, and inputting all the regions of interest corresponding to a training image into the feature point regression network for training; and reversely updating network parameters of the characteristic point regression network according to the network loss of the characteristic point regression network, ending training the characteristic point regression network when the network loss of the characteristic point regression network is smaller than a second preset threshold, otherwise, inputting all the interested areas corresponding to the next training image into the characteristic point regression network for training.
The specific process of constructing, training and applying the characteristic point regression network is as follows:
1. constructing a pointRefine-Net network;
the pointRefine-Net network is a point regression network group and comprises five sub-networks, the input image size of each network is 15 x 15, the full-connection network is adopted in the last layer after the feature extraction of the main network, and a tensor of 1 x 2 is output, wherein the tensor is the predicted value of the feature point coordinate.
2. Training a pointRefine-Net network;
each sub-network of the pointRefine-Net network is independently trained, training samples are randomly intercepted near feature points, 10 small areas are randomly generated near each feature point of each image to serve as an interested area, the network regression target is the offset of the feature point relative to the upper left corner coordinates of the area, and the aim of minimizing the offset loss value is achieved, and the convolution kernel parameters are iteratively updated through a backward gradient propagation algorithm. And stopping training to obtain a final characteristic point regression network when the loss value is lower than a certain threshold value.
3. On the basis of coarse extraction of six characteristic areas of the unmanned aerial vehicle, fine positioning is carried out on the characteristic points in each area;
after the boundary frame of the six characteristic areas of the unmanned aerial vehicle is obtained, the boundary frame is taken as the center, the width and the height are respectively (w is 0.15 and h is 0.15), an ROI area is generated around the coarse positioning point, and the ROI area is sent into a corresponding pointeonce-Net network to obtain more accurate characteristic point coordinates.
As an example, in step S4, after the training of the bounding box positioning network and the feature point regression network is completed, the bounding box positioning network and the feature point regression network are cascaded to obtain a two-stage cascade depth network, so that the multi-feature point detection can be performed on the image to be detected through the two-stage cascade depth network, thereby realizing the real-time stable and accurate detection of a plurality of feature points of the unmanned aerial vehicle.
The first-stage network bboxLocate-Net network of the double-cascade depth network is a category and bounding box regression network, a large number of training images train the bboxLocate-Net network, and after one image is input, the category and the frame positioning of unmanned aerial vehicle bodies, left wings, right wings, left tail wings, right tail wings and tripod areas can be carried out on the bboxLocate-Net network. The second-stage network pointRefine-Net network of the two-stage cascade depth network is a network group consisting of five sub-networks, each sub-network is responsible for regression of a characteristic point, training samples are small areas which are randomly intercepted by 10 different positions near each characteristic point of each image and contain the characteristic point, and the characteristic point can be accurately positioned on the basis of a characteristic area prediction frame output by the bbox location-Net network.
The bboxLocate-Net network and the pointRefine-Net network share three tasks to output, wherein the first task of the bboxLocate-Net network outputs discrete classification probability, namely whether one of five characteristic areas exists, and the second task outputs specific coordinates and frame width and height values of the characteristic areas. The only task of the pointRefine-Net network is to output the coordinate vector of the feature point regression.
Taking the left wing tip region as an example, for example, the formula (1) is a loss function of classification tasks, the formula (2) is a loss function of boundary box regression tasks, and the formula (3) is a loss function of wing tip characteristic point regression tasks:
in the formula (1), y i Representing sample x i With a value of 0 or 1,0 representing the non-left wing tip region, 1 representing the left wing tip region, p i Representing a probability that the network determines the sample as a left wing tip;
in the formula (2), the amino acid sequence of the compound,representing the frame position increment of each candidate window predicted by the network, wherein y represents the actual boundary frame position increment;
in the formula (3), the amino acid sequence of the compound,and (3) representing a network predicted unmanned aerial vehicle characteristic point position vector, and y representing a real characteristic point vector.
After the image to be detected is input, the approximate region where a plurality of characteristic points are positioned is determined through a bbox location-Net network, a region is intercepted by taking the center point of the region as the center, the width and the height are (w is 0.15 and h is 0.15), and the region is sent to a corresponding trained pointRefine-Net point regression network, so that the coordinates of the characteristic points can be accurately positioned in the region of the ROI.
According to the method for detecting the multi-feature points of the unmanned aerial vehicle based on the two-stage cascade depth network, a to-be-detected image is firstly input into a bbox location-Net network to obtain a prediction frame of six feature areas of the unmanned aerial vehicle, and then each sub-area is respectively sent into corresponding second-stage network pointensine-Net regression five-feature point coordinates.
Second embodiment:
as shown in fig. 3, a second embodiment provides an unmanned aerial vehicle multi-feature point detection device based on a dual-stage cascade depth network, including: the training image obtaining module 21 is configured to perform category and bounding box labeling on a plurality of feature areas of the unmanned aerial vehicle in each acquired unmanned aerial vehicle image, so as to obtain a corresponding training image; the bounding box positioning network training module 22 is configured to input each training image into a pre-constructed bounding box positioning network for training, so that the bounding box positioning network outputs a plurality of feature area prediction frames; the feature point regression network training module 23 is configured to extract a corresponding region of interest according to each feature region prediction frame, and input all the regions of interest into a pre-constructed feature point regression network for training; the multi-feature point detection module 24 is configured to perform multi-feature point detection on an image to be detected through a two-stage cascade depth network composed of a trained bounding box positioning network and a feature point regression network, so as to obtain a plurality of feature point coordinates.
It should be noted that the unmanned aerial vehicle image is an RGB three-channel image, the bounding box positioning network is a bbox locate-Net network, the characteristic point regression network is a pointRefine-Net network, and the two-stage cascade depth network is obtained by cascading the bbox locate-Net network and the pointRefine-Net network.
Illustratively, through the training image acquisition module 21, the ground camera shoots the unmanned aerial vehicle in the landing process to obtain unmanned aerial vehicle images, and marks the categories and the bounding boxes of a plurality of characteristic areas of the unmanned aerial vehicle in each unmanned aerial vehicle image to obtain corresponding training images, so that the bounding box positioning network is trained by the training images, and a plurality of characteristic points of the unmanned aerial vehicle are detected on the ground based on vision.
In a preferred embodiment, a plurality of feature areas of the unmanned aerial vehicle are labeled in each acquired unmanned aerial vehicle image by category and bounding box, and a corresponding training image is obtained, specifically: acquiring unmanned aerial vehicle images, and marking a plurality of characteristic areas of the unmanned aerial vehicle in each unmanned aerial vehicle image by using a boundary frame marking tool to obtain corresponding training images; the characteristic area comprises an unmanned aerial vehicle body, a left wing, a right wing, a left tail wing, a right tail wing and a middle foot rest area.
According to the embodiment, the boundary frame marking tool is utilized to mark the boundary frames of the areas such as the unmanned aerial vehicle body, the left wing, the right wing, the left tail wing, the right tail wing, the middle foot rest and the like in each unmanned aerial vehicle image through the training image acquisition module 21, so that the boundary frame positioning network is trained by the training images marked with a plurality of characteristic areas, and the detection efficiency, the detection robustness and the detection precision of the boundary frame positioning network to the multi-characteristic areas of the unmanned aerial vehicle are improved.
Illustratively, by the bounding box positioning network training module 22, after obtaining training images, each training image is directly input into a pre-constructed bounding box positioning network in sequence for training, so that the bounding box positioning network outputs a plurality of feature area prediction frames, and network parameters of the bounding box positioning network are iteratively updated.
In a preferred embodiment, each training image is input into a pre-constructed bounding box positioning network for training, so that the bounding box positioning network outputs a plurality of feature area prediction frames, specifically: constructing a boundary frame positioning network, inputting a training image into the boundary frame positioning network for training, and enabling the boundary frame positioning network to output a plurality of characteristic region prediction frames; the characteristic region prediction frame comprises an unmanned aerial vehicle body, a left wing, a right wing, a left tail wing, a right tail wing and a middle foot rest region prediction frame; and reversely updating network parameters of the boundary frame positioning network according to the network loss of the boundary frame positioning network, ending training the boundary frame positioning network when the network loss of the boundary frame positioning network is smaller than a first preset threshold value, otherwise, continuing inputting the next training image into the boundary frame positioning network for training.
The specific processes of constructing, training and applying the bounding box positioning network are as follows:
1. constructing a bboxLocate-Net network;
the bbox location-Net network is improved based on YOLO V3, an RGB three-channel image with 416 x 416 is input, the feature extraction network is a 23-layer Darknet-tiny network formed by stacking convolution pooling units, and the detection layer predicts the boundary frame by adopting 13 x 13 and 26 x 26 double scales.
2. Training a bboxLocate-Net network;
scaling an input training image to 416 x 416, inputting the training image to a dark-tiny network in batches for feature extraction, performing convolution and pooling operation to obtain output predictions with two different scales, and performing loss value calculation on the prediction results in a YOLO detection layer. The penalty values include bounding box coordinate penalty, confidence penalty, and category confidence penalty, the penalty values calculated from the penalty function.
3. Detecting and positioning six characteristic areas of the unmanned aerial vehicle;
inputting the image to be detected into a bboxLocate-Net network to obtain a predicted tensor, wherein the predicted tensor comprises a central coordinate value (t x ,t y ) Width and height values (t w ,t h ) As well as confidence and category.
As an example, through the feature point regression network training module 23, after the bounding box positioning network outputs a plurality of feature region prediction frames, the region of interest (ROI) corresponding to the feature region prediction frames is input into the pre-constructed feature point regression network for training, which is beneficial to improving the detection efficiency, the detection robustness and the detection precision of the feature point network to the multi-feature points of the unmanned aerial vehicle.
In a preferred embodiment, the extracting a corresponding region of interest according to each feature region prediction frame is specifically: and extracting an image region with a preset size based on the central point of the characteristic region prediction frame, and taking the image region as a corresponding region of interest.
In a preferred embodiment, all the regions of interest are input into a pre-constructed feature point regression network for training, specifically: constructing a feature point regression network, and inputting all the regions of interest corresponding to a training image into the feature point regression network for training; and reversely updating network parameters of the characteristic point regression network according to the network loss of the characteristic point regression network, ending training the characteristic point regression network when the network loss of the characteristic point regression network is smaller than a second preset threshold, otherwise, inputting all the interested areas corresponding to the next training image into the characteristic point regression network for training.
The specific process of constructing, training and applying the characteristic point regression network is as follows:
1. constructing a pointRefine-Net network;
the pointRefine-Net network is a point regression network group and comprises five sub-networks, the input image size of each network is 15 x 15, the full-connection network is adopted in the last layer after the feature extraction of the main network, and a tensor of 1 x 2 is output, wherein the tensor is the predicted value of the feature point coordinate.
2. Training a pointRefine-Net network;
each sub-network of the pointRefine-Net network is independently trained, training samples are randomly intercepted near feature points, 10 small areas are randomly generated near each feature point of each image to serve as an interested area, the network regression target is the offset of the feature point relative to the upper left corner coordinates of the area, and the aim of minimizing the offset loss value is achieved, and the convolution kernel parameters are iteratively updated through a backward gradient propagation algorithm. And stopping training to obtain a final characteristic point regression network when the loss value is lower than a certain threshold value.
3. On the basis of coarse extraction of six characteristic areas of the unmanned aerial vehicle, fine positioning is carried out on the characteristic points in each area;
after the boundary frame of the six characteristic areas of the unmanned aerial vehicle is obtained, the boundary frame is taken as the center, the width and the height are respectively (w is 0.15 and h is 0.15), an ROI area is generated around the coarse positioning point, and the ROI area is sent into a corresponding pointeonce-Net network to obtain more accurate characteristic point coordinates.
Illustratively, by the multi-feature point detection module 24, after the bounding box positioning network and the feature point regression network are trained, the bounding box positioning network and the feature point regression network are cascaded to obtain a two-stage cascade depth network, so that the multi-feature point detection can be performed on the image to be detected through the two-stage cascade depth network, thereby realizing the real-time stable and accurate detection of a plurality of feature points of the unmanned aerial vehicle.
The first-stage network bboxLocate-Net network of the double-cascade depth network is a category and bounding box regression network, a large number of training images train the bboxLocate-Net network, and after one image is input, the category and the frame positioning of unmanned aerial vehicle bodies, left wings, right wings, left tail wings, right tail wings and tripod areas can be carried out on the bboxLocate-Net network. The second-stage network pointRefine-Net network of the two-stage cascade depth network is a network group consisting of five sub-networks, each sub-network is responsible for regression of a characteristic point, training samples are small areas which are randomly intercepted by 10 different positions near each characteristic point of each image and contain the characteristic point, and the characteristic point can be accurately positioned on the basis of a characteristic area prediction frame output by the bbox location-Net network.
The bboxLocate-Net network and the pointRefine-Net network share three tasks to output, wherein the first task of the bboxLocate-Net network outputs discrete classification probability, namely whether one of five characteristic areas exists, and the second task outputs specific coordinates and frame width and height values of the characteristic areas. The only task of the pointRefine-Net network is to output the coordinate vector of the feature point regression.
Taking the left wing tip region as an example, for example, the formula (4) is a loss function of classification tasks, the formula (5) is a loss function of boundary box regression tasks, and the formula (6) is a loss function of wing tip characteristic point regression tasks:
in the formula (4), y i Representing sample x i With a value of 0 or 1,0 representing the non-left wing tip region, 1 representing the left wing tip region, p i Representing a probability that the network determines the sample as a left wing tip;
in the formula (5), the amino acid sequence of the compound,representing the frame position increment of each candidate window predicted by the network, wherein y represents the actual boundary frame position increment;
in the formula (6), the amino acid sequence of the compound,unmanned representation of network predictionsAnd (3) a machine feature point position vector, wherein y represents a real feature point vector.
In this embodiment, the multi-feature point detection module 24 is used to determine the approximate region where the feature points are located by the bbox location-Net network after the image to be detected is input, and the region center point is taken as the center, the width and height are (w is 0.15, h is 0.15), an ROI region is intercepted, and the ROI region is sent to the corresponding trained pointRefine-Net point regression network, so that the feature point coordinates can be accurately located in the ROI region.
In summary, the embodiment of the invention has the following beneficial effects:
the method comprises the steps of marking a plurality of characteristic areas of an unmanned aerial vehicle in each acquired unmanned aerial vehicle image by category and boundary frames, obtaining corresponding training images, inputting each training image into a pre-built boundary frame positioning network for training, enabling the boundary frame positioning network to output a plurality of characteristic area prediction frames, extracting corresponding interested areas according to each characteristic area prediction frame, inputting all interested areas into a pre-built characteristic point regression network for training, and detecting a plurality of characteristic points of an image to be detected through a two-stage cascade depth network consisting of the trained boundary frame positioning network and the characteristic point regression network, so as to obtain a plurality of characteristic point coordinates, and detecting a plurality of characteristic points of the unmanned aerial vehicle. According to the embodiment of the invention, the boundary frame positioning network is trained by utilizing the training image, so that the boundary frame positioning network directly outputs a plurality of feature area prediction frames according to the training image, and the feature point regression network is trained by utilizing the region of interest corresponding to the feature area prediction frames, so that the feature point regression network outputs a plurality of feature point coordinates according to the region of interest, and therefore, a plurality of feature points of the unmanned aerial vehicle in the image to be detected can be stably and accurately detected in real time through a two-stage cascade depth network consisting of the trained boundary frame positioning network and the feature point regression network.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.
Those skilled in the art will appreciate that implementing all or part of the above-described embodiments may be accomplished by way of computer programs, which may be stored on a computer readable storage medium, which when executed may comprise the steps of the above-described embodiments. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.

Claims (4)

1. The unmanned aerial vehicle multi-feature point detection method based on the two-stage cascade depth network is characterized by comprising the following steps of:
the method comprises the steps that category and boundary box labeling is carried out on a plurality of characteristic areas of an unmanned aerial vehicle in each acquired unmanned aerial vehicle image, and corresponding training images are obtained;
inputting each training image into a pre-constructed boundary frame positioning network for training, so that the boundary frame positioning network outputs a plurality of characteristic region prediction frames; inputting each training image into a pre-constructed boundary frame positioning network for training, and enabling the boundary frame positioning network to output a plurality of characteristic region prediction frames, wherein the method specifically comprises the following steps: constructing the boundary frame positioning network, inputting a training image into the boundary frame positioning network for training, and enabling the boundary frame positioning network to output a plurality of characteristic region prediction frames; the characteristic region prediction frame comprises an unmanned aerial vehicle body, a left wing, a right wing, a left tail wing, a right tail wing and a tripod region prediction frame; reversely updating network parameters of the boundary frame positioning network according to the network loss of the boundary frame positioning network, ending training the boundary frame positioning network when the network loss of the boundary frame positioning network is smaller than a first preset threshold value, otherwise, continuing inputting the next training image into the boundary frame positioning network for training;
extracting corresponding regions of interest according to each characteristic region prediction frame, and inputting all the regions of interest into a pre-constructed characteristic point regression network for training; the extracting the corresponding interested region according to each characteristic region prediction frame specifically comprises the following steps: extracting an image region with a preset size based on the central point of the characteristic region prediction frame, and taking the image region as the corresponding region of interest; inputting all the interested areas into a pre-constructed characteristic point regression network for training, wherein the training is specifically as follows: constructing the characteristic point regression network, and inputting all the regions of interest corresponding to a training image into the characteristic point regression network for training; reversely updating network parameters of the characteristic point regression network according to the network loss of the characteristic point regression network, ending training the characteristic point regression network when the network loss of the characteristic point regression network is smaller than a second preset threshold, otherwise, inputting all the regions of interest corresponding to the next training image into the characteristic point regression network for training;
and performing multi-feature point detection on the image to be detected through a two-stage cascade depth network consisting of the trained boundary box positioning network and the feature point regression network to obtain a plurality of feature point coordinates.
2. The method for detecting the multiple feature points of the unmanned aerial vehicle based on the two-stage cascade depth network according to claim 1, wherein the method is characterized in that the category and the bounding box of the multiple feature areas of the unmanned aerial vehicle are marked in each acquired unmanned aerial vehicle image to obtain the corresponding training image, specifically:
acquiring unmanned aerial vehicle images, and marking a plurality of characteristic areas of the unmanned aerial vehicle in each unmanned aerial vehicle image by using a boundary frame marking tool to obtain corresponding training images; the characteristic area comprises an unmanned aerial vehicle body, a left wing, a right wing, a left tail wing, a right tail wing and a middle foot rest area.
3. Unmanned aerial vehicle multi-feature point detection device based on doublestage allies oneself with degree of depth network, its characterized in that includes:
the training image acquisition module is used for marking a plurality of characteristic areas of the unmanned aerial vehicle in the acquired images of each unmanned aerial vehicle by category and boundary frames to obtain corresponding training images;
the boundary frame positioning network training module is used for inputting each training image into a pre-constructed boundary frame positioning network for training, so that the boundary frame positioning network outputs a plurality of characteristic region prediction frames; the method is particularly used for: constructing the boundary frame positioning network, inputting a training image into the boundary frame positioning network for training, and enabling the boundary frame positioning network to output a plurality of characteristic region prediction frames; the characteristic region prediction frame comprises an unmanned aerial vehicle body, a left wing, a right wing, a left tail wing, a right tail wing and a tripod region prediction frame; reversely updating network parameters of the boundary frame positioning network according to the network loss of the boundary frame positioning network, ending training the boundary frame positioning network when the network loss of the boundary frame positioning network is smaller than a first preset threshold value, otherwise, continuing inputting the next training image into the boundary frame positioning network for training;
the feature point regression network training module is used for extracting corresponding regions of interest according to each feature region prediction frame, and inputting all the regions of interest into a pre-constructed feature point regression network for training; the method is particularly used for: extracting an image region with a preset size based on the central point of the characteristic region prediction frame, and taking the image region as the corresponding region of interest; constructing the characteristic point regression network, and inputting all the regions of interest corresponding to a training image into the characteristic point regression network for training; reversely updating network parameters of the characteristic point regression network according to the network loss of the characteristic point regression network, ending training the characteristic point regression network when the network loss of the characteristic point regression network is smaller than a second preset threshold, otherwise, inputting all the regions of interest corresponding to the next training image into the characteristic point regression network for training;
and the multi-feature point detection module is used for carrying out multi-feature point detection on the image to be detected through a two-stage cascade depth network consisting of the trained boundary box positioning network and the feature point regression network to obtain a plurality of feature point coordinates.
4. The unmanned aerial vehicle multi-feature point detection device based on the two-stage cascade depth network according to claim 3, wherein the classification and bounding box labeling are performed on a plurality of feature areas of the unmanned aerial vehicle in each acquired unmanned aerial vehicle image, so as to obtain a corresponding training image, specifically:
acquiring unmanned aerial vehicle images, and marking a plurality of characteristic areas of the unmanned aerial vehicle in each unmanned aerial vehicle image by using a boundary frame marking tool to obtain corresponding training images; the characteristic area comprises an unmanned aerial vehicle body, a left wing, a right wing, a left tail wing, a right tail wing and a middle foot rest area.
CN202010925838.7A 2020-09-04 2020-09-04 Unmanned aerial vehicle multi-feature point detection method and device based on two-stage cascade depth network Active CN112070085B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010925838.7A CN112070085B (en) 2020-09-04 2020-09-04 Unmanned aerial vehicle multi-feature point detection method and device based on two-stage cascade depth network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010925838.7A CN112070085B (en) 2020-09-04 2020-09-04 Unmanned aerial vehicle multi-feature point detection method and device based on two-stage cascade depth network

Publications (2)

Publication Number Publication Date
CN112070085A CN112070085A (en) 2020-12-11
CN112070085B true CN112070085B (en) 2023-07-28

Family

ID=73662753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010925838.7A Active CN112070085B (en) 2020-09-04 2020-09-04 Unmanned aerial vehicle multi-feature point detection method and device based on two-stage cascade depth network

Country Status (1)

Country Link
CN (1) CN112070085B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108470077A (en) * 2018-05-28 2018-08-31 广东工业大学 A kind of video key frame extracting method, system and equipment and storage medium
CN109117794A (en) * 2018-08-16 2019-01-01 广东工业大学 A kind of moving target behavior tracking method, apparatus, equipment and readable storage medium storing program for executing
CN109492416A (en) * 2019-01-07 2019-03-19 南京信息工程大学 A kind of guard method of big data image and system based on safety zone
CN109741318A (en) * 2018-12-30 2019-05-10 北京工业大学 The real-time detection method of single phase multiple dimensioned specific objective based on effective receptive field

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108470077A (en) * 2018-05-28 2018-08-31 广东工业大学 A kind of video key frame extracting method, system and equipment and storage medium
CN109117794A (en) * 2018-08-16 2019-01-01 广东工业大学 A kind of moving target behavior tracking method, apparatus, equipment and readable storage medium storing program for executing
CN109741318A (en) * 2018-12-30 2019-05-10 北京工业大学 The real-time detection method of single phase multiple dimensioned specific objective based on effective receptive field
CN109492416A (en) * 2019-01-07 2019-03-19 南京信息工程大学 A kind of guard method of big data image and system based on safety zone

Also Published As

Publication number Publication date
CN112070085A (en) 2020-12-11

Similar Documents

Publication Publication Date Title
CN108171748B (en) Visual identification and positioning method for intelligent robot grabbing application
US20210390329A1 (en) Image processing method, device, movable platform, unmanned aerial vehicle, and storage medium
CN111080693A (en) Robot autonomous classification grabbing method based on YOLOv3
CN107507167B (en) Cargo tray detection method and system based on point cloud plane contour matching
Keller et al. A new benchmark for stereo-based pedestrian detection
CN109977997B (en) Image target detection and segmentation method based on convolutional neural network rapid robustness
Alidoost et al. A CNN-based approach for automatic building detection and recognition of roof types using a single aerial image
CN106767812A (en) A kind of interior semanteme map updating method and system based on Semantic features extraction
CN105574527A (en) Quick object detection method based on local feature learning
Ji et al. Integrating visual selective attention model with HOG features for traffic light detection and recognition
CN113139453A (en) Orthoimage high-rise building base vector extraction method based on deep learning
Ji et al. RGB-D SLAM using vanishing point and door plate information in corridor environment
US20200226392A1 (en) Computer vision-based thin object detection
Zelener et al. Cnn-based object segmentation in urban lidar with missing points
CN112329559A (en) Method for detecting homestead target based on deep convolutional neural network
Zhang et al. Out-of-region keypoint localization for 6D pose estimation
CN114358133B (en) Method for detecting looped frames based on semantic-assisted binocular vision SLAM
CN114219936A (en) Object detection method, electronic device, storage medium, and computer program product
Liang et al. DIG-SLAM: an accurate RGB-D SLAM based on instance segmentation and geometric clustering for dynamic indoor scenes
CN111598033B (en) Goods positioning method, device, system and computer readable storage medium
Zhuang et al. Contextual classification of 3D laser points with conditional random fields in urban environments
CN117496401A (en) Full-automatic identification and tracking method for oval target points of video measurement image sequences
CN117036484A (en) Visual positioning and mapping method, system, equipment and medium based on geometry and semantics
CN112070085B (en) Unmanned aerial vehicle multi-feature point detection method and device based on two-stage cascade depth network
CN109543700B (en) Anti-shielding clothing key point detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant