CN118155104A

CN118155104A - Unmanned aerial vehicle autonomous landing method and system

Info

Publication number: CN118155104A
Application number: CN202410571923.6A
Authority: CN
Inventors: 谢家东; 陈敏; 吴志刚; 朱耀晖; 张萌; 李锦沅; 陈世杰
Original assignee: NANCHANG CAMPUS OF JIANGXI UNIVERSITY OF SCIENCE AND TECHNOLOGY
Current assignee: NANCHANG CAMPUS OF JIANGXI UNIVERSITY OF SCIENCE AND TECHNOLOGY
Priority date: 2024-05-10
Filing date: 2024-05-10
Publication date: 2024-06-07
Anticipated expiration: 2044-05-10

Abstract

The invention discloses an unmanned aerial vehicle autonomous landing method and system, which relate to the technical field of unmanned aerial vehicle application, and the method comprises the following steps: inputting the data set into an improved YOLOV model structure by taking the aerial ground image of the unmanned plane as the data set, and outputting a prediction boundary box, wherein the improved YOLOV model structure comprises an input layer, an improved backbone network extraction layer, an improved neck feature fusion layer and a head prediction layer; calculating a loss function of the prediction boundary frame, adjusting parameters of the improved YOLOV model structure according to the loss difference of the prediction boundary frame and the actual boundary frame, deploying the improved YOLOV model structure and the adjusted parameters on an unmanned aerial vehicle, detecting an input aerial ground image by the unmanned aerial vehicle, and outputting a landing boundary frame; according to the landing boundary box, the flight state of the unmanned aerial vehicle is adjusted, so that the unmanned aerial vehicle lands autonomously, and the technical problem that in the prior art, the landing of the unmanned aerial vehicle is easily affected by the environment and the accuracy is low due to the positioning of the unmanned aerial vehicle by the positioning system can be solved.

Description

Unmanned aerial vehicle autonomous landing method and system

Technical Field

The invention relates to the technical field of unmanned aerial vehicle application, in particular to an unmanned aerial vehicle autonomous landing method and system.

Background

With the deep development of logistics distribution systems to intellectualization and unmanned, unmanned aerial vehicle autonomous delivery landing technology is becoming one of the core driving forces for the transformation of logistics industry. Unmanned aerial vehicles have been successfully applied to a plurality of actual scenes such as express package delivery, medical first-aid material transportation, remote regional replenishment and the like, and have great potential in the fields such as retail industry, emergency response and the like. Although the current unmanned aerial vehicle delivery and landing system generally depends on a GPS and inertial navigation combined technology to plan a flight path and preliminarily realize autonomous navigation, a series of technical challenges still remain to be broken through in the links of accurate goods delivery and autonomous landing.

In the traditional unmanned aerial vehicle delivery and landing process, the approximate position is determined through a GPS and other auxiliary positioning systems, and a delivery task is executed by combining a preset flight plan, however, because GPS signals are easy to be blocked and interfered in the urban environment of a tall building forest, positioning errors are increased, and the requirement of autonomous landing to a specified narrow area on high precision cannot be met. In addition, the complex weather conditions, wind speed variations, and uncertainty of the target landing bounding box also increase the difficulty of autonomous delivery and landing operations of the unmanned aerial vehicle.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide an unmanned aerial vehicle autonomous landing method and system, and aims to solve the technical problem that in the prior art, the landing of a positioning unmanned aerial vehicle is easily affected by the environment and the accuracy is low.

A first aspect of the present invention provides an autonomous landing method of an unmanned aerial vehicle, the autonomous landing method of an unmanned aerial vehicle comprising:

Inputting the dataset into an improved YOLOV model structure as a dataset through unmanned aerial vehicle aerial ground image, outputting a prediction bounding box, wherein the improved YOLOV model structure comprises an input layer, an improved backbone network extraction layer, an improved neck feature fusion layer and a head prediction layer, and the method comprises the following steps of:

inputting the data set into the input layer for preprocessing,

Performing feature extraction through the improved backbone network extraction layer to obtain a multi-scale feature map, wherein the improved backbone network extraction layer comprises at least one CBM layer, a C3GC layer embedded behind the CBM layer, a scSENET layer and a SPPELAN layer embedded behind the scSENET layer, the improvement of the C3GC layer comprises introducing GCNET layers into the original C3 layer, the C3GC layer comprises a plurality of CBM layers and an improved GCSP1_X layer connected behind the CBM layers, the improved GCSP1_X layer comprises a first branch and a second branch which are arranged in parallel, the first branch comprises a CBM layer, at least one GCRes unit layer and a convolution layer in sequence, the GCRes unit layer comprises the GCNET layer, the second branch comprises a CBM layer, the first branch and the second branch are connected through splicing layers in a splicing mode, and then output through batch normalization layers, relu layers and CBM layers in sequence,

Inputting the feature images with different scales into the improved neck feature fusion layer for feature fusion to obtain a multi-scale feature fusion image, wherein the improvement of the improved neck feature fusion layer comprises replacing a splicing layer with DWACM layers,

Then, predicting through the head prediction layer to obtain a prediction boundary frame;

Calculating a loss function of the prediction boundary frame, adjusting parameters of the improved YOLOV model structure according to the loss difference between the prediction boundary frame and the actual boundary frame, deploying the improved YOLOV model structure and the adjusted parameters on an unmanned aerial vehicle, detecting an input aerial ground image by the unmanned aerial vehicle, and outputting a landing boundary frame;

and according to the landing boundary box, adjusting the flight state of the unmanned aerial vehicle so that the unmanned aerial vehicle lands autonomously.

Compared with the prior art, the invention has the beneficial effects that: the unmanned aerial vehicle autonomous landing method provided by the invention can effectively improve landing accuracy, and specifically, the improved YOLOV model structure comprises an input layer, an improved main network extraction layer, an improved neck feature fusion layer and a head prediction layer, wherein the improved main network extraction layer comprises at least one CBM layer, a C3GC layer embedded in the CBM layer, a scSENET layer and a SPPELAN layer embedded in the scSENET layer, global context features can be effectively extracted through the embedded C3GC layer, the input feature map is subjected to deep compression nuclear decoupling, the response of the feature map is readjusted through weight distribution among learning channels, the features are subjected to recalibration, irrelevant or redundant feature information is filtered, the feature expression of a detection target is enhanced, the judgment capability and the generalization performance of the model are improved, the quality of feature representation and the capture capability of the model to key information can be effectively improved through combining a spatial attention mechanism and a channel attention mechanism, the feature can be prevented from being lost through a jump connection branch of the SPPELAN layer, and the feature map can be further subjected to deep treatment with a shallow pool, and the feature pool can not be subjected to deep treatment of the feature pool can be prevented; the improved neck feature fusion layer comprises a DWACM layer and a DWACM layer which are combined with global attention modulation from top to bottom and local attention modulation from bottom to top to capture high-level semantic information and bottom-layer detail information, and performs feature fusion through an attention mechanism; and add the separable convolution layer of degree of depth at the terminal for realize the passageway quantity in the characteristic integration process and double, avoid the passageway number that leads to because of replacing the operation of concatenation layer to reduce half the problem, consequently, can improve effectively through modified backbone network extraction layer and modified neck characteristic integration layer and catch the characteristic ability and fuse the characteristic, thereby improve the degree of accuracy and the precision of prediction, in addition, the data of rethread landing point bounding box, adjust unmanned aerial vehicle's flight condition, improve unmanned aerial vehicle's landing's degree of accuracy, thereby the ubiquitous technical problem that is easily influenced by the environment through positioning system location unmanned aerial vehicle landing has been solved, the accuracy is low results in.

According to an aspect of the above technical solution, the improved backbone network extraction layer includes four CBM layers stacked in sequence, a C3GC layer embedded in the CBM layers, and scSENET layers and SPPELAN layers introduced in sequence at an output end of the fourth C3GC layer.

According to an aspect of the above technical solution, the GCRes unit layer includes a skip connection branch and a unit branch that are set in parallel, the skip connection branch and the unit branch are connected and output through pixel-by-pixel superposition, the unit branch includes a CBM layer, a GCNET layer, a CBM layer, and a CBM layer that are sequentially connected, the GCNET layer includes a first GC branch and a second GC branch that are set in parallel and a skip connection branch, the first GC branch includes a convolution layer and a Softmax layer, and is input to the second GC branch through splicing by a splicing layer, and then sequentially passes through the convolution layer, the layer normalization layer, the Relu layer, and the convolution layer of the second GC branch, and is connected and output through splicing by the splicing layer with the skip connection branch.

According to an aspect of the above technology, the scSENET layer combines a spatial attention mechanism and a channel attention mechanism, features are selectively extracted on the space and the channel, the SPPELAN layer includes a jump connection branch and an SPP branch which are arranged in parallel, and the jump connection branch and the SPP branch are connected and output through pixel-by-pixel superposition.

According to an aspect of the above technology, the DWACM layers include a global attention branch from top to bottom and a local attention branch from bottom to top, and perform feature fusion on the global attention branch and the local attention branch, and output the features through a depth separable convolution layer, where the depth separable convolution layer includes a depth convolution layer and a point-by-point convolution layer that are disposed in series.

According to an aspect of the foregoing technical solution, the global attention branch includes a first global branch and a second global branch that are disposed in parallel, the local attention branch includes a first local branch and a second local branch that are disposed in parallel, the first global branch and the second local branch are jump connection branches, the second global branch includes at least one point-by-point convolution layer and a Sigmoid layer in sequence, and the first local branch includes a global average pooling two-dimensional layer, at least one full connection layer and a Sigmoid layer in sequence;

obtaining a first feature image to be fused through pixel-by-pixel multiplication of the first global branch and the first local branch, obtaining a second feature image to be fused through pixel-by-pixel multiplication of the second global branch and the second local branch, carrying out pixel-by-pixel addition fusion on the first feature image to be fused and the second feature image to be fused, and outputting through the depth separable convolution layer.

A second aspect of the present invention provides an autonomous landing system for an unmanned aerial vehicle, for implementing the autonomous landing method for an unmanned aerial vehicle, where the system includes:

a prediction bounding box acquisition module, configured to input a dataset into an improved YOLOV model structure by using an aerial ground image of an unmanned aerial vehicle as the dataset, and output a prediction bounding box, where the improved YOLOV model structure includes an input layer, an improved backbone network extraction layer, an improved neck feature fusion layer, and a head prediction layer, and includes:

inputting the data set into the input layer for preprocessing,

The landing boundary box acquisition module is used for carrying out loss function calculation on the prediction boundary box, adjusting parameters of the improved YOLOV model structure according to the loss difference between the prediction boundary box and the actual boundary box, deploying the improved YOLOV model structure and the adjusted parameters on an unmanned aerial vehicle, detecting an input aerial photography ground image by the unmanned aerial vehicle, and outputting a landing boundary box;

And the autonomous landing module is used for adjusting the flight state of the unmanned aerial vehicle according to the landing boundary box so that the unmanned aerial vehicle can land autonomously.

A third aspect of the present invention is to provide a readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the unmanned aerial vehicle autonomous landing method described above.

A fourth aspect of the present invention is to provide an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the unmanned aerial vehicle autonomous landing method described above when the program is executed.

Drawings

The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:

fig. 1 is a flowchart of an autonomous landing method of a unmanned aerial vehicle according to a first embodiment of the present invention;

FIG. 2 is a schematic diagram of a YOLOV model structure modified in accordance with a first embodiment of the present invention;

FIG. 3 is a schematic diagram of a part of the structure of an improved backbone network extraction layer according to a first embodiment of the present invention;

FIG. 4 is a schematic diagram of scSENET layers in a first embodiment of the present invention;

FIG. 5 is a schematic diagram of SPPELAN layers in a first embodiment of the present invention;

FIG. 6 is a schematic illustration of a portion of the structure of a modified neck feature fusion layer in accordance with a first embodiment of the present invention;

FIG. 7 is a schematic diagram of the structure of a prediction bounding box and an actual bounding box according to the first embodiment of the present invention;

FIG. 8 is a schematic view of a ground image according to a first embodiment of the present invention;

Fig. 9 is a block diagram of an autonomous landing system of a unmanned aerial vehicle according to a second embodiment of the present invention;

Description of the drawings element symbols:

A prediction bounding box acquisition module 100, a landing bounding box acquisition module 200, and an autonomous landing module 300.

Detailed Description

In order to make the objects, features and advantages of the present invention more comprehensible, embodiments accompanied with figures are described in detail below. Several embodiments of the invention are presented in the figures. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Example 1

Referring to fig. 1 to 7, an autonomous landing method of an unmanned aerial vehicle according to a first embodiment of the present invention is shown, and the method includes steps S10 to S12:

Step S10, inputting the data set into an improved YOLOV model structure by using the unmanned aerial vehicle aerial ground image as the data set, and outputting a prediction bounding box, where the improved YOLOV model structure includes an input layer, an improved backbone network extraction layer, an improved neck feature fusion layer, and a head prediction layer, and includes:

inputting the data set into the input layer for preprocessing,

The input layer is an input end, preprocesses the data set, and mainly comprises three parts of data enhancement, picture size processing and self-adaptive anchor frame calculation, wherein the initial picture size is 320 multiplied by 320.

In addition, as shown in fig. 2, the modified backbone network extraction layer includes a focusing layer (Focus), four CBM layers stacked in sequence, a C3GC layer embedded with the CBM layers, and scSENET layers and SPPELAN layers introduced in sequence at the output end of the fourth C3GC layer.

Specifically, the improved Backbone network extraction layer (Backbone) sequentially comprises a stacked focusing layer (Focus), a first CBM layer, a first C3GC layer, a second CBM layer, a second C3GC layer, a third CBM layer, a third C3GC layer, a fourth CBM layer, a fourth C3GC layer, scSENET layers and SPPELAN layers.

Further, as shown in fig. 3 (a), the focusing layer (Focus) includes a CBR layer and a stitching layer (Concat), and the image extraction operation is performed through several CBR layers arranged in parallel, and then the feature map of the focusing process is obtained through fusion stitching of the stitching layer (Concat). Compared with the traditional image extraction operation through slice processing, the CBR layer adopted by the embodiment not only quickens the model convergence speed, but also relieves the gradient dispersion of the improved YOLOV5 model structure to a certain extent, so that the improved YOLOV5 model structure is simpler and more stable.

Further, as shown in (b) of fig. 3, the CBM layer includes a convolutional layer (Conv), a batch normalization layer (BN) and a Mish layer (Mish activation function layer), the LeakReLU activation function in the activation layer is replaced by Mish function, the Mish function introduces more complex linear transformation, captures more interaction information between input data, and can enhance network expression capability; the gradient of Mish functions is smoother, the adaptability is better, the problems of training speed reduction caused by gradient disappearance and gradient explosion are effectively avoided, neuron necrosis is also prevented, and the generalization capability of the network model is enhanced.

Specifically, mish activation functions are:

，

Where j is the data set, The function is activated for Mish.

It should be noted that, a C3GC layer is introduced at the output end of the CBM layer, and the improvement of the C3GC layer includes introducing GCNET layers in the original C3 layer, as shown in (C) of fig. 3, the C3GC layer includes several CBM layers and an improved GCSP1_x layer connected after the several CBM layers.

Further, as shown in (d) of fig. 3, the modified GCSP1_x layer includes a first leg and a second leg disposed in parallel, the first leg including, in order, a CBM layer, at least one GCRes unit layer, and a convolutional layer (Conv); the second branch comprises a CBM layer, the first branch and the second branch are spliced and connected through a splicing layer (Concat), and then sequentially output through a batch normalization layer (BN), a Relu layer and the CBM layer.

As shown in (e) of fig. 3, the GCRes unit layers include a jump connection branch and a unit branch, which are arranged in parallel, the jump connection branch and the unit branch are connected and output through pixel-by-pixel superposition, the unit branch includes a CBM layer, a GCNET layer, a CBM layer and a CBM layer, which are sequentially connected, wherein the unit branch extracts global context features through the CBM layer with the size of 1×1, and then the processing of the GCNET layer, and then the depth compression core decoupling can be effectively performed on the input feature map, the response of the feature map is readjusted through the weight distribution among learning channels, the feature is recalibrated, irrelevant or redundant feature information is filtered, the feature expression of a detection target is enhanced, and the judgment capability and the generalization performance of a model are improved.

The global context feature can be effectively extracted through GCNET layers of processing, the capturing capability of the feature is improved, as shown in (f) in fig. 3, the GCNET layers comprise a first GC branch, a second GC branch and a jump connection branch, which are arranged in parallel, the first GC branch comprises a convolution layer (Conv) and a Softmax layer, the first GC branch is spliced and input to the second GC branch through a splicing layer (Concat), and the first GC branch, the second GC branch, the first GC branch, the second GC branch and the jump connection branch are spliced and output through the splicing layer (Concat) after sequentially passing through the convolution layer (Conv), the layer normalization Layer (LN), the Relu layer and the convolution layer (Conv). The Softmax layer carries out nonlinear mapping on the output result through a Softmax activation function, provides nonlinear modeling capability and solves the problem of multiple classification.

The scSENET layer combines a spatial attention mechanism and a channel attention mechanism, and features are selectively extracted on the space and the channel, so that the quality of feature representation and the capturing capability of a model on key information can be effectively improved.

Specifically, as shown in fig. 4, the scSENET layer includes a sSE part and a cSE part which are arranged in parallel, the sSE part includes two branches of an upper part of the scSENET layer, for introducing a spatial attention mechanism, the cSE part includes two branches of a lower part of the scSENET layer, for introducing a channel attention mechanism, and an output result of the sSE part and an output result of the cSE part are added and output, so as to complete a flow of the scSENET layer.

Wherein, the input of scSENET layers is B (batch size) ×C (channel number) ×H (height) ×W (width), sSE part passes through convolution layer (Conv), sigmoid layer to obtain characteristic data B×1×H×W, which represents that all the characteristics are gathered in one characteristic channel, the channel contains all the characteristic space information (but does not contain inter-channel information), and then the characteristic data is multiplied with the original data to obtain an output result.

In addition, cSE is pooled by a global average pooled two-dimensional layer (GlobalAvg Pool 2D) to obtain b×c×1×1, which represents that all the feature data are output as feature data with a height of 1 and a width of 1, all the feature data are stacked, the number of channels is reduced to half by a convolution layer (Conv), then the number of channels is increased to the original number of channels by a Relu layer and a convolution layer (Conv), weights are obtained by a Sigmoid layer, and the output result is obtained by multiplying the original data, and contains abundant channel information.

Further, as shown in (b) of fig. 5, the SPPELAN layer includes a jump connection leg and an SPP leg that are disposed in parallel, and the jump connection leg and the SPP leg are output through a pixel-by-pixel superposition connection.

Specifically, the SPP branch circuit includes a first CBS layer, a first sub-branch circuit and a second sub-branch circuit that are respectively connected with the output end of the first CBS layer, the first sub-branch circuit includes a first maximum pooling two-dimensional layer (Maxpool D), the first maximum pooling two-dimensional layer (Maxpool D) is respectively output to a splicing layer (Concat) and a second maximum pooling two-dimensional layer (Maxpool D), the second maximum pooling two-dimensional layer (Maxpool 2D) is respectively output to the splicing layer (Concat) and a third maximum pooling two-dimensional layer (Maxpool D), the third maximum pooling two-dimensional layer (Maxpool D) and the second sub-branch circuit are directly output to the splicing layer (Concat), the output end of the splicing layer (Concat) is connected with the second CBS layer, and the output of the second CBS layer and the output of the jump connection branch circuit are connected through overlapping pixel by pixel.

The CBS layers include, among others, a convolution layer (Conv), a bulk normalization layer (BN), and a SiLU layer (SiLU activation function layer) as shown in fig. 5 (a).

It should be noted that, in SPPELAN layers, all the pooling operations of the largest pooling two-dimensional layer (Maxpool D) share the same parameter configuration, so that the total parameter amount of the model can be effectively reduced, and the lightweight operation of the model is facilitated. The pooling output of each layer of the maximum pooling two-dimensional layer (Maxpool D) is directly used for the pooling operation of the next layer, and the multi-scale features are gathered together through the stitching operation of the stitching layer (Concat), so that the model can be helped to make full use of feature layers with different granularities to carry out accurate identification and decision. In order to further improve the feature expression capability of the model, a jump connection branch is introduced, so that deep features of the input features subjected to different-level pooling treatment can be more clearly distinguished from original shallow features which are not pooled, and meanwhile, the problem of partial feature information loss possibly occurring is prevented.

The improved neck feature fusion layer (Neck) is a top-down, bottom-up multi-scale feature fusion, as shown in fig. 2, to fully extract and fuse features.

The multi-scale feature fusion branch circuit comprises a first CBM layer, a first upsampling layer (Upsample), a first DWACM layer, a first CSP2_X layer, a second CBM layer, a second upsampling layer (Upsample) and a second DWACM layer which are connected through SPPELAN layers of output sequentially.

Further, the output characteristic result Q2 of the third C3GC layer is also output to the first DWACM layers, and the output characteristic result Q1 of the second C3GC layer is also output to the second DWACM layers.

In addition, the multi-scale feature fusion branch circuit sequentially comprises a second CSP2_X layer, a third CBM layer, a third DWACM layer, a third CSP2_X layer, a fourth CBM layer, a fourth DWACM layer and a fourth CSP2_X layer which are connected with the output end of the second DWACM layer from top to bottom.

Further, the output end of the second CBM layer is further connected to the third DWACM layer, and the output end of the first CBM layer is further connected to the fourth DWACM layer.

Wherein the shallow features include: in the improved backbone network extraction layer, the output characteristic result Q1 of the second C3GC layer and the output characteristic result Q2 of the third C3GC layer; in the neck feature fusion layer, the first CBM layer outputs a feature result Q3, and the second CBM layer outputs a feature result Q4.

Further, the deep features include: in the neck feature fusion layer, a first upsampling layer (Upsample) upsamples the output feature result S1, a second upsampling layer (Upsample) upsamples the output feature result S2, a third CBM layer outputs the feature result S3, and a fourth CBM layer outputs the feature result S4.

In particular, the improved neck feature fusion layer includes replacing the splice layer (Concat) with a DWACM layer. To recover the channel number resulting from the replaced splice layer (Concat) operation, the DWACM layer adds a depth separable convolutional layer at the end of the original ACM layer, doubling the number of channels while reducing the number of model parameters. Wherein introducing DWACM layers to replace the splice layer (Concat) comprises: q2 is used as a shallow layer feature, S1 is used as a deep layer feature and is input into a first DWACM layer; q1 is used as a shallow layer feature, S2 is used as a deep layer feature and is input into a second DWACM layer; q4 is used as a shallow layer feature, S3 is used as a deep layer feature and is input into a third DWACM layer; q3 is input as shallow features and S4 is input as deep features into the fourth DWACM layers.

The DWACM layer combines global attention modulation from top to bottom and local attention modulation from bottom to top to capture high-level semantic information and bottom-level detail information, and performs feature fusion through an attention mechanism; and the depth separable convolution layer is added at the tail end, so that the number of channels is doubled in the characteristic fusion process, and the problem that the number of channels is halved due to the operation of replacing a splicing layer is avoided. Compared with the conventional convolution layer, the depth separable convolution layer has lower parameter quantity, can effectively control the complexity of a model and improve the efficiency of the model while doubling the channel.

Specifically, as shown in (b) of fig. 6, the improvement of DWACM layers includes a global attention branch from top to bottom and a local attention branch from bottom to top, and the global attention branch and the local attention branch are subjected to feature fusion and output through a depth separable convolution layer, where the depth separable convolution layer includes a depth convolution layer and a point-by-point convolution layer that are arranged in series.

The global attention branch comprises a first global branch and a second global branch which are arranged in parallel, the local attention branch comprises a first local branch and a second local branch which are arranged in parallel, the first global branch and the second local branch are jump connection branches, the second global branch sequentially comprises at least one point-by-point convolution layer and a Sigmoid layer, and the first local branch sequentially comprises a global average pooling two-dimensional layer (GlobalAvg Pool D), at least one full connection layer (full Connected) and a Sigmoid layer;

The first global branch and the first local branch are connected and output through pixel-by-pixel multiplication to obtain a first feature map to be fused, the second global branch and the second local branch are connected and output through pixel-by-pixel multiplication to obtain a second feature map to be fused, the first feature map to be fused and the second feature map to be fused are connected and fused through pixel-by-pixel addition, and then the first feature map to be fused and the second feature map to be fused are output through the depth separable convolution layer.

In addition, as shown in fig. 6 (a), the csp2_x includes a single CBM layer and two continuous CBM layers arranged in parallel, and the single CBM layer and the two continuous CBM layers are spliced and connected by a splicing layer (Concat), and then output through batch normalization layers (BN), relu layers, and CBM layers.

As shown in fig. 2, the header prediction layer (Head) includes a first convolution layer (Conv) connected to the output end of the second csp2_x layer, a second convolution layer (Conv) connected to the output end of the third csp2_x layer, a third convolution layer (Conv) connected to the output end of the fourth csp2_x layer, and the first convolution layer (Conv) and the second convolution layer (Conv) and the third convolution layer (Conv) output to the image feature maps with sizes of 76×76×255, 38×38×255, and 19×19×255, respectively, so as to obtain the prediction bounding box.

Step S11, carrying out loss function calculation on the prediction boundary frame, adjusting parameters of the improved YOLOV model structure according to the loss difference between the prediction boundary frame and the actual boundary frame, deploying the improved YOLOV model structure and the adjusted parameters on an unmanned aerial vehicle, detecting an input aerial photography ground image by the unmanned aerial vehicle, and outputting a landing boundary frame;

As shown in fig. 7, B is a prediction bounding box, and B' is a real bounding box, specifically, position parameters of the prediction bounding box and the real bounding box are obtained, and an angle loss, a distance loss and a shape loss of the prediction bounding box and the real bounding box are respectively calculated, where a calculation formula of the angle loss is as follows:

，

Wherein, For angle loss, h is the distance between the center point height of the prediction bounding box and the center point height of the actual bounding box,/>Measuring the distance between the center point of the boundary frame and the center point of the actual boundary frame;

The calculation formula of the distance loss is as follows:

Wherein, For distance loss, w is the distance between the width of the center point of the prediction bounding box and the width of the center point of the actual bounding box,/>For the width of the actual bounding box,/>To predict the width of the bounding box,/>To predict the height of the bounding box,/>Is the height of the actual bounding box,/>A prediction bias value in the width direction or the height direction for the prediction boundary box and the actual boundary box; i=x, y; /(I)For predicting the predicted deviation value of the bounding box from the actual bounding box in the width direction,/>For the predicted deviation value in the elevation direction,/>Is a weight parameter;

the calculation formula of the shape loss is as follows:

，

Wherein, For shape loss,/>For the normalized value of the deviation of the prediction bounding box from the actual bounding box in the width direction,/>A normalized value of the deviation of the prediction bounding box from the actual bounding box in the height direction;

a loss function is constructed from the angle loss, the distance loss, and the shape loss.

The construction of the loss function is as follows:

，

Wherein, For the function loss value,/>Is the ratio of the intersection union of the prediction bounding box and the actual bounding box.

By introducing angle loss and shape loss and distance loss, the loss function is redefined, multi-dimensional loss composition is provided, prediction accuracy and precision are improved, loss values are minimized, and prediction of target points can be converged more quickly and higher positioning precision is achieved.

And step S12, adjusting the flight state of the unmanned aerial vehicle according to the landing boundary box, so that the unmanned aerial vehicle lands autonomously.

As shown in fig. 8, the triangle is the center point of the landing bounding box, the dot is the ground image center point, and the line connecting the triangle and the dot is the line connecting the center point of the landing bounding box and the ground image center point. Specifically, when the unmanned aerial vehicle lands autonomously, a ground image of a landing boundary box is acquired in real time, and ground image center point data and landing boundary box data are obtained;

performing deviation calculation on the ground image center point data and the landing boundary frame data to obtain relative deviation information;

And constructing a flight state function of the relative deviation information and the unmanned aerial vehicle, and controlling the flight speed and direction of the unmanned aerial vehicle until the unmanned aerial vehicle reaches the upper part of the landing boundary box.

Performing deviation calculation on the ground image center point data and the landing boundary box data to obtain relative deviation information, wherein the method specifically comprises the following steps of:

Taking the upper left corner of the ground image as the origin of coordinates, and acquiring the data coordinates of the center point of the ground image as @ k ，/>），

，

Wherein,Is the width coordinate of the center point of the ground image,/>The height coordinate of the center point of the ground image is W, the width of the ground image is W, and the height of the ground image is H;

Obtaining the upper left corner data coordinate of the landing boundary box as% ，/>) The lower right corner data coordinates are (/ >，/>) And (3) performing deviation calculation, wherein the calculation formula is as follows:

Wherein, For deviation width value,/>Is the deviation height value.

In addition, when the pixel difference between the landing bounding box and the ground image center point is less than 30, it will be considered to reach the true top of the landing bounding box and proceed to the next operation.

To further illustrate this embodiment, the constructed improved YOLOV model structure of this embodiment is compared to the classical YOLOV model structure, the dataset used in the experiment is a manually labeled 3120 dataset, the dataset comprises a training set and a test set, and the ratio of the training set to the test set is about 7: and 3, setting batchsize to 12 in the training process, wherein the actual input picture size is 320 pixels×320 pixels, eliminating errors through 300 rounds of iteration, and taking the highest precision in the iteration process as the final precision. The display card used by the built experimental platform is NVIDIA GeForce 4080, the display card has 16GB of display memory, the CUDA version is 11.7, the PyTorch version is 1.13, and the experimental result is shown in Table 1.

Table 1:

Note that, maps: average prediction accuracy; mAP_75: average prediction accuracy at an intersection ratio of 0.75; mAP_m: average prediction accuracy when predicting medium-sized targets; mAP_l: average prediction accuracy when predicting large targets.

Wherein the classical YOLOV model is a traditional YOLOV model;

Experiment group one is to replace CBR layer with CBM layer in backbone network extraction layer on classical YOLOV model;

the second experimental group is to embed a C3GC layer in the backbone network extraction layer on a classical YOLOV model;

The third experimental group is embedding scSENET layers in a backbone network extraction layer on a classical YOLOV model;

The fourth experimental group is embedding SPPELAN layers in the extraction layer of the main network on a classical YOLOV model;

experiment group five replaces DWACM layers in the neck feature fusion layer on the classical YOLOV model (Concat);

The sixth experiment group is to replace the CBR layer with the CBM layer in the backbone network extraction layer on the classical YOLOV model and embed the C3GC layer;

experiment group seven is to replace CBR layer with CBM layer in backbone network extraction layer on classical YOLOV model, and embed C3GC layer and scSENET layer;

Experiment group eight is to replace CBR layer with CBM layer in backbone network extraction layer on classical YOLOV model, and embed C3GC layer and scSENET layer and SPPELAN layer;

The modified YOLOV model is to replace CBR layers with CBM layers in the backbone network extraction layer and embed C3GC layers and scSENET layers and SPPELAN layers in the classical YOLOV model, while replacing splice layers with DWACM layers in the neck feature fusion layer (Concat).

As can be seen from table 1, the improved YOLOV model structure of the present embodiment has higher prediction accuracy by the improved backbone network extraction layer and the improved neck feature fusion layer, so as to improve the accuracy and precision of prediction.

Compared with the prior art, the unmanned aerial vehicle autonomous landing method provided by the embodiment has the beneficial effects that: the unmanned aerial vehicle autonomous landing method provided by the invention can effectively improve landing accuracy, and specifically, the improved YOLOV model structure comprises an input layer, an improved main network extraction layer, an improved neck feature fusion layer and a head prediction layer, wherein the improved main network extraction layer comprises at least one CBM layer, a C3GC layer embedded in the CBM layer, a scSENET layer and a SPPELAN layer embedded in the scSENET layer, global context features can be effectively extracted through the embedded C3GC layer, the input feature map is subjected to deep compression nuclear decoupling, the response of the feature map is readjusted through weight distribution among learning channels, the features are subjected to recalibration, irrelevant or redundant feature information is filtered, the feature expression of a detection target is enhanced, the judgment capability and the generalization performance of the model are improved, the quality of feature representation and the capture capability of the model to key information can be effectively improved through combining a spatial attention mechanism and a channel attention mechanism, the feature can be prevented from being lost through a jump connection branch of the SPPELAN layer, and the feature map can be further subjected to deep treatment with a shallow pool, and the feature pool can not be subjected to deep treatment of the feature pool can be prevented; the improved neck feature fusion layer comprises a DWACM layer and a DWACM layer which are combined with global attention modulation from top to bottom and local attention modulation from bottom to top to capture high-level semantic information and bottom-layer detail information, and performs feature fusion through an attention mechanism; and add the separable convolution layer of degree of depth at the terminal for realize the passageway quantity in the characteristic integration process and double, avoid the passageway number that leads to because of replacing the operation of concatenation layer to reduce half the problem, consequently, can improve effectively through modified backbone network extraction layer and modified neck characteristic integration layer and catch the characteristic ability and fuse the characteristic, thereby improve the degree of accuracy and the precision of prediction, in addition, the data of rethread landing point bounding box, adjust unmanned aerial vehicle's flight condition, improve unmanned aerial vehicle's landing's degree of accuracy, thereby the ubiquitous technical problem that is easily influenced by the environment through positioning system location unmanned aerial vehicle landing has been solved, the accuracy is low results in.

Example two

Referring to fig. 9, an autonomous landing system for an unmanned aerial vehicle according to a second embodiment of the present invention is shown, the system includes:

A prediction bounding box acquisition module 100, configured to input a dataset into a modified YOLOV model structure by using an aerial ground image of an unmanned aerial vehicle as the dataset, and output a prediction bounding box, where the modified YOLOV model structure includes an input layer, a modified backbone network extraction layer, a modified neck feature fusion layer, and a head prediction layer, and includes:

inputting the data set into the input layer for preprocessing,

the landing bounding box obtaining module 200 is configured to perform loss function calculation on the prediction bounding box, adjust parameters of the improved YOLOV model structure according to a loss difference between the prediction bounding box and an actual bounding box, deploy the improved YOLOV model structure and the adjusted parameters on an unmanned aerial vehicle, and detect an input aerial ground image by the unmanned aerial vehicle and output a landing bounding box;

And the autonomous landing module 300 is configured to adjust a flight state of the unmanned aerial vehicle according to the landing bounding box, so that the unmanned aerial vehicle lands autonomously.

Compared with the prior art, the unmanned aerial vehicle autonomous landing system that this embodiment provided, beneficial effect lies in: the unmanned aerial vehicle autonomous landing system provided by the invention can effectively improve the landing accuracy, specifically, the model structure YOLOV is improved through the prediction boundary box acquisition module, wherein the improved main network extraction layer and the improved neck feature fusion layer can effectively improve the capability of capturing features and the capability of fusing features, so that the accuracy and the precision of prediction are improved, and the technical problem that the landing of the unmanned aerial vehicle is easily affected by the environment and the accuracy is low due to the positioning of the unmanned aerial vehicle through a positioning system is solved.

The third embodiment of the present invention further provides a readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the unmanned aerial vehicle autonomous landing method according to the first embodiment.

The fourth embodiment of the present invention further provides an electronic device, where the electronic device includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and the processor implements the steps of the unmanned aerial vehicle autonomous landing method according to the first embodiment when executing the program.

The technical features of the above embodiments may be arbitrarily combined, and for brevity, all of the possible combinations of the technical features of the above embodiments are not described, however, they should be considered as the scope of the description of the present specification as long as there is no contradiction between the combinations of the technical features.

Those of skill in the art will appreciate that the logic and/or steps represented in the flow diagrams or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable storage medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer-readable storage medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. An unmanned aerial vehicle autonomous landing method, characterized in that the unmanned aerial vehicle autonomous landing method comprises:

inputting the data set into the input layer for preprocessing,

2. The unmanned aerial vehicle autonomous landing method of claim 1, wherein the modified backbone network extraction layer comprises four CBM layers stacked in sequence, a C3GC layer embedded in the CBM layers, and scSENET layers, SPPELAN layers introduced in sequence at the output of the fourth C3GC layer.

3. The unmanned aerial vehicle autonomous landing method according to claim 1, wherein the GCRes unit layers comprise a jump connection branch and a unit branch which are arranged in parallel, the jump connection branch and the unit branch are connected and output through pixel-by-pixel superposition, the unit branch comprises a CBM layer, a GCNET layer, a CBM layer and a CBM layer which are sequentially connected, the GCNET layer comprises a first GC branch and a second GC branch which are arranged in parallel and a jump connection branch, the first GC branch comprises a convolution layer and a Softmax layer, the first GC branch is spliced and input to the second GC branch through a splicing layer, and the jump connection branch are connected and output through the splicing layer after sequentially passing through the convolution layer, the layer normalization layer, the Relu layer and the convolution layer of the second GC branch.

4. The unmanned aerial vehicle autonomous landing method of claim 2, wherein the scSENET layer combines a spatial attention mechanism and a channel attention mechanism, features are selectively extracted on space and channels, and the SPPELAN layer comprises a jump connection leg and an SPP leg arranged in parallel, and the jump connection leg and the SPP leg are output through pixel-by-pixel superposition connection.

5. The unmanned aerial vehicle autonomous landing method of claim 1, wherein the DWACM layers comprise a top-down global attention branch and a bottom-up local attention branch, and the global attention branch and the local attention branch are feature fused and output through a depth separable convolutional layer comprising a depth convolutional layer and a pointwise convolutional layer arranged in series.

6. The unmanned aerial vehicle autonomous landing method of claim 5, wherein the global attention branch comprises a first global branch and a second global branch which are arranged in parallel, the local attention branch comprises a first local branch and a second local branch which are arranged in parallel, the first global branch and the second local branch are jump connection branches, the second global branch sequentially comprises at least one point-by-point convolution layer and a Sigmoid layer, and the first local branch sequentially comprises a global average pooling two-dimensional layer, at least one full connection layer and a Sigmoid layer;

7. An unmanned aerial vehicle autonomous landing system for implementing the unmanned aerial vehicle autonomous landing method of any of claims 1 to 6, the system comprising:

inputting the data set into the input layer for preprocessing,

8. A readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the unmanned aerial vehicle autonomous landing method according to any of claims 1 to 6.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the unmanned aerial vehicle autonomous landing method of any of claims 1 to 6 when the program is executed.