CN117853955A

CN117853955A - Unmanned aerial vehicle small target detection method based on improved YOLOv5

Info

Publication number: CN117853955A
Application number: CN202311844165.2A
Authority: CN
Inventors: 邱潇; 陈嘉豪; 陆毓晟; 郑恩辉
Original assignee: China Jiliang University
Current assignee: China Jiliang University
Priority date: 2023-12-28
Filing date: 2023-12-28
Publication date: 2024-04-09

Abstract

The invention discloses an unmanned aerial vehicle small target detection method based on improved YOLOv 5. The invention comprises the following steps: firstly, after the YOLOv5 network is improved, an improved YOLOv5 network is obtained; and then training the improved YOLOv5 network by using the unmanned aerial vehicle image data set to obtain a unmanned aerial vehicle small target detection network model, so as to realize target detection of the unmanned aerial vehicle image to be detected. According to the invention, by adding the additional pre-measuring head, the characteristics are extracted from the shallower layer of the network, so that the detection precision of the model on the small target of the unmanned aerial vehicle is improved; a CBAM attention mechanism is introduced, so that feature extraction of the small target of the unmanned aerial vehicle is enhanced, and interference of complex background elements on the small target of the unmanned aerial vehicle is reduced; the BiFPN network structure is used, so that higher-level feature fusion is enhanced and the detection speed is improved; the invention can improve the detection precision and detection speed of the small target of the unmanned aerial vehicle and obtain higher detection performance.

Description

Unmanned aerial vehicle small target detection method based on improved YOLOv5

Technical Field

The invention relates to a small target detection method of an unmanned aerial vehicle, which belongs to the technical field of target detection, in particular to a small target detection method of an unmanned aerial vehicle based on improved YOLOv 5.

Background

With the development and maturity of unmanned aerial vehicle technology, various civil unmanned aerial vehicles begin to enter the field of vision of masses, and unmanned aerial vehicles are widely applied in various aspects. Although it brings much convenience to us, it also brings many hidden troubles. Therefore, it is very necessary to detect small objects of the unmanned aerial vehicle. Currently, most unmanned aerial vehicle detection technologies are based on radar and photoelectric technologies. The cost of visual inspection is lower than these methods. Especially, with the introduction of a deep learning algorithm, the small target detection of the visual unmanned aerial vehicle is one of the most economical, rapid and accurate detection methods. Currently, the mainstream detection network is divided into a single-stage detection algorithm and a two-stage detection algorithm. In contrast, the two-stage algorithm has higher detection accuracy, but the real-time performance is poor, compared with the single-stage algorithm, the single-stage algorithm focuses on the balance of detection efficiency and detection accuracy, and is widely applied in the field of target detection, wherein the YOLO series is most actively researched, the YOLOv5 target detection model is one of the best known target recognition methods at present, and the speed and accuracy are well balanced. Although the YOLOv5 algorithm is excellent in detection speed and accuracy, challenges still exist in processing unmanned aerial vehicle small target detection under a complex background, more false detection and missed detection exist in unmanned aerial vehicle small target detection, and the detection accuracy of unmanned aerial vehicle small targets with fewer characteristics in data sets still needs to be improved.

Disclosure of Invention

In order to solve the problems in the background art, the invention aims to provide an unmanned aerial vehicle small target detection method based on improved YOLOv5, which enhances the detection performance of the unmanned aerial vehicle small target and improves the light weight performance of a detection algorithm.

In order to achieve the above purpose, the technical scheme of the invention is as follows:

1. unmanned aerial vehicle small target detection method based on improved YOLOv5

S1: after the YOLOv5 network is improved, an improved YOLOv5 network is obtained;

s2: training the improved YOLOv5 network by using the unmanned aerial vehicle image dataset to obtain a unmanned aerial vehicle small target detection network model;

s3: and inputting the unmanned aerial vehicle image to be detected into an unmanned aerial vehicle small target detection network model, and outputting an unmanned aerial vehicle small target detection result by the model.

The S1 specifically comprises the following steps:

introducing a attention mechanism module CBAM into a Backbone network Backbone of the YOLOv5 network to obtain an improved Backbone network Backbone;

after replacing the PANet structure in the Neck network Neck of the YOLOv5 network by using the bidirectional feature pyramid network, obtaining an improved Neck network Neck;

a YOLOv5 network of pre-measurement heads was added and noted as the fourth pre-measurement head.

The attention mechanism module CBAM is introduced into a Backbone network Backbone of the YOLOv5 network, so that an improved Backbone network Backbone is obtained, specifically:

after all the C3 modules of the Backbone network backhaul, an attention mechanism module CBAM is added, the input of each attention mechanism module CBAM is used as the output of the corresponding C3 module, and the output of each attention mechanism module CBAM is used as the input of the next network layer.

The improved Neck network Neck specifically comprises a plurality of convolution layers, a plurality of up-sampling layers and a plurality of C3 modules, wherein the SPPF module of the Backbone network back is connected with the first convolution layer of the improved Neck network Neck, the first convolution layer is connected with the first up-sampling layer, the output of the third C3 module of the Backbone network back is connected with the output of the first up-sampling layer and then is input into the first C3 module of the improved Neck network Neck, the first C3 module of the improved Neck network Neck is connected with the second convolution layer, the second convolution layer is connected with the second up-sampling layer, the output of the second C3 module of the Backbone network back is connected with the output of the second up-sampling layer and then is input into the second C3 module of the improved Neck network Neck, the second C3 module of the improved Neck network Neck is connected with the third up-sampling layer, the output of the first C3 module of the Backbone network back is connected with the third up-sampling layer and then is input into the improved Neck network Neck 3 after the output of the third up-sampling layer is connected with the third C3 module of the improved Neck network Neck; the improved third C3 module of the Neck network Neck is also connected with a fourth convolution layer, and the output of the fourth convolution layer, the output of the third convolution layer and the output of the second C3 module of the Backbone network Backbone are cascaded and then input into the improved fourth C3 module of the Neck network Neck, and the improved fourth C3 module of the Neck network Neck is connected with a third detection head; the output of the fifth convolution layer, the output of the second convolution layer and the output of the third C3 module of the Backbone network Backbone are cascaded and then input into the fifth C3 module of the improved Neck network Neck, and the fifth C3 module of the improved Neck network Neck is connected with the second detection head; the improved fifth C3 module of the Neck network Neck is connected with a sixth convolution layer, and the output of the sixth convolution layer, the output of the first convolution layer and the output of the sixth convolution layer are cascaded and then input into the improved sixth C3 module of the Neck network Neck, and the improved sixth C3 module of the Neck network Neck is connected with the first detection head.

In the step S2, the CIoU_Loss Loss function of the Yolov5 network is replaced by the EIoU_Loss Loss function, and the supervision training of the improved Yolov5 network is completed.

2. Computer equipment

The computer device comprises a memory storing a computer program and a processor implementing the steps of the method when the processor executes the computer program.

3. Computer readable storage medium

The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the method.

The beneficial effects of the invention are as follows:

according to the invention, by adding the additional pre-measuring head, the characteristics are extracted from the shallower layer of the network, so that the detection precision of the model on the small target of the unmanned aerial vehicle is improved; a CBAM attention mechanism is introduced, so that feature extraction of the small target of the unmanned aerial vehicle is enhanced, and interference of complex background elements on the small target of the unmanned aerial vehicle is reduced; the BiFPN network structure is used, so that higher-level feature fusion is enhanced and the detection speed is improved; the EIoU loss function is used for improving the sensitivity to the width and the height, optimizing the efficiency of detecting the small target of the unmanned aerial vehicle and accelerating the model convergence. The invention can improve the detection precision and detection speed of the small target of the unmanned aerial vehicle and obtain higher detection performance.

Drawings

FIG. 1 is a schematic overall flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of a basic network structure of the improved YOLOv5 of the present invention;

FIG. 3 is a schematic diagram of a module of the attention mechanism of the CBAM introduced in the present invention; wherein (a) is a CBAM attention mechanism module position diagram introduced in a Backbone network of a backhaul, and (b) is a CBAM attention mechanism module structure diagram;

FIG. 4 is a schematic diagram of three structures FPN, PAN, biFPN in the present invention;

FIG. 5 is a schematic diagram of a prediction frame and a real frame according to the present invention;

FIG. 6 is a schematic diagram of a portion of a dataset according to the present invention;

fig. 7 is a schematic diagram of a detection result of an unmanned aerial vehicle in the present invention.

Detailed Description

The specific operation of the present invention will be described in further detail with reference to the accompanying drawings.

The definition of small objects includes absolute size definition and relative size definition. The absolute size is defined as the object whose pixel size is smaller than 32 x 32 is defined as the small object. The relative size is defined as the small target when the ratio of the target bounding box area to the total image area is between 0.08% and 0.58%, or the ratio of the aspect ratio of the target bounding box to the aspect ratio of the image is less than 0.1.

The implementation process of the invention is shown in fig. 1, and comprises the following steps:

s1 specifically comprises the following steps:

introducing a attention mechanism module CBAM into a Backbone network Backbone of the YOLOv5 network to obtain an improved Backbone network Backbone, mainly focusing on feature fusion among convolution operations of different channels, enhancing the features of target features and increasing the capability of accurately detecting the target;

after the PANet structure in the Neck network Neck of the YOLOv5 network is replaced by the bi-directional feature pyramid network (BiFPN), an improved Neck network Neck is obtained, the pyramid structure is enhanced by the bi-directional feature pyramid network (BiFPN), feature fusion is enhanced, and the robustness of small targets detected by the model is improved;

the pre-measurement head of the YOLOv5 network is added and marked as the fourth pre-measurement head, input from the surface convolution layer is mainly received, the low-layer high-resolution feature map is added into the network, feature fusion is carried out on the feature information of the high layer and the low layer, negative effects caused by severe change of the object scale of the unmanned aerial vehicle image are effectively reduced, and the capability of detecting small targets of the model can be enhanced.

The attention mechanism module CBAM is introduced into a Backbone network Backbone of the YOLOv5 network, and an improved Backbone network Backbone is obtained, specifically:

as shown in fig. 3 (a), an attention mechanism module CBAM is added after all C3 modules of the Backbone network Backbone, and the input of each attention mechanism module CBAM is taken as the output of the corresponding C3 module, and the output of each attention mechanism module CBAM is taken as the input of the next network layer, namely the input of the next convolution layer or SPPF. The structure of each attention mechanism module CBAM is shown in fig. 3 (b).

The attention mechanism module CBAM is an attention mechanism module used in a feedforward neural network, and is a mixed attention mechanism module combining two aspects of feature space and feature channels, and an input feature map is processed from the two aspects of the channels and the space, so that a more comprehensive image feature relation is obtained.

Channel attention feature M _c (F) Is a mathematical expression of (a):

wherein sigma () is a Sigmoid activation function,and->W for global max pooling and global average pooling ₀ And W is ₁ Representing the weight parameters.

Spatial attention feature M _s (F) Is a mathematical expression of (a):

wherein,a convolution kernel operation of size 7 x 7.

The improved Neck network Neck specifically comprises a plurality of convolution layers, a plurality of up-sampling layers and a plurality of C3 modules, wherein the SPPF module of the Backbone network back is connected with the first convolution layer of the improved Neck network Neck, the first convolution layer is connected with the first up-sampling layer, the output of the third C3 module of the Backbone network back is connected with the output of the first up-sampling layer and then is input into the first C3 module of the improved Neck network Neck, the first C3 module of the improved Neck network Neck is connected with the second convolution layer, the second convolution layer is connected with the second up-sampling layer, the output of the second C3 module of the Backbone network back is connected with the output of the second up-sampling layer and then is input into the second C3 module of the improved Neck network Neck, the second C3 module of the improved Neck network Neck is connected with the third convolution layer, the output of the first C3 module of the Backbone network back is connected with the third up-sampling layer and then is input into the improved Neck network Neck, and then is input into the improved Neck network; the improved third C3 module of the Neck network Neck is also connected with a fourth convolution layer, and the output of the fourth convolution layer, the output of the third convolution layer and the output of the second C3 module of the Backbone network Backbone are cascaded and then input into the improved fourth C3 module of the Neck network Neck, and the improved fourth C3 module of the Neck network Neck is connected with a third detection head; the output of the fifth convolution layer, the output of the second convolution layer and the output of the third C3 module of the Backbone network Backbone are cascaded and then input into the fifth C3 module of the improved Neck network Neck, and the fifth C3 module of the improved Neck network Neck is connected with the second detection head; the improved fifth C3 module of the Neck network Neck is connected with a sixth convolution layer, and the output of the sixth convolution layer, the output of the first convolution layer and the output of the sixth convolution layer are cascaded and then input into the improved sixth C3 module of the Neck network Neck, and the improved sixth C3 module of the Neck network Neck is connected with the first detection head.

Three structures of FPN, PAN and BiFPN as shown in fig. 4 (a), 4 (b) and 4 (c), the neok network of YOLOv5 uses a PANet structure to handle multi-scale features. It adds a bottom-up path aggregation network over the FPN while considering the semantic information of the top layer and the location information of the bottom layer. The PANet structure can fuse shallow layer strong locating features and high layer strong semantic features in the network, but directly adds and calculates when features with different resolutions are fused, and does not set weight parameters for different feature layers. A weighted bi-directional feature pyramid bippn network is added to improve the nck network structure of YOLOv 5. The BiFPN utilizes weight, normalization and bidirectional connection to enhance information interaction and integration capability among multi-scale features, extracts and fuses multi-level and multi-scale information, and can better capture and process detail and context information required by detecting a small target of the unmanned aerial vehicle. The BiFPN structure is fused on a PANet basis.

The BiFPN structure is shown in (c) of fig. 4, the BiFPN network omits intermediate nodes of a P5 link and a P4 link on the basis of the PANet network, the two nodes have smaller contribution to output fusion characteristics, then a first node of the P5 link is input to the intermediate node of the P4 link, and a second node of the intermediate node of the P4 link is input to the P3 link, and meanwhile the first node of the P4 link is input to a third node, so that the network structure is simplified, and meanwhile, original shallow information nodes are reserved, more characteristics can be fused without increasing the calculation amount, and the detection speed is improved. The BiFPN network also adds an additional weight to each input, and finally the BiFPN network is stacked multiple times in the algorithm network to obtain more high-level feature fusion.

The expression of the weighting characteristics is:

wherein O is a node output value; w (w) _i Is a learning weight; i _i An input value for an inode; w (w) _j Epsilon=0.0001 for the total number of input nodes to enhance the stability of this value.

and S2, replacing the CIoU_Loss Loss function of the Yolov5 network by adopting the EIoU_Loss Loss function, completing the supervision training of the improved Yolov5 network, and optimizing the efficiency of detecting the micro target by the model.

The ciou_loss function is deficient in that the measurement of aspect ratio is too complex, resulting in slow convergence speed, and the aspect ratio cannot be substituted for width and height alone. Replacing the ciou_loss function with the eiou_loss function decomposes the aspect ratio following CIoU principles, divides the factor into two terms, one representing the width of the reference frame and the other representing the height of the reference frame, and calculates them separately. The EIoU_Loss Loss function not only considers the difference of the center point, but also considers the actual difference between the real frame and the prediction frame, directly minimizes the differences and accelerates the model convergence, reduces the training time of an algorithm model, improves the positioning accuracy, and accelerates the detection rate of small targets of the unmanned aerial vehicle.

Wherein L is _EIOU 、L _IOU 、L _dis 、L _asp IOU is EIoU loss function, cross-union ratio loss, center point distance loss, width-height loss and the ratio of the intersection area of the predicted frame and the real frame to the union area respectively; ρ represents the Euclidean distance between the center ends of the prediction frame and the real frame, c _w And c _h For the width and height of the smallest bounding rectangle frame containing the real and predicted frames, c denotes the diagonal length of the smallest rectangle frame surrounding both frames, b ^gt Respectively representing the coordinates of central points of the prediction frame and the real frame, w, h and w ^gt And h ^gt Representing the width and height of the prediction frame and the width and height of the real frame, respectively.

In a specific implementation, the collected unmanned aerial vehicle images need to be considered under different illumination conditions, different shooting backgrounds, different angles, different weather conditions and different models of unmanned aerial vehicles, as shown in fig. 6. And constructing a data set of the unmanned aerial vehicle, and dividing the data set into a training set and a verification set. Training the improved YOLOv5 model by using the marked unmanned aerial vehicle image training set to obtain a detection model, and testing the trained YOLOv5 model by using the testing set, wherein the schematic diagram of the detection result is shown in fig. 7.

Any modification, equivalent replacement, and improvement made within the scope and spirit of the present invention should be included in the scope of the claims of the present invention, and the scope and spirit of the present invention should be based on the contents of the claims.

Claims

1. The unmanned aerial vehicle small target detection method based on improved YOLOv5 is characterized by comprising the following steps of:

2. The unmanned aerial vehicle small target detection method based on improved YOLOv5 of claim 1, wherein S1 specifically comprises:

3. The unmanned aerial vehicle small target detection method based on improved YOLOv5 of claim 2, wherein the attention mechanism module CBAM is introduced into a Backbone network Backbone of the YOLOv5 network, and the improved Backbone network Backbone is obtained specifically by:

4. The unmanned aerial vehicle small target detection method based on improved YOLOv5 of claim 2, wherein the improved Neck network neg specifically comprises a plurality of convolution layers, a plurality of upsampling layers and a plurality of C3 modules, the SPPF module of the Backbone network back is connected with the first convolution layer of the improved Neck network neg, the first convolution layer is connected with the first upsampling layer, the output of the third C3 module of the Backbone network back is cascaded with the output of the first upsampling layer and then is input into the first C3 module of the improved Neck network neg, the first C3 module of the improved Neck network neg is connected with the second convolution layer, the second convolution layer is connected with the second upsampling layer, the output of the second C3 module of the Backbone network back is cascaded with the output of the second upsampling layer and then is input into the second C3 module of the improved Neck network neg, the second C3 module of the improved Neck network neg is connected with the third convolution layer, the output of the third C3 module of the improved Neck network neg is cascaded with the output of the third upsampling layer, and the first C3 module of the improved Neck network neg is input into the third back; the improved third C3 module of the Neck network Neck is also connected with a fourth convolution layer, and the output of the fourth convolution layer, the output of the third convolution layer and the output of the second C3 module of the Backbone network Backbone are cascaded and then input into the improved fourth C3 module of the Neck network Neck, and the improved fourth C3 module of the Neck network Neck is connected with a third detection head; the output of the fifth convolution layer, the output of the second convolution layer and the output of the third C3 module of the Backbone network Backbone are cascaded and then input into the fifth C3 module of the improved Neck network Neck, and the fifth C3 module of the improved Neck network Neck is connected with the second detection head; the improved fifth C3 module of the Neck network Neck is connected with a sixth convolution layer, and the output of the sixth convolution layer, the output of the first convolution layer and the output of the sixth convolution layer are cascaded and then input into the improved sixth C3 module of the Neck network Neck, and the improved sixth C3 module of the Neck network Neck is connected with the first detection head.

5. The unmanned aerial vehicle small target detection method based on improved YOLOv5 of claim 2, wherein in S2, the ciou_loss function of the YOLOv5 network is replaced by the eiou_loss function, and supervision training of the improved YOLOv5 network is completed.

6. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.

7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 5.