CN116310273A

CN116310273A - Unmanned aerial vehicle detection method based on multi-cavity convolution and SE attention residual error

Info

Publication number: CN116310273A
Application number: CN202310032053.0A
Authority: CN
Inventors: 张会娟; 李坤鹏; 姬淼鑫; 张弛; 袁航; 刘建娟; 刘振江
Original assignee: Henan University of Technology
Current assignee: Henan University of Technology
Priority date: 2023-01-10
Filing date: 2023-01-10
Publication date: 2023-06-23

Abstract

The invention provides an unmanned aerial vehicle detection method based on multi-cavity convolution and SE attention residual, which comprises the following steps: acquiring an unmanned aerial vehicle image sample set, dividing the sample set into a training set and a verification set, and preprocessing image samples in the training set; improving a target detection model YOLOv5 to obtain an unmanned aerial vehicle target detection model, which specifically comprises the following steps: introducing an SE attention residual error network module after the SPP structure of the backbone network of the target detection model YOLOv 5; introducing a multi-cavity convolution fusion module after each convolution layer into the PANet structure of the feature fusion network of the target detection model YOLOv 5; constructing a GPU training environment and setting training parameters; inputting the training set and the verification set into an improved unmanned aerial vehicle target detection model for training and verification to obtain a trained unmanned aerial vehicle target detection model; and acquiring an unmanned aerial vehicle target image to be identified, inputting the unmanned aerial vehicle target image to the trained unmanned aerial vehicle target detection model for detection, and obtaining an unmanned aerial vehicle target detection result.

Description

Unmanned aerial vehicle detection method based on multi-cavity convolution and SE attention residual error

Technical Field

The invention relates to the field of target detection, in particular to an unmanned aerial vehicle detection method based on multi-cavity convolution and SE attention residual errors.

Background

Along with the gradual release of low airspace authority in recent years, unmanned aerial vehicles gradually enter the field of vision of common people, which can bring certain economic benefits and enrich the daily life of people. However, with the gradual perfection of unmanned aerial vehicle technology and functions, the following safety problems are also exposed, such as the increase of black flying events of unmanned aerial vehicles caused by improper supervision, which brings great challenges to national safety and people's life. The unmanned aerial vehicle is light and small, easy to carry, low in operation difficulty, capable of being additionally provided with camera equipment, communication equipment and the like, and wide in application in military and civil fields, and correspondingly also provides an unlawful man with a machine, such as crime or attack by using the unmanned aerial vehicle, and frequent events such as influence on airport flight safety caused by black flight of the unmanned aerial vehicle. Therefore, the unmanned aerial vehicle countering technology must also develop along with the progress of unmanned aerial vehicle technology, unmanned aerial vehicle detection is used as the basis and key of unmanned aerial vehicle countering technology, and the countering of the unmanned aerial vehicle can be realized only by accurately detecting the target.

Compared with other detection technologies, the photoelectric detection technology has the characteristics of visual detection results, high precision, wide application places and the like, and gradually becomes an indispensable part of the unmanned aerial vehicle-resisting technology.

The traditional target detection algorithm is to manually design some graphic feature information and some related parameters in advance, then visually extract the manual feature point information, and finally identify the special image feature information through different image matching algorithms and the like. However, the conventional target detection algorithm has great limitation, and cannot meet the requirements of real-time performance and high precision of the current detection task.

In recent years, with deep study, an unmanned aerial vehicle detection method based on deep study becomes a popular study direction of a photoelectric detection technology, but as the unmanned aerial vehicle has very small proportion and unobvious characteristics compared with the background in an imaging diagram when flying at low altitude, a large amount of characteristic information can be lost especially when the characteristics are extracted through a multi-layer convolution network, so that the unmanned aerial vehicle target information in a high-layer characteristic diagram is seriously lost, the detection precision of an output end is lower, and the problem of higher false detection rate exists.

Aiming at the problems existing in the detection of the unmanned aerial vehicle at present, an ideal technical solution is urgently sought.

Disclosure of Invention

The invention aims at overcoming the defects of the prior art, and provides an unmanned aerial vehicle detection method based on multi-cavity convolution and SE attention residual errors, so as to improve the detection precision of unmanned aerial vehicle targets.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: an unmanned aerial vehicle detection method based on multi-cavity convolution and SE attention residual errors comprises the following steps:

acquiring an unmanned aerial vehicle image sample set, dividing the sample set into a training set and a verification set, and preprocessing image samples in the training set;

improving a target detection model YOLOv5 to obtain an unmanned aerial vehicle target detection model, which specifically comprises the following steps:

introducing an SE attention residual error network module after the SPP structure of the backbone network of the target detection model YOLOv 5;

introducing a multi-cavity convolution fusion module after each convolution layer into the PANet structure of the feature fusion network of the target detection model YOLOv 5;

constructing a GPU training environment, setting training parameters, and loading a data configuration file and a model configuration file;

inputting the training set and the verification set into an improved unmanned aerial vehicle target detection model for training and verification to obtain a trained unmanned aerial vehicle target detection model;

and acquiring an unmanned aerial vehicle target image to be identified, inputting the unmanned aerial vehicle target image to the trained unmanned aerial vehicle target detection model for detection, and obtaining an unmanned aerial vehicle target detection result.

In some embodiments, the SE attention residual network module comprises a multi-hole convolution fusion module and an SE attention module;

the multi-cavity convolution fusion module is connected with the SPP structure and is used for performing receptive field expansion processing on the topmost feature map output by the SPP structure to obtain a fusion feature map;

the SE attention module is connected with the multi-cavity convolution fusion module, receives the fusion feature map output by the multi-cavity convolution fusion module, uses global average pooling to perform Sqeze operation to obtain a global feature map, uses two full connection layers to reduce and recover channels of the global feature map, reduces the calculated amount and obtains attention weight;

and carrying out weighting operation on the attention weight and the topmost feature map output by the SPP structure to obtain an enhanced feature map.

In some embodiments, the multi-hole convolution fusion module comprises three hole convolution layers with different hole rates, a splicing module, a convolution layer and a feature fusion module, wherein each hole convolution layer comprises a depth separable convolution module and a hole convolution module;

respectively carrying out depth separable convolution processing and cavity convolution processing on the top-layer feature images by utilizing three cavity convolution layers with different cavity rates to obtain three first feature images representing different scales;

and splicing the three first feature graphs representing different scales, carrying out convolution processing on the spliced feature graphs to obtain a second feature graph, and carrying out feature addition operation on the second feature graph and the input topmost feature graph to obtain a fusion feature graph.

The present invention also provides a computing device comprising: one or more processing units;

a storage unit for storing one or more programs,

wherein the one or more programs, when executed by the one or more processing units, cause the one or more processing units to perform the steps of the unmanned aerial vehicle detection method based on multi-hole convolution and SE attention residuals as described above.

The present invention also provides a computer readable storage medium having processor executable non-volatile program code which when executed by a processor implements the steps of the method for unmanned aerial vehicle detection based on multi-hole convolution and SE attention residuals as described above.

Compared with the prior art, the invention has outstanding substantive characteristics and remarkable progress, and in particular, after the SPP structure of the main network of the target detection model YOLOv5, the invention introduces an SE attention residual error network module, wherein the SE attention residual error network module comprises a multi-cavity convolution fusion module and an SE attention module; the topmost feature map output by the SPP structure is sent to a multi-cavity convolution fusion module to expand the receptive field and capture richer multi-scale detail features; the output of the multi-cavity convolution fusion module is used as the input of the SE attention module, the attention weights of different channels are obtained, and the attention weights are multiplied by the top-level feature map to obtain an enhanced feature map, so that the feature weights of the useful channels are improved, and the target unmanned plane area is more focused.

In the PANet structure of the feature fusion network of the target detection model YOLOv5, a multi-cavity convolution fusion module is introduced behind each convolution layer, the two-way fusion backbone network is utilized to fuse cavity convolution features of different scales, feature graphs of different scales about the contextual feature information of the unmanned aerial vehicle are obtained, more high-resolution detail information is reserved, and feature enhancement before feature fusion network feature graph splicing is further realized.

According to the unmanned aerial vehicle target detection model, the multi-scale fusion characteristic information of unmanned aerial vehicle characteristic diagrams with different resolutions is obtained while resolution is not lost, and the detection accuracy of unmanned aerial vehicle targets is improved.

Drawings

Fig. 1 is a flow chart of the unmanned aerial vehicle detection method of the invention.

Fig. 2 is a schematic diagram of the structure of the SE attention residual network of the present invention.

FIG. 3 is a schematic diagram of the structure of the hole convolution layer of the present disclosure.

FIG. 4 is a schematic diagram of the structure of the improved object detection model of the present invention.

Detailed Description

The technical scheme of the invention is further described in detail through the following specific embodiments.

Example 1

The embodiment provides an unmanned aerial vehicle detection method based on multi-cavity convolution and SE attention residual, as shown in fig. 1, comprising the following steps:

In implementation, as shown in fig. 2, the SE attention residual network module includes a multi-hole convolution fusion module and an SE attention module.

The cavity convolution (dilated convolution) is a convolution thought which is provided for the problem that downsampling in the image semantic segmentation reduces the resolution of the image and loses information. The advantage of the holes is that the receptive field is not increased by pooling, so that each convolution output contains a larger range of information. However, local information can be lost in a single hole convolution, and in order to overcome the grid effect and the local information loss of the single hole convolution, the hole convolution with a small receptive field needs to be selected so as to obtain more unmanned aerial vehicle high-resolution detailed information and avoid resolution loss.

Based on the above principle, the present embodiment designs the multi-hole convolution fusion module, as shown in fig. 3, where the multi-hole convolution fusion module includes three hole convolution layers with different hole rates, a splicing module, a convolution layer, and a feature fusion module.

When the multi-cavity convolution fusion module is used, the multi-cavity convolution fusion module is connected with the SPP structure, the topmost feature map output by the SPP structure is input into the multi-cavity convolution fusion module, and three cavity convolution layers with different cavity rates are used for respectively carrying out depth separable convolution processing and cavity convolution processing on the topmost feature map, so that three first feature maps representing different scales are obtained; the cavity convolution layer with small cavity rate has small scale receptive field, and the obtained higher layer characteristic information has good extraction capability on the small target of the unmanned aerial vehicle; and the cavity convolution layer with large cavity rate has a large scale receptive field, and low-layer characteristic information is acquired, so that the detection of the unmanned aerial vehicle small target is more accurate. By the superposition of the cavity convolutions with the large and small cavity rates, the grid effect and the local information loss caused by the cavity convolutions can be effectively reduced.

The fused feature map has the same size as the topmost feature map, but retains more high-resolution detailed information, and plays a role in connecting contexts.

The SE attention module comprises a compression module (Sqeze) and an Excitation module (expression), wherein the compression module (Sqeze) adopts a global average pooling module, and after the compression operation of the compression module, the feature map is compressed into a 1 multiplied by C vector; the Excitation module (Excitation) comprises two fully connected layers, wherein the fully connected layers introduce scaling factors, so that the number of channels is reduced, the calculated amount is reduced, and the calculation efficiency is improved.

When the method is used, the SE attention module is connected with the multi-cavity convolution fusion module, the fusion feature map output by the multi-cavity convolution fusion module is received, the global average pooling module is used for compression operation to obtain a global feature map with the size of 1 multiplied by C, the two full connection layers are used for reducing and recovering feature channels of the global feature map, finally, feature recalibration operation is carried out, the weight output by the Excitation module (expression) is expressed as the importance degree of each feature channel after feature selection, and the weight is weighted to each feature channel by channel through multiplication, so that feature recalibration of channel dimension is completed.

It can be understood that, as the frame occupation ratio of the unmanned aerial vehicle is smaller, the characteristics are mainly concentrated on the low-layer characteristic diagram, the receptive field range can be expanded after the processing by using the multi-cavity convolution fusion module, the characteristic information of the unmanned aerial vehicle in a shallow layer is obtained, and the detection precision of the unmanned aerial vehicle target detection model on the unmanned aerial vehicle small target is improved; by using the SE attention module, different weights can be given to different characteristic channels, so that an improved target detection model focuses on some characteristic channels with large weight values, the characteristic channels which are useful for the current unmanned aerial vehicle detection task are improved, and the characteristic channels which are not useful for the current unmanned aerial vehicle detection task are restrained. In the PANet structure of the feature fusion network, a multi-cavity convolution fusion module is introduced behind each convolution layer to obtain more feature graphs related to the context feature information, and more high-resolution detail information is reserved.

In specific implementation, acquiring an unmanned aerial vehicle image sample set, dividing the sample set into a training set and a verification set, and preprocessing image samples in the training set specifically comprises the following steps:

acquiring pictures of unmanned aerial vehicles, flying birds, kites and the like, and establishing an unmanned aerial vehicle image sample set; labeling and classifying all pictures in the unmanned aerial vehicle image sample set by using a LabelImg labeling tool, and automatically generating txt label files containing category information and position information;

after labeling and classification are completed, dividing an unmanned aerial vehicle image sample set into a training set and a verification set according to the ratio of 8:2;

modifying the picture paths and the category parameters in the data set configuration file according to the labeling and classifying results for the training set and the verification set, adopting a k-means clustering algorithm to process the picture paths and the category parameters, generating candidate frames, and replacing the candidate frame sizes of the model configuration file;

and performing basic data enhancement operation on each target image sample in the training set, cutting and then splicing random 4 pictures in one Batch by utilizing a Mosaic data enhancement method to obtain an image with a fixed output size 640 x 640, and inputting the image into the improved target detection model.

Through basic data enhancement operation and Mosaic data enhancement operation, training samples can be enriched, and the network training speed can be improved. Wherein the basic data enhancement operations include random clipping, scaling, rotation, translation, color transformation, and the like.

It will be appreciated that because the GPU environment needs to be built on a high power device using the NVIDIA 3080 graphics card prior to training, the loading data configuration file and the model configuration file need to be set up.

It can be understood that in the process of training the unmanned aerial vehicle target detection model, after 3 prediction feature graphs are obtained after each training, redundant candidate frames in each prediction feature graph are eliminated by adopting a non-maximum suppression method, the frame with the maximum confidence score is reserved, and the model weight is continuously adjusted by using a loss function as a back propagation algorithm.

Further, after each training is completed, a model file, a training result and training data of each round can be obtained, wherein the training result comprises index parameters such as accuracy, recall, average mean value precision (mAP), loss and the like, and the accuracy, the recall and the mAP are taken as important parameters for evaluating the performance of the model and can be used for verifying the feasibility of the invention; the prediction accuracy and the false detection rate of each category can be seen by the mixed dish matrix diagram; in addition, in order to visually observe the training results, the training results can also be visually operated by using the tfevents file.

Example 2

This embodiment provides a specific embodiment for improving the object detection model.

As shown in fig. 4, in this embodiment, the unmanned aerial vehicle target detection model is based on a target detection model YOLOv5, and specifically, the target detection model YOLOv5 includes three network parts of a backbone network, a feature fusion network, and an output network.

The backbone network comprises 1 Foucs layer, 4 Conv modules of 3*3 and one SPP structure, wherein the Conv modules contain standard convolution, normalization and SiLu activation operations; the Conv modules of 1 Foucs layer and 4 3*3 are used for performing 32 times downsampling operation on the input image; the SPP structure is used for converting the feature map with any size into feature vectors with fixed sizes;

after the SPP structure, introducing an SE attention residual error network module MDSE; after the features of the target image are extracted through 32 times of downsampling operation, the topmost features are sent to an SE attention residual error network module MDSE for feature enhancement processing, so that the attention degree of unmanned aerial vehicle features is improved.

Specifically, the SE attention residual network module comprises a multi-hole convolution fusion module MDCF and an SE attention module, the SE attention module comprises a compression module sqeze and an Excitation module accounting, the compression module sqeze adopts a global average pooling module, and the Excitation module accounting comprises two full-connection layers. The multi-cavity convolution fusion module comprises three cavity convolution layers with cavity rates of 2, 3 and 5, a splicing module, a 1*1 convolution layer Conv and a feature fusion module. Each cavity convolution layer comprises a depth separable convolution module DWconv2d and a cavity convolution module Dilatedconv2d, wherein the depth separable convolution module DWconv2d is used for reducing the calculated amount, and the cavity convolution module Dilatedconv2d is used for expanding the receptive field of the input feature map; 1*1 the convolutional layer Conv is used to adjust the number of output channels.

The feature fusion network comprises a PANet structure, wherein the PANet structure comprises 2 up-sampling units and 2 down-sampling units, and each up-sampling unit comprises a Conv module, an up-sampling module, a Concat module and a C3 module of 1*1; each downsampling unit comprises a Conv module, a Concat module and a C3 module of 3*3; the PANet structure adds a bottom-up feature fusion mode on the basis of feature fusion from top to bottom of the FPN structure, so that target feature information is further enhanced; the Conv modules of each up-sampling unit and each down-sampling unit are provided with the multi-cavity convolution fusion module MDCF, so that more feature graphs about the context feature information can be obtained, and more high-resolution detail information can be reserved.

In order to ensure that targets with large, medium and small scales are detected, the output network comprises 20×20, 40×40 and 80×80 3 prediction feature maps with different sizes, and each prediction feature map comprises position information, confidence and classification errors of 3 prior frames.

Example 3

The present embodiment provides a computing device including:

one or more processing units;

a storage unit for storing one or more programs;

wherein the one or more programs, when executed by the one or more processing units, cause the one or more processing units to perform the steps of the drone detection method as described in embodiment 1 or embodiment 2.

The specific connection medium between the processing unit and the storage unit is not limited in the embodiments of the present application.

In the embodiment of the present application, the storage unit stores instructions executable by at least one processing unit, and the at least one processing unit can execute the steps of the feature extraction method by executing the instructions stored in the storage unit.

The processing unit is a control center of the computer device, and various interfaces and lines can be used for connecting various parts of the computer device, and the feature extraction is performed by running or executing instructions stored in the storage unit and calling data stored in the storage unit. In some embodiments, the processing unit may include one or more processing units that may integrate an application processor that primarily processes operating systems, user interfaces, application programs, and the like, with a modem processor that primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processing unit. In some embodiments, the processing unit and the memory unit may be implemented on the same chip, and in some embodiments they may be implemented separately on separate chips.

The processing unit may be a general purpose processor, such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.

The storage unit is a non-volatile computer-readable storage medium that can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The storage unit may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random access Memory (Random AccessMemory, RAM), a static Random access Memory (Static Random Access Memory, SRAM), a programmable Read-Only Memory (Programmable Read Only Memory, PROM), a Read-Only Memory (ROM), a charged erasable programmable Read-Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and the like. A storage unit is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory unit in the embodiments of the present application may also be a circuit or any other device capable of implementing a memory function, and is used to store program instructions and/or data.

Example 4

The present embodiment provides a computer-readable storage medium having a processor-executable nonvolatile program code, which when executed by a processor, implements the steps of the drone detection method described in embodiment 1 or embodiment 2.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that the above-mentioned embodiments are only for illustrating the technical scheme of the present invention and are not limiting; while the invention has been described in detail with reference to the preferred embodiments, those skilled in the art will appreciate that: modifications may be made to the specific embodiments of the present invention or equivalents may be substituted for part of the technical features thereof; without departing from the spirit of the invention, it is intended to cover the scope of the invention as claimed.

Claims

1. The unmanned aerial vehicle detection method based on the multi-cavity convolution and SE attention residual is characterized by comprising the following steps of:

2. The unmanned aerial vehicle detection method based on multi-hole convolution and SE attention residual according to claim 1, wherein: the SE attention residual error network module comprises a multi-cavity convolution fusion module and an SE attention module;

the SE attention module is connected with the multi-cavity convolution fusion module, receives the fusion feature image output by the multi-cavity convolution fusion module, performs Sqeze operation by using global average pooling to obtain a global feature image, and performs channel reduction and restoration on the global feature image by using two full connection layers to obtain attention weight;

3. The unmanned aerial vehicle detection method based on multi-hole convolution and SE attention residual according to claim 1 or 2, wherein: the multi-cavity convolution fusion module comprises three cavity convolution layers with different cavity rates, a splicing module, a convolution layer and a feature fusion module, wherein each cavity convolution layer comprises a depth separable convolution module and a cavity convolution module;

4. A method of unmanned aerial vehicle detection based on multi-hole convolution and SE attention residuals according to claim 3, wherein: the void fraction of the three void convolutions is 2, 3 and 5, respectively.

5. A computing device, characterized by: comprising the following steps:

one or more processing units;

a storage unit for storing one or more programs,

wherein the one or more programs, when executed by the one or more processing units, cause the one or more processing units to perform the steps of the multi-hole convolution and SE attention residual-based drone detection method of any one of claims 1 to 4.

6. A computer readable storage medium having a processor executable non-volatile program code, characterized in that the computer program when executed by a processor implements the steps of the drone detection method based on multi-hole convolution and SE attention residuals according to any one of claims 1 to 4.