CN114863301A - Small target detection method for aerial image of unmanned aerial vehicle - Google Patents

Small target detection method for aerial image of unmanned aerial vehicle Download PDF

Info

Publication number
CN114863301A
CN114863301A CN202210488938.7A CN202210488938A CN114863301A CN 114863301 A CN114863301 A CN 114863301A CN 202210488938 A CN202210488938 A CN 202210488938A CN 114863301 A CN114863301 A CN 114863301A
Authority
CN
China
Prior art keywords
feature
small target
detection
training
unmanned aerial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210488938.7A
Other languages
Chinese (zh)
Inventor
张红英
张奇
罗向东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Original Assignee
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology filed Critical Southwest University of Science and Technology
Priority to CN202210488938.7A priority Critical patent/CN114863301A/en
Publication of CN114863301A publication Critical patent/CN114863301A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a small target detection method for aerial images of unmanned aerial vehicles. Firstly, aiming at the problem of small target size, the size of a detection head is changed to obtain original information of more small targets; secondly, in order to accurately detect dense continuous small targets in a complex scene, a residual convolution module fused with a CBAM attention mechanism is adopted in a feature extraction stage to increase weight for the densely connected interested areas in the training picture; then, in order to relieve the problems of background clutter and pixel blurring, a jump multi-scale feature enhancement module is introduced, two same-scale residual error paths and three cross-scale residual error paths are added in top-down feature fusion, shallow feature information and deep feature information are fully fused, and detection heads with different scales obtain enough image semantic information and space information. The invention utilizes multi-direction jumping residual connecting path to fuse multi-scale characteristics of the image, realizes excellent small target detection performance and has wide applicability.

Description

Small target detection method for aerial image of unmanned aerial vehicle
Technical Field
The invention relates to an image processing technology, in particular to a small target detection method for aerial images of unmanned aerial vehicles.
Background
The target detection combines two tasks of target positioning and target classification, and is the basis for solving higher-level visual tasks such as segmentation, scene understanding, target tracking, image description and event detection. Research on an object detection algorithm obtains the position of a predicted object from an initial sliding window-based mode, the object detection algorithm for extracting an interested area based on area suggestion generation appears along with the rise of deep learning, and the RPN based on anchors quickly replaces the area suggestion generation mode. Thus, anchors are widely used in a variety of target detection applications. However, the fixed size of the anchor cannot flexibly adapt to the prediction of targets with different scales, the generalization capability of the model is limited, the imbalance of positive and negative samples is aggravated by the dense anchor, and the calculated amount and the memory occupation are obviously increased, so that the anchor-free target detection method is gradually developed.
The unmanned aerial vehicle has the characteristics of low cost, small volume, high resolution, flexibility, convenient operation and capability of conveniently obtaining aerial images in various environments, is widely applied to industries which are not easy to operate manually, such as traffic monitoring, electric power inspection, military inspection, environment supervision and the like, and therefore has great significance for the research of target detection of the unmanned aerial vehicle. The unmanned aerial vehicle long-distance shot image has the characteristics of high background complexity, small target size and fuzzy appearance, so that great challenges are brought to the unmanned aerial vehicle small target detection task. The small targets widely exist in the aerial images, the problems of severe illumination change, target shielding, dense target connection, target scale change and the like usually exist in natural scenes in the aerial images, the influence of the factors on the characteristics of the small targets is more severe, and the difficulty of small target detection is further increased. The network training and the target prediction in the small target detection are mainly concentrated between the feature extraction and the feature fusion, and the excellent feature extraction mode and the feature fusion method can bring key high-frequency information for the small target detection, so that the problem that the feature information of the small target is dispersed or even disappears in a deep feature map due to the fact that the small target occupies a small number of pixels in an original image, the carried information is limited, and the appearance information such as textures, shapes and colors is lacked, and the down-sampling is solved, and therefore the rich network context semantic information, position information and feature representation become the research focus of the unmanned aerial vehicle small target detection task.
Disclosure of Invention
The invention aims to solve the problems of low small target detection accuracy, high false detection rate and high missed detection rate in aerial images of unmanned aerial vehicles, provide aerial images in different illumination environments, different climates and scenes, and obtain a detection model for small target detection through deep learning network training.
In order to achieve the above purpose, the present invention provides a small target detection method facing an unmanned aerial vehicle aerial image based on a YOLOX network, and the method adopts feature fusion of a multi-scale jump residual path and a residual structure embedded with an attention mechanism to achieve better small target detection, and includes five parts: the method comprises the steps of preprocessing a data set, extracting features of a small target data set of the unmanned aerial vehicle, fusing the features of the features from different stages, performing classification prediction and regression prediction of different scales on a fused feature map, and training and testing a network to obtain an optimal model for detecting the small target of the unmanned aerial vehicle.
The first part comprises 2 steps:
step 1, downloading a small target public data set of an unmanned aerial vehicle, selecting images with complex natural scenes, various angles and severe illumination change as a test set, and not performing any image enhancement on the test set;
step 2, resetting the size of the picture of the public data set, adjusting the picture to 640 multiplied by 640 pixels, carrying out four-time segmentation and turnover on each picture in the training set, then respectively placing the turnover pictures on corresponding segmentation positions, and finally carrying out operations such as color gamut transformation on the pictures, and the like, so as to enhance the training sample set and obtain a final training sample;
the second part comprises two steps:
step 3, inputting the training sample obtained in the step 2 into a convolution network with shared weight, carrying out channel number transformation through a convolution residual structure, and preliminarily obtaining a characteristic diagram of the aerial image from an RGB space;
step 4, performing multi-scale feature extraction on the feature map obtained in the step 3 to obtain feature maps feat0, feat1, feat2 and feat3 with channel numbers of 128, 256, 512 and 1024 respectively;
the third part comprises four steps:
step 5, respectively carrying out multi-scale jump residual processing on the feature maps feat0, feat1, feat2 and feat3 obtained in the step 4, carrying out concatenate operation for 3 times in a bottom-up path, simultaneously adding another bottom-up path, and carrying out cross-multi-scale jump connection on the bottommost feature feat3 and the shallowest feature feat0 to obtain fused feature maps P0, P1 and P2;
step 6, carrying out deep feature fusion on the P0, the P1 and the P2 obtained in the step 5 again through a top-down path to obtain feature maps P3 and P4 which are mutually fused in different scales;
step 7, on the basis of the step 5 and the step 6, additionally adding two same-scale residual paths during feature fusion from top to bottom, and performing jump fusion on the P3 and the P4 obtained in the step 6 and feat1 and feat2 obtained in feature extraction to obtain feature maps P5 and P6 with more details preserved;
step 8, adding two bottom-up paths, fusing the feature map feat3 obtained in the step 4 with the feature map P6 obtained in the step 7 again to obtain P7, and fusing the P0 with the P5 again to obtain P8;
the fourth section comprises three steps:
step 9, adding a 160 × 160 × 128 detection Head0, discarding a 20 × 20 × 1024 detection Head3 of the original YOLOX network, obtaining a refined feature map by a feature extraction and enhancement module which is integrated with a CBAM attention mechanism from the feature map P2 obtained in the step 5, and inputting the feature map into the Head0 to complete target prediction of small target detection;
step 10, respectively passing the feature maps P7 and P8 obtained in the step 8 through a different input feature extraction module, and respectively inputting the extracted feature information into a detection Head1 with the size of 80 × 80 × 256 and a detection Head2 with the size of 40 × 40 × 512 to complete target prediction of small target detection;
step 11, respectively completing classification and regression tasks of small target detection by convolution of two branches of the detection heads Head0, Head1 and Head2 in the steps 9 and 10;
the fifth part comprises two steps:
step 12, debugging the network structure hyper-parameters from step 3 to step 10, and obtaining a final training model;
and step 13, inputting the test set in the step 1 into the training model in the step 11 to obtain a test result of the unmanned aerial vehicle small target detection.
The invention provides a small target detection method for aerial images of unmanned aerial vehicles. Firstly, aiming at the problems that the small target has few pixels and the boundary between the target is not clear, the method adopts and fuses in the characteristic extraction stagenA feature refining module of the secondary CBAM attention mechanism searches an interested area on a dense object scene and increases attention to densely connected objects; in order to relieve the problems caused by background disorder and pixel blurring in the image, a multi-scale jump connection feature enhancement module is adopted, and the detection heads with different scales obtain enough image semantic information and space information by fully fusing shallow feature information and deep feature information and jump connecting original information and features after the first fusion; finally, for the problems of small image target size and large span of the unmanned aerial vehicle, a detection head with the size of 160 × 160 suitable for the small target size is added to a detection head with the size of 20 × 20 × 1024 of the original YOLOX network, and more original information mapped to the original image by the feature map is added by changing the size of the detection head.
Drawings
FIG. 1 is a network overall framework diagram of the present invention;
FIG. 2 is a diagram of a feature fusion network framework of the present invention;
FIG. 3 is a frame diagram of a feature extraction architecture of the present invention;
FIG. 4 is a frame diagram of a detecting head structure according to the present invention;
FIG. 5 shows the test set results output by the present invention.
Detailed Description
In order to better understand the present invention, a small target detection method for an aerial image of an unmanned aerial vehicle according to the present invention is described in more detail below with reference to specific embodiments. In the following description, detailed descriptions of the current prior art, which will be omitted herein, may obscure the subject matter of the present invention.
Step 1, downloading a small target public data set VisDrone of an unmanned aerial vehicle, selecting images with complex natural scenes, various angles and severe illumination change as a test set, unifying the sizes of the images in the test set to 640 multiplied by 640, and not performing any image enhancement on the test set;
step 2, resetting the size of the picture of the training data set, adjusting the picture to 640 multiplied by 640 pixels, performing mosaic data enhancement on the data set 90% of the total training round, performing four-time segmentation and turnover on each picture in the training set, then respectively placing the turned pictures on corresponding segmentation positions, and finally performing operations such as color gamut transformation and the like on the pictures, enhancing the training sample set, and obtaining a final training sample;
fig. 1 is a specific network model diagram of a small target detection method for an aerial image of an unmanned aerial vehicle according to the present invention, and in the present embodiment, the method is performed according to the following steps:
step 3, concentrating the width and height information of the training picture on a channel through a Focus structure, expanding the number of the channels from the original 3 channels to 64 channels, then performing 1 multiplied by 1 convolution expansion channel number expansion information, and increasing the nonlinear factor of a network model by using an SILU (silicon on insulator) activation function;
step 4, extracting image information features by using four residual modules such as Resblock1, Resblock2, Resblock3 and Resblock4 in the image information extraction system shown in the figure 1, and respectively outputting 4 feature maps feat0, feat1, feat2 and feat3 of the middle layer, wherein the number of channels is 128, 256, 512 and 1024;
and 5, fusing the features of different scales by adopting a multi-scale jump residual error path to obtain fused feature maps P0, P1 and P2, wherein the implementation is as follows:
step 5-1, a multi-scale jump residual feature fusion structure is shown in fig. 2, a feature fusion module is added in an original feature fusion structure, feat1 is subjected to channel change through convolution of 1 × 1, the number of channels is changed from 256 to 128, then the feature size is converted from 80 × 80 to 160 × 160 through upsampling of 2 times of adjacent interpolation, and is subjected to concatanate with feat0 generated in the first stage of feature extraction to obtain P2_1, and such features are fused for 3 times;
step 5-2, increasing a jump residual path across multiple scales, performing 3 times of 2 times of adjacent interpolation upsampling on feat3 generated in the last stage of feature extraction to transform the size of a feature map, changing 20 × 20 feat3 into a 160 × 160 feature map, performing channel change for convolution with 1 convolution kernel and step length being 1, changing the number of channels into 128, and finally performing concatenate feature fusion on the feature map and P2_1 in the step 5-1 to obtain P2;
step 6, carrying out deep feature fusion on the P0, the P1 and the P2 obtained in the step 5 again through a top-down path to obtain feature maps P3 and P4 which are mutually fused in different scales;
step 7, on the basis of the step 5 and the step 6, additionally adding two same-scale residual paths during feature fusion from top to bottom, and performing jump fusion on the P3 and the P4 obtained in the step 6 and feat1 and feat2 obtained in feature extraction to obtain feature maps P5 and P6;
step 8, adding two bottom-up paths, fusing feat3 in step 4 with P6 in step 7 again, and fusing P0 in step 3 with P5 in step 7 again to obtain P7 and P8, respectively, specifically as follows:
performing 2 times of adjacent interpolation upsampling on feat3 to obtain a feature map of 40 × 40, then performing 1 × 1 convolution, normalization and SILU activation functions to obtain a feature map with the channel number being unchanged in size being 512, and then fusing the feature map with P6 in the step 7 again; similarly, the feature map of the P0 in step 5 after passing through a feature extraction module is subjected to twice adjacent interpolation upsampling, then 1 × 1 convolution is performed to adjust the number of channels to 258, and the channel is fused with the P5 in step 7 by using concatenate;
step 9, as shown in fig. 4, adding a Head0 detection Head, discarding the Head3 detection Head, passing the feature map P2 obtained in step 5 through a feature extraction enhancement module integrated with the CBAM attention mechanism, and inputting the feature map P2 into the Head0 to complete target prediction of small target detection, which is implemented specifically as follows:
step 9-1, the feat0 features are utilized and introduced into the feature fusion module added to the feature fusion structure, a 160 × 160 × 128 detection Head0 is generated by feat0, and a 20 × 20 × 1024 detection Head3 generated by feature feat3 is discarded.
Step 9-2, the feature extraction enhancement module is as shown in fig. 3, the input features are respectively subjected to 2 1 × 1 convolutions and divided into two branches, and one branch is subjected to further convolutionnStacking a secondary residual error structure Bottleneck, wherein one branch of the Bottleneck is subjected to feature extraction through a 1 × 1 convolution and a 3 × 3 convolution, a CBAM attention mechanism is added between the two 1 × 1 convolutions and the 3 × 3 convolution, the other branch of the Bottleneck does not do any operation, the two branches of the Bottleneck are fused through add, and the two branches of the Bottleneck are fused with add of the BottlenecknSecondary Stack CBAM also repeatsnSecondly; the other branch is a residual edge branch, only one convolution operation of 1 multiplied by 1 is carried out, and finally the two branches are connected by using the concatenate for output;
step 10, respectively passing the feature maps P7 and P8 obtained in the step 8 through a different input feature extraction module, and respectively inputting the extracted feature information into a detection Head1 with the size of 80 × 80 × 256 and a detection Head2 with the size of 40 × 40 × 512 to complete target prediction of small target detection;
step 11, respectively completing classification and regression tasks of small target detection by convolution of two branches of the detection heads Head0, Head1 and Head2 in the steps 9 and 10;
step 12, debugging the network structure hyper-parameters from step 3 to step 10, and setting network model parameters, wherein the epoch is set to 130, migration training is adopted, the main network is frozen in the first 50 epochs, the learning rate is set to 0.001, a pre-training model is loaded, and the bach size is set to 4; unfreezing the backbone network by the next 80 epochs, setting the learning rate to be 0.0001, setting the bach size to be 2, reducing the learning rate to be 0.92 after every 1 epoch, and obtaining a final training model after training;
and step 13, inputting the test set in the step 1 into the training model in the step 11 to obtain a test result of the unmanned aerial vehicle small target detection. FIG. 5 shows the test set results output by the present invention.
The invention provides a multi-scale feature fusion anchor-free small target detection method according to the characteristics of small targets of aerial images of unmanned aerial vehicles and a target detection method based on deep learning. Meanwhile, the size of a detection head of the network is improved, a detection head of 160 multiplied by 128 is added, the network is more suitable for small-size target detection, the detection head of 20 multiplied by 1024 with larger parameter quantity in the original YOLOX network is discarded, and the parameter quantity of the network is reduced. In the feature extraction module, a CBAM attention mechanism is adopted to search a target dense connection region, the attention of a difficult sample is increased, the weight of important information is improved, and a large amount of calculated amount and feature redundancy generated by a network for searching an interested region are reduced. The method has the advantages of simple algorithm, strong operability and wide applicability.
While the invention has been described with respect to the illustrative embodiments thereof, it is to be understood that the invention is not limited thereto but is intended to cover various changes and modifications which are obvious to those skilled in the art, and which are intended to be included within the spirit and scope of the invention as defined and defined in the appended claims.

Claims (5)

1. A small target detection method for aerial images of unmanned aerial vehicles is characterized in that a residual error structure with a fusion attention mechanism is introduced for feature extraction and multi-directional multi-scale residual error jump path feature fusion, and multiple detection head predictions are carried out in a non-anchor prediction mode with different scales, the method comprises five parts, namely data set preprocessing, feature extraction on an unmanned aerial vehicle small target data set, feature fusion on features from different stages, classification and regression prediction with different scales on a fused feature map, and network training and testing;
the first part comprises two steps:
step 1, downloading a small target public data set VisDrone of an unmanned aerial vehicle, selecting images with complex natural scenes, various angles and severe illumination change as a test set, unifying the sizes of the images in the test set to 640 multiplied by 640, and not performing any image enhancement on the test set;
step 2, resetting the size of the picture of the training data set, adjusting the picture to 640 multiplied by 640 pixels, performing mosaic data enhancement on the data set 90% of the total training round, performing four-time segmentation and turnover on each picture in the training set, then respectively placing the turned pictures on corresponding segmentation positions, and finally performing operations such as color gamut transformation and the like on the pictures, enhancing the training sample set, and obtaining a final training sample;
the second part comprises two steps:
step 3, concentrating the width and height information of the training picture on a channel through a Focus structure, expanding the number of the channels from the original 3 channels to 64 channels, then performing 1 multiplied by 1 convolution expansion channel number expansion information, and increasing the nonlinear factor of a network model by using an SILU (silicon on insulator) activation function;
step 4, extracting image information features by using four residual modules such as Resblock1, Resblock2, Resblock3 and Resblock4 in the image information extraction system shown in the figure 1, and respectively outputting 4 feature maps feat0, feat1, feat2 and feat3 of the middle layer, wherein the number of channels is 128, 256, 512 and 1024;
the third part comprises four steps:
step 5, respectively carrying out multi-scale jump residual processing on the feature maps feat0, feat1, feat2 and feat3 obtained in the step 4, carrying out concatement operation for 3 times in a bottom-up path, simultaneously adding another bottom-up path, and carrying out cross-multi-scale jump connection on the bottommost feature map feat3 and the shallowest feature map feat0 to obtain fused feature maps P0, P1 and P2
Step 6, carrying out deep feature fusion on the P0, the P1 and the P2 obtained in the step 5 again through a top-down path to obtain feature maps P3 and P4 which are mutually fused in different scales;
step 7, on the basis of the step 5 and the step 6, additionally adding two same-scale residual paths during feature fusion from top to bottom, and performing jump fusion on the P3 and the P4 obtained in the step 6 and feat1 and feat2 obtained in feature extraction to obtain feature maps P5 and P6;
step 8, adding two bottom-up paths, fusing feat3 in step 4 with P6 in step 7 again to obtain a feature map P7, and fusing P0 in step 3 with P5 in step 7 again to obtain P8, which is implemented as follows:
performing 2 times of adjacent interpolation upsampling on feat3 to obtain a 40 × 40 feature map, then performing 1 × 1 convolution, normalization and SILU activation function to obtain a feature map with unchanged size and channel number of 512, then fusing the feature map with P6 in step 7 again, similarly, performing twice of adjacent interpolation upsampling on the feature map after P0 in step 5 passes through a feature extraction module, then performing 1 × 1 convolution to adjust the channel number of 258, and fusing the feature map with P5 in step 7 by using concatenate;
the fourth section comprises two steps:
step 9, adding a Head0 detection Head, discarding the Head3 detection Head in the original YOLOX network, inputting the feature map P2 obtained in step 5 into the Head0 through a feature extraction enhancement module integrated with the CBAM attention mechanism to complete target prediction of small target detection, and specifically implementing the following steps:
step 9-1, utilizing feat0 features, introducing the feat0 features into a newly added feature fusion module of a feature fusion structure, generating a detection Head0 of 160 × 160 × 128 by feat0, and discarding a detection Head3 of 20 × 20 × 1024 generated by feat 3;
step 9-2, the feature extraction enhancement module is as shown in fig. 3, the input features are respectively subjected to 2 1 × 1 convolutions and divided into two branches, and one branch is subjected to further convolutionnStacking of sub-residual structure Bottleneck, and performing feature extraction on one branch of Bottleneck through 1 × 1 convolution and 3 × 3 convolution when two volumes of 1 × 1 and 3 × 3 are usedAdding a CBAM attention mechanism between the products, carrying out no operation on the other branch of the Bottleneck, fusing the two branches of the Bottleneck by add, and following the BottlenecknSecondary Stack CBAM also repeatsnSecondly; the other branch is a residual edge branch, only one convolution operation of 1 multiplied by 1 is carried out, and finally the two branches are connected by using the concatenate for output;
step 10, respectively passing the feature maps P7 and P8 obtained in the step 8 through a different input feature extraction module, and respectively inputting the extracted feature information into a detection Head1 with the size of 80 × 80 × 256 and a detection Head2 with the size of 40 × 40 × 512 to complete target prediction of small target detection;
step 11, respectively completing classification and regression tasks of small target detection by convolution of two branches of the detection heads Head0, Head1 and Head2 in the steps 9 and 10;
the fifth part comprises two steps:
step 12, debugging the network structure hyper-parameters from step 3 to step 10, and setting network model parameters, wherein the epoch is set to 130, migration training is adopted, the main network is frozen in the first 50 epochs, the learning rate is set to 0.001, a pre-training model is loaded, and the bach size is set to 4; unfreezing the backbone network by the next 80 epochs, setting the learning rate to be 0.0001, setting the bach size to be 2, reducing the learning rate to be 0.92 after every 1 epoch, and obtaining a final training model after training;
and step 13, inputting the test set in the step 1 into the training model in the step 11 to obtain a test result of the unmanned aerial vehicle small target detection.
2. The method for detecting the small target facing the unmanned aerial vehicle aerial image, according to claim 1, is characterized in that feature fusion of a plurality of reverse recursive paths is added in the steps 5 and 7, and fine granularity of deep feature information is added to a shallow feature layer.
3. The small target detection method for the aerial image of the unmanned aerial vehicle as claimed in claim 1, wherein two bottom-up paths are added in step 8, and feature fusion is further performed on the features after the first fusion through channel number conversion, so that diversity of context semantic information is enriched.
4. The method for detecting the small target facing the unmanned aerial vehicle aerial image in the claim 1 is characterized in that the step 9 uses a CBAM attention mechanism to put attention on the features beneficial to sample classification, and feature redundancy is reduced.
5. The method for detecting small objects oriented to aerial images of unmanned aerial vehicles as claimed in claim 1, wherein step 9 is implemented by adding a Head0 detection Head and discarding the Head3 detection Head in the original YOLOX network to improve the size of the detection Head.
CN202210488938.7A 2022-05-07 2022-05-07 Small target detection method for aerial image of unmanned aerial vehicle Pending CN114863301A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210488938.7A CN114863301A (en) 2022-05-07 2022-05-07 Small target detection method for aerial image of unmanned aerial vehicle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210488938.7A CN114863301A (en) 2022-05-07 2022-05-07 Small target detection method for aerial image of unmanned aerial vehicle

Publications (1)

Publication Number Publication Date
CN114863301A true CN114863301A (en) 2022-08-05

Family

ID=82635792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210488938.7A Pending CN114863301A (en) 2022-05-07 2022-05-07 Small target detection method for aerial image of unmanned aerial vehicle

Country Status (1)

Country Link
CN (1) CN114863301A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035354A (en) * 2022-08-12 2022-09-09 江西省水利科学院 Reservoir water surface floater target detection method based on improved YOLOX
CN115272814A (en) * 2022-09-28 2022-11-01 南昌工学院 Long-distance space self-adaptive multi-scale small target detection method
CN115375677A (en) * 2022-10-24 2022-11-22 山东省计算中心(国家超级计算济南中心) Wine bottle defect detection method and system based on multi-path and multi-scale feature fusion
CN116958774A (en) * 2023-09-21 2023-10-27 北京航空航天大学合肥创新研究院 Target detection method based on self-adaptive spatial feature fusion
CN118155106A (en) * 2024-05-13 2024-06-07 齐鲁空天信息研究院 Unmanned aerial vehicle pedestrian detection method, system, equipment and medium for mountain rescue

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035354A (en) * 2022-08-12 2022-09-09 江西省水利科学院 Reservoir water surface floater target detection method based on improved YOLOX
CN115035354B (en) * 2022-08-12 2022-11-08 江西省水利科学院 Reservoir water surface floater target detection method based on improved YOLOX
CN115272814A (en) * 2022-09-28 2022-11-01 南昌工学院 Long-distance space self-adaptive multi-scale small target detection method
CN115375677A (en) * 2022-10-24 2022-11-22 山东省计算中心(国家超级计算济南中心) Wine bottle defect detection method and system based on multi-path and multi-scale feature fusion
CN116958774A (en) * 2023-09-21 2023-10-27 北京航空航天大学合肥创新研究院 Target detection method based on self-adaptive spatial feature fusion
CN116958774B (en) * 2023-09-21 2023-12-01 北京航空航天大学合肥创新研究院 Target detection method based on self-adaptive spatial feature fusion
CN118155106A (en) * 2024-05-13 2024-06-07 齐鲁空天信息研究院 Unmanned aerial vehicle pedestrian detection method, system, equipment and medium for mountain rescue

Similar Documents

Publication Publication Date Title
CN114863301A (en) Small target detection method for aerial image of unmanned aerial vehicle
CN112884064B (en) Target detection and identification method based on neural network
CN110910391B (en) Video object segmentation method for dual-module neural network structure
CN115719338A (en) PCB (printed circuit board) surface defect detection method based on improved YOLOv5
CN114742799B (en) Industrial scene unknown type defect segmentation method based on self-supervision heterogeneous network
CN114863236A (en) Image target detection method based on double attention mechanism
CN117372898A (en) Unmanned aerial vehicle aerial image target detection method based on improved yolov8
Vo et al. Enhanced feature pyramid networks by feature aggregation module and refinement module
CN112508848A (en) Deep learning multitask end-to-end-based remote sensing image ship rotating target detection method
Sheng et al. Pyrboxes: An efficient multi-scale scene text detector with feature pyramids
CN117079260A (en) Text detection method based on mixed attention and feature enhancement
CN116363518A (en) Camouflage target detection method based on focal plane polarization imaging
CN116580324A (en) Yolov 5-based unmanned aerial vehicle ground target detection method
Liu et al. Vehicle detection method based on ghostnet-SSD
CN115861861A (en) Lightweight acceptance method based on unmanned aerial vehicle distribution line inspection
CN115035429A (en) Aerial photography target detection method based on composite backbone network and multiple measuring heads
CN114782983A (en) Road scene pedestrian detection method based on improved feature pyramid and boundary loss
CN111008555B (en) Unmanned aerial vehicle image small and weak target enhancement extraction method
Chen et al. Accurate object recognition for unmanned aerial vehicle electric power inspection using an improved yolov2 algorithm
Zhou et al. Class-aware edge-assisted lightweight semantic segmentation network for power transmission line inspection
Zhou et al. Feature fusion detector for semantic cognition of remote sensing
Zhao et al. Faster object detector for drone-captured images
Liu Dense multiscale feature fusion pyramid networks for object detection in UAV-captured images
Zhao et al. Target detection based on multi-scale feature fusion and cross-channel interactive attention mechanism
Deng et al. Towards scale adaptive underwater detection through refined pyramid grid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination