CN117197687A - Unmanned aerial vehicle aerial photography-oriented detection method for dense small targets - Google Patents

Unmanned aerial vehicle aerial photography-oriented detection method for dense small targets Download PDF

Info

Publication number
CN117197687A
CN117197687A CN202310235724.3A CN202310235724A CN117197687A CN 117197687 A CN117197687 A CN 117197687A CN 202310235724 A CN202310235724 A CN 202310235724A CN 117197687 A CN117197687 A CN 117197687A
Authority
CN
China
Prior art keywords
network
loss
detection
small target
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310235724.3A
Other languages
Chinese (zh)
Inventor
张红英
蒲俊涛
袁明东
黄语涵
曾静超
曾芸芸
杨靖儒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Original Assignee
Southwest University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology filed Critical Southwest University of Science and Technology
Priority to CN202310235724.3A priority Critical patent/CN117197687A/en
Publication of CN117197687A publication Critical patent/CN117197687A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The application provides a detection method for a small dense target of unmanned aerial vehicle aerial photography. Firstly, aiming at constructing a lightweight backbone network CSPDarknet-tiny on the basis of a YOLOv7 model, the downsampling ratio is reduced, and more semantic information and detail characteristics are reserved; secondly, a multi-head attention mechanism MHSA is introduced into the neck network, so that the interference of irrelevant information brought by a complex background is effectively relieved, the network is helped to pay more attention to the feature information extraction of the small target, and the detection precision of the small target is improved; finally, aiming at the sensitivity of the IOU Loss to the small target position difference, introducing the NWD Loss, and obviously improving the detection precision of the small target by combining the IOU Loss and the NWD Loss through a certain weight proportion. The application improves the problem of small target detection under the unmanned aerial vehicle aerial photographing condition, improves the accuracy of small target detection, reduces the omission rate of small target detection, ensures the excellent detection performance of the small target, and has wide applicability.

Description

Unmanned aerial vehicle aerial photography-oriented detection method for dense small targets
Technical field:
the application relates to an image processing technology, in particular to a detection method for a dense small target for unmanned aerial vehicle aerial photography, which combines multi-scale feature fusion and a measurement mode based on Wasserstein distance, introduces a multi-head attention mechanism MHSA, and constructs a robust small target detection network.
Technical background:
the unmanned aerial vehicle has the advantages of low operation cost, high maneuverability, portability, multiple visual angles, small volume and the like, can make up for the defect of remote sensing satellite information acquisition, and is increasingly used as a research hotspot for domestic and foreign expert students along with the gradual opening of the low-altitude field and the continuous development of unmanned aerial vehicle research and development technologies.
The object detection technology is to distinguish an object of interest from a background in the obtained picture and video, and identify the type of the object and the position of the object. The early target detection method is that abstract semantic features cannot be captured well by utilizing artificial design features, only a single specified category can be identified, and therefore the identification efficiency is low and the detection performance is low. Because the aerial image has more complex scenes and targets than the daily image, the traditional target detection method is more disadvantageous, and the requirement of aerial image target detection cannot be met. Meanwhile, aerial images often have huge data volumes, and real-time detection is often required, which is more strict for detection methods. In recent years, with the rapid development of deep learning, compared with the traditional method, the detection performance of the convolutional neural network is greatly improved, the algorithm is mainly divided into two categories, namely a single-stage algorithm and a two-stage algorithm, the two-stage algorithm is mainly based on the basic idea of region detection, the detection process is divided into two steps, and candidate regions possibly containing targets are generated through methods such as selective search, edge detection, region extraction network and the like to extract features; and then, classifying and position regression is carried out on the positions of the candidate frames by using a convolutional neural network. The existing two-stage algorithm has low common false detection rate and omission rate and good detection effect, but needs multiple detection and classification, and has low detection speed, and the two-stage algorithm comprises R-CNN, fast R-CNN, mask R-CNN and SPP-Net. Unlike two stages, a single-stage detector can directly obtain a check box without generating a plurality of candidate areas in advance, so that the single-stage algorithm generally has a high detection speed, but has a low detection effect, such as SSD, YOLO columns, and the like.
The current target detection algorithm has the following problems in target detection of unmanned aerial vehicle aerial images: 1) The target scale is changed greatly, and the feature fusion requirement on the algorithm is high; 2) The target size is small, the distribution is dense, the background is complex, contradiction exists between the feature extraction and downsampling of the small target, and the detection difficulty is increased; 3) The algorithm model based on YOLO has large parameter quantity and complex calculation.
The application comprises the following steps:
the application aims to solve the problems of low target detection accuracy and high omission ratio caused by large target quantity and large small target ratio in aerial images of an unmanned aerial vehicle, provide aerial images under different time, different weather conditions and different illumination conditions, design an algorithm network model, and obtain the model through deep neural network training to perform target detection, thereby solving the problem of small target detection under the aerial condition of the unmanned aerial vehicle, improving the accuracy of small target detection and reducing the omission ratio of small target detection.
In order to achieve the above objective, the present application provides an unmanned aerial vehicle aerial photographing target detection model based on a YOLOv7 network, wherein the method uses YOLOv7 as a backbone network, reduces the downsampling ratio, introduces a multi-head attention Mechanism (MHSA), makes the model focus more on target feature information, introduces a Normalized Wasserstein Distance (NWD) loss function when calculating regression loss, and makes up for the defect of small target detection, and comprises three parts: the first part is to preprocess the data set, the second part is to construct an improved YOLOv7 network, the third part is to train and test the network, and the best aerial data set detection result is output.
The first part comprises three steps:
step 1: dividing a training set, a verification set and a test set by adopting an unmanned aerial vehicle aerial photography public data set VisDrone;
step 2: the obtained data set pictures are adjusted to 640 multiplied by 640 pixels, random overturning, scaling, color gamut conversion and other operations are carried out on each training picture through Mosaic data enhancement, and four pictures are spliced in a picture splicing mode, so that a final data set is obtained;
step 3: aiming at the data set obtained in the step 2, carrying out K-means++ clustering on the frames of the data set to obtain new anchor frame sizes, comparing the results with the originally set anchor frames, calculating the matching accuracy, and selecting the optimal anchor frame size setting;
the second part comprises three steps:
step 4: and establishing a lightweight trunk feature extraction network CSPDarknet-tiny. The downsampling multiplying power is reduced on the backbone network of the original YOLOv7, the downsampling is reduced from 32 times to 16 times, and the output characteristic diagrams comprise 160×160×256 characteristic diagrams map1, 80×80×512 characteristic diagrams map2 and 40×40×512 characteristic diagrams map3;
step 5: processing the feature map3 obtained in the step 4 by using SPPCSPC to obtain a feature map P1 of 40 multiplied by 256;
step 6: and establishing a feature fusion network. In the feature extraction network of the neck, a path fusion network of the YOLOv7 is reserved, different feature layers and detection layers are fused, the FPN up-sampling conveys semantic features, and the PAN down-sampling conveys positioning features, and the implementation is as follows:
(1) The P1 obtained in the step 5 is transmitted into a deep feature extraction module C3MS, wherein the C3MS introduces a multi-head attention mechanism MHSA on the basis of C3, so that the feature extraction capability of a network can be effectively enhanced, and a feature map P2 is obtained;
(2) Fusing the map1, map2 and map2 obtained in the step 4 through top-down and bottom-up paths, and outputting final feature maps P3, P4 and P5;
the third part comprises four steps:
step 7: the feature maps P3, P4 and P5 output by the step 6 are subjected to channel number adjustment by REPConv, three layers of 1×1 convolution are used for predicting three parts of objectness, class and bbox, and finally the used detection heads are head0 of 40×40×512, head1 of 80×80×256 and head2 of 160×160×128;
step 8: adjusting network structure super parameters, and setting network model parameters, wherein the training batch size epoch is set to 200, the Momentum momentum=0.937, and the learning rate is initially set to ir=0.01;
step 9: training an aerial target detection model by using a training set to obtain a prediction result of a target in each sample, wherein the prediction result comprises a target prediction boundary frame and a center point position of the prediction boundary frame;
step 10: calculating total loss according to the sample prediction result and the label difference obtained in the step 9, and updating network model parameters based on the total loss to obtain a final training model;
when the regression Loss in the total Loss is calculated, NWD Loss is introduced, the IOU Loss and the NWD Loss are combined through a certain weight proportion, and the regression Loss function is as follows:
Loss box =λ 1 ×(1.0-IOU)+λ 2 ×(1.0-NWD(N a ,N b ))
wherein lambda is taken 1 And lambda (lambda) 2 The NWD Loss is 0.5, so that the defect of small target detection by the IOU is fully overcome, the detection precision of the original model on large and medium targets is reserved, and the detection capability of the model on the small targets is remarkably improved;
step 11: and (3) inputting the test set in the step (2) into the training model in the step (10) to obtain a test result of unmanned aerial vehicle small target detection.
The application is mainly characterized in that the improvement in the YOLOv7 network model is as follows:
(1) The lightweight backbone network CSPDarknet-tiny is constructed, the downsampling multiplying power is reduced, the original downsampling multiplying power is reduced to 16 times by 32 times, more semantic information and detail characteristics are reserved, the output characteristic diagram is changed from original 80×80×512, 40×40×1024 and 20×20×1024 into 160×160×256, 80×80×512 and 40×40×1024, the parameter quantity of a model is obviously reduced, the information loss caused by overlarge multiplying power of a target in an aerial image of an unmanned plane in the downsampling process is effectively relieved, and the detection precision of a small target is improved;
(2) Because the background of the aerial image is complex and the target to be detected is small, the extraction of the target feature by the network is not facilitated, and when the main network extracts the feature information, a large amount of interference of irrelevant information exists, and the influence on the detection result is extremely large, the application introduces a multi-head attention mechanism MHSA, which has the capability of capturing the large-range image target information, combines C3 with multi-head attention to design a deep feature extraction module C3MS, effectively relieves the interference of the irrelevant information in the aerial image on the feature extraction, and enhances the capability of the model for extracting the small target feature information;
(3) Targets in the aerial image have the characteristics of large scale difference and more small targets, while the loss function of the initial YOLOv7 model uses CIOU, but the IOU is quite sensitive to the position difference of targets with different scales, so that the model is not ideal in application of the aerial image data set. According to the application, the NWD Loss is introduced to make up for the deficiency of the IOU Loss, the NWD is insensitive to the target scale change, and the similarity comparison between small targets is facilitated, but only the NWD Loss is not beneficial to the detection of the large and medium scale targets, so that the application combines the NWD Loss and the IOU Loss with specific weights, not only compensates the detection of the small targets which are not beneficial to the IOU, but also retains the superiority of the IOU on the detection of the large and medium scale targets.
Drawings
FIG. 1 is a general flow chart of the present application;
FIG. 2 is a diagram of the overall network framework of the present application;
FIG. 3 is a diagram of a feature fusion network framework of the present application;
FIG. 4 is a frame diagram of the MHSA structure of the present application;
FIG. 5 is a graph showing test set results output by the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described in the following description, in which the detailed description of the prior art may fade the subject matter of the present application, and the description will be omitted herein. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. An overall flow chart of an embodiment of the present application is shown in fig. 1, and the present application is further described below with reference to the accompanying drawings.
Step 1: acquiring an unmanned aerial vehicle aerial image data set VisDrone, wherein the data set contains 10209 images (6471 training sets, 548 verification sets and 3190 test sets) and contains 10 types of targets (pedestrians, people, bicycles, automobiles, trucks, tricycles, awning tricycles, buses and automobiles), and the sizes of the training pictures are unified to 640 multiplied by 640;
step 2: performing Mosaic data enhancement on a data set, randomly selecting four pictures in the data set, respectively performing random overturn (left-right overturn on an original picture), random zoom (size zoom on the original picture), color gamut change (brightness, saturation and tone change on the original picture), then intercepting corresponding areas of the four pictures in a matrix mode for splicing, and respectively placing the four pictures in the upper left, lower left, upper right and lower right areas to obtain pictures with more background and more information, thereby enhancing training samples;
step 3: for the data set obtained in the step 2, using a K-means++ clustering algorithm to cluster the width and height of all target boundary frames in the training set to obtain a new anchor frame combination, continuously updating the numerical value of a priori frame through back propagation, so that the data set can be better fitted, and finally obtaining the optimal anchor frame combinations [3,4,4,9,8,7], [8,14,16,9,14,18], [31,17,25,33,58,42];
fig. 2 is a diagram of an improved network model based on YOLOv7 model, firstly, a picture with 640×640 size is input, three feature images with different scales are output to a neck network through a trunk feature extraction network, the neck network outputs feature images with corresponding scales through a path fusion network, and a prediction result is output at a detection head through Rep and CBS, in the embodiment, the following steps are performed:
step 4: a lightweight trunk feature extraction network CSPDarknet-tiny is established, as shown in fig. 3, firstly, 640×640×3 pictures are input, a double downsampled feature map 320×320×64 is output through three CBS convolution layers, and then 4 times downsampled feature maps are obtained through CBS convolution with a convolution kernel size of 3 and a step length of 2, wherein CBS is composed of Conv (convolution) + BN (Batch Normalization) +SiLU (Sigmoid linear unit). Then 4 times of downsampling feature images are processed through an ELAN module, learning capacity of a network is enhanced on the basis of not damaging an original gradient path, 160×160×256 feature images map1 are output through the ELAN module, downsampling is continuously carried out on the map1, YOLOv7 is realized by adopting two branches, one branch realizes space downsampling through MP (maxpooling), a 1×1 convolution compression channel is connected in parallel, the other branch realizes downsampling through the 1×1 convolution compression channel and 3×3 convolution with a step length of 2, finally, the two branches are combined to obtain 80×80×256 feature images, and 80×80×512 feature images map2 and 40×40×512 feature images map3 are continuously obtained through ELAN and MP operations;
step 5: the map3 of the feature obtained in the step 4 is processed by SPPCSPC, SPPCSPC continues the idea of SPP, the receptive field is increased, different receptive fields are obtained through maximum pooling, the feature map is divided into two parts, one part is subjected to conventional convolution, the other part is subjected to SPP operation, so that the precision is improved, the speed is increased, and the feature map P1 of 40 multiplied by 256 is obtained;
step 6: the application discloses a feature fusion network, which is characterized in that the application extends a path fusion network PAFPN of YOLO and consists of two paths from top to bottom and from bottom to top, and the implementation is as follows:
(3) And (3) introducing the P1 obtained in the step (5) into a deep feature extraction module C3MS, wherein the C3MS introduces a multi-head attention mechanism MHSA on the basis of C3, so that the feature extraction capability of the network can be effectively enhanced. The Multi-Head Self-Attention mechanism MHSA (Multi-Head Self-Attention) consists of a plurality of Self-Attention modules, global information is captured in different spaces, and the obtained information is spliced to form a new feature map, and the structure of the new feature map is shown in figure 4 to obtain P2;
(4) Fusing the map1, map2 and map2 obtained in the step 4 through top-down and bottom-up paths to obtain final feature maps P3, P4 and P5;
step 7: the feature maps P3, P4 and P5 output by the step 6 are subjected to channel number adjustment by REPConv, three layers of 1×1 convolution are used for predicting three parts of objectness, class and bbox, and finally the used detection heads are head0 of 40×40×512, head1 of 80×80×256 and head2 of 160×160×128;
step 8: adjusting network structure super parameters, and setting network model parameters, wherein the training batch size epoch is set to 200, the Momentum momentum=0.937, and the learning rate is initially set to ir=0.01;
step 9: training an aerial target detection model by using a training set to obtain a prediction result of a target in each sample, wherein the prediction result comprises a target prediction boundary frame and a center point position of the prediction boundary frame;
step 10: calculating total loss according to the sample prediction result and the label difference obtained in the step 9, and updating network model parameters based on the total loss to obtain a final training model;
the total Loss function employed by the original YOLOv7 network in calculating the Loss includes a confidence Loss (Loss obj ) Regression Loss (Loss) box ) And Loss of classification (Loss cls ) The Loss function Loss is as follows:
Loss=λ 1 Loss obj2 Loss box3 Loss cls
wherein lambda is 1 、λ 2 、λ 3 The weights of different Loss functions in the total Loss function are represented, the application is 1, the target confidence Loss and the classification Loss adopt BCEWIthLogitsLoss (binary cross entropy Loss with log), and the regression Loss adopts CIOULoss;
when regression loss is calculated, a novel small target detection and evaluation method based on Wasserstein distance is introduced, which is called Normalized Wasserstein Distance (NWD), similarity between targets is calculated through Gaussian distribution, the similarity can be measured on detected targets whether the detected targets are overlapped or not through distribution similarity, the method is insensitive to target scale, and the method is more suitable for measuring the similarity between small targets, and the formula of the NWD is as follows:
where C is a constant closely related to the dataset,is a distance measure, N a And N b Is formed by A= (cx) a ,cy a ,w a ,h a ) And b= (cx b ,cy b ,w b ,h b ) A modeled gaussian distribution;
however, the result of completely replacing the IOU Loss with the NWD Loss is not improved, and although the detection precision of small targets and tiny targets is improved, the detection performance of large and medium targets is reduced, so the application reserves the IOU Loss, combines the IOU Loss and the NWD Loss through a certain weight proportion, and has the following complete regression Loss function:
Loss box =λ 1 ×(1.0-IOU)+λ 2 ×(1.0-NWD(N a ,N b ))
lambda is taken out in the application 1 And lambda (lambda) 2 The NWD Loss is 0.5, so that the defect of small target detection by the IOU is fully overcome, the detection precision of the original model on large and medium targets is reserved, and the detection capability of the model on the small targets is remarkably improved;
step 11: and (3) inputting the test set in the step (2) into the training model in the step to obtain a test result of unmanned aerial vehicle small target detection, as shown in fig. 5.
Aiming at the characteristics of large target scale difference, large small targets and large parameter quantity existing in the deep learning processing unmanned aerial vehicle aerial image, the application provides a light detection method for multi-scale feature fusion, firstly, an optimal anchor frame combination aiming at a VisDRone unmanned aerial vehicle aerial image data set is obtained by utilizing a K-means++ clustering algorithm, in addition, on a backbone network of an original YOLOv7 model, the downsampling multiplying power is reduced to 16 times by the original 32 times, the target information lost by a small target in the downsampling process is reduced, more semantic information and target features are reserved, the parameter quantity is reduced, the detection precision of the small target is improved by 71.1M reduction to 17.8M, and the experiment proves. In order to solve the interference of the complex background of the aerial image, the application introduces a multi-head attention mechanism MHSA, fuses the MHSA on the basis of C3 and designs a deep feature extraction module C3MS, thereby effectively relieving the interference of irrelevant information brought by the complex background, helping the network to pay more attention to the feature information extraction of the small target and improving the detection precision of the small target. In addition, as the IOU Loss function adopted by the original model is very sensitive to the position difference of targets with different scales, the NWD Loss is introduced to make up for the deficiency, the NWD Loss and the IOU Loss are combined through a certain weight proportion, so that the superiority of the detection of large and medium targets is reserved, the detection capability of small targets is improved, in the aspect of prediction, the original 20X 1024 detection head is abandoned, the 160X 256 detection head is increased, and the detection of the too small scale of the targets in the aerial image of the unmanned aerial vehicle is facilitated.
While the foregoing describes illustrative embodiments of the present application, it should be understood that the present application is not limited to the scope of the embodiments, but rather, it should be apparent to those skilled in the art that various changes can be made within the spirit and scope of the application as defined and defined by the appended claims, all of which are intended to be protected by the following inventive concept.

Claims (4)

1. The unmanned aerial vehicle aerial photography dense small target detection method is characterized by being improved based on YOLOv7, combining target characteristics in an unmanned aerial vehicle aerial photography image, establishing a lightweight trunk feature extraction network, fusing a multi-head attention mechanism MHSA, introducing an NWD Loss, and specifically comprising three parts of preprocessing a data set, constructing an improved YOLOv7 network, training the network and testing the network:
the first part comprises three steps:
step 1: dividing a training set, a verification set and a test set by adopting an unmanned aerial vehicle aerial photography public data set VisDrone;
step 2: the obtained data set pictures are adjusted to 640 multiplied by 640 pixels, random overturning, scaling and color gamut transformation operations are carried out on each training picture through Mosaic data enhancement, and four pictures are spliced in a picture splicing mode, so that a final data set is obtained;
step 3: aiming at the data set obtained in the step 2, carrying out K-means++ clustering on the frames of the data set to obtain new anchor frame sizes, comparing the results with the originally set anchor frames, calculating the matching accuracy, and selecting the optimal anchor frame size setting;
the second part comprises three steps:
step 4: and establishing a lightweight trunk feature extraction network CSPDarknet-tiny. The downsampling multiplying power is reduced on the backbone network of the original YOLOv7, the downsampling is reduced from 32 times to 16 times, and the output characteristic diagrams comprise 160×160×256 characteristic diagrams map1, 80×80×512 characteristic diagrams map2 and 40×40×512 characteristic diagrams map3;
step 5: processing the feature map3 obtained in the step 4 by using SPPCSPC to obtain a feature map P1 of 40 multiplied by 256;
step 6: and establishing a feature fusion network. In the feature extraction network of the neck, a path fusion network of the YOLOv7 is reserved, different feature layers and detection layers are fused, the FPN up-sampling conveys semantic features, and the PAN down-sampling conveys positioning features, and the implementation is as follows:
the P1 obtained in the step 5 is transmitted into a deep feature extraction module C3MS, wherein the C3MS introduces a multi-head attention mechanism MHSA on the basis of C3, so that the feature extraction capability of a network can be effectively enhanced, and a feature map P2 is obtained;
fusing the map1, map2 and map2 obtained in the step 4 through top-down and bottom-up paths, and outputting final feature maps P3, P4 and P5;
the third part comprises four steps:
step 7: the feature maps P3, P4 and P5 output by the step 6 are subjected to channel number adjustment by REPConv, three layers of 1×1 convolution are used for predicting three parts of objectness, class and bbox, and finally the used detection heads are head0 of 40×40×512, head1 of 80×80×256 and head2 of 160×160×128;
step 8: adjusting network structure super parameters, and setting network model parameters, wherein the training batch size epoch is set to 200, the Momentum momentum=0.937, and the learning rate is initially set to ir=0.01;
step 9: training an aerial target detection model by using a training set to obtain a prediction result of a target in each sample, wherein the prediction result comprises a target prediction boundary frame and a center point position of the prediction boundary frame;
step 10: calculating total loss according to the sample prediction result and the label difference obtained in the step 9, and updating network model parameters based on the total loss to obtain a final training model;
when the regression Loss in the total Loss is calculated, the NWD Loss is introduced, the IOU Loss and the NWD Loss are combined through a certain weight proportion, and the regression Loss function is as follows:
wherein take outAndthe NWD Loss is 0.5, so that the defect of small target detection by the IOU is fully overcome, the detection precision of the original model on large and medium targets is reserved, and the detection capability of the model on the small targets is remarkably improved;
step 11: and (3) inputting the test set in the step (2) into the training model in the step (10) to obtain a test result of unmanned aerial vehicle small target detection.
2. The unmanned aerial vehicle-oriented aerial photography dense small target detection method is characterized in that a lightweight trunk feature extraction network CSPDarknet-tiny is established in the step 4, and downsampling multiplying power is reduced on a trunk network of an original YOLOv7, so that more feature information is reserved, parameter quantity is reduced, and meanwhile detection precision of the small target is improved.
3. The unmanned aerial vehicle-oriented detection method for the dense small targets is characterized in that a deep feature extraction module C3MS designed by a multi-head attention mechanism MHSA is introduced on the basis of C3 in the step 6, complex background noise interference in aerial images is reduced, a backbone network is enabled to be more focused on extracting feature information of the small targets, and irrelevant information is ignored.
4. The unmanned aerial vehicle-oriented aerial photography dense small target detection method according to claim 1, wherein step 10 introduces NWD Loss to compensate sensitivity of IOU Loss to position differences of targets of different scales, and the detection precision of the small targets is obviously improved according to combination of a certain weight proportion.
CN202310235724.3A 2023-03-13 2023-03-13 Unmanned aerial vehicle aerial photography-oriented detection method for dense small targets Pending CN117197687A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310235724.3A CN117197687A (en) 2023-03-13 2023-03-13 Unmanned aerial vehicle aerial photography-oriented detection method for dense small targets

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310235724.3A CN117197687A (en) 2023-03-13 2023-03-13 Unmanned aerial vehicle aerial photography-oriented detection method for dense small targets

Publications (1)

Publication Number Publication Date
CN117197687A true CN117197687A (en) 2023-12-08

Family

ID=88993052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310235724.3A Pending CN117197687A (en) 2023-03-13 2023-03-13 Unmanned aerial vehicle aerial photography-oriented detection method for dense small targets

Country Status (1)

Country Link
CN (1) CN117197687A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117830813A (en) * 2024-01-21 2024-04-05 昆明理工大学 Small celestial body surface rock detection method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117830813A (en) * 2024-01-21 2024-04-05 昆明理工大学 Small celestial body surface rock detection method
CN117830813B (en) * 2024-01-21 2024-06-11 昆明理工大学 Small celestial body surface rock detection method

Similar Documents

Publication Publication Date Title
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN111259786B (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN112215128B (en) FCOS-fused R-CNN urban road environment recognition method and device
CN110929593B (en) Real-time significance pedestrian detection method based on detail discrimination
CN113052210A (en) Fast low-illumination target detection method based on convolutional neural network
CN111461083A (en) Rapid vehicle detection method based on deep learning
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN111353544B (en) Improved Mixed Pooling-YOLOV 3-based target detection method
CN111462050B (en) YOLOv3 improved minimum remote sensing image target detection method and device and storage medium
Cho et al. Semantic segmentation with low light images by modified CycleGAN-based image enhancement
CN114049572A (en) Detection method for identifying small target
CN114332921A (en) Pedestrian detection method based on improved clustering algorithm for Faster R-CNN network
CN113052170A (en) Small target license plate recognition method under unconstrained scene
CN116342894A (en) GIS infrared feature recognition system and method based on improved YOLOv5
CN115937736A (en) Small target detection method based on attention and context awareness
CN117197687A (en) Unmanned aerial vehicle aerial photography-oriented detection method for dense small targets
CN116091946A (en) Yolov 5-based unmanned aerial vehicle aerial image target detection method
CN113361528B (en) Multi-scale target detection method and system
CN114550023A (en) Traffic target static information extraction device
Khoshboresh-Masouleh et al. Robust building footprint extraction from big multi-sensor data using deep competition network
CN116863227A (en) Hazardous chemical vehicle detection method based on improved YOLOv5
CN116580289A (en) Fine granularity image recognition method based on attention
Wang et al. Summary of object detection based on convolutional neural network
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
CN113642520B (en) Double-task pedestrian detection method with head information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination