CN116469020A - Unmanned aerial vehicle image target detection method based on multiscale and Gaussian Wasserstein distance - Google Patents

Unmanned aerial vehicle image target detection method based on multiscale and Gaussian Wasserstein distance Download PDF

Info

Publication number
CN116469020A
CN116469020A CN202310402925.8A CN202310402925A CN116469020A CN 116469020 A CN116469020 A CN 116469020A CN 202310402925 A CN202310402925 A CN 202310402925A CN 116469020 A CN116469020 A CN 116469020A
Authority
CN
China
Prior art keywords
aerial vehicle
unmanned aerial
detection method
gaussian
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310402925.8A
Other languages
Chinese (zh)
Inventor
李红光
孟令捷
杨丽春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202310402925.8A priority Critical patent/CN116469020A/en
Publication of CN116469020A publication Critical patent/CN116469020A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Remote Sensing (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an unmanned aerial vehicle image target detection method based on multiscale and Gaussian Wasserstein distance, which relates to the technical field of aviation image processing, and combines low-layer and high-layer feature fusion and scale insensitivity measurement thought, and comprises the following steps: s1: establishing an unmanned aerial vehicle image target data set, and preprocessing image data; s2: slicing the input image, and then splicing slicing results; s3: a receptive field integrating multi-scale pooling information enrichment features; s4: introducing an NWD measurement based on a Gaussian Wasserstein distance; s5: and for unmanned aerial vehicle images containing small targets in the test set, target prediction is carried out by utilizing a trained improved feature extraction network. By adopting the method, the small target detection precision is improved, the depth detection algorithm aiming at the conventional scale target is improved, the effective detection of the target with limited pixels is realized, and the accuracy and the recall rate are higher.

Description

Unmanned aerial vehicle image target detection method based on multiscale and Gaussian Wasserstein distance
Technical Field
The invention relates to the technical field of aviation image processing, in particular to an unmanned aerial vehicle image target detection method based on multiscale and Gaussian Wasserstein distances.
Background
A drone image finite pixel target refers to a target in which few pixels are occupied in the drone image. Under the condition of long-distance imaging, especially when the medium-high altitude unmanned aerial vehicle looks at the ground in a long-distance strabismus mode, the number of occupied pixels of the ground target in the image is small. The unmanned aerial vehicle image data is effectively analyzed and processed by the computer, targets of different categories are identified, the positions of the targets are marked, the targets are one of the basic problems in the computer vision task, the unmanned aerial vehicle image data is widely applied to various fields such as military, agriculture and forestry, maritime affairs, disaster prevention and relief, city planning and the like, and higher requirements are also put forward on target detection tasks of unmanned aerial vehicle images.
The detection of small targets in a complex background is an important research direction in the field of image analysis and processing, compared with images in natural scenes, the unmanned aerial vehicle image has the characteristics of high background complexity, small target size, weak characteristics and the like due to the fact that the imaging distance is far, and the unmanned aerial vehicle image has the problems of low resolution, low color saturation, environmental noise distortion and the like due to the fact that the imaging environment is complex, such as weather, platform speed, height and stability variability are large, so that the difficulty of target detection is increased.
Existing target detection algorithms are divided into two main classes, namely algorithms based on traditional image processing and algorithms based on deep learning. The target detection method based on traditional image processing is mostly applied to the field of infrared dim and small target detection, and a target region of interest is selectively found by introducing a visual attention mechanism and utilizing the difference between a target and the background and noise, but the manual design features have the defect of insufficient representativeness, are easily interfered by complex backgrounds, and cannot be directly applied to unmanned aerial vehicle image target detection tasks. The target detection algorithm based on the deep neural network is excellent in a conventional data set, but has lower detection precision on small targets, because the convolutional neural network is generally formed by stacked convolutional and pooling layers, as the network hierarchy is deepened, the characteristic dimension is gradually reduced, the information quantity of the target to be detected is further reduced, and the target to be detected is difficult to detect.
Therefore, it is necessary to provide an unmanned aerial vehicle image target detection method based on multiscale and gaussian waserstein distances to solve the above-mentioned problems.
Disclosure of Invention
The invention aims to provide an unmanned aerial vehicle image target detection method based on multi-scale and Gaussian Wasserstein distance, which improves the detection precision of small targets, improves the depth detection algorithm for conventional scale targets, realizes effective detection on targets with limited pixels, and has higher accuracy and recall rate.
In order to achieve the above purpose, the invention provides an unmanned aerial vehicle image target detection method based on multiscale and Gaussian Wasserstein distance, which comprises the following steps:
s1: establishing an unmanned aerial vehicle image target data set, and preprocessing image data;
s2: slicing the input image, and then splicing slicing results;
s3: a receptive field integrating multi-scale pooling information enrichment features;
s4: introducing an NWD measurement based on a Gaussian Wasserstein distance;
s5: and for unmanned aerial vehicle images containing small targets in the test set, target prediction is carried out by utilizing a trained improved feature extraction network.
Preferably, in step S1, the original image is cut into uniform sizes of 800×800 pixels, the target is determined according to the frequency and size of the target appearing in the image, the image is selected according to the proportion of the target in the image, samples of X categories are taken as a training set, and samples of the remaining categories are taken as a test set.
Preferably, in step S2, the slicing operation is to set a Focus structure, perform downsampling, split the high resolution image into a plurality of low resolution images, and retain the feature information of the small object.
Preferably, in step S3, an SPP module is introduced before the last convolution layer of the backbone network, so as to fuse the feature information of different scales.
Preferably, in step S4, the NWD metric design is performed by modeling the bounding box as a two-dimensional gaussian distribution, and for a horizontal bounding box, its inscribed elliptic equation is expressed as:
wherein (mu) xy ) Is the center coordinate of ellipse, sigma x Sum sigma y Respectively represent the half-axis length along the x and y axes, mu x =c x ,μ y =c y ,σ x =w/2,σ y =h/2。
Preferably, in step S4, the probability density function of the two-dimensional gaussian distribution is expressed as:
where x, μ and Σ represent the coordinates of the gaussian distribution, the mean vector and the covariance matrix, respectively.
Preferably, in step S4, (x- μ) T Σ -1 When (x- μ) =1, the horizontal bounding box r= (c) x ,c y W, h) is modeled as a two-dimensional gaussian distribution N (μ, Σ), where:
converting the similarity between two bounding boxes into the distance between two gaussian distributions, μ for two-dimensional gaussian distributions 1 =N(m 11 ) Sum mu 2 =N(m 2 ,∑ 2 ),μ 1 Sum mu 2 The second order Wasserstein distance between is abbreviated as:
wherein I II F Representing the Frobenius norm;
for the slave bounding box a= (cx a ,cy a ,w a ,h a ) And b= (cx b ,cy b ,w b ,h b ) Modeled gaussian distribution N a And N b Further simplified into:
normalization using its exponential form is a measure of similarity of two bounding boxes:
wherein C is the average absolute size of the targets in the dataset, ioU curve increases in magnitude as the target size decreases, the index decrease due to the position offset.
Preferably, in step S4, the loss function is composed of a target confidence loss, a classification loss, and a bounding box regression loss weighting, where the target confidence loss and the classification loss use binary cross entropy, the bounding box regression loss is expressed as a normalized weighted sum of CIoU loss and NWD loss of the prediction bounding box and the real target bounding box, and the loss function is expressed as:
Loss=λ 1 L cls2 L obj3 [αL CIoU +(1-α)L NWD ]
L NWD =1-NWD(N p ,N g )
wherein NWD (N) p ,N g ) Representing an exponentially normalized wasperstein distance between the predicted and real boxes.
Preferably, in step S5, the performance of the algorithm is evaluated by using the AP50, the AP75 and the mAP as the evaluation indexes of the model, the effect of the improved feature extraction network on the test data set is tested, and the influence of the NWD metric introduced on the performance of the model is analyzed.
Therefore, the unmanned aerial vehicle image target detection method based on the multi-scale and Gaussian Wasserstein distance has the following beneficial effects;
(1) According to the invention, a multi-scale feature extraction module is used for carrying out bidirectional fusion on low-level features and high-level features in a Neck network by adopting a bidirectional feature pyramid network (BiFPN), so that the expression of the target feature information of the limited pixels is enriched.
(2) The invention improves the recall rate of detection by fusing the space-time information of multi-frame images.
(3) The invention ensures that the detection result has reliability by extracting and combining various image visual characteristics.
(4) The invention adopts normalized Gaussian Wasserstein distance measurement with scale insensitivity in non-maximum value inhibition stage and boundary frame regression loss to evaluate the similarity of a prediction frame and a real frame, thereby improving the detection precision of a small target.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
FIG. 1 is a flow chart of an unmanned aerial vehicle image target detection method based on multi-scale and Gaussian Wasserstein distances;
FIG. 2 is a schematic diagram of a Focus architecture employed in the present invention;
FIG. 3 is a block diagram of an SPP module employed in the present invention;
FIG. 4 is a schematic diagram of a position offset curve under an NWD metric based on Gaussian Wasserstein distance employed in the present invention;
Detailed Description
The technical scheme of the invention is further described below through the attached drawings and the embodiments.
Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs. The terms "first," "second," and the like, as used herein, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
The invention adopts the unmanned aerial vehicle image target detection method based on the multiscale and Gaussian Wasserstein distance, combines low-layer and high-layer feature fusion and scale insensitivity measurement thought, and comprises the following steps: s1: establishing an unmanned aerial vehicle image target data set, and preprocessing image data; s2: slicing the input image, and then splicing slicing results; s3: a receptive field integrating multi-scale pooling information enrichment features; s4: introducing an NWD measurement based on a Gaussian Wasserstein distance; s5: and for unmanned aerial vehicle images containing small targets in the test set, target prediction is carried out by utilizing a trained improved feature extraction network.
In step S1, the original image is cut into uniform sizes of 800×800 pixels, the target is determined according to the frequency and size of the target appearing in the image, the image is selected according to the proportion of the target in the image, samples of X categories are taken as training sets, and samples of the remaining categories are taken as test sets.
The AI-TOD data sets collected from a plurality of large-scale public aerial remote sensing image data sets such as DIOR, DOTA, xView, visDrone are used as basis to form an unmanned aerial vehicle image limited pixel small target data set, and the types of targets are mainly determined to be planes, ships, automobiles, people and the like according to the occurrence frequency and the occurrence size of the targets in the image.
The original images are cut into uniform sizes of 800 multiplied by 800 pixels in an overlapping mode, images with sizes not larger than 64 pixels are selected according to the proportion of targets in the images, a data set contains 28036 images and 700621 target examples, the average size of the targets is 12.8 pixels, the variance is 5.9 pixels, the target size is far smaller than other remote sensing data sets, samples of X categories are taken as training sets, and samples of the other categories are taken as test sets.
In step S2, the slicing operation is to set a Focus structure, perform downsampling, split the high resolution image into a plurality of low resolution images, and retain the feature information of the small object. Focus is a special downsampling method, specific processing operation is as shown in figure 2, the distances of one pixel are separated to take values and are combined into a low-resolution image, the number of corresponding channels is changed to 4 times that of the original low-resolution image, and by splitting the high-resolution image into a plurality of low-resolution images, the wide and high information is unified and concentrated on the channel dimension, so that the calculated amount is reduced, the information loss caused by downsampling is avoided, the characteristic information of a small target is reserved more, and the network training and reasoning speed is improved.
In step S3, an SPP module is introduced before the final convolution layer of the backbone network, and feature information of different scales is fused. The SSP module structure is shown in fig. 3, the input features first pass through a 1×1 convolution layer, respectively pass through three largest pooling windows of 5×5, 7×7 and 13×13 different scales, connect the pooling features of the three scales with the input features, and then pass through a 1×1 convolution layer to finally obtain feature vectors with fixed sizes. The SPP layer enhances the feature expression capability of the feature map by fusing receptive fields of the multi-scale pooling information rich features.
In step S4, ioU, which represents the degree of overlap between the prediction frame and the real frame, is widely used in the target detection frame based on the anchor frame, for example, the Non-maximum suppression (Non-MaximumSuppression, NMS) stage filters the prediction frame with higher overlap rate by using IoU index, and replaces L2 loss with index based on IoU in the loss function as the regression loss of the boundary frame, but the evaluation index based on IoU is very sensitive to small target position offset, and small position offset can cause IoU to drop rapidly, thereby affecting the performance of the detector based on the anchor frame. The similarity between the two bounding boxes is calculated using a normalized gaussian wasperstein distance. The NWD metric design process is to model the bounding box as a two-dimensional gaussian distribution, and for a horizontal bounding box, its inscribed ellipse equation is expressed as:
wherein (mu) xy ) Is the center coordinate of ellipse, sigma x Sum sigma y Respectively represent the half-axis length along the x and y axes, mu x =c x ,μ y =c y ,σ x =w/2,σ y =h/2。
In step S4, the probability density function of the two-dimensional gaussian distribution is expressed as:
where x, μ and Σ represent the coordinates of the gaussian distribution, the mean vector and the covariance matrix, respectively.
In step S4, (x- μ) T Σ -1 When (x- μ) =1, the horizontal bounding box r= (c) x ,c y W, h) can be modeled as a two-dimensional gaussian distribution N (μ, Σ), where:
converting the similarity between two bounding boxes into the distance between two gaussian distributions, μ for two-dimensional gaussian distributions 1 =N(m 1 ,∑ 1 ) Sum mu 2 =N(m 2 ,∑ 2 ),μ 1 Sum mu 2 The second order Wasserstein distance between is abbreviated as:
wherein I II F Indicating the Frobenius norm.
For the slave bounding box a= (cx a ,cy a ,w a ,h a ) And b= (cx b ,cy b ,w b ,h b ) Modeled gaussian distribution N a And N b Further simplified into:
normalization using its exponential form is a measure of similarity of two bounding boxes:
wherein C is the average absolute size of the targets in the dataset, ioU curve increases in magnitude as the target size decreases, the index decrease due to the position offset. As shown in fig. 4, the four curves corresponding to NWD are completely identical and insensitive to frame scale variation; the NWD curve is smoother, has lower sensitivity to offset, and when the bounding box a contains the bounding box B or the two bounding boxes have no intersection, the NWD index can still reflect the similarity of the two bounding boxes, with stronger robustness.
In step S4, the loss function is composed of a target confidence loss, a classification loss, and a bounding box regression loss weighting, where the target confidence loss and the classification loss use binary cross entropy, and the bounding box regression loss is expressed as a normalized weighted sum of CIoU loss and NWD loss of the prediction bounding box and the real target bounding box. The loss function is expressed as:
Loss=λ 1 L cls2 L obj3 [αL CIoU +(1-α)L NWD ]
L NWD =1-NWD(N p ,N g )
wherein NWD (N) p ,N g ) Representing an exponentially normalized wasperstein distance between the predicted and real boxes.
In step S5, the performance of the algorithm is evaluated by using the AP50, the AP75 and the mAP as the evaluation indexes of the model, the effect of the improved feature extraction network on the test data set is tested, and the influence of the NWD metric introduced on the performance of the model is analyzed.
The algorithm performance is evaluated by taking the AP50, the AP75 and the mAP as evaluation indexes of the model, wherein the average precision (Average Precision, AP) under each category is the area under the P-R curve, and the mAP represents the average value of the average precision (mAP) under each category; the mAP in the COCO data set represents an index obtained by calculating ten mAPs at intervals of 0.05 between a IoU threshold value of 0.5 and 0.95 and averaging, and the AP50 and the mAP75 represent average accuracies of each class calculated by using 0.5 and 0.75 as IoU threshold values, respectively.
The algorithm implementation is carried out on a deep learning framework PyTorch, and the hardware is configured as a CPU: intel Xeon 24 core, 1.9ghz,64gb RAM; GPU: geForce RTX 3080Ti. And initializing parameters by using a YOLOv5 official pre-training model, and performing fine tuning on a remote sensing image target detection data set. The initial learning rate is set to be 0.01, a Warmup warm-up strategy is adopted before training, and the learning rate is dynamically attenuated through a cosine annealing algorithm. Each model is trained for 1000 cycles, and to prevent overfitting, the process is ended in advance when the index on the validation set is no longer increasing for 100 cycles. The training time batch_size was set to 128 and the test time batch_size was set to 1.
And adopting a multi-scale training mode, and automatically clustering according to the true boundary frame labels marked by the data sets by adopting a K-means algorithm to generate a new optimal anchor point frame size so as to adapt to targets with different scales in different data sets.
The improved effect of the feature extraction network on the test data set is tested, the multi-layer feature images are connected in a bidirectional mode by adopting a BiFPN structure, dynamic weighting and fusion are carried out according to the feature importance degree, therefore, the feature expression capacity of the network is improved, and a detection head is newly added for a target. The experimental results are shown in Table 1, with mAP values increased by 0.3%, APm increased by 2.2% and AP75 increased by 1.0% on the AI-TOD dataset.
Table 1 improved network fabric performance contrast
The influence of NWD measurement introduced to the model performance is analyzed, and the NWD measurement insensitive to the target scale is adopted to replace IoU in the NMS stage, so that the increase of redundant detection frames caused by IoU of a prediction frame and a highest score prediction frame being smaller than a threshold value can be effectively avoided, and the false positive rate is overlarge; for the bounding box regression loss function, the NWD loss is introduced to help alleviate the sensitivity of CIoU loss to small target position deviation, so that the network can learn and optimize better for small targets, and experimental results are shown in table 2.
Table 2 influence of NWD metrics on detection performance
The NWD metric was introduced at the NMS stage with an mAP of 16.2% raised by 1.2% compared to the IoU metric mAP used by YOLOv 5. NWD loss complements CIoU loss in a normalized weighted manner, increasing by 1.6% compared to CIoU loss mAP alone when NWD loss weight is set to 0.35. Experimental results show that the NWD metric is introduced into the NMS stage and the bounding box regression loss to have a certain improvement on the small target detection performance improvement.
Performance comparisons were made with other classical and advanced target detection methods on AI-TOD datasets, using the official-provided COCOAPI-aid interface to ensure objectivity and credibility of model performance comparisons, with the comparison results shown in table 3.
TABLE 3 (1) comparison of different Algorithm Performance on AI-TOD dataset
TABLE 3 (2) comparison of different Algorithm Performance on AI-TOD datasets
The mAP value of the multi-class average precision of the method reaches 17.8%, and the AP is realized 50 And AP (Access Point) 75 41.4% and 12.4%, respectively. Compared with the baseline method YOLOv5, mAP is improved by 3.0%, and AP is improved 50 Promote 4.6%, AP 75 The method is improved by 3.3 percent, and the three indexes are higher than the classical target detection algorithm based on an anchor point frame and a non-anchor point frame; compared with classical multi-stage target detection methods such as Faster-RCNN, cascadeR-CNN, single-stage methods such as YOLOv3, SSD and RetinaNet have lower mAP values and poorer detection performance on small targets; the anchor-free detector CenterNet avoids the problem that the robustness of an anchor point frame with discrete size to a multi-scale target is poor, the anchor-free detector based on multiple center points further improves the detection performance of extremely small target detection through the design of the multiple center points and the bias target, and AP vt The index is highest and reaches 6.1%, and the method disclosed herein gives consideration to all scale target examples of the whole data set, and has outstanding performance advantages as a whole. AP compared with advanced DetectorRS algorithm t Index improvement 4.9%, AP vt The method improves the performance by 3.4 percent, and has obvious performance improvement on the detection of very small targets. The results of the comparison experiments show that the method provided by the invention has better performance than the current partial method on the detection task of the small target of the remote sensing image, thereby proving the effectiveness of the method.
Therefore, the unmanned aerial vehicle image target detection method based on the multi-scale and Gaussian Wasserstein distance combines the low-layer feature fusion with the high-layer feature fusion and the scale insensitivity measurement thought to improve the accuracy of detecting the small target with limited pixels in the unmanned aerial vehicle image.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention and not for limiting it, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that: the technical scheme of the invention can be modified or replaced by the same, and the modified technical scheme cannot deviate from the spirit and scope of the technical scheme of the invention.

Claims (9)

1. An unmanned aerial vehicle image target detection method based on multiscale and Gaussian Wasserstein distance is characterized by comprising the following steps of: the method comprises the following steps:
s1: establishing an unmanned aerial vehicle image target data set, and preprocessing image data;
s2: slicing the input image, and then splicing slicing results;
s3: a receptive field integrating multi-scale pooling information enrichment features;
s4: introducing an NWD measurement based on a Gaussian Wasserstein distance;
s5: and for unmanned aerial vehicle images containing small targets in the test set, target prediction is carried out by utilizing a trained improved feature extraction network.
2. The unmanned aerial vehicle image target detection method based on multi-scale and Gaussian Wasserstein distance according to claim 1, wherein the unmanned aerial vehicle image target detection method is characterized by comprising the following steps of: in step S1, the original image is cut into uniform sizes of 800×800 pixels, the target is determined according to the frequency and size of the target appearing in the image, the image is selected according to the proportion of the target in the image, samples of X categories are taken as training sets, and samples of the remaining categories are taken as test sets.
3. The unmanned aerial vehicle image target detection method based on multi-scale and Gaussian Wasserstein distance according to claim 1, wherein the unmanned aerial vehicle image target detection method is characterized by comprising the following steps of: in step S2, the slicing operation is to set a Focus structure, perform downsampling, split the high resolution image into a plurality of low resolution images, and retain the feature information of the small object.
4. The unmanned aerial vehicle image target detection method based on multi-scale and Gaussian Wasserstein distance according to claim 1, wherein the unmanned aerial vehicle image target detection method is characterized by comprising the following steps of: in step S3, an SPP module is introduced before the final convolution layer of the backbone network, and feature information of different scales is fused.
5. The unmanned aerial vehicle image target detection method based on multi-scale and Gaussian Wasserstein distance according to claim 1, wherein the unmanned aerial vehicle image target detection method is characterized by comprising the following steps of: in step S4, the NWD metric design procedure is:
modeling a bounding box as a two-dimensional gaussian distribution, for a horizontal bounding box, its inscribed elliptical equation is expressed as:
wherein (mu) xy ) Is the center coordinate of ellipse, sigma x Sum sigma y Respectively represent the half-axis length along the x and y axes, mu x =c x ,μ y =c y ,σ x =w/2,σ y =h/2。
6. The unmanned aerial vehicle image target detection method based on multi-scale and Gaussian Wasserstein distance according to claim 5, wherein the unmanned aerial vehicle image target detection method is characterized by comprising the following steps of: in step S4, the probability density function of the two-dimensional gaussian distribution is expressed as:
where x, μ and Σ represent the coordinates of the gaussian distribution, the mean vector and the covariance matrix, respectively.
7. The unmanned aerial vehicle image target detection method based on multi-scale and Gaussian Wasserstein distance according to claim 6, wherein the unmanned aerial vehicle image target detection method is characterized by comprising the following steps of: in step S4, (x- μ) T Σ -1 When (x- μ) =1, the horizontal bounding box r= (c) x ,c y W, h) is modeled as a two-dimensional gaussian distribution N (μ, Σ), where:
two boundaries are setThe similarity between the boxes is converted into a distance between two gaussian distributions, μ for two-dimensional gaussian distributions 1 =N(m 11 ) Sum mu 2 =N(m 2 ,∑ 2 ),μ 1 Sum mu 2 The second order Wasserstein distance between is abbreviated as:
wherein I II F Representing the Frobenius norm;
for the slave bounding box a= (cx a ,cy a ,w a ,h a ) And b= (cx b ,cy b ,w b ,h b ) Modeled gaussian distribution N a And N b Further simplified into:
normalization using its exponential form is a measure of similarity of two bounding boxes:
where C is the average absolute size of the targets in the dataset.
8. The unmanned aerial vehicle image target detection method based on multi-scale and Gaussian Wasserstein distance according to claim 7, wherein the unmanned aerial vehicle image target detection method comprises the following steps of: in step S4, the loss function is composed of a target confidence loss, a classification loss, and a bounding box regression loss weighting, where the target confidence loss and the classification loss use binary cross entropy, the bounding box regression loss is represented as a normalized weighted sum of CIoU loss and NWD loss of the prediction bounding box and the real target bounding box, and the loss function is represented as:
Loss=λ 1 L cls2 L obj3 [αL CIoU +(1-α)L NWD ]
L NWD =1-NWD(N p ,N g )
wherein NWD (N) p ,N g ) Representing an exponentially normalized wasperstein distance between the predicted and real boxes.
9. The unmanned aerial vehicle image target detection method based on multi-scale and Gaussian Wasserstein distance according to claim 1, wherein the unmanned aerial vehicle image target detection method is characterized by comprising the following steps of: in step S5, the performance of the algorithm is evaluated by using the AP50, the AP75 and the mAP as the evaluation indexes of the model, the effect of the improved feature extraction network on the test data set is tested, and the influence of the NWD metric introduced on the performance of the model is analyzed.
CN202310402925.8A 2023-04-17 2023-04-17 Unmanned aerial vehicle image target detection method based on multiscale and Gaussian Wasserstein distance Pending CN116469020A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310402925.8A CN116469020A (en) 2023-04-17 2023-04-17 Unmanned aerial vehicle image target detection method based on multiscale and Gaussian Wasserstein distance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310402925.8A CN116469020A (en) 2023-04-17 2023-04-17 Unmanned aerial vehicle image target detection method based on multiscale and Gaussian Wasserstein distance

Publications (1)

Publication Number Publication Date
CN116469020A true CN116469020A (en) 2023-07-21

Family

ID=87183772

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310402925.8A Pending CN116469020A (en) 2023-04-17 2023-04-17 Unmanned aerial vehicle image target detection method based on multiscale and Gaussian Wasserstein distance

Country Status (1)

Country Link
CN (1) CN116469020A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116775622A (en) * 2023-08-24 2023-09-19 中建五局第三建设有限公司 Method, device, equipment and storage medium for generating structural data
CN117333512A (en) * 2023-10-17 2024-01-02 大连理工大学 Aerial small target tracking method based on detection frame tracking

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116775622A (en) * 2023-08-24 2023-09-19 中建五局第三建设有限公司 Method, device, equipment and storage medium for generating structural data
CN116775622B (en) * 2023-08-24 2023-11-07 中建五局第三建设有限公司 Method, device, equipment and storage medium for generating structural data
CN117333512A (en) * 2023-10-17 2024-01-02 大连理工大学 Aerial small target tracking method based on detection frame tracking

Similar Documents

Publication Publication Date Title
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
CN111723748A (en) Infrared remote sensing image ship detection method
CN110263712B (en) Coarse and fine pedestrian detection method based on region candidates
CN108108657A (en) A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning
CN110059586B (en) Iris positioning and segmenting system based on cavity residual error attention structure
CN112836713A (en) Image anchor-frame-free detection-based mesoscale convection system identification and tracking method
CN116469020A (en) Unmanned aerial vehicle image target detection method based on multiscale and Gaussian Wasserstein distance
CN113420607A (en) Multi-scale target detection and identification method for unmanned aerial vehicle
CN109919223B (en) Target detection method and device based on deep neural network
CN114359851A (en) Unmanned target detection method, device, equipment and medium
CN106408030A (en) SAR image classification method based on middle lamella semantic attribute and convolution neural network
CN112580480B (en) Hyperspectral remote sensing image classification method and device
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN114255403A (en) Optical remote sensing image data processing method and system based on deep learning
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
CN114049572A (en) Detection method for identifying small target
CN115187786A (en) Rotation-based CenterNet2 target detection method
CN116416503A (en) Small sample target detection method, system and medium based on multi-mode fusion
CN116091946A (en) Yolov 5-based unmanned aerial vehicle aerial image target detection method
Khoshboresh-Masouleh et al. Robust building footprint extraction from big multi-sensor data using deep competition network
CN109284752A (en) A kind of rapid detection method of vehicle
CN117218545A (en) LBP feature and improved Yolov 5-based radar image detection method
CN110334703B (en) Ship detection and identification method in day and night image
CN111582057B (en) Face verification method based on local receptive field
Yin et al. M2F2-RCNN: Multi-functional faster RCNN based on multi-scale feature fusion for region search in remote sensing images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination