CN112287899A

CN112287899A - Unmanned aerial vehicle aerial image river drain detection method and system based on YOLO V5

Info

Publication number: CN112287899A
Application number: CN202011350152.6A
Authority: CN
Inventors: 谷永辉; 刘昌军
Original assignee: Shandong Jiexun Communication Technology Co ltd
Current assignee: Shandong Jiexun Communication Technology Co ltd
Priority date: 2020-11-26
Filing date: 2020-11-26
Publication date: 2021-01-29

Abstract

The utility model provides an unmanned aerial vehicle aerial image river drain detection method and system based on YOLO V5, include: establishing an unmanned aerial vehicle aerial photography river drain outlet image data set; preprocessing the established data set, and establishing a training set and a testing set; and (3) constructing a YOLO V5 model, testing the performance of the trained YOLO V5 model by using a test set, and then using the model for identifying the aerial photography sewage outlet image of the unmanned aerial vehicle. The problem of in the traditional target detection method, the detection precision is lower to unmanned aerial vehicle aerial photograph image drain class small target is solved. Supplementary artifical image river drain that detects by plane improves the detection precision of drain.

Description

Unmanned aerial vehicle aerial image river drain detection method and system based on YOLO V5

Technical Field

The disclosure belongs to the field of unmanned aerial vehicle aerial image target detection, and particularly relates to a method and a system for detecting river sewage outlets of unmanned aerial vehicle aerial images based on YOLO V5.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Domestic sewage treatment plants, industrial sewage treatment plants and mixed sewage treatment plants are important components of modern sewage outlets. The domestic sewage and the industrial wastewater are important reasons for river pollution, and the sewage outlets entering the river are large in quantity, wide in distribution, fast in change and strong in concealment. The traditional method is difficult to find the sewage draining exit, the cost is high, the omission is large, and the period is long. The unmanned aerial vehicle is low in construction and use cost, strong in maneuverability, easy to take off and land, small in safety risk coefficient, capable of achieving high-resolution image acquisition and capable of being used for image acquisition and investigation of river sewage outlets.

The inventor finds that the identification of the river sewage outlet of the current unmanned aerial vehicle aerial image mainly depends on expert experience, image judging personnel who are trained through professional discrimination mark the image, and the position of the sewage outlet is determined in the image according to image information. However, compared with a common image, the aerial image of the unmanned aerial vehicle has the resolution as high as 7952 x 5304, the sewage draining exit accounts for less than 0.01% of the whole image, the types and the forms of the sewage draining exit are more, the semantic information of the background is very rich, and the interference information is more. The manual judgment depends on the technical level of the professional, so that the problems of missing detection, false detection and the like are easily caused, and the checking workload of field workers can be increased. Meanwhile, the number of the shot images of the unmanned aerial vehicle is tens of thousands, and the sewage discharge port in the images is manually detected, so that time and labor are consumed.

Disclosure of Invention

In order to overcome the defects of the prior art, the method for detecting the river sewage outlet based on the YoLO V5 aerial images is provided, and the task of identifying the small target under the complex background information is completed.

In order to achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:

in a first aspect, a method for detecting a river sewage outlet based on an aerial image of an Unmanned Aerial Vehicle (UAV) 5 is disclosed, which comprises the following steps:

establishing an unmanned aerial vehicle aerial photography river drain outlet image data set;

preprocessing the established data set, and establishing a training set and a testing set;

constructing a YOLO V5 model, training the YOLO V5 model by using a training set, segmenting the segmented image again during training and detection so as to increase the proportion of the target size in the image, modifying the size of the image input to the network into the size of the image which is cut at last, and randomly overturning certain data with less sewage outlets for data enhancement;

the performance of the trained YOLO V5 model is tested by using a test set, and then the model is used for identifying the aerial sewage outlet image of the unmanned aerial vehicle.

According to the further technical scheme, the established data set is preprocessed, and the preprocessing comprises the steps of blocking the high-resolution image, marking a sewage outlet on the blocked image, and cutting the marked image again.

Further technical solution, the image data set contains two types of drain: the drain dam and the drain pipe select a drain target in the labeling process, input a label of the selected target, and then store the labeled information into a designated folder, wherein the file stores the labeled category of the drain.

According to a further technical scheme, the built YOLO V5 model comprises an input end, a feature extraction part, a Neck part and a Head part;

the input end carries out data enhancement processing, self-adaptive anchor frame calculation and self-adaptive picture scaling processing on the image;

the characteristic extraction part is used for carrying out slicing operation on the input image and obtaining a characteristic diagram through convolution operation;

the Neck part classifies and segments the feature map;

the Head section is used to obtain predictions of different scales.

According to a further technical scheme, a Neck part adopts a feature pyramid network FPN and a PAN network, the feature pyramid network consists of a top-down part and a bottom-up part, wherein the top-down part is used for extracting features of aerial images, and the bottom-up part is used for fusing feature information of different scales;

the PAN network and the FPN network are used together to create a bottom-up enhanced path for shortening the information path, and the stored accurate positioning signals are used to propagate the bottom-layer basic information to the higher layer for classification and segmentation.

In the further technical scheme, in the Head part, weighting non-maximum inhibition is used, in the process of eliminating the anchor frame, the confidence coefficient of the anchor frame is used as a weight value to obtain a new rectangular frame, the rectangle is used as a finally predicted rectangular frame, and then the anchor frame with a lower score is eliminated.

According to a further technical scheme, YOLO V5 models with different sizes are selected according to different requirements.

In a second aspect, an unmanned aerial vehicle aerial image river sewage outlet detection system based on YOLO V5 is disclosed, comprising:

the data set establishing and processing module is used for establishing an unmanned aerial vehicle aerial photography river drain outlet image data set; preprocessing the established data set, and establishing a training set and a testing set;

the sewage draining exit image module is used for constructing a YOLO V5 model, training the YOLO V5 model by using a training set, segmenting the segmented image again during training and detection so as to increase the proportion of the target size in the image, modifying the size of the image input to a network into the size of the finally segmented image, and randomly turning over data with less certain sewage draining exits for data enhancement;

The above one or more technical solutions have the following beneficial effects:

the technical scheme solves the problem that in the traditional target detection method, the detection precision of small targets such as a sewage outlet of an aerial image of an unmanned aerial vehicle is low. Supplementary artifical image river drain that detects by plane improves the detection precision of drain.

According to the technical scheme, the image is cut for multiple times, and the YOLO V5 model is used, so that the aerial photography river image pollution discharge port small target detection is adapted, and the target detection precision is improved.

According to the technical scheme, the image is divided into two blocks, and the original small target detection task is converted into a larger target detection task. The YOLO V5 is used to avoid the influence of complex background information on the target detection precision, and a better target detection result is obtained.

The models of the technical scheme disclosed by the invention are four in number and correspond to the requirements of different degrees. And the minimum model size is only 27M, so that the deployment cost is reduced during deployment, and the rapid deployment of the model is facilitated.

The technical scheme of the disclosure has the advantage that the running time of the YOLO V5 is shortest within 20 milliseconds on a single image (the bench size is 1). And YOLO V5 implements default batch reasoning (bench size is 36) and divides batch time by the number of images in batch, the single picture reasoning time can reach 7ms, i.e. 140FPS, which is State-of-art in the current field of target detection.

The technical scheme disclosed uses the Pythrch frame to realize, and easy user's extension and application have certain practical application and worth at unmanned aerial vehicle aerial photography drain investigation.

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.

FIG. 1 is a simplified diagram of the YOLO V5 model in accordance with an embodiment of the present disclosure;

FIG. 2 is a diagram illustrating a result of detecting a target at a sewage outlet of an aerial photography river according to an embodiment of the disclosure.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

Deep learning is one of the important methods for image target detection. Currently, methods for target detection are generally classified into One stage and Two stage methods. Wherein, the One stage method comprises SSD, RetinaNet, YOLO series and the like; the Two stage method includes RCNN, Fast-RCNN, and the like. The existing target detection data sets comprise data sets such as VOC, COCO, ImageNet and the like, target sizes in the data sets are large, and the method has a good effect no matter the One stage and the Two stage are used. The target scale of the unmanned aerial vehicle aerial image drain outlet is very small, but the background information is rich. Under comprehensive consideration, the unmanned aerial vehicle aerial image river sewage outlet detection method based on the YOLO V5 is adopted in the technical scheme disclosed by the invention, and the task of identifying the small target under the complex background information is completed.

The overall technical concept is as follows:

the technical scheme of the method includes that an unmanned aerial vehicle is used for collecting images of a river drain outlet; and marking and cutting the sewage outlet in the image, and establishing a training set and a testing set of the sewage outlet image. As the image background of the sewage draining exit is complex, the target size ratio picture is only less than 0.01%, and the interference of complex background information to the target is large, the YoLO V5 model of One stage series is selected, thereby reducing the influence of the background information on the identification. The model of the YOLO series has poor detection effect on small targets, so that in the training and detection part, the segmented image is segmented again to increase the ratio of the target size in the image. In addition, data enhancement of random overturning is carried out on less data of a certain sewage draining exit. After preprocessing, training a YOLO V5 model by using a training set, and evaluating the model on a test set; the trained YOLO V5 model is then used for aerial image river sewage outlet detection. The method and the device accurately detect the aerial image drain outlet, have strong practicability, can overcome the influence of background information on identification, and integrate various characteristics of high speed and small model to obtain better average precision of target detection.

Example one

The embodiment discloses an unmanned aerial vehicle aerial image river sewage outlet detection method based on YOLO V5, which comprises the following steps:

step 1: by using the unmanned aerial vehicle aerial photography river, images containing the sewage draining exit or the suspected sewage draining exit in the shot images are screened by experts, and an unmanned aerial vehicle aerial photography river sewage draining exit image data set is established.

Step 2: and preprocessing the established data set, namely blocking the high-resolution image, carrying out drain port labeling on the blocked image by an expert, cutting the labeled image again, and establishing a training set and a test set of the model.

Step 21: in the invention, the river sewage outlet image aerial photographed by the unmanned aerial vehicle is a high-resolution image, so that a specialist can conveniently label the image with a rectangular frame and improve the training efficiency of the model, and therefore, the high-resolution image is processed in a blocking way. Assuming that the size of the high resolution image is W × H, the number of divisions is calculated using the following formula:

Num_x＝W/7

Num_y＝H/7

num in the above formula_xAnd Num_yRepresenting the number of blocks wide and high, respectively, in a blocking time.

Step 22: on the basis of step 21, a blocked image is obtained. And labeling the partitioned image by using a labelImg tool. The data set established in step 1 contains two types of sewage outlets: a sewage discharge dam and a sewage discharge pipe. In the labeling process, the expert selects a drain target by using a labelImg tool and inputs a label of the selected target. Then, the labelImg stores the labeled information into a specified folder in an XML format. The XML file stores the type of the labeled sewage draining exit, the coordinates of the X and Y at the upper left corner of the labeling frame and the coordinates of the X and Y at the lower right corner of the labeling frame.

Step 23: after the image cut of step 22, the drain is still small in scale compared to the cut image, with less than 1% image. Therefore, the marked image is cut twice by a slider overlap cutting method, and the size of the cut image is 600 × 600. At this time, the target occupation of the sewage draining exit is larger, and the detection of the small target is converted into the detection of the larger target.

And step 3: the YOLO V5 model was constructed. In the model, the CSPDarkNet53 is improved to become CSP _ X, and the feature extraction capability of the network on the image is further enhanced. And a better regression loss function and improved non-maximum inhibition are used, so that higher target detection precision is obtained. A YOLO V5 model diagram is shown in FIG. 1. The model specification selection s is the most lightweight and smallest model. The main part of the model consists of an input part, a feature extraction part, a Neck part and an output part.

Step 31: at the part of the input end, the motion data enhancement, the self-adaptive anchor frame calculation and the self-adaptive picture scaling are added. The motion data enhancement is realized by evolving according to Cutmix, four images are recombined into a new image, and in the recombination process, the length and the width of the image are random, and operations such as random image turning, random zooming, random arrangement and the like can be added randomly. The dynamic data enhancement has a certain enhancement effect on the detection effect of the small target. In YOLO V4, the calculation of the anchor box requires a separate calculation using the K-means clustering algorithm alone and is determined only before training. However, in YOLO V5, the anchor block calculation is integrated into the model, and different anchor blocks are adapted to be trained when different parts are trained. Since the input image size may be different, the resize operation is used to change the picture size, changing the input image to a fixed size. However, there is a possibility that the input image has an uneven aspect ratio, resulting in a large number of filled gray portions in the image after resize, and an increase in the inference speed. Therefore, adaptive picture scaling is designed, the filled gray part is in an adaptive proportion, and the reasoning speed of the model is accelerated.

Step 32: in the feature extraction part, a Focus structure is added into a YOLO V5 model, slicing operation is carried out on an input image, and a feature map of 304 x 12 is finally obtained through convolution operation of 32 convolution kernels. And the subsequent network structure is improved from the traditional DarkNet53, a CSP structure is added, a convolution branch is added in addition to the Residual Block of DarkNet53, and the new network structure can enhance the feature extraction capability. And adding an SPP (spatial Pyramid Pooling) network at the end of the network, and fusing the convolved results through convolution kernels with different sizes to expand the receptive field of the network.

Step 33: the hack section employs Feature Pyramid Networks (FPN) and pan (path Aggregation Networks) Networks. The feature pyramid network consists of two parts, top-down and bottom-up. The top-down network is used for extracting features of aerial images, and the bottom-up network is used for fusing feature information of different scales. The PAN network is mainly used together with the FPN network, a bottom-up enhanced path can be created for shortening an information path, and bottom-layer basic information is propagated to a higher layer to help better classification and segmentation by utilizing accurate positioning signals stored in a low-level feature layer.

Step 34: head obtains the prediction results of three different scales. Take an input image size of 608 × 608 as an example. The prediction result sizes were 19 × 19 × 255, 38 × 38 × 255, and 76 × 76 × 255, respectively. Different sized results are used to predict different sized targets. In the post-processing process of target detection, a Non-Maximum Suppression (NMS) operation is usually required for screening many target anchor frames. And if not, screening the anchor frames with different confidence degrees, and screening the anchor frames with lower inhibition scores. In the YOLO V5, weighting NMS is used, in the process of removing the anchor frame, the confidence of the anchor frame is used as a weight to obtain a new rectangular frame, the rectangle is used as a final predicted rectangular frame, and then the anchor frame with a lower score is removed. The LOSS function uses CIOU-LOSS LOSS function, and the formula is as follows:

wherein IOU is cross-over ratio, C represents the minimum area for closing A and B, | C/(Au B) | represents the area of C which does not cover A and B and occupies the total area of C.

The total loss function is

L＝GIOU_Loss+L_conf+L_cls

The model size of YOLO V5 is divided into four types, which are s, m, l and x. Respectively suitable for different requirements.

And 4, step 4: the YOLO V5 model was trained using the training set, modifying the image size of the input network to the image size of the last cut.

Step 41: in the training process, the cross-over ratio IOU>A portion of 0.7, considered a foreground object; when IOU is used<At 0.3, we consider it to be a background object. The penalty function is shown in step 34, where GIOU _ Loss is the Loss, L, of the regression anchor box_confLoss, L, of confidence_clsIs the classified loss.

And finally, the optimization algorithm of the model adopts an Adam method, and the learning rate is set to be 0.001. And training the training set data to obtain a drain detection model for assisting in completing a reasoning task of the drain.

And 5: and testing the performance of the trained YOLO V5 model by using the test set, then using the model for identifying the aerial photography sewage outlet image of the unmanned aerial vehicle, and outputting an identification result.

Drain aerial image training set and test set statistical table related in the embodiment

Table 1 river sewage outfall aerial photography image data set statistical table

Data set	Sewage discharge dam	Blow-off pipe
			Training set	7650	3520
Test set	2434	1165
			Total up to	10084	4685

In order to evaluate the target detection result of the sewage outlet, the invention adopts Precision (Precision) and recall rate (Rcall) indexes, and the calculation formula is as follows:

wherein TP (true Positive) represents the number of the sewage outlets which are correctly detected, FP (false positive) represents the number of the sewage outlets which are wrongly detected, and FN (false negative) represents the number of the sewage outlets which are not detected.

According to the precision and the recall rate, a P-R curve can be obtained, the lower area of the P-R curve is AP (Average precision), each type of sewage outlet correspondingly obtains an AP value, further, the Average precision mAP (mean Average precision) of target detection of the two types of sewage outlets can be calculated, and the precision of the detection algorithm is measured.

FIG. 2 is a view showing the inspection of a drain, and the results of the present invention are shown in Table 2, in comparison with that of fast RCNN in the conventional Two Stage algorithm.

TABLE 2 comparison of the results of the test by the method of the present invention and the conventional fast RCNN method

Method of producing a composite material	AP (blowdown dam)	AP (blow off pipe)	mAP
				Faster RCNN	68.3％	55.2％	61.8％
The method of the invention	89.04％	69.4％	81.4％

As can be seen from the table, compared with the traditional fast RCNN, the detection precision of the method provided by the invention on the sewage discharge dam and the sewage discharge pipe is respectively improved by 20.74 percent and 14.2 percent, the average precision is improved by 19.6 percent, and the method provided by the invention has certain advantages.

Example two

The present embodiment is directed to a computing device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor executes the computer program to implement the specific steps in the first implementation example.

EXAMPLE III

An object of the present embodiment is to provide a computer-readable storage medium.

A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, performs the specific steps of the first embodiment.

Example four

The purpose of this embodiment is to provide unmanned aerial vehicle aerial photograph image river drain detecting system based on YOLO V5, includes:

and the sewage outlet image module is used for constructing a YOLO V5 model, training a YOLO V5 model by using a training set, modifying the size of an image of an input network into the size of a finally cut image, testing the performance of the trained YOLO V5 model by using a test set, and then using the model for identifying the aerial photography sewage outlet image of the unmanned aerial vehicle.

The steps involved in the apparatuses of the above second, third and fourth embodiments correspond to the first embodiment of the method, and the detailed description thereof can be found in the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media containing one or more sets of instructions; it should also be understood to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any of the methods of the present disclosure.

Those skilled in the art will appreciate that the modules or steps of the present disclosure described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code executable by computing means, whereby the modules or steps may be stored in memory means for execution by the computing means, or separately fabricated into individual integrated circuit modules, or multiple modules or steps thereof may be fabricated into a single integrated circuit module. The present disclosure is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims

1. Unmanned aerial vehicle aerial image river drain detection method based on YOLO V5 is characterized by comprising the following steps:

2. The method for detecting the river sewage draining exit of the unmanned aerial vehicle aerial image based on YOLO V5 as claimed in claim 1, wherein the established data set is preprocessed, and the preprocessing comprises the steps of blocking the high-resolution image, labeling the sewage draining exit of the blocked image, and cutting the labeled image again.

3. The method for detecting river sewage outlets based on the Yolo V5 aerial images of the unmanned aerial vehicle as claimed in claim 1, wherein the image data set comprises two types of sewage outlets: the drain dam and the drain pipe select a drain target in the labeling process, input a label of the selected target, and then store the labeled information into a designated folder, wherein the file stores the labeled category of the drain.

4. The method for detecting the river sewage draining exit based on the Yolo V5 aerial photography image of the unmanned aerial vehicle as claimed in claim 1, wherein the constructed Yolo V5 model comprises an input end, a feature extraction part, a Neck part and a Head part;

the Neck part classifies and segments the feature map;

the Head section is used to obtain predictions of different scales.

5. The method for detecting the river sewage outlet based on the Yolo V5 unmanned aerial vehicle aerial image as claimed in claim 4, wherein the Neck part adopts a feature pyramid network FPN and a PAN network, the feature pyramid network is composed of a top-down part and a bottom-up part, the top-down part is used for extracting features of the aerial image, and the bottom-up part is used for fusing feature information of different scales;

6. The method for detecting the river sewage draining exit of the unmanned aerial vehicle aerial image based on YOLO V5 as claimed in claim 4, wherein in the Head section, weighting non-maximum suppression is used, in the process of eliminating the anchor frame, the confidence coefficient of the anchor frame is used as a weight value to obtain a new rectangular frame, the rectangle is used as a finally predicted rectangular frame, and then the anchor frame with a lower score is eliminated.

7. The method for detecting the river sewage draining exit based on the YOLO V5 aerial photography image of the unmanned aerial vehicle as claimed in claim 1, wherein the YOLO V5 model with different sizes is selected according to different requirements.

8. Unmanned aerial vehicle aerial image river drain detecting system based on YOLO V5, characterized by includes:

9. A computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any of claims 1-7 when executing the program.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method steps of any one of the preceding claims 1-7.