CN110852243B

CN110852243B - Road intersection detection method and device based on improved YOLOv3

Info

Publication number: CN110852243B
Application number: CN201911078236.6A
Authority: CN
Inventors: 金飞; 陈佳怡; 王龙飞; 刘智; 芮杰; 王淑香; 官恺; 吕虎
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2019-11-06
Filing date: 2019-11-06
Publication date: 2022-06-28
Anticipated expiration: 2039-11-06
Also published as: CN110852243A

Abstract

The invention relates to a road intersection detection method and device based on improved YOLOv3, which mainly comprises the steps of firstly obtaining a road image; then, carrying out network training to construct an improved YOLOv3 network model; the improved YOLOv3 network model comprises a feature extraction end and a feature detection end, wherein the feature detection end comprises a plurality of channels, and in each channel, the corresponding convolution module is firstly widened transversely to generate different feature maps, and then is longitudinally aggregated; and identifying the road image to be detected by adopting the improved YOLOv3 network model, and outputting a result. The invention firstly expands the convolution module in each channel of the improved YOLOv3 feature detection end transversely to generate different feature maps, and then carries out longitudinal aggregation, so that the network width of the convolution module of each channel is wider, the expression capability of the network is enhanced, the difficulty in detecting small-size road intersections in complex remote sensing scenes is reduced, and the detection precision is improved.

Description

Road intersection detection method and device based on improved YOLOv3

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a road intersection detection method and device based on improved YOLOv 3.

Background

The road intersection is used as a junction for road connection, and provides important information such as accurate position, direction, topological relation and the like for rapid construction of a road network. In the road network extraction process, the extracted road is disturbed by various complex factors, and the phenomenon of discontinuity can occur. In this case, the construction of the road network can be assisted and guided by using information such as directions and topological relations, using the positions of the intersections as base points.

However, based on the characteristic that a road intersection is generally a planar object with a small shape in a remote sensing image, a common detection algorithm is mainly used for detecting the road intersection according to characteristics such as texture, shape and gray scale, so that the detection effect is good for the road intersection with a simple background and obvious contour characteristics, but the detection difficulty for the small-size road intersection in a complex remote sensing scene is very high, more manual intervention is required to be introduced, and the automation degree and the detection precision are not high.

Disclosure of Invention

The invention aims to provide a road intersection detection method and device based on improved YOLOv3, which are used for solving the problems of high difficulty and low detection precision in the detection of small-size road intersections in complex remote sensing scenes.

In order to solve the technical problems, the technical scheme of the invention is as follows: a road intersection detection method based on improved YOLOv3 comprises the following steps:

1) acquiring a road image;

2) carrying out network training, and constructing an improved YOLOv3 network model; the improved YOLOv3 network model comprises a feature extraction end and a feature detection end, wherein the feature detection end comprises a plurality of channels, and in each channel, a corresponding convolution module is firstly subjected to transverse widening to generate different feature maps, and then is subjected to longitudinal aggregation;

3) and identifying the road image to be detected by adopting the improved YOLOv3 network model, and outputting the result.

The invention has the beneficial effects that: the convolution modules in each channel of the improved YOLOv3 feature detection end are transversely widened to generate different feature maps, and then are longitudinally aggregated, so that the network width of the convolution modules of each channel is wider, the network expression capacity is enhanced, the detection difficulty of small-size road intersections in complex remote sensing scenes is reduced, and the detection precision is improved.

Further, the activation function of the feature extraction end is as follows:

wherein x is_iIs a characteristic value;

Mu is momentum factor, epsilon is learning rate, epsilon is constant, a_iIs 0.25, and the variation range is between (0, 1).

Further, the manner of widening the convolution module of each channel includes: symmetric widening and asymmetric widening.

The invention also provides a road intersection detection device based on the improved YOLOv3, which comprises a processor and a memory, wherein the processor is connected with an acquisition interface for acquiring a road image; the processor executes the following method instructions stored in memory:

1) acquiring a road image;

Further, the activation function of the feature extraction end is as follows:

wherein x is_iIs a characteristic value;

mu is a momentum factor, epsilon is a learning rate, epsilon is a constant, a _iIs 0.25, and the variation range is between (0, 1).

Further, the widening method of the convolution module of each channel includes: symmetric broadening and asymmetric broadening.

Drawings

FIG. 1 is a schematic diagram of the improved YOLOv3 network model of the present invention;

FIG. 2 is a diagram of the symmetrically widened symmetric convolution module channels of the improved YOLOv3 network model of the present invention;

FIG. 3 is an asymmetric convolution module path for asymmetric broadening of the YOLOv3 network model of the present invention;

FIG. 4 is a multi-scale feature fusion diagram of the YOLOv3 network model of the present invention;

FIG. 5-1 is a prior art YOLOv3 network model test curve;

FIG. 5-2 is a modified YOLOv3 network model test curve of the present invention;

FIG. 6-1 is a diagram of a detection result of a road intersection in an existing Yolov3 network model occlusion environment;

FIG. 6-2 is a diagram of road intersection detection results in an occluded environment of the improved YOLOv3 network model of the present invention;

FIG. 7-1 is a diagram of a detection result of a road intersection in an environment with a similar background color of an existing YOLOv3 network model;

FIG. 7-2 is a diagram of the detection result of the intersection under the environment with the similar background color of the improved YOLOv3 network model of the invention;

FIG. 8-1 is a diagram of the detection results of a dense intersection of a conventional YOLOv3 network model;

Fig. 8-2 is a diagram of the detection results of dense road intersections for the improved YOLOv3 network model of the present invention.

Detailed Description

For purposes of illustrating the objects, aspects and advantages of the present invention in detail, the present invention will be described in further detail with reference to the following detailed description and accompanying drawings.

Embodiment of road intersection detection method

The invention provides a road intersection detection method based on improved YOLOv3, which mainly comprises the steps of firstly collecting road images; carrying out network training, and constructing an improved YOLOv3 network model; the improved YOLOv3 network model comprises a feature extraction end and a feature detection end, wherein the feature detection end comprises a plurality of channels, and in each channel, the corresponding convolution module is firstly widened transversely to generate different feature maps, and then is longitudinally aggregated; and identifying the road image to be detected by adopting the improved YOLOv3 network model, and outputting a result.

Specifically, the road intersection detection method based on the improved YOLOv3 comprises the following steps:

(1) and acquiring road image data.

3106 images of 7 common road intersections including crosses, T-shaped, X-shaped, Y-shaped, staggered, annular and multipath are acquired through data platforms such as an image database, a Google map and the like, the resolution of the images is between 0.5 and 2.5 meters, and the size of the images is 416 multiplied by 416 pixels.

In computer vision, a planar object is often represented in the form of a border. In order to obtain accurate frame label information of the target, the targets are labeled one by one manually by using a lamb Ing platform, information such as the coordinates of the center point, the width and the height of the frame and the category of the frame is obtained, and the information is stored in an xml file corresponding to the image.

Finally, the whole data set is divided into 1: the 2:7 ratio is divided into a test set, a validation set, and a training set.

(2) And constructing an improved YOLOv3 network model and carrying out network training on the improved YOLOv3 network model.

The YOLOv3 network belongs to an end-to-end-based target detection network and comprises two parts, namely a feature extraction end Darknet and a feature detection end yolo. The characteristic extraction end Darknet is a depth network formed by 52 convolution modules and 23 residual modules, and the task of the characteristic extraction end Darknet is to extract characteristics of an original image layer by layer to form semantic characteristic graphs with different scales.

The improved YOLOv3 network model of the embodiment is further improved on the basis of the existing YOLOv3 network, and the improvement comprises the following two points:

1. the convolution layer is activated in the convolution module using the PReLU function.

Compared with the way that the LReLU function adopts a linear function with a fixed small slope to map the negative features into weak features, the PReLU function automatically adjusts the slope of the linear function according to the data characteristics and reserves more negative features related to the target.

The PReLU function defines the expression as follows:

as shown in equation (1), the prilu function performs identity mapping when the eigenvalue is greater than 0, and performs non-fixed linear mapping when the eigenvalue is less than 0.

In the neural network back propagation process, the slope a is updated by utilizing the momentum factor and the learning rate of the network_i. In the formula (2), mu is a momentum factor, epsilon is a learning rate, epsilon is a constant, and a_iIs 0.25, and the variation range is between (0, 1).

In the PReLU function in the embodiment, in a complex remote sensing scene, if more similar interference factors exist in the detection frame, the slope of the linear function is continuously increased in the back propagation process, the attention of the negative feature is improved, and the correlation is established between the road intersection feature and the interference factors. When the feature detection end performs classification regression on the targets, the semantic information and the negative features of the context are integrated, so that the problem of road intersection detection under similar interference factors can be solved to a certain extent.

2. The mode of widening the convolution module of each channel in the feature detection end comprises the following steps: symmetric broadening and asymmetric broadening.

Specifically, 3 types of convolution layers are arranged in parallel in the convolution layer before the output layer Contact, and as shown in fig. 2, the sizes of convolution kernels are 3 × 3, 1 × 1, 3 × 3 and 3 × 3 in sequence; that is, 3 convolution channels are constructed in the transverse direction by convolution kernels of 1 × 1 and 3 × 3, and each channel generates feature maps with different sizes after convolution operation. And adding a BN layer and a PReLU activation function after each convolutional layer, wherein the BN layer is a regularization function which enables the input of each layer of neural network to keep the same distribution in the deep neural network training process, the gradient descending speed and the accuracy rate can be improved, and the PReLU activation function carries out linear activation on the characteristic data. Finally, outputting feature graphs with the same channel number and the same size and size by the 3 convolutional layers; in order to ensure the SAME size, the SAME algorithm is used to fill the frames of the feature maps (or the zero padding setting of the frames) in the convolution operation, and then the feature maps with the SAME size are input into the output layer Contact for the merging operation.

The BN and pralu functions in the above embodiments are provided after each convolution kernel, which is actually combined with the convolution layer, which may also be considered to include the BN and pralu functions.

The 3 juxtaposed convolution layers in the above embodiment constitute a symmetric convolution module, as shown in fig. 3. Of course, as another embodiment, the asymmetric convolution module may be constructed by replacing one of the above 3 × 3 convolution kernels with the 1 × 3 and 3 × 1 convolution kernels.

It should be noted that, when the feature size in this embodiment is 12 × 12 to 12 × 20, the asymmetric convolution module has better extraction effect than the symmetric convolution module, and the number of network parameters of 1/3 can be reduced. Therefore, in the invention, in the 1 st detection channel with the characteristic diagram size of 13 multiplied by 13, an asymmetric convolution module is adopted; while symmetric convolution modules are employed on other large scale channels.

Therefore, the overall structure of the improved YOLOv3 network in the invention is shown in fig. 1, wherein, conditional-0 denotes a symmetric convolution module, conditional-1 denotes an asymmetric convolution module, and conditional-Set is a combination of 5 convolution modules.

In addition, in the embodiment, a multi-scale fusion method is adopted when the feature extraction of the road image is carried out; the multi-scale fusion method comprises a bottom-up path, a top-down path and a transverse connection; wherein, the bottom-up path is the feedforward calculation of network Darknet, and the characteristic hierarchy structure is composed of characteristic graphs of a plurality of scales, and the scaling step length is 2; selecting the output of the last layer of the same network stage as a reference feature mapping set; the top-down paths are subjected to feature scale amplification migration fusion, and then the features are enhanced from the bottom-up paths through the transverse connection; each transverse connection merges feature maps of the same spatial size from the bottom-up path and the top-down path.

Specifically, as shown in fig. 4, in the multi-scale fusion in the embodiment, firstly, the 13 × 13 feature map output by the last layer of the Darknet network is input into the 1 st channel in the YOLO to detect a large target; secondly, the feature map with the size of 13 multiplied by 13 is transmitted to the 2 nd channel downwards, and is fused with the 26 multiplied by 26 feature map output by Darknet after the up-sampling operation, and then the secondary large target is detected; thirdly, the feature map of the 2 nd channel is continuously transmitted to the 3 rd channel, and the up-sampled feature map is fused with the 52 x 52 feature map to detect a secondary small target; and fourthly, performing last upsampling on the 52 x 52 feature map of the 3 rd channel, fusing the upsampled feature map with the 104 x 104 feature map, and then detecting the small target, thereby realizing multi-size fused feature extraction.

Based on the improved YOLOv3 network model established above, parameters are initialized by using a training set in road image data and a model obtained by training on an MSCOCO data set by using a Darknet network at a feature extraction end.

The total number of training iterations is set to 30000 times, the Adam optimization algorithm is adopted to update the weight gradient, the iteration batch size is 32, the momentum parameter is 0.9, the learning rate of previous 20000 iterations is 0.0001, and the learning rate of next 10000 iterations is reduced to 0.00005.

Specifically, the training process of the improved YOLOv3 network model is as follows:

1) importing training data: respectively generating a training file train.txt, a verification file val.txt and a test file test.txt from the sample images according to the sequence and the proportion, and importing the 3 files, the labeling data and the sample images as training data into a network;

2) selecting a pre-training model: and taking a model obtained by training the Darknet on the MSCOCO data set as a pre-training model to initialize the network parameters of the feature extraction terminal.

3) Network training: and performing network iterative training, and storing indexes such as accuracy, loss value, recall rate and the like of real-time training in a training log in the training process.

4) Finally, a trained improved YOLOv3 network model is obtained.

3. And (3) carrying out road intersection detection on the road image (test set) to be detected by adopting the trained improved YOLOv3 network model.

In order to verify the effectiveness of the algorithm, the network before and after improvement is trained and adjusted on a road intersection data set, and then a plurality of types of intersections are detected on a multi-source remote sensing image.

To accurately evaluate the improved YOLOv3 network algorithm of the present invention, the detection performance of the model is measured by the experimental integrated accuracy, recall, PR curve, Average accuracy AP (Average-Precision) and Average accuracy Average mapp.

The index calculation formula is as follows:

wherein TP is the number of road intersections detected correctly; FP is the number of road intersections with the detection result not in accordance with the actual condition; FN is the number of undetected road intersections; n is the number of the types of the road intersections; p_reThe detection accuracy rate represents the occupation rate of the road intersection which is detected correctly in all detection results; r_ecFor the recall ratio, the proportion of the number of the detected correct road intersections in all the road intersections is shown; the PR curve integrates the evaluation standards of the accuracy and the recall ratio, the recall ratio is taken as an abscissa axis, the accuracy is taken as an ordinate axis, and the area of an area enclosed by the curve and the abscissa axis is taken as an AP value; mAP is average accuracy mean value used for measuring whole measurement of networkAnd (5) detecting the effect of the test set.

Fig. 5-1 and 5-2 are PR curves drawn by accuracy and recall of models obtained by YOLOv3 network training before and after improvement, respectively, on a test set covering 7 types of road intersections of cross (crossing), t-junction (tjunction), malposition (male), X-type (xshape), Y-type (yshape), roundabout (roundabout) and multi-path (multiple).

The PR curve of FIG. 5-2 is smoother and has a detection accuracy P than the PR curve of FIG. 5-1 _reAnd recall ratio R_ecAll have certain promotion, and the average accuracy mean mAP improves 3.17%. The relative sizes of different types of road intersections in the remote sensing image are different, wherein the smallest size is the T-shaped intersection, the detection effect is improved most obviously, and the AP value is improved by 8%; for 4 types of road intersections with large sizes, such as cross intersections, dislocation intersections, X-type intersections and Y-type intersections, the AP values are respectively improved by 3.01%, 3.25%, 4.5% and 2.08%; the test results of the ring intersection and the multi-way intersection with the largest size on the two network models are similar, and the AP value is improved by less than 1%.

The experimental results show that: by taking a road intersection test set as experimental data, the improved YOLOv3 network has stronger robustness and relatively stable test results, and meanwhile, the network obviously improves the expression capability of a small-size road intersection target by enhancing the extraction of the detailed features of the target.

In order to verify the effectiveness of the algorithm in detecting the road intersection in the complex environment, images with more interference factors are selected in a test set, and the test is carried out based on a Yolov3 network model before and after improvement.

For cement roads in residential areas of villages and towns, partial road intersections are shielded by surrounding trees, tests based on a YOLOv3 network model before and after improvement are carried out on incomplete outline characteristics, and detection results are shown in FIGS. 6-1 and 6-2; the road intersection with the similar color to the adjacent building and the unobvious road intersection features under the interference of the ground object background is tested based on the Yolov3 network model before and after improvement, and the detection result is shown in fig. 7-1 and 7-2. Based on the detection result of the YOLOv3 network before improvement on 2 images, as shown in fig. 6-1 and 7-1, 9 road intersections affected by interference are not identified, and the missing rate is high; based on the detection result of the improved YOLOv3 network, as shown in FIGS. 6-2 and 7-2, the intersection with the unobvious partial contour features is accurately identified, and the intersection detection effect under the complex environment is improved.

In order to verify the applicability of the algorithm, the urban image is randomly intercepted on a Google Earth platform as test data, migration tests are respectively carried out on the basis of a YOLOv3 model before and after improvement, and the detection results are shown in FIGS. 8-1 and 8-2.

The spatial resolution of the images of fig. 8-1 and 8-2 was about 1 meter, corresponding to a solid area of 1760m × 1096m, including 4 types of 65 road intersections of cross, t, offset and X. Due to the fact that the number of the staggered X-shaped road intersections in the image is small, the problem of high index is avoided, all types of road intersections are classified into one type when the recall ratio and the average accuracy ratio are calculated, and all evaluation index statistics are shown in the table 1.

TABLE 1 YOLOv3 comparison with improved YOLOv3 detection Performance

As can be seen from table 1, compared with the YOLOv3 algorithm, the improved YOLOv3 algorithm has obvious advantages in detecting dense intersections, reduces the number of missed detections and false detections, and improves the average accuracy by 12.18%. The experimental results show that: the improved YOLOv3 algorithm can effectively transfer the characteristics of the intersection of the roads in the data set, and has strong applicability to other road images. According to the statistical result, the average accuracy of road intersection detection is high, and auxiliary information can be provided for rapid construction of a road network.

Embodiments of a detection device for road intersections

The invention also provides a road intersection detection device based on the improved YOLOv3, which is actually a computer or other equipment with data processing capability, and the equipment comprises a processor and a memory, wherein the processor is connected with an acquisition interface for acquiring a road image, the processor can be a general processor, a digital signal processor, an application specific integrated circuit and the like, and the processor is used for executing instructions to realize the road intersection detection method.

While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims

1. A road intersection detection method based on improved YOLOv3 is characterized by comprising the following steps:

1) acquiring a road image;

2) carrying out network training, and constructing an improved YOLOv3 network model; the improved YOLOv3 network model comprises two parts, namely a feature extraction end Darknet and a feature detection end yolo, wherein the feature detection end yolo comprises four channels, and in each channel, a corresponding convolution module is firstly subjected to transverse widening to generate different feature maps, and then is subjected to longitudinal aggregation;

The convolution module is a symmetrical convolution module or an asymmetrical convolution module; the convolution kernel is 3 parallel convolution layers of 3 multiplied by 3, 1 multiplied by 1 and 3 multiplied by 3 to form a symmetrical convolution module, and one 3 multiplied by 3 convolution kernel in the symmetrical convolution module is replaced by the convolution kernels of 1 multiplied by 3 and 3 multiplied by 1 to form an asymmetrical convolution module; after convolution operation is carried out on each channel, generating feature maps with different sizes; for the channels with the size of the feature map within the range of 12 multiplied by 12 to 12 multiplied by 20, an asymmetric convolution module is adopted, and the channels with other sizes are provided with symmetric convolution modules;

when the characteristic extraction is carried out, a multi-scale fusion method is adopted, and the method specifically comprises the following steps: firstly, inputting a feature map with the size of 13 multiplied by 13 output by the last layer of Darknet network into the 1 st channel in YOLO to detect a large target; secondly, the feature map with the size of 13 multiplied by 13 is downwards transmitted to a 2 nd channel, and is fused with the feature map with the size of 26 multiplied by 26 output by the Darknet network after the up-sampling operation, and then the secondary large target is detected; thirdly, the feature map of the 2 nd channel is continuously transmitted to the 3 rd channel, and the feature map with the size of 52 multiplied by 52 is fused after upsampling to detect a secondary small target; fourthly, performing last upsampling on the feature map with the size of 52 multiplied by 52 of the 3 rd channel, fusing the upsampled feature map with the size of 104 multiplied by 104 of the 4 th channel, and then detecting the small target, thereby realizing feature extraction after multi-scale fusion;

2. The improved YOLOv 3-based intersection detection method according to claim 1, wherein the activation function of the feature extraction end is as follows:

wherein x is_iIs a characteristic value;

mu is a momentum factor, epsilon is a learning rate, epsilon is a constant, a_iIs 0.25, and the variation range is between (0, 1).

3. The improved YOLOv 3-based intersection detection device comprises a processor and a memory, wherein the processor is connected with a collection interface for acquiring a road image; wherein the processor executes the following method instructions stored in the memory:

1) acquiring a road image;

the convolution module is a symmetrical convolution module or an asymmetrical convolution module; 3 parallel convolution layers with convolution kernels of 3 x 3, 1 x 1 and 3 x 3 form a symmetrical convolution module, and one 3 x 3 convolution kernel in the symmetrical convolution module is replaced by the 1 x 3 and 3 x 1 convolution kernels to form an asymmetrical convolution module; after convolution operation is carried out on each channel, characteristic graphs with different sizes are generated; for the channels with the size of the feature map within the range of 12 multiplied by 12 to 12 multiplied by 20, an asymmetric convolution module is adopted, and the channels with other sizes are provided with symmetric convolution modules;

When the characteristic extraction is carried out, a multi-scale fusion method is adopted, and the method specifically comprises the following steps: firstly, inputting a feature map with the size of 13 multiplied by 13 output by the last layer of Darknet network into the 1 st channel in YOLO to detect a large target; secondly, the feature map with the size of 13 multiplied by 13 is downwards transmitted to a 2 nd channel, and is fused with the feature map with the size of 26 multiplied by 26 output by the Darknet after the up-sampling operation, and then the secondary large target is detected; thirdly, the feature map of the 2 nd channel is continuously transmitted to the 3 rd channel, and the feature map with the size of 52 multiplied by 52 is fused after upsampling to detect the secondary small target; fourthly, performing last upsampling on the feature map with the size of 52 multiplied by 52 of the 3 rd channel, fusing the upsampled feature map with the size of 104 multiplied by 104 of the 4 th channel, and then detecting the small target, thereby realizing feature extraction after multi-scale fusion;

4. The improved YOLOv 3-based intersection detection device according to claim 3, wherein the activation function of the feature extraction end is:

wherein x is_iIs a characteristic value;