CN113408410A

CN113408410A - Traffic sign detection method based on YOLOv4 algorithm

Info

Publication number: CN113408410A
Application number: CN202110676065.8A
Authority: CN
Inventors: 彭军; 龚宇; 李小兵; 杨志; 谭玉春; 许可
Original assignee: Chongqing University of Science and Technology
Current assignee: Chongqing University of Science and Technology
Priority date: 2021-06-18
Filing date: 2021-06-18
Publication date: 2021-09-17

Abstract

The invention discloses a traffic sign detection method based on a YOLOv4 algorithm, which relates to the technical field of images, and comprises the following steps: model training is carried out on the preprocessed traffic sign images through an original YOLOv4 algorithm; through training iteration, the model with the optimal model training parameters and the minimum loss function is stored; the images in the test set are tested through the trained model, and finally the frame with the highest confidence coefficient is selected to be output, so that the detection of the traffic sign is completed.

Description

Traffic sign detection method based on YOLOv4 algorithm

Technical Field

The invention relates to the technical field of image detection, in particular to a traffic sign detection method based on a YOLOv4 algorithm.

Background

Road traffic safety is a common concern for people worldwide, and approximately 125 thousand deaths annually result from traffic accidents. However, study [1] showed that the driver was instructed 1.5 seconds before the occurrence of the traffic accident, which could reduce the traffic accident by nearly 90%. Therefore, the target detection algorithm for the road traffic sign is particularly important, and the detection algorithm detects the target in real time. The traditional target detection algorithm is mainly divided into three steps, namely candidate region segmentation, feature extraction and image candidate region detection, and the prior art has a traffic sign detection algorithm based on SIFT features, extracting a characteristic region from an input image, calculating the characteristic region by adopting an SIFT descriptor, or effectively detecting traffic signs in different shapes by a traffic sign detection algorithm of a deformation component (DPM) of HOG and classification of an SVM classifier, however, the traditional detection algorithm needs manual pre-extraction of features, consumes a lot of time, is easy to miss detection, and the image quality is required to be higher, and the ideal recognition accuracy rate is difficult to achieve, in recent years, with the deep learning widely applied to the image processing field, although the target detection algorithm based on the convolutional neural network has achieved great success, the target detection algorithm based on the convolutional neural network still needs to be improved in real-time.

Therefore, the invention discloses a traffic sign detection method based on a YOLOv4 algorithm, compared with the prior art, the invention can improve the detection speed on the premise of ensuring the calculation cost, the invention provides a method for improving the original YOLOv4 trunk extraction network through deep separable convolution to obtain a new trunk extraction network, and the designed experiment result shows that the mAP of the improved YOLOv4 model and the original YOLOv4 model on a CSTD traffic sign data set only differs by 0.88 percent, but the detection speed is improved by nearly 3 times, and the parameter quantity of the models is reduced to a certain extent.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a traffic sign detection method based on a YOLOv4 algorithm.

The invention is realized by the following technical scheme: a traffic sign detection method based on YOLOv4 algorithm, the method comprising the steps of:

model training is carried out on the preprocessed traffic sign images through an original YOLOv4 algorithm;

through training iteration, the model with the optimal model training parameters and the minimum loss function is stored;

and testing the images in the test set through the trained model, and finally selecting the frame with the highest confidence coefficient for output to finish the detection of the traffic sign.

Preferably, the backbone extraction network CSPDarknet of the original YOLOv4 algorithm is processed by a depth separable convolution, said processing comprising the steps of:

step 100: the method comprises the steps of performing depth separable convolution on a multi-channel feature map in an original algorithm, specifically, performing convolution on each channel by adopting a convolution kernel of 3 multiplied by 3, and decomposing the multi-channel feature map into a single-channel feature map;

step 200: the feature map decomposed into the single channel is convolved again by using the convolution kernel of 1 × 1 to adjust the number of channels, and a second feature map is output.

Preferably, the traffic image is embodied as an indication sign, a prohibition sign and a warning sign.

The invention discloses a traffic sign detection method based on a YOLOv4 algorithm, which is compared with the prior art that:

the invention discloses a method for detecting actual life traffic signs based on a YOLOv4 target detection algorithm, which is characterized in that on the basis of an original YOLOv4 algorithm, the network structure of a trunk extraction network CSPDarknet53 is improved by means of the idea of deep separable convolution, and input images are respectively subjected to channel-by-channel convolution and point-by-point convolution to obtain a new trunk characteristic extraction network, the mAP value of the improved YOLOv4 network model for detecting three types of traffic signs reaches 92.63 percent, compared with the original YOLOv4 network, the mAP is only reduced by 0.82 percent, through comparison, the improved YOLOv4 algorithm can achieve the detection accuracy rate of remote small traffic signs, the detection speed of the improved YOLOv4 model is improved by nearly 3 times, and the number of model parameters is greatly reduced.

Drawings

FIG. 1 is a diagram of a depth separable convolution;

FIG. 2 is a modified network architecture diagram;

FIG. 3 is a diagram of a traffic sign detection process;

FIG. 4 is a schematic exterior view of an embodiment of a traffic sign;

FIG. 5a is a comparison graph of the directory AP value;

FIG. 5b is a schematic comparison of the prohibitory AP value;

FIG. 5c is a comparison graph of warning AP values.

Detailed Description

The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.

The invention discloses a traffic sign detection method based on a YOLOv4 algorithm, which comprises the following steps:

For convenience of understanding, the YOLOv4 is introduced in detail in the invention, the YOLOv4 algorithm is an enhanced version obtained by improvement on the basis of a single-stage target detection YOLOv3 algorithm, although there is no qualitative change in the development process of the target detection algorithm, on the premise that the FPS is not reduced, the accuracy is improved obviously, as the single-stage target detection algorithm, three feature layers with different scales are used for classification and regression prediction, a feature graph for detection in each scale is divided into grids of S × S, and if the central coordinate of a real labeled frame of a certain target is in the grids, the grids are responsible for the detection of the target. The YOLOv4 algorithm improves upon the YOLOv3 algorithm in four main ways compared to the YOLOv3 algorithm.

Firstly, the method comprises the following steps: the improvement of the trunk feature extraction network is that Darknet53 in the YOLOv3 algorithm is modified into CSPDarknet53, the CSPnet structure splits the stack of the original residual blocks into a left part and a right part, the trunk part continues to stack the original residual blocks, and the other part is directly connected to the end after a small amount of processing.

Secondly, the method comprises the following steps: the improvement of the feature enhancement network uses an SPP structure and a PANET structure in the feature pyramid part, the SPP structure can greatly increase the receptive field, the most obvious context features are separated, and the PANET structure has the obvious characteristic that the features are repeatedly extracted to obtain richer feature information.

Thirdly, the method comprises the following steps: the improvement of training skill, the robustness of the network is increased by using a masoic data enhancement mode during training, and CIOU is used as a loss function, so that the problems of divergence and the like in the training process are avoided, and the regression of the target frame is more stable.

Fourthly: and the improvement of the activation function uses a Mish activation function to replace an LeakReLU activation function, so that the accuracy and the generalization of the model are improved.

The method also processes the CSPDarknet of the backbone extraction network of the original YOLOv4 algorithm through the deep separable convolution, compared with the standard convolution, the deep separable convolution decomposes the convolution operation into two parts, namely channel-by-channel convolution and point-by-point convolution, and each characteristic channel in the channel-by-channel convolution Depthwise uses a specific convolution kernel to extract characteristics; the Pointwise convolution Pointwise performs secondary feature extraction on the feature map in the channel-wise convolution by using N1 × 1 convolution kernels to obtain a final feature map, and the depth separable convolution structure is shown in fig. 1.

Specifically, assume that the size of the input feature map in the conventional convolutional layer is D_x×D_xX M, size of convolution kernel D_y×D_yX N, size of output characteristic diagram is D_z×D_z×N，D_xAnd D_zRepresenting the width and height of the input and output profiles, respectively, D_yThe method refers to the space dimension of a convolution kernel, and M and N respectively represent the number of channels of an input characteristic diagram and an output characteristic diagram. The calculation formula is as follows:

the conventional convolution parameters are:

D_y×D_y×M×N (1)

the conventional convolution calculations are:

D_x×D_x×M×N×D_y×D_y (2)

the depth separable convolution parameters are:

D_y×D_y×M×N+D_y×D_y×N (3)

the depth separable convolution calculated quantity is:

D_x×D_x×M×D_y×D_y+D_x×D_x×M×N (4)

the ratio of the parameter quantities of the depth separable convolution to the conventional convolution is:

in order to make the YOLOv4 algorithm more suitable for detecting traffic signs, further improve the real-time performance of detection and meet the requirement of detecting traffic signs for drivers in real life, the trunk extraction network CSPDarknet of the original YOLOv4 network is improved by using the deep separable convolution, the multichannel feature map in the original algorithm is convolved by using the deep separable convolution, each channel is firstly convolved by using a convolution kernel of 3 × 3 to be decomposed into a feature map of a single channel, then the feature map of the single channel is convolved by using a convolution kernel of 1 × 1 to adjust the number of the channels, the feature map is output, and the parameter number and the calculation amount of the model are greatly reduced. The improved network structure is shown in fig. 2 below.

For more rigorous reasons, the present invention also discloses specific experiments in the examples to illustrate the examples:

the experimental framework structure is as follows: when the improved YOLOv4 detection algorithm is used for detecting the traffic sign, firstly, configuration files in the algorithm are modified, detection categories are modified into three categories according to experimental requirements, then a light-weighted trunk extraction network replaces the original CSPDarknet-53 and is added into the network, network parameters are reasonably modified according to the actual use condition of experimental equipment, and finally, model training is carried out on pictures in a training set, the improved traffic sign detection process is shown in a figure 3, firstly, the improved YOLOv4 algorithm is used for carrying out model training on a preprocessed traffic sign image, and through training iteration, a model with optimal model training parameters and minimum loss functions is stored. And testing the images in the test set by using the trained model, and finally selecting the frame with the highest confidence coefficient for output. The experiment identifies three types of common traffic signs in daily life, specifically an indication sign, a prohibition sign and a warning sign.

Preparing an experimental data set, wherein the data set selected in the experiment is a China traffic sign CTSD data set issued by a Chinese academy of sciences, and three traffic signs are mainly detected and respectively used as an indication sign (autonomity), a prohibition sign (prohibitory) and a warning sign (warning). Three types of traffic sign examples are shown in fig. 4, wherein a CTSD data set is composed of 1100 images acquired under different scenes and weather conditions, and because training samples of the data set are too few to meet requirements of model training, about 9000 images containing traffic signs are acquired through real life and network resources on the basis of an original data set in the experiment, the image quality of the data set is improved through region clipping, median filtering denoising, color histogram equalization and other processing, then the traffic signs in the data set are labeled according to a production format of a Pascal voc 2007 data set to generate corresponding XML files, and finally the data set is divided into a training set and a test set according to a ratio of 9: 1.

The experimental procedures and results were analyzed as follows.

Respectively training a Yolov4 network and an improved Yolov4 network under a Windows10 operating system to obtain a final model, wherein hardware parameters of a computer used for an experiment are detailed in a table 1;

TABLE 1 Experimental computer configuration parameters

Initial parameters of model training as shown in table 2 below, the learning rate was reduced to 10% of the initial learning rate when the number of training times reached 50.

TABLE 2 Experimental initial parameters

The evaluation indexes of the experiment on the performance of the detection model comprise an AP (access precision) value and a detection rate. The AP value is the area under the curve resulting from the combination of Precision and Recall. Accuracy Precision refers to the proportion of the part that is predicted to be a positive class and is indeed a positive class to the total number of predicted to be positive classes. Recall recalling refers to the proportion of the portion predicted to be positive and indeed positive to the total number of all positive classes. The Precision and Recall calculation formulas are as follows:

in the formula, tp (true positive) indicates the number of correctly detected positive samples, fp (false positive) indicates the number of detected negative samples, and fn (false positive) indicates the number of detected positive samples.

The images in the training set are trained through an original YOLOv4 algorithm and an improved YOLOv4 algorithm, and the obtained detection model tests the images in the test set to obtain the following three types of traffic sign AP value comparison graphs as shown in fig. 5.

The average accuracy and detection rate obtained are shown in table 3 below, and the mAP value is obtained by dividing the average accuracy of all classes by the average accuracy of all classes.

TABLE 3 original YOLOv4 Algorithm and improved YOLOv4 Algorithm model test results

As can be seen from fig. 5 and table 3, the mapp of the original YOLOv4 model is 93.45%, the mapp of the improved YOLOv4 model is 92.63%, by comparison, the difference in accuracy between the YOLOv4 model before and after improvement is only 0.82%, the difference is not large, but the FPS of the model detection after improvement is increased from 11 to 31, and the number of the model parameters after improvement is reduced from 6.17 × 107 to 4.12 × 107, which reduces the number of the original parameters by about 2/3, and greatly reduces the model computation. The improved YOLOv4 model can improve the model detection speed and reduce the operation cost on the basis of ensuring the detection accuracy of the original YOLOv4 algorithm. Fig. 5 shows the detection effect of the improved traffic sign detection model. According to the detection graph, the algorithm can accurately detect three different types of traffic signs, and the prediction box can correctly frame the traffic signs. The traffic sign detection method can effectively help a driver to detect the traffic sign according to real-time road conditions in actual driving.

In summary, compared with the prior art, the invention introduces a method for detecting actual life traffic signs based on a YOLOv4 target detection algorithm, and based on an original YOLOv4 algorithm, the network structure of a trunk extraction network CSPDarknet53 is improved by means of the idea of deep separable convolution, and channel-by-channel convolution and point-by-point convolution operations are respectively performed on an input image to obtain a new trunk feature extraction network, the improved YOLOv4 network model has an mAP value reaching 92.63% for detecting three types of traffic signs, and compared with the original YOLOv4 network, the mAP is reduced by only 0.82%. By contrast, the improved YOLOv4 algorithm can achieve the detection accuracy of the remote small traffic signs. And the detection speed of the improved YOLOv4 model is improved by nearly 3 times, and the parameter quantity of the model is greatly reduced.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be able to cover the technical scope of the present invention and the equivalent alternatives or modifications according to the technical solution and the inventive concept of the present invention within the technical scope of the present invention.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A traffic sign detection method based on a YOLOv4 algorithm is characterized by comprising the following steps:

2. The method of claim 1, wherein the backbone extraction network CSPDarknet of the original YOLOv4 algorithm is processed by deep separable convolution, the processing comprising the steps of:

3. The method as claimed in claim 2, wherein the traffic image is a sign, a prohibition sign and a warning sign.