CN116597413A - Real-time traffic sign detection method based on improved YOLOv5 - Google Patents

Real-time traffic sign detection method based on improved YOLOv5 Download PDF

Info

Publication number
CN116597413A
CN116597413A CN202310480371.3A CN202310480371A CN116597413A CN 116597413 A CN116597413 A CN 116597413A CN 202310480371 A CN202310480371 A CN 202310480371A CN 116597413 A CN116597413 A CN 116597413A
Authority
CN
China
Prior art keywords
traffic sign
sign detection
model
yolov5
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310480371.3A
Other languages
Chinese (zh)
Inventor
罗悦晨
慈玉生
魏晓丽
蒋世鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202310480371.3A priority Critical patent/CN116597413A/en
Publication of CN116597413A publication Critical patent/CN116597413A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an improved YOLOv 5-based real-time traffic sign detection method, which comprises the following steps: constructing a traffic sign data set; improving the YOLOv5 model to obtain an original traffic sign detection model; training an original traffic sign detection model based on the traffic sign dataset; and detecting the information of the traffic sign in real time based on the trained traffic sign detection model. The application improves the detection precision as much as possible while the network is kept light, so that the new model has high precision and high speed; the cloud platform computing resource allocation method can be deployed to the edge equipment, and the problem of cloud platform computing resource shortage is solved.

Description

Real-time traffic sign detection method based on improved YOLOv5
Technical Field
The application belongs to the technical field of intelligent traffic sign detection, and particularly relates to an improved YOLOv 5-based real-time traffic sign detection method.
Background
In recent years, the application of the target detection technology in the intelligent driving field is increasing, wherein the research of the traffic sign detection technology is receiving a great deal of attention. The main measurement indexes of the practicability of the traffic sign detection technology are the detection speed and the detection precision, in the existing research results, a plurality of algorithms have been well expressed, for example, a traffic sign detection method and a system based on a DL-SSD model are proposed in Chinese patent document with publication number of CN114882469A, the receptive field of images is enlarged, and a position channel attention mechanism PCA is used for multiplying the feature images by a weight matrix to obtain local features and the like, so that the precision is improved to about 85%; the chinese patent publication No. CN114863384a proposes a traffic sign detection technology based on YOLOv4 algorithm, which increases residual structure and highlights traffic sign features by adding an attention mechanism network SENet in the backbone network and the detection module of the YOLOv4 model to improve detection accuracy.
Although some traffic sign detection algorithms can achieve good effects, the algorithms are often oriented to good detection environments, but the real traffic sign detection environments are often interfered by various external conditions, such as weather conditions, poor lighting conditions, shielding degradation of traffic signs and the like, and when the algorithms are oriented to bad detection environments, the recognition effect of the algorithms is often greatly reduced. Moreover, most of the researches of students take cloud computing as a research background, the intelligent driving technology has extremely high requirements on real-time performance, the high real-time performance means generation, transmission and processing of massive data, and when a centralized cloud server faces a large amount of data, hysteresis processing and unstable delay are easy to occur, and no doubt, the security of the intelligent driving technology is required to be threatened greatly.
In summary, although the existing deep learning method achieves some results in the traffic sign detection task, the existing deep learning method still has certain limitations facing the complex natural environment and real-time edge detection. For example, deployed on edge platforms, reasoning speed is lower, accuracy and lower.
Therefore, for the application scene of high-speed, high-intelligence and high-real-time requirements such as intelligent driving, research on detection technology with higher precision and speed is still significant.
Disclosure of Invention
In order to solve the technical problems, the application provides an improved YOLOv 5-based real-time traffic sign detection method, which can improve the detection precision as much as possible while the network is kept light, so that a new model has high precision and high speed.
In order to achieve the above object, the present application provides a real-time traffic sign detection method based on improved YOLOv 5; comprising the following steps:
constructing a traffic sign data set;
improving the YOLOv5 model to obtain an original traffic sign detection model;
training an original traffic sign detection model based on the traffic sign dataset;
and detecting the information of the traffic sign in real time based on the trained traffic sign detection model.
Optionally, improving the YOLOv5 model comprises: and adding a coordinate attention mechanism module into the backbone network of the YOLOv5 model.
Optionally, improving the YOLOv5 model further comprises: an EIoU loss function is used.
Optionally, the backbone network of the YOLOv5 model after joining the coordinate attention mechanism module includes: conv module, C3 module, coordinate attention mechanism module, and SPPF module connected in sequence.
Optionally, the EIoU loss function is:
wherein b, b gt The center points of the prediction frame and the label frame are respectively; w (w) gt ,h gt W and h are the width and height of the label frame and the width and height of the prediction frame respectively; ρ represents calculating the center point distance of the two boxes; c is the furthest distance between the two frame boundaries; c (C) w And C h Respectively representing the width and height of the smallest circumscribed frame covering two boxesThe method comprises the steps of carrying out a first treatment on the surface of the IoU the overlap ratio between the real frame and the predicted frame.
Optionally, training the original traffic sign detection model includes: training the coordinate attention mechanism module;
training the coordinate attention mechanism module includes:
the input traffic sign data is firstly subjected to a residual module and then divided into two parts, wherein one part is respectively subjected to average pooling in the X, Y direction, then is spliced by Concat, is subjected to Conv2d convolution, is subjected to batch normalization and nonlinear operation, is subjected to Conv2d and an activation function Sigmoid, and is finally connected with the other part to be used as output.
Optionally, training the original traffic sign detection model further includes:
and comparing the confidence coefficient, the target frame coordinates and the category information output by the original traffic sign detection model with the true value of the sample image of the traffic sign, and correcting the original traffic sign detection model.
Optionally, the traffic sign dataset comprises a training set and a test set; wherein the test set comprises: an original comprehensive test set, a bad weather test set and a night test set.
Compared with the prior art, the application has the following advantages and technical effects:
(1) The network is light, and meanwhile, the detection precision is improved as much as possible, so that the new model has high precision and high speed; (2) The deep learning algorithm can be deployed to the edge equipment, so that the problem of shortage of computing resources of the cloud platform is solved; (3) The method can also improve the detection speed through the edge equipment, and improve the safety of automatic driving and auxiliary driving vehicles.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:
FIG. 1 is a schematic flow chart of an improved traffic sign detection method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a model structure of an improved traffic sign detection method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a module of the coordinate attention mechanism Coordinate Attention according to an embodiment of the present application;
fig. 4 is a schematic diagram of a Jetson Nano structure of an inference apparatus according to an embodiment of the present application;
fig. 5 is a schematic diagram of YOLOv5 structure according to an embodiment of the present application.
Detailed Description
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
The application provides an improved YOLOv 5-based real-time traffic sign detection method, which comprises the following steps:
1. in the trunk network part of the YOLOv5, a coordinate attention mechanism Coordinate Attention (CA for short) is added, and Conv modules, C3 modules, CA modules and SPPF modules are sequentially connected to form the trunk network part of the YOLOv5 neural network;
further, in step 1, the coordinate attention mechanism Coordinate Attention module firstly inputs the data and then passes through a residual module, then is divided into two parts, one part is respectively subjected to X, Y-direction average pooling, then is spliced by a Concat, then is convolved by a Conv2d, then is subjected to batch normalization and nonlinear operation, then is connected with the activation function Sigmoid by a Conv2d, and finally is connected with the other part to be used as output, and the detailed process is shown in fig. 3.
2. In the loss function of YOLOv5, the original positioning loss function CIoU is changed into EIoU, and the loss function mechanism is modified;
further, in step 2, the EIoU loss function is calculated as follows:
wherein b, b gt The center points of the prediction frame and the label frame are respectively; w (w) gt ,h gt W and h are the width and height of the label frame and the width and height of the prediction frame respectively; ρ represents calculating the center point distance of the two boxes; c is the furthest distance between the two frame boundaries; cw and Ch represent the width and height, respectively, of the smallest circumscribed frame covering the two boxes. IoU as a calculation index commonly used in the positioning loss, it represents the overlapping proportion between the real frame and the predicted frame, and its expression is as follows:
wherein b is the center point of the prediction frame, b gt Is the center point of the label frame.
3. Sequentially connecting the improved YOLOv5 input network, the YOLOv5 main network, the YOLOv5 neck network and the YOLOv5 output network, wherein the input pictures enter an output end, information such as confidence level, target frame coordinates, category and the like is obtained through the output end, and the information is compared with a true value of a sample image of a traffic sign through a series of operations to correct a YOLOv5 traffic sign detection model;
4. dividing the traffic sign data set according to a fixed proportion, taking each traffic sign sample image as input, and deeply training the improved YOLOv5 network model by using a pre-training weight to finally obtain an improved YOLOv5 traffic detection model;
5. and packaging the trained final model, deploying the model on NVIDIA Jetson Nano, capturing a front street view of the automobile during driving by using a camera, carrying out real-time reasoning, and detecting the type and the position of the traffic sign.
Further, for the selected traffic sign data set, the test sets are subdivided into three test sets: the test set comprises an original test set, a bad weather test set and a night test set.
Further, during training, the input image size is set to be 640x640, and the methods of Mosaic, mixUp, self-adaptive picture scaling, self-adaptive anchor frame calculation and the like are adopted to process the training set image to a certain extent.
Further, in order to verify the feasibility of the algorithm of the embodiment, performing performance test on the improved YOLOv5 neural network, obtaining average accuracy mAP on a test set, average accuracy Precision (P), average Recall (R), and reasoning time on Jetson Nano;
the method comprises the following steps: calculated according to the following formula:
where TP refers to what is predicted to be a positive sample but is actually a positive sample; FP refers to a positive sample predicted but actually a negative sample; FN refers to a negative sample predicted but actually a positive sample; AP refers to average accuracy; m refers to the preset number of categories.
After the target model is obtained, the target model can be used for image detection of the target to be detected.
Further, the improved network is first subjected to an ablation experiment on the original test set, and the result is shown in table 1, and it can be seen that [email protected] of the improved YOLOv5 model reaches 78.2%, and compared with the original YOLOv5 network, precision, recall and [email protected] are respectively improved by 1.7%, 4.5% and 4.3%, so that good effects are obtained, and in addition, each part of improvement has a certain improvement in precision relative to the original network.
TABLE 1
Further, in order to further verify the detection accuracy, detection speed and comprehensive performance effect of the improved algorithm on different types of test sets and on a model memory, five target detection algorithms of Yolov5n, yolov5s, yolov5-ghost, yolov5-shuffle 5, yolov 5-mobiletv 3 and Yolov 3-tini are selected in the embodiment, three data sets divided in the embodiment are compared in a comparison experiment, 640x384 resolution is selected, four indexes of reasoning time on Precision, recall, [email protected] and Jetson Nano are selected to evaluate each algorithm, and the results of the comparison experiment are shown in table 2.
It can be seen that the mAP of the model provided by the embodiment in the original test set, the bad weather test set and the night test set is 78.2%,91.6%,81.9% respectively, and the reasoning time is 58ms. Although mAP is the highest to Yolov5s, the detection speed of the model on jetson nano is slow, which is about 70ms different from the algorithm of the embodiment; the reasoning time of the Yolov5n on the jetson nano is minimum, but the accuracy of the algorithm is smaller than that of the algorithm in the embodiment on three test sets, the accuracy of the algorithm in the embodiment on the original test set is improved by 4.3% compared with that of the Yolov5n, the detection speed is reduced by only 4ms, and meanwhile, the algorithm in the embodiment shows better effects on the test set with bad weather and the night, and the accuracy is 3.9% and 5.4% higher than that of the Yolov5 n; for other algorithms of the experiment, the algorithm of the embodiment shows better performance. The algorithm of the embodiment ensures that the detection accuracy is improved to a certain extent while the embedded equipment detects in real time, and the algorithm has excellent performance in extreme weather conditions and bad night light conditions, and proves the feasibility.
TABLE 2
YOLOv5 is currently the most mainstream single-stage target detection algorithm, and according to the network depth and the difference of the depth, the single-stage target detection algorithm can be divided into five models, namely YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l and YOLOv5x, and as the model is to be deployed on the embedded device, the embodiment selects the YOLOv5n with the smallest volume as the basic network model.
Fig. 5 is a network structure diagram of YOLOv5, and it can be seen that YOLOv5 can be divided into 4 parts: input, backBone, neg, prediction.
Firstly, input enriches a data set through Mosaic data enhancement, has low requirements on hardware equipment, and finally inputs 640x640 standard-sized images.
Then, the backhaul part mainly consists of Conv, C3 and SPPF modules, which are the cores of the network and are responsible for extracting the features in the image. The Conv module is used for splicing Conv2d with BN and Swish activation functions, and replaces the first Focus module, so that the operation efficiency is higher; the C3 module refers to the idea of CSPNet, replaces an early Bottleneck CSP module, divides the features into two paths, and splices (Concat) the two paths after certain treatment, wherein the number of Bottleneck of one path is determined by the network depth in the model; the SPPF is improved on the basis of SPP, and a plurality of small-size pooling cores are used for cascading instead of a single large-size pooling core in an SPP module, so that the running speed is further improved under the condition that the original functions are reserved, namely, feature images of different receptive fields are fused, and the expression capability of the feature images is enriched.
In addition, the Neck part mainly comprises FPN and PANet for fusing the characteristic information of different scales, and on the basis of FPN, PANet introduces a bottom-up path, so that the bottom-up characteristic fusion can be carried out after the top-down characteristic fusion, and thus the position information of the bottom layer can be transferred to the deep layer, thereby enhancing the positioning capability on a plurality of scales.
Finally, the prediction part expands the channel number of feature graphs with different scales obtained from Neck through Conv, and the expanded features comprise the abscissa, the ordinate, the width, the height and the confidence of the central point of the prediction frame.
The loss function of YOLOv5 consists of three parts, classification loss, positioning loss, confidence loss:
classification loss is used for calculating whether the Anchor and the corresponding calibration classification are correct.
Localization loss the error in position between the prediction and calibration frames.
Confidence loss calculates the Confidence error of the network.
Yolov5 calculated classification loss using BCEWithLogitsLoss (L obj ) And confidence loss (L cls ) The calculation formula is as follows:
a commonly used computational index for the loss of localization in Yolov5 is IoU, which represents the overlap ratio between the real and predicted frames, expressed as:
the original Yolov5 uses CIoU as a positioning loss function, and adds an influence factor av on the basis of the penalty of DIoU, wherein the factor takes the aspect ratio of a prediction frame and the aspect ratio of a real frame into consideration, namely, the penalty term of CIoU is as follows:
therefore, the loss calculation formula of CIoU is:
where
wherein b, b gt The center points of the prediction frame and the label frame, w gt 、h gt W, h are the width and height of the label frame and the width and height of the predicted frame, respectively, ρ represents the distance of the center points of the two frames calculated, i.e., d in the lower graph, and c is the furthest distance of the two frame boundaries.
YOLOv5 algorithm based on Coordinate Attention attention module
Visual attention mechanisms are a special brain signal processing mechanism for human vision. Due to the bottleneck in information processing, a human being may select a portion of all information while ignoring other visible information. Similar to the selective visual attention mechanism of humans, the core purpose of the attention mechanism in neural networks is to select information that is more relevant to the current task. By introducing an attention mechanism, each part of the input is given different weights to strengthen important information, concentrate on the part which is more important currently, and reduce attention information on the rest of information, thereby reducing calculation load and improving model performance. In this embodiment Coordinate Attention [19] is added to the backbone network.
CA is a neural network attention mechanism proposed by Hou et al in 2021, which not only captures cross-channel information, but also contains direction-aware and position-active information, so that a model can more accurately locate and identify a target area, and traffic signs of small targets can be more finely located, and the accuracy of the model can be effectively improved, and meanwhile, only a small amount of calculation is brought.
The CA encodes the channel relation and long-term dependence through accurate position information, and the specific operation is divided into 2 steps of embedding the Coordinates information and generating Coordinate Attention. The structure of which is shown in figure 3.
First is the incorporation of the Coordinate information. The global pooling method is generally used for global encoding of channel attention encoded spatial information, but it makes it difficult to save location information because it compresses the global spatial information into channel descriptors. To enable the attention module to capture remote spatial interactions with precise location information, global pooling is solved into a pair of dimensional feature encoding operations according to the following formula:
given an input X, each channel is first encoded along the horizontal and vertical coordinates, respectively, using a scaling kernel of size (H, 1) or (1, W). Thus, the output of the c-th channel of height h can be expressed as:
likewise, the output of the c-th channel of width w can be written as:
then Coordinate Attention. After transformation in information embedding, the part subjects the above transformation to a concatate operation, and then subjects it to a transformation operation using a convolution transformation function:
f=δ(F 1 ([z h ,z ω ]))
g h =σ(F h (f h ))
g ω =σ(F ω (f ω ))
finally, the output Y of Coordinate Attention Block can be written as:
improvement of loss function
Although the CIoU adopted by the original algorithm can accelerate the regression speed of the predicted frame to a certain extent, some problems still exist, in the regression process of the predicted frame, once the aspect ratio of the predicted frame and the real frame presents the advance proportion, the predicted frames w and h cannot be increased or reduced at the same time, and regression optimization cannot be continued. Thus, the present embodiment uses an EIoU loss function instead of the CIoU loss function.
The EIoU is calculated as Formula X, where w and h are the width and height of the smallest circumscribed frame covering the real frame of the prediction frame. The method comprehensively considers the distances among the overlapped areas, the center points, the actual distances among the actual center points and the actual differences among the width and the height, and simultaneously introduces Focal Loss to solve the problems in the CIoU Loss function, so that the model converges faster, the regression process is more stable, and the regression accuracy of the prediction frame is improved.
In order to make the technical scheme of the present application better understood by those skilled in the art, the technical scheme of the present application will be described in further detail with reference to the accompanying drawings and examples.
As shown in fig. 1, the present embodiment provides an improved traffic sign detection method based on improved YOLOv5 and Jetson Nano, and the main idea of the technical scheme of the embodiment is as follows: obtaining a standard data set of traffic signs, wherein the data set comprises a training set image and a test set image, and all images are marked with the position of a target frame and traffic sign category information; according to the test set, a certain treatment is carried out to obtain three test sets under different scenes, namely an original comprehensive test set, a weather bad test set and a night test set; then, obtaining a network model of YOLOv5, utilizing a Mosaic and other means to enhance data, and adopting a K-means algorithm to re-obtain the proper anchor frame size; modifying the YOLOv5 model, introducing a coordinate attention mechanism Coordinate Attention in a backbone network of the original model, and keeping the rest part unchanged to obtain a target training type; training and optimizing the target model through the training set, setting parameter training in the form of SGD+Momentum, and modifying a positioning loss function into EIoU; completing the optimization process to obtain a traffic sign detection model; and building a Pytorch deep learning frame on the Jetson Nano, deploying a trained network model on the Pytorch deep learning frame, simultaneously installing and fixing embedded equipment on a vehicle, operating the vehicle, capturing a front image through a specific camera, processing the image in real time, and carrying out reasoning operation on the image by utilizing an improved YOLOv5 model to obtain information of a sign to be detected of an actual scene, namely the position and the category of a target traffic sign.
The specific implementation steps provided in this embodiment are as follows:
(1) A CCTSDB data set is selected, which is manufactured by a university of Changsha university comprehensive transportation big data intelligent processing Hunan province key laboratory Zhang Jianming teacher team, and comprises text files of traffic sign positions and class labels in images, wherein the text files are stored in a YOLO format (class, x, y, w and h), the class labels are represented by class labels, x represents a bbox center abscissa and image width ratio, y represents a bbox center ordinate and image height ratio, w represents a bbox width and image width ratio, and h represents a bbox height and image height ratio, and the CCTSDB data set comprises. The method comprises the steps of dividing the test sets according to the original proportion, screening the test sets, and re-dividing the test sets to obtain three test sets: the test set comprises an original test set, a bad weather test set and a night test set, wherein 1500 original test set images and 325 bad weather 423 and Zhang Yewan images are displayed. The CCTSDB image is a Chinese street view picture shot under some automobile data recorders, the data set covers traffic sign images under various traffic environments, and the CCTSDB image is more in line with the real traffic scenes, so that the detection result is more persuasive and contrasting.
(2) In the model input stage, a network model of YOLOv5 is obtained, and then a plurality of data processing methods, such as Mosaic, mixUp and adaptive picture scaling Adaptive image scaling, are adopted to unify all pictures in a training set and a testing set to 640X640 size. In addition, if the anchor differs significantly from the target size, the K-means algorithm is used to find the most appropriate anchor size and use it for training.
(3) The network model of YOLOv5 is modified, coordinate Attention is added in a backbond, the rest part is kept unchanged, meanwhile, the positioning loss function is changed into E-IoU, a target training model is generated, a modified network model diagram is shown in fig. 2, and an added Coordinate Attention module structure is shown in fig. 3.
(4) And training model parameters by using an SGD+Momentum optimization algorithm on the training set, and finally obtaining the target traffic sign detection model.
In order to apply the depth detection algorithm to an automatic driving vehicle and reduce the equipment operation amount and balance the detection speed and accuracy, the embodiment also provides an improved traffic sign detection technology based on YOLOv5 and Jetson Nano, NVIDIA Jetson Nano is installed on the vehicle, real-time traffic sign detection is carried out in normal running of the vehicle, an available lightweight neural network model is deployed to mobile terminal equipment, and the real-time edge detection is realized without depending on a cloud platform. The application steps are as follows:
(1) And training the target model, and storing network parameters of each layer of the target model.
(2) Preparing a hardware environment, in this embodiment, an embedded mobile platform NVIDIA Jetson Nano is adopted, as shown in fig. 4, a CSI camera, a keyboard, a screen, a mouse, and a Jetson Nano are connected with a computer, and a corresponding environment is installed on the hardware device, and Pytorch is used as a deep learning framework.
(3) And deploying the YOLOv5 framework and the target model which is trained to the hardware equipment, setting related detection parameters, and unifying the input picture size to 640x384 for detection.
(4) After the relevant software is deployed, the Jetson Nano is fixed in front of the vehicle windshield, and the camera is fixed at a certain height, facing forward.
(5) And the vehicle is enabled to normally run, a Jetson Nano traffic sign detection function is started, and traffic signs met in front of the vehicle are detected.
According to the detection effect, the Jetson Nano can detect an image in about 60ms by the improved algorithm of the embodiment, and correctly detect the position and the type of the traffic sign in the image, so that under the condition of ensuring accuracy, when the vehicle normally runs, the traffic sign in front is detected in real time, a certain effect is achieved on the automatic driving vehicle in planning, the situation that the automatic driving vehicle obtains the surrounding environment more in real time and perfectly can be helped, and some referenceable technologies are provided for the automatic driving vehicle.
The present application is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present application are intended to be included in the scope of the present application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims (8)

1. A real-time traffic sign detection method based on improved YOLOv5, comprising:
constructing a traffic sign data set;
improving the YOLOv5 model to obtain an original traffic sign detection model;
training an original traffic sign detection model based on the traffic sign dataset;
and detecting the information of the traffic sign in real time based on the trained traffic sign detection model.
2. The improved YOLOv 5-based real-time traffic sign detection method of claim 1, wherein improving the YOLOv5 model comprises: and adding a coordinate attention mechanism module into the backbone network of the YOLOv5 model.
3. The improved YOLOv 5-based real-time traffic sign detection method of claim 2, wherein improving the YOLOv5 model further comprises: an EIoU loss function is used.
4. The improved YOLOv 5-based real-time traffic sign detection method of claim 2, wherein the YOLOv5 model backbone network after joining the coordinate attention mechanism module comprises: conv module, C3 module, coordinate attention mechanism module, and SPPF module connected in sequence.
5. The improved YOLOv 5-based real-time traffic sign detection method of claim 3, wherein the EIoU loss function is:
wherein b, b gt The center points of the prediction frame and the label frame are respectively; w (w) gt ,h gt W and h are the width and height of the label frame and the width and height of the prediction frame respectively; ρ represents calculating the center point distance of the two boxes; c is the furthest distance between the two frame boundaries; c (C) w And C h Respectively representing the width and the height of the minimum external frame covering the two boxes; ioU the overlap ratio between the real frame and the predicted frame.
6. The improved YOLOv 5-based real-time traffic sign detection method of claim 3, wherein training the original traffic sign detection model comprises: training the coordinate attention mechanism module;
training the coordinate attention mechanism module includes:
the input traffic sign data is firstly subjected to a residual module and then divided into two parts, wherein one part is respectively subjected to average pooling in the X, Y direction, then is spliced by Concat, is subjected to Conv2d convolution, is subjected to batch normalization and nonlinear operation, is subjected to Conv2d and an activation function Sigmoid, and is finally connected with the other part to be used as output.
7. The improved YOLOv 5-based real-time traffic sign detection method of claim 1, wherein training the original traffic sign detection model further comprises:
and comparing the confidence coefficient, the target frame coordinates and the category information output by the original traffic sign detection model with the true value of the sample image of the traffic sign, and correcting the original traffic sign detection model.
8. The improved YOLOv 5-based real-time traffic sign detection method of claim 1, wherein the traffic sign dataset comprises a training set and a test set; wherein the test set comprises: an original comprehensive test set, a bad weather test set and a night test set.
CN202310480371.3A 2023-04-28 2023-04-28 Real-time traffic sign detection method based on improved YOLOv5 Pending CN116597413A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310480371.3A CN116597413A (en) 2023-04-28 2023-04-28 Real-time traffic sign detection method based on improved YOLOv5

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310480371.3A CN116597413A (en) 2023-04-28 2023-04-28 Real-time traffic sign detection method based on improved YOLOv5

Publications (1)

Publication Number Publication Date
CN116597413A true CN116597413A (en) 2023-08-15

Family

ID=87594733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310480371.3A Pending CN116597413A (en) 2023-04-28 2023-04-28 Real-time traffic sign detection method based on improved YOLOv5

Country Status (1)

Country Link
CN (1) CN116597413A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117333845A (en) * 2023-11-03 2024-01-02 东北电力大学 Real-time detection method for small target traffic sign based on improved YOLOv5s
CN117593512A (en) * 2023-12-05 2024-02-23 太原科技大学 Method, system and storage medium for detecting position of foam line of A/O pool in real time

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117333845A (en) * 2023-11-03 2024-01-02 东北电力大学 Real-time detection method for small target traffic sign based on improved YOLOv5s
CN117593512A (en) * 2023-12-05 2024-02-23 太原科技大学 Method, system and storage medium for detecting position of foam line of A/O pool in real time
CN117593512B (en) * 2023-12-05 2024-05-28 太原科技大学 Method, system and storage medium for detecting position of foam line of A/O pool in real time

Similar Documents

Publication Publication Date Title
CN112464910A (en) Traffic sign identification method based on YOLO v4-tiny
CN111738110A (en) Remote sensing image vehicle target detection method based on multi-scale attention mechanism
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN116597413A (en) Real-time traffic sign detection method based on improved YOLOv5
CN111767927A (en) Lightweight license plate recognition method and system based on full convolution network
CN112464912B (en) Robot end face detection method based on YOLO-RGGNet
CN109886147A (en) A kind of more attribute detection methods of vehicle based on the study of single network multiple-task
CN113239753A (en) Improved traffic sign detection and identification method based on YOLOv4
CN116385958A (en) Edge intelligent detection method for power grid inspection and monitoring
CN110659601A (en) Depth full convolution network remote sensing image dense vehicle detection method based on central point
CN116597326A (en) Unmanned aerial vehicle aerial photography small target detection method based on improved YOLOv7 algorithm
CN117456480B (en) Light vehicle re-identification method based on multi-source information fusion
CN114529890A (en) State detection method and device, electronic equipment and storage medium
CN107610224A (en) It is a kind of that algorithm is represented based on the Weakly supervised 3D automotive subjects class with clear and definite occlusion modeling
Dong et al. Intelligent pixel-level pavement marking detection using 2D laser pavement images
CN117011728A (en) Unmanned aerial vehicle aerial photographing target detection method based on improved YOLOv7
CN116563553A (en) Unmanned aerial vehicle image segmentation method and system based on deep learning
CN114550016B (en) Unmanned aerial vehicle positioning method and system based on context information perception
CN113947774B (en) Lightweight vehicle target detection system
CN114494893B (en) Remote sensing image feature extraction method based on semantic reuse context feature pyramid
CN115035429A (en) Aerial photography target detection method based on composite backbone network and multiple measuring heads
CN112069997B (en) Unmanned aerial vehicle autonomous landing target extraction method and device based on DenseHR-Net
CN111881833A (en) Vehicle detection method, device, equipment and storage medium
Zhang A novel one-stage object detection network for multi-scene vehicle attribute recognition
CN118135669B (en) Classroom behavior identification method and system based on lightweight network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination