CN116863227A - Hazardous chemical vehicle detection method based on improved YOLOv5 - Google Patents

Hazardous chemical vehicle detection method based on improved YOLOv5 Download PDF

Info

Publication number
CN116863227A
CN116863227A CN202310834688.2A CN202310834688A CN116863227A CN 116863227 A CN116863227 A CN 116863227A CN 202310834688 A CN202310834688 A CN 202310834688A CN 116863227 A CN116863227 A CN 116863227A
Authority
CN
China
Prior art keywords
representing
feature
channel
network
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310834688.2A
Other languages
Chinese (zh)
Inventor
陈伯伦
朱鹏程
刘步实
许雪
戚梓凡
赵月
于翠莹
于永涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaiyin Institute of Technology
Original Assignee
Huaiyin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaiyin Institute of Technology filed Critical Huaiyin Institute of Technology
Priority to CN202310834688.2A priority Critical patent/CN116863227A/en
Publication of CN116863227A publication Critical patent/CN116863227A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The patent discloses a dangerous chemical vehicle detection method based on improved YOLOv5, which comprises the following specific steps: firstly, acquiring dangerous chemical vehicle images through road monitoring, marking the images by using Labelme, and converting a label format into a yolo format; then preprocessing the image dataset; improving a YOLOv5 model, adding an attention mechanism in a back bone network and a neg structure of the model, improving the neg structure into a cross-layer path aggregation network, and replacing a space pyramid pooling layer into an S-ASPP structure; a context feature enhancement network is designed to enhance the detection of small targets and feature refinements are used to filter noise features; adopting SIoU loss to replace original CIoU loss; and inputting the training set into the improved YOLOv5 algorithm model to train the model and test the result. Compared with the prior art, the method can better identify and position the dangerous chemical vehicle, and obtain larger improvement on the progress.

Description

Hazardous chemical vehicle detection method based on improved YOLOv5
Technical Field
The application relates to the technical field of target detection, in particular to a dangerous chemical vehicle detection method based on an improved YOLOv5 algorithm.
Background
With the continuous development of industrialization, the demand for chemical raw materials is increasing. However, some chemical raw materials have the characteristics of explosion, corrosion, inflammability, toxicity and the like, and once leakage occurs, the chemical raw materials can cause a certain degree of harm to human bodies and surrounding environments. Therefore, it is important to ensure the safety of dangerous chemicals (hazardous chemicals). The transportation of hazardous chemicals is mostly carried out by road. As the transportation amount of dangerous chemicals is continuously increased, the number of accidents of dangerous chemical vehicles during transportation is also increased, and the transportation of dangerous chemicals has become the highest risk in the safety of dangerous chemicals. Therefore, in order to reduce casualties and property loss, supervision on the way to the transportation of dangerous chemicals vehicles is not acceptable.
In recent years, deep learning and machine vision have been rapidly developed, such as object detection, image segmentation, and the like. This provides a technological base for hazardous chemical vehicle identification. The vehicle detection technology is utilized to identify an important road, when a dangerous chemical vehicle is detected, a road management department carries out real-time dynamic monitoring on the vehicle, so that traffic accidents can be avoided or reduced, or timely and effective emergency rescue can be provided when dangerous chemical transportation accidents occur, secondary accidents are avoided to the greatest extent, and casualties and property loss are reduced.
In recent years, vehicle detection has been the focus of attention of many researchers, and researches on vehicle detection have been in progress. Djenouri et al propose an improved area convolutional neural network that first uses a SIFT extractor to remove noise (a set of outlier images) and then establishes an improved area convolutional neural network to detect vehicles of different scales. Li et al propose a new condition-sensitive approach based on fast R-CNN and Domain Adaptation (DA), making maximum use of the marked daytime image (source domain) to aid in vehicle detection in the unmarked nighttime image (target domain) to improve the detection capability of the nighttime vehicle.
Aiming at the problem of rapid detection of vehicles in traffic scenes, chen et al propose an improved single lens multi-box detector (SSD) algorithm. The MobileNet v2 is selected as a backbone network feature extraction network of the SSD, so that the algorithm instantaneity is improved, the channel attention mechanism is utilized for feature weighting, and the deconvolution module is utilized for constructing a bottom-up feature fusion structure, so that the detection precision is improved. Kang et al propose a remote sensing satellite video motion vehicle rapid detection method with automatic restriction of a region of interest: firstly, rapidly and automatically acquiring an interested region of a moving vehicle target; and secondly, under the constraint of the region of interest, the rapid detection of the moving vehicle in the region of interest is realized based on an improved Gaussian background difference method.
For the existing deep learning vehicle target detection method, the detection precision of a two-stage algorithm is higher but the speed is slower, and the speed of a single-stage algorithm is higher but the precision is not as high as that of the two-stage algorithm. Therefore, how to design a detection algorithm with high accuracy and high speed is very important.
Disclosure of Invention
The application aims to: aiming at the problem that the precision of the existing algorithm is not high when the object of the dangerous chemical vehicle is detected, the application provides a method for detecting the dangerous chemical vehicle based on an improved YOLOv5 algorithm, and the method for improving the detection precision of the dangerous chemical vehicle on the premise of ensuring the detection speed.
The technical scheme is as follows: the application provides a dangerous chemical vehicle target detection method based on improved YOLOv5, which is characterized by comprising the following specific steps:
s1, acquiring dangerous chemical vehicle images through road monitoring to form a data set, and dividing the data set into a training set, a verification set and a test set;
s2, preprocessing pictures in the dangerous chemical vehicle data set;
s3, improving a YOLOv5 model, adding an attention mechanism in a back bone network and a back structure of the model, improving the back structure into a cross-layer path aggregation network, and replacing a space pyramid pooling layer into an S-ASPP structure;
s4, designing a context feature enhancement network CAH, introducing the context feature enhancement network CAH to the top of a main network to enhance the detection of small targets, and filtering noise features by using a feature refinement module FRB in a neg structure;
s5, modifying a loss function in the Yolov5, and replacing original CIoU loss by SIoU loss;
s6, inputting the training set into the improved YOLOv5 model to train the model and test the result.
Further, the specific steps of the step S1 are as follows:
s1.1, acquiring a dangerous chemical vehicle image through road monitoring, and marking the image by using a Labelme marking tool;
s1.2, converting the json tag format with the marked label into a yolo format suitable for network input, generating id, x, y, w, h, and normalizing;
s1.3, marking the marked vehicle data set according to 8:1:1, dividing training set, verification set and test set.
Further, the specific steps of the preprocessing in the step S2 are as follows:
s2.1, calculating the average brightness of a scene according to the current scene by using a Tone mapping contrast enhancement method, selecting a proper brightness domain according to the average brightness, and finally mapping from the whole scene to the brightness domain to obtain a required result;
s2.2, enhancing the expansion data volume by using Mixup data and enhancing the generalization capability of a network model;
s2.3, enhancing by using Mosaic data, and splicing by using 9 pictures to enhance the robustness of the model.
Further, the specific steps of adding the attention mechanism in the back bone network and the back structure of the model in the step S3 are as follows:
s3.1, introducing an attention mechanism into a back bone network and a back structure of a YOLOv5 model, and inhibiting useless information by giving weights or hard selection part feature graphs to the feature graphs of different parts; the attention module generates attention characteristic map information in two dimensions of a channel and a space in parallel, and then the two kinds of characteristic map information are multiplied by the original input characteristic map to carry out self-adaptive characteristic correction to generate a final attention mechanism characteristic map;
s3.2, the channel attention mechanism is to compress the feature map in the space dimension, carry out global average pooling and maximum pooling on the input feature map in the width dimension and the height dimension respectively, and then carry out cross-channel interaction through one-dimensional convolution with the size of k, so as to enhance the connection among channels; the size k of the convolution kernel is defined as:
wherein, C is the channel number of the feature map, gamma and b are respectively 2 and 1, and the specific definition of the channel attention mechanism is as follows;
Mc(F)=σ(Conv(avgpool(F)+Conv(maxpool(F)))
=σ(W k (F avg )+W k (F max ))
wherein F represents an input feature map, sigma represents a sigmoid activation function, F avg and Fmax Respectively representing the feature graphs after global average pooling and global maximum pooling, W i Representing the weight of the network;
s3.3, the spatial attention mechanism is to compress a feature map in a channel dimension, perform global average pooling and maximum pooling based on channels on the input feature map, splice two obtained tensors in the channel direction, reduce the channel to 1 by convolution operation, and activate to generate spatial attention features through sigmoid, wherein the method is specifically defined as:
Ms(F)=σ(f 7x7 ([Avgpool(F);Maxpool(F)]))
=σ(f 7x7 ([F avg ;F max ]))
wherein F represents an input feature map, sigma represents a sigmoid activation function, F 7x7 Representing a convolution operation with a convolution kernel size of 7 x 7;
s3.4, using a cross-layer path aggregation network as a neg structure;
s3.5, replacing the SPPF layer, and designing a brand new space pyramid pooling layer S-ASPP layer.
Further, the S-ASPP structure in the step 3 realizes extraction of vehicle information of dangerous chemicals of different scales in the image by using different void ratios, specifically:
the S-ASPP structure is divided into two parallel parts: the first part carries out serial convolution operation on four 3X 3 convolution layers with the void ratio of {18, 12,6,1} from large to small, and splices the obtained characteristic information of four different receptive fields in the channel dimension; the second S-ASPP layer sequentially carries out pooling-convolution-up sampling on the input original features, the obtained feature images and the feature images of the first part are spliced together in channel dimension, and the channel dimension is subjected to 1X 1 convolution to obtain the final S-ASPP output.
Further, the specific steps of the step S4 are as follows:
s4.1, designing a brand new context feature enhancement module CAH, and introducing the brand new context feature enhancement module CAH to the top of a backbone network; the context feature enhancement module CAH is composed of four context reasoning branches in total: the first branch and the second branch use a hole convolution kernel with hole rates of 1 and 2 to realize the access to the local context; the third branch uses two cavity convolution kernels with the cavity rate of {2,4}, and the fourth branch uses two convolution kernels with the cavity rate of {3,6}, so as to feel larger context information; finally, splicing and fusing the outputs of the four parallel branches in the channel dimension, and obtaining a final output result through 1 multiplied by 1 convolution with the void ratio of 1;
s4.2, designing a feature refinement module FRB to filter noise features in a cross-layer path aggregation network, and integrating the noise features in a neg structure; the feature refinement module FRB firstly unifies the features output by the prediction head into feature graphs with the same size and the same size, and then the feature graphs are spliced and fused; then respectively entering a space purification module and a channel purification module, and finally adding and fusing the outputs of the two purification modules to generate a final result; the channel purification module aggregates the input features into space dimensions, and combines global average pooling and global maximum pooling, and is specifically defined as:
wherein ,Xm Representing the input of the m (m=1, 2, 3) th layer of the feature refinement module,representing the results of the n-th to m-th layers in the (x, y) position, +.>Representing the output result of the mth layer at the (x, y) position, a m ,b m ,c m Representing the adaptive weights of the channels, the size is 1 multiplied by 1;
the spatial purification module uses a softmax function to generate weights under each channel, and the specific definition of branches is as follows:
wherein ,represents the last output feature at the (x, y) position, k represents the number of input feature map channels, c represents the channel,/and c represents the channel>Values representing the profile of the kth channel from n to m layers in the (x, y) position,/v>Representing spatially adaptive weights.
Further, the specific steps of the step S5 are as follows:
s5.1, calculating a target confidence loss by a sample obtained by positive sample matching, wherein the target confidence loss comprises two parts: target confidence score p in prediction block o The IoU values of the prediction frame and the target frame corresponding to the prediction frame are calculated by the two to obtain the final target confidence loss, which is specifically defined as:
wherein ,lobj Representing confidence loss, C i The true value is represented by a value that is true,representing predicted value lambda nobj Weight representing non-detection target, S 2 Representing the number of grids, B representing the number of anchors in each cell;
s5.2, calculating the category loss through the category score of the prediction frame and the one-hot performance of the target frame category, wherein the category loss is specifically defined as:
wherein ,lcls Representing a classification loss, c representing a category, and P (c) representing a prediction box category score for the category;
s5.3, replacing the CIoU in the target frame loss with the SIoU; SIoU consists of four parts: angle, distance, shape and IoU, specifically defined as:
the shape cost is defined as:
wherein Ω represents shape cost, w, h, w gt ,h gt The width and the height of the prediction frame and the real frame are respectively, and theta represents the importance degree of the shape loss;
the angular cost is defined as:
wherein sigma represents the distance between the center point of the real frame and the center point of the predicted frame, c h Representing the difference between the heights of the center points of the real frame and the predicted frame, alpha represents the included angle between the connecting line and the horizontal between the two points,representing the center position of the real frame b cx ,b cy Representing the central position of the prediction frame;
the redefined distance costs are, in terms of angle costs:
wherein ,representing the position of the center point of the real frame b cx ,b cy Representing the position of the center point of the prediction frame, c w ,c h For both real and predicted framesThe width and height of the minimum circumscribed matrix.
Further, the specific steps of the step S6 are as follows:
s6.1, inputting the training set into the improved YOLOv5 algorithm model to train the model and test the result on the test set;
and S6.2, deploying the trained model on a server, and monitoring the dangerous chemical vehicle in real time.
Compared with the prior art, the method for detecting the dangerous chemical vehicle based on the improved YOLOv5 has the advantages that the effect of improving the detection precision on the premise of ensuring the speed is achieved, and the method is characterized in that:
(1) The Tone mapping contrast enhancement method is used to improve the variability of the vehicle from the image background, and then Mixup and Mosaic data are used to enhance the extended data set.
(2) And improving the YOLOv5 model network, adding an attention mechanism in a back bone network and a neg structure of the model, improving the neg structure into a cross-layer path aggregation network, and replacing a space pyramid pooling layer into an S-ASPP structure. A contextual feature enhancement network (CAH) is designed to be introduced at the top of the backbone network to enhance detection of small objects and Feature Refinement (FRB) is used in the neg structure to filter noise features.
(3) The loss function in YOLOv5 is modified, and the SIoU loss is used for replacing the original CIoU loss.
Aiming at the problem of detection precision of dangerous chemicals, the application firstly uses a Tone mapping contrast enhancement method to enhance the target contrast; secondly, increasing the data volume of a training set by using Mixup and Mosaic data enhancement, and improving the robustness of the model; adding an attention mechanism into the network model to enable the model to pay more attention to vehicle information, improving a path aggregation network to enable the vehicle characteristic information not to be lost easily, and replacing a space pyramid pooling layer; then introducing context feature enhancement to enable the model to capture tiny features and refining and filtering ineffective features by using the features; the SIoU loss is then used as a loss function for the target box. Compared with the same type of method, the method can effectively improve the detection precision of the model on the vehicle.
Drawings
FIG. 1 is an overall flow diagram of the algorithm;
FIGS. 2-4 are block diagrams of added attention mechanisms;
FIG. 5 is a diagram of a replaced spatial pyramid S-ASPP layer;
FIG. 6 is a diagram of a context feature enhanced network (CAH) architecture;
FIG. 7 is a feature refinement module (FRB) block diagram;
FIG. 8 is a diagram of the improved YOLOv5 model.
Detailed Description
The application is further elucidated below in connection with the drawings and the detailed description. It is to be understood that these examples are for the purpose of illustrating the application only and are not to be construed as limiting the scope of the application, since modifications to the application, which are various equivalent to those skilled in the art, will fall within the scope of the application as defined in the appended claims after reading the application.
As shown in fig. 1, the specific steps of the application are as follows:
s1, acquiring dangerous chemical vehicle images through road monitoring to serve as training data for training an improved YOLOv5 network, wherein the method comprises the following specific steps of:
s1.1, acquiring dangerous chemical vehicle images through road monitoring, and marking the images by using a Labelme marking tool.
S1.2, converting the json tag format with the marked label into a yolo format suitable for network input, generating id, x, y, w, h, and normalizing.
S1.3, marking the marked vehicle data set according to 8:1:1, dividing training set, verification set and test set.
S2, preprocessing pictures in the dangerous chemical vehicle data set, wherein the specific steps are as follows:
s2.1, using a Tone mapping contrast enhancement method to improve the difference between the vehicle and the image background. The average brightness of the scene is calculated according to the current scene, then a proper brightness domain is selected according to the average brightness, and finally mapping is carried out from the whole scene to the brightness domain so as to obtain the required result. Specifically defined as:
wherein ,Lw (x, y) represents the pixel brightness at position (x, y), N represents the number of pixels in the scene, and δ represents a very small number for the case of processing a solid black pixel.
S2.2, enhancing the extended data volume and enhancing the generalization capability of the network model by using Mixup data.
S2.3, enhancing by using Mosaic data, and splicing by using 9 pictures to enhance the robustness of the model.
S3, improving a YOLOv5 model, adding an attention mechanism in a back bone network and a back structure of the model, and replacing a space pyramid pooling layer into an SPPCSPC structure, wherein the method comprises the following specific steps:
s3.1, introducing a attention mechanism into a back bone network and a back structure of the YOLOv5 model, and inhibiting useless information by giving weights to characteristic diagrams of different parts or rigidly selecting the characteristic diagrams of the parts. The attention module generates attention characteristic map information in two dimensions of a channel and a space in parallel, and then the two kinds of characteristic map information are multiplied by the original input characteristic map to carry out self-adaptive characteristic correction to generate a final attention mechanism characteristic map.
S3.2, the channel attention mechanism is to compress the feature map in the space dimension, to carry out global average pooling and maximum pooling on the input feature map in the width dimension and the height dimension respectively, and to carry out cross-channel interaction through one-dimensional convolution with k, so as to enhance the connection between channels. The size k of the convolution kernel is defined as:
wherein C is the channel number of the feature map, and gamma and b respectively take values of 2 and 1. The specific definition of channel attention mechanism is as follows;
Mc(F)=σ(Conv(avgpool(F)+Conv(maxpool(F)))
=σ(W k (F avg )+W k (F max ))
wherein F represents an input feature map, sigma represents a sigmoid activation function, F avg and Fmax Respectively representing the feature graphs after global average pooling and global maximum pooling, W i Representing the weight of the network;
s3.3, the spatial attention mechanism is to compress the feature map in the channel dimension, make the input feature map global average pooling and maximum pooling based on the channel, splice the two obtained tensors in the channel direction, reduce the channel to 1 by convolution operation, and activate to generate the spatial attention feature through sigmoid. Specifically defined as:
Ms(F)=σ(f 7x7 ([Avgpool(F);Maxpool(F)]))
=σ(f 7x7 ([F avg ;F max ]))
wherein F represents an input feature map, sigma represents a sigmoid activation function, F 7x7 Representing a convolution operation with a convolution kernel size of 7 x 7.
S3.4, using a cross-layer path aggregation network as a neg structure.
S3.5, replacing the SPPF layer, and designing a brand new space pyramid pooling layer S-ASPP layer. The S-ASPP structure realizes extraction of vehicle information of dangerous chemicals with different scales in the image by using different void ratios. The method comprises the steps of dividing the method into two parallel parts, wherein the first part respectively carries out serial convolution operation on four 3X 3 convolution layers with the void ratio of {18, 12,6,1} from large to small, and splices the obtained characteristic information of four different receptive fields in the channel dimension. The second S-ASPP layer sequentially carries out pooling-convolution-up sampling on the input original features, the obtained feature images and the feature images of the first part are spliced together in channel dimension, and the channel dimension is subjected to 1X 1 convolution to obtain the final S-ASPP output.
S4, introducing a new module to enable the model to pay more attention to fine features, wherein the method comprises the following specific steps of:
s4.1, designing a brand new context feature enhancement module (CAH), and introducing the brand new context feature enhancement module (CAH) to the top of a backbone network. The context feature enhancement module is composed of four context reasoning branches, and the first branch and the second branch use hole convolution kernels with hole rates of 1 and 2 to achieve access to the local context. The third branch uses two hole convolution kernels with a hole rate of {2,4}, and the fourth branch uses two convolution kernels with a hole rate of {3,6}, for sensing larger context information. And finally, splicing and fusing the outputs of the four parallel branches in the channel dimension, and obtaining a final output result through 1 multiplied by 1 convolution with the void ratio of 1.
S4.2, designing a feature refinement module to filter noise features in the cross-layer path aggregation network, and integrating the noise features in the neg structure. The feature refinement module firstly unifies the features output by the prediction heads into feature graphs with the same size, and then the feature graphs are spliced and fused. And then enter the space purification module and the channel purification module respectively. Finally, the outputs of the two purifying modules are added and fused to generate a final result. The channel purification module aggregates the input features into spatial dimensions and combines global average pooling with global maximum pooling. Specifically defined as:
wherein ,Xm Representing the input of the m (m=1, 2, 3) th layer of the feature refinement module,representing the results of the n-th to m-th layers in the (x, y) position, +.>Representing the output result of the mth layer at the (x, y) position, a m ,b m ,c m Representing the adaptive weights of the channels, the size is 1 multiplied by 1;
the spatial purification module uses a softmax function to generate weights under each channel, and the specific definition of branches is as follows:
wherein ,represents the last output feature at the (x, y) position, k represents the number of input feature map channels, c represents the channel,/and c represents the channel>Values representing the profile of the kth channel from n to m layers in the (x, y) position,/v>Representing spatially adaptive weights.
S5, modifying a loss function in the Yolov5, and replacing the original CIoU loss by adopting the SIoU loss, wherein the specific steps are as follows:
s5.1, calculating a target confidence loss by a sample obtained by positive sample matching, wherein the target confidence loss comprises two parts: target confidence score p in prediction block o And calculating binary cross entropy by the prediction frame and IoU values of the target frame corresponding to the prediction frame to obtain the final target confidence loss. Specifically defined as:
wherein ,lobj Representing confidence loss, C i The true value is represented by a value that is true,representing predicted value lambda nobj Weight representing non-detection target, S 2 Representing the number of grids, and B representing the number of anchors in each cell.
S5.2, calculating the category loss through the category score of the prediction frame and the one-hot performance of the target frame category. Specifically defined as:
wherein ,lcls Representing the classification penalty, c represents the class, and P (c) represents the prediction box class score for that class.
S5.3, replacing the CIoU in the target frame loss with the SIoU. SIoU consists of four parts: angle, distance, shape, and IoU. Specifically defined as:
wherein, the shape cost is defined as:
wherein Ω represents shape cost, w, h, w gt ,h gt The width and height of the predicted and real frames, respectively, θ represents the importance of the shape loss.
The angular cost is defined as:
wherein sigma represents the distance between the center point of the real frame and the center point of the predicted frame, c h Representing the difference between the heights of the center points of the real frame and the predicted frame, alpha represents the included angle between the connecting line and the horizontal between the two points,representing the center position of the real frame b cx ,b cy Representing the central position of the prediction box.
The redefined distance costs are, in terms of angle costs:
wherein ,representing the position of the center point of the real frame b cx ,b cy Representing the position of the center point of the prediction frame, c w ,c h The width and height of the minimum bounding matrix for both the real and predicted frames.
S6, inputting the training set into an improved YOLOv5 algorithm model to train the model and test the result, wherein the training set comprises the following specific steps:
s6.1, inputting the training set into the improved YOLOv5 algorithm model to train the model and test the result on the test set;
s6.2, deploying the trained model on a server, and detecting the dangerous chemical vehicle in real time.
The application can be deployed at a mobile terminal to realize real-time detection of dangerous chemical vehicles.
The foregoing embodiments are merely illustrative of the technical concept and features of the present application, and are intended to enable those skilled in the art to understand the present application and to implement the same, not to limit the scope of the present application. All equivalent changes or modifications made according to the spirit of the present application should be included in the scope of the present application.

Claims (8)

1. The dangerous chemical vehicle target detection method based on the improved YOLOv5 is characterized by comprising the following specific steps of:
s1, acquiring dangerous chemical vehicle images through road monitoring to form a data set, and dividing the data set into a training set, a verification set and a test set;
s2, preprocessing pictures in the dangerous chemical vehicle data set;
s3, improving a YOLOv5 model, adding an attention mechanism in a back bone network and a back structure of the model, improving the back structure into a cross-layer path aggregation network, and replacing a space pyramid pooling layer into an S-ASPP structure;
s4, designing a context feature enhancement network CAH, introducing the context feature enhancement network CAH to the top of a main network to enhance the detection of small targets, and filtering noise features by using a feature refinement module FRB in a neg structure;
s5, modifying a loss function in the Yolov5, and replacing original CIoU loss by SIoU loss;
s6, inputting the training set into the improved YOLOv5 model to train the model and test the result.
2. The method for detecting the object of the hazardous chemical vehicle based on the improved YOLOv5 of claim 1, wherein the specific steps of the step S1 are as follows:
s1.1, acquiring a dangerous chemical vehicle image through road monitoring, and marking the image by using a Labelme marking tool;
s1.2, converting the json tag format with the marked label into a yolo format suitable for network input, generating id, x, y, w, h, and normalizing;
s1.3, marking the marked vehicle data set according to 8:1:1, dividing training set, verification set and test set.
3. The method for detecting the object of the hazardous chemical vehicle based on the improved YOLOv5 of claim 1, wherein the specific steps of the preprocessing in the step S2 are as follows:
s2.1, calculating the average brightness of a scene according to the current scene by using a Tonemapping contrast enhancement method, selecting a proper brightness domain according to the average brightness, and finally mapping from the whole scene to the brightness domain to obtain a required result;
s2.2, enhancing the expansion data volume by using Mixup data and enhancing the generalization capability of a network model;
s2.3, enhancing by using Mosaic data, and splicing by using 9 pictures to enhance the robustness of the model.
4. The method for detecting the dangerous chemical vehicle target based on the improved YOLOv5 according to claim 1, wherein the specific steps of adding the attention mechanism in the back bone network and the neg structure of the model in the step S3 are as follows:
s3.1, introducing an attention mechanism into a back bone network and a back structure of a YOLOv5 model, and inhibiting useless information by giving weights or hard selection part feature graphs to the feature graphs of different parts; the attention module generates attention characteristic map information in two dimensions of a channel and a space in parallel, and then the two kinds of characteristic map information are multiplied by the original input characteristic map to carry out self-adaptive characteristic correction to generate a final attention mechanism characteristic map;
s3.2, the channel attention mechanism is to compress the feature map in the space dimension, carry out global average pooling and maximum pooling on the input feature map in the width dimension and the height dimension respectively, and then carry out cross-channel interaction through one-dimensional convolution with the size of k, so as to enhance the connection among channels; the size k of the convolution kernel is defined as:
wherein, C is the channel number of the feature map, gamma and b are respectively 2 and 1, and the specific definition of the channel attention mechanism is as follows;
Mc(F)=σ(Conv(avgpool(F)+Conv(maxpool(F)))
=σ(W k (F avg )+W k (F max ))
wherein F represents an input feature map, sigma represents a sigmoid activation function, F avg and Fmax Respectively representing the feature graphs after global average pooling and global maximum pooling, W i Representing the weight of the network;
s3.3, the spatial attention mechanism is to compress a feature map in a channel dimension, perform global average pooling and maximum pooling based on channels on the input feature map, splice two obtained tensors in the channel direction, reduce the channel to 1 by convolution operation, and activate to generate spatial attention features through sigmoid, wherein the method is specifically defined as:
Ms(F)=σ(f 7x7 ([Avgpool(F);Maxpool(F)]))
=σ(f 7x7 ([F avg ;F max ]))
wherein F represents an input feature map, sigma represents a sigmoid activation function, F 7x7 Representing a convolution operation with a convolution kernel size of 7 x 7;
s3.4, using a cross-layer path aggregation network as a neg structure;
s3.5, replacing the SPPF layer, and designing a brand new space pyramid pooling layer S-ASPP layer.
5. The method for detecting the dangerous chemical vehicle target based on the improved YOLOv5 according to claim 1 or 4, wherein the S-ASPP structure in the step 3 realizes the extraction of dangerous chemical vehicle information with different scales in an image by using different void ratios, specifically:
the S-ASPP structure is divided into two parallel parts: the first part carries out serial convolution operation on four 3X 3 convolution layers with the void ratio of {18, 12,6,1} from large to small, and splices the obtained characteristic information of four different receptive fields in the channel dimension; the second S-ASPP layer sequentially carries out pooling-convolution-up sampling on the input original features, the obtained feature images and the feature images of the first part are spliced together in channel dimension, and the channel dimension is subjected to 1X 1 convolution to obtain the final S-ASPP output.
6. The method for detecting the object of the hazardous chemical vehicle based on the improved YOLOv5 of claim 1, wherein the specific steps of the step S4 are as follows:
s4.1, designing a brand new context feature enhancement module CAH, and introducing the brand new context feature enhancement module CAH to the top of a backbone network; the context feature enhancement module CAH is composed of four context reasoning branches in total: the first branch and the second branch use a hole convolution kernel with hole rates of 1 and 2 to realize the access to the local context; the third branch uses two cavity convolution kernels with the cavity rate of {2,4}, and the fourth branch uses two convolution kernels with the cavity rate of {3,6}, so as to feel larger context information; finally, splicing and fusing the outputs of the four parallel branches in the channel dimension, and obtaining a final output result through 1 multiplied by 1 convolution with the void ratio of 1;
s4.2, designing a feature refinement module FRB to filter noise features in a cross-layer path aggregation network, and integrating the noise features in a neg structure; the feature refinement module FRB firstly unifies the features output by the prediction head into feature graphs with the same size and the same size, and then the feature graphs are spliced and fused; then respectively entering a space purification module and a channel purification module, and finally adding and fusing the outputs of the two purification modules to generate a final result; the channel purification module aggregates the input features into space dimensions, and combines global average pooling and global maximum pooling, and is specifically defined as:
wherein ,Xm Representing the input of the m (m=1, 2, 3) th layer of the feature refinement module,representing the results of the n-th to m-th layers in the (x, y) position, +.>Representing the output result of the mth layer at the (x, y) position, a m ,b m ,c m Representing the adaptive weights of the channels, the size is 1 multiplied by 1;
the spatial purification module uses a softmax function to generate weights under each channel, and the specific definition of branches is as follows:
wherein ,represents the last output feature at the (x, y) position, k represents the number of input feature map channels, c represents the channel,/and c represents the channel>Values representing the profile of the kth channel from n to m layers in the (x, y) position,/v>Representing spatially adaptive weights.
7. The method for detecting the object of the hazardous chemical vehicle based on the improved YOLOv5 of claim 1, wherein the specific steps of the step S5 are as follows:
s5.1, calculating a target confidence loss by a sample obtained by positive sample matching, wherein the target confidence loss comprises two parts: target confidence score p in prediction block o The IoU values of the prediction frame and the target frame corresponding to the prediction frame are calculated by the two to obtain the final target confidence loss, which is specifically defined as:
wherein ,lobj Representing confidence loss, C i The true value is represented by a value that is true,representing predicted value lambda nobj Weight representing non-detection target, S 2 Representing the number of grids, B representing the number of anchors in each cell;
s5.2, calculating the category loss through the category score of the prediction frame and the one-hot performance of the target frame category, wherein the category loss is specifically defined as:
wherein ,lcls Representing a classification loss, c representing a category, and P (c) representing a prediction box category score for the category;
s5.3, replacing the CIoU in the target frame loss with the SIoU; SIoU consists of four parts: angle, distance, shape and IoU, specifically defined as:
the shape cost is defined as:
wherein Ω represents shape cost, w, h, w gt ,h gt The width and the height of the prediction frame and the real frame are respectively, and theta represents the importance degree of the shape loss;
the angular cost is defined as:
wherein sigma represents the distance between the center point of the real frame and the center point of the predicted frame, c h Representing the difference between the heights of the center points of the real frame and the predicted frame, alpha represents the included angle between the connecting line and the horizontal between the two points,representing the center position of the real frame b cx ,b cy Representing the central position of the prediction frame;
the redefined distance costs are, in terms of angle costs:
wherein ,representing the position of the center point of the real frame b cx ,b cy Representing the position of the center point of the prediction frame, c w ,c h The width and height of the minimum bounding matrix for both the real and predicted frames.
8. The method for detecting the object of the hazardous chemical vehicle based on the improved YOLOv5 of claim 1, wherein the specific steps of the step S6 are as follows:
s6.1, inputting the training set into the improved YOLOv5 algorithm model to train the model and test the result on the test set;
and S6.2, deploying the trained model on a server, and monitoring the dangerous chemical vehicle in real time.
CN202310834688.2A 2023-07-07 2023-07-07 Hazardous chemical vehicle detection method based on improved YOLOv5 Pending CN116863227A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310834688.2A CN116863227A (en) 2023-07-07 2023-07-07 Hazardous chemical vehicle detection method based on improved YOLOv5

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310834688.2A CN116863227A (en) 2023-07-07 2023-07-07 Hazardous chemical vehicle detection method based on improved YOLOv5

Publications (1)

Publication Number Publication Date
CN116863227A true CN116863227A (en) 2023-10-10

Family

ID=88228133

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310834688.2A Pending CN116863227A (en) 2023-07-07 2023-07-07 Hazardous chemical vehicle detection method based on improved YOLOv5

Country Status (1)

Country Link
CN (1) CN116863227A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117935186A (en) * 2024-03-25 2024-04-26 福建省高速公路科技创新研究院有限公司 Method for identifying dangerous goods vehicles in tunnel under strong light inhibition

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117935186A (en) * 2024-03-25 2024-04-26 福建省高速公路科技创新研究院有限公司 Method for identifying dangerous goods vehicles in tunnel under strong light inhibition

Similar Documents

Publication Publication Date Title
CN110188705B (en) Remote traffic sign detection and identification method suitable for vehicle-mounted system
CN112183203B (en) Real-time traffic sign detection method based on multi-scale pixel feature fusion
CN111598030A (en) Method and system for detecting and segmenting vehicle in aerial image
CN114359851A (en) Unmanned target detection method, device, equipment and medium
CN105550701A (en) Real-time image extraction and recognition method and device
CN112070713A (en) Multi-scale target detection method introducing attention mechanism
CN111462050B (en) YOLOv3 improved minimum remote sensing image target detection method and device and storage medium
CN110969171A (en) Image classification model, method and application based on improved convolutional neural network
CN115631344B (en) Target detection method based on feature self-adaptive aggregation
CN113723377A (en) Traffic sign detection method based on LD-SSD network
CN114049572A (en) Detection method for identifying small target
CN116452937A (en) Multi-mode characteristic target detection method based on dynamic convolution and attention mechanism
CN117037119A (en) Road target detection method and system based on improved YOLOv8
CN116863227A (en) Hazardous chemical vehicle detection method based on improved YOLOv5
CN117079163A (en) Aerial image small target detection method based on improved YOLOX-S
CN113903022A (en) Text detection method and system based on feature pyramid and attention fusion
CN116152226A (en) Method for detecting defects of image on inner side of commutator based on fusible feature pyramid
CN116597411A (en) Method and system for identifying traffic sign by unmanned vehicle in extreme weather
CN114550023A (en) Traffic target static information extraction device
Kheder et al. Transfer learning based traffic light detection and recognition using CNN inception-V3 model
CN115761552B (en) Target detection method, device and medium for unmanned aerial vehicle carrying platform
CN117372853A (en) Underwater target detection algorithm based on image enhancement and attention mechanism
CN117197687A (en) Unmanned aerial vehicle aerial photography-oriented detection method for dense small targets
CN115240163A (en) Traffic sign detection method and system based on one-stage detection network
CN115035429A (en) Aerial photography target detection method based on composite backbone network and multiple measuring heads

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination