WO2022083784A1 - 一种基于车联网的道路检测方法 - Google Patents
一种基于车联网的道路检测方法 Download PDFInfo
- Publication number
- WO2022083784A1 WO2022083784A1 PCT/CN2021/130684 CN2021130684W WO2022083784A1 WO 2022083784 A1 WO2022083784 A1 WO 2022083784A1 CN 2021130684 W CN2021130684 W CN 2021130684W WO 2022083784 A1 WO2022083784 A1 WO 2022083784A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- network
- yolov3
- feature
- module
- target
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 79
- 238000000034 method Methods 0.000 claims abstract description 76
- 238000013138 pruning Methods 0.000 claims abstract description 47
- 238000012545 processing Methods 0.000 claims abstract description 33
- 230000004927 fusion Effects 0.000 claims abstract description 26
- 238000000605 extraction Methods 0.000 claims abstract description 20
- 238000013140 knowledge distillation Methods 0.000 claims abstract description 18
- 238000011084 recovery Methods 0.000 claims abstract description 6
- 238000012549 training Methods 0.000 claims description 42
- 238000007500 overflow downdraw method Methods 0.000 claims description 24
- 230000007704 transition Effects 0.000 claims description 20
- 238000004891 communication Methods 0.000 claims description 18
- 230000001629 suppression Effects 0.000 claims description 9
- 238000011176 pooling Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 3
- 238000003064 k means clustering Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 description 26
- 238000010586 diagram Methods 0.000 description 19
- 238000007906 compression Methods 0.000 description 12
- 230000006835 compression Effects 0.000 description 11
- 230000006872 improvement Effects 0.000 description 11
- 238000011897 real-time detection Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 230000008859 change Effects 0.000 description 7
- 238000005070 sampling Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 230000000717 retained effect Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000007499 fusion processing Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 101100194362 Schizosaccharomyces pombe (strain 972 / ATCC 24843) res1 gene Proteins 0.000 description 1
- 101100194363 Schizosaccharomyces pombe (strain 972 / ATCC 24843) res2 gene Proteins 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 238000013401 experimental design Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/588—Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/255—Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/94—Hardware or software architectures specially adapted for image or video understanding
- G06V10/95—Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/40—Bus networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/12—Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/08—Detecting or categorising vehicles
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/28—Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
- H04L12/40—Bus networks
- H04L2012/40267—Bus for use in transportation systems
- H04L2012/40273—Bus for use in transportation systems the transportation system being a vehicle
Definitions
- the invention belongs to the field of image detection, and in particular relates to a road detection method and vehicle-mounted electronic equipment based on the Internet of Vehicles.
- the embodiments of the present invention provide a road detection method based on the Internet of Vehicles and an in-vehicle electronic device.
- the specific technical solutions are as follows:
- an embodiment of the present invention provides a road detection method based on the Internet of Vehicles, which is applied to a vehicle terminal, including:
- the target road image captured by the image acquisition end input the target road image into the improved YOLOv3 network obtained by pre-training, and use the backbone network in the form of dense connection to perform feature extraction on the target road image to obtain x different scales.
- Feature map; x is a natural number greater than or equal to 4;
- the improved FPN network is used to perform top-down, densely connected feature fusion on the x feature maps of different scales, and the prediction results corresponding to each scale are obtained; based on all predictions
- attribute information of the target road image is obtained, and the attribute information includes the position and category of the target in the target road image;
- the improved YOLOv3 network includes the densely connected backbone network
- the improved The improved YOLOv3 network is based on the YOLOv3 network, replacing the residual module in the backbone network with a dense connection module, increasing the feature extraction scale, optimizing the feature fusion method of the FPN network, and pruning and combining It is formed after the knowledge distillation guides
- an embodiment of the present invention provides an in-vehicle electronic device, including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus;
- the processor is configured to implement the steps of any one of the road detection methods based on the Internet of Vehicles provided in the first aspect when executing the program stored in the memory.
- the residual module in the backbone network of the YOLOv3 network is replaced with a dense connection module, and the feature fusion mode is changed from parallel to serial, so that when the backbone network performs feature extraction,
- the early feature map can be directly used as the input of each subsequent layer, and the obtained feature map has more information, which strengthens the transfer of features, so the detection accuracy can be improved when the target road image is detected.
- the number of parameters and the amount of computation can be reduced by multiplexing the feature map parameters of the shallow network.
- using multiple feature extraction scales to add fine-grained feature extraction scales for small targets can improve the detection accuracy of small targets in target road images.
- the feature fusion method of the FPN network is changed, the feature map extracted by the backbone network is characterized by a top-down and dense connection method, and the deep features are directly upsampled by different multiples, so as to make the transmitted All feature maps have the same size.
- the shallow network there is also the participation of high-dimensional semantic information, which is helpful for Improve the detection accuracy; at the same time, by directly receiving the features of the shallower network, more specific features can be obtained, which will effectively reduce the loss of features, reduce the amount of parameters that need to be calculated, improve the detection speed, and realize real-time detection.
- the network volume can be reduced and most of the redundancy can be eliminated.
- the calculation can greatly improve the detection speed while maintaining the detection accuracy.
- the invention deploys the detection process of the cloud in the edge devices with very limited storage resources and computing resources, and the vehicle-mounted device can realize the road detection beyond the line of sight, and can realize the high-precision and high-real-time detection of the targets on the road. It is beneficial for the driver to drive safely.
- FIG. 1 is a schematic flowchart of a road detection method based on the Internet of Vehicles provided by an embodiment of the present invention
- FIG. 2 is a schematic structural diagram of a YOLOv3 network in the prior art
- FIG. 3 is a schematic structural diagram of an improved YOLOv3 network provided by an embodiment of the present invention.
- FIG. 4 is a schematic structural diagram of a transition module provided by an embodiment of the present invention.
- Figure 5-1 is a comparison diagram of mAP curves between YOLOv3 and Dense-YOLO-1 of an embodiment of the present invention
- Figure 5-2 is a comparison diagram of loss curves of YOLOv3 and Dense-YOLO-1 of an embodiment of the present invention
- Figure 6-1 is a comparison diagram of mAP curves of Dense-YOLO-1 and MultiScale-YOLO-1 according to the embodiment of the present invention
- Figure 6-2 is the loss of Dense-YOLO-1 and MultiScale-YOLO-1 according to the embodiment of the present invention Curve comparison chart;
- Figure 7-1 is a comparison diagram of mAP curves of Dense-YOLO-1 and Dense-YOLO-2 according to the embodiment of the present invention
- Figure 7-2 is the loss of Dense-YOLO-1 and Dense-YOLO-2 according to the embodiment of the present invention Curve comparison chart;
- Figure 8-1 is a comparison diagram of mAP curves of Dense-YOLO-1 and MultiScale-YOLO-2 according to the embodiment of the present invention
- Figure 8-2 is the loss of Dense-YOLO-1 and MultiScale-YOLO-2 according to the embodiment of the present invention Curve comparison chart;
- FIG. 9-1 is a weight shift diagram of parameter combination 5 selected in an embodiment of the present invention
- FIG. 9-2 is a weight overlap diagram of parameter combination 5 selected in an embodiment of the present invention
- FIG. 10 is a performance comparison diagram of an improved YOLOv3 network (YOLO-Terse) and a YOLOv3 network according to an embodiment of the present invention
- FIG. 11 is a schematic structural diagram of an in-vehicle electronic device according to an embodiment of the present invention.
- the embodiments of the present invention provide a road detection method based on the Internet of Vehicles and an in-vehicle electronic device.
- the execution subject of the road detection method based on the Internet of Vehicles may be a road detection device based on the Internet of Vehicles, and the road detection device based on the Internet of Vehicles may run on the in-vehicle electronic equipment. middle.
- the in-vehicle electronic device may be a plug-in in an image processing tool, or a program independent of an image processing tool, which is of course not limited thereto.
- an embodiment of the present invention provides a road detection method based on the Internet of Vehicles.
- the road detection method based on the Internet of Vehicles will be introduced first.
- a road detection method based on the Internet of Vehicles provided by an embodiment of the present invention, applied to a vehicle terminal, may include the following steps:
- the target road image is the image captured by the image acquisition device at the image end for the road area.
- the image collection terminal can be other vehicles, pedestrians, road facilities, service platforms, etc. that are connected to the current vehicle through the Internet of Vehicles technology.
- the image terminal can be high-level road facilities such as street light poles and overpasses on the roadside, or it can be flying equipment such as drones. Image acquisition devices are deployed on these image acquisition terminals.
- the image capturing device may include a camera, a video camera, a camera, a mobile phone, etc.; in an optional embodiment, the image capturing device may be a high-resolution camera.
- the image acquisition device can continuously collect road images of the corresponding area at certain time intervals, such as shooting at a rate of 30fps, and the image acquisition terminal where it is located transmits the collected road images to the corresponding vehicle.
- the time interval can also be adjusted according to the density of objects on the road or according to demand.
- a major problem in the Internet of Vehicles is the problem of beyond the line of sight. Due to the limited sight distance of the driver when driving on the road, the road conditions beyond the sight distance cannot be observed with the naked eye, especially when there are large vehicles and intersections ahead, the sight distance is even more limited.
- the Internet of Vehicles needs to solve the problem of beyond the line of sight, so that drivers can obtain information on road conditions beyond the line of sight and adjust their driving plans as soon as possible.
- the size of the target road image is 416 ⁇ 416 ⁇ 3. Therefore, in this step, in one embodiment, the vehicle-mounted terminal can directly obtain a target road image with a size of 416 ⁇ 416 ⁇ 3 from the image acquisition terminal; in another embodiment, the vehicle-mounted terminal can obtain any size sent by the image acquisition terminal. The image of the vehicle is then scaled by a certain size to obtain a target road image with a size of 416 ⁇ 416 ⁇ 3.
- image enhancement operations such as cropping, splicing, smoothing, filtering, and edge filling can also be performed on the acquired image, so as to enhance the features of interest in the image and expand the generalization ability of the data set.
- the network structure of the YOLOv3 network in the prior art is introduced.
- the part inside the dashed box is the YOLOv3 network.
- the part in the dotted line box is the backbone network of the YOLOv3 network, that is, the darknet-53 network; the backbone network of the YOLOv3 network is composed of a CBL module and five resn modules connected in series.
- the CBL module is a convolutional network module, including a serially connected conv layer (Convolutional layer, convolutional layer, referred to as conv layer), BN (Batch Normalization, batch normalization) layer and activation function Leaky relu corresponding to the Leaky relu layer, CBL means conv+BN+Leakyrelu.
- the resn module is a residual module, and n represents a natural number, as shown in Figure 2.
- the resn module includes serially connected zero padding (zero padding) layers, CBL module and residual unit group
- the residual unit group is represented by Res unit*n, which means that it includes n residual units Res unit, and each residual unit includes the connection form of Residual Network (Residual Network, referred to as ResNets)
- ResNets Residual Network
- the rest of the main network is the FPN (FeaturePyramidNetworks, feature pyramid network) network, and the FPN network is divided into three prediction branches Y 1 ⁇ Y 3 , and the scales of the prediction branches Y 1 ⁇ Y 3 are respectively the same as the ones along the input reverse direction.
- the scales of the feature maps output by the three residual modules res4, res8, and res8 correspond one-to-one.
- the prediction results of each prediction branch are represented by Y1, Y2, and Y3, respectively, and the scales of Y1, Y2, and Y3 increase in sequence.
- Each prediction branch of the FPN network includes a convolutional network module group, specifically including five convolutional network modules, namely CBL*5 in Figure 2.
- the US (up sampling, up sampling) module is an up sampling module; concat indicates that the feature fusion adopts a cascade method, and concat is the abbreviation of concatenate.
- the improved YOLOv3 network includes a densely connected backbone network and an improved FPN network; the improved YOLOv3 network is based on the YOLOv3 network, replacing the residual module in the backbone network with a densely connected module, adding features It is formed by extracting scale, optimizing the feature fusion method of FPN network, and performing pruning and combining knowledge distillation to guide network recovery processing; the improved YOLOv3 network is obtained by training based on sample road images and the location and category of the target corresponding to the sample road images. of. The network training process will be introduced later.
- the structure of the improved YOLOv3 network is first introduced below, first of all, its backbone network part.
- FIG. 3 is a schematic diagram of the structure of the improved YOLOv3 network provided by the embodiment of the present invention. The portion inside the dot-dash line frame in FIG. 3 .
- the backbone network of the improved YOLOv3 network Compared with the backbone network of the YOLOv3 network, the backbone network of the improved YOLOv3 network provided by the embodiment of the present invention has an improvement idea in that, on the one hand, drawing on the connection method of the dense convolutional network DenseNet, a specific dense connection module is proposed. to replace the residual module (resn module) in the backbone network of the YOLOv3 network. That is, the backbone network of the improved YOLOv3 network adopts the backbone network in the form of dense connection. It is known that ResNets combine features by summing before passing the features to the layer, that is, feature fusion in a parallel manner.
- the dense connection method all layers (with matching feature map sizes) are directly connected to each other in order to ensure that information flows between layers in the network to the greatest extent. Specifically, for each layer, all feature maps of its previous layers are used as its input, and its own feature maps are used as the input of all subsequent layers, that is, feature fusion adopts a cascade method (also called concatenation Way). Therefore, compared with the residual module used in the YOLOv3 network, the improved YOLOv3 network can obtain more information in the feature map by switching to the dense connection module, which can enhance the feature propagation and improve the detection accuracy when performing road image detection.
- the feature maps are transferred from shallow to deep, and feature maps of at least four scales are extracted, so that the network can detect objects of different scales.
- the detection accuracy can be improved for small targets.
- the small targets in the embodiments of the present invention include objects with small volumes on the road, such as road signs, small obstacles, small animals, etc., or objects with small areas in the image due to the long shooting distance.
- the backbone network in the form of dense connection may include:
- Densely connected modules and transition modules in series at intervals; the densely connected modules are denoted as denm in Figure 3.
- the number of densely connected modules is y; the densely connected modules include serially connected convolutional network modules and densely connected unit groups; the convolutional network modules include serially connected convolutional layers, BN layers, and Leaky relu layers; densely connected unit groups It includes m densely connected units; each densely connected unit includes multiple convolutional network modules connected in a densely connected form, and uses a cascaded method to fuse the feature maps output by multiple convolutional network modules; where y is greater than or equal to 4 , and m is a natural number greater than 1.
- the number of densely connected modules in Figure 3 is 5.
- the improved YOLOv3 network composed of 5 densely connected modules has higher accuracy.
- the convolutional network module is denoted as CBL as before; the densely connected unit group is denoted as den unit*m, which means that the densely connected unit group includes m densely connected units, and m can be 2.
- Each dense connection unit is represented as a den unit; it includes multiple convolutional network modules connected in the form of dense connections, and uses a cascaded method to fuse the feature maps output by multiple convolutional network modules.
- the cascaded method is concat, which means Tensor splicing, this operation is different from the add operation in the residual module, concat will expand the dimension of the tensor, and add just adds directly and does not change the dimension of the tensor.
- the dense connection module is used to change the feature fusion method from parallel to serial, which can directly use the early feature map as the input of each subsequent layer to strengthen the feature pass, and reduce the number of parameters and the amount of computation by reusing the feature map parameters of the shallow network.
- the backbone network in the form of dense connection extracts feature maps of at least 4 scales to perform feature fusion of subsequent prediction branches. Therefore, the number y of dense connection modules is greater than or equal to 4, so that the feature maps output by itself correspond to Fusion into each prediction branch. It can be seen that compared with the YOLOv3 network, the improved YOLOv3 network obviously adds at least one finer-grained feature extraction scale to the backbone network. Please refer to Figure 3. Compared with the YOLOv3 network, the feature map output by the fourth residual module along the input reverse is added for subsequent feature fusion.
- the backbone network in the form of dense connection outputs corresponding feature maps respectively along the input reverse direction of the four dense connection modules, and the scales of these four feature maps increase in turn.
- the scales of each feature map are 13 ⁇ 13 ⁇ 72, 26 ⁇ 26 ⁇ 72, 52 ⁇ 52 ⁇ 72, and 104 ⁇ 104 ⁇ 72, respectively.
- five feature extraction scales may also be set, that is, the feature map output by the fifth dense connection module in the reverse direction of the input is added to perform subsequent feature fusion, and so on.
- step S2 x feature maps of different scales are obtained, including:
- the feature maps respectively output by the first dense connection module to the fourth dense connection module along the input reverse direction are obtained, and the size of these four feature maps increases in turn.
- the transition module is a convolutional network module. That is to use the CBL module as a transition module. Then, when building the backbone network of the improved YOLOv3 network, it is only necessary to replace the residual module with a dense connection module, and then connect the dense connection module and the original CBL module in series. In this way, the network construction process will be faster, and the resulting network structure will be simpler.
- a transition module only uses the convolutional layer for transition, that is, the feature map is dimensionally reduced by increasing the step size directly. This can only take care of local area features, but cannot combine the information of the entire image, so it will make the features Much information is lost in the figure.
- the transition module includes a convolutional network module and a maximum pooling layer; the input of the convolutional network module and the input of the maximum pooling layer are shared, and the feature map output by the convolutional network module and the maximum pooling layer are shared.
- the feature maps output by the layers are fused in a cascaded manner.
- FIG. 4 is a schematic structural diagram of a transition module according to an embodiment of the present invention.
- the transition module is represented by a tran module, and the MP layer is a maximum pooling layer (Maxpool, abbreviated MP, meaning maximum pooling).
- the step size of the MP layer can be selected to be 2.
- the introduced MP layer can reduce the dimension of the feature map with a larger receptive field; the parameters used are relatively small, so the calculation amount will not increase too much, which can reduce the possibility of overfitting and improve the The generalization ability of the network model; and combined with the original CBL module, it can be regarded as dimensionality reduction of feature maps from different receptive fields, so more information can be retained.
- the number of convolutional network modules included in the transition module is two or three, and the convolutional network modules are connected in series. Compared with using one convolutional network module, using two or three convolutional network modules in series can increase the complexity of the model and fully extract features.
- the improved FPN network includes x prediction branches Y 1 ⁇ Y x whose scales increase in turn; the scales of the prediction branches Y 1 ⁇ Y x correspond one-to-one with the scales of the x feature maps; exemplarily, the improvement of FIG. 3
- the FPN network has 4 prediction branches Y 1 -Y 4 , whose scales correspond to the scales of the aforementioned 4 feature maps respectively.
- the improved FPN network is used to perform top-down, densely connected feature fusion on feature maps of x different scales, including:
- the feature map after convolution processing is cascaded and fused with the feature map after upsampling processing by the prediction branches Y i-1 to Y 1 respectively;
- the size of the three feature maps is the same, which is 52 ⁇ 52 ⁇ 72. In this way, the prediction branch Y3 can continue to perform convolution and other processing after the cascade fusion to obtain the prediction result Y3, and the size of Y3 is 52 ⁇ 52 ⁇ 72.
- the prediction branch Y 1 it obtains the feature map output by the first dense connection module along the input reverse direction and then performs the subsequent prediction process by itself, and does not accept the feature maps of the remaining prediction branches to be fused with it.
- the feature fusion method of the FPN network of the original YOLOv3 network the method of adding the deep and shallow network features first, and then up-sampling together, after adding the features, must pass the convolution method.
- the layer extracts feature maps, and such operations will destroy some of the original feature information.
- the feature fusion combines the horizontal method and the top-down dense connection method.
- the original top-down method becomes the feature map of the smaller-scale prediction branch. Transfer its own features to each large-scale prediction branch, and change the feature fusion method into a dense fusion method, that is, the deep features are directly upsampled by different multiples, so that all the transferred feature maps have the same size. .
- each prediction branch mainly uses some convolution operations to perform prediction.
- the related prior art which will not be described here.
- the above-mentioned top-down and dense connection mode feature fusion can be respectively adopted for the improved YOLOv3 network that adopts two different forms of transition modules.
- this step is implemented in a modified YOLOv3 network employing the transition module shown in FIG. 4 .
- the improved YOLOv3 network refers to the network obtained in Figure 3 combined with Figure 4.
- four prediction branches output feature maps of four scales, which are 13 ⁇ 13 ⁇ 72, 26 ⁇ 26 ⁇ 72, 52 ⁇ 52 ⁇ 72, and 104 ⁇ 104 ⁇ 72 respectively.
- the smallest 13 ⁇ 13 ⁇ 72 feature map has the largest receptive field and is suitable for larger target detection
- the medium 26 ⁇ 26 ⁇ 72 feature map has a medium receptive field and is suitable for detecting medium-sized targets
- the large 52 ⁇ 52 ⁇ 72 feature map has a smaller receptive field and is suitable for detecting smaller targets
- the largest 104 ⁇ 104 ⁇ 72 feature map has a smaller receptive field, so it is suitable for detecting smaller objects.
- the goal It can be seen that the embodiment of the present invention divides images more precisely, and the prediction results are more targeted for objects with smaller sizes.
- Network training is done in the server, and network training can include three processes: network pre-training, network pruning, and network fine-tuning. Specifically, the following steps may be included:
- each sample road image is marked in the form of a target frame containing the target, this target frame is true and accurate, and each target frame is marked with coordinate information to reflect the target's position in the image.
- determine the anchor box size in the sample road image may include the following steps:
- the anchor box is several boxes of different sizes obtained by statistics or clustering from the ground truth in the training set; the anchor box actually constrains the range of the predicted object, and adds Size prior experience, so as to achieve the purpose of multi-scale learning.
- the anchor box since it is desired to add a finer-grained feature extraction scale, it is necessary to use a clustering method to cluster the sizes of each target frame (that is, the real frame) that have been marked in the sample road image, so as to obtain suitable Appropriate anchor box size for the scene of the embodiment of the present invention.
- determine the number of clusters to be clustered for the anchor box size in the sample road image including:
- This step is actually to obtain the size of each target frame in the sample road image.
- the size of each target box may be clustered using the K-Means clustering method to obtain a clustering result of the anchor box size; the clustering process will not be repeated here.
- the definition of the distance for different anchor boxes is the Euclidean distance of its width and height:
- d 1,2 represents the Euclidean distance between two anchor boxes
- w 1 , w 2 represent the width of the anchor box
- h 1 , h 2 represent the height of the anchor box.
- the clustering results of the anchor box size can be: (13,18), (20,27), (26,40), (38,35), (36,61), (56 ,45), (52,89), (70,61), (85,89), (69,155), (127,112), (135,220). specific:
- Anchor box size for prediction branch Y 1 (69,155), (127,112), (135,220);
- Anchor box size for prediction branch Y 2 (52,89), (70,61), (85,89);
- Anchor box size for prediction branch Y 3 (38,35), (36,61), (56,45);
- Anchor box size for prediction branch Y 4 (13,18), (20,27), (26,40);
- the clustering result is written into the configuration file of each prediction branch of the road image detection network according to the anchor box size corresponding to different prediction branches, and then the network can be pre-trained.
- pre-training the built network including the following steps:
- the residual module in the backbone network is changed to a dense connection module, the feature extraction scale is increased, and the feature fusion method of the FPN network is optimized.
- the dense connection of the backbone network The module performs layer pruning to obtain the YOLOv3-1 network;
- channel pruning is performed directly during the simplified processing of the YOLOv3 network, but the inventor found in experiments that it is still difficult to achieve rapid speed improvement only through channel pruning. Therefore, a layer pruning process is added before channel pruning.
- this step can perform layer pruning on the dense connection module of the backbone network in the aforementioned complex network, that is, perform layer pruning on the number m of dense connection units in the dense connection module, reduce m to 2, and obtain YOLOv3-1 The internet.
- the YOLOv3-1 network is sparsely trained to obtain a YOLOv3-2 network with sparse distribution of BN layer scaling coefficients; it may include:
- the YOLOv3-1 network is sparsely trained. During the training process, sparse regularization is added to the scaling factor ⁇ .
- the loss function of sparse training is:
- the application scenario of the embodiment of the present invention is a road target detection scenario, and the number of types of targets to be detected can be set to 13, which is far less than the 80 in the original YOLOv3 data set. Therefore, the value of ⁇ can be selected with a larger value of ⁇ , and the convergence speed of sparse training will not be very slow. At the same time, the convergence can be further accelerated by increasing the learning rate of the model; however, considering that the parameter selection is too large and It will cause a certain loss to the accuracy of the network model.
- the combination of learning rate of 0.25 ⁇ and ⁇ of 0.1 ⁇ is finally determined as the preferred parameter combination for sparse training.
- the preferred combination of learning rate and weight in the embodiment of the present invention is more favorable for the distribution of weights after coefficient training, and the accuracy of the network model is also higher.
- pruning a channel basically corresponds to removing all incoming and outgoing connections of that channel, and a lightweight network can be directly obtained without using any special sparse computing package.
- scaling factors act as a proxy for channel selection; since they are co-optimized with network weights, the network can automatically identify insignificant channels that can be safely removed without greatly affecting generalization performance.
- this step may include the following steps:
- the channel pruning ratio may be 60%.
- the YOLOv3-3 network is subjected to knowledge distillation to obtain an improved YOLOv3 network.
- knowledge distillation is introduced into the YOLOv3-3 network, the aforementioned complex network is used as the teacher network, and the YOLOv3-3 network is used as the student network.
- the teacher network guides the student network to restore and adjust the accuracy, and obtain an improved YOLOv3 network.
- the output result before the Softmax layer of the aforementioned complex network can be divided by the temperature coefficient to soften the predicted value of the final output of the teacher network, and then the student network uses the softened predicted value as a label to assist in training YOLOv3 -3 network, the accuracy of the YOLOv3-3 network is finally comparable to that of the teacher network; among them, the temperature coefficient is a preset value and does not change with network training.
- the reason for introducing the temperature parameter T is that the classification result of the input data of a trained network with high accuracy is basically the same as the real label.
- the real known training class label is [1, 0, 0]
- the predicted result may be [0.95, 0.02, 0.03], which is very close to the real label value. Therefore, for the student network, there is little difference between using the classification results of the teacher network to assist training and directly using data for training.
- the temperature parameter T can be used to control the softening degree of the predicted labels, that is, it can increase the bias of the classification results of the teacher network.
- the in-vehicle device may be a device placed in the car, such as a navigator, a mobile phone, and the like.
- the improved YOLOv3 network also includes a classification network and a non-maximum suppression module; the classification network and the non-maximum suppression module are concatenated after the FPN network.
- the attribute information of the target road image is obtained, including:
- the classification network includes SoftMax classifier.
- the purpose is to achieve mutually exclusive classification of multiple vehicle classes.
- the classification network can also use the logistic regression of the YOLOv3 network for classification to achieve multiple independent binary classifications.
- the non-maximum value suppression module is used for NMS (non_max_suppression, non-maximum value suppression) processing. It is used to exclude prediction boxes with relatively low confidence in multiple prediction boxes that repeatedly select the same target.
- the detection result is in the form of a vector, including the position of the predicted frame, the confidence level of the vehicle in the predicted frame, and the category of the target in the predicted frame.
- the position of the prediction frame is used to represent the position of the target in the target road image; specifically, the position of each prediction frame is represented by four values of bx, by, bw, and bh, and bx and by are used to represent the position of the center point of the prediction frame.
- bw, bh are used to represent the width and height of the prediction box. For example, there are 1 bus, 5 cars and 2 pedestrians on the road. They are located at different positions of the target road image. The bus is located in the image with the upper left corner as the origin, 230 pixels horizontally and 180 pixels vertically. The bus is located in the image. If the width is 20 and the height is 50, then its attribute information can be "230, 180, 20, 50, bus".
- the category of the target is the category of the object to which the target belongs, such as people, animals, buildings, vehicles, signs and so on.
- the target may only be a vehicle
- the categories may include cars, single-deck buses, double-deck buses, large trucks, vans, bicycles, motorcycles, and the like.
- the method may further include:
- the attribute information may be displayed, including: displaying the attribute information on the in-vehicle device.
- the attribute information may be displayed on a display screen in the vehicle, and the display screen may be a display screen of a navigation device or a display screen of a driver's mobile phone.
- the target road image marked with attribute information can be directly displayed on the display screen in the car, so that the driver in the car can directly observe the attribute information, so as to understand the location and category of each target displayed in the target road image.
- a driver in the distance can obtain the road conditions beyond his line of sight, and make appropriate driving behaviors in advance, such as slowing down, route planning, object avoidance, etc., to achieve the purpose of safe driving.
- the attribute information can also be displayed in the form of other text, which is reasonable.
- the attribute information can be played in the form of voice, so that the driver can easily receive the attribute information even when it is inconvenient to view the image while driving, which is conducive to safe driving.
- the above two methods can be combined.
- display attribute information on the in-vehicle device which may include:
- a special reminder can be given for small targets.
- the size of the prediction frame where the target is located can be determined, and it can be judged whether the size of a prediction frame is smaller than the preset prediction frame size, and if so, Then it can be determined that the target belongs to the small target to be reminded; alternatively, the category of the target can be divided in advance, and some obviously smaller object categories such as signs are preset as the small target category, by judging whether the category of a target is not. It belongs to the preset small target category to determine whether the target belongs to the small target to be reminded.
- the position and category of the target can be combined to determine the small target to be reminded.
- the attribute information can be displayed in the reminder mode on the in-vehicle device; for example, the target road image is marked with brightly colored fonts, or marked in the form of flashing, or supplemented by voice prompts ,and many more.
- the target road image is marked with brightly colored fonts, or marked in the form of flashing, or supplemented by voice prompts ,and many more.
- voice prompts and many more.
- a combination of multiple reminder methods can be used.
- the general mode can be used to display the attribute information on the in-vehicle device, that is, a consistent mode is adopted for all targets, which will not be repeated here.
- the method may further include:
- the driver can send the attribute information to the image acquisition terminal or other vehicles, pedestrians, etc., so that multiple terminals in the Internet of Vehicles system can obtain the attribute information to achieve information statistics, safe driving and other purposes.
- the vehicle can carry the current position information of the vehicle, such as the coordinate information obtained through GPS (Global Positioning System, global positioning system), and the current time information, so that the receiver can have a better understanding of the road condition information. clearer understanding.
- a plurality of target road images within a predetermined period of time may be acquired to perform target detection, and the position and category of the same target may be used to achieve target trajectory tracking, and so on.
- the original YOLOv3 network contains more convolutional layers, because there are more categories of targets, including 80.
- the target is mainly an object on the road, and the number of categories of the target is small, so a large number of convolutional layers are unnecessary, which will waste network resources and reduce the processing speed.
- the number of densely connected units contained in the densely connected module is set to 2 , the number of convolutional layers in the backbone network can be reduced for the target road image in the embodiment of the present invention without affecting the accuracy of the network.
- the improved YOLOv3 network can also be obtained by adjusting the value of k in the convolutional network module group of each prediction branch in the FPN network, that is, k is reduced from 5 in the original YOLOv3 network. is 4 or 3, that is, the original CBL*5 is changed to CBL*4 or CBL*3; this can also reduce the number of convolutional layers in the FPN network, without affecting the network accuracy, for the implementation of the present invention For example, the target road image, the overall number of network layers is simplified, and the network processing speed is improved.
- the residual module in the backbone network of the YOLOv3 network is replaced with a dense connection module, and the feature fusion mode is changed from parallel to serial, so that when the backbone network performs feature extraction,
- the early feature map can be directly used as the input of each subsequent layer, and the obtained feature map has more information, which strengthens the transfer of features, so the detection accuracy can be improved when the target road image is detected.
- the number of parameters and the amount of computation can be reduced by multiplexing the feature map parameters of the shallow network.
- using multiple feature extraction scales to add fine-grained feature extraction scales for small targets can improve the detection accuracy of small targets in target road images.
- the feature fusion method of the FPN network is changed, the feature map extracted by the backbone network is characterized by a top-down and dense connection method, and the deep features are directly upsampled by different multiples, so as to make the transmitted All feature maps have the same size.
- the shallow network there is also the participation of high-dimensional semantic information, which is helpful for Improve the detection accuracy; at the same time, by directly receiving the features of the shallower network, more specific features can be obtained, which will effectively reduce the loss of features, reduce the amount of parameters that need to be calculated, improve the detection speed, and realize real-time detection.
- the network volume can be reduced and most of the redundancy can be eliminated.
- the calculation can greatly improve the detection speed while maintaining the detection accuracy.
- the invention deploys the detection process of the cloud in the edge devices with very limited storage resources and computing resources, and the vehicle-mounted device can realize the road detection beyond the line of sight, and can realize the high-precision and high-real-time detection of the targets on the road. It is beneficial for the driver to drive safely.
- the present invention selects the UA-DETRAC data set for experiments.
- the shooting location of the UA-DETRAC dataset is the road crossing overpasses in Beijing and Tianjin.
- the shooting equipment is Cannon EOS550D
- the video frame rate is 25FPS
- the data format is JPEG
- the image size is 960 ⁇ 540.
- the dataset contains 60 videos, shot on sunny days, cloudy days, rainy days and nights, including data under different weather conditions.
- the total number of images is 82085 and the objects are annotated. These annotations are manually annotated, so the annotation data is more accurate. All images in each video are numbered sequentially under the same folder, and the annotation data of all images in each video are recorded in an XML file with the same name as the video folder.
- the random sampling method is used to extract the data in the data set.
- the entire dataset contains a total of 82,085 images, and this paper will sample 10,000 images for experiments. And according to the ratio of 4:1 to allocate training set and test set. In order to ensure that the training set and the test set do not contain the same images, the 10,000 images extracted should be randomly selected again for data set allocation.
- training the YOLO network needs to use data in VOC format or COCO format, that is, five numbers are used to represent the type of frame-selected object, the position of the upper left corner, and the length and width of the object, and these data are stored in text documents. Therefore, Python script is used to convert the annotation format of the dataset, and at the same time, the types and proportions of targets in the dataset are counted.
- the residual module in the backbone network of the YOLOv3 network is replaced with a dense connection module and the network after the transition module is improved is named Dense-YOLO-1; for the structure of the Dense-YOLO-1 network, please refer to the network in FIG. 2 and FIG.
- the backbone network of 3 is understood, and will not be repeated here.
- Test Dense-YOLO-1 with YOLOv3 network The mAP (Mean Average Precision) of the model is selected as the evaluation object. The value of mAP is between 0 and 1. The larger the mAP, the better the model accuracy. Of course, also refer to the loss curve of the model to observe the convergence of the model.
- Figure 5-1 is a comparison diagram of mAP curves of YOLOv3 and Dense-YOLO-1 according to the embodiment of the present invention
- Figure 5-2 is YOLOv3 and Dense-YOLO according to the embodiment of the present invention.
- Table 1 The volumes of the YOLOv3 and Dense-YOLO-1 network models and their detection times on different platforms
- the time for the network to perform road image detection on different platforms is shown in Table 1. It can be seen that adding dense connections to the network can reduce the size of the network and reduce the time required for detection.
- Dense-YOLO-1 On the basis of Dense-YOLO-1, an improvement idea of multi-scale is to add a more fine-grained target detection scale to YOLO v3, so that the network can detect smaller objects.
- the scale of 104 ⁇ 104 is specifically increased, and the corresponding anchor box size is set, and the obtained network is named MultiScale-YOLO-1.
- the mAP and loss curves of Dense-YOLO-1 and MultiScale-YOLO-1 networks are shown in Figure 6-1 and Figure 6-2.
- the multi-scale network has improved compared with the densely connected network, but the change is not obvious, only about 7%, and the difference in the loss curve is still not obvious. This may be because the number of small-sized objects in the data set is not large, and the demand for fine-grained recognition is not strong.
- the requirements are high, if time and energy are sufficient and there is no suitable dataset, you can label the dataset yourself.
- Dense-YOLO-1 On the basis of Dense-YOLO-1, another multi-scale improvement idea is to start with the method of feature fusion, and try to improve the method of feature fusion to allow the detection process to integrate more dimensional semantic information, thereby improving target recognition. precision. Therefore, the feature fusion method of the FPN network is improved, and the fusion method in the form of top-down and dense connection is adopted, and the obtained network is named Dense-YOLO-2. The network structure is no longer shown. The mAP and loss curves of Dense-YOLO-1 and Dense-YOLO-2 networks are shown in Figure 7-1 and Figure 7-2.
- the advantages of multi-scale are more obvious. This may be because the densely connected feature fusion method retains the More high-dimensional abstract semantic information than horizontal connections allows the model to discriminate objects more clearly.
- the network accuracy after changing the fusion method is 18.2% higher than the original, and the loss curve is also slightly lower than before. According to the above graph, it can be seen that the improvement of the fusion method significantly improves the network accuracy.
- the network should have a smaller parameter volume and a faster detection speed.
- the volume of the network model after multi-scale improvement and the time for road image detection on different platforms are shown in Table 2.
- Table 2 The volume of the multi-scale improved network model and its detection time on different platforms
- the method of layer pruning is to change the densely connected blocks from a group of 4 densely connected units to a group of 2, which simplifies the network structure and can The amount of parameters and operations are reduced by nearly half.
- the network after layer pruning is named MultiScale-YOLO-3 network, which can also be referred to as YOLOv3-1 network.
- the YOLOv3-1 network is sparsely trained to obtain a YOLOv3-2 network with sparse distribution of BN layer scaling coefficients;
- the channel pruning ratio can be 60%. This is because a small number of target types in the target road image to be detected are greatly affected during the network compression process, which will directly affect the mAP. Therefore, the data set and network compression ratio should be considered.
- the embodiment of the present invention selects types of targets with a smaller number to be combined to balance the number of different types, or directly uses a data set with a more balanced type distribution, which is similar to the application scenario of the embodiment of the present invention. match.
- the other is to control the compression ratio to ensure that the prediction accuracy of a small number of categories does not drop too much. According to the mAP simulation results, the compression ratio of 50%-60% is the turning point of the accuracy change, so the compression ratio of 60% can be initially selected.
- the relationship between detection time and model compression ratio should also be considered.
- the time of image detection is simulated. According to the simulation results, it can be found that the impact of different network compression ratios on the detection time is very weak, while the time required for NMS (non-maximum suppression) has a greater impact.
- NMS non-maximum suppression
- the YOLOv3-3 network is subjected to knowledge distillation to obtain an improved YOLOv3 network.
- the aforementioned complex network serves as the teacher network.
- the resulting network is named YOLO-Terse.
- FIG. 10 is a performance comparison diagram of the improved YOLOv3 network (YOLO-Terse) and the YOLOv3 network according to the embodiment of the present invention. It can be seen that the accuracy of YOLO-Terse is 9.0% higher than that of YOLOv3, while the model size is reduced by 72.9%, and the detection time on Tesla V100 and JetsonTX2 is reduced by 18.9% and 15.3%, respectively. This shows that the model volume is greatly reduced and the detection speed of road images is improved when the accuracy is partially improved.
- the embodiments of the present invention further provide an in-vehicle electronic device, as shown in FIG. 11 , including a processor 1101 , a communication interface 1102 , a memory 1103 and a communication bus 1104 , wherein the processor 1101, the communication interface 1102, and the memory 1103 complete the communication with each other through the communication bus 1104,
- the processor 1101 is configured to implement the steps of any of the foregoing road detection methods based on the Internet of Vehicles when executing the program stored in the memory 1103 .
- the communication bus mentioned in the above electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like.
- PCI peripheral component interconnect standard
- EISA Extended Industry Standard Architecture
- the communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
- the communication interface is used for communication between the above electronic device and other devices.
- the memory may include random access memory (Random Access Memory, RAM), and may also include non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk storage.
- RAM Random Access Memory
- NVM non-Volatile Memory
- the memory may also be at least one storage device located away from the aforementioned processor.
- the above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processing, DSP), application-specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
- CPU Central Processing Unit
- NP Network Processor
- DSP Digital Signal Processing
- ASIC Application Specific Integrated Circuit
- FPGA Field-Programmable Gate Array
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
Description
网络 | 模型大小 | Tesla V100上的检测时间 | Jetson TX2上的检测间 |
YOLOv3 | 236M | 42.8ms | 221.1ms |
Dense-YOLO-1 | 131M | 39.0ms | 214.7ms |
网络 | 模型大小 | Tesla V100上的检测时间 | Jetson TX2上的检测时间 |
Dense-YOLO-2 | 489M | 35.1ms | 300.0ms |
MutiScale-YOLO-1 | 132M | 41.2ms | 243.4ms |
MutiScale-YOLO-2 | 491M | 44.8ms | 350.6ms |
组合 | 学习率 | λ |
1 | 1× | 1× |
2 | 1× | 0.1× |
3 | 0.1× | 1× |
4 | 1× | 0.025× |
5 | 0.25× | 0.1× |
Claims (10)
- 一种基于车联网的道路检测方法,其特征在于,应用于车载端,包括:获取图像采集端拍摄的目标道路图像;将所述目标道路图像输入预先训练得到的改进型YOLOv3网络中,利用密集连接形式的主干网络对所述目标道路图像进行特征提取,得到x个不同尺度的特征图;x为大于等于4的自然数;利用改进型FPN网络对所述x个不同尺度的特征图进行自顶向下、密集连接方式的特征融合,得到各尺度对应的预测结果;基于所有预测结果,得到所述目标道路图像的属性信息,所述属性信息包括所述目标道路图像中目标的位置和类别;其中,所述改进型YOLOv3网络包括所述密集连接形式的主干网络、所述改进型FPN网络;所述改进型YOLOv3网络是在YOLOv3网络基础上,将主干网络中的残差模块更换为密集连接模块、增加特征提取尺度、优化FPN网络的特征融合方式,以及进行剪枝及结合知识蒸馏引导网络恢复处理后形成的;所述改进型YOLOv3网络是根据样本道路图像,以及所述样本道路图像对应的目标的位置和类别训练得到的。
- 根据权利要求1所述的方法,其特征在于,所述密集连接形式的主干网络,包括:间隔串接的密集连接模块和过渡模块;所述密集连接模块的数量为y;所述密集连接模块包括串行连接的卷积网络模块和密集连接单元组;所述卷积网络模块包括串行连接的卷积层、BN层、Leaky relu层;所述密集连接单元组包括m个密集连接单元;每个密集连接单元包括多个采用密集连接形式连接的所述卷积网络模块,并采用级联方式融合多个卷积网络模块输出的特征图;其中,y为大于等于4的自然数,m为大于1的自然数。
- 根据权利要求2所述的方法,其特征在于,所述得到x个不同尺度的特征图,包括:得到沿输入逆向的x个密集连接模块输出的、尺度依次增大的x个特征图。
- 根据权利要求2所述的方法,其特征在于,所述过渡模块包括所述卷积网络模块和最大池化层;所述卷积网络模块的输入和所述最大池化层的输入共用,所述卷积网络模块输出的特征图和所述最大池化层输出的特征图采用级联方式融合。
- 根据权利要求4所述的方法,其特征在于,所述过渡模块包括的所述卷积网络模块的数量为两个或三个,且各个卷积网络模块之间采用串接方式。
- 根据权利要求3所述的方法,其特征在于,所述利用改进型FPN网络对所述x个不同尺度的特征图进行自顶向下、密集连接方式的特征融合,包括:针对预测支路Y i,从所述x个特征图中,获取对应尺度的特征图并进行卷积处理;将卷积处理后的特征图,与预测支路Y i-1~Y 1分别经上采样处理后的特征图进行级联融合;其中,所述改进型FPN网络包括尺度依次增大的x个预测支路Y 1~Y x;所述预测支路Y 1~Y x的尺度与所述x个特征图的尺度一一对应;预测支路Y i-j的上采样倍数为2 j;i=2、3,…,x;j为小于i的自然数。
- 根据权利要求2所述的方法,其特征在于,所述进行剪枝及结合知识蒸馏引导网络恢复处理,包括:对YOLOv3网络基础上将主干网络中的残差模块改为密集连接模块、增加特征提取尺度、优化FPN网络的特征融合方式后得到的网络中,主干网络的密集连接模块进行层剪枝,得到YOLOv3-1网络;对所述YOLOv3-1网络进行稀疏化训练,得到BN层缩放系数稀疏分布的YOLOv3-2网络;将所述YOLOv3-2网络进行通道剪枝,得到YOLOv3-3网络;将所述YOLOv3-3网络进行知识蒸馏,得到所述改进型YOLOv3网络。
- 根据权利要求1所述的方法,其特征在于,对所述改进型YOLOv3网络进行训练之前还包括:确定针对样本道路图像中锚盒尺寸的待聚类数量;获取已标注目标框尺寸的若干样本道路图像;基于已标注目标框尺寸的若干样本道路图像,利用K-Means聚类方法,获得样本道路图像中锚盒尺寸的聚类结果;将所述聚类结果写入所述改进型YOLOv3网络的配置文件中。
- 根据权利要求1所述的方法,其特征在于,所述改进型YOLOv3网络还包括分类网络和非极大值抑制模块;所述基于所有预测结果,得到所述目标道路图像的属性信息,包括:对所有预测结果经由所述分类网络进行分类处理,再经由所述非极大值抑制模块进行预测框去重处理,得到所述目标道路图像的属性信息;其中,所述分类网络包括SoftMax分类器。
- 一种车载电子设备,其特征在于,包括处理器、通信接口、存储器和通信总线,其中,处理器,通信接口,存储器通过通信总线完成相互间的通信;存储器,用于存放计算机程序;处理器,用于执行存储器上所存放的程序时,实现权利要求1-9任一所述的方法步骤。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/564,524 US20230154202A1 (en) | 2020-10-23 | 2021-12-29 | Method of road detection based on internet of vehicles |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011147522.6A CN112380921A (zh) | 2020-10-23 | 2020-10-23 | 一种基于车联网的道路检测方法 |
CN202011147522.6 | 2020-10-23 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/564,524 Continuation US20230154202A1 (en) | 2020-10-23 | 2021-12-29 | Method of road detection based on internet of vehicles |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022083784A1 true WO2022083784A1 (zh) | 2022-04-28 |
Family
ID=74580793
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/130684 WO2022083784A1 (zh) | 2020-10-23 | 2021-11-15 | 一种基于车联网的道路检测方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20230154202A1 (zh) |
CN (1) | CN112380921A (zh) |
WO (1) | WO2022083784A1 (zh) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114881227A (zh) * | 2022-05-13 | 2022-08-09 | 北京百度网讯科技有限公司 | 模型压缩方法、图像处理方法、装置和电子设备 |
CN114912532A (zh) * | 2022-05-20 | 2022-08-16 | 电子科技大学 | 一种自动驾驶汽车多源异构感知数据融合方法 |
CN115019071A (zh) * | 2022-05-19 | 2022-09-06 | 昆明理工大学 | 光学图像与sar图像匹配方法、装置、电子设备及介质 |
CN115116028A (zh) * | 2022-06-06 | 2022-09-27 | 安徽理工大学 | 一种基于Tiny-Yolov4的无人驾驶电机车障碍物检测方法及电子设备 |
CN115115974A (zh) * | 2022-06-08 | 2022-09-27 | 中国船舶集团有限公司***工程研究院 | 基于神经网络的智能航行态势感知*** |
CN115272412A (zh) * | 2022-08-02 | 2022-11-01 | 电子科技大学重庆微电子产业技术研究院 | 一种基于边缘计算的低小慢目标检测方法及跟踪*** |
CN115272763A (zh) * | 2022-07-27 | 2022-11-01 | 四川大学 | 一种基于细粒度特征融合的鸟类识别方法 |
CN115359360A (zh) * | 2022-10-19 | 2022-11-18 | 福建亿榕信息技术有限公司 | 一种电力现场作业场景检测方法、***、设备和存储介质 |
CN115661614A (zh) * | 2022-12-09 | 2023-01-31 | 江苏稻源科技集团有限公司 | 一种基于轻量化YOLO v1的目标检测方法 |
CN116343063A (zh) * | 2023-05-26 | 2023-06-27 | 南京航空航天大学 | 一种路网提取方法、***、设备及计算机可读存储介质 |
CN116434173A (zh) * | 2023-04-12 | 2023-07-14 | 腾讯科技(深圳)有限公司 | 道路图像检测方法、装置、电子设备及存储介质 |
CN116563800A (zh) * | 2023-04-26 | 2023-08-08 | 北京交通大学 | 基于轻量化YOLOv3的隧道内车辆检测方法及*** |
CN116665188A (zh) * | 2023-07-20 | 2023-08-29 | 南京博融汽车电子有限公司 | 一种大客车图像***数据分析方法 |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112380921A (zh) * | 2020-10-23 | 2021-02-19 | 西安科锐盛创新科技有限公司 | 一种基于车联网的道路检测方法 |
CN112949500A (zh) * | 2021-03-04 | 2021-06-11 | 北京联合大学 | 一种基于空间特征编码改进的YOLOv3车道线检测方法 |
CN112949604A (zh) * | 2021-04-12 | 2021-06-11 | 石河子大学 | 一种基于深度学习的主动悬架智能控制方法及装置 |
CN113177937B (zh) * | 2021-05-24 | 2022-09-13 | 河南大学 | 基于改进YOLOv4-tiny的布匹缺陷检测方法 |
CN113592784A (zh) * | 2021-07-08 | 2021-11-02 | 浙江科技学院 | 一种基于轻量级卷积神经网络检测路面病害的方法及装置 |
CN116342894B (zh) * | 2023-05-29 | 2023-08-08 | 南昌工程学院 | 基于改进YOLOv5的GIS红外特征识别***及方法 |
CN116612379B (zh) * | 2023-05-30 | 2024-02-02 | 中国海洋大学 | 一种基于多知识蒸馏的水下目标检测方法及*** |
CN116416626B (zh) * | 2023-06-12 | 2023-08-29 | 平安银行股份有限公司 | 圆形***数据的获取方法、装置、设备及存储介质 |
CN117253123B (zh) * | 2023-08-11 | 2024-05-17 | 中国矿业大学 | 一种基于中间层特征辅助模块融合匹配的知识蒸馏方法 |
CN117218129B (zh) * | 2023-11-09 | 2024-01-26 | 四川大学 | 食道癌图像识别分类方法、***、设备及介质 |
CN117953192A (zh) * | 2024-01-09 | 2024-04-30 | 北京地铁建筑设施维护有限公司 | 一种吊顶病害预警方法及图像采集设备 |
CN117935200A (zh) * | 2024-01-23 | 2024-04-26 | 肇庆学院 | 一种基于改进YOLOv8的自动驾驶路况监测方法 |
CN118233222A (zh) * | 2024-05-24 | 2024-06-21 | 浙江大学 | 一种基于知识蒸馏的工控网络入侵检测方法及装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109815886A (zh) * | 2019-01-21 | 2019-05-28 | 南京邮电大学 | 一种基于改进YOLOv3的行人和车辆检测方法及*** |
AU2019101142A4 (en) * | 2019-09-30 | 2019-10-31 | Dong, Qirui MR | A pedestrian detection method with lightweight backbone based on yolov3 network |
CN111401148A (zh) * | 2020-02-27 | 2020-07-10 | 江苏大学 | 一种基于改进的多级YOLOv3的道路多目标检测方法 |
CN111553406A (zh) * | 2020-04-24 | 2020-08-18 | 上海锘科智能科技有限公司 | 基于改进yolo-v3的目标检测***、方法及终端 |
CN112380921A (zh) * | 2020-10-23 | 2021-02-19 | 西安科锐盛创新科技有限公司 | 一种基于车联网的道路检测方法 |
-
2020
- 2020-10-23 CN CN202011147522.6A patent/CN112380921A/zh not_active Withdrawn
-
2021
- 2021-11-15 WO PCT/CN2021/130684 patent/WO2022083784A1/zh active Application Filing
- 2021-12-29 US US17/564,524 patent/US20230154202A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109815886A (zh) * | 2019-01-21 | 2019-05-28 | 南京邮电大学 | 一种基于改进YOLOv3的行人和车辆检测方法及*** |
AU2019101142A4 (en) * | 2019-09-30 | 2019-10-31 | Dong, Qirui MR | A pedestrian detection method with lightweight backbone based on yolov3 network |
CN111401148A (zh) * | 2020-02-27 | 2020-07-10 | 江苏大学 | 一种基于改进的多级YOLOv3的道路多目标检测方法 |
CN111553406A (zh) * | 2020-04-24 | 2020-08-18 | 上海锘科智能科技有限公司 | 基于改进yolo-v3的目标检测***、方法及终端 |
CN112380921A (zh) * | 2020-10-23 | 2021-02-19 | 西安科锐盛创新科技有限公司 | 一种基于车联网的道路检测方法 |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114881227B (zh) * | 2022-05-13 | 2023-07-04 | 北京百度网讯科技有限公司 | 模型压缩方法、图像处理方法、装置和电子设备 |
CN114881227A (zh) * | 2022-05-13 | 2022-08-09 | 北京百度网讯科技有限公司 | 模型压缩方法、图像处理方法、装置和电子设备 |
CN115019071A (zh) * | 2022-05-19 | 2022-09-06 | 昆明理工大学 | 光学图像与sar图像匹配方法、装置、电子设备及介质 |
CN115019071B (zh) * | 2022-05-19 | 2023-09-19 | 昆明理工大学 | 光学图像与sar图像匹配方法、装置、电子设备及介质 |
CN114912532A (zh) * | 2022-05-20 | 2022-08-16 | 电子科技大学 | 一种自动驾驶汽车多源异构感知数据融合方法 |
CN114912532B (zh) * | 2022-05-20 | 2023-08-25 | 电子科技大学 | 一种自动驾驶汽车多源异构感知数据融合方法 |
CN115116028A (zh) * | 2022-06-06 | 2022-09-27 | 安徽理工大学 | 一种基于Tiny-Yolov4的无人驾驶电机车障碍物检测方法及电子设备 |
CN115115974A (zh) * | 2022-06-08 | 2022-09-27 | 中国船舶集团有限公司***工程研究院 | 基于神经网络的智能航行态势感知*** |
CN115272763A (zh) * | 2022-07-27 | 2022-11-01 | 四川大学 | 一种基于细粒度特征融合的鸟类识别方法 |
CN115272763B (zh) * | 2022-07-27 | 2023-04-07 | 四川大学 | 一种基于细粒度特征融合的鸟类识别方法 |
CN115272412A (zh) * | 2022-08-02 | 2022-11-01 | 电子科技大学重庆微电子产业技术研究院 | 一种基于边缘计算的低小慢目标检测方法及跟踪*** |
CN115272412B (zh) * | 2022-08-02 | 2023-09-26 | 电子科技大学重庆微电子产业技术研究院 | 一种基于边缘计算的低小慢目标检测方法及跟踪*** |
CN115359360A (zh) * | 2022-10-19 | 2022-11-18 | 福建亿榕信息技术有限公司 | 一种电力现场作业场景检测方法、***、设备和存储介质 |
CN115661614A (zh) * | 2022-12-09 | 2023-01-31 | 江苏稻源科技集团有限公司 | 一种基于轻量化YOLO v1的目标检测方法 |
CN115661614B (zh) * | 2022-12-09 | 2024-05-24 | 江苏稻源科技集团有限公司 | 一种基于轻量化YOLO v1的目标检测方法 |
CN116434173A (zh) * | 2023-04-12 | 2023-07-14 | 腾讯科技(深圳)有限公司 | 道路图像检测方法、装置、电子设备及存储介质 |
CN116563800A (zh) * | 2023-04-26 | 2023-08-08 | 北京交通大学 | 基于轻量化YOLOv3的隧道内车辆检测方法及*** |
CN116343063A (zh) * | 2023-05-26 | 2023-06-27 | 南京航空航天大学 | 一种路网提取方法、***、设备及计算机可读存储介质 |
CN116343063B (zh) * | 2023-05-26 | 2023-08-11 | 南京航空航天大学 | 一种路网提取方法、***、设备及计算机可读存储介质 |
CN116665188A (zh) * | 2023-07-20 | 2023-08-29 | 南京博融汽车电子有限公司 | 一种大客车图像***数据分析方法 |
CN116665188B (zh) * | 2023-07-20 | 2023-10-10 | 南京博融汽车电子有限公司 | 一种大客车图像***数据分析方法 |
Also Published As
Publication number | Publication date |
---|---|
US20230154202A1 (en) | 2023-05-18 |
CN112380921A (zh) | 2021-02-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022083784A1 (zh) | 一种基于车联网的道路检测方法 | |
CN110766098A (zh) | 基于改进YOLOv3的交通场景小目标检测方法 | |
CN110348384B (zh) | 一种基于特征融合的小目标车辆属性识别方法 | |
CN112487862B (zh) | 基于改进EfficientDet模型的车库行人检测方法 | |
CN111460919B (zh) | 一种基于改进YOLOv3的单目视觉道路目标检测及距离估计方法 | |
CN112800906B (zh) | 一种基于改进YOLOv3的自动驾驶汽车跨域目标检测方法 | |
CN112417973A (zh) | 一种基于车联网的无人驾驶*** | |
CN111428558A (zh) | 一种基于改进YOLOv3方法的车辆检测方法 | |
CN110781850A (zh) | 道路识别的语义分割***和方法、计算机存储介质 | |
CN113762209A (zh) | 一种基于yolo的多尺度并行特征融合路标检测方法 | |
CN114092917B (zh) | 一种基于mr-ssd的被遮挡交通标志检测方法及*** | |
CN112364719A (zh) | 一种遥感图像目标快速检测方法 | |
CN114445430A (zh) | 轻量级多尺度特征融合的实时图像语义分割方法及*** | |
CN112528934A (zh) | 一种基于多尺度特征层的改进型YOLOv3的交通标志检测方法 | |
CN112364721A (zh) | 一种道面异物检测方法 | |
CN114821492A (zh) | 一种基于YOLOv4的道路车辆检测***及方法 | |
CN112990065A (zh) | 一种基于优化的YOLOv5模型的车辆分类检测方法 | |
CN112819000A (zh) | 街景图像语义分割***及分割方法、电子设备及计算机可读介质 | |
CN112288701A (zh) | 一种智慧交通图像检测方法 | |
CN112395953A (zh) | 一种道面异物检测*** | |
CN115346071A (zh) | 高置信局部特征与全局特征学习的图片分类方法及*** | |
CN116740516A (zh) | 基于多尺度融合特征提取的目标检测方法及*** | |
CN114639067A (zh) | 一种基于注意力机制的多尺度全场景监控目标检测方法 | |
CN112364864A (zh) | 一种车牌识别方法、装置、电子设备及存储介质 | |
CN117710841A (zh) | 一种无人机航拍图像的小目标检测方法、装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21882196 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21882196 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21882196 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 10.10.2023) |