CN114170581A - Anchor-Free traffic sign detection method based on deep supervision - Google Patents

Anchor-Free traffic sign detection method based on deep supervision Download PDF

Info

Publication number
CN114170581A
CN114170581A CN202111487756.XA CN202111487756A CN114170581A CN 114170581 A CN114170581 A CN 114170581A CN 202111487756 A CN202111487756 A CN 202111487756A CN 114170581 A CN114170581 A CN 114170581A
Authority
CN
China
Prior art keywords
module
layer
traffic sign
output
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111487756.XA
Other languages
Chinese (zh)
Inventor
吕卫
梁芷茵
褚晶辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202111487756.XA priority Critical patent/CN114170581A/en
Publication of CN114170581A publication Critical patent/CN114170581A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

An Anchor-Free traffic sign detection method based on deep supervision comprises the following steps: constructing a data set and carrying out data preprocessing to form a training set and a testing set; building an Anchor-Free traffic sign detection neural network model based on deep supervision, which comprises the following steps of sequentially connecting in series: the device comprises an input unit, a coding unit, a layer jump structure unit, a decoding unit and an output prediction unit, wherein the coding unit is also connected with the decoding unit; training an Anchor-Free traffic sign detection neural network model based on deep supervision by using the obtained training set; and testing the Anchor-Free traffic sign detection neural network model based on deep supervision by using the obtained test set. The invention realizes the traffic sign detection by using the Anchor-Free method Based on the coding-decoding structure in the traffic sign detection, avoids the problem that the Anchor-Based method needs to manually set the Anchor frame parameters, and ensures that the algorithm can adapt to various traffic sign detection scenes.

Description

Anchor-Free traffic sign detection method based on deep supervision
Technical Field
The invention relates to a traffic sign detection method. In particular to an Anchor-Free traffic sign detection method based on deep supervision.
Background
The traffic sign is one of the most critical components in a road traffic system, and provides suggestive or restrictive information such as road conditions, real-time traffic conditions and the like for vehicles. The vehicles obey the traffic rules according to the traffic signs, so that the occurrence of traffic jam and traffic accidents can be greatly reduced. In practical applications, the traffic sign detection algorithm is an integral part of an automatic driving system. In the early stage of research, scholars at home and abroad mainly solve the problem of traffic sign detection by combining a plurality of image processing methods because the traffic signs are regular in shape and bright in color. In recent years, with the continuous and deep research of scholars at home and abroad on the neural network, the detection method based on the neural network has better detection effect and higher detection speed than the traditional image processing method, is widely applied to the field of traffic sign detection, and occupies an important position.
The neural network-based traffic sign detection algorithm is high in accuracy, and can better cope with negative influences on detection caused by illumination change, shielding and the like. The current commonly used detection method Based on the neural network is mainly an Anchor-Based method, and is typified by fast-RCNN[1]、SSD[2]And YOLO[3]And the like. The patent "a traffic sign detection algorithm based on the YOLOv5 network structure" (china, 202110305468.1) uses a lightweight feature extraction network to extract image feature information on the basis of YOLOv 5; the patent 'a traffic sign detection and identification method based on a residual SSD model' (China, 201810850416.0) introduces a residual network into the SSD detection model, and improves the capability of the model in extracting features. The Anchor-Based method is Based on the Anchor frame needing to be manually set with hyper-parameters, and the Anchor frame has heuristic prior information, so that the hyper-parameters of the Anchor frame can be manually set in different data sets according to the target size distribution characteristics of the data sets. Meanwhile, the preset anchor frame is sensitive to the change of the data set, and the detection effect of the detection algorithm in the real scene change is reduced. The Anchor-Based method obtains higher recall ratio by densely arranging Anchor frames in the image, but only a small part of the Anchor frames are overlapped with a target real area, and a large amount of extra calculation cost is generated during detection.
Disclosure of Invention
The invention aims to solve the technical problem of providing a method for detecting an Anchor-Free traffic sign based on deep supervision with higher detection speed to overcome the defects of the prior art.
The technical scheme adopted by the invention is as follows: an Anchor-Free traffic sign detection method based on deep supervision comprises the following steps:
step 1, constructing a data set and carrying out data preprocessing to form a training set and a test set;
step 2, building an Anchor-Free traffic sign detection neural network model based on deep supervision, which comprises the following steps of sequentially connecting in series: the device comprises an input unit, a coding unit, a layer jump structure unit, a decoding unit and an output prediction unit, wherein the coding unit is also connected with the decoding unit;
step 3, training the Anchor-Free traffic sign detection neural network model based on deep supervision by adopting the training set obtained in the step 1;
and 4, testing the Anchor-Free traffic sign detection neural network model based on deep supervision by adopting the test set obtained in the step 1.
The data set in the step 1 is used for training and testing a neural network by adopting data which contains 45 types of traffic signs and has the frequency of more than 100 in a China traffic sign data set TT100K published by Qinghua university and Tengchun; the data preprocessing is to cut the original image into 512 x 512 pixel images randomly according to the area where the traffic sign is located, the cut image contains more than one traffic sign, and the annotation of the traffic sign detection frame in the cut image is obtained according to the original annotation file; the incomplete traffic sign in the cut image is used for simulating the condition when the traffic sign is blocked.
Step 2, an Anchor-Free traffic sign detection neural network model based on deep supervision is built through a deep learning framework PyTorch, and the input unit comprises: carrying out primary extraction of shallow features on an input original image through a 7 x 7 convolutional layer to obtain a feature map, wherein the convolutional kernel size of the convolutional layer is 7 x 7, the step length is 2, and the number of output channels is 64; and sequentially passing the feature map through 1 BN layer to prevent gradient disappearance, 1 RELU activation function layer and 1 maximum pooling layer with the step length of 2 and the pooling window of 2 multiplied by 2 to form the input of the coding unit.
The encoding unit described in step 2 includes:
the device comprises a first residual error module, a second residual error module, a third residual error module, a fourth residual error module and a cavity space convolution pooling pyramid module which are sequentially connected in series, wherein a feature map output by an input unit is further subjected to feature extraction through the first residual error module, the second residual error module, the third residual error module and the fourth residual error module in sequence, feature enhancement is performed through the cavity space convolution pooling pyramid module, an enhanced feature map is obtained and is sent to a decoding unit, and the outputs of the first residual error module, the second residual error module and the third residual error module are respectively connected with a layer jump structure unit.
The void space convolution pooling pyramid module is divided into 5 branches, wherein the first branch comprises a global average pooling layer, a fourth 1 x 1 convolution layer and an upper sampling layer which are sequentially connected in series; the second branch comprises a fifth 1 × 1 convolutional layer; the third branch comprises a first 3 x 3 expanded convolutional layer, the expansion rate of the first 3 x 3 expanded convolutional layer is 6; the fourth branch comprises a second 3 × 3 expansion convolutional layer, the expansion rate of the second 3 × 3 expansion convolutional layer is 12; the fifth branch comprises a third 3 × 3 expanded convolutional layer, the third 3 × 3 expanded convolutional layer having an expansion rate of 18; and after the feature maps output by the fourth residual module respectively enter five branches, the feature maps pass through a channel dimension splicing layer, are fused with feature information under different receptive fields, and pass through a sixth 1 multiplied by 1 convolution layer to obtain the output feature map of the void space convolution pooling pyramid module.
The layer jump structure unit in step 2 comprises a first 1 × 1 convolutional layer, a second 1 × 1 convolutional layer, a third 1 × 1 convolutional layer, a first depth supervision mechanism, a second depth supervision mechanism and a third depth supervision mechanism, wherein the first 1 × 1 convolutional layer receives an output characteristic diagram of a first residual error module in a coding unit, outputs the output characteristic diagram to a decoding unit after convolution operation, and simultaneously outputs the output characteristic diagram to the first depth supervision mechanism during model training; the second 1 x 1 convolutional layer receives an output characteristic diagram of a second residual error module in the coding unit, outputs the output characteristic diagram to the decoding unit after convolutional operation, and simultaneously outputs the output characteristic diagram to a second depth supervision mechanism during model training; and the third 1 x 1 convolutional layer receives the output characteristic diagram of the third residual error module in the coding unit, outputs the output characteristic diagram to the decoding unit after convolution operation, and simultaneously outputs the output characteristic diagram to a third depth supervision mechanism during model training.
The first depth supervision mechanism, the second depth supervision mechanism and the third depth supervision mechanism are respectively and correspondingly receive output feature maps of the first 1 x 1 convolution module, the second 1 x 1 convolution module and the third 1 x 1 convolution module in a model training stage; the first depth supervision mechanism, the second depth supervision mechanism and the third depth supervision mechanism have the same structure, and respectively carry out traffic sign central point prediction, offset prediction and scale prediction on a received feature map through three branches, wherein the three branches comprise two 3 multiplied by 3 convolution modules connected in series; obtaining the prediction information of the central point after passing through a seventh 3 multiplied by 3 convolution module and an eighth 3 multiplied by 3 convolution module of the first branch, and performing cross entropy loss calculation with the real central point to obtain cross entropy loss; obtaining prediction information of the offset after passing through a ninth 33 convolution module and a tenth 33 convolution module of the second branch, and performing L1 loss calculation with the real offset to obtain a first L1 loss; obtaining prediction information about the scale after passing through an eleventh 3 × 3 convolution module and a twelfth 3 × 3 convolution module of the third branch, and performing L1 loss calculation with the real scale to obtain a second L1 loss; and adding the cross entropy loss, the first L1 loss and the second L1 loss to obtain an auxiliary loss function, and forming an output value of the deep supervision mechanism.
The decoding unit in step 2 comprises a first decoding module, a second decoding module and a third decoding module, wherein the first decoding module is used for decoding the feature map output by the hollow space convolution pooling pyramid module in the receiving and coding unit, adding the feature map to the output of a third 1 x 1 convolution layer in the layer jump structure unit, and outputting the feature map obtained by adding to the second decoding module and the output prediction unit; the second decoding module decodes the received characteristic diagram, adds the decoded characteristic diagram to the output of a second 1 multiplied by 1 convolution layer in the layer jump structure unit, and outputs the characteristic diagram obtained by the addition to a third decoding module and an output prediction unit; the third decoding module is used for decoding the received characteristic diagram, adding the decoded characteristic diagram to the output of the first 1 multiplied by 1 convolution layer in the layer jump structure unit, and outputting the characteristic diagram obtained by the addition to the output prediction unit; the first decoding module, the second decoding module and the third decoding module have the same structure and respectively comprise a bilinear interpolation layer and a 3 multiplied by 3 convolution layer which are sequentially connected in series.
The output prediction unit in step 2 comprises: a first bilinear interpolation for receiving the result of the output addition of the second decoding module in the decoding unit and the second 1 × 1 convolutional layer in the layer-skipping structural unit, a second bilinear interpolation for receiving the result of the output addition of the first decoding module in the decoding unit and the third 1 × 1 convolutional layer in the layer-skipping structural unit, a channel dimension splicing layer for respectively receiving the result of the output addition of the third decoding module in the decoding unit and the first 1 × 1 convolutional layer in the layer-skipping structural unit, the output of the first bilinear interpolation and the output of the second bilinear interpolation, the channel dimension splicing layer fuses the obtained feature maps into one feature map for output, the fused feature map respectively carries out the prediction information of the traffic sign center point category and position, the center point offset and the traffic sign scale through three branches and outputs to realize the traffic sign detection, the three branches have the same structure, there are two concatenated 3 x 3 convolution modules.
The 3 × 3 convolution module comprises a 3 × 3 convolution layer, a BN layer and a RELU layer which are sequentially connected in series.
The Anchor-Free traffic sign detection method Based on deep supervision realizes traffic sign detection by using an Anchor-Free method Based on a coding-decoding structure in traffic sign detection, avoids the problem that the Anchor-Based method needs to manually set Anchor frame parameters, and enables an algorithm to be suitable for various traffic sign detection scenes. In the invention, a cavity space convolution Pooling Pyramid module (ASPP) is added behind the coding sub-network, the module utilizes the cavity convolution with various expansion rates to capture the feature information of different space proportions, and the space representation capability of the features extracted by the decoding sub-network is enhanced, thereby improving the detection capability of the detection model on the traffic sign features of different space proportions. The invention introduces a layer jump structure between coding-decoding structures and a deep supervision mechanism. The skip-layer structure can utilize the multilevel features generated by the coding sub-network, and fully utilize the edge and detail information of the shallow features and the semantic information of the deep features. The deep supervision mechanism can optimize the training of the model and reduce the difficulty of model optimization caused by utilizing shallow features. The invention connects the multilevel decoding features in the decoding structure on the channel dimension, thereby fusing the multilevel features. After the multi-level features are fused, channel information of interest is obtained by using a channel attention mechanism, so that the output can integrate richer feature information including detail information and context information and obtain the channel information of interest. The invention has the beneficial effects that:
1. the neural network adopts a coding-decoding structure, a cavity space convolution pooling pyramid module is added after a coding sub-network, and the cavity convolutions with different expansion rates can extract semantic information with different space scales, so that the module can extract and fuse context information under a plurality of space scales, and the detection performance of a multi-scale traffic sign target is improved.
2. The neural network adds a layer jump structure between the coding-decoding structures, and the layer jump structure is from the multi-level characteristics of the coding sub-network, so that the intermediate-level characteristics can be effectively utilized, and the reuse rate of the characteristics is improved. And the middle-level features of the coding sub-network have richer detail information and edge information, and the accuracy of positioning the traffic sign can be improved by utilizing the features.
3. Deep supervision is introduced after the layer jump structure, in the training process, all levels of feature maps of the layer jump structure are added and output as a part of a loss function, and the middle level feature maps can be directly optimized in the optimization process, so that the optimization difficulty of the model is reduced.
4. The neural network connects the outputs of the multi-stage decoding modules in the channel dimension and uses the feature map of the channel attention mechanism for prediction. The multi-stage decoding module comprises multi-scale features from a layer jump structure, the obtained feature graph comprises detail information and context information after splicing on a channel dimension, interested channel weight is increased through a channel attention mechanism, and feature expression capacity is improved.
Drawings
FIG. 1 is a schematic diagram of a neural network model for detecting Anchor-Free traffic signs based on deep supervision constructed by the invention;
FIG. 2 is a schematic diagram of a hollow space convolution pooling pyramid module structure according to the present invention;
FIG. 3 is a schematic diagram of the deep supervision mechanism of the present invention;
FIG. 4 is a schematic diagram of a decoding module according to the present invention;
FIG. 5 is a schematic diagram of the structure of a 33 convolution module according to the present invention;
FIG. 6 is a graph of the effect of the test using the method of the present invention.
Detailed Description
The depth supervision-Free traffic sign detection method based on the invention is explained in detail below with reference to the embodiments and the attached drawings.
The invention discloses a depth supervision-based Anchor-Free traffic sign detection method, which comprises the following steps:
step 1, constructing a data set and carrying out data preprocessing to form a training set and a test set;
the data set is used for training and testing a neural network by adopting data which contains 45 types of traffic signs and has the frequency of occurrence of more than 100 in a Chinese traffic sign data set TT100K published by Qinghua university and Tengchong; the data preprocessing is to cut the original image into 512 x 512 pixel images randomly according to the area where the traffic sign is located, the cut image contains more than one traffic sign, and the annotation of the traffic sign detection frame in the cut image is obtained according to the original annotation file; the incomplete traffic sign in the cut image is used for simulating the condition when the traffic sign is blocked.
Step 2, building an Anchor-Free traffic sign detection neural network model based on deep supervision, as shown in the attached figure 1, and sequentially connecting in series: the device comprises an input unit 1, an encoding unit 2, a layer jump structure unit 3, a decoding unit 4 and an output prediction unit 5, wherein the encoding unit 2 is also connected with the decoding unit 4. The invention relates to an Anchor-Free traffic sign detection neural network model based on deep supervision, which is built through a deep learning framework PyTorch.
In the invention, a layer jump structure is introduced between layers with the same size of characteristic graphs of the coding sub-network and the decoding sub-network, and a deep supervision mechanism is added between the layer jump structures. The layer jump structure effectively utilizes rich detail information contained in the decoding sub-network, and more accurate target positioning information can be obtained in the output characteristic diagram through the layer jump structure. The deep surveillance mechanism helps the training phase model to be better optimized, and reduces the difficulty of optimization increased by using shallow features.
In order to obtain the multi-level characteristics in the decoding sub-network, the output obtained by each level of decoding module is spliced on the channel dimension, and the weight of the interested channel information is increased by using a channel attention mechanism.
In the invention, based on the depth supervision Anchor-Free traffic sign detection neural network model,
1. as shown in fig. 1, the input unit 1 includes:
carrying out primary extraction of shallow features on an input original image through a 7 x 7 convolutional layer to obtain a feature map, wherein the convolutional kernel size of the convolutional layer is 7 x 7, the step length is 2, and the number of output channels is 64; and the feature map sequentially passes through 1 BN layer to prevent gradient disappearance, 1 RELU activation function layer and 1 maximum pooling layer with the step length of 2 and the pooling window of 2 multiplied by 2 to form the input of the encoding unit 2.
2. The encoding unit 2 includes:
the device comprises a first residual error module, a second residual error module, a third residual error module, a fourth residual error module and a cavity space convolution pooling pyramid module which are sequentially connected in series, wherein after a feature map output by an input unit 1 is further subjected to feature extraction sequentially through the first residual error module, the second residual error module, the third residual error module and the fourth residual error module, feature enhancement is performed through the cavity space convolution pooling pyramid module, an enhanced feature map is obtained and is sent to a decoding unit 4, and the outputs of the first residual error module, the second residual error module and the third residual error module are respectively connected with a layer jump structure unit 3. The first Residual module, the second Residual module, the third Residual module and the fourth Residual module are all structured by ResNet-101(He K, Zhang X, Ren S, et al. deep reactive Learning for Image registration [ C ]//2016IEEE Conference on Computer Vision and Pattern Registration (CVPR), Las vegas, NV, USA (2016.6.27-2016.6.30),2016: 770-778.).
The void space convolution pooling pyramid module is divided into 5 branches as shown in fig. 2, wherein the first branch comprises a global average pooling layer, a fourth 1 × 1 convolution layer and an upper sampling layer which are sequentially connected in series; the second branch comprises a fifth 1 × 1 convolutional layer; the third branch comprises a first 3 x 3 expanded convolutional layer, the expansion rate of the first 3 x 3 expanded convolutional layer is 6; the fourth branch comprises a second 3 × 3 expansion convolutional layer, the expansion rate of the second 3 × 3 expansion convolutional layer is 12; the fifth branch comprises a third 3 × 3 expanded convolutional layer, the third 3 × 3 expanded convolutional layer having an expansion rate of 18; and after the feature maps output by the fourth residual module respectively enter five branches, the feature maps pass through a channel dimension splicing layer, are fused with feature information under different receptive fields, and pass through a sixth 1 multiplied by 1 convolution layer to obtain the output feature map of the void space convolution pooling pyramid module.
3. The layer jump structure unit 3 comprises a first 1 × 1 convolutional layer, a second 1 × 1 convolutional layer, a third 1 × 1 convolutional layer, a first depth supervision mechanism, a second depth supervision mechanism and a third depth supervision mechanism, wherein the first 1 × 1 convolutional layer receives an output characteristic diagram of a first residual error module in the coding unit 2, outputs the output characteristic diagram to the decoding unit 4 after convolution operation, and simultaneously outputs the output characteristic diagram to the first depth supervision mechanism when model training is carried out; the second 1 x 1 convolutional layer receives the output characteristic diagram of the second residual module in the coding unit 2, outputs the output characteristic diagram to the decoding unit 4 after convolutional operation, and simultaneously outputs the output characteristic diagram to the second deep supervision mechanism during model training; the third 1 × 1 convolutional layer receives the output feature map of the third residual module in the encoding unit 2, outputs the feature map to the decoding unit 4 after convolutional operation, and simultaneously outputs the feature map to the third deep supervision mechanism during model training.
The first depth supervision mechanism, the second depth supervision mechanism and the third depth supervision mechanism respectively and correspondingly receive output feature maps of the first 1 x 1 convolution module, the second 1 x 1 convolution module and the third 1 x 1 convolution module in a model training stage; the first, second and third deep supervision mechanisms have the same structure, and as shown in fig. 3, the received feature map is subjected to traffic sign center point prediction, offset prediction and scale prediction through three branches, wherein the three branches comprise two 3 × 3 convolution modules connected in series; obtaining the prediction information of the central point after passing through a seventh 3 multiplied by 3 convolution module and an eighth 3 multiplied by 3 convolution module of the first branch, and performing cross entropy loss calculation with the real central point to obtain cross entropy loss; obtaining prediction information of the offset after passing through a ninth 33 convolution module and a tenth 33 convolution module of the second branch, and performing L1 loss calculation with the real offset to obtain a first L1 loss; obtaining prediction information about the scale after passing through an eleventh 3 × 3 convolution module and a twelfth 3 × 3 convolution module of the third branch, and performing L1 loss calculation with the real scale to obtain a second L1 loss; and adding the cross entropy loss, the first L1 loss and the second L1 loss to obtain an auxiliary loss function, and forming an output value of the deep supervision mechanism.
4. The decoding unit 4 comprises a first decoding module, a second decoding module and a third decoding module, wherein the first decoding module is used for decoding the feature map output by the hollow space convolution pooling pyramid module in the receiving and encoding unit 2, adding the feature map to the output of a third 1 x 1 convolution layer in the layer jump structure unit 3, and outputting the feature map obtained by adding to the second decoding module and the output prediction unit 5; the second decoding module decodes the received feature map, adds the decoded feature map to the output of the second 1 × 1 convolution layer in the layer jump structure unit 3, and outputs the feature map obtained by the addition to the third decoding module and the output prediction unit 5; the third decoding module decodes the received feature map, adds the decoded feature map to the output of the first 1 × 1 convolution layer in the layer jump structure unit 3, and outputs the feature map obtained by the addition to the output prediction unit 5; the first decoding module, the second decoding module and the third decoding module have the same structure, and the structure is shown in fig. 4, and each decoding module comprises a bilinear interpolation layer and a 3 × 3 convolution layer which are sequentially connected in series.
5. The output prediction unit 5 includes: a first bilinear interpolation for receiving the result of the output addition of the second decoding module in the decoding unit 4 and the second 1 × 1 convolutional layer in the layer-skipping structural unit 3, a second bilinear interpolation for receiving the result of the output addition of the first decoding module in the decoding unit 4 and the third 1 × 1 convolutional layer in the layer-skipping structural unit 3, a channel dimension splicing layer for respectively receiving the result of the output addition of the third decoding module in the decoding unit 4 and the first 1 × 1 convolutional layer in the layer-skipping structural unit 3, the output of the first bilinear interpolation and the output of the second bilinear interpolation, the channel dimension splicing layer fuses the obtained feature maps into one feature map output, the fused feature maps respectively carry out the prediction information of the traffic sign center point category and position, the center point offset and the traffic sign scale through three branches and output, thereby realizing the traffic sign detection, the three branches have the same structure and are provided with two series-connected 3 multiplied by 3 convolution modules.
The 3 × 3 convolution module structure is shown in fig. 5, and includes a 3 × 3 convolution layer, a BN layer, and a RELU layer connected in series in sequence.
Step 3, training the Anchor-Free traffic sign detection neural network model based on deep supervision by adopting the training set obtained in the step 1;
inputting the training set image cut in the step 1 into an Anchor-Free traffic sign detection neural network model based on deep supervision, and obtaining the category information of the traffic sign and the position information of the detection frame in a forward propagation stage. And calculating the error between the output result of the neural network model based on the depth-supervised Anchor-Free traffic sign detection and the position information and the category information of the real target according to the predicted result. And the error term is reversely propagated from the output layer to the hidden layer by layer, the model parameters of the depth-supervised Anchor-Free traffic sign detection neural network are updated, and the model parameters of the depth-supervised Anchor-Free traffic sign detection neural network are continuously fed back and optimized by using a random gradient descent (SGD) optimizer.
In the training of the Anchor-Free traffic sign detection neural network model based on deep supervision, one batch comprises 4 images, the iteration times are set to be 180, the initial learning rate is set to be 1.25 multiplied by 10-4And decays to 1.25X 10 in the 90 th iteration-5. And storing the trained model.
And 4, testing the Anchor-Free traffic sign detection neural network model based on deep supervision by adopting the test set obtained in the step 1.
Inputting the cut test set image obtained in the step 1 into the trained Anchor-Free traffic sign detection neural network model based on deep supervision in the step 3, and outputting a detection result, wherein an effect graph of the result is shown in the attached figure 6.
The embodiment of the invention adopts Average Precision (Average Precision) to measure the algorithm effect. 3073 test set pictures are input for detection and calculation, and then the AP is calculated to be 95.7.
[1]Ren S,He K,Girshick R,et al.Faster r-cnn:Towards real-time object detection with region proposal networks[C]//Advances in neural information processing systems.2015:91-99.
[2]Huang J,Rathod V,Sun C,et al.Speed/accuracy trade-offs for modern convolutional object detectors[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2017:7310-7311.
[3]Redmon J,Divvala S,Girshick R,et al.You only look once:Unified,real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2016:779-788.

Claims (10)

1. An Anchor-Free traffic sign detection method based on deep supervision is characterized by comprising the following steps:
step 1, constructing a data set and carrying out data preprocessing to form a training set and a test set;
step 2, building an Anchor-Free traffic sign detection neural network model based on deep supervision, which comprises the following steps of sequentially connecting in series: the device comprises an input unit (1), an encoding unit (2), a layer jump structure unit (3), a decoding unit (4) and an output prediction unit (5), wherein the encoding unit (2) is also connected with the decoding unit (4);
step 3, training the Anchor-Free traffic sign detection neural network model based on deep supervision by adopting the training set obtained in the step 1;
and 4, testing the Anchor-Free traffic sign detection neural network model based on deep supervision by adopting the test set obtained in the step 1.
2. The depth supervision-based Anchor-Free traffic sign detection method according to claim 1, characterized in that the data set in step 1 adopts 45-class traffic sign-containing data with the frequency of occurrence of more than 100 in China traffic sign data set TT100K published by Qinghua university and Tengchin for training and testing of neural networks; the data preprocessing is to cut the original image into 512 x 512 pixel images randomly according to the area where the traffic sign is located, the cut image contains more than one traffic sign, and the annotation of the traffic sign detection frame in the cut image is obtained according to the original annotation file; the incomplete traffic sign in the cut image is used for simulating the condition when the traffic sign is blocked.
3. The method for detecting the Anchor-Free traffic sign based on the deep supervision according to the claim 1, wherein the step 2 is to build the neural network model based on the Anchor-Free traffic sign detection based on the deep supervision through a deep learning framework PyTorch, and the input unit (1) comprises: carrying out primary extraction of shallow features on an input original image through a 7 x 7 convolutional layer to obtain a feature map, wherein the convolutional kernel size of the convolutional layer is 7 x 7, the step length is 2, and the number of output channels is 64; and the characteristic diagram sequentially passes through 1 BN layer to prevent gradient disappearance, 1 RELU activation function layer and 1 maximum pooling layer with the step length of 2 and the pooling window of 2 multiplied by 2 to form the input of the encoding unit (2).
4. The depth supervision-based Anchor-Free traffic sign detection method according to claim 1, wherein the encoding unit (2) in step 2 comprises:
the device comprises a first residual error module, a second residual error module, a third residual error module, a fourth residual error module and a cavity space convolution pooling pyramid module which are sequentially connected in series, a feature map output by an input unit (1) sequentially passes through the first residual error module, the second residual error module, the third residual error module and the fourth residual error module to further extract features, the features are enhanced through the cavity space convolution pooling pyramid module, the enhanced feature map is obtained and sent to a decoding unit (4), and the outputs of the first residual error module, the second residual error module and the third residual error module are respectively connected with a layer jump structure unit (3).
5. The depth supervision-based Anchor-Free traffic sign detection method according to claim 4, characterized in that the void space convolution pooling pyramid module is divided into 5 branches, the first branch comprises a global average pooling layer, a fourth 1 x 1 convolution layer and an upsampling layer which are connected in series in sequence; the second branch comprises a fifth 1 × 1 convolutional layer; the third branch comprises a first 3 x 3 expanded convolutional layer, the expansion rate of the first 3 x 3 expanded convolutional layer is 6; the fourth branch comprises a second 3 × 3 expansion convolutional layer, the expansion rate of the second 3 × 3 expansion convolutional layer is 12; the fifth branch comprises a third 3 × 3 expanded convolutional layer, the third 3 × 3 expanded convolutional layer having an expansion rate of 18; and after the feature maps output by the fourth residual module respectively enter five branches, the feature maps pass through a channel dimension splicing layer, are fused with feature information under different receptive fields, and pass through a sixth 1 multiplied by 1 convolution layer to obtain the output feature map of the void space convolution pooling pyramid module.
6. The depth supervision-based Anchor-Free traffic sign detection method according to claim 1, wherein the layer jump structure unit (3) in the step 2 comprises a first 1 x 1 convolutional layer, a second 1 x 1 convolutional layer, a third 1 x 1 convolutional layer, a first depth supervision mechanism, a second depth supervision mechanism and a third depth supervision mechanism, wherein the first 1 x 1 convolutional layer receives an output feature map of a first residual error module in the coding unit (2), outputs the output feature map to the decoding unit (4) after convolution operation, and simultaneously outputs the output feature map to the first depth supervision mechanism when model training is performed; the second 1 x 1 convolutional layer receives the output characteristic diagram of the second residual error module in the coding unit (2), outputs the output characteristic diagram to the decoding unit (4) after convolutional operation, and simultaneously outputs the output characteristic diagram to a second depth supervision mechanism during model training; and the third 1 x 1 convolutional layer receives the output characteristic diagram of the third residual module in the coding unit (2), outputs the output characteristic diagram to a decoding unit (4) after convolutional operation, and simultaneously outputs the output characteristic diagram to a third deep supervision mechanism during model training.
7. The depth supervision-based Anchor-Free traffic sign detection method according to claim 6, wherein the first depth supervision mechanism, the second depth supervision mechanism and the third depth supervision mechanism respectively receive output feature maps of the first 1 x 1 convolution module, the second 1 x 1 convolution module and the third 1 x 1 convolution module in a model training phase; the first depth supervision mechanism, the second depth supervision mechanism and the third depth supervision mechanism have the same structure, and respectively carry out traffic sign central point prediction, offset prediction and scale prediction on a received feature map through three branches, wherein the three branches comprise two 3 multiplied by 3 convolution modules connected in series; obtaining the prediction information of the central point after passing through a seventh 3 multiplied by 3 convolution module and an eighth 3 multiplied by 3 convolution module of the first branch, and performing cross entropy loss calculation with the real central point to obtain cross entropy loss; obtaining prediction information of the offset after passing through a ninth 33 convolution module and a tenth 33 convolution module of the second branch, and performing L1 loss calculation with the real offset to obtain a first L1 loss; obtaining prediction information about the scale after passing through an eleventh 3 × 3 convolution module and a twelfth 3 × 3 convolution module of the third branch, and performing L1 loss calculation with the real scale to obtain a second L1 loss; and adding the cross entropy loss, the first L1 loss and the second L1 loss to obtain an auxiliary loss function, and forming an output value of the deep supervision mechanism.
8. The depth supervision-based Anchor-Free traffic sign detection method according to claim 1, wherein the decoding unit (4) in step 2 comprises a first decoding module, a second decoding module and a third decoding module, wherein the first decoding module decodes the feature map output by the cavity space convolution pooling pyramid module in the receiving and encoding unit (2), adds the feature map to the output of a third 1 x 1 convolution layer in the layer-skipping structure unit (3), and outputs the feature map obtained by the addition to the second decoding module and the output prediction unit (5); the second decoding module carries out decoding processing on the received feature map, adds the feature map to the output of the second 1 x 1 convolution layer in the layer jump structure unit (3), and outputs the feature map obtained by the addition to the third decoding module and the output prediction unit (5); the third decoding module carries out decoding processing on the received feature map, adds the feature map to the output of the first 1 x 1 convolution layer in the layer jump structure unit (3), and outputs the feature map obtained by the addition to an output prediction unit (5); the first decoding module, the second decoding module and the third decoding module have the same structure and respectively comprise a bilinear interpolation layer and a 3 multiplied by 3 convolution layer which are sequentially connected in series.
9. The depth supervision-based Anchor-Free traffic sign detection method according to claim 1, characterized in that the output prediction unit (5) of step 2 comprises: receiving a first bilinear interpolation of a result of output addition of a second decoding module in a decoding unit (4) and a second 1 × 1 convolutional layer in a layer-skipping structural unit (3), receiving a second bilinear interpolation of a result of output addition of the first decoding module in the decoding unit (4) and a third 1 × 1 convolutional layer in the layer-skipping structural unit (3), respectively receiving a result of output addition of the third decoding module in the decoding unit (4) and the first 1 × 1 convolutional layer in the layer-skipping structural unit (3), a channel dimension splicing layer of an output of the first bilinear interpolation and an output of the second bilinear interpolation, respectively, fusing the obtained feature maps into one feature map for output, and respectively carrying out prediction information of the traffic sign center point category and position, the center point offset and the traffic sign scale through three branches by the channel dimension splicing layer and outputting, and the three branches have the same structure and are provided with two series-connected 3 multiplied by 3 convolution modules.
10. The depth supervision-based Anchor-Free traffic sign detection method according to claim 9, wherein the 3 x 3 convolution module comprises a 3 x 3 convolution layer, a BN layer and a RELU layer which are connected in series in sequence.
CN202111487756.XA 2021-12-07 2021-12-07 Anchor-Free traffic sign detection method based on deep supervision Pending CN114170581A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111487756.XA CN114170581A (en) 2021-12-07 2021-12-07 Anchor-Free traffic sign detection method based on deep supervision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111487756.XA CN114170581A (en) 2021-12-07 2021-12-07 Anchor-Free traffic sign detection method based on deep supervision

Publications (1)

Publication Number Publication Date
CN114170581A true CN114170581A (en) 2022-03-11

Family

ID=80484107

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111487756.XA Pending CN114170581A (en) 2021-12-07 2021-12-07 Anchor-Free traffic sign detection method based on deep supervision

Country Status (1)

Country Link
CN (1) CN114170581A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114648685A (en) * 2022-03-23 2022-06-21 成都臻识科技发展有限公司 Method and system for converting anchor-free algorithm into anchor-based algorithm
CN114694119A (en) * 2022-04-07 2022-07-01 长沙理工大学 Traffic sign detection method based on reparameterization and feature weighting and related device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114648685A (en) * 2022-03-23 2022-06-21 成都臻识科技发展有限公司 Method and system for converting anchor-free algorithm into anchor-based algorithm
CN114694119A (en) * 2022-04-07 2022-07-01 长沙理工大学 Traffic sign detection method based on reparameterization and feature weighting and related device

Similar Documents

Publication Publication Date Title
CN110147763B (en) Video semantic segmentation method based on convolutional neural network
CN109840471B (en) Feasible road segmentation method based on improved Unet network model
CN111582029B (en) Traffic sign identification method based on dense connection and attention mechanism
CN110781776B (en) Road extraction method based on prediction and residual refinement network
CN114170581A (en) Anchor-Free traffic sign detection method based on deep supervision
CN112801117B (en) Multi-channel receptive field guided characteristic pyramid small target detection network and detection method
CN111291660B (en) Anchor-free traffic sign identification method based on void convolution
CN109993269A (en) Single image people counting method based on attention mechanism
CN114267025A (en) Traffic sign detection method based on high-resolution network and light-weight attention mechanism
CN112101117A (en) Expressway congestion identification model construction method and device and identification method
CN114092815B (en) Remote sensing intelligent extraction method for large-range photovoltaic power generation facility
CN116071668A (en) Unmanned aerial vehicle aerial image target detection method based on multi-scale feature fusion
CN113298817A (en) High-accuracy semantic segmentation method for remote sensing image
Yang et al. Real-time traffic signs detection based on YOLO network model
CN110634127A (en) Power transmission line vibration damper target detection and defect identification method and device
CN115100549A (en) Transmission line hardware detection method based on improved YOLOv5
CN112215231A (en) Large-scale point cloud semantic segmentation method combining space depth convolution and residual error structure
CN116051977A (en) Multi-branch fusion-based lightweight foggy weather street view semantic segmentation algorithm
CN115527096A (en) Small target detection method based on improved YOLOv5
CN112818777B (en) Remote sensing image target detection method based on dense connection and feature enhancement
CN114266805A (en) Twin region suggestion network model for unmanned aerial vehicle target tracking
CN114550135A (en) Lane line detection method based on attention mechanism and feature aggregation
CN113361528A (en) Multi-scale target detection method and system
CN115995002B (en) Network construction method and urban scene real-time semantic segmentation method
CN116434188A (en) Traffic sign detection method based on improved_yolov5s network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination