CN113920499A - Laser point cloud three-dimensional target detection model and method for complex traffic scene - Google Patents

Laser point cloud three-dimensional target detection model and method for complex traffic scene Download PDF

Info

Publication number
CN113920499A
CN113920499A CN202111255417.9A CN202111255417A CN113920499A CN 113920499 A CN113920499 A CN 113920499A CN 202111255417 A CN202111255417 A CN 202111255417A CN 113920499 A CN113920499 A CN 113920499A
Authority
CN
China
Prior art keywords
dimensional
candidate
convolution
frames
voxel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111255417.9A
Other languages
Chinese (zh)
Inventor
王海
陈智宇
蔡英凤
陈龙
刘擎超
李祎承
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN202111255417.9A priority Critical patent/CN113920499A/en
Publication of CN113920499A publication Critical patent/CN113920499A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a laser point cloud three-dimensional target detection model and method for complex traffic scenes, wherein a three-dimensional encoder in the model is favorable for detecting a long-distance target and a small target, sparse convolution and sub-manifold convolution can greatly improve the encoding efficiency of voxel characteristics, a residual error structure of a two-dimensional encoder can keep more complete information and is favorable for detecting the long-distance target and the small target, a network is easier to optimize, the self-calibration convolution can be used for respectively extracting characteristics in an original scale and a self-calibration scale, the receptive field is enlarged, more complete and rich characteristics are extracted, and useful characteristic expression is enhanced in the space direction and the channel direction through space attention and channel attention, so that useless information is inhibited. The final detection precision of the intelligent vehicle-mounted vehicle is 81.88%, the pedestrian is 47.82%, the rider is 69.81%, the average precision is 66.25%, 9.9% is improved compared with the existing VOXEL RCNN algorithm, 13.8FPS is achieved on RTX 2080Ti display, and the detection precision and speed can meet the perception requirements of the intelligent vehicle in a complex traffic environment.

Description

Laser point cloud three-dimensional target detection model and method for complex traffic scene
Technical Field
The invention belongs to the field of intelligent automobile perception, and particularly relates to a laser point cloud three-dimensional target detection model and method for a complex traffic scene.
Background
An intelligent car is a complex system comprising sensing, decision-making and control. The environment perception technology is the basis of the intelligent vehicle and provides necessary environment information for subsequent decision and control. The precision of the traditional machine learning algorithm cannot meet the operation requirement of the current intelligent automobile. Therefore, the perception algorithm based on deep learning is rapidly developed, and great progress is made in the field of two-dimensional target detection and segmentation. However, the camera is affected by the conditions of night, rain, fog, strong light and the like, and the detection effect is influenced. With the reduction of the cost of the laser radar and the improvement of the computer computing power, the three-dimensional target detection is continuously applied to the intelligent automobile. The two-dimensional target detection can only provide the position of the target in the two-dimensional image, and the three-dimensional target detection can provide the position, the shape and the course angle of the target in the real environment, which has very important effect on subsequent decision and planning.
Disclosure of Invention
The method aims at solving the problems that most of the existing three-dimensional target detection algorithms cannot adapt to complex traffic scenes, the detection effects of long-distance vehicles and short-distance small targets such as pedestrians and riders are poor, and the perception requirements of intelligent automobiles under complex traffic conditions cannot be met. The invention improves and designs a three-dimensional target detection model algorithm facing to the complex traffic environment based on the VOXEL RCNN algorithm, so that the detection precision of a long-distance target and a small target can be improved, and the method can be suitable for the complex traffic environment. The invention achieves the technical purpose through the following technical scheme.
The invention provides a laser point cloud three-dimensional target detection model facing a complex traffic scene, which comprises a voxelization processing module, a three-dimensional encoder module, a two-dimensional encoder module, a candidate frame generation module, a candidate frame pooling module and a full-connection layer module;
the voxelization processing module: the system is used for carrying out voxelization processing on input laser radar point cloud data;
the three-dimensional encoder module: extracting the characteristics of the non-empty voxels, and projecting the obtained voxel characteristics to a bird's-eye view to generate a pseudo-image expression;
the two-dimensional encoder module: carrying out feature extraction on the pseudo-image expression to obtain a two-dimensional feature expression;
the candidate box generation module: performing target classification and regression by using the two-dimensional features to generate a high-quality candidate frame;
the candidate frame pooling module: performing voxel pooling on the candidate frames to obtain pooling characteristics;
the full connection layer module: and refining the candidate frame aiming at the pooled features.
Further, the voxelization processing module performs voxelization processing on the laser radar point cloud data in the following specific process:
the method comprises the steps of carrying out voxelization pretreatment on input point cloud along the direction X, Y, Z, dividing the whole point cloud scene into uniform voxels, taking the center of a laser radar as an origin, and taking the forward direction of a vehicle as an X axis, the leftward direction as a Y axis and the upward direction as a Z axis. The range of the whole point cloud scene (X, Y, Z) is [ (-75.2,75.2), (-75.2,75.2), (-5,3) ] meters, the size of each voxel is set to be (0.1,0.1,0.2) meter, therefore, the whole scene is divided into 1504 × 1504 × 40 voxels with equal size, if the number of original points in each voxel exceeds 5, the down sampling is carried out to 5, and the mean value of X, Y, Z, coordinates and reflection intensity of all points in each voxel after the down sampling is taken as the original characteristic of the voxel.
Further, the three-dimensional encoder module comprises a residual error structure, a sparse convolution and a sub-manifold convolution, the input of the three-dimensional encoder module is a 40X 1504 voxel expression, each voxel has an original characteristic of 4-dimensional (X, Y, Z, reflection intensity), after the input, the original characteristic of the voxel is extracted and down sampled by 1 time through a 3X 3 sub-manifold convolution, then the characteristic extraction is carried out by 2 continuous residual sub-manifold blocks, then the down sampling is carried out by 2 times, 4 times and 8 times along the X, Y and Z directions by 3X 3 sparse convolutions respectively to obtain the multi-scale three-dimensional characteristic, and each sparse convolution is followed by 2 continuous residual sub-manifold blocks for characteristic extraction; the structure of the residual sub-manifold block is formed by 2 sub-manifold convolutions of 3 multiplied by 3, and the input of the first sub-manifold convolution and the output of the second sub-manifold convolution are added, meanwhile, each sub-manifold convolution is followed by a BatchNorm layer and a ReLU layer; finally, the Z direction is subjected to 2 times of downsampling by using a sparse convolution, and the downsampled Z direction is projected on a bird's eye view and converted into a pseudo-image expression of 256 multiplied by 188.
Further, the two-dimensional encoder module comprises five parts, namely a residual structure, two-dimensional convolution, self-calibration convolution, space attention and channel attention mechanism, the module carries out 1-time, 2-time and 4-time down-sampling on a 256 × 188 × 188 pseudo-image expression obtained by the three-dimensional encoder module by using 13 × 3 two-dimensional convolution respectively to obtain multi-scale features, then 2 continuous self-calibration convolution are used for extracting the multi-scale features after each two-dimensional convolution, the input of the first self-calibration convolution and the output of the second self-calibration convolution are added, space attention and channel attention mechanisms are added after the self-calibration convolution, and useful feature expression and useless information reduction are achieved in the space and channel directions; and finally, returning the multi-scale features to the same size through deconvolution, splicing along the channel direction, and performing dimension compression by using a two-dimensional convolution to obtain a 64X 188 two-dimensional feature expression.
Further, the candidate frame generation module, aiming at the two-dimensional feature expression with the size of 64 × 188 × 188 obtained by the two-dimensional encoder module, places 10 anchor frames in each pixel of the 188 × 188 two-dimensional feature map, wherein the anchor frames are respectively 5 categories of cars, trucks, buses, pedestrians and riders, and each category is in two directions of 0 ° and 180 °, and performs category prediction and size regression (x, y, z, w, h, l and θ) on each anchor frame by using the two-dimensional feature map to obtain 10 × 188 × 188 three-dimensional frames, and in the training process, 9000 three-dimensional frames with the highest classification score are selected and sent to the non-maximum suppression module to obtain 512 high-quality candidate frames.
Further, the candidate frame pooling module randomly samples 128 candidate frames into a refinement stage for 512 high-quality candidate frames generated by the candidate frame generation module, wherein 64 of the candidate frames are positive samples, and 64 of the candidate frames are negative samples, and the voxel candidate frame pooling module is used for extracting multi-scale voxel characteristics: firstly, uniformly sampling 6 x 6 grid points for each candidate frame, then inquiring non-empty voxels within a certain distance around the grid points, and then extracting the non-empty voxels features by using a pointnet, a full connection layer and a maximum pooling layer to obtain pooling features; when inquiring the voxels around the grid point, inquiring the voxels sampled 2 times, 4 times and 8 times respectively to obtain the multi-scale voxel characteristics.
Further, the full-link layer module takes the pooling of the 128 candidate frames randomly sampled by the candidate frame pooling module as input, performs confidence prediction and regression frame refinement through the 2-layer full-link layer, and predicts the classification confidence c of each candidate frame for the confidence prediction branchiThe target is obtained by the following formula:
Figure BDA0003323690180000031
wherein
Figure BDA0003323690180000032
A confidence target value of the ith candidate box; IoUiThe intersection ratio of the ith candidate box and its true value box. ThetaHLThe intersection ratio threshold of the foreground and the background is 0.75 and 0.25 respectively;
the intersection ratio of the prediction box and the truth value is associated with the classification execution degree, so that the problem that the classification confidence degree and the positioning accuracy are not matched is solved; for the regression box refinement branch, more accurate position information (x, y, z, w, h, l, θ) is predicted by the fully connected layers.
Furthermore, the data set used by the model adopts an ONCE data set as a training data set and a verification data set, and a class balance sampling enhancement method is added during training, namely training scenes with few classes are randomly copied and sent into the training data set, so that the class imbalance is relieved, and the number of training samples is expanded;
further, the model training adopts end-to-end training, and five categories of cars, trucks, buses, riders and pedestrians are trained simultaneously; 2 RTX 2080Ti GPUS training 90 rounds of the network, wherein an optimizer is Adam, a learning rate change mode adopts a cosine fire mode, the maximum learning rate is 0.003, a frequency division coefficient is 10, momentum is from 0.95 to 0.85, weight attenuation is 0.02, and batch size is 6; total loss L thereofTOTALIncluding RPN loss LRPNAnd RCNN loss LRCNNIn the RPN phase, the classification Loss L is calculated by Focal localclsCalculating the regression Loss L by using Smooth L1 Lossreg1In the RCNN stage, the confidence loss L is calculated by using the binary cross entropy lossiouCalculating the regression Loss L by using Smooth L1 Lossreg2
LTOTAL=LRPN+LRCNN
Figure BDA0003323690180000041
Wherein N isfgIs the number of foreground candidate frames, siIs the predicted category of the candidate box,
Figure BDA0003323690180000042
is the category true value of the candidate box, alpha represents the regression loss calculated for only the candidate box in the foreground,
Figure BDA0003323690180000043
are the predicted candidate box regression parameters,
Figure BDA0003323690180000044
is the true value of the candidate box regression parameters;
Figure BDA0003323690180000045
wherein N issIs the number of candidate boxes for the sample, ciIs the confidence level of the prediction that the prediction is,
Figure BDA0003323690180000046
is the target value for confidence, beta represents the regression loss calculated for only the candidate boxes of the sampled foreground,
Figure BDA0003323690180000047
are the predicted candidate box regression parameters,
Figure BDA0003323690180000048
is the true value of the candidate box regression parameters.
Based on the model, the invention provides a detection method of a laser point cloud three-dimensional target facing a complex traffic scene, which comprises the following steps:
step 1, selecting or making a data set as a training data set and a verification data set of a detection network, and adding a class balance sampling enhancement method during training.
And 2, performing voxelization on the laser radar point cloud input into the network.
And 3, extracting the characteristics of the non-empty voxels by using a three-dimensional encoder consisting of a residual structure, a sparse convolution and a sub-manifold convolution, and projecting the obtained voxel characteristics onto the aerial view to generate a pseudo-image expression.
And 4, performing feature extraction on the pseudo-image expression by using a two-dimensional encoder consisting of a residual error structure, a two-dimensional convolution, a self-calibration convolution, a space attention and a channel attention mechanism to obtain a two-dimensional feature expression.
And 5, classifying and regressing the target by using the two-dimensional characteristics to generate a high-quality candidate frame.
And 6, carrying out voxel pooling on the candidate frames to obtain pooling characteristics.
And 7, refining the candidate frame by utilizing the full connection layer aiming at the pooling characteristics.
And 8, training the network from the step 2 to the step 7.
And 9, visualizing the detection effect.
The specific implementation process of each step is described in the detailed description of the specific embodiments.
The invention has the beneficial effects that:
1. the invention adopts the automatic driving data set ONCE to carry out network training, the data sets collected on different roads, different weather and different time can represent the complex traffic condition, and the network trained by the data sets can be suitable for the complex traffic condition.
2. The three-dimensional encoder designed by the invention is composed of a residual error structure, sparse convolution and sub-manifold convolution, more complete information can be kept through the residual error structure, the detection of a long-distance target and a small target is facilitated, the coding efficiency of voxel characteristics can be greatly improved through the sparse convolution and the sub-manifold convolution, and the real-time detection is facilitated.
3. The two-dimensional encoder designed by the invention consists of a residual error structure, two-dimensional convolution, self-calibration convolution, space attention and channel attention. More complete information can be kept through the residual error structure, the detection effect of a long-distance target and a small target is facilitated, and meanwhile, the network is easier to optimize. The self-calibration convolution can be used for respectively extracting the features in the original scale and the self-calibration scale, so that the receptive field is enlarged, and more complete and richer features can be extracted. The useful feature expression can be enhanced in the spatial direction and the channel direction through the spatial attention and the channel attention, and the useless information is suppressed.
4. In the training process, a class balance sampling method is added, so that on one hand, class imbalance is relieved, the detection effect of pedestrians and riders with small sample amount in a data set is improved, and on the other hand, the robustness of an algorithm is improved by expanding the training samples of the data set.
5. The final detection precision of the intelligent vehicle-mounted vehicle is 81.88%, the pedestrian is 47.82%, the rider is 69.81%, the average precision is 66.25%, the VOXEL RCNN algorithm is 9.9% higher than that of an original VOXEL RCNN algorithm, 13.8FPS is achieved on RTX 2080Ti display, and the detection precision and speed can meet the perception requirements of the intelligent vehicle in a complex traffic environment.
Drawings
FIG. 1 is an algorithm flow of a laser point cloud three-dimensional target detection method for a complex traffic scene.
Fig. 2 is a network configuration diagram of the three-dimensional encoder of the present invention.
Fig. 3 is a network configuration diagram of the two-dimensional encoder of the present invention.
Fig. 4 is a diagram of the detection visualization effect of the present invention.
Detailed Description
The invention will be further explained with reference to the drawings.
The invention provides a laser point cloud three-dimensional target detection method facing a complex traffic environment, which specifically comprises the following processes as shown in figure 1:
step 1, selecting the Hua-ONCE data set as a training data set and a verification data set of a detection network, and adding a class balance sampling enhancement method during training.
Hua shi ONCE data set was acquired by 7 cameras and 1 lidar of 40 lines, and it contained 5 categories, car, truck, bus, pedestrian and cyclist respectively. The data set collects scenes of different weather (sunny days, cloudy days and rainy days), different time (morning, Chinese, afternoon and evening) and different road conditions (city center, suburb, expressway, tunnel and bridge), and can well represent complex traffic conditions. The ONCE data set collected a 144 hour autopilot scenario with approximately 100 ten thousand frame samples. Wherein 16000 frames have labeled samples, 5000 frames are used for training, 3000 frames are used for verification, and 8000 frames are used for testing. In addition, aiming at the fact that the number of car samples in training samples is large, and the number of training samples of other trucks, buses, pedestrians and riders is small, a class balance sampling enhancement method is adopted in the training process, namely training scenes with few classes are randomly copied and sent into a training data set, so that class imbalance is relieved, the detection precision of few classes is improved, meanwhile, due to the fact that the number of the training samples is expanded, the training samples are expanded to 16600 frames from 5000 frames, 3.3 times are expanded, and the robustness of a model is improved due to the fact that the number of the training samples is increased.
And 2, performing voxelization on the laser radar point cloud input into the network.
The input point cloud is subjected to voxelization preprocessing along the direction X, Y, Z, and the whole point cloud scene is divided into uniform voxels, so that the point cloud is subjected to efficient feature extraction through sparse convolution and sub-manifold convolution. The center of the laser radar is used as an origin, the forward direction of the vehicle is an X axis, the leftward direction of the vehicle is a Y axis, and the upward direction of the vehicle is a Z axis. The range of the whole point cloud scene (X, Y, Z) is [ (-75.2,75.2), (-75.2,75.2), (-5,3) ] meters, the size of each voxel is set to be (0.1,0.1,0.2) meter, therefore, the whole scene is divided into 1504 × 1504 × 40 voxels with equal size. And if the number of the original points in each voxel exceeds 5, downsampling to 5, taking the mean values of x, y, z, coordinates and reflection intensity of all the points in each voxel after downsampling as the original features of the voxel, greatly improving the extraction efficiency of point cloud features through voxelization, and facilitating real-time detection in an automatic driving scene.
Wherein, the expression form in fig. 2 is like sub m (a, b) -c × d × e- (f, g, h), sub m represents sub manifold convolution, a and b respectively and input-output dimension, c × d × e represents convolution kernel size in x, y, z direction, and (f, g, h) represents step size in x, y, z direction; FIG. 2 is a representation in the form of SpConv (a, b) -c × d × e- (f, g, h), where SpConv represents sparse convolution, a and b respectively correspond to input and output dimensions, c × d × e represents convolution kernel sizes in the x, y, z directions, and (f, g, h) represents step sizes in the x, y, z directions; fig. 2 is an expression like a Residual sub-Block (a, b) representing Residual sub-blocks, a and b and the input-output dimension, the specific composition of which is shown by the dashed boxes in fig. 2.
And 3, extracting the characteristics of the non-empty voxels by using a three-dimensional encoder consisting of a residual structure, a sparse convolution and a sub-manifold convolution, and projecting the obtained voxel characteristics onto the aerial view to generate a pseudo-image expression.
Input to the three-dimensional encoder is a 40 × 1504 × 1504 representation of voxels, each having 4 dimensions (x, y, z, reflection intensity) of primitive features. The network structure of the three-dimensional encoder is shown in fig. 2. The extraction of the original voxel features is performed by a 3 × 3 × 3 sub-manifold convolution and 1-fold down-sampling, followed by feature extraction with 2 consecutive residual sub-manifold blocks. Then 3 times, 4 times and 8 times of sparse convolution are respectively carried out along the X, Y and Z directions to obtain multi-scale three-dimensional features, and each sparse convolution is followed by 2 continuous residual sub-flow blocks to carry out feature extraction. Specifically, the structure of the residual sub-manifold block is formed by 2 sub-manifold convolutions of 3 × 3 × 3, the input of the first sub-manifold convolution is added to the output of the second sub-manifold convolution, and the residual structure is favorable for keeping more complete information in the down-sampling process and improving the detection effect of small targets and long-distance targets. Meanwhile, each sub-manifold convolution is followed by a BatchNorm layer and a ReLU layer. The BatchNorm layer and the ReLU layer are beneficial to accelerating the convergence speed of the network, preventing gradient explosion and gradient disappearance and preventing overfitting. Finally, the Z direction is subjected to 2 times of downsampling by using a sparse convolution, and the downsampled Z direction is projected on a bird's eye view and converted into a pseudo-image expression of 256 multiplied by 188. In a three-dimensional encoder, the use of a large number of sparse convolutions and sub-manifold convolutions can improve the speed of extracting non-empty voxel features.
And 4, performing feature extraction on the pseudo-image expression by using a two-dimensional encoder consisting of a residual error structure, a two-dimensional convolution, a self-calibration convolution, a space attention and a channel attention mechanism to obtain a two-dimensional feature expression.
And (3) sending the 256 × 188 × 188 pseudo image expression obtained in the step (3) to a two-dimensional encoder for feature extraction. The network structure of the two-dimensional encoder is shown in fig. 3. The original pseudo-image representation is down-sampled 1, 2, 4 times with 13 x 3 two-dimensional convolution to obtain multi-scale features. Then, after each two-dimensional convolution, 2 successive self-calibration convolutions are used to extract the multi-scale features, and the input of the first self-calibration convolution and the output of the second self-calibration convolution are added. The self-calibration convolution is used for extracting the features in the original scale space and the self-calibration scale space, so that the receptive field is greatly increased, and more remarkable features are obtained. In addition, a spatial attention and channel attention mechanism is added after the self-calibration convolution, so that useful feature expression is enhanced in the spatial and channel directions, and useless information is reduced. And finally, returning the multi-scale features to the same size through deconvolution, splicing along the channel direction, and performing dimension compression by using a two-dimensional convolution to obtain a 64X 188 two-dimensional feature expression.
Wherein, fig. 3 is an expression form of Conv (a, b) -c × d- (e, f), Conv represents two-dimensional convolution, a and b are input and output dimensions, respectively, c × d represents convolution and size in x, y directions, and (e, f) represents step size in x, y directions; FIG. 3 is an expression in the form of Conv.T (a, b) -c × d- (e, f), where Conv.T stands for deconvolution, a and b are input and output dimensions, respectively, c × d stands for convolution and size in the x, y directions, and (e, f) stands for step sizes in the x, y directions; FIG. 3 is an expression in the form of SC-Conv (a, b), where SC-Conv stands for self-calibrating convolution and a and b are the input and output dimensions, respectively; FIG. 3 is an expression of the form CBAM (a, b), where CBAM represents channel and spatial attention, and a and b are input and output dimensions, respectively; the shape in fig. 3 is expressed as a × b × c, where a denotes the feature dimension, and b and c denote the dimensions of the feature in the x and y directions, respectively.
And 5, classifying and regressing the target by using the two-dimensional characteristics to generate a high-quality candidate frame.
The two-dimensional feature expression is obtained through the step 4, the size of the two-dimensional feature expression is 64 x 188, 10 anchor frames (5 categories including cars, trucks, buses, pedestrians and riders, and 0-degree and 180-degree directions of each category) are placed in each pixel of the 188 x 188 two-dimensional feature map, category prediction and size regression (x, y, z, w, h, l and theta) are carried out on each anchor frame through the two-dimensional feature map, and 10 x 188 three-dimensional frames are obtained. In training, 9000 three-dimensional frames with the highest classification scores are selected and sent to a non-maximum suppression module to obtain 512 high-quality candidate frames.
And 6, carrying out voxel pooling on the candidate frames to obtain pooling characteristics.
For the 512 candidate boxes generated in step 5, 128 random samples are sent to the refinement stage, 64 positive samples and 64 negative samples. And extracting multi-scale voxel characteristics by using a voxel candidate frame pooling module. Firstly, uniformly sampling 6 x 6 grid points for each candidate box, then inquiring non-empty voxels within a certain distance around the grid points, and then extracting the non-empty voxels features by using a pointnet, a full connection layer and a maximum pooling layer to obtain pooled features. When the voxels around the grid points are inquired, the voxels sampled 2 times, 4 times and 8 times are inquired respectively, so that the multi-scale voxel characteristics are obtained, and the target detection with different sizes is facilitated.
And 7, refining the candidate frame by utilizing the full connection layer aiming at the pooling characteristics.
For the 128 candidate boxes in step 6, their pooling is used as input, and confidence prediction and refinement of the regression box are performed by the 2-layer fully-connected layer. For confidence-predicted branches, predict the classification confidence c of each candidate boxiThe target is obtained by the following formula:
Figure BDA0003323690180000081
wherein
Figure BDA0003323690180000082
A confidence target value of the ith candidate box; IoUiThe intersection ratio of the ith candidate box and its true value box. ThetaHLThe threshold values of the foreground and background intersection ratio are 0.75 and 0.25 respectively.
The intersection ratio of the prediction box and the truth value is associated with the classification execution degree, so that the problem that the classification confidence degree and the positioning accuracy are not matched can be relieved. For the regression box refinement branch, more accurate position information (x, y, z, w, h, l, θ) is predicted by the full-link layer.
And 8, training the network from the step 2 to the step 7.
The steps 2 to 7 form a network model of the invention, and the network is trained end to end, and five categories of cars, trucks, buses, riders and pedestrians are trained simultaneously. Train this network 90 rounds at 2 RTX 2080Ti GPUS, whichThe optimizer is Adam, the learning rate variation mode adopts a cosine fire mode, the maximum learning rate is 0.003, the frequency division coefficient is 10, the momentum is from 0.95 to 0.85, the weight attenuation is 0.02, and the batch size is 6. Total loss L thereofTOTALIncluding RPN loss LRPNAnd RCNN loss LRCNNIn the RPN phase, the classification Loss L is calculated by Focal localclsCalculating the regression Loss L by using Smooth L1 Lossreg1In the RCNN stage, the confidence loss L is calculated by using the binary cross entropy lossiouCalculating the regression Loss L by using Smooth L1 Lossreg2
LTOTAL=LRPN+LRCNN
Figure BDA0003323690180000083
Wherein N isfgIs the number of foreground candidate frames, siIs the predicted category of the candidate box,
Figure BDA0003323690180000084
is the category true value of the candidate box, alpha represents the regression loss calculated for only the candidate box in the foreground,
Figure BDA0003323690180000091
are the predicted candidate box regression parameters,
Figure BDA0003323690180000092
is the true value of the candidate box regression parameters.
Figure BDA0003323690180000093
Wherein N issIs the number of candidate boxes for the sample, ciIs the confidence level of the prediction that the prediction is,
Figure BDA0003323690180000094
is the target value for confidence, beta represents the regression loss calculated for only the candidate boxes of the sampled foreground,
Figure BDA0003323690180000095
are the predicted candidate box regression parameters,
Figure BDA0003323690180000096
is the true value of the candidate box regression parameters.
And 8, visualizing the detection effect.
And inputting the point cloud scene into the trained network to obtain the category, the confidence coefficient and the candidate frame of the refined target. At this time, candidate frames of a plurality of targets in the point cloud road scene are sorted from high to low according to the classification confidence degrees, frames with low confidence degrees are filtered, then, a frame with the highest score is selected from the rest frames, meanwhile, the frame which is intersected with the frame and is larger than a certain threshold value is deleted, then, the operation (non-maximum suppression) is circularly carried out in the rest frames, and a final detection result is generated and visualized. The visualization effect is shown in fig. 4, where green is vehicle (car, truck, bus are collectively classified as vehicle), yellow is pedestrian, and blue is rider. Table 1 is a comparison graph of the detection accuracy of the original VOXEL RCNN and the detection model algorithm of the present invention at different distances.
Table 1 comparison of detection accuracy of VOXEL RCNN from original edition to the present algorithm at different distances
Figure BDA0003323690180000097
As can be seen from table 1, compared with the original VOXEL RCNN, the detection accuracy of the algorithm for vehicles is improved by 7.58%, the detection accuracy for pedestrians is improved by 12.16%, the detection accuracy for riders is improved by 9.96%, and the average accuracy is improved by 9.9%. The detection precision can meet the perception requirement of the intelligent automobile under the complex traffic condition, and secondly, through the detection time statistics of each frame point cloud, 13.8FPS is achieved on an RTX 2080Ti display card, and the requirement of real-time performance can be met.
The above-listed series of detailed descriptions are merely specific illustrations of possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent means or modifications that do not depart from the technical spirit of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A laser point cloud three-dimensional target detection model facing a complex traffic scene is characterized by comprising a voxelization processing module, a three-dimensional encoder module, a two-dimensional encoder module, a candidate frame generating module, a candidate frame pooling module and a full-connection layer module;
the voxelization processing module: the system is used for carrying out voxelization processing on input laser radar point cloud data;
the three-dimensional encoder module: extracting the characteristics of the non-empty voxels, and projecting the obtained voxel characteristics to a bird's-eye view to generate a pseudo-image expression;
the two-dimensional encoder module: carrying out feature extraction on the pseudo-image expression to obtain a two-dimensional feature expression;
the candidate box generation module: performing target classification and regression by using the two-dimensional features to generate a high-quality candidate frame;
the candidate frame pooling module: performing voxel pooling on the candidate frames to obtain pooling characteristics;
the full connection layer module: and refining the candidate frame aiming at the pooled features.
2. The complex traffic scene-oriented laser point cloud three-dimensional target detection model as recited in claim 1, wherein the voxelization processing module performs voxelization processing on the laser radar point cloud data in the following specific process:
the method comprises the steps of performing voxelization preprocessing on an input point cloud along the direction X, Y, Z, dividing an entire point cloud scene into uniform voxels, taking the center of a laser radar as an origin, an X axis along the front direction of a vehicle, a Y axis along the left direction and a Z axis along the upper direction, setting the size of each voxel to be (0.1,0.1,0.2) meters, and (5, 3) of the entire point cloud scene, wherein the range of the entire point cloud scene is [ (-75.2,75.2), (-75.2,75.2), and (-5,3) ], dividing the entire scene into 1504 × 1504 × 40 voxels with equal size, downsampling to 5 if the number of original points in each voxel exceeds 5, and taking the mean value of X, Y, Z, coordinates and reflection intensity of all the points in each voxel after downsampling as the original characteristics of the voxel.
3. The laser point cloud three-dimensional target detection model for the complex traffic scene according to claim 1, characterized in that the three-dimensional encoder module comprises a residual error structure, a sparse convolution and a sub-manifold convolution, the input of the three-dimensional encoder module is a 40 x 1504 voxel expression, each voxel has original characteristics of 4 dimensions (x, y, z, reflection intensity), after the input, the extraction of the original characteristics of the voxel is firstly carried out by a 3 x 3 sub-manifold convolution and 1 time down sampling is carried out, then 2 continuous residual sub-manifold blocks are used for characteristic extraction, then 3 times, 4 times and 8 times of sparse convolution are respectively carried out along the X, Y and Z directions to obtain multi-scale three-dimensional characteristics, and each sparse convolution is followed by 2 continuous residual sub-flow blocks to carry out characteristic extraction; the structure of the residual sub-manifold block is formed by 2 sub-manifold convolutions of 3 multiplied by 3, and the input of the first sub-manifold convolution and the output of the second sub-manifold convolution are added, meanwhile, each sub-manifold convolution is followed by a BatchNorm layer and a ReLU layer; finally, the Z direction is subjected to 2 times of downsampling by using a sparse convolution, and the downsampled Z direction is projected on a bird's eye view and converted into a pseudo-image expression of 256 multiplied by 188.
4. The laser point cloud three-dimensional target detection model for the complex traffic scene according to claim 1, the two-dimensional encoder module is characterized by comprising five parts of a residual error structure, two-dimensional convolution, self-calibration convolution, space attention and channel attention mechanism, the module down-samples the 256 x 188 pseudo-image representation obtained by the three-dimensional encoder module by 1, 2, 4 times with 13 x 3 two-dimensional convolution to obtain multi-scale features, then extracting the multi-scale features by 2 continuous self-calibration convolutions after each two-dimensional convolution, and adding the input of the first self-calibration convolution and the output of the second self-calibration convolution, adding a space attention and channel attention mechanism after the self-calibration convolution, enhancing useful feature expression and reducing useless information in the space and channel directions; and finally, returning the multi-scale features to the same size through deconvolution, splicing along the channel direction, and performing dimension compression by using a two-dimensional convolution to obtain a 64X 188 two-dimensional feature expression.
5. The laser point cloud three-dimensional target detection model oriented to the complex traffic scene as claimed in claim 1, wherein the candidate frame generation module is configured to, for a two-dimensional feature expression with a size of 64 × 188 × 188 obtained by the two-dimensional encoder module, place 10 anchor frames in each pixel of a 188 × 188 two-dimensional feature map, wherein the anchor frames are 5 categories of cars, trucks, buses, pedestrians, and riders, and each category is subjected to category prediction and size regression (x, y, z, w, h, l, θ) by using the two-dimensional feature map in two directions of 0 ° and 180 ° to obtain 10 × 188 × 188 three-dimensional frames, and when training, 9000 three-dimensional frames with the highest classification scores are selected and sent to the non-maximum suppression module to obtain 512 high-quality candidate frames.
6. The complex traffic scene-oriented laser point cloud three-dimensional target detection model as claimed in claim 1, wherein the candidate frame pooling module randomly samples 128 candidate frames into a refinement stage for 512 high-quality candidate frames generated by the candidate frame generation module, wherein 64 of the candidate frames are positive samples and 64 of the candidate frames are negative samples, and the voxel candidate frame pooling module is used for extracting multi-scale voxel features: firstly, uniformly sampling 6 x 6 grid points for each candidate frame, then inquiring non-empty voxels within a certain distance around the grid points, and then extracting the non-empty voxels features by using a pointnet, a full connection layer and a maximum pooling layer to obtain pooling features; when inquiring the voxels around the grid point, inquiring the voxels sampled 2 times, 4 times and 8 times respectively to obtain the multi-scale voxel characteristics.
7. The laser point cloud three-dimensional target detection model for the complex traffic scene according to claim 1,the full-connection layer module is characterized in that the full-connection layer module takes the pooling of 128 candidate frames randomly sampled by the candidate frame pooling module as input, carries out confidence prediction and regression frame refinement through a 2-layer full-connection layer, and predicts the classification confidence c of each candidate frame for a confidence prediction branchiThe target is obtained by the following formula:
Figure FDA0003323690170000031
wherein
Figure FDA0003323690170000032
A confidence target value of the ith candidate box; IoUiThe intersection ratio of the ith candidate box and its true value box. ThetaH,θLThe intersection ratio threshold of the foreground and the background is 0.75 and 0.25 respectively;
the intersection ratio of the prediction box and the truth value is associated with the classification execution degree, so that the problem that the classification confidence degree and the positioning accuracy are not matched is solved; for the regression box refinement branch, more accurate position information (x, y, z, w, h, l, θ) is predicted by the fully connected layers.
8. The complex traffic scene-oriented laser point cloud three-dimensional target detection model as claimed in any one of claims 1 to 7, wherein a dataset used by the model adopts an ONCE dataset as a training dataset and a verification dataset, and a class balance sampling enhancement method is added during training, namely training scenes with few classes are randomly copied and sent into the training dataset, so that class imbalance is relieved, and the number of training samples is expanded.
9. The complex traffic scene-oriented laser point cloud three-dimensional target detection model as claimed in any one of claims 1 to 7, wherein the model training adopts end-to-end training to train five categories of cars, trucks, buses, riders and pedestrians simultaneously; training the net in 2 blocks of RTX 2080Ti GPUSThe optimizer of the round 90 is Adam, the learning rate change mode adopts a cosine fire mode, the maximum learning rate is 0.003, the frequency division coefficient is 10, the momentum is from 0.95 to 0.85, the weight attenuation is 0.02, and the batch size is 6; total loss L thereofTOTALIncluding RPN loss LRPNAnd RCNN loss LRCNNIn the RPN stage, FocalLoss is used to calculate the classification loss LclsCalculating the regression Loss L by using Smooth L1 Lossreg1In the RCNN stage, the confidence loss L is calculated by using the binary cross entropy lossiouCalculating the regression Loss L by using Smooth L1 Lossreg2
LTOTAL=LRPN+LRCNN
Figure FDA0003323690170000033
Wherein N isfgIs the number of foreground candidate frames, siIs the predicted category of the candidate box,
Figure FDA0003323690170000034
is the category true value of the candidate box, alpha represents the regression loss calculated for only the candidate box in the foreground,
Figure FDA0003323690170000035
are the predicted candidate box regression parameters,
Figure FDA0003323690170000036
is the true value of the candidate box regression parameters;
Figure FDA0003323690170000037
wherein N issIs the number of candidate boxes for the sample, ciIs the confidence level of the prediction that the prediction is,
Figure FDA0003323690170000038
is a target value for confidence, beta stands forA regression loss is calculated for the candidate boxes of the sampled foreground,
Figure FDA0003323690170000041
are the predicted candidate box regression parameters,
Figure FDA0003323690170000042
is the true value of the candidate box regression parameters.
10. A detection method based on a laser point cloud three-dimensional target detection model facing a complex traffic scene is characterized by comprising the following steps:
s1, performing voxelization on the laser radar point cloud input into the network;
performing voxelization preprocessing on the input point cloud along the direction X, Y, Z, dividing the whole point cloud scene into uniform voxels, wherein the range of the whole point cloud scene (X, Y, Z) is [ (-75.2,75.2), (-75.2,75.2), (-5,3) ], the size of each voxel is set to be (0.1,0.1,0.2) meter, therefore, the whole scene is divided into 1504 × 1504 × 40 voxels with equal size, if the number of original points in each voxel exceeds 5, the whole scene is down-sampled to 5, and the mean values of X, Y, Z, coordinates and reflection intensity of all the points in each voxel after down-sampling are taken as the original characteristics of the voxels;
s2, extracting the features of the non-empty voxels by using a three-dimensional encoder, and projecting the obtained voxel features onto a bird' S-eye view to generate a pseudo-image expression;
the input of the three-dimensional encoder is a voxel expression of 40 multiplied by 1504, each voxel has original characteristics of 4 dimensions (X, Y, Z, reflection intensity), after the input, the extraction of the original characteristics of the voxel is firstly carried out through a 3 multiplied by 3 sub-manifold convolution and 1 time down sampling is carried out, then 2 continuous residual error sub-manifold blocks are used for carrying out characteristic extraction, then 3 multiplied by 3 sparse convolution is respectively carried out along the X, Y and Z directions for carrying out 2 times, 4 times and 8 times down sampling to obtain multi-scale three-dimensional characteristics, and each sparse convolution is followed by 2 continuous residual error sub-manifold blocks for carrying out characteristic extraction; the structure of the residual sub-manifold block is formed by 2 sub-manifold convolutions of 3 multiplied by 3, and the input of the first sub-manifold convolution and the output of the second sub-manifold convolution are added, meanwhile, each sub-manifold convolution is followed by a BatchNorm layer and a ReLU layer; finally, carrying out 2-time down-sampling on the Z direction by using a sparse convolution, projecting the Z direction on the aerial view, and converting the Z direction into a 256X 188 pseudo-image expression;
s3, performing feature extraction on the pseudo-image expression by using a two-dimensional encoder to obtain two-dimensional feature expression;
the method comprises the steps that 256 × 188 × 188 pseudo-image expressions obtained by a three-dimensional encoder are respectively subjected to 1-time, 2-time and 4-time down-sampling by 13 × 3 two-dimensional convolution to obtain multi-scale features, then 2 continuous self-calibration convolutions are used for extracting the multi-scale features after each two-dimensional convolution, the input of the first self-calibration convolution and the output of the second self-calibration convolution are added, and space attention and channel attention mechanisms are added after the self-calibration convolutions, so that useful feature expressions are enhanced in the space and channel directions, and useless information is reduced; finally, returning the multi-scale features to the same size through deconvolution, splicing along the channel direction, and performing dimension compression by using a two-dimensional convolution to obtain a 64X 188 two-dimensional feature expression;
s4, classifying and regressing the target by using the two-dimensional characteristics to generate a high-quality candidate frame;
aiming at the two-dimensional feature expression with the size of 64 x 188 obtained by a two-dimensional encoder module, 10 anchor frames are placed in each pixel of a 188 x 188 two-dimensional feature map, wherein the anchor frames are respectively 5 categories of cars, trucks, buses, pedestrians and riders, and each category is in two directions of 0 degrees and 180 degrees, category prediction and size regression (x, y, z, w, h, l and theta) are carried out on each anchor frame by using the two-dimensional feature map to obtain 10 x 188 three-dimensional frames, and 9000 three-dimensional frames with the highest classification scores are selected and sent to a non-maximum suppression module to obtain 512 high-quality candidate frames during training.
S5, voxel pooling is carried out on the candidate frames to obtain pooling characteristics;
for 512 high-quality candidate frames generated by the candidate frame generation module, randomly sampling 128 candidate frames and sending the candidate frames to a refinement stage, wherein 64 of the candidate frames are positive samples, and 64 of the candidate frames are negative samples, and extracting multi-scale voxel characteristics by using a voxel candidate frame pooling module: firstly, uniformly sampling 6 x 6 grid points for each candidate frame, then inquiring non-empty voxels within a certain distance around the grid points, and then extracting the non-empty voxels features by using a pointnet, a full connection layer and a maximum pooling layer to obtain pooling features; when the voxels around the grid points are inquired, the voxels sampled by 2 times, 4 times and 8 times are inquired respectively, so that the multi-scale voxel characteristics are obtained;
s6, refining the candidate frame by using the full connection layer according to the pooling characteristics;
and aiming at 128 candidate frames randomly sampled by the candidate frame pooling module, taking the pooling of the 128 candidate frames as input, performing confidence prediction and refinement of regression frames through a 2-layer full-connection layer, and predicting the classification confidence c of each candidate frame for confidence prediction branchesiThe target is obtained by the following formula:
Figure FDA0003323690170000051
wherein
Figure FDA0003323690170000052
A confidence target value of the ith candidate box; IoUiIs the intersection ratio of the ith candidate frame and its true value frame, thetaH,θLThe intersection ratio threshold of the foreground and the background is 0.75 and 0.25 respectively;
the intersection ratio of the prediction box and the truth value is associated with the classification execution degree, so that the problem that the classification confidence degree and the positioning accuracy are not matched is solved; for the regression frame refinement branch, predicting more accurate position information (x, y, z, w, h, l, theta) through the full connection layer;
s7, visualizing the detection effect;
sorting candidate frames of a plurality of targets in the refined point cloud road scene from large to small according to the classification confidence degrees, filtering frames with low confidence degrees, selecting frames with the highest score from the rest frames, deleting the frames which are intersected with the frames and have the ratio larger than a threshold value, and circularly performing the non-maximum value inhibition operation in the rest frames to generate a final detection result for visualization.
CN202111255417.9A 2021-10-27 2021-10-27 Laser point cloud three-dimensional target detection model and method for complex traffic scene Pending CN113920499A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111255417.9A CN113920499A (en) 2021-10-27 2021-10-27 Laser point cloud three-dimensional target detection model and method for complex traffic scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111255417.9A CN113920499A (en) 2021-10-27 2021-10-27 Laser point cloud three-dimensional target detection model and method for complex traffic scene

Publications (1)

Publication Number Publication Date
CN113920499A true CN113920499A (en) 2022-01-11

Family

ID=79243235

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111255417.9A Pending CN113920499A (en) 2021-10-27 2021-10-27 Laser point cloud three-dimensional target detection model and method for complex traffic scene

Country Status (1)

Country Link
CN (1) CN113920499A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082902A (en) * 2022-07-22 2022-09-20 松立控股集团股份有限公司 Vehicle target detection method based on laser radar point cloud
CN116030023A (en) * 2023-02-02 2023-04-28 泉州装备制造研究所 Point cloud detection method and system
CN116229174A (en) * 2023-03-10 2023-06-06 南京审计大学 Hyperspectral multi-class change detection method based on spatial spectrum combined attention mechanism
CN116299247A (en) * 2023-05-19 2023-06-23 中国科学院精密测量科学与技术创新研究院 InSAR atmospheric correction method based on sparse convolutional neural network
CN116664874A (en) * 2023-08-02 2023-08-29 安徽大学 Single-stage fine-granularity light-weight point cloud 3D target detection system and method
CN116740668A (en) * 2023-08-16 2023-09-12 之江实验室 Three-dimensional object detection method, three-dimensional object detection device, computer equipment and storage medium
CN117173655A (en) * 2023-08-28 2023-12-05 南京航空航天大学 Multi-mode 3D target detection method based on semantic propagation and cross-attention mechanism
CN117252899A (en) * 2023-09-26 2023-12-19 探维科技(苏州)有限公司 Target tracking method and device
CN117315724A (en) * 2023-11-29 2023-12-29 烟台大学 Open scene-oriented three-dimensional pedestrian detection method, system, equipment and medium

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082902A (en) * 2022-07-22 2022-09-20 松立控股集团股份有限公司 Vehicle target detection method based on laser radar point cloud
CN115082902B (en) * 2022-07-22 2022-11-11 松立控股集团股份有限公司 Vehicle target detection method based on laser radar point cloud
CN116030023A (en) * 2023-02-02 2023-04-28 泉州装备制造研究所 Point cloud detection method and system
CN116229174A (en) * 2023-03-10 2023-06-06 南京审计大学 Hyperspectral multi-class change detection method based on spatial spectrum combined attention mechanism
CN116299247A (en) * 2023-05-19 2023-06-23 中国科学院精密测量科学与技术创新研究院 InSAR atmospheric correction method based on sparse convolutional neural network
CN116299247B (en) * 2023-05-19 2023-08-04 中国科学院精密测量科学与技术创新研究院 InSAR atmospheric correction method based on sparse convolutional neural network
CN116664874A (en) * 2023-08-02 2023-08-29 安徽大学 Single-stage fine-granularity light-weight point cloud 3D target detection system and method
CN116664874B (en) * 2023-08-02 2023-10-20 安徽大学 Single-stage fine-granularity light-weight point cloud 3D target detection system and method
CN116740668A (en) * 2023-08-16 2023-09-12 之江实验室 Three-dimensional object detection method, three-dimensional object detection device, computer equipment and storage medium
CN116740668B (en) * 2023-08-16 2023-11-14 之江实验室 Three-dimensional object detection method, three-dimensional object detection device, computer equipment and storage medium
CN117173655A (en) * 2023-08-28 2023-12-05 南京航空航天大学 Multi-mode 3D target detection method based on semantic propagation and cross-attention mechanism
CN117252899A (en) * 2023-09-26 2023-12-19 探维科技(苏州)有限公司 Target tracking method and device
CN117252899B (en) * 2023-09-26 2024-05-17 探维科技(苏州)有限公司 Target tracking method and device
CN117315724A (en) * 2023-11-29 2023-12-29 烟台大学 Open scene-oriented three-dimensional pedestrian detection method, system, equipment and medium
CN117315724B (en) * 2023-11-29 2024-03-08 烟台大学 Open scene-oriented three-dimensional pedestrian detection method, system, equipment and medium

Similar Documents

Publication Publication Date Title
CN113920499A (en) Laser point cloud three-dimensional target detection model and method for complex traffic scene
CN113128348B (en) Laser radar target detection method and system integrating semantic information
CN111242041B (en) Laser radar three-dimensional target rapid detection method based on pseudo-image technology
CN111695448B (en) Roadside vehicle identification method based on visual sensor
CN110766098A (en) Traffic scene small target detection method based on improved YOLOv3
CN113095152B (en) Regression-based lane line detection method and system
Chao et al. Multi-lane detection based on deep convolutional neural network
CN110599497A (en) Drivable region segmentation method based on deep neural network
CN115187964A (en) Automatic driving decision-making method based on multi-sensor data fusion and SoC chip
CN114764856A (en) Image semantic segmentation method and image semantic segmentation device
CN114913498A (en) Parallel multi-scale feature aggregation lane line detection method based on key point estimation
CN116486368A (en) Multi-mode fusion three-dimensional target robust detection method based on automatic driving scene
CN114973199A (en) Rail transit train obstacle detection method based on convolutional neural network
CN114782949B (en) Traffic scene semantic segmentation method for boundary guide context aggregation
CN115359455A (en) Lightweight vehicle detection method based on deep learning
CN114821508A (en) Road three-dimensional target detection method based on implicit context learning
CN114495050A (en) Multitask integrated detection method for automatic driving forward vision detection
CN117115690A (en) Unmanned aerial vehicle traffic target detection method and system based on deep learning and shallow feature enhancement
CN116630702A (en) Pavement adhesion coefficient prediction method based on semantic segmentation network
CN114120246B (en) Front vehicle detection algorithm based on complex environment
CN116486352A (en) Lane line robust detection and extraction method based on road constraint
CN114820931B (en) Virtual reality-based CIM (common information model) visual real-time imaging method for smart city
CN115953660A (en) Point cloud 3D target detection method based on pseudo label and oriented to automatic driving
Jiangzhou et al. Research on real-time object detection algorithm in traffic monitoring scene
CN114882205A (en) Target detection method based on attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination