CN111242041A

CN111242041A - Laser radar three-dimensional target rapid detection method based on pseudo-image technology

Info

Publication number: CN111242041A
Application number: CN202010040820.9A
Authority: CN
Inventors: 蔡英凤; 栾天雨; 王海; 李祎承; 陈小波
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2020-01-15
Filing date: 2020-01-15
Publication date: 2020-06-05
Anticipated expiration: 2040-01-15
Also published as: CN111242041B

Abstract

The invention provides a laser radar three-dimensional target rapid detection method based on a pseudo-image technology, and S1, ground points in original point cloud data are removed; s2, converting the three-dimensional point cloud data into a two-dimensional pseudo-image data form with three-dimensional characteristic information; s3, setting a two-dimensional convolution neural network fusion feature layer, and capturing feature maps of the pseudo-image under three different scales; s4, based on the feature maps of the pseudo images under different scales, utilizing the SSD detection positioning network to realize position regression and category identification of the target; s5, setting a total loss function of the detection network; and S6, performing network training to obtain a standard three-dimensional target detection network for performing three-dimensional target detection based on the point cloud data acquired by the laser radar. The invention reduces a large amount of calculation, ensures the speed of the detection process and simultaneously accurately obtains the information of the position, the size, the shape and the like of the detected target.

Description

Laser radar three-dimensional target rapid detection method based on pseudo-image technology

Technical Field

The invention belongs to the field of unmanned automobiles, relates to an environment sensing method of an unmanned automobile, and particularly relates to a three-dimensional target rapid detection method based on a laser radar.

Background

The point cloud-based three-dimensional target detection is an indispensable task in the development process of the unmanned technology. The control system of the unmanned vehicle relies on sensors to sense the surrounding environment, including vehicle, pedestrian, and traffic signs, among others. Unmanned vehicles are often equipped with a variety of sensor devices to obtain a greater amount of information. The different sensors can complement each other, and richer information is provided for the control system of the unmanned vehicle to sense. Lidar sensors and camera sensors are the two most commonly used and most prominent sensor devices in unmanned vehicles. The point cloud data collected and generated by the laser radar is the main input for recognizing the target of the unmanned vehicle at present, and the point cloud data is processed or the point cloud and the image information are fused, so that the three-dimensional target detection is realized, and the processing of the point cloud data is avoided.

The basic process of the existing three-dimensional target detection technology based on point cloud data is to divide point clouds into blocks by using a voxel method, manually extract characteristics of the point clouds in each block, and then carry out three-dimensional convolution operation on the coded block point clouds, thereby realizing the classification task of targets in a picture. But the three-dimensional convolution operation is very computationally expensive in large scenes and cannot even be realized.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a rapid detection method for a laser radar three-dimensional target based on a pseudo-image technology, which is characterized in that after the local characteristics of point cloud are fully considered, three-dimensional point cloud data are converted into a two-dimensional pseudo-image data form with three-dimensional characteristic information, the calculated amount is reduced, and high-precision 3D Box parameters are finally output.

The present invention achieves the above-described object by the following technical means.

A laser radar three-dimensional target rapid detection method based on a pseudo-image technology utilizes a two-dimensional convolution neural network, an optimized feature learning network and an SSD detection positioning network to form a laser radar three-dimensional target rapid detection network for rapidly detecting a three-dimensional target.

Further, the optimized feature learning network comprises a sampling layer, a grouping layer and a PointNet layer.

Further, the output of the sampling layer is a set of sampling points n' × D, where: n' is the number of points in the sample point set obtained by the farthest point sampling algorithm, and D is the original characteristic of the points.

Further, the output of the grouping layer is a set of points n' × k × D, where: and k is the number of points contained in the point cloud cluster.

Further, the PointNet layer extracts the characteristics of the point cloud cluster to obtain the global characteristics of the point cloud cluster, namely the local characteristics of the point cloud column; and (4) inputting the point cloud cluster as a unit, iterating the optimization feature extraction process once, and obtaining the global features of the point cloud columns.

Further, the optimized feature learning network converts feature tensors (D, N, P) of the point cloud data into a pseudo image form (C, H, W), where C is the number of channels of the pseudo image, N is the number of points in each point cloud pillar, P is the total area of the grid, H is the length of the grid, and W is the width of the grid.

Further, the loss function adopted by the laser radar three-dimensional target rapid detection network in the training process is as follows:

wherein N is_POSFor the number of positive samples in the network training process, β_loc、β_cls、β_dirWeights for classification loss, regression loss and course angle loss, L_cls、L_loc、L_dirClassification loss, regression loss and course angle loss of 3DBox, respectively.

Still further, the course angle loss is defined by a softmax classification loss function.

Through the technical scheme, the invention has the following beneficial effects:

(1) the invention converts the three-dimensional point cloud data into a two-dimensional pseudo-image data form with three-dimensional information, overcomes the defects of overlarge calculated amount and excessively slow detection speed in the existing three-dimensional target detection technology, and realizes the rapid detection of the three-dimensional target while ensuring certain detection precision.

(2) The optimized feature learning network comprises a sampling layer, a grouping layer and a PointNet layer; the sampling layer samples in the point cloud columns by utilizing a farthest point sampling algorithm to obtain the central points of the point cloud clusters; the grouping layer searches points near the center point of the point cloud cluster by using a kNN algorithm to obtain the nearest neighbor point of the center point; the central point and the nearest neighbor point are used as a cluster of point clouds, namely, the local division of the point clouds is realized; the PointNet layer learns the characteristics of the point cloud clusters by using a simplified PointNet algorithm, and obtains the global characteristics of each point cloud column through maximum pooling and iterative operation; according to the invention, by setting the optimized feature learning network, the problem that the local features of the point cloud cannot be fully learned in the conventional technology of converting point cloud data into a pseudo image is solved, the full extraction of the local features of the three-dimensional target is realized, and the detection precision is effectively improved.

(3) The loss function of the present invention takes into account the classification loss (L) of the 3D Box_cls) Regression loss (L)_loc) And course angle loss (L)_dir) The course angle loss is defined by a softmax classification loss function, and the network can well learn the orientation information of the detected target by setting the loss function containing the course angle.

Drawings

FIG. 1 is a flow chart of a rapid detection method for a laser radar three-dimensional target based on a pseudo-image technology;

FIG. 2 is a flow chart of the present invention for generating a pseudo-image from point cloud data;

FIG. 3 is a schematic diagram of an optimized feature learning network of the present invention;

fig. 4 is a basic structural diagram of the two-dimensional convolutional neural network of the present invention.

Detailed Description

The invention will be further described with reference to the following figures and specific examples, but the scope of the invention is not limited thereto.

As shown in fig. 1, the method for rapidly detecting a three-dimensional target of a laser radar based on a pseudo-image technology specifically includes the following steps:

step (1), point cloud data preprocessing

Collecting the surrounding environment of the vehicle by a laser radar, generating original point cloud data, preprocessing the original point cloud data, performing ground point detection on the point cloud data by using a random sampling consistency algorithm (RANSAC), and removing ground points of the whole point cloud data by using the random sampling consistency algorithm in a flat area; for uneven areas, the point cloud data is partitioned, and then ground point removal is carried out on the point cloud data in each block.

The principle of detecting and rejecting ground points by using a random sampling consistency algorithm is as follows: and continuously and randomly extracting three points in a point cloud space to construct a plane equation, and if the constructed plane contains enough points, considering the plane as the ground and directly filtering the ground points.

Step (2), converting the point cloud data to generate a pseudo image (as shown in figure 2)

And (2.1) firstly, dividing grids on a plane where the top view of the preprocessed point cloud data is located, then stretching in the z-axis direction to generate individual point cloud columns, and paving the whole plane.

And (2.2) extracting an original feature D (x, y, z, r) for each point in the point cloud column, wherein x, y and z are coordinates of each point in a laser radar coordinate system, and r is the reflectivity of each point.

And (2.3) constructing tensors (D, N, P), wherein N is the number of points in each point cloud column (when N is larger than a set threshold value, the point cloud column is reserved), P is the total area of the grid, H is the length of the grid, and W is the width of the grid.

Step (2.4), setting an optimized characteristic learning network, which sequentially comprises a sampling layer, a grouping layer and a PointNet layer (as shown in figure 3)

Step (2.4.1), a sampling layer is set, and the purpose of the sampling layer is to select the center of the point cloud cluster

Sampling each cloud column by utilizing a furthest point sampling algorithm (FPS) to obtain a sampling point set, wherein the input of the furthest point sampling algorithm is a labeled point set X ═ X₁，x₂，x₃，……，x_nIn which x_iThe output sampling point set is n 'multiplied by D for representing the D-dimensional vector of the point cloud data, wherein n' is the number of points in the sampling point set obtained by a farthest point sampling algorithm (in the prior art), and the points in the sampling point set are all the point cloud cluster centers.

Step (2.4.2), a grouping layer is set, and the grouping layer is used for creating a plurality of point cloud clusters in each point cloud pillar

Giving a radius, searching for a fixed adjacent point by using a K nearest neighbor classification algorithm (kNN algorithm) with a sampling point as a central reference, wherein the radius is respectively 0.1, 0.2 and 0.4, and the maximum point number in a corresponding circle is 16, 32 and 128; taking the sampling points and the adjacent points as a cluster of point clouds; the input of the grouping layer is a sampling point set n 'xD, and the output of the grouping layer is a point set n' xkxD (all point cloud clusters of each point cloud pillar), where k is the number of points contained in the point cloud clusters (i.e., the sampling points and their neighboring points).

Step (2.4.3), setting a PointNet layer, wherein the PointNet layer is used for obtaining the global characteristics of the point cloud pillar

And setting a point set n' multiplied by k multiplied by D as the input of a PointNet layer, and learning the characteristics of each point in the cluster of point cloud by using a simplified PointNet algorithm. The method specifically comprises the following steps: inputting a cluster of point clouds kXD into a PointNet characteristic learning network, aligning the point clouds on the space through T-Net transformation, and then mapping the point clouds to a 64-dimensional space through an MLP (multi-layer perceptron); and circulating the above processes once, and finally mapping the cluster of point clouds into a k multiplied by 1024 dimensional feature representation, wherein the k multiplied by 1024 dimensional feature representation is the local feature representation of the cluster of point clouds. However, for the three-dimensional point cloud data, 1024-dimensional feature representation is redundant, so that maximum pooling operation is introduced, only the largest feature is reserved on all channels for 1024-dimensional feature representation, and a 1 × 1024 feature vector is obtained, namely the global feature of the point cloud of the cluster and the local feature of the cloud pillar of the whole point.

The above global feature of learning point cloud clusters by using the PointNet algorithm is applicable to each cluster of point cloud. After the global features of each point cloud cluster in the point cloud pillar are learned, each point cloud cluster is used as an input unit, and the steps (2.4.1), (2.4.2) and (2.4.3) are iterated to finally obtain the global features of the point cloud pillar, namely, the whole optimized feature learning network learns the C channels from the point cloud input through iteration. In the whole process of extracting the local characteristics of the point cloud by the optimized point cloud characteristic learning network, the point cloud characteristic information extracted in each dimension contains the local characteristic information of the point cloud midpoint in the previous dimension, and the problem of poor detection effect caused by insufficient extraction of the local characteristics of the point cloud in the traditional three-dimensional target detection method is solved.

Step (2.5), unfolding to generate a pseudo image

After the point cloud data passes through the optimized feature learning network in the step (2.4), the feature tensors (D, N, P) are converted into (C, N, P), the maximum pooling operation is performed on N, the feature tensors are converted into (C, P), and because P is H × W, the point cloud data is expanded into a pseudo image form (C, H, W), and C is the channel number of the pseudo image.

Step (3), the two-dimensional Convolution Neural Network (CNN) fuses the feature layer (as shown in FIG. 4)

The two-dimensional convolutional neural network fusion feature layer comprises a down-sampling layer and an up-sampling layer. And (3) inputting the pseudo image data (C, H, W) output in the step (2) into the two-dimensional convolutional neural network, and outputting the feature map (6C, H/2, W/2) after the down-sampling layer and the up-sampling layer are processed and fused.

Step (3.1), a down-sampling layer is set, and the down-sampling layer is used for capturing feature maps of the pseudo images under different scales

The down-sampling layer includes three sampling layers, each including a convolution layer, a BatchNorm layer, and a ReLU layer. Setting the size of a convolution kernel of the convolution layer to be (3 multiplied by 3); considering that the volume of the vehicle is large and the feature design should not be excessively intensive, the convolution step size of the convolution layer is set to 2 in order to reduce the amount of calculation and improve the detection efficiency. For small three-dimensional objects such as pedestrians and bicycles, the convolution step size of the convolution layer is set to 1.

The first layer convolution output of the downsampling layer is the convolution input of the second layer, and the second layer convolution output is the convolution input of the third layer; the input of the first layer is the dummy image data (C, H, W) output in step (2), the output of the first layer is (C, H/2, W/2), the output of the second layer is (2C, H/4, W/4), and the output of the third layer is (2C, H/8, W/8).

The pseudo image is subjected to three times of downsampling of a downsampling layer, and feature maps at three different scales are captured respectively. The specific structure is shown in fig. 3.

And (3.2) setting an upper sampling layer, wherein the upper sampling layer is used for fusing feature maps under three scales

The up-sampling layer comprises three parallel sampling layers, and each sampling layer comprises an anti-convolution layer, a BatchNorm layer and a ReLU layer. Each sampling layer is butted with each sampling layer in the down-sampling layer, namely the sampling layer output characteristic diagram of the down-sampling layer is the sampling layer input of the up-sampling layer.

And setting the convolution kernel and the convolution step length of the sampling layer in the upper sampling layer corresponding to the convolution kernel and the convolution step length of each sampling layer in the lower sampling layer to obtain the output of all three sampling layers of (2C, H/2 and W/2). In order to realize information fusion of the feature maps under different scales, the output feature maps of the three sampling layers are spliced on the channel dimension, and finally the output feature map of the whole two-dimensional Convolutional Neural Network (CNN) fusion feature layer is (6C, H/2, W/2).

Step (4), detecting and positioning the target classification and regression of the network based on the SSD

In the step (3), feature maps of three scales are obtained through up-sampling and down-sampling, the feature maps of different scales have different sizes, the receptive fields are different, and the corresponding SSD prior frames have different sizes. Thus using two sets of convolution filters for the feature maps at different scales separately produces a fixed set of predictions. The method specifically comprises the following steps: performing convolution operation on feature maps with different scales by using a convolution kernel of 3 multiplied by 3, and then taking a Softmax function as an output activation function of the detection network to obtain a category score; and performing convolution operation on the feature maps with different scales by using a 3 x 3 convolution kernel, and then taking an identity function as an activation function of the regression network to finish position regression of the detection target.

Step (5) of setting a loss function considering course angle loss

The whole target detection network is trained by using a random gradient descent method, so that the total loss function L is reduced to convergence. The method specifically comprises the following steps: and punishment is made on the whole detection model by means of a calculation loss function in the iterative training process of the target detection network, so that the detection model is gradually optimized in the training process. Finally, weights and parameters which can enable the detection model to carry out example-level target detection are learned.

Because the common angular positioning loss can not punish the reversal of the 3DBox, so that the network can not well learn the course angle of the target, the course angle loss is introduced into the invention, the course angle loss is defined by a softmax classification loss function, and the network can well learn the orientation information of the detected target by setting the course angle loss function. The loss function of the present invention considers the class loss (L) of the 3D Box_cls) Regression loss (L)_lac) And course angle loss (L)_dir)。

The loss function expression is:

wherein N is_POSFor the number of positive samples in the network training process, β_loc、β_cls、β_dirThe weights for the classification loss, the regression loss, and the heading angle loss, respectively.

The classification loss expression is:

L_cls＝-a_a(1-p^a)γ_logp^α

where a denotes the anchor prediction, constant α ═ 0.25, p^αThe constant γ is 2 for the probability of the target class.

The regression loss expression is:

L_loc＝∑_{b∈(X，Y，Z，w，l，h，θ)}SmoothL1(Δb)

wherein:

x, Y, Z is the position of the three-dimensional target detection frame in the laser radar coordinate system, w, l and h are the length, width and height of the three-dimensional target detection frame, and theta is the orientation angle parameter of the three-dimensional target detection frame; the superscript gt represents the true label.

Step (6), based on the loss function of step (5), network training is carried out

Pre-training a two-dimensional convolutional neural network using a COCO dataset: initializing a two-dimensional convolutional neural network; the method comprises the steps of forming a laser radar three-dimensional target rapid detection network based on a pseudo-image technology by combining a pre-trained two-dimensional convolution neural network, an optimized feature learning network and a detection positioning network based on an SSD, training the whole laser radar three-dimensional target rapid detection network based on the pseudo-image technology by using labeled point cloud data in a kitti data set, and adjusting training parameters such as learning rate to enable a loss function to converge.

And obtaining a standard laser radar three-dimensional target rapid detection network based on the pseudo-image technology through the network training process. The actual three-dimensional target detection process can be set as that point cloud data of the surrounding environment of the vehicle are collected through the laser radar, the point cloud data are preprocessed through the step (1), then a standard laser radar three-dimensional target rapid detection network based on a pseudo-image technology is input, and finally various parameters such as the position, the size and the like of the detected target are rapidly and accurately obtained.

The present invention is not limited to the above-described embodiments, and any obvious improvements, substitutions or modifications can be made by those skilled in the art without departing from the spirit of the present invention.

Claims

1. The method for rapidly detecting the three-dimensional target of the laser radar based on the pseudo-image technology is characterized in that a two-dimensional convolutional neural network, an optimized feature learning network and an SSD detection positioning network are utilized to form a rapid detection network of the three-dimensional target of the laser radar, and the rapid detection network is used for rapidly detecting the three-dimensional target.

2. The method for rapidly detecting the three-dimensional laser radar target based on the pseudo-image technology as claimed in claim 1, wherein the optimized feature learning network comprises a sampling layer, a grouping layer and a PointNet layer.

3. The method for rapidly detecting the three-dimensional laser radar target based on the pseudo-image technology as claimed in claim 2, wherein the output of the sampling layer is a sampling point set n' × D, wherein: n' is the number of points in the sample point set obtained by the farthest point sampling algorithm, and D is the original characteristic of the points.

4. The method for rapidly detecting the lidar three-dimensional target based on the pseudo-image technology according to claim 3, wherein the output of the grouping layer is a point set n' × k × D, wherein: and k is the number of points contained in the point cloud cluster.

5. The method for rapidly detecting the laser radar three-dimensional target based on the pseudo-image technology as claimed in claim 2, wherein the PointNet layer performs feature extraction on the point cloud cluster to obtain global features of the point cloud cluster, namely local features of a point cloud column; and (4) inputting the point cloud cluster as a unit, iterating the optimization feature extraction process once, and obtaining the global features of the point cloud columns.

6. The method for rapidly detecting the three-dimensional laser radar target based on the pseudo-image technology as claimed in any one of claims 2 to 5, wherein the optimized feature learning network converts feature tensors (D, N, P) of the point cloud data into a pseudo-image form (C, H, W), where C is the number of channels of the pseudo-image, N is the number of points in each cloud column, P is the total area of the grid, H is the length of the grid, and W is the width of the grid.

7. The method for rapidly detecting the three-dimensional target of the laser radar based on the pseudo-image technology as claimed in claim 1, wherein the loss function adopted by the laser radar three-dimensional target rapid detection network in the training process is as follows:

wherein N is_POSFor the number of positive samples in the network training process, β_loc、β_cls、β_dirWeights for classification loss, regression loss and course angle loss, L_cls、L_loc、L_dirClassification loss, regression loss, and course angle loss for the 3D Box, respectively.

8. The method of claim 7, wherein the course angle loss is defined by a softmax classification loss function.