CN111862101A - 3D point cloud semantic segmentation method under aerial view coding visual angle - Google Patents

3D point cloud semantic segmentation method under aerial view coding visual angle Download PDF

Info

Publication number
CN111862101A
CN111862101A CN202010681588.7A CN202010681588A CN111862101A CN 111862101 A CN111862101 A CN 111862101A CN 202010681588 A CN202010681588 A CN 202010681588A CN 111862101 A CN111862101 A CN 111862101A
Authority
CN
China
Prior art keywords
point cloud
network
convolution
module
residual error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010681588.7A
Other languages
Chinese (zh)
Inventor
杨树明
李述胜
袁野
王腾
胡鹏宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202010681588.7A priority Critical patent/CN111862101A/en
Publication of CN111862101A publication Critical patent/CN111862101A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a 3D point cloud semantic segmentation method under a bird's-eye view coding view angle, which converts an input 3D point cloud into the bird's-eye view angle through a voxel-based coding mode, extracts the characteristics of each voxel through a simplified PointNet network, converts the characteristics into a characteristic image which can be directly processed by a 2D convolution network, processes the coded characteristic image by a full convolution network structure consisting of residual modules reconstructed by means of decomposition convolution and cavity convolution, obtains an end-to-end pixel level semantic segmentation result, can accelerate the point cloud network semantic segmentation, and achieves a point cloud segmentation task under a high-precision real-time large scene under the condition of limited hardware. The method can be directly used for tasks such as robots, unmanned driving, unordered grabbing and the like, and due to the design of the method on a coding mode and a network structure, the method has low system overhead while having high-precision point cloud semantic segmentation, and is more suitable for hardware-limited scenes such as robots, unmanned driving and the like.

Description

3D point cloud semantic segmentation method under aerial view coding visual angle
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a 3D point cloud semantic segmentation method under a bird's-eye view encoding visual angle.
Background
In 2014, the R-CNN convolutional neural network is proposed, and the original traditional manual feature extraction method which is not before the beginning of the 2010 is slowly replaced by the feature extraction method based on the convolutional neural network. The method for processing two-dimensional images based on the convolutional neural network begins to dominate the development of computer vision technology, and the successful key lies in effective extraction of image features by convolution operation, accurate fitting of a network model fitting method based on data driving to model parameters and high robustness and expansibility brought by a redundant structure of a depth network. The convolutional neural network has the characteristics that the convolutional neural network can well complete the understanding task of a computer to the environment through a large amount of data and an exquisite structure.
Two-dimensional convolution has achieved great success in the image field, and when understanding and analyzing large-scale three-dimensional scenes, two-dimensional convolution is expanded into three-dimensional convolution naturally, and point clouds are processed directly by using the three-dimensional convolution. However, since the point cloud is generally highly sparse and lacks surface texture information, the method for directly processing the point cloud by using the three-dimensional convolution has too large system overhead, and is difficult to process the point cloud information in real time.
In order to reduce the overhead of directly processing the point cloud by three-dimensional convolution, some studies propose to perform voxel division on the point cloud and then perform three-dimensional convolution on a voxel grid. This approach may reduce the overhead somewhat, but still has certain limitations. The system overhead and prediction precision of the voxel grid-based processing method are closely related to the grid division precision, so that researchers need to balance model prediction precision and model operation efficiency. Subsequent studies have proposed techniques that utilize octrees to encode point cloud structures, which are then still insufficient to ensure efficient processing of large areas of point clouds.
Qi et al, Stanford university in 2017 propose PointNet, an pioneering network structure for processing disordered point clouds, and explore a new idea for extracting end-to-end point cloud features. The network is difficult to acquire structural features between points and insufficient to acquire global features, so that the network is difficult to be directly applied to semantic segmentation of large-scale point clouds, but the idea inspires many other point cloud segmentation networks and often appears in a point cloud feature extraction network.
The point cloud data is usually obtained by sensors such as a binocular camera, a depth camera and a laser radar, and the point cloud data obtained by the sensors usually only represents the surface information of an object, so that the point cloud data is highly sparse; meanwhile, the point cloud is discontinuous, so that the surface texture information of the object is difficult to express. These are in contrast to three-dimensional data of true voxel morphology, such as medical images. Therefore, the point cloud or the voxel is directly processed by the three-dimensional convolution, and a plurality of invalid operations are generated.
In order to solve the problem of overlarge three-dimensional convolution overhead, a lot of researches are recently carried out to project point clouds to top views (aerial views) or front views, then the point clouds are divided into voxels or volume columns with fixed formats, and manual feature extraction is carried out on points in the fixed voxels to form feature maps which can be directly processed by two-dimensional convolution.
(1) Three-dimensional point cloud semantic segmentation
The semantic segmentation of the three-dimensional point cloud is one of important directions for understanding of a three-dimensional scene of a computer, but because point cloud data is a high-redundancy and non-uniform data structure, a traditional segmentation method based on manual feature extraction is difficult to obtain a satisfactory result in a complex scene. In recent years, with the rise of deep networks, various point cloud segmentation technologies based on deep networks are proposed, such as PointNet, PointNet + + and VoxelNet, and the network models have rapidly enhanced comprehension capability for complex scenes.
(2) Development of deep networks for three-dimensional scene understanding
As deep networks continue to evolve in the field of computer vision, more and more research is beginning to focus on the processing and understanding of three-dimensional scene data by deep learning models. Three-dimensional data has certain complexity, a two-dimensional image can be easily expressed in a matrix form, and the expression form of the three-dimensional data is usually changed due to different scenes and comprises the expression forms of point clouds, triangular surface patches, voxels, multi-angle pictures and the like. Different depth network models can be divided into a two-step network and an end-to-end network according to the steps of processing the point cloud, and can also be divided into a three-dimensional convolution network and a two-dimensional convolution network according to the difference of convolution kernels of the depth network models.
Since end-to-end networks implement feature extraction and prediction using one network model, it is generally more efficient. The two-dimensional convolution parameters are far smaller than the three-dimensional convolution, so that an end-to-end two-dimensional convolution network is more suitable for completing real-time large-scale point cloud processing.
Disclosure of Invention
The invention provides a 3D point cloud semantic segmentation method under a bird's-eye view coding view angle, aiming at solving the technical problems that the prior point cloud scene understanding is easily limited by data high sparsity, local feature robustness is insufficient and system overhead is too large, so that large-scale point cloud is difficult to process in real time, and the like. Under the model framework, the semantic segmentation task of the large-scale point cloud scene can be rapidly, accurately and in real time.
The invention is realized by adopting the following technical scheme:
A3D point cloud semantic segmentation method under a bird's-eye view coding view angle comprises the following steps:
(1) and (3) projecting the point cloud codes under the view angle of the aerial view to construct a feature map which can be directly processed by 2D convolution: firstly, under a world coordinate system, grids are divided under a bird's-eye view, each point in the point cloud is distributed to different voxels obtained by grid division according to x, y and z coordinates of the point cloud, and the features of all the points in each voxel are extracted by utilizing a simplified version of PointNet to form a feature map (H, W and C) which can be directly processed by 2D convolution;
(2) point cloud semantic segmentation network: the data processed by the network is a characteristic diagram in an (H, W, C) form obtained in the step (1), and the network structure consists of an encoder and a decoder which are built by a residual error module consisting of decomposition convolution and cavity convolution; the residual error module and the down-sampling module form an encoder, the residual error module and the up-sampling module form a decoder, and the residual error module and the up-sampling module form an end-to-end pixel-level point cloud semantic segmentation network;
(3) network training:
the network input is disordered point cloud data, model training is carried out by using the data as a driving method, a cross entropy function is used as a loss function for the model, and meanwhile, the phenomenon of data distribution imbalance is relieved by adding punishment weight to error losses of different types:
Figure BDA0002586050580000041
Wherein
Figure BDA0002586050580000042
The subscript c represents the class, wcRepresents a penalty weight, represented bycDetermination of fcRepresenting the frequency of occurrence of objects in category c in the data set;
Figure BDA0002586050580000043
representing a network forecast value;
and calculating the total error of the network by using a loss function formula, updating network parameters by using an error back propagation and random gradient descent method, continuously iterating until the loss function of the model is converged, and finishing training.
The further improvement of the invention is that the specific implementation method of the step (1) is as follows:
(1.1) grid division under the view angle of the aerial view:
dividing grids under the view angle of the aerial view, and dividing the grids under an x-y plane according to a set size; for in the point cloudEach point p comprises the characteristics of three dimensions of x, y and z, and is distributed into different voxels obtained by grid division according to the x and y coordinates of the point p; then, for the voxel containing points inside, calculating the average value of all the point coordinates inside the voxel and marking the average value as xc,yc,zcSimultaneously calculating the deviation of each point in the voxel to the voxel central point x, y, and recording as xpAnd ypAfter expansion, the points in the point cloud have at least the characteristic of D being 9 dimensions;
(1.2) converting the divided point cloud into a feature map:
limiting the maximum number of points in each voxel to N, and converting the point cloud into a (P, N, D) tensor form; then, mapping the features of each point to a high-dimensional feature space by using a simplified PointNet network, wherein the obtained output is in a (P, N, C) form, if the point cloud features are more in consideration of overlapping of a plurality of modules, a maximum pooling layer processing channel N is added at the end of each module to reserve the most prominent features in all points in each voxel, and the obtained output is in the (P, C) form; and finally, reassigning the P voxels to the original positions according to the indexes of the x-y plane positions of the previous voxels to obtain a feature map in the shape of (H, W, C), wherein H and W respectively represent the height and width of the feature map, and for the positions of the empty voxels, filling with zeros.
The further improvement of the invention is that the specific implementation method of the step (2) is as follows:
(2.1) reconstructing residual module architecture:
the reconstructed residual error module replaces 3 × 3 convolution in the traditional residual error module by using 1 × 3 and 3 × 1 decomposition convolution, the perception field is expanded by adding cavity convolution, and the size of a feature map is unchanged before and after the feature map passes through the reconstructed residual error module; meanwhile, by introducing the cavity convolution into the reconstructed residual error module convolution layer, under the condition of keeping parameters unchanged, the perception field of the network is increased, so that the network is wider and has stronger feature extraction and identification capabilities; the residual error module uses a jump connection structure to accelerate deep network training and predict the details of a target;
(2.2) downsampling module architecture:
the down-sampling module is formed by connecting a maximum pooling layer with the step length of 2 and a convolution layer in parallel, and a feature graph obtained by processing the pooling layer and a feature graph obtained by the convolution layer are spliced on the dimension of a feature channel C to form a new feature graph and then output the new feature graph; the step length of the pooling layer and the convolution layer is 2, the size of the feature graph processed by the down-sampling module is reduced by one time, and high-dimensional features are extracted; the pooling layer and the convolution layer are processed in parallel, and network detail characteristics are reserved;
(2.3) overall network architecture:
the whole network consists of an encoder and a decoder, wherein the encoder consists of a down-sampling module and a reconstructed residual error module; the multiple reconstructed residual error modules can extract high-dimensional features more accurately and quickly; the decoder consists of an up-sampling module and a reconstructed residual error module, wherein the up-sampling module consists of a deconvolution layer and aims to restore the spatial dimension of the feature and restore the size of the feature map; meanwhile, a structure that one up-sampling module is matched with a plurality of residual error reconstruction modules is also adopted, so that the network can more finely restore the characteristic details, and the pixel-level point cloud segmentation is achieved; after the decoder restores the feature map to the size of the input feature map, the number of feature map channels is reduced to the number nclass of the final target classification by directly utilizing two layers of 1-1 convolution layers, and the feature map channels are normalized into probability distribution by utilizing the softmax function processing to obtain
Figure BDA0002586050580000051
Figure BDA0002586050580000052
Wherein a iscFor the network output results, the subscript c represents the different categories.
The invention has at least the following beneficial technical effects:
For a point cloud segmentation task, the traditional algorithm is to finely divide voxels in the whole point cloud scene in pursuit of accuracy, so that the parameter quantity is exponentially increased, the efficiency of a network model is low, the system overhead is high, and the actual requirement is difficult to adapt. At the same time, the complexity and sparsity of the point cloud data make it difficult for the network to extract enough features to predict details in the three-dimensional scene.
The 3D point cloud semantic segmentation method under the bird's-eye view coding view angle provided by the invention codes the point cloud under the bird's-eye view, extracts the required characteristics through data by using a simplified and improved version PointNet, and reduces network parameters while expanding the perception field of a convolution kernel by using a reconstructed residual error module and a down-sampling module, so that the network model is deeper and wider, an end-to-end pixel-level full convolution network model is constructed, and the leading point cloud segmentation accuracy is ensured while the very high point cloud segmentation efficiency (20 frames per second under 1080 Ti) is achieved. The invention is extremely suitable for directly utilizing tasks such as robot navigation, automatic driving and the like due to high accuracy, high efficiency and low system overhead.
Drawings
Fig. 1 is a schematic view of encoding under a bird's eye view.
Fig. 2 is a backbone network of the model, each square representing a feature map, and the lower number representing the number of channels of the feature map.
Fig. 3 is a simplified modified structure diagram of PointNet.
FIG. 4(a) is a diagram of a reconstructed residual block structure; fig. 4(b) is a structure diagram of a downsampling block.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, but the present invention is not limited to the specific embodiments.
The invention provides a 3D point cloud semantic segmentation method under an aerial view coding visual angle, which comprises the following steps of:
(1) as shown in fig. 1, point cloud coding is projected under a bird's eye view to construct a feature map which can be directly processed by 2D convolution: firstly, under a world coordinate system, grids are divided under a bird's-eye view, each point in the point cloud is distributed to different voxels obtained by grid division according to x, y and z coordinates of each point, and the features of all the points in each voxel are extracted by using a simplified version of PointNet to form a feature map which can be directly processed by 2D convolution;
(1.1) grid division under the view angle of the aerial view:
dividing grids under the view angle of the aerial view, dividing the grids under the x-y plane according to a set size, wherein the size of the grids can be selected according to actual conditions, for example, in a large-range scene, the size of the grids is selected to be 0.16m 2. Each point p in the point cloud includes features of three dimensions of x, y and z, and some point clouds obtained by sensors such as laser radar also include a reflectivity r, and the point clouds are distributed into different voxels obtained by grid division according to x and y coordinates of the point clouds. Then, for the voxel containing points inside, calculating the average value of all the point coordinates inside the voxel and marking the average value as xc,yc,zcSimultaneously calculating the deviation of each point in the voxel to the voxel central point x, y, and recording as xpAnd ypAfter expansion, the points in the point cloud will have at least D-9 dimensions;
(1.2) converting the divided point cloud into a feature map:
due to the high sparsity of the point cloud, most of voxels obtained by grid division are empty voxels, taking the public data set KITTI as an example, if the point cloud in the KITTI data set is under the x-y plane, the point cloud is 0.162m2And (4) scale division, wherein the number P of non-empty voxels in the division result is approximately 10000-20000. Limiting the maximum point number in each voxel to N, and converting the point cloud into a (P, N, D) tensor form according to the standard; then, mapping the features of each point to a high-dimensional feature space by using a simplified version of PointNet network, wherein the obtained output is in a (P, N, C) form, as shown in FIG. 3, the simplified version of PointNet mainly comprises modules consisting of a linear transformation layer, a batch normalization layer and a ReLU layer, if more point cloud features can be considered, overlapping of a plurality of modules is considered, and a maximum pooling layer processing channel N is added at the end of each module to retain the most prominent feature in all points in each voxel, so that the obtained output is in the (P, C) form; finally, P voxels are reassigned to the original positions according to the index of the x-y plane position of each voxel before, and a feature map in the shape of (H, W, C) is obtained, wherein H and W respectively represent the feature map For the location of empty voxels, zero padding is used, and for the processed feature map we can directly process it using a 2D convolutional network. For example, an RGB color map is a feature map of the form (H, W, C) with channel C of 3.
(2) Point cloud semantic segmentation network: the data processed by the network is a characteristic diagram in an (H, W, C) form obtained in the last step, and the network structure mainly comprises two parts of an encoder and a decoder which are built by a residual error module consisting of decomposition convolution and cavity convolution; the residual error module and the down-sampling module form an encoder, the residual error module and the up-sampling module form a decoder, the residual error module and the up-sampling module form an end-to-end pixel-level semantic segmentation network, and the whole network frame is shown in FIG. 2;
(2.1) reconstructing residual module architecture:
as shown in fig. 4(a), the reconstructed residual error module replaces 3 × 3 convolution in the conventional residual error module with 1 × 3 and 3 × 1 decomposed convolution, and at the same time, the sensing field is expanded by adding hole convolution, so that the network feature extraction capability and accuracy are ensured, and the network parameters are reduced as much as possible to achieve a faster processing speed, and the feature map has a constant size before and after passing through the reconstructed residual error module. By designing a reconstruction residual error module with fewer parameters, the number of the parameters can be kept less under the condition of deeper network structure so as to keep the efficiency of processing data; meanwhile, by introducing the hole convolution, under the condition of keeping the parameters unchanged, the perception field of the network is increased, so that the network is wider and has stronger feature extraction and identification capabilities; the jump connection structure of the residual module can also enable a deep network to be trained better and faster, and can also predict the details of the target better.
(2.2) downsampling module architecture:
as shown in fig. 4(b), the down-sampling module is formed by connecting the maximum pooling layer with the step length of 2 and the convolution layer in parallel, and the feature map obtained by processing the pooling layer and the feature map obtained by the convolution layer are spliced in the dimension of the feature channel C to form a new feature map and then output the new feature map; the step length of the pooling layer and the convolution layer is 2, so that the size of the feature graph processed by the down-sampling module can be reduced by one time, only the prominent features are concerned, and the effect of reducing the parameter quantity is achieved; and the parallel processing of the pooling layer and the convolution layer also enables the module to keep useful detail characteristics as much as possible, and improves the semantic segmentation effect of the network on small targets.
(2.3) overall network architecture:
the whole network consists of an encoder and a decoder, wherein the encoder mainly consists of a down-sampling module and a reconstructed residual error module; the multiple residual error reconstruction modules can extract high-dimensional features more accurately and quickly. The decoder mainly comprises an up-sampling module and a reconstructed residual error module, wherein the up-sampling module mainly comprises a deconvolution layer and aims to restore the spatial dimension of the feature and reduce the size of the feature map; meanwhile, a structure that one up-sampling module is matched with a plurality of residual error reconstruction modules is also adopted, so that the network can restore feature details more finely, and the pixel-level point cloud segmentation is achieved. After the decoder restores the feature map to the size of the input feature map, the number of feature map channels is reduced to the number nclass of the final target classification by directly utilizing two layers of 1-1 convolution layers, and the feature map channels are normalized into probability distribution by utilizing the softmax function processing to obtain
Figure BDA0002586050580000081
Figure BDA0002586050580000091
Wherein a iscFor the network output results, the subscript c represents the different categories.
(3) Network training:
the network input is disordered point cloud data, such as a SemanticKITTI public data set, model training is carried out by using data as a driving method, a cross entropy function is used as a loss function by the model, and meanwhile, penalty weights are added to error losses of different types to relieve the phenomenon of data distribution imbalance:
Figure BDA0002586050580000092
wherein
Figure BDA0002586050580000093
The subscript c represents the class, wcRepresents a penalty weight, mainly consisting ofcDetermination of fcRepresenting the frequency of occurrence of objects in category c in the data set. Generally, object classes that occur at high frequencies in the dataset will be given less weight.
And calculating the total error of the network by using a loss function formula, updating network parameters by using an error back propagation and random gradient descent method, continuously iterating until the loss function of the model is converged, and finishing training.
Examples
The invention provides a 3D point cloud semantic segmentation method under a bird's-eye view coding visual angle.
1. Training network model
Training the 3D point cloud semantic segmentation network model under the aerial view coding visual angle, and firstly, requiring sufficient point cloud data. Each frame of point cloud scene sample should contain XYZ, reflectivity, and semantic category information to which each point belongs. Take the sematic kitti outdoor lidar point cloud dataset as an example. 15000 frames of scene point clouds are used as a training set, and 3000 real point clouds are used as a verification set.
After obtaining an adequate point cloud data set, firstly, each frame of point cloud needs to be encoded through a bird's-eye view to become a grid voxel under a depression view, then, the features of each point are mapped to a high-dimensional feature space by using a simplified and improved PointNet network, and the structure of the simplified PointNet network is shown in FIG. 3. Maximum pooling level processing is added at the end of the module to retain the most salient features of all points in each voxel; finally, according to the index of the x-y plane position of each voxel before, the voxel is redistributed to the original position, and for the position of the empty voxel, zero filling is carried out, and the characteristic diagram is obtained through processing
Then, an end-to-end full convolution network model is built according to the attached figure 2, and for the processing of the feature map, a reconstructed residual error module and a down-sampling module built in the figure 4 are used. Calculating the point cloud network error and the penalty weight of each category by using the formula (1) and the formula (2), iteratively updating parameters according to a gradient back propagation method, and accelerating by using a GPU until the error of the network is reduced to be within a set threshold value or the number of network iterations meets the requirement.
2. Point cloud semantic segmentation process
For a frame of large-scale point cloud, firstly, the frame of large-scale point cloud is sent to a bird's-eye view coding module for coding, a simplified version PointNet is utilized for carrying out feature mapping processing to finally obtain a feature map, then the obtained feature map is sent to a trained semantic segmentation network model for carrying out point cloud segmentation, and a prediction label is given to each point. The scene point cloud segmentation result can be directly used in tasks such as automatic driving and robot navigation.

Claims (3)

1. A3D point cloud semantic segmentation method under a bird's-eye view coding view angle is characterized by comprising the following steps:
(1) and (3) projecting the point cloud codes under the view angle of the aerial view to construct a feature map which can be directly processed by 2D convolution: firstly, under a world coordinate system, grids are divided under a bird's-eye view, each point in the point cloud is distributed to different voxels obtained by grid division according to x, y and z coordinates of the point cloud, and the features of all the points in each voxel are extracted by utilizing a simplified version of PointNet to form a feature map (H, W and C) which can be directly processed by 2D convolution;
(2) point cloud semantic segmentation network: the data processed by the network is a characteristic diagram in an (H, W, C) form obtained in the step (1), and the network structure consists of an encoder and a decoder which are built by a residual error module consisting of decomposition convolution and cavity convolution; the residual error module and the down-sampling module form an encoder, the residual error module and the up-sampling module form a decoder, and the residual error module and the up-sampling module form an end-to-end pixel-level point cloud semantic segmentation network;
(3) network training:
the network input is disordered point cloud data, model training is carried out by using the data as a driving method, a cross entropy function is used as a loss function for the model, and meanwhile, the phenomenon of data distribution imbalance is relieved by adding punishment weight to error losses of different types:
Figure FDA0002586050570000011
Wherein
Figure FDA0002586050570000012
The subscript c represents the class, wcRepresents a penalty weight, represented bycDetermination of fcRepresenting the frequency of occurrence of objects in category c in the data set;
Figure FDA0002586050570000013
representing a network forecast value;
and calculating the total error of the network by using a loss function formula, updating network parameters by using an error back propagation and random gradient descent method, continuously iterating until the loss function of the model is converged, and finishing training.
2. The 3D point cloud semantic segmentation method under the bird's eye view coding view angle according to claim 1, wherein the specific implementation method of the step (1) is as follows:
(1.1) grid division under the view angle of the aerial view:
dividing grids under the view angle of the aerial view, and dividing the grids under an x-y plane according to a set size; for each point p in the point cloud, the point p contains the characteristics of three dimensions of x, y and z, and the point p is distributed into different voxels obtained by grid division according to the x and y coordinates of the point p; then, for the voxel containing points inside, calculating the average value of all the point coordinates inside the voxel and marking the average value as xc,yc,zcSimultaneously calculating the deviation of each point in the voxel to the voxel central point x, y, and recording as xpAnd ypAfter expansion, the points in the point cloud have at least the characteristic of D being 9 dimensions;
(1.2) converting the divided point cloud into a feature map:
Limiting the maximum number of points in each voxel to N, and converting the point cloud into a (P, N, D) tensor form; then, mapping the features of each point to a high-dimensional feature space by using a simplified PointNet network, wherein the obtained output is in a (P, N, C) form, if the point cloud features are more in consideration of overlapping of a plurality of modules, a maximum pooling layer processing channel N is added at the end of each module to reserve the most prominent features in all points in each voxel, and the obtained output is in the (P, C) form; and finally, reassigning the P voxels to the original positions according to the indexes of the x-y plane positions of the previous voxels to obtain a feature map in the shape of (H, W, C), wherein H and W respectively represent the height and width of the feature map, and for the positions of the empty voxels, filling with zeros.
3. The 3D point cloud semantic segmentation method under the view angle of the bird's eye view image of claim 2 is characterized in that the implementation method of the step (2) is as follows:
(2.1) reconstructing residual module architecture:
the reconstructed residual error module replaces 3 × 3 convolution in the traditional residual error module by using 1 × 3 and 3 × 1 decomposition convolution, the perception field is expanded by adding cavity convolution, and the size of a feature map is unchanged before and after the feature map passes through the reconstructed residual error module; meanwhile, by introducing the cavity convolution into the reconstructed residual error module convolution layer, under the condition of keeping parameters unchanged, the perception field of the network is increased, so that the network is wider and has stronger feature extraction and identification capabilities; the residual error module uses a jump connection structure to accelerate deep network training and predict the details of a target;
(2.2) downsampling module architecture:
the down-sampling module is formed by connecting a maximum pooling layer with the step length of 2 and a convolution layer in parallel, and a feature graph obtained by processing the pooling layer and a feature graph obtained by the convolution layer are spliced on the dimension of a feature channel C to form a new feature graph and then output the new feature graph; the step length of the pooling layer and the convolution layer is 2, the size of the feature graph processed by the down-sampling module is reduced by one time, and high-dimensional features are extracted; the pooling layer and the convolution layer are processed in parallel, and network detail characteristics are reserved;
(2.3) overall network architecture:
the whole network is composed of an encoder andthe encoder is composed of a down-sampling module and a reconstructed residual error module, and in a common structure, a plurality of reconstructed residual error modules are connected behind one down-sampling module and used for extracting point cloud characteristics and reducing network parameters; the multiple reconstructed residual error modules can extract high-dimensional features more accurately and quickly; the decoder consists of an up-sampling module and a reconstructed residual error module, wherein the up-sampling module consists of a deconvolution layer and aims to restore the spatial dimension of the feature and restore the size of the feature map; meanwhile, a structure that one up-sampling module is matched with a plurality of residual error reconstruction modules is also adopted, so that the network can more finely restore the characteristic details, and the pixel-level point cloud segmentation is achieved; after the decoder restores the feature map to the size of the input feature map, the number of feature map channels is reduced to the number nclass of the final target classification by directly utilizing two layers of 1-1 convolution layers, and the feature map channels are normalized into probability distribution by utilizing the softmax function processing to obtain
Figure FDA0002586050570000031
Figure FDA0002586050570000032
Wherein a iscFor the network output results, the subscript c represents the different categories.
CN202010681588.7A 2020-07-15 2020-07-15 3D point cloud semantic segmentation method under aerial view coding visual angle Pending CN111862101A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010681588.7A CN111862101A (en) 2020-07-15 2020-07-15 3D point cloud semantic segmentation method under aerial view coding visual angle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010681588.7A CN111862101A (en) 2020-07-15 2020-07-15 3D point cloud semantic segmentation method under aerial view coding visual angle

Publications (1)

Publication Number Publication Date
CN111862101A true CN111862101A (en) 2020-10-30

Family

ID=72983162

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010681588.7A Pending CN111862101A (en) 2020-07-15 2020-07-15 3D point cloud semantic segmentation method under aerial view coding visual angle

Country Status (1)

Country Link
CN (1) CN111862101A (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112330680A (en) * 2020-11-04 2021-02-05 中山大学 Lookup table-based method for accelerating point cloud segmentation
CN112465844A (en) * 2020-12-29 2021-03-09 华北电力大学 Multi-class loss function for image semantic segmentation and design method thereof
CN112488117A (en) * 2020-12-11 2021-03-12 南京理工大学 Point cloud analysis method based on direction-induced convolution
CN112560865A (en) * 2020-12-23 2021-03-26 清华大学 Semantic segmentation method for point cloud under outdoor large scene
CN112731339A (en) * 2021-01-04 2021-04-30 东风汽车股份有限公司 Three-dimensional target detection system based on laser point cloud and detection method thereof
CN112818756A (en) * 2021-01-13 2021-05-18 上海西井信息科技有限公司 Target detection method, system, device and storage medium
CN112819833A (en) * 2021-02-05 2021-05-18 四川大学 Large scene point cloud semantic segmentation method
CN113011317A (en) * 2021-03-16 2021-06-22 青岛科技大学 Three-dimensional target detection method and detection device
CN113256793A (en) * 2021-05-31 2021-08-13 浙江科技学院 Three-dimensional data processing method and system
CN113378756A (en) * 2021-06-24 2021-09-10 深圳市赛维网络科技有限公司 Three-dimensional human body semantic segmentation method, terminal device and storage medium
CN113392842A (en) * 2021-06-03 2021-09-14 电子科技大学 Point cloud semantic segmentation method based on point data network structure improvement
CN113392841A (en) * 2021-06-03 2021-09-14 电子科技大学 Three-dimensional point cloud semantic segmentation method based on multi-feature information enhanced coding
CN113759338A (en) * 2020-11-09 2021-12-07 北京京东乾石科技有限公司 Target detection method and device, electronic equipment and storage medium
CN113936139A (en) * 2021-10-29 2022-01-14 江苏大学 Scene aerial view reconstruction method and system combining visual depth information and semantic segmentation
CN114187310A (en) * 2021-11-22 2022-03-15 华南农业大学 Large-scale point cloud segmentation method based on octree and PointNet ++ network
CN114359562A (en) * 2022-03-20 2022-04-15 宁波博登智能科技有限公司 Automatic semantic segmentation and labeling system and method for four-dimensional point cloud
CN114359902A (en) * 2021-12-03 2022-04-15 武汉大学 Three-dimensional point cloud semantic segmentation method based on multi-scale feature fusion
CN114638956A (en) * 2022-05-23 2022-06-17 南京航空航天大学 Whole airplane point cloud semantic segmentation method based on voxelization and three-view
CN114972763A (en) * 2022-07-28 2022-08-30 香港中文大学(深圳)未来智联网络研究院 Laser radar point cloud segmentation method, device, equipment and storage medium
CN115035296A (en) * 2022-06-15 2022-09-09 清华大学 Flying vehicle 3D semantic segmentation method and system based on aerial view projection
CN115760886A (en) * 2022-11-15 2023-03-07 中国平安财产保险股份有限公司 Plot partitioning method and device based on aerial view of unmanned aerial vehicle and related equipment
WO2023193400A1 (en) * 2022-04-06 2023-10-12 合众新能源汽车股份有限公司 Point cloud detection and segmentation method and apparatus, and electronic device
CN116958557A (en) * 2023-08-11 2023-10-27 安徽大学 Three-dimensional indoor scene semantic segmentation method based on residual impulse neural network
WO2023213083A1 (en) * 2022-05-05 2023-11-09 北京京东乾石科技有限公司 Object detection method and apparatus and driverless car
WO2024015891A1 (en) * 2022-07-15 2024-01-18 The Regents Of The University Of California Image and depth sensor fusion methods and systems

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109410307A (en) * 2018-10-16 2019-03-01 大连理工大学 A kind of scene point cloud semantic segmentation method
US20190096125A1 (en) * 2017-09-28 2019-03-28 Nec Laboratories America, Inc. Generating occlusion-aware bird eye view representations of complex road scenes
US20190108639A1 (en) * 2017-10-09 2019-04-11 The Board Of Trustees Of The Leland Stanford Junior University Systems and Methods for Semantic Segmentation of 3D Point Clouds
CN110879994A (en) * 2019-12-02 2020-03-13 中国科学院自动化研究所 Three-dimensional visual inspection detection method, system and device based on shape attention mechanism
WO2020053611A1 (en) * 2018-09-12 2020-03-19 Toyota Motor Europe Electronic device, system and method for determining a semantic grid of an environment of a vehicle

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190096125A1 (en) * 2017-09-28 2019-03-28 Nec Laboratories America, Inc. Generating occlusion-aware bird eye view representations of complex road scenes
US20190108639A1 (en) * 2017-10-09 2019-04-11 The Board Of Trustees Of The Leland Stanford Junior University Systems and Methods for Semantic Segmentation of 3D Point Clouds
WO2020053611A1 (en) * 2018-09-12 2020-03-19 Toyota Motor Europe Electronic device, system and method for determining a semantic grid of an environment of a vehicle
CN109410307A (en) * 2018-10-16 2019-03-01 大连理工大学 A kind of scene point cloud semantic segmentation method
CN110879994A (en) * 2019-12-02 2020-03-13 中国科学院自动化研究所 Three-dimensional visual inspection detection method, system and device based on shape attention mechanism

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ALEX H. LANG等: ""PointPillars: Fast Encoders for Object Detection from Point Clouds"", 《HTTPS://ARXIV.ORG/ABS/1812.05784》 *
EDUARDO ROMERA等: ""Efficient ConvNet for Real-time Semantic Segmentation"", 《2017 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV)》 *
黄忠义等: ""Kinect点云的平面提取算法研究"", 《全球定位***》 *

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112330680A (en) * 2020-11-04 2021-02-05 中山大学 Lookup table-based method for accelerating point cloud segmentation
CN112330680B (en) * 2020-11-04 2023-07-21 中山大学 Method for accelerating point cloud segmentation based on lookup table
CN113759338B (en) * 2020-11-09 2024-04-16 北京京东乾石科技有限公司 Target detection method and device, electronic equipment and storage medium
CN113759338A (en) * 2020-11-09 2021-12-07 北京京东乾石科技有限公司 Target detection method and device, electronic equipment and storage medium
CN112488117A (en) * 2020-12-11 2021-03-12 南京理工大学 Point cloud analysis method based on direction-induced convolution
CN112488117B (en) * 2020-12-11 2022-09-27 南京理工大学 Point cloud analysis method based on direction-induced convolution
CN112560865A (en) * 2020-12-23 2021-03-26 清华大学 Semantic segmentation method for point cloud under outdoor large scene
CN112560865B (en) * 2020-12-23 2022-08-12 清华大学 Semantic segmentation method for point cloud under outdoor large scene
CN112465844A (en) * 2020-12-29 2021-03-09 华北电力大学 Multi-class loss function for image semantic segmentation and design method thereof
CN112731339A (en) * 2021-01-04 2021-04-30 东风汽车股份有限公司 Three-dimensional target detection system based on laser point cloud and detection method thereof
CN112818756A (en) * 2021-01-13 2021-05-18 上海西井信息科技有限公司 Target detection method, system, device and storage medium
CN112819833A (en) * 2021-02-05 2021-05-18 四川大学 Large scene point cloud semantic segmentation method
CN112819833B (en) * 2021-02-05 2022-07-12 四川大学 Large scene point cloud semantic segmentation method
CN113011317A (en) * 2021-03-16 2021-06-22 青岛科技大学 Three-dimensional target detection method and detection device
CN113256793A (en) * 2021-05-31 2021-08-13 浙江科技学院 Three-dimensional data processing method and system
CN113392842A (en) * 2021-06-03 2021-09-14 电子科技大学 Point cloud semantic segmentation method based on point data network structure improvement
CN113392841A (en) * 2021-06-03 2021-09-14 电子科技大学 Three-dimensional point cloud semantic segmentation method based on multi-feature information enhanced coding
CN113392841B (en) * 2021-06-03 2022-11-18 电子科技大学 Three-dimensional point cloud semantic segmentation method based on multi-feature information enhanced coding
CN113378756A (en) * 2021-06-24 2021-09-10 深圳市赛维网络科技有限公司 Three-dimensional human body semantic segmentation method, terminal device and storage medium
CN113378756B (en) * 2021-06-24 2022-06-14 深圳市赛维网络科技有限公司 Three-dimensional human body semantic segmentation method, terminal device and storage medium
CN113936139B (en) * 2021-10-29 2024-06-11 江苏大学 Scene aerial view reconstruction method and system combining visual depth information and semantic segmentation
CN113936139A (en) * 2021-10-29 2022-01-14 江苏大学 Scene aerial view reconstruction method and system combining visual depth information and semantic segmentation
CN114187310A (en) * 2021-11-22 2022-03-15 华南农业大学 Large-scale point cloud segmentation method based on octree and PointNet ++ network
CN114359902A (en) * 2021-12-03 2022-04-15 武汉大学 Three-dimensional point cloud semantic segmentation method based on multi-scale feature fusion
CN114359902B (en) * 2021-12-03 2024-04-26 武汉大学 Three-dimensional point cloud semantic segmentation method based on multi-scale feature fusion
CN114359562A (en) * 2022-03-20 2022-04-15 宁波博登智能科技有限公司 Automatic semantic segmentation and labeling system and method for four-dimensional point cloud
WO2023193400A1 (en) * 2022-04-06 2023-10-12 合众新能源汽车股份有限公司 Point cloud detection and segmentation method and apparatus, and electronic device
WO2023213083A1 (en) * 2022-05-05 2023-11-09 北京京东乾石科技有限公司 Object detection method and apparatus and driverless car
CN114638956A (en) * 2022-05-23 2022-06-17 南京航空航天大学 Whole airplane point cloud semantic segmentation method based on voxelization and three-view
US11836896B2 (en) 2022-05-23 2023-12-05 Nanjing University Of Aeronautics And Astronautics Semantic segmentation method for aircraft point cloud based on voxelization and three views
CN115035296A (en) * 2022-06-15 2022-09-09 清华大学 Flying vehicle 3D semantic segmentation method and system based on aerial view projection
CN115035296B (en) * 2022-06-15 2024-07-12 清华大学 Flying car 3D semantic segmentation method and system based on aerial view projection
WO2024015891A1 (en) * 2022-07-15 2024-01-18 The Regents Of The University Of California Image and depth sensor fusion methods and systems
CN114972763B (en) * 2022-07-28 2022-11-04 香港中文大学(深圳)未来智联网络研究院 Laser radar point cloud segmentation method, device, equipment and storage medium
CN114972763A (en) * 2022-07-28 2022-08-30 香港中文大学(深圳)未来智联网络研究院 Laser radar point cloud segmentation method, device, equipment and storage medium
CN115760886B (en) * 2022-11-15 2024-04-05 中国平安财产保险股份有限公司 Land parcel dividing method and device based on unmanned aerial vehicle aerial view and related equipment
CN115760886A (en) * 2022-11-15 2023-03-07 中国平安财产保险股份有限公司 Plot partitioning method and device based on aerial view of unmanned aerial vehicle and related equipment
CN116958557A (en) * 2023-08-11 2023-10-27 安徽大学 Three-dimensional indoor scene semantic segmentation method based on residual impulse neural network

Similar Documents

Publication Publication Date Title
CN111862101A (en) 3D point cloud semantic segmentation method under aerial view coding visual angle
CN110443842B (en) Depth map prediction method based on visual angle fusion
CN109410307B (en) Scene point cloud semantic segmentation method
CN111832655B (en) Multi-scale three-dimensional target detection method based on characteristic pyramid network
CN109410321A (en) Three-dimensional rebuilding method based on convolutional neural networks
CN111161364B (en) Real-time shape completion and attitude estimation method for single-view depth map
CN110135227B (en) Laser point cloud outdoor scene automatic segmentation method based on machine learning
CN111028335B (en) Point cloud data block surface patch reconstruction method based on deep learning
CN110827295A (en) Three-dimensional semantic segmentation method based on coupling of voxel model and color information
CN116229057B (en) Method and device for three-dimensional laser radar point cloud semantic segmentation based on deep learning
CN113256649B (en) Remote sensing image station selection and line selection semantic segmentation method based on deep learning
CN112270332A (en) Three-dimensional target detection method and system based on sub-stream sparse convolution
CN115471634B (en) Modeling method and device for urban green plant twins
CN114549537A (en) Unstructured environment point cloud semantic segmentation method based on cross-modal semantic enhancement
CN112001293A (en) Remote sensing image ground object classification method combining multi-scale information and coding and decoding network
CN114373104A (en) Three-dimensional point cloud semantic segmentation method and system based on dynamic aggregation
CN113822825B (en) Optical building target three-dimensional reconstruction method based on 3D-R2N2
CN116402851A (en) Infrared dim target tracking method under complex background
CN115482268A (en) High-precision three-dimensional shape measurement method and system based on speckle matching network
Gu et al. Ue4-nerf: Neural radiance field for real-time rendering of large-scale scene
CN112950786A (en) Vehicle three-dimensional reconstruction method based on neural network
Chen et al. Ground 3D object reconstruction based on multi-view 3D occupancy network using satellite remote sensing image
CN116597071A (en) Defect point cloud data reconstruction method based on K-nearest neighbor point sampling capable of learning
CN110675381A (en) Intrinsic image decomposition method based on serial structure network
CN112785684B (en) Three-dimensional model reconstruction method based on local information weighting mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201030

RJ01 Rejection of invention patent application after publication