CN115170626A

CN115170626A - Unsupervised method for robust point cloud registration based on depth features

Info

Publication number: CN115170626A
Application number: CN202210847039.1A
Authority: CN
Inventors: 陈明; 韦升喜; 肖远辉; 田旭; 李祺峰; 吴冬柳
Original assignee: Guangxi Normal University
Current assignee: Guangxi Normal University
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2022-10-11

Abstract

The invention discloses an unsupervised method for carrying out robust point cloud registration based on depth characteristics, which comprises the following steps: 1) Acquiring point cloud data; 2) Converting; 3) Extracting characteristics; 4) Point cloud registration; 5) And (5) training. The method is an unsupervised network, the registration framework is trained in an unsupervised mode by combining global and local high-level features to learn and extract the deep features, expensive calculation is not needed to be carried out on point correspondence, and great advantages are achieved in the aspects of precision, initialization robustness and calculation efficiency.

Description

Unsupervised method for robust point cloud registration based on depth features

Technical Field

The invention relates to object three-dimensional reconstruction and positioning, in particular to an unsupervised method for robust point cloud registration based on depth features.

Background

Point cloud registration is a problem of estimating a rigid transformation of the alignment of two point clouds. The method has many applications in various fields such as automatic driving, motion and attitude estimation, three-dimensional reconstruction, simultaneous localization and mapping (SLAM), augmented reality and the like.

Recently, some methods based on Deep Learning (DL) have been proposed to handle large rotation angles. (Y.Wang and J.M.Solomon, "Deep closed point: left prediction for point closed prediction," in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019, pp.3523-3532.) (Y.aoki, H.Goforth, R.A. Srivasan, and S.lucey, "Point: robust & impact point closed prediction Point," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp.7163-7172) (V.Sarode, X.Li, H.Goforth, Y.aoki, R.Srivan, S.lucency, H.Chongsis, "for point closed prediction" P.12, P.78. Echo, and "PcJ.12. Participant, P.78. Echo, P.11. Echo, P.12, P.S.communication, P.12, P.S.

Roughly speaking, they can be divided into two categories, supervised and unsupervised methods that rely on ground truth correspondences or class labels. The depth nearest Point (DCP) calculates rigid transformation by Singular Value Decomposition (SVD), wherein a correspondence is constructed by learning soft matching mapping. PointNet uses classical alignment techniques, such as the Lucas-Kanade (LK) algorithm, to align PointNet features, yielding good generalization capability for shapes not seen in training. However, they rely on a large amount of registration tag data, which makes the algorithm impractical because 3D registration tags are very labor intensive. In contrast, achieving registration from unlabeled point cloud data without ground truth correspondence is a significant challenge. PCRnet alleviates the attitude bias shown in PointNelk by replacing the Lucas-Kanade module with a multi-layer perceptron. The PCRNet recovers the transformation parameters directly from the concatenated global descriptors of the source point cloud and the target point cloud. FMR-Net implements an unsupervised framework using encoder-decoder tasks while achieving registration by minimizing feature metric projection errors. While these methods show the more prominent advantages of unsupervised learning, they rely primarily on globally represented depth features, and ignore locally represented depth features. Therefore, the depth features of the point cloud are not fully utilized for registration, and the registration effect is not perfect.

Disclosure of Invention

The invention aims to provide an unsupervised method for robust point cloud registration based on depth features, aiming at the defects in the prior art. The method is an unsupervised network, the registration framework is trained in an unsupervised mode by combining global and local high-level features to learn and extract the deep features, expensive calculation is not needed to be carried out on point correspondence, and great advantages are achieved in the aspects of precision, initialization robustness and calculation efficiency.

The technical scheme for realizing the purpose of the invention is as follows:

an unsupervised method for robust point cloud registration based on depth features comprises the following steps:

1) Acquiring point cloud data: acquiring point cloud data to be registered, and uniformly sampling 1024 point data on the surface of the point cloud;

2) Conversion: converting the obtained sampling data types of the two point clouds into tensors, wherein the size of the tensor is 1024 multiplied by 3, and then inputting the tensors into a deep learning framework;

3) Feature extraction: an encoder module in the deep learning framework extracts the point cloud depth features of the input tensor, and finally outputs a one-dimensional tensor representing the point cloud depth features, and the specific process is as follows:

3-1) for the input tensor 1024 × 3, by the EdgeConv module, each point is taken as a central point to represent the edge characteristics of the point and each adjacent point, and the characteristics are aggregated to obtain a new characteristic of the point, namely the local characteristics of the point cloud are obtained by constructing the field of each vertex, and the specific steps are as follows:

h _θ (x _i ,x _j )＝h _θ (x _i ,x _j -x _i ) (1)，

wherein x _i In vertex set X = { X = ×) ₁ ,...x _n }∈R ^F And F represents the feature space dimension information of a point output by a certain layer of the neural network, and then the feature space dimension information is sent to a perceptron to obtain edge features:

e' _ijm ＝RELU(θ _m ·(x _j -x _i )+φ _m ·x _i ) (2)，

wherein:

Θ＝(θ ₁ ，...，θ _M ，φ ₁ ...，φ _M ) (3)，

Θ is a learnable parameter, followed by aggregation of the features of each neighboring edge:

wherein

Denotes the polymerization operation, omega denotes the point x _i The central point pair set forming the adjacent edge specifically aggregates the operation as follows:

finally, the size of the output tensor of the edgeConv module is 6 multiplied by 1024 multiplied by 20, and the next volume of lamination is input;

3-2) performing five convolutions on the tensor with the local features, wherein the output tensor size of the convolution layer after the first convolution is subjected to 64 convolution kernels with two-dimensional size of 1 is 64 multiplied by 1024 multiplied by 20;

3-3) inspired by the characteristic that an attention mechanism obviously improves the deep learning effect, the output result of the first layer is accessed into a CBAM attention module to weight the importance of each channel and space, and similarly, the output result sequentially passes through convolutional layers with the sizes of 64, 128 and 256;

3-4) splicing the output tensors of the first four layers, modifying the tensor size to be 512 multiplied by 1024 multiplied by 1, then entering 512 convolution kernels with the two-dimensional size of 1 to perform convolution and modify the dimensionality to obtain the tensor size to be 512 multiplied by 1024, flattening the tensor by using a flat function to output the tensor size to be 512, and thus obtaining the depth feature containing the local descriptor and the global descriptor;

4) Point cloud registration: registering two point clouds, i.e. finding a rigid transformation matrix G E SE (3) divided into a rotation matrix R E SO (3) and a translation matrix t E R ³ Then, the minimization objective function F (G (R, t)) is solved:

wherein psi: R ^3×N →R ^K Extracting a function for the characteristic learned by the encoder, wherein k is a characteristic dimension;

4-1) G (R, t) is regarded as a special lie group and is represented by an exponential mapping as follows:

wherein, T is a generator of exponential mapping G, and epsilon is a lie algebra and is mapped to a transformation matrix G through an exponent;

4-2) the point cloud registration problem is described as Phi (PT) = phi (G.PT), the solving matrix G is the derivative of G obtained by deriving the lie algebra epsilon, so that the lie algebra epsilon is directly adjusted to indirectly optimize the optimal transformation matrix G and minimize the feature space projection error of the two-point cloud to carry out the point cloud registration task, and according to the inspiration of the (IC) formula, firstly carrying out inverse transformation on the Phi (PT) = phi (G.PT) and linearizing the right side of the transformation matrix:

4-3) transformation estimation an iterative run calculates the delta ε, estimating ε at each step by running the inverse synthesis (IC) algorithm:

ε＝(J ^T J) ^-1 (J ^T δ) (9)，

where δ = φ (P) _S )-φ(P _T ) Is a two-point cloud feature space projection error,

is a jacobian matrix of the projection error delta relative to the transformation parameter epsilon;

4-4) in order to effectively calculate the Jacobian matrix, different finite gradients are adopted to replace the traditional random gradient method to calculate the Jacobian matrix:

wherein t is _i For infinite perturbation of the transformation parameters during calculation, when t _i When the value is set to a fixed value, a relatively good effect is obtained, t _i Three angular parameters are set for rotation and three perturbation parameters are set for translation, t _i Size set to 2*e ^-2 ；

4-5) optimizing the transformation matrix by an iterative operation formula (9) to shorten the projection distance between the source point cloud and the sample point cloud in the feature space:

ΔG·P _S →P _S (11)，

wherein

Calculating a transformation parameter epsilon for formula (9) to obtain a transformation matrix, and continuously transforming P through multiple iterations _S And P _T Registering and finally outputting the optimal transformation matrix G _est ：

G _est ＝ΔG _n ·...·ΔG ₁ ΔG ₀ (12)，

5) Training: in order to realize the deep learning framework to train in an unsupervised mode, a decoder consisting of four fully-connected layers is introduced; meanwhile, reLU is selected as an activation function of the first three layers, and the tanh function is an activation function of the fourth layer;

5-1) in an unsupervised condition, performing model training by using alignment errors between the transformed source point cloud and the target point cloud instead of ground true value transformation, and adopting a robust loss function:

wherein, P _T E.g. theta is from the unit square [0,1] ² A set of points sampled in (a), x is a point cloud feature,

is the ith component in the convolutional layer parameters and phi is the originally input three-dimensional point.

The method has high precision and calculation efficiency and strong initialization robustness, and does not need to perform expensive calculation on the point correspondence.

Drawings

FIG. 1 is a schematic diagram of a point cloud registration network architecture in an embodiment;

FIG. 2 is a schematic and effect diagram of an embodiment performing three-dimensional point cloud registration of bottles;

FIG. 3 is a schematic and effect diagram of an embodiment of three-dimensional point cloud registration of a toilet;

FIG. 4 is a schematic and effect diagram of the embodiment for three-dimensional point cloud registration of a noisy handset;

FIG. 5 is a schematic and effect diagram of the embodiment for performing three-dimensional point cloud registration of a real indoor scene;

fig. 6 is a schematic and effect diagram of an embodiment for point cloud registration of Armadillo in stanford 3d match.

Detailed Description

The invention will be described in further detail with reference to the following drawings and specific examples, but the invention is not limited thereto.

Example (b):

referring to fig. 1, an unsupervised method for robust point cloud registration based on depth features includes the following steps:

2) Conversion: converting the obtained sampling data types of the two point clouds into tensors with the size of 1024 multiplied by 3, and then inputting the tensors into a deep learning framework;

3-1) different from other unsupervised methods relying on global descriptors, the method of the present embodiment continuously focuses on local features on the basis of an unsupervised training mode, and for an input tensor 1024 × 3, the edge features of each point and each adjacent point are represented by taking each point as a central point through an EdgeConv module, and then the features are aggregated to obtain a new feature of the point, that is, the local features of the point cloud are obtained by constructing the field of each vertex, and the specific steps are as follows:

h _θ (x _i ,x _j )＝h _θ (x _i ,x _j -x _i ) (1)，

wherein x is _i In vertex set X = { X = ×) ₁ ,...x _n }∈R ^F And F represents the feature space dimension information of a point output by a certain layer of the neural network, and then the feature space dimension information is sent to a perceptron to obtain edge features:

e' _ijm ＝RELU(θ _m ·(x _j -x _i )+φ _m ·x _i ) (2)，

wherein:

Θ＝(θ ₁ ，...，θ _M ，φ ₁ ...，φ _M ) (3)，

wherein

Denotes the polymerization operation, omega denotes the point x _i The specific aggregation operation is as follows:

3-3) inspired by the characteristic that an attention mechanism obviously improves the deep learning effect, the output result of the first layer is accessed into a CBAM attention module to weight the importance of each channel and space, and the weighted result sequentially passes through convolution layers with sizes of 64, 128 and 256 similarly;

3-4) splicing the output tensors of the first four layers, modifying the tensor size to be 512 multiplied by 1024 multiplied by 1, then entering 512 convolution kernels with the two-dimensional size of 1 to perform convolution and modifying the dimensionality to obtain the tensor size to be 512 multiplied by 1024, and flattening the tensor by using a flatten function to obtain the depth feature including the local descriptor and the global descriptor, wherein the output tensor size is 512;

4-3) transformation estimation an iterative run calculates the delta ε, estimating ε at each step by running an inverse synthesis (IC) algorithm:

ε＝(J ^T J) ^-1 (J ^T δ) (9)，

where δ = φ (P) _S )-φ(P _T ) For two-point cloud feature space projection errors,

is the projection error delta relative to the transform

A Jacobian matrix of parameters ε;

ΔG·P _S →P _S (11)，

wherein

Calculating transformation parameter epsilon for formula (9) to obtain transformation matrix, and continuously transforming P through multiple iterations _S And P _T Registering, and finally outputting the optimal transformation matrix G _est ：

G _est ＝ΔG _n ·...·ΔG ₁ ΔG ₀ (12)，

5) Training: in order to realize the deep learning framework to train in an unsupervised mode, a decoder consisting of four fully connected layers is introduced; meanwhile, reLU is selected as an activation function of the first three layers, the tanh function is an activation function of the fourth layer, and the number of layers is shown in FIG. 1;

5-1) under the unsupervised condition, using the alignment error between the transformed source point cloud and the target point cloud instead of the ground true value transformation to train the model, and adopting a robust loss function:

wherein, P _T E.g. theta is from the unit square [0,1] ² A set of points sampled in (b), x is a point cloud feature,

In order to illustrate the effectiveness of the deep learning framework, the most widely used ModelNet40 is adopted as a pre-training set, the ModelNet40 is a data set which has 12311 CAD models and contains 40 object classes, the deep learning framework is trained in the front 20 classes of the ModelNet40,and tested in the last 20 categories. In the training and testing process, a ground truth value transformation matrix G is obtained by randomly generating a rigid transformation matrix G _gt And meanwhile, enabling the source point cloud to pass through G to generate the target point cloud. The axis is arbitrarily selected, and the rotation component in G is set to be [0,45 ]]Within a range of degrees, the translation component is [ -0.5,0.5]In-range initialization with true rotational-translation matrix G _gt And predicting a rotational-translation matrix G _est Mean Square Error (MSE), root Mean Square Error (RMSE) and Absolute average Error (MAE) of the rotation component and the translation component are used as evaluation indexes, and the smaller the value of the MSE, the RMSE and the MAE, the higher the registration accuracy. Meanwhile, in the experiment, the measurement unit of the angle is degree, and the unit of the running registration time is second. The results are shown in the following table:

specific examples are shown in fig. 2-6.

Claims

1. An unsupervised method for robust point cloud registration based on depth features is characterized by comprising the following steps:

2) Conversion: converting the types of the obtained sampling data of the two point clouds into tensors with the size of 1024 multiplied by 3, and inputting the tensors into a deep learning frame;

3) Characteristic extraction: an encoder module in the deep learning framework extracts the point cloud depth features of the input tensor, and finally outputs a one-dimensional tensor representing the point cloud depth features, and the specific process is as follows:

h _θ (x _i ,x _j )＝h _θ (x _i ,x _j -x _i ) (1)，

e' _ijm ＝RELU(θ _m ·(x _j -x _i )+φ _m ·x _i ) (2)，

wherein:

Θ＝(θ ₁ ，...，θ _M ，φ ₁ ...，φ _M ) (3)，

wherein

Denotes the polymerization operation, omega denotes the point x _i Forming a point pair set of adjacent edges for the center, wherein the specific aggregation operation is as follows:

wherein, T is a generator of exponential mapping G, epsilon is a lie algebra and is mapped to a transformation matrix G through an exponent;

4-2) the point cloud registration problem is described as Phi (PT) = phi (G · PT), solving the matrix G, namely obtaining the derivative of G by deriving the lie algebra epsilon, so as to directly adjust the lie algebra epsilon to indirectly optimize the optimal transformation matrix G and minimize the feature space projection error of the two-point cloud, and as inspired by the (IC) formula, firstly performing inverse transformation on Phi (PT) = phi (G · PT) and linearizing the right side thereof:

ε＝(J ^T J) ^-1 (J ^T δ) (9)，

ΔG·P _S →P _S (11)，

wherein

Calculating a transformation parameter epsilon for formula (9) to obtain a transformation matrix, and continuously transforming P through multiple iterations _S And P _T Registering, and finally outputting the optimal transformation matrix G _est ：

G _est ＝ΔG _n ·...·ΔG ₁ ΔG ₀ (12)，