CN112785526B

CN112785526B - Three-dimensional point cloud restoration method for graphic processing

Info

Publication number: CN112785526B
Application number: CN202110116229.1A
Authority: CN
Inventors: 朱佩浪; 张岩; 刘琨
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2021-01-28
Filing date: 2021-01-28
Publication date: 2023-12-05
Anticipated expiration: 2041-01-28
Also published as: CN112785526A

Abstract

The invention provides a three-dimensional point cloud restoration method for graphic processing, which comprises the following steps: step 1, collecting data of an input point cloud model data set; step 2, a method based on Self-Attention mechanism is combined with a multi-layer perceptron MLP to obtain a long-distance dependency relationship extraction network, the long-distance dependency relationship extraction network is used for mapping an input point cloud into a global feature vector, and then a decoder of a topological root tree structure is used for generating a missing part of an incomplete point cloud; and step 3, combining the incomplete point cloud and the generated missing part point cloud together to obtain a final repaired complete point cloud model.

Description

Three-dimensional point cloud restoration method for graphic processing

Technical Field

The invention belongs to the field of computer three-dimensional model processing and computer graphics, and particularly relates to a three-dimensional point cloud repairing method for graphics processing.

Background

In recent years, a large amount of three-dimensional data is directly acquired in the real world, which can be achieved by using a LiDAR scanner or a depth sensor such as Kinect, a stereo camera, and the like.

However, the 3D data obtained using these instruments is often incomplete, mainly for the following reasons: the scanning view angle of the scanner is limited, and the shielding of non-target objects and the influence of light refraction and reflection. Therefore, the loss of the geometric information and the semantic information of the target object is often caused. Thus, it is a very necessary study to investigate how to repair incomplete 3D models for more subsequent applications. In addition, 3D models also appear in a large number of representations such as point clouds, voxels, patches and distance fields. The use of point clouds to represent and process 3D data has received increasing attention because of its lower storage cost compared to other representations (e.g., 3D voxel grids), but it can represent 3D models more finely and finely. The advent of literature 1 C.R.Qi,H.Su,K.Mo,and L.J.Guibas.Pointnet: deep learning on point sets for 3d classification and segmentation.2018 allows unordered point sets to be processed directly, which greatly facilitates the development of deep learning architectures for processing point clouds, as well as the development of other related studies such as 3D scene reconstruction, 3D model segmentation, 3D model repair, and the like.

3D point cloud model repair works based on learning, document 2W.Yuan,T.Khot,D.Held,C.Mertz,and M.Hebert.PCN:Point Completion Network.International Conference on 3D Vision 2018, document 3Z.Huang, Y.Yu, J.xu, F.Ni, and X.le.PF-Net: point Fractal Network for 3D Point Cloud Completion.Conference on Computer Vision and Pattern Recognition 2020, document 4 W.Yuan,L.P.Tchapmi,S.H, rezatofighi, I.Reid, and S.Savaree.TopNet: structural Point Cloud decoder.conference on Computer Vision and Pattern Recognition 2019, etc., typically use a multi-layer perceptron (MLP) as its feature extractor, which takes an incomplete point cloud as input and maps each of its points into feature vectors of different dimensions, and extracts the maximum from the last feature vector to obtain global features. Meanwhile, since there is currently no relatively feasible method to define a local neighborhood of a point cloud, it is difficult to extract features through a convolution operation like a 2D image. Thus, these methods rely heavily on multiple fully connected layers with similar architecture to capture the features of the input model and the dependencies between different points in the input point cloud. In addition, PF-Net shows that the lower and middle layers in MLP often extract local information, and these local cannot be exploited to form global features simply by passing them to the higher layers using a shared fully connected layer. This means that the method cannot efficiently extract and embed enough long-range dependency information into the final global feature vector. Another problem is that even if limited long-range dependent information can be captured, it is often necessary to go through several fully connected layers to learn the information. This may be detrimental to the efficient capture of long range dependencies, mainly for several reasons:

(1) A more targeted model may be needed to represent these long-range dependencies;

(2) Optimization algorithms may be difficult to calculate certain parameter values that may be used to facilitate inter-layer coordination among multiple layers to capture these long-range dependencies.

(3) When these parameter settings are applied to a new model that the network has not seen, the parameter settings may exhibit vulnerability statistically.

In recent years, the Attention mechanism has been commonly combined with various methods (such as the current method and the GAN method) to capture long-range dependency information. It began first in the computer vision field and has evolved greatly in the Natural Language Processing (NLP) field. Document 5 V.Mnih,N.Heess,A.Graves,and K Kavukculoglu. Recurrent Models of Visual attention. Conference on Neural Information Processing Systems 2014 this mechanism is combined with the RNN method for image classification studies to obtain excellent performance. Document 6 D.Bahdanau,K.Cho,and Y.Bengio.Neural Machine Translation by Jointly Learning to Align and Translate.International Conference on Learning Representations 2015 the Attention mechanism is applied to NLP, i.e. it is used to translate and align simultaneously to complete the machine translation task. Self-intent allows the input elements in the collection to interact with each other to calculate weights or responses and find out which elements should be put more attention. Document 7 A.Vaswani,N.Shazeer,N Parmar, J.Uszkoreit, L.Jones, A.N.Gomez, and L.Kaiser. Attributes All You need Conference on Neural Information Processing Systems 2017, show that applying the Self-attribute mechanism to machine translation tasks achieves the best performance at the time. Document 8H.Zhang, I.GoodFe, D.Metaxas, J.Uszkoreit, and A.Odena.self-Attention Generative Adversarial networks.International Conference on Machine Learning 2019 the Self-attribute mechanism is integrated into the GAN framework, achieving the best performance in terms of class-conditional image generation on imageNet at the time.

Disclosure of Invention

The invention aims to: the invention aims to solve the technical problem of providing a three-dimensional point cloud restoration method for graphic processing aiming at the defects of the prior art, and particularly discloses a three-dimensional point cloud restoration method for extracting long-distance dependence based on a self-attention mechanism, which is used for restoring an incomplete 3D model and comprises the following steps:

step 1, inputting a point cloud model data set and collecting data;

step 2, a long-distance dependency relationship extraction network is obtained by combining a Self-Attention mechanism-based method and a multi-layer perceptron MLP, the long-distance dependency relationship extraction network is used for mapping an input point cloud into a total feature vector, and then a decoder of a topological root tree structure is used for generating a missing part of an incomplete point cloud;

and step 3, combining the incomplete point cloud and the generated missing part point cloud together to obtain a final repaired complete point cloud model.

Step 1 comprises the following steps:

step 1-1, setting and inputting a single three-dimensional point cloud model s, presetting 5 viewpoints which are respectively (1, 0), (0, 1), (1, 0, 1), (-1, 0), (-1, 0), and a plurality of different viewpoints to ensure that the missing part of the incomplete model has randomness when training and test data are acquired;

step 1-2, randomly selecting a viewpoint as a central point p, and presetting a radius r (the radius is set according to the removal point and is not a specific length in a mathematical sense, if the removal point is set to 25% of the origin cloud, then p is taken as the central point, and 25% of the closest points to the p point are removed);

step 1-3, for a three-dimensional point cloud model s, taking a randomly selected viewpoint p as a center, and removing points within a preset radius r to obtain an incomplete point cloud model; the removed set of points is the missing part point cloud corresponding to the incomplete point cloud model.

Step 2 comprises the following steps:

step 2-1, the input three-dimensional point cloud model data set S= { S _Train ，S _Test Dividing into training sets S _Train ＝{s ₁ ，s ₂ ，...s _i ，...，s _n Sum test set S _Test ＝{s _n+1 ，s _n+2 ，...，s _n+j ，...，s _n+m (s is therein _i Representing an ith three-dimensional point cloud model in a training set, s _n+j Representing a jth three-dimensional point cloud model in the test set; i is 1-n, j is 1-m;

step 2-2, for training set S _Train Collecting incomplete point cloud models P of each three-dimensional point cloud model under random view points _Train ＝{p ₁ ，p ₂ ，...p _i ，...，p _n ' and corresponding missing part point cloud model G _Train ＝{g ₁ ，g ₂ ，...g _i ，...，g _n Training as input to the whole network to obtain a trained long-range dependency extraction network and decoder of topological root tree structure, wherein p _i Refers to training set S _Train I-th three-dimensional point cloud model s in (3) _i Corresponding incomplete point cloud model g _i Refers to training set S _Train I-th three-dimensional point cloud model s in (3) _i A corresponding missing part point cloud model;

step 2-3, for test set S _Test Collecting incomplete point cloud models P of each three-dimensional point cloud model under random view points _Test ＝{p _n+1 ，p _n+2 ，...，p _n+j ，...，p _n+m Inputting into a trained network to obtain a missing part point cloud corresponding to the incomplete point cloud input model, wherein p is as follows _n+j Refers to test set S _Test The j-th three-dimensional point cloud model s in (3) _n+j And a corresponding incomplete point cloud model.

Step 2-2 includes the steps of:

step 2-2-1, training set S _Train Incomplete point cloud P _Train As input, and use the corresponding missing part point cloud G _Train After supervised training and forward propagation of the first stage shared multi-layer perceptron (shared MLP), each point in the incomplete point cloud is mapped into a 256-dimensional feature vector. The first-stage multi-layer shared perceptron consists of a two-layer shared fully-connected network, wherein the first layer maps each point into 128-dimensional feature vectors, the second layer maps each point into 256-dimensional feature vectors, and the whole input point cloud is mapped into a matrix with dimensions of 2048 multiplied by 256;

step 2-2, setting 2048×256 dimensional matrix obtained in step 2-2-1 as x= (x) ₁ ，x ₂ ，x ₃ ，...，x _i ) It will be used as an input to the self-attention module, where x _i A feature vector corresponding to one point in the input point cloud is obtained; mapping x to two feature spaces Q and K through two 1 x 1 convolution networks to calculate an attention score of the input point cloud, obtained by functions h (x) and v (x), respectively, where q= (h (x ₁ )，h(x ₂ )，h(x ₃ )，...h(x _i ))＝((w _h x ₁ ，w _h x ₂ ，w _h x ₃ ，...w _h x _i )， K＝(v(x ₁ )，v(x ₂ )，v(x ₃ )，...v(x _i ))＝(w _v x ₁ ，w _v x ₂ ，w _v x ₃ ，...w _v x _i )，w _h And w _v The weight matrix to be learned corresponds to h (x) and v (x) respectively, and is realized by 1X 1 convolution, w _h And w _v Is 32 x 256; q is a query matrix with dimensions of 2048 multiplied by 32, the number of points representing the input point cloud is 2048, each point is represented by a 32-dimensional feature vector, namely the query value length of each point is 32; k is a key value matrix corresponding to the input point cloud, and the dimension is 2048×32, namely the key value dimension corresponding to each point is 32; q and K will be used to calculate the attention score value of the input point cloud;

step 2-2-3, defining a functionTo calculate a scalar representing each point in the input point cloud relative to other pointsDependency relationship; function->Is defined as +.>Wherein i is the index of the point in the matrix Q obtained in the step 2-2-2, and j is the index of the point in the matrix K obtained in the step 2-2-2; for each point (expressed by a 32-dimensional vector) in Q, multiplying the key values corresponding to all points including the point one by one, namely multiplying the key values corresponding to each point in a matrix K by the 32-dimensional vector corresponding to each point, wherein the number of the input point cloud points is 2048, so that each point can calculate 2048 scalar quantities, and finally, the scalar quantities corresponding to all points are combined to obtain a matrix with dimensions of 2048 multiplied by 2048, namely, the attention score diagram corresponding to the input point cloud;

step 2-2-4, mapping each point in the input point cloud to a value matrix V to calculate an input signal at point j

Step 2-2-5, performing Softmax operation on the attention score map obtained in the step 2-2-3;

step 2-2-6, setting the output of the self-attention module as y, and mapping the input x to the output y by using a function phi;

step 2-2-7, carrying out maximum pooling;

step 2-2-8, finally generating a missing point cloud corresponding to the incomplete point cloud model;

and 2-2-9, obtaining a trained long-distance dependency relationship extraction network and a topology root tree generator.

The step 2-2-4 comprises the following steps: defining a function f (x _j )＝w _f x _j Mapping each point in the input point cloud to a value matrix V to calculate an input signal at point j, where w _f The weight matrix to be learned is realized by 1X 1 convolution; the points in the value matrix V are in one-to-one correspondence with each key value in the key value matrix K, namely, the input signals at the point j are in one-to-one correspondence with the key values at the point j; wherein v= (f (x) ₁ )，f(x ₂ )，f(x ₃ )，...f(x _j ))＝(w _f x ₁ ，w _f x ₂ ，w _f x ₃ ，...w _f x _j ) The dimension is 2048×128.

The step 2-2-5 comprises the following steps: definition formulaPerforming Softmax operation on the attention score map obtained in step 2-2-3, wherein q _i，j Representing the attention score value of point i in the query matrix Q with respect to point j in the key matrix K.

The steps 2-2-6 include: definition y=Φ (x) = (y) ₁ ，y ₂ ，...，y _i ，y _N )＝(φ(x ₁ )，φ(x ₂ )，...，φ(x _i )，φ(x _N ) Where N is the number of points in the input point cloud, x _N For inputting a feature vector corresponding to a certain point in the point cloud, y _i For outputting the ith point in the point cloud, adopting a formulaCalculated, where g (x _i )＝w _g x _i ，w _g The weight matrix to be learned is realized by 1X 1 convolution; and finally, obtaining an output matrix y with the same dimension as the input feature matrix x, namely, the dimension of the output matrix y is 2048 multiplied by 256.

The steps 2-2-7 comprise: and (3) carrying out maximum pooling on the 2048 multiplied by 256-dimensional matrix obtained in the step (2-2-6), namely selecting the maximum value of each point in each dimension to combine into a 256-dimensional feature vector, obtaining the 2048 multiplied by 256-dimensional feature matrix with the same shape by a stacking method, and splicing the feature matrix with the 2048 multiplied by 256-dimensional matrix obtained in the step (2-2-6) to form the 2048 multiplied by 512-dimensional feature matrix fused with the long-distance dependency information.

The steps 2-2-8 include: forward propagation of 2048×512-dimensional feature matrices fused with long-distance dependency information through a second stage sharing multi-layer perceptron is carried out, each point in an input incomplete point cloud model is mapped into 1024-dimensional vectors, the whole incomplete point cloud is mapped into 2048×1024-dimensional matrices, and then maximum pooling is carried out on the 2048×1024-dimensional matrices, namely, the maximum value of each point in each dimension is selected to obtain 1024-dimensional global feature vectors; and inputting the global feature vector into a decoder of a topological root tree structure, and finally generating a deficiency point cloud corresponding to the imperfection point Yun Mo type.

The steps 2-2-9 comprise: comparing the generated missing point cloud part with the real missing point cloud corresponding to the incomplete point cloud, calculating a Loss function, and carrying out back propagation to finally obtain a trained long-distance dependency relationship extraction network and a topology root tree generator. Step 3 comprises the following steps:

the method of the invention aims at solving the problem of repairing the three-dimensional model. The sensor may be used to acquire a large amount of three-dimensional data quickly, but it is often difficult to acquire complete three-dimensional data. The repair and estimation of the complete model based on the partial incomplete model is also widely applied to the fields of computer vision, robots, virtual reality and the like, such as mixed model analysis, target detection and tracking, 3D reconstruction, style migration, robot roaming and grabbing and the like, which also makes the work very significant.

The beneficial effects are that: the method introduces a self-attention mechanism into the problem of three-dimensional point cloud restoration, does not only adopt layer sharing full connection to perform feature extraction, and is beneficial to modeling long-distance dependency relations among various points in the input point cloud. As can be seen from the visualization of the self-attentive map of FIG. 4 and the comparison result of FIG. 5, the points in the missing part point cloud generated by the network model of the invention can be finely coordinated with other remote points by adopting the self-attentive mechanism, and the feature extractor can generate the global features by utilizing the information of remote positions instead of local positions, so that the prediction result is enabled to have less noise and deformation, and the 3D model restoration effect is improved. The whole method system is efficient and practical. Meanwhile, as can be seen from tables 1 and 2, compared with other methods capable of repairing the three-dimensional point cloud model, the method provided by the invention has the advantages that the CD (Chamfer distance) value is obviously reduced, and the repairing performance is obviously improved.

Drawings

The foregoing and/or other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings and detailed description.

FIG. 1a is an incomplete point cloud model before repair.

FIG. 1b is a point cloud model after repair.

FIG. 2 is a schematic architecture diagram of a self-attention module of the method of the present invention.

FIG. 3 is a system frame diagram of a feature extraction module of the method of the present invention.

Fig. 4 is a visual diagram of attention scores corresponding to an input point cloud model in the method of the present invention.

FIG. 5 is a graph showing the repair effect of the method of the present invention compared to other methods.

Fig. 6 is a flow chart of the present invention.

Detailed Description

As shown in fig. 6, the invention discloses a three-dimensional point cloud restoration method for extracting long-distance dependency relationship based on a self-attention mechanism, wherein one viewpoint is randomly selected from a plurality of preset viewpoints as a central point, and all the points are removed within a preset radius range to acquire an incomplete model under the viewpoint; inputting an incomplete model of a model training set and a corresponding missing part into a network of the method for training, and inputting an incomplete model of a model testing set into a trained network to obtain a missing part of the corresponding incomplete model; and synthesizing the incomplete model and the missing part together to obtain the final repaired model.

For a given set of some class of 3D models s= { S _Train ，S _Test Divided into training sets S _Train ＝ {s ₁ ，s ₂ ，...s _i ，...，s _n Sum test set S _Test ＝{s _n+1 ，s _n+2 ，...，s _n+j ，...，s _n+m (s is therein _i Representing the ith model in the training set, s _n+j The j-th model in the test set is represented, and the invention completes the test set S through the following steps _Test Repair of the model inside, target task as shown in fig. 1a, flow chart as shown in fig. 2, 3 and 6:

the method specifically comprises the following steps:

step 1, inputting a point cloud model data set and collecting data;

step 2, a long-distance dependency relationship extraction network is obtained by combining a Self-Attention mechanism-based method and a multi-layer perceptron MLP, an input point cloud is mapped into a global feature vector by the long-distance dependency relationship extraction network, and then a decoder of a topological root tree structure is adopted to generate a missing part of an incomplete point cloud;

Step 1 comprises the following steps:

Step 2 comprises the following steps:

step 2-1, the input three-dimensional point cloud model data set S= { S _Train ，S _Test Dividing into training sets S _Train ＝{s ₁ ，s ₂ ，...s _i ，...，s _n Sum test set S _Test ＝{s _n+1 ，s _n+2 ，...，s _n+j ，...，s _n+m (s is therein _i Representing the ith three-dimensional point in the training setCloud model s _n+j Representing a jth three-dimensional point cloud model in the test set; i is 1-n, i is 1-m;

Step 2-2 includes the steps of:

step 2-2-3, defining a functionTo calculate a scalar representing the dependency of each point in the input point cloud on other points; function->Is defined as +.>Wherein i is the index of the point in the matrix Q obtained in the step 2-2-2, and j is the index of the point in the matrix K obtained in the step 2-2-2; for each point in Q (with 32-dimensional vector representation), multiplying the key values corresponding to all points including the point by one, namely multiplying the key values by 32-dimensional vectors corresponding to each point in the matrix K, wherein the number of the input point cloud points is 2048, so that each point can be calculated to obtain 2048 scalar quantities, and finally, the scalar quantities corresponding to all points are combined to obtain a matrix with dimensions of 2048 multiplied by 2048, namely, an attention score diagram corresponding to the input point cloud;

step 2-2-4, defining a function f (x _j )＝w _f x _j Mapping each point in the input point cloud to a value matrix V to calculate the input signal at point j (the value matrix will be used to multiply the attention score map obtained in step 2-2-3 to obtain a weighted vector), where w _f Is a weight matrix that needs to be learned, and is implemented by 1×1 convolution. The points in the value matrix V are in one-to-one correspondence with each key value in the key value matrix K, i.e. the input signal at point j is in one-to-one correspondence with the key value at point j. Wherein v= (f (x) ₁ )，f(x ₂ )，f(x ₃ )，...f(x _j ))＝(w _f x ₁ ，w _f x ₂ ，w _f x ₃ ，...w _f x _j ) The dimension is 2048×128;

step 2-2-5, defining a formulaPerforming a Softmax operation on the attention score map obtained in step 2-2-3 (i.e., performing a Softmax operation on all the attention score values corresponding to each point such that the sum of the attention scores of each point with respect to all other points is 1), wherein q _i，j A score value representing the attention of point i in the query matrix Q with respect to point j in the key matrix K;

step 2-2-6, setting the output of the self-attention module as y, and mapping the input x to the output y by using a function phi; definition y=Φ (x) = (y) ₁ ，y ₂ ，...，y _i ，y _N )＝(φ(x ₁ )，φ(x ₂ )，...，φ(x _i )，φ(x _N ) Where N is the number of points in the input point cloud, x _N For inputting a feature vector corresponding to a certain point in the point cloud, y _i For outputting the ith point in the point cloud, adoptFormula (VI)Calculated, where g (x _i )＝w _g x _i ，w _g The weight matrix to be learned is realized by 1X 1 convolution; and finally, obtaining an output matrix y with the same dimension as the input feature matrix x, namely, the dimension of the output matrix y is 2048 multiplied by 256.

2-2-7, carrying out maximum pooling on the 2048 multiplied by 256-dimensional matrix obtained in the 2-2-6, namely selecting the maximum value of each point under each dimension to combine into a 256-dimensional feature vector, obtaining the 2048 multiplied by 256-dimensional feature matrix with the same shape by a stacking method, and splicing the feature matrix with the 2048 multiplied by 256-dimensional matrix obtained in the 2-2-6 to form the 2048 multiplied by 512-dimensional feature matrix fused with long-distance dependency relationship information;

2-2-8, forward transmitting a 2048×512-dimensional feature matrix fused with long-distance dependency information through a shared multi-layer perceptron (shared MLP) of a second stage, mapping each point in an input incomplete point cloud model into a 1024-dimensional vector, mapping the whole incomplete point cloud into a 2048×1024-dimensional matrix, and carrying out maximum pooling on the 2048×1024-dimensional matrix, namely selecting the maximum value of each point in each dimension to obtain a 1024-dimensional global feature vector; inputting the global feature vector into a decoder of a topological root tree structure, and finally generating a missing point cloud corresponding to the incomplete point cloud model;

and 2-2-9, comparing the generated missing point cloud part with the real missing point cloud corresponding to the incomplete point cloud, calculating a Loss function, and carrying out back propagation to finally obtain a trained long-distance dependency relationship extraction network and a topology root tree generator.

Step 3 comprises the following steps:

and (3) synthesizing the point cloud of the missing part obtained in the step (2) and the incomplete point cloud together to obtain the final repaired complete point cloud model.

Examples

The objective task of the present invention is shown in fig. 1a and 1b, fig. 1a is an original model to be repaired, fig. 1b is a repaired model, the self-attention module architecture of the method of the present invention is shown in fig. 2, and the architecture of the whole global feature extractor is shown in fig. 3. The steps of the present invention are described below according to examples.

Step (1), collecting data of an input point cloud model data set;

step (1.1), setting and inputting a single three-dimensional point cloud model s, presetting 5 viewpoints which are respectively (1, 0), (0, 1), (1, 0, 1), (-1, 0), (-1, 0), and a plurality of different viewpoints so as to ensure that the missing part of the incomplete model has randomness when training and test data are acquired;

step (1.2), randomly selecting a viewpoint as a central point p, and presetting a radius r (the radius is set according to the removal point and is not a specific length in a mathematical sense, if the removal point is set to 25% of the origin cloud, then p is taken as the central point, and 25% of the closest points to the p point are removed);

step (1.3), for a three-dimensional point cloud model s, taking a randomly selected viewpoint p as a center, removing points within a preset radius r range, and obtaining an incomplete point cloud model; the removed set of points is the missing part point cloud corresponding to the incomplete point cloud model.

Step (2), a long-distance dependency relationship extraction network is obtained by combining a Self-Attention mechanism-based method and a multi-layer perceptron MLP, an input point cloud is mapped into a global feature vector by the long-distance dependency relationship extraction network, and then a decoder of a topological root tree structure is adopted to generate a missing part of an incomplete point cloud;

step (2.1), the input three-dimensional point cloud model data set S= { S _Train ，S _Test Dividing into training sets S _Train ＝{s ₁ ，s ₂ ，...s _i ，...，s _n Sum test set S _Test ＝{s _n+1 ，s _n+2 ，...，s _n+j ，...，s _n+m (s is therein _i Representing an ith three-dimensional point cloud model in a training set, s _n+j Represents the jth in the test setA three-dimensional point cloud model; i is 1-n, i is 1-m;

step (2.2), for training set S _Train Collecting incomplete point cloud models P of each three-dimensional point cloud model under random view points _Train ＝{p ₁ ，p ₂ ，...p _i ，...，p _n ' and corresponding missing part point cloud model G _Train ＝ {g ₁ ，g ₂ ，...g _i ，...，g _n Training as input to the whole network to obtain a trained long-range dependency extraction network and decoder of topological root tree structure, wherein p _i Refers to training set S _Train I-th three-dimensional point cloud model s in (3) _i Corresponding incomplete point cloud model g _i Refers to training set S _Train I-th three-dimensional point cloud model s in (3) _i A corresponding missing part point cloud model;

step (2.2.1), training set S _Train Incomplete point cloud P _Train As input, and use the corresponding missing part point cloud G _Train After supervised training and forward propagation of the first stage shared multi-layer perceptron (shared MLP), each point in the incomplete point cloud is mapped into a 256-dimensional feature vector. The first-stage multi-layer shared perceptron consists of a two-layer shared fully-connected network, wherein the first layer maps each point into 128-dimensional feature vectors, the second layer maps each point into 256-dimensional feature vectors, and the whole input point cloud is mapped into a matrix with dimensions of 2048 multiplied by 256;

step (2.2.2), let 2048×256 dimensional matrix obtained in step 2-2-1 be x= (x) ₁ ，x ₂ ，x ₃ ，...，x _i ) It will be the input to the self-attention module, where x _i A feature vector corresponding to one point in the input point cloud is obtained; mapping x to two feature spaces Q and K through two 1 x 1 convolutional networks to calculate the attention score of the input point cloud, obtained by functions h (x) and v (x), respectively, where q= (h (x ₁ )，h(x ₂ )，h(x ₃ )，...h(x _i ))＝((w _h x ₁ ，w _h x ₂ ，w _h x ₃ ，...w _h x _i )， K＝(v(x ₁ )，v(x ₂ )，v(x ₃ )，...v(x _i ))＝(w _v x ₁ ，w _v x ₂ ，w _v x ₃ ，...w _v x _i )，w _h And w _v The weight matrix to be learned corresponds to h (x) and v (x) respectively, and is realized by 1X 1 convolution, w _h And w _v Is 32 x 256; q is a query matrix with dimensions of 2048 multiplied by 32, the number of points representing the input point cloud is 2048, each point is represented by a 32-dimensional feature vector, namely the query value length of each point is 32; k is a key value matrix corresponding to the input point cloud, and the dimension is 2048×32, namely the key value dimension corresponding to each point is 32; q and K will be used to calculate the attention score value of the input point cloud;

step (2.2.3) of defining a functionTo calculate a scalar representing the dependency of each point in the input point cloud on other points; function->Is defined as +.>Wherein i is the index of the point in the matrix Q obtained in the step 2-2-2, and j is the index of the point in the matrix K obtained in the step 2-2-2; for each point (expressed by a 32-dimensional vector) in Q, multiplying the key values corresponding to all points including the point one by one, namely multiplying the key values corresponding to each point in a matrix K by the 32-dimensional vector corresponding to each point, wherein the number of the input point cloud points is 2048, so that each point can calculate 2048 scalar quantities, and finally, the scalar quantities corresponding to all points are combined to obtain a matrix with dimensions of 2048 multiplied by 2048, namely, the attention score diagram corresponding to the input point cloud;

step (2.2.4), defining a function f (x _j )＝w _f x _j Mapping each point in the input point cloud to a value matrix V to calculate the input signal at point j (the value matrix will be used to multiply the attention score map obtained in step 2-2-3 toResulting in a weight vector), where w _f Is a weight matrix that needs to be learned, and is implemented by 1×1 convolution. The points in the value matrix V are in one-to-one correspondence with each key value in the key value matrix K, i.e. the input signal at point j is in one-to-one correspondence with the key value at point j. Wherein v= (f (x) ₁ )，f(x ₂ )，f(x ₃ )，...f(x _j ))＝(w _f x ₁ ，w _f x ₂ ，w _f x ₃ ，...w _f x _j ) The dimension is 2048×128;

step (2.2.5), defining a formulaPerforming a Softmax operation on the attention score map obtained in step 2-2-3 (i.e., performing a Softmax operation on all the attention score values corresponding to each point such that the sum of the attention scores of each point with respect to all other points is 1), wherein q _i，j A score value representing the attention of point i in the query matrix Q with respect to point j in the key matrix K;

step (2.2.6), setting the output of the self-attention module as y, and mapping the input x to the output y by using a function phi; define y=Φ (x) = (y) ₁ ，y ₂ ，...，y _i ，y _N )＝(φ(x ₁ )，φ(x ₂ )，...，φ(x _i )，φ(x _N ) Where N is the number of points in the input point cloud, x _N For inputting a feature vector corresponding to a certain point in the point cloud, y _i For outputting the ith point in the point cloud, adopting a formulaCalculated, where g (x _i )＝w _g x _i ，w _g The weight matrix to be learned is realized by 1X 1 convolution; and finally, obtaining an output matrix y with the same dimension as the input feature matrix x, namely, the dimension of the output matrix y is 2048 multiplied by 256.

Step (2.2.7), carrying out maximum pooling on the 2048 multiplied by 256-dimensional matrix obtained in the step 2-2-6, namely selecting the maximum value of each point under each dimension to combine into a 256-dimensional feature vector, obtaining the 2048 multiplied by 256-dimensional feature matrix with the same shape by a stacking method, and splicing the feature matrix with the 2048 multiplied by 256-dimensional matrix obtained in the step 2-2-6 to form the 2048 multiplied by 512-dimensional feature matrix fused with long-distance dependency information;

step (2.2.8), forward transmitting 2048×512-dimensional feature matrices fused with long-distance dependency information through a shared multi-layer perceptron (shared MLP) in a second stage, mapping each point in an input incomplete point cloud model into 1024-dimensional vectors, mapping the whole incomplete point cloud into 2048×1024-dimensional matrices, and carrying out maximum pooling on the 2048×1024-dimensional matrices, namely selecting the maximum value of each point in each dimension to obtain a 1024-dimensional global feature vector; inputting the global feature vector into a decoder of a topological root tree structure, and finally generating a missing point cloud corresponding to the incomplete point cloud model;

and (2.2.9) comparing the generated missing point cloud part with the real missing point cloud corresponding to the incomplete point cloud, calculating a Loss function, and carrying out back propagation to finally obtain a trained long-distance dependency relationship extraction network and a topology root tree generator.

Step (2.3), for test set S _Test Collecting incomplete point cloud models P of each three-dimensional point cloud model under random view points _Test ＝{p _n+1 ，p _n+2 ，...，p _n+j ，...，p _n+m Inputting into a trained network to obtain a missing part point cloud corresponding to the incomplete point cloud input model, wherein p is as follows _n+j Refers to test set S _Test The j-th three-dimensional point cloud model s in (3) _n+j And a corresponding incomplete point cloud model. The testing process mainly comprises the following steps:

step (2.3.1), the incomplete point cloud model P of the test model set under the random view point _Test Inputting the data to a generator network trained cooperatively with a long-distance dependency relationship extraction network;

and (2.3.2) outputting the missing part point cloud corresponding to the incomplete model of the test model set under the random viewpoint.

And (3) synthesizing the incomplete point cloud and the generated missing part point cloud together to obtain the final repaired complete point cloud model.

Analysis of results

The experimental environment parameters of the method of the invention are as follows:

(1) The parameters of an experimental platform for data acquisition of the model are Ubuntu 16.04.4 LTS operating system, intel (R) Core (TM) i7-6850K [email protected] and memory 32GB, and a Python programming language is adopted, so that the programming development environment is PyCharm 2019;

(2) The experimental platform parameters of the training and testing process of the network extraction based on the Self-attribute mechanism are Ubuntu 16.04.4 LTS operating system, intel (R) Core (TM) i7-6850K [email protected] and memory 32GB, the display card is TITAN RTX GPU 24GB, the Python programming language is adopted, and the TensorFlow third-party open source library is adopted for realizing.

Comparative experimental results (shown in tables 1 and 2) of the method of the present invention and TopNet, folding, PCN, ATlasNet, pointNetFCAE were analyzed as follows:

experiments were performed on a subset of the recognized benchmark dataset shape net, the subset 8 different models, the class names of the datasets of each class are shown in the first column of Table 1, wherein each class name means airland, lamp, cabinet, car, chair, couch, table, watercraft; the partitioning of the training set and the test set is shown in the second column of table 1.

The final measure is the average Chamfer Distance (CD) of the complete model after repair. As shown by the CD comparisons of tables 1 and 2 (table 1 shows the model CD comparison of the method of the invention versus other methods for class 8 on ShapeNet dataset and table 2 shows the average CD comparison of the method of the invention versus other methods for all classes on ShapeNet dataset). The CDs shown in the table are all multiplied by 10 at the same time after calculation ⁵ . It can be seen from tables 1 and 2 that all of the class CD values and class average CD values of the methods of the present invention are lower than those of the other methods. FIG. 5 is a graph showing the repair results of the method of the present invention compared with other methods, and it can be seen that the method of the present invention repairs the missing portion of the incomplete point cloudNoise and deformation are obviously reduced, and the repairing effect is obviously improved.

TABLE 1

TABLE 2

	ATlasNet	Folding	PCN	TopNet	PointNetFCAE	The method of the invention
							Category Avg.	94.4	74.6	67.1	63.9	97.6	55.8

In the Self-comparison experiment, a Self-section module for extracting the long-distance dependent information is removed, and the CD pair of the final experiment result is shown in a table 3, so that the optimization operation for extracting the long-distance dependent information can obviously reduce the final CD value of the complete model after repair.

In addition, the method of the invention visualizes the learned long-distance dependence information relationship, and the visualizations are shown in fig. 4. For each point in the incomplete 3D point cloud, there is a long-distance dependency information relationship corresponding to the point. In each line of fig. 4, the first picture shows three points of representative locations, and the other three show attention score graphs corresponding to these points. The method is used for extracting the long-distance dependent information by adopting a more specific method, and not only a fully-connected shared multi-layer perceptron layer is adopted, so that enough long-distance dependent information can be learned. Thereby significantly reducing the final CD value of the complete model after repair. Table 3 shows the final results of the method of the present invention in comparison to the results after optimization with the self-attention module removed for long-range dependent information extraction.

TABLE 3 Table 3

The present invention provides a three-dimensional point cloud repairing method for graphic processing, and the method and the way for realizing the technical scheme are numerous, the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be regarded as the protection scope of the present invention. The components not explicitly described in this embodiment can be implemented by using the prior art.

Claims

1. A three-dimensional point cloud restoration method for graphic processing is characterized by comprising the following steps:

step 1, inputting a point cloud model data set and collecting data;

step 2, a Self-Attention Self-mechanism-based method and a multi-layer perceptron MLP are combined to obtain a long-distance dependency relationship extraction network, the long-distance dependency relationship extraction network is used for mapping an input point cloud into a global feature vector, and then a decoder of a topological root tree structure is used for generating a missing part of an incomplete point cloud;

step 3, combining the incomplete point cloud and the generated missing part point cloud together to obtain a final repaired complete point cloud model;

step 1 comprises the following steps:

step 1-1, setting and inputting a single three-dimensional point cloud model s, and presetting 5 viewpoints which are (1, 0), (0, 1), (1, 0, 1), (-1, 0), (-1, 0) respectively;

step 1-2, randomly selecting a viewpoint as a center point p, and presetting a radius r;

step 1-3, for a three-dimensional point cloud model s, taking a randomly selected viewpoint p as a center, and removing points within a preset radius r to obtain an incomplete point cloud model; the removed point set is the missing part point cloud corresponding to the incomplete point cloud model;

step 2 comprises the following steps:

step 2-1, the input three-dimensional point cloud model data set S= { S _Train ,S _Test Dividing into training sets S _Train ＝{s ₁ ,s ₂ ,…s _i ,…,s _n Sum test set S _Test ＝{s _n+1 ,s _n+2 ,…,s _n+j ,…,s _n+m (s is therein _i Representing an ith three-dimensional point cloud model in a training set, s _n+j Representing a jth three-dimensional point cloud model in the test set; i is 1-n, j is 1-m;

step 2-2, for training set S _Train Collecting incomplete point cloud models P of each three-dimensional point cloud model under random view points _Train ＝{p ₁ ,p ₂ ,…p _i ,…,p _n ' and corresponding missing part point cloud model G _Train ＝{g ₁ ,g ₂ ,…g _i ,…,g _n Training as input to the whole network to obtain a trained long-range dependency extraction network and decoder of topological root tree structure, wherein p _i Refers to training set S _Train I-th three-dimensional point cloud model s in (3) _i Corresponding incomplete point cloud model g _i Refers to training set S _Train I-th three-dimensional point cloud model s in (3) _i A corresponding missing part point cloud model;

step 2-3, for test set S _Test Collecting incomplete point cloud models P of each three-dimensional point cloud model under random view points _Test ＝{p _n+1 ,p _n+2 ,…,p _n+j ,…,p _n+m Inputting into a trained network to obtain a missing part point cloud corresponding to the incomplete point cloud input model, wherein p is as follows _n+j Refers to test set S _Test The j-th three-dimensional point cloud model s in (3) _n+j A corresponding incomplete point cloud model;

step 2-2 includes the steps of:

step 2-2-1, training set S _Train Incomplete point cloud P _Train As input, and use the corresponding missing part point cloud G _Train After forward propagation of the first-stage shared multi-layer perceptron, mapping each point in the incomplete point cloud into a 256-dimensional feature vector, wherein the first-stage multi-layer shared perceptron consists of two layers of shared fully-connected networks, the first layer maps each point into a 128-dimensional feature vector, the second layer maps each point into a 256-dimensional feature vector, and the whole input point cloud is mapped into a matrix with dimensions of 2048×256;

step 2-2, setting 2048×256 dimensional matrix obtained in step 2-2-1 as x= (x) ₁ ,x ₂ ,x ₃ ,…,x _i ) It will be the input to the self-attention module, where x _i A feature vector corresponding to one point in the input point cloud is obtained; mapping x to two feature spaces Q and K through two 1 x 1 convolution networks to calculate the attention score of the input point cloud, obtained by functions h (x) and v (x), respectively, where q= (h (x ₁ ),h(x ₂ ),h(x ₃ ),…h(x _i ))＝((w _h x ₁ ,w _h x ₂ ,w _h x ₃ ,…w _h x _i )，K＝(v(x ₁ ),v(x ₂ ),v(x ₃ ),…v(x _i ))＝(w _v x ₁ ,w _v x ₂ ,w _v x ₃ ,…w _v x _i ),w _h And w _v Is a weight matrix to be learned, and is realized by 1X 1 convolution and w is respectively corresponding to h (x) and v (x) _h And w _v Is 32 x 256; q is a query matrix with dimensions of 2048 multiplied by 32, the number of points representing the input point cloud is 2048, each point is represented by a 32-dimensional feature vector, namely the query value length of each point is 32; k is a key value matrix corresponding to the input point cloud, and the dimension is 2048×32, namely the key value dimension corresponding to each point is 32; q and K will be used to calculate the attention score value of the input point cloud;

step 2-2-3, defining a functionTo calculate a scalar representing the dependency of each point in the input point cloud on other points; function->Is defined as +.>Wherein i is the index of the point in the matrix Q obtained in the step 2-2-2, and j is the index of the point in the matrix K obtained in the step 2-2-2; for each point in Q, multiplying the key values corresponding to all points including the point one by one, namely multiplying the key values corresponding to each point in the matrix K by 32-dimensional vectors corresponding to each point, wherein the number of the input point cloud points is 2048, so that each point can calculate 2048 scalar quantities, and finally, the scalar quantities corresponding to all points are combined to obtain a matrix with dimensions of 2048 multiplied by 2048, namely an attention score diagram corresponding to the input point cloud;

step 2-2-7, carrying out maximum pooling;

step 2-2-9, obtaining a trained long-distance dependency relationship extraction network and a topology root tree generator;

the step 2-2-4 comprises the following steps: defining a function fx _j )＝w _f x _j Mapping each point in the input point cloud to a value matrix V to calculate an input signal at point j, where w _f The weight matrix to be learned is realized by 1X 1 convolution; the points in the value matrix V are in one-to-one correspondence with each key value in the key value matrix K, namely, the input signals at the point j are in one-to-one correspondence with the key values at the point j; wherein v= (f (x) ₁ ),f(x ₂ ),f(x ₃ ),…f(x _j ))＝(w _f x ₁ ,w _f x ₂ ,w _f x ₃ ,…w _f x _j ) The dimension is 2048×128;

the step 2-2-5 comprises the following steps: definition formulaPerforming Softmax operation on the attention score map obtained in step 2-2-3, wherein q _i,j A score value representing the attention of point i in the query matrix Q with respect to point j in the key matrix K;

the steps 2-2-6 include: definition y=Φ (x) = (y) ₁ ,y ₂ ,…,y _i ,y _N )＝(φ(x ₁ ),φ(x ₂ ),…,φ(x _i ),φ(x _N ) Where N is the number of points in the input point cloud, x _N For inputting a feature vector corresponding to a certain point in the point cloud, y _i For outputting the ith point in the point cloud, adopting a formulaCalculated to be thatMiddle g (x) _i )＝w _g x _i ,w _g The weight matrix to be learned is realized by 1X 1 convolution; finally, an output matrix y with the same dimension as the input feature matrix x is obtained, namely the dimension of the output matrix y is 2048 multiplied by 256;

the steps 2-2-7 comprise: performing maximum pooling on the 2048×256-dimensional matrix obtained in the step 2-2-6, namely selecting the maximum value of each point under each dimension to combine into a 256-dimensional feature vector, obtaining the 2048×256-dimensional feature matrix with the same shape by a stacking method, and splicing the feature matrix with the 2048×256-dimensional matrix obtained in the step 2-2-6 to form the 2048×512-dimensional feature matrix fused with long-distance dependency information;

the steps 2-2-8 include: forward propagation of 2048×512-dimensional feature matrices fused with long-distance dependency information through a second stage sharing multi-layer perceptron is carried out, each point in an input incomplete point cloud model is mapped into 1024-dimensional vectors, the whole incomplete point cloud is mapped into 2048×1024-dimensional matrices, and then maximum pooling is carried out on the 2048×1024-dimensional matrices, namely, the maximum value of each point in each dimension is selected to obtain a 1024-dimensional global feature vector; inputting the global feature vector into a decoder of a topological root tree structure, and finally generating a missing point cloud corresponding to the incomplete point cloud model;

the steps 2-2-9 comprise: comparing the generated missing point cloud part with the real missing point cloud corresponding to the incomplete point cloud, calculating a Loss function, and carrying out back propagation to finally obtain a trained long-distance dependency relationship extraction network and a topology root tree generator.