CN111428855B

CN111428855B - End-to-end point cloud deep learning network model and training method

Info

Publication number: CN111428855B
Application number: CN202010116881.9A
Authority: CN
Inventors: 杨健; 范敬凡; 艾丹妮; 郭龙腾; 王涌天
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2020-02-25
Filing date: 2020-02-25
Publication date: 2023-11-14
Anticipated expiration: 2040-02-25
Also published as: CN111428855A

Abstract

The end-to-end point cloud deep learning network model and the training method can simultaneously position the identification points on the faces with different scales, and the network has good positioning accuracy and high positioning speed. The network model is a deep learning network structure similar to a convolutional neural network CNN, and comprises the following components: (1) The network gradually downsamples from an input point cloud to obtain a series of sampling point sets, and gradually extracts the point distribution characteristics of the neighborhood point clouds of the sampling points in each sampling point set by using a point distribution characteristic extractor, wherein the point distribution characteristics of the neighborhood point clouds of the sampling points are gradually abstract and the space receptive field is gradually enlarged; (2) Selecting part of point sets from the sampling point sets, and using all sampling points in the sampling point sets as monitoring points to locate the identification points; (3) And predicting the probability that each monitoring point is positioned in the neighborhood of different identification points and the offset of each monitoring point and different identification points.

Description

End-to-end point cloud deep learning network model and training method

Technical Field

The invention relates to the technical field of point cloud image processing and deep learning, in particular to an end-to-end point cloud deep learning network model and an end-to-end point cloud deep learning training method.

Background

The three-dimensional image is a special information expression form, and is characterized by three-dimensional data in the expression space, wherein the expression form comprises: depth map (expressing object-to-camera distance in grayscale), geometric model (built by CAD software), point cloud model (all reverse engineering devices sample objects as point clouds). Compared with the two-dimensional image, the three-dimensional image can realize the decoupling of the natural object, namely the background by the information of the third dimension. Point cloud data is the most common and fundamental three-dimensional model. The point cloud model is usually obtained directly by measurement, each point corresponds to one measurement point, and other processing means are not adopted, so that the maximum information quantity is contained. The information is hidden in the point cloud and needs to be extracted by other extraction means, and the process of extracting the information in the point cloud is three-dimensional image processing.

The Point Cloud is a massive Point set expressing the target space distribution and the target surface characteristics under the same space reference system, and after the space coordinates of each sampling Point of the object surface are obtained, the Point Cloud is obtained and is called as Point Cloud.

The rapid and accurate positioning of the identification points in the point cloud is very important in the fields of identity recognition, 3D model segmentation, 3D model retrieval and the like, wherein the automatic positioning of the identification points in the 3D face point cloud is very important in the aspects of face recognition, expression recognition, head pose recognition, head motion estimation, head point cloud dense matching, lip shape analysis, head operation, disease diagnosis and the like.

However, the existing technology cannot simultaneously ensure the accuracy and the speed of the algorithm, the algorithm with higher speed has lower accuracy, and the algorithm with higher accuracy has slower speed, so that the application with higher requirements on the accuracy and the speed can not be met.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide an end-to-end point cloud deep learning network model which can simultaneously position identification points on faces with different scales, and has the advantages of high positioning accuracy and high positioning speed.

The technical scheme of the invention is as follows: the end-to-end point cloud deep learning network model is a deep learning network structure of a convolutional neural network CNN, and comprises the following steps:

(1) The network gradually downsamples from an input point cloud to obtain a series of sampling point sets, and gradually extracts the point distribution characteristics of the neighborhood point clouds of the sampling points in each sampling point set by using a point distribution characteristic extractor, wherein the point distribution characteristics of the neighborhood point clouds of the sampling points are gradually abstract and the space receptive field is gradually enlarged;

(2) Selecting part of point sets from the sampling point sets, and using all sampling points in the sampling point sets as monitoring points to locate the identification points;

(3) And predicting the probability that each monitoring point is positioned in the neighborhood of different identification points and the offset of each monitoring point and different identification points.

The invention uses the point distribution feature extractor to extract the neighborhood point cloud distribution feature of the sampling point, the neighborhood point distribution feature of the point is abstract step by step and the space receptive field is enlarged step by step, thus the distribution feature of the point in different space ranges can be expressed, the invention uses a plurality of monitoring point sets with different space receptive fields, so that the network can simultaneously position the identification points on the faces with different scales; the network uses an end-to-end training mechanism, so that the network can obtain higher positioning precision, and the algorithm consumes time which is time consuming for forward propagation of the point cloud in the network, and is shorter and more stable through light-weight design.

The invention also provides a training method of the end-to-end point cloud deep learning network model, which is used for matching each monitoring point with a plurality of identification points, so long as the monitoring point is adjacent to a certain identification point, the identification point is matched with the monitoring point, the position of the identification point matched with the monitoring point is predicted by using the characteristic of each monitoring point, and the problem of positioning the identification point in the point cloud is converted into a problem of multi-label prediction and regression.

Drawings

Fig. 1 is a flow chart of the structure of a Landmark Net and its application to a set of face points with normal dimensions.

The identification point of fig. 2 is a schematic diagram of a simple matching result of the monitoring point and the target identification point.

Fig. 3 is a flow chart of an end-to-end point cloud deep learning network model according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

In order that the present disclosure may be more fully described and fully understood, the following description is provided by way of illustration of embodiments and specific examples of the present invention; this is not the only form of practicing or implementing the invention as embodied. The description covers the features of the embodiments and the method steps and sequences for constructing and operating the embodiments. However, other embodiments may be utilized to achieve the same or equivalent functions and sequences of steps.

As shown in fig. 3, this end-to-end point cloud deep learning network model is a deep learning network structure of a convolutional neural network CNN, and includes the following steps:

(1) The network gradually downsamples from an input point cloud to obtain a series of sampling point sets, and gradually extracts the point distribution characteristics of the neighborhood point clouds of the sampling points in each sampling point set by using a point distribution characteristic extractor, wherein the neighborhood point clouds of the sampling points are gradually abstract in point distribution characteristics and the space receptive field is gradually expanded;

(2) Selecting a part of point sets from the sampling point sets, and using all sampling points written in the sampling point sets here as monitoring points to locate the identification points;

The invention uses the point distribution feature extractor to extract the neighborhood point cloud distribution feature of the sampling point, the neighborhood point distribution feature of the point is abstract step by step and the space receptive field is enlarged step by step, thus the distribution feature of the point in different space ranges can be expressed, the invention uses a plurality of monitoring point sets with different space receptive fields, and the network can simultaneously position the identification points on the faces with different scales; the network uses an end-to-end training mechanism, so that the network can obtain higher positioning precision, and the algorithm consumes time which is time consuming for forward propagation of the point cloud in the network, and is shorter and more stable through light-weight design.

Preferably, the steps are(1) In the method, for any input point cloud P, a Voxel Grid filter is firstly used for downsampling the point cloud P into a point cloud P with a point cloud density of D ₀ The method comprises the steps of carrying out a first treatment on the surface of the According to a fixed sampling proportion { tau } ₁ ,τ ₂ ,…,τ _n From P ₀ Step-by-step downsampling to obtain a sampling point set { P } ₁ ,P ₂ ,…,P _n }；

From the first set of sampling points P ₁ Initially, a set of sampling points { P } is extracted step by step using a feature abstraction operation ₁ ,P ₂ ,…,P _n Abstract features of sample points in }. Feature abstraction operations are applied to a set of points P _i-1 Calculate the point set P _i Abstract features of each sampling point in (a) for a set of sampling points P _i The kth sample point inAt the sampling point set P _i-1 Find the position at the point->Radius r as center _i Neighborhood subset inside sphere->Extracting +.>N in _i Individual points and their feature vectors, resulting in points +.>Is>Wherein n is _i Is positively correlated with a point cloud density D. Sampling point set P _i Features of all sample points ∈ ->Composition point set P _i Is an abstract feature set F of (1) _i Each sampling point set { P ] ₁ ,P ₂ ,…,P _n Feature set { F } ₁ ,F ₂ ,…,F _n The spatial receptive field of the } is expanded step by step and is abstract step by step; finally, the point cloud extractor acts on P _n Will produce a feature vector that expresses the global feature.

Next, from the last layer of sample point set P _n The sampling point set { P } is obtained step by step _n ,P _n-1 ,…,P ₁ The propagation characteristics of all sample points within a set will constitute a propagation characteristic setFeature propagation operations are applied to point set P _i+1 Calculate the point set P _i Propagation characteristics of each sampling point of the set P of sampling points _i The kth sample point in->Will point set P _i+1 Middle ANDAbstract features of the nearest 3 points to AND +.>The derivative of the distance is weighted average for the weight, and the weighted average result is compared with the pointAbstract features of->Splicing, and applying multiple multi-layer perceptrons (MLP) and nonlinear activation functions (ReLu functions) to splicing results to obtain point +.>Propagation characteristics of->Sampling pointSet P _i Propagation characteristics of all sample points in (a)>Composition point set P _i Propagation feature set->Due to the sampling point set P _n The next stage of (a) is a feature vector, which is taken as a weighted average result and a sampling point set P _n The abstract features of each point in the list are spliced to obtain a point set P _n Propagation characteristics of each sample point +.>

Preferably, in the step (1), the Voxel Grid filter voxels the space first, and the barycenters of the points located in each Voxel form the output point cloud.

Preferably, in the step (2), the sampling point set { P } ₁ ,P ₂ ,…,P _n Selecting a plurality of point sets, namely a monitoring point set MPS, and enabling all sampling points in the monitoring point sets to be called monitoring points; for the ith monitoring point set P _i The kth monitoring point in (a)Will->And->Respectively carrying out batch normalization and then splicing, and taking the splicing result as the characteristic of each monitoring point ∈10->Characteristics of each monitoring Point->Reflects the neighborhood of the monitoring pointThe distribution characteristics of the points in the domain, the characteristics of the monitoring points in different areas are provided with differentiation, the neighborhood of the target identification point to which the monitoring point belongs is judged according to the characteristics of each monitoring point, and the position of the adjacent target identification point is predicted.

Preferably, in the step (3), if the number of target test points is L, for the i-th monitoring point set P _i The kth monitoring point in (a)Single-layer full-connection layer with 1 output dimension L>Acting on its features->Predicting the probability that the monitoring point is positioned in the neighborhood of each identification point by using L single-layer full-connection layers with 3 output dimensions Acting on its features->On the monitoring point->Predicting the offset (delta x, delta y, delta z) of each identification point; jth->And predicting the offset of the monitoring point and the j-th identification point.

Preferably, in the step (3), the parameters in the fully connected layers are shared in each sampling point set.

Fig. 1 is a flow chart of the structure of a Landmark Net and its application to a set of face points with normal dimensions. The following is a specific description:

a network consists of a number of feature abstraction operations and feature propagation operations. For any input point cloud P, a Voxel Grid filter is firstly used for downsampling the point cloud P into a point cloud P with a point cloud density of D ₀ . The Voxel Grid filter first voxels the space, and the center of gravity of the points located within each Voxel forms the output point cloud. According to a fixed sampling proportion { tau } ₁ ,τ ₂ ,…,τ _n From P ₀ Step by step sampling to obtain a sampling point set { P } ₁ ,P ₂ ,…,P _n }. From the first set of sampling points P ₁ Initially, a set of sampling points { P } is extracted step by step using a feature abstraction operation ₁ ,P ₂ ,…,P _n Abstract features of sample points in }. Feature abstraction operations are applied to a set of points P _i-1 Calculate the point set P _i Abstract features of each sampling point in (a) for a set of sampling points P _i The kth sample point inAt the sampling point set P _i-1 Find the position at the point->Radius r as center _i Neighborhood subset inside sphere->Using a Point distribution feature extractor (e.g., pointNet, RS-CNN, etc.) to act on +.>N in _i Individual points and their feature vectors, resulting in points +.>Is>Wherein, n _i Is positively correlated with a point cloud density D.Features of all sample points in each sample point set +.>Constitutes a point set P _i Is an abstract feature set F of (1) _i Each sampling point set { P ] ₁ ,P ₂ ,…,P _n Feature set { F } ₁ ,F ₂ ,…,F _n The spatial receptive field of is progressively expanding and is progressively abstract. Finally, the point of use and feature extractor acts on P _n Will produce a feature vector that expresses the global feature. Next, from the last layer of sample point set P _n Initially, a set of sampling points { P } will be obtained step by step _n ,P _n-1 ,…,P ₁ The propagation features of all sample points within a set will constitute a propagation feature set +.>Feature propagation operations are applied to point set P _i+1 Calculate the point set P _i Propagation characteristics of each sampling point of the set P of sampling points _i The kth sample point in->Will point set P _i+1 Middle and->Abstract features of the nearest 3 points to AND +.>The derivative of the distance of (2) is weighted average for the weight, the weighted average result is combined with the point +.>Abstract features of->Splicing, and applying multiple multi-layer perceptrons (MLP) and nonlinear activation functions (ReLu) to splicing results to obtain point ∈>Propagation characteristics of->Due to the sampling point set P _n A feature vector is identified as the next stage of the set P of sampling points and the weighted average result _n The abstract features of each point in the model are spliced, and a point set P is obtained through a plurality of multi-layer perceptrons (MLP) and nonlinear activation functions (ReLu) _n Propagation characteristics of each sample point +.>

From the set of sampling points { P ₁ ,P ₂ ,…,P _n A plurality of point sets are selected, which are called Monitoring Point Sets (MPS), and all sampling points in the monitoring point sets are called monitoring points. For the ith monitoring point set P _i The kth monitoring point in (a)Will->And->Respectively carrying out batch normalization and then splicing, and taking the splicing result as the characteristic of each monitoring point ∈10->Due to the characteristics of each monitoring point->The point distribution characteristics in the neighborhood of the monitoring point are reflected, the characteristics of the monitoring points in different areas are distinguished, the neighborhood of which target identification point the monitoring point belongs to can be judged according to the characteristics of each monitoring point, and the position of the adjacent target identification point can be predicted.

If the number of the target identification points is L, for the ith monitoring point set P _i The kth monitoring point in (a)Single-layer full-connection layer with 1 output dimension L>Acting on its features->Predicting the probability that the monitoring point is positioned in the neighborhood of each identification point by using L single-layer full-connection layers with 3 output dimensionsActing on its features->On the monitoring point->The offset (deltax, deltay, deltaz) from each identification point is predicted. Different->(e.g.)>) The offset of this monitoring point from a different identification point (e.g., the jth identification point) is predicted. The parameters at these fully connected layers are shared in each sample point set.

Features with larger spatial receptive fields can express distribution features of points within a larger spatial range, and can be used to locate identified points on a larger scale face, and vice versa. If multiple monitoring point sets with different spatial receptive fields are used, the network can be enabled to locate the identification points on faces of different scales at the same time. Because the relative topological relation of the identification points and the relative positions of the identification points and the characteristic areas on the human face are relatively fixed, the global information is helpful for positioning the identification points, and because the propagation characteristics of the points contain the global information, besides the abstract characteristics of the monitoring points, the propagation characteristics of the monitoring points are integrated as the characteristics of the monitoring points, so that the positioning stability of the network is improved.

Preferably, when the network is used for positioning the identification points in the point set with multiple scales, the identification points in the point set with specific scales are matched with the monitoring points with the corresponding size space receptive fields, and a series of boxes are respectively arranged with the gold standard identification points and the monitoring points as the centers and are respectively called a target frame TBX and a detection frame MBX.

Preferably, the gold standard is based on training dataSide length (l) of TBX is set _x ^t ,l _y ^t ,L _z ^t ) The setting mode is formula (1):

wherein,left external corner of the eye, left eye, right eye, left eye, right>Is the right outer corner of the eye, is->Eyebrow, in particular->Is the chin; according to the method for detecting the position of each monitoring point>The upper level point is centrally generated +.>Radius r of sphere of (2) _i Set->Length of side (l) _x ^m ,l _y ^m ,l _z ^m ) The setting mode is formula (2):

l _x ^m ＝l _y ^m ＝l _z ^m ＝2r _i (2)

if TBX of j-th gold standard mark point and monitoring pointIs->Is greater than a threshold th _m Then matching is performed according to equation (3):

preferably, all parameters of the network are trained simultaneously using the loss functions of equation (4), including the classification loss function and the regression loss function

loss＝loss _c +λloss _r (4)

The classification loss function is formula (5)

Wherein i, k are indexes of the monitoring point set and indexes of monitoring points in the monitoring point set respectively;

loss _i,k for monitoring pointsClassification loss of->Is to use sigmoid function to act on +.>The j-th dimension of the output of (a) calculates the predicted monitoring point of the network>The probability of being positioned in the neighborhood of the jth gold standard identification point is defined, at least one monitoring point matched with one gold standard identification point is defined as a positive sample, the monitoring point which is not matched with any gold standard identification point is defined as a negative sample, and N is defined as _p N is the number of positive samples _e Number of negative samples;

according to loss of _i,k Ordering negative samples, selecting loss _i,k The largest first few negative samples calculate the classification loss, and ensure that the number of negative samples participating in the calculation is not more than three times the number of positive samples.

The regression loss function is formula (6):

is composed of networkPredicted monitoring Point->Offset from the jth target mark point isAn output of (2); />Is the corresponding gold standard.

Fig. 2 is a schematic diagram of a simple matching result of a monitoring point and a target identification point. The training method is described in detail below.

In the network training stage, the monitoring points are required to be matched with the gold standard in the training data, and the network is trained according to the matching result.

In order to solve the two problems, a multi-label matching strategy (MLM) is proposed, each monitoring point is matched with a plurality of identification points, so long as the monitoring point is adjacent to a certain identification point, the identification point is matched with the monitoring point, the position of the identification point matched with the monitoring point is predicted by using the characteristic of each monitoring point, and the problem of positioning the identification point in the point cloud is converted into a problem of multi-label prediction and regression.

When using this network to locate the identification points in a set of points with multiple dimensions, it is necessary to match the identification points in the set of points with a certain specific dimension with the monitoring points with a corresponding size of spatial receptive field, for which purpose a series of boxes, called Target Box (TBX) and detection box (MBX), are set, centered on the gold standard identification points and the monitoring points, respectively. As shown in fig. 2, two solid black dots and two bold line boxes represent two target identification dots and their TBXs, respectively. The three diagonally filled black dots and the three thin line boxes are the three monitoring points and their MBX, respectively.

In order to make the size of TBX reflect the dimension of human face in training data, according to the gold standard of training dataSide length (l) of TBX is set _x ^t ,l _y ^t ,l _z ^t ) The arrangement mode is as follows:

wherein,left external corner of the eye, left eye, right eye, left eye, right>Is the right outer corner of the eye, is->Eyebrow, in particular->Is the chin tip.

According to the method for each monitoring pointThe upper level point is centrally generated +.>Radius r of sphere of (2) _i Set->Length of side (l) _x ^m ,l _y ^m ,l _z ^m ) The arrangement mode is as follows:

l _x ^m ＝l _y ^m ＝l _z ^m ＝2r _i

if TBX of j-th gold standard mark point and monitoring pointIs->Is greater than a threshold th _m Then they are matched +.>

Loss function: all parameters of the network are synchronously trained using the following loss functions, including classification loss functions and regression loss functions.

loss＝loss _c +λloss _r

Wherein the classification loss function is as follows:

wherein i, k are indexes of the monitoring point set and indexes of monitoring points in the monitoring point set respectively; loss of loss _i,k For monitoring pointsClassification loss of->Is to use sigmoid function to act on +.>The j-th dimension of the output of (a) calculates the predicted monitoring point of the network>The probability of being located inside the neighborhood of the jth gold standard mark point is obtained by the formula (3)>Defining at least one monitoring point matched with one gold standard identification point as a positive sample, and defining the monitoring point which is not matched with any gold standard identification point as a negative sample, N _p N is the number of positive samples _e Is the number of negative samples.

Since the number of negative samples is much larger than the number of positive samples, according to loss _i,k Ordering negative samples, selecting loss _i,k The largest first few negative samples calculate the classification loss, and ensure that the number of negative samples participating in the calculation is not more than three times the number of positive samples.

The regression loss function is defined as follows:

obtained by the method (3) Is a monitoring point predicted by the network +.>Offset from the jth target mark point is +.>An output of (2); />Is the corresponding gold standard.

In more detail, RS-Conv is used as a point distribution feature extractor in the network, and 3D euclidean distance and coordinate difference (3D-Ed, x _i -x _j ) As low-level hierarchical relationship information h of the point cloud. The network contains 8 feature abstract operations and feature propagation operations, and the sampling proportion { tau } ₁ ,τ ₂ ,…,τ ₇ {7/20,8/10,10/15,15/20,20/25,25/60,60/120} respectively, for generating a local sample subset for each sample pointIs { r } ₁ ,r ₂ ,…,r ₇ {8,10,15,20,25,60,120} (mm), respectively, the last feature abstraction operates to work on the point set P ₇ Collecting a local point cloud subset of each sampling point from the upper-level sampling point set by using the furthest sampling point method>Local point cloud subset->Number of sample points { s } ₁ ,s ₂ ,…,s ₇ The values are {75/V,100/V,50/V,75/V,75/V,200/V,100/V } where V is the size of the Grid in the Voxel Grid filter for down-sampling the set of input points, V=5 mm. In addition, λ=1, th _m ＝0.2,th _p ＝0.9,th _d ＝3mm,th _e ＝5mm。

And the covariance matrix Cov (X) for predicting the missing identification points is calculated according to the gold standard in the training set, and the missing gold standard identification points in the training data are supplemented to finish calculation of the matching condition of the gold standard and the monitoring points.

Data enhancement: the training data were rotated sequentially around the x, y, z axes, respectively, at randomly selected angle values ranging from-2.5 ° to +2.5°, and random jitter with a mean of 0 standard deviation of 0.25mm was added at each point of the training data. Random rotation and random jitter will change the training data used by the training network each time to be different from each other, which will stabilize the network training and is therefore very important.

The present invention is not limited to the preferred embodiments, but can be modified in any way according to the technical principles of the present invention, and all such modifications, equivalent variations and modifications are included in the scope of the present invention.

Claims

1. A prediction method of an end-to-end point cloud deep learning network model is characterized by comprising the following steps of: the model is a deep learning network structure similar to a convolutional neural network CNN, and comprises the following steps:

(3) Predicting the probability that each monitoring point is located in the neighborhood of different identification points and the offset of each monitoring point and different identification points;

in the step (1), for any input point cloud P, a pixel Grid filter is first used to downsample the point cloud P to a point cloud density D ₀ The method comprises the steps of carrying out a first treatment on the surface of the According to a fixed sampling proportion { tau } ₁ ,τ ₂ ,…,τ _n From P ₀ Step-by-step downsampling to obtain a sampling point set { P } ₁ ,P ₂ ,…,P _n -a }; from the first set of sampling points P ₁ Initially, a set of sampling points { P } is extracted step by step using a feature abstraction operation ₁ ,P ₂ ,…,P _n Abstract features of sample points in }; feature abstraction operations are applied to a set of points P _i-1 Calculate the point set P _i Abstract features of each sampling point in (a) for a set of sampling points P _i The kth sample point inAt the sampling point set P _i-1 Find the position at the point->Radius r as center _i Adjacent to the spherical interiorDomain subset->Extracting +.>N in _i Individual points and their feature vectors, resulting in points +.>Is an abstract feature vector f of (1) _i ^k Wherein n is _i Positively correlated to a point cloud density D; feature f of all sampling points in each sampling point set _i ^k Composition point set P _i Is an abstract feature set F of (1) _i Each sampling point set { P ] ₁ ,P ₂ ,…,P _n Feature set { F } ₁ ,F ₂ ,…,F _n The spatial receptive field of the } is expanded step by step and is abstract step by step; end use point cloud feature extractor acting on P _n Will produce a feature vector which expresses the global feature;

next, from the last layer of sample point set P _n Initially, a set of sampling points { P } will be obtained step by step _n ,P _n-1 ,…,P ₁ The propagation characteristics of all sample points within a set will constitute a propagation characteristic setFeature propagation operations are applied to point set P _i+1 Calculate the point set P _i Abstract features of each sampling point in (a) for a set of sampling points P _i The kth sample point in->Will point set P _i+1 Middle and->Abstract features of the nearest 3 points to AND +.>Is weighted by the inverse of the distance of (2), and the result of the weighted average is compared with the pointAbstract features f of (1) _i ^k And splicing, namely, using a plurality of multi-layer perceptron MLP and a nonlinear activation function ReLu functions to act on splicing results to obtain a point ∈>Propagation characteristics of->Due to the sampling point set P _n The next stage of (a) is a feature vector, then this feature vector is treated as a weighted average result and a set of sampling points P _n The abstract features of each point in the list are spliced to obtain a point set P _n Propagation characteristics of each sample point +.>

2. The method for predicting an end-to-end point cloud deep learning network model according to claim 1, wherein: in the step (1), the Voxel Grid filter voxels the space first, and the center of gravity of the point located in each Voxel forms an output point cloud.

3. The method for predicting an end-to-end point cloud deep learning network model according to claim 2, wherein: in the step (2), from the sampling point set { P } ₁ ,P ₂ ,…,P _n Selecting a plurality of point sets, namely a monitoring point set MPS, and enabling all sampling points in the monitoring point sets to be called monitoring points; for the ith monitoring point set P _i The kth monitoring point in (a)Will f _i ^k And->Respectively carrying out batch normalization and then splicing, and taking the splicing result as the characteristic of each monitoring point ∈10->Characteristics of each monitoring Point->The point distribution characteristics in the neighborhood of the monitoring point are reflected, the characteristics of the monitoring points in different areas are distinguished, the neighborhood of the destination ratio identification point where the monitoring point is located is judged according to the characteristics of each monitoring point, and the position of the adjacent destination ratio identification point is predicted.

4. The method for predicting an end-to-end point cloud deep learning network model of claim 3, wherein: in the step (3), if the number of the target identification points is L, for the ith monitoring point set P _i The kth monitoring point in (a)Single-layer full-connection layer with 1 output dimension L>Acting on its features->Predicting the probability that the monitoring point is positioned in the neighborhood of each identification point by using L single-layer full-connection layers with 3 output dimensionsActing on its features->On the monitoring point->Predicting the offset (delta x, delta y, delta z) of each identification point; jth->And predicting the offset of the monitoring point and the j-th identification point.

5. The method for predicting an end-to-end point cloud deep learning network model according to claim 4, wherein: in the step (3), the parameters at these fully connected layers are shared in each sample point set.

6. The method for predicting an end-to-end point cloud deep learning network model according to claim 5, wherein: each monitoring point is matched with a plurality of identification points, so long as the monitoring point is adjacent to a certain identification point, the identification point is matched with the monitoring point, the position of the identification point matched with each monitoring point is predicted by using the characteristic of each monitoring point, and the problem of positioning the identification point in the point cloud is converted into a problem of multi-label prediction and regression.

7. The method for predicting an end-to-end point cloud deep learning network model of claim 6, wherein: when the network is used for positioning the identification points in the point set with multiple scales, the identification points in the point set with specific scales are matched with the monitoring points with the space receptive fields with corresponding sizes, and a series of boxes which are respectively called a target box TBX and a detection box MBX are respectively arranged by taking the gold standard identification points and the monitoring points as centers.

8. The method for predicting an end-to-end point cloud deep learning network model of claim 7, wherein: according to trainingGold standard for training dataSide length (l) of TBX is set _x ^t ,l _y ^t ,l _z ^t ) The setting mode is formula (1):

wherein,left external corner of the eye, left eye, right eye, left eye, right>Is the right outer corner of the eye, is->Eyebrow, in particular->Is the chin;

according to the method for each monitoring pointThe upper level point is centrally generated +.>Radius r of sphere of (2) _i Set->Length of side (l) _x ^m ,l _y ^m ,l _z ^m ) The setting mode is formula (2):

l _x ^m ＝l _y ^m ＝l _z ^m ＝2r _i (2)

9. the method for predicting an end-to-end point cloud deep learning network model of claim 8, wherein: learning all parameters of the network simultaneously using the loss functions of equation (4), including classification loss functions and regression loss functions

loss＝loss _c +λloss _r (4)

The classification loss function is formula (5)

according to loss of _i,k Ordering negative samples, selecting loss _i,k The largest first several negative samples calculate the classification loss, and ensure that the number of the negative samples participating in the calculation is not more than three times of the number of the positive samples;

the regression loss function is formula (6):

is a monitoring point predicted by the network +.>Offset from the jth target mark point isAn output of (2); />Is the corresponding gold standard.