CN114782684B

CN114782684B - Point cloud semantic segmentation method and device, electronic equipment and storage medium

Info

Publication number: CN114782684B
Application number: CN202210220298.1A
Authority: CN
Inventors: 宁欣; 王昌硕; 董肖莉; 李卫军; 张丽萍; 孙琳钧
Original assignee: Institute of Semiconductors of CAS
Current assignee: Institute of Semiconductors of CAS
Priority date: 2022-03-08
Filing date: 2022-03-08
Publication date: 2023-04-07
Anticipated expiration: 2042-03-08
Also published as: CN114782684A

Abstract

The invention provides a point cloud semantic segmentation method, a point cloud semantic segmentation device, electronic equipment and a storage medium, wherein the method comprises the following steps: sampling and extracting characteristics of point clouds of a target object to obtain high-order characteristic information of a plurality of sampling points; the high-order characteristic information of the sampling point is obtained based on relative characteristic information and relative position information between the sampling point and surrounding neighborhood points of the sampling point and density information of the neighborhood points; obtaining high-order characteristic information of each point in the point cloud of the target object according to the high-order characteristic information of the plurality of sampling points; and performing semantic segmentation on the point cloud of the target object according to the high-order characteristic information of each point in the point cloud of the target object. The high-order characteristic information can reflect deep semantic information of the point cloud, so that the accuracy of a point cloud semantic segmentation result obtained by the point cloud semantic segmentation method is higher.

Description

Point cloud semantic segmentation method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of three-dimensional images, in particular to a point cloud semantic segmentation method, a point cloud semantic segmentation device, electronic equipment and a storage medium.

Background

In the prior art, a sensing end and a cognitive end of a 3D visual sensing technology are split and cannot interact with each other, and the sensing end, namely a visual sensing data acquisition end, can only acquire data in a full disc, so that sensing redundancy is large, accuracy is low, and further the cognitive end is high in cost and threshold. Regarding the interaction problem between the perception end and the cognition end in the 3D vision sensing technology, the cognition end uses a point cloud semantic segmentation technology when analyzing and identifying the reacquired region of the perception end.

Point cloud semantic segmentation refers to identifying and extracting different objects in a point cloud, namely identifying and analyzing objects in a scene. One conventional method of point cloud semantic segmentation is to extract feature information from a point cloud, and then obtain the probability of each category to which each point in the point cloud belongs by processing the feature information. In the process of point cloud semantic segmentation, extracting feature information is a step which is quite important. The content and category of the information which can be represented by the characteristic information can have an important influence on the point cloud semantic segmentation effect.

Extracting characteristic information from point clouds through a neural network is a hot spot in the industry at present. Such as PointNet, learns features independently for each point and integrates global features, but this type of network structure ignores local structures. PointNet + + improves PointNet by dividing the point cloud into different subsets and processing these nested partitions with pintnet to extract local features and combine multi-scale features. However, the method also treats the points in the subset independently and does not consider the connection between the points. Those skilled in the art have subsequently proposed various other types of neural network architectures, such as DGCNN, LDGCNN, pointCNN, pointConv, pointWeb, RS-CNN, and the like. These follow-up works mainly solve the problem of local point cloud correlation, but also have the following disadvantages: first, sufficient high-order information cannot be extracted; secondly, the distribution of the point cloud data in the space is not considered to be irregular and uneven, which can cause the network not to learn local features better; thirdly, most of the selected feature aggregation methods are maximum pooling, and the pooling only focuses on the most significant features in each dimension, which inevitably causes information loss.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a point cloud semantic segmentation method, a point cloud semantic segmentation device, electronic equipment and a storage medium.

The invention provides a point cloud semantic segmentation method, which comprises the following steps:

sampling and extracting the characteristics of the point cloud of the target object to obtain high-order characteristic information of a plurality of sampling points; the high-order characteristic information of the sampling point is obtained based on relative characteristic information and relative position information between the sampling point and surrounding neighborhood points of the sampling point and density information of the neighborhood points;

obtaining high-order characteristic information of each point in the point cloud of the target object according to the high-order characteristic information of the plurality of sampling points; and performing semantic segmentation on the point cloud of the target object according to the high-order characteristic information of each point in the point cloud of the target object.

According to the point cloud semantic segmentation method provided by the invention, the point cloud of a target object is sampled and subjected to feature extraction to obtain high-order feature information of a plurality of sampling points, and the method comprises the following steps:

sampling and extracting the point cloud of the target object for multiple times to obtain high-order characteristic information of multiple sampling points; wherein, once sampling and feature extraction include:

determining a second sampling point and a neighborhood point of the second sampling point according to the point cloud of the target object or a first sampling point obtained by previous sampling;

determining high-order characteristic information of the neighborhood points according to the relative characteristic information between the neighborhood points and the second sampling points and the convolution weights of the neighborhood points; the relative feature information between the neighborhood point and the second sampling point is obtained according to the feature information of the neighborhood point and the second sampling point in the point cloud or the feature information obtained after the previous feature extraction of the neighborhood point and the second sampling point; the convolution weight of the neighborhood point is obtained according to the relative position information between the neighborhood point and the second sampling point and the density information of the neighborhood point;

and performing feature aggregation on the high-order feature information of the neighborhood point to obtain the high-order feature information of the second sampling point.

According to the point cloud semantic segmentation method provided by the invention, the determining of the high-order characteristic information of the neighborhood points according to the relative characteristic information between the neighborhood points and the second sampling points and the convolution weights of the neighborhood points comprises the following steps:

calculating relative position information between the neighborhood point and the second sampling point;

generating a first convolution weight of the neighborhood point according to the relative position information;

calculating the density of the neighborhood points in the point cloud;

adjusting the first convolution weight of the neighborhood points according to the density to obtain the convolution weight of the neighborhood points;

calculating relative feature information between the neighborhood point and the second sampling point;

and calculating the high-order characteristic information of the neighborhood points according to the relative characteristic information between the neighborhood points and the second sampling points and the convolution weight of the neighborhood points.

According to the point cloud semantic segmentation method provided by the invention, the feature aggregation is performed on the high-order feature information of the neighborhood points to obtain the high-order feature information of the second sampling point, and the method comprises the following steps:

respectively carrying out average pooling operation and maximum pooling operation on the high-order characteristic information of the neighborhood points;

respectively compressing or expanding the spatial dimension of the average pooling operation result and the maximum pooling operation result to obtain an average pooling operation result and a maximum pooling operation result with the same spatial dimension;

and combining the average pooling operation result and the maximum pooling operation result with the same spatial dimension element by element, and activating the combined result to obtain the high-order characteristic information of the second sampling point.

determining an attention coefficient vector of the neighborhood point according to the position information of the neighborhood point, the position information of the second sampling point and the relative position information between the neighborhood point and the second sampling point;

normalizing the attention coefficient vectors of all neighborhood points around the sampling point in the same channel to obtain a normalized attention coefficient vector;

and obtaining the high-order characteristic information of the second sampling point according to the high-order characteristic information of the neighborhood point and the normalized attention coefficient vector.

According to the point cloud semantic segmentation method provided by the invention, the generating of the first convolution weight of the neighborhood point according to the relative position information comprises the following steps:

splicing the relative position information between the neighborhood point and the second sampling point with the neighborhood point position information and the second sampling point position information to obtain a splicing result;

and respectively performing feature mapping on the splicing result by utilizing a plurality of parallel convolution kernels in the multilayer perceptron, and then combining the results of the feature mapping based on the attention coefficients corresponding to different parallel convolution kernels to obtain a first convolution weight of the neighborhood point.

According to the point cloud semantic segmentation method provided by the invention, the obtaining of the high-order feature information of each point in the point cloud of the target object according to the high-order feature information of the plurality of sampling points comprises the following steps:

and carrying out reverse interpolation, characteristic connection and characteristic mapping on the data of the plurality of sampling points to obtain high-order characteristic information of each point in the point cloud of the target object.

The invention also provides a point cloud semantic segmentation device, which comprises:

the sampling and feature extraction module is used for sampling and feature extracting point clouds of the target object to obtain high-order feature information of a plurality of sampling points; the high-order characteristic information of the sampling point is obtained based on relative characteristic information and relative position information between the sampling point and surrounding neighborhood points of the sampling point and density information of the neighborhood points;

the semantic segmentation module is used for obtaining high-order characteristic information of each point in the point cloud of the target object according to the high-order characteristic information of the plurality of sampling points; and performing semantic segmentation on the point cloud of the target object according to the high-order characteristic information of each point in the point cloud of the target object.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the point cloud semantic segmentation method when executing the program.

The invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the point cloud semantic segmentation method as described.

According to the point cloud semantic segmentation method, the point cloud semantic segmentation device, the electronic equipment and the storage medium, the point cloud is sampled and extracted, and high-order feature information of a sampling point can be obtained on the basis of relative feature information and relative position information between the sampling point and a neighborhood point and density information of the neighborhood point; further, the high-order characteristic information of the sampling points is transmitted to each point in the point cloud, so that the semantic segmentation of the point cloud can be realized according to the high-order characteristic information of each point in the point cloud. Because the high-order characteristic information can reflect deep semantic information of the point cloud, the accuracy of the point cloud semantic segmentation result obtained by the point cloud semantic segmentation method is higher.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a point cloud semantic segmentation method provided by the present invention;

FIG. 2 is a schematic flow chart of a second convolution weight generation process involved in the point cloud semantic segmentation method provided by the present invention;

FIG. 3 is a schematic diagram of a feature aggregation process involved in the point cloud semantic segmentation method provided by the present invention;

FIG. 4 is a schematic diagram of interpolation involved in the point cloud semantic segmentation method provided by the present invention;

FIG. 5 is a schematic structural diagram of a point cloud semantic segmentation apparatus provided by the present invention;

fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The point cloud semantic segmentation method, apparatus, electronic device and storage medium of the present invention are described with reference to fig. 1 to 6.

Fig. 1 is a flowchart of the point cloud semantic segmentation method of the present invention, and as shown in fig. 1, the point cloud semantic segmentation method of the present invention includes:

step 101, sampling and feature extracting are carried out on the point cloud of the target object, and high-order feature information of a plurality of sampling points is obtained.

In the invention, the target object refers to an object to be subjected to category identification through point cloud semantic segmentation. The target object may be a human or an object. For example, there are visitors, pets, flowers, and trees in the park. The whole park area can be a target object, and tourists, pets, flowers and trees in the park can be distinguished by a point cloud semantic segmentation method. As another example, the target object may be a person, and the head, the limbs, and the trunk of the person may be distinguished by the point cloud semantic segmentation method.

Point cloud data (point cloud data) refers to a collection of vectors in a three-dimensional coordinate system. The point cloud data typically includes location information for the points (typically represented in three-dimensional location coordinates x, y, z), as well as feature information for the points.

There are various implementations of the feature information of the point, for example, the position information of the point may be used as the feature information of the point, or the surface normal vector of the point and the position information of the point may be used together as the feature information of the point. If only the position information of the point is used as the feature information of the point, the position information of the point included in the point cloud data is initially the same as the feature information of the point, but the feature information of the point is changed from the feature information of a lower order to the feature information of a higher order as the feature information of the point is further processed, and at this time, the position information of the point and the feature information of the point are different. In this embodiment, the position information of the point may be used as the feature information of the initial point.

The acquisition of the point cloud data of the target object can be realized by existing equipment, such as a laser radar or a 3D camera.

After the point cloud data of the target object is acquired, in this step, the point cloud of the target object can be sampled and feature extracted to obtain data of a plurality of sampling points. The data of the sampling points comprise high-order characteristic information of the sampling points, and the high-order characteristic information of the sampling points is obtained based on relative characteristic information, relative position information and density information of neighborhood points between the sampling points and the surrounding neighborhood points.

As is known from the foregoing description of the prior art, in the prior art, when feature extraction is performed on points in a point cloud, only low-order feature information can be generally extracted. As is known to those skilled in the art, low-order feature information is more focused on detail features, while high-order feature information is more focused on deep semantic information. In the point cloud semantic segmentation process, the category of each point in the point cloud needs to be identified. Therefore, if the high-order characteristic information of the point in the point cloud can be obtained, the accuracy of the point cloud semantic segmentation result can be improved.

Therefore, in the embodiment, a certain number of sampling points can be selected from the point cloud of the target object, and then neighborhood points around the sampling points are determined by taking the sampling points as centers; and then, calculating high-order characteristic information of the neighborhood points by utilizing the relative characteristic information, the relative position information and the density information of the neighborhood points between the neighborhood points and the sampling points, and further obtaining the high-order characteristic information of the sampling points by utilizing the high-order characteristic information of the neighborhood points.

In the following embodiments, the sampling and feature extraction processes involved in this step will be further described.

It should be noted that, in the present embodiment, the division between the high-order feature information and the low-order feature information may be determined according to actual situations. Taking the cover convolution formula as an example, if the index p > =3, the feature information extracted for the point in the point cloud is considered to be high-order feature information. The exponent p of the coverage convolution formula is learnable, so that for the neural network with different layers formed by the coverage convolution formula, the order of the adopted characteristic of each layer can be adaptively determined through learning. This greatly increases the expressive power of the network compared to a neural network that uses only first-order features.

102, obtaining high-order characteristic information of each point in the point cloud of the target object according to the high-order characteristic information of the plurality of sampling points; and performing semantic segmentation on the point cloud of the target object according to the high-order characteristic information of each point in the point cloud of the target object.

When semantic segmentation is performed on the point cloud, point-by-point detection needs to be performed on each point in the point cloud. In the previous step, only the high-order feature information of a plurality of sampling points (i.e. partial points in the point cloud) is obtained. Therefore, in this step, the high-order feature information of the sampling points needs to be propagated from a plurality of sampling points (i.e., some points in the point cloud) to all the points in the point cloud, so as to obtain the high-order feature information of each point in the point cloud.

After the high-order feature information of each point in the point cloud is obtained, semantic segmentation can be performed on the point cloud of the target object based on the high-order feature information of the points. Because the high-order characteristic information based on when the point cloud is subjected to semantic segmentation can better reflect deep semantic information of the point cloud, the accuracy of the semantic segmentation result of the point cloud of the target object is higher.

The following takes an example of an experimental result of a single convolution kernel generated by a shared MLP (multi-Layer perceptron). "shared" in shared MLP means that the neighborhood points are all transformed using the same convolution kernel, and MLP is used to generate the convolution kernel. Multiple MLPs generate multiple convolution kernels, and each domain point in the domain shares the several convolution kernels. Table 1 (split into tables 1-1 and 1-2 for ease of identification) is the experimental data covering the semantic segmentation dataset shareenet Part of the object Part, and table 2 (split into tables 2-1 and 2-2 for ease of identification) is the semantic segmentation result covering Area 5 convolved on the S3DIS dataset.

TABLE 1-1

Tables 1 to 2

TABLE 2-1

Tables 2 to 2

According to the experimental data in the tables 1 and 2, analysis and comparison show that the experimental data is better when the coverage convolution method is adopted, namely the accuracy of the point cloud semantic segmentation result of the target object is higher.

According to the point cloud semantic segmentation method, by sampling and feature extraction of the point cloud, high-order feature information of a sampling point can be obtained on the basis of relative feature information and relative position information between the sampling point and a neighborhood point and density information of the neighborhood point; further, the high-order characteristic information of the sampling points is transmitted to each point in the point cloud, so that the semantic segmentation of the point cloud can be realized according to the high-order characteristic information of each point in the point cloud. The high-order characteristic information can reflect deep semantic information of the point cloud, so that the accuracy of a point cloud semantic segmentation result obtained by the point cloud semantic segmentation method is higher.

Based on any one of the above embodiments, in this embodiment, the sampling and feature extracting the point cloud of the target object to obtain the high-order feature information of the multiple sampling points includes:

The above steps will be further described with reference to examples.

Generally, the number of points included in the point cloud of the target object is huge, and if the subsequent feature extraction processing is performed on all the points, a large amount of computing resources are consumed. Therefore, in the present embodiment, a first number of sample points may be selected from the point cloud.

For example, 10000 points are included in the point cloud of the target object, and 1024 points can be selected from the 10000 points as sampling points.

When the sampling points are selected, various sampling methods may be adopted, such as a Farthest Point Sampling (FPS) method, a random sampling method, and the like. In this embodiment, the farthest point sampling method can be employed. Compared with a random sampling method, the farthest point sampling method can better cover the whole sampling space, so that the sampled sampling points are more representative in the point cloud.

After a first number of sample points are obtained, the point cloud of the target object is divided into a plurality of subsets centered on the sample points.

As can be seen from the foregoing description, the sampling points are only a part of the point cloud of the target object, and some points in the point cloud of the target object are not considered as sampling points, and these points may be referred to as non-sampling points. In this step, the sampling point is taken as the center, and non-sampling points within a certain range near the sampling point are classified into a subset where the sampling point is located.

For example, the target object is a human body, one of the sampling points is a point at the mouth of the human body, a non-sampling point having a distance within a certain range (e.g., 2 cm) from the sampling point is found with the sampling point as the center, and if there are 5 such points, the 5 points belong to the subset in which the sampling point at the mouth is located.

Points that belong to the same subset as the sampling points are also referred to as neighborhood points. The neighborhood point search for the sampling point can be implemented in various ways, such as a K nearest neighbor (KNN, K-nearest neighbor) algorithm, a spherical neighborhood query method, and the like. In this embodiment, a spherical neighborhood query method may be employed.

It should be noted that there may be some non-sample points in the point cloud of the target object that cannot be included in any subset, and these non-sample points will be discarded.

After the sampling point and the neighborhood point are determined, relative feature information between the sampling point and the neighborhood point thereof can be calculated.

In this embodiment, the relative feature information refers to a relative relationship between the feature of the neighborhood point in the feature space and the feature of the sampling point, that is, a difference between the feature of the neighborhood point and the feature of the sampling point (central point).

According to the definition of the relative characteristic information, the relative characteristic information between the sampling point and the neighborhood point can be calculated based on the characteristic information of the sampling point and the characteristic information of the neighborhood point.

Let x be _i For sampling points, can use f _i ＝(f _ih |h＝1,2,……，n)∈R ⁿ Representing sample points x _i N represents the dimension of the feature, h represents the h-th dimension of the feature; x is the number of _j Is x _i Neighborhood points of (1), available as f _j ＝(f _jh |h＝1,2,……，n)∈R ⁿ Representing neighborhood point x _j The characteristics of (1). Then sample point x _i And neighborhood point x _j May be expressed as f _j -f _i 。

If there are k neighborhood points of a sampling point, there are k pieces of relative feature information between the sampling point and the neighborhood points.

Besides determining the relative feature information between a sampling point and its neighboring points, the convolution weight of the neighboring points needs to be determined.

In this embodiment, the convolution weights are used to perform a non-linear transformation on the features of the point cloud to extract abstract features in the point cloud.

Specifically, relative position information between a sampling point and a neighborhood point of the sampling point is calculated in each subset, a first convolution weight of the neighborhood point is generated according to the relative position information, and the first convolution weight is adjusted by using point cloud density of the neighborhood point to obtain the convolution weight of the neighborhood point. For the convenience of distinction, the convolution weight of the neighborhood point obtained by final calculation may also be referred to as a second convolution weight of the neighborhood point.

When calculating the first convolution weight, the relative position information between the sampling point and its neighborhood point (i.e. the distance between the sampling point and the neighborhood point) can be calculated first, then the relative position information is spliced with the neighborhood point position information, the sampling point position information, the difference between the neighborhood point position information and the sampling point position information, and finally the channel of the splicing result is mapped to the sampling point characteristic fi through a shared MLP (Multi-Layer Perception) _i The channels are aligned.

The first convolution weight may be formulated as follows:

ω _j ＝MLP(||x _j -x _i ||,x _i ,x _j ,x _j -x _i ) (1)

the first convolution weight is transformed from point-to-point cloud coordinates using MLP. In this way, the first convolution weight is obtained by driving point cloud data and transforming the characteristics of the point cloud, which is beneficial to learning shape information implied by the point cloud.

Given that the distribution of the point cloud in space is irregular and uneven, this can result in neural networks not learning local features well. Therefore, in this embodiment, the first convolution weight is further adjusted by using the point cloud density, so as to obtain a second convolution weight.

Fig. 2 is a schematic flow chart of a second convolution weight generation process involved in the point cloud semantic segmentation method provided by the present invention, and as shown in fig. 2, the method specifically includes: firstly, estimating the Density of the periphery of each point in the point cloud by using Kernel Density Estimation (KDE); then the density value is input into a one-dimensional nonlinear transformed multilayer perceptron, and finally the transformed density output by the multilayer perceptron is copied to the first convolution weight omega _j The same channel as the first convolution weight ω _j Point-by-point multiplication to obtain a second convolution weight omega' _j . The corresponding calculation formula is:

ω’ _j ＝tile(MLP(m _j ))·ω _j (2)

wherein tile represents copying the vector n times; m is a unit of _j e.R represents a neighborhood point x _j The density of (a); the MLP is the aforementioned multilayer perceptron.

As mentioned above, the first convolution weight is obtained by the position coordinate relationship between the neighborhood point and the sampling point and by the MLP, and thus it can be said that the first convolution weight includes the position relationship. Since the distribution characteristics of the point cloud play an important role in capturing the shape of the point cloud, in this embodiment, the convolution weight is weighted by calculating the density of the sampling point, and the obtained second convolution weight includes both the position information and the density information of the point cloud. In the calculation process of the second convolution weight, the nonlinear transformation of the density around each point in the point cloud obtained by kernel density estimation is performed in order to make the network adaptively determine whether to apply density estimation. This is because some points in the point cloud are important and others are relatively unimportant, and by applying a non-linear transformation to the density, the network can adaptively determine which points it should value.

Because the coordinates of each neighborhood point and the distance between each neighborhood point and the sampling point are different, each neighborhood point can obtain respective first convolution weight and second convolution weight. Assuming that there are k neighborhood points in a subset, the second convolution weights of the k neighborhood points in the subset can be obtained through the present step.

However, the MLP used for calculating the first convolution weight and the second convolution weight is the same for different neighboring points in the same subset, i.e. the MLP is a shared MLP. Shared MLP is employed because the problem of point cloud disorder can be solved by its pooling operation and subsequent feature aggregation.

In the above calculation formula for calculating the second convolution weight, the vector is copied by tile operation by n, where n is the dimension of the point. This is because the dimension of the first convolution weight of a certain neighborhood point is n, and the dimension of the density of the point is 1, and therefore the dimension of the density of the point needs to be expanded by laterally copying n pieces, so that the dimension of the density of the expanded point is the same as the dimension of the first convolution weight.

After determining the relative feature information between the sampling point and its neighborhood points and the convolution weights of the neighborhood points, the high-order feature information of the neighborhood points in each subset can be calculated according to the two.

The calculation formula of the high-order characteristic information of the neighborhood point is as follows:

y _j ＝(y _jh |h＝1,2,……,n)∈R ⁿ (4)

wherein, y _j Representing high-order characteristic information with the dimension of n of the jth neighborhood point in the subset; y is _jh Representing the h dimension in the high-order characteristic information of the j neighborhood point in the subset; omega _j ’ _h Dimension h of the second convolution weight for the jth neighbor point in the subset; f. of _jh Representing the h dimension of the characteristic of the j neighborhood point in the subset; f. of _ih Representing the h-dimension of the sample point features in the subset; p is a power parameter and s is a parameter that determines the sign of a single term.

After the high-order feature information of the neighborhood points in each subset is obtained, feature aggregation can be performed on the high-order feature information of each neighborhood point to obtain the high-order feature information of the sampling points (central points) in the subsets.

The calculation formula of the high-order characteristic information of the sampling points is as follows:

wherein, f _i Is sampling point x _i High-order feature information of (2); δ is the activation function; γ is a characteristic aggregation function that maintains permutation invariance. In the present application, the resulting sampling points x will also be referred to _i High-order feature information f of _i ' is called coverage convolution.

The feature aggregation function may be a summation operation or a pooling operation, which is not limited in this embodiment.

High-order characteristic information f at the aforementioned calculated sampling point _i In the calculation formula of' a way of feature aggregation for the same channel of the neighborhood point is adopted. If feature aggregation is performed on the features of the individual points and the aggregation function selects a summation operation, the expression of the higher order feature information of the sample points in the subset becomes:

f _i ’＝δ(y _i ) (6)

wherein, y _i ＝(y _ih |h＝1,2,……,n)∈R ⁿ ，

k represents a sampling point x _j The number of neighborhood points.

When f is _i All 0,s =1,p =1, y _ih The expression of (c) is:

this is an expression of the conventional BP neuron model.

When ω' _j When all the components of (1,s =0,p =2, y _ih The expression of (a) is:

this is an expression for a conventional radial basis neuron.

From the above analysis, the coverage convolution proposed by the present application can cover the functions of the BP neuron and the radial basis neuron. This shows that the geometrical shape of the expression covering convolution is complex and variable and has strong expression capability.

In the coverage convolution described in the present invention, not only the relative position between the point and the point is considered, but also the relative feature between the point and the point is considered (this is advantageous for grasping the semantic relationship of the local subset). The formula of the coverage convolution has two learnable exponents of s and p, wherein p ensures that the learned characteristics of the coverage convolution are high-order, and the learned characteristics of each layer are high-order characteristic information of different p orders through a back propagation algorithm. The network structure consisting of the coverage convolution of the present invention is more expressive than a network of only first-order features alone. Meanwhile, the s index range is 0 or 1, the directionality of the p-order features processed by the coverage convolution formula is guaranteed, the coverage convolution with the directionality is achieved, a space geometric body is established between the neighborhood point and the central point (sampling point), and shape information implicit in the point cloud can be mined.

The above is a description of the one-time sampling and feature extraction process. Because the high-order feature information of the sampling points obtained by one-time sampling and feature extraction is not high enough, in order to obtain a better semantic segmentation effect, multiple times of sampling and feature extraction operations are generally required to be executed.

When the sampling and feature extraction operations are performed again, a new sampling point (second sampling point) needs to be selected from the first sampling points obtained in the previous sampling operation, and a new neighborhood point needs to be selected from the sampling points obtained in the previous sampling operation by taking the new sampling point as a central point.

For example, 10000 points are included in the point cloud of the target object, and 1024 points have been selected from the 10000 points as sampling points in the last round of sampling operation. In a new round of sampling operation, selecting one point from the 1024 points as a sampling point, and then taking the sampling point as a central point, and selecting a plurality of points from the rest 1023 (1024 points minus the determined central point) points as neighborhood points; then, continuously selecting one point from 1023 points as a sampling point, and selecting a neighborhood point by taking the sampling point as a central point; and so on until a second number (e.g., 512) of sample points is selected and a subset of the second number is obtained with the sample points as center points. In addition, it should be emphasized that after one point is selected as a neighborhood point, the same point can be set as a sampling point (center point).

After the subsets with the second quantity are obtained, second high-order features of the sampling points are calculated in each subset according to relative feature information and relative position information between the neighborhood points and the sampling points (here, the high-order features of the sampling points obtained in the first sampling and feature extraction operation process are recorded as first high-order features, the high-order features of the sampling points obtained in the second sampling and feature extraction operation process are recorded as second high-order features, and the like).

It should be noted that in the previous sampling operation, the high-order feature of the central point (sampling point) has been calculated, and in the new sampling operation, both the newly obtained central point (sampling point) and the newly obtained neighborhood point are selected from the sampling points obtained in the previous sampling operation. Therefore, when the relative feature information between the neighborhood point and the central point is calculated in a new round, the relative feature information is calculated on the basis of the high-order features of the points. Since the position information of the points in the point cloud is not changed in the previous operation, the calculation method of the relative position information is not substantially changed.

After the relative feature information and the relative position information are obtained, a first convolution weight of a new neighborhood point is still generated according to the relative position information, and the first convolution weight of the new neighborhood point is adjusted by using the point cloud density of the new neighborhood point to obtain a second convolution weight of the new neighborhood point. Then, calculating high-order characteristics of each new neighborhood point according to the second convolution weight of the new neighborhood point in the subset and the relative characteristic information between the new neighborhood point and the new sampling point; and finally, performing feature aggregation on the high order of each new neighborhood point to obtain second high order features of the new sampling points in the subset.

The specific implementation process of the above steps is similar to the implementation process of calculating the first high-order feature of the sampling point, and therefore, the description is not repeated here. Obviously, the second high-order features of the sampling points involved in the new round of sampling operation are higher in dimensionality than the first high-order features of the points, and can reflect semantic information of the points better.

After a new round of sampling operation is finished, whether a termination condition is met or not needs to be judged, if the termination condition is not met, a round of sampling operation needs to be carried out again, a higher-order feature is calculated for a newly generated sampling point, and if the termination condition is met, the processes of sampling and calculating the high-order feature are stopped.

For example, in the present embodiment, the number of rounds of the sampling operation may be set as the termination condition, such as setting the number of rounds to 4 as the termination condition. Then the sampling operation is carried out for four rounds, in the first round of sampling operation, 1024 points are sampled from 10000 points of the point cloud, and the first high-order features are extracted for the 1024 sampling points; in the second round of sampling operation, 512 sampling points are selected from the 1024 sampling points, and second high-order features are extracted for the 512 sampling points; in the third round of sampling operation, 128 sampling points are selected from the 512 sampling points, and a third high-order feature is extracted for the 128 sampling points; in the fourth sampling operation, 64 more sampling points are selected from the aforementioned 128 sampling points, and the fourth high-order feature is extracted for the 64 sampling points. When the number of sampling operations reaches 4 times, the termination condition is met, and therefore the sampling operation is not continued. In the process, extracting first high-order features based on the initial features of the sampling points; extracting a second high-order feature based on the first high-order feature; similarly, extracting a third high-order feature based on the second high-order feature; and extracting fourth high-order features based on the third high-order features. The order of the features of the extracted sampling points is continuously increased along with the repeated sampling operation, and the features with higher orders can reflect the semantic information of the point cloud.

The number of sampling points of the sampling operations of different rounds described above is only for illustration, and those skilled in the art can easily understand that the number of sampling points of each round of sampling operations can be determined according to actual needs in practical applications. However, the number of sampling points in the previous sampling operation should be greater than the number of sampling points in the next sampling operation, and the sampling points involved in the next sampling operation are selected from the sampling points involved in the previous sampling operation.

The point cloud semantic segmentation method provided by the invention performs multiple sampling and feature extraction on the point cloud of a target object, selects a sampling point in the process of one-time sampling and feature extraction, then determines neighborhood points around the sampling point, and then calculates the relative feature information, the relative position information and the density information of the neighborhood points between the sampling point and the neighborhood points, thereby calculating the high-order feature information of the sampling point. Because the high-order characteristic information can reflect deep semantic information of the point cloud, the accuracy of the point cloud semantic segmentation result obtained by the point cloud semantic segmentation method is higher.

Based on any one of the above embodiments, in this embodiment, the performing feature aggregation on the high-order feature information of the neighboring point to obtain the high-order feature information of the second sampling point includes:

Because the point cloud is disordered, in order to ensure the displacement invariance caused by the point cloud disorder, aggregation operation needs to be carried out on the features extracted from the neighborhood points. In the prior art, polymerization operations generally have addition and pooling operations, with most uses today maximizing pooling. However, such an operation only focuses on the most significant features of each dimension, and will inevitably result in loss of information.

In order to overcome the disadvantages of the feature aggregation methods in the prior art, in the present embodiment, feature aggregation can be achieved by combining average pooling with maximum pooling.

The feature aggregation may be represented by the following expression:

in this expression, avgPool represents the average pooling, maxPool represents the maximum pooling, x _j Is the jth neighbor point in the subset, y _j High-order feature information representing the jth neighborhood point in the subset, f _i Is' sample point x _i δ is the activation function.

Fig. 3 is a schematic diagram of a feature aggregation process related to the point cloud semantic segmentation method provided by the present invention. As shown in fig. 3, in the feature aggregation process, the spatial information of each neighborhood point is aggregated by using average pooling and maximum pooling, and then the average pooling result and the maximum pooling result are respectively sent to a shared MLP network, and the spatial dimensions of the average pooling result or the maximum pooling result are compressed or expanded by the MLP network; and then, element-by-element summation and combination are carried out on the average pooling operation result and the maximum pooling operation result with the same spatial dimension, and then the aggregation characteristic of the central point (namely the sampling point) of the local neighborhood is finally generated through a Sigmoid activation function.

In the process, the average pooling is responsive to each feature of the neighborhood point, and the maximum pooling is responsive to the most significant feature of each dimension of the neighborhood point; therefore, the characteristics obtained by final aggregation are richer in characteristic content.

The point cloud semantic segmentation method provided by the invention simultaneously adopts average pooling and maximum pooling in the feature aggregation process, and integrates the advantages of two pooling modes, so that the finally aggregated features are richer in feature content.

In the previous example, it was described how feature aggregation was done combining maximum pooling with average pooling. In this example, it is described how feature aggregation is accomplished using attention pooling.

Specifically, firstly, according to the position information of the neighborhood point, the position information of the second sampling point and the relative position information between the neighborhood point and the second sampling point, the attention coefficient vector of the neighborhood point is determined.

The formula for calculating the attention coefficient vector is:

Q _j ＝MLP(||x _j -x _i ||，x _i ，x _j ，x _j -x _i ) (10)

wherein Q is _j ＝(Q _jh |h＝1，2，......，n)∈R ⁿ The attention coefficient vector representing the jth neighborhood point.

Then, the same intra-channel normalization is performed on each attention coefficient vector of k neighborhood points by a softmax function, namely:

the expression of the normalized attention coefficient vector is:

Q′ _j ＝(Q′ _jh |h＝1，2，......，n)∈R ⁿ 。

high order feature y of each neighborhood point _j And its attention coefficient vector Q' _j And carrying out vector multiplication and obtaining high-order characteristic information of the sampling point after the activation function processing.

The corresponding calculation formula is as follows:

f _i ’＝(f’ _ih |h＝1，2，......，n)∈R ⁿ ；

f’ _ih ＝δ[y _j ·(Q′ _j ) ^T ]∈R (12)

f _i 'is high-order characteristic information of sampling point, f' _ih The characteristic value of the high-order characteristic information of the sampling point in the h dimension is obtained.

The formula (12) is the core of attention pooling, which performs weighted summation on the high-order features of k domain points on each channel according to a certain attention coefficient, in this way, all the neighborhood features of each channel participate in the calculation, and the defects that maximum pooling only focuses on the most significant features and average pooling only focuses on the whole features are avoided.

According to the point cloud semantic segmentation method, in the feature aggregation process, attention vectors are utilized for feature aggregation, all neighborhood features of each channel are involved in calculation in the feature aggregation mode, and the defects that maximum pooling only pays attention to the most significant features and average pooling only pays attention to the overall features are avoided.

Based on any one of the foregoing embodiments, in this embodiment, the generating a first convolution weight of the neighborhood point according to the relative position information includes:

In the previous embodiment, when calculating the first convolution weight, the relative position information between the sampling point and its neighboring point is first calculated, then the relative position information is spliced with the difference between the neighboring point position information, the sampling point position information, the neighboring point position information and the sampling point position information, and finally the channel of the splicing result is mapped to the sampling point characteristic f through a shared MLP _i The channels are aligned.

In the above embodiment, the shared MLP uses a single convolution kernel. The use of a single convolution kernel limits the expressive power of the network since different samples have different responses to different convolution kernels. Thus, in this embodiment, the shared MLP employs a plurality of parallel convolution kernels, any one of which has a respective attention coefficient.

And respectively performing feature mapping on the splicing result by a plurality of parallel convolution kernels in the shared MLP, and then combining the results of the feature mapping based on the attention coefficients corresponding to different convolution kernels to obtain a first convolution weight of the neighborhood point.

The corresponding calculation formula can be expressed as:

according to the formula, the MLP has k parallel convolution kernels, MLP ^z Representing the MLP adopting the z-th convolution kernel, wherein z is more than or equal to 1 and less than or equal to k; a is _z Denotes the attention coefficient, a, corresponding to the z-th convolution kernel ₁ +a ₂ +…+a _k ＝1；

The convolution results obtained by the MLP calculation using the z-th convolution kernel are shown, and the convolution results obtained by the MLP calculation corresponding to the k convolution kernels are added.

The attention coefficient can be calculated by the following formula:

a＝(a ₁ ，a ₂ ，...，a _k )＝softmax(MLP(||x _j -x _i ||，x _i ，x _j ，x _j -x _i )) (15)

the point cloud semantic segmentation method provided by the invention adopts the MLP with a plurality of convolution kernels when calculating the convolution weight for the neighborhood points, thereby expanding the expression capability of the network.

Based on any one of the above embodiments, in this embodiment, the obtaining, according to the high-order feature information of the multiple sampling points, the high-order feature information of each point in the point cloud of the target object includes:

and carrying out reverse interpolation, feature connection and feature mapping on the data of the plurality of sampling points to obtain high-order feature information of each point in the point cloud of the target object.

When semantic segmentation is performed on the point cloud, point-by-point detection needs to be performed on each point in the point cloud. In the previous step, only the high-order feature information of a plurality of sampling points (i.e. some points in the point cloud) is obtained. Therefore, it is necessary to propagate the high-order feature information of the sampling points from a plurality of sampling points (i.e., some points in the point cloud) to all the points in the point cloud, so as to obtain the high-order feature information of each point in the point cloud. In the present embodiment, a propagation process of high-order feature information is explained.

Firstly, a plurality of finally obtained sampling points are subjected to reverse interpolation to obtain interpolation points and characteristic information of the interpolation points.

Fig. 4 is a schematic diagram of interpolation involved in the point cloud semantic segmentation method provided by the present invention, and as shown in fig. 4, when performing interpolation operation, first determine the position of an interpolation point, then determine m points nearest to the interpolation point according to the position of the interpolation point, and perform weighted summation on feature information of the m points nearest to the interpolation point, thereby obtaining feature information of the interpolation point.

The corresponding calculation is as follows:

wherein f is ^(h) (x) Is a feature of the h channel of the interpolation point, f _j ^(h) Is a feature of the h channel of the j-th neighborhood of the interpolation point, d (x, x) _j ) Is the distance between the coordinate of the interpolated point and the jth point of the m points nearest to it, d _j (x) And weighting the characteristic of the nearest jth neighbor point, wherein the characteristic weighting is the weighting coefficient of the neighbor points of the to-be-interpolated point, namely the ratio of the neighbor points of the to-be-interpolated point to be added. As can be seen from equation (17), the farther a point is from the interpolation point, the smaller the feature contribution to the interpolation point x.

After the interpolation points and the feature information of the interpolation points are obtained, the upsampling features (interpolation features) are connected with the features from the layers with the same resolution through Skip-connect, feature mapping is carried out through shared MLP, namely reverse interpolation is carried out to obtain the shared MLP, and the process is repeated until the high-order feature information of each point in the point cloud of the target object is obtained.

Through the operation, the fine category characteristics of each point in the point cloud can be restored layer by layer. How to perform residual concatenation and how to perform feature mapping using shared MLPs are common knowledge of those skilled in the art, and therefore, the description is not repeated here.

According to the point cloud semantic segmentation method, the high-order feature information of each point in the point cloud of the target object is obtained through the high-order feature information of the plurality of sampling points through reverse interpolation, feature connection and feature mapping, and the high-order feature information can reflect deep semantic information of the point cloud, so that the accuracy of a point cloud semantic segmentation result obtained by the point cloud semantic segmentation method is higher.

The point cloud semantic segmentation device provided by the invention is described below, and the point cloud semantic segmentation device described below and the point cloud semantic segmentation method described above can be referred to in a corresponding manner.

Fig. 5 is a schematic diagram of a point cloud semantic segmentation apparatus provided by the present invention, and as shown in fig. 5, the point cloud semantic segmentation apparatus provided by the present invention includes:

a sampling and feature extraction module 510, configured to sample and extract features of a point cloud of a target object, so as to obtain high-order feature information of multiple sampling points; the high-order characteristic information of the sampling point is obtained based on relative characteristic information and relative position information between the sampling point and surrounding neighborhood points of the sampling point and density information of the neighborhood points;

the semantic segmentation module 520 is configured to obtain high-order feature information of each point in the point cloud of the target object according to the high-order feature information of the plurality of sampling points; and performing semantic segmentation on the point cloud of the target object according to the high-order characteristic information of each point in the point cloud of the target object.

The point cloud semantic segmentation device provided by the invention can obtain high-order characteristic information of a sampling point on the basis of relative characteristic information, relative position information and density information of neighborhood points between the sampling point and the neighborhood points by sampling and extracting characteristics of the point cloud; further, the high-order characteristic information of the sampling points is transmitted to each point in the point cloud, so that the semantic segmentation of the point cloud can be realized according to the high-order characteristic information of each point in the point cloud. The high-order characteristic information can reflect deep semantic information of the point cloud, so that the accuracy of a point cloud semantic segmentation result obtained by the point cloud semantic segmentation device is higher.

Fig. 6 is a schematic structural diagram of an electronic device provided in the present invention, and as shown in fig. 6, the electronic device may include: a processor (processor) 610, a communication Interface (Communications Interface) 620, a memory (memory) 630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a point cloud semantic segmentation method comprising:

sampling and extracting characteristics of point clouds of a target object to obtain high-order characteristic information of a plurality of sampling points; the high-order characteristic information of the sampling point is obtained based on relative characteristic information and relative position information between the sampling point and surrounding neighborhood points of the sampling point and density information of the neighborhood points;

In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer is capable of executing the point cloud semantic segmentation method provided by the above methods, the method includes:

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the point cloud semantic segmentation method provided above, the method comprising:

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A point cloud semantic segmentation method is characterized by comprising the following steps:

obtaining high-order characteristic information of each point in the point cloud of the target object according to the high-order characteristic information of the plurality of sampling points; performing semantic segmentation on the point cloud of the target object according to the high-order characteristic information of each point in the point cloud of the target object;

the method for sampling and extracting the features of the point cloud of the target object to obtain the high-order feature information of a plurality of sampling points comprises the following steps:

2. The point cloud semantic segmentation method according to claim 1, wherein the determining the high-order feature information of the neighborhood points according to the relative feature information between the neighborhood points and the second sampling points and the convolution weights of the neighborhood points comprises:

calculating the density of the neighborhood points in the point cloud;

and calculating high-order characteristic information of the neighborhood points according to the relative characteristic information between the neighborhood points and the second sampling points and the convolution weights of the neighborhood points.

3. The point cloud semantic segmentation method according to claim 1, wherein the performing feature aggregation on the high-order feature information of the neighborhood points to obtain the high-order feature information of the second sampling point includes:

4. The point cloud semantic segmentation method according to claim 1, wherein the performing feature aggregation on the high-order feature information of the neighborhood points to obtain the high-order feature information of the second sampling point includes:

5. The point cloud semantic segmentation method according to claim 2, wherein the generating of the first convolution weight of the neighborhood point according to the relative position information includes:

6. The point cloud semantic segmentation method according to claim 1, wherein the obtaining the high-order feature information of each point in the point cloud of the target object according to the high-order feature information of the plurality of sampling points comprises:

7. A point cloud semantic segmentation apparatus, comprising:

the semantic segmentation module is used for obtaining high-order characteristic information of each point in the point cloud of the target object according to the high-order characteristic information of the plurality of sampling points; performing semantic segmentation on the point cloud of the target object according to the high-order characteristic information of each point in the point cloud of the target object;

the method for sampling and extracting the point cloud of the target object to obtain the high-order characteristic information of a plurality of sampling points comprises the following steps:

8. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the point cloud semantic segmentation method of any one of claims 1 to 6.

9. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the point cloud semantic segmentation method according to any one of claims 1 to 6.