CN118018766A

CN118018766A - Point cloud coding method based on geometric sampling

Info

Publication number: CN118018766A
Application number: CN202311873400.9A
Authority: CN
Inventors: 李婕; 徐迪; 李佳; 王金华; 马红军; 王兴伟
Original assignee: 东北大学
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-05-10

Abstract

The invention provides a point cloud coding method based on geometric sampling, and relates to the technical field of information acquisition and network transmission. The method comprises the steps of geometrically downsampling an original point cloud by a voxel grid downsampling method, and sampling the original point cloud to a sparse point cloud; performing encoding storage and decoding processes of point clouds based on MPEG G-PCC; and then adding geometric upsampling, performing geometric upsampling on the sparse point cloud by a point cloud nearest neighbor interpolation upsampling method, and reconstructing the sparse point cloud into a dense point cloud similar to the original point cloud to finish point cloud reconstruction. A lower binary stream file size and a higher compression ratio can be obtained, and also the codec time can be significantly reduced. The method provided by the invention has good variation in the aspect of the point cloud coding and decoding performance, and can cope with the condition of insufficient network bandwidth.

Description

Point cloud coding method based on geometric sampling

Technical Field

The invention relates to the technical field of information acquisition and network transmission, in particular to a point cloud coding method based on geometric sampling.

Background

Under the new scene of information comprehensive and diversified development such as information three-dimensional acquisition technology, sensor technology and algorithm application, the utilization rate and service quality of point cloud information resources are gradually improved, and the cost of information collection is further reduced, so that fine visualization of the information three-dimensional environment can be realized, and the method is applied to emerging industries such as augmented reality, remote communication, intelligent transportation, digital earth and the like. However, because of the characteristics of mass information, unstructured and uneven density of the point cloud, a plurality of problems are brought forward to the big data and the propagation function of the point cloud. The point cloud model contains a large number of nodes, typically hundreds of thousands or tens of millions, and the bandwidth required to transmit a model of 30 frames and 100 tens of thousands of points per second is approximately 3.6 gigabits per second without compression. This will put a great strain on the storage capacity and network transmission bandwidth. Therefore, the method has great theoretical significance and practicability for performing the point cloud compression coding with low bit rate and low distortion under the condition that the storage capacity and the network transmission bandwidth are severely limited.

Literature "w.zhu et al ,'Lossy Point Cloud Geometry Compression via Region-Wise Processing,'in IEEE Transactions on Circuits and Systems for Video Technology,vol.31,no.12,pp.4575-4589,Dec.2021" proposes a method for lossy point cloud geometric compression by region processing, which uses inter-region redundancy to achieve efficient lossy point cloud geometric compression by region processing, and thus region similarity. First dividing a given point cloud into a number of local regions; then, grouping the regions into several discriminant clusters, ensuring that the similarity between clusters is minimized, and simultaneously maximizing the similarity between clusters, and setting a reference region with the maximum similarity score with other clusters in each cluster; finally, the reference region is encoded. According to the method, all point cloud data are segmented to achieve point cloud compression, geometric details can be better reserved through regional processing, structural consistency among local regions is guaranteed, but large bandwidth support is needed, and if the number of point clouds to be encoded is too large or network bandwidth is insufficient, a large amount of generated code stream information brings difficulty to storage and transmission, so that encoding efficiency is low, and a final visual reconstruction effect is affected.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides the point cloud coding method based on geometric sampling, which can not only cope with the shortage of network bandwidth, but also ensure the final reconstruction quality.

In order to solve the technical problems, the invention adopts the following technical scheme:

A point cloud coding method based on geometric sampling comprises geometric downsampling, a point cloud coder and decoder and geometric upsampling; performing geometric downsampling on an original point cloud by a voxel grid downsampling-based method, sampling the original point cloud to a sparse point cloud, reducing the number of points to be encoded, and relieving the network bandwidth pressure; performing encoding, storage and decoding processes of point clouds based on MPEG G-PCC, wherein the MPEG G-PCC is a geometric-based point cloud compression G-PCC proposed by MPEG; and then adding geometric upsampling, performing geometric upsampling on the sparse point cloud by a point cloud nearest neighbor interpolation upsampling method, and reconstructing the sparse point cloud into a dense point cloud similar to the original point cloud to finish point cloud reconstruction.

Further, the geometric downsampling specifically comprises the following steps:

step 1.1: placing the original point cloud data into a 3D voxel grid, and marking the voxel where each point is located as accessed;

Step 1.2: traversing all voxels, selecting a point closest to the center of the voxel as a representative point for the accessed voxels, and reserving the point which is not accessed voxels;

Step 1.3: and forming a new point cloud by all the selected representative points, and taking the new point cloud as a point cloud result after downsampling.

Further, the specific method for performing the point cloud coding based on the MPEG G-PCC comprises the following steps:

step 2.1: preprocessing the point cloud geometric information, including coordinate transformation, quantization and repeated point removal;

Firstly, converting original coordinates of the point cloud into normalized coordinates which can be processed by a geometric encoder; carrying out certain scale quantization on the geometric coordinates of the point cloud so as to achieve the effect of geometric lossy coding; after quantization and rounding, duplicate points are deleted, and only one point exists in the same geometric position;

step 2.2: the point cloud geometric coding is divided into octree coding and prediction tree coding;

step 2.3: color space conversion, converting RGB color space into YCbCr or YCoCg color space, improving the correlation of data;

Step 2.4: re-coloring;

Step 2.5: performing region self-adaptive hierarchical transformation;

Dividing an image into small blocks, carrying out hierarchical transformation on each small block, representing the small blocks as coefficients composed of local basis functions, and quantizing and encoding the coefficients to realize image compression;

Step 2.6: generating a level of detail;

Step 2.7: a predictive transform and a lifting transform;

step 2.8: residual quantization and arithmetic coding;

and comparing the original point cloud with the reconstructed point cloud, calculating residual information between each point, namely the difference between the actual coordinates and the reconstructed coordinates, discretizing the residual value, converting the discretized residual value into a binary code stream, and carrying out arithmetic coding on the non-zero quantized residual.

Further, the specific method of octree coding in step 2.2 is as follows:

The whole point cloud is located in a cube bounding box, the point cloud is recursively divided to construct an octree, each cube is uniformly divided into eight subcubes, and eight subcubes correspond to eight subnodes; allocating a bit of a flag bit to each subcube, wherein '1' indicates that the data point can be further divided, and '0' indicates that the data point is not divided; eight child nodes of the father node determine the internal real occupation situation by the flag bit and form a byte information for coding; if the number of points in the subcubes is greater than 1, the non-empty subcubes are continuously divided until only one point is left or a predetermined depth is reached;

The specific method for coding the prediction tree in the step 2.2 is as follows:

Dividing the point cloud data, wherein each small block comprises a plurality of points; the previous data point is used as a reference, and the current point is coded according to the prediction error by predicting the position and attribute information of the current point.

Further, in the step 2.3, when the lifting transform coding is selected, converting the RGB color space into the YCbCr color space; when predictive transform lossless coding is selected, the RGB color space is converted into YCoCg color space.

Further, the specific method in the step 2.4 is as follows:

Step 2.4.1: for each point in the reconstructed point cloud Searching the nearest point from the input point cloud, and finding the nearest neighbor/>, in the input point cloud, of X _n And assign its attributes to/>By finding nearest neighbors/>Will/>Mapping the attributes of the point cloud to corresponding points in the input point cloud, so as to obtain a point cloud model with the same attributes;

Step 2.4.2: for each point X _n in the input point cloud, searching the reconstructed point cloud for a point closest to X _n if there is more than one point in the input point cloud that is the nearest neighbor in the reconstructed point cloud These points form a set of points Q ⁺ (n) with the nearest neighbor as the core;

Step 2.4.3: if Q ⁺ (n) is an empty set, let the reconstruction point Attribute/>If Q ⁺ (n) is not an empty set, then calculate the reconstruction point/>Attribute/>The formula is as follows:

Wherein H (n) is the number of elements of the point set Q ⁺ (n); Is an element in the point set Q ⁺ (n).

Further, the specific method of the step 2.6 is as follows:

step 2.6.1: setting all points in the point cloud to be in an unaccessed state, and adding the points to an unaccessed point set NV; meanwhile, defining a set V of points which have been accessed as an empty set;

Step 2.6.2: for the first iteration l=0, find the first point in the point set NV and add it to the first refinement level R ₀ and the point set V, and at the same time delete the point from NV to show that this point has been accessed and added to the current LOD set;

Step 2.6.3: for other iterations 1.ltoreq.l.ltoreq.L-1, traversing all points in the non-access point set NV, calculating the minimum distance D between the current point and the point set V, and if D.ltoreq.d _l, removing the current point from the NV and adding the current point into the first refinement level R _l and the point set V; if D < D _l, ignoring the current point and continuing to traverse the non-access point set NV; wherein d _l is the minimum distance threshold; after all points in the non-access point set NV are traversed, the refinement level R _l is generated; let l=l+1, repeat the above steps to continue generating the next refinement level until the NV is terminated when empty.

Further, the prediction transformation in the step 2.7 specifically includes:

Selecting the coded points to predict the current point according to the recombination sequence of the points in the LOD generation process, and searching the nearest neighbor of the current point in the previous i-1 points only; for the i-th point to be encoded, its predicted attribute value is shown as follows:

Wherein, Is the predicted attribute value of the ith point to be encoded; /(I)The attribute values corresponding to j neighboring points; n _i is a set of k adjacent points in space corresponding to the ith point; delta _j is the Euclidean distance of the ith point from its jth neighbor;

the lifting transformation in the step 2.7 comprises three parts, namely segmentation, prediction and updating;

The dividing part performs space division on the input point cloud data and divides the input point cloud data into a high-level point cloud H (N) and a low-level point cloud L (N); the prediction link predicts the attribute of the high-level point cloud by using the attribute of the low-level point cloud, and the attribute value of a certain point is predicted by using the attribute value of a nearby point due to the local correlation of the high-level point cloud and the low-level point cloud in space under the action of a reasonable predictor, so as to obtain a prediction residual error; in the updating link, updating is carried out according to the prediction residual error, and the attribute information of the original low-level point cloud is improved.

Further, the specific method for geometric upsampling is as follows:

step 3.1: preprocessing point cloud data, and calculating normal vector of each point in the point cloud;

Step 3.2: traversing each point in the original point cloud;

Step 3.3: for each point, searching for its adjacent point;

step 3.4: after finding the adjacent point, generating a new point by using an interpolation method;

step 3.5: adding the newly generated points into the up-sampled point cloud;

step 3.6: repeating steps 3.3 to 3.5 until all the original points are processed.

The beneficial effects of adopting above-mentioned technical scheme to produce lie in: according to the point cloud coding method based on geometric sampling, geometric downsampling is added before coding, downsampling is carried out on an original point cloud, downsampling sparse point cloud with fewer points is obtained, and subsequent coding pressure is reduced; transmitting the sparse point cloud to a point cloud encoder, and performing encoding storage and decoding processes of the point cloud by adopting an encoding mode based on MPEG G-PCC; finally, the sparse point cloud is reconstructed via a geometric upsampling module to a dense point cloud that approximates the original point cloud. Compared with the direct coding point cloud, the sampling process greatly reduces the number of points to be coded, improves the coding efficiency and relieves the pressure of network bandwidth. Compared with the traditional MPEG G-PCC method, the point cloud coding method based on geometric sampling can obtain lower binary code stream file size and higher compression ratio. Compared with TMC13 algorithm, the method of the invention can obviously reduce the size of binary code stream, reduce 2/3 on average, and increase the compression ratio by two times. Besides, the method of the invention can also obviously reduce the encoding and decoding time, the overall encoding and decoding time is reduced from 10.55s to 2.98s, the actual network transmission time is also shorter, and the average reduction is 8ms. In summary, the method provided by the invention has good performance in terms of the point cloud coding and decoding performance, and can cope with the situation when the network bandwidth is insufficient.

Drawings

FIG. 1 is a diagram of a G-PCC encoding framework provided by an embodiment of the present invention;

FIG. 2 is a flowchart of a lifting transformation provided in an embodiment of the present invention;

fig. 3 is a schematic diagram of arithmetic coding according to an embodiment of the present invention.

Detailed Description

The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.

The method of the present embodiment includes three modules: geometric downsampling, point cloud codecs, and geometric upsampling. The number of points to be encoded is reduced through a geometric downsampling process, the network bandwidth pressure is relieved, geometric upsampling is added after encoding, and point cloud reconstruction is completed through upsampling of sparse point clouds, so that good decoding quality is provided, and the method is specifically described below.

Step 1: and geometrically downsampling the original point cloud by adopting a voxel grid downsampling-based method.

The voxel grid downsampling can effectively compress the point cloud data volume, improve the calculation efficiency and the storage efficiency, and can keep better point cloud structure information.

Step2: the flow chart is shown in fig. 1, based on point cloud encoding of MPEG G-PCC. The MPEG is commonly referred to as Moving Picture Expert Group, the "moving Picture experts group", the G-PCC is commonly referred to as Geometry-based Point Cloud Compression, and the "Geometry-based Point cloud compression".

Step 2.1: the point cloud geometric information preprocessing comprises coordinate transformation, quantization and repeated point removal.

Step 2.1.1: firstly, converting original coordinates of the point cloud into normalized coordinates which can be processed by a geometric encoder;

Step 2.1.2: carrying out certain scale quantization on the geometric coordinates of the point cloud so as to achieve the effect of geometric lossy coding;

step 2.1.3: after quantization and rounding, a plurality of neighboring points are quantized to the same point, so that repeated points need to be deleted, and only one point exists in the same geometric position.

Step 2.2: point cloud geometric coding is divided into octree coding and prediction tree coding.

The specific method for octree coding is as follows:

The whole point cloud is located in a cube bounding box, the point cloud is recursively divided to construct an octree, each cube is uniformly divided into eight subcubes, and eight subcubes correspond to eight subnodes; allocating a bit of a flag bit to each subcube, wherein '1' indicates that the data point can be further divided, and '0' indicates that the data point is not divided; eight child nodes of the father node determine the internal real occupation situation by the flag bit and form a byte information for coding; if the number of points in the subcubes is greater than 1, the non-empty subcubes are continuously partitioned until only one point remains or a predetermined depth is reached.

The specific method for predicting the tree coding is as follows:

Step 2.3: color space conversion.

Converting the RGB color space to YCbCr or YCoCg color space improves the correlation of the data, making the data easier to compress. In this embodiment, the RGB color space is converted into the YCbCr color space when the lifting transform coding is selected, and the RGB color space is converted into the YCoCg color space when the predictive transform lossless coding is selected, so that the compression efficiency of the attribute can be significantly improved.

Step 2.4: and (5) re-coloring.

Step 2.4.1: for each point in the reconstructed point cloudSearching the nearest point from the input point cloud, and finding the nearest neighbor/>, in the input point cloud, of X _n And assign its attributes to/>By finding nearest neighbors/>Can/>Mapping the attributes of the point cloud to corresponding points in the input point cloud, so as to obtain a point cloud model with the same attributes;

Step 2.5: and (5) region adaptive hierarchical transformation.

Step 2.5.1: dividing the image into small blocks;

Step 2.5.2: performing hierarchical transformation on each small block, and representing the small blocks as coefficients consisting of local basis functions;

step 2.5.3: and then the coefficients are quantized and encoded, thereby realizing image compression.

Step 2.6: a level of detail is generated.

Step 2.6.2: for the first iteration l=0, the first point is found in the set of points NV and added to the first refinement levels R ₀ and V, while the point is deleted from NV to show that this point has been accessed and added to the current LOD set. This is the second step of generating a level of detail, namely generating a first level of refinement R ₀, and adding a first point thereto;

Step 2.7: prediction transforms and lifting transforms.

The predictive transformation is specifically: selecting the coded points to predict the current point according to the recombination sequence of the points in the LOD generation process, and searching the nearest neighbor of the current point in the previous i-1 points only; for the i-th point to be encoded, its predicted attribute value is shown as follows:

Wherein, Is the predicted attribute value of the ith point to be encoded; /(I)The attribute values corresponding to j neighboring points; n _i is a set of k adjacent points in space corresponding to the ith point; delta _j is the Euclidean distance of the ith point from its jth adjacent point.

The lifting transform comprises three parts, segmentation, prediction, update, respectively, as shown in fig. 2. The dividing part performs space division on the input point cloud data and divides the input point cloud data into a high-level point cloud H (N) and a low-level point cloud L (N); the prediction link predicts the attribute of the high-level point cloud by using the attribute of the low-level point cloud, and the attribute value of a certain point is predicted by using the attribute value of a nearby point due to the local correlation of the high-level point cloud and the low-level point cloud in space under the action of a reasonable predictor, so as to obtain a prediction residual error; in the updating link, updating is carried out according to the prediction residual error, and the attribute information of the original low-level point cloud is improved.

Step 2.8: residual quantization and arithmetic coding.

Step 2.8.1: comparing the original point cloud with the reconstructed point cloud;

step 2.8.2: calculating residual information between each point, namely the difference between the actual coordinates and the reconstructed coordinates;

step 2.8.3: discretizing the residual error value and converting the residual error value into a binary code stream;

step 2.8.4: the non-zero quantized residual is arithmetically encoded as shown in fig. 3.

For a non-zero quantized residual, the positive and negative of the residual need to be judged first. If the quantized residual error is smaller than zero, setting a coding flag bit sign=1 to represent a negative number; if the quantized residual is zero or more, sign=0 is set, indicating a positive number. After determining whether the quantized residual is negative, the encoding mode adopted depends on the magnitude of the quantized residual, so two flag bits isZero and isOne are introduced to judge and encode the magnitude of the quantized residual.

Specifically, isZero =1 and isone=0 are set when the quantized residual amplitude is 0; when the quantized residual amplitude is 1, isZero =0 and isone=1 are set; and processes the magnitude as an input to arithmetic coding.

Step 3: and geometrically upsampling the sparse point cloud by adopting a point cloud nearest neighbor interpolation upsampling method.

Step 3.2: traversing each point in the original point cloud;

Step 3.3: for each point, searching for its adjacent point;

step 3.5: adding the newly generated points into the up-sampled point cloud;

The up-sampling of the nearest neighbor interpolation of the point cloud can keep the attribute information of the original point cloud, and can reduce information loss and errors.

According to the embodiment, the geometric downsampling operation is added before encoding, and downsampling is carried out on the original point cloud, so that downsampling sparse point cloud with fewer points is obtained, and subsequent encoding pressure is reduced; transmitting the sparse point cloud to a point cloud encoder, and performing encoding storage and decoding processes of the point cloud by adopting an encoding mode based on MPEG G-PCC; finally, the sparse point cloud is reconstructed via a geometric upsampling module to a dense point cloud that approximates the original point cloud. Compared with the direct coding point cloud, the sampling process greatly reduces the number of points to be coded, improves the coding efficiency and relieves the pressure of network bandwidth.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions, which are defined by the scope of the appended claims.

Claims

1. A point cloud coding method based on geometric sampling is characterized by comprising the following steps of: including geometric downsampling, point cloud codecs, and geometric upsampling; performing geometric downsampling on an original point cloud by a voxel grid downsampling-based method, sampling the original point cloud to a sparse point cloud, reducing the number of points to be encoded, and relieving the network bandwidth pressure; performing encoding, storage and decoding processes of point clouds based on MPEG G-PCC, wherein the MPEG G-PCC is a geometric-based point cloud compression G-PCC proposed by MPEG; and then adding geometric upsampling, performing geometric upsampling on the sparse point cloud by a point cloud nearest neighbor interpolation upsampling method, and reconstructing the sparse point cloud into a dense point cloud similar to the original point cloud to finish point cloud reconstruction.

2. The point cloud encoding method based on geometric sampling as claimed in claim 1, wherein: the geometric downsampling specifically comprises the following steps:

3. The point cloud encoding method based on geometric sampling as claimed in claim 1, wherein: the specific method for carrying out the point cloud coding based on the MPEG G-PCC comprises the following steps:

Step 2.4: re-coloring;

Step 2.5: performing region self-adaptive hierarchical transformation;

Step 2.6: generating a level of detail;

Step 2.7: a predictive transform and a lifting transform;

step 2.8: residual quantization and arithmetic coding;

4. A point cloud encoding method based on geometric sampling as claimed in claim 3, wherein: the specific method for octree coding in the step 2.2 is as follows:

5. A point cloud encoding method based on geometric sampling as claimed in claim 3, wherein: in the step 2.3, when the lifting transformation coding is selected, converting the RGB color space into the YCbCr color space; when predictive transform lossless coding is selected, the RGB color space is converted into YCoCg color space.

6. A point cloud encoding method based on geometric sampling as claimed in claim 3, wherein: the specific method of the step 2.4 is as follows:

7. A point cloud encoding method based on geometric sampling as claimed in claim 3, wherein: the specific method of the step 2.6 is as follows:

8. A point cloud encoding method based on geometric sampling as claimed in claim 3, wherein: the prediction transformation in the step 2.7 specifically comprises the following steps:

9. The point cloud encoding method based on geometric sampling as claimed in claim 1, wherein: the specific method for geometric upsampling comprises the following steps:

Step 3.2: traversing each point in the original point cloud;

Step 3.3: for each point, searching for its adjacent point;

step 3.5: adding the newly generated points into the up-sampled point cloud;