GB2620453A - Method and apparatus for compression and encoding of 3D dynamic point cloud - Google Patents

Method and apparatus for compression and encoding of 3D dynamic point cloud Download PDF

Info

Publication number
GB2620453A
GB2620453A GB2210096.0A GB202210096A GB2620453A GB 2620453 A GB2620453 A GB 2620453A GB 202210096 A GB202210096 A GB 202210096A GB 2620453 A GB2620453 A GB 2620453A
Authority
GB
United Kingdom
Prior art keywords
predictor
point
prediction functions
bitstream
predictors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
GB2210096.0A
Other versions
GB202210096D0 (en
Inventor
Le Floch Hervé
Ouedraogo Naël
Tannhauser Falk
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to GB2210096.0A priority Critical patent/GB2620453A/en
Publication of GB202210096D0 publication Critical patent/GB202210096D0/en
Priority to PCT/EP2023/068942 priority patent/WO2024008968A1/en
Publication of GB2620453A publication Critical patent/GB2620453A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/004Predictors, e.g. intraframe, interframe coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/107Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Disclosed is method of encoding a dynamic 3D point cloud. The dynamic cloud being a sequence of 3D point clouds in a bitstream. The method starts by determining a set of respective prediction functions and they indexes for the point cloud, determining a set of predictors based on the set of prediction functions, and then selecting a predictor for the current 3D point among the set of predictors based on a cost evaluation for each predictor. Then the bitstream is encoded for reconstructing the set of prediction functions, the index of the predictor selected and the residual in the bitstream. The prediction functions may include INTRA and/or INTER prediction functions and the encoding may include an indication of whether the predictor is an INTRA or an INTER function. The cost for each predictor may be evaluated by storing a best cost value after evaluating each predictor, the best cost value being the lowest cost for the previously evaluated predictors, and then terminating early the evaluation when the cost reaches the stored best value.

Description

METHOD AND APPARATUS FOR COMPRESSION AND ENCODING OF 3D DYNAMIC POINT CLOUD
FIELD OF THE INVENTION
The present disclosure concerns a method and a device for compression and encoding of 3D point cloud. It concerns more particularly 3D dynamic point cloud. A 3D dynamic point cloud is a temporal sequence of 3D point clouds. Each 3D point cloud comprising a variable number of 3D points with variable 3D positions.
BACKGROUND OF INVENTION
In particular, the 3D point cloud can be a set of 3D points captured by a LIDAR located on the top of a car. For example, the LIDAR can be a rotating LIDAR containing different elevation lasers with a horizontal rotation along the vertical dimension.
For compressing 3D points, a standard called G-PCC V1 (ISO/IEC FDIS 23090- 9:2022(E)) has emerged which proposes to encode 3D points based on INTRA prediction. However, recent improvements over the standard are proposed in a MPEG document called Technology Under Consideration ("ISO/IEC JTC 1/SC 29/VVG 7 N00281", called herein TUC). In particular, the TUC document proposes to use INTER prediction, prediction between consecutive frames, a frame being in the present case a 3D point cloud associated with a same capture instant, by analogy to the video compression techniques where the frame is a picture associated with a capture instant in a video sequence.
The compression (or encoding, both terms being used in this document referring to the same techniques) techniques proposed in these two documents are based on predictive encoding where the encoding of a particular point is based on the identification of a predictor, and the encoding of a residual constituted by the difference between the current point and the 3D coordinates of a previously encoded point (the predictor). As the encoding may be destructive, the residual is actually computed as the difference of the current point and the decoded version of the predictor. INTRA modes correspond to modes where the predictor is chosen in the current frame among the previously encoded points, meaning the current 3D point cloud. INTER modes correspond to modes where the predictor is chosen in a reference frame, meaning a previously encoded 3D point cloud.
Improvements in term of coding efficiency may be advantageous for both INTRA 35 and INTER modes.
SUMMARY OF THE INVENTION
The present invention has been devised to address one or more of the foregoing concerns. It concerns different improvements over techniques defined in the standard and the TUC propositions. In particular, it is proposed to increase the number of predictors for both INTRA and INTER modes, to provide a variable number of predictors, to encode the number of INTRA and/or INTER predictors, to change the coding of the used INTRA and/or INTER predictors, to signal in the bitstream the used prediction functions among a list of prediction functions as detailed below, and to change the encoding of the index of the predictor determined by the encoder.
According to a first aspect of the invention there is provided a method of encoding a 3D dynamic point cloud, comprising a sequence of 3D point clouds, each 3D point cloud comprising a set of 3D points, in a bitstream, wherein the method comprises for encoding a current 3D point: - determining a set of prediction functions; determining a set of predictors based on the set of respective prediction functions; - selecting a predictor for the current 3D point among the set of predictors based on a cost evaluated for each predictor; and encoding in the bitstream information for reconstructing the set of prediction functions, the index of the predictor selected in the set of predictors and the residual in the bitstream.
In an embodiment, the set of prediction functions comprises a set of INTRA prediction functions and/or a set of INTER prediction functions.
In an embodiment, the method further comprises: -encoding in the bitstream an indication for indicating whether the selected predictor is based on an INTRA or an INTER prediction function.
In an embodiment, the information for reconstructing the set of prediction functions comprises a number of INTRA prediction functions.
In an embodiment, the information for reconstructing the set of prediction functions comprises a number of INTER prediction functions In an embodiment, the set of INTER prediction functions is determined as a subset of an indexed list of INTER prediction functions.
In an embodiment, the information for reconstructing the set of prediction functions comprises indexes in the indexed list of INTER prediction functions of the INTER prediction functions of the subset.
In an embodiment, the information for reconstructing the set of prediction functions comprises a maximum index in the indexed list of INTER predictors of INTER prediction functions in the subset, the set of INTER predictors corresponding to the first INTER prediction functions in the indexed list of INTER prediction functions up to the maximum index.
In an embodiment, the index of the selected predictor is represented in the bitstream according to a truncated unary code.
In an embodiment, the truncated unary code is encoded with an arithmetic encoder.
In an embodiment, the selected predictor is the predictor for which the encoding cost is the lowest.
In an embodiment, evaluating the cost for a predictor comprises: storing a best cost value after evaluating each predictor, the best cost value corresponding to the lowest cost for the previously evaluated predictors; and early terminating the evaluation of a predictor as soon as its cost reach the stored best cost value.
According to another aspect of the invention there is provided a method for decoding a 3D dynamic point cloud, comprising a sequence of 3D point clouds, each 3D point cloud comprising a set of 3D points, in a bitstream, wherein the method comprises for decoding a current 3D point: obtaining from the bitstream information for reconstructing a set of prediction functions; determining a set of predictors based on the set of respective prediction functions; obtaining from the bitstream an index of a selected predictor in the set of predictors; obtaining from the bitstream the residual, decoding the current 3D point based on the selected predictor and the residual.
According to another aspect of the invention there is provided a computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing a method according to the invention, when loaded into and executed by the programmable apparatus.
According to another aspect of the invention there is provided a computer-readable storage medium storing instructions of a computer program for implementing a method according to the invention.
According to another aspect of the invention there is provided a computer program which upon execution causes the method of the invention to be performed.
According to another aspect of the invention there is provided a device for encoding a 3D dynamic point cloud, comprising a sequence of 3D point clouds, each 3D point cloud comprising a set of 3D points, in a bitstream, wherein the device comprises for encoding a current 3D point a processor configured for: determining a set of prediction functions; determining a set of predictors based on the set of respective prediction functions; selecting a predictor for the current 3D point among the set of predictors based on a cost evaluated for each predictor; and encoding in the bitstream information for reconstructing the set of prediction functions, the index of the predictor selected in the set of predictors and the residual in the bitstream.
According to another aspect of the invention there is provided a device for decoding a 3D dynamic point cloud, comprising a sequence of 3D point clouds, each 3D point cloud comprising a set of 3D points, in a bitstream, wherein the device comprises for decoding a current 3D point a processor configured for: obtaining from the bitstream information for reconstructing a set of prediction functions; determining a set of predictors based on the set of respective prediction functions; obtaining from the bitstream an index of a selected predictor in the set of predictors; obtaining from the bitstream the residual; decoding the current 3D point based on the selected predictor and the residual.
At least parts of the methods according to the invention may be computer implemented. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit", "module" or "system". Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.
Since the present invention can be implemented in software, the present invention can be embodied as computer readable code for provision to a programmable apparatus on any suitable carrier medium. A tangible, non-transitory carrier medium may comprise a storage medium such as a floppy disk, a CD-ROM, a hard disk drive, a magnetic tape device or a solid state memory device and the like. A transient carrier medium may include a signal such as an electrical signal, an electronic signal, an optical signal, an acoustic signal, a magnetic signal or an electromagnetic signal, e.g. a microwave or RF signal.
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments of the invention will now be described, by way of example only, and with reference to the following drawings in which: Figure 1 illustrates an example of LIDAR system; Figure 2 illustrates different compression tools proposed in this standard; Figure 3 illustrates an example of the main steps of a general encoding method for a point cloud and the construction of the associated bitstream; Figure 4 illustrates the main steps of an example of decoding method for decoding a bitstream generated by the encoder; Figure 5 illustrates examples of points obtained by a rotating lidar; Figure 6 illustrates an example of tree construction for a rotating lidar; Figure 7 illustrates the main step of an example of the compression algorithm used for encoding the 3D geometry of the input points; Figure 8a and 8b illustrate the INTER prediction as proposed in a first and second version of the TUC document; Figure 9 is another illustration of the INTER predictive encoding; Figure 10 illustrates a proposed improvement over the previously described solutions; Figure 11 illustrates a second improvement over the previously described 20 method; Figure 12 illustrates the main steps of an example of encoding method using the proposed improvements; Figure 13 illustrates the main steps of an example of method for estimating the encoding cost with early termination; Figure 14 illustrates a block diagram of a device adapted to incorporate the invention.
DETAILED DESCRIPTION OF THE INVENTION
Figure 1 illustrates an example of LIDAR system where the 4 arrows 1000 are representative of 4 laser heads. The 4 heads rotate around the z axis at a given rotation speed. Each head emits laser beam regularly and calculate a 3D position (x,y,z coordinates in a Cartesian system) according to measured feedbacks relative to the laser beam emission. In addition, to these 3D positions, attributes may be present for each point. At each 3D position, a set of attributes can be associated. For example, the attributes can be a color, a normal, reflectance, etc. This kind of LIDAR is called rotating LIDAR, but the invention is not limited to this particular example of LIDAR and may be adapted to any kind of 3D dynamic point clouds.
In the context of MPEG standardization, a compression standard called G-PCC V1 has emerged and is under finalization. Figure 2 illustrates different compression tools proposed in this standard: In step 2000, the initial point cloud is considered. It corresponds to a frame captured at time 't'. This initial point cloud may be optionally pre-processed in step 2001 to quantize the 3D positions of the 3D points. This first step is often called Yoxelizafion'. The voxelization moves slightly the 3D positions of the initial point cloud so that they move to the center of the nearest voxel in a bounding box.
A step 2002, called geometry encoding, consists in encoding the 3D positions of the point cloud. G-PCC proposes 2 alternatives, 2003 and 2004, for the encoding of the geometry: o The first alternative 2003 is based on the construction of an octree and the encoding of the generated octree structure.
o The second alternative 2004 is based on the construction of a tree where each 3D point position is encoded in reference to previously encoded point positions in the tree. This second alternative is called geometry prediction.
Once the geometry is encoded, either from octree-based technics or geometry prediction technics, the geometry is decoded to obtain a decoded 3D positions of the points. The decoded positions, which can be different from the initial positions if the geometry is encoded with the lossy mode, can be re-colored in step 2005.
It means that attributes of the encoded-decoded points can be changed according to the differences between the initial 3D position of a point and its decoded position. For example, the color or normal of each point can be recalculated taking into account the modification between the initial position and decoded position of a point. If the geometry is encoded in lossless mode, there is no need in applying this step 2005. The result of this recoloring is a new set of attributes related to the encoded-decoded 3D points.
After recoloring, the attributes are encoded in a step 2006. G-PCC proposes 2 modes for encoding the attributes called RAHT, step 2007, and LoD/Liffing, step 2008.
Usually, the dense point clouds use the octree-based technics (2003) for geometry encoding step 2002, thus for compressing the positions of the point cloud called the geometry. The sparse point clouds use typically the geometry prediction technics of step 2004. Indeed, geometry prediction is more efficient when dealing with sparse content. LIDAR generated point clouds are often sparse.
This invention is related to step 2004 of geometry prediction.
The contemplated standard G-PCC V1, as illustrated in Figure 2, suffers from drawbacks: One major drawback is the independent temporal compression of the frames of the temporal point cloud. Temporal independency means that the current frame, whose capture starts at time 't', is compressed and encoded independently of the previous frame, whose capture starts at time 't-1'. The redundant information between the two frames cannot be exploited. That is the reason why the new technologies under development in the G-PCC context described in the TUC document, has proposed INTER tools for taking advantage of the temporal redundancy and improving the compression ratio. The INTER tools proposed by the TUC document exist both for geometry encoding (2002) and attributes encoding (2006).
The predictive encoding of a 3D point consists in determining a predictor of the 3D point to encode. The predictor is the decoded version of a previously encoded point. The predictor is in the current frame, meaning the current 3D point cloud, for INTRA encoding. The predictor is in a previously encoded frame, meaning a previously encoded 3D point cloud, named the reference frame, in case of INTER encoding.
The predictor is determined by evaluating a set of predictors and selecting one of the predictors based on this evaluation as the predictor to be used at encoding and decoding,. The determination being made by evaluating a cost function for each predictor corresponding to the rate-distortion of the encoding based on the predictor.
Each predictor is given by a prediction function which is used to determine the predictor. A prediction function may be, for example, a constant function where the predictor is a point of a predetermined value. The prediction function may be based on the location in the tree relatively to the point to encode, for example, the function that determines that the predictor is the direct ancestor in the tree of the point to encode. The prediction function may also correspond to a linear combination of several points defined by their location relatively to the point to encode.
Accordingly, a set of prediction functions are used to determine a corresponding set of predictors, which are evaluated to determine the predictor used to encode the current point.
According to embodiments of the invention, it is proposed to improve this INTER tools in the context of the geometry prediction (2004) by one or a combination of the following: o Adding new INTER prediction functions.
o Signaling in the bitstream the used prediction functions. In a simple implementation, the number of INTER prediction functions used by the encoder is signaled by setting the index of the last used prediction function in a list of ordered prediction functions, or a number of prediction functions to be used in this list.
o To encode the index of the chosen predictor by using a truncated unary code This will be explained in detail below.
The advantage of having new INTER predictors (obtained from new INTER prediction functions) is a better compression efficiency. However, computation time may be increased at the encoder side as the new predictors have to be tested in the rate-distortion optimization. Setting the number of INTER prediction functions in the bitstream enable the codec to choose a low complexity version of the codec. Further, optimizing the compression of the INTER prediction functions indices used by the encoder may be achieved by using a truncated unary code.
A second drawback of the standard is the low efficiency of the INTRA coding tools for geometry compression when the geometry prediction tools (2004) are used. In GPCC-V1, the INTRA geometry prediction/compression tools are based on predicting the 3D position of a current point based on the ancestors in the predictive tree. This way of predicting the current 3D position was not the most efficient solution. That is why new INTRA tools have been proposed in the TUC document.
According to embodiments of the invention, it is proposed to improve these new INTRA tools in the context of the geometry prediction (2004). In particular, it is proposed, separately or in combination: o To have a variable number of INTRA prediction functions (e.g. increasing the number of INTRA prediction functions).
o To signal in the bitstream the maximal number of INTRA prediction functions used by the encoder, which let the possibility to the encoder to optimize or not the number of prediction functions.
o To encode the index of the chosen predictor by using a truncated unary code (the generated code being encoded with an arithmetic encoder with one context per bon index).
It is also possible to use unary code instead of truncated code. But our preferred embodiment is based on truncated unary code.
Increasing the number of INTRA prediction functions enable to improve the compression efficiency. The advantage of adding the maximal number of INTRA prediction functions used by the encoder in the bitstream and therefore having a variable number of INTRA prediction functions is the flexibility at the encoding stage. A low complexity encoder could set a low number of INTRA prediction functions. At the opposite, if compression efficiency is prioritized by the encoder over the computation time, the number of used INTRA prediction functions could be high. Signaling the maximal number of INTRA prediction functions in the bitstream has an advantage in terms of compression especially when the maximal number of INTRA prediction functions is set at a low value. This advantage, obtained by using a truncated unary code, is described in detail below.
Figure 3 illustrates an example of the main steps of a general encoding method for a point cloud and the construction of the associated bitstream. The three main elements are the point cloud/frame 3000 to be encoded, the compression algorithm 3003 which will generate the bitstream 3006.
In a step 3001, the point cloud is divided in slices, which are subsets of the 3D points of the point cloud. Each slice is compressed in step 3005 for the geometry and in step 3007 for the 3D points attributes.
The compression step 3003, comprises a first step 3004 of initialization, a step of generation of the geometry bitstream 3005, followed by the generation of the attributes bitstream 3007.
The generated bitstream 3006 comprises a metadata part 3008, 3009 and 3010 followed by data units 3011-3014. The metadata part comprises a sequence parameter set, SPS, 3008 comprising parameters defined for the whole sequence, a geometry parameter set, GPS, 3009, comprising parameters of the geometry and an attribute parameter set, APS, 3010 comprising parameters of attributes. The data units are of two sorts, geometry data unit, GDU and attributes data unit, ADU. Each data unit is constituted of a header, GDU header 3011 and ADU header 3013, and a payload, GDU data 3012 and ADU data 3014, comprising the raw encoded geometry data and attributes data respectively.
The metadata are typically generated during the initialization step 3004, the GDUs during the geometry bitstream generation step 3005, the ADUs during the attributes bitstream generation step 3007.
Figure 4 illustrates an example of the main steps of a decoding method for decoding a bitstream generated by the encoder as specified in relation with Figure 3. In a first step 4009, the bitstream 4000 is obtained (e.g. read). The metadata comprising the SPS 4001, the GPS 4002 and the APS 4003, are obtained in a step 4010. These metadata are, for example, read and decompressed and then used in a step 4011 for initializing the Raw data decompression step 4012.
For example, according to embodiments of the invention, if the number of INTRA prediction functions and/or the number of used INTER prediction functions are specified in the GPS 4002, this information will be read and will be used for initializing the Raw data decompression step 4012. The raw data decompressor will know how many INTRA prediction functions, or which INTER prediction functions are used and will be able to decode the indices of the prediction functions and to associate them with the right INTER prediction functions and/or the right INTRA prediction functions.
For example, according to embodiments of the invention, if the number of INTRA prediction functions and/or the number of used INTER prediction functions are specified in the GDU header 4004, this information will be read and will be used for initializing the raw data decompression step 4012.
The raw data decompression step 4012 is then able to decode the GDUs, 4004, 4005 and the ADUs, 4006, 4007 based on the metadata.
Figure 5 illustrates examples of 3D points obtained by a rotating lidar. On the left, the lidar 5000 is rotating clock wise on the 'z' axis, based on the reflection of the beam on obstacles, here two cars 5001 and 5002 in front of a wall 5003. The projection on the plane (x', 'y') is illustrated. For each point, its coordinates (x, y, z) are obtained.
The right part illustrates the same points expressed in spherical coordinates where each point is defined by a laser index corresponding to an elevation angle 9 defined by each laser head geometry as illustrated by Figure 1, the azimuthal angle (rotation angle) cp, and the radius r giving the distance between the lidar and the point. Accordingly, each 3D point may be represented by its spherical coordinates (r, p, 0). The geometry prediction method uses mainly the spherical coordinates.
According to G-PCC-V1, the encoding of geometry is now described. As explained in relation to Figure 2, G-PCC proposes two methods for encoding the geometry of the 3D points. The predictive approach starts by defining a prediction structure on the point cloud. Such structure could be described by a prediction tree where each point in the point cloud is associated with a vertex of the tree. Each vertex could be predicted only from its ancestors in the tree. Various prediction strategies are possible. The standard proposes four different strategies for determining a predictor: no prediction, meaning the current point is encoded with a predefined predictor, delta prediction where the predictor is the direct ancestor of the current point in the tree (i.e. p0), linear2 prediction where the predictor is a linear combination of the two ancestors (i.e. 2p0-p1), and linear3 prediction where the predictor is a linear combination of the three ancestors (i.e., 2p0+p1-p2), where p0, p1, p2 are the positions of the parent, grandparent, and grand-grandparent of the current vertex.
The tree structure is encoded by traversing the tree in a depth order and encoding for each vertex the number of its children. The positions of the vertices are encoded by storing the chosen prediction mode and the obtained prediction residuals. Arithmetic coding is used to further compress the generated values. Building the optimal prediction tree is an NP-hard problem. Prior to generating each predictive tree, the input points corresponding to the tree are sorted according to a sorting method. This helps to guide the tree construction process to build a more efficient tree. The sorting methods available are none, morton order, azimuth angle order, and radial distance order. Especially, the azimuthal sorting by rounded azimuth, radius, and elevation generates stable ordering shape and improves the coding efficiency.
In G-PCC's predictive tree, for each node in the prediction tree, one predictor index 'n is encoded in the bitstream. This index points to a selected predictor PR,, among a list of possible predictors, called predictors. When angular mode (spherical coordinates) is used the predictors are PRo, PRI, PR2 and PRo defined as follows: 1) "None": PRo = (Po, Bo), where rm,,, is the minimum radius value (provided in the geometry parameter set), and To and 90 are equal to 0 if the node has no parent, or are equal to (p and el values of the point coded in the parent node.
2) "Delta": PRi = p0 = (ro, 90, 80), where ro, (Po and 90 are respectively the radius r, the azimuthal angle 9 and the laser index 0 values of the parent point p0 coded in the parent node.
3) "Linear2": PR2 = 2*p0-p1, where p0 and p1 are the parent and grandparent points/nodes.
4) "Linear3": PRo = p0+p1-p2, where p0, p1 and p2 are the parent, the grandparent and great grandparent points/nodes.
A prediction residual (fie., (pees, Gee.) is obtained in the encoder by (r105, (Pres, eres) = (r, p, G) -PR e -(0, k (pstee, 0), (1) where PRe, is one of the predictors PRo, PRi, PR2 or PRo, and k is a number of azimuthal angle steps ((Petee) to be added to the prediction. The prediction index 'n' and the number 'k of azimuthal angle steps are encoded in the bitstream for each node, while the value of ((Petee) is encoded in the geometry parameter set by geom_angular azimuth_speed_minust The residual (ries, (p., Gres) is also encoded in the bitstream.
9 can be additionally quantized according to the radius r before being encoded in the bitstream.
In both encoder and decoder side, the coordinates (ree., "dee, edec) of points are retrieved by doing the reverse process: a reconstructed point (ree., Ode.) is obtained by: (rdec, (hee, edec) = PRn + (0, k* (Pstep, 0) + (rres, (Pres,ree, Ores), where (pre... is obtained from the quantized (pres after being inversely quantized.
This way of generating predictors is not the most efficient one. In the TUC document, new prediction functions have been proposed, which are now described.
Instead of using the list of G-PCC prediction functions as previously explained, a list of N predictors is built from a prediction buffer of N pairs of one radius and one azimuthal angle (re, (Pe). The coding of the predictor index is simply performed using a unary coding with one context per bin index.
The derivation of a predictor is performed as follows: If the point being predicted is the first point of the tree (i.e., there is no parent node), the predictor PRo is set equal to (rm,e, 0, 0), the other predictors PRe>o are set equal to (0, 0, 0). If the point has a parent point, the predictor PRo is set equal to (ro, To, GO, where Go is the laser index 0 value of the parent point p0 coded in the parent node, and where (ro, To) is the first pair in the buffer (as will be understood from the buffer management, it is also equal to respectively the radius r, and the azimuthal angle T of the parent point p0 coded in the parent node); the predictors PRe>o are set equal to (re, k* 00), where Go is the laser index 0 value of the parent point p0 coded in the parent node, and where (re, Pe) is the n-th pair in the buffer, and k equals 0 if I Wo -pstep, else k equals the integer division (To - Tstee.
Since it is better to avoid integer division in decoder, (To - / (step is approximated using the divApprox function of G-PCC: k = divApprox(To -T", Tstee, 0).
The buffer used for the predictors' derivation is managed as follows. Each pair of the buffer is first initialized to (0, 0). After the (de)coding of a point, the buffer is updated as follows: If the absolute value of (de)coded radius residual ries is higher than a threshold Th, it is considered that the laser has probed a new object. Then a new element (ro, To) is inserted in front of the buffer, with ro and To the reconstructed radius and the reconstructed azimuthal angle of the (de)coded point. The last element of the buffer is discarded. This is performed by letting the buffer element (re, TO be equal to (re_i, Te_i) for n=3 to 1. Then, setting the first buffer element values from the decoded point.
If the absolute value of (de)coded ries is not higher than the threshold Th, it is considered that the laser has probed an object present in the buffer. Then, the element of the buffer with index predldx, corresponding to the index of the predictor that has been used for the prediction, is moved to the front of the list and is updated with (ro, PO the reconstructed radius and the reconstructed azimuthal angle of the (de)coded point. This is performed by letting the buffer elements (re, TO be equal to (re_1, 9,1) for n=predldx to 1, then, setting the first buffer element values from the decoded point.
Th is equal to ps.predgeom_radius_threshold_for_pred_list and has been fixed in the encoder to 2048 » ps.geom_angular_radius_inv_scale_log2 with '»' being a bit shift.
The parameter Egeom_angular_radius_inv_scale_log2' is an inverse scaling factor applied to radius in predictive geometry coding. It is used to modify the precision in bits of the radius values.
According to embodiments of the invention, the size of the buffer list is modified, thus introducing the possibility to use a variable number of INTRA prediction functions, for example storing/using until 16 INTRA prediction functions instead of 4 prediction functions. This advantageously improves compression efficiency.
As the number of INTRA prediction functions is proposed to be variable, inventors identified that it could be interesting to signal in the bitstream the number of used prediction functions. It has at least 2 advantages: While signaling the number of INTRA prediction functions, the decoder can allocate the right amount of memory for the predictor buffer.
While signaling the number of INTRA prediction functions, a compression gain for the encoding in the bitstream of the predictor indices can be obtained if the encoder decides to use less prediction functions than the maximal number of possible prediction functions specified in the standard. This may be done, for example, for computation time's reason. Indeed, in the prior art, the encoding of the predictor index is done by using a unary coding according to the following scheme: o If predictor 0 is chosen, it is encoded by a binary word '0' o If predictor 1 is chosen, it is encoded by a binary '10' o If predictor 2 is chosen, it is encoded by a binary '110' o If predictor 3 is chosen, it is encoded by a binary '1110' If the encoder uses only 2 prediction functions, each time the predictor '1' is used, 2 bits are encoded (10). However, if the decoder knows in advance that the maximal number of prediction functions is 2 because this information has been provided in the bitstream, the last 0 becomes useless and one bit to encode is saved.
For these reasons, it is proposed, according to embodiments of the invention, to encode the number of used prediction functions (e.g. the size of the predictor list) in the bitstream. The number of used prediction functions can be specified in the bitstream, preferably in the metadata part (SPS or GPS) or in the header of the GDU. For example, if GPS is used, it means to set the maximal number of INTRA predictors used by the encoder in the geometry parameter set.
In these embodiments, if for example the maximal index of the predictor is 3, in other words the prediction list has a size of 3, the predictor indexes 0, 1 and 2 could be encoded according to the following scheme: o If predictor 0 is chosen, it is encoded by a binary word '0'; o If predictor 1 is chosen, it is encoded by a binary '10'; o If predictor 2 is chosen, it is encoded by a binary '11'.
Indeed, as the decoder/arithmetic decoder in charge of reading the indices of the used prediction functions will know by reading the SPS, GPS or GDU header that the maximal number of prediction functions is 3, when reading the binary word '11', the decoder will know after reading 2 bits at 1 that it is the last bit and will stop to read the predictor index. 1 bit will be saved each time the predictor 2 is used for the coding of the 2D position of a 3D point.
If for example the maximal index of the predictor is 4, in other words the prediction list has a size of 4, the predictor indexes 0, 1, 2 and 3 could be encoded according to the following scheme: o If predictor 0 is chosen, it is encoded by a binary word '0'; o If predictor 1 is chosen, it is encoded by a binary '10'; o If predictor 2 is chosen, it is encoded by a binary '110'; o If predictor 3 is chosen, it is encoded by a binary '111'.
Expressed differently, for truncated unary, NPredDelta = 4: predldx 0 4 encoded word: '0' predldx 1 4 encoded word: '10' predldx 2 4 encoded word: '110' predldx 3 4 encoded word: '111' Another example is given below with 2 predictors used by the encoder: - predldx 0 4 encoded word: '0' predldx 1 4 encoded word: '1' predldx is the index of the predictor and only 2 INTRA predictors are used by the encoder.
This kind of bits encoding is called truncated encoding. In short, it is proposed to change the unary coding of the predictor index to a truncated unary coding. The generated words are encoded with arithmetic encoder with one context per bin index.
It is thus proposed according to embodiments of the invention to change the unary coding of the 'predldx' (the index of the predictor inside the buffer list) to a truncated unary coding. In other words, it is proposed to use truncated unary coding (still with one context per bin index) instead of the unary coding for the encoding of the predl ndex (the index of the selected (INTRA) predictor). Advantageously, one bit (the zero corresponding to the maximal predldx value) won't be encoded each time the last predictor of the buffer list is used (e.g. predldx=3 when NPredDelta = 4 with NPredDelta, the size of the buffer list). In other words, the encoder bounds the size of the predictor list to a low value (e.g. number of predictor set to 2 for optimization of the computation times), the truncated unary code can be applied on the last predictor index (and is still decodable).
As previously explained, according to embodiments of the invention, we propose to set the number of INTRA predictors used by the encoder in the geometry parameter set. With this modification, the decoder will know the maximal value of the predictor's indices (by reading the GPS information). This modification enables the decoder to decode the predldx based on truncated unary. With this modification, the bit truncation could be used on the highest used predictor index (the last one of the dynamic predictor list also called buffer list as previously discussed). The decoder will be in capacity to decode the truncated binary code.
Expressing things differently, according to embodiments of the invention, it is proposed to use a truncated unary coding for the coding of predlndex of the INTRA predictors, to set in the Geometry Parameter Set the number of used INTRA predictors and to modify the decoder for reading the new information in the GPS and correctly decode the truncated unary for predldx.
Expressing things in a different way, it is proposed to set the maximal number of INTRA predictors used by the encoder in the geometry parameter set. This modification enables to the decoder: o to decode the predldx based on truncated unary; o to allocate the right memory size for the dynamic list of INTRA predictors (encoder and decoder).
For computation saving, the encoder may decide to use a low number of INTRA predictors (e.g. by using 2 or 3). The truncated code is applied on the last used predl ndex. In such a case, the decoder has to know the number of INTRA predictors in order to decode the predldx (with truncated unary).
Figure 6 illustrates an example of tree construction for a rotating lidar.
Input 3D points 6007 in spherical coordinates (Ri, (pi, 0), T being the index of the point. For rotating Lidar, there is a direct correspondence between elevation 0; and laserld. This is the reason why we use indifferently these 2 words.
Before starting the encoding, the tree is constructed in step 6006. The tree construction consists in sorting the points in an increasing azimuth order fora same given elevation except for a short subset of 3D points. This short subset of points is illustrated with the points 6000 and 6003. As these are the first encoded points for a given (different) elevation, their ancestors have a different elevation and there is no guaranty the azimuth of the ancestor is lower. For example, in the tree 6008, the point 6001 has the point 6000 as parent. The point 6002 has the point 6001 as parent. It means that the azimuth (pl is higher than (p0. The azimuth (p2 is higher than (pl. The point 6003 has the point 6000 as parent because it is the 3D point at elevation 01 with the lowest azimuth. The first point with an elevation higher than 00 (6003) has as parent the point 6000. The point 6004 has the point 6003 as parent. The point 6005 has the point 6004 as parent.
In the TUC document, the generated tree is used for: Successively selecting the 3D points to encode (a point being a node of the tree) Use the parent as predictor of the elevation.
The radius and azimuth are predicted from the new INTRA and INTER prediction functions. In a preferred embodiment, we suppose that this global prediction mode is kept for this invention even if the INTRA or INTER prediction functions could be also used for elevation prediction, or just for radius or just azimuth prediction.
Figure 7 illustrates the main step of an example of the compression algorithm used for encoding the 3D geometry of the input points corresponding to step 2004 in Figure 2. This version of the algorithm is used, for example, for a 3D point cloud coming from LIDAR or any 3D point cloud with a spherical representation.
As explained previously, a tree is constructed. According to this tree, a point is selected in step 7000. This point is in its spherical representation (radius, azimuth, elevation or Laserld). In step 7001, a prediction of this point is done. The prediction can be done from INTRA prediction functions (figure 5), or INTER prediction functions (figures 8, 9, 10 and 11). In embodiments of the present invention, it is proposed to use at least three different INTRA prediction functions, and/or at least three INTER predictors functions.
The predictors (calculated from used prediction functions) are previous reconstructed 3D points, reconstructed meaning that these points have been encoded and decoded, coming from a reference frame for INTER prediction or from the current frame for INTRA prediction. The best predictor is chosen among all the predictors. A flag is set to notify if it is an INTRA or INTER predictor. The index of this predictor (INTER or INTRA) is encoded and added in the bitstream in step 7006. The predictors are used for the prediction of radius and azimuth. The elevation being predicted from the parent in the tree. Once the prediction is done, the residual is calculated in the spherical representation in step 7002, quantized and encoded in a binary representation in step 7005 for being added in the bitstream in step 7006. The decoded residual in the spherical domain is calculated in step 7003 and added to the prediction in the step 7007. The reconstructed point is transformed in the Cartesian domain/representation in step 7008.
In step 7013, the input point is transformed in Cartesian coordinates. The residual in the Cartesian domain is obtained in step 7004. This residual is quantized and encoded in a binary stream in step 7006.
In step 7009, the residual is decoded in the Cartesian domain before being added in step 7010 to the point in step 7008. The result is the reconstructed point in the Cartesian domain which is transformed in the spherical domain in step 7011 before being stored in step 7012. The storage in used in step 7012 contains encoded/decoded points that can be used as INTRA or INTER prediction.
For INTRA prediction, the encoded/decoded points are used as described above for constructing the list of predictors.
For INTER prediction, the stored point will be used for the encoding of the next frame: they will be used as reference frame for the next frame as illustrated by Figure 8.
Figures 8a and 8b illustrate the INTER prediction as proposed in a first and second version of the TUC document.
For inter prediction, it was initially proposed to predict the radius of a point from a reference frame. The azimuth and laserlD are still predicted with intra prediction, while the radius is predicted from the point in the reference frame that has the same laserlD as the current point and an azimuth that is closest to the current azimuth.
An improvement of this method, proposed in a successive version of the TUC document, also enables inter prediction of the azimuth and laserlD in addition to radius prediction. When inter-coding is applied, the radius, azimuth and laserlD of the current point are predicted based on a point that is near the azimuth position of a previously decoded point in the reference frame. In addition, separate sets of contexts are used for inter and intra prediction in the arithmetic encoder.
This improved method is illustrated in Figure 8a. The method consists of the following steps: * For a given 3D point in the frame at time 't', called curPoint in the figure, choose the previous decoded point (prevDecP0) in the same frame.
* Choose position in reference frame refFramePO, which is usually the previous decoded and stored frame at time 't-1', that has same scaled azimuth and laserlD as prevDecP0.
* In reference point cloud frame, find the first point, called interPredPt, that has azimuth greater than that of refFramePO.
When the INTER predictor is chosen, a flag is encoded, called "interflag", for signalling to the decoder to use INTER predictor.
According to a further improvement proposed in the TUC document, an additional INTER predictor is added, which is obtained by finding the first point that has azimuth greater than the inter predictor previously obtained as illustrated in Figure 8b. Additional signalling is used to indicate which of the predictors is selected if INTER coding has been applied.
Figure 9 is another illustration of the INTER predictive encoding described above in relation to Figure 8b. The figure illustrates the current frame 9000 and the reference frame 9001. The reference frame 9001 could be any previously encoded frame. The reference frame could be translated and rotated according to 3D parameters representing the 3D motion between the reference frame and the current frame, these parameters being transmitted in the bitstream for the decoder.
For the current and reference frame, several points with same elevation (00) are illustrated. The notion of elevation and laserld are equivalent because the elevation can be calculated from the laserld. In this document, we use the term elevation but Laser ID can be used instead for rotating LIDAR. In real frames, points at different elevations exist because the LIDAR has several heads as illustrated in Figure 1 but the representation considers only one elevation (or laserld) for simplicity. As indicated in Figure 6, a tree can be constructed (and the points to encode successively selected according to this tree) according to increased azimuth for a given elevation. The method is applied similarly for each elevation.
The point to encode 9002 is illustrated for the given elevation. The point is in the spherical representation wherein (p3 is the value of the azimuth, R3, the value the radius and 00 is the elevation. The previous decoded point 9003 of the same frame is selected according to the constructed tree.
The reference point 9004 collocated in the reference frame is selected and the two INTER predictors 9005 and 9006 are evaluated. In this example, let's consider that the additional predictor 9006 is the best one. This predictor could be used as predictor of the current point 9002 for the prediction of the radius, azimuth and elevation.
If the reference point 9004 doesn't exist because no feedback from the laser beam has been received, the two INTER predictors are the first two points of the reference frame with same elevation and azimuth higher than pl 9005 and 9006.
If one of the two INTER predictors 9005 and 9006 is better, according to a rate-distortion estimate, than the INTRA predictor, then the flag signalling the choice between INTRA or INTER mode is set for INTER. And the index of the INTER predictor is encoded as an integer value.
Figure 10 illustrates a proposed improvement over the previously described solutions according to embodiments of the invention. On this figure, it can be seen that the current point to encode 10002 cannot be predicted correctly from the interPred predictor 10005 and additional inter pred predictor 10006. Therefore, it is advantageous to propose additional INTER prediction functions. In fact, the 2 inter predictors (Figure 8b) cannot find the best predictor.
In these embodiments, four INTER prediction functions are used. The general behaviour of the algorithm is the following one: The point to encode 10002 is represented in spherical representation wherein p4 is the value of the azimuth, R4, the value the radius and 00 is the elevation. The previous decoded point of the same frame 10003 is selected. In a variant notation, the current point to encode is the point Pt4(curr) where: (curr) is the current frame; R4(curr), yo4(curr) are the radius and azimuth associated to this point; and Pt(400, cur) is the previous encoded point.
The reference point 10004 in the reference frame is then retrieved and the four INTER predictors 10005-10008, which are the successive point in azimuth, are tested. In this example, we consider for example that the additional predictor 10008, the fourth, is the best one. This predictor could be used as predictor of the current point to encode 10002 for the prediction of the radius, azimuth and/or elevation. In the reference frame, the illustration shows that the INTER predictors 'INTER Pred Point' (interPredldx =0) and 'additional Pred Point' (interPredldx=1) are not relevant. The point Pt(94, ref) is a better choice (interPredldx=3).
If the reference point 10004 doesn't exist, the four INTER predictors in the reference frame with azimuth higher to the azimuth of the point 10003 in the current frame are selected. If the INTER predictor 10008 is better than the INTRA predictor according to a rate-distortion criterion, then the flag signalling the INTRA or INTER mode is set for INTER. And the index of the INTER predictor is encoded as an integer value.
These embodiments can be summarized according to the following formulation by referring to figure 10. Additional INTER predictor functions according a list of prediction functions are defined. For example, it can be: o F(0): Inter Pred (for selecting 10005 as predictor) 0 F(1): Additional Inter Pred (for selecting interPredldx=1, 10006, as predictor) o F(2): 'Additional Additional' Inter Pred (for selecting interPredldx=2, 10007, as predictor) o F(3): 'Additional Additional Additional' Inter Pred (for selecting interPredldx=3, 10008, as predictor) o F(max) As seen later, the chosen inter predictor will be encoded with a truncated unary word with arithmetic encoding with one context per bin index.
Embodiments of the invention relate thus to predictive geometry and especially to INTER predictive geometry.
In short, it is proposed additional INTER predictors and it is proposed to encode the chosen INTER predictor with truncated unary bits and arithmetic encoding with one context per bin index. In other words, it is proposed additional INTER predictors, according to a list of prediction functions, wherein the chosen INTER predictor index is encoded with truncated unary bits and arithmetic encoding with one context per bin index.
According to embodiments, it is thus proposed to set the maximal number of used INTER predictors by the encoder in the geometry parameter set. The list of prediction functions is known at both the encoder and decoder sides (for example, by defining the maximum number of prediction functions according to an ordered list of prediction functions). The number of INTER prediction functions used by the encoder is specified in the GPS and read by the decoder. The index of the used inter prediction function is encoded with a truncated unary coding with one context per bin index.
Figure 11 illustrates a second improvement over the previously described method according to embodiments of the invention. In this figure, we are going to consider the encoded points of the reference frame 11001 at a different elevation.
The point to encode in the current frame 11000 is in the spherical representation wherein (p4 is the value of the azimuth, R4, the value the radius and 00 is the elevation.
This embodiment is similar to the previous one with the difference that the points in the reference frame will be searched at a different elevation (01 in this example).
11002 is the point to encode. The point is in the spherical representation wherein (p4 is the value of the azimuth, R4, the value the radius and 00 is the elevation. In 11003, the previous encoded/decoded point of the same frame is selected. The reference point in the reference frame is selected 11004 and the 4 INTER predictors are tested 11005, 11006, 11007, 11008. The difference with the previous figure is that the reference points in the reference frame will be searched at a different elevation (01 in this example).
In this illustration, we consider that the additional predictor 11008 is the best one.
This predictor could be used as predictor of the current point for the radius and azimuth (the elevation for the point 11002 being predicted from the point 11003).
If the reference point 11004 doesn't exist, the 4 INTER predictors in the reference frame will be the first 4 points with the elevation 01 and with azimuth higher to the azimuth of the point 11003 are selected 11005, 11006, 11007, 11008. If the INTER predictor 11008 is better than the INTRA predictors according to a rate-distortion criterion, then the flag signalling the INTRA or INTER mode is set for INTER. And the index of the INTER predictor is encoded as a Boolean value (false if InterPred is used, true if the additional Inter Pred is used).
According to an embodiment, a list of N prediction functions, for example indexed from 0 to N-1, is defined. The encoder is allowed to select a subset of P prediction functions from this list of N prediction functions for encoding. This subset is signaled in the bitstream to be used by the decoder. In this embodiment, the encoded index of the determined predictor (which is in direct correspondence with a prediction function) for a given point is the index in the subset, namely between 0 and P-1, that is signaled in the bitstream.
According to a first variant of this embodiment, the indexes in the list of N prediction functions of the prediction functions belonging to the selected subset is signaled. These indexes may be signaled in the SPS, the GPS or the GDU header.
Accordingly, the subset is valid for the whole sequence, the current frame or slice.
In a second variant, the subset is constituted by the P first elements in the list of N prediction functions. In this case, the index P-1 of the last element of the subset is signaled in the bitstream in SPS if the selection is valid for the whole sequence, in the GPS if the selection is valid for the current frame and the GDU if the selection is valid for the slice.
In other words, the number of INTER predictors (with associated prediction functions) used by the encoder is set in the bitstream in the Sequence Parameter Set, geometry Parameter Set, for example.
In an embodiment, a selection at the sequence level signaled in the SPS may be overwritten at the frame level or the slice level by signaling in the GPS or the GDU. A selection made at the frame level may be overwritten at the slice level in the GDU. An example of list of N predictors (with associated prediction functions for calculating this predictors) is the N successive point in the reference frame according to increasing azimuth from the point collocated to the previously encoded point in the reference frame (9004 or 10004).
Accordingly, a variable number of predictors can be used by the encoder. This allows the encoder to determine a tradeoff between computing time at encoding and compression efficiency.
Advantageously, the number of predictors being signaled in the bitstream, the index of the determined predictor is encoded using a truncated unary code (with arithmetic encoding with one context per bin index).
According to an embodiment, it is proposed to encode the chosen INTER predictor with a truncated unary word with arithmetic encoding with one context per bin index (as illustrated below with max=3 and interPreldx the name of the chosen INTER predictor): interPredldx 0 4 encode word '0' interPredldx 1 4 encode word '10' interPredldx 2 4 encode word '110' interPredldx 3 4 encode word '111' The encoded words are the binary words encoded with arithmetic encoding with one context per bin index.
Another example, where the truncated words (encoded with arithmetic encoder with one context per bin index) are displayed for 2 predictors used by the encoder: predldx 0 4 encode word '0' predldx 1 4 encode word '1' In a scenario where the encoder decides to optimize the computation times and uses a low number of predictors (e.g. by using only the 1 or 2 inter predictors), the bit truncation can be used only if the decoder is aware of the number of used inter predictors.
It is thus proposed in an embodiment to encode the number of inter predictors used by the encoder in the geometry parameter set. With this modification, the decoder will know the maximal value of the predictors indices (by reading the GPS information). With this modification, the bit truncation could be used on the highest used predictor index (the last one as specified in the GPS). Still with this modification, the decoder will be in capacity to decode the truncated binary code (the truncation being applied on the highest predictor index as specified in the GPS).
In other words, it is proposed to set the number of INTER predictors used by the encoder in the Geometry Parameter Set. With this modification, when the encoder bounds the number of used inter predictors to a low value (e.g. number of inter predictors set to 2 for optimization of the computation times), the bit truncation can be applied on the last inter predictor index while being decodable at the decoder.
Figure 12 illustrates the main steps of an example of encoding method using the proposed improvements. Let us suppose that we want to encode a new slice in a point cloud.
In step 12001, an INTRA prediction buffer is provided. Its size corresponds to the number of possible INTRA predictors. In step 12002, a set of INTER prediction functions is provided.
In the following of the text, we use indifferently INTRA predictors or INTRA prediction functions. The meaning is the same. An INTRA prediction function corresponds to the selection of one predictor. For example, the INTRA prediction function 'x' will select the predictor 'x' in the INTRA predictor list 12001.
The INTER prediction functions are slightly different. An INTER prediction function calculates a predictor according to the examples given in the figure 10. For example, in reference to the figure 10, the prediction function '0' will select the point 10004 as predictor for prediction. The prediction function '1' will select the point 10005 as predictor for prediction. The prediction function '2' will select the point 10006 as predictor for prediction.
In step 12003, the encoder determines the actual number of INTRA and INTER prediction functions to be used for point encoding, and possibly a subset of INTER prediction functions that is to be actually used at encoding to generate the actual predictor to be used 12006. For example, in reference to the figures 10 and 11, a subset of INTER prediction functions could be the following 4 functions: - One calculating the predictor 10004 (function 0) - One calculating the predictor 10005 (function 1) One calculating the predictor 10006 (function 2) One calculating the predictor 11004 (function 3) These determined number of predictors, both INTRA and INTER prediction functions, are stored in the GPS with possibly a subset of INTER prediction functions, in step 12005.
This information can alternatively be written in the SPS or GDU header depending of the wished scope for changing the predictors.
In step 12000 the current point of the predictive tree to encode is obtained with the previous encoded/decoded point 12017, if it exists.
In step 12006, in the context of a loop on all the predictors, the next prediction function in the list of used prediction functions 12004 is selected and the corresponding predictor is used to evaluate the associated encoding costs in term of rate-distortion. If the prediction function is an INTRA prediction function, then the associated INTRA predictor in 12001 is selected (according of the index of the INTRA predictor) in the INTRA prediction list 12001. It means that the INTRA prediction function is just a pointer into the INTRA predictor list 12001. If the predictor is an INTER prediction function, then the INTER prediction function in 12002 is selected and the predictor is calculated according to this function (as indicated in the figure 10 and 11) based on the reference frame.
In step 12008, the spherical residual 12010 is calculated. The residual information contains residual related to azimuth, radius and elevation. This information can be additionally quantized by step 12008.
In step 12011, an estimation of the cost On bits) of the prediction information and of the spherical residual is done. The prediction information comprises the information indicating whether INTRA or INTER prediction is selected, the index of the prediction function (either for INTRA or INTER). If the bits estimation is the lowest estimation among the tested predictions for the current point, then this value is stored as bits reference 12013. The chosen predictor (INTER or INTRA, index) is also registered. Once all the predictors have been tested, the best reference 12013 is determined. The following steps are done: Encoding of the prediction information in step 12014; Encoding of the spherical residual in step 12015; Encoding of the cartesian residual in step 12016. Indeed, as described in relation with Figure 7, a cartesian residual is generated in addition to the spherical residual.
All this encoded information is encoded in the bitstream in the Geometry Data Unit 3012.
It is understood that Figure 12 is an illustration and other information could be encoded than the one given in this figure. The bits estimation of the encoding information for a given predictor would be estimated in consequence.
Once all the steps are conducted, the current point (after decoding) is used in order to update the INTRA predictor List 12001, the next point of the predictive tree is determined in a loop to steps 12000 and 12007 until the end of the slice. For the next point, the next chosen predictor function will be the first one in the list for the module (12006).
In an embodiment, an early termination is proposed in the cost estimation function. In other words, in the embodiment, an early termination test in the function in charge of estimating the bit cost of each tested predictor (INTRA or INTER) is added. No gain/no loss is obtained in terms of compression ratio but computation times saving is obtained. The cost estimate in bits comprises the evaluation of several estimations in bits, the index of the prediction function (either INTER or INTRA), the INTER flag, components of the residual. It is proposed to check at different stages of the process whether the current size is greater or equal to the cost of the best previously estimated predictor stored as the best reference 12013. When the result of the test is true, the cost estimation for the current predictor is terminated as it cannot be the best predictor.
Therefore, the end of the function is not conducted. Accordingly, computation time is saved for this computation intensive step of cost evaluation.
In other words, it is proposed several early termination tests in a function 'estimateBits' (the function in charge of estimating the encoding cost) by: Adding in the function estimateBits a parameter specifying the best bit cost of previously tested predictors.
Checking/comparing regularly inside the function the summed bit cost with the best bit cost/parameter and to stop the function if the early termination is valid (calculated bit cost becomes higher than the best bit cost).
Figure 13 illustrates the main steps of an example of method for estimating the encoding cost with early termination. The estimation of the encoding cost is done for each predictor tested during the encoding of a point.
In step 13000, a test is done to know if the INTER flag indicating whether INTER or INTRA prediction is under evaluation is at 0 or 1. If it is 1, the estimation of the compression cost of the INTER predldx is done, step 13001. If it is 0, the estimation of the compression cost of the INTRA predldx is done, step 13002. predldx is the index of the predictor chosen for prediction of a given point. The result is the estimation cost called 'Bits' 13012. In step 13003, the early termination test is run. It consists in comparing bits' with bits reference'. If bits' is higher than bits reference' then the cost of the current information is higher than the cost for a previously tested predictor. It means that the function can be stop and the 'bits' value is returned. The further steps of the estimation function are skipped leading to an improved computing time.
In step 13004, the cost of encoding the INTER flag is estimated and added to the previously calculated bits' resulting in a new bits value. In step 13005, the early termination test is run similarly to step 13003.
In step 13006, the cost of encoding the residual information related to the azimuth is estimated and added to the previously calculated bits' resulting in a new bits value. In step 13007, the early termination test is run anew.
In step 13008, the cost of encoding the residual information related to the radius is estimated and added to the previously calculated 'bits' resulting in a new bits value. In step 13009, the early termination test is run anew.
In step 13010, the cost of encoding the residual information related to the elevation (or laserld) is calculated. This value is added to the previous 'bits' value and is the return value.
Several different implementations of this function can be done. For example: The information which are (statistically) the most important to compress/encode On term of bits) can be encoded first.
The information to estimate can be slightly different. For example, the TUC document describes an embodiment where the radius residual can be split into sign and radius and encoded with different arithmetic context When using spherical coordinates in predictive geometry coding of LiDAR acquired point clouds G-PCC Ed.1 describes an embodiment where the prediction of the azimuthal angle of a point can be refined by adding a number 'k' (coded in bitstream) of azimuthal steps cp' to the azimuthal angle prediction '4)."' provided by the 'n'-th predictor.
In other words, according to embodiments, we propose several early termination tests in the function 'estimateBits' which is the function in charge of estimating the encoding cost: By adding in the function 'estimateBits' a parameter specifying the best bit cost of previously tested predictors.
To check/compare regularly inside the function the summed bit cost (the previously calculated bits inside the function) with the best bit cost/parameter and to stop the function if the early termination is valid (calculated bit cost becomes higher than the best bit cost).
For example, after the estimation of the bit cost of interFlag, the early termination test is run; bits += estimate(interFlag, _ctxInterFlag[interFlagCtx1dx]); if (bits > best_known_bits) return bits; Here, interFlag is a flag indicating whether INTER or INTRA prediction is used for the encoding of the current point; 'estimate' is a function in charge of estimating the cost On bits) of the interFlag; ctxInterFlagn is an arithmetic context and interFlagCtxIdx is an integer value.
The early termination test is repeated for all the information to estimate in the estimateBits function (residual, predictor index, ....).
While the early termination test in the function in charge of estimating the bit cost of each tested predictor (IMira or Inter) is added, no gain/no loss is obtained in terms of compression ratio (compression ratios are unchanged) but computation times saving is obtained. In other words, it is proposed to change slightly the function in charge of estimating the bits cost by adding an early termination stage. In particular, it is proposed to optimize the function estimateBits with a slight modification (addition of a parameter, inclusion of bits comparisons in the function and early termination according to the bits comparison). Results show same compression performances with better computation time's performances. It enables to have new anchor in terms of computation times. Compression ratios are unchanged but computation times are reduced for the encoding and can be used as anchor. It may be observed that: computation gains are obtained for encoding; compression ratio are unchanged and decoding times are stable.
A specification of the decoding must be set. An example in relation with the new INTRA and INTER functions is illustrated herein with the following table. We suppose here that the new prediction functions are added in the Geometry Parameter Set. We suppose that these prediction functions can be specified by the size of the INTRA predictor list and that the subset of the INTER prediction functions is defined by the N first INTER prediction functions. In other words, it is proposed to make the number of predictors used for predictive geometry configuration in the GPS. New syntax elements indicating the number of predictors for intra and inter predication are added in the GPS extension as illustrated below (hence addition of GPS information) geometry_parameter_set() ( Descriptor gps_geom_parameter_set_id u(4) gps_seq_parameter_set_id u(4) i---] gps_extension_present u(1) if(gps_extension_present) f [...1 if(geom_tree_type == 1) f ptree_num_intra_predictor_minus1 ue(v) if(inter_prediction_flag == 1) ptree_numinter_predictor ue(v) [...] while(more_data_in_data_unit()) gps_extension_data u(1) byte_alignment()
I
gps_geom_parameter_set_id: identifier of the Geometry Parameter Set gps_seg_parameter set_id: identifier of the SPS referenced by the Geometry Parameter Set while(more_data_in_data_unit()): is used for continuing to read the data ptree_num_intra_predictor_minus1 plus 1 specifies the size of the list of INTRA predictors used for INTRA prediction.
ptree_num_inter predictor specifies the maximum number of INTER predictors.
Figure 14 illustrates a block diagram of a device adapted to incorporate the invention.
Preferably, the device comprises a central processing unit (CPU) 14001 capable of executing instructions from program ROM 14003 on powering up of the receiving device, and instructions relating to a software application from main memory 14002 after the powering up. The main memory 14002 is for example of Random Access Memory (RAM) type which functions as a working area of CPU 14001, and the memory capacity thereof can be expanded by an optional RAM connected to an expansion port (not illustrated). Instructions relating to the software application may be loaded to the main memory 14002 from the hard-disc (HD) 14006 or the program ROM 14003 for example. Such software application, when executed by the CPU 14001, causes the steps of the flowcharts shown in the previous figures.
Reference numeral 14004 is a network interface that allows the connection of the device to the communication network. The software application when executed by the CPU is adapted to receive data streams through the network interface from other devices.
Reference numeral 14005 represents a user interface to display information to, and/or receive inputs from a user.
Any step of the algorithms of the invention may be implemented in software by execution of a set of instructions or program by a programmable computing machine, such as a PC ("Personal Computer"), a DSP ("Digital Signal Processor") or a microcontroller; or else implemented in hardware by a machine or a dedicated component, such as an FPGA ("Field-Programmable Gate Array") or an ASIC ("Application-Specific Integrated Circuit").
Although the present invention has been described hereinabove with reference to specific embodiments, the present invention is not limited to the specific embodiments, and modifications will be apparent to a skilled person in the art which lie within the scope of the present invention.
Many further modifications and variations will suggest themselves to those versed in the art upon making reference to the foregoing illustrative embodiments, which are given by way of example only and which are not intended to limit the scope of the invention, that being determined solely by the appended claims. In particular the different features from different embodiments may be interchanged, where appropriate.
Each of the embodiments of the invention described above can be implemented solely or as a combination of a plurality of the embodiments. Also, features from different embodiments can be combined where necessary or where the combination of elements or features from individual embodiments in a single embodiment is beneficial.
In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. The mere fact that different features are recited in mutually different dependent claims does not indicate that a combination of these features cannot be advantageously used.

Claims (18)

  1. CLAIMS1 A method of encoding a 3D dynamic point cloud, comprising a sequence of 3D point clouds, each 3D point cloud comprising a set of 3D points, in a bitstream, wherein the method comprises for encoding a current 3D point: determining a set of prediction functions; determining a set of predictors based on the set of respective prediction functions; selecting a predictor for the current 3D point among the set of predictors based on a cost evaluated for each predictor; and encoding in the bitstream information for reconstructing the set of prediction functions, the index of the predictor selected in the set of predictors and the residual in the bitstream.
  2. 2. The method of claim 1, wherein the set of prediction functions comprises a set of INTRA prediction functions and/or a set of INTER prediction functions.
  3. 3 The method of claim 2, wherein the method further comprises: -encoding in the bitstream an indication for indicating whether the selected predictor is based on an INTRA or an INTER prediction function.
  4. 4. The method of claim 2, wherein the information for reconstructing the set of prediction functions comprises a number of INTRA prediction functions.
  5. 5. The method of claim 2, wherein the information for reconstructing the set of prediction functions comprises a number of INTER prediction functions.
  6. 6. The method of claim 2, wherein the set of INTER prediction functions is determined as a subset of an indexed list of INTER prediction functions.
  7. 7 The method of claim 6, wherein the information for reconstructing the set of prediction functions comprises indexes in the indexed list of INTER prediction functions of the INTER prediction functions of the subset.
  8. 8 The method of claim 6, wherein the information for reconstructing the set of prediction functions comprises a maximum index in the indexed list of INTER predictors of INTER prediction functions in the subset, the set of INTER predictors corresponding to the first INTER prediction functions in the indexed list of INTER prediction functions up to the maximum index.
  9. 9. The method of any one claim 1 to 8, wherein the index of the selected predictor is represented in the bitstream according to a truncated unary code.
  10. 10. The method of claim 9 wherein the truncated unary code is encoded with an arithmetic encoder.
  11. 11. The method of any one claim 1 to 10, wherein the selected predictor is the predictor for which the encoding cost is the lowest. 15
  12. 12. The method of any one claim 1 to 11, wherein evaluating the cost for a predictor comprises: storing a best cost value after evaluating each predictor, the best cost value corresponding to the lowest cost for the previously evaluated predictors; and early terminating the evaluation of a predictor as soon as its cost reach the stored best cost value.
  13. 13 A method for decoding a 3D dynamic point cloud, comprising a sequence of 3D point clouds, each 3D point cloud comprising a set of 3D points, in a bitstream, wherein the method comprises for decoding a current 3D point: - obtaining from the bitstream information for reconstructing a set of prediction functions; - determining a set of predictors based on the set of respective prediction functions; - obtaining from the bitstream an index of a selected predictor in the set of predictors; obtaining from the bitstream the residual; - decoding the current 3D point based on the selected predictor and the residual.
  14. 14. A computer program product for a programmable apparatus, the computer program product comprising a sequence of instructions for implementing a method according to any one of claims 1 to 13, when loaded into and executed by the programmable apparatus.
  15. 15. A computer-readable storage medium storing instructions of a computer program for implementing a method according to any one of claims 1 to 13.
  16. 16. A computer program which upon execution causes the method of any one of claims 1 to 13 to be performed.
  17. 17 A device for encoding a 3D dynamic point cloud, comprising a sequence of 3D point clouds, each 3D point cloud comprising a set of 3D points, in a bitstream, wherein the device comprises for encoding a current 3D point a processor configured for: determining a set of prediction functions; determining a set of predictors based on the set of respective prediction functions; selecting a predictor for the current 3D point among the set of predictors based on a cost evaluated for each predictor; and encoding in the bitstream information for reconstructing the set of prediction functions, the index of the predictor selected in the set of predictors and the residual in the bitstream.
  18. 18 A device for decoding a 3D dynamic point cloud, comprising a sequence of 3D point clouds, each 3D point cloud comprising a set of 3D points, in a bitstream, wherein the device comprises for decoding a current 3D point a processor configured for: obtaining from the bitstream information for reconstructing a set of prediction functions; determining a set of predictors based on the set of respective prediction functions; obtaining from the bitstream an index of a selected predictor in the set of predictors; obtaining from the bitstream the residual; decoding the current 3D point based on the selected predictor and the residual.
GB2210096.0A 2022-07-08 2022-07-08 Method and apparatus for compression and encoding of 3D dynamic point cloud Pending GB2620453A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB2210096.0A GB2620453A (en) 2022-07-08 2022-07-08 Method and apparatus for compression and encoding of 3D dynamic point cloud
PCT/EP2023/068942 WO2024008968A1 (en) 2022-07-08 2023-07-07 Method and apparatus for compression and encoding of 3d dynamic point cloud

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB2210096.0A GB2620453A (en) 2022-07-08 2022-07-08 Method and apparatus for compression and encoding of 3D dynamic point cloud

Publications (2)

Publication Number Publication Date
GB202210096D0 GB202210096D0 (en) 2022-08-24
GB2620453A true GB2620453A (en) 2024-01-10

Family

ID=84540024

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2210096.0A Pending GB2620453A (en) 2022-07-08 2022-07-08 Method and apparatus for compression and encoding of 3D dynamic point cloud

Country Status (2)

Country Link
GB (1) GB2620453A (en)
WO (1) WO2024008968A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020146341A1 (en) * 2019-01-07 2020-07-16 Futurewei Technologies, Inc. Point cloud bitstream structure and auxiliary information differential coding
US20200366932A1 (en) * 2018-02-11 2020-11-19 Peking University Shenzhen Graduate School Intra-frame prediction-based point cloud attribute compression method
WO2022120594A1 (en) * 2020-12-08 2022-06-16 Oppo广东移动通信有限公司 Point cloud encoding method, point cloud decoding method, encoder, decoder, and computer storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113455007B (en) * 2019-03-22 2023-12-22 腾讯美国有限责任公司 Method and device for encoding and decoding inter-frame point cloud attribute

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200366932A1 (en) * 2018-02-11 2020-11-19 Peking University Shenzhen Graduate School Intra-frame prediction-based point cloud attribute compression method
WO2020146341A1 (en) * 2019-01-07 2020-07-16 Futurewei Technologies, Inc. Point cloud bitstream structure and auxiliary information differential coding
WO2022120594A1 (en) * 2020-12-08 2022-06-16 Oppo广东移动通信有限公司 Point cloud encoding method, point cloud decoding method, encoder, decoder, and computer storage medium

Also Published As

Publication number Publication date
GB202210096D0 (en) 2022-08-24
WO2024008968A1 (en) 2024-01-11

Similar Documents

Publication Publication Date Title
US20220207780A1 (en) Inter prediction coding for geometry point cloud compression
US11956470B2 (en) Predictor index signaling for predicting transform in geometry-based point cloud compression
US20230164353A1 (en) Point cloud data processing device and processing method
AU2021411954A1 (en) Inter prediction coding for geometry point cloud compression
KR20230127219A (en) Hybrid tree coding for inter and intra prediction for geometry coding
KR20220122995A (en) Information processing devices and methods
CN115086658B (en) Point cloud data processing method and device, storage medium and encoding and decoding equipment
WO2022076708A1 (en) Angular mode and in-tree quantization in geometry point cloud compression
US12026920B2 (en) Point cloud encoding and decoding method, encoder and decoder
GB2620453A (en) Method and apparatus for compression and encoding of 3D dynamic point cloud
WO2023272730A1 (en) Method for encoding and decoding a point cloud
WO2023028177A1 (en) Attribute coding in geometry point cloud coding
GB2623372A (en) Method and apparatus for compression and encoding of 3D dynamic point cloud
GB2626043A (en) Method and apparatus for compression and encoding of 3D dynamic point cloud
US20230345045A1 (en) Inter prediction coding for geometry point cloud compression
WO2024008019A1 (en) Method, apparatus, and medium for point cloud coding
US20230342987A1 (en) Occupancy coding using inter prediction with octree occupancy coding based on dynamic optimal binary coder with update on the fly (obuf) in geometry-based point cloud compression
WO2022260115A1 (en) Three-dimensional data encoding method, three-dimensional data decoding method, three-dimensional data encoding device, and three-dimensional data decoding device
WO2023198168A1 (en) Method, apparatus, and medium for point cloud coding
JP2024527337A (en) Method for encoding and decoding point clouds
JP2023101095A (en) Point cloud decoding device, point cloud decoding method, and program
WO2023250100A1 (en) System and method for geometry point cloud coding
JP2024093896A (en) Point group decoding device, point group decoding method and program
JP2024093897A (en) Point group decoding device, point group decoding method and program
JP2023007934A (en) Point group decryption device, point group decryption method and program