CN117999580A - Coding point cloud data in G-PCC using direct mode for inter-prediction - Google Patents

Coding point cloud data in G-PCC using direct mode for inter-prediction Download PDF

Info

Publication number
CN117999580A
CN117999580A CN202280063734.7A CN202280063734A CN117999580A CN 117999580 A CN117999580 A CN 117999580A CN 202280063734 A CN202280063734 A CN 202280063734A CN 117999580 A CN117999580 A CN 117999580A
Authority
CN
China
Prior art keywords
node
mode
point
determining
inter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280063734.7A
Other languages
Chinese (zh)
Inventor
L·法姆范
G·范德奥韦拉
A·K·拉马苏布拉莫尼安
M·卡尔切维茨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/933,300 external-priority patent/US20230099908A1/en
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN117999580A publication Critical patent/CN117999580A/en
Pending legal-status Critical Current

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

An example apparatus for decoding point cloud data includes: a memory configured to store point cloud data; and one or more processors implemented in the circuitry and configured to: determining at least one of: 1) The node of the octree of the point cloud data is not inter-predictable, or 2) an angular mode is enabled for the node; determining an Inferred Direct Coding Mode (IDCM) mode for the node in response to determining at least one of: 1) The node is not inter-predictable, or 2) angular mode is enabled for the node; and decoding occupancy data for the node using the determined IDCM mode.

Description

Coding point cloud data in G-PCC using direct mode for inter-prediction
The present application claims priority from U.S. patent application Ser. No. 17/933,300, filed on 9/19, 2022, and U.S. provisional application Ser. No. 63/261,722, filed on 27, 9/2021, each of which is incorporated herein by reference in its entirety. U.S. patent application Ser. No. 17/933,300, filed on 9/19/2022, claims the benefit of U.S. provisional application Ser. No. 63/261,722, filed on 27/9/2021.
Technical Field
The present disclosure relates to point cloud encoding and decoding.
Background
A point cloud is a collection of points in three-dimensional space. These points may correspond to points on objects within a three-dimensional space. Thus, the point cloud may be used to represent the physical content of a three-dimensional space. The point cloud may have utility in a wide variety of situations. For example, a point cloud may be used in the context of an autonomous vehicle to represent the location of an object on a road. In another example, a point cloud may be used in the context of physical content representing an environment in order to locate virtual objects in an Augmented Reality (AR) or Mixed Reality (MR) application. Point cloud compression is a process for encoding and decoding point clouds. Encoding the point cloud may reduce the amount of data required to store and transmit the point cloud.
Disclosure of Invention
In general, this disclosure describes techniques for coding point cloud data using direct mode (e.g., for inter-prediction coding of geometric point cloud compression (G-PCC)). In particular, this disclosure describes techniques for directly coding occupancy data for a node (e.g., directly coding the position of a point of a node of an octree, coding the position of a point of a node to be the same as the position of a point in a reference node for a node, or coding the position of a point of a node according to a positional offset (or residual) relative to the position of a point in a reference node). The coding mode may be an Inferred Direct Coding Mode (IDCM). The G-PCC decoder may determine to enable the IDCM mode for the node when at least one of the nodes is not inter-predictable or enables an angular mode for the node. Thus, the G-PCC decoder may determine to disable IDCM for a node that is inter-predictable and for which angular mode is not enabled.
In one example, a method of decoding point cloud data includes: determining at least one of: 1) The nodes of the octree of the point cloud data are not inter-predictable, or 2) an angular mode is enabled for the node; determining an Inferred Direct Coding Mode (IDCM) mode for the node in response to determining at least one of: 1) The node is not inter-predictable, or 2) angular mode is enabled for the node; and decoding occupancy data for the node using the determined IDCM mode.
In another example, an apparatus for decoding point cloud data includes: a memory configured to store point cloud data; and one or more processors implemented in the circuitry and configured to: determining at least one of: 1) The node of the octree of the point cloud data is not inter-predictable, or 2) an angular mode is enabled for the node; determining an Inferred Direct Coding Mode (IDCM) mode for the node in response to determining at least one of: 1) The node is not inter-predictable, or 2) angular mode is enabled for the node; and decoding occupancy data for the node using the determined IDCM mode.
In another example, a computer-readable storage medium having instructions stored thereon that, when executed, cause a processor to: determining at least one of: 1) The nodes of the octree of the point cloud data are not inter-predictable, or 2) an angular mode is enabled for the node; determining an Inferred Direct Coding Mode (IDCM) mode for the node in response to determining at least one of: 1) The node is not inter-predictable, or 2) angular mode is enabled for the node; and decoding occupancy data for the node using the determined IDCM mode.
In another example, an apparatus for decoding point cloud data includes: means for determining at least one of: 1) The nodes of the octree of the point cloud data are not inter-predictable, or 2) an angular mode is enabled for the node; means for determining an Inferred Direct Coding Mode (IDCM) mode for the node in response to determining the at least one of: 1) The node is not inter-predictable, or 2) angular mode is enabled for the node; and means for decoding occupancy data for the node using the determined IDCM mode.
The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Drawings
Fig. 1 is a block diagram illustrating an example encoding and decoding system that may perform the techniques of this disclosure.
Fig. 2 is a block diagram illustrating an example geometric point cloud compression (G-PCC) encoder in accordance with one or more techniques of the present disclosure.
Fig. 3 is a block diagram illustrating an example G-PCC decoder in accordance with one or more techniques of the present disclosure.
Fig. 4 is a flowchart depicting an example process for performing motion-based inter-prediction for G-PCC in accordance with one or more techniques of this disclosure.
Fig. 5 is a flow diagram illustrating an example process for estimating a local node motion vector in accordance with one or more techniques of the present disclosure.
Fig. 6 is a conceptual diagram illustrating an example of performing occupancy comparisons for inter-prediction in a G-PCC in accordance with one or more techniques of this disclosure.
Fig. 7 is a conceptual diagram illustrating a Planar Coding Mode (PCM) for G-PCC according to one or more techniques of the present disclosure.
Fig. 8 is a conceptual diagram illustrating a laser package (such as a LIDAR sensor or another system including one or more lasers) scanning points in three-dimensional space according to one or more techniques of the present disclosure.
FIG. 9 is a conceptual diagram illustrating an example ranging system that may be used with one or more techniques of this disclosure.
Fig. 10 is a conceptual diagram illustrating an example vehicle-based scenario in which one or more techniques of the present disclosure may be used.
Fig. 11 is a conceptual diagram illustrating an example augmented reality system in which one or more techniques of the present disclosure may be used.
Fig. 12 is a conceptual diagram illustrating an example mobile device system that may use one or more techniques of this disclosure.
Fig. 13 is a flow chart illustrating an example method of encoding point cloud data in accordance with the techniques of this disclosure.
Detailed Description
In general, this disclosure describes techniques related to point cloud coding (encoding and/or decoding). Point cloud coding generally involves recursively dividing a three-dimensional space into nodes, and coding data that indicates whether the nodes are occupied by one or more points. The attribute data may also be decoded for the point. When decoding occupancy data of a node (i.e., whether the data is occupied by at least one point), various modes may be used, such as intra-prediction, inter-prediction, angular mode, or Inferred Direct Coding Mode (IDCM). The present disclosure recognizes that decoding occupancy of a node using IDCM may involve relatively high overhead bit costs. Accordingly, this disclosure describes techniques for restricting the use of IDCMs to IDCM for an appropriate time. For example, if a node is inter-predictable, inter-prediction may be used to more efficiently code the occupancy of the node. As another example, the present disclosure recognizes that the overhead of IDCM data is significantly reduced when angular mode is enabled for a node. Accordingly, this disclosure describes techniques for enabling IDCM when a node is not inter-predictable or when angular mode is enabled for the node. When an IDCM is enabled for a node, the IDCM may be used to decode occupancy data for the node.
Fig. 1 is a block diagram illustrating an example encoding and decoding system 100 that may perform the techniques of this disclosure. The technology of the present disclosure relates generally to decoding (encoding and/or decoding) point cloud data, i.e., supporting point cloud compression. Generally, point cloud data includes any data used to process a point cloud. Decoding may efficiently compress and/or decompress point cloud data.
As shown in fig. 1, the system 100 includes a source device 102 and a destination device 116. The source device 102 provides the encoded point cloud data for decoding by the destination device 116. Specifically, in the example of fig. 1, source device 102 provides point cloud data to destination device 116 via computer-readable medium 110. The source device 102 and the destination device 116 may comprise any of a wide range of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets (such as smartphones), televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, ground or marine vehicles, spacecraft, aircraft, robots, LIDAR devices, satellites, and the like. In some cases, the source device 102 and the destination device 116 may be equipped for wireless communication.
In the example of fig. 1, source device 102 includes a data source 104, a memory 106, a G-PCC encoder 200, and an output interface 108. Destination device 116 includes an input interface 122, a G-PCC decoder 300, a memory 120, and a data consumer 118. In accordance with the present disclosure, the G-PCC encoder 200 of the source device 102 and the G-PCC decoder 300 of the destination device 116 may be configured to apply the techniques of the present disclosure related to decoding point cloud data in direct mode. Thus, the source device 102 represents an example of an encoding device, while the destination device 116 represents an example of a decoding device. In other examples, the source device 102 and the destination device 116 may include other components or arrangements. For example, the source device 102 may receive data (e.g., point cloud data) from an internal source or an external source. Likewise, the destination device 116 may interface with an external data consumer without including the data consumer in the same device.
The system 100 as shown in fig. 1 is only one example. In general, other digital encoding and/or decoding devices may perform the techniques of the present disclosure related to decoding point cloud data in direct mode. Source device 102 and destination device 116 are merely examples of such devices, wherein source device 102 generates transcoded data for transmission to destination device 116. The present disclosure refers to a "transcoding" device as a device that performs transcoding (e.g., encoding and/or decoding) of data. Thus, the G-PCC encoder 200 and the G-PCC decoder 300 represent examples of decoding devices, specifically, encoders and decoders, respectively. In some examples, the source device 102 and the destination device 116 may operate in a substantially symmetrical manner such that each of the source device 102 and the destination device 116 includes an encoding component and a decoding component. Thus, the system 100 may support unidirectional or bidirectional transmission between the source device 102 and the destination device 116, for example, for streaming, playback, broadcasting, telephony, navigation, and other applications.
In general, the data source 104 represents a source of data (i.e., raw, unencoded point cloud data) and may provide a series of sequential "frames" of data to the G-PCC encoder 200, which encodes the data of the frames. The data source 104 of the source device 102 may include a point cloud capture device such as any of a variety of cameras or sensors (e.g., a 3D scanner or light detection and ranging (LIDAR) device, one or more video cameras), an archive containing previously captured data, and/or a data feed interface that receives data from a data content provider. Alternatively or additionally, the point cloud data may be computer generated from a scanner, camera, sensor, or other data. For example, the data source 104 may generate computer graphics-based data as source data, or a combination of real-time data, archived data, and computer-generated data. In each case, G-PCC encoder 200 encodes captured data, pre-captured data, or computer generated data. G-PCC encoder 200 may rearrange the frames from the received order (sometimes referred to as the "display order") to a coding order for coding. G-PCC encoder 200 may generate one or more bitstreams including the encoded data. The source device 102 may then output the encoded data onto the computer-readable medium 110 via the output interface 108 for receipt and/or retrieval by, for example, the input interface 122 of the destination device 116.
The memory 106 of the source device 102 and the memory 120 of the destination device 116 may represent general purpose memory. In some examples, memory 106 and memory 120 may store raw data, e.g., raw data from data source 104 and raw, decoded data from G-PCC decoder 300. Additionally or alternatively, memory 106 and memory 120 may store software instructions executable by, for example, G-PCC encoder 200 and G-PCC decoder 300, respectively. Although memory 106 and memory 120 are shown separate from G-PCC encoder 200 and G-PCC decoder 300 in this example, it should be understood that G-PCC encoder 200 and G-PCC decoder 300 may also include internal memory for functionally similar or equivalent purposes. Further, memory 106 and memory 120 may store encoded data, for example, output from G-PCC encoder 200 and input to G-PCC decoder 300. In some examples, portions of memory 106 and memory 120 may be allocated as one or more buffers, e.g., for storing raw, decoded, and/or encoded data. For example, memory 106 and memory 120 may store data representing a point cloud.
Computer-readable medium 110 may represent any type of medium or device capable of transmitting encoded data from source device 102 to destination device 116. In one example, the computer-readable medium 110 represents a communication medium for enabling the source device 102 to transmit encoded data directly to the destination device 116 in real-time, e.g., via a radio frequency network or a computer-based network. Output interface 108 may modulate a transmission signal including the encoded data and input interface 122 may demodulate a received transmission signal according to a communication standard, such as a wireless communication protocol. The communication medium may include any wireless or wired communication medium such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network such as: a local area network, a wide area network, or a global network such as the internet. The communication medium may include a router, switch, base station, or any other device that may be useful for facilitating communication from the source device 102 to the destination device 116.
In some examples, source device 102 may output encoded data from output interface 108 to storage device 112. Similarly, destination device 116 may access the encoded data from storage device 112 via input interface 122. Storage device 112 may comprise any of a variety of distributed or locally accessed data storage media such as a hard drive, blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded data.
In some examples, source device 102 may output the encoded data to file server 114 or another intermediate storage device that may store the encoded data generated by source device 102. The destination device 116 may access the stored data from the file server 114 via streaming or download. File server 114 may be any type of server device capable of storing encoded data and transmitting the encoded data to destination device 116. File server 114 may represent a web server (e.g., for a web site), a File Transfer Protocol (FTP) server, a content delivery network device, or a Network Attached Storage (NAS) device. The destination device 116 may access the encoded data from the file server 114 through any standard data connection, including an internet connection. This may include a wireless channel (e.g., wi-Fi connection), a wired connection (e.g., digital Subscriber Line (DSL), cable modem, etc.), or a combination of both suitable for accessing the encoded data stored on the file server 114. The file server 114 and the input interface 122 may be configured to operate in accordance with a streaming protocol, a download transfer protocol, or a combination thereof.
Output interface 108 and input interface 122 may represent a wireless transmitter/receiver, a modem, a wired networking component (e.g., an ethernet card), a wireless communication component operating according to any of a variety of IEEE 802.11 standards, or other physical components. In examples where output interface 108 and input interface 122 include wireless components, output interface 108 and input interface 122 may be configured to transmit data (such as encoded data) according to a cellular communication standard, such as 4G, 4G-LTE (long term evolution), LTE-advanced, 5G, etc. In some examples, where output interface 108 includes a wireless transmitter, output interface 108 and input interface 122 may be configured to transmit data, such as encoded data, according to other wireless standards, such as the IEEE 802.11 specification, the IEEE 802.15 specification (e.g., the ZigBee TM)、BluetoothTM standard, etc., in some examples, source device 102 and/or destination device 116 may include respective system-on-chip (SoC) devices.
The techniques of this disclosure may be applicable to encoding and decoding to support any of a variety of applications, such as communications between autonomous vehicles, scanners, cameras, sensors, and processing devices such as local or remote servers, geographic mapping, or other applications.
The input interface 122 of the destination device 116 receives the encoded bitstream from the computer readable medium 110 (e.g., communication medium, storage device 112, file server 114, etc.). The encoded bitstream may include signaling information defined by the G-PCC encoder 200, also used by the G-PCC decoder 300, such as syntax elements having values describing characteristics and/or processing of the coded units (e.g., slices, pictures, groups of pictures, sequences, etc.). The data consumer 118 uses the decoded data. For example, the data consumer 118 may use the decoded data to determine the location of the physical object. In some examples, the data consumer 118 may include a display for rendering images based on a point cloud.
G-PCC encoder 200 and G-PCC decoder 300 may each be implemented as any of a variety of suitable encoder and/or decoder circuitry, such as one or more microprocessors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware or any combination thereof. When the techniques are implemented in part in software, the apparatus may store instructions for the software in a suitable non-transitory computer readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of G-PCC encoder 200 and G-PCC decoder 300 may be included in one or more encoders or decoders, any of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device. The devices that include G-PCC encoder 200 and/or G-PCC decoder 300 may include one or more integrated circuits, microprocessors, and/or other types of devices.
The G-PCC encoder 200 and the G-PCC decoder 300 may operate according to a coding standard, such as a video point cloud compression (V-PCC) standard or a geometric point cloud compression (G-PCC) standard. The present disclosure may generally relate to coding (e.g., encoding and decoding) of pictures, including processes of encoding or decoding data. The encoded bitstream typically includes a series of values for syntax elements that represent coding decisions (e.g., coding modes).
The present disclosure may generally relate to "signaling" certain information, such as syntax elements. The term "signaling" may generally refer to communication of values for syntax elements and/or other data for decoding encoded data. That is, the G-PCC encoder 200 may signal values for the syntax elements in the bitstream. In general, "signaling" refers to generating a value in a bit stream. As noted above, the source device 102 may stream the bits to the destination device 116 in substantially real-time or non-real-time (such as may occur when the syntax elements are stored to the storage device 112 for later retrieval by the destination device 116).
ISO/IEC MPEG (JTC 1/SC 29/WG 11) is studying the potential need for, and will strive to create, a point cloud decoding technology standardization with compression capabilities significantly exceeding current approaches. The team jointly performs this exploratory activity in a collaboration called a three-dimensional graphics team (3 DG) to evaluate the compressed technical design proposed by the domain expert.
Point cloud compression activities are categorized into two different approaches. The first approach is "video point cloud compression" (V-PCC), which segments 3D objects and projects these segments to multiple 2D planes, which are denoted as "blobs" in 2D frames, which are further coded by legacy 2D video codecs, such as High Efficiency Video Coding (HEVC) (ITU-T h.265) codecs. The second approach is "geometry-based point cloud compression" (G-PCC), which directly compresses the 3D geometry (i.e., the location of the point set in 3D space) and associated attribute values (for each point associated with the 3D geometry). G-PCC addresses compression of point clouds in both class 1 (static point clouds) and class 3 (dynamically acquired point clouds). The latest draft of the G-PCC standard is available in G-PCC DIS (ISO/IEC JTC1/SC29/WG11 w19088, brussell, belgium, month 1 2020), and the description of the codec is available in G-PCC codec Specification v6 (ISO/IEC JTC1/SC29/WG11 w19091, brussell, 2020 month 1).
The point cloud contains a set of points in 3D space and may have attributes associated with those points. The attribute may be color information such as R, G, B or Y, cb, cr, or reflectivity information, or other attributes. The point cloud may be captured by various cameras or sensors (such as LIDAR sensors and 3D scanners) and may also be computer generated. The point cloud data is used for a variety of applications including, but not limited to, construction (modeling), graphics (3D models for visualization and animation), and the automotive industry (LIDAR sensors for aiding navigation).
The 3D space occupied by the point cloud data may be enclosed by a virtual bounding box. The position of a point in the bounding box can be represented with a certain accuracy; thus, the location of one or more points may be quantified based on the accuracy. At the minimum level, the bounding box is split into voxels, which are the smallest spatial units represented by the unit cubes. Voxels in the bounding box may be associated with zero, one, or more than one point. The bounding box may be split into multiple cube/cuboid regions, which may be referred to as tiles. Each tile may be coded as one or more slices. Dividing the bounding box into slices and tiles may be based on the number of points in each partition, or based on other considerations (e.g., a particular region may be coded as a tile). The slice region may be further partitioned using a split decision similar to that in a video codec.
Fig. 2 provides an overview of a G-PCC encoder 200. Fig. 3 provides an overview of a G-PCC decoder 300. The modules shown are logical and do not necessarily correspond one-to-one to the implemented code in the reference implementation of the G-PCC codec, i.e. TMC13 test model software studied by ISO/IEC MPEG (JTC 1/SC 29/WG 11). In both the G-PCC encoder 200 and the G-PCC decoder 300, the point cloud location is first decoded. Attribute coding depends on the decoded geometry.
For class 3 data, the compressed geometry is typically represented as a leaf-level octree from root up to individual voxels. For class 1 data, the compressed geometry is typically represented by a pruned octree (i.e., an octree from root to leaf level of a block larger than the voxel) plus a model approximating the inner surface of each leaf of the pruned octree. In this way, the class 1 data and the class 3 data share an octree coding mechanism, while the class 1 data may additionally approximate voxels within each leaf with a surface model. The surface model used is a triangulation comprising 1-10 triangles per block, resulting in a triangular soup. Thus, the class 1 geometry codec is referred to as Trisoup geometry codec, while the class 3 geometry codec is referred to as octree geometry codec.
At each node of the octree, occupancy (when not inferred) is signaled for one or more of its children (up to eight nodes). A plurality of neighbors is specified, including (a) nodes that share a face with a current octree node, (b) nodes that share a face, edge, or vertex with a current octree node, and so on. Within each neighborhood, the occupancy of a node and/or its children may be used to predict the occupancy of the current node or its children. For sparsely populated points in certain nodes of the octree, the codec also supports a direct coding mode in which 3D locations of the points are directly encoded. A flag may be signaled to indicate that direct mode is signaled. At the lowest level, the number of points associated with octree nodes/leaf nodes may also be coded.
Once the geometry is decoded, the attributes corresponding to the geometry point are decoded. When there are a plurality of attribute points corresponding to one reconstructed/decoded geometric point, an attribute value representing the reconstructed point can be derived.
There are three attribute decoding methods in G-PCC: the Region Adaptive Hierarchical Transform (RAHT) coding, interpolation-based hierarchical nearest neighbor prediction (predictive transform) and interpolation-based hierarchical nearest neighbor prediction with update/boost steps (boost transform). RAHT and lifting are typically used for category 1 data, while prediction is typically used for category 3 data. However, either method can be used for any data, and as with the geometry codec in G-PCC, the attribute decoding method used to decode the point cloud is specified in the bitstream.
The encoding of the attributes may be performed at levels of detail (LOD), where a finer representation of the point cloud attributes may be obtained by each level of detail. Each level of detail may be specified based on a distance metric from neighboring nodes or based on a sampling distance.
At the G-PCC encoder 200, the residual obtained as an output of the decoding method for the attributes is quantized. The residual may be obtained by subtracting the attribute value from a prediction derived based on the points in the neighborhood of the current point and based on the attribute value of the previously encoded point. The quantized residue may be encoded using context-adaptive arithmetic coding.
The G-PCC also includes an angular coding model. Angular decoding modes may use sensor characteristics, such as typical LIDAR sensors, to improve the decoding efficiency of planar modes. The angular coding mode may optionally be used with a planar mode, and coding of vertical (z) planar position syntax elements may be improved by employing data regarding the position and angle of the sensing laser beam in a typical LIDAR sensor. Furthermore, an angular coding mode may optionally be used to improve the coding of the vertical z-position bits in an Inferred Direct Coding Mode (IDCM). The angular coding mode may use simplified context derivation and efficient High Level Signaling (HLS) coding of sensor data parameters.
The azimuth coding mode is similar to the angular mode and extends the angular mode to coding of (x) plane location syntax elements and (y) plane location syntax elements of the planar mode and improves coding of x-bit bits or y-bit bits under IDCM. Azimuth decoding mode may use a reduced number of contexts.
The specifications relating to the planar coding mode are summarized as follows:
8.2.3.1 qualification of nodes for planar coding modes
Splitting and relocating
Explicit coding of occupied planes is conditional on probabilities.
An array PLANARRATE with elements PLANARRATE [ k ] (k=0..2) is an estimate of the probability that the occupancy of a node forms a single plane perpendicular to the kth axis.
Variable LocalDensity is an estimate of the average number of child nodes occupied in a node.
Variable NumNodesUntilPlanarUpdate counts the number of nodes to resolve before updating PLANARRATE and LocalDensity.
At the beginning of parsing the geometry_octree syntax structure, PLANARRATE and LocalDensity are initialized as follows:
for(k=0;k<3;k++)
PlanarRate[k]=1024
LocalDensity=4096
NumNodesUntilPlanarUpdate=0
At the beginning of parsing each geometry_octree_node syntax structure, numNodesUntilPlanarUpdate is decremented. If NumNodesUntilPlanarUpdate is less than zero, then PLANARRATE and LocalDensity are updated as follows:
-the number of occupied sibling nodes is determined and used to update LocalDensity the estimation:
let numSiblings=NodeNumChildren[depth-1][sNp][tNp][vNp]
LocalDensity=(255×LocalDensity+1024×numSiblings)>>8
the number of nodes until the next update is:
NumNodesUntilPlanarUpdate=numSiblings-1
Occupancy information of the parent node is used to determine the presence of a single occupied plane along each axis and update the corresponding plane probability estimate PLANARRATE k.
At the beginning of parsing each geometry_octree_node syntax structure, it is determined for each axis whether the current node is eligible to signal plane information. The output of this process is array PlanarEligible with elements PlanarEligible k (k=0..2).
First, the order planeOrder [ k ] from most likely to least likely for the three planes is determined using PLANARRATE according to table 18 below.
Then, planarEligible is set as follows:
TABLE 18 determination of planeOrder k values from PLANARRATE k
The G-PCC encoder 200 and the G-PCC decoder 300 may code values for syntax elements (such as is_player_flag syntax elements) indicating whether a node is planar according to the following semantics: an is_planar_flag [ axisIdx ] equal to 1 indicates that the position of the child node of the current node forms a single plane perpendicular to the axisIdx th axis. When present, an is_planar_flag [ axisIdx ] equal to 0 indicates that the position of the child node of the current node occupies two planes perpendicular to the axisIdx th axis. The G-PCC encoder 200 and the G-PCC decoder 300 may decode the is_planar_flag using a context index set equal to axisIdx, as indicated in the 2020 month 11 teleconference G-PCC DIS, ISO/IEC JTC1/SC29/WG11 w 55637.
The G-PCC standard specification for tracking nodes along an axis is reproduced as follows:
8.2.3.2 buffer tracking nearest nodes along axis
Array PlanarPrevPos, planarPlane, isPlanarNode records information about previously decoded geometry tree nodes for use in determining ctxIdx for the syntax element plane_position. When the geometry_player_enabled_flag is equal to 0 or the player_buffer_disabled_flag is equal to 1, the decoding process does not use an array.
In this process, variable axisIdx is used to represent one of the three decoding axes, and variable axisPos represents the position of the node along the axisIdx axis. axisPos values are in the range 0..0x3fff.
Array IsPlanarNode with value IsPlanarNode [ axisIdx ] [ axisPos ] indicates whether the most recently decoded node with axisIdx position component equal to axisPos is planar in a plane perpendicular to axisIdx th axis.
Array PlanarPrevPos, having a value PlanarPrevPos [ axisIdx ] [ axisPos ], stores the largest position component of the most recently decoded node having a axisIdx th position component equal to axisPos.
Array PLANARPLANE with value PLANARPLANE [ axisIdx ] [ axisPos ] indicates the value of plane_position [ axisIdx ] for the most recently decoded node with axisIdx th position component equal to axisPos.
At the beginning of each geometry tree level, each element of arrays PlanarPrevPos and IsPlanarNode is initialized to 0.
After decoding each geometry_player_mode_data syntax structure with parameters childIdx and axisIdx, the arrays PlanarPrevPos, planarPlane and IsPlanarNode are updated as follows:
-deriving a variable axisPos representing position along the axisIdx th axis as follows:
if(axisIdx==0)axisPos=sN&0x3fff
if(axisIdx==1)axisPos=tN&0x3fff
if(axisIdx==2)axisPos=vN&0x3fff
-updating the array entries corresponding to the nodes as follows:
if(axisIdx==0)maxPos=Max(tN&0x7c0,vN&0x7c0)>>3
if(axisIdx==1)maxPos=Max(sN&0x7c0,vN&0x7c0)>>3
if(axisIdx==2)maxPos=Max(sN&0x7c0,tN&0x7c0)>>3
PlanarPrevPos[axisIdx][axisPos]=maxPos
if(is_planar_flag[axisPos])
PlanarPlane[axisIdx][axisPos]=plane_position[axisIdx]
IsPlanarNode[axisIdx][axisPos]=is_planar_flag[axisIdx]
8.2.3.3 determine ctxIdx for the syntax element plane_position
The inputs to this process are:
Variables axisIdx identifying axes orthogonal to the plane, and
-The location (sN, tN, vN) of the current node within the geometry tree level.
The output of this process is the variable ctxIdx.
Variable neighOccupied indicates whether there are nodes adjacent to the current node along the axisIdx th axis. The derivation is as follows:
neighOccupied=(NeighbourPattern>>2×axisIdx)&3
adjPlaneCtxInc=neighOccupied==30:neighOccupied
if(axisIdx==0&&neighOccupied==3)
adjPlaneCtxInc=((neighOccupied&1)<<1)|(neighOccupied>>1)
When the planar_buffer_disabled_flag is equal to 1, the value of ctxIdx is set equal to adjPlaneCtxInc, and this process performs no further processing. Otherwise, the remainder of this clause applies.
Variable axisPos indicates that the current node's 14 least significant bits along the axisIdx axis:
if(axisIdx==0)axisPos=sN&0x3fff
if(axisIdx==1)axisPos=tN&0x3fff
if(axisIdx==2)axisPos=vN&0x3fff
The variable dist represents the distance between the current node and the most recently decoded node location along the axisIdx th axis having the same axisPos value. The derivation is as follows:
a=PlanarPrevPos[axisIdx][axisPos]
if(axisIdx==0)b=Max(tN&0x7c0,vN&0x7c0)>>3
if(axisIdx==1)b=Max(sN&0x7c0,vN&0x7c0)>>3
if(axisIdx==2)b=Max(sN&0x7c0,tN&0x7c0)>>3
dist=Abs(a-b)
The context index ctxIdx is derived as follows:
8.2.3.4 determine planePosIdxAzimuthalS and planePosIdxAzimuthalT for coding horizontal plane positions
The determination planePosIdxAngularS for arithmetic coding of the plane_position [0] and planePosIdxAngularT for arithmetic coding of the plane_position [1] are obtained as follows.
When the geometry_angular_enabled_flag is equal to 0, the values of both planePosIdxAzimuthalS and planePosIdxAzimuthalT are set equal to planePosIdx. Otherwise, the following applies:
the determination contextAngular for arithmetic coding of the plane_position [2] is performed as described in XREF.
8.2.3.5 Determining planePosIdxAngular for decoding vertical plane position
PlanePosIdxAngular determined for arithmetic coding of the plane_position [2] is obtained as follows.
When the geometry_angular_enabled_flag is equal to 0, the value of planePosIdxAngular is set equal to planePosIdx. Otherwise, the following applies:
Determining contextAngular for arithmetic coding of plane_position [2] is performed as described in section 8.2.5.3.
The following summarizes the angle pattern syntax for G-PCC:
Specific syntax elements carrying LIDAR laser sensor information that may provide coding efficiency benefits for the angular coding mode are discussed below. The semantics of a particular syntax element are specified as follows:
A geometry_player_enabled_flag equal to 1 indicates that the plane coding mode is activated. A geometry_player_enabled_flag equal to 0 indicates that the plane coding mode is not activated. When not present, the geometry_player_enabled_flag is inferred to be 0.
Geom _planar_th [ i ] (i is within range 0..2) specifies the value of the activation threshold for the planar coding mode along the i-th most likely direction for the planar coding mode to make the planar coding mode efficient.
Geom _ idcm _rate_minus1 specifies the rate at which a node may be eligible for direct decoding. When not present geom _ idcm _rate_minus1 is inferred to be 31.
Array IdcmEnableMask is derived as follows:
The geometry_angular_enabled_flag being equal to 1 indicates that the angular coding mode is activated. A geometry_angular_enabled_flag equal to 0 indicates that the angular coding mode is not activated.
Geom _slice_angular_origin_present_flag equal to 1 specifies that the slice relative angular origin exists in the geometry data unit. geom _slice_angular_origin_present_flag equal to 0 specifies that the angular origin is not present in the geometry data elements. When not present, geom _slice_angular_origin_present_flag is inferred to be 0.
Geom _angular_original_bits_minus1 plus 1 is the bit length of the syntax element geom _angular_original_xyz [ k ].
Geom _angular_origin_xyz [ k ] specifies the kth component of the (x, y, z) coordinates of the origin used in the processing of the angular decoding mode. When not present, the value of geom _angular_origin_xyz [ k ] (k=0..2) is inferred to be 0.
Geom _angular_angular_scale_log2_minus11 and geom _angular_radius_scale_log2 specify factors that are used to scale positions coded using a spherical coordinate system during conversion to cartesian coordinates.
Geom _angular_azimuth_step_minus1 plus 1 specifies the unit change of azimuth. The differential prediction residual used in the angular prediction tree coding may be represented in part as a multiple of geom _angular_azimuth_geo_minus1 plus 1. The value of geom _angular_azimuth_step_minus1 should be less than (1 < (geom _angular_scale_log2_minus11+12)).
Number_lasers_minus1 plus 1 specifies the number of lasers used for the angular decoding mode.
Laser_angle_init and laser_angle_diff [ i ] (i=0..number_lasers_minus1) specify the tangent of the elevation angle of the ith laser with respect to the horizontal plane defined by the first and second decoding axes.
Deriving array LASERANGLE [ i ] (i=0..number_lasers_minus1):
The requirement for bitstream consistency is that the value of LASERANGLE [ i ] (i=1..number_lasers_minus1) should be greater than or equal to LASERANGLE [ i-1].
Laser_correction_init and laser_correction_diff [ i ] (i=1..number_lasers_minus1) specify a correction of the ith laser position along the second axis relative to GeomAngularOrigin [2 ].
Laser_phi_per_turn_init_minus1 and laser_phi_per_turn_diff [ i ] (i=1.) number_lasers_minus1) specify the number of samples produced by the ith laser of the rotation sensing system located at the origin used in the processing of the angular decoding mode.
Arrays LaserCorrection [ i ] and LaserPhiPerTurn [ i ] (i=1..number_lasers_minus1) were derived as follows:
the requirement for bitstream consistency is that the value LaserPhiPerTurn i (i=0..number_lasers_minus1) should not be 0.
Arrays DeltaPhi [ i ] and INVDELTAPHI [ i ] (i=0..number_lasers_minus1) were derived as follows:
A player_buffer_disabled_flag equal to 1 indicates that the use of the buffer tracking nearest node is not used in decoding the plane mode flag and plane position in plane mode. A player_buffer_disabled_flag equal to 0 indicates that the use of a buffer to track the most recent node is used. When not present, the player_buffer_disabled_flag is inferred to be-! geometry_player_enabled_flag.
Table 2. Geometry parameter set syntax. The corner pattern syntax elements are highlighted using < ++ > and ++ > labels.
/>
/>
/>
The data syntax of the plane mode and the direct mode are included in tables 3 and 4, respectively.
TABLE 3 geometric octree pattern data grammar
/>
TABLE 4 direct mode data grammar
/>
8.2.4.1 Derivation of angular qualification for nodes
If the geometry_angular_enabled_flag is equal to 0, angular_ eligible is set equal to 0.
Otherwise, the following applies:
The variable DELTAANGLE specifying the minimum angular distance between lasers is derived as follows:
finally, angular_ eligible was derived as follows:
midNodeS=1<<(Max(1,ChildNodeSizeLog2[0])-1)
8.2.4.2 derivation of laser index laserIndex associated with a node
If angle_ eligible is equal to 0, laserIndex index is set to a preset value unknown_laser.
Otherwise, if angle_ eligible is equal to 1, the following applies as a continuation of the process described in 8.2.5.1.
First, the inverse rInv of the radial distance of the current node from the laser radar is determined as follows:
r2=sLidar×sLidar+tLidar×tLidar
rInv=IntRecipSqrt(r2)
Then, the angle theta32 is determined as follows:
vLidar=((vNchild-GeomAngularOrigin[2]+midNodeT)<<1)-1
theta=vLidar×rInv
theta32=theta>=0theta>>15:-((-theta)>>15)
finally, based on Parent node Parent, the corner qualifications and associated lasers are determined as follows.
/>
8.2.4.3 Derivation of contexts contextAzimuthalS and contextAzimuthalT for plane coding modes
The following applies as a continuation of the process described in 8.2.5.2.
First, two corners are decoded according to the node position relative to the corner origin
sPos=sNchild-GeomAngularOrigin[0]
tPos=tNchild-GeomAngularOrigin[1]
phiNode=IntAtan2(tPos+midNodeT,sPos+midNodeS)
phiNode0=IntAtan2(tPos,sPos)
Second, obtain azimuth predictor from array phiBuffer
predPhi=phiBuffer[laserIndex]
if(predPhi==0x80000000)
predPhi=phiNode
Two azimuth contexts are initialized as follows
contextAzimuthalS=-1
contextAzimuthalT=-1
Then, if predictor predPhi is not equal to 0x80000000, the following applies to refine both azimuth contexts
/>
8.2.4.4 Derivation process for context contextAngular of plane coding mode
If the LASER index laserIndex is equal to UNKNOWN_LASER, contextAngular is set to a preset value UNKNOWN_CONTEXT. Otherwise, if the LASER index laserIndex is not equal to UNKOWN _laser, the following applies as a continuation of the process described in 8.2.5.2.
First, two angular differences thetaLaserDeltaBot and thetaLaserDeltaTop are determined with respect to the lower and upper planes.
thetaLaserDelta=LaserAngle[laserIndex]-theta32
Hr=LaserCorrection[laserIndex]×rInv;
thetaLaserDelta+=Hr>=0?-(Hr>>17):((-Hr)>>17)
vShift=(rInv<<ChildNodeSizeLog2[2])>>20
thetaLaserDeltaTop=thetaLaserDelta-vShift
thetaLaserDeltaBot=thetaLaserDelta+vShift
Then, the angle context is deduced from the two angle differences.
contextAngular=thetaLaserDelta<0
if(thetaLaserDeltaTop>=0||thetaLaserDeltaBot<0)
contextAngular+=2
When the intra-tree quantization and angular modes are jointly enabled, scaled versions of one or more of the effective node size, point position, and offset may be used for contextual derivation for the planar mode to ensure that the position/offset/node size and angular origin are used in the same proportion; this may be useful, for example, in the correct derivation and contextual derivation of laser indexes. Not using scaling values may result in incorrect derivation of the laser index or context.
G-PCC encoder 200 and G-PCC decoder 300 may be configured to decode data using an Inferred Direct Coding Mode (IDCM). The syntax associated with the IDCM mode may include the following:
inferred _direct_coding_mode greater than 0 indicates that direct_mode_flag may be present in the geometric node syntax. inferred _direct_coding_mode equal to 0 indicates that direct_mode_flag does not exist in the geometry node syntax.
Joint_2point_idcm_enabled_flag equal to 1 indicates that joint coding of two points is activated in direct coding mode. joint_2point_idcm_enabled_flag equal to 0 indicates that joint coding of two points is not activated.
Geom _ idcm _rate_minus1 specifies the rate at which a node may be eligible for direct decoding. When not present geom _ idcm _rate_minus1 is inferred to be 31.
Array IdcmEnableMask is derived as follows:
direct_point_cnt_eq2_flag equal to 1 specifies that the current node contains two point_offset values representing residuals for two coding points. direct_point_cnt_eq2_flag equal to 0 specifies that the current node contains a single point_offset value representing a residual of a single point position that is copied zero or more times.
Dup_point_cnt_gt0_flag, dup_point_cnt_gt1_flag, and
The dup_point_cnt_minus2 together specifies the number of times a single point_offset value is repeated to represent multiple points with the same position in the reconstructed point cloud. Any of the dup_point_cnt_gt0_flag, dup_point_cnt_gt1_flag, or dup_point_cnt_minus2 that is not present is inferred to be 0.
Variables DirectDupPointCnt representing the number of point repetitions were derived as follows:
DirectDupPointCnt=
dup_point_cnt_gt0_flag+dup_point_cnt_gt1_flag+dup_point_cnt_minus2
An array PointOffset having elements PointOffset [ i ] [ k ] (i=0.. NumDirectPoints-1 and k=0..2) represents the position of the kth dimension of the ith point relative to the full resolution position of the current node. PointOffset [ i ] [ k ] consists of EffectiveNodeSizeLog [ k ] bits and is derived as follows.
Variable NodeSizeLog Rem [ k ] indicates the number of bits that remain to be derived for PointOffset [ i ] [ k ] independent of i. The initialization of NodeSizeLog Rem and array PointOffset is performed for each value of i by
NodeSizeLog2Rem[k]=EffectiveNodeSizeLog2[k]
for(k=0;k<3;k++)
PointOffset[i][k]=0
If is_player_flag [ k ] is equal to 1, deriving PointOffset [ i ] [ k ] from the plane_position [ k ]:
/>
The corresponding jth bits of the same_bit [ k ] [ j ] equal to 1 designation PointOffset [0] [ k ] and PointOffset [1] [ k ] are equal. The same_bit [ k ] [ j ] being equal to 0 specifies that the two jth bits are not equal.
Value_bit [ k ] [ j ] indicates the value of the j-th bit of PointOffset [0] [ k ]. When value_bit [ k ] [ j ] does not exist, its value is inferred to be 0.
A variable EligTwoPoints k equal to 1 indicates that the kth component of the points contained by the node is eligible for joint decoding of both points. EligTwoPoints [ k ] equal to 0 indicates that the kth component of the points contained by the node is not eligible for joint decoding of both points.
A variable samePrecComp [ k ] equal to 1 indicates that the components 0 through k-1 of the two points contained by the node are equal. Otherwise, samePrecComp [ k ] equal to 0 indicates that one of the components 0 to k-1 of the two points is different. samePrecComp [ k ] is initialized to 1.
for(k=0;k<3;k++)
samePrecComp[k]=1
If joint decoding of two points is activated, if there are two points in a node and if the kth component is eligible for joint decoding, joint two-point decoding is performed for this component.
Point_offset [ i ] [ k ] [ j ] is the jth bit of the corresponding s, t, and v coordinates of the ith point of the current node relative to the kth component of the origin of the current node.
The NodeSizeLog Rem [ k ] remaining bits for each point offset are set as follows:
for(k=0;k<3;k++)
for(j=NodeSizeLog2Rem[k]-1;j>0;j--)
PointOffset[i][k]=(PointOffset[i][k]<<1)+point_offset[i][k][j]
laser_residual_abs_gt0_flag[ptIdx]、laser_residual_sign[ptIdx]、laser_residual_abs_gt1_flag[ptIdx]、laser_residual_abs_gt2_flag[ptIdx] And laser_residual_abs_minus3[ ptIdx ] together specify a residual laser index value associated with ptIdx th point of the current node using the inferred direct coding mode when the geometry_angular_enabled_flag is equal to 1. Any of laser_residual_abs_gt0_flag[ptIdx]、laser_residual_sign[ptIdx]、laser_residual_abs_gt1_flag[ptIdx]、laser_residual_abs_gt2_flag[ptIdx] that are not present and laser_residual_minus3[ ptIdx ] are inferred to be 0.
G-PCC decoder 300 may be configured to parse and inverse binarize syntax elements related to IDCM modes as follows:
10.8 inferred direct coding mode resolution procedure
10.8.1 General procedure
Parsing and inverse binarization of syntax elements, same_bit [ k ] [ j ], value_bit [ k ] [ j ] and point_offset [ i ] [ k ] [ j ], for point index i, component index k and bit index j are described in sub-clauses 9.8.2 to 9.8.5.
The output of the procedure is an offset of one point belonging to the current node in case of direct_point_cnt_eq2_flag value of 0, or an offset of two points belonging to the current node in case of direct_point_cnt_eq2_flag value of 1. When present, these offsets are PointOffset [0] [ k ] for the first point and PointOffset [1] [ k ] for the second point.
Each offset PointOffset [ i ] [ k ] consists of EffectiveNodeSizeLog [ k ] bits decoded from the most significant bit to the least significant bit for each component k and each point i. For this purpose, the IDCM process uses the following variables
-Number of bits to be decoded still for offset of component k NodeSizeLog Rem [ k ], independent of the point index
-Partial decoding partialOffset [ i ] [ k ] of the kth component of the ith point
At any step in the process, the value partialOffset [ i ] [ k ] represents the EffectiveNodeSizeLog [ k ] -NodeSizeLog2Rem [ k ] most significant bits of PointOffset [ i ] [ k ]. During this process partialOffset bits are determined one after the other, while NodeSizeLog Rem [ k ] is reduced by one for each determined bit to reach a final state where NodeSizeLog Rem [ k ] equals 0 and partialOffset [ i ] [ k ] equals PointOffset [ i ] [ k ].
The IDCM process proceeds through sub-clauses 9.8.2 to 9.8.5 in the following order and conditions
Sub-clause 9.8.2 for initialization of process variables and pushing the most significant bits of breakpoint offset by planar mode
Then, if joint decoding of two points is activated (joint_2point_idcm_enabled_flag is equal to 1) and there are two points in the current node (direct_point_cnt_eq2_flag is equal to 1), sub-clause 9.8.3
Then, if the angular mode is activated (geometry_angular_enabled_flag is equal to 1), sub-clause 9.8.4, otherwise (geometry_angular_enabled_flag is equal to 0)
10.8.2 Initialization and plane inference
For all components k and points i, the number of remaining bits and partial offsets is initialized by
If available (is_planar_flag [ k ] equals 1), the most significant bit of the breakpoint offset is pushed through planar mode as follows
In the case where the angular coding mode is activated, the horizontal position in coordinates used by the current node in the processing of the angular coding mode is used to determine a variable byPassSorT indicating which of the S or T components is allowed to be bypass coded
10.8.3 Joint decoding of offsets of two points
The procedure in this section only applies when joint_2point_idcm_enabled_flag is equal to 1 and direct_point_cnt_eq2_flag is equal to 1.
First, the value of EligTwoPoints [ k ] indicating whether the kth component of two points is eligible for joint decoding is initialized by
for(k=0;k<3;k++)
EligTwoPoints[k]=!geometry_angular_enabled_flag
Then, in the event that the corner decode mode is activated, the qualification is further determined using variable byPassSorT
An array samePrecComp [ k ] indicating that the components 0 through k-1 of the two points contained by the node are equal is initialized to
for(k=0;k<3;k++)
samePrecComp[k]=1
The joint decoding process is then applied to the qualified components in ascending order
10.8.4 Angular and azimuthal decoding of point offsets
10.8.4.1 General
The procedure in this section only applies when the geometry_angular_enabled_flag is equal to 1. This process applies to the sub-process described in the sub-section below. Sub-section 9.8.4.2 is applied once, and then sub-sections 9.8.4.3 to 9.8.4.6 are applied to each point i belonging to the current node.
10.8.4.2 Evaluate a laser index associated with a current node
Based on the best knowledge of the position (after planar inference and joint decoding of the position of the first point belonging to the current node) of that position, an estimate laserIndexEstimate of the index of the laser of that point has been detected.
First, the best known 3D bestKnownPos location of the first point is obtained by
bestKnownPos[0]=sN<<EffectiveNodeSizeLog2[0]
bestKnownPos[1]=tN<<EffectiveNodeSizeLog2[1]
bestKnownPos[2]=vN<<EffectiveNodeSizeLog2[2]
bestKnownPos[0]+=partialOffset[0][0]<<EffectiveNodeSizeLog2[0]-NodeSizeLog2Rem[0]
bestKnownPos[1]+=partialOffset[0][1]<<EffectiveNodeSizeLog2[1]-NodeSizeLog2Rem[1]
bestKnownPos[2]+=partialOffset[0][2]<<EffectiveNodeSizeLog2[2]-NodeSizeLog2Rem[2]
Next, the best known positioning position bestKnownPos Lidar [0] among the coordinates used in the processing of the angle decoding mode is deduced by
Third, the angular value bestKnownAngle associated with this position is determined by
sPoint=bestKnownPos2Lidar[0]<<8
tPoint=bestKnownPos2Lidar[1]<<8
r2=sPoint×sPoint+tPoint×tPoint
rInvPoint=IntRecipSqrt(r2)
bestKnownAngle=bestKnownPos2Lidar[2]*rInvPoint>>14
Laser index estimate laserIndexEstimate is obtained as follows as an index of the laser having the closest angle to bestKnownAngle
10.8.4.3 Bypass decoding of the first component S or T of Point_offset
The component bypassSorT (whose value is 0 for S and 1 for T) belonging to the i-th point of the current node is bypass decoded.
At the end of this sub-process NodeSizeLog Rem [ bypassSorT ] equals 0. The bypassSorT th component for the point offset has no more bits to decode and partialOffset [ i ] [ bypassSorT ] is equal to the full point offset PointOffset [ i ] [ bypassSorT ].
10.8.4.4 Determining a laser index associated with a point
Deriving a laser index residual associated with an ith point belonging to the current node from the decoded value
laserIndexResidual[i]
laserIndexResidual[i]=
(1-2×laser_residual_sign_flag)
×(laser_residual_abs_gt0_flag+laser_residual_abs_gt1_flag
+laser_residual_abs_gt2_flag+laser_residual_abs_minus3_flag)
The laser index associated with the ith point belonging to the current node is then obtained by summing
laserIndex[i]
laserIndex[i]=laserIndexEstimate+laserIndexResidual[i]
The requirement for bitstream consistency is that laserIndex i should be within 0.
Angle-of-azimuth decoding of 10.8.4.5 second component S or T of point offset
Components 1-bypassSorT (whose values are 0 for S and 1 for T) belonging to the i-th point of the current node are decoded using the azimuth decoding mode.
The best known horizontal position of point i in coordinates in the process for angular coding mode is calculated using the decoded bits in the partial offset by
posPoint2LidarS[i]=(sN<<EffectiveNodeSizeLog2[0])-GeomAngularOrigin[0]
posPoint2LidarT[i]=(tN<<EffectiveNodeSizeLog2[1])-GeomAngularOrigin[1]
posPoint2LidarS[i]+=partialOffset[i][0]<<NodeSizeLog2Rem[0]
posPoint2LidarT[i]+=partialOffset[i][1]<<NodeSizeLog2Rem[1]
Then, an initial value of the azimuth predictor predPhi is determined from the buffer phiBuffer.
phiNode=IntAtan2(posPoint2LidarT[i],posPoint2LidarS[i])
predph=phiBuffer[laserIndex[i]]
if(predPhi==0x80000000)
predPhi=phiNode
nShift=((predPhi-phiNode)*InvDeltaPhi[laserIndex[i]]+536870912)>>30
predPhi-=DeltaPhi[laserIndex[i]]*nShift
The remainder of the point partial offsets partialOffset [ i ] [1-bypassSorT ] are iteratively decoded on the remainder bits in loop j to decode for the partial offsets of components 1-bypassSorT. In the loop, the azimuth context idcmIdxAzimuthal [ i ] [ j ] is determined and used to decode the syntax element point_offset [ i ] [1-bypassSorT ] [ j ]. The position of the point (posPoint 2LidarS [ i ] or posPoint2Lidart [ i ]) is also updated iteratively, depending on the components involved in the azimuth decoding.
/>
Buffer phiBuffer is then updated
phiBuffer[laserIndex[i]]=phiNode
Angle decoding of 10.8.4.6 component V of point offset
The last component V belonging to the i-th point of the current node is decoded using the angle decoding mode.
Known horizontal positions posPoint, lidarS [ i ] and posPoint, 2, lidarT [ i ] are decoded from azimuth, and the inverse horizontal radial distance rInv is decoded by
sLidar=(posPoint2LidarS[i]<<8)-128
tLidar=(posPoint2LidarT[i]<<8)-128
r2=sLidar×sLidar+tLidar×tLidar
rInv=IntRecipSqrt(r2)
Using the decoded bits in the partial offset, the best known vertical position of point i in the coordinates in the process for the angular coding mode is calculated by
posPoint2LidarV[i]=(vN<<EffectiveNodeSizeLog2[2])-GeomAngularOrigin[2]
posPoint2LidarV[i]+=partialOffset[i][2]<<NodeSizeLog2Rem[2]
The corrected laser angle THETALASER of the laser associated with the point is
Hr=LaserCorrection[laserIndex[i]]×rInv
ThetaLaser=LaserAngle[laserIndex[i]]+(Hr>=0?-(Hr>>17):((-Hr)>>17))
The remainder of the point partial offset partialOffset [ i ] [2] is iteratively decoded on the remainder bits in loop j to decode for the partial offset of component V. In the loop, an angle context idcmIdxAngular [ i ] [ j ] is determined and used to decode the syntax element Point_offset [ i ] [2] [ j ]. The location posPoint of the point LidarV i is also iteratively updated.
Bypass decoding of all components of 10.8.5 point offsets
The process in this section only applies when the geometry_angular_enabled_flag is equal to 0.
In this process, the remaining bits of the point offset are determined by bypass decoding of the point_offset [ i ] [ k ] [ j ]. This is performed for each point index i and each component k as follows
At the end of this process NodeSizeLog Rem [ k ] is equal to 0 for all k. No more bits are to be decoded for the point offset and partialOffset i k is equal to the full point offset PointOffset i k.
When in-tree quantization, corner modes, and IDCM are jointly enabled, scaled versions of one or more of the effective node size, point position, and offset may be used in the IDCM decoding process to ensure that the position/offset/node size and corner origin are used in the same scale; this may be useful, for example, in the correct derivation and contextual derivation of laser indexes. Not using scaling values may result in incorrect derivation of the laser index or context.
In the example of fig. 2, the G-PCC encoder 200 may include a coordinate transformation unit 202, a color transformation unit 204, a voxelization unit 206, an attribute transmission unit 208, an octree analysis unit 210, a surface approximation analysis unit 212, an arithmetic coding unit 214, a geometry reconstruction unit 216, RAHT unit 218, a LOD generation unit 220, a lifting unit 222, a coefficient quantization unit 224, a memory 228, and an arithmetic coding unit 226.
The memory 228 may be configured to store point cloud data to be used as reference data for inter prediction, such as original point cloud data, encoded point cloud data, and/or decoded point cloud data.
As shown in the example of fig. 2, G-PCC encoder 200 may obtain a set of locations and a set of attributes for points in a point cloud. G-PCC encoder 200 may obtain a set of locations and a set of attributes for points in a point cloud from data source 104 (FIG. 1). These locations may include coordinates of points in the point cloud. The attributes may include information about points in the point cloud, such as colors associated with the points in the point cloud. G-PCC encoder 200 may generate a geometric bitstream 203 that includes an encoded representation of the locations of points in the point cloud. The G-PCC encoder 200 may also generate an attribute bit stream 205 comprising an encoded representation of the set of attributes.
The coordinate transformation unit 202 may apply a transformation to the point coordinates to transform the coordinates from an initial domain to a transformation domain. The present disclosure may refer to transformed coordinates as transformed coordinates. The color transformation unit 204 may apply a transformation to transform the color information of the attribute to a different domain. For example, the color conversion unit 204 may convert color information from an RGB color space to a YCbCr color space.
Further, in the example of fig. 2, the voxelization unit 206 may voxelize the transformed coordinates. Voxelization of the transformed coordinates may include quantizing and removing some points in the point cloud. In other words, multiple points in the point cloud may be grouped within a single "voxel," which may then be considered a point in some aspects. Furthermore, the octree analysis unit 210 may generate octrees based on the voxelized transformed coordinates. Additionally, in the example of fig. 2, the surface approximation analysis unit 212 may analyze the points to potentially determine a surface representation of the set of points. The arithmetic coding unit 214 may entropy-encode syntax elements representing octree and/or information of the surface determined by the surface approximation analysis unit 212. The G-PCC encoder 200 may output these syntax elements in the geometry bitstream 203. The geometric bitstream 203 may also include other syntax elements, including syntax elements that are not arithmetically encoded.
In accordance with the techniques of this disclosure, arithmetic coding unit 214 may determine how to encode occupancy data for a current node, e.g., whether the current node is occupied by at least one point and/or a location of a point in the current node. In particular, the arithmetic coding unit 214 may determine whether an Inferred Direct Coding Mode (IDCM) mode is available for the current node according to whether the current node is inter-predictable and/or whether an angular mode is enabled for the current node. For example, if inter prediction is enabled for a current node and angular mode is disabled for the current mode, IDCM mode may be disabled for the current node. On the other hand, if inter prediction is enabled for the current node or angular mode is enabled for the current node, the IDCM mode may be enabled and thus used to encode occupancy data for the current node.
In some examples, when the IDCM mode is enabled, the arithmetic coding unit 214 may further determine whether to enable the location copy mode for the current node. In the IDCM mode, the arithmetic coding unit 214 may directly encode the position value for the point in the current node. In the position copy mode, the arithmetic coding unit 214 may predict a position value of a point for a current node from a position value of a reference node for the current node. The prediction may be such that the position value for the current node is directly copied from the position value of the reference node, or the arithmetic coding unit 214 may further encode a residual value representing a positional offset of the position value for the current node with respect to the position value of the reference node.
The geometry reconstruction unit 216 may reconstruct transformed coordinates of points in the point cloud based on the octree, data indicative of the surface determined by the surface approximation analysis unit 212, and/or other information. The number of transformed coordinates reconstructed by the geometry reconstruction unit 216 may be different from the original points of the point cloud due to the voxelization and surface approximation. The present disclosure may refer to the resulting points as reconstruction points. The attribute transmission unit 208 may transmit the attribute of the original point of the point cloud to the reconstructed point of the point cloud.
In addition, RAHT unit 218 may apply RAHT coding to the attributes of the reconstruction points. In some examples, according to RAHT, the properties of the block of 2x2x2 point locations are obtained and transformed in one direction to obtain four low frequency nodes (L) and four high frequency nodes (H). Subsequently, four low-frequency nodes (L) are transformed in the second direction to obtain two low-frequency nodes (LL) and two high-frequency nodes (LH). Two low frequency nodes (LL) are transformed in a third direction to obtain one low frequency node (LLL) and one high frequency node (LLH). The low frequency node LLL corresponds to DC coefficients and the high frequency nodes H, LH and LLH correspond to AC coefficients. The transform in each direction may be a 1D transform with two coefficient weights. The low frequency coefficients may be obtained as coefficients for the next higher level 2x2 block of RAHT transforms and the AC coefficients are encoded unchanged; such transformation continues until the top root node. The tree traversal for encoding is top-to-bottom and is used to calculate the weights to be used for the coefficients; the transformation order is bottom-up. These coefficients may then be quantized and coded.
Alternatively or additionally, the LOD generation unit 220 and the lifting unit 222 may apply LOD processing and lifting, respectively, to the attributes of the reconstruction points. LOD generation is used to split attributes into different levels of refinement. Each refinement level provides refinement of attributes of the point cloud. The first level of refinement provides a coarse approximation and contains few points; the subsequent level of refinement typically contains more points, and so on. The refinement level may be constructed using distance-based metrics, or may also use one or more other classification criteria (e.g., sub-sampling from a particular order). Thus, all reconstruction points may be included in the refinement level. Each level of detail is generated by taking the union of all points up to a particular level of refinement: for example, LOD1 is obtained based on refinement level RL1, LOD2 is obtained based on RL1 and RL 2. In some cases, the LOD generates a followable prediction scheme (e.g., a predictive transform) in which attributes associated with each point in the LOD are predicted from a weighted average of previous points and the residuals are quantized and entropy coded. A lifting scheme is built on top of the predictive transform mechanism, in which the update operator is used to update the coefficients and perform adaptive quantization of the coefficients.
RAHT unit 218 and lifting unit 222 may generate coefficients based on these attributes. The coefficient quantization unit 224 may quantize the coefficients generated by the RAHT unit 218 or the lifting unit 222. The arithmetic coding unit 226 may apply arithmetic coding to syntax elements representing quantized coefficients. The G-PCC encoder 200 may output these syntax elements in the attribute bit stream 205. The attribute bitstream 205 may also include other syntax elements, including non-arithmetic coded syntax elements.
In the example of fig. 3, the G-PCC decoder 300 may include a geometric arithmetic decoding unit 302, a memory 324, an attribute arithmetic decoding unit 304, an octree synthesis unit 306, an inverse quantization unit 308, a surface approximation synthesis unit 310, a geometric reconstruction unit 312, RAHT unit 314, a LoD generation unit 316, an inverse lifting unit 318, an inverse transform coordinate unit 320, and an inverse transform color unit 322.
G-PCC decoder 300 may obtain geometry bit stream 203 and attribute bit stream 205. The geometric arithmetic decoding unit 302 of the decoder 300 may apply arithmetic decoding (e.g., context Adaptive Binary Arithmetic Coding (CABAC) or other types of arithmetic decoding) to syntax elements in the geometric bitstream 203. Similarly, the attribute arithmetic decoding unit 304 may apply arithmetic decoding to syntax elements in the attribute bitstream 205.
In accordance with the techniques of this disclosure, the geometric arithmetic decoding unit 302 may determine how to decode occupancy data of a current node. In particular, the geometric arithmetic decoding unit 302 may determine whether an Inferred Direct Coding Mode (IDCM) mode is available for the current node based on whether the current node is inter-predictable and/or whether an angular mode is enabled for the current node. For example, if inter prediction is enabled for a current node and angular mode is disabled for the current mode, IDCM mode may be disabled for the current node. On the other hand, if inter prediction is enabled for the current node or angular mode is enabled for the current node, the IDCM mode may be enabled and thus used to decode occupancy data for the current node.
In some examples, when the IDCM mode is enabled, the geometric arithmetic decoding unit 302 may further determine whether to enable the location copy mode for the current node. In the IDCM mode, the geometric arithmetic decoding unit 302 may directly decode the position value for the point in the current node. In the position copy mode, the geometric arithmetic decoding unit 302 may predict a position value of a point for a current node from a position value of a reference node for the current node. The prediction may be such that the position value for the current node is directly copied from the position value of the reference node, or the geometric arithmetic decoding unit 302 may further decode a residual value representing a positional offset of the position value for the current node with respect to the position value of the reference node.
The octree synthesis unit 306 may synthesize octrees based on syntax elements parsed from the geometric bitstream 203. From the root node of the octree, the occupancy of each of the eight child nodes at each octree level is signaled in the bitstream. When the signaling indicates that a child node at a particular octree level is busy, the occupancy of the child node for that child node is signaled. Signaling of nodes at each octree level is signaled before proceeding to the subsequent octree level. At the last level of the octree, each node corresponds to a voxel location; when a leaf node is occupied, one or more points may be designated as occupied at a voxel location. In some examples, some branches of the octree may terminate earlier than the final level due to quantization. In such cases, the leaf node is considered to be an occupied node without child nodes. In examples where surface approximations are used in the geometric bitstream 203, the surface approximation synthesis unit 310 may determine the surface model based on syntax elements parsed from the geometric bitstream 203 and based on octree.
Further, the geometric reconstruction unit 312 may perform reconstruction to determine coordinates of points in the point cloud. For each position at a leaf node of the octree, the geometry reconstruction unit 312 may reconstruct the node position by using a binary representation of the leaf node in the octree. At each respective leaf node, signaling a number of points at the respective leaf node; this indicates the number of duplicate points at the same voxel location. When using geometric quantization, the point locations are scaled to determine reconstructed point location values.
The inverse transformation coordinate unit 320 may apply an inverse transformation to the reconstructed coordinates to convert the reconstructed coordinates (positions) of the points in the point cloud from the transformation domain back to the initial domain. The locations of points in the point cloud may be in the floating point domain, but the points in the G-PCC codec are decoded in the integer domain. An inverse transform may be used to transform these locations back to the original domain.
In addition, in the example of fig. 3, the inverse quantization unit 308 may inversely quantize the attribute value. The attribute value may be based on syntax elements obtained from the attribute bitstream 205 (e.g., including syntax elements decoded by the attribute arithmetic decoding unit 304).
Depending on how the attribute values are encoded, RAHT unit 314 may perform RAHT coding to determine color values for points in the point cloud based on the inverse quantized attribute values. RAHT decoding is performed from the top to the bottom of the tree. At each level, the composition value is derived using the low frequency coefficients and the high frequency coefficients derived from the inverse quantization process. At the leaf node, the derived value corresponds to the attribute value of the coefficient. The weight derivation process of the points is similar to that used at the G-PCC encoder 200. Alternatively, the LOD generation unit 316 and the inverse boost unit 318 may use a level of detail-based technique to determine color values for points in the point cloud. The LOD generation unit 316 decodes each LOD to give a progressively finer representation of the properties of the points. In the case of a predictive transform, the LOD generation unit 316 derives a prediction of a point from a weighted sum of points previously reconstructed in a previous LOD or in the same LOD. The LOD generation unit 316 may add the prediction to the residual (which is obtained after inverse quantization) to obtain a reconstructed value of the attribute. When a lifting scheme is used, the LOD generation unit 316 may also include an update operator to update coefficients used to derive attribute values. In this case, the LOD generation unit 316 may also apply inverse adaptive quantization.
Further, in the example of fig. 3, the inverse transform color unit 322 may apply an inverse color transform to the color values. The inverse color transform may be inverse to the color transform applied by the color transform unit 204 of the encoder 200. For example, the color conversion unit 204 may convert color information from an RGB color space to a YCbCr color space. Accordingly, the inverse color transform unit 322 may transform color information from the YCbCr color space to the RGB color space.
Various elements of fig. 2 and 3 are shown to assist in understanding the operations performed by encoder 200 and decoder 300. The units may be implemented as fixed function circuits, programmable circuits or a combination thereof. The fixed function circuit refers to a circuit that provides a specific function, and an operation that can be performed is set in advance. Programmable circuitry refers to circuitry that can be programmed to perform various tasks and provide flexible functionality in the operations that can be performed. For example, the programmable circuit may execute software or firmware that causes the programmable circuit to operate in a manner defined by instructions of the software or firmware. Fixed function circuitry may execute software instructions (e.g., to receive parameters or output parameters) but the type of operation that fixed function circuitry performs is typically not variable. In some examples, one or more of the units may be different circuit blocks (fixed function or programmable), and in some examples, one or more of the units may be an integrated circuit.
Fig. 4 is a flow chart depicting an example process for performing motion-based inter prediction for G-PCC. The G-PCC encoder 200 and the G-PCC decoder 300 may be configured to perform motion prediction (i.e., inter prediction) as follows.
Two types of motion are involved in the G-PCC INTEREM software: global motion matrix and local node motion vector. Global motion parameters are defined as rotation matrices and translation vectors to be applied at all points in the predicted (reference) frame (except the point where the local motion pattern is applied). The local node motion vector of a node of the octree is a motion vector applied only at points within a node in the predicted (reference) frame. Details of the motion estimation algorithm in InterEM are described below. Fig. 4 depicts a flow chart for a motion estimation algorithm.
Given an input predicted (reference) frame and a current frame, global motion is first estimated in global scale. After the global motion is predictively applied, the local motion is estimated in the octree at a finer scale (i.e., node level). Finally, the estimated local node motion is applied in motion compensation.
Fig. 5 is a flow chart illustrating an example process for estimating a local node motion vector. The G-PCC encoder 200 and the G-PCC decoder 300 may recursively estimate local node motion vectors according to fig. 5. The cost function for selecting the most suitable motion vector may be based on a rate distortion cost.
If the current node is not split into 8 children, then a determination is made that the lowest cost motion vector between the current node and the predicted node can be generated. If the current node is divided into 8 sub-nodes, a motion estimation algorithm is applied and the total cost under split conditions is obtained by summing the estimated cost values for each sub-node. Determining whether to split or not split by comparing costs between split and not split; each child node is assigned its respective motion vector (or each child node may be further split into its child nodes) if split, and the current node is assigned a motion vector if not split.
Two parameters that affect the performance of motion vector estimation are block size (BlockSize) and minimum prediction unit size (MinPUSize). The BlockSize defines an upper bound on the node size to which the motion vector estimation is applied, and MinPUSize defines a lower bound.
G-PCC encoder 200 and G-PCC decoder 300 may perform inter-prediction based on occupied coding (including planar mode coding that disables angular modes).
Fig. 6 is a conceptual diagram illustrating an example in which occupancy comparison for inter prediction is performed in G-PCC. G-PCC encoder 200 and G-PCC decoder 300 may compare current node 452 of parent node 450 of the octree with reference node 456 for parent reference node 454. Reference node 456 is a collocated node in the reference frame with current node 452 of the current frame. The G-PCC encoder 200 and the G-PCC decoder 300 may derive a predicted frame from a reference frame using global motion vectors applied to the reference frame. When the parent reference node 454 is obtained, the G-PCC encoder 200 and the G-PCC decoder 300 may split the parent reference node 454 into 8 cubic child nodes of the same size, including the reference node 456.
G-PCC encoder 200 and G-PCC decoder 300 may count points in each child node of parent reference node 454 to form an inter-prediction occupancy value (predOccupancy, which may be an array of binary values) and a prediction occupancy strength value (predOccupancyStrong). predOccupancy the data structure may be an array of eight bits, indexed 0 to 7. In some examples, if there is at least one point in a child node of parent reference node 454, then the corresponding bit (i of 8 bits, i.e., 0 to 7) predOccupancy [ i ] is set equal to 1. Otherwise, the corresponding bit of predOccupancy arrays is set equal to 0. In some examples, if the number of points in a child node is greater than 2, then the corresponding bit in predOccupancyStrong (which may also be an array of eight bits, index 0 to 7) is set equal to 1; otherwise, this bit is set equal to 0.
The quality of the inter prediction is then evaluated by a parameter called 'occupancyIsPredictable'. The value for occupancyIsPredictable of the nodes is derived from the number of sibling nodes that have missed the prediction. Specifically, if the occupancy bit of a child node in a parent node is different from the occupancy bit of a corresponding reference node in the parent reference node, then the child node is considered to have missed the prediction. G-PCC encoder 200 and G-PCC decoder 300 may calculate the number of siblings that have missed the prediction by comparing the occupancy of parent node 450 with the occupancy of parent reference node 454 (numSiblingsMispredicted), as shown in FIG. 6. If predOccupancy of the current node is 0 or the number of siblings missing predictions is greater than 5, occupancyIsPredictable is set equal to 0. Otherwise, it is set equal to 1. Thus, in one example, a threshold of 5 may be used to determine whether a node is inter-predictable.
G-PCC encoder 200 and G-PCC decoder 300 may use occupancyIsPredictable to update predOccupancy, predOccupancyStrong, plane replication mode qualification, and IDCM qualification. If occupancyIsPredictable is equal to 0, predOccupancy, predOccupancyStrong and plane copy mode qualifications are set equal to 0. If occupancyIsPredictable is 1, then IDCM is disabled for this node in routine InterEM.
Fig. 7 is a conceptual diagram illustrating a Planar Coding Mode (PCM) for G-PCC. In particular, a node can be said to be planar in a particular direction (e.g., X-direction, Y-direction, and Z-direction) if all points are in a single plane or in two planes in that direction. There may be two planes in each direction: for example, for the X-direction, the left and right planes, for the Y-direction, the top and bottom planes, and for the Z-direction, the front and back planes.
In planar mode, if a node is decoded using Planar Copy Mode (PCM), then the planar information for this node is not signaled in the bitstream. Instead, for example, for the current node 460, the plane pattern and plane position in three directions under the pcm are copied from the plane information of the reference node 462, which may be generated according to predOccupancy. An example of a current node 460 of PCM decoding and a node 464 of non-PCM decoding is shown in fig. 7. For non-PCM coded nodes 464, the plane information of reference node 466 is used to provide more flexibility in the selection of contexts for coding plane modes and plane positions.
The G-PCC encoder 200 and the G-PCC decoder 300 may be configured to perform inter prediction based on occupied coding. Inter prediction may be used to improve occupied coding, in particular, for context selection to encode occupied bits of a current node. This is represented in GeometryOctreeEncoder: encodeOccupancyNeighNZ () and GeometryOctreeEncoder: encodeOccupancyNeighZ () functions as follows:
In the example pseudocode above, the occupancy of neighboring nodes is used to decide idxAdj. bitIsPredicted and bitPrediction are occupancy bits derived using intra prediction. In an inter frame, these parameters are set equal to 0. The value "+.! ! MAPPEDPRED "indicates whether the prediction occupancy of the inter reference block is zero. bitPred and bitPredStrong are the corresponding bits of the child nodes in predOccupancy and predOccupancyStrong.
The present disclosure recognizes that in current InterEM version 3 for G-PCC, inter prediction provides significant coding efficiency for both lossy-lossy and lossless-lossless configurations. Note that in current InterEM version 3, if the node is inter-predictable, the IDCM mode is disabled for the child node. In this case, the encoder and decoder run times increase significantly for the lossless-lossless case. When the use of angles is enabled, the overhead of decoding the IDCM node is significantly reduced. Thus, when the number of IDCM nodes is reduced by using a determination of whether the nodes are inter-predictable, the benefit of IDCM is reduced and this reduces coding efficiency.
The present disclosure describes various techniques that may be used to solve the above-described problems and to improve coding of IDCM modes (in particular, coding the position of points in G-PCC) using inter-prediction.
In some examples, the IDCM mode is not allowed when the node is inter-predictable. Otherwise, the permission for the IDCM mode is based on the IDCM mode, the node size, and the number of siblings per isDirectModeEligible (), which may be defined as follows:
/>
In accordance with the techniques of this disclosure, G-PCC encoder 200 and G-PCC decoder 300 may be configured to use the modified IDCM qualification to more adaptively control the trade-off of the IDCM mode. In one example, inter prediction is not used in IDCM qualification, so the above function may be modified as follows, where "remove" indicates removal from the G-PCC standard:
In another example, the G-PCC encoder 200 and the G-PCC decoder 300 may use inter-frame prediction in an IDCM qualification check that depends on the angular mode. For example, if the angular mode is disabled and the node is inter-predictable, IDCM mode may not be allowed for this node. The above function may be updated accordingly, wherein "add" indicates an addition with respect to the G-PCC standard, and "modify" indicates a modification to the G-PCC standard:
/>
In this example, if the current node has a true value of occupancyIsPredictable (i.e., the node is inter-predictable) and the angular mode is not enabled, the IDCM mode is disabled for that node. Otherwise, i.e., if the current node is not inter-predictable or the angular mode is enabled, the IDCM mode may be enabled for the current node.
As discussed above, to determine whether a node is inter-predictable, G-PCC encoder 200 and G-PCC decoder 300 may determine whether the sibling node for the node is predicted correctly or has a prediction miss (i.e., missed prediction), which may be determined according to the techniques discussed above. If the number of missed predictions meets or exceeds a threshold, G-PCC encoder 200 and G-PCC decoder 300 may determine that the node is not inter-predictable. On the other hand, if the number of missed predictions is less than or equal to the threshold, G-PCC encoder 200 and G-PCC decoder 300 may determine that the node is inter-predictable.
G-PCC encoder 200 may test the angular pattern for a node and use the angular pattern for the node to determine a Rate Distortion Optimization (RDO) value. If the RDO value indicates that the angular mode should be enabled for the node, G-PCC encoder 200 may encode a value for the syntax element indicating that the angular mode is enabled for the node. On the other hand, if the RDO value indicates that the angular mode should not be enabled for the node, G-PCC encoder 200 may encode a value for the syntax element that indicates that the angular mode should be disabled for the node. G-PCC decoder 300 may use the values of the syntax elements to determine whether to enable an angular mode for a node. The syntax element may be, for example angularModeIsEnabled, as shown above.
In another example, G-PCC encoder 200 and G-PCC decoder 300 may use inter-frame predictability in the selection of IDCM qualification, as follows, where "remove" indicates removal relative to the G-PCC standard and "add" indicates addition relative to the G-PCC standard:
In another example, G-PCC encoder 200 and G-PCC decoder 300 may be configured to perform the following:
/>
The G-PCC encoder 200 and the G-PCC decoder 300 may be configured to perform a location copy mode (referred to as "PCMI mode") with RD checking for IDCM mode as follows. If a node is decoded to PCMI, the location of the point in this node can be copied from the location of the point in the reference node. A flag may be signaled to indicate whether the node is encoded using PCMI modes.
G-PCC encoder 200 and G-PCC decoder 300 may determine PCMI the mode qualifications as follows: PCMI modes may be applied only to nodes in certain depths, which may be signaled in a header or parameter set. The PCMI mode may be applied only to nodes with reference nodes with a certain number of points. For example, if the reference node has at most 3 points, then PCMI may be used to encode the current node. PCMI qualifications may be set in the configuration file and the PCMI qualifications signaled to the decoder. For example PCMI is not applicable to lossless-lossless cases.
In some examples, PCMI modes may be applied to PCM nodes.
G-PCC encoder 200 may be configured to determine whether to use PCMI modes based on Rate and Distortion (RDO) optimization. The distortion may be calculated by reconstructing the sum of the differences between the positions of the nodes in the points and the original points. Optionally, the position residuals are coded in the bit stream and rate. For non-PCMI nodes, the rate is the number of bits that signal the number of points and the location of the points.
The G-PCC encoder 200 and the G-PCC decoder 300 may be configured to perform a modified version of joint position coding. In particular, points in the reference node may be used for joint coding together with points in the current node. It should be noted that joint coding techniques are limited to nodes having two points. In the techniques of this disclosure, this limitation may be relaxed where reference points are used in joint coding.
An example case of joint decoding is shown in the following table:
Example Joint coding case
In some techniques, the value_bit [ k ] [ j ] is bypass encoded when the bits of two points are identical (same_bit [ k ] [ j ] = true, k is the direction index, j is the bit index in the position). However, in the techniques of this disclosure, G-PCC encoder 200 and G-PCC decoder 300 may avoid decoding value_bit [ k ] [ j ] because it may be derived from a bit value in a reference point.
In some examples, joint position coding may be applied only to directions in which the current node and the reference node share the same plane information including plane patterns and plane positions.
G-PCC encoder 200 and G-PCC decoder 300 may be configured to perform techniques of the present disclosure that may improve context selection for decoding a point offset. The laser index associated with the current node is used to decide the angle or azimuth to be used to select the context to encode the corresponding component of the point offset. Specifically, azimuth is used to select a context when encoding the second component S or T of the point offset (chapter 10.8.4.5 of the G-PCC standard), and angle is used to select a context when encoding the component V of the point offset (chapter 10.8.4.6 of the current G-PCC standard).
In accordance with the techniques of this disclosure, G-PCC encoder 200 and G-PCC decoder 300 may adaptively select a context for encoding or decoding components of a point offset using points in a reference node. Reference points may be defined to represent points in the reference node. The reference point may be a function of the location of the point in the reference node. In one example, the reference point may be an average position of points in the reference node. In another example, the reference point may be a median location of points in the reference node. In yet another example, the reference point may be a maximum location of a point in the reference node. In yet another example, the reference point may be a maximum location of a point in the reference node.
The G-PCC encoder 200 and the G-PCC decoder 300 may be configured to perform context selection in point offset coding using reference points as follows:
6.2.4.1 reference points for use in context selection for encoding component V of a point offset
In this example, the component z (height) of the reference point may be used to select the context. The z-based context index may be determined as follows:
/>
Section 10.8.4.6 can be updated as follows:
For example, the node position in the vertical direction may be associated with the vertical coordinates of the node boundary plane perpendicular to the vertical axis.
In one example, the value of n may be selected depending on the size of the node. For example, for larger node sizes, a larger value of n may be selected. In another example, the value of the context index may be inferred as follows: ctxRefZ = (node position in z-vertical direction) T/n, where T is selected according to node size. For smaller node size values, the value of T is greater and vice versa.
As another example, the ctxRefZ value for each binary number may be recalculated after updating z based on the previously decoded binary number. Similar techniques may be applied to decoding of S-point offset or T-point offset as follows:
6.2.4.2 reference point for use in context selection for encoding a second component S or T of a point offset
/>
Fig. 8 is a conceptual diagram illustrating a laser package 600 (such as a LIDAR sensor or another system including one or more lasers) scanning points in three-dimensional space. The data source 104 (fig. 1) may include a laser package 600.
As shown in fig. 8, a point cloud may be captured using a laser package 600, i.e., a sensor scans points in 3D space. However, it should be understood that some point clouds are not generated by actual LIDAR sensors, but may be encoded as they are generated by actual LIDAR sensors. In the example of fig. 8, the laser package 600 includes a LIDAR head 602 that includes a plurality of lasers 604A-604E (collectively "lasers 604") arranged in a vertical plane at different angles relative to an origin. The laser package 600 may be rotated about a vertical axis 608. The laser package 600 may use the returned laser light to determine the distance and location of points in the point cloud. The laser beams 606A-606E (collectively, "laser beams 606") emitted by the lasers 604 of the laser package 600 may be characterized by a set of parameters. The distances represented by arrows 610, 612 represent example laser correction values for lasers 604B, 604A, respectively.
Fig. 9 is a conceptual diagram illustrating an example ranging system 900 that may be used with one or more techniques of this disclosure. In the example of fig. 9, ranging system 900 includes an illuminator 902 and a sensor 904. The illuminator 902 may emit light 906. In some examples, illuminator 902 can emit light 906 as one or more laser beams. The light 906 may be at one or more wavelengths, such as infrared wavelengths or visible wavelengths. In other examples, light 906 is not a coherent laser. When light 906 encounters an object, such as object 908, light 906 produces return light 910. The return light 910 may include back-scattered and/or reflected light. The return light 910 may pass through a lens 911 that directs the return light 910 to create an image 912 of the object 908 on the sensor 904. The sensor 904 generates a signal 914 based on the image 912. Image 912 may include a set of points (e.g., the set of points represented by the points in image 912 of fig. 9).
In some examples, the illuminator 902 and the sensor 904 may be mounted on a rotating structure such that the illuminator 902 and the sensor 904 capture a 360 degree view of the environment. In other examples, ranging system 900 may include one or more optical components (e.g., mirrors, collimators, diffraction gratings, etc.) that enable illuminator 902 and sensor 904 to detect objects within a particular range (e.g., up to 360 degrees). Although the example of fig. 9 shows only a single illuminator 902 and sensor 904, ranging system 900 may include multiple sets of illuminators and sensors.
In some examples, illuminator 902 generates a structured light pattern. In such examples, ranging system 900 may include a plurality of sensors 904 on which respective images of the structured light pattern are formed. Ranging system 900 may use inconsistencies between images of structured light patterns to determine a distance to object 908 from which the structured light patterns are backscattered. When the object 908 is relatively close to the sensor 904 (e.g., 0.2 meters to 2 meters), the structured light-based ranging system may have a high level of accuracy (e.g., accuracy in the sub-millimeter range). This high level of precision may be useful in facial recognition applications such as unlocking mobile devices (e.g., mobile phones, tablet computers, etc.) and for security applications.
In some examples, ranging system 900 is a time-of-flight (ToF) based system. In some examples where ranging system 900 is a ToF-based system, illuminator 902 generates pulses of light. In other words, the illuminator 902 can modulate the amplitude of the emitted light 906. In such examples, the sensor 904 detects return light 910 from the pulses of light 906 generated by the illuminator 902. Ranging system 900 may then determine a distance to object 908 from which light 906 is backscattered based on a delay between when light 906 is emitted and when it is detected and a known speed of light in air. In some examples, illuminator 902 can modulate the phase of emitted light 1404 instead of (or in addition to) modulating the amplitude of emitted light 906. In such examples, the sensor 904 may detect the phase of the return light 910 from the object 908 and determine the distance to the point on the object 908 using the speed of light and based on the time difference between when the illuminator 902 generates the light 906 at a particular phase and when the sensor 904 detects the return light 910 at that particular phase.
In other examples, the point cloud may be generated without the use of the illuminator 902. For example, in some examples, sensor 904 of ranging system 900 may include two or more optical cameras. In such examples, ranging system 900 may use an optical camera to capture stereoscopic images of an environment including object 908. The ranging system 900 (e.g., the point cloud generator 920) may then calculate the difference between the locations in the stereoscopic image. The ranging system 900 may then use these differences to determine a distance to a location shown in the stereoscopic image. From these distances, the point cloud generator 920 may generate a point cloud.
The sensor 904 may also detect other properties of the object 908, such as color and reflectivity information. In the example of fig. 9, a point cloud generator 920 may generate a point cloud based on the signal 918 generated by the sensor 904. Ranging system 900 and/or point cloud generator 920 may form part of data source 104 (fig. 1).
Fig. 10 is a conceptual diagram illustrating an example vehicle-based scenario in which one or more techniques of the present disclosure may be used. In the example of fig. 10, the vehicle 1000 includes a laser package 1002, such as a LIDAR system. The laser package 1002 may be implemented in the same manner as the laser package 600 (fig. 8). Although not shown in the example of fig. 10, vehicle 1000 may also include a data source, such as data source 104 (fig. 1), and a G-PCC encoder, such as G-PCC encoder 200 (fig. 1). In the example of fig. 10, a laser package 1002 emits laser beams 1004 that reflect from pedestrians 1006 or other objects on the road. The data source of the vehicle 1000 may generate a point cloud based on the signals generated by the laser package 1002. The G-PCC encoder of vehicle 1000 may encode the point cloud to generate a bit stream 1008, such as the geometric bit stream of fig. 2 and the attribute bit stream of fig. 2. The bit stream 1008 may include significantly fewer bits than the uncoded point cloud obtained by the G-PCC encoder. An output interface of the vehicle 1000 (e.g., the output interface 108 of fig. 1) may transmit the bit stream 1008 to one or more other devices. Thus, the vehicle 1000 may be able to transmit the bit stream 1008 to other devices faster than the encoded point cloud data. In addition, the bit stream 1008 may require less data storage capacity.
The techniques of this disclosure may further reduce the number of bits in the bitstream 1008. For example, as discussed above, if the current node is encoded using at least one of inter-prediction occupancy or plane mask data, then there is no need to encode a single occupancy data for the current node. Avoiding encoding a single occupancy data in these cases may reduce the number of bits in the bitstream because occupancy for the current node may be more efficiently coded using inter-prediction occupancy or plane mask data.
In the example of fig. 10, a vehicle 1000 may transmit a bit stream 1008 to another vehicle 1010. Vehicle 1010 may include a G-PCC decoder, such as G-PCC decoder 300 (FIG. 1). The G-PCC decoder of the vehicle 1010 may decode the bit stream 1008 to reconstruct the point cloud. The vehicle 1010 may use the reconstructed point cloud for various purposes. For example, the vehicle 1010 may determine that the pedestrian 1006 is on a road in front of the vehicle 1000 based on the reconstructed point cloud, and thus begin decelerating, e.g., even before the driver of the vehicle 1010 realizes that the pedestrian 1006 is on the road. Thus, in some examples, the vehicle 1010 may perform an autonomous navigation operation, generate a notification or alert, or perform another action based on the reconstructed point cloud.
Additionally or alternatively, the vehicle 1000 may transmit the bit stream 1008 to the server system 1012. The server system 1012 may use the bit stream 1008 for various purposes. For example, the server system 1012 may store the bit stream 1008 for subsequent reconstruction of the point cloud. In this example, server system 1012 may use the point cloud along with other data (e.g., vehicle telemetry data generated by vehicle 1000) to train the autonomous driving system. In other examples, the server system 1012 may store the bit stream 1008 for subsequent reconstruction of the forensic collision survey (e.g., if the vehicle 1000 collides with the pedestrian 1006).
Fig. 11 is a conceptual diagram illustrating an example augmented reality system in which one or more techniques of the present disclosure may be used. Augmented reality (XR) is a term used to cover a range of technologies including Augmented Reality (AR), mixed Reality (MR), and Virtual Reality (VR). In the example of fig. 11, a first user 1100 is located in a first location 1102. User 1100 wears XR headset 1104. As an alternative to XR headset 1104, user 1100 may use a mobile device (e.g., a mobile phone, tablet computer, etc.). The XR headset 1104 includes a depth detection sensor, such as a LIDAR system, that detects the position of points on the object 1106 at the location 1102. The data source of the XR headset 1104 may use the signals generated by the depth detection sensor to generate a point cloud representation of the object 1106 at the location 1102. XR headset 1104 may include a G-PCC encoder (e.g., G-PCC encoder 200 of fig. 1) configured to encode a point cloud to generate bitstream 1108.
The techniques of this disclosure may further reduce the number of bits in the bitstream 1108. For example, as discussed above, if the current node is encoded using at least one of inter-prediction occupancy or plane mask data, then there is no need to encode a single occupancy data for the current node. Avoiding encoding a single occupancy data in these cases may reduce the number of bits in the bitstream because occupancy for the current node may be more efficiently coded using inter-prediction occupancy or plane mask data.
XR headset 1104 may transmit bit stream 1108 to XR headset 1110 worn by user 1112 at second location 1114 (e.g., via a network such as the internet). XR headset 1110 may decode bitstream 1108 to reconstruct the point cloud. The XR headset 1110 may use the point cloud to generate XR visualizations (e.g., AR visualizations, MR visualizations, VR visualizations) that represent the object 1106 at the location 1102. Thus, in some examples, user 1112 at location 1114 may have a 3D immersive experience of location 1102, such as when XR headset 1110 generates a VR visualization. In some examples, XR headset 1110 may determine a location of the virtual object based on the reconstructed point cloud. For example, XR headset 1110 may determine that the environment (e.g., location 1102) includes a flat surface based on the reconstructed point cloud, and then determine that a virtual object (e.g., cartoon character) is to be located on the flat surface. XR headset 1110 may generate an XR visualization in which the virtual object is located at the determined location. For example, XR headset 1110 may show a cartoon character sitting on the flat surface.
Fig. 12 is a conceptual diagram illustrating an example mobile device system that may use one or more techniques of this disclosure. In the example of fig. 12, a mobile device 1200 (such as a mobile phone or tablet computer) includes a depth detection sensor (such as a LIDAR system) that detects a location of a point on an object 1202 in an environment of the mobile device 1200. The data source of the mobile device 1200 may use the signals generated by the depth detection sensor to generate a point cloud representation of the object 1202. Mobile device 1200 may include a G-PCC encoder (e.g., G-PCC encoder 200 of fig. 1) configured to encode a point cloud to generate bit stream 1204.
In the example of fig. 12, mobile device 1200 may transmit a bitstream to a remote device 1206, such as a server system or another mobile device. The remote device 1206 may decode the bitstream 1204 to reconstruct the point cloud. The remote device 1206 may use the reconstructed point cloud for various purposes. For example, the remote device 1206 may use the point cloud to generate an environment map of the mobile device 1200. For example, the remote device 1206 may generate a building interior map based on the reconstructed point cloud. As another example, the remote device 1206 may generate an image (e.g., a computer graphic) based on the point cloud. For example, the remote device 1206 may use points in the point cloud as vertices of the polygon and color attributes of the points as a basis for coloring the polygon. In some examples, the remote device 1206 may perform facial recognition using a point cloud.
Fig. 13 is a flow chart illustrating an example method of decoding point cloud data in accordance with the techniques of this disclosure. The method of fig. 13 may be performed by the G-PCC encoder 200 during a point cloud encoding process, or by the G-PCC decoder 300 during a point cloud decoding process. For purposes of illustration and explanation, the method of fig. 13 is explained with respect to G-PCC decoder 300, but G-PCC encoder 200 may also perform this or a similar method.
Initially, G-PCC decoder 300 may obtain a current node of an octree of point cloud data (500). For example, the G-PCC decoder 300 may extract data for the octree from the bit stream and reciprocally decode nodes of the octree starting from the root node. When executed by the G-PCC encoder 200, the G-PCC encoder 200 may reciprocally encode octree starting from the root node.
G-PCC decoder 300 may divide each occupied node into eight child nodes, as shown, for example, in fig. 6 and 7. G-PCC decoder 300 may determine whether the node is inter-predictable for the current node (502). For example, G-PCC decoder 300 may determine the number of sibling nodes missing predictions and whether this number is less than or equal to a Threshold (TH). When the occupancy of a node differs from the occupancy of a reference node for that node, that node may miss predictions. For example, a reference node may be said to have missed a prediction if it is actually occupied when it is not occupied, or if it is actually unoccupied when it is occupied. A current node may be considered inter-predictable if it has five or fewer sibling nodes missing predictions. Otherwise, if the current node has more than five sibling nodes missing predictions, then the node may be considered not inter-predictable.
If the number of sibling nodes missing predictions is less than or equal to the threshold (the "yes" branch of 502), i.e., if the current node is inter-predictable, G-PCC decoder 300 may further determine whether to enable angular mode for the current node (504). G-PCC decoder 300 may, for example, determine a value (e.g., a geometry_angular_enabled_flag) of a syntax element indicating whether angular mode is enabled for the current node. Alternatively, G-PCC decoder 300 may receive a profile indicating whether or not angular mode is enabled. G-PCC encoder 200 may perform a Rate Distortion Optimization (RDO) procedure to determine whether to enable an angular mode for the current node, and set the value of the syntax element accordingly.
If the number of siblings missing a prediction is less than or equal to the threshold (i.e., the current node is inter-predictable) (the "yes" branch of 502) and the angular mode is not enabled for the current node (the "no" branch of 504), G-PCC decoder 300 may decode occupancy data for the current node using a non-IDCM mode (such as inter-prediction) (506). That is, inter prediction may generally be more efficient than IDCM mode, and thus, if inter prediction is available and angular mode is not available, G-PCC decoder 300 may decode occupancy data of a current node using inter prediction. For example, the G-PCC decoder 300 may determine a context for entropy decoding a value indicating whether the current node is occupied according to whether the reference node of the current node is occupied, and then entropy decode the value using the determined context.
However, if the number of siblings missing predictions is greater than a threshold (i.e., the current node is not inter-predictable) (no branch of 502) or the angular mode is enabled for the current node (yes branch of 504), G-PCC decoder 300 may determine to enable the IDCM mode for the current node. In some cases, when the IDCM mode is enabled and the current node is inter-predictable, G-PCC decoder 300 may further decode a value indicating whether to use IDCM or inter-prediction to code the current node. Assuming that the current node is decoded using the IDCM mode, the G-PCC decoder 300 may decode the occupancy data using the IDCM mode. For example, G-PCC encoder 200 may perform an RDO process to determine whether IDCM or inter-prediction achieves better RDO performance, and determine to decode occupancy data for the current node using one of IDCM or inter-prediction with better RDO performance, and accordingly further encode a value indicating whether to encode occupancy data for the current node using inter-prediction or IDCM.
In the example of fig. 13, G-PCC decoder 300 may further determine whether the IDCM mode is a location copy mode for the current node (508), e.g., a regular IDCM mode or a location copy mode. If the location duplication mode is not to be used ("no" branch of 510), G-PCC decoder 300 may directly decode the occupancy data, i.e., the occupancy data may directly indicate the point location for the current mode (512). On the other hand, if the location duplication mode is to be used ("yes" branch of 510), G-PCC decoder 300 may predict a point location for the current node from the reference node (514). In some examples, the current node may directly inherit the point location of the reference node, while in other examples, G-PCC decoder 300 may decode residual values representing a position offset to be applied to the location of the point in the reference node to implement the point location in the current node.
In this way, the method of fig. 13 represents an example of a method of decoding point cloud data, the method comprising: determining at least one of: 1) The nodes of the octree of the point cloud data are not inter-predictable, or 2) an angular mode is enabled for the node; determining an Inferred Direct Coding Mode (IDCM) mode for the node in response to determining the at least one of: 1) The node is not inter-predictable, or 2) an angular mode is enabled for the node; and decoding occupancy data for the node using the determined IDCM mode.
Various examples of the technology of the present disclosure are summarized in the following clauses:
Clause 1: a method of coding point cloud data, the method comprising: determining at least one of: 1) The nodes of the octree of point cloud data are not inter-predictable, or 2) angular modes are enabled for the nodes; determining an Inferred Direct Coding Mode (IDCM) mode for the node in response to determining the at least one of: 1) The node is not inter-predictable, or 2) an angular mode is enabled for the node; and decoding occupancy data of the node using the determined IDCM mode.
Clause 2: the method of clause 1, wherein the node comprises a first node, the method further comprising: determining that a second node of the octree is inter-predictable and disabling an angular mode for the second node; and responsive to determining that the second node is inter-predictable and disabling angular mode for the second node, coding occupancy data of the second node using inter-prediction.
Clause 3: the method of clause 1, wherein determining that the node of the octree of point cloud data is not inter-predictable comprises determining that a number of sibling nodes of the octree that miss predictions for the node exceeds a threshold.
Clause 4: the method of clause 1, wherein determining to enable angular mode for the node comprises coding a value for a syntax element, the value indicating that angular mode is enabled for the node.
Clause 5: the method of clause 1, wherein decoding the occupancy data using the determined IDCM pattern comprises decoding data representing locations of points in the node.
Clause 6: the method of clause 1, wherein the determined IDCM mode comprises a location duplication mode, and wherein coding the occupancy data of the node comprises: determining a reference node for the node; determining a location of a point in the reference node; and determining the point of the node from the location of the point in the reference node.
Clause 7: the method of clause 6, further comprising determining that the syntax element has a value indicating that the location copy mode is usable.
Clause 8: the method of clause 6, further comprising determining that the location replication pattern is usable based on a depth of the node in the octree.
Clause 9: the method of clause 6, further comprising determining that the location duplication mode is usable based on the number of points in the reference node.
Clause 10: the method of clause 6, wherein determining the point of the node comprises determining that the point of the node is located at the location of the point in the reference node.
Clause 11: the method of clause 6, wherein determining the point of the node comprises coding a position residual value for the point of the node, the position residual value representing a position offset between the position of the point of the reference node and a position of the point of the node.
Clause 12: the method of clause 1, wherein coding the occupancy data comprises decoding the occupancy data.
Clause 13: the method of clause 1, wherein coding the occupancy data comprises encoding the occupancy data.
Clause 14: an apparatus for coding point cloud data, the apparatus comprising: a memory configured to store point cloud data; and one or more processors, the one or more processors implemented in circuitry and configured to: determining at least one of: 1) Nodes of the octree of point cloud data are not inter-predictable, or 2) angular modes are enabled for the nodes; determining an Inferred Direct Coding Mode (IDCM) mode for the node in response to determining the at least one of: 1) The node is not inter-predictable, or 2) an angular mode is enabled for the node; and decoding occupancy data of the node using the determined IDCM mode.
Clause 15: the apparatus of clause 14, wherein the node comprises a first node, and wherein the one or more processors are further configured to: determining that a second node of the octree is inter-predictable and disabling an angular mode for the second node; and responsive to determining that the second node is inter-predictable and disabling angular mode for the second node, coding occupancy data of the second node using inter-prediction.
Clause 16: the apparatus of clause 14, wherein to determine that the node of the octree of point cloud data is not inter-predictable, the one or more processors are configured to determine that a number of sibling nodes of the octree that miss predictions for the node exceeds a threshold.
Clause 17: the apparatus of clause 14, wherein to determine that an angular mode is enabled for the node, the one or more processors are configured to code a value for a syntax element, the value indicating that an angular mode is enabled for the node.
Clause 18: the apparatus of clause 14, wherein to decode the occupancy data using the determined IDCM pattern, the one or more processors are configured to decode data representing a location of a point in the node.
Clause 19: the apparatus of clause 14, wherein the determined IDCM mode comprises a location duplication mode, and wherein to decode the occupancy data of the node, the one or more processors are configured to: determining a reference node for the node; determining a location of a point in the reference node; and determining the point in the node from the location of the point in the reference node.
Clause 20: the apparatus of clause 19, wherein the one or more processors are configured to determine that the point of the node is located at the location of the point in the reference node.
Clause 21: the apparatus of clause 19, wherein to determine the point of the node, the one or more processors are configured to code a position residual value for the point of the node, the position residual value representing a position offset between the position of the point of the reference node and a position of the point of the node.
Clause 22: a computer-readable storage medium having instructions stored thereon that, when executed, cause a processor to: determining at least one of: 1) The nodes of the octree of point cloud data are not inter-predictable, or 2) angular modes are enabled for the nodes; determining an Inferred Direct Coding Mode (IDCM) mode for the node in response to determining the at least one of: 1) The node is not inter-predictable, or 2) an angular mode is enabled for the node; and decoding occupancy data of the node using the determined IDCM mode.
Clause 23: the computer-readable storage medium of clause 22, wherein the node comprises a first node, the computer-readable storage medium further comprising instructions that cause the processor to: determining that a second node of the octree is inter-predictable and disabling an angular mode for the second node; and responsive to determining that the second node is inter-predictable and disabling angular mode for the second node, coding occupancy data of the second node using inter-prediction.
Clause 24: the computer-readable storage medium of clause 22, wherein the instructions that cause the processor to determine that the node of the octree of point cloud data is not inter-predictable comprise instructions that cause the processor to determine that a number of sibling nodes of the octree that miss predictions for the node exceeds a threshold.
Clause 25: the computer-readable storage medium of clause 22, wherein the instructions that cause the processor to determine that angular mode is enabled for the node comprise instructions that cause the processor to code a value for a syntax element, the value indicating that angular mode is enabled for the node.
Clause 26: the computer-readable storage medium of clause 22, wherein the instructions that cause the processor to decode the occupancy data using the determined IDCM pattern comprise instructions that cause the processor to decode data representing the location of points in the node.
Clause 27: the computer-readable storage medium of clause 22, wherein the determined IDCM pattern comprises a location duplication pattern, and wherein the instructions that cause the processor to decode the occupancy data of the node comprise instructions that cause the processor to: determining a reference node for the node; determining a location of a point in the reference node; and determining the point in the node from the location of the point in the reference node.
Clause 28: the computer-readable storage medium of clause 27, wherein the instructions that cause the processor to determine the point of the node comprise instructions that cause the processor to determine that the point of the node is located at the location of the point in the reference node.
Clause 29: the computer-readable storage medium of clause 27, wherein the instructions that cause the processor to determine the point of the node comprise instructions that cause the processor to code a position residual value for the point of the node, the position residual value representing a positional offset between the position of the point of the reference node and a position of the point of the node.
Clause 30: an apparatus for coding point cloud data, the apparatus comprising: means for determining at least one of: 1) The nodes of the octree of point cloud data are not inter-predictable, or 2) angular modes are enabled for the nodes; means for determining an Inferred Direct Coding Mode (IDCM) mode for the node in response to determining the at least one of: 1) The node is not inter-predictable, or 2) an angular mode is enabled for the node; and means for decoding occupancy data of the node using the determined IDCM mode.
Clause 31: a method of coding point cloud data, the method comprising: determining at least one of: 1) The nodes of the octree of point cloud data are not inter-predictable, or 2) angular modes are enabled for the nodes; determining an Inferred Direct Coding Mode (IDCM) mode for the node in response to determining the at least one of: 1) The node is not inter-predictable, or 2) an angular mode is enabled for the node; and decoding occupancy data of the node using the determined IDCM mode.
Clause 32: the method of clause 1, wherein the node comprises a first node, the method further comprising: determining that a second node of the octree is inter-predictable and disabling an angular mode for the second node; and responsive to determining that the second node is inter-predictable and disabling angular mode for the second node, coding occupancy data of the second node using inter-prediction.
Clause 33: the method of any one of clauses 31 and 32, wherein determining that the node of the octree of point cloud data is not inter-predictable comprises determining that a number of sibling nodes of the octree that miss predictions for the node exceeds a threshold.
Clause 34: the method of any of clauses 31-33, wherein determining to enable angular mode for the node comprises coding a value for a syntax element, the value indicating that angular mode is enabled for the node.
Clause 35: the method of any of clauses 31-34, wherein decoding the occupancy data using the determined IDCM pattern comprises decoding data representing a location of a point in the node.
Clause 36: the method of any of clauses 31-35, wherein the determined IDCM pattern comprises a location duplication pattern, and wherein decoding the occupancy data of the node comprises: determining a reference node for the node; determining a location of a point in the reference node; and determining the point of the node from the location of the point in the reference node.
Clause 37: the method of clause 36, further comprising determining that the syntax element has a value indicating that the location copy mode is usable.
Clause 38: the method of any one of clauses 36 and 37, further comprising determining that the location replication pattern is usable based on a depth of the node in the octree.
Clause 39: the method of any of clauses 36 to 38, further comprising determining that the location duplication mode is usable according to the number of points in the reference node.
Clause 40: the method of any of clauses 36 to 39, wherein determining the point of the node comprises determining that the point of the node is located at the location of the point in the reference node.
Clause 41: the method of any of clauses 36-39, wherein determining the point of the node comprises coding a position residual value for the point of the node, the position residual value representing a position offset between the position of the point of the reference node and a position of the point of the node.
Clause 42: the method of any of clauses 31 to 41, wherein coding the occupancy data comprises decoding the occupancy data.
Clause 43: the method of any of clauses 31 to 42, wherein coding the occupancy data comprises encoding the occupancy data.
Clause 44: an apparatus for coding point cloud data, the apparatus comprising: a memory configured to store point cloud data; and one or more processors, the one or more processors implemented in circuitry and configured to: determining at least one of: 1) Nodes of the octree of point cloud data are not inter-predictable, or 2) angular modes are enabled for the nodes; determining an Inferred Direct Coding Mode (IDCM) mode for the node in response to determining the at least one of: 1) The node is not inter-predictable, or 2) an angular mode is enabled for the node; and decoding occupancy data of the node using the determined IDCM mode.
Clause 45: the apparatus of clause 44, wherein the node comprises a first node, and wherein the one or more processors are further configured to: determining that a second node of the octree is inter-predictable and disabling an angular mode for the second node; and responsive to determining that the second node is inter-predictable and disabling angular mode for the second node, coding occupancy data of the second node using inter-prediction.
Clause 46: the apparatus of any one of clauses 44 and 45, wherein to determine that the node of the octree of point cloud data is not inter-predictable, the one or more processors are configured to determine that a number of sibling nodes of the octree that miss predictions for the node exceeds a threshold.
Clause 47: the apparatus of any one of clauses 44 to 46, wherein to determine to enable angular mode for the node, the one or more processors are configured to code a value for a syntax element, the value indicating that angular mode is enabled for the node.
Clause 48: the apparatus of any of clauses 44-47, wherein to decode the occupancy data using the determined IDCM pattern, the one or more processors are configured to decode data representing a location of a point in the node.
Clause 49: the apparatus of any one of clauses 44 to 48, wherein the determined IDCM pattern comprises a location duplication pattern, and wherein to decode the occupancy data of the node, the one or more processors are configured to: determining a reference node for the node; determining a location of a point in the reference node; and determining the point in the node from the location of the point in the reference node.
Clause 50: the apparatus of clause 49, wherein the one or more processors are configured to determine that the point of the node is located at the location of the point in the reference node.
Clause 51: the apparatus of clause 49, wherein to determine the point of the node, the one or more processors are configured to code a position residual value for the point of the node, the position residual value representing a position offset between the position of the point of the reference node and a position of the point of the node.
Clause 52: a computer-readable storage medium having instructions stored thereon that, when executed, cause a processor to: determining at least one of: 1) The nodes of the octree of point cloud data are not inter-predictable, or 2) angular modes are enabled for the nodes; determining an Inferred Direct Coding Mode (IDCM) mode for the node in response to determining the at least one of: 1) The node is not inter-predictable, or 2) an angular mode is enabled for the node; and decoding occupancy data of the node using the determined IDCM mode.
Clause 53: the computer-readable storage medium of clause 52, wherein the node comprises a first node, the computer-readable storage medium further comprising instructions that cause the processor to: determining that a second node of the octree is inter-predictable and disabling an angular mode for the second node; and responsive to determining that the second node is inter-predictable and disabling angular mode for the second node, coding occupancy data of the second node using inter-prediction.
Clause 54: the computer-readable storage medium of any of clauses 52 and 53, wherein the instructions that cause the processor to determine that the node of the octree of point cloud data is not inter-predictable comprise instructions that cause the processor to determine that a number of sibling nodes of the octree that miss predictions for the node exceeds a threshold.
Clause 55: the computer-readable storage medium of any of clauses 52 to 54, wherein the instructions that cause the processor to determine that angular mode is enabled for the node comprise instructions that cause the processor to code a value for a syntax element, the value indicating that angular mode is enabled for the node.
Clause 56: the computer-readable storage medium of any of clauses 52 to 55, wherein the instructions that cause the processor to decode the occupancy data using the determined IDCM pattern comprise instructions that cause the processor to decode data representing the location of points in the node.
Clause 57: the computer-readable storage medium of any of clauses 52 to 56, wherein the determined IDCM pattern comprises a location duplication pattern, and wherein the instructions that cause the processor to decode the occupancy data of the node comprise instructions that cause the processor to: determining a reference node for the node; determining a location of a point in the reference node; and determining the point in the node from the location of the point in the reference node.
Clause 58: the computer-readable storage medium of clause 57, wherein the instructions that cause the processor to determine the point of the node comprise instructions that cause the processor to determine that the point of the node is located at the location of the point in the reference node.
Clause 59: the computer-readable storage medium of clause 57, wherein the instructions that cause the processor to determine the point of the node comprise instructions that cause the processor to code a position residual value for the point of the node, the position residual value representing a position offset between the position of the point of the reference node and a position of the point of the node.
Clause 60: an apparatus for coding point cloud data, the apparatus comprising: means for determining at least one of: 1) The nodes of the octree of point cloud data are not inter-predictable, or 2) angular modes are enabled for the nodes; means for determining an Inferred Direct Coding Mode (IDCM) mode for the node in response to determining the at least one of: 1) The node is not inter-predictable, or 2) an angular mode is enabled for the node; and means for decoding occupancy data of the node using the determined IDCM mode.
It is to be appreciated that certain acts or events of any of the techniques described herein can be performed in a different order, may be added, combined, or omitted entirely, depending on the example (e.g., not all of the described acts or events are necessary to implement the techniques). Further, in some examples, an action or event may be performed concurrently (e.g., by multi-threaded processing, interrupt processing, or multiple processors) rather than sequentially.
In one or more examples, the described functionality may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium, and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media (which corresponds to tangible media, such as data storage media) or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, the computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other non-transitory media, but instead are directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the terms "processor" and "processing circuitry" as used herein may refer to any one of the foregoing structures or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated into a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit or provided by a collection of interoperable hardware units (including one or more processors as described above) in combination with appropriate software and/or firmware.
Various examples have been described. These and other examples are within the scope of the following claims.

Claims (22)

1. A method of coding point cloud data, the method comprising:
Determining at least one of: 1) The nodes of the octree of point cloud data are not inter-predictable, or 2) angular modes are enabled for the nodes;
determining an Inferred Direct Coding Mode (IDCM) mode for the node in response to determining the at least one of: 1) The node is not inter-predictable, or 2) an angular mode is enabled for the node; and
The occupancy data of the node is decoded using the determined IDCM mode.
2. The method of claim 1, wherein the node comprises a first node, the method further comprising:
Determining that a second node of the octree is inter-predictable and disabling an angular mode for the second node; and
In response to determining that the second node is inter-predictable and disabling an angular mode for the second node, occupancy data of the second node is coded using inter-prediction.
3. The method of claim 1, wherein determining that the node of the octree of point cloud data is not inter-predictable comprises determining that a number of sibling nodes of the octree that miss-predictions for the node exceeds a threshold.
4. The method of claim 1, wherein determining to enable angular mode for the node comprises coding a value for a syntax element, the value indicating that angular mode is enabled for the node.
5. The method of claim 1, wherein coding the occupancy data using the determined IDCM pattern comprises coding data representing locations of points in the node.
6. The method of claim 1, wherein the determined IDCM mode comprises a location duplication mode, and wherein coding the occupancy data of the node comprises:
Determining a reference node for the node;
determining a location of a point in the reference node; and
A point in the node is determined from the location of the point in the reference node.
7. The method of claim 6, further comprising determining that a syntax element has a value indicating that the location copy mode is available for the node.
8. The method of claim 6, further comprising determining that the location replication mode is usable according to a depth of the node in the octree.
9. The method of claim 6, further comprising determining that the location duplication mode is usable based on the number of points in the reference node.
10. The method of claim 6, wherein determining the point in the node comprises determining that the point in the node is located at the location of the point in the reference node.
11. The method of claim 6, wherein determining the points in the nodes comprises coding position residual values for the points in the nodes, the position residual values representing a positional offset between the positions of the points in the reference nodes and the positions of the points in the nodes.
12. The method of claim 1, wherein coding the occupancy data comprises decoding the occupancy data.
13. The method of claim 1, wherein coding the occupancy data comprises encoding the occupancy data.
14. An apparatus for coding point cloud data, the apparatus comprising:
a memory configured to store point cloud data; and
One or more processors implemented in circuitry and configured to:
Determining at least one of: 1) Nodes of the octree of point cloud data are not inter-predictable, or 2) angular modes are enabled for the nodes;
determining an Inferred Direct Coding Mode (IDCM) mode for the node in response to determining the at least one of: 1) The node is not inter-predictable, or 2) an angular mode is enabled for the node; and
The occupancy data of the node is decoded using the determined IDCM mode.
15. The apparatus of claim 14, wherein the node comprises a first node, and wherein the one or more processors are further configured to:
Determining that a second node of the octree is inter-predictable and disabling an angular mode for the second node; and
In response to determining that the second node is inter-predictable and disabling an angular mode for the second node, occupancy data of the second node is coded using inter-prediction.
16. The apparatus of claim 14, wherein to determine that the node of the octree of point cloud data is not inter-predictable, the one or more processors are configured to determine that a number of sibling nodes of the octree that miss predictions for the node exceeds a threshold.
17. The apparatus of claim 14, wherein to determine to enable angular mode for the node, the one or more processors are configured to code a value for a syntax element, the value indicating that angular mode is enabled for the node.
18. The apparatus of claim 14, wherein to decode the occupancy data using the determined IDCM pattern, the one or more processors are configured to decode data representative of a location of a point in the node.
19. The apparatus of claim 14, wherein the determined IDCM mode comprises a location duplication mode, and wherein to decode the occupancy data of the node, the one or more processors are configured to:
Determining a reference node for the node;
determining a location of a point in the reference node; and
A point in the node is determined from the location of the point in the reference node.
20. The apparatus of claim 19, wherein the one or more processors are configured to determine that the point in the node is located at the location of the point in the reference node.
21. The apparatus of claim 19, wherein to determine the point in the node, the one or more processors are configured to code a position residual value for the point in the node, the position residual value representing a position offset between the position of the point in the reference node and a position of the point in the node.
22. A computer-readable storage medium having instructions stored thereon that, when executed, cause a processor to:
Determining at least one of: 1) The nodes of the octree of point cloud data are not inter-predictable, or 2) angular modes are enabled for the nodes;
determining an Inferred Direct Coding Mode (IDCM) mode for the node in response to determining the at least one of: 1) The node is not inter-predictable, or 2) an angular mode is enabled for the node; and
The occupancy data of the node is decoded using the determined IDCM mode.
CN202280063734.7A 2021-09-27 2022-09-20 Coding point cloud data in G-PCC using direct mode for inter-prediction Pending CN117999580A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US63/261,722 2021-09-27
US17/933,300 2022-09-19
US17/933,300 US20230099908A1 (en) 2021-09-27 2022-09-19 Coding point cloud data using direct mode for inter-prediction in g-pcc
PCT/US2022/076704 WO2023049698A1 (en) 2021-09-27 2022-09-20 Coding point cloud data using direct mode for inter prediction in g-pcc

Publications (1)

Publication Number Publication Date
CN117999580A true CN117999580A (en) 2024-05-07

Family

ID=90889574

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280063734.7A Pending CN117999580A (en) 2021-09-27 2022-09-20 Coding point cloud data in G-PCC using direct mode for inter-prediction

Country Status (1)

Country Link
CN (1) CN117999580A (en)

Similar Documents

Publication Publication Date Title
US20210407143A1 (en) Planar and azimuthal mode in geometric point cloud compression
JP2023520855A (en) Coding laser angles for angular and azimuthal modes in geometry-based point cloud compression
CN115298698A (en) Decoding of laser angles for angle and azimuth modes in geometry-based point cloud compression
WO2022147015A1 (en) Hybrid-tree coding for inter and intra prediction for geometry coding
CN118160006A (en) Inter-prediction coding with radius interpolation for predictive geometry-based point cloud compression
CA3189639A1 (en) Gpcc planar mode and buffer simplification
US20230099908A1 (en) Coding point cloud data using direct mode for inter-prediction in g-pcc
CN117999580A (en) Coding point cloud data in G-PCC using direct mode for inter-prediction
US11910021B2 (en) Planar and direct mode signaling in G-PCC
US20230018907A1 (en) Occupancy coding using inter prediction in geometry point cloud compression
US20230177739A1 (en) Local adaptive inter prediction for g-pcc
US11871037B2 (en) Sorted laser angles for geometry-based point cloud compression (G-PCC)
KR20240087699A (en) Coding of point cloud data using direct mode for inter prediction in G-PCC
WO2023059446A1 (en) Planar and direct mode signaling in g-pcc
CN117561544A (en) Occupied coding using inter prediction in geometric point cloud compression
CN117121492A (en) Performance enhancements to Geometric Point Cloud Compression (GPCC) plane modes using inter-prediction
WO2023102484A1 (en) Local adaptive inter prediction for g-pcc
KR20230170908A (en) Improved performance of geometric point cloud compression (GPCC) planar mode using inter-prediction
KR20240088764A (en) Planar and direct mode signaling in G-PCC
WO2024086604A1 (en) Decoding attribute values in geometry-based point cloud compression
CN116648914A (en) Global motion estimation using road and ground object markers for geometry-based point cloud compression
TW202408244A (en) Inter prediction coding for geometry point cloud compression
KR20230125786A (en) Global motion estimation using road and ground object labels for geometry-based point cloud compression
TW202420822A (en) Using vertical prediction for geometry point cloud compression
CN116636204A (en) Mixed tree coding for inter and intra prediction for geometric coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination