CN116648914A - Global motion estimation using road and ground object markers for geometry-based point cloud compression - Google Patents

Global motion estimation using road and ground object markers for geometry-based point cloud compression Download PDF

Info

Publication number
CN116648914A
CN116648914A CN202180088307.XA CN202180088307A CN116648914A CN 116648914 A CN116648914 A CN 116648914A CN 202180088307 A CN202180088307 A CN 202180088307A CN 116648914 A CN116648914 A CN 116648914A
Authority
CN
China
Prior art keywords
points
threshold
point
encoding
ground
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180088307.XA
Other languages
Chinese (zh)
Inventor
L·法姆范
A·K·拉马苏布拉莫尼安
B·雷
G·范德奥韦拉
M·卡尔切维茨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/558,362 external-priority patent/US11949909B2/en
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Priority claimed from PCT/US2021/064869 external-priority patent/WO2022146827A2/en
Publication of CN116648914A publication Critical patent/CN116648914A/en
Pending legal-status Critical Current

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

An example apparatus for encoding point cloud data includes: a memory configured to store data representing points of a point cloud; and one or more processors implemented in the circuitry and configured to: determining a height value of a point in the point cloud; classifying the points into a ground point set or an object point set according to the height value; and encoding and decoding the ground points and the object points according to the classification. The one or more processors may determine top and bottom thresholds and classify the ground points and object points based on the top and bottom thresholds. The one or more processors may also encode and decode a data structure including data representing the top and bottom thresholds, such as a Geometric Parameter Set (GPS).

Description

Global motion estimation using road and ground object markers for geometry-based point cloud compression
Cross Reference to Related Applications
The present application claims priority from U.S. application Ser. No. 17/558,362, filed on 21 and 12 months 2021, U.S. provisional application Ser. No. 63/131,637, filed on 29 and 12 months 2020, and U.S. provisional application Ser. No. 63/171,945, filed on 7 and 4 months 2021, each of which is incorporated herein by reference in its entirety. U.S. application Ser. No. 17/558,362 claims the benefits of U.S. provisional application Ser. No. 63/131,637, filed on month 29 of 2020, and U.S. provisional application Ser. No. 63/171,945, filed on month 4 of 2021.
Technical Field
The present disclosure relates to point cloud encoding and decoding.
Background
The point cloud contains a set of points in 3D space. Each point may have a set of attributes associated with the point. The attribute may be color information such as R, G, B or Y, cb, cr information or reflectivity information, or other attributes. The point cloud may be captured by various cameras or sensors, such as LIDAR sensors and 3D scanners. The point cloud may also be computer generated. The point cloud data may be used in a variety of applications including, but not limited to, construction (modeling), graphics (3D models for visualization and animation), and the automotive industry (LIDAR sensors to aid navigation).
A point cloud encoder/decoder (codec) may enclose the 3D space occupied by the point cloud data in a virtual bounding box. The position of a point in the bounding box may be represented by a certain accuracy. Thus, the point cloud codec may quantify the location of one or more points based on the accuracy. At a minimum level, the point cloud codec divides the bounding box into voxels (voxels), which are the smallest spatial units represented by a unit cube. The voxels in the bounding box may be associated with zero, one, or more than one point. The point cloud codec may divide the bounding box into a plurality of cube/cuboid regions, which may be referred to as tiles. The point cloud codec may encode the slice into one or more slices. The segmentation of the bounding box into slices and tiles may be based on the number of points in each partition, or based on other considerations (e.g., a particular region may be encoded into tiles). The slice region may be further partitioned using a partitioning decision similar to that in a video codec.
Disclosure of Invention
In general, this disclosure describes techniques for encoding (encoding and decoding) point cloud data. In particular, the G-PCC encoder and/or decoder may determine whether points in the point cloud are ground/road points or object points, and then use these classifications of points to encode (encode or decode) the points. For example, a G-PCC encoder or decoder may generate a global set of motion information for object points only. In some examples, the G-PCC encoder or decoder may also generate a global set of motion information for only ground/road points. Alternatively, the G-PCC encoder or decoder may use local motion information and/or intra prediction to encode the ground/road points.
In one example, a method of encoding and decoding G-PCC data includes: determining a height value of a point in the point cloud; classifying the points into a ground point set or an object point set according to the height value; and encoding and decoding the ground points and the object points according to the classification.
In another example, an apparatus for encoding and decoding G-PCC data includes: a memory configured to store data representing points of a point cloud; and one or more processors implemented in the circuitry configured to: determining a height value of a point in the point cloud; classifying the points into a ground point set or an object point set according to the height value; and encoding and decoding the ground points and the object points according to the classification.
In another example, a computer-readable storage medium has instructions stored thereon that, when executed, cause a processor to: determining a height value of a point in the point cloud; classifying the points into a ground point set or an object point set according to the height value; and encoding and decoding the ground points and the object points according to the classification.
In another example, an apparatus for encoding and decoding G-PCC data includes: means for determining a height value of a point in the point cloud; means for classifying the points into a ground point set or an object point set according to the height value; and means for encoding and decoding the ground points and the object points according to the classification.
The details of one or more examples of the disclosure are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the disclosure will be apparent from the description and drawings, and from the claims.
Drawings
FIG. 1 is a block diagram illustrating an example encoding and decoding system that may perform the techniques of this disclosure.
Fig. 2 is a block diagram illustrating an example geometric point cloud compression (G-PCC) encoder that may be configured to perform the techniques of this disclosure.
Fig. 3 is a conceptual diagram illustrating an example of inter prediction coding in G-PCC.
Fig. 4 is a block diagram illustrating an example G-PCC decoder that may be configured to perform the techniques of this disclosure.
Fig. 5 is a conceptual diagram illustrating an example of inter prediction decoding in G-PCC.
Fig. 6 is a conceptual diagram illustrating an example prediction tree that may be used in performing the techniques of this disclosure.
Fig. 7 is a conceptual diagram illustrating an example rotational LIDAR acquisition model.
FIG. 8 is a flow chart illustrating an example motion estimation process of G-PCC InterEM software.
FIG. 9 is a flowchart illustrating an example process for estimating global motion.
Fig. 10 is a flow chart illustrating an example process for estimating local node motion vectors.
Fig. 11 is a diagram illustrating an example of classifying a cloud into a ground (road) and an object using two thresholds of z-values of points according to the techniques of this disclosure.
Fig. 12 is a diagram illustrating an example of deriving a threshold using a histogram in accordance with the techniques of this disclosure.
Fig. 13 is a conceptual diagram illustrating marking points in a cloud as roads and objects according to the techniques of this disclosure.
Fig. 14 is a flowchart illustrating an example method of encoding a point cloud according to the techniques of this disclosure.
Fig. 15 is a flowchart illustrating an example method of decoding a point cloud in accordance with the techniques of this disclosure.
Fig. 16 is a conceptual diagram illustrating a laser package (such as a LIDAR sensor or other system including one or more laser components) scanning a point in three-dimensional space.
FIG. 17 is a conceptual diagram illustrating an example ranging system 900 that may be used with one or more techniques of this disclosure.
FIG. 18 is a conceptual diagram illustrating an example vehicle-based scenario in which one or more techniques of the present disclosure may be used.
Fig. 19 is a conceptual diagram illustrating an example augmented reality system in which one or more techniques of the present disclosure may be used.
Fig. 20 is a conceptual diagram illustrating an example mobile device system in which one or more techniques of this disclosure may be used.
Detailed Description
The point cloud data may be generated using, for example, a LIDAR system mounted on a car. The LIDAR system may intermittently (in burst) fire a laser light in a plurality of different directions over time as the car moves. Thus, for a given laser emission, a point cloud may be formed. To compress the point cloud data, the respective point clouds (frames) may be encoded with respect to each other, for example, using intra-frame prediction or inter-frame prediction. The present disclosure recognizes that because most objects around a car will remain relatively stationary, a common global motion vector (which can be expected to generally correspond to the direction and offset traversed by the car) can be used to predict points in a point cloud corresponding to the object. However, points along the ground typically remain stationary between frames because the laser may be expected to identify points at opposite locations within each frame because the road or ground beneath the car is expected to be relatively flat.
Accordingly, the present disclosure describes techniques that may reduce signaling overhead and codec information. In particular, a geometric point cloud compression (G-PCC) encoder and a G-PCC decoder may be configured to separately encode and decode objects and road/ground points. That is, the G-PCC encoder may be configured to classify points in the point cloud as either object points or ground/road points, and then encode the object points using global motion vectors while separately encoding the ground/road points (e.g., using zero motion vectors, second different global motion vectors, corresponding local motion vectors, intra-prediction, or other different encoding techniques). Similarly, the G-PCC decoder may decode object points separately from road/ground points. Using a single global motion vector for all object points in this way consumes less bits than using separate local motion vectors for each point in the point cloud separately. Likewise, encoding and decoding all roads/ground points together may reduce signaling overhead and encoding and decoding data. These techniques may also reduce the number of processing operations required to encode and decode the point cloud. In this way, these techniques may improve the operating efficiency of the G-PCC encoding and decoding device as well as the overall range (field) of geometry-based point cloud compression.
Fig. 1 is a block diagram illustrating an example encoding and decoding system 100 that may perform the techniques of this disclosure. The techniques of this disclosure are generally directed to encoding (encoding and/or decoding) point cloud data, i.e., supporting point cloud compression. In general, point cloud data includes any data used to process a point cloud. The codec may effectively compress and/or decompress the point cloud data.
As shown in fig. 1, the system 100 includes a source device 102 and a destination device 116. The source device 102 provides encoded point cloud data to be decoded by the destination device 116. Specifically, in the example of fig. 1, the source device 102 provides point cloud data to the destination device 116 via the computer readable medium 110. The source device 102 and the destination device 116 may comprise any of a variety of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets such as smart phones, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, land or marine vehicles, spacecraft, aircraft, robots, LIDAR devices, satellites, and the like. In some cases, the source device 102 and the destination device 116 may be equipped for wireless communication.
In the example of fig. 1, source device 102 includes a data source 104, a memory 106, a G-PCC encoder 200, and an output interface 108. Destination device 116 includes an input interface 122, a G-PCC decoder 300, a memory 120, and a data consumer 118. In accordance with the present disclosure, the G-PCC encoder 200 of the source device 102 and the G-PCC decoder 300 of the destination device 116 may be configured to apply the techniques of the present disclosure in relation to marking points in a point cloud as ground points or object points according to their altitude values. Thus, the source device 102 represents an example of an encoding apparatus, and the destination device 116 represents an example of a decoding device. In other examples, the source device 102 and the destination device 116 may include other components or arrangements. For example, the source device 102 may receive data (e.g., point cloud data) from an internal or external source. Likewise, the destination device 116 may interface with external data consumers rather than including the data consumers in the same device.
The system 100 as shown in fig. 1 is merely one example. In general, other digital encoding and/or decoding devices may perform the techniques of the present disclosure related to marking points in a point cloud as ground points or object points based on their altitude values. Source device 102 and destination device 116 are merely examples of devices in which source device 102 generates decoded data for transmission to destination device 116. The present disclosure refers to "codec" devices as devices that perform the codec (encoding and/or decoding) of data. Thus, the G-PCC encoder 200 and the G-PCC decoder 300 represent examples of codec devices (specifically, encoders and decoders), respectively. In some examples, the source device 102 and the destination device 116 may operate in a substantially symmetrical manner such that each of the source device 102 and the destination device 116 includes encoding and decoding components. Thus, the system 100 may support unidirectional or bidirectional transmission between the source device 102 and the destination device 116, for example, for streaming, playback, broadcasting, telephony, navigation, and other applications.
In general, data source 104 represents a source of data (i.e., raw, unencoded point cloud data) and may provide G-PCC encoder 200 with a continuous series of "frames" of data, which G-PCC encoder 200 encodes. The data source 104 of the source device 102 may include a point cloud capture device, such as any of a variety of cameras or sensors, for example, a 3D scanner or light detection and ranging (LIDAR) device, one or more cameras, a profile containing previously captured data, and/or a data feed interface that receives data from a data content provider. Alternatively or additionally, the point cloud data may be generated by a computer from a scanner, camera, sensor, or other data. For example, the data source 104 may generate computer graphics-based data as source data, or a combination of real-time data, archived data, and computer-generated data. In each case, the G-PCC encoder 200 encodes captured, pre-captured, or computer-generated data. The G-PCC encoder 200 may rearrange the order of frames from reception (sometimes referred to as "display order") into a codec order for codec. G-PCC encoder 200 may generate one or more bitstreams including the encoded data. The source device 102 may then output the encoded data via the output interface 108 onto the computer-readable medium 110 for receipt and/or retrieval, for example, by the input interface 122 of the destination device 116.
The memory 106 of the source device 102 and the memory 120 of the destination device 116 may represent general purpose memory. In some examples, memory 106 and memory 120 may store raw data, such as raw data from data source 104 and raw, decoded data from G-PCC decoder 300. Additionally or alternatively, memory 106 and memory 120 may store software instructions executable by, for example, G-PCC encoder 200 and G-PCC decoder 300, respectively. Although memory 106 and memory 120 are shown separately from G-PCC encoder 200 and G-PCC decoder 300 in this example, it should be appreciated that G-PCC encoder 200 and G-PCC decoder 300 may also include internal memory for functionally similar or equivalent purposes. Further, memory 106 and memory 120 may store encoded data, for example, output from G-PCC encoder 200 and input to G-PCC decoder 300. In some examples, memory 106 and portions of memory 120 may be allocated as one or more buffers, e.g., for storing raw, decoded, and/or encoded data. For example, memory 106 and memory 120 may store data representing a point cloud.
Computer-readable medium 110 may represent any type of medium or device capable of transmitting encoded data from source device 102 to destination device 116. In one example, the computer-readable medium 110 represents a communication medium that enables the source device 102 to transmit encoded data directly to the destination device 116 in real-time, e.g., via a radio frequency network or a computer-based network. According to a communication standard, such as a wireless communication protocol, output interface 108 may modulate a transmission signal including encoded data, and input interface 122 may demodulate a received transmission signal. The communication medium may include any wireless or wired communication medium such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the internet. The communication medium may include a router, switch, base station, or any other device that facilitates communication from source device 102 to destination device 116.
In some examples, source device 102 may output encoded data from output interface 108 to storage device 112. Similarly, destination device 116 may access encoded data from storage device 112 via input interface 122. Storage device 112 may include any of a variety of distributed or locally accessed data storage media such as a hard disk, blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded data.
In some examples, source device 102 may output the encoded data to file server 114 or another intermediate storage device that may store the encoded data generated by source device 102. The destination device 116 may access the stored data from the file server 114 via streaming or download. File server 114 may be any type of server device capable of storing encoded data and transmitting the encoded data to destination device 116. File server 114 may represent a web server (e.g., for a web site), a File Transfer Protocol (FTP) server, a content delivery network device, or a Network Attached Storage (NAS) device. The destination device 116 may access the encoded data from the file server 114 via any standard data connection, including an internet connection. This may include a wireless channel (e.g., wi-Fi connection), a wired connection (e.g., digital Subscriber Line (DSL), cable modem, etc.), or a combination of both, adapted to access the encoded data stored on the file server 114. The file server 114 and the input interface 122 may be configured to operate in accordance with a streaming protocol, a download transmission protocol, or a combination thereof.
Output interface 108 and input interface 122 may represent wireless transmitters/receivers, modems, wired network components (e.g., ethernet cards), wireless communication components operating according to any of a variety of IEEE 802.11 standards, or other physical components. In examples where output interface 108 and input interface 122 comprise wireless components, output interface 108 and input interface 122 may be configured to transmit data, such as encoded data, in accordance with cellular communication standards, such as 4G, 4G-LTE (long term evolution), LTE-advanced, 5G, and the like. In some examples where output interface 108 includes a wireless transmitter, output interface 108 and input interface 122 may be configured to be in accordance with a protocol such as the IEEE 802.11 specification, the IEEE 802.15 specification (e.g., zigBee) TM )、Bluetooth TM Other wireless standards such as standards to transmit such as encodedData of the data of (a). In some examples, source device 102 and/or destination device 116 may include respective system-on-chip (SoC) devices. For example, source device 102 may include a SoC device that performs the functions attributed to G-PCC encoder 200 and/or output interface 108, and destination device 116 may include a SoC device that performs the functions attributed to G-PCC decoder 300 and/or input interface 122.
The techniques of this disclosure may be applied to encoding and decoding to support any of a variety of applications, such as communications between autonomous vehicles, scanners, cameras, communications between sensors and processing devices (e.g., local or remote servers), geographic mapping, or other applications.
The input interface 122 of the destination device 116 receives the encoded bitstream from the computer readable medium 110 (e.g., communication medium, storage device 112, file server 114, etc.). The encoded bitstream may include signaling information defined by the G-PCC encoder 200, which is also used by the G-PCC decoder 300, such as syntax elements having values describing characteristics and/or processing of the codec unit (e.g., slice, picture, group of pictures, sequence, etc.). The data consumer 118 uses the decoded data. For example, the data consumer 118 may use the decoded data to determine the location of the physical object. In some examples, the data consumer 118 may include a display that presents images based on the point cloud.
G-PCC encoder 200 and G-PCC decoder 300 may each be implemented as any of a variety of suitable encoder and/or decoder circuits, such as one or more microprocessors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware or any combination thereof. When the techniques are implemented in part in software, a device may store instructions of the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of the G-PCC encoder 200 and the G-PCC decoder 300 may be included in one or more encoders or decoders, any of which may be integrated as part of a combined encoder/decoder (codec) in the respective device. The devices that include G-PCC encoder 200 and/or G-PCC decoder 300 may include one or more integrated circuits, microprocessors, and/or other types of devices.
The G-PCC encoder 200 and the G-PCC decoder 300 may operate according to a codec standard, such as a video point cloud compression (V-PCC) standard or a geometric point cloud compression (G-PCC) standard. The present disclosure may generally relate to encoding and decoding (e.g., encoding and decoding) of pictures, including processes of encoding or decoding data. The encoded bitstream typically includes a series of values representing syntax elements of a codec decision (e.g., a codec mode).
The present disclosure may generally relate to "signaling" certain information, such as syntax elements. The term "signaling" may generally refer to the communication of syntax elements used to decode encoded data and/or values of other data. That is, the G-PCC encoder 200 may signal the value of the syntax element in the bitstream. Typically, signaling refers to generating values in the bit stream. As described above, the source device 102 may stream the bits to the destination device 116 in substantially real-time or non-real-time (such as may occur when storing the syntax elements to the storage device 112 for later retrieval by the destination device 116).
The potential need for standardization of point cloud codec technology is under investigation by ISO/IEC MPEG (JTC 1/SC 29/WG 11), whose compression capabilities significantly exceed current methods, and will strive to create this standard. The team is collaborating on this exploratory activity, called the three-dimensional graphics team (3 DG), to evaluate the compression technology design proposed by the domain expert.
Point cloud compression activity is categorized into two different approaches. The first approach is "video point cloud compression" (V-PCC), which segments a 3D object and projects the segments into multiple 2D planes (denoted as "slices" in 2D frames) that are further encoded by conventional 2D video codecs such as High Efficiency Video Codec (HEVC) (ITU-T h.265) codecs. The second approach is "geometry-based point cloud compression" (G-PCC), which directly compresses the 3D geometry (i.e., the location of the point set in 3D space) and associated attribute values (attribute values for each point associated with the 3D geometry). G-PCC addresses compression of point clouds in class 1 (static point clouds) and class 3 (dynamically acquired point clouds). The G-PCC standard draft is available in G-PCC DIS, ISO/IEC JTC1/SC29/WG11w19522, MPEG-131, teleconference, month 7 in 2020, and the codec description is available in G-PCC codec description, ISO/IEC JTC1/SC29/WG11 w19525, MPEG-131, teleconference, month 7 in 2020.
The units shown are logical and do not necessarily correspond one-to-one to the code implemented in the reference implementation of the G-PCC codec, i.e. the TMC13 test model software studied by ISO/IEC MPEG (JTC 1/SC29/WG 11). Similarly, the illustrated elements do not necessarily correspond one-to-one with the hardware elements in a hardware implementation of the G-PCC codec.
In both the G-PCC encoder 200 and the G-PCC decoder 300, the point cloud location is first encoded. The attribute codec depends on the decoded geometry. For class 3 data, the compressed geometry is typically represented as a leaf-level octree from root-direct to individual voxels (voxels). For class 1 data, the compressed geometry is typically represented by a pruned octree (i.e., an octree from root down to a leaf level greater than the block of voxels) plus a model of the surface within each leaf of the approximately pruned octree. In this way, the class 1 and class 3 data share an octree codec mechanism, while the class 1 data may additionally approximate the voxels within each leaf with a surface model. The surface model used is triangulation (triangulation) comprising 1-10 triangles per block, resulting in a triangle set (triangulation). Thus, the class 1 geometry codec is referred to as a triangle set geometry codec, while the class 3 geometry codec is referred to as an octree geometry codec.
At each node of the octree, one or more of its children (up to eight nodes) is signaled (when not inferred) to occupy (occupancy). Designating a plurality of neighbors, including (a) nodes sharing a face (face) with the current octree node, (b) nodes sharing a face, edge, or vertex with the current octree node, and so on. Within each neighborhood, the occupancy of a node and/or its child nodes may be used to predict the occupancy of the current node or its child nodes. For sparsely populated points in certain nodes of the octree, the codec also supports a direct codec mode in which the 3D positions of the points are directly encoded. A notification flag may be signaled to indicate that the direct mode is signaled. At the lowest level, the number of points associated with octree nodes/leaf nodes may also be encoded.
Once the geometry is encoded, the attributes corresponding to the geometry points are encoded. When there are multiple attribute points corresponding to one reconstructed/decoded geometry point, an attribute value representing the reconstructed point may be derived.
There are three attribute codec methods in G-PCC: regional Adaptive Hierarchical Transform (RAHT) codec, interpolation-based hierarchical nearest neighbor prediction (predictive transform), and interpolation-based hierarchical nearest neighbor prediction with update/boost steps (boost transform). RAHT and boosting are typically used for class 1 data, while prediction is typically used for class 3 data. However, either method can be used for any data, just like the geometric codec in G-PCC, the property codec method for the codec point cloud is specified in the bitstream.
The encoding and decoding of the attributes may be performed in a level of detail (LOD), where for each level of detail a finer representation of the point cloud attributes may be obtained. Each level of detail may be specified based on a distance metric from the neighboring node or based on a sampling distance.
The G-PCC encoder 200 may quantize a residual obtained as an output of the codec method of the attribute. The G-PCC encoder 200 may entropy encode the quantized residual using context adaptive arithmetic coding.
In accordance with the techniques of this disclosure, G-PCC encoder 200 and G-PCC decoder 300 may be configured to separately encode/decode points of a point cloud based on classification of the points. In particular, the G-PCC encoder 200 and the G-PCC decoder 300 may be configured to classify points into, for example, ground (or road) points and object points. In some examples, a LIDAR system mounted on a car may project laser light into the surrounding environment to build a point cloud. The present disclosure recognizes that the ground or road on which the automobile is traveling will likely remain relatively flat and stable between frames (i.e., between corresponding point cloud build instances). Thus, the points collected at the ground or road location should be nearly identical between the respective frames.
For other portions of the point cloud, the identified points may correspond to non-road/ground objects. Thus, the relative position of each point corresponding to a non-road/ground object may vary from frame to frame in substantially the same manner due to the speed of the car. As such, it may be efficient to encode and decode points corresponding to objects using global motion vectors and to encode and decode points corresponding to roads or floors using different mechanisms (e.g., different global motion vectors (e.g., zero-value global motion vectors only), local motion vectors, or intra-prediction).
G-PCC encoder 200 may determine a threshold for classifying a point as either a ground/road point (hereinafter commonly referred to as a "ground" point) or an object point. For example, the G-PCC encoder 200 may determine a top threshold and a bottom threshold, typically representing the top and bottom of a ground or road. Thus, if points are between these two thresholds, these points may be classified as ground points, while other points (e.g., points above the top threshold or below the bottom threshold) may be classified as object points. G-PCC encoder 200 may encode data representing the top and bottom thresholds in a data structure, such as a Sequence Parameter Set (SPS), a Geometric Parameter Set (GPS), or a geometric data unit header (GDH). G-PCC encoder 200 and G-PCC decoder 300 may thus encode or decode occupancy of nodes above a top threshold or below a bottom threshold using global motion vectors, and encode or decode occupancy of nodes between the top threshold and the bottom threshold using a second, different global motion vector, local motion vector, intra-prediction, or other different prediction techniques.
In this way, the techniques of this disclosure may result in more efficient encoding and decoding of object points. Instead of using the respective local motion vectors to encode points in the point cloud, a single global motion vector may be used to predict all object points between the respective clouds. Thus, the signaling overhead associated with signaling motion information of object points can be significantly reduced. Furthermore, because it can be largely assumed that the ground point will remain constant between frames, the ground point codec technique can consume a relatively small number of bits.
Fig. 2 is a block diagram illustrating example components of G-PCC encoder 200 of fig. 1 that may be configured to perform the techniques of this disclosure. In the example of fig. 2, G-PCC encoder 200 includes a memory 228, a coordinate transformation unit 202, a color transformation unit 204, a stereo pixelation unit 206, an attribute transfer unit 208, an octree analysis unit 210, a surface approximation analysis unit 212, an arithmetic coding unit 214, a geometry reconstruction unit 216, a RAHT unit 218, a LOD generation unit 220, a lifting unit 222, a coefficient quantization unit 224, and an arithmetic coding unit 226. In fig. 2, gray shaded cells are options that are commonly used for category 1 data. In fig. 2, the diagonal cross-hatched cells are the options commonly used for category 3 data. All other units are common between category 1 and category 3.
As shown in the example of fig. 2, G-PCC encoder 200 may receive a set of locations and a set of attributes. The location may include coordinates of points in the point cloud. The attributes may include information about points in the point cloud, such as colors associated with the points in the point cloud.
The coordinate transformation unit 202 may apply a transformation to the point coordinates to transform the coordinates from an initial domain to a transformation domain. The present disclosure may refer to transformed coordinates as transformed coordinates. The color transformation unit 204 may apply a transformation to transform the color information of the attribute to a different domain. For example, the color conversion unit 204 may convert color information from an RGB color space to a YCbCr color space.
Further, in the example of fig. 2, the stereo pixelation unit 206 may stereo pixelate the transformed coordinates. The stereo pixelation of transformed coordinates may include quantizing and removing some points of the point cloud. In other words, multiple points of a point cloud may be contained within a single "voxel," which may be considered a point in some respect hereafter. The octree analysis unit 210 may also store data representing occupied voxels (i.e., voxels occupied by points of the point cloud) in the memory 228 (e.g., in a history buffer of the memory 228).
Furthermore, the arithmetic coding unit 214 may entropy-encode data representing occupancy of the octree. In some examples, the arithmetic coding unit 214 may entropy encode the occupancy data based only on the data of the current point cloud (this may be referred to as "intra prediction" of the current point cloud). In other examples, the arithmetic coding unit 214 may entropy encode the occupancy data with reference to a previous octree of a previous point cloud, e.g., buffered in the memory 228 (this may be referred to as "inter prediction" of the current point cloud relative to the reference cloud). The arithmetic coding unit 214 may use local or global motion vectors to perform inter prediction, for example, as discussed in more detail below with respect to fig. 3.
In particular, according to the techniques of this disclosure, arithmetic coding unit 214 may entropy decode data representing thresholds (e.g., top and bottom thresholds) used to define ground points (or road points) and object points. The top and bottom thresholds may correspond to a series of frames (point clouds). The arithmetic coding unit 214 may also entropy decode data representing a global motion vector of a current point cloud of the series of frames. The arithmetic coding unit 214 may form a predicted cloud from previous point clouds buffered in the memory 228 using the global motion vector and determine a context using occupancy of nodes in the predicted cloud to entropy decode occupancy data of any of nodes above a top threshold or nodes below a bottom threshold of the current cloud. The arithmetic coding unit 214 may use different prediction techniques for the ground/road points, such as different global motion vectors, local motion vectors, intra prediction or another alternative entropy decoding/prediction technique.
Additionally, in the example of fig. 2, the surface approximation analysis unit 212 may analyze the points to potentially determine a surface representation of the set of points. The arithmetic coding unit 214 may entropy-code syntax elements representing the information of the surface and/or octree determined by the surface approximation analysis unit 212. The G-PCC encoder 200 may output these syntax elements in a geometric bitstream.
The geometry reconstruction unit 216 may reconstruct transformed coordinates of points in the point cloud based on the octree, data indicative of the surface determined by the surface approximation analysis unit 212, and/or other information. The number of transformed coordinates reconstructed by the geometry reconstruction unit 216 may be different from the number of original points of the point cloud due to the stereo pixelation and surface approximation. The present disclosure may refer to the resulting points as reconstructed points. The attribute transfer unit 208 may transfer the attributes of the original points of the point cloud to the reconstructed points of the point cloud.
Furthermore, the RAHT unit 218 may apply RAHT codec to the attributes of the reconstructed points. Alternatively or additionally, the LOD generation unit 220 and the lifting unit 222 may apply LOD processing and lifting, respectively, to the properties of the reconstructed points. The RAHT unit 218 and the lifting unit 222 may generate coefficients based on the attributes. The coefficient quantization unit 224 may quantize the coefficients generated by the RAHT unit 218 or the lifting unit 222. The arithmetic coding unit 226 may apply arithmetic coding to syntax elements representing quantized coefficients. The G-PCC encoder 200 may output these syntax elements in the attribute bitstream.
Fig. 3 is a conceptual diagram illustrating an example of inter prediction coding in G-PCC. In some examples, G-PCC encoder 200 may decode/reproduce the point cloud to form reference cloud 130. In other examples, G-PCC encoder 200 may simply store an unencoded historical version of the previous point cloud. Reference cloud 130 may be stored in a decoded frame buffer or history buffer (i.e., memory) of G-PCC encoder 200. G-PCC encoder 200 may also use inter-frame prediction, at least in part, to obtain current cloud 140 to be encoded. For example, G-PCC encoder 200 may use the techniques of this disclosure to determine a set of points of current cloud 140 that are to be predicted using global motion (rather than local motion or intra prediction).
G-PCC encoder 200 may compare the location of the point of current cloud 140 to be inter-predicted with the location of the point of reference cloud 130 and calculate global motion vector 132. Global motion vector 132 may represent a global motion vector that most accurately predicts the location of a point of the current cloud to be inter-predicted using global motion relative to reference cloud 130. G-PCC encoder 200 may then form predicted cloud 134 by applying global motion vector 132 to reference cloud 130. That is, G-PCC encoder 200 may construct predicted cloud 134 by applying global motion vector 132 to each point of reference cloud 130 at various locations, and setting the occupancy of the nodes to include points in predicted cloud 134 at the corresponding locations offset by global motion vector 132.
The G-PCC encoder 200 (and in particular, the arithmetic coding unit 214) may then encode the points of the nodes of the current cloud 140 using the corresponding points within the nodes of the predicted cloud 134 to determine a context for context-based entropy encoding (e.g., context Adaptive Binary Arithmetic Coding (CABAC)). For example, the arithmetic coding unit 214 may encode the occupancy of the current node 142 of the current cloud 140 using the occupancy of the reference node 136 (which corresponds to the location of the current node 142 indicated by the vector 144) to determine a context for encoding the value of the occupancy of the current node 142.
For example, if the reference node 136 is occupied (that is, includes a point), the arithmetic coding unit 214 may determine a first context for encoding a value representing the occupancy of the current node 142. The first context may indicate the most probable symbol of the value representing occupancy of the current node 142 as having a high likelihood (e.g., "1") of the value representing occupancy of the current node 142. On the other hand, if the reference node 136 is unoccupied (that is, does not include any points), the arithmetic coding unit 214 may determine a second context for encoding a value representing the occupancy of the current node 142. The second context may indicate the most probable symbol of the value representing occupancy of the current node 142 as having a high likelihood (e.g., "0") of a value representing non-occupancy of the current node 142. The arithmetic coding unit 142 may then determine whether the current node 142 is actually occupied, determine a value indicating whether the current node 142 is actually occupied, and then entropy-encode the value using the determined context (e.g., the first context or the second context). The arithmetic coding unit 214 may add the entropy encoded value to the bitstream 146 and advance to the next node (or next cloud) of the current cloud 140.
Fig. 4 is a block diagram illustrating example components of G-PCC decoder 300 of fig. 1, which may be configured to perform the techniques of this disclosure. In the example of fig. 4, the G-PCC decoder 300 includes a geometric arithmetic decoding unit 302, a memory 324, an attribute arithmetic decoding unit 304, an octree synthesis unit 306, an inverse quantization unit 308, a surface approximation synthesis unit 310, a geometric reconstruction unit 312, a RAHT unit 314, a LOD generation unit 316, an inverse lifting unit 318, an inverse transform coordinate unit 320, and an inverse transform color unit 322. In fig. 4, gray shaded cells are options that are commonly used for category 1 data. In fig. 4, the diagonal cross-hatched cells are the options commonly used for category 3 data. All other units are common between category 1 and category 3.
G-PCC decoder 300 may obtain a geometry bitstream and an attribute bitstream. The geometric arithmetic decoding unit 302 of the decoder 300 may apply arithmetic decoding (e.g., context Adaptive Binary Arithmetic Coding (CABAC) or other types of arithmetic decoding) to syntax elements in the geometric bitstream. Similarly, the attribute arithmetic decoding unit 304 may apply arithmetic decoding to syntax elements in the attribute bitstream.
The geometric arithmetic decoding unit 302 may entropy-decode data representing occupancy of the octree of the current point cloud. In some examples, the geometric arithmetic decoding unit 302 may entropy decode the occupancy data based only on the data of the current point cloud (this may be referred to as "intra prediction" of the current point cloud). In other examples, the geometric arithmetic decoding unit 302 may entropy decode the occupancy data with reference to a previous octree of a previous point cloud, e.g., buffered in the memory 324 (this may be referred to as "inter prediction" of the current point cloud relative to the reference cloud). The geometric arithmetic decoding unit 302 may use local or global motion vectors to perform inter prediction, for example, as discussed in more detail below with respect to fig. 5.
In particular, according to the techniques of this disclosure, the geometric arithmetic decoding unit 302 may entropy decode data representing thresholds (e.g., top and bottom thresholds) used to define ground points (or road points) and object points. The top and bottom thresholds may correspond to a series of frames (point clouds). The geometric arithmetic decoding unit 302 may also entropy decode data representing a global motion vector of a current point cloud of the series of frames. The geometric arithmetic decoding unit 302 may form a predicted cloud using global motion vectors from previous point clouds buffered in the memory 324 and determine a context using occupancy of nodes in the predicted cloud to entropy decode occupancy data of any of nodes above a top threshold or nodes below a bottom threshold of the current cloud. The geometric arithmetic decoding unit 302 may use different prediction techniques for the ground/road points, such as different global motion vectors, local motion vectors, intra prediction, or another alternative entropy decoding/prediction technique.
The octree synthesis unit 306 may synthesize octrees based on data of syntax elements parsed from the geometric bitstream and entropy-decoded by the geometric arithmetic decoding unit 302. In examples where surface approximations are used in the geometric bitstream, the surface approximation synthesis unit 310 may determine the surface model based on syntax elements parsed from the geometric bitstream and based on octree.
Further, the geometric reconstruction unit 312 may perform reconstruction to determine coordinates of points in the point cloud. The inverse transform coordinate unit 320 may apply an inverse transform to the reconstructed coordinates to convert the reconstructed coordinates (positions) of the points in the point cloud from the transform domain back to the initial domain.
Additionally, in the example of fig. 4, the inverse quantization unit 308 may inverse quantize the attribute values. The attribute value may be based on syntax elements obtained from the attribute bitstream (e.g., including syntax elements decoded by the attribute arithmetic decoding unit 304).
Depending on how the attribute values are encoded, the RAHT unit 314 may perform RAHT codec to determine color values of points of the point cloud based on the inversely quantized attribute values. Alternatively, the LOD generation unit 316 and the inverse boost unit 318 may use a level of detail based technique to determine color values of points of the point cloud.
Further, in the example of fig. 4, the inverse transform color unit 322 may apply an inverse color transform to the color values. The inverse color transform may be the inverse of the color transform applied by the color transform unit 204 of the encoder 200. For example, the color conversion unit 204 may convert color information from an RGB color space to a YCbCr color space. Accordingly, the inverse color transform unit 322 may transform color information from the YCbCr color space to the RGB color space.
The various elements of fig. 2 and 4 are illustrated to aid in understanding the operations performed by the encoder 200 and decoder 300. These units may be implemented as fixed function circuits, programmable circuits or a combination thereof. The fixed function circuit refers to a circuit that provides a specific function and presets an operation that can be performed. Programmable circuitry refers to circuitry that can be programmed to perform various tasks and provide flexible functionality in the operations that can be performed. For example, the programmable circuit may run software or firmware that causes the programmable circuit to operate in a manner defined by instructions of the software or firmware. Fixed function circuitry may execute software instructions (e.g., to receive parameters or output parameters) but the type of operation that fixed function circuitry performs is typically not variable. In some examples, one or more of the units may be different circuit blocks (fixed function or programmable), and in some examples, one or more of the units may be integrated circuits.
Fig. 5 is a conceptual diagram illustrating an example of inter prediction decoding in G-PCC. In accordance with the techniques of this disclosure, G-PCC decoder 300 may decode a set of points of current cloud 160 using the global motion vector inter-prediction technique of fig. 5, and decode a second set of points of current cloud 160 using local motion vector inter-prediction or intra-prediction. G-PCC decoder 300 may receive and decode data of bit stream 166 that indicates whether a set of points of one or more nodes are to be decoded using global motion vector inter prediction.
G-PCC decoder 300 may initially decode one or more previous point clouds and store the previously decoded point clouds in a buffer or history buffer of decoded frames (i.e., memory of G-PCC decoder 300). G-PCC decoder 300 may also decode motion information including data of global motion vector 152 and identify reference cloud 150 from among the previously decoded point clouds.
G-PCC decoder 300 may apply global motion vector 152 to reference cloud 150 to generate predicted cloud 154. That is, G-PCC decoder 300 may construct predicted cloud 154 by applying global motion vector 152 to each point of reference cloud 150 at a respective location, and setting occupancy of nodes of predicted cloud 154 (e.g., reference node 156) to include the point at the respective location offset by global motion vector 152.
The geometric arithmetic decoding unit 302 may then use the occupancy of the nodes (e.g., reference nodes 156) of the predicted cloud 154 to determine a context for decoding a value representing the occupancy of the current node 162 of the current cloud 160. The current cloud 162 corresponds to the reference node 156 as indicated by vector 164. For example, if the reference node 156 is occupied (that is, includes a point), the geometric arithmetic decoding unit 302 may determine a first context for encoding a value representing the occupancy of the current node 162. The first context may indicate the most probable symbol of the value representing occupancy of the current node 162 as having a high likelihood (e.g., "1") of the value representing occupancy of the current node 162. On the other hand, if the reference node 156 is unoccupied (that is, does not include any points), the geometric arithmetic decoding unit 302 may determine a second context for encoding a value representing the occupancy of the current node 162. The second context may indicate the most probable symbol of the value representing occupancy of the current node 162 as having a high likelihood (e.g., "0") of representing the value that the current node 162 is unoccupied. The geometric arithmetic decoding unit 302 may then use the determined context to decode the value of the bitstream 166 representing the occupancy of the current node 162.
Fig. 6 is a conceptual diagram illustrating an example prediction tree that may be used in performing the techniques of this disclosure. Predictive geometry codec was introduced in the 2018, 10-month, middle australian ISO/IEC JTC1/SC29 WG11 file N18096"Exploratory model for inter-prediction in G-PCC (inter-frame predictive exploration model in G-PCC)" as an alternative to octree geometry codec. In predictive geometry codec, nodes are arranged in a tree structure (which defines a prediction structure), and various prediction strategies are used to predict the coordinates of each node in the tree with respect to its predicted value.
Fig. 6 illustrates an example prediction tree 350, which is a directed graph with arrows pointing in the prediction direction. The prediction tree 350 includes various types of nodes according to the number of child nodes (e.g., 0 to 3). In the example of fig. 6, node 352 is an example of a branch vertex having three child nodes, node 354 is an example of a branch node having two child nodes, node 356 is an example of a branch node having one child node, node 358 represents an example of a leaf vertex, and node 360 represents an example of a root vertex. As a root vertex, node 360 has no predicted value. Each node in the prediction tree 350 has at most one parent node.
Four prediction strategies may be specified for the current node based on its parent node (p 0), grandparent node (p 1), and great grandparent node (p 2): 1) No prediction/zero prediction (0); 2) Offset prediction (P0); 3) Linear prediction (2×p0-P1); parallelogram prediction (2×P0+P1-p 2).
The G-PCC encoder 200 may employ any algorithm to generate the prediction tree. The G-PCC encoder 200 may determine the algorithm to use according to the application/use case and may use several strategies. Some strategies are described in N18096.
For each node, G-PCC encoder 200 may encode residual coordinate values in the bitstream starting from the root node in a depth-first manner. Predictive geometric codec may be particularly useful for class 3 (e.g., LIDAR acquired) point cloud data, e.g., for low latency applications.
Fig. 7 is a conceptual diagram illustrating an example rotational LIDAR acquisition model. Fig. 7 illustrates a LIDAR380 that includes a plurality of sensors that transmit and receive respective lasers 382. The G-PCC includes an angular pattern for predictive geometric codec. In the angle mode, the characteristics of the LIDAR sensor may be used to more efficiently encode the prediction tree. In the angular mode, the coordinates of the position are converted into values of radius (r) 384, azimuth angle (phi) 386, and laser index (i) 388. The G-PCC encoder 200 and the G-PCC decoder 300 may perform prediction in this domain. That is, the G-PCC encoder 200 and the G-PCC decoder 300 may codec residual values in the r, phi, i domain.
Due to rounding errors, the codec is not lossless in the r, phi, i domain. Thus, the G-PCC encoder 200 and the G-PCC decoder 300 may codec a second set of residuals corresponding to cartesian coordinates. The description of the encoding and decoding strategies for the angular mode from the predictive geometry codec in N18096 is reproduced below.
The angular pattern of predictive geometry codec focuses on the point cloud acquired using the rotating Lidar model. In the example of fig. 7, the LIDAR 380 has N lasers 382 that rotate about the Z-axis according to the azimuth angle phi (e.g., where N may be equal to 16, 32, 64, or some other value). Each laser 382 may have a different elevation angle θ (i) i=1…N And height ofLet it be assumed that the laser i hits a point M having cartesian integer coordinates (x, y, z) defined according to a coordinate system.
According to N18096, the position of the point M can be modeled with three parameters (r, phi, i) which can be calculated as follows:
φ=atan2(y,x)
more specifically, the G-PCC encoder 200 and the G-PCC decoder 300 may use quantized versions of (r, φ, i), denoted asWherein three integers>And i can be calculated as follows:
wherein the method comprises the steps of
●(q r ,o r ) And (q) φ ,o φ ) Respectively controlAnd->Quantization parameters of precision. />
● sign (t) is a function, returning to 1 if t is positive, otherwise returning to (-1).
● |t| is the absolute value of t.
To avoid reconstruction mismatch due to the use of floating point operations, the G-PCC encoder 200 and the G-PCC decoder 300 may be pre-computed and quantized as followsAnd tan (θ (i)) i=1...N Is the value of (1):
wherein the method comprises the steps of
And (q) θ ,o θ ) Control +.>And->Quantization parameters of precision.
The G-PCC encoder 200 and the G-PCC decoder 300 may obtain reconstructed cartesian coordinates as follows:
wherein app_cos (.) and app_sin (.) are approximations of cos (.) and sin (.), respectively. The computation may use a fixed point representation, a look-up table, and/or linear interpolation.
For various reasons, such as quantization, approximation, model inaccuracy and/or model parameter inaccuracy,may be different from (x, y, z).
Design (r) x ,r y ,r z ) To reconstruct the residual, the following is defined:
-
-
-
the G-PCC encoder 200 may proceed as follows:
● For model parametersAnd->Quantization parameter q r 、/>q θ And q φ Coding;
● Applying the geometric prediction scheme described in w19522 to a representation
A new predicted value using characteristics of lidar may be introduced. For example, the rotational speed of the lidar scanner about the z-axis is typically constant. Thus, the G-PCC encoder 200 may predict the current as follows
Wherein the method comprises the steps of
■(δ φ (k)) k=1...K Is a potential set of speeds from which the encoder can select. The G-PCC encoder 200 may explicitly encode the index k into the bitstream, or the G-PCC decoder 300 may infer the index k from the context based on deterministic policies applied by both the G-PCC encoder 200 and the G-PCC decoder 300, and
■ n (j) is the number of skip points that the G-PCC encoder may explicitly encode into the bitstream, or the G-PCC decoder 300 may infer n (j) from the context based on deterministic policies applied by both the G-PCC encoder 200 and the G-PCC decoder 300. n (j) is also referred to as the "phi multiplier". n (j) may be used with the offset prediction value.
● The reconstructed residual is encoded with each node (r x ,r y ,r z )。
G-PCC decoder 300 may proceed as follows:
● Decoding model parametersAnd->Quantization parameter q r 、/>q θ And q φ
● Decoding the nodes associated with the nodes according to the geometric prediction scheme described in w19522Parameters.
● Calculating the reconstructed coordinates as described above
● Decoding residual (r) x ,r y ,r z )。
Lossy compression can reconstruct the residual error (r x ,r y ,r z ) To support.
● The raw coordinates (x, y, z) are calculated as follows:
if G-PCC is braidedThe encoder 200 applies quantization to the reconstructed residual (r x ,r y ,r z ) Or drop points, lossy compression may be implemented. The quantized reconstructed residual may be calculated as follows:
wherein (q) x ,o x )、(q y ,o y ) And (q) z ,o z ) Respectively controlAnd->Is used for the quantization parameter of the precision of (a).
Grid quantization may be used to further improve RD (rate distortion) performance results. Quantization parameters may be changed at the sequence/frame/slice/block level to achieve region adaptive quality and for rate control purposes.
FIG. 8 is a flow chart illustrating an example motion estimation process of G-PCC InterEM software. Two types of motion, global motion matrix and local node motion vector, are involved in the G-PCC intersem software. Global motion parameters are defined as rotation matrices and translation vectors that will be applied to all points in the predicted (reference) frame. The local node motion vector of a node of the octree is a motion vector that can be applied only to points within a node in the predicted (reference) frame. Details of the motion estimation algorithm in inters are described below. Fig. 8 illustrates a flow chart of a motion estimation algorithm.
Given an input predicted (reference) frame and a current frame, G-PCC encoder 200 may first estimate global motion on a global scale (400). G-PCC encoder 200 may then apply the estimated global motion to the predicted (reference) frame (402). After applying global motion to the predicted (reference) frames, the G-PCC encoder 200 may estimate local motion at a finer scale (404), e.g., node level in an octree. Finally, the G-PCC encoder 200 may perform motion compensation (406) to encode the estimated local node motion vectors and points.
FIG. 9 is a flowchart illustrating an example process for estimating global motion. In the inters software, the global motion matrix is defined as the feature points between the matching predicted frame (reference) and the current frame. FIG. 9 illustrates a pipeline for estimating global motion. The global motion estimation algorithm can be divided into three steps: finding feature points (410), sampling feature point pairs (412), and motion estimation using a Least Mean Square (LMS) algorithm (414).
The algorithm defines feature points as those points that have a large change in position between the predicted frame and the current frame. For each point in the current frame, G-PCC encoder 200 finds the nearest point in the predicted frame and establishes a point pair between the current frame and the predicted frame. If the distance between the pairs of points is greater than the threshold, G-PCC encoder 200 treats the pairs of points as feature points.
After finding the feature points, the G-PCC encoder 200 performs sampling on the feature points to reduce the scale of the problem (e.g., by selecting a subset of the feature points to reduce the complexity of motion estimation). The G-PCC encoder 200 then applies an LMS algorithm to derive the motion parameters by attempting to reduce the error between the corresponding feature points in the predicted frame and the current frame.
Fig. 10 is a flow chart illustrating an example process for estimating local node motion vectors. G-PCC encoder 200 may recursively estimate motion vectors for nodes of the prediction tree. G-PCC encoder 200 may evaluate a cost function for selecting the most appropriate motion vector based on a Rate Distortion (RD) cost.
In the example of fig. 10, G-PCC encoder 200 receives the current node (420). If the current node is not divided into 8 children nodes, G-PCC encoder 200 determines a motion vector that will result in the lowest cost between the current node and the predicted node (422). On the other hand, if the current node is divided into 8 children nodes, G-PCC encoder 200 divides the current node into 8 children nodes (424), finds motion for each child node (426), and adds all returned estimated costs (428). That is, the G-PCC encoder 200 applies a motion estimation algorithm and obtains the total cost under the division condition by adding the estimated cost value of each child node. G-PCC encoder 200 may determine whether to divide or un-divide the node by comparing costs between dividing and un-dividing. If partitioning is performed, G-PCC encoder 200 may assign each child node its respective motion vector (or further partition the node into respective child nodes). If no partitioning is performed, G-PCC encoder 200 may assign its motion vector to the node. G-PCC encoder 200 may then compare the costs to determine whether to partition or not partition the current node (430).
Two parameters that may affect motion vector estimation performance are block size (BlockSize) and minimum prediction unit size (MinPUSize). The BlockSize defines the upper bound of the node size to which motion vector estimation is applied, while the MinPUSize defines the lower bound.
U.S. provisional patent No. 63/090657, which is converted to U.S. patent application No. 17/495,28, which was filed 10 in 2021, and filed 12 in 10 in 2020, describes an improved global motion estimation technique based on an iterative closest point approach. In this scheme, first, an initial translation vector is estimated by minimizing a mean square error between a current frame and a reference frame. When estimating the initial translation vector, it may be considered whether the point is a marker of the ground. For example, if the point is a ground point, then the point is excluded from the estimation. The initial translation vector combined with the identity matrix may then be fed into an iterative closest point scheme or similar scheme to estimate the rotation matrix and translation vector. Further, in this case, it may be considered whether the point is the ground, for example, by excluding it from the estimation. Alternatively, the rotation matrix may be estimated first based on whether the point is a marker of the ground. The flag may be derived by the G-PCC encoder 200 and signaled to the G-PCC decoder 300, or the G-PCC encoder 200 and the G-PCC decoder 300 may derive the flag. The signature may be derived based on a ground estimation algorithm; such an algorithm may be based on the height of the point, the density of the point cloud near the point, the relative distance of the point from the LIDAR origin/fixed point, etc.
In practical applications, objects such as automobiles, ground areas and point clouds often have different movements. For example, ground points may have zero or small motion, while objects may have higher motion. In the conventional approach to estimating global motion in inters software, both ground points and object points can be used to derive global motion. After doing so, the estimated output may be inaccurate.
U.S. provisional patent No. 63/090,657, filed on 12 10/2020, describes several marking methods for classifying objects and floors. For example, in these methods, the G-PCC encoder 200 may derive a flag and signal the flag to the G-PCC decoder 300, or both the G-PCC encoder 200 and the G-PCC decoder 300 may derive the flag. The signature may be derived based on a ground estimation algorithm; such an algorithm may be based on the height of the point, the density of the point cloud near the point, the relative distance of the point from the LIDAR origin/fixed point, etc.
This disclosure describes techniques for marking surfaces and objects to improve the performance of global motion estimation. In particular, the G-PCC encoder 200 and the G-PCC decoder 300 may be configured to classify ground/road and object data in the point cloud, which may improve the performance of global motion estimation.
Fig. 11 is a diagram illustrating an example of classifying a cloud into a ground (road) and an object using two thresholds of z-values of points according to the techniques of this disclosure. The G-PCC encoder 200 and the G-PCC decoder 300 may be configured to use the height (or z-value) of points in the cloud to classify ground and road points. In one example, G-PCC encoder 200 and G-PCC decoder 300 may be configured with definitions of two thresholds, e.g., z_top 452 and z_bottom 454 as shown in FIG. 11.
If the height (z value) of a point is less than z_bottom 454 or higher than z_top 452, then G-PCC encoder 200 and G-PCC decoder 300 may classify the point as an object. Otherwise, if the point has a height (z value) between z_bottom 454 and z_top 452, then G-PCC encoder 200 and G-PCC decoder 300 may classify the point as a ground (road).
In some examples, G-PCC encoder 200 and G-PCC decoder 300 may use the set of value ranges to specify a ground point and classify the ground point as including any point that satisfies at least one value range. For example, for the (x, y, z) coordinates, G-PCC encoder 200 and G-PCC decoder 300 may be configured with a specification of the ith range of values { (x_min) i ,x_max i ),(y_min i ,y_max i ),(z_min i ,z_max i ) }. The G-PCC encoder 200 and G-PCC decoder 300 may be configured with N such ranges that i is in [1, N ]Is a kind of medium. If for [1, N]I (alternatively, [0, N-1)]Some value of i) in (x_min) satisfies ((x_min) i ≤x≤x_max i )&(y_min i ≤y≤y_max i )&(z_min i ≤z≤z_max i ) G-PCC encoder 200 and G-PCC decoder 300 may classify the point at (x, y, z) as a ground point.
The G-PCC encoder 200 and the G-PCC decoder 300 may be configured with a minimum value down to minus infinity and a maximum value up to infinity. In the example above regarding the point between z_bottom 454 and z_top 452 where the z value is classified as ground and the other points are classified as objects, x_mini and y_mini may be set to minus infinity and x_maxi and y_maxi may be set to infinity, while z_mini may be set to z_bottom 454 and z_maxi may be set to z_top 452.
When the G-PCC encoder 200 is configured to quantize the point cloud prior to encoding by a scaling factor, the G-PCC encoder 200 and the G-PCC decoder 300 may also use the same quantization factor quantization threshold.
Additionally or alternatively, the G-PCC encoder 200 and the G-PCC decoder 300 may be configured to use the output of the classification of points (e.g., classification as road and ground points) in global motion estimation and prediction. As described above, the G-PCC encoder 200 and the G-PCC decoder 300 may estimate global motion according to techniques of InterEM software or the method described in U.S. provisional application No. 63/090,657.
Alternatively, in some examples, G-PCC encoder 200 and G-PCC decoder 300 may derive two global motion sets. The G-PCC encoder 200 and the G-PCC decoder 300 may use the first set of global motion information to predict the ground/road points and the second set of global motion information to predict the object points. To derive a global set of motions for the ground/road, only points with "ground/road" markers may be used. To derive a global set of motions of an object, only points with a marker "object" can be used.
As yet another example, G-PCC encoder 200 and G-PCC decoder 300 may derive only one global motion set to predict an object point. In this example, the G-PCC encoder 200 and the G-PCC decoder 300 may use zero motion (translation and rotation set equal to zero) to predict the ground/road point.
In some examples, which may be used in addition to the various techniques described above, G-PCC encoder 200 and G-PCC decoder 300 may define thresholds for different levels of sharing. For example, G-PCC encoder 200 and G-PCC decoder 300 may independently determine thresholds for different frames. G-PCC encoder 200 may determine a threshold value for each frame and encode data representing the threshold value in the bitstream such that G-PCC decoder 300 may determine the threshold value from the encoded data of the bitstream.
In some examples, G-PCC encoder 200 and G-PCC decoder 300 may define the threshold at a group of Pictures (GOP) level. In this example, all frames in the GOP may share the same threshold. The G-PCC encoder 200 and the G-PCC decoder 300 may determine a shared threshold at the start of a GOP and encode and decode data in conjunction with encoding information for the sequential first frame of the GOP.
In some examples, G-PCC encoder 200 and G-PCC decoder 300 may define thresholds at a sequence level. That is, all frames in a sequence may share the same threshold. In this example, the threshold may be presented in encoder configuration data (e.g., encoder configuration file) of the G-PCC encoder 200, and the G-PCC encoder 200 may encode data representing the threshold in the bit stream such that the G-PCC decoder 300 may determine the threshold from the encoded data.
G-PCC encoder 200 may use various techniques to derive thresholds applied to two or more sets of frames (e.g., GOP, sequence, etc.):
● In the simplest case, G-PCC encoder 200 may select the threshold of the sequentially first frame in the set as the threshold of the frames in the set. The sequential first frame may be a sequential first frame in an output order or a decoding order of the point cloud.
● In some examples, G-PCC encoder 200 may derive the threshold used from a weighted average derived for/applicable to two or more frames in the set. For example, if there are 10 frames in the set, and t1 i ,t2 i Referring to the threshold derived for the i-th frame, then for n equal to 1 and 2, the final threshold may be derived as follows:
one example may uniformly set weights for all frames in a set. Weights may also be assigned so that only some frames are used to calculate the final threshold; for example, every 8 th frame may be selected to have a non-zero weight, while all other frames may be given a weight of 0. Weights may also be specified based on the time domain IDs of the point cloud; frames belonging to a lower time domain ID may get a larger weight, while frames belonging to a higher time domain ID may get a smaller weight.
● In some alternatives, G-PCC encoder 200 may be configured with a constraint that the sum of the weights used to derive the threshold is equal to 1.
In some examples, G-PCC encoder 200 may derive a threshold and encode data in the bitstream such that G-PCC decoder 300 may determine the threshold from the encoded data. In some examples, the threshold may be derived from the same technique by both the G-PCC encoder 200 and the G-PCC decoder 300.
In some examples, when G-PCC encoder 200 signals the threshold, G-PCC encoder 200 may signal the threshold in a Sequence Parameter Set (SPS), a Geometry Parameter Set (GPS), or a geometry data unit header (GDH). In one example, where the G-PCC encoder 200 signals a threshold in the GPS/GDH level, the G-PCC encoder 200 may be configured to conditionally signal the threshold only, for example, when the angle mode is enabled. Thus, when the angle mode is enabled, the G-PCC encoder 200 and the G-PCC decoder 300 may be configured to codec data from the threshold in the GPS/GDH, and when the angle mode is disabled, the G-PCC encoder 200 and the G-PCC decoder 300 may be configured to avoid codec data from the threshold. Alternatively, the G-PCC encoder 200 and the G-PCC decoder 300 may be configured to unconditionally codec the threshold data.
The G-PCC encoder 200 and the G-PCC decoder 300 may codec the data of the threshold value as a se (v) or ue (v) value. The Se (v) codec may first represent syntax elements of the signed integer 0 th order Exp-Golomb codec with left bits, and the ue (v) codec may first represent syntax elements of the unsigned integer 0 th order Exp-Golomb codec with left bits. In one example, the GPS may be modified as shown in table 1 below, with [ add: "added text" ] represents an addition to an existing G-PCC standard, and G-PCC encoder 200 and G-PCC decoder 300 may be configured according to the standard:
TABLE 1
/>
Alternatively, if the threshold is large enough, a fixed length codec may also be performed, including indicating the number of bits to be encoded for the fixed length codec, followed by actual fixed length codec of the threshold using s (v) codec, e.g., according to table 2 below. S (v) signed fixed length codec encoding a coded representation value:
TABLE 2
As shown in fig. 11, in a scene in which top, road, and bottom regions are classified, there may be at most two thresholds (z_bottom, z_top). In a typical scenario, the origin of the frame may be the center of the LIDAR system, which indicates that both z_top and z_bottom may be negative, as the LIDAR system center/frame origin may be much higher than the road. Second, the thresholds may always be arranged in descending order, i.e., z_top > z_bottom. In this case, the G-PCC encoder 200 and the G-PCC decoder 300 may be configured to natively codec the first threshold, and for the second threshold, the G-PCC encoder 200 and the G-PCC decoder 300 may codec the difference between the second threshold and the first threshold. Furthermore, since the offset is always negative, it is possible to infer the sign of the offset, and therefore only the magnitude of the difference may be encoded. Furthermore, the difference cannot be zero, so the delta minus 1 amplitude can be encoded. In some scenarios, a single threshold may be sufficient when the bottom region is not very distinct. Thus, the G-PCC encoder 200 and the G-PCC decoder 300 may be configured to codec a flag indicating whether the second threshold exists. The syntax modification of the GPS (according to which the G-PCC encoder 200 and the G-PCC decoder 300 may be configured) may be as shown in table 3 below:
TABLE 3 Table 3
In another example, threshold 1 may be signaled at all times, and threshold 0 may be signaled conditionally based on the value of the flag.
In another example, the midpoint of the two thresholds may be signaled (m), and the distance (w) of the midpoint from either threshold may be signaled; these two thresholds can then be derived as m-w and m+w. These values may be signaled using fixed length or variable length codecs.
In another example, G-PCC encoder 200 and G-PCC decoder 300 may codec the data of these thresholds in the GPS level, possibly overriding/refining the thresholds in the GDH level.
In another example, the G-PCC encoder 200 and the G-PCC decoder 300 may codec these thresholds along with global motion information (rotation and translation factors).
In another example, G-PCC encoder 200 and G-PCC decoder 300 may codec thresholds in separate parameter sets, such as parameter sets dedicated to motion-related parameters.
In some examples, in addition to or as an alternative to the techniques discussed above, G-PCC encoder 200 and G-PCC decoder 300 may be configured to implicitly classify points, e.g., as objects or ground points, by encoding and decoding the points in the slices corresponding to the categories. For example, if the point cloud includes object and ground (or road) point classifications, the G-PCC encoder 200 and the G-PCC decoder 300 may encode object slices that include a first subset of points that are all classified as object points, and ground or road slices that include a second subset of points that are all classified as ground or road points. More than two classifications may be used in this manner. In general, G-PCC encoder 200 and G-PCC decoder 300 may be configured to determine that each class of points has one slice, and that all points within a given slice will be classified according to the corresponding class of the given slice. In this example, an explicit classification algorithm is not necessary, which may reduce the computations that G-PCC encoder 200 and G-PCC decoder 300 are to perform.
More generally, G-PCC encoder 200 and G-PCC decoder 300 may be configured to perform the following techniques, alone or in any combination with various other techniques of the present disclosure:
1. the points of the point cloud are classified (or partitioned) into M groups. G-PCC encoder 200 and G-PCC decoder 300 may be configured according to one or other of the techniques of this disclosure to enable classification of points into M groups.
a. Examples of groups include roads, dividers, nearby cars or vehicles, buildings, signs, traffic lights, pedestrians, and the like. Note that each car/vehicle/building, etc. may be classified into separate groups.
b. The group may include points representing the object or points adjacent to each other in the spatial domain.
G-PCC encoder 200 and G-PCC decoder 300 may specify N slice groups (N.ltoreq.M). G-PCC encoder 200 and G-PCC decoder 300 may associate each of the M groups with one of the N slice groups. The G-PCC encoder 200 and the G-PCC decoder 300 may codec together the points belonging to the slice group.
a. For example, a "ground" slice group may include points belonging to a "road" and "divider" group, a "static" slice group may include points belonging to a "building" and "sign", and a "dynamic" slice group may include groups such as automobiles/vehicles or "pedestrians".
b. More generally, G-PCC encoder 200 and G-PCC decoder 300 may encode one or more groups that share some attributes into a slice group. For example, groups having similar relative motion with respect to the LIDAR sensor/vehicle may be encoded into one slice group.
c. In another example, G-PCC encoder 200 and G-PCC decoder 300 may be configured to determine that each point group having certain attributes belongs to a separate slice group.
d. Points in a group may be associated with more than one slice group (e.g., the points may be repeated).
3.G-PCC encoder 200 and G-PCC decoder 300 may codec points belonging to each slice group of one or more slices.
4.G-PCC encoder 200 and G-PCC decoder 300 may identify slices belonging to a slice group based on an index value (e.g., slice index) or a flag (slice type or slice group type).
a. Each slice group may be associated with a slice type/slice group type that may be signaled in each slice of the slice group.
i. For example, an index/flag of [0, N-1] may be associated with each slice group, and G-PCC encoder 200 and G-PCC decoder 300 may codec index/flag "i" in slices belonging to the ith slice group (0.ltoreq.i.ltoreq.N-1).
in another example, the point cloud may have two slice groups S1 and S2, and each slice group may be encoded into 3 slices, such that there are 6 slices in total. Each slice of S1 may have a slice type of 0, while each slice of S2 may have a slice type of 1.
b. In another example, each slice may be associated with a slice number of the slice index; the slices belonging to a particular slice group may be identified by a slice number/index.
i. For example, the point cloud may have two slice groups S1 and S2, and each slice group may be encoded into 3 slices, so that there are 6 slices in total. The slices of S1 may have slice numbers 0,1, and 2, while the slices of S2 may have slice numbers 3, 4, and 5.
c. In some examples, the slice identifier may be a combination of a slice group identifier/type and a slice number.
i. For example, the point cloud may have two slice groups S1 and S2, and each slice group may be encoded into 3 slices, so that there are 6 slices in total. The slices of S1 may have identifiers (0, 0), (0, 1), (0, 2), where the first number of each tuple is the slice type and the second number is the slice number within the slice group. Similarly, the slice of S2 may have identifiers (1, 0), (1, 1), (1, 2).
d. The slice type, slice group type, slice number of the slice identifier may be signaled in the slice.
The G-PCC encoder 200 and the G-PCC decoder 300 may codec data with reference to the slices used for prediction. A slice may be predicted with reference to another slice. The reference slice may belong to the same picture (intra prediction) or to another picture (inter prediction).
a.G-PCC encoder 200 and G-PCC decoder 300 may identify the reference slice using one or more of:
i. reference frame number or frame counter
Reference slice identifier (slice type/group type, slice number, slice identifier, etc.)
b. In some examples, G-PCC encoder 200 and G-PCC decoder 300 may be configured according to the restriction that a slice can only refer to other slices belonging to the same slice type/slice group type. In this case, the reference slice type/slice group type need not be signaled.
c. In another example, slices may be allowed to reference all points belonging to a frame or slice group; in this case, the reference slice number may not be signaled, as all slices of the frame/slice group may be referenced for prediction.
d. In another example, two or more slice identifiers may be signaled to identify a plurality of slices that may be referenced for prediction.
6.G-PCC encoder 200 and G-PCC decoder 300 may associate a first set of motion parameters for each point; the motion parameters may be used to compensate for the position of the points; the compensated position may be used as a reference for prediction.
a. In one example, the motion parameter associated with a point may be a motion parameter associated with a slice containing the point.
b. In one example, the motion parameter associated with a slice may be a motion parameter associated with a slice group that includes the slice.
c. In one example, the motion parameter associated with a slice group may be a motion parameter associated with a frame containing the slice group.
d. The motion parameters may be signaled in a parameter set such as SPS, GPS, etc., a slice header, or other portion of the bitstream.
e. The above description relates to motion parameters, but this can be applied to any set of motion parameters (e.g., rotation matrix/parameters, translation vector/parameters, etc.)
f. In some examples, the motion parameters used to apply motion compensation to points in the reference frame may be signaled in the current frame or in frames other than the reference frame. For example, if frame 1 predicts using a point from frame 0, frame 1 may be used to signal the motion parameters applied to the point in frame 0.
g. In one example, the reference index of a slice/slice group of a reference frame may be signaled (in a parameter set or slice or other syntax structure) in the current frame.
i. In one example, one or more tuples (motion parameters, reference indices) may be signaled with a current frame (or slice), where the reference indices identify points in the reference frame (slice/slice group/region) to which the corresponding motion parameters apply.
h. In one example, the motion parameter may be a global set of motion parameters applied to all points in a slice, slice group, region, or frame.
One or more techniques of this disclosure may also be applied to attributes, e.g., in addition to or as an alternative to points.
In some examples, G-PCC encoder 200 and G-PCC decoder 300 may be configured to specify one or more regions in a point cloud. The G-PCC encoder 200 and the G-PCC decoder 300 may also associate motion parameters with each region. The G-PCC encoder 200 and the G-PCC decoder 300 may codec data in the bitstream representing motion parameters associated with the region. The G-PCC encoder 200 and the G-PCC decoder 300 may use motion parameters to compensate for the position of the points. The G-PCC encoder 200 and the G-PCC decoder 300 may use the compensated points as references/predictions for encoding and decoding the position of the points in the current frame. In some cases, classifying using regions (as compared to slices) may achieve better compression performance because G-PCC encoder 200 and G-PCC decoder 300 may codec points belonging to different regions together.
The G-PCC encoder 200 and the G-PCC decoder 300 may codec data representing one or more regions in a point cloud.
a.G-PCC encoder 200 and G-PCC decoder 300 may codec data representing a value N for the number of regions and representing parameters specifying each of the N regions.
i. In some examples, N may be limited to a range of values (e.g., N may be limited to less than a fixed value, such as 10).
b.G-PCC encoder 200 and G-PCC decoder 300 may codec parameters for each region in the bitstream. In some examples, the region may be specified using one or more of the following parameters:
i. the upper and lower bounds of the x, y, and z coordinates of the region (or any other coordinate system used to encode the point cloud) are defined.
in some examples, one or more of the upper or lower bounds may not be specified; in this case, the G-PCC encoder 200 and the G-PCC decoder 300 may use default values suitable for the coordinates and the coordinate system as inferred values.
1. For example, in the sphere (r, phi, laserId), if no bounds of phi are signaled, it can be inferred that the upper and lower bounds correspond to 360 degrees and 0 degrees, respectively.
2. A motion parameter may be associated with each region; motion compensation may be applied to one or more points belonging to the region to obtain compensated positions/points; the compensated position/point may be used as a reference for predicting a point in the current point cloud frame.
a. One or more methods of signaling motion parameters disclosed in the present disclosure may be applied to signaling motion parameters of each region. For example, G-PCC encoder 200 and G-PCC decoder 300 may codec motion parameters of each region in a parameter set (e.g., SPS, GPS) or other portions of the bitstream (e.g., slice header or separate syntax structures).
The G-PCC encoder 200 and the G-PCC decoder 300 may be configured to perform any of the various techniques of this disclosure in various combinations. For example, the motion parameters of the reference frame may be specified by region, while the current frame may be specified with one or more slice groups; the slice group may be associated with a region (explicitly or implicitly), and reference points from the region may be used to predict points of the slice group. In another example, points in a region may be encoded as slices or slice groups.
Fig. 12 is a diagram 460 illustrating an example derivation of a threshold using a histogram in accordance with the techniques of this disclosure. Graph 460 represents an example histogram of the collection height (z-value) of point cloud data. The G-PCC encoder 200 may use the histogram to calculate the thresholds z_bottom 462 and z_top 464.
In an example embodiment, G-PCC encoder 200 may downscale (downscale) the cloud (subsampled) with a size of hist_bin_size, which may be defined as follows:
hist_bin_size=int((max_box_t-min_box_t)/hist_scale)
where max_box_t and min_box_t are ranges of z-values in the cloud that will be used to obtain the threshold. Max_box_t may be lower than the maximum value of z in the cloud, and min_box_t may be higher than the minimum value of z in the cloud.
Next, G-PCC encoder 200 may derive a histogram of points for which the z-value is in the range min_box_t to max_box_t (this is an example Python code, although other languages or other implementations of hardware may also be used):
n,bins=np.histogram(source_points_ori,hist_bin_size,(min_box_t,max_box_t))
in this example, np represents a numpy library (org), source_points_ori is a set of points whose z-values are in the range of min_box_t to max_box_t.
Thereafter, G-PCC encoder 200 may calculate standard deviation (std) 466 of the histogram (although other languages or other implementations of hardware may be used) based on, for example, the following Python code:
mids=0.5*(bins[1:]+bins[:-1])
probs=n/np.sum(n)
mean=np.sum(probs*mids)
std=np.sqrt(np.sum(probs*(mids-mean)**2))
finally, in this example, the G-PCC encoder 200 may derive z_bottom 462 and z_top 464 as follows: G-PCC encoder 200 determines a bin (bin) index (top_idxn, bin 470 in the example of fig. 12) in the histogram with the maximum count of points. G-PCC encoder 200 determines thresholds z_top (z_top 464) and z_bottom (z_bottom 462) by shifting std-related values (e.g., 1 x std466 and 1.5 x std468 in the example of fig. 12) rightward and leftward from bin 470 (i.e., the bin with the largest count of points). The following Python code represents one example technique that may be used to derive the threshold:
top_idx_n=np.where(n==n.max())
z_top=min(bins[top_idx_n]+w_top*std,max_box_t)
z_bottom=max(bins[top_idx_n]-w_bottom*std,max_box_t)
Where w_top and w_bottom are predefined positive values.
In the example of fig. 12, for the 100 th frame of the collected dataset, (max_box_t, min_box_t) is set equal to (-500, -4000). In fig. 12, (w_top, w_bottom) is set equal to (1, 1.5).
Fig. 13 is a conceptual diagram illustrating marking points in a point cloud 470 as road points 474 and object points 472 according to the techniques of the present disclosure. An automobile equipped with a LIDAR system (not shown in fig. 13) typically located at a point 476 may collect data from the surrounding environment to construct a point cloud 470. A G-PCC encoder within an automobile, such as G-PCC encoder 200, may determine a threshold for classifying points of point cloud 470 as road points or object points. After determining the threshold (e.g., according to the technique of fig. 12), G-PCC encoder 200 may mark the points of point cloud 470 as ground/road points 474 or object points 472.
Fig. 13 shows a visualization of these sets of points, where object points 472 are dark shaded and road points 474 (also referred to as ground points) are light shaded. As can be seen in the example of fig. 13, light-shaded road points 474 are typically distributed on a flat plane (e.g., the ground or a road on the ground), while dark-shaded object points 472 typically define objects, such as fences, signs, buildings, or other objects near the location of the automobile when the point cloud 470 is generated.
Fig. 14 is a flowchart illustrating an example method of encoding a point cloud according to the techniques of this disclosure. The method of fig. 14 is explained with respect to the G-PCC encoder 200 of fig. 1 and 2. Other G-PCC encoding devices may be configured to perform this or similar methods.
Initially, G-PCC encoder 200 may obtain a point cloud to be encoded, such as current cloud 140 of fig. 3. The point cloud may include a set of points, each point having a geometric location (e.g., represented by (x, y, z) coordinates) and one or more attributes. G-PCC encoder 200 may then determine a height value of a point in the point cloud, for example, using the z-value of the geometric location of the point (500). G-PCC encoder 200 may then determine top and bottom thresholds (502) and use these thresholds to classify ground points and object points in the point cloud (504). For example, G-PCC encoder 200 may determine the threshold using the techniques discussed above with respect to fig. 11 and 12. G-PCC encoder 200 may then classify points between the top and bottom thresholds as ground points, while other points are classified as object points. G-PCC encoder 200 may also encode a data structure (e.g., SPS, GPS, GDH, etc.) that includes data representing the top and bottom thresholds. The data structure may conform to the examples of any of tables 1-3 above.
G-PCC encoder 200 may then calculate a global motion vector for the object point (506). For example, as shown in fig. 3, G-PCC encoder 200 may calculate a global motion vector 132 for the set of object points. The global motion vector may generally represent the motion vector that best produced the predicted cloud 134 (i.e., produced the predicted cloud that includes the points that most closely match the current cloud 140 relative to the reference cloud 130). After obtaining the global motion vector, G-PCC encoder 200 may generate predicted cloud 134 using global motion vector 132 with respect to reference cloud 130 (508).
G-PCC encoder 200 may then use predicted cloud 134 to determine a context for encoding occupancy of nodes of current cloud 140 by the determined object points (510). G-PCC encoder 200 may also use contextual entropy encoding of data representing the occupancy of object points to nodes (512). Specifically, for a given node of current cloud 140, G-PCC encoder 200 may determine whether the corresponding node (having the same size and location within predicted cloud 134) is occupied by at least one object point. If the corresponding node is occupied (i.e., includes at least one object point), G-PCC encoder 200 may determine a context for encoding a value indicating whether the current node is occupied or not to have a high likelihood of indicating that the current node of current cloud 140 is also occupied. If the corresponding node is unoccupied (i.e., does not include any object points), G-PCC encoder 200 may determine a context for encoding a value indicating whether the current node is occupied to have a high likelihood of indicating that the current node of current cloud 140 is unoccupied.
G-PCC encoder 200 may then encode the value using the determined context. If the current node is not occupied, G-PCC encoder 200 may proceed to the new node. On the other hand, if the current node is occupied, the G-PCC encoder 200 may divide the current node into eight child nodes and encode the occupancy data of each of the eight child nodes in the same manner.
G-PCC encoder 200 may then separately encode data representing the occupancy of the ground point-to-node (514). For example, G-PCC encoder 200 may use a second different global motion vector, local motion vector, and/or intra prediction to encode data representing the occupancy of a ground point to node.
In this way, the method of fig. 14 represents an example of a method of encoding point cloud data, including determining a height value of a point in a point cloud; classifying the points into a ground point set or an object point set according to the height value; and encoding and decoding the ground points and the object points according to the classification.
Fig. 15 is a flowchart illustrating an example method of decoding a point cloud in accordance with the techniques of this disclosure. The method of fig. 15 is explained as being performed by the G-PCC decoder 300 of fig. 1 and 4. However, in other examples, other decoding devices may be configured to perform the method or similar methods.
Initially, G-PCC decoder 300 may determine top and bottom thresholds (520). For example, G-PCC decoder 300 may decode a data structure (e.g., SPS, GPS, GDH, etc.) that includes data representing top and bottom thresholds. The data structure may conform to the examples of any of tables 1-3 above. G-PCC decoder 300 may also decode a global motion vector (522) for the object point, i.e., a point within a node having a height value outside of the range between the top and bottom thresholds. For example, G-PCC decoder 300 may use global motion vectors to decode data representing occupancy of nodes above a top threshold and/or below a bottom threshold, as described below.
G-PCC decoder 300 may use a global motion vector (e.g., global motion vector 152) with respect to reference cloud 150 to form a predicted cloud (e.g., predicted cloud 154 of fig. 5) (524). G-PCC decoder 300 may then use points within predicted cloud 154 to determine a context for decoding data representing occupancy of nodes in current cloud 160 (526). Specifically, for a given node of the current cloud 160, the G-PCC decoder 300 may determine whether the corresponding node (having the same size and location within the predicted cloud 154) is occupied by at least one object point. If the corresponding node is occupied (i.e., includes at least one object point), G-PCC decoder 300 may determine a context for decoding a value indicating whether the current node is occupied or not to have a high likelihood of indicating that the current node of current cloud 160 is also occupied. If the corresponding node is unoccupied (i.e., does not include any object points), G-PCC decoder 300 may determine a context for encoding a value indicating whether the current node is occupied to have a high likelihood of indicating that the current node of current cloud 160 is unoccupied.
G-PCC decoder 300 may then entropy decode the occupied data representing the node using the context (528). G-PCC decoder 300 may proceed to a new node if the decoded data indicates that the current node is unoccupied. On the other hand, if the current node is occupied, the G-PCC decoder 300 may divide the current node into eight child nodes and decode the occupancy data of each of the eight child nodes in the same manner.
G-PCC decoder 300 may also separately decode data representing occupancy of the node by the ground point, e.g., using different global motion vectors, local motion vectors, and/or intra-prediction (530).
Fig. 16 is a conceptual diagram illustrating a laser package 600 (such as a LIDAR sensor or other system including one or more lasers) scanning points in three-dimensional space. The laser package 600 may correspond to the LIDAR 380 of fig. 7. The data source 104 (fig. 1) may include a laser package 600.
As shown in fig. 16, a laser package 600 may be used to capture a point cloud, i.e., a sensor scans points in 3D space. However, it should be understood that some point clouds are not generated by actual LIDAR sensors, but may be encoded as if they were generated by actual LIDAR sensors. In the example of fig. 16, the laser package 600 includes a LIDAR head 602, the LIDAR head 602 including a plurality of lasers 604A-604E (collectively "lasers 604") arranged in a vertical plane at different angles relative to an origin. The laser package 600 may be rotated about a vertical axis 608. The laser package 600 may use the returned laser light to determine the distance and location of the points of the point cloud. The laser beams 606A-606E (collectively, "laser beams 606") emitted by the lasers 604 of the laser package 600 may be characterized by a set of parameters. The distances represented by arrows 610, 612 represent example laser correction values for lasers 604B, 604A, respectively.
Some lasers 604 may generally identify object points, while other lasers 604 may generally identify ground points. Using the techniques of this disclosure, these points may be classified as ground points or object points and encoded or decoded accordingly.
FIG. 17 is a conceptual diagram illustrating an example ranging system 900 that may be used with one or more techniques of this disclosure. In the example of fig. 17, ranging system 900 includes an illuminator 902 and a sensor 904. The illuminator 902 may emit light 906. In some examples, illuminator 902 can emit light 906 as one or more laser beams. The light 906 may be one or more wavelengths, such as infrared wavelengths or visible wavelengths. In other examples, light 906 is not a coherent laser. When light 906 encounters an object, such as object 908, light 906 produces return light 910. The return light 910 may include back-scattered light and/or reflected light. The return light 910 may pass through a lens 911, the lens 911 directing the return light 910 to create an image 912 of the object 908 on the sensor 904. The sensor 904 generates a signal 914 based on the image 912. Image 912 may include a set of points (e.g., represented by small dots in image 912 of fig. 17).
In some examples, the illuminator 902 and the sensor 904 may be mounted on a rotating structure such that the illuminator 902 and the sensor 904 capture a 360 degree view of the environment. In other examples, ranging system 900 may include one or more optical components (e.g., mirrors, collimators, diffraction gratings, etc.) that enable illuminator 902 and sensor 904 to detect objects within a particular range (e.g., up to 360 degrees). Although the example of fig. 17 shows only a single illuminator 902 and sensor 904, ranging system 900 may include multiple illuminator sets and sensor sets.
In some examples, illuminator 902 generates a structured light pattern. In such an example, ranging system 900 may include a plurality of sensors 904 on which respective images of the structured light pattern are formed. Ranging system 900 may use the differences between the images of the structured light pattern to determine a distance to object 908 from which the structured light pattern is backscattered. Structured light based ranging systems can have a high level of accuracy (e.g., accuracy in the sub-millimeter range) when the object 908 is relatively close to the sensor 904 (e.g., 0.2 meters to 2 meters). Such a high level of accuracy may be useful in facial recognition applications, such as unlocking mobile devices (e.g., mobile phones, tablet computers, etc.) and security applications.
In some examples, ranging system 900 is a time-of-flight (ToF) based system. In some examples where ranging system 900 is a ToF-based system, illuminator 902 generates pulses of light. In other words, the illuminator 902 can modulate the amplitude of the emitted light 906. In such an example, the sensor 904 detects return light 910 from the light pulse 906 generated by the illuminator 902. Ranging system 900 may then determine a distance to object 908 from which light 906 is backscattered based on a delay between the time light 906 is emitted and the time detected and a known speed of light in the air. In some examples, illuminator 902 can modulate the phase of emitted light 1404 instead of (or in addition to) modulating the amplitude of emitted light 906. In such an example, the sensor 904 may detect the phase of the return light 910 from the object 908 and determine the distance to the point on the object 908 using the light speed and based on the time difference between the time the illuminator 902 generated the light 906 of the particular phase and the time the sensor 904 detected the return light 910 of the particular phase.
In other examples, the point cloud may be generated without the use of the illuminator 902. For example, in some examples, the sensor 904 of the ranging system 900 may include two or more optical cameras. In such examples, ranging system 900 may use an optical camera to capture a stereoscopic image of an environment including object 908. Ranging system 900 (e.g., point cloud generator 920) may then calculate the difference between the locations in the stereoscopic image. Ranging system 900 may then use the differences to determine a distance to a location shown in the stereoscopic image. From these distances, the point cloud generator 920 may generate a point cloud.
The sensor 904 may also detect other properties of the object 908, such as color and reflectance information. In the example of fig. 17, a point cloud generator 920 may generate a point cloud based on the signals 918 generated by the sensor 904. Ranging system 900 and/or point cloud generator 920 may form part of data source 104 (fig. 1).
Fig. 18 is a conceptual diagram illustrating an example vehicle-based scenario in which one or more techniques of the present disclosure may be used. In the example of fig. 18, the vehicle 1000 includes a laser package 1002, such as a LIDAR system. The laser package 1002 may be implemented in the same manner as the laser package 600 (fig. 16). Although not shown in the example of fig. 18, the vehicle 1000 may also include data sources, such as the data source 104 (fig. 1) and G-PCC encoders, such as the G-PCC encoder 200 (fig. 1). In the example of fig. 18, a laser package 1002 emits a laser beam 1004, which laser beam 1004 reflects a pedestrian 1006 or other object in the road. The data source of the vehicle 1000 may generate a point cloud based on the signals generated by the laser package 1002. The G-PCC encoder of vehicle 1000 may encode the point cloud to generate a bit stream 1008, such as the geometric bit stream of fig. 2 and the attribute bit stream of fig. 2. The bit stream 1008 may include significantly fewer bits than the unencoded point cloud obtained by the G-PCC encoder. An output interface of the vehicle 1000 (e.g., the output interface 108 (fig. 1)) may send the bitstream 1008 to one or more other devices. Thus, the vehicle 1000 is able to send the bit stream 1008 to other devices faster than the unencoded point cloud data. Additionally, the bit stream 1008 may require less data storage capacity.
The techniques of this disclosure may further reduce the number of bits in the bitstream 1008. For example, encoding an object point separately from a ground point (e.g., using global motion information for the object point) may reduce the number of bits associated with the object point in the bitstream 1008.
In the example of fig. 18, a vehicle 1000 may send a bitstream 1008 to another vehicle 1010. The vehicle 1010 may include a G-PCC decoder, such as the G-PCC decoder 300 (fig. 1). The G-PCC decoder of the vehicle 1010 may decode the bitstream 1008 to reconstruct the point cloud. The vehicle 1010 may use the reconstructed point cloud for various purposes. For example, the vehicle 1010 may determine that the pedestrian 1006 is on a road in front of the vehicle 1000 based on the reconstructed point cloud and thus begin decelerating, e.g., even before the driver of the vehicle 1010 is aware that the pedestrian 1006 is on the road. Thus, in some examples, the vehicle 1010 may perform autonomous navigational operations, generate a notification or alert, or perform another action based on the reconstructed point cloud.
Additionally or alternatively, the vehicle 1000 may send the bitstream 1008 to the server system 1012. The server system 1012 may use the bit stream 1008 for various purposes. For example, the server system 1012 may store the bit stream 1008 for subsequent reconstruction of the point cloud. In this example, server system 1012 may use the point cloud along with other data (e.g., vehicle telemetry data generated by vehicle 1000) to train the autopilot system. In other examples, the server system 1012 may store the bit stream 1008 for subsequent reconstruction of the judicial collision survey (e.g., if the vehicle 1000 collides with the pedestrian 1006).
Fig. 19 is a conceptual diagram illustrating an example augmented reality system in which one or more techniques of the present disclosure may be used. Augmented reality (XR) is a term used to cover a range of technologies including Augmented Reality (AR), mixed Reality (MR), and Virtual Reality (VR). In the example of fig. 19, a first user 1100 is located in a first location 1102. The user 1100 wears an XR headset 1104. Instead of XR headset 1104, user 1100 may use a mobile device (e.g., a mobile phone, tablet, etc.). XR headset 1104 includes a depth detection sensor, such as a LIDAR system, that detects the position of a point on object 1106 at location 1102. The data source of the XR headset 1104 may use the signals generated by the depth detection sensor to generate a point cloud representation of the object 1106 at the location 1102. XR headset 1104 may include a G-PCC encoder (e.g., G-PCC encoder 200 of fig. 1) configured to encode a point cloud to generate bitstream 1108.
The techniques of this disclosure may further reduce the number of bits in the bitstream 1108. For example, encoding the object point separately from the ground point (e.g., using common global motion information for the object point) may reduce the number of bits in the bitstream 1108 associated with the third laser angle.
XR headset 1104 may send bit stream 1108 to XR headset 1110 worn by user 1112 at second location 1114 (e.g., via a network such as the internet). XR headset 1110 may decode bitstream 1108 to reconstruct the point cloud. The XR headset 1110 may use the point cloud to generate an XR visualization (e.g., AR, MR, VR visualization) that represents the object 1106 at the location 1102. Thus, in some examples, user 1112 at location 1114 may have a 3D immersive experience of location 1102, such as when XR headset 1110 generates a VR visualization. In some examples, XR headset 1110 may determine a location of the virtual object based on the reconstructed point cloud. For example, XR headset 1110 may determine that the environment (e.g., location 1102) includes a flat surface based on the reconstructed point cloud, and then determine that a virtual object (e.g., cartoon character) is to be located on the flat surface. The XR headset 1110 may generate an XR visualization in which the virtual object is in the determined location. For example, XR headset 1110 may display a cartoon character sitting on a flat surface.
Fig. 20 is a conceptual diagram illustrating an example mobile device system in which one or more techniques of this disclosure may be used. In the example of fig. 20, a mobile device 1200, such as a mobile phone or tablet computer, includes a depth detection sensor, such as a LIDAR system, that detects the location of a point on an object 1202 in the environment of the mobile device 1200. The data source of the mobile device 1200 may use the signals generated by the depth detection sensor to generate a point cloud representation of the object 1202. Mobile device 1200 may include a G-PCC encoder (e.g., G-PCC encoder 200 of fig. 1) configured to encode a point cloud to generate bitstream 1204. In the example of fig. 20, mobile device 1200 can send a bitstream to a remote device 1206 (such as a server system or other mobile device). The remote device 1206 may decode the bitstream 1204 to reconstruct the point cloud. The remote device 1206 may use the point cloud for various purposes. For example, the remote device 1206 may use the point cloud to generate an environment map of the mobile device 1200. For example, the remote device 1206 may generate a map of the interior of the building based on the reconstructed point cloud. In another example, the remote device 1206 may generate an image (e.g., a computer graphic) based on the point cloud. For example, the remote device 1206 may use points of the point cloud as vertices of the polygon and color attributes of the points as a basis for coloring the polygon. In some examples, the remote device 1206 may perform facial recognition using a point cloud.
The following clauses represent various examples of the techniques described in this disclosure:
clause 1: a method of encoding point cloud data, the method comprising: determining a height value of a point in the point cloud; classifying the points into a ground point set or an object point set according to the height value; and encoding and decoding the ground points and the object points according to the classification.
Clause 2: the method according to clause 1, wherein classifying the points comprises: determining a top threshold and a bottom threshold; classifying points with height values between a top threshold and a bottom threshold into a ground point set; and classifying points with height values above a top threshold or below a bottom threshold into a set of object points.
Clause 3: the method according to any one of clauses 1 and 2, wherein the top threshold comprises z_max i While the bottom threshold includes the ith value range { (x_min) i ,x_max i ),(y_min i ,y_max i ),(z_min i ,z_max i ) Z_min of } i
Clause 4: the method of clause 3, wherein the ith range of values comprises the ith range of values in the N ranges of values.
Clause 5: according to clause 3 and4, wherein x_min i And y_min i Having a value of minus infinity, and x_max i And y_max i With infinite values.
Clause 6: the method according to any of clauses 2-5, wherein encoding and decoding the ground points and object points further comprises: quantifying the ground points and the object points by a scaling factor; and quantizing the top and bottom thresholds by a scaling factor.
Clause 7: the method according to any of clauses 2-6, wherein encoding and decoding object points comprises: deriving a global set of motions for object points; and predicting object points using the global motion set.
Clause 8: the method of clause 7, wherein deriving the global motion set comprises deriving the global motion set from only object points.
Clause 9: the method according to any one of clauses 7 and 8, wherein the global motion set comprises a first global motion set, and wherein encoding and decoding the ground point comprises: deriving a second global set of motions for the ground points; and predicting the ground point using the second global motion set.
Clause 10: the method of clause 9, wherein deriving the second global motion set comprises deriving the second global motion set from only ground points.
Clause 11: the method of any of clauses 7-10, wherein deriving the global motion set comprises deriving a rotation matrix and a translation vector, and wherein encoding and decoding the object point comprises applying the rotation matrix and the translation vector to a reference point of the reference frame.
Clause 12: the method according to clause 11, wherein the encoding and decoding the object point further comprises: determining a local node motion vector for a node of the prediction tree, the node comprising a corresponding set of reference points for the reference frame; and applying the local node motion vector to the node.
Clause 13: the method of any of clauses 2-12, wherein determining the top threshold and the bottom threshold comprises determining the top threshold and the bottom threshold for a group of pictures (GOP) comprising a plurality of frames, the plurality of frames comprising the point cloud.
Clause 14: the method of any of clauses 2-12, wherein determining the top threshold and the bottom threshold comprises determining a top threshold and a bottom threshold corresponding to a Sequence Parameter Set (SPS) of a plurality of frames comprising a point cloud.
Clause 15: the method of any of clauses 13 and 14, wherein determining the top threshold and the bottom threshold comprises determining a top threshold and a bottom threshold of a sequential first frame of the plurality of frames.
Clause 16: the method of any of clauses 13 and 14, wherein determining the top threshold and the bottom threshold comprises determining the top threshold and the bottom threshold as weighted averages of the thresholds of the plurality of frames.
Clause 17: the method of any of clauses 2-16, further comprising encoding and decoding a Global Parameter Set (GPS) comprising data representing at least one of a top threshold or a bottom threshold.
Clause 18: the method of clause 17, wherein encoding the GPS includes encoding a value of a top threshold and a flag indicating whether the data is to be encoded for a bottom threshold.
Clause 19: the method according to any one of clauses 17 and 18, wherein encoding and decoding the data of at least one of the top threshold or the bottom threshold comprises: encoding and decoding a value of geom_global_threshold 0 representing a top threshold; and encoding and decoding a value of geom_global_threshold 1 representing the bottom threshold.
Clause 20: the method of any of clauses 17-19, wherein encoding and decoding the data representing at least one of the top threshold or the bottom threshold comprises encoding and decoding the data representing at least one of the top threshold or the bottom threshold using a corresponding unsigned integer 0 th order Exp-Golomb value.
Clause 21: the method of any of clauses 17-19, wherein encoding and decoding the data representing at least one of the top threshold or the bottom threshold comprises encoding and decoding the data representing at least one of the top threshold or the bottom threshold using a corresponding signed integer 0 th order Exp-Golomb value.
Clause 22: the method of any of clauses 17-19, wherein encoding data representing at least one of the top threshold or the bottom threshold comprises encoding data representing at least one of the top threshold or the bottom threshold using a corresponding signed fixed length value, the method further comprising encoding data representing a number of bits assigned to at least one of the top threshold or the bottom threshold.
Clause 23: the method of any of clauses 17-22, wherein encoding and decoding data representing at least one of a top threshold or a bottom threshold comprises: encoding and decoding data representing a midpoint between the top threshold and the bottom threshold; and encoding and decoding data representing distances from the midpoint to the top and bottom thresholds.
Clause 24: the method of any of clauses 17-23, further comprising encoding and decoding a geometric data unit header (GDH) comprising data that covers or refines the data of the GPS for at least one of the top threshold or the bottom threshold.
Clause 25: the method of any of clauses 2-24, wherein determining the top threshold and the bottom threshold comprises: determining a maximum histogram height value max_box_t; determining a minimum histogram height value min_box_t; determining a histogram scale value hist_scale; determining a histogram bin size value hist_bin_size according to int ((max_box_t-min_box_t)/hist_scale); generating a histogram of points having a height value in a range from min_box_t to max_box_t; calculating the standard deviation of the histogram; determining a bin in the histogram having the largest number of height values; and determining the top and bottom thresholds from offsets from bins having the largest number of height values, the offsets being defined according to respective multiples of the standard deviation.
Clause 26: a method of encoding point cloud data, the method comprising: determining a first classification associated with a first slice of a frame of point cloud data, the first slice including a first one or more points; determining that the first one or more points correspond to a first classification; encoding and decoding the first one or more points according to the determination that the first one or more points correspond to the first classification; determining a second classification associated with a second slice of the frame of point cloud data, the second slice including a second one or more points; determining that the second one or more points correspond to a second classification; and encoding the second one or more points based on a determination that the second one or more points correspond to the second classification.
Clause 27: a method comprising a combination of the method of any one of clauses 1-25 and the method of clause 26.
Clause 28: the method according to any one of clauses 26 and 27, further comprising: determining a third classification associated with a third slice of the frame of point cloud data, the third slice including a third one or more points; determining that the third one or more points correspond to a third classification; and encode the third one or more points based on a determination that the third one or more points correspond to the third classification.
Clause 29: the method according to any of clauses 26-28, wherein the first classification and the second classification include at least one of a road, a partition, a nearby vehicle, a building, a sign, a traffic light, or a pedestrian.
Clause 30: the method of any of clauses 26-29, further comprising encoding and decoding data representing a slice group of the first slice, wherein encoding and decoding the first one or more points comprises encoding and decoding the first one or more points along with other points included in one or more other slices corresponding to the slice group.
Clause 31: the method according to clause 30, further comprising: determining a third classification associated with a third slice of the frame of point cloud data, the third slice being one of the one or more other slices corresponding to the slice group; and encoding and decoding a third one or more points of a third slice along with the first one or more points.
Clause 32: the method of any of clauses 30 and 31, further comprising encoding and decoding an index value representing each slice of the corresponding slice group.
Clause 33: the method of any of clauses 26-32, wherein encoding the first one or more points comprises predicting at least one of the first one or more points from a third one or more points of a third slice.
Clause 34: the method of clause 33, wherein the frame comprises a first frame and the third slice forms a portion of a second frame different from the first frame.
Clause 35: the method according to any of clauses 26-34, further comprising determining a respective motion parameter for each of the first one or more points and the second one or more points.
Clause 36: a method of encoding point cloud data, the method comprising: determining one or more regions of a frame of point cloud data; and for each region: encoding and decoding data representing corresponding motion parameters of the region; and encoding and decoding the points of the region using the corresponding motion parameters of the region.
Clause 37: a method comprising a combination of the method of any one of clauses 1-35 and the method of clause 36.
Clause 38: the method according to any one of clauses 36 and 37, further comprising encoding and decoding data representing a plurality of regions included in the frame.
Clause 39: the method according to any of clauses 36-38, further comprising specifying codec parameters for each region of the frame.
Clause 40: the method of clause 39, wherein the parameter comprises at least one of an upper bound or a lower bound of one or more of an x-coordinate of the region, a y-coordinate of the region, or a z-coordinate of the region.
Clause 41: the method according to any one of clauses 39 and 40, further comprising determining a default value for one or more coordinates of the region.
Clause 42: the method according to any of clauses 36-41, wherein encoding and decoding the points of the region comprises applying motion compensation to the points of the region using the corresponding motion parameters.
Clause 43: a method of encoding point cloud data, the method comprising: determining a height value of a point in the point cloud; classifying the points into a ground point set or an object point set according to the height value; and encoding and decoding the ground points and the object points according to the classification.
Clause 44: the method according to clause 43, wherein classifying the point comprises: determining a top threshold and a bottom threshold; classifying points with height values between a top threshold and a bottom threshold into a ground point set; and classifying points with height values above a top threshold or below a bottom threshold into a set of object points.
Clause 45: the method according to clause 43, wherein the top threshold comprises z_max i While the bottom threshold includes the ith value range { (x_min) i ,x_max i ),(y_min i ,y_max i ),(z_min i ,z_max i ) Z_min of } i
Clause 46: the method of clause 45, wherein the ith range of values comprises the ith range of values in the N ranges of values.
Clause 47: the method according to clause 45, wherein x_min i And y_min i Having a value of minus infinity, and x_max i And y_max i With infinite values.
Clause 48: the method according to clause 44, wherein encoding and decoding the ground points and object points further comprises: quantifying the ground points and the object points by a scaling factor; and quantizing the top and bottom thresholds by a scaling factor.
Clause 49: the method according to clause 44, wherein encoding and decoding the object point comprises: deriving a global set of motions for object points; and predicting object points using the global motion set.
Clause 50: the method of clause 49, wherein deriving the global motion set comprises deriving the global motion set only from object points.
Clause 51: the method of clause 50, wherein the global motion set comprises a first global motion set, and wherein encoding and decoding the ground points comprises: deriving a second global set of motions for the ground points; and predicting the ground point using the second global motion set.
Clause 52: the method of clause 51, wherein deriving the second global motion set comprises deriving the second global motion set from only ground points.
Clause 53: the method of clause 49, wherein deriving the global motion set comprises deriving a rotation matrix and a translation vector, and wherein encoding and decoding the object points comprises applying the rotation matrix and the translation vector to reference points of the reference frame.
Clause 54: the method according to clause 53, wherein encoding and decoding the object point further comprises: determining a local node motion vector for a node of the prediction tree, the node comprising a corresponding set of reference points for the reference frame; and applying the local node motion vector to the node.
Clause 55: the method of clause 44, wherein determining the top threshold and the bottom threshold comprises determining the top threshold and the bottom threshold for a group of pictures (GOP) that includes a plurality of frames, the plurality of frames including the point cloud.
Clause 56: the method of clause 44, wherein determining the top and bottom thresholds comprises determining the top and bottom thresholds corresponding to a Sequence Parameter Set (SPS) of a plurality of frames including the point cloud.
Clause 57: the method of clause 56, wherein determining the top threshold and the bottom threshold comprises determining a top threshold and a bottom threshold of a sequential first frame of the plurality of frames.
Clause 58: the method of clause 56, wherein determining the top threshold and the bottom threshold comprises determining the top threshold and the bottom threshold as weighted averages of the thresholds of the plurality of frames.
Clause 59: the method of clause 44, further comprising encoding and decoding a Global Parameter Set (GPS) that includes data representing at least one of a top threshold or a bottom threshold.
Clause 60: the method of clause 59, wherein encoding the GPS includes encoding a value of a top threshold and a flag indicating whether the data is to be encoded for a bottom threshold.
Clause 61: the method of clause 59, wherein encoding and decoding the data of at least one of the top threshold or the bottom threshold comprises: encoding and decoding a value of geom_global_threshold 0 representing a top threshold; and encoding and decoding a value of geom_global_threshold 1 representing the bottom threshold.
Clause 62: the method of clause 59, wherein encoding the data representing at least one of the top threshold or the bottom threshold comprises encoding the data representing at least one of the top threshold or the bottom threshold using respective unsigned integer 0 th order Exp-Golomb values.
Clause 63: the method of clause 59, wherein encoding the data representing at least one of the top threshold or the bottom threshold comprises encoding the data representing at least one of the top threshold or the bottom threshold using respective unsigned integer 0 th order Exp-Golomb values.
Clause 64: the method of clause 59, wherein encoding the data representing at least one of the top threshold or the bottom threshold comprises encoding the data representing at least one of the top threshold or the bottom threshold using the corresponding signed fixed length value, the method further comprising encoding the data representing the number of bits assigned to at least one of the top threshold or the bottom threshold.
Clause 65: the method of clause 59, wherein encoding and decoding the data representing at least one of the top threshold or the bottom threshold comprises: encoding and decoding data representing a midpoint between the top threshold and the bottom threshold; and encoding and decoding data representing distances from the midpoint to the top and bottom thresholds.
Clause 66: the method of clause 59, further comprising encoding and decoding a geometric data unit header (GDH) comprising data that covers or refines the data of the GPS for at least one of the top threshold or the bottom threshold.
Clause 67: the method of clause 44, wherein determining the top threshold and the bottom threshold comprises: determining a maximum histogram height value max_box_t; determining a minimum histogram height value min_box_t; determining a histogram scale value hist_scale; determining a histogram bin size value hist_bin_size according to int ((max_box_t-min_box_t)/hist_scale); generating a histogram of points having a height value in a range from min_box_t to max_box_t; calculating the standard deviation of the histogram; determining a bin in the histogram having the largest number of height values; and determining the top and bottom thresholds from offsets from bins having the largest number of height values, the offsets being defined according to respective multiples of the standard deviation.
Clause 68: a method of encoding point cloud data, the method comprising: determining a first classification associated with a first slice of a frame of point cloud data, the first slice including a first one or more points; determining that the first one or more points correspond to a first classification; encoding and decoding the first one or more points according to the determination that the first one or more points correspond to the first classification; determining a second classification associated with a second slice of the frame of point cloud data, the second slice including a second one or more points; determining that the second one or more points correspond to a second classification; and encoding the second one or more points based on a determination that the second one or more points correspond to the second classification.
Clause 69: the method according to clause 68, further comprising: determining a third classification associated with a third slice of the frame of point cloud data, the third slice including a third one or more points; determining that the third one or more points correspond to a third classification; and encode the third one or more points based on a determination that the third one or more points correspond to the third classification.
Clause 70: the method of clause 68, wherein the first classification and the second classification comprise at least one of a road, a partition, a nearby vehicle, a building, a sign, a traffic light, or a pedestrian.
Clause 71: the method of clause 68, further comprising encoding and decoding data representing a slice group of the first slice, wherein encoding and decoding the first one or more points comprises encoding and decoding the first one or more points along with other points included in one or more other slices corresponding to the slice group.
Clause 72: the method according to clause 71, further comprising: determining a third classification associated with a third slice of the frame of point cloud data, the third slice being one of the one or more other slices corresponding to the slice group; and encoding and decoding a third one or more points of a third slice along with the first one or more points.
Clause 73: the method of clause 71, further comprising encoding and decoding an index value representing each slice of the corresponding slice group.
Clause 74: the method of clause 68, wherein encoding the first one or more points comprises predicting at least one of the first one or more points from a third one or more points of a third slice.
Clause 75: the method of clause 74, wherein the frame comprises a first frame and the third slice forms a portion of a second frame different from the first frame.
Clause 76: the method of clause 68, further comprising determining a respective motion parameter for each of the first one or more points and the second one or more points.
Clause 77: a method of encoding point cloud data, the method comprising: determining one or more regions of a frame of point cloud data; and for each region: encoding and decoding data representing corresponding motion parameters of the region; and encoding and decoding the points of the region using the corresponding motion parameters of the region.
Clause 78: the method according to clause 77, further comprising encoding and decoding data representing the plurality of regions included in the frame.
Clause 79: the method according to clause 77, further comprising specifying codec parameters for each region of the frame.
Clause 80: the method of clause 79, wherein the parameter comprises at least one of an upper bound or a lower bound of one or more of an x-coordinate of the region, a y-coordinate of the region, or a z-coordinate of the region.
Clause 81: the method of clause 79, further comprising determining a default value for one or more coordinates of the region.
Clause 82: the method according to clause 77, wherein encoding and decoding the points of the region includes applying motion compensation to the points of the region using the corresponding motion parameters.
Clause 83: the method according to any of clauses 1-82, wherein the encoding and decoding comprises decoding.
Clause 84: the method according to any of clauses 1-83, wherein the codec comprises coding.
Clause 85: an apparatus for decoding point cloud data, the apparatus comprising one or more components for performing the method of any of clauses 1-84.
Clause 86: the device of clause 85, further comprising a display configured to display the point cloud data.
Clause 87: the device of any of clauses 85-86, wherein the device comprises one or more of a camera, a computer, a mobile device, a broadcast receiver device, or a set-top box.
Clause 88: the apparatus of clauses 85-87, further comprising a memory configured to store the point cloud data.
Clause 89: a computer readable storage medium having instructions stored thereon that, when executed, cause a processor to perform the method of any of clauses 1-84.
Clause 90: an apparatus for encoding point cloud data, the apparatus comprising: means for determining a height value of a point in the point cloud; classifying the points into a ground point set or an object point set according to the height value; and encoding and decoding the ground points and the object points according to the classification.
Clause 91: the apparatus according to clause 90, wherein the means for classifying points comprises: means for determining a top threshold and a bottom threshold; means for classifying points having a height value between a top threshold and a bottom threshold into a ground point set; and means for classifying points having a height value above a top threshold or below a bottom threshold into a set of object points.
Clause 92: an apparatus for encoding point cloud data, the apparatus comprising: determining a first classification component associated with a first slice of a frame of point cloud data, the first slice comprising a first one or more points; means for determining that the first one or more points correspond to a first classification means; means for encoding and decoding the first one or more points based on a determination that the first one or more points correspond to the first classification; means for determining a second classification associated with a second slice of the frame of point cloud data, the second slice including a second one or more points; means for determining that the second one or more points correspond to a second classification; and means for encoding and decoding the second one or more points based on a determination that the second one or more points correspond to the second classification.
Clause 93: an apparatus for encoding point cloud data, the apparatus comprising: means for determining one or more regions of a frame of point cloud data; means for encoding and decoding data representing respective motion parameters for each region; and means for encoding and decoding the points of each region using the respective motion parameters of the region including the points.
Clause 94: a method of encoding point cloud data, the method comprising: determining a height value of a point in the point cloud; classifying the points into a ground point set or an object point set according to the height value; and encoding and decoding the ground points and the object points according to the classification.
Clause 95: the method according to clause 94, wherein encoding and decoding the object point comprises: deriving a global motion information set of the object points; and predicting object points using the global motion information set.
Clause 96: the method of clause 95, wherein deriving the global motion information set comprises deriving the global motion information set only for object points.
Clause 97: the method of clause 95, wherein the set of global motion information comprises a first set of global motion information, and wherein encoding and decoding the object point comprises: deriving a second global motion information set of ground points; and predicting the ground point using the second set of global motion information.
Clause 98: the method of clause 97, wherein deriving the second set of global motion information comprises deriving the second set of global motion information only for ground points.
Clause 99: the method of clause 95, wherein deriving the global motion information set comprises deriving a rotation matrix and a translation vector, and wherein encoding and decoding the object points comprises applying the rotation matrix and the translation vector to reference points of the reference frame.
Clause 100: the method according to clause 99, wherein encoding and decoding the object point further comprises: determining a local node motion vector for a node of the prediction tree, the node comprising a corresponding set of reference points for the reference frame; and applying the local node motion vector to the node.
Clause 101: the method according to clause 94, wherein classifying the points comprises: determining a top threshold and a bottom threshold; classifying points with height values between a top threshold and a bottom threshold into a ground point set; and classifying points with height values above a top threshold or below a bottom threshold into a set of object points.
Clause 102: the method according to clause 101, wherein the top threshold comprises z_max i While the bottom threshold includes the ith value range { (x_min) i ,x_max i ),(y_min i ,y_max i ),(z_min i ,z_max i ) Z_min of } i
Clause 103: the method of clause 102, wherein the ith range of values comprises the ith range of values in the N ranges of values.
Clause 104: the method according to clause 102, wherein x_min i And y_min i Having a value of minus infinity, and x_max i And y_max i With infinite values.
Clause 105: the method according to clause 101, wherein encoding and decoding the ground points and object points further comprises: quantifying the ground points and the object points by a scaling factor; and quantizing the top and bottom thresholds by a scaling factor.
Clause 106: the method of clause 101, wherein determining the top threshold and the bottom threshold comprises determining the top threshold and the bottom threshold for a group of pictures (GOP) comprising a plurality of frames, the plurality of frames comprising the point cloud.
Clause 107: the method of clause 101, wherein determining the top and bottom thresholds comprises determining the top and bottom thresholds corresponding to a Sequence Parameter Set (SPS) of a plurality of frames comprising a point cloud.
Clause 108: the method of clause 106, wherein determining the top threshold and the bottom threshold comprises determining a top threshold and a bottom threshold of a sequential first frame of the plurality of frames.
Clause 109: the method of clause 106, wherein determining the top threshold and the bottom threshold comprises determining the top threshold and the bottom threshold as weighted averages of the thresholds of the plurality of frames.
Clause 110: the method of clause 101, further comprising encoding and decoding a data structure comprising data representing at least one of a top threshold or a bottom threshold.
Clause 111: the method of clause 110, wherein encoding the data structure comprises encoding at least one of a Sequence Parameter Set (SPS), a Geometric Parameter Set (GPS), or a geometric data unit header (GDH).
Clause 112: the method of clause 110, wherein encoding the data structure comprises encoding a value of a top threshold and a flag indicating whether the data is to be encoded for a bottom threshold.
Clause 113: the method of clause 110, wherein encoding and decoding the data of at least one of the top threshold or the bottom threshold comprises: encoding and decoding a value of geom_global_threshold 0 representing a top threshold; and encoding and decoding a value of geom_global_threshold 1 representing the bottom threshold.
Clause 114: the method of clause 110, wherein encoding the data representing at least one of the top threshold or the bottom threshold comprises encoding the data representing at least one of the top threshold or the bottom threshold using respective unsigned integer 0 th order Exp-Golomb values.
Clause 115: the method of clause 110, wherein encoding the data representing at least one of the top threshold or the bottom threshold comprises encoding the data representing at least one of the top threshold or the bottom threshold using a corresponding signed integer 0 th order Exp-Golomb value.
Clause 116: the method of clause 110, wherein encoding data representing at least one of the top threshold or the bottom threshold comprises encoding data representing at least one of the top threshold or the bottom threshold using the corresponding signed fixed length value, the method further comprising encoding data representing a number of bits assigned to at least one of the top threshold or the bottom threshold.
Clause 117: the method of clause 110, wherein encoding and decoding the data representing at least one of the top threshold or the bottom threshold comprises: encoding and decoding data representing a midpoint between the top threshold and the bottom threshold; and encoding and decoding data representing distances from the midpoint to the top and bottom thresholds.
Clause 118: the method of clause 110, further comprising encoding and decoding a geometric data unit header (GDH) comprising data that covers or refines the data of the GPS for at least one of the top threshold or the bottom threshold.
Clause 119: the method of clause 101, wherein determining the top and bottom thresholds comprises: determining a maximum histogram height value max_box_t; determining a minimum histogram height value min_box_t; determining a histogram scale value hist_scale; determining a histogram bin size value hist_bin_size according to int ((max_box_t-min_box_t)/hist_scale); generating a histogram of points having a height value in a range from min_box_t to max_box_t; calculating the standard deviation of the histogram; determining a bin in the histogram having the largest number of height values; and determining the top and bottom thresholds from offsets from bins having the largest number of height values, the offsets being defined according to respective multiples of the standard deviation.
Clause 120: an apparatus for encoding point cloud data, the apparatus comprising: a memory configured to store data representing points of a point cloud; and one or more processors implemented in the circuitry configured to: determining a height value of a point in the point cloud; classifying the points into a ground point set or an object point set according to the height value; and encoding and decoding the ground points and the object points according to the classification.
Clause 121: the apparatus of clause 120, wherein to encode the object point, the one or more processors are configured to: deriving a global motion information set of the object points; and predicting object points using the global motion information set.
Clause 122: the apparatus of clause 121, wherein the one or more processors are configured to derive the global motion information set only for object points.
Clause 123: the apparatus of clause 121, wherein the set of global motion information comprises a first set of global motion information, and wherein to codec the ground point, the one or more processors are configured to: deriving a second global motion information set of ground points; and predicting the ground point using the second set of global motion information.
Clause 124: the device of clause 123, wherein the one or more processors are configured to derive the second set of global motion information only for ground points.
Clause 125: the apparatus of clause 121, wherein to derive the global motion information set, the one or more processors are configured to derive a rotation matrix and a translation vector, and wherein to encode the object point, the one or more processors are configured to apply the rotation matrix and the translation vector to a reference point of the reference frame.
Clause 126: the apparatus of clause 125, wherein to encode the object point, the one or more processors are further configured to: determining a local node motion vector of a node of the prediction tree, the node comprising a corresponding set of reference points of the reference frame; and applying the local node motion vector to the node.
Clause 127: the apparatus according to clause 120, wherein to categorize the points, the one or more processors are configured to: determining a top threshold and a bottom threshold; classifying points having a height value between a top threshold and a bottom threshold into a ground point set; and classifying points having a height value above a top threshold or below a bottom threshold into the set of object points.
Clause 128: the apparatus of clause 127, wherein the top threshold comprises z_max i While the bottom threshold includes the ith value range { (x_min) i ,x_max i ),(y_min i ,y_max i ),(z_min i ,z_max i ) Z_min of } i
Clause 129: the apparatus of clause 128, wherein the ith range of values comprises the ith range of values in the N ranges of values.
Clause 130: the apparatus according to clause 128, wherein x_min i And y_min i Having a value of minus infinity, and x_max i And y_max i With infinite values.
Clause 131: the apparatus of clause 127, wherein to classify the ground points and the object points, the one or more processors are further configured to: quantifying the ground points and the object points by a scaling factor; and quantizing the top and bottom thresholds by a scaling factor.
Clause 132: the apparatus of clause 127, wherein the one or more processors are further configured to codec a data structure comprising data representing at least one of a top threshold or a bottom threshold.
Clause 133: a method of encoding point cloud data, the method comprising: determining a height value of a point in the point cloud; classifying the points into a ground point set or an object point set according to the height value; and encoding and decoding the ground points and the object points according to the classification.
Clause 134: the method of clause 133, wherein encoding and decoding the object point comprises: deriving a global motion information set of the object points; and predicting object points using the global motion information set.
Clause 135: the method of clause 134, wherein deriving the global motion information set comprises deriving the global motion information set only for object points.
Clause 136: the method of any of clauses 134 and 135, wherein the set of global motion information comprises a first set of global motion information, and wherein encoding and decoding the ground point comprises: deriving a second global motion information set of ground points; and predicting the ground point using the second set of global motion information.
Clause 137: the method of clause 136, wherein deriving the second set of global motion information comprises deriving the second set of global motion information only for ground points.
Clause 138: the method of any of clauses 134-137, wherein deriving the global motion information set comprises deriving a rotation matrix and a translation vector, and wherein encoding and decoding the object point comprises applying the rotation matrix and the translation vector to a reference point of the reference frame.
Clause 139: the method according to clause 138, wherein encoding and decoding the object point further comprises: determining a local node motion vector for a node of the prediction tree, the node comprising a corresponding set of reference points for the reference frame; and applying the local node motion vector to the node.
Clause 140: the method according to any of clauses 139-139, wherein classifying the points comprises: determining a top threshold and a bottom threshold; classifying points with height values between a top threshold and a bottom threshold into a ground point set; and classifying points with height values above a top threshold or below a bottom threshold into a set of object points.
Clause 141: the method of clause 140, wherein the top threshold comprises z_max i While the bottom threshold includes the ith value range { (x_min) i ,x_max i ),(y_min i ,y_max i ),(z_min i ,z_max i ) Z_min of } i
Clause 142: the method of clause 141, wherein the ith range of values comprises the ith range of values in the N ranges of values.
Clause 143: the method according to any of clauses 141 and 142, wherein x_min i And y_min i Having a value of minus infinity, and x_max i And y_max i With infinite values.
Clause 144: the method according to any of clauses 134-143, wherein encoding and decoding the ground points and object points further comprises: quantifying the ground points and the object points by a scaling factor; and quantizing the top and bottom thresholds by a scaling factor.
Clause 145: the method of any of clauses 140-144, wherein determining the top threshold and the bottom threshold comprises determining the top threshold and the bottom threshold for a group of pictures (GOP) comprising a plurality of frames, the plurality of frames comprising the point cloud.
Clause 146: the method of any of clauses 140-145, wherein determining the top threshold and the bottom threshold comprises determining a top threshold and a bottom threshold corresponding to a Sequence Parameter Set (SPS) of a plurality of frames comprising a point cloud.
Clause 147: the method of any of clauses 140-144, wherein determining the top threshold and the bottom threshold comprises determining a top threshold and a bottom threshold of a sequential first frame of the plurality of frames.
Clause 148: the method of any of clauses 140-146, wherein determining the top threshold and the bottom threshold comprises determining the top threshold and the bottom threshold as weighted averages of the thresholds of the plurality of frames.
Clause 149: the method of any of clauses 140-148, further comprising encoding and decoding a data structure comprising data representing at least one of a top threshold or a bottom threshold.
Clause 150: the method of clause 149, wherein encoding the data structure comprises encoding at least one of a Sequence Parameter Set (SPS), a Geometric Parameter Set (GPS), or a geometric data unit header (GDH).
Clause 151: the method of any of clauses 149 and 150, wherein encoding the data structure comprises encoding a value of a top threshold and a flag indicating whether the data is to be encoded for a bottom threshold.
Clause 152: the method of any of clauses 149-151, wherein encoding and decoding the data of at least one of the top threshold or the bottom threshold comprises: encoding and decoding a value of geom_global_threshold 0 representing a top threshold; and encoding and decoding a value of geom_global_threshold 1 representing the bottom threshold.
Clause 153: the method of any of clauses 149-152, wherein encoding and decoding the data representing at least one of the top threshold or the bottom threshold comprises encoding and decoding the data representing at least one of the top threshold or the bottom threshold using a corresponding unsigned integer 0 th order Exp-Golomb value.
Clause 154: the method of any of clauses 149-152, wherein encoding and decoding the data representing at least one of the top threshold or the bottom threshold comprises encoding and decoding the data representing at least one of the top threshold or the bottom threshold using a corresponding signed integer 0 th order Exp-Golomb value.
Clause 155: the method of any of clauses 149-152, wherein encoding data representing at least one of the top threshold or the bottom threshold comprises encoding data representing at least one of the top threshold or the bottom threshold using a corresponding signed fixed length value, the method further comprising encoding data representing a number of bits assigned to at least one of the top threshold or the bottom threshold.
Clause 156: the method of any of clauses 149-155, wherein encoding and decoding data representing at least one of a top threshold or a bottom threshold comprises: encoding and decoding data representing a midpoint between the top threshold and the bottom threshold; and encoding and decoding data representing distances from the midpoint to the top and bottom thresholds.
Clause 157: the method of any of clauses 149-156, further comprising encoding a geometric data unit header (GDH) comprising data that covers or refines the data of the GPS for at least one of the top threshold or the bottom threshold.
Clause 158: the method of any of clauses 133-157, wherein determining the top threshold and the bottom threshold comprises: determining a maximum histogram height value max_box_t; determining a minimum histogram height value min_box_t; determining a histogram scale value hist_scale; determining a histogram bin size value hist_bin_size according to int ((max_box_t-min_box_t)/hist_scale); generating a histogram of points having a height value in a range from min_box_t to max_box_t; calculating the standard deviation of the histogram; determining a bin in the histogram having the largest number of height values; and determining the top and bottom thresholds from offsets from bins having the largest number of height values, the offsets being defined according to respective multiples of the standard deviation.
Clause 159: an apparatus for encoding point cloud data, the apparatus comprising: a memory configured to store data representing points of a point cloud; and one or more processors implemented in the circuitry configured to: determining a height value of a point in the point cloud; classifying the points into a ground point set or an object point set according to the height value; and encoding and decoding the ground points and the object points according to the classification.
Clause 160: the apparatus of clause 159, wherein to encode the object point, the one or more processors are configured to: deriving a global motion information set of the object points; and predicting object points using the global motion information set.
Clause 161: the apparatus of clause 160, wherein the one or more processors are configured to derive the global motion information set only for object points.
Clause 162: the apparatus of any one of clauses 160 and 161, wherein the global motion information set comprises a first global motion information set, and wherein to codec a ground point, the one or more processors are configured to: deriving a second global motion information set of ground points; and predicting the ground point using the second set of global motion information.
Clause 163: the apparatus of clause 162, wherein the one or more processors are configured to derive the second set of global motion information only for ground points.
Clause 164: the apparatus of any of clauses 163-163, wherein to derive the global motion information set, the one or more processors are configured to derive a rotation matrix and a translation vector, and wherein to encode and decode the object point, the one or more processors are configured to apply the rotation matrix and the translation vector to a reference point of the reference frame.
Clause 165: the apparatus of clause 164, wherein to encode the object point, the one or more processors are further configured to: determining a local node motion vector of a node of the prediction tree, the node comprising a corresponding set of reference points of the reference frame; and applying the local node motion vector to the node.
Clause 166: the apparatus of any of clauses 159-165, wherein to categorize the points, the one or more processors are configured to: determining a top threshold and a bottom threshold; classifying points having a height value between a top threshold and a bottom threshold into a ground point set; and classifying points having a height value above a top threshold or below a bottom threshold into the set of object points.
Clause 167: the apparatus of clause 166, wherein the top threshold comprises z_max i While the bottom threshold includes the ith value range { (x_min) i ,x_max i ),(y_min i ,y_max i ),(z_min i ,z_max i ) Z_min of } i
Clause 168: the apparatus of clause 167, wherein the ith range of values comprises the ith range of values in the N ranges of values.
Clause 169: the apparatus according to any one of clauses 167 and 168, wherein x_min i And y_min i Having a value of minus infinity, and x_max i And y_max i With infinite values.
Clause 170: the apparatus of any of clauses 169-169, wherein to classify a ground point and an object point, the one or more processors are further configured to: quantifying the ground points and the object points by a scaling factor; and quantizing the top and bottom thresholds by a scaling factor.
Clause 171: the apparatus of any of clauses 166-170, wherein the one or more processors are further configured to codec a data structure comprising data representing at least one of a top threshold or a bottom threshold.
It should be appreciated that certain acts or events of any of the techniques described herein can be performed in a different order, may be added, combined, or omitted entirely, depending on the example (e.g., not all of the described acts or events are necessary for the practice of the technique). Further, in some examples, acts or events may be performed concurrently (e.g., through multi-threaded processing, interrupt processing, or multiple processors) rather than sequentially.
In one or more examples, the described functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. The computer-readable medium may include a computer-readable storage medium corresponding to a tangible medium, such as a data storage medium, or a communication medium including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for use in implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Further, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the terms "processor" and "processing circuitry" as used herein may refer to any of the foregoing structures or any other structure suitable for implementation of the techniques described herein. Furthermore, in some aspects, the functionality described herein may be provided in dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Moreover, the techniques may be implemented entirely in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses including a wireless handset, an Integrated Circuit (IC), or a group of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit or provided by a collection of interoperable hardware units including one or more processors as described above in combination with appropriate software and/or firmware.
Various examples have been described. These and other examples are within the scope of the following claims.

Claims (47)

1. A method of encoding point cloud data, the method comprising:
determining a height value of a point in the point cloud;
classifying the points into a ground point set or an object point set according to the height value; and
and encoding and decoding the ground points and the object points according to the classification.
2. The method of claim 1, wherein encoding and decoding the object point comprises:
deriving a global motion information set of the object points; and
the global motion information set is used to predict the object point.
3. The method of claim 2, wherein deriving the global motion information set comprises deriving the global motion information set only for the object points.
4. The method of claim 2, wherein the set of global motion information comprises first global motion information, and wherein encoding and decoding the ground point comprises:
deriving a second global motion information set for the ground points; and
the ground point is predicted using the second set of global motion information.
5. The method of claim 4, wherein deriving the second set of global motion information comprises deriving the second set of global motion information only for the ground points.
6. The method of claim 2, wherein deriving the global set of motion information comprises deriving a rotation matrix and a translation vector, and wherein encoding and decoding the object point comprises applying the rotation matrix and the translation vector to a reference point of a reference frame.
7. The method of claim 6, wherein encoding and decoding the object point further comprises:
determining local node motion vectors for nodes of a prediction tree, the nodes comprising respective sets of reference points of the reference frame; and
the local node motion vector is applied to the node.
8. The method of claim 1, wherein classifying the points comprises:
determining a top threshold and a bottom threshold;
classifying points having a height value between the top threshold and the bottom threshold into the ground point set; and
points having a height value above the top threshold or below the bottom threshold are classified into the set of object points.
9. The method of claim 8, wherein the top threshold comprises z_max i The bottom threshold includes an ith value range { (x_min) i ,x_max i ),(y_min i ,y_max i ),(z_min i ,z_max i ) Z_min of } i
10. The method of claim 9, wherein the ith range of values comprises an ith range of values in N ranges of values.
11. The method of claim 9, wherein x_min i And y_min i Having a value of minus infinity, x_max i And y_max i With infinite values.
12. The method of claim 8, wherein encoding and decoding the ground points and the object points further comprises:
quantifying the ground points and the object points by a scaling factor; and
the top threshold and the bottom threshold are quantized by the scaling factor.
13. The method of claim 8, wherein determining the top threshold and the bottom threshold comprises determining the top threshold and the bottom threshold for a group of pictures (GOP) comprising a plurality of frames, the plurality of frames comprising the point cloud.
14. The method of claim 8, wherein determining the top threshold and the bottom threshold comprises determining the top threshold and the bottom threshold for a Sequence Parameter Set (SPS) corresponding to a plurality of frames comprising the point cloud.
15. The method of claim 13, wherein determining the top threshold and the bottom threshold comprises determining the top threshold and the bottom threshold for a sequential first frame of the plurality of frames.
16. The method of claim 13, wherein determining the top threshold and the bottom threshold comprises determining the top threshold and the bottom threshold as weighted averages of thresholds of the plurality of frames.
17. The method of claim 8, further comprising encoding and decoding a data structure comprising data representing at least one of the top threshold or the bottom threshold.
18. The method of claim 17, wherein encoding the data structure comprises encoding at least one of a Sequence Parameter Set (SPS), a Geometric Parameter Set (GPS), or a geometric data unit header (GDH).
19. The method of claim 17, wherein encoding the data structure comprises encoding a value of the top threshold and a flag indicating whether data is to be encoded for the bottom threshold.
20. The method of claim 17, wherein encoding and decoding the data of at least one of the top threshold or the bottom threshold comprises:
encoding and decoding a value of geom_global_threshold 0 representing the top threshold; and
the value of geom_global_threshold 1 representing the bottom threshold is encoded and decoded.
21. The method of claim 17, wherein encoding the data representative of at least one of the top threshold or the bottom threshold comprises encoding the data representative of at least one of the top threshold or the bottom threshold using a respective unsigned integer 0 th order Exp-Golomb value.
22. The method of claim 17, wherein encoding the data representative of at least one of the top threshold or the bottom threshold comprises encoding the data representative of at least one of the top threshold or the bottom threshold using a respective signed integer 0 th order Exp-Golomb value.
23. The method of claim 17, wherein encoding the data representative of at least one of the top threshold or the bottom threshold comprises encoding the data representative of at least one of the top threshold or the bottom threshold using a respective signed fixed length value, the method further comprising encoding data representative of a number of bits assigned to at least one of the top threshold or the bottom threshold.
24. The method of claim 17, wherein encoding and decoding the data representing at least one of the top threshold or the bottom threshold comprises:
encoding and decoding data representing a midpoint between the top threshold and the bottom threshold; and
data representing distances from the midpoint to the top and bottom thresholds are encoded.
25. The method of claim 17, further comprising encoding a geometric data unit header (GDH) comprising data that overrides or refines data of a data structure for at least one of the top threshold or the bottom threshold.
26. The method of claim 8, wherein determining the top and bottom thresholds comprises:
determining a maximum histogram height value max_box_t;
determining a minimum histogram height value min_box_t;
determining a histogram scale value hist_scale;
determining a histogram bin size value hist_bin_size according to int ((max_box_t-min_box_t)/hist_scale);
generating a histogram of points having a height value in a range from min_box_t to max_box_t;
calculating a standard deviation of the histogram;
determining a bin having the largest number of height values in the histogram; and
the top threshold and the bottom threshold are determined from an offset from the bin having the largest number of height values, the offset being defined according to a respective multiple of the standard deviation.
27. The method of claim 1, wherein encoding the ground point and the object point comprises encoding the ground point and the object point.
28. The method of claim 27, further comprising generating a bitstream comprising encoded data representing the ground point and the object point.
29. The method of claim 1, wherein encoding and decoding the ground points and the object points comprises decoding the ground points and the object points.
30. An apparatus for encoding point cloud data, the apparatus comprising:
a memory configured to store data representing points of a point cloud; and
one or more processors implemented in circuitry, the one or more processors configured to:
determining a height value of a point in the point cloud;
classifying the points into a ground point set or an object point set according to the height value; and
and encoding and decoding the ground points and the object points according to the classification.
31. The apparatus of claim 30, wherein to encode the object point, the one or more processors are configured to:
deriving a global motion information set of the object points; and
the global motion information set is used to predict the object point.
32. The apparatus of claim 31, wherein the one or more processors are configured to derive the global motion information set only for the object points.
33. The device of claim 31, wherein the global motion information set comprises a first global motion information set, and wherein to codec the ground point, the one or more processors are configured to:
deriving a second global motion information set for the ground points; and
the ground point is predicted using the second set of global motion information.
34. The device of claim 33, wherein the one or more processors are configured to derive the second set of global motion information only for the ground points.
35. The device of claim 31, wherein to derive the global motion information set, the one or more processors are configured to derive a rotation matrix and a translation vector, and wherein to codec the object point, the one or more processors are configured to apply the rotation matrix and the translation vector to a reference point of a reference frame.
36. The device of claim 35, wherein to encode the object point, the one or more processors are further configured to:
determining local node motion vectors for nodes of a prediction tree, the nodes comprising respective sets of reference points of the reference frame; and
The local node motion vector is applied to the node.
37. The device of claim 30, wherein to classify the point, the one or more processors are configured to:
determining a top threshold and a bottom threshold;
classifying points having a height value between the top threshold and the bottom threshold into the ground point set; and
points having a height value above the top threshold or below the bottom threshold are classified into the set of object points.
38. The apparatus of claim 37, wherein the top threshold comprises z_max i The bottom threshold includes an ith value range { (x_min) i ,x_max i ),(y_min i ,y_max i ),(z_min i ,z_max i ) Z_min of } i
39. The apparatus of claim 38, wherein the ith range of values comprises an ith range of values in N ranges of values.
40. The apparatus of claim 38, wherein x_min i And y_min i Having a value of minus infinity, and x_max i And y_max i With infinite values.
41. The apparatus of claim 37, wherein for the ground point and the object point, the one or more processors are further configured to:
quantifying the ground points and the object points by a scaling factor; and
The top threshold and the bottom threshold are quantized by the scaling factor.
42. The device of claim 37, wherein the one or more processors are further configured to codec a data structure comprising data representing at least one of the top threshold or the bottom threshold.
43. The apparatus of claim 30, wherein to encode the ground point and the object point, the one or more processors are configured to encode the ground point and the object point.
44. The apparatus of claim 43, wherein the one or more processors are further configured to generate a bitstream comprising encoded data representing the ground points and the object points.
45. The apparatus of claim 30, wherein to encode the ground point and the object point, the one or more processors are configured to decode the ground point and the object point.
46. A computer-readable storage medium having stored thereon instructions that, when executed, cause a processor to:
determining a height value of a point in the point cloud;
classifying the points into a ground point set or an object point set according to the height value; and
And encoding and decoding the ground points and the object points according to the classification.
47. An apparatus for encoding point cloud data, the apparatus comprising:
means for determining a height value of a point in the point cloud;
means for classifying the points as a set of ground points or a set of object points according to the height value; and
and means for encoding and decoding the ground points and the object points according to the classification.
CN202180088307.XA 2020-12-29 2021-12-22 Global motion estimation using road and ground object markers for geometry-based point cloud compression Pending CN116648914A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US63/131,637 2020-12-29
US63/171,945 2021-04-07
US17/558,362 2021-12-21
US17/558,362 US11949909B2 (en) 2020-12-29 2021-12-21 Global motion estimation using road and ground object labels for geometry-based point cloud compression
PCT/US2021/064869 WO2022146827A2 (en) 2020-12-29 2021-12-22 Global motion estimation using road and ground object labels for geometry-based point cloud compression

Publications (1)

Publication Number Publication Date
CN116648914A true CN116648914A (en) 2023-08-25

Family

ID=87615709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180088307.XA Pending CN116648914A (en) 2020-12-29 2021-12-22 Global motion estimation using road and ground object markers for geometry-based point cloud compression

Country Status (1)

Country Link
CN (1) CN116648914A (en)

Similar Documents

Publication Publication Date Title
CN115315725A (en) Laser angle decoding of angular and azimuthal modes in geometry-based point cloud compression
CN115298698A (en) Decoding of laser angles for angle and azimuth modes in geometry-based point cloud compression
US20230105931A1 (en) Inter prediction coding with radius interpolation for predictive geometry-based point cloud compression
EP4272166A1 (en) Hybrid-tree coding for inter and intra prediction for geometry coding
CN116250010A (en) Laser index clipping in predictive geometric coding for point cloud compression
WO2022147100A1 (en) Inter prediction coding for geometry point cloud compression
CN115315724A (en) Angular pattern simplification for geometry-based point cloud compression
US20220215596A1 (en) Model-based prediction for geometry point cloud compression
WO2023059987A1 (en) Inter prediction coding with radius interpolation for predictive geometry-based point cloud compression
US11949909B2 (en) Global motion estimation using road and ground object labels for geometry-based point cloud compression
CN116648914A (en) Global motion estimation using road and ground object markers for geometry-based point cloud compression
US20230177739A1 (en) Local adaptive inter prediction for g-pcc
US20220210480A1 (en) Hybrid-tree coding for inter and intra prediction for geometry coding
US20230345044A1 (en) Residual prediction for geometry point cloud compression
WO2022146827A2 (en) Global motion estimation using road and ground object labels for geometry-based point cloud compression
US20230230290A1 (en) Prediction for geometry point cloud compression
US20230099908A1 (en) Coding point cloud data using direct mode for inter-prediction in g-pcc
US20230345045A1 (en) Inter prediction coding for geometry point cloud compression
US20240185470A1 (en) Decoding attribute values in geometry-based point cloud compression
US20230018907A1 (en) Occupancy coding using inter prediction in geometry point cloud compression
WO2023205318A1 (en) Improved residual prediction for geometry point cloud compression
WO2023102484A1 (en) Local adaptive inter prediction for g-pcc
CN116636204A (en) Mixed tree coding for inter and intra prediction for geometric coding
WO2022147008A1 (en) Model-based prediction for geometry point cloud compression
CN116711313A (en) Inter-prediction codec for geometric point cloud compression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40093381

Country of ref document: HK