CN118056404A - Point cloud data transmitting method, point cloud data transmitting device, point cloud data receiving method and point cloud data receiving device - Google Patents

Point cloud data transmitting method, point cloud data transmitting device, point cloud data receiving method and point cloud data receiving device Download PDF

Info

Publication number
CN118056404A
CN118056404A CN202280067455.8A CN202280067455A CN118056404A CN 118056404 A CN118056404 A CN 118056404A CN 202280067455 A CN202280067455 A CN 202280067455A CN 118056404 A CN118056404 A CN 118056404A
Authority
CN
China
Prior art keywords
point
point cloud
frame
frames
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202280067455.8A
Other languages
Chinese (zh)
Inventor
李受娟
许惠桢
朴宥宣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
LG Electronics Inc
Original Assignee
LG Electronics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by LG Electronics Inc filed Critical LG Electronics Inc
Priority claimed from PCT/KR2022/015035 external-priority patent/WO2023059089A1/en
Publication of CN118056404A publication Critical patent/CN118056404A/en
Pending legal-status Critical Current

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The point cloud data transmission method according to an embodiment may include the steps of: performing data encoding on the point cloud; and transmitting a bitstream including the point cloud data. Further, the point cloud data transmitting apparatus according to the embodiment may include: an encoder for encoding the point cloud data; and a transmitter for transmitting a bitstream comprising the point cloud data.

Description

Point cloud data transmitting method, point cloud data transmitting device, point cloud data receiving method and point cloud data receiving device
Technical Field
Embodiments relate to methods and apparatus for processing point cloud content.
Background
The point cloud content is content represented by a point cloud, which is a set of points belonging to a coordinate system representing a three-dimensional space. The point cloud content may express media configured in three dimensions and is used to provide various services such as Virtual Reality (VR), augmented Reality (AR), mixed Reality (MR), and self-driving services. However, tens of thousands to hundreds of thousands of point data are required to represent the point cloud content. Therefore, a method of efficiently processing a large amount of point data is required.
Disclosure of Invention
Technical problem
Embodiments provide an apparatus and method for efficiently processing point cloud data. The embodiment provides a point cloud data processing method and device for solving delay and encoding/decoding complexity.
The technical scope of the embodiments is not limited to the above technical purpose, but extends to other technical purposes that can be inferred by those skilled in the art based on the entire disclosure herein.
Technical proposal
To achieve these objects and other advantages and in accordance with the purpose of the disclosure, as embodied and broadly described herein, a method of transmitting point cloud data may include: encoding the point cloud data; and transmitting a bitstream containing the point cloud data. In another aspect of the disclosure, a method of receiving point cloud data may include: receiving a bit stream comprising point cloud data; and decoding the point cloud data.
Advantageous effects
The apparatus and method according to the embodiment can efficiently process point cloud data.
The apparatus and method according to the embodiments may provide a high quality point cloud service.
Apparatuses and methods according to embodiments may provide point cloud content for providing general services such as VR services and self-driving services.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure. For a better understanding of the various embodiments described below, reference should be made to the description of the embodiments below in conjunction with the accompanying drawings. The same reference numbers will be used throughout the drawings to refer to the same or like parts.
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure. For a better understanding of the various embodiments described below, reference should be made to the description of the embodiments below in conjunction with the accompanying drawings. The same reference numbers will be used throughout the drawings to refer to the same or like parts. In the drawings:
FIG. 1 illustrates an exemplary point cloud content providing system according to an embodiment.
Fig. 2 is a block diagram illustrating a point cloud content providing operation according to an embodiment.
Fig. 3 illustrates an exemplary process of capturing point cloud video according to an embodiment.
FIG. 4 illustrates an exemplary point cloud encoder according to an embodiment.
Fig. 5 shows an example of voxels according to an embodiment.
Fig. 6 shows an example of an octree and occupancy code according to an embodiment.
Fig. 7 shows an example of a neighbor node pattern according to an embodiment.
Fig. 8 illustrates an example of point configuration in each LOD according to an embodiment.
Fig. 9 illustrates an example of point configuration in each LOD according to an embodiment.
Fig. 10 illustrates a point cloud decoder according to an embodiment.
Fig. 11 illustrates a point cloud decoder according to an embodiment.
Fig. 12 illustrates a transmitting apparatus according to an embodiment.
Fig. 13 illustrates a receiving apparatus according to an embodiment.
Fig. 14 illustrates an exemplary structure operable in conjunction with a point cloud data transmission/reception method/apparatus according to an embodiment.
Fig. 15 illustrates inter prediction according to an embodiment.
Fig. 16 illustrates a frame group according to an embodiment.
Fig. 17 illustrates a method of forward referencing for inter prediction according to an embodiment.
Fig. 18 illustrates a method for backward referencing for inter prediction according to an embodiment.
Fig. 19 illustrates a method for bi-directional reference for inter prediction according to an embodiment.
Fig. 20 illustrates forward inter prediction according to an embodiment.
Fig. 21 illustrates forward inter prediction according to an embodiment.
Fig. 22 illustrates forward inter prediction according to an embodiment.
Fig. 23 illustrates bi-directional inter prediction according to an embodiment.
Fig. 24 illustrates bi-directional inter-prediction according to an embodiment.
Fig. 25 illustrates bi-directional inter prediction according to an embodiment.
Fig. 26 illustrates backward inter prediction according to an embodiment.
Fig. 27 illustrates backward inter prediction according to an embodiment.
Fig. 28 illustrates backward inter prediction according to an embodiment.
Fig. 29 illustrates a GOF according to an embodiment.
Fig. 30 illustrates accumulating reference frames according to an embodiment.
Fig. 31 illustrates a method of predicting a current point based on a cumulative reference frame according to an embodiment.
Fig. 32 illustrates an encoded bitstream according to an embodiment.
Fig. 33 illustrates an exemplary syntax of a sequence parameter set (seq_parameter_set) according to an embodiment.
Fig. 34 illustrates an exemplary syntax of a geometry_parameter_set according to an embodiment.
Fig. 35 illustrates an exemplary syntax of an attribute parameter set (attribute_parameter_set) according to an embodiment.
Fig. 36 to 38 illustrate a process of a method of predicting a current point according to an embodiment.
Fig. 39 illustrates a method of predicting a current point according to an embodiment.
Fig. 40 illustrates a method of predicting a current point according to an embodiment.
Fig. 41 shows an exemplary syntax of a sequence parameter set (seq_parameter_set).
FIG. 42 illustrates an exemplary syntax of a geometry_parameter_set according to an embodiment.
Fig. 43 shows an example of an attribute parameter set (attribute_parameter_set) according to an embodiment.
Fig. 44 is a flowchart illustrating an apparatus/method for transmitting point cloud data according to an embodiment.
Fig. 45 is a flowchart illustrating an apparatus/method for receiving point cloud data according to an embodiment.
Fig. 46 is a flowchart illustrating an apparatus/method for transmitting point cloud data according to an embodiment.
Fig. 47 is a flowchart illustrating an apparatus/method for receiving point cloud data according to an embodiment.
Fig. 48 is a flowchart illustrating a method of transmitting point cloud data according to an embodiment. And
Fig. 49 is a flowchart illustrating a method of receiving point cloud data according to an embodiment.
Detailed Description
Reference will now be made in detail to the preferred embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings. The detailed description set forth below in connection with the appended drawings is intended as a description of exemplary embodiments of the present disclosure and is not intended to illustrate the only embodiments that may be implemented in accordance with the present disclosure. The following detailed description includes specific details to provide a thorough understanding of the present disclosure. It will be apparent, however, to one skilled in the art that the present disclosure may be practiced without these specific details.
Although most terms used in the present disclosure are selected from general-purpose terms widely used in the art, some terms are arbitrarily selected by the applicant and their meanings are explained in detail as needed in the following description. Accordingly, the present disclosure should be understood based on the intended meaning of the terms rather than their simple names or meanings.
FIG. 1 illustrates an exemplary point cloud content providing system according to an embodiment.
The point cloud content providing system shown in fig. 1 may include a transmitting apparatus 10000 and a receiving apparatus 10004. The transmitting means 10000 and the receiving means 10004 can communicate by wire or wirelessly to transmit and receive point cloud data.
The point cloud data transmission apparatus 10000 according to the embodiment can acquire and process point cloud video (or point cloud content) and transmit it. According to an embodiment, the transmitting device 10000 may comprise a fixed station, a Base Transceiver System (BTS), a network, an Artificial Intelligence (AI) device and/or system, a robot, an AR/VR/XR device and/or a server. According to an embodiment, the transmitting device 10000 may include devices, robots, vehicles, AR/VR/XR devices, portable devices, home appliances, internet of things (IoT) devices, and AI devices/servers configured to perform communications with base stations and/or other wireless devices using radio access technologies (e.g., 5G New RAT (NR), long Term Evolution (LTE)).
The transmitting apparatus 10000 according to the embodiment includes a point cloud video acquirer 10001, a point cloud video encoder 10002, and/or a transmitter (or a communication module) 10003.
The point cloud video acquirer 10001 according to the embodiment acquires a point cloud video through a processing procedure such as capturing, synthesizing, or generating. Point cloud video is point cloud content represented by a point cloud, which is a collection of points located in 3D space, and may be referred to as point cloud video data. A point cloud video according to an embodiment may include one or more frames. One frame represents a still image/picture. Thus, a point cloud video may include a point cloud image/frame/picture, and may be referred to as a point cloud image, frame, or picture.
The point cloud video encoder 10002 according to the embodiment encodes the acquired point cloud video data. The point cloud video encoder 10002 may encode point cloud video data based on point cloud compression encoding. The point cloud compression encoding according to an embodiment may include geometry-based point cloud compression (G-PCC) encoding and/or video-based point cloud compression (V-PCC) encoding or next generation encoding. The point cloud compression encoding according to the embodiment is not limited to the above-described embodiment. The point cloud video encoder 10002 may output a bitstream that includes encoded point cloud video data. The bitstream may contain not only the encoded point cloud video data, but also signaling information related to the encoding of the point cloud video data.
The transmitter 10003 according to an embodiment transmits a bit stream containing encoded point cloud video data. The bit stream according to an embodiment is encapsulated in a file or fragment (e.g., a stream fragment) and transmitted via various networks such as a broadcast network and/or a broadband network. Although not shown in the drawings, the transmission apparatus 10000 may include an encapsulator (or an encapsulation module) configured to perform an encapsulation operation. According to an embodiment, the encapsulator may be included in the transmitter 10003. Depending on the implementation, the file or fragment may be sent to the receiving device 10004 via a network, or stored in a digital storage medium (e.g., USB, SD, CD, DVD, blu-ray, HDD, SSD, etc.). The transmitter 10003 according to the embodiment can communicate with the reception apparatus 10004 (or the receiver 10005) by wire/wireless via a network of 4G, 5G, 6G, or the like. In addition, the transmitter may perform necessary data processing operations according to a network system (e.g., a 4G, 5G, or 6G communication network system). The transmitting apparatus 10000 can transmit the encapsulated data in an on-demand manner.
The receiving means 10004 according to an embodiment comprises a receiver 10005, a point cloud video decoder 10006 and/or a renderer 10007. According to an embodiment, the receiving device 10004 may include devices, robots, vehicles, AR/VR/XR devices, portable devices, home appliances, internet of things (IoT) devices, and AI devices/servers configured to perform communications with base stations and/or other wireless devices using a radio access technology (e.g., 5G New RAT (NR), long Term Evolution (LTE)).
The receiver 10005 according to the embodiment receives a bit stream containing point cloud video data or a file/clip in which the bit stream is encapsulated from a network or a storage medium. The receiver 10005 can perform necessary data processing according to a network system (e.g., a communication network system of 4G, 5G, 6G, or the like). The receiver 10005 according to an embodiment may decapsulate the received file/clip and output a bitstream. According to an embodiment, the receiver 10005 may include a decapsulator (or decapsulation module) configured to perform decapsulation operations. The decapsulator may be implemented as an element (or component) separate from the receiver 10005.
The point cloud video decoder 10006 decodes a bit stream containing point cloud video data. The point cloud video decoder 10006 may decode point cloud video data according to the method by which the point cloud video data is encoded (e.g., as an inverse of the operation of the point cloud video encoder 10002). Thus, the point cloud video decoder 10006 may decode point cloud video data by performing point cloud decompression encoding (inverse of point cloud compression). The point cloud decompression coding includes G-PCC coding.
The renderer 10007 renders the decoded point cloud video data. The renderer 10007 may output point cloud content by rendering not only point cloud video data but also audio data. According to an implementation, the renderer 10007 may include a display configured to display point cloud content. According to an implementation, the display may be implemented as a separate device or component rather than being included in the renderer 10007.
The arrow indicated by a broken line in the figure indicates the transmission path of the feedback information acquired by the receiving device 10004. The feedback information is information reflecting interactivity with a user consuming the point cloud content, and includes information about the user (e.g., head orientation information, viewport information, etc.). Specifically, when the point cloud content is content for a service (e.g., a self-driving service or the like) that requires interaction with the user, feedback information may be provided to a content sender (e.g., the transmission apparatus 10000) and/or a service provider. The feedback information may be used in the receiving apparatus 10004 and the transmitting apparatus 10000 or may not be provided according to the embodiment.
The head orientation information according to the embodiment is information about the head position, orientation, angle, movement, etc. of the user. The receiving apparatus 10004 according to the embodiment may calculate viewport information based on the head orientation information. The viewport information may be information about an area of the point cloud video that the user is viewing. The viewpoint is the point through which the user views the point cloud video and may refer to the center point of the viewport region. That is, the viewport is a region centered at a viewpoint, and the size and shape of the region may be determined by a field of view (FOV). Thus, in addition to the head orientation information, the receiving device 10004 may also extract viewport information based on the vertical or horizontal FOV supported by the device. In addition, the receiving device 10004 performs gaze analysis or the like to check a manner in which the user consumes the point cloud, an area in which the user gazes in the point cloud video, gaze time, or the like. According to an embodiment, the receiving device 10004 may send feedback information including gaze analysis results to the sending device 10000. Feedback information according to embodiments may be obtained during rendering and/or display. The feedback information according to an embodiment may be acquired by one or more sensors included in the receiving device 10004. Depending on the implementation, the feedback information may be retrieved by the renderer 10007 or a separate external element (or device, component, etc.). The dashed line in fig. 1 represents a procedure of transmitting feedback information acquired by the renderer 10007. The point cloud content providing system may process (encode/decode) the point cloud data based on the feedback information. Accordingly, the point cloud video data decoder 10006 may perform decoding operations based on feedback information. The receiving device 10004 may send feedback information to the sending device 10000. The transmitting apparatus 10000 (or the point cloud video data encoder 10002) may perform an encoding operation based on feedback information. Accordingly, the point cloud content providing system can efficiently process necessary data (e.g., point cloud data corresponding to the head position of the user) based on the feedback information instead of processing (encoding/decoding) the entire point cloud data, and provide the point cloud content to the user.
According to an embodiment, the transmitting apparatus 10000 may be referred to as an encoder, a transmitting apparatus, a transmitter, etc., and the receiving apparatus 10004 may be referred to as a decoder, a receiving apparatus, a receiver, etc.
The point cloud data processed (through a series of processes of acquisition/encoding/transmission/decoding/rendering) in the point cloud content providing system of fig. 1 according to an embodiment may be referred to as point cloud content data or point cloud video data. According to an embodiment, the point cloud content data may be used as a concept covering metadata or signaling information related to the point cloud data.
The elements of the point cloud content providing system shown in fig. 1 may be implemented by hardware, software, a processor, and/or a combination thereof.
Fig. 2 is a block diagram illustrating a point cloud content providing operation according to an embodiment.
Fig. 2 is a block diagram illustrating an operation of the point cloud content providing system described in fig. 1. As described above, the point cloud content providing system may process point cloud data based on point cloud compression encoding (e.g., G-PCC).
The point cloud content providing system (e.g., the point cloud transmitting apparatus 10000 or the point cloud video acquirer 10001) according to the embodiment may acquire a point cloud video (20000). The point cloud video is represented by a point cloud belonging to a coordinate system for expressing the 3D space. Point cloud video according to an embodiment may include Ply (preton file format or Stanford Triangle format) files. When the point cloud video has one or more frames, the acquired point cloud video may include one or more Ply files. The Ply file contains point cloud data such as point geometry and/or attributes. The geometry includes the location of the points. The location of each point may be represented by a parameter (e.g., X, Y and Z-axis values) representing a three-dimensional coordinate system (e.g., a coordinate system consisting of X, Y and Z-axes). The attributes include attributes of points (e.g., information about texture, color (YCbCr or RGB), reflectivity r, transparency, etc. of each point). The points have one or more attributes. For example, a dot may have a color attribute or both color and reflectivity attributes. According to an embodiment, geometry may be referred to as location, geometry information, geometry data, etc., and attributes may be referred to as attributes, attribute information, attribute data, etc. The point cloud content providing system (e.g., the point cloud transmitting apparatus 10000 or the point cloud video acquirer 10001) may acquire point cloud data from information (e.g., depth information, color information, etc.) related to the point cloud video acquisition process.
The point cloud content providing system (e.g., the transmitting apparatus 10000 or the point cloud video encoder 10002) according to the embodiment may encode the point cloud data (20001). The point cloud content providing system may encode the point cloud data based on point cloud compression encoding. As described above, the point cloud data may include the geometry and attributes of the points. Thus, the point cloud content providing system may perform geometric encoding that encodes geometry and output a geometric bitstream. The point cloud content providing system may perform attribute encoding that encodes attributes and output an attribute bit stream. According to an embodiment, the point cloud content providing system may perform attribute encoding based on geometric encoding. The geometric bit stream and the attribute bit stream according to the embodiment may be multiplexed and output as one bit stream. The bit stream according to an embodiment may also contain signaling information related to geometric coding and attribute coding.
The point cloud content providing system (e.g., the transmitting apparatus 10000 or the transmitter 10003) according to the embodiment may transmit encoded point cloud data (20002). As shown in fig. 1, the encoded point cloud data may be represented by a geometric bit stream and an attribute bit stream. In addition, the encoded point cloud data may be transmitted in the form of a bitstream together with signaling information related to encoding of the point cloud data (e.g., signaling information related to geometric encoding and attribute encoding). The point cloud content providing system may encapsulate and transmit a bitstream carrying encoded point cloud data in the form of a file or a fragment.
A point cloud content providing system (e.g., receiving device 10004 or receiver 10005) according to an embodiment may receive a bitstream containing encoded point cloud data. In addition, the point cloud content providing system (e.g., the receiving device 10004 or the receiver 10005) may demultiplex the bit stream.
The point cloud content providing system (e.g., receiving device 10004 or point cloud video decoder 10005) may decode encoded point cloud data (e.g., geometric bit stream, attribute bit stream) sent in a bit stream. The point cloud content providing system (e.g., the receiving device 10004 or the point cloud video decoder 10005) may decode the point cloud video data based on signaling information contained in the bitstream related to the encoding of the point cloud video data. The point cloud content providing system (e.g., the receiving device 10004 or the point cloud video decoder 10005) may decode a geometric bitstream to reconstruct the location (geometry) of the point. The point cloud content providing system may reconstruct the attributes of the points by decoding the attribute bit stream based on the reconstructed geometry. The point cloud content providing system (e.g., the receiving device 10004 or the point cloud video decoder 10005) may reconstruct the point cloud video based on location according to the reconstructed geometry and the decoded properties.
The point cloud content providing system (e.g., the receiving device 10004 or the renderer 10007) according to an embodiment may render decoded point cloud data (20004). The point cloud content providing system (e.g., receiving device 10004 or renderer 10007) may use various rendering methods to render geometry and attributes decoded by the decoding process. Points in the point cloud content may be rendered as vertices having a particular thickness, cubes having a particular minimum size centered at corresponding vertex positions, or circles centered at corresponding vertex positions. All or part of the rendered point cloud content is provided to the user via a display (e.g., VR/AR display, general display, etc.).
The point cloud content providing system (e.g., receiving device 10004) according to the embodiment may obtain feedback information (20005). The point cloud content providing system may encode and/or decode the point cloud data based on the feedback information. The feedback information and operation of the point cloud content providing system according to the embodiment are the same as those described with reference to fig. 1, and thus detailed description thereof is omitted.
Fig. 3 illustrates an exemplary process of capturing point cloud video according to an embodiment.
Fig. 3 illustrates an exemplary point cloud video capturing process of the point cloud content providing system described with reference to fig. 1 to 2.
The point cloud content includes point cloud videos (images and/or videos) representing objects and/or environments located in various 3D spaces (e.g., 3D spaces representing real environments, 3D spaces representing virtual environments, etc.). Thus, the point cloud content providing system according to an embodiment may capture point cloud video using one or more cameras (e.g., an infrared camera capable of acquiring depth information, an RGB camera capable of extracting color information corresponding to the depth information, etc.), projectors (e.g., an infrared pattern projector that acquires depth information), liDAR, etc. The point cloud content providing system according to an embodiment may extract geometry composed of points in the 3D space from the depth information and extract attributes of the respective points from the color information to acquire the point cloud data. Images and/or videos according to embodiments may be captured based on at least one of an inward-facing technique and an outward-facing technique.
The left part of fig. 3 shows the inward facing technique. Inward facing techniques refer to techniques that capture an image of a center object with one or more cameras (or camera sensors) positioned around the center object. The inward facing techniques may be used to generate point cloud content that provides a 360 degree image of a key object to a user (e.g., VR/AR content that provides a 360 degree image of an object (e.g., a key object such as a character, player, object, or actor) to a user).
The right-hand part of fig. 3 shows the outward facing technique. Outward facing techniques refer to techniques that utilize one or more cameras (or camera sensors) positioned around a central object to capture an image of the environment of the central object rather than the central object. The outward facing technique may be used to generate point cloud content that provides an ambient environment that appears from the perspective of the user (e.g., content that may be provided to the user of the self-driving vehicle that represents an external environment).
As shown, the point cloud content may be generated based on capture operations of one or more cameras. In this case, the coordinate system may differ between cameras, so the point cloud content providing system may calibrate one or more cameras to set the global coordinate system prior to the capture operation. In addition, the point cloud content providing system may generate point cloud content by compositing any image and/or video with the image and/or video captured by the above-described capturing technique. The point cloud content providing system may not perform the capturing operation described in fig. 3 when generating the point cloud content representing the virtual space. A point cloud content providing system according to an embodiment may perform post-processing on captured images and/or video. In other words, the point cloud content providing system may remove unwanted areas (e.g., background), identify the space to which the captured image and/or video is connected, and perform an operation of filling the space hole when the space hole exists.
The point cloud content providing system may generate one piece of point cloud content by performing coordinate transformation on points of the point cloud video acquired from the respective cameras. The point cloud content providing system may perform coordinate transformation on the points based on the position coordinates of the respective cameras. Thus, the point cloud content providing system may generate content representing a wide range, or may generate point cloud content having a high density of points.
FIG. 4 illustrates an exemplary point cloud encoder according to an embodiment.
Fig. 4 shows an example of the point cloud video encoder 10002 of fig. 1. The point cloud encoder reconstructs and encodes the point cloud data (e.g., the location and/or attributes of the points) to adjust the quality (e.g., lossless, lossy, or near lossless) of the point cloud content according to network conditions or applications. When the total size of the point cloud content is large (e.g., 60Gbps for 30fps point cloud content), the point cloud content providing system may not be able to stream the content in real time. Accordingly, the point cloud content providing system may reconstruct the point cloud content based on the maximum target bit rate to provide the point cloud content according to a network environment or the like.
As described with reference to fig. 1 and 2, the point cloud encoder may perform geometric encoding and attribute encoding. The geometric encoding is performed before the attribute encoding.
The point cloud encoder according to the embodiment includes a coordinate transformer (transform coordinates) 40000, a quantizer (quantize and remove points (voxelization)) 40001, an octree analyzer (analyze octree) 40002 and a surface approximation analyzer (analyze surface approximation) 40003, an arithmetic encoder (arithmetic coding) 40004, a geometry reconstructor (reconstruction geometry) 40005, a color transformer (transform color) 40006, an attribute transformer (transform attribute) 40007, RAHT transformer (RAHT) 40008, a LOD generator (generate LOD) 40009, a lifting transformer (lifting) 40010, a coefficient quantizer (quantized coefficient) 40011 and/or an arithmetic encoder (arithmetic coding) 40012.
The coordinate transformer 40000, quantizer 40001, octree analyzer 40002, surface approximation analyzer 40003, arithmetic encoder 40004, and geometry reconstructor 40005 may perform geometry encoding. Geometric coding according to an embodiment may include octree geometric coding, direct coding, triplet geometric coding, and entropy coding. Direct encoding and triplet geometry encoding are applied selectively or in combination. Geometric coding is not limited to the above examples.
As shown, a coordinate transformer 40000 according to an embodiment receives a position and transforms it into coordinates. For example, the position may be transformed into position information in a three-dimensional space (e.g., a three-dimensional space represented by an XYZ coordinate system). The positional information in the three-dimensional space according to the embodiment may be referred to as geometric information.
The quantizer 40001 according to the embodiment quantizes the geometry. For example, the quantizer 40001 may quantize the points based on the minimum position values (e.g., minimum values on each of X, Y and Z-axis) of all the points. The quantizer 40001 performs quantization operations: the difference between the minimum position value and the position value of each point is multiplied by a preset quantization scale value, and then the nearest integer value is found by rounding the value obtained by the multiplication. Thus, one or more points may have the same quantized position (or position value). The quantizer 40001 according to the embodiment performs voxelization based on the quantization position to reconstruct quantization points. As in the case of pixels (the smallest unit containing 2D image/video information), points of point cloud content (or 3D point cloud video) according to an embodiment may be included in one or more voxels. As a complex of volume and pixels, the term voxel refers to a 3D cubic space generated when the 3D space is divided into units (unit=1.0) based on axes (e.g., X-axis, Y-axis, and Z-axis) representing the 3D space. The quantizer 40001 may match a set of points in 3D space with voxels. According to an embodiment, one voxel may comprise only one point. According to an embodiment, one voxel may comprise one or more points. To represent a voxel as a point, the position of the center of the voxel may be set based on the position of one or more points included in the voxel. In this case, the attributes of all positions included in one voxel may be combined and assigned to the voxel.
Octree analyzer 40002 according to an embodiment performs octree geometric encoding (or octree encoding) to present voxels in an octree structure. The octree structure represents points that match voxels based on the octree structure.
The surface approximation analyzer 40003 according to an embodiment may analyze and approximate octree. Octree analysis and approximation according to embodiments is a process of analyzing a region containing multiple points to efficiently provide octree and voxelization.
The arithmetic encoder 40004 according to an embodiment performs entropy encoding on the octree and/or the approximate octree. For example, the coding scheme includes arithmetic coding. As a result of the encoding, a geometric bitstream is generated.
The color transformer 40006, the attribute transformers 40007, RAHT transformer 40008, the LOD generator 40009, the lifting transformer 40010, the coefficient quantizer 40011, and/or the arithmetic encoder 40012 perform attribute encoding. As described above, a point may have one or more attributes. Attribute coding according to the embodiment is also applied to an attribute possessed by one point. However, when an attribute (e.g., color) includes one or more elements, attribute encoding is applied to each element independently. Attribute encoding according to an embodiment includes color transform encoding, attribute transform encoding, region Adaptive Hierarchical Transform (RAHT) encoding, interpolation-based hierarchical nearest neighbor prediction (predictive transform) encoding, and interpolation-based hierarchical nearest neighbor prediction (lifting transform) encoding with update/lifting steps. The RAHT codes, predictive transform codes, and lifting transform codes described above may be selectively used, or a combination of one or more coding schemes may be used, depending on the point cloud content. The attribute encoding according to the embodiment is not limited to the above example.
The color transformer 40006 according to an embodiment performs color transform coding that transforms color values (or textures) included in an attribute. For example, the color transformer 40006 may transform the format of the color information (e.g., from RGB to YCbCr). Alternatively, the operation of the color transformer 40006 according to the embodiment may be applied according to the color value included in the attribute.
The geometry reconstructor 40005 according to an embodiment reconstructs (decompresses) the octree and/or the approximate octree. The geometry reconstructor 40005 reconstructs the octree/voxel based on the result of the analysis of the point distribution. The reconstructed octree/voxel may be referred to as a reconstructed geometry (restored geometry).
The attribute transformer 40007 according to the embodiment performs attribute transformation to transform attributes based on the reconstructed geometry and/or the position where the geometry encoding is not performed. As described above, since the attribute depends on geometry, the attribute transformer 40007 can transform the attribute based on the reconstructed geometry information. For example, based on the position value of a point included in a voxel, the attribute transformer 40007 may transform the attribute of the point at that position. As described above, when the center position of the voxel is set based on the position of one or more points included in the voxel, the attribute transformer 40007 transforms the attribute of the one or more points. When performing triplet geometry encoding, attribute transformer 40007 may transform attributes based on the triplet geometry encoding.
The attribute transformer 40007 may perform attribute transformation by calculating an average of attributes or attribute values (e.g., color or reflectivity of each point) of neighboring points within a specific position/radius from a center position (or position value) of each voxel. The attribute transformer 40007 may apply weights according to distances from the center to various points when calculating the average. Thus, each voxel has a location and a calculated attribute (or attribute value).
The attribute transformer 40007 may search for neighboring points existing within a specific position/radius from the center position of each voxel based on a K-D tree or morton code. The K-D tree is a binary search tree and supports a data structure capable of managing points based on location so that Nearest Neighbor Searches (NNS) can be performed quickly. The morton code is generated by presenting coordinates (e.g., (x, y, z)) representing the 3D positions of all points as bit values and mixing the bits. For example, when the coordinates representing the point positions are (5,9,1), the bit values of the coordinates are (0101,1001,0001). 010001000111 are generated from the bit index by mixing the values in the order of z, y, and x. This value is represented as a decimal number 1095. That is, the morton code value of the point having the coordinates (5,9,1) is 1095. The attribute transformer 40007 may sort the points based on the morton code values and perform NNS through a depth-first traversal process. After the attribute transformation operation, a K-D tree or Morton code is used when NNS is needed in another transformation process for attribute encoding.
As shown, the transformed attributes are input to RAHT transformer 40008 and/or LOD generator 40009.
The RAHT transformer 40008 according to an embodiment performs RAHT encoding for prediction attribute information based on reconstructed geometric information. For example, RAHT transformer 40008 may predict attribute information for nodes of a higher level in an octree based on attribute information associated with nodes of a lower level in the octree.
The LOD generator 40009 according to an embodiment generates a level of detail (LOD) to perform predictive transform coding. LOD according to an embodiment is the level of detail of the point cloud content. As the LOD value decreases, the detail of the point cloud content is indicated to be degraded. As the LOD value increases, details indicative of point cloud content are enhanced. Points may be classified by LOD.
The lifting transformer 40010 according to the embodiment performs lifting transform coding that transforms the point cloud attributes based on weights. As described above, lifting transform coding may optionally be applied.
The coefficient quantizer 40011 according to the embodiment quantizes the attribute of the attribute code based on the coefficient.
The arithmetic encoder 40012 according to the embodiment encodes the quantized attribute based on arithmetic encoding.
Although not shown in the figures, the elements of the point cloud encoder of fig. 4 may be implemented by hardware, software, firmware, or a combination thereof, including one or more processors or integrated circuits configured to communicate with one or more memories included in the point cloud providing apparatus. The one or more processors may perform at least one of the operations and/or functions of the elements of the point cloud encoder of fig. 4 described above. Additionally, the one or more processors may operate or execute software programs and/or sets of instructions for performing the operations and/or functions of the elements of the point cloud encoder of fig. 4. The one or more memories according to embodiments may include high-speed random access memory, or include non-volatile memory (e.g., one or more disk storage devices, flash memory devices, or other non-volatile solid-state memory devices).
Fig. 5 shows an example of voxels according to an embodiment.
Fig. 5 shows voxels located in a 3D space represented by a coordinate system consisting of three axes (X-axis, Y-axis and Z-axis). As described with reference to fig. 4, a point cloud encoder (e.g., quantizer 40001) may perform voxelization. A voxel refers to a 3D cubic space generated when a 3D space is divided into units (unit=1.0) based on axes (e.g., X-axis, Y-axis, and Z-axis) representing the 3D space. Fig. 5 shows an example of voxels generated by an octree structure, wherein a cube axis aligned bounding box defined by two poles (0, 0) and (2 d,2 d) is recursively subdivided. One voxel comprises at least one point. The spatial coordinates of the voxels may be estimated from the positional relationship with the group of voxels. As described above, voxels have properties (e.g., color or reflectivity) similar to pixels of a 2D image/video. Details of the voxels are the same as those described with reference to fig. 4, and thus description thereof is omitted.
Fig. 6 shows an example of an octree and occupancy code according to an embodiment.
As described with reference to fig. 1-4, a point cloud content providing system (point cloud video encoder 10002) or a point cloud encoder (e.g., octree analyzer 40002) performs octree geometric encoding (or octree encoding) based on octree structures to efficiently manage regions and/or locations of voxels.
The top of fig. 6 shows an octree structure. The 3D space of the point cloud content according to an embodiment is represented by axes (e.g., X-axis, Y-axis, and Z-axis) of a coordinate system. The octree structure is created by recursive reuse of a cube axis alignment bounding box defined by two poles (0, 0) and (2 d,2d,2d). Here, 2 d may be set to a value constituting a minimum bounding box around all points of the point cloud content (or point cloud video). Here, d represents the depth of the octree. The value of d is determined in the following equation. In the following equation, (x int n,yint n,zint n) represents the position (or position value) of the quantized point.
As shown in the middle of the upper part of fig. 6, the entire 3D space may be divided into eight spaces according to partitions. Each divided space is represented by a cube having six faces. As shown in the upper right of fig. 6, each of the eight spaces is subdivided based on axes (e.g., X-axis, Y-axis, and Z-axis) of the coordinate system. Thus, each space is divided into eight smaller spaces. The smaller space divided is also represented by a cube having six faces. The segmentation scheme is applied until the leaf nodes of the octree become voxels.
The lower part of fig. 6 shows the octree occupancy code. An occupancy code of the octree is generated to indicate whether each of eight divided spaces generated by dividing one space contains at least one point. Thus, a single occupancy code is represented by eight child nodes. Each child node represents the occupation of the divided space, and the child node has a value of 1 bit. Thus, the occupied code is represented as an 8-bit code. That is, when at least one point is included in the space corresponding to the child node, the node is assigned a value of 1. When a point is not included in the space corresponding to the child node (space is empty), the node is assigned a value of 0. Since the occupation code shown in fig. 6 is 00100001, it is indicated that the spaces corresponding to the third and eighth child nodes among the eight child nodes each contain at least one point. As shown, each of the third and eighth child nodes has eight child nodes, and the child nodes are represented by 8-bit occupancy codes. The figure shows that the occupancy code of the third child node is 10000111 and that of the eighth child node is 01001111. A point cloud encoder (e.g., arithmetic encoder 40004) according to an embodiment may perform entropy encoding on the occupancy code. To increase compression efficiency, the point cloud encoder may perform intra/inter encoding on the occupied codes. A receiving device (e.g., receiving device 10004 or point cloud video decoder 10006) according to an embodiment reconstructs an octree based on an occupancy code.
A point cloud encoder (e.g., the point cloud encoder or octree analyzer 40002 of fig. 4) according to an embodiment may perform voxelization and octree encoding to store point locations. However, the dots are not always uniformly distributed in the 3D space, and thus there may be a specific area where fewer dots exist. Therefore, performing voxelization of the entire 3D space is inefficient. For example, when a specific region contains few points, voxelization does not need to be performed in the specific region.
Thus, for the specific region (or a node other than a leaf node of the octree) described above, the point cloud encoder according to the embodiment may skip voxelization and perform direct encoding to directly encode points included in the specific region. The coordinates of the direct encoding points according to the embodiment are referred to as direct encoding modes (DCMs). The point cloud encoder according to an embodiment may also perform triplet geometry encoding based on a surface model, which is to reconstruct the point locations in a specific region (or node) based on voxels. Triplet geometry is a geometry that represents an object as a series of triangular meshes. Thus, the point cloud decoder may generate a point cloud from the mesh surface. Direct encoding and triplet geometry encoding according to embodiments may be selectively performed. In addition, direct encoding and triplet geometry encoding according to an embodiment may be performed in combination with octree geometry encoding (or octree encoding).
In order to perform direct encoding, an option of using a direct mode to apply direct encoding should be enabled. The node to which the direct encoding is to be applied is not a leaf node, and there should be a point less than the threshold within a particular node. In addition, the total number of points to which direct encoding is to be applied should not exceed a preset threshold. When the above condition is satisfied, the point cloud encoder (or the arithmetic encoder 40004) according to the embodiment may perform entropy encoding on the point position (or the position value).
A point cloud encoder (e.g., surface approximation analyzer 40003) according to an embodiment may determine a particular level of the octree (a level less than the depth d of the octree), and may use a surface model from that level to perform triplet geometry encoding to reconstruct points in the node region based on voxels (triplet pattern). A point cloud encoder according to an embodiment may specify a level at which triplet geometry coding is to be applied. For example, when a particular level is equal to the depth of the octree, the point cloud encoder does not operate in triplet mode. In other words, the point cloud encoder according to an embodiment may operate in the triplet mode only when the specified level is less than the depth value of the octree. The 3D cubic region of a node of a designated level according to an embodiment is referred to as a block. A block may include one or more voxels. A block or voxel may correspond to a block. The geometry is represented as a surface within each block. Surfaces according to embodiments may intersect each edge of a block at most once.
One block has 12 sides, so there are at least 12 intersections in one block. Each intersection point is called a vertex. When there is at least one occupied voxel adjacent to the edge among all blocks sharing the edge, vertices present along the edge are detected. Occupied voxels according to an embodiment refer to voxels comprising points. The vertex position detected along an edge is the average position of the edge along all voxels adjacent to the edge among all blocks sharing the edge.
Once the vertex is detected, the point cloud encoder according to an embodiment may perform entropy encoding on the start point (x, y, z) of the edge, the direction vector (Δx, Δy, Δz) of the edge, and the vertex position value (relative position value within the edge). When triplet geometry coding is applied, a point cloud encoder (e.g., geometry reconstructor 40005) according to an embodiment can generate a restored geometry (reconstructed geometry) by performing triangle reconstruction, upsampling, and voxelization processes.
Vertices located at edges of the block define surfaces across the block. The surface according to an embodiment is a non-planar polygon. In the triangle reconstruction process, a surface represented by a triangle is reconstructed based on the start points of the edges, the direction vectors of the edges, and the position values of the vertices. The triangle reconstruction process is performed as follows: i) Calculating centroid values for the respective vertices, ii) subtracting the center values from the respective vertex values, and iii) estimating a sum of squares of values obtained by the subtraction.
②/>③/>
The minimum value of the sum is estimated, and the projection process is performed according to the axis having the minimum value. For example, when element x is smallest, each vertex is projected on the x-axis relative to the center of the block and on the (y, z) plane. When the value obtained by projection on the (y, z) plane is (ai, bi), the value of θ is estimated by atan2 (bi, ai), and vertices are ordered based on the value of θ. The table below shows the vertex combinations that create triangles from the number of vertices. Vertices are ordered from 1 to n. The table below shows that for four vertices, two triangles can be constructed from the vertex combination. The first triangle may be composed of vertices 1, 2, and 3 among the ordered vertices, and the second triangle may be composed of vertices 3, 4, and 1 among the ordered vertices.
Triangles formed from vertices ordered by 1, …, n
An upsampling process is performed to add points in the middle along the sides of the triangle and voxelization is performed. The added points are generated based on the upsampling factor and the width of the block. The added points are called refinement vertices. A point cloud encoder according to an embodiment may voxel the refined vertices. In addition, the point cloud encoder may perform attribute encoding based on the voxelized position (or position value).
Fig. 7 shows an example of a neighbor node pattern according to an embodiment.
In order to increase compression efficiency of the point cloud video, the point cloud encoder according to the embodiment may perform entropy encoding based on context adaptive arithmetic encoding.
As described with reference to fig. 1-6, a point cloud content providing system or point cloud encoder (e.g., point cloud video encoder 10002, point cloud encoder of fig. 4, or arithmetic encoder 40004) may immediately perform entropy encoding on the occupancy code. In addition, the point cloud content providing system or the point cloud encoder may perform entropy encoding (intra-frame encoding) based on the occupancy code of the current node and the occupancy of the neighboring node, or perform entropy encoding (inter-frame encoding) based on the occupancy code of the previous frame. A frame according to an embodiment represents a set of point cloud videos that are generated simultaneously. The compression efficiency of intra/inter coding according to an embodiment may depend on the number of neighboring nodes referenced. When the bits are increased, the operation becomes complicated, but the encoding may be biased to one side, which may increase compression efficiency. For example, when a 3-bit context is given, it is necessary to perform encoding using 2 3 =8 methods. The portion divided for encoding affects the implementation complexity. Therefore, it is necessary to satisfy the compression efficiency and complexity of an appropriate level.
Fig. 7 illustrates a process of obtaining an occupancy pattern based on occupancy of neighbor nodes. The point cloud encoder according to an embodiment determines occupancy of neighbor nodes of respective nodes of the octree and obtains values of neighbor patterns. The neighbor node pattern is used to infer the occupancy pattern of the node. The upper part of fig. 7 shows a cube corresponding to a node (a cube located in the middle) and six cubes sharing at least one face with the cube (neighbor nodes). The nodes shown in the graph are nodes of the same depth. The numbers shown in the figure represent weights (1, 2,4, 8, 16, and 32) associated with six nodes, respectively. Weights are assigned in turn according to the locations of neighboring nodes.
The lower part of fig. 7 shows neighbor node pattern values. The neighbor node pattern value is the sum of values multiplied by weights occupying neighbor nodes (neighbor nodes having points). Therefore, the neighbor node pattern value is 0 to 63. When the neighbor node pattern value is 0, it is indicated that there is no node having a point (unoccupied node) among the neighbor nodes of the node. When the neighbor node pattern value is 63, it indicates that all neighbor nodes are occupied nodes. As shown, since the neighbor nodes assigned with weights 1, 2, 4, and 8 are occupied nodes, the neighbor node pattern value is 15 (sum of 1, 2, 4, and 8). The point cloud encoder may perform encoding according to the neighbor node pattern value (e.g., 64 types of encoding may be performed when the neighbor node pattern value is 63). According to an embodiment, the point cloud encoder may reduce encoding complexity by changing neighbor node pattern values (e.g., based on a table changing 64 to 10 or 6).
Fig. 8 illustrates an example of point configuration in each LOD according to an embodiment.
As described with reference to fig. 1 to 7, the encoded geometry is reconstructed (decompressed) before performing the attribute encoding. When direct encoding is applied, the geometric reconstruction operation may include changing the placement of the directly encoded points (e.g., placing the directly encoded points in front of the point cloud data). When triplet geometry coding is applied, the geometry reconstruction process is performed by triangle reconstruction, upsampling and voxelization. Since the properties depend on geometry, property encoding is performed based on the reconstructed geometry.
A point cloud encoder (e.g., LOD generator 40009) may classify (reorganize) the points by LOD. The point cloud content corresponding to the LOD is shown. The leftmost picture in the figure represents the original point cloud content. The second picture from the left in the figure shows the distribution of points in the lowest LOD, and the rightmost picture in the figure shows the distribution of points in the highest LOD. That is, the points in the lowest LOD are sparsely distributed and the points in the highest LOD are densely distributed. That is, as the LOD increases in the direction indicated by the arrow indicated at the bottom of the figure, the space (or distance) between the points becomes narrower.
Fig. 9 shows an example of a point configuration for each LOD according to an embodiment.
As described with reference to fig. 1-8, a point cloud content providing system or point cloud encoder (e.g., point cloud video encoder 10002, point cloud encoder of fig. 4, or LOD generator 40009) may generate LODs. LOD is generated by reorganizing points into a set of refinement levels according to a set LOD distance value (or Euclidean distance set). The LOD generation process is performed not only by the point cloud encoder but also by the point cloud decoder.
The upper part of fig. 9 shows examples of points (P0 to P9) of the point cloud content distributed in the 3D space. In fig. 9, the original order indicates the order of the points P0 to P9 before LOD generation. In fig. 9, the order based on LOD represents the order of points generated from LOD. The dots are reorganized by LOD. In addition, a high LOD contains points belonging to a lower LOD. As shown in fig. 9, LOD0 contains P0, P5, P4, and P2.LOD1 contains points of LOD0, P1, P6, and P3.LOD2 contains the point of LOD0, the point of LOD1, P9, P8, and P7.
As described with reference to fig. 4, the point cloud encoder according to the embodiment may selectively or in combination perform predictive transform encoding, lifting transform encoding, and RAHT transform encoding.
The point cloud encoder according to the embodiment may generate a predictor for points to perform predictive transform encoding for setting a prediction attribute (or a prediction attribute value) of each point. That is, N predictors may be generated for N points. The predictor according to the embodiment may calculate the weight (=1/distance) based on the LOD value of each point, index information on neighboring points existing within a set distance of each LOD, and the distance to the neighboring points.
The prediction attribute (or attribute value) according to the embodiment is set as an average of values obtained by multiplying the attribute (or attribute value) (e.g., color, reflectance, etc.) of the neighbor point set in the predictor of each point by a weight (or weight value) calculated based on the distance to each neighbor point. The point cloud encoder (e.g., coefficient quantizer 40011) according to the embodiment may quantize and inverse quantize a residual (may be referred to as a residual attribute, a residual attribute value, or an attribute prediction residual) obtained by subtracting a prediction attribute (attribute value) from an attribute (attribute value) of each point. The quantization process is configured as shown in the following table.
TABLE 1
Attribute prediction residual quantization pseudocode
int PCCQuantization(int value,int quantStep){
if(value>=0){
return floor(value/quantStep+1.0/3.0);
}else{
return-floor(-value/quantStep+1.0/3.0);
}
}
TABLE 2
Attribute prediction residual inverse quantization pseudo code
int PCCInverseQuantization(int value,int quantStep){
if(quantStep==0){
return value;
}else{
return value*quantStep;
}
}
When predictors of respective points have neighbor points, a point cloud encoder (e.g., arithmetic encoder 40012) according to an embodiment may perform entropy encoding on quantized and inverse quantized residual values as described above. When the predictors of the respective points do not have neighbor points, the point cloud encoder (e.g., the arithmetic encoder 40012) according to the embodiment may perform entropy encoding on the attributes of the corresponding points without performing the above-described operations.
A point cloud encoder (e.g., lifting transformer 40010) according to an embodiment may generate predictors of respective points, set calculated LODs and register neighbor points in the predictors, and set weights according to distances to the neighbor points to perform lifting transform encoding. The lifting transform coding according to the embodiment is similar to the predictive transform coding described above, but differs in that weights are cumulatively applied to the attribute values. The procedure of cumulatively applying weights to attribute values according to the embodiment is configured as follows.
1) An array Quantization Weight (QW) is created for storing the weight values of the individual points. The initial value of all elements of the QW is 1.0. The QW value of the predictor index of the neighbor node registered in the predictor is multiplied by the weight of the predictor of the current point, and the values obtained by the multiplication are added.
2) And (3) lifting and predicting: the value obtained by multiplying the attribute value of the point by the weight is subtracted from the existing attribute value to calculate a predicted attribute value.
3) Temporary arrays called updateweight and update are created and initialized to zero.
4) The weights calculated by multiplying the weights calculated for all predictors by the weights stored in the QWs corresponding to the predictor index are accumulated with updateweight arrays as indexes of neighbor nodes. The value obtained by multiplying the attribute value of the neighbor node index by the calculated weight is accumulated with the update array.
5) And (3) lifting and updating: the attribute values of the update array of all predictors are divided by the weight values of the updateweight array of predictor indices and the existing attribute values are added to the values obtained by division.
6) The predicted attributes are calculated for all predictors by multiplying the attribute values updated by the boost update process by the weights (stored in the QW) updated by the boost prediction process. A point cloud encoder (e.g., coefficient quantizer 40011) according to an embodiment quantizes a prediction attribute value. In addition, a point cloud encoder (e.g., arithmetic encoder 40012) performs entropy encoding on the quantization attribute values.
A point cloud encoder (e.g., RAHT transformer 40008) according to an embodiment may perform RAHT transform coding, where attributes associated with nodes of a lower level in an octree are used to predict attributes of nodes of a higher level. RAHT transform coding is an example of attribute intra coding by octree backward scanning. The point cloud encoder according to an embodiment scans the entire region starting from the voxel and repeats the merging process of merging the voxels into larger blocks at each step until the root node is reached. The merging process according to the embodiment is performed only on the occupied node. The merging process is not performed on the null nodes. And executing a merging process on an upper node right above the empty node.
The following equation represents RAHT the transform matrix. In the case of the formula (I) of this patent,Average attribute value of voxels representing level l. /(I)Can be based on/>And/>To calculate. /(I)And/>The weight of (2) is/>And
Here the number of the elements is the number,Is a low pass value and is used in the next highest level of merging. /(I)Indicating a high pass coefficient. The high-pass coefficients of each step are quantized and subjected to entropy encoding (e.g., encoded by arithmetic encoder 400012). The weights are calculated as/> By/>And/>The root node is created as follows.
The value gDC is also quantized and entropy encoded like a high-pass coefficient.
Fig. 10 illustrates a point cloud decoder according to an embodiment.
The point cloud decoder illustrated in fig. 10 is an example of the point cloud video decoder 10006 described in fig. 1, and can perform the same or similar operations as those of the point cloud video decoder 10006 shown in fig. 1. As shown, the point cloud decoder may receive a geometric bitstream and an attribute bitstream contained in one or more bitstreams. The point cloud decoder includes a geometry decoder and an attribute decoder. The geometry decoder performs geometry decoding on the geometry bitstream and outputs decoded geometry. The attribute decoder performs attribute decoding based on the decoded geometry and attribute bit stream, and outputs the decoded attributes. The decoded geometry and the decoded properties are used to reconstruct the point cloud content (decoded point cloud).
Fig. 11 illustrates a point cloud decoder according to an embodiment.
The point cloud decoder illustrated in fig. 11 is an example of the point cloud decoder illustrated in fig. 10, and may perform a decoding operation, which is an inverse process of the encoding operation of the point cloud encoder illustrated in fig. 1 to 9.
As described with reference to fig. 1 and 10, the point cloud decoder may perform geometry decoding and attribute decoding. The geometry decoding is performed before the attribute decoding.
The point cloud decoder according to the embodiment includes an arithmetic decoder (arithmetic decoding) 11000, an octree synthesizer (synthetic octree) 11001, a surface approximation synthesizer (synthetic surface approximation) 11002, and a geometry reconstructor (reconstruction geometry) 11003, a coordinate inverse transformer (inverse transform coordinates) 11004, an arithmetic decoder (arithmetic decoding) 11005, an inverse quantizer (inverse quantization) 11006, RAHT transformer 11007, an LOD generator (generating LOD) 11008, an inverse raiser (inverse lifting) 11009, and/or a color inverse transformer (inverse transform color) 11010.
The arithmetic decoder 11000, the octree synthesizer 11001, the surface approximation synthesizer 11002, the geometry reconstructor 11003, and the inverse coordinate transformer 11004 may perform geometry decoding. Geometric decoding according to embodiments may include direct encoding and triplet geometric decoding. Direct encoding and triplet geometry decoding are selectively applied. The geometric decoding is not limited to the above example, and is performed as an inverse process of the geometric encoding described with reference to fig. 1 to 9.
The arithmetic decoder 11000 according to the embodiment decodes the received geometric bitstream based on arithmetic coding. The operation of the arithmetic decoder 11000 corresponds to the inverse process of the arithmetic encoder 40004.
The octree synthesizer 11001 according to an embodiment may generate an octree by acquiring an occupied code (or information on geometry acquired as a result of decoding) from the decoded geometry bitstream. The occupancy code is configured as described in detail with reference to fig. 1 to 9.
When triplet geometry coding is applied, the surface approximation synthesizer 11002 according to an embodiment may synthesize a surface based on the decoded geometry and/or the generated octree.
The geometry reconstructor 11003 according to an embodiment may regenerate geometry based on the surface and/or decoded geometry. As described with reference to fig. 1 to 9, direct encoding and triplet geometry encoding are selectively applied. Therefore, the geometry reconstructor 11003 directly imports and adds position information about the point to which the direct encoding is applied. When triplet geometry coding is applied, the geometry reconstructor 11003 may reconstruct the geometry by performing the reconstruction operations of the geometry reconstructor 40005 (e.g., triangle reconstruction, upsampling, and voxelization). Details are the same as those described with reference to fig. 6, and thus description thereof is omitted. The reconstructed geometry may include a point cloud picture or frame that does not contain attributes.
The inverse coordinate transformer 11004 according to an embodiment may acquire a point position by geometrically transforming coordinates based on reconstruction.
The arithmetic decoder 11005, the inverse quantizer 11006, RAHT transformer 11007, the LOD generator 11008, the inverse booster 11009, and/or the inverse color transformer 11010 may perform the attribute decoding described with reference to fig. 10. Attribute decoding according to an embodiment includes Region Adaptive Hierarchical Transform (RAHT) decoding, interpolation-based hierarchical nearest neighbor prediction (predictive transform) decoding, and interpolation-based hierarchical nearest neighbor prediction (lifting transform) decoding with update/lifting steps. The three decoding schemes described above may be selectively used, or a combination of one or more decoding schemes may be used. The attribute decoding according to the embodiment is not limited to the above example.
The arithmetic decoder 11005 according to the embodiment decodes the attribute bit stream by arithmetic coding.
The inverse quantizer 11006 according to the embodiment inversely quantizes information on the decoded attribute bit stream or an attribute taken as a result of decoding, and outputs an inversely quantized attribute (or attribute value). Inverse quantization may be selectively applied based on the property encoding of the point cloud encoder.
According to an embodiment, RAHT transformer 11007, LOD generator 11008, and/or inverse booster 11009 may process the reconstructed geometric and inverse quantized properties. As described above, RAHT the transformer 11007, LOD generator 11008, and/or inverse booster 11009 may selectively perform decoding operations corresponding to the encoding of the point cloud encoder.
The color inverse transformer 11010 according to the embodiment performs inverse transform encoding to inverse transform color values (or textures) included in the decoded attributes. The operation of the inverse color transformer 11010 may be selectively performed based on the operation of the color transformer 40006 of the point cloud encoder.
Although not shown in the figures, the elements of the point cloud decoder of fig. 11 may be implemented by hardware, software, firmware, or a combination thereof, including one or more processors or integrated circuits configured to communicate with one or more memories included in the point cloud providing apparatus. The one or more processors may perform at least one or more of the operations and/or functions of the elements of the point cloud decoder of fig. 11 described above. Additionally, the one or more processors may operate or execute software programs and/or sets of instructions for performing the operations and/or functions of the elements of the point cloud decoder of fig. 11.
Fig. 12 illustrates a transmitting apparatus according to an embodiment.
The transmitting apparatus shown in fig. 12 is an example of the transmitting apparatus 10000 (or the point cloud encoder of fig. 4) of fig. 1. The transmitting device shown in fig. 12 may perform one or more operations and methods identical or similar to those of the point cloud encoder described with reference to fig. 1 to 9. The transmitting apparatus according to the embodiment may include a data input unit 12000, a quantization processor 12001, a voxelization processor 12002, an octree occupation code generator 12003, a surface model processor 12004, an intra/inter encoding processor 12005, an arithmetic encoder 12006, a metadata processor 12007, a color conversion processor 12008, an attribute conversion processor 12009, a prediction/lifting/RAHT conversion processor 12010, an arithmetic encoder 12011, and/or a transmitting processor 12012.
The data input unit 12000 according to the embodiment receives or acquires point cloud data. The data input unit 12000 may perform the same or similar operations and/or acquisition methods as those of the point cloud video acquirer 10001 (or the acquisition process 20000 described with reference to fig. 2).
The data input unit 12000, quantization processor 12001, voxelization processor 12002, octree occupation code generator 12003, surface model processor 12004, intra/inter coding processor 12005, and arithmetic coder 12006 perform geometric coding. The geometric coding according to the embodiment is the same as or similar to that described with reference to fig. 1 to 9, and thus a detailed description thereof will be omitted.
The quantization processor 12001 according to an embodiment quantizes the geometry (e.g., the position value of a point). The operation and/or quantization of the quantization processor 12001 is the same as or similar to the operation and/or quantization of the quantizer 40001 described with reference to fig. 4. Details are the same as those described with reference to fig. 1 to 9.
The voxelization processor 12002 voxelizes quantized position values of points according to an embodiment. The voxelization processor 12002 may perform operations and/or processes that are the same as or similar to the operations and/or voxelization process of the quantizer 40001 described with reference to fig. 4. Details are the same as those described with reference to fig. 1 to 9.
The octree occupancy code generator 12003 according to an embodiment performs octree encoding based on the octree structure to the voxel-ized positions of the points. The octree occupancy code generator 12003 may generate occupancy codes. The octree occupancy code generator 12003 may perform the same or similar operations and/or methods as those of the point cloud encoder (or the octree analyzer 40002) described with reference to fig. 4 and 6. Details are the same as those described with reference to fig. 1 to 9.
The surface model processor 12004 according to an embodiment may perform triplet geometry encoding based on the surface model to reconstruct point locations in a particular region (or node) based on voxels. The surface model processor 12004 may perform the same or similar operations and/or methods as those of the point cloud encoder (e.g., the surface approximation analyzer 40003) described with reference to fig. 4. Details are the same as those described with reference to fig. 1 to 9.
An intra/inter encoding processor 12005 according to an embodiment may perform intra/inter encoding of point cloud data. The intra/inter encoding processor 12005 may perform the same or similar encoding as the intra/inter encoding described with reference to fig. 7. Details are the same as those described with reference to fig. 7. According to an embodiment, an intra/inter encoding processor 12005 may be included in the arithmetic encoder 12006.
The arithmetic encoder 12006 according to an embodiment performs entropy encoding on octree and/or approximate octree of point cloud data. For example, the coding scheme includes arithmetic coding. The arithmetic encoder 12006 performs the same or similar operations and/or methods as the operations and/or methods of the arithmetic encoder 40004.
The metadata processor 12007 according to the embodiment processes metadata (e.g., set values) about the point cloud data and provides it to necessary processes such as geometric coding and/or attribute coding. In addition, the metadata processor 12007 according to an embodiment may generate and/or process signaling information related to geometric coding and/or attribute coding. Signaling information according to an embodiment may be encoded separately from geometric encoding and/or attribute encoding. Signaling information according to an embodiment may be interleaved.
The color transform processor 12008, the attribute transform processor 12009, the prediction/lifting/RAHT transform processor 12010, and the arithmetic encoder 12011 perform attribute encoding. The attribute codes according to the embodiment are the same as or similar to those described with reference to fig. 1 to 9, and thus detailed description thereof is omitted.
The color transform processor 12008 according to the embodiment performs color transform encoding to transform color values included in attributes. The color transform processor 12008 may perform color transform encoding based on the reconstructed geometry. The geometry of the reconstruction is the same as described with reference to fig. 1 to 9. In addition, it performs the same or similar operations and/or methods as the operations and/or methods of color converter 40006 described with reference to fig. 4. Detailed description thereof is omitted.
The attribute transformation processor 12009 according to an embodiment performs attribute transformation to transform attributes based on the reconstructed geometry and/or the location where geometry encoding is not performed. The attribute transformation processor 12009 performs the same or similar operations and/or methods as those of the attribute transformer 40007 described with reference to fig. 4. Detailed description thereof is omitted. The prediction/lifting/RAHT transform processor 12010 according to an embodiment may encode the properties of the transform by any one or combination of RAHT encoding, predictive transform encoding, and lifting transform encoding. The prediction/lifting/RAHT transform processor 12010 performs at least one operation that is the same as or similar to the operation of the RAHT transformer 40008, LOD generator 40009, and lifting transformer 40010 described with reference to fig. 4. In addition, the predictive transform coding, the lifting transform coding, and RAHT transform coding are the same as those described with reference to fig. 1 to 9, and thus detailed descriptions thereof are omitted.
The arithmetic encoder 12011 according to the embodiment may encode the encoded attribute based on arithmetic encoding. The arithmetic encoder 12011 performs the same or similar operations and/or methods as the operations and/or methods of the arithmetic encoder 400012.
The transmission processor 12012 according to an embodiment may transmit individual bitstreams containing encoded geometric and/or encoded attribute and metadata information, or one bitstream configured with encoded geometric and/or encoded attribute and metadata information. When the encoded geometry and/or encoded attribute and metadata information according to an embodiment is configured as one bitstream, the bitstream may include one or more sub-bitstreams. The bitstream according to an embodiment may contain signaling information including a Sequence Parameter Set (SPS) for sequence level signaling, a Geometry Parameter Set (GPS) for geometry information coding signaling, an Attribute Parameter Set (APS) for attribute information coding signaling, and a Tile Parameter Set (TPS) for tile level signaling, and slice data. The slice data may include information about one or more slices. A slice according to an embodiment may include one geometric bitstream Geom00 and one or more attribute bitstreams Attr0 0 and Attr1 0.
Slice refers to a series of syntax elements representing all or part of an encoded point cloud frame.
TPS according to an embodiment may include information about each tile of the one or more tiles (e.g., coordinate information and height/size information about the bounding box). The geometric bitstream may contain a header and a payload. The header of the geometric bitstream according to an embodiment may contain a parameter set identifier (geom _parameter_set_id), a tile identifier (geom _tile_id), and a slice identifier (geom _slice_id) included in the GPS, and information about data contained in the payload. As described above, the metadata processor 12007 according to an embodiment may generate and/or process signaling information and transmit it to the transmission processor 12012. According to an embodiment, the element performing geometric encoding and the element performing attribute encoding may share data/information with each other as indicated by a dotted line. The transmission processor 12012 according to an embodiment may perform the same or similar operations and/or transmission methods as those of the transmitter 10003. Details are the same as those described with reference to fig. 1 and 2, and thus description thereof is omitted.
Fig. 13 illustrates a receiving apparatus according to an embodiment.
The receiving apparatus illustrated in fig. 13 is an example of the receiving apparatus 10004 of fig. 1 (or the point cloud decoder of fig. 10 and 11). The receiving apparatus shown in fig. 13 may perform one or more operations and methods identical or similar to those of the point cloud decoder described with reference to fig. 1 to 11.
The receiving apparatus according to the embodiment may include a receiver 13000, a receiving processor 13001, an arithmetic decoder 13002, an occupancy code-based octree reconstruction processor 13003, a surface model processor (triangle reconstruction, upsampling, voxelization) 13004, an inverse quantization processor 13005, a metadata parser 13006, an arithmetic decoder 13007, an inverse quantization processor 13008, a prediction/lifting/RAHT inverse transformation processor 13009, a color inverse transformation processor 13010, and/or a renderer 13011. Each decoding element according to an embodiment may perform an inverse of the operation of the corresponding encoding element according to an embodiment.
The receiver 13000 according to an embodiment receives point cloud data. The receiver 13000 can perform the same or similar operations and/or reception methods as the receiver 10005 of fig. 1. Detailed description thereof is omitted.
The receive processor 13001 according to an embodiment may obtain a geometric bitstream and/or an attribute bitstream from the received data. A receive processor 13001 may be included in the receiver 13000.
The arithmetic decoder 13002, the octree reconstruction processor 13003 based on the occupancy code, the surface model processor 13004, and the inverse quantization processor 13005 may perform geometric decoding. The geometric decoding according to the embodiment is the same as or similar to that described with reference to fig. 1 to 10, and thus a detailed description thereof is omitted.
The arithmetic decoder 13002 according to an embodiment may decode a geometric bitstream based on arithmetic coding. The arithmetic decoder 13002 performs the same or similar operations and/or encodings as the operations and/or encodings of the arithmetic decoder 11000.
The octree reconstruction processor 13003 based on the occupancy code according to the embodiment may reconstruct the octree by acquiring the occupancy code from the decoded geometry bitstream (or information on geometry taken as a result of decoding). The octree reconstruction processor 13003 performs the same or similar operations and/or methods as the octree synthesizer 11001 and/or octree generation method based on the occupancy code. When triplet geometry coding is applied, the surface model processor 13004 according to an embodiment may perform triplet geometry decoding and related geometry reconstruction (e.g., triangle reconstruction, upsampling, voxelization) based on the surface model method. The surface model processor 13004 performs the same or similar operations as the surface approximation synthesizer 11002 and/or the geometry reconstructor 11003.
The inverse quantization processor 13005 according to an embodiment inversely quantizes the decoded geometry.
The metadata parser 13006 according to an embodiment may parse metadata (e.g., a set value) included in the received point cloud data. The metadata parser 13006 may pass the metadata to geometry decoding and/or attribute decoding. The metadata is the same as that described with reference to fig. 12, and thus a detailed description thereof is omitted.
The arithmetic decoder 13007, the inverse quantization processor 13008, the prediction/lifting/RAHT inverse transform processor 13009, and the color inverse transform processor 13010 perform attribute decoding. The attribute decoding is the same as or similar to the attribute decoding described with reference to fig. 1 to 10, and thus a detailed description thereof is omitted.
The arithmetic decoder 13007 according to an embodiment may decode an attribute bitstream through arithmetic coding. The arithmetic decoder 13007 may decode the attribute bitstream based on the reconstructed geometry. The arithmetic decoder 13007 performs the same or similar operations and/or encodings as those of the arithmetic decoder 11005.
The inverse quantization processor 13008 according to an embodiment inversely quantizes the decoded attribute bitstream. The inverse quantization processor 13008 performs the same or similar operations and/or methods as the operations and/or inverse quantization methods of the inverse quantizer 11006.
The prediction/lifting/RAHT inverse transform processor 13009 according to an embodiment may process the reconstructed geometric and inverse quantized properties. The prediction/lifting/RAHT inverse transform processor 13009 performs one or more operations and/or decodes that are the same as or similar to the operations and/or decodes of the RAHT transformer 11007, the LOD generator 11008, and/or the inverse lifting means 11009. The color inverse transform processor 13010 according to the embodiment performs inverse transform encoding to inverse transform color values (or textures) included in the decoded attribute. The color inverse transform processor 13010 performs the same or similar operations and/or inverse transform encodings as the operations and/or inverse transform encodings of the color inverse transformer 11010. The renderer 13011 according to an embodiment may render the point cloud data.
Fig. 14 illustrates an exemplary structure operable in conjunction with a point cloud data transmission/reception method/apparatus according to an embodiment.
The structure of fig. 14 represents a configuration in which at least one of a server 1460, a robot 1410, a self-driving vehicle 1420, an XR device 1430, a smart phone 1440, a home appliance 1450, and/or a Head Mounted Display (HMD) 1470 is connected to the cloud network 1400. Robot 1410, self-propelled vehicle 1420, XR device 1430, smart phone 1440, or home appliance 1450 are referred to as devices. Further, XR device 1430 may correspond to a point cloud data (PCC) device or may be operatively connected to a PCC device according to an embodiment.
Cloud network 1400 may represent a network that forms part of or resides in a cloud computing infrastructure. Here, the cloud network 1400 may be configured using a 3G network, a 4G or Long Term Evolution (LTE) network, or a 5G network.
The server 1460 may be connected to at least one of the robot 1410, the self-driving vehicle 1420, the XR device 1430, the smart phone 1440, the home appliance 1450, and/or the HMD 1470 via the cloud network 1400, and may assist in at least a portion of the processing of the connected devices 1410 to 1470.
HMD 1470 represents one of the implementation types of XR devices and/or PCC devices according to an embodiment. The HMD type device according to an embodiment includes a communication unit, a control unit, a memory, an I/O unit, a sensor unit, and a power supply unit.
Hereinafter, various embodiments of the devices 1410 to 1450 to which the above-described technology is applied will be described. The devices 1410 to 1450 shown in fig. 14 may be operatively connected/coupled to the point cloud data transmitting/receiving device according to the above-described embodiment.
<PCC+XR>
XR/PCC device 1430 may employ PCC technology and/or XR (ar+vr) technology and may be implemented as an HMD, head-up display (HUD) disposed in a vehicle, television, mobile phone, smart phone, computer, wearable device, home appliance, digital signage, vehicle, stationary robot, or mobile robot.
XR/PCC device 1430 may analyze 3D point cloud data or image data acquired by various sensors or from external devices and generate location data and attribute data regarding the 3D points. Thus, XR/PCC device 1430 may obtain information about surrounding space or real objects and render and output XR objects. For example, XR/PCC device 1430 may match an XR object that includes ancillary information about the identified object with the identified object and output the matched XR object.
< PCC+XR+Mobile Phone >
XR/PCC apparatus 1430 may be implemented as mobile phone 1440 by applying PCC technology.
The mobile phone 1440 may decode and display the point cloud content based on PCC technology.
< PCC+self-steering+XR >
The autonomous vehicle 1420 may be implemented as a mobile robot, vehicle, unmanned aerial vehicle, or the like by applying PCC technology and XR technology.
The autonomous vehicle 1420 to which XR/PCC technology is applied may represent an autonomous vehicle provided with means for providing XR images, or an autonomous vehicle that is a control/interaction target in XR images. Specifically, as a control/interaction target in the XR image, the autonomous vehicle 1420 may be distinguished from and operatively connected to the XR device 1430.
The autonomous vehicle 1420 having means for providing an XR/PCC image may acquire sensor information from a sensor comprising a camera, and output the generated XR/PCC image based on the acquired sensor information. For example, the self-driving vehicle 1420 may have a HUD and output XR/PCC images thereto, providing passengers with XR/PCC objects corresponding to real objects or objects presented on a screen.
When the XR/PCC object is output to the HUD, at least a portion of the XR/PCC object may be output to overlap with the real object pointed at by the occupant's eyes. On the other hand, when the XR/PCC object is output on a display provided in the self-driving vehicle, at least a portion of the XR/PCC object may be output to overlap with the object on the screen. For example, the self-driving vehicle 1220 may output XR/PCC objects corresponding to objects such as roads, another vehicle, traffic lights, traffic signs, two-wheelers, pedestrians, and buildings.
Virtual Reality (VR), augmented Reality (AR), mixed Reality (MR) and/or Point Cloud Compression (PCC) techniques according to embodiments are applicable to a variety of devices.
In other words, VR technology is a display technology that provides CG images of only real world objects, backgrounds, and the like. On the other hand, the AR technique refers to a technique of displaying a virtually created CG image on an image of a real object. MR technology is similar to AR technology described above in that the virtual objects to be displayed are mixed and combined with the real world. However, the MR technology is different from the AR technology in that the AR technology explicitly distinguishes between a real object and a virtual object created as a CG image and uses the virtual object as a supplementary object to the real object, whereas the MR technology regards the virtual object as an object having characteristics equivalent to the real object. More specifically, an example of an MR technology application is a holographic service.
More recently, VR, AR, and MR techniques have been commonly referred to as extended display (XR) techniques, rather than being clearly distinguished from one another. Thus, embodiments of the present disclosure are applicable to any of VR, AR, MR, and XR technologies. Encoding/decoding based on PCC, V-PCC and G-PCC techniques is applicable to such techniques.
The PCC method/apparatus according to an embodiment may be applied to a vehicle that provides a self-driving service.
The vehicle providing the self-driving service is connected to the PCC device for wired/wireless communication.
When a point cloud data (PCC) transmitting/receiving device according to an embodiment is connected to a vehicle for wired/wireless communication, the device may receive/process content data related to an AR/VR/PCC service (which may be provided together with a self-driving service) and transmit it to the vehicle. In the case where the PCC transmission/reception apparatus is mounted on a vehicle, the PCC transmission/reception apparatus may receive/process content data related to the AR/VR/PCC service according to a user input signal input through the user interface apparatus and provide it to a user. A vehicle or user interface device according to an embodiment may receive a user input signal. The user input signal according to an embodiment may include a signal indicating a self-driving service.
As described with reference to fig. 1 to 14, the point cloud data is composed of a point group, and each point may have geometric data (geometric information) and attribute data (attribute information). The geometric data is the three-dimensional position (e.g., coordinate values of x, y, and z axes) of each point. That is, the positions of the respective points are indicated by parameters in a coordinate system representing a three-dimensional space, for example, parameters (x, y, z) representing three axes of the space x, y and z. Attribute information may include color (RGB, YUV, etc.), reflectivity, normal, and transparency. The attribute information may be represented in scalar or vector form.
According to an embodiment, according to the type and acquisition method of point cloud data, the point cloud data may be classified into a category 1 of static point cloud data, a category 2 of dynamic point cloud data, and a category 3 of point cloud data acquired through dynamic motion. Category 1 consists of a point cloud with a single frame of high density points for an object or space. The category 3 data may be divided into frame-based data having a plurality of frames acquired while moving and fusion data that is a single frame matching a point cloud acquired by a lidar sensor for a large space with a color image acquired as a 2D image.
According to an embodiment, inter-prediction (encoding/decoding) may be used to efficiently compress three-dimensional point cloud data having a plurality of frames, such as frame-based point cloud data having a plurality of frames, over time. Inter-prediction encoding/decoding may be applied to geometric information and/or attribute information. Inter prediction may be referred to as inter prediction or inter prediction, and intra prediction may be referred to as intra prediction.
According to the embodiment, the point cloud data transmitting/receiving apparatus/method can perform multi-directional prediction between a plurality of frames. The point cloud data transmitting/receiving apparatus/method can distinguish between the encoding order and the display order of frames, and can predict the point cloud data according to a predetermined encoding order. The point cloud data transmitting/receiving apparatus/method according to the embodiment may perform inter prediction in a prediction tree structure based on a reference between a plurality of frames.
Further, according to an embodiment, the point cloud data transmitting/receiving apparatus/method may perform inter prediction by generating an accumulated reference frame. The accumulated reference frame may be an accumulation of multiple reference frames.
The point cloud data transmitting/receiving apparatus/method according to the embodiment may define a prediction unit to apply a prediction technique between a plurality of frames as a method for improving compression efficiency of point cloud data having one or more frames. A prediction unit according to an embodiment may be referred to by various terms such as a unit, a first unit, a region, a first region, a frame, a region, or a unit.
The point cloud data transmitting/receiving device/method according to the embodiment may compress/reconstruct data composed of a point cloud. In particular, in order to efficiently compress a point cloud having one or more frames, motion estimation and data prediction may be performed in consideration of characteristics of the point cloud captured by a lidar sensor and distribution of data contained in a prediction unit.
Fig. 15 illustrates inter prediction according to an embodiment.
Referring to fig. 15, the point cloud data includes a plurality of frames. The plurality of frames may be referred to as GOFs. A frame encoded or decoded by the transmitting/receiving apparatus/method according to the embodiment may be referred to as a current frame, and a frame referred to encode or decode the current frame may be referred to as a reference frame.
When generating a prediction tree structure for predicting point cloud data, the transmitting/receiving apparatus/method according to the embodiment may perform inter prediction (inter prediction). In this prediction tree structure, a frame encoded immediately before the current frame is determined as a reference frame, and a point 1504 of the reference frame, the azimuth of which is most similar to a point 1505 decoded before the current point in the current frame and the laser ID of which is the same, is searched. Then, the closest point 1502 or the second closest point 1503 to the point 1504 among the points having the increased azimuth may be determined as a predictor, i.e., the predictor of the current point 1501. In this case, the case of determining the closest point as a predictor and the case of determining the second closest point as a predictor may be signaled by a flag separately, so that point information to be considered as the current point position in inter-prediction may be determined, and then information about the corresponding predictor may be transferred to a receiver.
A predictor or predictor according to an embodiment may be referred to as a candidate point for predicting the current point.
The transmitting/receiving apparatus/method according to the embodiment may correspond to the transmitting/receiving apparatus of fig. 1, the transmitting/receiving apparatus of fig. 2, the point cloud encoder of fig. 4, the point cloud decoders of fig. 10 and 11, the transmitting apparatus of fig. 12, the receiving apparatus of fig. 13, the apparatus of fig. 14, and the transmitting apparatus/method of fig. 44, the receiving apparatus/method of fig. 45, the transmitting apparatus/method of fig. 46, the receiving apparatus/method of fig. 47, the transmitting method of fig. 48, or the receiving method of fig. 49, or a combination thereof. The transmitting/receiving apparatus/method according to the embodiment may perform inter prediction described with reference to fig. 15.
Fig. 16 shows a frame group (GoF) according to an embodiment.
The GoF 1600 according to an embodiment represents a group of frames, which may also be referred to as a GoP. The frames comprising GoF 1600 include intra frames (I-frames) 1610, predicted frames (P-frames) 1620, and/or bi-directional frames (B frames) (not shown). I-frame 1610 may represent a frame that does not refer to any other frame. I-frame 1610 is the first frame in GoF 1600, so it has no previous frame and does not reference any other frame. The P-frame 1620 represents a frame predicted by referring to a previous frame such as the I-frame 1610 or another frame such as the P-frame 1620. B-frames may represent frames predicted by referencing I-frames 1610 or P-frames 1620 in two directions.
Referring to fig. 16, a prediction direction 1630 is shown. The transmitting/receiving apparatus/method according to the embodiment may predict the current frame based on information about an immediately previously decoded or encoded frame. However, when prediction is performed based only on information about immediately preceding frames, prediction can be performed in only a single direction even if there is a large amount of redundant information between frames, which may not allow multi-directional prediction and may not allow information about one or more reference frames to be utilized. Therefore, transmission efficiency may be reduced.
The transmitting/receiving apparatus/method according to the embodiment may perform multi-directional prediction between frames when predicting point cloud data. The transmitting/receiving apparatus/method according to the embodiment may not just refer to a frame encoded immediately before the current frame, but separate the encoding order from the display order to predict the point cloud data of the current frame using one or more reference frames regardless of the display order.
The transmitting/receiving apparatus/method according to the embodiment may predict point cloud data with reference to a plurality of frames (or multiframes) in inter-prediction (inter-prediction) of a predicted geometry.
Further, the transmitting/receiving apparatus/method according to the embodiment may separate the encoding order from the display order in inter-prediction of the predicted geometry, and may perform encoding or decoding according to the encoding order.
Further, the transmitting/receiving apparatus/method according to the embodiment can distinguish a bidirectional frame from a unidirectional frame in inter prediction (inter prediction) of a predicted geometry.
Furthermore, the transmitting/receiving apparatus/method according to the embodiment may select an advantageous prediction direction in bi-directional, unidirectional, backward (or backward) prediction for each node when predicting a predicted geometry in a multi-directional frame.
Further, the transmitting/receiving apparatus/method according to the embodiment may determine and signal whether inter prediction or intra prediction is performed on each node when predicting the predicted geometry.
Further, the transmitting/receiving apparatus/method according to the embodiment may reorder frames decoded in the encoding order in the display order. When prediction is performed based on one or more reference frames, frame information to be referred to may be stored until prediction of a frame including a corresponding node ends.
The transmitting/receiving apparatus/method according to the embodiment may generate single-segment accumulated point cloud data for a nearby reference frame selected for inter prediction (inter prediction).
Further, in inter prediction (inter prediction), the transmitting/receiving apparatus/method according to the embodiment may select the most appropriate data from among the accumulated reference frame data to predict a current point (or node) in the current frame, and may replace the value of the current point with it.
Further, the transmitting/receiving apparatus/method according to the embodiment may select one or more data most relevant from the accumulated reference frame data to generate a piece of node information and use it as a predicted value for a current node of the current frame.
Further, the transmitting/receiving apparatus/method according to the embodiment may select one or more data most relevant from the accumulated reference frame data and group them to select a prediction direction.
When the transmitting/receiving apparatus/method according to the embodiment utilizes a plurality of reference frames, a method of applying the reference frames may be different depending on a display order, a frame identifier, or a frame index for the reference frames, using a value smaller or larger than a position, a frame identifier, or a frame index of the current frame.
When the display order, frame identifier, or frame index of the reference frame has a value smaller than that of the current frame, the transmitting/receiving apparatus/method according to the embodiment may store one or more reference frames in a buffer.
When the display order, frame identifier, or frame index of the reference frame has a value greater than that of the current frame, the encoding order may be defined separately from the display order. The coding order of the frames according to the embodiments may vary from one GOF to another, or may be fixed to a specific order.
When predicting a current frame using one or more reference frames, it is possible to signal to the transmitting and receiving apparatus according to the embodiment whether the positions of the reference frames are all before the current frame in display order, bi-directionally divided into before and after the current frame in display order, or all after the current frame. Furthermore, it may be signaled whether to use only a single reference frame or multiple reference frames.
The transmitting/receiving apparatus/method according to the embodiment may correspond to the transmitting/receiving apparatus of fig. 1, the transmitting/receiving apparatus of fig. 2, the point cloud encoder of fig. 4, the point cloud decoders of fig. 10 and 11, the transmitting apparatus of fig. 12, the receiving apparatus of fig. 13, the apparatus of fig. 14, and the transmitting apparatus/method of fig. 44, the receiving apparatus/method of fig. 45, the transmitting apparatus/method of fig. 46, the receiving apparatus/method of fig. 47, the transmitting method of fig. 48, or the receiving method of fig. 49, or a combination thereof. The transmitting/receiving apparatus/method according to the embodiment may perform inter prediction described with respect to fig. 16.
Fig. 17 illustrates a method of forward referencing for inter prediction according to an embodiment. Fig. 17 illustrates types of a plurality of frames, reference directions between frames, display order of frames, and encoding order of frames according to an embodiment.
Referring to fig. 17, an m-frame 1730 may represent a frame referring to a plurality of frames. For example, m-frame 1730 located in the third position in display order may refer to I-frame 1710 located in the first position in display order and P-frame 1720 located in the second position.
Forward prediction: in referring to a plurality of frames, if a reference frame is selected from previously encoded frames, the display order and the encoding order may be the same, as shown in fig. 17. The highly correlated reference frame may be next to the current frame in no display order. Forward prediction may refer to a prediction method for referencing at least one frame preceding a current frame in display order.
Furthermore, when the I-frame is a frame allowing only inter prediction and the P-frame uses only a single reference frame in the forward direction, the m-frame may utilize information about one or more reference frames regardless of the reference direction.
Referring to fig. 17, the display order and the encoding order of the plurality of frames are the same. In addition, in addition to the I-frame 1710, a frame (P-frame or m-frame) may refer to a previous frame in display order or encoding order. For example, the frame located at the eighth position in the display order may refer to the frame located at the seventh position in the display order and the frame located at the fourth position. That is, in order to predict the current point in the frame at the eighth position in the display order, points in the frames at the sixth position and the seventh position in the display order may be referred to.
Fig. 18 illustrates a method for backward (or backward) referencing of inter prediction according to an embodiment. In other words, the figure illustrates an embodiment of prediction using a reference frame located at the rear side.
Referring to fig. 18, the position of the reference frame is identified by the reference direction. The transmitting/receiving apparatus/method according to an embodiment may encode the I-frame 1810 in the GOF, predict the P-frame 1830, which is forward-referenced by only a single frame, by referring to the I-frame 1810, and then predict the m-frame 1820 between the I-frame and the P-frame based on information about the P-frame 1830 and the first encoded m-frame. In this case, the display order and the encoding order may be different, and the reference direction may depend on the designated positions of the m-frame and the P-frame.
Referring to fig. 18, the transmitting/receiving apparatus/method according to the embodiment may predict a current frame by referring to frames positioned further backward in display order. For example, a frame at a second position in the display order may reference a frame at a fourth position in the display order to predict the current point. In other words, a current point in a frame at a second position in the display order may be predicted with reference to a point in a frame at a fourth position in the display order.
Fig. 19 illustrates a method for bi-directional reference for inter prediction according to an embodiment. Referring to fig. 19, an m-frame 1920 refers to frames located on the front side and the rear side in the display order.
When the m-frame 1920 allows bi-directional reference, a transmitting and receiving device according to an embodiment may first encode the I-frame 1910 and the P-frame 1930 and then predict the m-frame 1920 between the I-frame 1910 and the P-frame 1930 based on bi-directional information. At this time, if there is a previously encoded m-frame, it may also be added and used as a reference frame.
Referring to fig. 19, a transmitting/receiving apparatus/method according to an embodiment may predict a current frame by referring to frames located at a front side and a rear side in display order. For example, a frame located in the second position in the display order may predict the current point by referring to frames located in the first and fourth positions in the display order. In other words, to predict the current point in the frame at the second position of the display order, the point in the frame at the first position of the display order and the point in the frame at the fourth position of the display order may be referenced.
The transmitting/receiving apparatus/method according to the embodiment may correspond to the transmitting/receiving apparatus of fig. 1, the transmitting/receiving apparatus of fig. 2, the point cloud encoder of fig. 4, the point cloud decoders of fig. 10 and 11, the transmitting apparatus of fig. 12, the receiving apparatus of fig. 13, the apparatus of fig. 14, and the transmitting apparatus/method of fig. 44, the receiving apparatus/method of fig. 45, the transmitting apparatus/method of fig. 46, the receiving apparatus/method of fig. 47, the transmitting method of fig. 48, or the receiving method of fig. 49, or a combination thereof. The transmitting/receiving apparatus/method according to the embodiment may perform inter prediction described with reference to fig. 17 to 19.
According to the above three reference directions, when a reference frame is referred to by one or more frames, information about the reference frame may be stored in a separate buffer until its use is completed. The information about the reference frame may be deleted from the buffer after the buffer usage is completed.
According to an embodiment, the method of referring to a plurality of frames may be applied to geometric prediction, attribute prediction, and the like. In inter prediction (inter prediction) of geometry and properties, reference frame information may be divided and used, or the same reference frame may be used.
Although references between frames according to embodiments have been described according to forward, backward and bi-directional methods, combinations of methods may be applied in the same order.
Hereinafter, a method of predicting a node based on a plurality of reference frames according to an embodiment will be described. A node according to an embodiment may be referred to as a point.
The following prediction method may be applied to inter prediction of point cloud data according to an embodiment based on a plurality of reference frames. The prediction method according to the embodiment may be applied to inter prediction of predictive geometry.
Fig. 20 illustrates forward inter prediction according to an embodiment.
Forward prediction
Referring to fig. 20, a reference frame closest to the current frame in display order is referred to as reference frame 1, and a next closest reference frame is referred to as reference frame 2.
In order to predict a point (or node) 2014 to be predicted in the current frame, the transmitting/receiving apparatus/method according to the embodiment searches for a point having the same laser ID and a similar azimuth as a previously decoded point (prev.point) 2002 in the reference frame 1, and determines the closest point among points having a radius greater than that of the previously decoded point 2002 in the reference frame 1 as a first predictor P1, 2006, and determines the next closest point as a second predictor P1',2008. Then, a point in the reference frame 2 having the same laser ID, similar azimuth angle, and most similar radius as P1, 2006 is determined as a third predictor P2, 2010, and a closest point having the same laser ID, similar azimuth angle, and radius greater than P2, 2010 is determined as a fourth predictor P2',2012.
The transmitting/receiving apparatus/method according to the embodiment may set a predictor most similar to the current point 2014 among the four predictors as a predicted value, or may find an optimal combination most similar to the current point 2014 among a combination of P1, 2006 and P2, 2010, a combination of P1, 2006 and P2',2012, a combination of P1',2008 and P2, 2010, and a combination of P1',2008 and P2',2012, and use it as a predicted value of the current point 2014. Alternatively, the transmitting/receiving apparatus/method according to the embodiment may calculate representative values (e.g., averages) of predictors found in respective reference frames, and then compare the respective representative values with a current point (or current node), or calculate the predicted values according to a combination of the representative values. In predicting the current point information by combining information about two or more predictors, a predicted value may be calculated by applying an arithmetic mean, a geometric mean, a weight based on a difference in display order, or the like.
To predict the current point 2014 in the current frame, the transmitting/receiving apparatus/method according to an embodiment may find a second point 2004 having the same laser ID and a similar azimuth in the reference frame 1 based on the previously decoded first point 2002 in the current frame. Then, the closest point (third point 2006) among the points in the reference frame 1 having a radius larger than the radius of the first point 2002 may be determined as the first predictor P1, 2006 or the first candidate point P1, 2006, and the next closest point (fourth point 2008) may be determined as the second predictor P1',2008. Then, in the reference frame 2, a fifth point 2010 having the same laser ID, a similar azimuth angle, and a closest radius as the first predictor 2006 may be determined as the third predictor P2, 2010, and a closest point 2012 among points having the same laser ID and a similar azimuth angle and a radius greater than the radius of the fifth point 2010 may be determined as the fourth predictor P2',2012.
The transmitting/receiving apparatus/method according to the embodiment may independently compare four predictors or candidate points with the current point and determine the closest predictor as a predicted value, or may determine a predicted value of the current point 2014 based on a combination of two or more of the four predictors. For example, an average of predictors found in each reference frame may be calculated and compared to the current point 2014, or the current point 2014 may be predicted based on a combination of averages.
Fig. 21 illustrates forward inter prediction according to an embodiment.
Referring to fig. 21, a reference frame closest to the current frame in display order is referred to as reference frame 1, and a next closest reference frame is referred to as reference frame 2.
In order to predict a point (or node) 2112 to be predicted in the current frame, the transmitting/receiving apparatus/method according to the embodiment searches for a point 2104 having the same laser ID and a similar azimuth angle in the reference frame 1 based on a previous decoding point (prev.point) 2102, which is a point decoded before a point to be predicted in the current point, determines a closest point among points having a radius larger than that of the previous decoding point 2102 as a first predictor P1, 2106, and then determines a closest point among points having the same laser ID, a similar azimuth angle, and a large radius as a second predictor P2, 2110 based on a point 2108 having the same laser ID, a similar azimuth angle, and a radius most similar to the point 2106 in the reference frame 2.
The transmitting/receiving apparatus/method according to the embodiment may compare each of P1, 2106 and P2, 2110 with the current point 2112 in the current frame to select the closest value, or may find the best case among a total of three conditions including a combination of P1, 2106 and P2, 2110 and use it as a predicted value of the current point. In predicting the current point information by combining two or more predictor information, a predicted value may be calculated by applying an arithmetic mean, a geometric mean, a weight based on a difference in display order, or the like.
To predict the current point 2112 in the current frame, the transmitting/receiving apparatus/method according to the embodiment may find the second point 2104 having the same laser ID and similar azimuth in the reference frame 1 based on the previously decoded first point 2102 in the current frame. Then, the closest point (third point 2106) among the points in the reference frame 1 having a radius larger than the radius of the first point 2102 may be determined as the first predictor P1, 2106 or the first candidate point P1, 2106. Then, in the reference frame 2, a fourth point 2108 having the same laser ID as the first predictor 2006, a similar azimuth angle, and a closest radius may be searched, and a closest point 2110 among points having the same laser ID as the fourth point 2108, a similar azimuth angle, and a radius greater than the radius of the fourth point 2108 may be determined as the second predictor P2, 2110.
The transmitting/receiving apparatus/method according to the embodiment may independently compare each of the first predictor 2106 and the second predictor 2110 with the current point 2112 and determine the closest predictor as a predicted value, or may determine the predicted value of the current point 2112 based on three methods including a combination of predictors.
Fig. 22 illustrates forward inter prediction according to an embodiment.
Referring to fig. 22, a reference frame closest to the current frame in display order is referred to as reference frame 1, and a next closest reference frame is referred to as reference frame 2.
In order to predict a point (or node) 2214 to be predicted in the current frame, the transmitting/receiving apparatus/method according to the embodiment searches for a point 2204 having the same laser ID and a similar azimuth in the reference frame 1 based on a previous decoding point (prev.point) 2202, which is a point decoded before the point to be predicted in the current point, determines the closest point among points having a radius larger than that of the previous decoding point 2202 as a first predictor P1, 2206, and determines the next closest point as a second predictor P1',2208. Then, in reference frame 2, the transmitting/receiving apparatus/method determines a point having the same laser ID, similar azimuth angle, and most similar radius as P1',2208 as the third predictor P2, 2210, and determines the closest point among points having the same laser ID, similar azimuth angle, and radius greater than P2, 2210 as the fourth predictor P2',2212.
The transmitting and receiving apparatus according to the embodiment may compare P2, 2210 and P1',2208 with the current point and determine a predictor having a value closest to the current point as a predicted value and signal it, or may calculate the predicted value based on a combination of P1',2208 and P2, 2210. Further, they may determine one or both of P2',2212 and P1, 2206 as candidate points as desired. Further, in predicting the current point information by combining two or more predictors, a predicted value may be calculated by applying an arithmetic mean, a geometric mean, a weight based on a difference in display order, or the like.
To predict the current point 2214 in the current frame, the transmitting/receiving apparatus/method according to the embodiment may find a second point 2204 having the same laser ID and a similar azimuth in the reference frame 1 based on the previously decoded first point 2202 in the current frame. Then, the closest point (third point 2206) among the points in the reference frame 1 having the radius larger than the radius of the first point 2202 may be determined as the first predictor P1, 2206 or the first candidate point P1, 2206, and the next closest point (fourth point 2208) may be determined as the second predictor P1',2208. Then, in the reference frame 2, a fifth point 2210 having the same laser ID, a similar azimuth angle, and a closest radius to the second predictor 2208 may be determined as the third predictor P2, 2210, and a closest point 2212 among points having the same laser ID and a similar azimuth angle and having a radius greater than the radius of the fifth point 2210 may be determined as the fourth predictor P2',2212.
The transmitting/receiving apparatus/method according to the embodiment may independently compare four predictors or candidate points with the current point and determine the closest predictor as a predicted value, or may determine a predicted value of the current point 2214 based on a combination of two or more of the four predictors. For example, an average of predictors found in each reference frame may be calculated and compared to the current point 2214, or the current point 2214 may be predicted based on a combination of the averages.
Fig. 23 illustrates bi-directional inter prediction according to an embodiment.
Referring to fig. 23, a reference frame before a current frame in display order may be referred to as reference frame 1, and a reference frame after the current frame in display order may be referred to as reference frame 2. The reference frame 2 is located after the current frame in display order.
In order to predict a point (or node) 2312 to be predicted in the current frame, the transmitting and receiving apparatus according to the embodiment searches for a point 2304 having the same laser ID and a similar azimuth in the reference frame 1 based on a previous decoding point (prev.point) 2302, which is a point decoded before the point to be predicted in the current frame, determines the closest point among points having a radius greater than the radius of the previous decoding point 2302 as a first predictor P1, 2306, and determines the next closest point as a second predictor P1',2308. Then, in the reference frame 2, the transmitting and receiving apparatus determines points having the same laser ID, similar azimuth angle, and most similar radius as P1',2308 as third predictors P2, 2310. The transmitting and receiving apparatus may independently compare each of the two predicted values P1',2308 and P2, 2310 with the current point and set a predictor most similar to the current point as a predicted value. Alternatively, they may determine the best case of the two predictors and the combination of P1',2308 and P2, 2310 as predictors. Further, in predicting the current point information by combining two or more predictors, a predicted value may be calculated by applying an arithmetic mean, a geometric mean, a weight based on a difference in display order, or the like.
To predict the current point 2312 in the current frame, the transmitting/receiving apparatus/method according to the embodiment may find the second point 2304 having the same laser ID and a similar azimuth in the reference frame 1 based on the previously decoded first point 2302 in the current frame. Then, the closest point (third point 2306) among the points in the reference frame 1 having the radius larger than the radius of the first point 2302 may be determined as the first predictor P1, 2306 or the first candidate point P1, 2306, and the next closest point (fourth point 2308) may be determined as the second predictor P1',2308. Then, in the reference frame 2, a fifth point 2210 having the same laser ID, similar azimuth and closest radius as the second predictor 2308 can be found and determined as the third predictor P2, 2310.
The transmitting/receiving apparatus/method according to the embodiment may independently compare four predictors or candidate points with the current point and determine the closest predictor as a predicted value, or may determine a predicted value of the current point 2214 based on a combination of two or more of the four predictors. For example, an average of predictors found in each reference frame may be calculated and compared to the current point 2214, or the current point 2214 may be predicted based on a combination of the averages.
Fig. 24 illustrates bi-directional inter-prediction according to an embodiment.
Referring to fig. 24, a reference frame before a current frame in display order may be referred to as reference frame 1, and a reference frame after the current frame in display order may be referred to as reference frame 2. The reference frame 2 is located after the current frame in display order.
In order to predict a point (or node) 2416 to be predicted in the current frame, the transmitting and receiving apparatus according to an embodiment searches for a point 2404 having the same laser ID and a similar azimuth in the reference frame 1 based on a previous decoding point (prev.point) 2402, the previous decoding point (prev.point) 2402 being a point decoded before the point to be predicted in the current frame, determines the closest point among points having a radius greater than the radius of the previous decoding point 2402 as the first predictors P1, 2408, and determines the next closest point as the second predictor P1'2410. Then, based on the previously decoded point (prev.point) 2402, a point 2406 having the same laser ID, similar azimuth, and most similar radius is found in the reference frame 2, and a point showing the smallest difference from the point 2406 among points having a radius larger than the radius of the point 2406 is determined as P2, 2412. P2',2414 is determined as the closest point among the points having a radius greater than the radius of P2, 2412.
In predicting the current point (or node) 2416 of the current frame, the transmitting and receiving apparatus according to an embodiment may compare the four predictors P1, P1', P2, and P2' alone or select the best candidate from a combination of two predictors. In predicting the current point information by combining two or more predictor information, a predicted value may be calculated by applying an arithmetic mean, a geometric mean, a weight based on a difference in display order, or the like.
To predict the current point 2416 in the current frame, the transmitting/receiving apparatus/method according to an embodiment may find the second point 2404 with the same laser ID and similar azimuth in the reference frame 1 based on the previously decoded first point 2402 in the current frame. Then, the closest point (third point 2408) among the points in the reference frame 1 having the radius larger than the radius of the first point 2402 may be determined as the first predictor P1, 2408 or the first candidate point P1, 2408, and the next closest point (fourth point 2410) may be determined as the second predictor P1',2410. Then, in the reference frame 2, a fifth point 2406 having the same laser ID, a similar azimuth angle, and a most similar radius may be found based on the first point 2402, and a point exhibiting the smallest difference from the fifth point 2402 among points having a radius larger than the radius of the fifth point 2406 may be determined as the third predictor P2, 2412, and a point exhibiting the second smallest difference may be determined as the fourth predictor P2',2414.
The transmitting/receiving apparatus/method according to the embodiment may independently compare four predictors or candidate points with the current point and determine the closest predictor as a predicted value, or may determine a predicted value of the current point 2416 based on a combination of two or more of the four predictors. For example, an average of predictors found in each reference frame may be calculated and compared to current point 2416, or current point 2416 may be predicted based on a combination of averages.
Fig. 25 illustrates bi-directional inter prediction according to an embodiment.
Referring to fig. 25, a reference frame before a current frame in display order may be referred to as reference frame 1, and a reference frame after the current frame in display order may be referred to as reference frame 2. The reference frame 2 is located after the current frame in display order.
In order to predict a point (or node) 2514 to be predicted in the current frame, the transmitting and receiving apparatus according to an embodiment may search for a point 2504 having the same laser ID and a similar azimuth in the reference frame 1 based on a previous decoding point (prev.point) 2502, the previous decoding point (prev.point) 2502 being a point decoded before the point to be predicted in the current frame, determining the closest point among points having a radius greater than that of the previous decoding point 2502 as the first predictors P1, 2506, and the next closest point as the second predictors P1',2508. Then, in reference frame 2, points having the same laser ID and similar azimuth angle as the previous decoding points (prev.point) 2502 and P1, 2506 and having a radius larger than that of P1, 2506 and the previous decoding point 2502 are sequentially selected, and designated as P2, 2510 and P2',2512, respectively.
In predicting the current point (or node) 2514 of the current frame, the transmitting and receiving apparatus according to an embodiment may compare the four predictors P1, P1', P2, and P2' individually or select the best candidate from a combination of the two predictors. In predicting the current point information by combining two or more predictor information, a predicted value may be calculated by applying an arithmetic mean, a geometric mean, a weight based on a difference in display order, or the like.
To predict the current point 2514 in the current frame, the transmitting/receiving apparatus/method according to an embodiment may find a second point 2504 having the same laser ID and similar azimuth in the reference frame 1 based on the previously decoded first point 2502 in the current frame. Then, the closest point (third point 2506) among the points having a radius larger than that of the first point 2502 in the reference frame 1 may be determined as the first predictor P1, 2506 or the first candidate point P1, 2506, and the next closest point (fourth point 2508) may be determined as the second predictor P1',2508. Then, in the reference frame 2, points having the same laser ID and similar azimuth angles based on the first point 2502 and having a radius larger than that of the third point 2506 and the first point 2502 may be sequentially selected and determined as the third predictors P2, 2510 and the fourth predictors P2',2512.
Fig. 26 illustrates backward inter prediction according to an embodiment.
Referring to fig. 26, among frames later than the current frame in display order, a reference frame closer to the current frame may be referred to as reference frame 1, and a reference frame later than the current frame and reference frame 1 in display order may be referred to as reference frame 2. Reference frame 2 follows reference frame 1 in display order.
In order to predict a point (or node) 2614 to be predicted in the current frame, the transmitting and receiving apparatus according to the embodiment may search for a point 2604 having the same laser ID and a similar azimuth in the reference frame 1 based on a previously decoded point (prev.point) 2602. The previous decoding point (prev. Point) 2602 is a point decoded before a point to be predicted in the current frame, a closest point among points having a radius larger than the radius of the previous decoding point 2602 is determined as a first predictor P1, 2606, and a next closest point is determined as a second predictor P1',2608. Then, in the reference frame 2, points having laser IDs and azimuth angles similar to those of P1, 2606 and P1',2608 and having radius values most similar to those of P1, 2606 and P1',2608 may be searched for and determined as P2, 2610 and P2',2612, respectively.
In predicting the current point (or node) 2614 of the current frame, the transmitting and receiving apparatus according to the embodiment may compare the four predictors P1, P1', P2, and P2' alone or select the best candidate from a combination of the two predictors. In predicting the current point information by combining two or more predictor information, a predicted value may be calculated by applying an arithmetic mean, a geometric mean, a weight based on a difference in display order, or the like.
To predict the current point 2614 in the current frame, the transmitting/receiving apparatus/method according to the embodiment may find the second point 2604 with the same laser ID and similar azimuth in the reference frame 1 based on the previously decoded first point 2602 in the current frame. Then, the closest point (third point 2606) among the points in the reference frame 1 having the radius larger than the radius of the first point 2602 may be determined as the first predictor P1, 2606 or the first candidate point P1, 2606, and the next closest point (fourth point 2608) may be determined as the second predictor P1',2608. Then, in the reference frame 2, a point having the same laser ID and similar azimuth angle and having a radius closest to the third point 2606 and the fourth point 2608 can be found based on the third point 2606 and the fourth point 2608, and determined as the third predictors P2, 2610 and the fourth predictors P2',2612.
Fig. 27 illustrates backward inter prediction according to an embodiment.
Referring to fig. 26, among frames later than the current frame in display order, a reference frame closer to the current frame may be referred to as reference frame 1, and a reference frame later than the current frame and reference frame 1 in display order may be referred to as reference frame 2. Reference frame 2 follows reference frame 1 in display order.
In order to predict a point (or node) 2714 to be predicted in the current frame, the transmitting and receiving apparatus according to the embodiment may search for a point 2704 having the same laser ID and a similar azimuth in the reference frame 1 based on a previous decoding point (prev.point) 2702, which is a point decoded before the point to be predicted in the current frame, determine the closest point among points having a radius greater than the radius of the previous decoding point 2702 as a first predictor P1, 2706, and determine the next closest point as a second predictor P1',2708. Then, in the reference frame 2, points having the laser ID and the azimuth angle similar to P1',2708 and the radius most similar thereto may be determined as P2, 2712, and points having a radius smaller than P2, 2712 and larger than the previously decoded point 2702 may be determined as P2',2710.
In predicting the current point (or node) 2714 of the current frame, the transmitting and receiving apparatus according to the embodiment may compare the four predictors P1, P1', P2, and P2' alone or select the best candidate from a combination of two predictors. In predicting the current point information by combining two or more predictor information, a predicted value may be calculated by applying an arithmetic mean, a geometric mean, a weight based on a difference in display order, or the like.
To predict the current point 2714 in the current frame, the transmitting/receiving apparatus/method according to the embodiment may find the second point 2704 having the same laser ID and a similar azimuth in the reference frame 1 based on the previously decoded first point 2702 in the current frame. Then, the closest point (third point 2706) among points having a radius larger than that of the first point 2702 in the reference frame 1 may be determined as the first predictor P1, 2706 or the first candidate point P1, 2706, and the next closest point (fourth point 2708) may be determined as the second predictor P1',2708. Then, in the reference frame 2, a fifth point 2712 having the same laser ID and similar azimuth angle as the fourth point 2708 and the most similar radius may be determined as the third predictors P2, 2712, and a point having a radius smaller than the fifth point 2712 and larger than the first point 2702 may be determined as the fourth predictors P2',2710.
Fig. 28 illustrates backward inter prediction according to an embodiment.
Referring to fig. 26, among frames later than the current frame in display order, a reference frame closer to the current frame may be referred to as reference frame 1, and a reference frame later than the current frame and reference frame 1 in display order may be referred to as reference frame 2. Reference frame 2 follows reference frame 1 in display order.
In order to predict a point (or node) 2814 to be predicted in the current frame, the transmitting and receiving apparatus according to an embodiment may search for a point 2804 having the same laser ID and a similar azimuth in the reference frame 1 based on a previous decoding point (prev.point) 2802, which is a point decoded before the point to be predicted in the current frame, determine the closest point among points having a radius greater than the radius of the previous decoding point 2802 as a first predictor P1, 2806, and determine the next closest point as a second predictor P1',2808. Then, in reference frame 2, a point having a laser ID and azimuth similar to P1',2808 and a radius most similar to P1'2808 may be determined as P2, 2810.
In predicting the current point (or node) 2814 of the current frame, the transmitting and receiving apparatus according to an embodiment may compare the three predictors P1, P1' and P2 alone or select the best candidate from a combination of the two predictors. In predicting the current point information by combining two or more predictor information, a predicted value may be calculated by applying an arithmetic mean, a geometric mean, a weight based on a difference in display order, or the like.
To predict the current point 2812 in the current frame, the transmitting/receiving apparatus/method according to an embodiment may find a second point 2804 having the same laser ID and similar azimuth in the reference frame 1 based on the previously decoded first point 2802 in the current frame. Then, the closest point (third point 2806) among the points having a radius larger than that of the first point 2802 in the reference frame 1 may be determined as the first predictor P1, 2806 or the first candidate point P1, 2806, and the next closest point (fourth point 2808) may be determined as the second predictor P1',2808. Then, in the reference frame 2, a fifth point 2810 having the same laser ID and similar azimuth angle as the fourth point 2808 and a radius most similar to the radius of the fourth point 2808 may be found and determined as the third predictor P2, 2810.
In some embodiments, when points have the same laser ID and similar azimuth angles, the points are compared based only on the difference in radius. However, azimuth or laser ID may be substituted for comparison. Further, points may be compared according to one or more of elemental laser ID, azimuth, and radius. The three reference directions may vary from frame to frame or from point to point (or node to node). When there is any point (or node) referencing a plurality of frames in the case of a point-by-point change in direction, the reference frame information may be maintained until the prediction of the frame or prediction tree to which the point belongs is completed, or the reference frame information may be deleted when the prediction of the point (or node) referencing the plurality of frames is completed.
According to an embodiment, the transmitting and receiving apparatuses may compare points between frames based on at least one of a laser ID, an azimuth angle, and a radius. For example, the reference point or predictor may be determined by comparing points in the reference frame based on at least one of the laser ID, azimuth, and radius of the point decoded before the current point. The transmitting and receiving apparatus according to the embodiment may determine a plurality of predictors, and may predict the current point based on a combination of the plurality of predictors.
In order to select the best case among the found points (or predicted values), the transmitting and receiving apparatus according to the embodiment may exclude some points from the search if the found information is greater than a certain threshold. For example, when the difference between some or all of P1, P1', P2, and P2' according to the embodiment and the current point is greater than a threshold value, the corresponding point may be excluded from the best case search. The difference from the current point may be a difference in azimuth or radius, may be a euclidean distance or other distance metric, or may be a difference in attribute information.
The transmitting/receiving apparatus/method according to the embodiment may correspond to the transmitting/receiving apparatus of fig. 1, the transmitting/receiving apparatus of fig. 2, the point cloud encoder of fig. 4, the point cloud decoders of fig. 10 and 11, the transmitting apparatus of fig. 12, the receiving apparatus of fig. 13, the apparatus of fig. 14, and the transmitting apparatus/method of fig. 44, the receiving apparatus/method of fig. 45, the transmitting apparatus/method of fig. 46, the receiving apparatus/method of fig. 47, the transmitting method of fig. 48, or the receiving method of fig. 49, or may be combined therewith. The transmitting/receiving apparatus/method according to the embodiment may perform forward, bi-directional, or backward inter prediction described with reference to fig. 20 to 28.
In other words, the transmitting/receiving apparatus/method according to the embodiment may predict a current point in a current frame by referring to points belonging to a plurality of frames located before or after the current frame in display order.
Further, the transmitting/receiving apparatus/method according to the embodiment can select a predictor or a candidate point to be referred to in a reference frame by comparing not only the laser ID, the azimuth angle, and the radius but also X, Y, and Z values according to characteristics of a coordinate system.
The transmitting/receiving apparatus/method according to the embodiment may select a predictor or a candidate point in a reference frame based on a difference in distance thereof from a current point according to a specific threshold. For example, when the feature difference between a point and the current point is greater than a threshold, the point may not be selected as a predictor or candidate point.
According to the embodiment, by predicting point cloud data with reference to a plurality of frames, information about a current point in a current frame can be predicted more accurately. Thus, the difference between the predicted information and the actual information may be reduced, resulting in reduced residuals, reduced amounts of data to be transmitted and received, increased transmission and reception efficiency, and reduced encoding or decoding latency.
Fig. 29 illustrates a GOF according to an embodiment.
The transmitting/receiving apparatus/method according to the embodiment may perform inter prediction (inter prediction) based on the accumulated reference frames. One or more reference frames may be selected to improve the accuracy of inter-prediction when predicting a current node in a current frame.
Referring to fig. 29, a scenario is shown in which two frames are selected as reference frames 2930 from among seven reference candidates 2910 to predict a current frame 2920 among eight frames belonging to GoF. In order to use the reference frame, the reference frame should be decoded before the current frame and its decoding order may be signaled. In addition, information about the display order of the reference frames or information about the sequence of frames in the GOF may be signaled.
Fig. 30 illustrates accumulating reference frames according to an embodiment.
The reference frame 2930 selected in fig. 29 may be generated as an accumulated reference frame, which is generated in the form of accumulated point cloud data, and then ordered for comparison with the current frame, as shown in fig. 30.
The transmitting/receiving apparatus/method according to the embodiment may generate the accumulated reference frame 3010 by accumulating the selected reference frames, and may generate the ordered accumulated reference frame 3020 by ordering points in the accumulated reference frame. The transmitting/receiving apparatus/method according to the embodiment may compare a current point in the current frame with a point in the accumulated reference frame 3020 to select a candidate point for predicting the current point.
Fig. 31 illustrates a method of predicting a current point based on a cumulative reference frame according to an embodiment.
The transmitting/receiving apparatus/method according to the embodiment aggregates the selected reference frames into a single point cloud and sorts the points in the same manner as the sorting method for the current frame. After sorting, a point 3104 is searched in the point cloud of the accumulated reference frame, which point 3104 has the most similar laser ID, azimuth and radius as the decoding point 3102 immediately preceding the current point (or node) 3114 in the current frame, as shown in fig. 31. In this case, one or more points may be similar to decoding point 3102, but they are considered redundant points. Accordingly, the point 3106 having a larger azimuth and at a position closest to the decoding point 3102 may be selected as a candidate. 2) Based on point 1 (point 3106), points 3108 with the same laser ID, similar azimuth, and larger radius may be selected as candidates. 3) Based on the decoding point 3102, a point 3110 having a smaller azimuth and being at the nearest position may be selected as a candidate. 4) Based on point 3 (point 3110), points 3112 with the same laser ID, similar azimuth, and larger radius may be selected as candidates. Therefore, the transmitting/receiving apparatus/method according to the embodiment can find a total of four candidate points to predict the current point in the current frame.
To predict the current point 3114 in the current frame, the transmitting/receiving apparatus/method according to the embodiment may find a second point 3104 with the same laser ID and similar azimuth and radius in the accumulated reference frame based on the previously decoded first point 3102 in the current frame. Then, a third point 3106 having a larger azimuth and located closest to the first point 3102 may be selected as a candidate point in the accumulated reference frame. Further, a fourth point 3108 having the same laser ID and similar azimuth angle as the third point 3106, and a radius larger than the third point 3106, may be selected as the candidate point. Further, a fifth point 3110 having a smaller azimuth and located at the nearest position may be selected as the candidate point based on the first point 3102. Further, based on the fifth point 3110, a sixth point 3112 having the same laser ID, similar azimuth, and larger radius may be selected as the candidate point.
Method for selecting a predictor according to an embodiment
Method 1. The four selected points are used to determine whether the position of the current point 3114 is near P1 3106, P2 3108, P3 3110 or P4 3112 in the accumulated reference frame based on azimuth. Then, the point group is found by determining whether the position of the current point is close to P1 3106, P2 3108, P3 3110 or P4 3112 based on the radius.
Since P1 3106 and P2 3108 have the same (or similar) azimuth as current point 3114, one representative azimuth may be selected. Similarly, a representative azimuth angle may be selected for P3 3110 and P4 3112. Thus, by comparing the representative azimuth with the azimuth of the current point 3114, a set of most similar points (either a larger azimuth set or a smaller azimuth set based on the decoding point) may be selected. Depending on whether the closest one or average of the two points in the selected azimuth group is used as the predicted value for the current node 3114, three modes may be signaled as follows.
The transmitting/receiving apparatus/method according to the embodiment may group candidate points based on their azimuth angles and select a similar point group by comparing with the current point based on the representative azimuth angle. The closest point of the points in the selected point group is used as the predicted value of the current point, or the average of the points may be used as the predicted value of the current point.
Method 2. Based on the selected combination of four points, it may be determined whether it is advantageous to predict the position of the current point 3114 in the backward direction (corresponding to the combination of P3 and P4), in the forward direction (corresponding to the combination of P1 and P2), bi-directionally (corresponding to the combination of P1 and P3), or in a more distant bi-directionally direction (corresponding to the combination of P2 and P4), and then signaling corresponding to this determination may be provided.
The transmitting/receiving apparatus/method according to the embodiment may correspond to the transmitting/receiving apparatus of fig. 1, the transmitting/receiving apparatus of fig. 2, the point cloud encoder of fig. 4, the point cloud decoders of fig. 10 and 11, the transmitting apparatus of fig. 12, the receiving apparatus of fig. 13, the apparatus of fig. 14, and the transmitting apparatus/method of fig. 44, the receiving apparatus/method of fig. 45, the transmitting apparatus/method of fig. 46, the receiving apparatus/method of fig. 47, the transmitting method of fig. 48, or the receiving method of fig. 49, or may be combined therewith. The transmission/reception apparatus/method according to the embodiment may perform inter prediction based on the accumulated reference frames and signaling related to the prediction method described with reference to fig. 29 to 31.
According to the embodiment, by predicting point cloud data with reference to an accumulated frame (which is an accumulation of a plurality of frames), information about a current point in a current frame can be predicted more accurately. Thus, the difference between the predicted information and the actual information may be reduced, resulting in reduced residuals, reduced amounts of data to be transmitted and received, increased transmission and reception efficiency, and reduced encoding or decoding latency.
Fig. 32 illustrates an encoded bitstream according to an embodiment.
The bit stream according to the embodiment may be transmitted based on the transmission apparatus 10000 of fig. 1, the transmission method of fig. 2, the encoder of fig. 4, the transmission apparatus of fig. 12, the apparatus of fig. 14, the transmission method/apparatus of fig. 16, and/or the transmission methods/apparatuses of fig. 30 and 32. Further, the bit stream according to the embodiment may be received based on the receiving apparatus 20000 of fig. 1, the receiving method of fig. 2, the decoder of fig. 11, the receiving apparatus of fig. 13, the apparatus of fig. 14, and/or the receiving methods/apparatuses of fig. 31 and 33.
Relevant information may be signaled to add/execute an implementation. Parameters according to an embodiment (which may be referred to as metadata, signaling information, etc.) may be generated in the processing of the transmitter according to an embodiment described below and transmitted to the receiver according to an embodiment for use in the reconstruction process. For example, the parameters according to the embodiments may be generated in a metadata processor (or metadata generator) of the transmitting apparatus according to the embodiments described below and acquired by a metadata parser of the receiving apparatus according to the embodiments described below.
The transmitting and receiving apparatus according to the embodiment may define information for predicting the predictive geometric node using a plurality of frames or accumulated reference frames. The sequence parameter set according to an embodiment may indicate whether predictive geometric nodes are predicted based on a plurality of frames or based on accumulated reference frames, and may carry all or part of relevant necessary information depending on the implementation method. Also, the corresponding information may be carried in a set of geometrical parameters, a slice header, an SEI message, a data unit header, etc.
Further, the transmitting and receiving apparatus according to the embodiment may define multi-frame prediction related information or accumulated frame prediction related information in respective or separate locations according to applications, systems, or the like to provide different application ranges and methods. If the information containing similar functions is signaled in a higher layer, it can be applied even if the signaling is omitted in the parameter set of a lower layer. In addition, when the syntax element defined below is applicable not only to the current point cloud data stream but also to a plurality of point cloud data streams, the information may be carried in a higher level parameter set or the like.
The bitstream according to an embodiment may contain information on a prediction method in which point cloud data is predicted with reference to a plurality of frames or accumulated frames.
The transmitting and receiving apparatus according to the embodiment may signal the related information to perform prediction based on a plurality of frames. Parameters according to an embodiment (which may be referred to as metadata, signaling information, etc.) may be generated in a process of a transmitting apparatus according to an embodiment described below, and may be transmitted to a receiving apparatus according to an embodiment to be used in a process of reconstructing point cloud data. For example, the parameters may be generated by the metadata processor (or metadata generator) 12007 of fig. 12 and obtained by the metadata parser 13006 of fig. 13.
Fig. 32 illustrates a configuration of an encoding point cloud.
The abbreviations shown in fig. 32 are represented as follows:
SPS: sequence parameter set
GPS: geometric parameter set
APS: attribute parameter set
TPS: image block parameter set
Geom: geometric bitstream = geometric slice header + geometric slice data
Attr: attribute bit stream = attribute slice header + attribute slice data
A slice according to an embodiment may be referred to as a data unit. The slice header may be referred to as a data unit header. In addition, slices may be represented by other terms having similar meanings, such as blocks, boxes, and regions.
A bit stream according to an embodiment may provide tiles or slices to allow a point cloud to be divided into regions for processing. When the point cloud is divided into a plurality of areas, the respective areas may have different importance. The transmitting and receiving apparatus according to the embodiments may provide different filters or different filter units to be applied based on importance, thereby providing a method of using a more complex filtering method with higher quality results for important areas. Furthermore, by allowing different filtering to be applied to each region (region divided into tiles or slices) according to the processing capability of the receiving apparatus, rather than using a complicated filtering method for the entire point cloud, it is possible to ensure a better image quality for regions important to the user and to ensure an appropriate delay for the system. When the point cloud is divided into tiles, different filters or different filter units may be applied to each tile. When dividing the point cloud into slices, different filters or different filter units may be applied to the respective slices.
Fig. 33 illustrates an exemplary syntax of a sequence parameter set (seq_parameter_set) according to an embodiment.
In the bitstream according to an embodiment, information about point (or node) prediction based on a plurality of reference frames may be included in a Sequence Parameter Set (SPS). That is, information about prediction based on a plurality of reference frames may be signaled through SPS.
Sps_ interEnable: a flag indicating whether the sequence allows inter prediction. If the value is true, it may be indicated that some frames in the sequence allow inter-prediction (or inter-prediction); if the value is false, it may indicate that all frames in the sequence allow intra-prediction only (or intra-prediction).
NumGroupOfFrame: when sps_ interEnable is true, numGroupOfFrame indicates the periodicity of the random access points corresponding to inter-predicted frames. For example, numGroupOfFrame, equal to 8, indicates that the first frame is predicted using intra prediction and 7 subsequent frames are predicted using inter prediction. Frames following the eighth frame may be intra-predicted again. The value may vary from sequence to sequence.
MultiFrameEnableFlag: indicating whether the sequence allows prediction by multiple reference frames when inter-prediction is allowed. When the value is true, it indicates that a portion of the frame corresponding to inter-prediction is predicted based on the plurality of reference frames. When the value is false, it indicates that prediction is performed on a single frame basis.
CodingOrder [ numGroupOfFrame ]: when multiFrameEnableFlag is true, codingOrder [ numGroupOfFrame ] can signal the coding order in inter-prediction GOF by GOF, as the display order of the frames can be different from the order of the frames to be referenced. CodingOrder can be declared in the SPS and applied to any sequence. If CodingOrder is omitted, the sequence may be signaled on a frame-by-frame basis.
Fig. 34 illustrates an exemplary syntax of a geometry_parameter_set according to an embodiment.
In the bitstream according to an embodiment, information regarding prediction based on points (or nodes) of a plurality of reference frames may be included in a Geometric Parameter Set (GPS). That is, information on prediction based on a plurality of reference frames may be signaled through GPS.
Gps_ interEnable: a flag indicating whether the frame allows inter prediction. If the value is true, indicating that some frames in the sequence allow inter prediction; if the value is false, it means that all frames in the sequence allow only intra prediction.
Gps_ multiFrameEnableFlag: indicating whether prediction is performed based on one or more reference frames when the frame is an inter-predicted frame (when gps _ interEnable is true).
Gps_ nextDisplayIndex: a display index or frame index indicating a frame to be encoded after the frame. This can be inherited by matching with the coding order in the SPS. It may be calculated as the difference between the frame index and the coding order, i.e. the residual, or used as a substitute when the coding order is omitted in the SPS. gps_ nextDisplayIndex may be signaled as a difference of a position of a frame from an index of a current frame or a display index, or it may be a difference of a position of a current frame from one GoF to another, or may be an index allocated in a GoF level.
Gps_ numRefFrame: the number of frames referenced in predicting the current frame may be indicated.
Gps_ refIndex [ numRefFrame ]: an index of a frame to be referred to in predicting a current frame may be indicated. The index may be a difference of the display index from the current frame, or the position difference may be indicated at the GoF level, or a sequence of frames referenced in other frames may be signaled.
Fig. 35 illustrates an exemplary syntax of an attribute parameter set (attribute_parameter_set) according to an embodiment.
In the bitstream according to an embodiment, information on point (or node) predictions based on a plurality of reference frames may be included in an Attribute Parameter Set (APS). That is, information on prediction based on a plurality of reference frames may be signaled through APS.
Aps_ interEnable: a flag indicating whether the frame allows inter prediction. If the value is true, indicating that some frames in the sequence allow inter prediction; if the value is false, it indicates that all frames in the sequence allow only intra prediction. aps_ interEnable may inherit information from gps_ interEnable or may be managed separately.
Aps_ multiFrameEnableFlag: indicating whether a frame is predicted based on one or more reference frames when the frame is an inter-predicted frame (when aps _ interEnable is true). aps multiFrameEnableFlag may be inherited from GPS or may be managed and signaled as separate information.
Aps_ nextDisplayIndex: a display index or frame index indicating a frame to be encoded after the frame. This can be inherited by matching with the coding order in the SPS. It may be calculated as the difference between the frame index and the coding order, i.e. the residual, or used as a substitute when the coding order is omitted in the SPS. aps nextDisplayIndex may be signaled as the difference of the frame position from the index of the current frame or the index of the display index or the display index, or it may be the difference from the position of the current frame from GoF to GoF, or it may be the index allocated in the GoF level. Alternatively, the information used in gps_ nextDisplayIndex may be inherited and used.
Aps_ numRefFrame: the number of frames referred to when predicting attribute information about the current frame may be indicated. This information may be the same as ni GPS gps_ numRefFrames and may be managed separately.
Aps_ refIndex [ numRefFrame ]: an index of a frame to be referred to when predicting attribute information about a current frame may be indicated. The index may be a difference of the display index from the current frame, or may indicate a position difference in the GoF level, or may signal a sequence of frames referenced in other frames. This information may be the same as that in GPS refIndex [ numRefFrame ], and may be managed separately.
To apply an embodiment, individual points (or nodes) in the predictive geometry may be signaled with the following information.
The transmitting/receiving apparatus/method according to the embodiment may apply a None (None) mode, a single point (singlePoint) mode, a multi-point (multiPoint) mode, or a multi-point average (MultiPointAverage) mode as a mode for predicting the current point of each node. None mode represents four modes to be used in conventional intra prediction. singlePoint mode is a mode for determining one of the points as a predictor instead of processing alone the point candidates found in the reference frame. multiPoint modes are modes for determining predictors based on combinations of points found in a reference frame (e.g., an average, weighted average, or geometric mean of the points). MultiPointAverage modes can indicate prediction modes based on a combination of all points or a combination of some points.
The transmission/reception apparatus/method according to the embodiment may exclude points found in the reference frame from the candidate points for predicting the current point when the distance between the points and the current point is greater than a predetermined threshold before applying the respective modes. If the number of point candidates remaining after excluding the points exceeding the threshold is less than or equal to 1, the prediction method may be determined by considering only none or singlePoint modes in the inter-prediction structure described below. Further, if the number of remaining point candidates is less than or equal to 2, calculation may be performed only on singlePoint modes and multiPoint modes, with MultiPointAverage modes and no modes being excluded in the following inter-prediction structure. Furthermore, if all point candidates are included, the best case can be found between multiPoint modes other than no mode and MultiPointAverage mode and signaled.
Each prediction mode includes four sub-modes, and each node (or point) can calculate up to 12 modes. Residues between the predicted values for at least eight sub-modes and the current point may be found and the mode with the smallest residual may be signaled. The decoder may then receive the signaled information and perform reconstruction based on the received information.
Fig. 36 to 38 illustrate a process of a method of predicting a current point according to an embodiment.
Fig. 36 to 38 illustrate internal processes for predicting a current node using a single point mode when a transmitting/receiving apparatus/method according to an embodiment uses a predictive table. When a current point is predicted using a plurality of frames and four candidate points are used in the prediction, each case represents a candidate point. predMode may be returned for the point with the smallest residual when these points replace the current node, in which case pred may be returned as point information, and minResidual may be returned for the residual when this mode is applied. Thus, it may be entropy encoded by the transmitter and decoded by the receiver for prediction.
In order to predict the current point, the transmitting/receiving apparatus/method according to the embodiment may independently compare each selected candidate point in the reference frame with the current point and predict the current point using the candidate point having the smallest difference from the current point.
Fig. 39 illustrates a method of predicting a current point according to an embodiment.
Fig. 39 illustrates an internal process when a transmitting/receiving apparatus/method according to an embodiment uses a predictive table for predicting a current node using a multi-point mode. When a plurality of frames are used to predict the current point and four candidate points are used in the prediction, each case represents a combination or average of the candidate points. predMode may be returned for the point with the smallest residual when the given value replaces the current node when the corresponding mode is applied, in which case pred may be returned as point information, and minResidual may be returned for the residual when the mode is applied. Thus, it may be entropy encoded by the transmitter and decoded by the receiver for prediction.
The transmitting/receiving apparatus/method according to the embodiment may compare a value calculated based on a combination of candidate points selected for predicting a current point in a reference frame with the current point, and may predict the current point using a value having a minimum difference. Information about the candidate points and mode information for combining the candidate points, residual information, etc. may be signaled and used to predict the current point.
Fig. 40 illustrates a method of predicting a current point according to an embodiment.
Fig. 40 illustrates an internal process when a transmitting/receiving apparatus/method according to an embodiment uses a predictive table for predicting a current node based on an average of a plurality of points. When multiple frames are used to predict the current point and four candidate points are used in the prediction, each case may represent a combination or average of the candidate points. Cases 2 and 4 are the average or geometric mean calculated for the remaining points of the four points except the furthest point. predMode may be returned for the point with the smallest residual when the current node is replaced by one of the predicted values in the corresponding case, pred may be returned in this case as point information, and minResidual may be returned for the residual when this mode is applied. Thus, it may be entropy encoded by the transmitter and decoded by the receiver for prediction.
The transmitting/receiving apparatus/method according to the embodiment may compare a value calculated based on a combination of candidate points selected in the reference frame for predicting the current point with the current point, and may predict the current point using a value having the smallest difference. Information about the candidate points and mode information for combining the candidate points, residual information, etc. may be signaled and used to predict the current point.
The transmitting/receiving apparatus/method according to the embodiment may correspond to the transmitting/receiving apparatus of fig. 1, the transmitting/receiving apparatus of fig. 2, the point cloud encoder of fig. 4, the point cloud decoders of fig. 10 and 11, the transmitting apparatus of fig. 12, the receiving apparatus of fig. 13, the apparatus of fig. 14, and the transmitting apparatus/method of fig. 44, the receiving apparatus/method of fig. 45, the transmitting apparatus/method of fig. 46, the receiving apparatus/method of fig. 47, the transmitting method of fig. 48, or the receiving method of fig. 49, or may be combined therewith. The transmitting/receiving apparatus/method according to the embodiment may perform inter prediction based on the reference frame and signaling related to the prediction method described with reference to fig. 33 to 39.
Fig. 41 shows an exemplary syntax of a sequence parameter set (seq_parameter_set).
Node (or point) prediction information using the accumulated reference frames may be added to and signaled in the SPS.
Sps_ interEnable: a flag indicating whether the sequence allows inter prediction. If the value is true, it may be indicated that some frames in the sequence allow inter prediction; if the value is false, it may indicate that all frames in the sequence allow intra-prediction only.
NumGroupOfFrame: when sps interEnable is true, numGroupOfFrame may indicate the periodicity of the random access points corresponding to the intra-predicted frame. For example, when numGroupOfFrame is equal to 8, the first frame is predicted using intra-prediction, and inter-prediction is used for 7 subsequent frames. Then, intra prediction may be performed again on the next frame. The value may vary from sequence to sequence.
CumulFrameEnableFlag: when inter prediction is allowed, it may be indicated whether the sequence should cumulatively use reference frames for inter prediction.
FIG. 42 illustrates an exemplary syntax of a geometry_parameter_set according to an embodiment.
Node (or point) prediction information using the accumulated reference frames may be added to and signaled in the GPS.
Gps_ interEnable: a flag indicating whether the frame allows inter prediction. If the value is true, it may be indicated that some frames in the sequence allow inter prediction; if the value is false, it may indicate that all frames in the sequence allow intra-prediction only.
Gps_ cumulFrameEnableFlag: indicating whether prediction is performed based on the accumulated reference frame when the accumulated reference frame is an inter-predicted frame (when gps_ interEnable is true).
Gps_ numRefFrame: the number of frames included in the accumulated reference frame when predicting the current frame may be indicated.
RefIndex [ numRefFrame ]: an index of frames required to create the accumulated reference frame when predicting the current frame may be indicated. The index may be a difference of the display index from the current frame, or may indicate a position difference in the GoF level, or a sequence of frames referenced in other frames.
PredictionDirection: the following pattern information may be used to indicate the direction of information to be selected among candidate points selected to predict each node in the current frame after creating the accumulated reference frame.
Fig. 43 shows an example of an attribute parameter set (attribute_parameter_set) according to an embodiment.
Node (or point) prediction information using the accumulated reference frames may be added to and signaled in the APS.
Aps_ interEnable: a flag indicating whether the frame allows inter prediction. If the value is true, it may be indicated that some frames in the sequence allow inter prediction; if the value is false, it may indicate that all frames in the sequence allow intra-prediction only. aps_ interEnable may inherit information from gps_ interEnable or may be managed separately.
Aps_ cumulFrameEnableFlag: when the frame is an inter-prediction frame (when aps_ interEnable is true), it is indicated whether prediction is performed based on the accumulated reference frame at the time of predicting point cloud attribute information.
Aps_ numRefFrame: indicating the number of frames included in the accumulated reference frame when predicting the current frame.
RefIndex [ numRefFrame ]: an index of frames required to create the accumulated reference frame when predicting the current frame may be indicated. The index may be a difference of the display index from the current frame, or may indicate a position difference in the GoF level, or a sequence of frames referenced in other frames. This information may be inherited from the GPS or may be managed as a separate index.
PredictionDirection: the following pattern information may be used to indicate the direction of information to be selected among candidate points selected to predict each node in the current frame after creating the accumulated reference frame. This information may be inherited from the GPS or may be managed as a separate index.
Fig. 44 is a flowchart illustrating an apparatus/method for transmitting point cloud data according to an embodiment.
The transmitting/receiving apparatus/method according to the embodiment may correspond to the transmitting/receiving apparatus of fig. 1, the transmitting/receiving apparatus of fig. 2, the point cloud encoder of fig. 4, the point cloud decoders of fig. 10 and 11, the transmitting apparatus of fig. 12, the receiving apparatus of fig. 13, the apparatus of fig. 14, and the transmitting apparatus/method of fig. 44, the receiving apparatus/method of fig. 45, the transmitting apparatus/method of fig. 46, the receiving apparatus/method of fig. 47, the transmitting method of fig. 48, or the receiving method of fig. 49, or may be combined therewith. The transmitting/receiving apparatus/method according to the embodiment may perform inter prediction based on a plurality of reference frames and signaling related to a prediction method.
Fig. 44 may be understood as a flowchart illustrating a process of processing point cloud data and a point cloud data transmitting/receiving apparatus/method according to components performing respective steps in the process.
When point cloud data is input, a transmitting apparatus/method according to an embodiment may cluster and rank the data to facilitate compression 4410 using predictive geometry.
After processing, it is determined whether to perform intra-prediction 4421 on the point cloud data, and geometric intra-prediction 4426 may be performed with intra-prediction frames. When the prediction is not intra prediction, it is determined whether to reference only one reference frame or multiple reference frames for inter prediction 4422. When predicting using a single reference frame, a point closest to the point processed before the current point may be found within the single reference frame based on the internal standard 4423. In the case of performing prediction using a plurality of reference frames, a point 4424 closest to the current point is found in each reference frame according to the internal processing based on the point processed before the current point. Inter prediction 4425 may be performed based on the point candidates to be used for prediction, and the points at which intra prediction and inter prediction are completed may be input into a reference frame buffer in coding order for prediction of the next frame. When the prediction mode is determined for all nodes, residuals, reference frame information, etc. that may be generated when the prediction mode is applied may be entropy encoded to generate an output bitstream.
The component blocks illustrated in fig. 44 may include processors configured to perform operations for processing point cloud data and instructions for operating the processors, and may be referred to as units, processors, modules, and the like. Fig. 44 may represent a processing apparatus including a combination of components such as a processor or a module for processing point cloud data, or may represent a data processing method illustrating data processing operations performed by the respective components.
Referring to fig. 44, a point cloud data transmitting/receiving apparatus/method according to an embodiment may cluster or sort the point cloud data 4410 and determine whether to perform intra-prediction 4421. If the prediction is not intra prediction, the transmitting/receiving apparatus/method may determine whether to refer to the plurality of frames 4422. When referring to a plurality of frames, a candidate point 4424 closest to the current point may be searched based on the plurality of frames, and the current point 4425 may be predicted according to a prediction mode using the candidate point.
Fig. 45 is a flowchart illustrating an apparatus/method for receiving point cloud data according to an embodiment.
The transmitting/receiving apparatus/method according to the embodiment may correspond to the transmitting/receiving apparatus of fig. 1, the transmitting/receiving apparatus of fig. 2, the point cloud encoder of fig. 4, the point cloud decoders of fig. 10 and 11, the transmitting apparatus of fig. 12, the receiving apparatus of fig. 13, the apparatus of fig. 14, and the transmitting apparatus/method of fig. 44, the receiving apparatus/method of fig. 45, the transmitting apparatus/method of fig. 46, the receiving apparatus/method of fig. 47, the transmitting method of fig. 48, or the receiving method of fig. 49, or may be combined therewith. The transmitting/receiving apparatus/method according to the embodiment may perform inter prediction based on a plurality of reference frames and signaling related to a prediction method.
Fig. 45 may be understood as a flowchart illustrating a process of processing point cloud data and a point cloud data transmitting/receiving apparatus/method according to components performing respective steps in the process.
The reception apparatus/method according to the embodiment may perform entropy decoding 4510 on the received bitstream. When a frame is used for intra prediction, it may apply the conventional intra prediction method 4526 and then update the geometric information related to the point cloud data. When the prediction is not intra-prediction, it is determined whether to perform the prediction 4522 based on a single reference frame. When referring to a single frame, a closest point 4523 is found in the reference frame based on a predetermined criterion according to the prediction mode and the reference frame information, a current node is predicted according to the prediction mode received from the transmitting apparatus according to the embodiment, and a prediction value and a residual are added and input to the current node. This operation is repeated until all nodes up to the leaf nodes of the tree of predictive geometry are reconstructed. When referring to multiple frames, the prediction mode and the reference frame may be retrieved from the reference frame buffer and the closest point may be found in the reference frame based on the predetermined criteria 4524. Then, prediction is performed according to a prediction mode received from the transmitting apparatus according to the embodiment, and a prediction value and a residual are added and input to the current node. Then, when the point is reconstructed after the prediction of all nodes is completed according to the prediction mode, the reconstructed point may be transmitted to the attribute prediction module for the prediction of attribute information.
The component blocks illustrated in fig. 45 may include processors configured to perform operations for processing point cloud data and instructions for operating the processors, and may be referred to as units, processors, modules, and the like. Fig. 45 may represent a processing apparatus including a combination of components such as a processor or a module for processing point cloud data, or may represent a data processing method illustrating data processing operations performed by the respective components.
Referring to fig. 45, the point cloud data transmitting/receiving apparatus/method according to the embodiment may decode the point cloud data 4510 and determine whether to perform intra prediction 4521. If the prediction is not intra prediction, the transmitting/receiving apparatus/method may determine whether to refer to the plurality of frames 4522. When referring to a plurality of frames, a candidate point 4524 closest to the current point may be searched for based on the plurality of frames, and the current point 4525 may be predicted according to a prediction mode based on the candidate point.
Fig. 46 is a flowchart illustrating an apparatus/method for transmitting point cloud data according to an embodiment.
The transmitting/receiving apparatus/method according to the embodiment may correspond to the transmitting/receiving apparatus of fig. 1, the transmitting/receiving apparatus of fig. 2, the point cloud encoder of fig. 4, the point cloud decoders of fig. 10 and 11, the transmitting apparatus of fig. 12, the receiving apparatus of fig. 13, the apparatus of fig. 14, and the transmitting apparatus/method of fig. 44, the receiving apparatus/method of fig. 45, the transmitting apparatus/method of fig. 46, the receiving apparatus/method of fig. 47, the transmitting method of fig. 48, or the receiving method of fig. 49, or may be combined therewith. The transmitting/receiving apparatus/method according to the embodiment may perform inter prediction based on a plurality of reference frames or accumulated reference frames and signaling related to a prediction method.
Fig. 46 may be understood as a flowchart showing a process of processing point cloud data and a point cloud data transmitting/receiving apparatus/method according to components performing respective steps in the process.
When point cloud data is input, the transmitting/receiving device/method according to an embodiment may cluster and sort the data to facilitate compression 4610 using predictive geometry. After the processing, it is determined whether to perform intra-prediction 4621 on the point cloud data, and geometric intra-prediction 4626 may be performed with intra-prediction frames. When the prediction is not intra prediction, it is determined whether to accumulate the reference frame 4622 for inter prediction. When only one reference frame is used for prediction, the point 4623 closest to the point processed before the current point may be found within a single reference frame based on internal criteria. In the case of performing prediction using accumulated reference frames, information about frames to be referred to is specified, and the specified reference frames are accumulated into one point cloud, and then sorted 4623. Based on the point processed before the current point, the point closest to the current point is found in the accumulated reference frame according to the internal processing. From the closest points, two closest points on both sides of the azimuth direction are found. Based on the two points found, the closest point is found among the points with larger radius to find the candidate point for predicting the current node. From the candidate points, the best point set or prediction direction for predicting the current node is selected, and inter prediction 4625 is performed. The points at which the intra prediction and inter prediction are completed are input to the reference frame buffer in coding order for prediction of the next frame. When the prediction mode is determined for all nodes, residuals, reference frame information, etc. that may be generated when the prediction mode is applied may be entropy encoded to generate an output bitstream.
The component blocks illustrated in fig. 46 may include processors configured to perform operations for processing point cloud data and instructions for operating the processors, and may be referred to as units, processors, modules, and the like. Fig. 46 may represent a processing apparatus including a combination of components such as a processor or a module for processing point cloud data, or may represent a data processing method illustrating data processing operations performed by the respective components.
Referring to fig. 46, a point cloud data transmitting/receiving apparatus/method according to an embodiment may cluster or sort the point cloud data 4610 and determine whether to perform intra-prediction 4621. If the prediction is not intra prediction, the transmitting/receiving device/method may determine whether to use the accumulated reference frame 4622. When referring to the accumulated reference frames, the accumulated reference frames may be created based on the plurality of frames and the points may be ordered 4624. Then, a point (or predictor) may be used to predict a candidate point 4625 closest to the current point and the current point searched in the accumulated reference frame according to the prediction mode.
According to the embodiment, by predicting point cloud data with reference to a plurality of frames, information about a current point in a current frame can be predicted more accurately. Thus, the difference between the predicted information and the actual information may be reduced, resulting in reduced residuals, reduced amounts of data to be transmitted and received, increased transmission and reception efficiency, and reduced encoding or decoding latency.
Fig. 47 is a flowchart illustrating an apparatus/method for receiving point cloud data according to an embodiment.
The transmitting/receiving apparatus/method according to the embodiment may correspond to the transmitting/receiving apparatus of fig. 1, the transmitting/receiving apparatus of fig. 2, the point cloud encoder of fig. 4, the point cloud decoders of fig. 10 and 11, the transmitting apparatus of fig. 12, the receiving apparatus of fig. 13, the apparatus of fig. 14, and the transmitting apparatus/method of fig. 44, the receiving apparatus/method of fig. 45, the transmitting apparatus/method of fig. 46, the receiving apparatus/method of fig. 47, the transmitting method of fig. 48, or the receiving method of fig. 49, or may be combined therewith. The transmitting/receiving apparatus/method according to the embodiment may perform inter prediction based on a plurality of reference frames or accumulated reference frames and signaling related to a prediction method.
Fig. 47 may be understood as a flowchart showing a process of processing point cloud data and a point cloud data transmitting/receiving apparatus/method according to components performing respective steps in the process.
The transmitting/receiving apparatus/method according to the embodiment may perform entropy decoding 4710 on the received bitstream. When a frame is used for intra prediction, it may apply a conventional intra prediction method, and then may update geometric information related to the point cloud, or determine whether to perform prediction 4722 based on a single reference frame. When referring to a single frame, a closest point 4723 is found in the reference frame based on a predetermined criterion according to the prediction mode and the reference frame information, a current node is predicted according to the prediction mode received from the transmitting apparatus according to the embodiment, and the residuals are added and input to the current node. This operation is repeated until all nodes up to the leaf nodes of the tree of predictive geometry are reconstructed. When prediction is performed based on the accumulated reference frame 4722, the prediction mode and the reference frame may be fetched from the reference frame buffer and the accumulated reference frame may be generated from the reference frame. The closest point 4724 may then be found in the reference frame based on predetermined criteria. Then, prediction is performed according to a prediction mode received from the transmitting apparatus according to the embodiment. The predicted value and the residual error received from the transmitter are added and input to the current node. Then, when the point is reconstructed after the prediction of all the nodes is completed according to the prediction mode, the reconstructed point may be transmitted to the attribute prediction module for the prediction of attribute information.
The component blocks shown in fig. 47 may include processors configured to perform operations for processing point cloud data and instructions for operating the processors, and may be referred to as units, processors, modules, and the like. Fig. 47 may represent a processing apparatus including a combination of components such as a processor or a module for processing point cloud data, or may represent a data processing method illustrating data processing operations performed by the respective components.
Referring to fig. 47, the point cloud data transmitting/receiving apparatus/method according to the embodiment may decode the point cloud data 4710 and determine whether to perform the intra prediction 4571. If the prediction is not intra prediction, the transmitting/receiving device/method may determine whether to use the accumulated reference frame 4722. When the accumulated reference frame is used, the accumulated reference frame is generated based on a plurality of frames, and the candidate point 4724 closest to the current point may be searched in the accumulated reference frame. Then, the current point 4725 may be predicted according to a prediction mode based on the found candidate point.
Fig. 48 is a flowchart illustrating a method of transmitting point cloud data according to an embodiment.
The transmitting/receiving apparatus/method according to the embodiment may correspond to the transmitting/receiving apparatus of fig. 1, the transmitting/receiving apparatus of fig. 2, the point cloud encoder of fig. 4, the point cloud decoders of fig. 10 and 11, the transmitting apparatus of fig. 12, the receiving apparatus of fig. 13, the apparatus of fig. 14, and the transmitting apparatus/method of fig. 44, the receiving apparatus/method of fig. 45, the transmitting apparatus/method of fig. 46, the receiving apparatus/method of fig. 47, the transmitting method of fig. 48, or the receiving method of fig. 49, or may be combined therewith. The transmitting/receiving apparatus/method according to the embodiment may perform inter prediction based on a plurality of reference frames or accumulated reference frames and signaling related to a prediction method.
The transmission method according to the embodiment includes an operation S4800 of encoding point cloud data and an operation S4810 of transmitting a bitstream including the point cloud data.
As described with reference to fig. 15, the point cloud data may include a plurality of frames.
Operation S4800 encoding the point cloud data may include predicting a first point belonging to a first frame. The first frame may represent the current frame and the first point may represent the current point. In other words, in the transmission method according to the embodiment, the operation S4800 of encoding the point cloud data may include predicting a current point belonging to the current frame. In this case, predicting may include predicting the current point based on points belonging to one or more frames.
A method for predicting a current point belonging to a current frame is described with reference to fig. 20 to 28 and fig. 30 and 31.
One or more frames referenced to predict the current point may be before or after the current frame. Further, one of the one or more frames may precede the current frame and the other frame may follow the current frame. Forward prediction, backward prediction, or bi-prediction may be performed depending on the order of one or more frames referenced by the current frame. The order may be a display order or an encoding order.
Fig. 20 to 28 illustrate forward prediction, backward prediction, or bi-prediction.
Referring to fig. 20 to 28, predicting includes searching for a third point belonging to one frame among the one or more frames based on a second point processed before the current point in the current frame. Here, the second point may represent a point decoded before the current point, and the third point may represent a point similar to the second point. Based on the laser ID, azimuth or radius, it may be determined whether the third point is similar to the second point. In other words, the third point may be the point whose laser ID is the same as the second point and whose azimuth or radius is closest to the second point.
Further, the predicting may include selecting a plurality of candidate points based on the second point or the third point, the candidate points being used to predict the current point. The candidate points may be referred to as predictors, point candidates, etc., and may represent points used to calculate a predicted value of the current point. The candidate points may be selected based on laser ID, azimuth or radius. The current point may be predicted based on a value of one of the candidate points or a value of a combination of the plurality of candidate points. The candidate points may be combined in various ways, for example by averaging or weighting. In other words, the current point may be predicted based on at least two candidate points.
For example, the one or more frames may include a second frame and a third frame. In the prediction, a fourth point found in the second frame by a search based on the second point or the third point may be determined as the first candidate point, and a fifth point found in the third frame by a search based on the fourth point may be determined as the second candidate point. Then, the current point is predicted based on the first candidate point and the second candidate point. In this case, the first point (current point) to the fifth point contain at least one of identification information (laser ID), azimuth information, or radius information.
In processing the point cloud data, the bitstream may contain information indicating whether to perform prediction based on the plurality of frames, and may contain information configured to identify one or more frames referenced to predict the current point. Accordingly, the receiving apparatus/method according to the embodiment may reconstruct the current point based on the information indicating whether to perform the multi-frame based prediction and the identification information about the reference frame.
Signaling information contained in a bit stream according to an embodiment is described with reference to fig. 32 to 35 and 41 to 43.
The transmission method according to the embodiment may be performed by the components of the transmission apparatus/method shown in fig. 44 and 46. According to an embodiment, the prediction section 4420 in fig. 44 or the prediction section 4620 in fig. 46 may perform multi-frame based prediction. Which may be referred to as a prediction unit, a prediction module, a prediction processor, etc., and may be connected or combined with other figures illustrating a transmission apparatus/method according to an embodiment.
The transmitting apparatus according to an embodiment may include: an encoder configured to encode point cloud data comprising a plurality of frames; and a transmitter configured to transmit a bitstream containing the point cloud data. The transmission apparatus according to the embodiment may be represented by an apparatus including components such as units, modules, or processors that perform the processing procedures of the transmission method described above.
Fig. 49 is a flowchart illustrating a method of receiving point cloud data according to an embodiment.
The transmitting/receiving apparatus/method according to the embodiment may correspond to the transmitting/receiving apparatus of fig. 1, the transmitting/receiving apparatus of fig. 2, the point cloud encoder of fig. 4, the point cloud decoders of fig. 10 and 11, the transmitting apparatus of fig. 12, the receiving apparatus of fig. 13, the apparatus of fig. 14, and the transmitting apparatus/method of fig. 44, the receiving apparatus/method of fig. 45, the transmitting apparatus/method of fig. 46, the receiving apparatus/method of fig. 47, the transmitting method of fig. 48, or the receiving method of fig. 49, or may be combined therewith. The transmitting/receiving apparatus/method according to the embodiment may perform inter prediction based on a plurality of reference frames or accumulated reference frames and signaling related to a prediction method.
The receiving method according to the embodiment includes an operation S4900 of receiving a bit stream containing point cloud data and an operation S4910 of decoding the point cloud data.
As described with reference to fig. 15, the point cloud data may include a plurality of frames.
Operation S4910 of decoding the point cloud data may include predicting a first point belonging to a first frame. The first frame may represent the current frame and the first point may represent the current point. In other words, in the receiving method according to the embodiment, operation S4910 of decoding the point cloud data may include predicting a current point belonging to the current frame. In this case, predicting may include predicting the current point based on points belonging to one or more frames.
A method for predicting a current point belonging to a current frame is described with reference to fig. 20 to 28 and fig. 30 and 31.
One or more frames referenced to predict the current point may be before or after the current frame. Further, one of the one or more frames may precede the current frame and the other frame may follow the current frame. Forward prediction, backward prediction, or bi-directional prediction may be performed according to the order of one or more frames referenced by the current frame. The order may be a display order or an encoding order.
Fig. 20 to 28 illustrate forward prediction, backward prediction, or bi-prediction.
Referring to fig. 20 to 28, predicting includes searching for a third point belonging to one frame among the one or more frames based on a second point processed before the current point in the current frame. Here, the second point may represent a point decoded before the current point, and the third point may represent a point similar to the second point. Based on the laser ID, azimuth or radius, it may be determined whether the third point is similar to the second point. In other words, the third point may be the point whose laser ID is the same as the second point and whose azimuth or radius is closest to the second point.
Further, the predicting may include selecting a plurality of candidate points based on the second point or the third point, the candidate points being used to predict the current point. The candidate points may be referred to as predictors, point candidates, etc., and may represent points used to calculate a predicted value of the current point. The candidate points may be selected based on laser ID, azimuth or radius. The current point may be predicted based on a value of one of the candidate points or a value of a combination of the plurality of candidate points. The candidate points may be combined in various ways, for example by averaging or weighting. In other words, the current point may be predicted based on at least two candidate points.
For example, the one or more frames may include a second frame and a third frame. In the prediction, a fourth point found in the second frame by a search based on the second point or the third point may be determined as the first candidate point, and a fifth point found in the third frame by a search based on the fourth point may be determined as the second candidate point. Then, the current point is predicted based on the first candidate point and the second candidate point. In this case, the first point (current point) to the fifth point contain at least one of identification information (laser ID), azimuth information, or radius information.
In processing the point cloud data, the bitstream may contain information indicating whether to perform prediction based on the plurality of frames, and may contain information configured to identify one or more frames referenced to predict the current point. Accordingly, the receiving apparatus/method according to the embodiment may reconstruct the current point based on the information indicating whether to perform the multi-frame based prediction and the identification information about the reference frame.
Signaling information contained in a bit stream according to an embodiment is described with reference to fig. 32 to 35 and 41 to 43.
The receiving method according to the embodiment may be performed by the components of the receiving apparatus/method shown in fig. 45 and 47. According to an embodiment, the prediction section 4520 in fig. 45 or the prediction section 4620 in fig. 47 may perform multi-frame based prediction. They may be referred to as prediction units, prediction modules, prediction processors, etc., and may be connected or combined with other figures showing a receiving device/method according to an embodiment.
The receiving apparatus according to an embodiment may include: a receiver configured to receive point cloud data comprising a plurality of frames; and a decoder configured to decode a bitstream containing the point cloud data. The reception apparatus according to the embodiment may be represented by an apparatus including components such as units, modules, or processors that perform the processing procedures of the reception method described above.
The transmitting/receiving apparatus/method according to the embodiment proposes a method of considering a plurality of reference frames for inter-frame motion prediction in a predictive geometric method for point cloud content. By using multiple frames, the accuracy of prediction can be improved. Furthermore, by predicting nodes from a pre-predicted frame, the encoding time required in the tree generation process can be reduced.
The present disclosure proposes a method for considering accumulated reference frames of inter-frame motion in a predictive geometrical method of point cloud content. By using multiple frames, the accuracy of prediction can be improved. Furthermore, by predicting nodes from a pre-predicted frame, the encoding time required in the tree generation process can be reduced.
The processing of the transmitting and receiving apparatus according to the above-described embodiment can be described in conjunction with the following point cloud compression processing. Further, the operations according to the embodiments described in the present specification may be performed by a transmitting/receiving apparatus including a memory and/or a processor according to the embodiments. The memory may store a program for processing/controlling operations according to the embodiments, and the processor may control various operations described in the present specification. The processor may be referred to as a controller or the like. In an embodiment, the operations may be performed by firmware, software, and/or combinations thereof. The firmware, software, and/or combinations thereof may be stored in a processor or memory.
Operations according to embodiments described in the present specification may be performed by a transmitting/receiving apparatus including a memory and/or a processor according to embodiments. The memory may store a program for processing/controlling operations according to the embodiments, and the processor may control various operations described in the present specification. The processor may be referred to as a controller or the like. In an embodiment, the operations may be performed by firmware, software, and/or combinations thereof. The firmware, software, and/or combinations thereof may be stored in a processor or memory.
Embodiments have been described in terms of methods and/or apparatus, which may be applied to complement each other.
Although the drawings are described separately for simplicity, new embodiments may be designed by combining the embodiments shown in the various drawings. A recording medium readable by a computer and recorded with a program for executing the above-described embodiments is also designed according to the needs of those skilled in the art and falls within the scope of the appended claims and equivalents thereof. The apparatus and method according to the embodiments may not be limited to the configurations and methods of the above-described embodiments. Various modifications may be made to the embodiments by selectively combining all or some of the embodiments. Although the preferred embodiments have been described with reference to the accompanying drawings, those skilled in the art will appreciate that various modifications and changes can be made to the embodiments without departing from the spirit or scope of the present disclosure as described in the appended claims. These modifications should not be construed separately from the technical ideas or viewpoints of the embodiments.
The various elements of the apparatus of the embodiments may be implemented in hardware, software, firmware, or a combination thereof. The various elements of the embodiments may be implemented by a single chip (e.g., a single hardware circuit). According to an embodiment, the components according to the embodiment may be implemented as separate chips, respectively. According to an embodiment, at least one or more components of an apparatus according to an embodiment may include one or more processors capable of executing one or more programs. One or more programs may perform any one or more operations/methods according to embodiments or include instructions for performing the same.
Executable instructions for performing the methods/operations of an apparatus according to embodiments may be stored in a non-transitory CRM or other computer program product configured to be executed by one or more processors, or may be stored in a transitory CRM or other computer program product configured to be executed by one or more processors.
In addition, the memory according to the embodiment may be used to cover not only volatile memory (e.g., RAM) but also the concept of nonvolatile memory, flash memory, and PROM. In addition, it may be implemented in the form of a carrier wave (e.g., transmission via the internet). In addition, the processor-readable recording medium may be distributed to computer systems connected via a network such that the processor-readable code is stored and executed in a distributed manner.
In this specification, the terms "/" and "," should be interpreted as indicating "and/or". For example, the expression "A/B" may mean "A and/or B". Furthermore, "A, B" may mean "a and/or B". Further, "a/B/C" may mean "at least one of A, B and/or C". In addition, "a/B/C" may mean "at least one of A, B and/or C". Furthermore, in the present specification, the term "or" should be interpreted as indicating "and/or". For example, the expression "a or B" may mean 1) a only, 2) B only, or 3) both a and B. In other words, the term "or" as used in this document should be interpreted as indicating "additionally or alternatively".
Terms such as first and second may be used to describe various elements of the embodiments. However, the various components according to the embodiments should not be limited by the above terms. These terms are only used to distinguish one element from another element. For example, the first user input signal may be referred to as a second user input signal. Similarly, the second user input signal may be referred to as the first user input signal. The use of these terms should be construed without departing from the scope of the various embodiments. The first user input signal and the second user input signal are both user input signals, but do not mean the same user input signal unless the context clearly dictates otherwise.
The terminology used to describe the embodiments is used for the purpose of describing particular embodiments and is not intended to be limiting of embodiments. As used in the description of the embodiments and in the claims, the singular forms "a," "an," and "the" include plural referents unless the context clearly dictates otherwise. The expression "and/or" is used to include all possible combinations of terms. Terms such as "comprises" or "comprising" are intended to indicate the presence of a graphic, quantity, step, element, and/or component, and should be understood as not excluding the possibility of additional graphics, quantities, steps, elements, and/or components being present. As used herein, conditional expressions such as "if" and "when" are not limited to the alternative case, and are intended to be interpreted as performing the relevant operation when a particular condition is satisfied, or as interpreting the relevant definition in accordance with the particular condition.
The operations according to the embodiments described in the present specification may be performed by a transmitting/receiving apparatus including a memory and/or a processor according to the embodiments. The memory may store programs for processing/controlling operations according to embodiments, and the processor may control various operations described in the present specification. The processor may be referred to as a controller or the like. In an embodiment, the operations may be performed by firmware, software, and/or combinations thereof. The firmware, software, and/or combinations thereof may be stored in a processor or memory.
The operations according to the above-described embodiments may be performed by the transmitting apparatus and/or the receiving apparatus according to the embodiments. The transmitting/receiving device includes a transmitter/receiver configured to transmit and receive media data, a memory configured to store instructions (program code, algorithms, flowcharts, and/or data) for a process according to an embodiment, and a processor configured to control operation of the transmitting/receiving device.
A processor may be referred to as a controller or the like and may correspond to, for example, hardware, software, and/or combinations thereof. Operations according to the above embodiments may be performed by a processor. Further, the processor may be implemented as an encoder/decoder for the operation of the above-described embodiments.
Mode for the invention
As described above, the related contents are described in the best mode of carrying out the embodiment.
Industrial applicability
As described above, the embodiments may be applied in whole or in part to the point cloud data transmitting/receiving device and system. It will be apparent to those skilled in the art that various changes or modifications may be made to the embodiments within the scope of the embodiments. Accordingly, it is intended that the embodiments cover modifications and variations of this disclosure provided they come within the scope of the appended claims and their equivalents.

Claims (19)

1. A method of transmitting point cloud data, the method comprising the steps of:
encoding point cloud data comprising a plurality of frames; and
And sending a bit stream containing the point cloud data.
2. The method of claim 1, wherein the encoding of the point cloud data comprises:
A first point belonging to a first frame is predicted,
Wherein the predicting comprises:
the first point is predicted based on points belonging to one or more frames of the plurality of frames.
3. The method according to claim 2, wherein:
The one or more frames precede the first frame in sequence; or (b)
The one or more frames are sequentially subsequent to the first frame; or (b)
One of the one or more frames precedes the first frame and another of the one or more frames follows the first frame.
4. A method according to claim 3, wherein the predicting comprises:
Searching for a third point belonging to one of the one or more frames based on a second point processed before the first point in the first frame.
5. The method of claim 4, wherein the predicting comprises:
a plurality of candidate points are selected based on the third point, the candidate points being used for prediction of the first point.
6. The method of claim 5, wherein the predicting comprises:
The first point is predicted based on one of the candidate points.
7. The method of claim 5, wherein the predicting comprises:
The first point is predicted based on at least two of the candidate points.
8. The method of claim 6 or 7, wherein the bitstream contains information indicating whether to perform the prediction based on the plurality of frames.
9. The method of claim 8, wherein the bitstream contains information configured to identify the one or more frames.
10. The method of claim 9, wherein the one or more frames comprise a second frame and a third frame,
Wherein the predicting comprises:
determining a fourth point found in the second frame by a search based on the second point or the third point as a first candidate point;
Determining a fifth point found in the third frame by the search based on the fourth point as a second candidate point; and
Predicting the first point based on the first candidate point and the second candidate point,
Wherein the first to fifth points contain at least one of identification information, azimuth information, or radius information.
11. An apparatus for transmitting point cloud data, the apparatus comprising:
an encoder configured to encode point cloud data comprising a plurality of frames; and
A transmitter configured to transmit a bit stream containing the point cloud data.
12. A method of receiving point cloud data, the method comprising:
Receiving a bit stream containing point cloud data, the point cloud data comprising a plurality of frames; and
And decoding the point cloud data.
13. The method of claim 12, wherein the decoding comprises:
A first point belonging to a first frame is predicted,
Wherein the predicting comprises:
the first point is predicted based on points belonging to one or more frames of the plurality of frames.
14. The method according to claim 13, wherein:
The one or more frames precede the first frame in sequence; or (b)
The one or more frames are sequentially subsequent to the first frame; or (b)
One of the one or more frames precedes the first frame and another of the one or more frames follows the first frame.
15. The method of claim 14, wherein the predicting comprises:
Searching for a third point belonging to one of the one or more frames based on a second point processed before the first point in the first frame.
16. The method of claim 15, wherein the predicting comprises:
a plurality of candidate points are selected based on the third point, the candidate points being used for prediction of the first point.
17. The method of claim 16, wherein the predicting comprises:
the first point is predicted based on one of the candidate points.
18. The method of claim 16, wherein the predicting comprises:
The first point is predicted based on at least two of the candidate points.
19. An apparatus for receiving point cloud data, the apparatus comprising:
A receiver configured to receive a bit stream containing point cloud data, the point cloud data comprising a plurality of frames; and
A decoder configured to decode the point cloud data.
CN202280067455.8A 2021-10-06 2022-10-06 Point cloud data transmitting method, point cloud data transmitting device, point cloud data receiving method and point cloud data receiving device Pending CN118056404A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR10-2021-0132230 2021-10-06
KR10-2021-0143216 2021-10-26
KR20210143216 2021-10-26
PCT/KR2022/015035 WO2023059089A1 (en) 2021-10-06 2022-10-06 Point cloud data transmission method, point cloud data transmission device, point cloud data reception method, and point cloud data reception device

Publications (1)

Publication Number Publication Date
CN118056404A true CN118056404A (en) 2024-05-17

Family

ID=91052260

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202280067455.8A Pending CN118056404A (en) 2021-10-06 2022-10-06 Point cloud data transmitting method, point cloud data transmitting device, point cloud data receiving method and point cloud data receiving device

Country Status (1)

Country Link
CN (1) CN118056404A (en)

Similar Documents

Publication Publication Date Title
CN114503571B (en) Point cloud data transmitting device and method, and point cloud data receiving device and method
KR102340238B1 (en) An apparatus for transmitting point cloud data, a mehtod for transmitting point cloud data, an apparatus for receiving point cloud data, and a method for receiving point cloud data
CN114616827A (en) Point cloud data transmitting device and method, and point cloud data receiving device and method
CN113597771A (en) Apparatus and method for processing point cloud data
CN114175100A (en) Method and apparatus for processing point cloud data
CN115462083A (en) Apparatus for transmitting point cloud data, method for transmitting point cloud data, apparatus for receiving point cloud data, and method for receiving point cloud data
CN115428467B (en) Point cloud data transmitting device and method, and point cloud data receiving device and method
CN114762334B (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device and point cloud data receiving method
CN114073085A (en) Point cloud data processing method and device
CN114009046A (en) Apparatus and method for processing point cloud data
CN114073086A (en) Point cloud data processing apparatus and method
EP4325852A1 (en) Point cloud data transmission method, point cloud data transmission device, point cloud data reception method, and point cloud data reception device
CN115918092A (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device, and point cloud data receiving method
CN116724556A (en) Point cloud data transmitting device and method, and point cloud data receiving device and method
CN115210765A (en) Point cloud data transmitting device, transmitting method, processing device and processing method
CN116349229A (en) Point cloud data transmitting device and method, and point cloud data receiving device and method
CN114051730A (en) Apparatus and method for processing point cloud data
KR102294613B1 (en) An apparatus for transmitting point cloud data, a method for transmitting point cloud data, an apparatus for receiving point cloud data, and a method for receiving point cloud data
CN117581541A (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device and point cloud data receiving method
CN116965019A (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device and point cloud data receiving method
CN116438799A (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device and point cloud data receiving method
CN115668920A (en) Point cloud data transmitting device, point cloud data transmitting method, point cloud data receiving device, and point cloud data receiving method
CN118056404A (en) Point cloud data transmitting method, point cloud data transmitting device, point cloud data receiving method and point cloud data receiving device
EP4373098A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
CN118160310A (en) Point cloud data transmitting method, point cloud data transmitting device, point cloud data receiving method and point cloud data receiving device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination