WO2024011417A1 - 编解码方法、解码器、编码器及计算机可读存储介质 - Google Patents

编解码方法、解码器、编码器及计算机可读存储介质 Download PDF

Info

Publication number
WO2024011417A1
WO2024011417A1 PCT/CN2022/105253 CN2022105253W WO2024011417A1 WO 2024011417 A1 WO2024011417 A1 WO 2024011417A1 CN 2022105253 W CN2022105253 W CN 2022105253W WO 2024011417 A1 WO2024011417 A1 WO 2024011417A1
Authority
WO
WIPO (PCT)
Prior art keywords
occupancy
upsampled
voxels
group
groups
Prior art date
Application number
PCT/CN2022/105253
Other languages
English (en)
French (fr)
Inventor
马展
王剑强
魏红莲
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Priority to PCT/CN2022/105253 priority Critical patent/WO2024011417A1/zh
Priority to TW112125920A priority patent/TW202406343A/zh
Publication of WO2024011417A1 publication Critical patent/WO2024011417A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding

Definitions

  • the present application relates to point cloud compression encoding and decoding technology, and in particular, to a encoding and decoding method, a decoder, an encoder and a computer-readable storage medium.
  • a point cloud is a collection of points that can store the geometric position and related attribute information of each point to accurately describe objects in space.
  • the amount of point cloud data is huge, and one frame of point cloud can contain millions of points, which also brings great difficulties and challenges to the effective storage and transmission of point clouds. Therefore, compression technology is used to reduce redundant information in point cloud storage to facilitate subsequent processing.
  • point cloud geometric compression technology based on neural networks is mainly divided into geometric lossy compression and geometric lossless compression.
  • geometric lossless compression on the encoder side, it is often necessary to use the surrounding context such as parent nodes, neighbor nodes, etc. as input.
  • the output of each voxel in the geometric data of the point cloud is The occupancy probability is then used to convert the voxel occupancy symbol corresponding to the occupancy probability of each voxel into a code stream using an entropy encoder.
  • the occupancy probability of each voxel is predicted according to the same process. Based on the predicted occupancy probability, an entropy decoder is used to decode the voxel occupancy symbols from the code stream and reconstruct the geometric data of the point cloud.
  • Embodiments of the present application provide a coding and decoding method, a decoder, an encoder and a computer-readable storage medium, which can improve coding and decoding efficiency and compression performance, thereby improving coding and decoding performance.
  • the embodiment of this application provides a decoding method, including:
  • Predict the occupancy probabilities of the m groups of upsampled voxels in sequence determine the m groups of occupancy probabilities, decode the occupancy indication information according to the m groups of occupancy probabilities, and determine the respective occupancy symbols of the m groups;
  • the embodiment of the present application provides an encoding method, including:
  • Predict the occupancy probability of the m groups of upsampled voxels in sequence determine the m group of occupancy probabilities, and determine the respective occupancy symbols of the m groups according to the m group of occupancy probabilities;
  • This embodiment of the present application provides a decoder, including:
  • the parsing part is configured to parse the code stream and determine the occupancy indication information corresponding to the first scale point cloud;
  • the first grouping part is configured to determine the second scale point cloud, perform upsampling and dividing processing on the second scale point cloud, and determine m groups of upsampled voxels; wherein the second scale point cloud is the The previous decoded point cloud data corresponding to the first scale point cloud; m is an integer greater than or equal to 1;
  • the first prediction part is configured to predict the occupancy probability of the m groups of upsampled voxels in sequence, determine the m group of occupancy probabilities, and decode the occupancy indication information according to the m group of occupancy probabilities to determine the m group of occupancy probabilities. respective occupancy symbols;
  • the decoding part is configured to determine the reconstructed geometric data corresponding to the first scale point cloud based on the m groups of occupancy symbols.
  • An embodiment of the present application provides an encoder, including:
  • the downsampling part is configured to perform voxel downsampling on the first-scale point cloud and determine the second-scale point cloud;
  • the second grouping part is configured to perform upsampling and dividing processing on the second scale point cloud, and determine m groups of upsampled voxels; m is an integer greater than or equal to 1;
  • the second prediction part is configured to predict the occupancy probability of the m groups of upsampled voxels in sequence, determine the m group of occupancy probabilities, and determine the respective occupancy symbols of the m groups according to the m group of occupancy probabilities;
  • the encoding part is configured to encode the respective occupancy symbols of the m groups, determine the occupancy indication information corresponding to the first scale point cloud, and write the occupancy indication information into the code stream.
  • This embodiment of the present application provides a code stream, including:
  • the code stream is generated by bit encoding based on the information to be encoded; wherein the information to be encoded at least includes: occupancy indication information corresponding to the first scale point cloud.
  • This embodiment of the present application provides a decoder, including:
  • a first memory configured to store executable instructions
  • the first processor is configured to implement the decoding method as described in any one of the above when executing executable instructions stored in the first memory.
  • An embodiment of the present application provides an encoder, including:
  • a second memory configured to store executable instructions
  • the second processor is configured to implement the encoding method as described in any one of the above when executing the executable instructions stored in the second memory.
  • Embodiments of the present application provide a computer-readable storage medium that stores executable instructions, which are used to cause the first processor to implement the above decoding method when executed, or to cause the second processor to implement the above decoding method when executed. Encoding method.
  • Embodiments of the present application provide a computer program product, including a computer program or instructions.
  • the computer program or instructions are executed by a first processor, the decoding method provided by the embodiments of the present application is implemented; or, the computer program or instructions are executed by a first processor.
  • the second processor executes, the encoding method provided by the embodiment of the present application is implemented.
  • Embodiments of the present application provide a coding and decoding method, a decoder, an encoder, and a computer-readable storage medium.
  • the occupancy indication information corresponding to the first-scale point cloud is obtained by parsing from the code stream, that is, representing the first-scale point cloud.
  • the previous decoded second-scale point cloud corresponding to the first-scale point cloud can be upsampled and divided, and the second-scale point cloud can be upsampled to the first-scale point cloud.
  • the prediction speed of occupancy probability is accelerated and the processing efficiency of the prediction process is improved.
  • m groups of occupancy probabilities are used to decode the occupancy indication information corresponding to the first scale point cloud, and m groups of decoded occupancy symbols are obtained, and the geometric data of the first scale point cloud is reconstructed to complete the decoding of the first scale point cloud. Improved decoding efficiency and decoding performance.
  • Figure 1 is a flow chart of G-PCC encoding
  • Figure 2 is a flow chart of G-PCC decoding
  • Figure 3 is an optional flow diagram of the encoding method provided by the embodiment of the present application.
  • Figure 4 is a schematic diagram of an optional process of voxel downsampling provided by an embodiment of the present application
  • Figure 5A is an optional schematic diagram of the occupancy symbols of voxels in the scale p point cloud provided by the embodiment of the present application;
  • Figure 5B is an optional schematic diagram of the occupancy symbols of voxels in the scale p-1 point cloud provided by the embodiment of the present application;
  • Figure 6 is an optional flow diagram of the encoding method provided by the embodiment of the present application.
  • Figure 7 is an optional flow diagram of the encoding method provided by the embodiment of the present application.
  • Figure 8 is a schematic diagram of an optional process of upsampling and dividing provided by the embodiment of the present application.
  • Figure 9 is an optional flow diagram of the encoding method provided by the embodiment of the present application.
  • Figure 10 is an optional schematic diagram of the group sequential occupancy probability prediction process provided by the embodiment of the present application.
  • Figure 11 is an optional effect diagram of the occupancy probability of voxels in the point cloud provided by the embodiment of the present application.
  • Figure 12 is an optional flow diagram of the decoding method provided by the embodiment of the present application.
  • Figure 13 is an optional flow diagram of the decoding method provided by the embodiment of the present application.
  • Figure 14 is an optional process schematic diagram of applying the encoding and decoding method provided by the embodiment of the present application to an actual scenario
  • Figure 15 is an optional structural schematic diagram of a decoder provided by an embodiment of the present application.
  • Figure 16 is an optional structural schematic diagram of an encoder provided by an embodiment of the present application.
  • Figure 17 is an optional structural schematic diagram of a decoder provided by an embodiment of the present application.
  • Figure 18 is an optional structural schematic diagram of an encoder provided by an embodiment of the present application.
  • first ⁇ second ⁇ third are only used to distinguish similar objects and do not represent a specific ordering of objects. It is understandable that "first ⁇ second ⁇ third" Where permitted, the specific order or sequence may be interchanged so that the embodiments of the application described herein can be practiced in an order other than that illustrated or described herein.
  • Voxel is the abbreviation of volume element and is the smallest unit of digital data in three-dimensional space division. With voxels, a 3D space can be meshed and each mesh can be given characteristics. For example, a voxel may be a fixed-sized cube in three-dimensional space. Voxels can be widely used in fields such as three-dimensional imaging, scientific data and medical imaging.
  • Point cloud compression algorithms include Geometry-based Point Cloud Compression (G-PCC). Geometric compression in G-PCC is mainly implemented through octree models and/or triangular surface models.
  • Point cloud geometric compression technology based on neural networks can be roughly divided into geometric lossy compression and lossless compression.
  • the lossless compression algorithm mainly focuses on the design of the prediction model of voxel occupancy probability.
  • the data representation of voxels usually uses octree models, volume models, sparse tensor representations, etc.
  • geometric lossless compression is often used.
  • the context is the input, and after processing by the neural network (such as convolution, fully connected) layer, the occupancy probability of each voxel in the geometric data of the point cloud is output, and then an entropy encoder is used to convert the occupancy probability of each voxel into the corresponding voxel Occupy symbols and convert them into code streams.
  • an entropy decoder is used to decode the voxel occupancy symbols from the code stream and reconstruct the geometric data of the point cloud.
  • the method of predicting the occupancy probability of each voxel in the point cloud and then encoding and decoding based on the occupancy probability often has performance or complexity flaws: for example, the surrounding context of the location of the voxel to be predicted is insufficient. , leading to inaccurate prediction, which in turn leads to poor coding and compression performance; or the occupancy probability prediction for each voxel overuses context information, resulting in redundant code rates, resulting in slow encoding and decoding times and low encoding and decoding efficiency. In summary, the encoding and decoding performance is reduced.
  • Embodiments of the present application provide a coding and decoding method, a decoder, an encoder and a computer-readable storage medium, which can improve coding and decoding efficiency and coding and decoding performance.
  • a flow chart of G-PCC encoding and a flow chart of G-PCC decoding are first provided. It should be noted that the flow chart of G-PCC encoding and the flow chart of G-PCC decoding described in the embodiment of the present application are only to explain the technical solutions of the embodiment of the present application more clearly, and do not constitute a provision for the embodiment of the present application. limitations of technical solutions.
  • the cloud can be a point cloud in the video, but is not limited to this.
  • each slice is independently encoded.
  • the point cloud data is first divided into multiple slices through slice division. In each slice, the geometric information and attribute information of the point cloud are encoded separately.
  • the geometric information is transformed into coordinates so that all point clouds are contained in a bounding box, and then quantized.
  • the quantization mainly plays the role of scaling. Due to the quantization rounding, part of the point cloud
  • the geometric information is the same, and it can be decided whether to remove duplicate points based on parameters.
  • the process of quantifying and removing duplicate points is also called the voxelization process. Then divide the bounding box into an octree.
  • the bounding box is divided into eight equal parts into eight sub-cubes, and the non-empty sub-cubes (containing points in the point cloud) continue to be divided into eight equal parts until the leaf structure is obtained.
  • the division stops when the point is a 1x1x1 unit cube, and the points in the leaf nodes are arithmetic encoded to generate a binary geometric bit stream, that is, a geometric code stream.
  • octree division is also required first, but unlike octree-based geometric information encoding, this trisoup does not need to divide the point cloud step by step.
  • Vertex is also used in the implementation of the geometric reconstruction process, and the reconstructed geometric information is used when encoding the attributes of the point cloud.
  • color conversion is performed to convert color information (ie, attribute information) from RGB color space to YUV color space. Then, the point cloud is recolored using the reconstructed geometric information so that the unencoded attribute information corresponds to the reconstructed geometric information.
  • LOD Level of Detail
  • Region Adaptive Hierarchal Transform Region Adaptive Hierarchal Transform
  • both methods will convert the color information from the spatial domain to the frequency domain, obtain high-frequency coefficients and low-frequency coefficients through transformation, and finally quantize the coefficients (i.e., quantization coefficients).
  • the octree will be passed After the geometric encoding data of division and surface fitting and the attribute encoding data of quantization coefficient processing are slice-synthesized, the vertex coordinates of each block are sequentially encoded (i.e., arithmetic encoding) to generate a binary attribute bit stream, that is, an attribute code stream.
  • the flow chart of G-PCC decoding shown in Figure 2 is applied to the decoder.
  • the decoder obtains the binary code stream and independently decodes the geometric bit stream (i.e., the geometric code stream) and the attribute bit stream in the binary code stream.
  • the geometric information of the point cloud is obtained through arithmetic decoding-octree synthesis-surface fitting-reconstructed geometry-inverse coordinate transformation;
  • the attribute bit stream through arithmetic decoding-inverse coordinate transformation Quantization - LOD-based inverse lifting or RAHT-based inverse transformation - inverse color conversion to obtain the attribute information of the point cloud, and restore the three-dimensional image model of the point cloud data to be encoded based on the geometric information and attribute information.
  • the encoding method of the embodiment of the present application can be applied to the geometric information encoding process of G-PCC as shown in Figure 1.
  • voxel down-sampling is performed on the voxelized first-scale point cloud. , determine the second scale point cloud; perform upsampling and division processing on the second scale point cloud, and determine m groups of upsampled voxels; predict the occupancy probability of the m groups of upsampled voxels in sequence, and determine the m group of occupancy probabilities.
  • the group occupancy probability determines the respective occupancy symbols of the m groups as data that needs to be entropy encoded, thereby replacing the octree allocation and surface fitting processes in Figure 1.
  • the coding process in this embodiment of the present application may adopt the arithmetic coding method in Figure 1, such as entropy coding.
  • the m groups of respective occupancy symbols are encoded through the arithmetic coding process, the occupancy indication information corresponding to the first scale point cloud is determined, and written into the geometry bit stream (code stream).
  • the decoding method of the embodiment of the present application can be applied to the geometric information decoding process of G-PCC as shown in Figure 2.
  • the occupancy indication information corresponding to the first scale point cloud is determined;
  • the second scale point is determined cloud, and perform upsampling and division processing on the second-scale point cloud to determine m groups of upsampled voxels; among them, the second-scale point cloud is the previous decoded point cloud data corresponding to the first-scale point cloud; for m Groups of upsampled voxels predict occupancy probabilities in turn, determine m groups of occupancy probabilities, decode the occupancy indication information through m groups of occupancy probabilities, and determine the respective occupancy symbols of m groups.
  • the encoding process in this embodiment of the present application may adopt the arithmetic decoding method in Figure 2, such as entropy decoding. After arithmetic decoding, there is no need for the processing of octree synthesis and surface fitting in Figure 2.
  • the reconstructed geometry data corresponding to the first-scale point cloud is determined directly based on m groups of occupied symbols.
  • the encoding method and decoding method in the embodiment of the present application can also be used in other point cloud encoding and decoding processes besides G-PCC.
  • Figure 3 is an optional flow diagram of the encoding method provided by the embodiment of the present application, which will be described in conjunction with the steps shown in Figure 3. Since the encoding method provided by the embodiment of the present application is applied to the process of generating a geometric bit stream in Figure 1, the first-scale point cloud, the second-scale point cloud, etc. in the following embodiments specifically refer to the geometric information of the first-scale point cloud, Geometric information of the second scale point cloud, etc.
  • the encoder before the encoder performs voxel downsampling on the first-scale point cloud, it needs to complete the voxelization of the first-scale point cloud so as to represent the first-scale point cloud in the form of a voxel grid.
  • a point in the point cloud can correspond to an occupied voxel (i.e., a non-empty voxel), and an unoccupied voxel (i.e., an empty voxel) represents the volume.
  • occupied voxels may be marked as 1 and unoccupied voxels may be marked as 0.
  • the voxelized point cloud can represent the geometric data of the point cloud through the occupation symbols of voxels at each position in the voxel grid.
  • the encoder performs voxel downsampling on the voxelized first-scale point cloud to obtain a second-scale point cloud.
  • the encoder can implement voxel downsampling through pooling, such as using a maximum pooling layer with a stride of 2 ⁇ 2 ⁇ 2 to merge 8 voxels of the first scale point cloud into the second scale. 1 voxel in the point cloud, each downsampling reduces the size of the point cloud in three dimensions to half of the original size.
  • the first-scale point cloud can be called a high-scale point cloud relative to the second-scale point cloud
  • the second-scale point cloud can be called a low-scale point cloud relative to the first-scale point cloud.
  • a scale p point cloud including 4 ⁇ 4 ⁇ 2 voxels is obtained.
  • the second scale point cloud obtained is scale p.
  • -1 point cloud contains 2 ⁇ 2 ⁇ 1 voxels.
  • the occupied voxels in the point cloud are represented by solid cubes, which represent the locations of the points in the point cloud.
  • Unoccupied voxels in the point cloud are represented by empty cubes, representing locations where there are no points in the point cloud.
  • the corresponding occupancy symbols are shown in Figure 5A.
  • the occupancy symbol of the voxel is 1, which means that the voxel is occupied, and 0, which means that the voxel is occupied.
  • the voxel is unoccupied.
  • 3 of the 4 voxels corresponding to the scale p-1 point cloud are occupied and 1 voxel is unoccupied.
  • the corresponding occupancy symbols are shown in Figure 5B.
  • the point cloud in Figure 4 is only exemplary, and the actual point cloud may include more voxels.
  • the encoder performs upsampling processing on the second scale point cloud, and upsamples the second scale point cloud to the first scale. It can be understood that since the second scale is lower than the first scale, when the second scale point cloud is upsampled, multiple upsampled voxels corresponding to one second scale voxel will be obtained, and each of the multiple upsampled voxels will be obtained. Whether each upsampled voxel contains points in the point cloud, that is, whether it is occupied, needs to be predicted through the subsequent occupancy probability prediction process.
  • the encoder divides each upsampled voxel obtained through upsampling, determines m groups of upsampled voxels, and performs subsequent prediction processing in a group-based form.
  • m is an integer greater than or equal to 1.
  • the upsampled voxels of the first scale obtained by the upsampling are numbered and grouped to implement division processing according to the numbering, and m groups of upsampling are obtained. voxels.
  • S102 can be implemented by executing S1021-S1022, as follows:
  • S1022 can be implemented through S201-S202, as follows:
  • the encoder can determine the numbering range based on the number of voxels corresponding to the voxels in a second-scale point cloud after upsampling; using the numbering range, n obtained by sampling on the second-scale point cloud Among the upsampled voxels, the multiple upsampled voxels corresponding to each voxel in the second scale point cloud among the n upsampled voxels are numbered in the same manner to obtain the number corresponding to each upsampled voxel.
  • the encoder can divide upsampled voxels with the same number into a group according to the number of each upsampled voxel, and determine m groups of upsampled voxels.
  • the encoder may also determine at least two consecutive numbers among the numbers used to mark the upsampled voxels; and based on the number of each upsampled voxel, assign at least two consecutive numbers to n
  • the corresponding upsampled voxels in the sampled voxels are treated as a group, and m groups of upsampled voxels are determined.
  • voxels that is, n upsampled voxels of the first scale. It can be seen that one voxel of the scale p-1 point cloud corresponds to 8 voxels of the scale p after upsampling, and the number range can be determined to be 1-8.
  • the first grouping method divide upsampled voxels with the same number into one group; for example, group each upsampled voxel numbered 1 into one group, and divide each upsampled voxel numbered 2 into One group, and so on, get 8 groups of upsampled voxels as m groups of upsampled voxels.
  • the second grouping method Determine number 1 and number 2 as consecutive numbers, number 3 and number 4 as consecutive numbers, and number 5 to number 8 as consecutive numbers; use the upsampled voxels corresponding to number 1 and number 2 as a Group, the upsampled voxels corresponding to numbers 3 and 4 are taken as a group, the upsampled voxels corresponding to numbers 5 to 8 are taken as a group, and 3 groups of upsampled voxels are obtained as m groups of upsampled voxels.
  • the third grouping method treating all n upsampled voxels as a group is equivalent to determining the numbers 1-8 as consecutive numbers, and obtaining 1 group of upsampled voxels as m groups of upsampled voxels.
  • the encoder can sequentially predict the occupancy probability of the m groups of upsampled voxels in a progressive manner.
  • the previously predicted prediction results of one or more groups of upsampled voxels (such as the occupancy symbols corresponding to the occupancy probability) are used to predict the occupancy probability of the current group of upsampled voxels.
  • Prediction is made to determine the occupancy probability corresponding to the current group of upsampled voxels; until the m group of occupancy probabilities are determined, the respective occupancy symbols of the m groups are determined based on the m group of occupancy probabilities.
  • S103 can be implemented by executing S1031-S1034, as follows:
  • the first group of upsampled voxels does not have the prediction results of other groups of upsampled voxels that have been predicted before. Therefore, the encoder can directly perform the first group of upsampled voxels in the m groups of upsampled voxels. Carry out occupancy probability prediction and determine the first set of occupancy probabilities. Moreover, the preset occupancy symbols are used to represent the first group of occupancy probabilities, that is, the first group of occupancy probabilities are identified by the occupancy symbols, so as to distinguish whether the corresponding voxels represented by the first group of occupancy probabilities are occupied, and determine the first group of occupancy probabilities. A set of occupancy symbols.
  • the encoder when i is greater than 1 and less than or equal to m, that is, for the second to mth group of upsampled voxels in the m group of upsampled voxels, the encoder When predicting occupancy probability for the i-th group of up-sampled voxels in the group of up-sampled voxels, at least k groups of occupancy symbols corresponding to at least k groups of up-sampled voxels represented by the completed probability prediction and occupancy symbols are used to predict the i-th group of up-sampled voxels. Sampling voxels are used to predict the occupancy probability and determine the occupancy probability of the i-th group.
  • the encoder can perform feature extraction on the i-th group of upsampled voxels to determine the i-th group of voxel features; determine the i-kth group of occupancy symbols to the i-1th group that have completed probability prediction and occupancy symbol representation
  • the occupancy symbols are at least k groups of occupancy symbols; based on at least k groups of occupancy symbols and combined with the characteristics of the i-th group of voxels, the occupancy probability is predicted for each upsampled voxel in the i-th group of upsampled voxels, and the i-th group of upsampling is determined
  • the occupancy probability corresponding to each upsampled voxel in the voxel is the i-th group occupancy probability.
  • the encoder may use the occupancy symbols corresponding to all upsampled voxel groups represented by the occupancy symbols that have completed probability prediction before the i-th group of upsampled voxels as at least the k group of occupancy symbols; it may also be the i-th group of upsampled voxels. Before the group of upsampled voxels, there are at least k groups of occupancy symbols represented by the most recent completed probability prediction and occupancy symbols. The specific selection is based on the actual situation and is not limited in the embodiment of this application.
  • the encoder uses the first occupancy symbol in the preset occupancy symbol to represent the occupancy probability in the i-th group of occupancy probabilities that is greater than or equal to the preset probability threshold based on the obtained i-th group occupancy probability; and uses the preset occupancy symbol.
  • the second occupancy symbol in the occupancy symbol represents the occupancy probability of the i-th group of occupancy probabilities that is less than the preset probability threshold, thereby determining the i-th group of occupancy symbols.
  • the preset probability threshold may be 90%, and the preset occupiers may include 0 and 1. If the occupancy probability of an upsampled voxel in the i-th group of occupancy probabilities is greater than or equal to the preset probability threshold, it means that the upsampled voxel has a greater probability of containing the point cloud midpoint, and 1 is used to represent the occupancy probability, and Let the preset occupancy symbol 1 correspond to the upsampled voxel corresponding to the occupancy probability.
  • the process of using preset occupancy characters to represent the first group of occupancy probabilities and determining the first group of occupied symbols in S1031 is the same as the method of using the preset occupancy characters to represent the i-th group of occupancy probability and determining the i-th group of occupied symbols in S1033. are consistent.
  • the encoder uses at least k groups of occupancy symbols including the ith group of occupancy symbols. For example, the above-mentioned occupancy symbols from the 1st group to the ith group of occupancy symbols are obtained , continue to perform occupancy probability prediction and occupancy symbol representation on the i+1th group of upsampled voxels, and obtain the i+1th group of occupancy symbols corresponding to the i+1th group of upsampled voxels. The occupancy probability prediction and occupancy symbol representation are continued in this manner until m groups of respective occupancy symbols are determined.
  • the above-mentioned processes of S1021-S1022 and S1031-S1034 can be implemented through a probabilistic prediction model.
  • the probabilistic prediction model is a trained deep learning network that performs feature extraction, upsampling, and probabilistic prediction processing.
  • the sequence of occupied symbols shown in Figure 5B can be input into the probability prediction model to perform feature extraction, upsampling and Occupancy probability prediction.
  • the scale p-1 point cloud represents the second scale point cloud.
  • each voxel at scale p-1 corresponds to 8 upsampled voxels at scale p.
  • the encoder uses the numbers 1-8 to number the 8 upsampled voxels obtained by decomposing p-1 voxels at each scale one by one to obtain each of the 24 upsampled voxels. The number corresponding to the upsampled voxel.
  • the probabilistic prediction model groups upsampled voxels with the same number as a group to obtain 8 groups of upsampled voxels, which enter the occupancy probability prediction process.
  • the probability prediction model uses the CNN network to predict the first group of upsampled voxels, and obtains the occupancy probability corresponding to each upsampled voxel in the first group of upsampled voxels.
  • the probabilistic prediction model performs occupancy symbol representation based on the occupancy probability corresponding to each upsampled voxel in the first group of upsampled voxels, and obtains the occupancy symbol corresponding to each upsampled voxel in the first group of upsampled voxels.
  • the second group of upsampled voxels are Occupancy probability prediction and occupancy symbol representation are used to obtain the occupancy symbol corresponding to each upsampled voxel in the second group of upsampled voxels.
  • the occupancy symbols corresponding to each upsampled voxel in the first group of upsampled voxels obtained in the first stage are combined with the occupancy symbols corresponding to each upsampled voxel in the second group of upsampled voxels obtained in the second stage.
  • the occupancy probability prediction and occupancy symbol representation are performed for the third group of upsampled voxels, and the occupancy symbols corresponding to each upsampled voxel in the third group of upsampled voxels are obtained.
  • the occupancy symbol corresponding to each upsampled voxel in the 8th group of upsampled voxels is obtained.
  • the occupancy symbol corresponding to each upsampled voxel in the obtained eight groups of upsampled voxels can be used to obtain the geometric information of the first scale point cloud, that is, the scale p point cloud.
  • the voxels in the high-scale point cloud are recovered, and then through grouping and occupancy probability prediction, the occupancy probability of the points in the point cloud contained in each high-scale voxel is predicted, and based on the occupancy The probability is to mark each high-scale voxel with a preset occupancy symbol, thereby predicting the geometric data of the high-scale point cloud.
  • the occupancy probability corresponding to each of the 24 upsampled voxels predicted through the above eight stages can be shown in Figure 11.
  • Figure 11 shows 24 upsampled voxels of scale p.
  • the occupancy probability corresponding to each upsampled voxel facing the side of the paper.
  • the predicted probabilities in Figure 11 are only for convenience of explanation and cannot be understood as the results of actual calculations. It can be seen that if the occupancy probability predicted for the actually occupied voxels is closer to 1, and the occupancy probability predicted for the actual unoccupied voxels is closer to 0, the prediction is more accurate. The more accurate the prediction, the less coding data will be obtained when encoding occupancy symbols determined based on the occupancy probability of voxels in high-scale point clouds. That is, the more accurate the probability prediction, the better the compression performance of point cloud geometric information. good.
  • the encoder can perform entropy coding on each group of m groups of occupancy symbols, and determine that the m groups of coded occupancy indication information are the occupancy indication information corresponding to the first scale point cloud.
  • whether a voxel in a high-scale point cloud is occupied can be represented by an occupancy symbol, and the occupancy symbol can be directly set to 0 for voxels that have not been predicted for occupancy probability.
  • the encoder encodes m groups of respective occupancy symbols, such as a sequence of m groups of occupancy symbols, to obtain the first encoded data corresponding to the first scale point cloud, that is, the occupancy indication information corresponding to the first scale point cloud, thereby achieving high-level Lossless compression of geometric data from scaled point clouds.
  • the entropy encoding may use a Context-based Adaptive Binary Arithmetic Coding (CABAC) algorithm, but is not limited thereto.
  • CABAC Context-based Adaptive Binary Arithmetic Coding
  • the encoder writes the occupancy indication information corresponding to the first-scale point cloud into the code stream and sends it to the decoder.
  • the decoder extracts the occupancy indication information corresponding to the first-scale point cloud and combines it with the previously decoded low-scale point cloud.
  • the geometric data (such as the second-scale point cloud) is fed into the entropy decoder, and the geometric data of the lossless high-scale point cloud can be reconstructed, that is, the reconstructed geometric data corresponding to the first-scale point cloud can be reconstructed.
  • the group prediction and coding method in the embodiment of the present application compared with the method of predicting and coding the occupancy probability voxel by voxel based on the parent node and neighbor node of each voxel, on the one hand greatly reduces the The coding complexity improves the coding speed and compression performance; on the other hand, as the coding proceeds, the occupancy symbols of all upsampled voxel groups that have previously completed occupancy probability prediction can be used as context information and jointly input to the convolutional network , helping to predict the occupancy probability of the current upsampled voxel group, thereby improving the accuracy of the occupancy prediction probability prediction, thereby improving the coding accuracy, that is, improving the coding performance.
  • the encoder can encode m groups of respective occupied symbols in a parallel manner, thereby further speeding up the encoding speed and improving encoding efficiency and encoding performance.
  • the second scale point cloud is obtained by downsampling the first scale point cloud, and the second scale point cloud is upsampled and divided to upsample the second scale point cloud. to the first scale, and divide the upsampled voxels into m groups, and predict the occupancy probability of the groups in sequence to obtain m groups of occupancy probabilities. In this way, through group prediction, the prediction speed of occupancy probability is accelerated, and the processing efficiency and coding efficiency of the prediction process are improved.
  • m groups of occupancy symbols corresponding to m groups of occupancy probabilities are encoded, and the occupancy indication information corresponding to the first-scale point cloud is determined and written into the code stream, thereby completing the encoding of the first-scale point cloud and improving coding efficiency and coding performance.
  • FIG. 4 and FIG. 10 are an example process of performing down-sampling, up-sampling and dividing processing on a first-scale point cloud, such as a scale p point cloud, and predicting occupancy probability.
  • the first-scale point cloud can be downsampled more times, such as 2 times, 3 times, 4 times or more, to obtain multiple low-scale point cloud geometric data.
  • the point cloud geometric data at a low scale is subjected to upsampling and division processing, occupancy probability prediction, and occupancy symbol representation and coding processing to obtain occupancy indication information corresponding to multiple scales.
  • the encoding method provided by the embodiment of the present application also includes: performing at least S times of voxel downsampling based on the second scale point cloud, and determining the third scale point cloud to the S+2th scale point cloud; S is greater than Or an integer equal to 1; perform voxel upsampling and division processing on the third scale point cloud to the S+2th scale point cloud respectively, and perform occupancy probability prediction and coding on the divided groups in sequence to determine the second scale point The occupancy indication information of the cloud to the occupancy indication information of the S+1th scale point cloud.
  • the two point clouds before and after each voxel downsampling can be used as the high-scale point cloud and the two point clouds respectively.
  • the low-scale point cloud is divided, processed, occupancy probability predicted and entropy encoded in a similar manner to obtain the occupancy indication information of the second scale point cloud to the occupancy indication information of the S+1th scale point cloud; the encoder converts the first scale point cloud into The occupancy indication information of the point cloud, the occupancy indication information of the second scale point cloud to the occupancy indication information of the S+1th scale point cloud are written into the code stream and sent to the decoder.
  • Figure 12 is an optional flow diagram of the decoding method provided by the embodiment of the present application, which will be described in conjunction with the steps shown in Figure 12. Since the decoding method provided by the embodiment of the present application is applied to the process of decoding the geometric bit stream in Figure 2, the first-scale point cloud, the second-scale point cloud, etc. in the following embodiments specifically refer to the geometry of the first-scale point cloud. information, geometric information of second-scale point clouds, etc.
  • the decoder parses the received code stream to obtain the first encoded data corresponding to the first scale point cloud, that is, the occupancy indication information corresponding to the first scale point cloud.
  • S402. Determine the second scale point cloud, perform upsampling and division processing on the second scale point cloud, and determine m groups of upsampled voxels; wherein the second scale point cloud is the previous decoded corresponding to the first scale point cloud.
  • point cloud data; m is an integer greater than or equal to 1.
  • the code stream contains occupancy indication information of multiple scale point clouds, and when decoding, the decoder performs decoding in order from low scale to high scale.
  • the decoder determines that the previously decoded point cloud data before the first-scale point cloud is the second-scale point cloud; thus, the second-scale point cloud is used As the decoded low-scale point cloud geometric data, the first-scale point cloud is decoded, predicted and reconstructed.
  • S402 can be implemented by executing S4021-S4022, as follows:
  • S4022 can be implemented by executing S501-S502, as follows:
  • the decoder may divide the upsampled voxels with the same number into a group according to the number of each upsampled voxel among the n upsampled voxels, and determine m groups of upsampled voxels.
  • at least two consecutive numbers can also be determined; according to the number of each upsampling voxel, the upsampling voxels corresponding to the at least two consecutive numbers in the n upsampling voxels are taken as a group to determine m groups of upsampling voxels.
  • the specific selection is made according to the actual situation, and is not limited by the embodiments of this application.
  • the process of upsampling and dividing in the decoder is the same as the process of upsampling and dividing in the encoder.
  • the decoder performs occupancy probability prediction on the obtained m groups of upsampled voxels in a progressive manner, and determines m groups of occupancy probabilities.
  • the previously predicted prediction results of one or more groups of upsampled voxels (such as the occupancy symbols corresponding to the occupancy probability) are used to predict the occupancy probability of the current group of upsampled voxels.
  • Prediction is made to determine the occupancy probability corresponding to the current group of upsampled voxels; until the m group of occupancy probabilities are determined, the respective occupancy symbols of the m groups are determined based on the m group of occupancy probabilities.
  • S403 can be implemented by executing S4031-S4033, as follows:
  • the occupancy probability is predicted for the first group of upsampled voxels.
  • the process of determining the occupancy probability of the first group is consistent with the corresponding process description in S1031 in the encoding method, and will not be described again here.
  • the decoder decodes the occupancy indication information according to the first group of occupancy probabilities and determines the first group of occupancy symbols.
  • the decoder when i is greater than 1 and less than or equal to m, that is, for the 2nd to mth group of upsampled voxels in the m group of upsampled voxels, the decoder is upsampling the 2nd to mth groups.
  • the decoder uses at least k groups of occupancy symbols corresponding to the decoded at least k groups of upsampled voxels to predict the occupancy probability of the i-th group of upsampled voxels, and determine The occupancy probability of the i-th group.
  • the decoder uses at least k groups of occupancy symbols corresponding to at least k groups of decoded upsampled voxels to predict the occupancy probability of the i-th group of upsampled voxels, and the process of determining the i-th group of occupancy probabilities is the same as the corresponding probability in S1032 The description of the prediction process is consistent and will not be repeated here.
  • the encoder can perform feature extraction on the i-th group of upsampled voxels to determine the i-th group of voxel features; determine that the decoded i-k-th group of occupied symbols to the i-1-th group of occupied symbols are at least k groups Occupancy symbols; Based on at least k groups of occupancy symbols, combined with the characteristics of the i-th group of voxels, predict the occupancy probability for each upsampled voxel in the i-th group of upsampled voxels, and determine the predicted occupancy probability of the i-th group of upsampled voxels.
  • the occupancy probability corresponding to each upsampled voxel is the i-th group occupancy probability.
  • the decoder can perform entropy decoding on each occupancy indication information in the i-th group of occupancy indication information according to the i-th group of occupancy probability, and determine the occupancy symbol corresponding to each occupancy indication information in the i-th group of occupancy indication information. , thereby determining the i-th group of occupancy symbols; where the occupancy symbols include: a first occupancy symbol and a second occupancy symbol; the first occupancy symbol represents that the corresponding voxel is occupied; the second occupancy symbol represents that the corresponding voxel is not occupied.
  • the first occupancy character may be 1, and the second occupancy character may be 0.
  • the decoder can continue decoding the next group of symbols based on the i-th group of occupied symbols in the same manner as above until the respective occupied symbols of the m groups are determined.
  • the decoder can use at least k groups of occupancy symbols including the i-th group of occupancy symbols to predict the occupancy probability of the i+1th group of upsampled voxels, and predict the i+th group of occupancy indication information based on the i+1th group of occupancy probabilities.
  • 1 group of occupancy indication information is decoded to determine the i+1 group of occupancy symbols, until the decoding of the occupancy indication information is completed and the respective occupancy symbols of the m groups are determined.
  • the decoder when the decoder completes the prediction and decoding of occupancy probabilities of m groups of upsampled voxels and obtains m groups of occupancy symbols, it can determine the reconstructed geometric data corresponding to the first scale point cloud based on the m groups of occupancy symbols, and reconstruct Geometric information of first-scale point clouds.
  • the second scale point cloud is upsampled to the first scale by performing upsampling and division processing on the previous decoded second scale point cloud corresponding to the first scale point cloud. , and divide the upsampled voxels into m groups, and predict the occupancy probability of the groups in turn to obtain m groups of occupancy probabilities. Through group prediction, the prediction speed of occupancy probability is accelerated and the processing efficiency of the prediction process is improved. In this way, m groups of occupancy probabilities are used to decode the occupancy indication information corresponding to the first scale point cloud, and m groups of decoded occupancy symbols are obtained, and the geometric data of the first scale point cloud is reconstructed to complete the decoding of the first scale point cloud. Improved decoding efficiency and decoding performance.
  • the group prediction and coding method in the embodiment of this application compared with the method of predicting and decoding the occupancy probability voxel by voxel based on the parent node and neighbor node of each voxel, on the one hand, greatly reduces the decoding complexity. degree, improving the decoding speed; on the other hand, as decoding proceeds, the occupied symbols of all previously decoded upsampled voxel groups can be used as contextual information and jointly input into the convolutional network to help predict the current upsampled volume.
  • the occupancy probability of the element group is improved, thereby improving the accuracy of the occupancy prediction probability prediction, thereby improving the decoding accuracy, that is, improving the decoding performance.
  • embodiments of the present application may also perform the process of S601-S605 for decoding to obtain the second scale point cloud, as follows:
  • the decoder can obtain the occupancy indication information corresponding to the T-th scale point cloud by parsing the code stream, that is, the T-th occupancy indication information; and determine the previous decoded low-scale point cloud of the T-1th scale point cloud, That is, the T-th scale point cloud is used to decode the T-1th occupancy indication information using the T-th scale point cloud.
  • j T, T-1,...,3; T is an integer greater than or equal to 3;
  • the decoder can obtain the occupancy indication information corresponding to the second scale point cloud by parsing the code stream, that is, the second occupancy indication information; and determine the decoded third scale point cloud before the second scale point cloud.
  • the decoded third-scale point cloud is used to decode the second occupancy indication information corresponding to the second-scale point cloud.
  • the decoder can also start decoding from the third occupancy indication information corresponding to the third scale point cloud until the second scale point cloud is decoded: obtain the third occupancy indication information from the code stream and determine the decoded fourth scale point cloud. , using the fourth scale point cloud to first decode the third occupancy indication information to obtain the decoded third scale point cloud; and then using the third scale point cloud to decode the second occupancy indication information parsed in the code stream to obtain Second scale point cloud.
  • S602. Perform voxel upsampling and division processing based on the j-th scale point cloud, perform occupancy probability prediction on the divided groups in sequence, and determine m group occupancy probabilities corresponding to m groups in the j-1th scale point cloud.
  • the decoder performs voxel upsampling and division processing based on the decoded j-th scale point cloud through a process similar to the above-mentioned S402-S403, and sequentially performs occupancy probability prediction on the divided groups to determine the j-th- The occupancy probability of m groups corresponding to m groups in the 1-scale point cloud.
  • the decoder performs entropy decoding on the j-1th occupancy indication information based on the predicted m group of occupancy probabilities corresponding to the m groups in the j-1th scale point cloud, and determines the correspondence of the j-1th occupancy indication information. m groups occupy symbols. Then, the geometric data is reconstructed based on the m sets of occupancy symbols corresponding to the j-1th occupancy indication information, and the j-1th scale point cloud is reconstructed.
  • the decoder when the decoder decodes adjacent scales, it can use the known geometric data of the previously decoded low-scale point cloud to decode the occupancy indication information of the high-scale point cloud to be decoded, and reconstruct the high-scale point cloud until The second scale point cloud is reconstructed.
  • the decoding method in the embodiment of the present application can be applied to a scalable encoding and decoding method, that is, for multiple occupancy indication information of multiple scale point clouds sent by the encoder side, the decoder can decode according to the actual To meet the needs of accuracy, decode and reconstruct point clouds of any scale in the order of decoding from low scale to high scale.
  • the encoder writes and sends the occupancy indication information of the first scale point cloud, the occupancy indication information of the second scale point cloud to the occupancy indication information of the S+1th scale point cloud in the code stream, and the decoder
  • the decoding method provided by the embodiment of the present application can be used to decode from the S+1th scale point cloud to the third scale point cloud, and then end the decoding after reconstructing the geometric data of the third scale point cloud, and no longer perform the decoding of the S+1th scale point cloud.
  • the second-scale occupancy indication information is decoded with the occupancy indication information corresponding to the first-scale point cloud. The specific selection is made according to the actual situation, and is not limited by the embodiments of this application.
  • the decoding method provided by the embodiments of the present application can be repeatedly applied between multiple adjacent scales, and the decoding between each group of adjacent scales is independent of each other. Therefore, scale-scalable decoding can be flexibly implemented.
  • each decoding process of the above-mentioned decoder takes the decoded low-scale point cloud as known information and decodes the occupancy indication information of the high-scale point cloud.
  • its known information may be a preset number of uncoded point cloud information sent by the encoder side.
  • the encoder can send a preset number of point cloud information, such as the coordinates of 100 points in the point cloud, as the first known information, directly to the decoder in an unencoded manner, so that the decoder does not need to process the first known information.
  • the point cloud image A is first voxelized to obtain the scale p+1 point cloud.
  • the scale p+1 point cloud is voxelized to obtain the scale p point. cloud.
  • the point cloud of scale p is upsampled through the CNN network, and the n upsampled voxels of scale p+1 obtained by upsampling are numbered and divided into groups. For example, they are divided into 8 Group. From group 1 to group 8, the occupancy probability (occupancy probability) of each group of upsampled voxels is sequentially predicted through the CNN network.
  • the process of occupancy probability prediction can be referred to Figure 10.
  • the first group of upsampled voxels in the eight groups of upsampled voxels is processed by CNN, and the occupancy probability of the first group of upsampled voxels is output, and then the occupancy probability of the first group of upsampled voxels is output according to the first group of upsampled voxels.
  • Voxel occupancy probability is determined to determine the first group of occupation symbols corresponding to the first group of upsampled voxels.
  • the input data of each prediction process includes, in addition to the current group of upsampled voxels, all other upsampled voxel groups obtained in the previous prediction stage.
  • the occupancy symbol is used as the contextual information of the current group of upsampled voxels, and is jointly input into the CNN to help predict the occupancy probability of the current group of upsampled voxels, and convert the occupancy probability of the current group of upsampled voxels into occupancy symbols .
  • the encoder obtains the occupancy symbols of each group of upsampled voxels based on the predicted occupancy probability of each group of upsampled voxels, and performs arithmetic coding based on the occupancy symbols of each group of upsampled voxels to obtain a scale p+1 point cloud.
  • the scale p+1 encoded data is written into the code stream.
  • the scale p+1 coded data contains 8 sets of codes corresponding to 8 sets of occupied symbols.
  • the encoder performs the next downsampling based on the scale p point cloud to obtain the scale p-1 point cloud, and through the same process as above, predicts and codes based on the scale p-1 point cloud to obtain the scale p coded data of the scale p point cloud.
  • Write code stream By analogy, the encoder obtains a code stream containing scale p-1 coded data, scale p coded data and scale p+1 coded data of the scale p-1 point cloud, and sends the code stream to the decoder.
  • decoding is performed from low scale to high scale according to the decoding method provided by the embodiment of the present application.
  • the scale p-1 encoded data is decoded and a scale p-1 point cloud is reconstructed.
  • the scale p-1 point cloud is upsampled to obtain n upsampled voxels of scale p.
  • the n upsampled voxels of scale p are numbered and grouped with the same number, and the occupancy probability prediction is performed in groups to obtain the occupancy probability corresponding to each group of upsampled voxels of scale p.
  • each group of upsampled voxels of scale p Corresponding occupancy probability, perform arithmetic decoding on the scale p coded data analyzed in the code stream, and obtain the occupancy symbols corresponding to each upsampled voxel in the n upsampled voxels of scale p, thereby reconstructing the geometry of the scale p point cloud. data to complete the decoding of scale p-encoded data. Then, based on the decoded point cloud of scale p, the same process is used to decode the encoded data of scale p+1, and reconstruct the geometric data of the point cloud of scale p+1.
  • the embodiment of the present application has good compression performance and parallelism, while maintaining a faster encoding and decoding speed and improving encoding and decoding efficiency.
  • the applicant conducted a comparative test between the encoding and decoding method of the embodiment of the present application and the traditional G-PCC method. The results are shown in Table 1, as follows:
  • G-PCC 1-stage 3-stage 8-stage Code rate (bpp) 1.029 1.086 0.703 0.633 Gain - +5.5% -31.7% -38.5% encoding time 5.8 0.7 1.1 1.9 decoding time 3.3 0.7 1.0 1.8
  • 1-stage indicates that 1 group is used in the grouping process, and occupancy probability prediction and encoding and decoding are performed on 1 group of upsampled voxels; 3-stage indicates that 3 groups are used in the grouping process. method, and sequentially perform occupancy probability prediction and encoding and decoding on 3 groups of upsampled voxels; 8-stage means using 8 groups of grouping methods in the group division processing, and sequentially perform occupancy probability prediction and encoding on 8 groups of upsampled voxels. decoding. It can be seen that compared with the traditional G-PCC method, the encoding time and decoding time of the embodiment of the present application under the three packet prediction methods are much smaller than the encoding and decoding time of the traditional G-PCC method.
  • bit rates (bits per pixel, bpp) under 3-stage and 8-stage are also smaller than the bit rate of traditional G-PCC, resulting in gains of -31.7% under 3-stage and 8-stage respectively. vs. -38.5%.
  • the value of gain is inversely related to performance. The smaller the gain, such as the smaller the value expressed as a negative value, the better the compression performance. This data illustrates the improvement in encoding and decoding performance and saves transmission bandwidth.
  • This embodiment of the present application provides a decoder 1, as shown in Figure 15, including:
  • the parsing part 11 is configured to parse the code stream and determine the occupancy indication information corresponding to the first scale point cloud;
  • the first grouping part 12 is configured to determine the second scale point cloud, perform upsampling and dividing processing on the second scale point cloud, and determine m groups of upsampled voxels; wherein the second scale point cloud is The previous decoded point cloud data corresponding to the first scale point cloud;
  • the first prediction part 13 is configured to predict the occupancy probabilities of the m groups of upsampled voxels in sequence, determine the m groups of occupancy probabilities, and decode the occupancy indication information according to the m groups of occupancy probabilities to determine the m groups of occupancy probabilities.
  • the occupied symbols of each group; m is an integer greater than or equal to 1;
  • the decoding part 14 is configured to determine the reconstructed geometric data corresponding to the first scale point cloud based on the m groups of occupied symbols.
  • the first grouping part 12 is also configured to perform voxel upsampling on the second scale point cloud, and determine n upsampled voxels of the first scale; n is an integer greater than 1; Each of the n upsampled voxels is numbered, and grouped based on the number corresponding to each upsampled voxel, to determine the m groups of upsampled voxels.
  • the first prediction part 13 is also configured to predict the occupancy probability of the first group of upsampled voxels in the m groups of upsampled voxels, determine the occupancy probability of the first group, and determine the occupancy probability according to
  • the first group of occupancy probabilities decodes the occupancy indication information to determine the first group of occupancy symbols; when i is greater than 1 and less than or equal to m, use at least k groups of decoded occupancy symbols to upsample the i-th group
  • the voxels perform occupancy probability prediction, determine the i-th group of occupancy probabilities, and decode the i-th group of occupancy indication information in the occupancy indication information according to the i-th group of occupancy probabilities to determine the i-th group of occupancy symbols; wherein, i is an integer, k is an integer greater than or equal to 1 and less than i; based on the i-th group of occupied symbols, the next group of decoding is continued until the respective occupied symbols of the m groups are determined.
  • the first grouping part 12 is further configured to number each of the n upsampled voxels, and determine the number corresponding to each of the upsampled voxels; According to the number of each upsampled voxel, the n upsampled voxels are grouped to determine the m groups of upsampled voxels.
  • the first grouping part 12 is further configured to divide upsampled voxels with the same number among the n upsampled voxels according to the number of each upsampled voxel.
  • a group is determined to determine the m groups of upsampled voxels.
  • the first grouping part 12 is further configured to determine at least two consecutive numbers; according to the number of each upsampled voxel, the at least two consecutive numbers are assigned to the n
  • the corresponding upsampled voxels in the sampled voxels are taken as a group to determine the m groups of upsampled voxels.
  • the first prediction part 13 is also configured to perform feature extraction on the i-th group of upsampled voxels, determine the i-th group of voxel features, and determine the decoded i-k group of occupied symbols to the i-th group of occupied symbols.
  • the i-1 group of occupancy symbols are the at least k groups of occupancy symbols; according to the at least k groups of occupancy symbols, combined with the i-th group of voxel features, each upsampled volume in the i-th group of upsampled voxels is The occupancy probability is predicted for each voxel, and the predicted occupancy probability corresponding to each upsampled voxel in the i-th group of up-sampled voxels is determined to be the i-th group of occupancy probabilities.
  • the parsing part 11, the first grouping part 12, the first prediction part 13 and the decoding part 14 are also configured to parse the code stream and determine the T-th point cloud before determining the second scale point cloud.
  • the decoding part 14 is further configured to perform entropy decoding on each occupancy indication information in the i-th group of occupancy indication information according to the i-th group occupancy probability, and determine the i-th group occupancy
  • the occupancy symbols corresponding to each occupancy indication information in the indication information are used to determine the i-th group of occupancy symbols; wherein the occupancy symbols include: a first occupancy symbol and a second occupancy symbol; the first occupancy symbol represents a corresponding entity The voxel is occupied; the second occupancy character indicates that the corresponding voxel is not occupied.
  • the decoding part 14 is further configured to continue the next group of decoding based on the i-th group of occupied symbols until the respective m groups of occupied symbols are determined, including: using the i-th group of occupied symbols to determine At least k groups of occupancy symbols of the group of occupancy symbols, perform occupancy probability prediction on the i+1th group of upsampled voxels, and predict the i+1th group of occupancy indication information in the occupancy indication information based on the i+1th group of occupancy probabilities. Decoding is performed to determine the i+1th group of occupied symbols, until the decoding of the occupancy indication information is completed, and the respective occupied symbols of the m groups are determined.
  • the embodiment of the present application provides an encoder 2, as shown in Figure 16, including:
  • the downsampling part 21 is configured to perform voxel downsampling on the first scale point cloud and determine the second scale point cloud;
  • the second grouping part 22 is configured to perform upsampling and dividing processing on the second scale point cloud, and determine m groups of upsampled voxels; m is an integer greater than or equal to 1;
  • the second prediction part 23 is configured to predict the occupancy probability of the m groups of upsampled voxels in sequence, determine the m group of occupancy probabilities, and determine the respective occupancy symbols of the m groups according to the m group of occupancy probabilities;
  • the encoding part 24 is configured to encode the respective occupancy symbols of the m groups, determine the occupancy indication information corresponding to the first scale point cloud, and write the occupancy indication information into the code stream.
  • the second grouping part 22 is also configured to perform voxel upsampling on the second scale point cloud, and determine n upsampled voxels of the first scale; n is an integer greater than 1; Each of the n upsampled voxels is numbered, and grouped based on the number corresponding to each upsampled voxel, to determine m groups of upsampled voxels.
  • the second prediction part 23 is also configured to predict the occupancy probability of the first group of upsampled voxels in the m groups of upsampled voxels, determine the occupancy probability of the first group, and use
  • the preset occupancy symbol represents the first group of occupancy probabilities, and the first group of occupancy symbols is determined; when i is greater than 1 and less than or equal to m, the completed probability is used to predict the number corresponding to at least k groups of upsampled voxels represented by the occupancy symbol.
  • At least k groups of occupancy symbols are used to predict the occupancy probability of the i-th group of upsampled voxels to determine the i-th group of occupancy probabilities; where i is an integer, k is an integer greater than or equal to 1 and less than i; use the preset occupancy symbol represents the i-th group of occupancy probabilities, and determines the i-th group of occupancy symbols; using at least k groups of occupancy symbols including the i-th group of occupancy symbols, continue to predict the occupancy probability and occupancy of the i+1 group of upsampled voxels symbol representation until the respective occupied symbols of the m groups are determined.
  • the second grouping part 22 is further configured to number each of the n upsampled voxels, and determine the number corresponding to each of the upsampled voxels;
  • the n upsampled voxels are grouped to determine the m groups of upsampled voxels.
  • the second grouping part 22 is further configured to divide upsampled voxels with the same number among the n upsampled voxels according to the number of each upsampled voxel.
  • a group is determined to determine the m groups of upsampled voxels.
  • the second grouping part 22 is further configured to determine at least two consecutive numbers; according to the number of each upsampled voxel, the at least two consecutive numbers are placed on the n
  • the corresponding upsampled voxels in the sampled voxels are taken as a group to determine the m groups of upsampled voxels.
  • the second prediction part 23 is also configured to perform feature extraction on the i-th group of upsampled voxels, determine the i-th group of voxel features, and determine the i-th group of voxels that have completed probability prediction and occupancy symbol representation.
  • the i-k groups of occupancy symbols to the i-1th group of occupancy symbols are the at least k groups of occupancy symbols; according to the at least k groups of occupancy symbols, combined with the i-th group of voxel features, the i-th group of voxels are upsampled
  • the occupancy probability is predicted for each upsampled voxel in the i-th group of up-sampled voxels, and the occupancy probability corresponding to each up-sampled voxel in the i-th group of up-sampled voxels is determined to be the i-th group of occupancy probabilities.
  • the second prediction part 23 is also configured to use the first occupancy symbol in the preset occupancy symbols to represent the occupancy probability in the i-th group of occupancy probabilities that is greater than or equal to the preset probability threshold. ; And use the second occupancy symbol in the preset occupancy symbol to represent the occupancy probability of the i-th group of occupancy probabilities that is less than the preset probability threshold, and determine the i-th group of occupancy symbols.
  • the encoding part 24 is also configured to perform entropy encoding on each group of occupancy symbols in the m groups of respective occupancy symbols, and determine that the m groups of occupancy indication information obtained by encoding are the first scale point cloud Corresponding occupancy indication information.
  • the downsampling part 21, the second grouping part 22, the second prediction part 23 and the encoding part 24 are also configured to perform at least S times of voxel downsampling based on the second scale point cloud, determining The third scale point cloud to the S+2th scale point cloud; S is an integer greater than or equal to 1; perform voxel upsampling and division processing on the third scale point cloud to the S+2th scale point cloud respectively.
  • the occupancy indication information, the occupancy indication information of the second scale point cloud to the occupancy indication information of the S+1th scale point cloud are written into the code stream.
  • the embodiment of the present application also provides a decoder.
  • Figure 17 is an optional structural schematic diagram of the decoder 3 provided by the embodiment of the present application.
  • the decoder 3 includes: a first memory 32 and a first processor 33 .
  • the first memory 32 and the first processor 33 are connected through the first communication bus 34; the first memory 32 is used to store executable instructions; the first processor 33 is used to execute the executable instructions stored in the first memory 32.
  • the decoding method provided by the embodiment of this application is implemented.
  • the embodiment of the present application also provides an encoder.
  • Figure 17 is an optional structural schematic diagram of the encoder 4 provided by the embodiment of the present application.
  • the encoder 4 includes: a second memory 42 and a second processor 43 .
  • the second memory 42 and the second processor 43 are connected through the second communication bus 44; the second memory 42 is used to store executable instructions; the second processor 43 is used to execute the executable instructions stored in the second memory 42.
  • the instruction is issued, the encoding method provided by the embodiment of this application is implemented.
  • Embodiments of the present application provide a computer-readable storage medium storing executable instructions.
  • the executable instructions are stored therein. When the executable instructions are executed by a first processor, they will cause the first processor to execute any of the above-mentioned methods.
  • the computer-readable storage medium may be a memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; it may also include one or any combination of the above memories.
  • Various equipment may be a memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; it may also include one or any combination of the above memories.
  • Various equipment may be a memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; it may also include one or any combination of the above memories.
  • executable instructions may take the form of a program, software, software module, script, or code, written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and their May be deployed in any form, including deployed as a stand-alone program or deployed as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • executable instructions may, but do not necessarily correspond to, files in a file system and may be stored as part of a file holding other programs or data, for example, in a HyperText Markup Language (HTML) document. in one or more scripts, stored in a single file specific to the program in question, or in multiple collaborative files (e.g., files storing one or more modules, subroutines, or portions of code).
  • HTML HyperText Markup Language
  • executable instructions may be deployed to execute on one computing device, or on multiple computing devices located at one location, or alternatively, on multiple computing devices distributed across multiple locations and interconnected by a communications network execute on.
  • embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, magnetic disk storage and optical storage, etc.) embodying computer-usable program code therein.
  • a computer-usable storage media including, but not limited to, magnetic disk storage and optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
  • Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
  • the embodiment of this application upsamples the second-scale point cloud to the first scale by upsampling and dividing the second-scale point cloud, and divides the upsampled voxels into m groups, and then performs occupancy probability prediction on the groups. , get the occupancy probability of m groups. In this way, through group prediction, the prediction speed of occupancy probability is accelerated, and the processing efficiency and coding efficiency of the prediction process are improved.
  • m groups of occupancy symbols corresponding to m groups of occupancy probabilities are used for encoding, and the occupancy indication information corresponding to the first scale point cloud is determined to be written into the code stream; or, in the decoder, m groups of occupancy probabilities are used to encode the first
  • the occupancy indication information corresponding to the scale point cloud is decoded, thus improving the encoding and decoding efficiency and encoding and decoding performance.
  • the method of group prediction and encoding and decoding in the embodiment of the present application compared with the method of predicting and encoding the occupancy probability voxel by voxel based on the parent node and neighbor node of each voxel, on the one hand, it greatly reduces the The coding and decoding complexity improves the compression performance and coding and decoding speed; on the other hand, as the coding and decoding progress, the occupancy symbols of all the upsampled voxel groups that have previously completed the occupancy probability prediction can be used as context information and jointly input to In the convolutional network, it helps predict the occupancy probability of the current upsampled voxel group, thereby improving the accuracy of occupancy prediction probability prediction, thereby improving the encoding and decoding accuracy, that is, improving the encoding performance. Furthermore, the encoder can encode m groups of respective occupied symbols in a parallel manner, thereby further accelerating the encoding speed and improving the encoding efficiency and encoding performance.
  • the encoding and decoding methods provided by the embodiments of the present application can also be repeatedly applied between multiple adjacent scales, and the encoding and decoding between each group of adjacent scales are independent of each other. Therefore, scale-scalable encoding can be flexibly implemented. decoding.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)
  • Image Processing (AREA)

Abstract

本申请实施例提供了一种编解码方法、解码器、编码器及计算机可读存储介质,能够提高编解码效率与压缩性能,从而提高编解码性能。方法包括:解析码流,确定第一尺度点云对应的占用指示信息;确定第二尺度点云,并对第二尺度点云进行上采样与划分处理,确定m组上采样体素;其中,第二尺度点云为第一尺度点云对应的前一个已解码的点云数据;m为大于或等于1的整数;对m组上采样体素依次进行占用概率预测,确定m组占用概率,并根据m组占用概率对占用指示信息进行解码,确定m组各自的占用符号;基于m组占用符号,确定第一尺度点云对应的重建几何数据。

Description

编解码方法、解码器、编码器及计算机可读存储介质 技术领域
本申请涉及点云压缩编解码技术,尤其涉及一种编解码方法、解码器、编码器及计算机可读存储介质。
背景技术
点云是一组点的集合,它可以存储每个点的几何位置和相关属性信息,从而准确立体地描述空间中的物体。点云数据量庞大,一帧点云可以包含上百万的点,这也对有效地存储和传输点云带来了极大地困难与挑战。因此,压缩技术被用于减少点云存储中的冗余信息,从而方便后续的处理工作。
目前,基于神经网络的点云几何压缩技术主要分为几何有损压缩和几何无损压缩。对于几何无损压缩,在编码器端,往往需要以父节点,邻居节点等周围上下文为输入,经过神经网络(如卷积,全连接)层的处理,输出点云的几何数据中每个体素的占据概率,进而使用熵编码器,将每个体素的占据概率对应的体素占据符号,转换成码流。相应地,在解码器端,根据同样的过程预测每个体素的占据概率,根据预测的占据概率,使用熵解码器从码流中解码出体素占据符号,重建点云的几何数据。
可以看出,对点云中的每个体素进行占据概率预测,进而基于占据概率进行编解码的方法会导致编解码时间过长,从而降低了编解码效率,从而降低了编解码性能。
发明内容
本申请实施例提供一种编解码方法、解码器、编码器及计算机可读存储介质,能够提高编解码效率与压缩性能,从而提高编解码性能。
本申请的技术方案是这样实现的:
本申请实施例提供一种解码方法,包括:
解析码流,确定第一尺度点云对应的占用指示信息;
确定第二尺度点云,并对所述第二尺度点云进行上采样与划分处理,确定m组上采样体素;其中,所述第二尺度点云为所述第一尺度点云对应的前一个已解码的点云数据;m为大于或等于1的整数;
对所述m组上采样体素依次进行占用概率预测,确定m组占用概率,并根据所述m组占用概率对所述占用指示信息进行解码,确定所述m组各自的占用符号;
基于所述m组占用符号,确定所述第一尺度点云对应的重建几何数据。
本申请实施例提供一种编码方法,包括:
对第一尺度点云进行体素下采样,确定第二尺度点云;
对所述第二尺度点云进行上采样与划分处理,确定m组上采样体素;m为大于或等于1的整数;
对所述m组上采样体素依次进行占用概率预测,确定m组占用概率,根据所述m组占用概率确定所述m组各自的占用符号;
对所述m组各自的占用符号进行编码,确定所述第一尺度点云对应的占用指示信息,并将所述占用指示信息写入码流。
本申请实施例提供一种解码器,包括:
解析部分,配置为解析码流,确定第一尺度点云对应的占用指示信息;
第一分组部分,配置为确定第二尺度点云,并对所述第二尺度点云进行上采样与划分处理,确定m组上采样体素;其中,所述第二尺度点云为所述第一尺度点云对应的前一个已解码的点云数据;m为大于或等于1的整数;
第一预测部分,配置为对所述m组上采样体素依次进行占用概率预测,确定m组占用概率,并根据所述m组占用概率对所述占用指示信息进行解码,确定所述m组各自的占用符号;
解码部分,配置为基于所述m组占用符号,确定所述第一尺度点云对应的重建几何数据。
本申请实施例提供一种编码器,包括:
下采样部分,配置为对第一尺度点云进行体素下采样,确定第二尺度点云;
第二分组部分,配置为对所述第二尺度点云进行上采样与划分处理,确定m组上采样体素;m为大 于或等于1的整数;
第二预测部分,配置为对所述m组上采样体素依次进行占用概率预测,确定m组占用概率,根据所述m组占用概率确定所述m组各自的占用符号;
编码部分,配置为对所述m组各自的占用符号进行编码,确定所述第一尺度点云对应的占用指示信息,并将所述占用指示信息写入码流。
本申请实施例提供一种码流,包括:
所述码流是根据待编码信息进行比特编码生成的;其中,所述待编码信息至少包括:第一尺度点云对应的占用指示信息。
本申请实施例提供一种解码器,包括:
第一存储器,配置为存储可执行指令;
第一处理器,配置为执行所述第一存储器中存储的可执行指令时,实现如上述任一项所述的解码方法。
本申请实施例提供一种编码器,包括:
第二存储器,配置为存储可执行指令;
第二处理器,配置为执行所述第二存储器中存储的可执行指令时,实现如上述任一项所述的编码方法。
本申请实施例提供一种计算机可读存储介质,存储有可执行指令,用于引起第一处理器执行时,实现上述的解码方法,或者,用于引起第二处理器执行时,实现上述的编码方法。
本申请实施例提供一种计算机程序产品,包括计算机程序或指令,所述计算机程序或指令被第一处理器执行时,实现本申请实施例提供的解码方法;或者,所述计算机程序或指令被第二处理器执行时,实现本申请实施例提供的编码方法。
本申请实施例提供了一种编解码方法、解码器、编码器及计算机可读存储介质,在从码流中解析得到第一尺度点云对应的占用指示信息,也即表征第一尺度点云中各个体素占用情况的编码信息时,可以通过对第一尺度点云对应的前一个已解码的第二尺度点云,进行上采样与划分处理,将第二尺度点云上采样至第一尺度,并将上采样后的体素划分为m组,分组依次进行占用概率预测,得到m组占用概率。通过分组预测,加快了占用概率的预测速度,提高了预测过程的处理效率。如此,利用m组占用概率对第一尺度点云对应的占用指示信息进行解码,得到已解码的m组占用符号,重建第一尺度点云的几何数据,完成对第一尺度点云的解码,提高了解码效率与解码性能。
附图说明
图1为G-PCC编码的流程框图;
图2为G-PCC解码的流程框图;
图3为本申请实施例提供的编码方法的一种可选的流程示意图;
图4为本申请实施例提供的体素下采样的一种可选的过程示意图;
图5A为本申请实施例提供的尺度p点云中体素的占用符号的一种可选的效果示意图;
图5B为本申请实施例提供的尺度p-1点云中体素的占用符号的一种可选的效果示意图;
图6为本申请实施例提供的编码方法的一种可选的流程示意图;
图7为本申请实施例提供的编码方法的一种可选的流程示意图;
图8为本申请实施例提供的上采样与划分处理的一种可选的过程示意图;
图9为本申请实施例提供的编码方法的一种可选的流程示意图;
图10为本申请实施例提供的分组依次占用概率预测过程的一种可选的示意图;
图11为本申请实施例提供的点云中体素的占用概率的一种可选的效果示意图;
图12为本申请实施例提供的解码方法的一种可选的流程示意图;
图13为本申请实施例提供的解码方法的一种可选的流程示意图;
图14为本申请实施例提供的编解码方法应用于实际场景的一种可选的过程示意图;
图15为本申请实施例提供的解码器的一种可选的结构示意图;
图16为本申请实施例提供的编码器的一种可选的结构示意图;
图17为本申请实施例提供的解码器的一种可选的结构示意图;
图18为本申请实施例提供的编码器的一种可选的结构示意图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述,所描述的实施例不应视为对本申请的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
在以下的描述中,所涉及的术语“第一\第二\第三”仅仅是是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
对本申请实施例进行进一步详细说明之前,对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释。
体素:体素是体积元素的简称,是数字数据于三维空间分割上的最小单位。通过体素,可以对3D空间进行网格划分,并赋予每个网格特征。示例性地,体素可以是三维空间中固定大小的立方块。体素可以广泛用于三维成像、科学数据与医学影像等领域。
点云压缩算法包括基于几何的点云压缩(Geometry-based Point Cloud Compression,G-PCC)。G-PCC中的几何压缩主要通过八叉树模型和/或三角形表面模型实现。
随着人工智能技术的发展,神经网络被应用于基于几何的点云压缩技术中。基于神经网络的点云几何压缩技术,可大致分为几何有损压缩与无损压缩。其中,无损压缩算法主要围绕体素占用概率的预测模型的设计展开。其中体素的数据表征通常使用八叉树模型,体积模型,稀疏张量表征等,在预测模型的设计中,往对于几何无损压缩,在编码器端,往往需要以父节点,邻居节点等周围上下文为输入,经过神经网络(如卷积,全连接)层的处理,输出点云的几何数据中每个体素的占据概率,进而使用熵编码器,将每个体素的占据概率对应的体素占据符号,转换成码流。相应地,在解码器端,根据同样的过程预测每个体素的占据概率,根据预测的占据概率,使用熵解码器从码流中解码出体素占据符号,重建点云的几何数据。
可以看出,对点云中的每个体素进行占据概率预测,进而基于占据概率进行编解码的方法往往存在性能或者复杂度上的缺陷:比如,待预测体素所处位置的周围上下文不充分,导致预测不准确,进而导致编码压缩性能差;或者对每个体素进行占据概率预测,过度使用了上下文信息,造成了码率的冗余,从而导致编解码时间缓慢,编解码效率低。综上所述,降低了编解码性能。
本申请实施例提供一种编解码方法、解码器、编码器及计算机可读存储介质,能够提高编解码效率,提高编解码性能。为了便于对本申请实施例所提供的技术方案的理解,首先提供一种G-PCC编码的流程框图和G-PCC解码的流程框图。需要说明的是,本申请实施例描述的G-PCC编码的流程框图和G-PCC解码的流程框图仅是为了更加清楚地说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定。本领域技术人员可知,随着点云压缩技术的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似G-PCC的点云编解码架构同样适用,本申请实施例压缩的点云可以是视频中的点云,但不局限于此。
在点云G-PCC编码器框架中,将输入三维图像模型的点云进行slice划分后,对每一个slice进行独立编码。
如图1所示的G-PCC编码的流程框图中,应用于编码器中,针对待编码的点云数据,先通过条带(slice)划分,将点云数据划分为多个slice。在每一个slice中,点云的几何信息和属性信息是分开进行编码的。在几何编码过程中,对几何信息进行坐标转换,使点云全都包含在一个包围盒(bounding box)中,然后再进行量化,量化主要起到缩放的作用,由于量化取整,使得一部分点云的几何信息相同,可以基于参数来决定是否移除重复点,量化和移除重复点这一过程又被称为体素化过程。接着对bounding box进行八叉树划分。在基于八叉树的几何信息编码流程中,将包围盒八等分为8个子立方体,对非空的(包含点云中的点)的子立方体继续进行八等分,直到划分得到的叶子结点为1x1x1的单位立方体时停止划分,对叶子结点中的点进行算术编码,生成二进制的几何比特流,即几何码流。在基于三角面片集(triangle soup,trisoup)的几何信息编码过程中,同样也要先进行八叉树划分,但区别于基于八叉树的几何信息编码,该trisoup不需要将点云逐级划分到边长为1x1x1的单位立方体,而是划分到子块(block)边长为W时停止划分,基于每个block中点云的分布所形成的表面,得到该表面与block的十二条边所产生的至多十二个交点(vertex),对vertex进行算术编码(基于交点进行表面拟合),生成二进制的几何比特流,即几何码流。vertex还用于在几何重建的过程的实现,而重建的几何信息在对点云的属性编码时使用。
在属性编码过程中,进行颜色转换,将颜色信息(即属性信息)从RGB颜色空间转换到YUV颜色空间。然后,利用重建的几何信息对点云重新着色,使得未编码的属性信息与重建的几何信息对应起来。在颜色信息编码过程中,主要有两种变换方法,一是依赖于细节层次(Level of Detail,LOD)划分的基于距离的提升变换,二是直接进行区域自适应分层变换(Region Adaptive Hierarchal Transform,RAHT)的变换,这两种方法都会将颜色信息从空间域转换到频域,通过变换得到高频系数和低频系数,最后对系数进行量化(即量化系数),最后,将经过八叉树划分及表面拟合的几何编码数据与量化系数处理属性编码数据进行slice合成后,依次编码每个block的vertex坐标(即算数编码),生成二进制的属性比特流,即属性码流。
如图2所示的G-PCC解码的流程框图,应用于解码器中。解码器获取二进制码流,针对二进制码流中的几何比特流(即几何码流)和属性比特流分别进行独立解码。在对几何比特流的解码时,通过算术解码-八叉树合成-表面拟合-重建几何-反坐标变换,得到点云的几何信息;在对属性比特流的解码时,通过算术解码-反量化-基于LOD的反提升或者基于RAHT的反变换-反颜色转换,得到点云的属性信息,基于几何信息和属性信息还原待编码的点云数据的三维图像模型。
本申请实施例的编码方法,可以应用于如图1所示的G-PCC的几何信息编码流程中,在体素化完成之后,对体素化后的第一尺度点云进行体素下采样,确定第二尺度点云;对第二尺度点云进行上采样与划分处理,确定m组上采样体素;对m组上采样体素依次进行占用概率预测,确定m组占用概率,根据m组占用概率确定m组各自的占用符号,作为需要进行熵编码的数据,从而替代图1中八叉树分配、表面拟合的处理过程。本申请实施例的编码过程可以采用图1中的算术编码方法,如熵编码。通过算术编码过程对m组各自的占用符号进行编码,确定第一尺度点云对应的占用指示信息,写入几何比特流(码流)。
本申请实施例的解码方法,可以应用于如图2所示的G-PCC的几何信息解码流程中,通过解析几何比特流,确定第一尺度点云对应的占用指示信息;确定第二尺度点云,并对第二尺度点云进行上采样与划分处理,确定m组上采样体素;其中,第二尺度点云为第一尺度点云对应的前一个已解码的点云数据;对m组上采样体素依次进行占用概率预测,确定m组占用概率,并通过m组占用概率对占用指示信息进行解码,确定m组各自的占用符号。本申请实施例的编码过程可以采用图2中的算术解码方法,如熵解码。在算术解码之后,无需图2中八叉树合成和表面拟合的处理,在重建几何过程中直接基于m组占用符号,确定第一尺度点云对应的重建几何数据。
需要说明的是,本申请实施例的编码方法与解码方法也可以用于G-PCC之外的其他点云编码和解码流程中。
下面说明本申请实施例提供的应用于编码器的编码方法。
参见图3,图3是本申请实施例提供的编码方法的一个可选的流程示意图,将结合图3示出的步骤进行说明。由于本申请实施例提供的编码方法应用于图1生成几何比特流的过程,以下实施例中的第一尺度点云、第二尺度点云等,具体是指第一尺度点云的几何信息、第二尺度点云的几何信息等。
S101、对第一尺度点云进行体素下采样,确定第二尺度点云。
本申请实施例中,编码器对第一尺度点云进行体素下采样之前,需要完成对第一尺度点云的体素化,以通过体素网格的形式来表示第一尺度点云。
本申请实施例中,对于体素化过程,点云中的一个点可以对应一个被占用的体素(即非空体素),而未被占用的体素(即空体素)表示该体素位置上没有落入点云中的点。在一些实施例中,可以将被占用的体素标记为1,将未被占用的体素标记为0。如此,体素化后的点云可以通过体素网格中,各个位置上体素的占用符号,来表示点云的几何数据。
本申请实施例中,编码器对体素化的第一尺度点云进行体素下采样,得到第二尺度点云。在一些实施例中,编码器可以通过池化的方式实现体素下采样,如采用步长为2×2×2最大池化层,将第一尺度点云的8个体素合并为第二尺度点云中的1个体素,每次下采样将点云在三个维度上的尺寸均缩小为原来的一半。这样,第一尺度点云相对第二尺度点云可以称为高尺度点云,第二尺度点云相对第一尺度点云可以称为低尺度点云。
请参见图4,图中作为第一尺度点云的示例的是一个包括4×4×2个体素的尺度p点云,经一次体素下采样后,得到的第二尺度点云即尺度p-1点云中包括2×2×1个体素。点云中被占用的体素用实的立方块表示,代表有点云中的点所在的位置。点云中未被占用的体素用空的立方块表示,代表有点云中没有点所在的位置。如图4所示,尺度p点云朝向纸面一侧有6个体素被占用,相应的占用符号如图5A所示,体素的占用符号为1表示该体素被占用,为0表示该体素未被占用。经过体素下采样之后,尺度p-1点云对应的4个体素中有3个体素被占用,1个体素未被占用,相应的占用符号如图5B所示。但图4的点云仅仅是示例性的,实际的点云可以包括更多的体素。
S102、对第二尺度点云进行上采样与划分处理,确定m组上采样体素。
本申请实施例中,编码器对第二尺度点云进行上采样处理,将第二尺度点云上采样至第一尺度。可理解,由于第二尺度低于第一尺度,第二尺度点云在进行上采样时,会得到一个第二尺度的体素对应的多个上采样体素,多个上采样体素的每个上采样体素中是否包含了点云中的点,也即是否被占用需要通过之后占用概率预测的过程进行预测。这里,编码器对通过上采样得到的各个上采样体素进行划分处理,确定m组上采样体素,以基于分组的形式进行后续的预测处理。其中,m为大于或等于1的整数。
本申请实施例中,编码器对第二尺度点云进行上采样处理后,对上采样得到的第一尺度的上采样体素进行编号与分组,以根据编号实现划分处理,得到m组上采样体素。
在一些实施例中,基于图3,如图6所示,S102可以通过执行S1021-S1022来实现,如下:
S1021、对第二尺度点云进行体素上采样,确定第一尺度的n个上采样体素;n为大于1的整数。
S1022、对n个上采样体素中的每个上采样体素进行编号,并基于每个上采样体素对应的编号进行分组,确定m组上采样体素。
在一些实施例中,基于图6,如图7所示,S1022可以通过S201-S202来实现,如下:
S201、对n个上采样体素中的每个上采样体素进行编号,确定每个上采样体素对应的编号。
本申请实施例中,编码器可以根据一个第二尺度点云中的体素在上采样后对应的体素的数量,确定编号范围;利用编号范围,在第二尺度点云上采样得到的n个上采样体素中,对第二尺度点云中每个体素在n个上采样体素中对应的多个上采样体素进行相同方式的编号,得到每个上采样体素对应的编号。
S202、根据每个上采样体素的编号,对n个上采样体素进行分组,确定m组上采样体素。
在一些实施例中,在n个上采样体素中,编码器可以根据每个上采样体素的编号,将编号相同的上采样体素划分为一组,确定m组上采样体素。
在一些实施例中,编码器也可以在用于标记上采样体素的编号中,确定至少两个连续编号;并根据每个上采样体素的编号,将至少两个连续编号在n个上采样体素中对应的上采样体素作为一组,确定m组上采样体素。
在一些实施例中,如图8所示,对尺度p-1点云即第二尺度点云,进行2×2×2的上采样后得到尺度p的n(n=24)个上采样体素,即第一尺度的n个上采样体素。可以看出,尺度p-1点云的一个体素在上采样后对应尺度p的8个体素,则可以确定编号范围为1-8。对于尺度p-1点云中的一个体素,利用编号1-8分别对该体素在尺度p点云中的对应的8个上采样体素进行编号;对于尺度p-1点云中的三个体素,利用相同的编号方式对每个p-1尺度点云中的体素对应的8个上采样体素进行编号,得到3个编号1的上采样体素、3个编号2的上采样体素…,至3个编号8的上采样体素。编码器根据每个上采样体素的编号进行分组,图8中示出了3种可选的分组方式,如下:
第一种分组方式:将编号相同的上采样体素划分为一组;示例性地,将每个编号1的上采样体素分为一组,将每个编号2的上采样体素分为一组,以此类推,得到8组上采样体素作为m组上采样体素。
第二种分组方式:确定编号1和编号2为连续编号、以及编号3和编号4为连续编号,以及编号5至编号8为连续编号;将编号1和编号2对应的上采样体素作为一组,将编号3和编号4对应的上采样体素作为一组,将编号5至编号8对应的上采样体素作为一组,得到3组上采样体素作为m组上采样体素。
第三种分组方式:将n个上采样体素全部作为一组,相当于将编号1-8确定为连续编号,得到1组上采样体素作为m组上采样体素。
S103、对m组上采样体素依次进行占用概率预测,确定m组占用概率,根据m组占用概率确定m组各自的占用符号。
本申请实施例中,编码器基于确定的m组上采样体素,可以以递进的方式,对m组上采样体素依次进行占用概率预测。在预测当前组上采样体素的占用概率时,利用之前已预测的一组或多组上采样体素的预测结果(如占用概率对应的占用符号),对当前组上采样体素的占用概率进行预测,确定当前组上采样体素对应的占用概率;直至确定m组占用概率,根据m组占用概率确定m组各自的占用符号。
在一些实施例中,基于图3、图6、或图7中的任一个,如图9所示,S103可以通过执行S1031-S1034来实现,如下:
S1031、在m组上采样体素中,对第1组上采样体素进行占用概率预测,确定第一组占用概率,并利用预设占用符表示第一组占用概率,确定第一组占用符号。
本申请实施例中,第1组上采样体素之前没有已预测的其他组上采样体素的预测结果,故,编码器可以直接对m组上采样体素中的第1组上采样体素进行占用概率预测,确定第一组占用概率。并且,利用预设占用符表示第一组占用概率,也即对第一组占用概率进行占用符标识,以对第一组占用概率所表征的对应体素是否被占用的情况进行区分,确定第一组占用符号。
S1032、在i大于1且小于或等于m时,利用已完成概率预测与占用符表示的至少k组上采样体素对应的至少k组占用符号,对第i组上采样体素进行占用概率预测,确定第i组占用概率;其中,i为整 数,k为大于或等于1且小于i的整数。
本申请实施例中,在i大于1且小于或等于m时,也即对于m组上采样体素中的第2组至第m组上采样体素,编码器在对第2组至第m组上采样体素中的第i组上采样体素进行占用概率预测时,利用已完成概率预测与占用符表示的至少k组上采样体素对应的至少k组占用符号,对第i组上采样体素进行占用概率预测,确定第i组占用概率。
在一些实施例中,编码器可以对第i组上采样体素进行特征提取,确定第i组体素特征;确定已完成概率预测与占用符表示的第i-k组占用符号至第i-1组占用符号为至少k组占用符号;根据至少k组占用符号,结合第i组体素特征,对第i组上采样体素中每个上采样体素进行占用概率预测,确定第i组上采样体素中每个上采样体素对应的占用概率为第i组占用概率。
在一些实施例中,编码器可以将第i组上采样体素之前,已完成概率预测与占用符表示的全部上采样体素组对应的占用符号作为至少k组占用符号;也可以是第i组上采样体素之前,最近的已完成概率预测与占用符表示的至少k组占用符号,具体的根据实际情况进行选择,本申请实施例不作限定。
S1033、利用预设占用符表示第i组占用概率,确定第i组占用符号。
本申请实施例中,编码器基于得到的第i组占用概率,利用预设占用符中的第一占用符号,表示第i组占用概率中大于或等于预设概率阈值的占用概率;并利用预设占用符中的第二占用符号,表示第i组占用概率中小于预设概率阈值的占用概率,从而确定出第i组占用符号。
示例性地,预设概率阈值可以为90%,预设占用符可以包含0和1。若第i组占用概率中某个上采样体素的占用概率大于或等于预设概率阈值,说明该上采样体素中包含点云中点的概率较大,则利用1表示该占用概率,并使预设占用符1对应于该占用概率对应的上采样体素。
需要说明的是,S1031中利用预设占用符表示第一组占用概率,确定第一组占用符号的过程与S1033中利用预设占用符表示第i组占用概率,确定第i组占用符号的方法是一致的。
S1034、利用包含第i组占用符号的至少k组占用符号,继续对第i+1组上采样体素进行占用概率预测与占用符表示,直至确定m组各自的占用符号。
本申请实施例中,编码器在占用概率预测与占用符表示过程中,利用包含第i组占用符号的至少k组占用符号,示例性地,上述得到第1组占用符号至第i组占用符号,继续对第i+1组上采样体素进行占用概率预测与占用符表示,得到第i+1组上采样体素对应的第i+1组占用符号。如此继续依次递进地进行占用概率预测与占用符表示,直至确定m组各自的占用符号。
在一些实施例中,上述S1021-S1022,以及S1031-S1034的过程可以通过概率预测模型来实现。这里,概率预测模型是一个训练好的深度学习网络,用于执行特征提取、上采样和概率预测处理。示例性地,以图4所示的第二尺度点云(尺度p-1点云)的示例,可以将图5B所示的占用符号的序列输入该概率预测模型,进行特征提取、上采样和占用概率预测。对于低尺度点云(即第二尺度点云)中未被占用的体素(如图5B中标记为0的体素),其上采样得到的高尺度(即第一尺度)的多个上采样体素也是未被占据的,不需要进行概率预测。而对低尺度点云中被占用的体素进行上采样得到的高尺度的多个体素需要进行预测。以2×2×2方式的上采样为例,图5B的1个尺度p-1体素会分解为尺度p的8个上采样体素,这8个上采样体素是否被占用是不确定的,均需要进行占用概率预测。占用概率预测的过程可以如图10所示,如下:
图10中,尺度p-1点云表示第二尺度点云,概率预测模型通过卷积神经网络(Convolution Neural Network,CNN)对尺度p-1点云进行上采样,得到第一尺度,也即尺度p的n(n=24)个上采样体素。其中,每个尺度p-1的体素对应尺度p的8个上采样体素。在24个上采样体素中,编码器利用编号1-8,对每个尺度p-1体素分解得到的8个上采样体素进行一一编号,得到24个上采样体素中每个上采样体素对应的编号。概率预测模型将编号相同的上采样体素作为一组,得到8组上采样体素,进入占用概率预测过程。
在占用概率预测过程的第1阶段,概率预测模型利用CNN网络,对第1组上采样体素进行预测,得到第1组上采样体素中每个上采样体素对应的占用概率。概率预测模型根据第1组上采样体素中每个上采样体素对应的占用概率进行占用符表示,得到第1组上采样体素中每个上采样体素对应的占用符号。
如图10所示,在占用概率预测过程的第2阶段,结合第1阶段得到的第1组上采样体素中每个上采样体素对应的占用符号,对于第2组上采样体素进行占用概率预测与占用符表示,得到第2组上采样体素中每个上采样体素对应的占用符号。在占用概率预测过程的第3阶段,结合第1阶段得到的第1组上采样体素中每个上采样体素对应的占用符号,以及第2阶段得到的第2组上采样体素中每个上采样体素对应的占用符号,对于第3组上采样体素进行占用概率预测与占用符表示,得到第3组上采样体素中每个上采样体素对应的占用符号。以此类推,直至得到第8组上采样体素中每个上采样体素对应的占用符号。如此,可以利用得到的8组上采样体素中每个上采样体素对应的占用符号,得到第一尺度点云,即尺度p点云的几何信息。这样,通过从低尺度向高尺度采样,恢复出高尺度点云中的体素,再通过分 组与占用概率预测,预测各个高尺度体素中包含点云中的点的占用概率,并根据占用概率以预设占用符号标记各个高尺度体素,从而预测得到高尺度点云的几何数据。
在一些实施例中,通过上述8个阶段预测得到的24个上采样体素中各个上采样体素对应的占用概率可以如图11所示,图11中示出了尺度p的24个上采样体素中,朝向纸面一侧的各个上采样体素对应的占用概率。图11中的预测概率仅仅为了方便说明,不能理解为实际运算的结果。可以看出,如果对实际被占用的体素预测得到的占用概率越接近于1,而对实际未被占用的体素预测得到的占用概率越接近于0,则预测越准确。而预测越准确,在根据高尺度点云中体素的占用概率确定的占用符号进行编码时,得到的编码数据就越少,也即概率预测越准确,对点云几何信息的压缩性能就越好。
S104、对m组各自的占用符号进行编码,确定第一尺度点云对应的占用指示信息,并将占用指示信息写入码流。
本申请实施例中,编码器可以对m组各自的占用符号中每组占用符进行熵编码,确定编码得到的m组占用指示信息为第一尺度点云对应的占用指示信息。
本申请实施例中,高尺度点云中体素是否被占用可以通过占用符号表示,没有进行占用概率预测的体素可以将占用符号直接置0。编码器对m组各自的占用符号,如m组占用符号的序列进行编码,得到第一尺度点云对应的第一编码数据,也即第一尺度点云对应的占用指示信息,从而实现了高尺度点云的几何数据的无损压缩。
在一些实施例中,熵编码可以采用自适应上下文的二进制算术编码(CABAC:Context-based Adaptive Binary Arithmetic Coding)算法,但不局限于此。根据熵编码的原理,对占用概率的预测越准,则信息熵越小,实际码率和带宽就越节省。如此,编码器将第一尺度点云对应的占用指示信息写入码流,发送至解码器,由解码器提取出第一尺度点云对应的占用指示信息,结合之前已解码的低尺度点云的几何数据(如第二尺度点云),送入熵解码器,就可以重建无损的高尺度点云的几何数据,也即重建第一尺度点云对应的重建几何数据。
可以理解的是,本申请实施例中分组预测与编码的方式,相较于基于每个体素的父节点和邻居节点,逐个体素进行占用概率预测与编码的方式,一方面极大减小了编码复杂度,提高了编码速度与压缩性能;另一方面,随着编码的进行,所有之前已经完成占用概率预测的上采样体素组的占用符号都可以作为上下文信息,共同输入到卷积网络中,帮助预测当前上采样体素组的占据概率,从而提高了占用预测概率预测的准确性,进而提高了编码准确性,也即提高了编码性能。
在一些实施例中,编码器可以以并行的方式,对m组各自的占用符号进行编码,从而进一步加快编码速度,提高编码效率和编码性能。
可以理解的是,本申请实施例中,通过对第一尺度点云进行下采样得到第二尺度点云,并对第二尺度点云进行上采样与划分处理,将第二尺度点云上采样至第一尺度,并将上采样后的体素划分为m组,分组依次进行占用概率预测,得到m组占用概率。这样,通过分组预测,加快了占用概率的预测速度,提高了预测过程的处理效率与编码效率。如此,对m组占用概率对应的m组占用符号进行编码,确定第一尺度点云对应的占用指示信息写入码流,完成对第一尺度点云的编码,提高了编码效率与编码性能。
在一些实施例中,图4与图10是对第一尺度点云,如尺度p点云进行一次下采样、上采样与划分处理、以及占用概率预测为例的过程。在实际压缩时,可以对第一尺度点云进行更多次的下采样,例如2次、3次、4次或者更多次,以得到多个低尺度下的点云几何数据,并基于多个低尺度下的点云几何数据进行上采样与划分处理、占用概率预测以及占用符表示和编码处理,得到多个尺度对应的占用指示信息。
在一些实施例中,本申请实施例提供的编码方法还包括:基于第二尺度点云进行至少S次体素下采样,确定第三尺度点云至第S+2尺度点云;S为大于或等于1的整数;分别对第三尺度点云至第S+2尺度点云进行体素上采样与划分处理、并分别对划分后的分组依次进行占用概率预测与编码,确定第二尺度点云的占用指示信息至第S+1尺度点云的占用指示信息。
在上述实施例中,编码器在对第一尺度点云进行两次及以上的体素下采样时,可以将每一次体素下采样前、后的两个点云分别作为高尺度点云和低尺度点云,按照类似的方式进行划分处理、占用概率预测和熵编码,从而得到第二尺度点云的占用指示信息至第S+1尺度点云的占用指示信息;编码器将第一尺度点云的占用指示信息、第二尺度点云的占用指示信息至第S+1尺度点云的占用指示信息写入码流,发送至解码器。
可以理解的是,本申请实施例提供的编码方法可以重复应用于多个相邻尺度之间,且每组相邻尺度间的编码相互独立不依赖,因此可以灵活地实现尺度可伸缩的编码。
下面说明本申请实施例提供的应用于解码器的解码方法。
参见图12,图12是本申请实施例提供的解码方法的一个可选的流程示意图,将结合图12示出的步骤进行说明。由于本申请实施例提供的解码方法应用于图2对几何比特流进行解码的过程,以下实施例中的第一尺度点云、第二尺度点云等,具体是指第一尺度点云的几何信息、第二尺度点云的几何信息等。
S401、解析码流,确定第一尺度点云对应的占用指示信息。
本申请实施例中,解码器对接收到的码流进行解析,得到第一尺度点云对应的第一编码数据,也即第一尺度点云对应的占用指示信息。
S402、确定第二尺度点云,并对第二尺度点云进行上采样与划分处理,确定m组上采样体素;其中,第二尺度点云为第一尺度点云对应的前一个已解码的点云数据;m为大于或等于1的整数。
本申请实施例中,码流中包含多个尺度点云的占用指示信息,解码器在解码时,以从低尺度到高尺度的顺序进行解码。这里,解码器在对第一尺度点云对应的占用指示信息进行解码时,确定第一尺度点云之前,前一个已解码的点云数据为第二尺度点云;从而以第二尺度点云作为已解码的低尺度点云几何数据,对第一尺度点云进行解码预测与重建。
在一些实施例中,S402可以通过执行S4021-S4022来实现,如下:
S4021、对第二尺度点云进行体素上采样,确定第一尺度的n个上采样体素;n为大于1的整数。
S4022、对n个上采样体素中的每个上采样体素进行编号,并基于每个上采样体素对应的编号进行分组,确定m组上采样体素。
这里,S4021与S4022的过程与编码方法中的S1021与S1022的过程描述一致,此处不再赘述。
在一些实施例中,S4022可以通过执行S501-S502来实现,如下:
S501、对n个上采样体素中的每个上采样体素进行编号,确定每个上采样体素对应的编号。
S502、根据每个上采样体素的编号,对n个上采样体素进行分组,确定m组上采样体素。
在一些实施例中,解码器可以在n个上采样体素中,根据每个上采样体素的编号,将编号相同的上采样体素划分为一组,确定m组上采样体素。或者,也可以确定至少两个连续编号;根据每个上采样体素的编号,将至少两个连续编号在n个上采样体素中对应的上采样体素作为一组,确定m组上采样体素。具体的根据实际情况进行选择,本申请实施例不作限定。
这里,S501与S502的过程与编码方法中的S201与S202的过程描述一致,此处不再赘述。
可以看出,本申请实施例中,解码器中进行上采样与划分处理的过程与编码器进行上采样与划分处理的过程是相同的。
S403、对m组上采样体素依次进行占用概率预测,确定m组占用概率,并根据m组占用概率对占用指示信息进行解码,确定m组各自的占用符号。
本申请实施例中,解码器对于得到的m组上采样体素,以递进的方式,分组依次进行占用概率预测,确定m组占用概率。在预测当前组上采样体素的占用概率时,利用之前已预测的一组或多组上采样体素的预测结果(如占用概率对应的占用符号),对当前组上采样体素的占用概率进行预测,确定当前组上采样体素对应的占用概率;直至确定m组占用概率,根据m组占用概率确定m组各自的占用符号。
在一些实施例中,基于图12,如图13所示,S403可以通过执行S4031-S4033来实现,如下:
S4031、在m组上采样体素中,对第1组上采样体素进行占用概率预测,确定第一组占用概率,并根据第一组占用概率对占用指示信息进行解码,确定第一组占用符号。
S4031中,对第1组上采样体素进行占用概率预测,确定第一组占用概率的过程与编码方法中的S1031中相应的过程描述一致,此处不再赘述。解码器根据第一组占用概率对占用指示信息进行解码,确定第一组占用符号。
S4032、在i大于1且小于或等于m时,利用已解码的至少k组占用符号,对第i组上采样体素进行占用概率预测,确定第i组占用概率,并根据第i组占用概率,对占用指示信息中的第i组占用指示信息进行解码,确定第i组占用符号;其中,i为整数,k为大于或等于1且小于i的整数。
S4032中,在i大于1且小于或等于m时,也即对于m组上采样体素中的第2组至第m组上采样体素,解码器在对第2组至第m组上采样体素中的第i组上采样体素进行占用概率预测时,利用已解码的至少k组上采样体素对应的至少k组占用符号,对第i组上采样体素进行占用概率预测,确定第i组占用概率。
这里,解码器利用已解码的至少k组上采样体素对应的至少k组占用符号,对第i组上采样体素进行占用概率预测,确定第i组占用概率的过程与S1032中相应的概率预测过程描述一致,此处不再赘述。
在一些实施例中,编码器可以对第i组上采样体素进行特征提取,确定第i组体素特征;确定已解码的第i-k组占用符号至第i-1组占用符号为至少k组占用符号;根据至少k组占用符号,结合第i组体素特征,对第i组上采样体素中每个上采样体素进行占用概率预测,确定预测得到的第i组上采样体素中每个上采样体素对应的占用概率为第i组占用概率。
在一些实施例中,解码器可以根据第i组占用概率,对第i组占用指示信息中每个占用指示信息进行熵解码,确定第i组占用指示信息中每个占用指示信息对应的占用符号,从而确定第i组占用符号;其中,占用符号包括:第一占用符与第二占用符;第一占用符表征对应体素被占据;第二占用符表征对应体素未被占据。示例性地,第一占用符可以为1,第二占用符可以为0。具体的根据实际情况进行选择, 本申请实施例不作限定。
S4033、基于第i组占用符号继续下一组解码,直至确定m组各自的占用符号。
本申请实施例中,解码器可以基于第i组占用符号,以上述同样的方式继续下一组解码,直至确定m组各自的占用符号。解码器可以利用包含第i组占用符号的至少k组占用符号,对第i+1组上采样体素进行占用概率预测,并根据第i+1组占用概率对占用指示信息中的第i+1组占用指示信息进行解码,确定第i+1组占用符号,直至对占用指示信息解码完成,确定m组各自的占用符号。
S404、基于m组占用符号,确定第一尺度点云对应的重建几何数据。
本申请实施例中,解码器在完成m组上采样体素的占用概率预测与解码,得到m组占用符号时,可以基于m组占用符号,确定第一尺度点云对应的重建几何数据,重建第一尺度点云的几何信息。
可以理解的是,本申请实施例中,通过对第一尺度点云对应的前一个已解码的第二尺度点云,进行上采样与划分处理,将第二尺度点云上采样至第一尺度,并将上采样后的体素划分为m组,分组依次进行占用概率预测,得到m组占用概率。通过分组预测,加快了占用概率的预测速度,提高了预测过程的处理效率。如此,利用m组占用概率对第一尺度点云对应的占用指示信息进行解码,得到已解码的m组占用符号,重建第一尺度点云的几何数据,完成对第一尺度点云的解码,提高了解码效率与解码性能。
进一步地,本申请实施例中分组预测与编码的方式,相较于基于每个体素的父节点和邻居节点,逐个体素进行占用概率预测与解码的方式,一方面极大减小了解码复杂度,提高了解码速度;另一方面,随着解码的进行,所有之前已经解码的上采样体素组的占用符号都可以作为上下文信息,共同输入到卷积网络中,帮助预测当前上采样体素组的占据概率,从而提高了占用预测概率预测的准确性,进而提高了解码准确性,也即提高了解码性能。
在一些实施例中,在确定第二尺度点云之前,本申请实施例还可以执行S601-S605的过程进行解码,得到第二尺度点云,如下:
S601、解析码流,确定第T尺度点云,以及第T-1尺度点云对应的第T-1占用指示信息。
S601中,解码器可以通过解析码流,得到第T尺度点云对应的占用指示信息,即第T占用指示信息;并确定第T-1尺度点云的前一个已解码的低尺度点云,即第T尺度点云,以利用第T尺度点云对第T-1占用指示信息进行解码。其中,j=T,T-1,…,3;T为大于或等于3的整数;
示例性地,解码器可以通过解析码流,得到第二尺度点云对应的占用指示信息,即第二占用指示信息;并确定第二尺度点云之前,已解码的第三尺度点云。从而利用已解码的第三尺度点云对第二尺度点云对应的第二占用指示信息进行解码。或者,解码器也可以从第三尺度点云对应的第三占用指示信息开始解码直至解码出第二尺度点云:从码流中获取第三占用指示信息,确定已解码的第四尺度点云,利用第四尺度点云先对第三占用指示信息进行解码,得到已解码的第三尺度点云;再利用第三尺度点云对码流中解析得到的第二占用指示信息进行解码,得到第二尺度点云。
这里,对于上述过程中的每次解码过程,可以从j=T开始,按照从大到小的顺序对j的每一个取值执行以下处理:
S602、基于第j尺度点云进行体素上采样与划分处理,并对划分后的分组依次进行占用概率预测,确定第j-1尺度点云中m个分组对应的m组占用概率。
S602中,解码器通过与上述S402-S403中类似的过程,基于已解码的第j尺度点云进行体素上采样与划分处理,并对划分后的分组依次进行占用概率预测,确定第j-1尺度点云中m个分组对应的m组占用概率。
S603、根据第j-1尺度点云中m个分组对应的m组占用概率,对第j-1占用指示信息进行熵解码,确定第j-1占用指示信息对应的m组占用符号。
S604、基于第j-1占用指示信息对应的m组占用符号,重建第j-1尺度点云。
S603-S604中,解码器根据预测得到的第j-1尺度点云中m个分组对应的m组占用概率,对第j-1占用指示信息进行熵解码,确定第j-1占用指示信息对应的m组占用符号。进而基于第j-1占用指示信息对应的m组占用符号进行几何数据重建,重建第j-1尺度点云。
S605、继续根据j的下一个取值进行下一次处理,直至j=3时,重建得到第二尺度点云。
S605中,解码器根据从大到小的顺序,以上述同样的上采样与划分、分组依次占用概率预测、以及解码过程,对j的下一个取值进行下一次处理,重建下一个尺度(更高尺度)的点云,直至j=3时,重建得到第二尺度点云。
如此,解码器对相邻尺度进行解码时,可以利用前一个已解码的低尺度点云的已知几何数据,对待解码的高尺度点云的占用指示信息进行解码,重建高尺度点云,直至重建得到第二尺度点云。
需要说明的是,本申请实施例中的解码方法可应用于可伸缩的编解码方法中,也即对于编码器侧发送的多个尺度点云的多个占用指示信息,解码器可以根据实际解码精度的需要,以从低尺度向高尺度解码的顺序,解码重建至任意尺度的点云。示例性地,编码器在码流中写入并发送了第一尺度点云的占用 指示信息、第二尺度点云的占用指示信息至第S+1尺度点云的占用指示信息,而解码器可以根据预设精度要求,根据本申请实施例提供的解码方法,从第S+1尺度点云解码至第三尺度点云,重建第三尺度点云的几何数据后结束解码,不再对第二尺度占用指示信息与第一尺度点云对应的占用指示信息进行解码。具体的根据实际情况进行选择,本申请实施例不作限定。
可以理解的是,本申请实施例提供的解码方法可以重复应用于多个相邻尺度之间,且每组相邻尺度间的解码相互独立不依赖,因此可以灵活地实现尺度可伸缩的解码。
需要说明的是,上述解码器的每次解码过程都是将已解码的低尺度点云作为已知信息,对高尺度点云的占用指示信息进行解码的。对于解码器的首个解码过程,其已知信息可以是编码器侧发送的未编码的预设数量个点云信息。编码器可以将预设数量个点云信息,如点云中100个点的坐标作为首个已知信息,以未编码的方式直接发送至解码端,以使解码器无需对首个已知信息进行解码,直接利用编码器发送的预设数量个点的位置信息,重建出相应尺度的点云,以继续之后的解码过程。
下面,结合图14,说明本申请实施例提供的编解码方法在实际场景中的应用。
如图14所示,首先对点云图像A进行体素化得到尺度p+1点云,在编码器的下采样过程中,对尺度p+1点云进行体素下采样,得到尺度p点云。在编码器的上采样过程中,通过CNN网络对尺度p点云进行上采样,并对上采样得到尺度p+1的n个上采样体素进行编号与分组划分,示例性地,分为8组。从第1组到第8组,通过CNN网络依次预测每组上采样体素的占据概率(占用概率)。这里,占用概率预测的过程可参考图10。
示例性地,在第1阶段预测中,8组上采样体素中的第1组上采样体素经过CNN处理,输出第1组上采样体素的占据概率,然后根据第1组上采样体素占据概率,确定第1组上采样体素对应的第1组占据符号。随着编码的进行,在第2阶段预测直至第8阶段的预测过程中,每次预测过程的输入数据除了当前组上采样体素,还包括所有之前预测阶段得到的其他上采样体素组的占用符号,以将其作为当前组上采样体素的上下文信息,共同输入到CNN中,帮助预测当前组上采样体素的占据概率,并将当前组上采样体素的占据概率转化为占用符号。这样,编码器根据预测得到的每组上采样体素的占据概率,得到每组上采样体素的占用符号,基于每组上采样体素的占用符号进行算术编码,得到尺度p+1点云的尺度p+1编码数据写入码流。其中,尺度p+1编码数据包含8组占用符号对应的8组编码。编码器基于尺度p点云进行下一次下采样,得到尺度p-1点云,并通过上述相同的过程,基于尺度p-1点云进行预测与编码,得到尺度p点云的尺度p编码数据写入码流。以此类推,编码器得到包含尺度p-1点云的尺度p-1编码数据、尺度p编码数据以及尺度p+1编码数据的码流,并将码流发送至解码器。
在解码器中,根据本申请实施例提供的解码方法,从低尺度向到高尺度解码。示例性地,对尺度p-1编码数据进行解码,重建得到尺度p-1点云。在解码器的上采样过程中,对尺度p-1点云进行上采样,得到尺度p的n个上采样体素。对尺度p的n个上采样体素进行相同的编号与分组,并分组依次进行占用概率预测,得到尺度p的每组上采样体素对应的占用概率,利用尺度p的每组上采样体素对应的占用概率,对码流中解析得到的尺度p编码数据进行算术解码,得到尺度p的n个上采样体素中各个上采样体素对应的占用符号,以此重建尺度p点云的几何数据,完成对尺度p编码数据的解码。之后基于已解码的尺度p点云,以相同的过程对尺度p+1编码数据解码,重建尺度p+1点云的几何数据。
可以理解的是,本申请实施例相比传统的G-PCC方法,具有良好的压缩性能与并行性,同时能够保持较快的编解码速度,提高编解码效率。申请人将本申请实施例的编解码方法与传统的G-PCC方法进行了对比测试,结果如表1所示,如下:
表1
  G-PCC 1-stage 3-stage 8-stage
码率(bpp) 1.029 1.086 0.703 0.633
增益 - +5.5% -31.7% -38.5%
编码时间 5.8 0.7 1.1 1.9
解码时间 3.3 0.7 1.0 1.8
表1中,1-stage表示在分组划分处理中使用1组的分组方式,并对1组上采样体素进行占用概率预测与编解码;3-stage表示在分组划分处理中使用3组的分组方式,并对3组上采样体素依次进行占用概率预测与编解码;8-stage表示在分组划分处理中使用8组的分组方式,并对8组上采样体素依次进行占用概率预测与编解码。可以看出,相比于传统的G-PCC方法,本申请实施例在3种分组预测方式下的编码时间和解码时间均远小于传统的G-PCC方法的编解码时间。并且,3-stage和8-stage下的码率(bits per pixel,bpp)也均小于传统的G-PCC的码率,从而使得3-stage和8-stage下的增益分别达到了-31.7%与-38.5%。这里,增益的数值与性能负相关。增益越小,比如以负值表示的数值越小,代表压缩性能越好。这一数据说明了编解码性能的提升,并节约了传输带宽。
本申请实施例提供一种解码器1,如图15所示,包括:
解析部分11,配置为解析码流,确定第一尺度点云对应的占用指示信息;
第一分组部分12,配置为确定第二尺度点云,并对所述第二尺度点云进行上采样与划分处理,确定m组上采样体素;其中,所述第二尺度点云为所述第一尺度点云对应的前一个已解码的点云数据;
第一预测部分13,配置为对所述m组上采样体素依次进行占用概率预测,确定m组占用概率,并根据所述m组占用概率对所述占用指示信息进行解码,确定所述m组各自的占用符号;m为大于或等于1的整数;
解码部分14,配置为基于所述m组占用符号,确定所述第一尺度点云对应的重建几何数据。
在一些实施例中,所述第一分组部分12,还配置为对所述第二尺度点云进行体素上采样,确定第一尺度的n个上采样体素;n为大于1的整数;对所述n个上采样体素中的每个上采样体素进行编号,并基于每个上采样体素对应的编号进行分组,确定所述m组上采样体素。
在一些实施例中,所述第一预测部分13,还配置为在所述m组上采样体素中,对第1组上采样体素进行占用概率预测,确定第一组占用概率,并根据所述第一组占用概率对所述占用指示信息进行解码,确定第一组占用符号;在i大于1且小于或等于m时,利用已解码的至少k组占用符号,对第i组上采样体素进行占用概率预测,确定第i组占用概率,并根据所述第i组占用概率,对所述占用指示信息中的第i组占用指示信息进行解码,确定第i组占用符号;其中,i为整数,k为大于或等于1且小于i的整数;基于所述第i组占用符号继续下一组解码,直至确定所述m组各自的占用符号。
在一些实施例中,所述第一分组部分12,还配置为对所述n个上采样体素中的每个上采样体素进行编号,确定所述每个上采样体素对应的编号;根据所述每个上采样体素的编号,对所述n个上采样体素进行分组,确定所述m组上采样体素。
在一些实施例中,所述第一分组部分12,还配置为在所述n个上采样体素中,根据所述每个上采样体素的编号,将编号相同的上采样体素划分为一组,确定所述m组上采样体素。
在一些实施例中,所述第一分组部分12,还配置为确定至少两个连续编号;根据所述每个上采样体素的编号,将所述至少两个连续编号在所述n个上采样体素中对应的上采样体素作为一组,确定所述m组上采样体素。
在一些实施例中,所述第一预测部分13,还配置为对所述第i组上采样体素进行特征提取,确定第i组体素特征;确定已解码的第i-k组占用符号至第i-1组占用符号为所述至少k组占用符号;根据所述至少k组占用符号,结合所述第i组体素特征,对所述第i组上采样体素中每个上采样体素进行占用概率预测,确定预测得到的所述第i组上采样体素中每个上采样体素对应的占用概率为所述第i组占用概率。
在一些实施例中,所述解析部分11、第一分组部分12、第一预测部分13与解码部分14,还配置为所述确定第二尺度点云之前,解析所述码流,确定第T尺度点云,以及第T-1尺度点云对应的第T-1占用指示信息;其中,j=T,T-1,…,3;T为大于或等于3的整数;从j=T开始,按照从大到小的顺序对j的每一个取值执行以下处理:基于第j尺度点云进行体素上采样与划分处理,并对划分后的分组依次进行占用概率预测,确定第j-1尺度点云中m个分组对应的m组占用概率;根据所述第j-1尺度点云中m个分组对应的m组占用概率,对所述第j-1占用指示信息进行熵解码,确定所述第j-1占用指示信息对应的m组占用符号;基于所述第j-1占用指示信息对应的m组占用符号,重建第j-1尺度点云;继续根据j的下一个取值进行下一次处理,直至j=3时,重建得到所述第二尺度点云。
在一些实施例中,所述解码部分14,还配置为根据所述第i组占用概率,对所述第i组占用指示信息中每个占用指示信息进行熵解码,确定所述第i组占用指示信息中每个占用指示信息对应的占用符号,从而确定所述第i组占用符号;其中,所述占用符号包括:第一占用符与第二占用符;所述第一占用符表征对应体素被占据;所述第二占用符表征对应体素未被占据。
在一些实施例中,所述解码部分14,还配置为所述基于所述第i组占用符号继续下一组解码,直至确定所述m组各自的占用符号,包括:利用包含所述第i组占用符号的至少k组占用符号,对第i+1组上采样体素进行占用概率预测,并根据第i+1组占用概率对所述占用指示信息中的第i+1组占用指示信息进行解码,确定第i+1组占用符号,直至对所述占用指示信息解码完成,确定所述m组各自的占用符号。
在一些实施例中,本申请实施例提供一种编码器2,如图16所示,包括:
下采样部分21,配置为对第一尺度点云进行体素下采样,确定第二尺度点云;
第二分组部分22,配置为对所述第二尺度点云进行上采样与划分处理,确定m组上采样体素;m为大于或等于1的整数;
第二预测部分23,配置为对所述m组上采样体素依次进行占用概率预测,确定m组占用概率,根据所述m组占用概率确定所述m组各自的占用符号;
编码部分24,配置为对所述m组各自的占用符号进行编码,确定所述第一尺度点云对应的占用指示信息,并将所述占用指示信息写入码流。
在一些实施例中,所述第二分组部分22,还配置为对第二尺度点云进行体素上采样,确定第一尺度的n个上采样体素;n为大于1的整数;对所述n个上采样体素中的每个上采样体素进行编号,并基于每个上采样体素对应的编号进行分组,确定m组上采样体素。
在一些实施例中,所述第二预测部分23,还配置为在所述m组上采样体素中,对第1组上采样体素进行占用概率预测,确定第一组占用概率,并利用预设占用符表示所述第一组占用概率,确定第一组占用符号;在i大于1且小于或等于m时,利用已完成概率预测与占用符表示的至少k组上采样体素对应的至少k组占用符号,对第i组上采样体素进行占用概率预测,确定第i组占用概率;其中,i为整数,k为大于或等于1且小于i的整数;利用所述预设占用符表示所述第i组占用概率,确定第i组占用符号;利用包含所述第i组占用符号的至少k组占用符号,继续对第i+1组上采样体素进行占用概率预测与占用符表示,直至确定所述m组各自的占用符号。
在一些实施例中,所述第二分组部分22,还配置为对所述n个上采样体素中的每个上采样体素进行编号,确定所述每个上采样体素对应的编号;
根据所述每个上采样体素的编号,对所述n个上采样体素进行分组,确定所述m组上采样体素。
在一些实施例中,所述第二分组部分22,还配置为在所述n个上采样体素中,根据所述每个上采样体素的编号,将编号相同的上采样体素划分为一组,确定所述m组上采样体素。
在一些实施例中,所述第二分组部分22,还配置为确定至少两个连续编号;根据所述每个上采样体素的编号,将所述至少两个连续编号在所述n个上采样体素中对应的上采样体素作为一组,确定所述m组上采样体素。
在一些实施例中,所述第二预测部分23,还配置为对所述第i组上采样体素进行特征提取,确定第i组体素特征;确定已完成概率预测与占用符表示的第i-k组占用符号至第i-1组占用符号为所述至少k组占用符号;根据所述至少k组占用符号,结合所述第i组体素特征,对所述第i组上采样体素中每个上采样体素进行占用概率预测,确定所述第i组上采样体素中每个上采样体素对应的占用概率为所述第i组占用概率。
在一些实施例中,所述第二预测部分23,还配置为利用所述预设占用符中的第一占用符号,表示所述第i组占用概率中大于或等于预设概率阈值的占用概率;并利用所述预设占用符中的第二占用符号,表示所述第i组占用概率中小于所述预设概率阈值的占用概率,确定所述第i组占用符号。
在一些实施例中,所述编码部分24,还配置为对所述m组各自的占用符号中每组占用符进行熵编码,确定编码得到的m组占用指示信息为所述第一尺度点云对应的占用指示信息。
在一些实施例中,所述下采样部分21、第二分组部分22、第二预测部分23与编码部分24,还配置为基于所述第二尺度点云进行至少S次体素下采样,确定第三尺度点云至第S+2尺度点云;S为大于或等于1的整数;分别对所述第三尺度点云至所述第S+2尺度点云进行体素上采样与划分处理、并分别对划分后的分组依次进行占用概率预测与编码,确定所述第二尺度点云的占用指示信息至第S+1尺度点云的占用指示信息;将所述第一尺度点云的占用指示信息、所述第二尺度点云的占用指示信息至所述第S+1尺度点云的占用指示信息写入码流。
需要说明的是,以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请装置实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。
在一些实施例中,本申请实施例还提供一种解码器,图17为本申请实施例提供的解码器3的一种可选的结构示意图。如图17所示,解码器3包括:第一存储器32与第一处理器33。其中,第一存储器32和第一处理器33通过第一通信总线34连接;第一存储器32,用于存储可执行指令;第一处理器33,用于执行第一存储器32中存储的可执行指令时,实现本申请实施例提供的解码方法。
在一些实施例中,本申请实施例还提供一种编码器,图17为本申请实施例提供的编码器4的一种可选的结构示意图。如图18所示,编码器4包括:第二存储器42与第二处理器43。其中,第二存储器42和第二处理器43通过第二通信总线44连接;第二存储器42,用于存储可执行指令;第二处理器43,用于执行第二存储器42中存储的可执行指令时,实现本申请实施例提供的编码方法。
本申请实施例提供一种存储有可执行指令的计算机可读存储介质,其中存储有可执行指令,当可执行指令被第一处理器执行时,将引起第一处理器执行上述任一种本申请实施例提供的解码方法;或者,当可执行指令被第二处理器执行时,将引起第二处理器执行上述任一种本申请实施例提供的编码方法。
在一些实施例中,计算机可读存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、闪存、磁表面存储器、光盘、或CD-ROM等存储器;也可以是包括上述存储器之一或任意组合的各种设备。
在一些实施例中,可执行指令可以采用程序、软件、软件模块、脚本或代码的形式,按任意形式的编程语言(包括编译或解释语言,或者声明性或过程性语言)来编写,并且其可按任意形式部署,包括被部署为独立的程序或者被部署为模块、组件、子例程或者适合在计算环境中使用的其它单元。
作为示例,可执行指令可以但不一定对应于文件***中的文件,可以被存储在保存其它程序或数据 的文件的一部分,例如,存储在超文本标记语言(HTML,Hyper Text Markup Language)文档中的一个或多个脚本中,存储在专用于所讨论的程序的单个文件中,或者,存储在多个协同文件(例如,存储一个或多个模块、子程序或代码部分的文件)中。
作为示例,可执行指令可被部署为在一个计算设备上执行,或者在位于一个地点的多个计算设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个计算设备上执行。
本领域内的技术人员应明白,本申请的实施例可提供为方法、***、或计算机程序产品。因此,本申请可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述,仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和范围之内所作的任何修改、等同替换和改进等,均包含在本申请的保护范围之内。
工业实用性
本申请实施例通过对第二尺度点云进行上采样与划分处理,将第二尺度点云上采样至第一尺度,并将上采样后的体素划分为m组,分组依次进行占用概率预测,得到m组占用概率。这样,通过分组预测,加快了占用概率的预测速度,提高了预测过程的处理效率与编码效率。进而,在编码器中利用m组占用概率对应的m组占用符号进行编码,确定第一尺度点云对应的占用指示信息写入码流;或者,在解码器中利用m组占用概率对第一尺度点云对应的占用指示信息进行解码,如此,提高了编解码效率与编解码性能。
进一步地,本申请实施例中分组预测与编解码的方式,相较于基于每个体素的父节点和邻居节点,逐个体素进行占用概率预测与编解码的方式,一方面极大减小了编解码复杂度,提高了压缩性能与编解码速度;另一方面,随着编解码的进行,所有之前已经完成占用概率预测的上采样体素组的占用符号都可以作为上下文信息,共同输入到卷积网络中,帮助预测当前上采样体素组的占据概率,从而提高了占用预测概率预测的准确性,进而提高了编解码准确性,也即提高了编码性能。进一步地,编码器可以以并行的方式,对m组各自的占用符号进行编码,从而进一步加快编码速度,提高编码效率和编码性能。
进一步地,本申请实施例提供的编解码方法还可以重复应用于多个相邻尺度之间,且每组相邻尺度间的编解码相互独立不依赖,因此可以灵活地实现尺度可伸缩的编解码。

Claims (26)

  1. 一种解码方法,包括:
    解析码流,确定第一尺度点云对应的占用指示信息;
    确定第二尺度点云,并对所述第二尺度点云进行上采样与划分处理,确定m组上采样体素;其中,所述第二尺度点云为所述第一尺度点云对应的前一个已解码的点云数据;m为大于或等于1的整数;
    对所述m组上采样体素依次进行占用概率预测,确定m组占用概率,并根据所述m组占用概率对所述占用指示信息进行解码,确定所述m组各自的占用符号;
    基于所述m组占用符号,确定所述第一尺度点云对应的重建几何数据。
  2. 根据权利要求1所述的方法,其中,所述对所述第二尺度点云进行上采样与划分处理,确定m组上采样体素,包括:
    对所述第二尺度点云进行体素上采样,确定第一尺度的n个上采样体素;n为大于1的整数;
    对所述n个上采样体素中的每个上采样体素进行编号,并基于每个上采样体素对应的编号进行分组,确定所述m组上采样体素。
  3. 根据权利要求1或2所述的方法,其中,所述对所述m组上采样体素依次进行占用概率预测,确定m组占用概率,并根据所述m组占用概率对所述占用指示信息进行解码,确定所述m组各自的占用符号,包括:
    在所述m组上采样体素中,对第1组上采样体素进行占用概率预测,确定第一组占用概率,并根据所述第一组占用概率对所述占用指示信息进行解码,确定第一组占用符号;
    在i大于1且小于或等于m时,利用已解码的至少k组占用符号,对第i组上采样体素进行占用概率预测,确定第i组占用概率,并根据所述第i组占用概率,对所述占用指示信息中的第i组占用指示信息进行解码,确定第i组占用符号;其中,i为整数,k为大于或等于1且小于i的整数;
    基于所述第i组占用符号继续下一组解码,直至确定所述m组各自的占用符号。
  4. 根据权利要求2所述的方法,其中,所述对所述n个上采样体素中的每个上采样体素进行编号,并基于每个上采样体素对应的编号进行分组,确定所述m组上采样体素,包括:
    对所述n个上采样体素中的每个上采样体素进行编号,确定所述每个上采样体素对应的编号;
    根据所述每个上采样体素的编号,对所述n个上采样体素进行分组,确定所述m组上采样体素。
  5. 根据权利要求4所述的方法,其中,所述根据所述每个上采样体素的编号,对所述n个上采样体素进行分组,确定所述m组上采样体素,包括:
    在所述n个上采样体素中,根据所述每个上采样体素的编号,将编号相同的上采样体素划分为一组,确定所述m组上采样体素。
  6. 根据权利要求4所述的方法,其中,所述根据所述每个上采样体素的编号,对所述n个上采样体素进行分组,确定所述m组上采样体素,包括:
    确定至少两个连续编号;
    根据所述每个上采样体素的编号,将所述至少两个连续编号在所述n个上采样体素中对应的上采样体素作为一组,确定所述m组上采样体素。
  7. 根据权利要求3所述的方法,其中,所述利用已解码的至少k组占用符号,对第i组上采样体素进行占用概率预测,确定第i组占用概率,包括:
    对所述第i组上采样体素进行特征提取,确定第i组体素特征;
    确定已解码的第i-k组占用符号至第i-1组占用符号为所述至少k组占用符号;
    根据所述至少k组占用符号,结合所述第i组体素特征,对所述第i组上采样体素中每个上采样体素进行占用概率预测,确定预测得到的所述第i组上采样体素中每个上采样体素对应的占用概率为所述第i组占用概率。
  8. 根据权利要求1、2、4-7任一项所述的方法,其中,所述确定第二尺度点云之前,所述方法还包括:
    解析所述码流,确定第T尺度点云,以及第T-1尺度点云对应的第T-1占用指示信息;其中,j=T,T-1,…,3;T为大于或等于3的整数;
    从j=T开始,按照从大到小的顺序对j的每一个取值执行以下处理:
    基于第j尺度点云进行体素上采样与划分处理,并对划分后的分组依次进行占用概率预测,确定第j-1尺度点云中m个分组对应的m组占用概率;
    根据所述第j-1尺度点云中m个分组对应的m组占用概率,对所述第j-1占用指示信息进行熵解码,确定所述第j-1占用指示信息对应的m组占用符号;
    基于所述第j-1占用指示信息对应的m组占用符号,重建第j-1尺度点云;
    继续根据j的下一个取值进行下一次处理,直至j=3时,重建得到所述第二尺度点云。
  9. 根据权利要求3所述的方法,其中,所述根据所述第i组占用概率,对所述占用指示信息中的第i组占用指示信息进行解码,确定第i组占用符号,包括:
    根据所述第i组占用概率,对所述第i组占用指示信息中每个占用指示信息进行熵解码,确定所述第i组占用指示信息中每个占用指示信息对应的占用符号,从而确定所述第i组占用符号;其中,
    所述占用符号包括:第一占用符与第二占用符;所述第一占用符表征对应体素被占据;所述第二占用符表征对应体素未被占据。
  10. 根据权利要求3所述的方法,其中,所述基于所述第i组占用符号继续下一组解码,直至确定所述m组各自的占用符号,包括:
    利用包含所述第i组占用符号的至少k组占用符号,对第i+1组上采样体素进行占用概率预测,并根据第i+1组占用概率对所述占用指示信息中的第i+1组占用指示信息进行解码,确定第i+1组占用符号,直至对所述占用指示信息解码完成,确定所述m组各自的占用符号。
  11. 一种编码方法,包括:
    对第一尺度点云进行体素下采样,确定第二尺度点云;
    对所述第二尺度点云进行上采样与划分处理,确定m组上采样体素;m为大于或等于1的整数;
    对所述m组上采样体素依次进行占用概率预测,确定m组占用概率,根据所述m组占用概率确定所述m组各自的占用符号;
    对所述m组各自的占用符号进行编码,确定所述第一尺度点云对应的占用指示信息,并将所述占用指示信息写入码流。
  12. 根据权利要求11所述的方法,其中,所述对所述第二尺度点云进行上采样与划分处理,确定m组上采样体素,包括:
    对第二尺度点云进行体素上采样,确定第一尺度的n个上采样体素;n为大于1的整数;
    对所述n个上采样体素中的每个上采样体素进行编号,并基于每个上采样体素对应的编号进行分组,确定m组上采样体素。
  13. 根据权利要求11所述的方法,其中,所述对所述m组上采样体素依次进行占用概率预测,确定m组占用概率,根据所述m组占用概率确定所述m组各自的占用符号,包括:
    在所述m组上采样体素中,对第1组上采样体素进行占用概率预测,确定第一组占用概率,并利用预设占用符表示所述第一组占用概率,确定第一组占用符号;
    在i大于1且小于或等于m时,利用已完成概率预测与占用符表示的至少k组上采样体素对应的至少k组占用符号,对第i组上采样体素进行占用概率预测,确定第i组占用概率;其中,i为整数,k为大于或等于1且小于i的整数;
    利用所述预设占用符表示所述第i组占用概率,确定第i组占用符号;
    利用包含所述第i组占用符号的至少k组占用符号,继续对第i+1组上采样体素进行占用概率预测与占用符表示,直至确定所述m组各自的占用符号。
  14. 根据权利要求12所述的方法,其中,所述对所述n个上采样体素中的每个上采样体素进行编号,并基于每个上采样体素对应的编号进行分组,确定m组上采样体素,包括:
    对所述n个上采样体素中的每个上采样体素进行编号,确定所述每个上采样体素对应的编号;
    根据所述每个上采样体素的编号,对所述n个上采样体素进行分组,确定所述m组上采样体素。
  15. 根据权利要求14所述的方法,其中,所述根据所述每个上采样体素的编号,对所述n个上采样体素进行分组,确定所述m组上采样体素,包括:
    在所述n个上采样体素中,根据所述每个上采样体素的编号,将编号相同的上采样体素划分为一组,确定所述m组上采样体素。
  16. 根据权利要求14所述的方法,其中,所述根据所述每个上采样体素的编号,对所述n个上采样体素进行分组,确定所述m组上采样体素,包括:
    确定至少两个连续编号;
    根据所述每个上采样体素的编号,将所述至少两个连续编号在所述n个上采样体素中对应的上采样体素作为一组,确定所述m组上采样体素。
  17. 根据权利要求13所述的方法,其中,所述利用已完成概率预测与占用符表示的至少k组上采样体素对应的至少k组占用符号,对第i组上采样体素进行占用概率预测,确定第i组占用概率,包括:
    对所述第i组上采样体素进行特征提取,确定第i组体素特征;
    确定已完成概率预测与占用符表示的第i-k组占用符号至第i-1组占用符号为所述至少k组占用符号;
    根据所述至少k组占用符号,结合所述第i组体素特征,对所述第i组上采样体素中每个上采样体素进行占用概率预测,确定所述第i组上采样体素中每个上采样体素对应的占用概率为所述第i组占用概率。
  18. 根据权利要求13所述的方法,其中,所述利用所述预设占用符表示所述第i组占用概率,确定第i组占用符号,包括:
    利用所述预设占用符中的第一占用符号,表示所述第i组占用概率中大于或等于预设概率阈值的占用概率;并利用所述预设占用符中的第二占用符号,表示所述第i组占用概率中小于所述预设概率阈值的占用概率,确定所述第i组占用符号。
  19. 根据权利要求11-18任一项所述的方法,其中,所述对所述m组各自的占用符号进行编码,确定所述第一尺度点云对应的占用指示信息,包括:
    对所述m组各自的占用符号中每组占用符进行熵编码,确定编码得到的m组占用指示信息为所述第一尺度点云对应的占用指示信息。
  20. 根据权利要求11-18任一项所述的方法,其中,所述方法还包括:
    基于所述第二尺度点云进行至少S次体素下采样,确定第三尺度点云至第S+2尺度点云;S为大于或等于1的整数;
    分别对所述第三尺度点云至所述第S+2尺度点云进行体素上采样与划分处理、并分别对划分后的分组依次进行占用概率预测与编码,确定所述第二尺度点云的占用指示信息至第S+1尺度点云的占用指示信息;
    将所述第一尺度点云的占用指示信息、所述第二尺度点云的占用指示信息至所述第S+1尺度点云的占用指示信息写入码流。
  21. 一种解码器,包括:
    解析部分,配置为解析码流,确定第一尺度点云对应的占用指示信息;
    第一分组部分,配置为确定第二尺度点云,并对所述第二尺度点云进行上采样与划分处理,确定m组上采样体素;其中,所述第二尺度点云为所述第一尺度点云对应的前一个已解码的点云数据;m为大于或等于1的整数;
    第一预测部分,配置为对所述m组上采样体素依次进行占用概率预测,确定m组占用概率,并根据所述m组占用概率对所述占用指示信息进行解码,确定所述m组各自的占用符号;
    解码部分,配置为基于所述m组占用符号,确定所述第一尺度点云对应的重建几何数据。
  22. 一种编码器,包括:
    下采样部分,配置为对第一尺度点云进行体素下采样,确定第二尺度点云;
    第二分组部分,配置为对所述第二尺度点云进行上采样与划分处理,确定m组上采样体素;m为大于或等于1的整数;
    第二预测部分,配置为对所述m组上采样体素依次进行占用概率预测,确定m组占用概率,根据所述m组占用概率确定所述m组各自的占用符号;
    编码部分,配置为对所述m组各自的占用符号进行编码,确定所述第一尺度点云对应的占用指示信息,并将所述占用指示信息写入码流。
  23. 一种码流,包括:
    所述码流是根据待编码信息进行比特编码生成的;其中,所述待编码信息至少包括:第一尺度点云对应的占用指示信息。
  24. 一种解码器,包括:
    第一存储器,配置为存储可执行指令;
    第一处理器,配置为执行所述第一存储器中存储的可执行指令时,实现权利要求1至10任一项所述的方法。
  25. 一种编码器,包括:
    第二存储器,配置为存储可执行指令;
    第二处理器,配置为执行所述第二存储器中存储的可执行指令时,实现权利要求11至20任一项所述的方法。
  26. 一种计算机可读存储介质,存储有可执行指令,用于引起第一处理器执行时,实现权利要求1至10任一项所述的方法,或者,用于引起第二处理器执行时,权利要求11至20任一项所述的方法。
PCT/CN2022/105253 2022-07-12 2022-07-12 编解码方法、解码器、编码器及计算机可读存储介质 WO2024011417A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2022/105253 WO2024011417A1 (zh) 2022-07-12 2022-07-12 编解码方法、解码器、编码器及计算机可读存储介质
TW112125920A TW202406343A (zh) 2022-07-12 2023-07-11 編解碼方法、解碼器、編碼器、碼流及電腦可讀儲存媒介

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/105253 WO2024011417A1 (zh) 2022-07-12 2022-07-12 编解码方法、解码器、编码器及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2024011417A1 true WO2024011417A1 (zh) 2024-01-18

Family

ID=89535224

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/105253 WO2024011417A1 (zh) 2022-07-12 2022-07-12 编解码方法、解码器、编码器及计算机可读存储介质

Country Status (2)

Country Link
TW (1) TW202406343A (zh)
WO (1) WO2024011417A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190394496A1 (en) * 2018-06-22 2019-12-26 Apple Inc. Point cloud geometry compression using octrees and binary arithmetic encoding with adaptive look-up tables
CN113473127A (zh) * 2020-03-30 2021-10-01 鹏城实验室 一种点云几何编码方法、解码方法、编码设备及解码设备
CN113613010A (zh) * 2021-07-07 2021-11-05 南京大学 基于稀疏卷积神经网络的点云几何无损压缩方法
CN113766228A (zh) * 2020-06-05 2021-12-07 Oppo广东移动通信有限公司 点云压缩方法、编码器、解码器及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190394496A1 (en) * 2018-06-22 2019-12-26 Apple Inc. Point cloud geometry compression using octrees and binary arithmetic encoding with adaptive look-up tables
CN113473127A (zh) * 2020-03-30 2021-10-01 鹏城实验室 一种点云几何编码方法、解码方法、编码设备及解码设备
CN113766228A (zh) * 2020-06-05 2021-12-07 Oppo广东移动通信有限公司 点云压缩方法、编码器、解码器及存储介质
CN113613010A (zh) * 2021-07-07 2021-11-05 南京大学 基于稀疏卷积神经网络的点云几何无损压缩方法

Also Published As

Publication number Publication date
TW202406343A (zh) 2024-02-01

Similar Documents

Publication Publication Date Title
JP7379524B2 (ja) ニューラルネットワークモデルの圧縮/解凍のための方法および装置
US9348860B2 (en) Method for encoding a mesh model, encoded mesh model and method for decoding a mesh model
US11177823B2 (en) Data compression by local entropy encoding
US12026925B2 (en) Channel-wise autoregressive entropy models for image compression
CN113537456B (zh) 一种深度特征压缩方法
JP7486883B2 (ja) Haarベースの点群符号化のための方法および装置
CN113613010A (zh) 基于稀疏卷积神经网络的点云几何无损压缩方法
US20240121447A1 (en) Systems, apparatus, and methods for bit level representation for data processing and analytics
CN117354523A (zh) 一种频域特征感知学习的图像编码、解码、压缩方法
JP2020053820A (ja) 量子化及び符号化器作成方法、圧縮器作成方法、圧縮器作成装置及びプログラム
WO2024011417A1 (zh) 编解码方法、解码器、编码器及计算机可读存储介质
JP2023523272A (ja) 最近傍検索方法、エンコーダ、デコーダ、及び記憶媒体
TW202406344A (zh) 一種點雲幾何資料增強、編解碼方法、裝置、碼流、編解碼器、系統和儲存媒介
CN116546219A (zh) 一种基于学习的点云几何颜色联合压缩方法
JP7394980B2 (ja) ブロック分割を伴うニューラルネットワークを復号する方法、装置及びプログラム
CN113382244B (zh) 编解码网络结构、图像压缩方法、装置及存储介质
WO2024082105A1 (zh) 编解码方法、解码器、编码器及计算机可读存储介质
CN115393452A (zh) 一种基于非对称自编码器结构的点云几何压缩方法
CN113810058A (zh) 数据压缩方法、数据解压缩方法、装置及电子设备
WO2024082101A1 (zh) 编解码方法、解码器、编码器、码流及存储介质
WO2024011427A1 (zh) 一种点云帧间补偿方法、编解码方法、装置和***
WO2023205969A1 (zh) 点云几何信息的压缩、解压缩及点云视频编解码方法、装置
TW202425638A (zh) 編解碼方法、解碼器、編碼器、碼流及儲存媒介
Al-Azawi et al. Compression of Audio Using Transform Coding.
Zabolotnii et al. Applying the Arithmetic Compression Method in Digital Speech Data Processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22950548

Country of ref document: EP

Kind code of ref document: A1