CN113727105A

CN113727105A - Depth map compression method, device, system and storage medium

Info

Publication number: CN113727105A
Application number: CN202111046709.1A
Authority: CN
Inventors: 黄缚鹏; 李翔宇; 范文新; 燕忠亮
Original assignee: Tianjin Yifuzhen Internet Hospital Co ltd; Beijing Yibai Technology Co ltd
Current assignee: Tianjin Yifuzhen Internet Hospital Co ltd; Beijing Yibai Technology Co ltd
Priority date: 2021-09-08
Filing date: 2021-09-08
Publication date: 2021-11-30
Anticipated expiration: 2041-09-08
Also published as: CN113727105B

Abstract

The application discloses a depth map compression method, a depth map compression device, a depth map compression system and a storage medium, which are used for enabling a depth map to be directly compressed through two-dimensional coding equipment, and further enabling the compressed depth map to be convenient for network distribution. The method comprises the following steps: acquiring a depth map to be compressed; acquiring all pixel points in the depth map; determining a concentrated region of the information entropies of all the pixel points; splitting the depth map according to the concentrated region, wherein all pixel points contained in the concentrated region are split into main maps, and all pixel points contained outside the concentrated region are split into auxiliary maps; and respectively inputting the information corresponding to the main graph and the auxiliary graph into two-dimensional coding equipment for coding so that the two-dimensional coding equipment outputs a video code stream corresponding to the depth graph. By adopting the scheme provided by the application, the splitting of the depth map can be realized based on the condition of the concentration area of the information entropy, so that a plurality of split depth maps can be directly compressed through two-dimensional coding equipment, and the network distribution is facilitated.

Description

Depth map compression method, device, system and storage medium

Technical Field

A depth map compression method, device, system and storage medium.

Background

Depth images, also known as range images, refer to images that take as pixel values the distance (depth) from an image grabber to various points in a scene, which directly reflect the geometry of the visible surface of a scene. In the prior art, the depth map is usually encoded into the png format to realize compression of the depth map, and although the space size is reduced by such an encoding method, the png format file is not suitable for network distribution.

Video coding has developed to date, a large number of two-dimensional coding devices exist today, these two-dimensional coding devices can provide coding speed of hundreds of frames per second, existing transmission protocols provide various excellent load balancing methods for network transmission video, and if a depth map can be coded into a video stream transparent to users through the two-dimensional coding devices, network distribution can be conveniently performed.

However, the current depth map is difficult to be compressed directly by a two-dimensional coding device, for example, the depth map provided by the microsoft camera has a bit depth of 16 bits, the highest coding bit depth of the current video protocol is 12 bits, and most of the bit depth does not provide lossless compression, while the depth map coding generally requires geometric lossless. Therefore, it is necessary to provide a depth map compression method, so that the depth map can be directly compressed by a two-dimensional coding device, and the compressed depth map is convenient for network distribution.

Disclosure of Invention

The application provides a depth map compression method, a depth map compression system and a storage medium, which are used for enabling a depth map to be directly compressed through two-dimensional coding equipment.

The application provides a depth map compression method, which comprises the following steps:

acquiring a depth map to be compressed;

acquiring all pixel points in the depth map;

determining a concentrated region of the information entropies of all the pixel points;

splitting the depth map according to the concentrated region, wherein all pixel points contained in the concentrated region are split into main maps, and all pixel points contained outside the concentrated region are split into auxiliary maps;

and respectively inputting the information corresponding to the main graph and the auxiliary graph into two-dimensional coding equipment for coding so as to enable the two-dimensional coding equipment to output a video code stream corresponding to the depth graph.

The beneficial effect of this application lies in: the method can determine the concentration area of the information entropies of all the pixel points, split all the pixel points contained in the concentration area into main pictures, and split all the pixel points contained outside the concentration area into sub-pictures, so that the splitting of the depth map can be realized based on the condition of the concentration area of the information entropies, and then a plurality of depth maps after splitting can be directly compressed through two-dimensional coding equipment, and the network distribution is facilitated.

In one embodiment, determining the concentration region of the information entropy of all the pixel points comprises:

determining the values of all pixel points;

converting the values of all the pixel points into binary numbers;

carrying out bit-plane statistics on the value of each pixel point in the depth map according to the binary number;

and determining the concentration area of the information entropy of all the pixel points according to the bit plane statistical result.

In one embodiment, the determining the concentration region of the information entropy of all the pixel points according to the bit plane statistics result includes:

generating a corresponding integral graph according to the bit plane statistical result;

and determining a bit plane corresponding to the maximum value of the integral image item as a concentrated region of the information entropy.

In one embodiment, inputting information corresponding to the main map and the secondary map into a two-dimensional coding device for coding respectively comprises:

determining a high bitmap and a low bitmap in the primary map and the secondary map;

generating a complementary graph corresponding to the low bitmap;

comparing compression rates after encoding the low bitmap and the complement;

if the compression ratio after the low bitmap is coded is higher than that after the complement image is coded, inputting the low bitmap and the high bitmap into two-dimensional coding equipment for coding;

and if the compression rate after the low bitmap is coded is lower than the compression rate after the complement image is coded, inputting the complement image corresponding to the low bitmap and the high bitmap into two-dimensional coding equipment for coding.

The present application further provides a depth map compression apparatus, including:

the first acquisition module is used for acquiring a depth map to be compressed;

the second acquisition module is used for acquiring all pixel points in the depth map;

the determining module is used for determining the concentrated region of the information entropy of all the pixel points;

the splitting module is used for splitting the depth map according to the concentrated region, splitting all pixel points contained in the concentrated region into main maps, and splitting all pixel points contained outside the concentrated region into auxiliary maps;

and the coding module is used for respectively inputting the information corresponding to the main graph and the auxiliary graph into two-dimensional coding equipment for coding so as to enable the two-dimensional coding equipment to output the video code stream corresponding to the depth graph.

In one embodiment, the determining module includes:

the first determining submodule is used for determining the values of all the pixel points;

the conversion submodule is used for converting the values of all the pixel points into binary numbers;

the statistic submodule is used for carrying out bit-plane statistics on the value of each pixel point in the depth map according to the binary number;

and the second determining submodule is used for determining the concentrated region of the information entropy of all the pixel points according to the bit plane statistical result.

In one embodiment, the second determining submodule is specifically configured to:

In one embodiment, an encoding module comprises:

a third determining sub-module, configured to determine a high bitmap and a low bitmap in the primary map and the secondary map;

the generating submodule is used for generating a complementary graph corresponding to the low bitmap;

a comparison sub-module for comparing the compression rates of the low bitmap and the complement after encoding;

the first input sub-module is used for inputting the low bitmap and the high bitmap into two-dimensional coding equipment for coding if the compression rate after the low bitmap is coded is higher than the compression rate after the complement picture is coded;

and the second input sub-module is used for inputting the complementary image corresponding to the low bitmap and the high bitmap into the two-dimensional coding equipment for coding if the compression rate after the low bitmap is coded is lower than the compression rate after the complementary image is coded.

The present application further provides a depth map compression system, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to implement the depth map compression method of any of the above embodiments.

The present application further provides a computer-readable storage medium, wherein when executed by a processor corresponding to the depth map compression system, instructions in the storage medium enable the depth map compression system to implement the depth map compression method described in any of the above embodiments.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present application is further described in detail by the accompanying drawings and examples.

Drawings

The accompanying drawings are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiment(s) of the application and together with the description serve to explain the application and not limit the application. In the drawings:

fig. 1 is a flowchart illustrating a depth map compression method according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating a depth map compression method according to another embodiment of the present application;

FIG. 3 is a flowchart illustrating a depth map compression method according to another embodiment of the present application;

FIG. 4 is a flow chart illustrating the selection of a low bitmap positive map or a complement map for encoding according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an embodiment of the present application, in which a circumscribed cube is divided into 8 cubes a along a centerline of each face;

FIG. 6 is a schematic diagram of a cube a labeled 1 cut along the midline of each face into 8 smaller cubes b according to an embodiment of the present application;

FIG. 7 is a diagram of an m-tree structure resulting from conversion of cut perfect cubes;

FIG. 8 is a block diagram of a depth map compression apparatus according to an embodiment of the present application;

fig. 9 is a schematic hardware structure diagram of a depth map compression system according to an embodiment of the present application.

Detailed Description

The preferred embodiments of the present application will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein only to illustrate and explain the present application and not to limit the present application.

Fig. 1 is a flowchart of a depth map compression method according to an embodiment of the present application, and as shown in fig. 1, the method can be implemented as the following steps S11-S15:

in step S11, a depth map to be compressed is acquired;

in step S12, all pixel points in the depth map are obtained;

in step S13, a concentration region of the information entropies of all the pixel points is determined;

in step S14, splitting the depth map according to the concentration region, wherein all pixel points included in the concentration region are split into main maps, and all pixel points included outside the concentration region are split into sub maps;

in step S15, the information corresponding to the main map and the sub map is input to the two-dimensional encoding device for encoding, so that the two-dimensional encoding device outputs a video code stream corresponding to the depth map.

At present, civil depth cameras are gradually popularized in life, but depth images shot by the civil depth cameras are often stored in png format. This application aims at optimizing to the comparatively dense depth map that civil level camera that microsoft produced was shot, and is specific:

taking microsoft kinect depth camera as an example to perform shooting experiments of depth maps, microsoft provides a method of storing shooting results in mkv (a video format) format, and mkv includes two streams, namely a color stream and a depth stream. 904 images taken by the microsoft kinect depth camera were made into a data set as shown in table 1 below:

table 1 frame number: 904

As can be seen from table 1, the depth map is stored in the mkv format file in uncompressed original data form, and thus occupies a huge space.

Assuming that the depth maps to be compressed in the present application are 904 frames of images in the mkv format, and when these depth maps are continuously captured, there is some continuity inside the 904 frames of images in the mkv video format, except for edges, they always appear in a smooth look. For three-dimensional objects, the smooth surface of the object also meets the continuously conductible assumption. As such, intra prediction is feasible. Video also has continuity in time, so that there are coding methods of inter prediction and bi-prediction. Since the depth map is obtained by temporally different motions and displacements of a three-dimensional object and continuity exists, inter-frame prediction is also applied to depth map coding.

In the application, a depth map to be compressed is obtained; acquiring all pixel points in the depth map; determining a concentrated region of the information entropies of all the pixel points; splitting the depth map according to the concentrated region, wherein all pixel points contained in the concentrated region are split into main maps, and all pixel points contained outside the concentrated region are split into auxiliary maps; and respectively inputting the information corresponding to the main graph and the auxiliary graph into two-dimensional coding equipment for coding so that the two-dimensional coding equipment outputs a video code stream corresponding to the depth graph.

The determining of the concentration areas of the information entropies of all the pixel points comprises the following steps: determining the values of all pixel points; converting the values of all the pixel points into binary numbers; carrying out bit-plane statistics on the value of each pixel point in the depth map according to the binary number; and determining the concentration area of the information entropy of all the pixel points according to the bit plane statistical result. And determining the concentration area of the information entropy of all the pixel points according to the bit plane statistical result, which comprises the following steps: generating a corresponding integral graph according to the bit plane statistical result; and determining a bit plane corresponding to the maximum value of the integral image item as a concentrated region of the information entropy.

For example, the bit depth of an image shot by a microsoft kinect depth camera is 16 bits, and a two-dimensional coding scheme can only achieve lossless coding of 8 bits generally, that is, the maximum bit depth allowed by a two-dimensional coding device is 8 bits, so that in the present application, an image to be compressed with the bit depth of 16 bits can be cut into a high 8-bit image and a low 8-bit image. Specifically, for a frame of depth map shot by microsoft kinect depth camera, the bit plane statistical information obtained is shown in table 2 below:

TABLE 2

Specifically, the bit plane statistics is to convert the value of the pixel into a binary number, and in the depth map, the pixel value is substantially used to represent the depth of the picture object, and the value range is 0-65535, so that the conversion into the 2-system number falls within the value range of 00000000000000000000-.

The bit plane statistical method can provide the distribution situation of all depths of the whole image on each bit, and further needs to find an 8-bit interval with the most concentrated data, namely a concentrated area of entropy coding, and the interval is used as a value basis of the main image. That is, the corresponding integrogram generated according to the bit plane statistical result is shown in the following table 3:

TABLE 3

In table 3, Stat _ i is the statistical result at position i; sigma (i) = sigma Stat _ k, i.e., 0-i bits of statistics sum; integrogram Si = Sigma (i) -Sigma (i-7).

After the integral map is determined, we only need to go through table 3 above and take the maximum value of the integral map term. Therefore, we can conclude that, taking table 3 as an example, the concentration region of the information entropy is mainly concentrated on 8-15 bits, and therefore, the bit plane region of 8-15 bits is the region corresponding to the main graph, i.e. the main graph is the high bitmap. The bit-plane interval of the sub-map is 0-7, i.e. the sub-map is a low bit map.

In addition, because the shooting range of the camera is limited, the analysis of all depth maps can obtain the conclusion that eight high bits are main figures. After the main graph is determined, it is obvious that the entropy of the whole depth map is concentrated on the main graph, and the graph only carries a small amount of information. The distribution is particularly suitable for two-dimensional coding to input a plurality of depth maps into a two-dimensional coding device, so that the two-dimensional coding device compresses the plurality of depth maps into a video stream format.

The application also includes: acquiring a low bitmap in the main graph and the secondary graph; generating a complementary graph corresponding to the low bitmap; comparing the compression rates of the low bitmap and the complement image after encoding; if the compression rate after encoding the low bitmap is higher than that after encoding the complement image, the low bitmap is used as the input of the two-dimensional encoding equipment; and if the compression rate after the low bitmap is coded is lower than that after the complement image is coded, taking the complement image corresponding to the low bitmap as the input of the two-dimensional coding equipment.

Assuming that the compression rate after encoding the low bitmap is lower than the compression rate after encoding the complement, when encoding specifically, the high bitmap and the complement of the low bitmap are input into a two-dimensional encoding device, and when encoding, the two-dimensional encoding device encodes the high bitmap first, then encodes the complement of the low bitmap, encodes colors and sounds, and finally multiplexes data, and if not, continues encoding.

When the coded video code stream is sent to the terminal equipment, the terminal decoder can respectively decode the main graph and the auxiliary graph, and then the main graph and the auxiliary graph are combined to restore the video code stream to the depth graph.

In one embodiment, the above step S13 may be implemented as the following steps A1-A4:

in step a1, the values of all the pixel points are determined;

in step a2, the values of all the pixel points are converted into binary numbers;

in step a3, performing bit-plane statistics on the values of each pixel point in the depth map according to binary numbers;

in step a4, a concentration region of the information entropies of all the pixel points is determined according to the bit plane statistics result.

In one embodiment, as shown in FIG. 2, the above step A4 can be implemented as the following steps S21-S22:

in step S21, generating a corresponding integrogram according to the bit plane statistical result;

in step S22, the bit plane corresponding to the maximum value of the integral term is determined as the region of concentration of the information entropy.

In one embodiment, as shown in FIG. 3, the method may also be implemented as the following steps S31-S35:

in step S31, determining a high bit map and a low bit map in the primary map and the secondary map;

in step S32, a complement corresponding to the low bitmap is generated;

in step S33, the compression rates after encoding the low bitmap and the complement map are compared;

in step S34, if the compression rate after encoding the low bitmap is higher than the compression rate after encoding the complement, the low bitmap is taken as an input of the two-dimensional encoding apparatus;

in step S35, if the compression rate after encoding the low bitmap is lower than the compression rate after encoding the complement map, the complement map corresponding to the low bitmap is used as the input of the two-dimensional encoding apparatus.

In this embodiment, as shown in fig. 4, a low eight-bit positive map is generated, and then a complementary map corresponding to the low eight-bit positive map is generated; carrying out compression coding on the low eight-bit positive map and the low eight-bit complementary map, and then comparing compression rates after the low bit map and the complementary map are coded; if the compression ratio after the low bitmap is coded is higher than that after the complement image is coded, which indicates that the encoding effect of the positive image of the low bitmap is better, the low bitmap is used as the input of two-dimensional encoding equipment; and if the compression rate after the low bitmap is coded is lower than that after the complement image is coded, which shows that the coding effect of the low bitmap complement image is better, taking the complement image corresponding to the low bitmap as the input of the two-dimensional coding equipment.

Depth maps have many morphologies, which can be roughly classified into sparse and dense. The present application is directed to relatively dense depth maps. I.e., a depth map that is dense in x, y coordinates, without limitation in the z direction. The depth map shot by the Microsoft camera conforms to the characteristic of dense x and y coordinates on data, and the spatial points can always find the adjacent points on the eight-connected channel except the edges. The data range on the z-axis is 0-65535, that is to say it needs to be represented by a 16-bit binary integer. When the depth map is a sparse depth map, the intra-frame prediction effect is not ideal due to low continuity inside the image, and at this time, the depth map may be encoded in a coding mode more suitable for the sparse depth map, specifically as follows:

in one embodiment, the method may also be implemented as the steps of:

and when the density of the feature points in the depth map is less than the preset density, converting the depth map to be compressed into a preset data structure.

In one embodiment, converting the depth map to be compressed into a preset data structure may be implemented as the following steps B1-B4:

in step B1, a depth map to be processed is acquired;

in step B2, a three-dimensional target object in the depth map is determined;

in step B3, constructing a circumscribed cube for wrapping the three-dimensional target object;

in step B4, the external cube is converted into a preset data structure according to the voxel distribution in the external cube, so as to represent the depth map to be processed by the preset data structure, where the preset data structure is a data structure that only contains 0 and 1.

In a depth image, perfect images are often few and few, most depth images are general in quality, pixel points are sparse, and spatial point position noise is often very large. Traditional depth maps tend to be compressed losslessly into png format, which is not very efficient and still leaves much room for improvement. In view of this, the present application tries to compress such lossless depth image with general image quality, sparse pixel points, and large spatial point position noise into a data structure containing only binary digits of 0 and 1, so as to improve the compression rate. Specifically, a depth map to be processed is obtained, and a three-dimensional target object in the depth map is determined. Determining resolution information of the depth map when determining the three-dimensional target object in the depth map; calculating the coordinate value of each pixel point according to the resolution information of the depth map; acquiring the depth value of the position of each pixel point; converting the pixel points into voxels with depth direction coordinates according to the coordinate values and the depth values of the pixel points; and when the voxel conversion is finished, determining the object formed by the voxels as a three-dimensional target object.

The step of determining the resolution information of the depth map may be implemented as the following steps:

determining the width of an image as w pixels and the height as h pixels;

the step of calculating the coordinate value of each pixel point according to the resolution information of the depth map may be specifically implemented as the following steps:

calculating the two-dimensional coordinate value of each pixel point according to the following formula:

x= i%w，y=[i/w]；

wherein i represents the ith pixel point of any row, and x represents the abscissa of the ith pixel point; y represents the ordinate of the ith pixel point; % represents the remainder; [] Meaning that the fractional part is discarded and rounded down.

Secondly, the above-mentioned converting the pixel point into a voxel having a depth direction coordinate according to the coordinate value and the depth value of the pixel point may be implemented as the following steps:

after the two-dimensional coordinate value of each pixel point is calculated, the depth value of each pixel point is determined;

specifically, in the depth map, it is assumed that the depth ranges from 0 to 65535, and 0 represents that there is no depth data at this pixel position, and if there is a non-zero value at this point, it indicates that there is depth map data at this point. Therefore, on the basis of the position coordinates (x, y), the depth is represented by adding one z, so that one point can be represented by (x, y, z). Taking the depth value of the pixel point as a third three-dimensional coordinate to obtain a three-dimensional coordinate value (x, y, z) corresponding to each pixel point;

and moving each pixel point to the position of the three-dimensional coordinate value corresponding to each pixel point according to the three-dimensional coordinate value, wherein the pixel point of the three-dimensional coordinate value corresponding to each pixel point is a voxel. And an object formed by the set of all the points in the depth map is the three-dimensional target object.

After the three-dimensional target object is determined, a bounding cube is constructed for wrapping the three-dimensional target object.

Then, executing cutting operation on the external cube to cut the external cube into a plurality of first cubes with the same volume; performing an analysis operation on the first cubes to determine whether a voxel exists in each first cube; labeling the first cube with no voxels present as 0; labeling a first cube in which a voxel exists as 1; continuing to perform cutting and analyzing operations on the first cube marked as 1 until the cutting operation times reach m times, and determining that the cutting of the external cube is completed, wherein m is a positive integer; the cut perfect cube is encoded into an m-level tree structure containing 0 and 1.

The following describes the external cube cutting process and the process of converting into the tree structure in detail with reference to the accompanying drawings:

step C1: as shown in fig. 5, a circumscribed cube is divided into 8 cubes a along the center line of each face, the 8 cubes a are traversed sequentially, if a point exists in the cube a, the cube a is marked as 1, and the cube a without the point inside is marked as 0;

step C2: as shown in fig. 6, the above step B1 is repeatedly performed on the cube a marked as 1, that is, the cube a marked as 1 is cut into 8 smaller cubes B along the center line of each face, and the 8 cubes B are continuously traversed in sequence, if a point exists in the cube B, the cube B is marked as 1, and the cube B with no point inside is marked as 0; step B1 is repeatedly executed m times until the downward division can not be realized;

step C3: finally, a data containing only 0 and 1 is obtained and recorded as a tree, and the tree contains m layers, as shown in fig. 7. Each point where a depth value exists can be found in the tree. That is, we encode the depth image as a tree containing only 0 and 1. It is to be understood that since the above-mentioned step a1 can be performed m times, the tree structure is also an m-level tree structure.

In addition, since the circumscribed cube needs to be cut along the centerline, when constructing the circumscribed cube for wrapping the three-dimensional target object, the minimum cube for wrapping the three-dimensional target object can be constructed first; if the edge of the smallest cube is L and L satisfies L ≤ 2ⁿAt the same time, the length of the edge of the smallest cube is enlarged to 2ⁿTo form a length of edge of 2ⁿThe target object corresponding to the depth map is wrapped by the circumscribed cube of (a), wherein n is a positive integer. For example, if n is 1, it can be guaranteed that the side length of the cut minimum cube is a positive number 1 after performing the cutting operation in the step B1 1 time, and n is 2, and it can be guaranteed that the side length of the cut minimum cube is a positive number 1 after performing the cutting operation in the step a1 twice. Therefore, when m is less than or equal to n, floating point type data can be guaranteed not to appear in the whole calculation process, and therefore in the scheme, the edge length of the external cube is enabled to be 2 through properly expanding the edge lengthⁿTherefore, the probability that the length of the cut square edge is an integer is improved, the calculated amount is reduced, and the calculating speed is increased.

In addition, in the present application, when determining a three-dimensional target object in a depth map, the three-dimensional target object may also be determined by:

determining resolution information of a depth map; calculating the coordinate value of each pixel point according to the resolution information of the depth map; acquiring the depth value of the position of each pixel point; converting the pixel points into voxels with depth direction coordinates according to the coordinate values and the depth values of the pixel points; when the voxel conversion is finished, generating a first table for recording the voxel distribution condition on the same coordinate axis; judging whether gaps exist among voxels on the same coordinate axis according to the first table; when gaps exist, gaps among voxels are eliminated; and determining an object formed by the gap-eliminated voxels as a three-dimensional target object, and generating a second table which has a mapping relation with the first table, wherein the second table is used for recording the distribution condition of the gap-eliminated voxels. For example, taking the z-axis as an example, the z-axis represents the depth direction of the depth map, and assuming that there are only 4 positions with depth values in the z-axis, namely, four positions 87, 987, 6793 and 31723 respectively, if entropy coding is performed on the four depth values, the calculation amount is large, and therefore, in this application, a first table for recording the distribution of voxels in the z-axis is generated as shown in table 4 below:

TABLE 4

As can be seen from table 4, since there is a gap between voxels in the z-axis, it is necessary to eliminate the gap between voxels, specifically, 87 is moved to a position with z-axis coordinate 0, 987 is moved to a position with z-axis coordinate 1, 6793 is moved to a position with z-axis coordinate 2, and 31723 is moved to a position with z-axis coordinate 3. And then mapping the above table 4 to generate a second table having a mapping relationship with the first table, where the second table is used to record the distribution of the voxels after the gaps are eliminated, and the second table is shown in the following table 5:

TABLE 5

It should be understood that, when the compressed depth map is restored to the original depth map, the restoration may be performed by referring to the mapping relationship of table 4 and table 5.

The beneficial effect of this embodiment lies in: when the depth map does not meet the preset condition, the external cube can be converted into a preset data structure only containing 0 and 1 according to the voxel distribution condition in the external cube to represent the depth map to be processed, and the depth map compressed data obtained by encoding the preset data structure only contains 0 and 1, so that the compression rate is improved compared with the traditional compression mode of compressing the depth map into pictures with other formats.

In one embodiment, the above step B2 may be implemented as the following steps D1-D8:

in step D1, resolution information of the depth map is determined;

in step D2, calculating coordinate values of each pixel point according to the resolution information of the depth map;

in step D3, obtaining a depth value of the position of each pixel point;

in step D4, converting the pixel point into a voxel having a depth direction coordinate according to the coordinate value and the depth value of the pixel point;

in step D5, when the voxel conversion is completed, a first table for recording the distribution of voxels on the same coordinate axis is generated;

in step D6, it is determined whether or not a gap exists between voxels on the same coordinate axis based on the first table;

in step D7, when there is a gap, the coordinate values of the respective voxels on the coordinate axis after the gap are reduced to eliminate the gap between the voxels;

in step D8, the object formed of the voxels from which the gap is eliminated is determined to be a three-dimensional target object.

In one embodiment, the above step B4 may be implemented as the following steps E1-E6:

in step E1, performing a cutting operation on the circumscribed cube to cut the circumscribed cube into a plurality of first cubes having the same volume;

in step E2, an analysis operation is performed on the first cubes to determine whether a voxel is present in each first cube;

in step E3, the first cube with no voxels present is labeled 0;

in step E4, the first cube in which a voxel is present is labeled 1;

in step E5, continuing to perform the cutting and analyzing operation on the first cube marked as 1 until the number of cutting operations reaches m times, and determining that the cutting of the circumscribed cube is completed, where m is a positive integer;

in step E6, the cut-completed circumscribed cube is encoded into an m-level tree structure containing 0 s and 1 s.

The above process of converting the depth map into the m-level tree is a dividing process, and if the number of voxels is p, and the last cutting operation is performed m times, the operation scale of the whole process is p × m times.

For example, if p is 279664, the processing times of the whole calculation process are shown in the following table 6:

TABLE 6

As can be seen, the above method finally requires 2237312 (279664 × 8) operations, which results in a huge operation scale. Therefore, in view of the worries of octree performance and the requirement of high real-time volume video, we change the idea to consider the problem from the convergence of geometric progression:

if there are the following numbers: p1, p1 r, p1 r²，...p1*r^(n-1),..; the sum of which is: sn = ∑ p1 × r^(n-1)(ii) a The summation formula is: sn = p1 (r)ⁿ-1)/(r-1); if 0<r<Sn converges at 1.

Based on the idea, if the voxel is changed from the dividing process to the merging process and from the top-down dividing process to the bottom-up merging process, the attenuation factor r in the geometric convergence process may be generated, and as the merging process continues, the attenuation factor decreases, so that the computation amount of each layer decreases, and the data compression is completed at an accelerated speed.

For the above reasons, we provide a scheme for compressing voxels through a merging process, which is as follows:

in one embodiment, the above step B4 may also be implemented as the following steps F1-F3:

in step F1, determining each voxel within the circumscribed cube as a box of unit side length;

in step F2, performing a pile building operation based on the box with the unit side length to obtain an octant pile structure of the point cloud data;

in step F3, entropy encoding is performed on the octree pile structure to obtain encoded point cloud data. Wherein, the step F2 can be realized by the following step a 1:

in step a1, taking the box with the unit side length as a first layer, taking the first layer as an initial layer, merging layer by layer upwards, and stopping merging until the side length of the box on the nth layer is equal to that of the external cube, so as to obtain an eight-fork stack structure of the point cloud data; wherein n is a positive integer.

Specifically, the step a1 can be further realized by steps S111-S112:

in step S111: for any one of 2^k*2^k*2^kCalculating the upper layer box corresponding to the box; wherein the side length of the upper layer of boxes is 2^k+1*2^k+1*2^k+1；

In step S112: making the value of k be 0 and 1 … n in sequence, and repeatedly executing the step S111 until the side length of the box on the upper layer is equal to that of the external cube, and stopping executing the step S111 to obtain an eight-fork stack structure of the point cloud data; wherein n is an integer.

And the step S111 may be further implemented as: for any one of 2^k*2^k*2^kIf the box does not have a corresponding upper layer box, establishing the upper layer box corresponding to the box; and if the box has a corresponding upper layer box, merging the box into the corresponding upper layer box.

Specifically, the method comprises the following steps: the unit side length can be regarded as 1, and then, determining each voxel in the bounding box as a box of the unit side length, i.e. regarding each voxel as a box of 1x1x1, has the following algorithm:

the first step is as follows: will currently each 2^k*2^k*2^kBox, 2, whose higher layer is one time longer than the side length^k+1*2^k+1*2^k ⁺¹A parent box position. In this case, a maximum of eight boxes may have the same parent box.

The second step is that: let k = k +1 and repeat the first step until the current box equals the side length of the circumscribed cube.

This process conforms to the idea of a heap, with the top of the heap being produced at the end.

It should be noted that k = k +1 belongs to a programming idea and is not a mathematical formula, and k = k +1 means that k performs a self-addition operation with a step size of 1 each time the above-mentioned first step is performed.

The number of calculations for the entire merging process is as follows:

table 7:

from the above table it can be seen that although there is not a certain attenuation factor, an approximation can be obtained approximately, so that r ≈ 0.24.

Compared with the above-described division process, the number of the sub-boxes is 369861 in total, and thus, the merging process is performed 369861 times in total. That is, the simple division process requires 2237312 operations, while the merging process requires 369861 operations. The merging scheme has better performance than the partitioning scheme in terms of operation times.

Table 7 reflects the attenuation factor for the case of denser voxels. For the case of sparse voxels, the attenuation pattern may not achieve the attenuation effect of table 7, and the attenuation rate of the first few layers is small or even not, and specifically, in the case of voxel coefficients, the attenuation pattern can be as shown in table 8 below:

TABLE 8

As can be seen from table 8 above, the attenuation effect was not good until layer 5. This is because the distribution of voxels over the integer space is very sparse and therefore the stacking effect is not good at the bottom layers.

For the above case, the above step F2 can also be realized by the following means b1-b 3:

in step b1, based on the boxes with the unit side length, selecting partial sample voxels as detection data, taking the first layer as an initial layer, and merging the voxels layer by layer upwards to obtain the number of boxes in each layer;

in step b2, starting from the second layer, for any layer of boxes, calculating the ratio of the number of the current layer of boxes to the number of the next layer of boxes of the current layer, wherein the ratio is the detection attenuation factor of the current layer;

step b1 may be embodied as the following steps S211-S212:

in step S211: for any 2 of the selected partial sample voxels^k*2^k*2^kCalculating the upper layer box corresponding to the box; wherein the side length of the upper layer of boxes is 2^k+1*2^k+1*2^k+1；

In step S212: and (5) sequentially setting the value of k as 0 and 1 … n, repeatedly executing the step (S211), and counting the number of boxes corresponding to each layer.

In step b3, when the detection attenuation factor is lower than a set threshold, taking the current layer corresponding to the current detection attenuation factor as an initial layer, merging layer by layer upwards, and stopping merging until the side length of the box of the nth layer is equal to that of the external cube, so as to obtain an eight-fork stack structure of the point cloud data; wherein n is a positive integer.

Step b3 may be embodied as the following steps S311-S313:

in step S311, the side length of the mth box is calculated according to the side length of the first box; wherein the mth layer is the initial layer;

in step S312, for any one of 2^k*2^k*2^kCalculating the upper layer box corresponding to the box; wherein the side length of the upper layer of boxes is 2^k+1*2^k+1*2^k+1；

Step S312 may be specifically implemented as: for any one of 2^k*2^k*2^kIf the box does not have a corresponding upper layer box, establishing the upper layer box corresponding to the box; and if the box has a corresponding upper layer box, merging the box into the corresponding upper layer box.

In step S313, the values of k are sequentially m and m +1 … n, and step S312 is repeatedly executed until the side length of the box on the upper layer is equal to the side length of the external cube, and then step S312 is stopped to be executed, so as to obtain an eight-fork stack structure of the point cloud data; wherein m is the number of initial layers and n is an integer.

Counting the number of non-empty boxes on each layer; determining a target layer of which the attenuation rate of the box reaches a preset rate according to the statistical result; and (3) carrying out stacking operation from the target layer according to the position of each layer of boxes and the corresponding next layer of boxes: and for boxes above the target level: acquiring the position of each layer of boxes above a target layer; the position of each layer of boxes above the target layer is recorded by means of partitioning.

Specifically, for the case of sparse voxels, a detection attenuation factor r1 may be determined, so that: r1= Cur _ B _ Count/Point _ Count; if r1< T, heap is built from this layer. Wherein Cur _ B _ Count is the number of the merged father boxes of the layer, and Point _ Count is the number of the original voxels. T is an empirical threshold, usually set to 0.6, and it is clear from the above table that when the attenuation factor of layer 8 is < 0.6, the pile-up energy rapidly attenuates from layer 8.

The concept of the heap is introduced to have incomparable speed advantage under dense point cloud, and good speed can still be kept under the sparse point cloud. This is another benign encoding method than the scheme of compression encoding according to a tree structure.

Fig. 8 is a block diagram of a depth map compression apparatus according to an embodiment of the present application, as shown in fig. 8, the apparatus includes the following modules:

a first obtaining module 81, configured to obtain a depth map to be compressed;

a second obtaining module 82, configured to obtain all pixel points in the depth map;

a determining module 83, configured to determine a concentrated region of the information entropies of all the pixel points;

the splitting module 84 is configured to split the depth map according to the concentration region, wherein all pixel points included in the concentration region are split into main maps, and all pixel points included outside the concentration region are split into sub-maps;

and the encoding module 85 is configured to input information corresponding to the main map and the sub map into the two-dimensional encoding device respectively for encoding, so that the two-dimensional encoding device outputs a video code stream corresponding to the depth map.

In one embodiment, the determining module includes:

In one embodiment, an encoding module comprises:

Fig. 9 is a schematic hardware structure diagram of a depth map compression system provided in the present application, including:

at least one processor 920; and the number of the first and second groups,

a memory 904 communicatively coupled to the at least one processor; wherein,

Referring to fig. 9, the depth map compression system 900 may include one or more of the following components: processing component 902, memory 904, power component 906, multimedia component 908, audio component 910, input/output interface 912, sensor component 914, and communication component 916.

The processing component 902 generally controls the overall operation of the depth map compression system 900, and the processing component 902 may include one or more processors 920 to execute instructions to perform all or part of the steps of the methods described above. Further, processing component 902 can include one or more modules that facilitate interaction between processing component 902 and other components. For example, the processing component 902 can include a multimedia module to facilitate interaction between the multimedia component 908 and the processing component 902.

The memory 904 is configured to store various types of data to support operation in the depth map compression system 900. Examples of such data include instructions for any application or method operating on the depth map compression system 900, such as text, pictures, video, and so forth. The memory 904 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power component 906 provides power to the various components of the depth map compression system 900. The power components 906 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power supplies for the depth map compression system 900.

The multimedia component 908 includes a screen that provides an output interface between the depth map compression system 900 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 908 may also include a front facing camera and/or a rear facing camera. When the depth map compression system 900 is in an operating mode, such as a capture mode or a video mode, the front-facing camera and/or the back-facing camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 910 is configured to output and/or input audio signals. For example, the audio component 910 includes a Microphone (MIC) configured to receive external audio signals when the depth map compression system 900 is in an operating mode, such as a call mode, a recording mode, and a speech recognition mode. The received audio signals may further be stored in the memory 904 or transmitted via the communication component 916. In some embodiments, audio component 910 also includes a speaker for outputting audio signals.

Input/output interface 912 provides an interface between processing component 902 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 914 includes one or more sensors for providing state estimation of various aspects for the depth map compression system 900. For example, the sensor component 914 may include a sound sensor. Additionally, the sensor component 914 may detect an open/closed state of the depth map compression system 900, the relative positioning of components, such as a display and keypad of the depth map compression system 900, the sensor component 914 may also detect a change in position of the depth map compression system 900 or a component of the depth map compression system 900, the presence or absence of user contact with the depth map compression system 900, the orientation or acceleration/deceleration of the depth map compression system 900, and a change in temperature of the depth map compression system 900. The sensor assembly 914 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 914 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 916 is configured to enable the depth map compression system 900 to provide communication capabilities with other devices and cloud platforms in a wired or wireless manner. The depth map compression system 900 may have access to wireless networks based on communication standards, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 916 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 916 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the depth map compression system 900 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components for performing the depth map compression methods described above.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A depth map compression method, comprising:

acquiring a depth map to be compressed;

acquiring all pixel points in the depth map;

2. The method of claim 1, wherein the determining the concentration region of the information entropy of all the pixel points comprises:

determining the values of all pixel points;

converting the values of all the pixel points into binary numbers;

3. The method of claim 2, wherein the determining the concentration region of the information entropy of all the pixel points according to the bit-plane statistics comprises:

4. The method as claimed in claim 1, wherein the inputting of the information corresponding to the main map and the secondary map into a two-dimensional coding device for coding comprises:

generating a complementary graph corresponding to the low bitmap;

comparing compression rates after encoding the low bitmap and the complement;

5. The method as claimed in claim 1, wherein the inputting of the information corresponding to the main map and the secondary map into a two-dimensional coding device for coding comprises:

generating a complementary graph corresponding to the low bitmap;

comparing compression rates after encoding the low bitmap and the complement;

6. The apparatus of claim 5, wherein the means for determining comprises:

7. The apparatus of claim 6, wherein the second determination submodule is specifically configured to:

8. The apparatus of claim 5, wherein the encoding module comprises:

9. A depth map compression system, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to implement the depth map compression method of any one of claims 1-4.

10. A computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor corresponding to a depth map compression system, enable the depth map compression system to implement the depth map compression method of any one of claims 1-4.