CN114827606A

CN114827606A - Quick decision-making method for coding unit division

Info

Publication number: CN114827606A
Application number: CN202210584047.1A
Authority: CN
Inventors: 杨威; 杨金锋; 景晓军; 袁航; 江巧捷; 邓达豪
Original assignee: Guangdong Southern Planning & Designing Institute Of Telecom Consultation Co ltd
Current assignee: Guangdong Southern Planning & Designing Institute Of Telecom Consultation Co ltd
Priority date: 2022-05-26
Filing date: 2022-05-26
Publication date: 2022-07-29

Abstract

The invention belongs to the technical field of video coding processing, and discloses a quick decision method for coding unit division, which comprises the following steps: before a current Coding Unit (CU) starts to traverse all modes, distinguishing the sizes of the modes; if the width and the height of the current Coding Unit (CU) are equal, acquiring a Local Binary Pattern (LBP) characteristic of the current Coding Unit (CU); if the Local Binary Pattern (LBP) characteristic value is smaller than a threshold TH1, the texture characteristic of the current Coding Unit (CU) is simple, and the division of the multi-type tree is skipped; if the Local Binary Pattern (LBP) characteristic value is larger than or equal to the threshold TH1, the texture characteristics of the current Coding Unit (CU) are complex, the variance of the pixel variance of each sub-block of the current Coding Unit (CU) is further obtained, and the division mode corresponding to the largest variance is selected as the optimal division mode. The invention can reduce the complexity of the coding process and the coding time consumed by the corresponding coding process.

Description

Quick decision-making method for coding unit division

Technical Field

The invention belongs to the technical field of video coding processing, and particularly relates to a quick decision method for coding unit division.

Background

With the development of the video market, ultra-high-definition video and virtual reality video are becoming more and more popular because they can provide more realistic perceptual quality. Correspondingly, UHD and VR videos have a large dynamic range of high resolution and brightness, which results in a dramatic increase in data volume, and the hevc (high Efficiency Video coding) Video coding standard is not enough to meet the compression capability required by the future market. The Joint Video Exploration Team (JVET) is developing the next generation standard VVC (Versatile Video coding). To further improve the coding performance of intra prediction, a number of novel coding techniques have been proposed in VVC, such as wide-angle intra prediction from non-square blocks, multi-reference line intra prediction, and intra sub-partitioning. The introduction of these techniques both significantly enhances the performance of the encoder compared to HEVC, but at the same time the complexity of the encoder also increases dramatically. In intra mode, the coding complexity of VVC is 18 times higher than that of HEVC on average, so it is necessary to study fast coding algorithm in video intra mode.

In HEVC, Coding Unit partitioning adopts a quadtree structure, decision of Coding Unit (CU) size occupies most of Coding time, and in VVC, Coding Unit partitioning adopts a quadtree plus Multi-type Tree (QTMT) structure, which can partition a CU into squares and rectangles, so that the CU of VVC can adapt to more texture modes of video content, and the amount of computation is also larger. In the video coding standard, RD costs of all possible CUs are checked first by means of brute force Distortion Optimization (RDO), and then a combination of CUs having the smallest RD costs is selected as a CTU division result. If the dividing result of the CTU can be quickly determined, the full brute force search process is avoided, the encoding time of the encoder is greatly shortened, and therefore the complexity of the encoder is reduced. On the other hand, designing a fast CTU partition decision algorithm is challenging, and if only the time reduction is focused, the partition algorithm is not accurate enough, so that wrong decisions are made too much, and the coding performance is reduced. Therefore, the fast CTU partition decision algorithm under video intra-frame coding needs to be further explored to achieve the balance between coding time and coding performance.

Through research on domestic and foreign documents, fast CTU partitioning decision-making algorithms of intra-frame coding are mainly divided into two categories, namely heuristic methods and data driving methods. The heuristic method is to extract intermediate features in the encoding process, for example: texture homogeneity and spatial correlation, establishing statistical models, determining the division decision of the CTU as early as possible through the models, and terminating the RDO search process in advance to skip unnecessary search in the CTU division process to shorten the coding time of an encoder; in the data-driven method, the CU size decision problem in the intra-frame mode is regarded as a multi-classification problem, and a deep learning model is used for automatically learning the CTU partition mode from enough data, so that the defect that the heuristic method seriously depends on manual feature extraction is avoided, but the complex network calculation also increases the calculation burden for an encoder.

Disclosure of Invention

Embodiments of the present invention provide a method for fast deciding coding unit partition, which can reduce the complexity of the coding process and the coding time consumed by the coding process.

The embodiment of the invention is realized as follows:

a fast decision method for coding unit division comprises the following steps:

before a current Coding Unit (CU) starts to traverse all modes, distinguishing the sizes of the modes;

if the width and the height of the partition are not equal, selecting an optimal partition mode by using an original rate distortion optimization traversal method;

if the width and the height of the current Coding Unit (CU) are equal, acquiring a Local Binary Pattern (LBP) characteristic of the current Coding Unit (CU);

if the Local Binary Pattern (LBP) characteristic value is smaller than a threshold TH1, the texture characteristic of the current Coding Unit (CU) is simple, and the division of the multi-type tree is skipped;

if the characteristic value of the Local Binary Pattern (LBP) is larger than or equal to the threshold TH1, the texture characteristics of the current Coding Unit (CU) are complex, the variance of the pixel variance of each sub-block of the current Coding Unit (CU) is further obtained, and the division mode corresponding to the maximum variance is selected as the optimal division mode; the threshold TH1 is a preset value, and the division mode is a horizontal binary tree division mode, a horizontal ternary tree division mode, a vertical binary tree division mode or a vertical ternary tree division mode;

specifically, whether the complex texture of the current Coding Unit (CU) is in the horizontal direction or the vertical direction is determined according to the number of 1 s in the Local Binary Pattern (LBP) feature value, if the complex texture is in the horizontal direction, a horizontal binary tree partitioning manner or a horizontal ternary tree partitioning manner is selected, and if the complex texture is in the vertical direction, a vertical binary tree partitioning manner or a vertical ternary tree partitioning manner is selected.

In the VVC, pictures are coded in units of blocks. When encoding an image, a large CU block tends to be selected for CU blocks in a uniform region, and a small CU block tends to be selected for texture-rich CU blocks. For texture that is simple or consistent in texture, the pixel values in the texture can be accurately predicted by one intra prediction mode. The invention changes the process of determining the optimal CTU division by calculating RD cost from top to bottom and searching and comparing from bottom to top in an original video encoder into a process only from top to bottom, determines whether to carry out the next division by calculating texture complexity of a coding unit for a heuristic algorithm, can terminate the CTU division process in advance if not dividing, and further determines which division mode is used if dividing; for the data-driven algorithm, the probability of CU division is predicted before formal coding, if the probability is larger than a threshold value, the CU division is carried out, if the probability is smaller than the threshold value, the CU division is not carried out, and the CTU division process is terminated in advance, so that the complexity of the coding process and the corresponding consumed coding time are reduced. The CTU overall division process is considered as a multi-layer classification problem to solve, and the selection of the division mode at each depth is a classification problem.

Drawings

FIG. 1 is an exemplary diagram of the CTU partitioning in VVC;

FIG. 2 is a flow chart of a coding unit partition mode decision model of the present invention;

FIG. 3 is a diagram illustrating a multi-type tree partitioning scheme employed in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Compared with the previous generation coding technology H.265/HEVC, the H.266/VVC coding frame still consists of a plurality of modules such as block partitioning, prediction, transformation quantization, loop filtering, entropy coding and the like, but the key technology for improving the video coding effect is added in each module, and the method specifically comprises the following steps:

1. block partitioning

For the partitioning problem of coding units, in the HEVC standard, a CTU chooses either not to partition at each CU depth or to split into smaller square CUs through a quadtree, and the coding units have four different sizes of 64 × 64, 32 × 32, 16 × 16, and 8 × 8, corresponding to depth values 0, 1, 2, and 3, respectively. In the VVC standard, the partition structure adopted by the CTU is a QTMT structure. The CTU may first choose not to split or split into smaller CUs with a quadtree, and then may further split into smaller CUs with a quadtree or a multi-type tree at each CU depth, where the multi-type tree includes a binary tree and a ternary tree, and each tree structure has both horizontal and vertical modes. Once a CU chooses to partition using a multi-type tree, it cannot be subsequently partitioned using a quadtree. For a stage, there may be up to six partitioning modes to choose (i.e., non-split, quadtree, horizontal binary tree, vertical binary tree, horizontal ternary tree, and vertical ternary tree) until the minimum width or height of a CU is satisfied to be 4. The default size of the CTU is 128 × 128, the smallest CU is 4 × 4, and the shape of the CU in this partition structure may be square or rectangular, which can flexibly adapt to more textures. Fig. 1 is a diagram illustrating an example of CTU division in the VVC.

In the video coding standard, the overall process of CTU partitioning is divided into two processes, bottom-up and top-down. The method comprises the steps of firstly checking RD cost of the size of a CU which can be obtained by all partitioning modes from top to bottom of a CTU, then comparing the sum of the RD cost of each layer of the CU and the RD cost of sub-CUs of the CU from bottom to top, selecting the minimum RD cost as the current optimal cost until the uppermost layer traverses the whole partitioning tree, and taking the CU combination corresponding to the obtained minimum RD cost of the CTU as the final partitioning result of the CTU. Among them, the coding complexity in VVC is much higher than that in HEVC due to more CU partitioning modes and finer partitioning. For each CTU in HEVC, 85 CUs need to be checked in the encoding process, while in VVC this number is increased to 5781. In fact, only a small part of the checked CUs (at least 1 CU, at most 64 CUs) are present in the final partitioning result in HEVC, while the number of final partitioning result CUs in VVC is at least 1 and at most 1024. Rate-distortion cost calculations for up to 84 CUs (i.e. 85-1) and at least 21 CUs (i.e. 85-64) in HEVC can be avoided if the partitioning result of the entire CTU can be accurately predicted in advance; precoding of up to 5780 CUs (i.e., 5781-1) and at least 4757 CUs (i.e., 5781-. Therefore, it is desirable to achieve fast coding in video intra mode, starting from reducing redundancy of CTU partition calculation process.

Unlike HEVC intra mode, which uses the same CU partition for all color channels, VVC intra mode takes an independent partitioning strategy for the CTUs of the luma and chroma components. Since the luma component carries a lot of detail information compared to the chroma component, the partitioning of the luma component CU should be finer in order to maintain coding performance, and therefore the partitioning of the luma component CU takes up most of the encoder encoding process time. The present invention is directed to CTU partitioning decisions for video luma components. In video coding, since the partitioning of the CTUs is progressive layer by layer, and the CU has multiple partitioning modes for each depth value, there are multiple partitioning results for the entire CTU, so if the CTU partitioning problem is directly treated as a multi-classification problem, it is difficult to treat the problem due to the classification. The invention solves the problem of multi-layer and two-classification in the whole CTU dividing process.

CU size is highly correlated with the texture of the video content, areas with simple and average textures tend to cover larger sized CUs, while areas with complex textures are always divided into smaller CUs. Therefore, one fast partitioning strategy is to terminate the CTU partitioning process early. If the current CU is detected as a flat area, the current CU does not need to be divided to the next depth, and the whole CTU division is terminated in advance. However, flat areas are not common for VVCs and therefore not enough to significantly reduce the computational burden, since there are still a large number of CUs that need to recursively execute all partition modes. Thus, if several partitioning modes can be skipped, more complexity reduction can be achieved. Because the partitioning mode is divided into horizontal and vertical directions, complexity of the horizontal direction and complexity of the vertical direction of the uneven CU can be further calculated, the partitioning mode in the corresponding direction is further selected, and calculation of RD cost (rate distortion cost) of the CU in the unnecessary partitioning mode is avoided.

The selection of the characteristics and the division threshold value of each layer is a key point for determining the coding effect, the invention selects Local Binary Patterns (LBP), the LBP is an operator for describing the Local texture characteristics of the image, the pixel point at the center of a 3 x 3 window is taken as the center, and the pixel value of the pixel point is taken as the threshold value. Then comparing the gray values of 8 surrounding pixel points with the threshold, and if the difference value of a certain surrounding pixel value minus the central pixel value is greater than a certain threshold, marking the position of the pixel point as 1; otherwise, the pixel point is marked as 0. Thus, 8 points in the window may produce an 8-bit unsigned number, which results in an LBP value for the window. And comparing the number of 1 in the LBP value as an index with a threshold value, and judging whether the CU is a complex texture area. And then judging whether the complex texture of the CU is in the horizontal direction or the vertical direction according to the number of the LBPs (1) at different positions, and further selecting horizontal binary tree division, horizontal ternary tree division or vertical binary tree division and vertical ternary tree division. Note that when a CU is partitioned by a multi-type tree, the partitioning cannot be performed using a quadtree.

2. Prediction

The prediction module in video coding is to eliminate data redundancy in a time domain and a space domain by utilizing the principle that strong pixel correlation exists between pixels in a video image frame or between adjacent image frames, and the prediction module is divided into an intra-frame prediction technology and an inter-frame prediction technology. The intra-frame prediction refers to a process of predicting a pixel value of a current block by using a neighboring pixel block of the current image block which is coded and processed, in consideration of the characteristic that spatial redundancy exists in the image block (i.e. strong similarity exists between neighboring pixels in a frame of image). Many technologies are added to the new generation of video coding h.266/VVC Intra Prediction, such as more Angle direction Prediction, non-square Wide-Angle Intra Prediction (Wide-Angle Intra Prediction for non-square blocks, WAIPs), Position-Dependent Intra Prediction Combination (PDPCs), Intra Sub-block partitioning (ISP), Cross-Component Linear Model Prediction (Cross-Component Linear Model Prediction, CCLM), Multi-reference Line Intra Prediction (MLIP), Matrix weighted Intra Prediction (MIP), and so on.

3. Transform quantization

The Transform is generally referred to as Discrete Transform (DT) which performs Discrete Transform on predicted residual values, and since an image has a flat region with a high probability of having simple texture, after the Transform process, data of the region which is relatively dispersed in a spatial domain is relatively concentrated in a certain region of the Transform domain, thereby effectively reducing data redundancy of the video image. In general, the transformed DT coefficients are often in a large continuous range. In order to reduce the value range of the DT coefficient, the continuous transform coefficients are quantized to further reduce the data size, and only a certain amount of data accuracy is lost. In the H.266/VVC, in order to increase the calculation speed of transformation and quantization, transformation and quantization are still simultaneously performed at the time of encoding. Besides the existing DCT2 transformation core in H.265/HEVC, DST7 and DCT8 transformation cores are also newly added, and the optimal transformation effect is achieved by selecting the appropriate transformation core according to different prediction modes.

4. Encoding of soil moisture

Entropy coding is a common lossless data coding compression technique in information theory, which replaces image data information with a binary stream. Entropy coding is combined with transformation and quantization, which can significantly reduce video image data. The entropy coding changes data (such as motion vector information, transformation quantization coefficient, etc.) bearing video image information into a binary data stream which can be stored or transmitted, and the original video is a compressed code stream after the entropy coding processing.

5. Filtering and compensation

Since the H.266/VVC video coding is predicted after the CU is subjected to block division, the video subjected to the H.266/VVC video coding has distortion phenomena such as image blocking effect, ringing effect, poor image quality and the like. In order to reduce the bad visual experience caused by the video distortion phenomenon, a De-Blocking Filter (DBF) is adopted in the h.266/VVC coding to reduce the Blocking effect, and the strength of the Filter is determined in the DBF by referring to the level of the reconstructed luminance component, so that the filtering effect is better. And the H.266/VVC continues to adopt sample adaptive compensation filtering for weakening aiming at the existing ringing phenomenon. Meanwhile, Adaptive Loop Filter (ALF) based on blocks is also applied to improve the subjective quality evaluation of the image and improve the encoding efficiency of H.266/VVC.

The following detailed description of specific implementations of the present invention is provided in conjunction with specific embodiments:

the invention discloses a quick decision-making method for coding unit division, which comprises the following steps:

if the characteristic value of the Local Binary Pattern (LBP) is larger than or equal to the threshold TH1, the texture characteristics of the current Coding Unit (CU) are complex, the variance of the pixel variance of each sub-block of the current Coding Unit (CU) is further obtained, and the division mode corresponding to the maximum variance is selected as the optimal division mode; the threshold TH1 is a preset value, and can be assigned according to different situations, that is: adjusting parameters according to different conditions of experience and processing, wherein the division mode is a horizontal binary tree division mode, a horizontal ternary tree division mode, a vertical binary tree division mode or a vertical ternary tree division mode;

In VVC, pictures are coded in units of blocks. When encoding an image, a large CU block tends to be selected for CU blocks in a uniform region, and a small CU block tends to be selected for texture-rich CU blocks. For texture that is simple or consistent in texture, the pixel values in the texture can be accurately predicted by one intra prediction mode. The invention changes the process of determining the optimal CTU division by calculating RD cost from top to bottom and searching and comparing from bottom to top in an original video coder into a process only from top to bottom, determines whether to carry out the next division by calculating texture complexity of a coding unit for a heuristic algorithm, can terminate the CTU division process in advance if the CTU division is not carried out, and further determines which division mode is used if the CTU division is carried out; for the data-driven algorithm, the probability of CU partition is predicted before formal coding, if the probability is greater than a threshold value, the partition is carried out, if the probability is less than the threshold value, the partition is not carried out, and the CTU partition process is terminated in advance, so that the complexity of the coding process and the coding time consumed correspondingly are reduced. The CTU overall division process is considered as a multi-layer classification problem to solve, and the selection of the division mode at each depth is a classification problem.

For the first part (block division), LBP (Local Binary pattern) characteristics are selected as the basis for judging the division, whether the division is carried out is judged, and if the division is carried out, the variance of the pixel variance of the divided sub-blocks is calculated, and the division in which direction is further judged, so that part of unnecessary CU division can be skipped, the calculation times of rate distortion loss are reduced, and the complexity of video intra-frame coding can be effectively reduced. The evaluation indexes of the actual effect are the percentage (representing the encoding complexity) of the encoding time saved relative to the original encoder, the code rate BD-BR required by encoding and the peak signal-to-noise ratio BD-PSNR (representing the encoding performance), and the target is reached when more time is saved under the condition that the BD-BR and the BD-PSNR are ensured to be smaller. The proposed LBP and variance feature-based partition mode fast decision method is used for realizing the fast decision of the partition mode of the coding unit, the main basis is that the partition of the coding unit is closely related to the distribution of the texture feature of the coding unit, and the frame structure is shown in figure 2 and is a flow chart of the coding unit partition mode decision model of the invention.

The architecture is mainly divided into two parts, namely, whether the current block needs to be divided or not is judged, and if the current block needs to be divided into the multiple types of cross trees, the optimal division mode is decided. Strategies are to skip ahead or to decide on unnecessary quadtrees and multi-type tree partitions in the h.266/VVC. When detecting that a CU to be encoded is a simple flat area, a decision to terminate the nested multi-type tree partitioning in advance may be performed. However, since flat areas are not common, it is not enough to reduce the amount of computation significantly, because there are still a large number of CUs that need to perform all the partitioning patterns recursively. Therefore, if the partition mode of the current CU can be decided in advance by the texture complexity of the image block, thereby reducing the rate-distortion optimization process, more complexity can be reduced, and the coding efficiency can be improved. Before a current CU starts to traverse all modes, the size of the current CU is firstly distinguished, if the width and the height of the current CU are unequal, an original rate-distortion optimization traversal method is carried out to select an optimal partition mode, if the width and the height of the current CU are equal, LBP characteristics of the current CU are calculated, if the value is smaller than a threshold TH1, texture characteristics of an encoding unit are simple, partition of a multi-type tree can be skipped, otherwise, variance of each subblock is calculated, and a partition mode corresponding to the largest variance is selected as the optimal partition mode. The main objective is to skip unnecessary rate-distortion calculation times, thereby shortening the encoding time. In the present invention, the dividing manner of the vertical binary tree direction, the horizontal binary tree direction, the vertical ternary tree direction, and the horizontal ternary tree direction is as shown in fig. 3.

PSNR is a full-reference image quality evaluation index, which is a widely used image objective evaluation index based on the mean square error between the current image and the reference image. Because the method is an image quality evaluation based on error sensitivity, and does not consider the visual characteristics of human eyes (the human eyes have higher sensitivity to contrast difference with lower spatial frequency, the human eyes have higher sensitivity to brightness contrast difference, and the human eyes can sense a region and the perception result of the human eyes can be influenced by the surrounding adjacent regions). Thus, there are often cases where the evaluation results do not coincide with the subjective feeling of a person. BD-PSNR, BDDR (%) and DT (%) are used as the performance indexes of the evaluation algorithm. The BDBR (%) shows the code rate saving condition of the two methods under the same objective quality, the optimization algorithm generally causes the increase of the BDBR, and the larger the BDBR is, the worse the coding effect is.

The BDBRs of all sequences are averaged to reflect the complete coding quality; the BDPSNR refers to the difference of PSNR-Y of the two methods, namely the change of the objective quality of the image of the optimized algorithm, compared with the original algorithm under the same code rate. When the BDPSNR is positive, the optimization algorithm improves the coding performance of the algorithm, and conversely, the BDPSNR represents the reduction of the coding performance. DT (%) refers to the coding time saved by the proposed algorithm compared to the VTM original algorithm, which is calculated as:

where T1 denotes the encoding time of the proposed algorithm (or modified algorithm) and Tr denotes the encoding time of the reference software VTM 6.0.

In summary, the key point of the present invention is that the LBP feature is used as a basis for determining whether CTU is divided or not and the division mode, and when complex texture is determined, the next division is performed; and taking the variance of the sub-coding units as a basis for judging which multi-type tree, and particularly dividing the multi-type tree into a horizontal binary tree direction, a horizontal ternary tree direction, a vertical binary tree direction and a vertical ternary tree direction. Otherwise, the division process is terminated in advance, the traversal and the corresponding rate distortion calculation times are reduced, and the complexity of the encoder is reduced.

The above description is only exemplary of the present invention and should not be taken as limiting the invention, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A fast decision method for coding unit partitioning, comprising the steps of:

if the widths and the heights of the partition modes are not equal, selecting an optimal partition mode by using an original rate distortion optimization traversal method;