CN111107368A

CN111107368A - Fast method for split tree decision

Info

Publication number: CN111107368A
Application number: CN201911033440.6A
Authority: CN
Inventors: 张莉; 张凯; 刘鸿彬; 王悦
Original assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Current assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Priority date: 2018-10-26
Filing date: 2019-10-28
Publication date: 2020-05-05
Anticipated expiration: 2039-10-28
Also published as: WO2020084604A1; CN111107368B

Abstract

A method for video processing, including a fast method for partition tree decision, includes, for a transition between a current block of video and a bitstream representation of the video, determining whether an Extended Quadtree (EQT) partitioning process applies to the current block based on a rule, and performing the transition based on the determination. The EQT segmentation process includes segmenting a given block into exactly four sub-blocks, wherein at least one sub-block has a size that is different from half the width of the given block multiplied by half the height of the given block, and the rule specifies a maximum depth of the EQT segmentation process based on an attribute associated with the current block.

Description

Fast method for split tree decision

Cross Reference to Related Applications

The present application is intended to claim in time the priority and benefit of international patent application PCT/CN2018/111990 filed on 26.10.2018 and international patent application PCT/CN2018/119316 filed on 5.12.2018, according to applicable patent laws and/or rules of paris convention. The entire disclosure of the above application is incorporated by reference as part of the disclosure of this patent document.

Technical Field

This document relates to video and image coding techniques.

Background

Currently, efforts are being made to improve the performance of current video codec techniques to provide better compression ratios or to provide video encoding and decoding schemes that allow for lower complexity or parallel implementation. Industry experts have recently proposed several new video coding tools, which are currently being tested to determine their effectiveness.

Disclosure of Invention

Some disclosed embodiments relate to encoding and decoding images and video pictures using a rule-based extended quadtree partitioning process. In one advantageous aspect, certain aspects of the rules are predefined, allowing encoder and decoder embodiments to generate the partition tree and perform decoding using fewer computational resources than conventional image and video encoding techniques.

In one example aspect, a method for video processing is disclosed. The method includes, for a conversion between a current block of the video and a bitstream representation of the video, determining whether an Extended Quad Tree (EQT) partitioning process is applicable to the current block based on a rule, and performing the conversion based on the determination. The EQT segmentation process includes segmenting a given block into exactly four sub-blocks, wherein at least one sub-block has a size that is different from half the width of the given block multiplied by half the height of the given block, and the rule specifies a maximum depth of the EQT segmentation process based on an attribute associated with the current block.

In another example aspect, a method of visual media processing includes performing a conversion between a current block of visual media data and a corresponding bitstream representation of the block using a rule for using an Extended Quadtree (EQT) partitioning process, wherein the EQT partitioning process includes partitioning a given block into exactly four sub-blocks, wherein a size of at least one of the sub-blocks is different from half a width of the given block multiplied by half a height of the given block, and wherein the rule specifies that, where the rule is for partitioning the current block, each sub-block is further divided into a Binary Tree (BT) partition or another EQT partition, and both the BT partition and the other EQT partition have depths that satisfy a predefined relationship.

In another example aspect, a visual media processing method is disclosed. The method includes performing a conversion between a current block of visual media data and a corresponding bitstream representation of the block using a rule for using an Extended Quadtree (EQT) partitioning process, wherein the EQT partitioning process includes partitioning a given block into exactly four sub-blocks, wherein at least one sub-block has a size that is different from half a width of the given block multiplied by half a height of the given block, and wherein the rule allows the EQT partitioning process for the current block based on the width or height of the current block.

In yet another aspect, another method of visual media processing is disclosed. The method includes performing a conversion between a current block of visual media data and a corresponding bitstream representation of the block using a rule for using an Extended Quadtree (EQT) partitioning process, wherein the EQT partitioning process includes partitioning a given block into exactly four sub-blocks, wherein at least one sub-block has a size that is different from half the width of the given block multiplied by half the height of the given block, and wherein the rule allows the EQT partitioning process for the current block based on the position of the current block.

In yet another aspect, another method of visual media processing is disclosed. The method includes performing a conversion between a current block of visual media data and a corresponding bitstream representation of the block using a rule for using an Extended Quadtree (EQT) partitioning process, wherein the EQT partitioning process includes partitioning a given block into exactly four sub-blocks, wherein a size of at least one of the sub-blocks is different from half a width of the given block multiplied by half a height of the given block, and wherein the rule allows a maximum depth of the EQT partitioning process to depend on a distance between a current picture of the current block and a reference picture of the current block or a quantization parameter of the current block or a temporal layer identifier of the current picture.

These and other aspects are described in more detail throughout this document.

Drawings

Fig. 1 shows a block diagram of an example implementation of video encoding and decoding.

Fig. 2 shows an example of MacroBlock (MB) partitioning according to the h.264/Audio Video Codec (AVC) standard.

Fig. 3 shows an example of a pattern for dividing a Coding Block (CB) into Prediction Blocks (PB) subject to certain size constraints. For example, intra pictures are only allowed to use M × M and M/2 × M/2 sizes.

Fig. 4 shows an example of the subdivision of a Coding Tree Block (CTB) into CBs and Transform Blocks (TBs). In the figure, a solid line indicates a CB boundary, and a dotted line indicates a TB boundary. The left side is the CTB with its partition and the right side is the corresponding quadtree.

Fig. 5 is an exemplary illustration of a Quad Tree Binary Tree (QTBT) structure.

Fig. 6 illustrates various examples of block segmentation.

Fig. 7A-7K show examples of block segmentation.

Fig. 8A-8D illustrate examples of block segmentation.

Fig. 9A-9B show examples of Generalized Triple Tree (GTT) partitioning.

FIG. 10 illustrates an example of the syntax and semantics of the multi-functional boundary segmentation.

11A-11B illustrate examples of allowed EQT modes that may be further divided into EQT or BT.

Fig. 12 shows an example of divided binarization.

Fig. 13A and 13B show examples of horizontal and vertical EQTs.

FIG. 14 illustrates an example hardware platform for implementing some disclosed methods.

FIG. 15 illustrates another example hardware platform for implementing some disclosed methods.

FIG. 16 is a flow diagram of an example method of visual media processing.

Fig. 17 shows an example of 7 specific locations used in the early termination of EQT segmentation.

FIG. 18 is a block diagram of an example video processing system in which the disclosed techniques may be implemented.

Fig. 19 is a flowchart representation of a method for video processing according to the present disclosure.

Detailed Description

This document provides several techniques that may be embodied in digital video or images (collectively referred to as visual media), encoders, and decoders. For clarity of understanding, section headings are used in this document and do not limit the scope of the techniques and embodiments disclosed in each section to that section only.

1. Brief summary

This document relates to image/video Coding, and in particular to image/video Coding with respect to partition structure, i.e. how one Coding Tree Unit (CTU) is divided into Coding Units (CUs), and how to speed up the encoder to select the optimal partition structure. It can be applied to existing Video Coding standards, such as HEVC (high efficiency Video Coding), or to a standard that is to be finalized (universal Video Coding). It may also be applicable to future video coding standards or video codecs.

2. Introduction to video encoding and decoding techniques

Video coding standards have evolved largely through the development of the well-known ITU-T and ISO/IEC standards. ITU-T has formulated H.261 and H.263, ISO/IEC has formulated MPEG-1 and MPEG-4Visual, and these two organizations have jointly formulated the H.262/MPEG-2Video and H.264/MPEG-4Advanced Video Coding (AVC) and H.265/HEVC standards. Since h.262, the video coding standard was based on a hybrid video coding structure, in which temporal prediction plus transform coding was employed. An example of a typical HEVC encoder framework is depicted in fig. 1.

Partition tree structure in 2.1H.264/AVC

In the previous standard, the core of the coding layer is a macroblock, containing a 16 × 16 block of luminance samples, and in the case of the usual 4:2:0 color samples, two corresponding blocks of 8 × 8 chrominance samples.

Intra-coded blocks exploit spatial correlation between pixels using spatial prediction. Two segmentations are defined: 16 × 16 and 4 × 4.

Inter-coded blocks use temporal prediction, rather than spatial prediction, by estimating motion between pictures. Motion may be estimated independently for a 16 × 16 macroblock or any sub-macroblock partition (partition) thereof (16 × 8, 8 × 16, 8 × 8, 8 × 4, 4 × 8, 4 × 4 (see fig. 5)). Only one Motion Vector (MV) per sub-macroblock partition is allowed.

FIG. 2 shows an example of MB partitioning in H.264/AVC.

Partition tree structure in 2.2HEVC

In HEVC, various local characteristics are accommodated by dividing CTUs into CUs using a quadtree structure, denoted as a coding tree. The decision whether to encode a picture region using inter-picture (temporal) prediction or intra-picture (spatial) prediction is made at the CU level. Each CU may be further divided into one, two, or four PUs according to a PU (Prediction Unit) division type. Within a PU, the same prediction process is applied and the relevant information is sent to the decoder on a PU basis. After a residual block is obtained by applying a prediction process based on the PU partition type, the CU may be partitioned into Transform Units (TUs) according to another quadtree structure similar to the coding tree of the CU. An important feature of the HEVC structure is that it has multiple partitioning concepts, including CU, PU and TU.

In the following, various features involved in hybrid video coding using HEVC are emphasized as follows.

1) Coding tree unit and Coding Tree Block (CTB) structure: the analogous structure in HEVC is a Coding Tree Unit (CTU), the size of which is chosen by the encoder and can be larger than a conventional macroblock. The CTU consists of luma CTB and corresponding chroma CTB and syntax elements. The size L × L of the luminance CTB may be chosen to be L ═ 16, 32, or 64 samples, with larger sizes generally enabling better compression. HEVC then supports the partitioning of CTBs into smaller blocks using a tree structure and quadtree-like signaling.

2) Coding Unit (CU) and Coding Block (CB): the quad tree syntax of a CTU specifies the size and location of its luma CB and chroma CB. The root of the quadtree is associated with the CTU. Therefore, the size of the luminance CTB is the maximum supported size of the luminance CB. The division of the CTUs into luma CBs and chroma CBs is jointly signaled. One luma CB typically forms a Coding Unit (CU) with two chroma CBs and associated syntax. A CTB may contain only one CU or may be partitioned to form multiple CUs, and each CU has a tree of associated partition and Transform Units (TUs) partitioned into Prediction Units (PUs).

3) Prediction unit and Prediction Block (PB): the decision whether to encode a picture region using inter-picture prediction or intra-picture prediction is made at the CU level. The root of the PU partition structure is at the CU level. The luma CB and chroma CB may then be further divided in size and scaled from luma to chroma depending on the underlying prediction type decisionDegree and chroma Prediction Blocks (PB). HEVC supports variable PB sizes down from 64 × 64 to 4 × 4 samples.

4) TU and transform block: the prediction residual is encoded using a block transform. The root of the TU tree structure is at the CU level. The luma CB residual may be the same as the luma Transform Block (TB) or may be further divided into smaller luma TBs. The same is true for chroma TB. Integer basis functions similar to the Discrete Cosine Transform (DCT) are defined for square TBs of sizes 4 × 4, 8 × 8, 16 × 16, and 32 × 32. For the 4 × 4 transform of the luma intra picture prediction residual, an integer transform derived from a form of Discrete Sine Transform (DST) is optionally specified.

Fig. 3 shows an example of a pattern for partitioning a Coding Block (CB) into Prediction Blocks (PB) subject to certain size constraints. For example, intra pictures are only allowed to use M × M and M/2 × M/2 sizes.

Fig. 4 shows an example of the subdivision of a Coding Tree Block (CTB) into CBs and Transform Blocks (TBs). In the figure, a solid line indicates a CB boundary, and a dotted line indicates a TB boundary. The left side is the CTB with the partition and the right side is the corresponding quadtree.

Quad-tree plus binary-tree block structure with larger CTU in 2.3JEM

In order to explore future Video coding techniques other than HEVC, VCEG and MPEG united in 2015 to form Joint Video Exploration Team (jmet). Since then, JFET has adopted many new methods and placed them into a reference software named Joint Exploration Model (JEM).

2.3.1QTBT block segmentation structure

Unlike HEVC, the QTBT structure removes the concept of multiple partition types, i.e., it removes the separation of CU, PU and TU concepts and supports more flexibility of CU partition shapes. In the QTBT block structure, a CU may be square or rectangular. As shown in fig. 5, a Coding Tree Unit (CTU) is first divided by a quadtree structure. The leaf nodes of the quadtree are further partitioned by a binary tree structure. Two division types of symmetric horizontal division and symmetric vertical division are provided in the binary tree division. The binary tree leaf nodes are called Coding Units (CUs) and the segments are used for prediction and transform processing without any further partitioning. This means that in a QTBT coding block structure, a CU, a PU and a TU have the same block size. In JEM, a CU sometimes consists of Coded Blocks (CBs) of different color components, e.g., in the case of P-and B-slices of the 4:2:0 chroma format, one CU contains one luma CB and two chroma CBs; CBs sometimes consist of CBs of a single component, e.g., in the case of I-slices, a CU contains only one luma CB, or only two chroma CBs.

The following parameters are defined for the QTBT segmentation scheme.

-CTU size: root node size of quadtree, same concept as in HEVC

-MinQTSize: minimum allowed quadtree leaf node size

-MaxBTSize: maximum allowed binary tree root node size

-MaxBTDepth: maximum allowed binary tree depth

-MinBTSize: minimum allowed binary tree leaf node size

In one example of the QTBT segmentation structure, the CTU size is set to 128 × 128 luma samples with two corresponding 64 × 64 chroma sample blocks, MinQTSize is set to 16 × 16, MaxBTSize is set to 64 × 64, MinBTSize (for both width and height) is set to 4 × 4, and MaxBTDepth is set to 4. Quadtree partitioning is first applied to CTUs to generate quadtree leaf nodes. The quadtree leaf nodes may have sizes from 16 × 16 (e.g., MinQTSize) to 128 × 128 (e.g., CTU size). If the quad tree leaf node is 128 x 128, it will not be further partitioned through the binary tree since the size exceeds MaxBTSize (e.g., 64 x 64). Otherwise, the leaf nodes of the quadtree may be further partitioned by the binary tree. Thus, the leaf nodes of the quadtree are also the root nodes of the binary tree, and their binary tree depth is 0. When the binary tree depth reaches MaxBTDepth (e.g., 4), no further partitioning is considered. When the width of the binary tree node is equal to MinBTSize (e.g., 4), no further horizontal partitioning is considered. Similarly, when the height of the binary tree node is equal to MinBTSize, no further vertical partitioning is considered. The leaf nodes of the binary tree are further processed by prediction and transformation processes without any further partitioning. In JEM, the maximum CTU size is 256 × 256 luminance samples.

In each partition (i.e., non-leaf) node of the binary tree, for example, as shown in fig. 5, a flag is signaled to indicate which partition type (i.e., horizontal or vertical) to use, where 0 indicates horizontal partitioning and 1 indicates vertical partitioning. For a quadtree partition, there is no need to indicate the partition type, since the quadtree partition always partitions a block horizontally and vertically to generate 4 sub-blocks having equal sizes.

In addition, the QTBT scheme supports the ability for luminance and chrominance to have separate QTBT structures. Currently, luminance CTB and chrominance CTB in one CTU share the same QTBT structure for P-and B-stripes. However, for I-stripes, luma CTB is partitioned into luma CUs by a QTBT structure, and chroma CTB is partitioned into chroma CUs by another QTBT structure. This means that a CU in an I-slice consists of coded blocks for the luma component or two chroma components, and a CU in a P-slice or B-slice consists of coded blocks for all three color components.

In HEVC, inter prediction of small blocks is restricted to reduce memory access for motion compensation, so that bi-prediction is not supported for 4 × 8 blocks and 8 × 4 blocks, and inter prediction is not supported for 4 × 4 blocks. In the QTBT of JEM, these restrictions are removed.

2.4VVC (Versatile Video Coding, Multi-function Video Coding) ternary Tree

In some cases, quadtrees and tree types other than binary trees are supported. In an embodiment, two further Ternary Tree (TT) divisions, i.e. horizontal and vertical center-side ternary trees, are introduced, as shown in items (d) and 6(e) of fig. 6.

Fig. 6 shows an example of a block division mode. (a) Quadtree splitting (b) vertical binary tree splitting (c) horizontal binary tree splitting (d) vertical center side ternary tree splitting (e) horizontal center side ternary tree splitting.

There are two levels of trees, the area tree (quadtree) and the prediction tree (binary or ternary). The CTU is first partitioned by a Region Tree (RT). The RT leaves may be further partitioned with a Prediction Tree (PT). PT leaves may also be further partitioned with PT until a maximum PT depth is reached. The PT leaf is the basic coding unit. For convenience, it is still referred to as CU. A CU cannot be further partitioned. Both prediction and transformation are applied to the CU in the same way as JEM. The entire partition structure is called a "multi-type tree".

2.5 expanding quadtrees

An Extended Quadtree (EQT) partitioning structure, corresponding to a block partitioning process that includes an extended quadtree partitioning process for a block of video data, wherein the extended quadtree partitioning structure represents partitioning the block of video data into final sub-blocks, and when the extended quadtree partitioning process decides to apply extended quadtree partitioning to a given block, the given block is always divided into four sub-blocks; decoding the final subblocks based on the video bitstream; and decoding the block of video data based on the final sub-block decoded according to the derived EQT structure. EQT is set forth in the above-identified patent application and incorporated herein by reference.

The EQT segmentation process may be recursively applied to a given block to generate EQT leaf nodes. Alternatively, when the EQT is applied to a block, each sub-block may be further divided into BT, and/or QT, and/or TT, and/or EQT, and/or other kinds of partition trees for each sub-block resulting from the EQT.

In one example, EQT and QT may share the same depth increment process and the same leaf node size constraint. In this case, the partitioning of a node may be terminated implicitly when the size of the node reaches the minimum allowed quadtree leaf node size or the EQT depth associated with the node reaches the maximum allowed quadtree depth.

In some embodiments, EQT and EQT may share different depth increment processes and/or leaf node size constraints. When a node reaches its minimum allowed EQT leaf node size or the EQT depth associated with the node reaches its maximum allowed EQT depth, the partitioning of the node by the EQT is implicitly terminated. Further, in one example, the EQT depth and/or the minimum allowed EQT leaf node size may be signaled in a Sequence Parameter Set (SPS), and/or a Picture Parameter Set (PPS), and/or a slice header, and/or a CTU, and/or a region, and/or a slice, and/or a CU.

Instead of using the current quadtree partitioning applied to square blocks, for blocks with mxn (M and N are non-zero positive integer values, equal or unequal) sizes, in EQT, a block may be equally divided into four partitions, such as M/4 xn or mxn/4 (examples are depicted in fig. 7A and 7B), or equally divided into four partitions and the partition size depends on the maximum and minimum values of M and N. In one example, a 4 × 32 block may be divided into four 4 × 8 sub-blocks, and a 32 × 4 block may be divided into four 8 × 4 sub-blocks.

Instead of using the current quadtree splitting applied to square blocks, for blocks with M × N (M and N are non-zero positive integer values, equal or not) sizes, in EQT, one block may be divided unequally into four partitions, such as two partitions of size equal to (M × w0/w) × (N × h0/h) and two other partitions of size equal to (M × w0)/w) × (N × h 0)/h.

For example, w0 and w may be equal to 1 and 2, respectively, i.e., the width is reduced by half, while the height may use other ratios than 2:1 to obtain the sub-blocks. Examples of this are depicted in fig. 7C and 7E. Alternatively, h0 and h may be equal to 1 and 2, respectively, i.e., the height is reduced by half, while the width may use other ratios than 2: 1. Examples of this are depicted in fig. 7D and 7F.

Fig. 7G and 7H show two alternative examples of quadtree splitting.

Fig. 7I shows a more general case of quadtree splitting with different split shapes.

Fig. 7J and 7K show a general example of fig. 7A and 7B.

FIG. 7C shows the subblocks fixed in width at M/2 and equal in height to N/4 or 3N/4, with the top two partitions smaller; FIG. 7D shows the subblocks fixed in height to N/2 and equal in width to M/4 or 3M/4, the two left partitions being smaller.

FIG. 7E shows the subblocks fixed in width at M/2 and equal in height to 3N/4 or N/4, the bottom two partitions being smaller. FIG. 7F shows the subblocks fixed in height to N/2 and equal in width to 3M/4 or M/4, the two partitions on the right being smaller. The following example dimensions are shown: FIG. 7G: m/2 XN/4 and M/N/2; FIG. 7H: NxM/4 and N/2 xM/2; FIG. 7I: m1 XN 1, (M-M1). times.N 1, M1X (N-N1), and (M-M1) X (N-N1); FIG. 7J: m × N1, M × N2, M × N3, and M × N4, where N1+ N2+ N3+ N4 is N; FIG. 7K: m1 × N, M2 × N, M3 × N and M4 × N, wherein M1+ M2+ M3+ M4 ═ M.

A Flexible Tree (FT) partitioning structure corresponding to a block partitioning process comprising an FT partitioning process of a block of video data, wherein the FT partitioning structure represents partitioning the block of video data into final sub-blocks, and when the FT partitioning process decides to apply FT partitioning to a given block, the given block is partitioned into K sub-blocks, wherein K may be greater than 4; decoding the final subblocks based on the video bitstream; and decoding the block of video data based on the final sub-block decoded according to the derived FT structure.

The FT segmentation process may be recursively applied to a given block to generate FT-leaf nodes. When a node reaches the minimum allowed FT leaf node size or the FT depth associated with the node reaches the maximum allowed FT depth, the segmentation of the node is implicitly terminated.

In some embodiments, when FT is applied to a block, each sub-block may be further divided into BT, and/or QT, and/or EQT, and/or TT, and/or other kinds of partition trees for each sub-block resulting from FT.

Furthermore, in some embodiments, the FT depth or minimum allowed FT leaf node size or minimum allowed partition size of the FT may be signaled in a Sequence Parameter Set (SPS), and/or a Picture Parameter Set (PPS), and/or a slice header, and/or a CTU, and/or a region, and/or a slice, and/or a CU.

Similar to the proposed EQT, all sub-blocks resulting from FT partitioning may have the same size; alternatively, the size of the different sub-blocks may be different.

In one example, K is equal to 6 or 8. Some examples are depicted in fig. 8A-8D, which show examples of FT segmentation (K6 in fig. 8C and 8D, or K8 in fig. 8A and 8B).

For TT, constraints that divide in the horizontal or vertical direction may be removed.

In one example, a generalized tt (gtt) split mode may be defined as a division of horizontal and vertical. An example is shown in fig. 9A and 9B.

The proposed method can be applied under certain conditions. In other words, when the condition(s) is not satisfied, there is no need to signal the segmentation type.

In some embodiments, the proposed method can be used to replace existing split tree types. Alternatively, and in addition, the proposed method can only be used as a substitute under certain conditions.

In one example, the condition may include a picture and/or a slice type; and/or block size; and/or a coding mode; and/or whether a block is located at a picture/slice boundary.

In one example, the proposed EQT may be viewed in the same manner as QT. In this case, when the indication that the partition tree type is QT, a more flag/indication of the detailed quadtree partition mode may be further signaled. Alternatively, EQT may be considered as an additional segmentation mode.

In one example, signaling of the partitioning methods of the EQT or FT or GTT may be conditional, e.g., one or some of the EQP/FT/GTT partitioning methods may not be used in some cases, and the bits corresponding to these partitioning methods are not signaled.

2.6 edge (Border) processing

In some embodiments, a method of boundary processing for multi-function video coding (VVC) is presented. A similar approach was also used for AVS-3.0.

Since the constrained quadtree boundary segmentation solution in VVC is not optimized, it has been proposed that boundary segmentation methods can use a regular block segmentation syntax to maintain continuity of the CABAC engine and match picture boundaries.

The multifunctional boundary partition obtains the following rules (both encoder and decoder):

for blocks located at boundaries, the syntax needs not to be changed using the exact same segmentation syntax of a normal block (non-boundary) (e.g., VTM-1.0, as in fig. 10).

If the partition pattern is not parsed for the boundary CU, Forced Boundary Partitioning (FBP) is used to match the picture boundaries. After forced boundary segmentation (non-single boundary segmentation), no further segmentation is performed. The forced boundary segmentation is described as follows:

if the size of the block is larger than the maximum allowed BT size, then the FBP is executed at the current forced partition level using forced QT;

otherwise, if the bottom-right sample point of the current CU is below the base picture boundary and the right boundary is not expanded, then performing FBP at the current forced partition level using the forcing level BT;

otherwise, if the lower right sample point of the current CU is located on the right side of the right picture boundary and not lower than the bottom boundary, then the FBP is executed at the current forced partition level using the forced vertical BT;

otherwise, if the lower right sample point of the current CU is located right of the right picture boundary and below the bottom boundary, then the FBP is performed at the current forced partition level using forced QT.

3. Problems and disadvantages of the current embodiments

There may be some redundancy between partitions of the EQT and QT/BT/TT. For example, for a block of size M × N, it may be divided into vertical BT three times (first into two M/2 × N partitions, then further vertical BT division is applied for each M/2 × N partition) to obtain four M/4 × N partitions. Also, to get four M/4 XN partitions, the block may choose to use EQT directly, as shown in FIG. 7B. There remains a problem how to efficiently signal the EQT.

4. Example techniques and embodiments

To address the above and other problems, several approaches have been proposed to address the case of EQT. Embodiments may include image or video encoders and decoders.

The techniques listed below should be considered as an explanation of the general concept. These examples should not be construed in a narrow manner. Furthermore, these embodiments may be combined in any manner.

Example 1: when the EQT is applied to a block, for each sub-block resulting from the EQT, each sub-block may be further divided into BT and/or EQT, and BT and EQT may share the same, by D_BTMaxThe maximum depth value represented (e.g., MaxBTDepth in section 2.3.1).

In one example, only two EQTs depicted in fig. 7A-7K may be allowed. Two allowed EQT modes are depicted in fig. 11A and 11B, and fig. 11A and 11B show examples of allowed EQT modes that may be further divided into EQT or BT. For example, one allowed EQT pattern may include a top partition of full width and quarter height, followed by two side-by-side partitions of half width and half height of the block, followed by a bottom partition of full width and quarter height of the block (e.g., fig. 11A). Another allowable section includes a left portion of full height and quarter width, followed by two half width and half height sections stacked vertically above each other, followed by a right section of full height and quarter width (e.g., fig. 11B). It should be understood that in one aspect, each partition has an equal area.

Similarly, when BT is applied to a certain block, for each sub-block resulting from BT, each sub-block may be further divided into BT and/or EQT, and BT and EQT may share the same maximum depth value.

EQT and BT may use different depth increment processes. For example, when each block can be assigned a channel D_BTDepth value of representation (D)_BTMay start at 0). If a block is divided by EQT (depth value equal to D)_BT) Then the depth value of each sub-block is set to D_BT+2。

Whenever the associated depth of a block is less than D_BTMaxIt may be further divided into EQT or BT.

In some embodiments, the EQT allowed maximum depth value may be set to the sum of the QT allowed maximum depth value and the BT allowed maximum depth value.

Example 2: when an EQT is allowed to encode a slice/picture/sequence, it may share the same maximum allowed binary tree root node size (e.g., MaxBTSize in section 2.3.1) used to encode the same video data unit. Alternatively, EQT may use a different maximum allowed root node size than BT.

In one example, the maximum EQT size is set to mxn, e.g., M-N-64 or 32. In some embodiments, the maximum allowed root node size for the EQT may be signaled from the encoder to the decoder in the VPS/SPS/PPS/picture header/slice header/CTU.

Example 3: before signaling the direction (e.g., horizontal or vertical) of BT or EQT, one flag is signaled first to indicate whether it is BT or EQT, and another flag may be further signaled to indicate that it uses the horizontal or vertical partition direction.

In one example, the binarization of the segmentation is shown in FIG. 12. Table 1 shows an example of binary values for each binary (bin) index. It should be noted that this is equivalent to all "0" and "1" in the exchange table.

Table 1: examples of partitioning patterns

In some embodiments, the direction of BT or EQT is defined as parallel or perpendicular to the current partitioning direction.

In some embodiments, a flag may be signaled first to indicate whether QT or EQT or neither is used (non-EQT non-QT). The BT partition information may be further signaled if non-EQT non-QT is selected.

Example 4: the flag indicating whether EQT or BT is used may be context encoded, and the context depends on the depth information of both the current block and its neighboring blocks.

In one example, the neighboring blocks may be defined as blocks above and to the left with respect to the current block.

In one example, both the quadtree depth and the BT/EQT depth may be used for encoding of the flags.

Based on the depth information of each block, a variable D can be derived for each block_ctxFor example, it is set to (2 × QT depth + BT/EQT depth). In some embodiments, (2 x QT depth + BT/EQT depth) may be further quantized before being used for context selection.

The flag may be encoded using three contexts. In one example, a context index is defined as: ((upper square D)_ctx>D of the Current Block_ctx) Is there a 1:0) + ((D of the left side block)_ctx>D of the Current Block_ctx) Is there a 1:0). In some embodiments, when a neighboring block is not available, its associated D_ctxIs set to 0.

Example 5: in some embodiments, whether and how EQT partitioning is applied may depend on the width and height (denoted as W and H) of the block to be partitioned.

In one example, EQT partitioning is not allowed when W > -T1 and H > -T2, where T1 and T2 are predefined integers, e.g., T1-T2-128 or T1-T2-64. Alternatively, T1 and/or T2 may be signaled from the encoder to the decoder in VPS/SPS/PPS/picture header/slice header/CTU.

In one example, EQT partitioning is not allowed when W > -T1 or H > -T2, where T1 and T2 are predefined integers, e.g., T1-T2-128 or T1-T2-64. Alternatively, T1 and/or T2 may be signaled from the encoder to the decoder in VPS/SPS/PPS/picture header/slice header/CTU.

In one example, EQT partitioning is not allowed when W < ═ T1 and H < ═ T2, where T1 and T2 are predefined integers, e.g., T1 ═ T2 ═ 8 or T1 ═ T2 ═ 16. Alternatively, T1 and/or T2 may be signaled from the encoder to the decoder in VPS/SPS/PPS/picture header/slice header/CTU.

In one example, EQT partitioning is not allowed when W < ═ T1 or H < ═ T2, where T1 and T2 are predefined integers, e.g., T1 ═ T2 ═ 8 or T1 ═ T2 ═ 16. Alternatively, T1 and/or T2 may be signaled from the encoder to the decoder in VPS/SPS/PPS/picture header/slice header/CTU.

In one example, the horizontal EQT as shown in fig. 11A is not allowed when W > ═ T, where T is a predefined integer, e.g., T-128 or T-64. Alternatively, T may be signaled from the encoder to the decoder in VPS/SPS/PPS/picture header/slice header/CTU.

In one example, the horizontal EQT as shown in fig. 11A is not allowed when H > ═ T, where T is a predefined integer, e.g., T-128 or T-64. Alternatively, T may be signaled from the encoder to the decoder in VPS/SPS/PPS/picture header/slice header/CTU.

In one example, the horizontal EQT as shown in fig. 11A is not allowed when W < ═ T, where T is a predefined integer, e.g., T ═ 8 or T ═ 16. Alternatively, T may be signaled from the encoder to the decoder in VPS/SPS/PPS/picture header/slice header/CTU.

In one example, the horizontal EQT as shown in fig. 11A is not allowed when H < ═ T, where T is a predefined integer, e.g., T ═ 8 or T ═ 16. Alternatively, T may be signaled from the encoder to the decoder in VPS/SPS/PPS/picture header/slice header/CTU.

In one example, a vertical EQT as shown in fig. 11B is not allowed when W > ═ T, where T is a predefined integer, e.g., T-128 or T-64. Alternatively, T may be signaled from the encoder to the decoder in VPS/SPS/PPS/picture header/slice header/CTU.

In one example, a vertical EQT as shown in fig. 11B is not allowed when H > ═ T, where T is a predefined integer, e.g., T-128 or T-64. Alternatively, T may be signaled from the encoder to the decoder in VPS/SPS/PPS/picture header/slice header/CTU.

In one example, a vertical EQT as shown in fig. 11B is not allowed when W < ═ T, where T is a predefined integer, e.g., T ═ 8 or T ═ 16. Alternatively, T may be signaled from the encoder to the decoder in VPS/SPS/PPS/picture header/slice header/CTU.

In one example, a vertical EQT as shown in fig. 11B is not allowed when H < ═ T, where T is a predefined integer, e.g., T ═ 8 or T ═ 16. Alternatively, T may be signaled from the encoder to the decoder in VPS/SPS/PPS/picture header/slice header/CTU.

In one example, when the width or height of any of the four sub-blocks resulting from EQT partitioning is equal to K and the K × K transform is not supported/defined in the codec, EQT partitioning is not allowed.

In some embodiments, whether and how EQT partitioning is applied may depend on the width and/or height (denoted as W and H) of the sub-blocks resulting from EQT partitioning of a block.

Example 6: in some embodiments, whether and how EQT partitioning is applied may depend on the location of the block to be partitioned.

In one example, whether and how EQT partitioning is applied may depend on whether the current block to be partitioned is located at an edge of a picture. In one example, assume that (x, y) is the coordinate of the top-left position of the current block, (PW, PH) is the width and height of the picture, and (W, H) is the width and height of the block with the current QT depth and BT/EQT depth. When y + H > PH, the current block is located at the bottom edge; when x + W > PW, the current block is located at the right edge; when y + H > PH and x + W > PW, the current block is located at the bottom right corner edge.

In one example, EQT partitioning is not allowed when the current block is at the bottom edge.

In one example, when the current block is located at the right edge, EQT partitioning is not allowed.

In one example, EQT partitioning is not allowed when the current block is located at the bottom right corner edge.

In one example, when the current block is at the bottom edge, the horizontal EQT as shown in fig. 11A is not allowed.

In one example, when the current block is located at the right edge, the horizontal EQT as shown in fig. 11A is not allowed.

In one example, when the current block is located at the lower right corner edge, the horizontal EQT as shown in fig. 11A is not allowed.

In one example, when the current block is at the bottom edge, the vertical EQT as shown in fig. 11B is not allowed.

In one example, when the current block is located at the right edge, the vertical EQT as shown in fig. 11B is not allowed.

In one example, when the current block is located at the lower right corner edge, the vertical EQT as shown in fig. 11B is not allowed.

In one example, when the current block is at the bottom edge, horizontal EQT and horizontal BT may be allowed.

In one example, when the current block is located at the right edge, vertical EQT and vertical BT may be allowed.

Example 7: the following may apply when the use of one or several EQTs is not allowed.

In one example, the parsing process depends on whether one or several EQTs are not allowed. If one or several EQTs are not allowed, the corresponding syntax elements related to the EQTs are not signaled.

In another example, the parsing process does not depend on whether one or several EQTs are not allowed. Whether or not the use of one or several EQTs is allowed, the corresponding syntax elements related to the EQTs are signaled.

In one example, if one or more EQTs are not allowed, a consistency encoder does not signal the one or more EQTs.

In one example, when the consistency decoder parses out the EQT partition but such an EQT is not allowed, it may interpret the EQT as some other type of partition, such as QT, BT, or no partition.

Example 8: the maximum EQT depth may depend on attributes associated with the current block and/or the current picture. In one example, the maximum EQT depth may depend on a distance between the current Picture and the reference Picture, e.g., a Picture Order Count (POC) difference.

In one example, the maximum EQT depth may depend on the temporal layer identifier of the current picture.

In one example, the maximum EQT depth may depend on whether the current picture is referenced by other pictures.

In one example, the maximum EQT depth may depend on the quantization parameter(s).

Example 9: Rate-Distortion Optimization (RDO) is a method to improve video quality in video compression. The method optimizes the amount of distortion (video quality loss) for the amount of data needed to encode the video. Rate-distortion cost estimation is very useful for many h.264/Advanced Video Coding (AVC) applications, including rate-distortion optimization (RDO) for mode decision and rate control. In some embodiments, at a certain BT/EQT depth, when the best mode for the current block and its neighboring blocks is both skip mode, no further check of the rate-distortion cost calculation is needed for further partitioning.

In one example, when the best mode of the current block and its neighboring blocks is skip mode or Merge mode, there is no need to further check the rate-distortion cost calculation for further partitioning.

In one example, if the best mode of the parent block is the skip mode, then no further check of the rate-distortion cost calculation is needed for further partitioning.

Example 10: in some embodiments, the average EQT depth of EQT divided blocks of previously encoded pictures/slices is recorded. When encoding the current video unit, there is no need to further examine the rate-distortion cost calculation for the EQT depth, which is larger than the recorded average depth.

In one example, an average EQT value may be recorded for each time domain layer. In this case, it only uses the recorded average of the same temporal layer for each video data to be encoded.

In one example, only the average EQT value for the first time domain layer is recorded. In this case, it always uses the recording average of the first time domain layer for each video data to be encoded.

Example 11: in some embodiments, the average size of eqtt divided blocks of previously encoded pictures/slices is recorded. When encoding the current video unit, there is no need to further check the rate-distortion cost calculation for smaller block sizes compared to the recorded block sizes.

In one example, the average EQT block size may be recorded for each time domain layer. In this case, it only uses the recorded average of the same temporal layer for each video data to be encoded.

In one example, only the average EQT block size of the first time domain layer is recorded. In this case, it always uses the recording average of the first time domain layer for each video data to be encoded.

Example 12: in some embodiments, whether to check the EQT partition that has not been checked may depend on the depth of the QT/BT/EQT partition that has been checked for the current block.

In some embodiments, whether to skip checking the EQT partition may depend on the average QT/BT partition depth of the current block;

estimating an average split depth after checking the BT level split;

after checking the BT vertical split, estimating an average split depth;

after checking the BT horizontal and vertical partitions, estimating an average partition depth;

in one example, for estimation, block depths are collected at some specific locations coordinated with EQT partitioning characteristics and then averaged. One example with 7 specific locations is depicted in fig. 17. In fig. 17, all horizontal lines represent quarter divisions of a block in the horizontal direction, and vertical lines depict quarter divisions of the block in the vertical direction.

In one example, the threshold is calculated as a function of the estimated average depth of division and the current depth of division.

Additionally, in some embodiments, a table of thresholds is applied to find the threshold. The estimated average partition depth and the current partition depth are used as keys to retrieve corresponding thresholds stored in the table.

When the average partition depth is less than the threshold, there is no need to further check the EQT partition.

In one example, the above method may be applicable to certain slice/picture types, such as I-slice or I-picture.

5. Examples of the embodiments

Grammatical changes over existing designs are shown in bold.

Examples of semantics

eqt_split_flag

-a flag to indicate whether EQT is enabled or disabled for a block.

eqt_split_dir

A flag to indicate whether to use horizontal EQT or vertical EQT. Fig. 13A and 13B show examples of quadtree splitting of horizontal EQT splitting and vertical EQT splitting.

Fig. 14 is a block diagram illustrating an example of an architecture of a computer system or other control device 1400, which computer system or other control device 1400 may be used to implement various portions of the techniques of this disclosure. In fig. 14, computer system 1400 includes one or more processors 1405 and memory 1410 connected via an interconnect 1425. Interconnect 1425 may represent any one or more separate physical buses, point-to-point connections, or both, connected by appropriate bridges, adapters, or controllers. Thus, interconnect 1425 may comprise, for example, a System Bus, a Peripheral Component Interconnect (PCI) Bus, an ISA (Industrial Standard Architecture) Bus, a Small Computer System Interface (SCSI) Bus, a Universal Serial Bus (USB), an IIC (I2C) Bus, or an IEEE (institute of Electrical and Electronics Engineers) Standard Bus 674 (sometimes referred to as a "firewire").

Processor(s) 1405 may include a Central Processing Unit (CPU) to control, for example, the overall operation of the host computer. In certain embodiments, processor(s) 1405 accomplish this by running software or firmware stored in memory 1410. The Processor(s) 1405 may be or may include one or more Programmable general purpose or special purpose microprocessors, Digital Signal Processors (DSPs), Programmable controllers, Application Specific Integrated Circuits (ASICs), Programmable Logic Devices (PLDs), or the like, or a combination of such devices.

The memory 1410 may be or include the main memory of a computer system. Memory 1410 represents any suitable form of Random Access Memory (RAM), Read-Only Memory (ROM), flash Memory, etc., or a combination of these devices. When used, memory 1410 may contain, among other things, a set of machine instructions that, when executed by processor 1405, cause processor 1405 to perform operations to implement embodiments of the techniques of this disclosure.

And (optional) network adapter 1415 is also connected to processor(s) 1405 via interconnect 1425. Network adapter 1415 provides computer system 1400 with the ability to communicate with remote devices (such as storage clients and/or other storage servers) and may be, for example, an ethernet adapter or a fibre channel adapter.

Fig. 15 shows a block diagram of an exemplary embodiment of a device 1500 that may be used to implement various portions of the techniques of this disclosure. The mobile device 1500 may be a laptop, smartphone, tablet, camcorder, or other type of device capable of processing video. The mobile device 1500 includes a processor or controller 1501 that processes data, and a memory 1502 in communication with the processor 1501 for storing and/or buffering data. For example, processor 1501 may include a Central Processing Unit (CPU) or a MicroController Unit (MCU). In some embodiments, processor 1501 may include a Field-Programmable Gate-Array (FPGA). In some embodiments, mobile device 1500 includes or communicates with a Graphics Processing Unit (GPU), a Video Processing Unit (VPU), and/or a wireless communication Unit for various visual and/or communication data Processing functions of the smartphone device. For example, the memory 1502 may include and store processor executable code that, when executed by the processor 1501, configures the mobile device 1500 to perform various operations, such as receiving information, commands, and/or data, processing information and data, and transmitting or providing processed information/data to another device, such as an actuator or external display. To support various functions of the mobile device 1500, the memory 1502 may store information and data, such as instructions, software, values, images, and other data that are processed or referenced by the processor 1501. For example, various types of Random Access Memory (RAM) devices, read only memory (RAM) devices, flash memory devices, and other suitable storage media may be used to implement the storage functions of the memory 1502. In some implementations, the mobile device 1500 includes an input/output (I/O) unit 1503 to interface the processor 1501 and/or the memory 1502 to other modules, units, or devices. For example, the I/O unit 1503 may interface the processor 1501 with the memory 1502 to utilize various types of wireless interfaces compatible with typical data communication standards, such as between one or more computers and user devices in the cloud, for example. In some implementations, the mobile device 1500 can interface with other devices via the I/O unit 1503 using a wired connection. The mobile device 1500 may also interface with other external interfaces, such as a data storage and/or visual or audio display device 1504 to retrieve and transmit data and information which may be processed by a processor, stored in a memory, or presented on the display device 1504 or an output unit of an external device. For example, the display device 1504 may display video frames in accordance with the disclosed techniques.

Fig. 16 is a flow chart of a method 1600 of visual media processing. The method 1600 includes performing (1602) a conversion between a current block of visual media data and a corresponding bitstream representation of the block using a rule for an Extended Quadtree (EQT) partitioning process, wherein the EQT partitioning process includes partitioning a given block into exactly four sub-blocks, wherein a size of at least one of the sub-blocks is different from half a width of the given block multiplied by half a height of the given block, and wherein the rule specifies that, where the rule is for partitioning the current block, each sub-block is further partitioned into a Binary Tree (BT) partition or another EQT partition, and both the BT partition and the other EQT partition have depths that satisfy a predefined relationship.

Another method of visual media processing includes performing a conversion between a current block of visual media data and a corresponding bitstream representation of the block using a rule for using an Extended Quadtree (EQT) partitioning process, wherein the EQT partitioning process includes partitioning a given block into exactly four sub-blocks, wherein at least one sub-block has a size that is different from half the width of the given block multiplied by half the height of the given block, and wherein the rule allows the EQT partitioning process to be performed on the current block based on the width or height of the current block.

Another method of visual media processing includes performing a conversion between a current block of visual media data and a corresponding bitstream representation of the block using a rule for using an Extended Quadtree (EQT) partitioning process, wherein the EQT partitioning process includes partitioning a given block into exactly four sub-blocks, wherein at least one sub-block has a size that is different from half the width of the given block multiplied by half the height of the given block, and wherein the rule allows the EQT partitioning process for the current block based on the position of the current block.

Another method of visual media processing includes performing a conversion between a current block of visual media data and a corresponding bitstream representation of the block using a rule for using an Extended Quadtree (EQT) partitioning process, wherein the EQT partitioning process includes partitioning a given block into exactly four sub-blocks, wherein a size of at least one of the sub-blocks is different from half a width of the given block multiplied by half a height of the given block, and wherein the rule allows a maximum depth of the EQT partitioning process to depend on a distance between a current picture of the current block and a reference picture of the current block or a quantization parameter of the current block or a temporal layer identifier of the current picture.

In the disclosed embodiments, the bitstream representation of the current block of video may include bits of a bitstream (compressed representation of video), which may be discontinuous and may depend on header information, as is known in the art of video compression. Furthermore, the current block may include samples representing one or more of luminance and chrominance components, or rotated variants thereof (e.g., YCrCb or YUM, etc.).

The following clause list describes some of the embodiments and techniques described below.

1. A method of visual media processing, comprising: performing a conversion between a current block of visual media data and a corresponding bitstream representation of the block using a rule for using an Extended Quadtree (EQT) partitioning process, wherein the EQT partitioning process comprises partitioning a given block into exactly four sub-blocks, wherein at least one sub-block has a size that is different from half the width of the given block multiplied by half the height of the given block; and wherein the rule specifies that, in case the rule is used for partitioning the current block, each sub-block is further divided into a Binary Tree (BT) partition or another EQT partition, and both the BT partition and the other EQT partition have depths that satisfy a predefined relationship.

2. The method of clause 1, wherein the converting comprises generating the current block from a bitstream representation.

3. The method of clause 1, wherein the converting comprises generating a bitstream representation from the current block.

4. The method of any of clauses 1-3, wherein the EQT segmentation process segments the current block into one of only two possible segmentations.

5. The method of clause 4, wherein the current block includes M × N pixels, wherein M and N are integers, and wherein the two possible partitions include a first partition including an M × N/4 top portion, followed by two side-by-side M/2 × N/2 middle portions, followed by an M × N/4 bottom portion, or a second partition including an M/4 × N left side portion, two M/2 × N/2 middle portions, and one M/4 × N right side portion.

6. The method of clause 1, wherein the predefined relationship specifies that the BT partition and the EQT partition have different values, or that the predefined relationship specifies that the depth of the EQT partition is equal to the sum of the depths of the BT partition and the Quadtree (QT) partition.

7. The method of clause 1, wherein the predefined relationship specifies that the BT partition and the EQT partition have the same value.

8. The method of clause 1, wherein the rule specifies that, where the current block is partitioned using BT, each partition is partitioned using one of BT partition or EQT partition.

9. The method of clause 1, wherein the rule specifies that, in the case of partitioning the current block using EQT, the depth value of each resulting sub-block is two more than the depth value of the current block.

10. The method according to any of clauses 1 to 9, wherein the rule further specifies that the same allowed root node size as used for binary tree splitting is used for all blocks in a picture slice or picture sequence.

11. The method according to any of clauses 1 to 9, wherein the rule further specifies that a different allowed root node size is used for all blocks in a picture slice or picture sequence than for binary tree splitting.

12. The method according to any of clauses 1 to 9, wherein the bitstream representation is configured to indicate a maximum allowed root node size for the EQT splitting process at a video level, a sequence level, a picture header level, a slice group header level, a slice level, or a coding tree unit level.

13. The method according to any one of clauses 1 to 11, wherein the bitstream representation is configured to include a first field indicating a partition of the current block between EQT partition or BT partition and a second field indicating a partition direction of the current block between a horizontal direction and a vertical direction.

14. The method according to any of clauses 11 to 13, wherein the dividing direction is relative to the dividing direction of the previous block.

15. The method according to any of clauses 11 to 14, wherein the first field or the second field is context-coded depending on depth information of one or more neighboring blocks or depth information of the current block.

16. The method of clause 15, wherein the neighboring block is an upper block or a left block with respect to the current block.

17. The method according to any one of clauses 15 and 16, wherein the quantized value of the depth information of the one or more neighboring blocks or the depth information of the current block is used for context coding.

18. A method of visual media processing, comprising: performing a conversion between a current block of visual media data and a corresponding bitstream representation of the block using a rule for using an Extended Quadtree (EQT) partitioning process, wherein the EQT partitioning process comprises partitioning a given block into exactly four sub-blocks, wherein at least one sub-block has a size that is different from half the width of the given block multiplied by half the height of the given block; and wherein the rule allows an EQT segmentation process for the current block based on the width or height of the current block.

19. The method of clause 18, wherein the converting comprises generating the current block from a bitstream representation.

20. The method of clause 18, wherein the converting comprises generating a bitstream representation from the current block.

21. The method of any of clauses 18 to 20, wherein the rule does not allow EQT segmentation when the width is greater than or equal to T1 or the height is greater than or equal to T2, wherein T1 and T2 are integers.

22. The method of clause 21, wherein T1 and T2 are predefined.

23. The method of clause 21, wherein the bitstream representation is configured to carry an indication of T1 and T2.

24. The method of clause 23, wherein the indications of T1 and T2 are indicated at a video level or sequence level or picture level or slice header level or slice group header level or slice level or coding tree unit level.

25. The method of any of clauses 18 to 20, wherein the rule does not allow EQT segmentation when the width is less than or equal to T1 or the height is less than or equal to T2, wherein T1 and T2 are integers.

26. The method of any of clauses 18 to 20, wherein the rule does not allow EQT segmentation when the width is greater than or equal to the height.

27. The method of any of clauses 18 to 20, wherein the rule does not allow EQT segmentation when the width is less than the height.

28. A method of visual media processing, comprising: performing a conversion between a current block of visual media data and a corresponding bitstream representation of the block using a rule for using an Extended Quadtree (EQT) partitioning process, wherein the EQT partitioning process comprises partitioning a given block into exactly four sub-blocks, wherein at least one sub-block has a size that is different from half the width of the given block multiplied by half the height of the given block; and wherein the rule allows an EQT segmentation process to be performed on the current block based on the position of the current block.

29. The method of clause 28, wherein the converting comprises generating the current block from a bitstream representation.

30. The method of clause 28, wherein the converting comprises generating a bitstream representation from the current block.

31. The method according to any of clauses 28 to 30, wherein the rule does not allow EQT segmentation for the current block at the bottom edge of the video region.

32. The method according to any of clauses 28 to 30, wherein the rule does not allow EQT splitting for a current block located at the right edge of the video region.

33. The method according to any of clauses 28 to 30, wherein the rule does not allow EQT segmentation processing for a current block that is a corner block of the video region.

34. The method of clause 33, wherein the corner corresponds to the bottom right corner of the video region.

35. The method of clause 28, wherein the rule allows the use of horizontal EQT partitioning or horizontal binary tree partitioning for a current block located at the bottom edge of the video region.

36. The method of clause 28, wherein the rule allows using horizontal EQT partitioning or horizontal binary tree partitioning for a current block located at the right edge of the video region.

37. The method of any of clauses 1 to 34, wherein in the event that the rule does not allow the EQT segmentation process for the current block, the corresponding syntax element is omitted from the bitstream representation.

38. The method of any of clauses 1 to 33, wherein in the event that the rule does not allow the EQT segmentation process for the current block, a corresponding syntax element having a default value is included in the bitstream representation.

39. A method of visual media processing, comprising: performing a conversion between a current block of visual media data and a corresponding bitstream representation of the block using a rule for using an Extended Quadtree (EQT) partitioning process, wherein the EQT partitioning process comprises partitioning a given block into exactly four sub-blocks, wherein at least one sub-block has a size that is different from half the width of the given block multiplied by half the height of the given block; and wherein the rule specifies either (1) a maximum depth of the EQT splitting process depends on a distance between a current picture of the current block and a reference picture of the current block, or a quantization parameter of the current block or a temporal layer identifier of the current picture, or (2) a split depth of the current block using the visual media data when deciding to check for additional EQT splits.

40. The method of clause 39, wherein the converting comprises generating the current block from a bitstream representation.

41. The method of clause 39, wherein the converting comprises generating a bitstream representation from the current block.

42. The method according to any of clauses 1 to 40, wherein the rule specifies that the EQT segmentation process is disabled in case the current block and the neighboring blocks are encoded using the skip mode, or in case the coded depth of the current block is higher than the average coded depth of the previously encoded blocks.

43. The method of clause 42, wherein the average coded depth is calculated on a previously coded picture or slice in which the current block is located.

44. The method of clause 42, wherein the average coded depth is calculated for the temporal layer in which the current block is located.

45. The method of clause 39, wherein the rule specifies that the checking of the EQT partition is skipped if the average partition depth of the current block satisfies the condition.

46. The method of clause 45, wherein the average split depth is estimated after checking the binary tree horizontal split.

47. The method of any of clauses 45-46, wherein the average split depth is estimated after checking the binary tree vertical split.

48. The method according to any of clauses 45-47, wherein the average partition depth is determined by the number of specific locations of pixels in the current block.

49. The method of clause 48, wherein the number is equal to 7.

50. The method of any of clauses 45-49, wherein the condition comprises comparing the average depth of division to a threshold.

51. A video processing apparatus comprising a processor configured to implement the method according to any one or more of clauses 1-50.

52. The apparatus of clause 51, wherein the apparatus is a video encoder.

53. The apparatus of clause 51, wherein the apparatus is a video decoder.

54. A computer readable medium comprising a program comprising code for a processor to perform the method according to any one or more of clauses 1 to 50.

With respect to the above listed terms and the list of techniques in section 4, the segmentation technique may be specified using a parameter set (picture or video parameter set) or pre-specified based on rules. Thus, the number of bits required to signal the partitioning of the blocks may be reduced. Similarly, due to the various rules specified in this document, the segmentation decision may also be simplified, allowing for a lower complexity implementation of the encoder or decoder.

Furthermore, the position dependency of the segmentation rule may be based on the video region in which the current block exists (e.g., clause 26). The video region may include a current block or a larger portion, such as a slice, or picture in which the current block exists.

Fig. 18 is a block diagram illustrating an example video processing system 1800 in which various techniques disclosed herein may be implemented. Various embodiments may include some or all of the components of system 1800. The system 1800 can include an input 1802 for receiving video content. The video content may be received in a raw or uncompressed format, e.g., 8 or 10 bit multi-component pixel values, or may be in a compressed or encoded format. Input 1802 may represent a network interface, a peripheral bus interface, or a storage interface. Examples of Network interfaces include wired interfaces such as ethernet, Passive Optical Network (PON), etc., and wireless interfaces such as Wi-Fi or cellular interfaces.

The system 1800 may include an encoding component 1804, the encoding component 1804 may implement various encoding or ciphering methods described in this document. The encoding component 1804 may reduce an average bit rate of the video from the input 1802 to the output of the encoding component 1804 to produce an encoded representation of the video. Thus, the encoding technique is sometimes referred to as a video compression or video transcoding technique. The output of the encoding component 1804 can be stored or transmitted via a connected communication, as shown by component 1806. Component 1808 may use a stored or transmitted bitstream (or encoded) representation of the video received at input 1802 for generating pixel values or displayable video that is sent to display interface 1810. The process of generating user-viewable video from a bitstream representation is sometimes referred to as video decompression. Further, while certain video processing operations are referred to as "encoding" operations or tools, it should be understood that the encoding tools or operations are used at the encoder and the corresponding decoding tools or operations will be performed by the decoder to reverse the results of the encoding.

Examples of a peripheral bus Interface or a display Interface may include a Universal Serial Bus (USB) or a High Definition Multimedia Interface (HDMI) or a displayport, among others. Examples of storage interfaces include SATA (Serial Advanced Technology Attachment), PCI, IDE interfaces, and the like. The techniques described in this document may be embodied in various electronic devices, such as mobile phones, laptops, smartphones, or other devices capable of performing digital data processing and/or video display.

Fig. 19 is a flowchart representation of a method 1900 for video processing according to the present disclosure. The method 1900 includes, at operation 1902, determining, based on a rule, whether an Extended Quadtree (EQT) partitioning process is applicable to a current block of a video for a transition between the current block and a bitstream representation of the video. The EQT partitioning process includes partitioning a given block into exactly four sub-blocks, where at least one sub-block has a size that is different from half the width of the given block multiplied by half the height of the given block. The rule specifies a maximum depth of the EQT segmentation process based on the attribute associated with the current block. The method 1900 includes, at operation 1904, performing the conversion based on the determination. The converting includes generating the current block from the bitstream representation. In some embodiments, the converting further comprises generating a bitstream representation from the current block.

In some embodiments, the attribute comprises a distance between a current picture of the current block and a reference picture of the current block. In some embodiments, the attribute comprises a difference in Picture Order Count (POC). In some embodiments, the attribute includes a temporal layer identifier of a current picture of the current block. In some embodiments, the attribute includes whether the current picture of the current block is referenced by other pictures of the video. In some embodiments, the attribute comprises a quantization parameter of the current block.

In some embodiments, the rule further specifies that, at a predefined coding depth of the current block, subsequent segmentation processes are disabled in case the current block and neighboring blocks are coded using skip mode. In some embodiments, the rule further specifies that subsequent segmentation processes are disabled if the current block and neighboring blocks are encoded using Merge mode. In some embodiments, the rule further specifies that subsequent partitioning processes are disabled if the parent block of the current block is encoded using skip mode.

In some embodiments, the rule further specifies that the EQT segmentation process is disabled if the coded depth of the current block is greater than the average coded depth of previously coded blocks. The average coded depth of the previously coded blocks may be calculated based on a binary tree splitting process, a quadtree splitting process, or an EQT splitting process. In some embodiments, a threshold value representing the relationship between the coded depth of the current block and the average coded depth of previously coded blocks is compared to a table of threshold values to determine whether to disable the EQT segmentation process. In some embodiments, the average coded depth is calculated over previously coded pictures, slices, or slices. In some embodiments, the average coded depth is calculated for the temporal layer in which the current block is located. In some embodiments, the average coded depth is calculated only for the first temporal layer of the video. In some embodiments, the average coded depth is calculated based on a binary tree horizontal partition. In some embodiments, the average coded depth is calculated based on a binary tree vertical partition. In some embodiments, the average coded depth is determined by the number of particular locations in the current block. In some embodiments, the number of specific locations is equal to 7.

In some embodiments, the rule further specifies that the EQT segmentation process is disabled if the size of the current block is less than the average size of previously encoded blocks. In some embodiments, the average size is calculated for a block of a previously encoded picture, slice, or slice. In some embodiments, the average size is calculated for a block of the temporal layer in which the current block is located. In some embodiments, the average size is calculated for blocks of only the first temporal layer of the video.

Some embodiments of the disclosed technology include making a decision or determination to enable a video processing tool or mode. In one example, when a video processing tool or mode is enabled, the encoder will use or implement the tool or mode in the processing of the video blocks, but the resulting bitstream is not necessarily modified based on the use of the tool or mode. That is, the conversion from a video block to a bitstream representation of the video will use the video processing tool or mode when the video processing tool or mode is enabled based on the decision or determination. In another example, when a video processing tool or mode is enabled, the decoder will process the bitstream knowing that the bitstream has been modified based on the video processing tool or mode. That is, the conversion from a bitstream representation of the video to video blocks will be performed using a video processing tool or mode that is enabled based on the decision or determination.

Some embodiments of the disclosed technology include making a decision or determination to disable a video processing tool or mode. In one example, when a video processing tool or mode is disabled, the encoder will not use that tool or mode in the conversion of video blocks to a bitstream representation of the video. In another example, when a video processing tool or mode is disabled, the decoder will process the bitstream knowing that the bitstream has not been modified using a video processing tool or mode that is enabled based on the decision or determination.

The disclosed and other embodiments, modules, and functional operations described in this document can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this document and their structural equivalents, or in combinations of one or more of them. The disclosed and other embodiments may be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer-readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term "data processing apparatus" encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.

A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described in this document can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not require such a device. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of specific embodiments that are specific to particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.

Only a few embodiments and examples are described and other implementations, enhancements and variations can be made based on what is described and illustrated in this patent document.

Claims

1. A method for video processing, comprising:

for a transition between a current block of a video and a bitstream representation of the video, determining whether an Extended Quadtree (EQT) partitioning process applies to the current block based on a rule, wherein the EQT partitioning process comprises partitioning a given block into exactly four sub-blocks, wherein at least one sub-block has a size that is different from half the width of the given block multiplied by half the height of the given block, and wherein the rule specifies a maximum depth of the EQT partitioning process based on an attribute associated with the current block; and

performing the conversion based on the determination.

2. The method of claim 1, wherein the converting comprises generating the current block from the bitstream representation.

3. The method of claim 1, wherein the converting comprises generating the bitstream representation from the current block.

4. The method of any of claims 1 to 3, wherein the attribute comprises a distance between a current picture of the current block and a reference picture of the current block.

5. The method of any of claims 1-4, wherein the attribute comprises a difference in Picture Order Count (POC).

6. The method of any of claims 1 to 3, wherein the attribute comprises a temporal layer identifier of a current picture of the current block.

7. The method of any of claims 1 to 3, wherein the attribute comprises whether a current picture of the current block is referenced by other pictures of the video.

8. The method of any of claims 1 to 3, wherein the attribute comprises a quantization parameter of the current block.

9. The method of any of claims 1 to 8, wherein the rule further specifies that, at a predefined coding depth of the current block, subsequent segmentation processes are disabled in case the current block and neighboring blocks are coded using skip mode.

10. The method of any of claims 1 to 8, wherein the rule further specifies that subsequent segmentation processes are disabled if the current and neighboring blocks are encoded using Merge mode.

11. The method of any of claims 1-8, wherein the rule further specifies that subsequent partitioning processes are disabled if a parent block of the current block is encoded using skip mode.

12. The method of any of claims 1 to 8 wherein the rule further specifies that the EQT partitioning process is disabled if the coded depth of the current block is greater than an average coded depth of previously coded blocks.

13. The method of claim 12, wherein the average coded depth of the previously encoded block is calculated based on a binary tree splitting process, a quadtree splitting process, or an EQT splitting process performed on the previously encoded block.

14. The method of claim 12 or 13 wherein a threshold value representing the relationship between the coded depth of the current block and the average coded depth of the previously coded block is compared to a table of threshold values to determine whether to disable the EQT segmentation process.

15. The method of any of claims 12 to 14, wherein the average coded depth is calculated over previously coded pictures, slices or slices.

16. The method of any of claims 12 to 15, wherein the average coded depth is calculated for a temporal layer in which the current block is located.

17. The method of any of claims 12-15, wherein the average coded depth is calculated only for a first temporal layer of the video.

18. The method of any of claims 12 to 17, wherein the average coded depth is calculated based on a binary tree horizontal partition.

19. The method of any of claims 12 to 18, wherein the average coded depth is calculated based on a binary tree vertical partition.

20. The method of any of claims 12 to 19, wherein the average coded depth is determined with the number of specific locations in the current block.

21. The method of claim 20, wherein the number of specific locations is equal to 7.

22. The method of any of claims 1 to 20 wherein the rule further specifies that the EQT partitioning process is disabled if the size of the current block is less than the average size of previously encoded blocks.

23. The method of claim 22, wherein the average size is calculated for a block of a previously encoded picture, slice, or slice.

24. The method of claim 22 or 23, wherein the average size is calculated for a block of a temporal layer in which the current block is located.

25. A method as defined in any one of claims 22 to 24, wherein the average size is calculated for blocks of a first temporal layer of the video only.

26. A video processing apparatus comprising a processor configured to implement the method of any of claims 1 to 25.

27. A computer readable medium comprising a program comprising code for a processor to perform the method according to any one of claims 1 to 25.