CN113841406A

CN113841406A - Method and apparatus for video coding and decoding using triangle partitioning

Info

Publication number: CN113841406A
Application number: CN202080037890.7A
Authority: CN
Inventors: 王祥林; 陈漪纹; 修晓宇; 马宗全; 朱弘正; 叶水明
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-05-20
Filing date: 2020-05-20
Publication date: 2021-12-24
Also published as: WO2020236991A1

Abstract

Methods and apparatus are provided for video coding and decoding. The method comprises the following steps: partitioning a video picture into a plurality of Coding Units (CUs), wherein at least one CU is further partitioned into two Prediction Units (PUs), the two PUs comprising at least one geometrically shaped PU; obtaining a first merge list comprising a plurality of candidates, each candidate comprising one or more motion vectors; obtaining a uni-directional predictive motion vector for each PU by selecting one or more motion vectors from the first merge list; and pruning the uni-directional predicted motion vector.

Description

Method and apparatus for video coding and decoding using triangle partitioning

Cross Reference to Related Applications

The present application claims priority from U.S. provisional application No. 62/850534 entitled "Video Coding Using Triangle Partition" filed on 20/5/2019 and U.S. provisional application No. 62/851630 entitled "Video Coding Using Triangle Partition" filed on 22/5/2019, both of which are incorporated by reference in their entirety for all purposes.

Technical Field

The present application relates generally to video coding and video compression, and in particular, but not exclusively, to a method and apparatus for motion compensated prediction using a triangle prediction unit (i.e., a special case of a geometrically partitioned prediction unit) in video coding.

Background

Digital video is supported by a variety of electronic devices, such as digital televisions, laptop or desktop computers, tablet computers, digital cameras, digital recording devices, digital media players, video game machines, smart phones, video teleconferencing devices, video streaming devices, and the like. Electronic devices transmit, receive, encode, decode, and/or store digital video data by implementing video compression/decompression. Digital video devices implement video codec techniques such as those described in standards defined by general video codec (VVC), Joint exploration test model (JEM), MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 part 10, Advanced Video Codec (AVC), ITU-T H.265/High Efficiency Video Codec (HEVC), and extensions of such standards.

Video codecs generally use prediction methods (e.g., inter prediction, intra prediction) that exploit redundancy present in a video image or sequence. An important goal of video codec techniques is to compress video data into a form that uses a lower bit rate while avoiding or minimizing degradation of video quality. As evolving video services become available, coding techniques with better coding and decoding efficiency are needed. The block partitioning scheme in each standard is also evolving.

Video compression typically includes performing spatial (intra) prediction and/or temporal (inter) prediction to reduce or remove redundancy inherent in the video data. For block-based video coding, a video frame is partitioned into one or more slices, each slice having a plurality of video blocks, which may also be referred to as Coding Tree Units (CTUs). Each CTU may contain one Codec Unit (CU) or be recursively split into smaller CUs until a predefined minimum CU size is reached. Each CU (also known as a leaf CU) contains one or more Transform Units (TUs), and each CU also contains one or more Prediction Units (PUs). Each CU may be coded in intra mode, inter mode, or IBC mode. Video blocks in an intra-coded (I) slice of a video frame are encoded using spatial prediction with respect to reference samples in neighboring blocks within the same video frame. Video blocks in an inter-coded (P or B) slice of a video frame may use spatial prediction with respect to reference samples in neighboring blocks within the same video frame, or temporal prediction with respect to reference samples in other previous and/or future reference video frames.

A prediction block for a current video block to be coded is generated based on spatial or temporal prediction of a previously coded reference block (e.g., a neighboring block). The process of finding the reference block may be accomplished by a block matching algorithm. Residual data representing pixel differences between the current block to be coded and the prediction block is called a residual block or prediction error. The inter-coded block is coded according to a motion vector and a residual block, the motion vector pointing to a reference block forming a prediction block in a reference frame. The process of determining motion vectors is commonly referred to as motion estimation. The intra coded block is coded according to an intra prediction mode and a residual block. For further compression, the residual block is transformed from the pixel domain to a transform domain (e.g., frequency domain), thereby generating residual transform coefficients, which may then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, may be scanned to produce one-dimensional vectors of transform coefficients, and then entropy encoded into a video bitstream to achieve even more compression.

The encoded video bitstream is then stored in a computer readable storage medium (e.g., flash memory) to be accessed by another electronic device having digital video capabilities or transmitted directly to the electronic device, either wired or wirelessly. The electronic device then performs video decompression (which is a process inverse to the video compression described above) by, for example, parsing the encoded video bitstream to obtain syntax elements from the bitstream, and reconstructing the digital video data from the encoded video bitstream to its original format based at least in part on the syntax elements obtained from the bitstream, and presents the reconstructed digital video data on a display of the electronic device.

As digital video quality changes from high definition to 4K × 2K or even 8K × 4K, the amount of video data to be encoded/decoded grows exponentially. This is a persistent challenge in terms of how video data can be encoded/decoded more efficiently while maintaining the image quality of the decoded video data.

In the joint video experts group (jfet) conference, jfet defines a first draft of universal video codec (VVC) and a VVC test model 1 (VTM 1) encoding method. A quadtree with nested multi-type trees is determined as an initial new codec feature of the VVC, including using binary and ternary split codec block structures. Since then, during the jfet conference, reference software VTMs have been developed that implement the encoding method and the sketched VVC decoding process.

Disclosure of Invention

In general, this disclosure describes examples of techniques related to motion compensated prediction using geometrically shaped prediction units in video coding.

According to a first aspect of the present disclosure, there is provided a method for video coding and decoding using geometric partitioning, comprising: partitioning a video picture into a plurality of Coding Units (CUs), wherein at least one CU is further partitioned into two Prediction Units (PUs), the two PUs comprising at least one geometrically shaped PU; constructing a first merge list comprising a plurality of candidates based on a merge list construction process for conventional merge prediction, wherein each candidate of the plurality of candidates is a motion vector comprising a list 0 motion vector or a list 1 motion vector or both; locating a first candidate for the first PU according to the first merge candidate index; locating a second candidate for the second PU based on the second merge candidate index; by selecting the list X of first candidates₁Motion vector, obtaining a first uni-directional predicted motion vector MV for a first PU₀Wherein X is₁Corresponding to the first merging candidate index and taking a value of 0 or 1; by selecting the list X of second candidates₂Motion vector, obtaining a second uni-directional predicted motion vector MV for a second PU₁Wherein X is₂Corresponding to the second merging candidate index and taking a value of 0 or 1; and in response to determining the MV₀And MV₁Same, prune the first uni-directional predicted motion vector MV₀And a second uni-directional predicted motion vector MV₁。

According to a second aspect of the present disclosure, there is provided an apparatus for video coding and decoding using geometric partitioning, comprising: one or more processors; and a memory configured to store instructions executable by the one or more processors; wherein the one or more processors, when executing the instructions, are configured to: partitioning a video picture into a plurality of Coding Units (CUs), wherein at least one CU is further partitioned into two Prediction Units (PUs), the two PUs comprising at least one geometrically shaped PU; constructing a first merge list comprising a plurality of candidates based on a merge list construction process for conventional merge prediction, wherein each candidate of the plurality of candidates is a motion vector comprising a list 0 motion vector or a list 1 motion vector or both; locating a first candidate for the first PU according to the first merge candidate index; locating a second candidate for the second PU based on the second merge candidate index; by selecting the list X of first candidates₁Motion vector, obtaining a first uni-directional predicted motion vector MV for a first PU₀Wherein X is₁Corresponding to the first merging candidate index and taking a value of 0 or 1; by selecting the list X of second candidates₂Motion vector, obtaining a second uni-directional predicted motion vector MV for a second PU₁Wherein X is₂Corresponding to the second merging candidate index and taking a value of 0 or 1; and in response to determining the MV₀And MV₁Same, prune the first uni-directional predicted motion vector MV₀And a second uni-directional predicted motion vector MV₁。

According to a third aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium for video coding with geometric partitioning, storing computer-executable instructions that, when executed by one or more computer processors, cause the one or more computer processors to perform acts comprising: partitioning a video picture into a plurality of Coding Units (CUs), wherein at least one CU is further partitioned into two Prediction Units (PUs), the two PUs comprising at least one geometrically shaped PU; constructing a merge list including multiple based on a merge list construction process for conventional merge predictionA first merged list of candidates, wherein each candidate of the plurality of candidates is a motion vector comprising a list 0 motion vector or a list 1 motion vector or both; locating a first candidate for the first PU according to the first merge candidate index; locating a second candidate for the second PU based on the second merge candidate index; by selecting the list X of first candidates₁Motion vector, obtaining a first uni-directional predicted motion vector MV for a first PU₀Wherein X is₁Corresponding to the first merging candidate index and taking a value of 0 or 1; by selecting the list X of second candidates₂Motion vector, obtaining a second uni-directional predicted motion vector MV for a second PU₁Wherein X is₂Corresponding to the second merging candidate index and taking a value of 0 or 1; and in response to determining the MV₀And MV₁Same, prune the first uni-directional predicted motion vector MV₀And a second uni-directional predicted motion vector MV₁。

Drawings

A more particular description of examples of the disclosure will be rendered by reference to specific examples thereof which are illustrated in the appended drawings. In view of the fact that these drawings depict only some examples and are therefore not to be considered limiting in scope, these examples will be described and explained with additional specificity and detail through the use of the accompanying drawings.

Fig. 1 is a block diagram illustrating an example video encoder in accordance with some embodiments of the present disclosure.

Fig. 2 is a block diagram illustrating an example video decoder in accordance with some embodiments of the present disclosure.

Fig. 3 is a schematic diagram illustrating a quadtree plus binary tree (QTBT) structure, according to some embodiments of the present disclosure.

Fig. 4 is a schematic diagram illustrating an example of pictures divided into CTUs according to some embodiments of the present disclosure.

Fig. 5 is a schematic diagram illustrating a multi-type tree splitting pattern, according to some embodiments of the present disclosure.

Fig. 6 is a schematic diagram illustrating locations of neighboring blocks according to some embodiments of the present disclosure.

Fig. 7 is a schematic diagram illustrating motion vector scaling for temporal merging candidates according to some embodiments of the present disclosure.

Fig. 8 is a schematic diagram illustrating candidate locations for temporal merging candidates according to some embodiments of the present disclosure.

Fig. 9 is a schematic diagram illustrating splitting a CU into triangle prediction units, according to some embodiments of the present disclosure.

Fig. 10 is a schematic diagram illustrating one example of uni-directional predictive Motion Vector (MV) selection for a triangle partition mode according to some embodiments of the present disclosure.

Fig. 11 is a schematic diagram illustrating one example of Motion Vector (MV) filling in triangle prediction mode according to some embodiments of the present disclosure.

Fig. 12A is a schematic diagram illustrating one example of simplified motion vector padding for triangle prediction modes, according to some embodiments of the present disclosure.

Fig. 12B is a schematic diagram illustrating another example of simplified motion vector padding for triangle prediction modes, according to some embodiments of the present disclosure.

Fig. 12C is a schematic diagram illustrating a third example of simplified motion vector padding for triangle prediction modes, according to some embodiments of the present disclosure.

Fig. 12D is a schematic diagram illustrating a fourth example of simplified motion vector padding for triangle prediction modes, according to some embodiments of the present disclosure.

Fig. 13A is a schematic diagram illustrating another example of simplified motion vector padding for triangle prediction modes, according to some embodiments of the present disclosure.

Fig. 13B is a schematic diagram illustrating another example of simplified motion vector padding for triangle prediction modes, according to some embodiments of the present disclosure.

Fig. 14 is a block diagram illustrating an example apparatus for video codec according to some embodiments of the present disclosure.

Fig. 15 is a flow diagram illustrating an example process for video coding for motion compensated prediction using geometric prediction units, according to some embodiments of the present disclosure.

Detailed Description

Reference will now be made in detail to specific embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous non-limiting specific details are set forth to provide an understanding of the subject matter presented herein. It will be apparent to those of ordinary skill in the art that various alternatives may be used. For example, it will be apparent to one of ordinary skill in the art that the subject matter presented herein may be implemented on many types of electronic devices having digital video capabilities.

Reference throughout this specification to "one embodiment," "an example," "some embodiments," "some examples," or similar language means that a particular feature, structure, or characteristic described is included in at least one embodiment or example. Features, structures, elements, or characteristics described in connection with one or some embodiments may be applicable to other embodiments as well, unless expressly stated otherwise.

Throughout this disclosure, the terms "first," "second," "third," and the like are used merely as nomenclature for referring to related elements (e.g., devices, components, ingredients, steps, etc.), and do not imply any spatial or temporal order unless explicitly stated otherwise. For example, "first device" and "second device" may refer to two separately formed devices, or two parts, components, or operating states of the same device, and may be arbitrarily named.

As used herein, the term "if" or "when … …" may be understood to mean "once … …" or "in response," depending on the context. These terms, if they appear in the claims, may not indicate that the associated limitation or feature is conditional or optional.

The terms "module," "sub-module," "circuit," "sub-circuit," "circuitry," "sub-circuitry," "unit" or "sub-unit" may include a memory (shared, dedicated, or group) that stores code or instructions that may be executed by one or more processors. A module may comprise one or more circuits, with or without stored code or instructions. A module or circuit may include one or more components connected directly or indirectly. These components may or may not be physically attached to or located adjacent to each other.

A unit or module may be implemented entirely by software, entirely by hardware, or by a combination of hardware and software. In a purely software embodiment, for example, a unit or module may comprise functionally related code blocks or software components linked together, directly or indirectly, to perform a specified function.

Fig. 1 shows a block diagram illustrating an exemplary block-based hybrid video encoder 100, which encoder 100 may be used in connection with many video codec standards that use block-based processing. In encoder 100, a video frame is partitioned into multiple video blocks for processing. For each given video block, a prediction is formed based on either an inter prediction method or an intra prediction method. In inter-frame prediction, one or more prediction values are formed by motion estimation and motion compensation based on pixels from previously reconstructed frames. In intra prediction, a prediction value is formed based on reconstructed pixels in a current frame. Through the mode decision, the best predictor can be selected to predict the current block.

The prediction residual, representing the difference between the current video block and its prediction value, is sent to transform circuitry 102. The transform coefficients are then sent from transform circuitry 102 to quantization circuitry 104 for entropy reduction. The quantized coefficients are then fed to entropy codec circuitry 106 to generate a compressed video bitstream. As shown in fig. 1, prediction related information 110 (such as video block partitioning information, motion vectors, reference picture indices, and intra prediction modes) from inter prediction circuitry and/or intra prediction circuitry 112 is also fed through entropy coding circuitry 106 and saved into compressed video bitstream 114.

In the encoder 100, decoder-related circuitry is also required to reconstruct the pixels for prediction purposes. First, the prediction residual is reconstructed by inverse quantization 116 and inverse transform circuitry 118. This reconstructed prediction residual is combined with the block prediction value 120 to generate an unfiltered reconstructed pixel for the current video block.

Spatial prediction (or "intra prediction") uses pixels of samples (called reference samples) from already coded neighboring blocks in the same video frame as the current video block to predict the current video block.

Temporal prediction (also referred to as "inter prediction") uses reconstructed pixels from already coded video pictures to predict the current video block. Temporal prediction reduces temporal redundancy inherent in video signals. The temporal prediction signal for a given Codec Unit (CU) or codec block is typically signaled by one or more Motion Vectors (MVs) indicating the amount and direction of motion between the current CU and its temporal reference. In addition, if multiple reference pictures are supported, a reference picture index is additionally sent identifying from which reference picture in the reference picture store the temporal prediction signal came.

After spatial and/or temporal prediction is performed, intra/inter mode decision circuitry 121 in encoder 100 selects the best prediction mode, e.g., based on a rate-distortion optimization method. The block predictor 120 is then subtracted from the current video block; and the resulting prediction residuals are de-correlated using transform circuitry 102 and quantization circuitry 104. The resulting quantized residual coefficients are inverse quantized by inverse quantization circuitry 116 and inverse transformed by inverse transform circuitry 118 to form a reconstructed residual, which is then added back to the prediction block to form the reconstructed signal for the CU. Further in-loop filtering 115, such as a deblocking filter, Sample Adaptive Offset (SAO), and/or adaptive in-loop filter (ALF), may be applied to the reconstructed CU before the reconstructed CU is placed into a reference picture store of a picture buffer 117 and used to encode future video blocks. To form the output video bitstream 114, the codec mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are all sent to the entropy codec unit 106 to be further compressed and packed to form the bitstream.

For example, deblocking filters are available in AVC, HEVC, and now current versions of VVCs. In HEVC, an additional in-loop filter called SAO (sample adaptive offset) is defined to further improve the coding efficiency. In the present version of VVC standard, still another in-loop filter called ALF (adaptive loop filter) is actively studied, and it is highly likely to be included in the final standard.

These in-loop filter operations are optional. Performing these operations helps to improve codec efficiency and visual quality. They may also be turned off in accordance with decisions made by the encoder 100 to save computational complexity.

It should be noted that intra prediction is typically based on unfiltered reconstructed pixels, whereas if the encoder 100 turns on these filter options, inter prediction is based on filtered reconstructed pixels.

Fig. 2 is a block diagram illustrating an exemplary block-based video decoder 200, the decoder 200 may be used in conjunction with many video codec standards. The decoder 200 is similar to the reconstruction related parts present in the encoder 100 of fig. 1. In the decoder 200, the incoming video bitstream 201 is first decoded by entropy decoding 202 to derive quantized coefficient levels and prediction related information. The quantized coefficient levels are then processed by inverse quantization 204 and inverse transformation 206 to obtain the reconstructed prediction residual. The block prediction value mechanism implemented in the intra/inter mode selector 212 is configured to perform either intra prediction 208 or motion compensation 210 based on the decoded prediction information. A set of unfiltered reconstructed pixels is obtained by adding the reconstructed prediction residual from the inverse transform 206 and the prediction output generated by the block prediction mechanism using an adder 214.

The reconstructed block may further pass through an in-loop filter 209 before being stored in a picture buffer 213, which serves as a reference picture store. The reconstructed video in the picture buffer 213 may be sent to drive a display device and used to predict future video blocks. With in-loop filter 209 on, a filtering operation is performed on these reconstructed pixels to derive a final reconstructed video output 222.

The video codec/decoding standards mentioned above (such as VVC, JEM, HEVC, MPEG-4 part 10) are conceptually similar. For example, they all use block-based processing. Some standard block partitioning schemes are detailed below.

High efficiency video coding and decoding (HEVC)

HEVC is a hybrid block-based motion compensation transform coding and decoding architecture. The basic unit used for compression is called a Codec Tree Unit (CTU). For the 4:2:0 chroma format, the maximum CTU size is defined as up to 64 by 64 luma pixels and two 32 by 32 chroma pixel blocks. Each CTU may contain one Codec Unit (CU) or be recursively split into four smaller CUs until a predefined minimum CU size is reached. Each CU (also known as a leaf CU) includes one or more Prediction Units (PUs) and a Transform Unit (TU) tree.

In general, a CTU may include one luma Codec Tree Block (CTB) and two corresponding chroma CTBs in addition to monochrome content; a CU may comprise one luma Codec Block (CB) and two corresponding chroma CBs; the PU may include one luma Prediction Block (PB) and two corresponding chroma PBs; and a TU may include one luma Transform Block (TB) and two corresponding chroma TBs. However, exceptions may occur because the minimum TB size is 4 × 4 for both luma and chroma (i.e., 2 × 2 chroma TBs are not supported for the 4:2:0 color format), and each intra chroma CB always has only one intra chroma PB, regardless of the number of intra luma PB in the corresponding intra luma CB.

For an intra CU, luma CB can be predicted by one or four luma PB, and each of two chroma CBs is always predicted by one chroma PB, where each luma PB has one intra luma prediction mode and two chroma PBs share one intra chroma prediction mode. Also, for intra CU, TB size cannot be larger than PB size. In each PB, intra prediction is applied to predict the samples of each TB within the PB from the neighboring reconstructed samples of the TB. For each PB, in addition to 33 directional intra prediction modes, DC mode and planar mode are supported to predict flat and fade areas, respectively.

For each inter PU, one of three prediction modes may be selected, including inter, skip, and merge. In general, a Motion Vector Competition (MVC) scheme is introduced to select motion candidates from a given candidate set comprising spatial and temporal motion candidates. Multiple references to motion estimation allows finding the best reference among two possible reconstructed reference picture lists (i.e., list 0 and list 1). For inter mode, referred to as AMVP mode, where AMVP denotes advanced motion vector prediction, an inter prediction indicator (list 0, list 1, or bi-prediction), a reference index, a motion candidate index, a Motion Vector Difference (MVD), and a prediction residual are transmitted. As for the skip mode and the merge mode, only the merge index is transmitted, and the current PU inherits the inter prediction indicator, the reference index, and the motion vector from the neighboring PU referred to by the coded merge index. In the case of a CU that is skipped from encoding, the residual signal is also omitted.

Joint exploration test model (JEM)

A joint exploration test model (JEM) is built on top of the HEVC test model. The basic encoding and decoding flow of HEVC remains unchanged in JEM; however, the design elements of the most important modules (including the modules for block structure, intra and inter prediction, residual transformation, loop filter and entropy coding) are modified somewhat and additional coding tools are added. The JEM includes the following new codec features.

In HEVC, the CTU is split into CUs by using a quadtree structure represented as a coding tree to accommodate various local characteristics. The decision whether to codec a picture region using inter-picture (temporal) prediction or intra-picture (spatial) prediction is made at the CU level. Each CU may be further split into one, two, or four PUs depending on the PU split type. Within one PU, the same prediction process is applied and the relevant information is sent to the decoder on a PU basis. After obtaining the residual block by applying a prediction process based on the PU split type, the CU may be partitioned into Transform Units (TUs) according to another quadtree structure similar to the coding tree used for the CU. One of the key features of the HEVC structure is that it has multiple partitioning concepts, including CU, PU and TU.

The QTBT structure removes the concept of multiple partition types, i.e. it removes the separation of CU, PU and TU concepts and supports more flexibility for CU partition shapes. In a QTBT block structure, a CU may have a square or rectangular shape. As shown in fig. 3, a Codec Tree Unit (CTU) is first divided by a quad-tree (i.e., quad-tree) structure. The leaf nodes of the quadtree may be further partitioned by a binary tree structure. There are two split types in binary tree splitting: symmetric horizontal splitting and symmetric vertical splitting. The binary tree leaf nodes are called Codec Units (CUs) and the partitioning is used for prediction and transform processing without any further partitioning. This means that CU, PU and TU have the same block size in the QTBT codec block structure. In JEM, a CU sometimes consists of coding and decoding blocks (CBs) of different color components, e.g., in the case of P-and B-slices of the 4:2:0 chroma format, one CU contains one luma CB and two chroma CBs, and a CU sometimes consists of CBs of a single component, e.g., in the case of I-slices, one CU contains only one luma CB or only two chroma CBs.

For the QTBT segmentation scheme, the following parameters are defined.

-CTU size: the root node size of the quadtree, the same as the concept in HEVC;

-MinQTSize: allowed minimum quadtree leaf node size;

-MaxBTSize: the allowed maximum binary tree root node size;

-MaxBTDepth: maximum allowed binary tree depth;

-MinBTSize: allowed minimum binary tree leaf node size.

In one example of the QTBT segmentation structure, the CTU size is set to 128 × 128 luma samples with two corresponding 64 × 64 chroma sample blocks (with 4:2:0 chroma format), MinQTSize is set to 16 × 16, MaxBTSize is set to 64 × 64, MinBTSize (for both width and height) is set to 4 × 4, and MaxBTDepth is set to 4. Quadtree partitioning is first applied to CTUs to generate quadtree leaf nodes. The quad tree leaf nodes may have sizes from 16 × 16 (i.e., MinQTSize) to 128 × 128 (i.e., CTU size). If the quad tree leaf node is 128 x 128, it will not be further split through the binary tree because the size exceeds MaxBTSize (i.e., 64 x 64). Otherwise, the leaf nodes of the quadtree may be further partitioned by the binary tree. Thus, a leaf node of the quadtree is also the root node for the binary tree, and its binary tree depth is 0. When the binary tree depth reaches MaxBTDepth (i.e., 4), no further splits are considered. When the binary tree nodes have a width equal to MinBTSize (i.e., 4), further horizontal splits are not considered. Similarly, when the binary tree nodes have a height equal to MinBTSize, no further vertical splitting is considered. The leaf nodes of the binary tree are further processed by the prediction and transformation process without any further partitioning. In JEM, the maximum CTU size is 256 × 256 luma samples.

An example of block segmentation by using the QTBT scheme and the corresponding tree representation are illustrated in fig. 3. The solid lines indicate quad tree splits and the dashed lines indicate binary tree splits. As shown in fig. 3, a coding and decoding tree unit (CTU) 300 is first partitioned by a quadtree structure, and three leaf nodes among four

quadtree leaf nodes

302, 304, 306, 308 are further partitioned by the quadtree structure or the binary tree structure. For example, the quadtree leaf nodes 306 are further partitioned by quadtree splitting; the quadtree leaf node 304 is further partitioned into two

leaf nodes

304a, 304b by a binary tree split; and the quadtree leaf nodes 302 are also further partitioned by binary tree splitting. In each split (i.e., non-leaf) node of the binary tree, a flag is signaled to indicate which type of split (i.e., horizontal or vertical) is used, where 0 indicates horizontal split and 1 indicates vertical split. For example, for a quad tree leaf node 304, a 0 is signaled to indicate a horizontal split, and for a quad tree leaf node 302, a 1 is signaled to indicate a vertical split. For quadtree splitting, there is no need to indicate the split type, since quadtree splitting always splits a block both horizontally and vertically to produce 4 sub-blocks of equal size.

Furthermore, the QTBT scheme supports the ability for luminance and chrominance to have separate QTBT structures. Currently, luminance CTB and chrominance CTB in one CTU share the same QTBT structure for P-and B-stripes. However, for an I-slice, luminance CTB is partitioned into CUs by a QTBT structure, and chrominance CTB is partitioned into chrominance CUs by another QTBT structure. This means that a CU in an I-slice consists of either the codec block for the luma component or the codec blocks for the two chroma components, while a CU in a P-slice or B-slice consists of the codec blocks for all three color components.

General purpose video codec (VVC)

In the joint video experts group (jfet) conference, jfet defines a first draft of universal video codec (VVC) and a VVC test model 1 (VTM 1) encoding method. A quadtree with nested multi-type trees is determined as an initial new codec feature of the VVC, including using binary and ternary split codec block structures.

In VVC, a picture partitioning structure divides input video into blocks called Codec Tree Units (CTUs). The CTU is split into Codec Units (CUs) using quadtrees with nested multi-type tree structures, the leaf Codec Units (CUs) defining regions that share the same prediction mode (e.g., intra or inter). Here, the term "cell" defines the area of the image that covers all components; the term "block" is used to define an area covering a particular component (e.g., luminance) and may differ in spatial location when considering chroma sampling formats, such as 4:2: 0.

Segmenting images into CTUs

In VVC, pictures are divided into CTU sequences, and the CTU concept is the same as in HEVC. For a picture with three sample arrays, a CTU consists of a block of N × N luma samples and two corresponding blocks of chroma samples. Fig. 4 shows an example of a picture 400 divided into CTUs 402.

The maximum allowable size of a luminance block in the CTU is designated as 128 × 128 (although the maximum size of a luminance transform block is 64 × 64).

Partitioning of CTUs using tree structures

In HEVC, the CTU is split into CUs by using a quadtree structure represented as a coding tree to accommodate various local characteristics. The decision whether to use inter-picture (temporal) prediction or intra-picture (spatial) prediction to encode a picture region is made at the leaf-CU level. Each leaf-CU may be further split into one, two, or four PUs depending on the PU split type. Within one PU, the same prediction process is applied and the relevant information is sent to the decoder on a PU basis. After obtaining the residual block by applying a prediction process based on the PU split type, the leaf-CU may be partitioned into Transform Units (TUs) according to another quadtree structure similar to the coding tree used for the CU. One of the key features of the HEVC structure is that it has multiple partitioning concepts, including CU, PU and TU.

In VVC, a quadtree with nested multi-type trees using binary and ternary split structure replaces the concept of multiple split unit types, i.e. it removes the separation of CU, PU and TU concepts and supports more flexibility for CU split shapes, except that a CU with an oversize size for the maximum transform length requires the separation of CU, PU and TU concepts. In the coding tree structure, a CU may have a square or rectangular shape. A Codec Tree Unit (CTU) is first partitioned by a quad-tree (i.e., quad-tree) structure. The quadtree leaf nodes may then be further partitioned by a multi-type tree structure. As shown in FIG. 5, there are four split types in the multi-type tree structure: a vertical binary SPLIT 502 (SPLIT _ BT _ VER), a horizontal binary SPLIT 504 (SPLIT _ BT _ HOR), a vertical ternary SPLIT 506 (SPLIT _ TT _ VER), and a horizontal ternary SPLIT 508 (SPLIT _ TT _ HOR). The leaf nodes of the multi-type tree are called Codec Units (CUs) and unless a CU is too large for the maximum transform length, this partitioning is used for the prediction and transform process without any further partitioning. This means that in most cases, in a quadtree with a nested multi-type tree codec block structure, a CU, a PU, and a TU have the same block size. Example when the maximum transform length supported is smaller than the width or height of the color component of the CU. In VTM1, a CU consists of coding and decoding blocks (CBs) of different color components, e.g., one CU contains one luma CB and two chroma CBs (unless the video is monochrome, i.e., has only one color component).

Partitioning a CU into multiple prediction units

In VVC, for each CU partitioned based on the structure described above, prediction of block content may be performed on the entire CU block or in a subblock manner as explained in the following paragraphs. Such a predicted operation unit is called a prediction unit (or PU).

In the case of intra prediction (or intra prediction), the size of a PU is typically equal to the size of a CU. In other words, prediction is performed on the entire CU block. For inter prediction (inter prediction), the size of a PU may be equal to or smaller than the size of a CU. In other words, there are cases where a CU can be split into multiple PUs for prediction.

Some examples of making the PU size smaller than the CU size include an affine prediction mode, an advanced temporal level motion vector prediction (ATMVP) mode, and a triangle prediction mode, among others.

In affine prediction mode, a CU may be split into multiple 4 × 4 PUs for prediction. Motion vectors may be derived for each 4 × 4 PU, and motion compensation may be performed on the 4 × 4 PU accordingly. In ATMVP mode, a CU may be split into one or more 8 × 8 PUs for prediction. Motion vectors are derived for each 8 × 8 PU, and motion compensation may be performed on the 8 × 8 PU accordingly. In the triangle prediction mode, a CU may be split into two triangle-shaped prediction units. Motion vectors are derived for each PU and motion compensation is performed accordingly. The triangle prediction mode is supported for inter prediction. More details of the triangle prediction mode are described below.

Conventional merge mode motion vector candidate list

According to the current VVC, in a conventional merge mode in which an entire CU is predicted without being split into more than one PU, a motion vector candidate list or a merge candidate list is constructed using a different process from that for the triangle prediction mode.

First, spatial motion vector candidates are selected based on motion vectors from neighboring blocks as indicated in fig. 6, fig. 6 being a schematic diagram illustrating the locations of spatial merging candidates according to some embodiments of the present disclosure. In the derivation of the spatial merge candidates for the current block 602, up to four merge candidates are selected among the candidates located at the positions depicted in fig. 6. These candidates are selected according to a certain order. An exemplary derivation order is A₁—>B₁—>B₀—>A₀—>（B₂). Only when in position A₁、B₁、B₀、A₀Is not available or is intra-coded, location B is considered₂. It should be noted that other different orders may be used. For example, in the later stage of VVC, the order is changed to B₁—>A₁—>B₀—>A₀—>（B₂）。

Next, temporal merging candidates are derived. In the derivation of the temporal merging candidate, the scaled motion vector is derived based on a co-located PU belonging to a picture within a given reference picture list having a smallest Picture Order Count (POC) difference from the current picture. The reference picture lists to be used to derive the co-located PUs are explicitly signaled in the slice header. As shown by the dashed line in fig. 7, a scaled motion vector for a temporal merging candidate is obtained, fig. 7 illustrates a motion vector scaling for a temporal merging candidate according to some embodiments of the present disclosure. The scaled motion vector for the temporal merging candidate is scaled from the motion vector of the co-located PU col _ PU using POC distances tb and td, where tb is defined as the POC difference between the reference picture curr _ ref of the current picture and the current picture curr _ pic, and td is defined as the POC difference between the reference picture col _ ref of the co-located picture and the co-located picture col _ pic. The reference picture index of the temporal merging candidate is set equal to zero. The actual implementation of the scaling process is described in the HEVC draft specification. For B slices, two motion vectors (one for reference picture list 0 and the other for reference picture list 1) are obtained and combined to form a bi-predictive merge candidate.

As depicted in fig. 8, the location of the co-located PU is selected between two candidate locations C3 and H. If the PU at position H is not available, either intra-coded or outside the current CTU, position C3 is used to derive the temporal merging candidate. Otherwise, position H is used to derive a temporal merging candidate.

After inserting both spatial and temporal motion vectors into the merge candidate list as described above, history-based merge candidates are added. So-called history-based merge candidates include those motion vectors from previously coded CUs, which are kept in a separate motion vector list and managed based on some rules.

After the history-based candidates are inserted, the pairwise average motion vector candidates are further added to the list if the merge candidate list is not full. As its name implies, this type of candidate is constructed by averaging the candidates already in the current list. More specifically, based on some order or rule, two candidates in the merge candidate list are fetched at a time and the average motion vectors of the two candidates are appended to the current list.

After inserting the pairwise average motion vector, if the merge candidate list is still not full, zero motion vectors will be added to fill the list.

Triangle prediction mode (or triangle division mode)

The concept of triangle prediction mode will introduce triangle partitioning for motion compensated prediction. The triangle prediction mode may also be named triangle prediction unit mode or triangle partition mode. As shown in fig. 9, CU 902 or CU 904 is split into two triangle prediction units in the diagonal or anti-diagonal direction (i.e., split from top left corner to bottom right corner as shown by CU 902, or from top right corner to bottom left corner as shown by CU 904): split 0 and split 1. Each triangle prediction unit in a CU is inter-predicted using its own uni-directional prediction motion vector and reference frame index, which are derived directly and/or indirectly from candidates in the conventional merge candidate list. After the triangle prediction unit is predicted, a weighting process is performed to the diagonal edges. The transform and quantization process is then applied to the entire CU. Note that this mode only applies to the skip mode and the merge mode in the current VVC. Although in fig. 9, the CU is shown as a square block, the triangular prediction mode may also be applied to a non-square (i.e., rectangular) shaped CU.

One-way predictive motion vector derivation

Fig. 10 is a schematic diagram illustrating uni-directional predictive motion vector selection for a triangle partition mode, according to some embodiments of the present disclosure.

In some examples, the uni-directional predictive motion vector for each triangle partition is derived directly from the merge candidate list formed for the conventional merge mode as explained in the previous section of the "conventional merge mode motion vector candidate list". Given a merge candidate index, a candidate may be located from the merge candidate list. Then, for this candidate, its list X motion vector (X is equal to the parity value (p) of the merge candidate index value) is used as the uni-directional prediction motion vector for the triangle partition mode. These motion vectors are marked with an "x" in fig. 10. In the case where the corresponding list X (or list p) motion vector does not exist, the list (1-X) (or list (1-p)) motion vector of the same candidate is used as a uni-directional predictive motion vector for the triangle division mode.

For each of the triangular PUs, a predictor is derived based on its motion vector. Notably, the derived predictors cover a larger area than the actual triangle PU, such that there is an overlapping area of the two predictors along the shared diagonal edge of the two triangle PUs. A weighting process is applied to the diagonal edge region between the two predicted values to derive the final prediction for the CU. The current weighting factors for luma samples and chroma samples are {7/8, 6/8, 5/8, 4/8, 3/8, 2/8, 1/8} and {6/8, 4/8, 2/8}, respectively.

Triangle prediction mode syntax and signaling

Here, the triangle prediction mode is signaled using a triangle prediction flag. When a CU is coded in skip mode or merge mode, a triangle prediction flag is signaled. For a given CU, if the value of the triangle prediction flag is 1, it means that the corresponding CU is coded using the triangle prediction mode. Otherwise, the CU is coded using a prediction mode other than the triangle prediction mode.

For example, a triangle prediction flag is conditionally signaled in skip mode or merge mode. First, a triangle prediction tool enable/disable flag is signaled in the sequence parameter set (or SPS). The triangle prediction flag is signaled at the CU level only if the triangle prediction tool enable/disable flag is true. Second, triangle prediction tools are allowed only in B-slices. Therefore, only in B slices, the triangle prediction flag will be signaled at the CU level. Third, the triangle prediction mode is signaled only for CUs with a size equal to or larger than a certain threshold (e.g., 64). If the CU has a size less than the threshold, the triangle prediction flag is not signaled. Fourth, the triangle prediction mode may be allowed to be used for a CU only when the CU is not coded in the normal merge mode, or MMVD or subblock merge mode, or CIIP mode. For CUs that satisfy these conditions, a triangle division mode is applied.

If the triangle prediction mode is used, a triangle partition orientation flag is signaled to indicate whether the partition is oriented from the top left corner to the bottom right corner or from the top right corner to the bottom left corner.

When signaled, the triangle prediction flag is signaled using a Context Adaptive Binary Arithmetic Coding (CABAC) entropy codec with certain contexts. These contexts are formed based on the triangle prediction flag values of the top and left blocks of the current CU.

To codec (i.e., encode or decode) the triangle prediction flag for the current block (or current CU), triangle prediction flags from the top and left side blocks (or CUs) are derived and their values are added. This results in three possible contexts corresponding to the following cases:

1) both the left block and the top block have a triangle prediction flag of 0;

2) both the left block and the top block have a triangle prediction flag of 1;

3) and others.

Separate probabilities are maintained for each of the three contexts. Once the context value for the current block is determined, the triangle prediction flag of the current block is coded using a CABAC probability model corresponding to the context value.

If the triangle prediction flag is true, a triangle partition orientation flag is signaled to indicate whether the partition is oriented from the top left corner to the bottom right corner or from the top right corner to the bottom left corner.

In case a triangle prediction mode is used for a CU, two merging index values are signaled to indicate the merging index values of the first and second uni-directional prediction merging candidates for triangle prediction, respectively. These two merge index values are used to locate two merge candidates from the list of uni-directional predicted motion vector candidates described above for the first and second partitions, respectively. For triangle prediction, the two merging index values are required to be different so that the two predicted values of the two triangle partitions can be different from each other. As a result, the first merge index value is directly signaled. To signal the second merge index value, if it is less than the first merge index value, its value is signaled directly. Otherwise, its value is subtracted by 1 before being signaled to the decoder. At the decoder side, the first merge index is decoded and used directly. To decode the second merge index value, a value denoted "idx" is first decoded from the CABAC engine. If idx is less than the first merge index value, the second merge index value will be equal to the value of idx. Otherwise, the second merge index value will be equal to (idx + 1).

Block motion vector filling in triangle prediction mode

According to the VVC standard draft, if a CU is coded in triangle prediction mode, different motion vectors are used to fill (i.e., stored in) the motion vector buffer of the 4 × 4 subblocks within the CU, depending on the subblock position. This motion vector filling is performed for the following purposes: motion vector prediction is performed when coding other CUs that may be spatially or temporally adjacent to the current CU. More specifically, the sub-blocks within the first triangle partition (i.e., partition 0) are filled with the uni-directional predicted motion vector of the first triangle partition, denoted as MV₀(ii) a And sub-blocks within the second triangle partition (i.e., partition 1) are filled with the uni-directional prediction motion vectors of the second triangle partition, and those 4 × 4 sub-blocks located on the diagonal partition boundaries are filled with the motion vectors of the MVs₀And MV₁The two form a motion vector. Dependent on MV₀And MV₁Is expressed as MV₀₁The formed motion vector may be uni-directionally predicted or bi-directionally predicted. If MV is₀And MV₁From a different reference list, the two uni-directionally predicted motion vectors are directly combined to form a bi-directionally predicted motion vector. Checking the MVs if they refer to the same reference list₁Reference to (2)Picture to see if it exists in another reference list. If so, the MV₁Is converted to refer to the same reference picture but is another reference list and then is compared to the MV₀Combined to form bi-directionally predicted motion vectors. If MV is₁Does not exist in another reference list, the MV is checked₀See if it is present in another reference list. If so, the MV₀Is converted to refer to the same reference picture but is another reference list and then is compared to the MV₁Combined to form bi-directionally predicted motion vectors. If MV is₀Is not present in another reference list, then the MV is₀Is directly used as the formed motion vector and in this case, the formed motion vector is a motion vector of unidirectional prediction. An example is shown in fig. 11. In this example, a CU of 32 × 32 size is coded in the triangle prediction mode. In this case, those 4 × 4 blocks within partition 0 are filled with the uni-directional predictive motion vector of partition 0; those 4 x 4 blocks within partition 1 are filled with the uni-directional predictive motion vector of partition 1; and those 4 x 4 blocks (marked as squares with solid border lines) lying on diagonal border lines are filled with the formed motion vectors MV described above₀₁. It is noted that in the above-described process, the motion vector used to fill the 4 × 4 block may be the same as the motion vector used to form the inter prediction for the 4 × 4 block, or may be different therefrom. Although the sub-blocks disclosed in this disclosure have a size of 4 x 4, the sub-block size may be adapted to 2 x 2, 8 x 8 or other sizes, wherein the disclosed method may be adapted accordingly.

One-way predictive motion vector derivation with limited motion vector pruning

In some examples, the uni-directional predictive motion vector for each triangle partition is derived directly from the merge candidate list formed for the conventional merge mode as explained in the section "conventional merge mode motion vector candidate list". This method is simple. However, as shown in fig. 10, the number of selectable motion vectors may be limited for triangle segmentation. For example, when motion vectors labeled with "x" in the figure exist, corresponding uni-directional prediction motion vectors (i.e., those not labeled with "x" in the figure) of the same merge candidate but from other reference lists will have no chance of being used for triangle prediction. Meanwhile, it often happens that some of the motion vectors marked with "x" may be identical to each other, which may further limit the motion vector diversity and sacrifice coding efficiency.

Another problem of triangle prediction is associated with its current block motion vector filling method as described in the section "block motion vector filling in triangle prediction mode". The corresponding operation when filling in motion vectors is not so simple. A more implementation friendly approach may be preferred.

According to some examples of the present disclosure, given two merge index values in the triangle prediction mode, two uni-directionally predicted motion vectors may be located based on the process described in the section "uni-directionally predicted motion vector derivation". In addition, a motion vector clipping operation may be performed. In case the two uni-directional predicted motion vectors derived for partition 0 and partition 1, respectively, are identical, their corresponding uni-directional predicted motion vectors from another reference list (if present) may be used instead.

The disclosed examples mentioned above may be implemented in different ways. Suppose for triangle partition 0 and partition 1, the two uni-directional predicted motion vectors located based on the process described in the section "uni-directional predicted motion vector derivation" are MVs, respectively₀And MV₁. In one example, if MV₁And MV₀Same as MV₁The corresponding motion vector (if it exists) that shares the same merge index but from another reference list is instead used for partition 1. If it does not exist, the MV will still be used₁. In another example, if MV₁And MV₀Same as MV₁Corresponding motion vector sharing the same merge index but from another reference list (if it were itIf present) is used instead. If it does not exist or if it is associated with an MV₀Same, then MV will still be used₁. In this case, with MV₀The corresponding motion vector (if it exists) that shares the same merge index but from another reference list is used to partition 0. If it does not exist, the MV₀Will still be used to partition 0.

In the above description, the checking and processing order of the

division numbers

0 and 1, and their respective MVs₀And MV₁Are all relative. Thus, their order of examination and processing may be interchanged in the description, and the resulting method still covered by the same spirit of the present disclosure. For example, instead of being described with respect to MV as in the example above₀Firstly to MV₁Performing a cropping operation, also with respect to MVs₁Firstly to MV₀A trimming operation is performed.

Based on the disclosed example, different methods may be used in determining whether two uni-directional predicted motion vectors are the same. In one example, two uni-directionally predicted motion vectors are considered to be the same when the codec device determines that the two vectors have the same X-component and Y-component and the same POC (i.e., picture order count) for their reference pictures. The X and Y components of the motion vector represent the relative horizontal and vertical offset values, respectively, from the current block to their corresponding reference block. In another example, when the encoding and decoding apparatus determines that two vectors have the same X and Y components, the same reference list, and the same reference picture index, the two uni-directional prediction motion vectors are considered to be the same. In yet another example, when the encoding and decoding device determines that two vectors have the same X and Y components, the two uni-directionally predicted motion vectors are considered to be the same regardless of their reference lists or reference picture indices. Here, the codec device may be an electronic device having a chip for encoding video data.

With the disclosed method explained above, more motion vectors can be selected and used for triangle prediction without additional signaling overhead. This improves the coding efficiency. The complexity of the associated motion vector pruning operation is limited.

Simplified block motion vector padding

According to some examples of the present disclosure, the block motion vector fill operation does not necessarily follow the procedure described in the section "block motion vector fill in triangle prediction mode". Some simplified schemes may alternatively be used. In the following description of the present disclosure, motion vectors for

triangle partitions

0 and 1 are denoted as MVs, respectively₀And MV₁(ii) a And from the MV, based on the procedure described in the section "block motion vector fill in triangle prediction mode₀And MV₁The motion vector formed by the two is expressed as MV₀₁. As explained earlier, the MV₀₁Motion vectors that can be bi-directional predicted or uni-directional predicted.

In one example of the disclosure, instead of filling a 4 x 4 block with different motion vectors, this formed motion vector MV is used₀₁To fill each 4 x 4 block in the current CU.

In another example of the present disclosure, instead of filling 4 x 4 blocks with different motion vectors, each 4 x 4 block in the current CU is filled with a uni-directional predicted motion vector associated with a triangle partition located at the bottom of the CU. An example is shown in fig. 9, where partition 1 is a triangle partition at the bottom and its motion vector MV₁For filling each 4 x 4 block in the CU.

In yet another example of the present disclosure, uni-directional predictive motion vectors associated with a triangle partition located at the bottom of a CU are used to fill each 4 x 4 block in the current CU, except for two 4 x 4 blocks located at two corners on the diagonal partition boundary. For two 4 x 4 blocks located at two corners on a diagonal partition boundary, the resulting motion vector MV₀₁For filling them. This is illustrated in fig. 12A and 12B, where only two 4 × 4 blocks with solid border lines are filled with the formed motion vector MV₀₁. More specifically, as shown in fig. 12A, when the current CU is split from the top left corner to the bottom right corner, the left CUThe upper 4 x 4 block and the lower right 4 x 4 block are filled with the formed motion vectors MV₀₁. When the current CU is split from the upper right corner to the lower left corner, as shown in fig. 12B, the upper right 4 × 4 block and the lower left 4 × 4 block are filled with the formed motion vector MV₀₁。

In another example of the present disclosure, the block motion vector fill operation still follows the process described in the section "block motion vector fill in triangle prediction mode", except for those 4 x 4 blocks (marked with solid boundary lines in fig. 11) that lie on diagonal boundary lines. For two 4 x 4 blocks located at two corners on a diagonal partition boundary, the resulting motion vector MV₀₁For filling them. For other 4 x 4 blocks located on diagonal boundary lines, they are filled with uni-directional predicted motion vectors associated with the triangle partitions located at the bottom of the CU.

In another example of the present disclosure, the block motion vector fill operation still follows the process described in the section "block motion vector fill in triangle prediction mode", except for those 4 x 4 blocks (marked with solid boundary lines in fig. 11) that lie on diagonal boundary lines. For two 4 x 4 blocks located at two corners on a diagonal partition boundary, the resulting motion vector MV₀₁For filling them. For other 4 x 4 blocks located on diagonal boundary lines, they are filled with uni-directional predicted motion vectors associated with the triangle partitions located in the upper part of the CU.

In another example of the present disclosure, the current CU is partitioned into four quarter-sized regions. The blocks in each region are filled with the same motion vector, while the blocks in different regions may be filled with different motion vectors. More specifically, blocks located in quarter-sized regions on diagonal boundaries are filled with MVs₀₁And blocks in a quarter-sized region within each triangle partition are filled with the uni-directional predictive motion vector for that partition. Fig. 13A shows an example. In this figure, a 4 x 4 block in two quarter-sized regions (marked with a real boundary line) containing diagonal partition boundariesIs filled with MV₀₁While the 4 x 4 blocks in the other two quarter-sized regions are filled with MVs depending on which triangle partition they are in₀Or MV₁. In the case of fig. 13A, 4 × 4 blocks in the upper-right quarter-sized region are filled with MVs₀And 4 x 4 blocks in the lower left quarter-sized region are filled with MVs₁. In the case of fig. 13B, the 4 × 4 blocks in the upper left quarter-sized region are filled with MVs₀And 4 x 4 blocks in the lower right quarter-sized region are filled with MVs₁。

In another example of the present disclosure, each block in the current CU is filled with a motion vector MV, except for two 4 × 4 corner blocks located at two corners of partition 0 and partition 1, respectively₀₁. The two corner blocks are not located on diagonal segmentation boundaries. Fig. 12C and 12D show an example in which the two corner blocks are indicated by solid boundary lines. According to this embodiment of the present disclosure, these two corner blocks are filled with the corresponding uni-directional predictive motion vectors of their triangle partitions. More specifically, as shown in fig. 12C, when the current CU is split from the upper left corner to the lower right corner, the upper-right 4 × 4 block and the lower-left 4 × 4 block are filled with MVs, respectively₀And MV₁. When the current CU is split from the top right corner to the bottom left corner, as shown in fig. 12D, the top left 4 × 4 block and the bottom right 4 × 4 block are filled with MVs, respectively₀And MV₁. Although the examples in fig. 11-13 use sub-blocks of size 4 x 4, the method may be adapted to different sub-block sizes, such as 2 x 2, 8 x 8, or other sizes.

In yet another example of the present disclosure, in case the current CU has a width equal to 4 pixels or has a height equal to 4 pixels, if it is coded in the triangle prediction mode, then MV is used₀₁Each block in the current CU is filled. It is noted that this example may be used in conjunction with each of those examples described above.

In the above process, although the first merge list including 5 merge candidates is used for illustration in all examples of the present disclosure, the size of the first merge list may be defined differently, for example, 6 or 4, or some other value, in practice. All methods described in this disclosure are equally applicable to the case where the first merged list has a size different from 5.

Although the methods of forming the uni-directional prediction merge list are described in this disclosure with respect to triangle prediction modes, these methods are applicable to other prediction modes of similar kind. For example, in a more general geometric partitioning prediction mode, where a CU is partitioned into two PUs along a line that is not completely diagonal, the two PUs may have a geometric shape such as a triangle, wedge, or trapezoid shape. In this case, the prediction for each PU may be formed in a manner similar to that in the triangle prediction mode, and the methods described herein are equally applicable.

Fig. 14 is a block diagram illustrating an apparatus for video codec according to some embodiments of the present disclosure. The apparatus 1400 may be a terminal, such as a mobile phone, a tablet computer, a digital broadcast terminal, a tablet device, or a personal digital assistant.

As shown in fig. 14, the apparatus 1400 may include one or more of the following components: a processing component 1402, a memory 1404, a power component 1406, a multimedia component 1408, an audio component 1410, an input/output (I/O) interface 1412, a sensor component 1414, and a communication component 1416.

The processing component 1402 generally controls the overall operation of the device 1400, such as operations related to display, telephone calls, data communications, camera operations, and recording operations. Processing component 1402 may include one or more processors 1420 to execute instructions to perform all or a portion of the steps of the above-described methods. Additionally, processing component 1402 can include one or more modules to facilitate interaction between processing component 1402 and other components. For example, the processing component 1402 can include a multimedia module to facilitate interaction between the multimedia component 1408 and the processing component 1402.

The memory 1404 is configured to store different types of data to support the operation of the apparatus 1400. Examples of such data include instructions for any application or method operating on the device 1400, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1404 may be implemented by any type or combination of volatile or non-volatile storage devices, and the memory 1404 may be Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, a magnetic disk, or a compact disk.

The power supply component 1406 provides power to the various components of the device 1400. The power components 1406 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 1400.

The multimedia component 1408 includes a screen that provides an output interface between the device 1400 and a user. In some examples, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen that receives an input signal from a user. The touch panel may include one or more touch sensors for sensing touches, swipes, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some examples, the multimedia component 1408 may include a front camera and/or a rear camera. The front camera and/or the back camera may receive external multimedia data when the device 1400 is in an operating mode, such as a shooting mode or a video mode.

The audio component 1410 is configured to output and/or input audio signals. For example, audio component 1410 includes a Microphone (MIC). The microphone is configured to receive external audio signals when the apparatus 1400 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 1404 or transmitted via the communication component 1416. In some examples, audio component 1410 further includes a speaker for outputting audio signals.

I/O interface 1412 provides an interface between processing component 1402 and peripheral interface modules. The peripheral interface module can be a keyboard, a click wheel, a button and the like. These buttons may include, but are not limited to, a home button, a volume button, a start button, and a lock button.

The sensor component 1414 includes one or more sensors for providing various aspects of state assessment for the apparatus 1400. For example, the sensor component 1414 may detect an on/off state of the apparatus 1400 and the relative position of the component. For example, the components are the display and keypad of the device 1400. The sensor components 1414 may also detect changes in position of the apparatus 1400 or components of the apparatus 1400, presence or absence of user contact on the apparatus 1400, orientation or acceleration/deceleration of the apparatus 1400, and changes in temperature of the apparatus 1400. The sensor component 1414 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 1414 may further include an optical sensor, such as a CMOS or CCD image sensor used in imaging applications. In some examples, sensor assembly 1414 may further include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 1416 is configured to facilitate wired or wireless communication between the apparatus 1400 and other devices. The apparatus 1400 may access a wireless network based on a communication standard, such as WiFi, 4G, or a combination thereof. In one example, the communication component 1416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one example, the communication component 1416 can further include a Near Field Communication (NFC) module for facilitating short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In one example, the apparatus 1400 may be implemented by one or more of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components to perform the above-described methods.

The non-transitory computer-readable storage medium may be, for example, a Hard Disk Drive (HDD), a Solid State Drive (SSD), flash memory, a hybrid drive or Solid State Hybrid Drive (SSHD), Read Only Memory (ROM), compact disc read only memory (CD-ROM), magnetic tape, a floppy disk, or the like.

Fig. 15 is a flow diagram illustrating an example process for video coding using geometric partitioning for motion compensated prediction, according to some embodiments of the present disclosure.

In step 1501, processor 1420 partitions the video picture into multiple Coding Units (CUs), where at least one CU is further partitioned into two Prediction Units (PUs). The two PUs may include at least one geometrically shaped PU. For example, the geometrically shaped PUs may include a pair of triangular PUs, a pair of wedge-shaped PUs, or other geometrically shaped PUs.

In step 1502, processor 1420 constructs a first merge list comprising a plurality of candidates, each candidate comprising one or more motion vectors: a list 0 motion vector or a list 1 motion vector. For example, the processor 1420 may construct the first merge list based on a merge list construction process for conventional merge prediction. The processor 1420 may also obtain the first consolidated list from other electronic devices or storage.

In step 1503, the processor 1420 locates a first candidate for the first PU from the first merge candidate index.

In step 1504, processor 1420 locates a second candidate for the second PU from the second merge candidate index.

In step 1505, processor 1420 selects a first candidate list X₁Motion vector, obtaining a first uni-directional predicted motion vector MV for a first PU₀Wherein X is₁Corresponds to the first merge candidate index and takes a value of 0 or 1.

In step 1506, processor 1420 selects a second candidate list X₂Motion vector, obtaining a second uni-directional predicted motion vector MV for a second PU₁Wherein X is₂Corresponding to the second merge candidate index and taking the

value

0 or 1.

In step 1507, in response to determining the MV₀And MV₁Similarly, processor 1420 prunes the first uni-directional predicted motion vector MV₀And a second uni-directional predicted motion vector MV₁。

In some examples, an apparatus for video coding is provided. The apparatus includes a processor 1420; and a memory 1404 configured to store instructions executable by the processor; wherein the processor, when executing the instructions, is configured to perform the method as shown in figure 15.

In some other examples, a non-transitory computer-readable storage medium 1404 is provided in which instructions are stored. When executed by the processor 1420, the instructions cause the processor to perform the method as shown in fig. 15.

The description of the present disclosure has been presented for purposes of illustration and is not intended to be exhaustive or limited to the disclosure. Many modifications, variations and alternative embodiments will become apparent to those skilled in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings.

The examples were chosen and described in order to explain the principles of the disclosure and to enable others of ordinary skill in the art to understand the disclosure for various embodiments and with the best mode of practicing various embodiments with various modifications as are suited to the particular use contemplated. Therefore, it is to be understood that the scope of the disclosure is not to be limited to the specific examples of the embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the disclosure.

Claims

1. A method for video coding using geometric partitioning, comprising:

partitioning a video picture into a plurality of Coding Units (CUs), wherein at least one CU is further partitioned into two Prediction Units (PUs): a first PU and a second PU, the two PUs comprising at least one geometrically shaped PU;

constructing a first merge list comprising a plurality of candidates based on a merge list construction process for conventional merge prediction, wherein each candidate of the plurality of candidates is a motion vector comprising a list 0 motion vector or a list 1 motion vector or both;

locating a first candidate for the first PU according to a first merge candidate index;

locating a second candidate for the second PU according to a second merge candidate index;

by selecting the list X of the first candidates₁A motion vector obtaining a first uni-directional predicted motion vector MV for the first PU₀Wherein X is₁Corresponding to the first merging candidate index and taking a value of 0 or 1;

by selecting the list X of second candidates₂Motion vector, obtaining a second uni-directional predicted motion vector MV for said second PU₁Wherein X is₂Corresponding to the second merging candidate index and taking a value of 0 or 1; and

in response to determining the MV₀And the MV₁Pruning the first uni-directional predicted motion vector MV₀And said second uni-directional predicted motion vector MV₁。

2. The method for video coding using geometric partitioning according to claim 1, wherein the MVs are pruned₀And the MV₁Further comprising:

responsive to determining the list (1-X) of second candidates₂) Motion vectors are present by selecting said list (1-X) of said second candidates for said second PU₂) Motion vectors to obtain updated second uni-directional predicted motion vectors.

3. The method for video coding using geometric partitioning according to claim 1, wherein the MVs are pruned₀And the MV₁Further comprising:

responsive to determining the list (1-X) of second candidates₂) A motion vector does not exist and is in response to determining the first candidateList of (1-X)₁) Motion vectors are present by selecting said list (1-X) of said first candidates for said first PU₁) The motion vector to obtain an updated first uni-directional predicted motion vector.

4. The method for video coding using geometric partitioning according to claim 1, wherein determining the MV₀And the MV₁The method further comprises the following steps:

determining the MV₀And the X component and the Y component of (1) and the MV₁The X component and the Y component of (a) are the same.

5. The method for video coding and decoding using geometric partitioning of claim 4, further comprising:

determining the MV₀A first Picture Order Count (POC) of a first reference picture associated with the MV and a second reference picture associated with the first reference picture₁The second POC of the associated second reference picture is the same.

6. The method for video coding and decoding using geometric partitioning of claim 4, further comprising:

determining the MV₀And the MV₁Is the same as the second reference list; and

determining the MV₀And the MV₁Is the same as the second reference picture index.

7. The method for video coding and decoding using geometric partitioning as defined in claim 1, wherein X₁Corresponds to the first merging candidate index and takes a value of 0 or 1, and X₂Corresponding to the second merge candidate index and taking a value of 0 or 1, the method further comprises:

if the list of first candidates p₁If a motion vector exists, p is added₁Is assigned to X₁Wherein p is₁A parity value for the first merge candidate index;

if it is atThe list p of the first candidates₁If no motion vector exists, (1-p) will be₁) Is assigned to X₁；

If the second candidate list p₂If a motion vector exists, p is added₂Is assigned to X₂Wherein p is₂A parity value for the second merge candidate index; and

if the second candidate list p₂If no motion vector exists, (1-p) will be₂) Is assigned to X₂。

8. An apparatus for video coding using geometric partitioning, comprising:

one or more processors; and

a memory configured to store instructions executable by the one or more processors; wherein the one or more processors, when executing the instructions, are configured to:

by selecting the list X of second candidates₂Motion vector to obtain a second uni-directional for the second PUPredicting motion vector MV₁Wherein X is₂Corresponding to the second merging candidate index and taking a value of 0 or 1; and

9. The apparatus for video coding using geometric partitioning according to claim 8, wherein the one or more processors are further configured to:

10. The apparatus for video coding using geometric partitioning according to claim 8, wherein the one or more processors are further configured to:

responsive to determining the list (1-X) of second candidates₂) A motion vector is absent and in response to determining the list (1-X) of first candidates₁) Motion vectors are present by selecting said list (1-X) of said first candidates for said first PU₁) The motion vector to obtain an updated first uni-directional predicted motion vector.

11. The apparatus for video coding using geometric partitioning according to claim 8, wherein the one or more processors are further configured to:

12. The apparatus for video coding using geometric partitioning according to claim 11, wherein the one or more processors are further configured to:

13. The apparatus for video coding using geometric partitioning according to claim 11, wherein the one or more processors are further configured to:

determining the MV₀And the MV₁Is the same as the second reference list; and

14. The apparatus for video coding using geometric partitioning according to claim 8, wherein the one or more processors are further configured to:

if the list of first candidates p₁If no motion vector exists, (1-p) will be₁) Is assigned to X₁；

15. A non-transitory computer-readable storage medium for video coding with geometric partitioning, the non-transitory computer-readable storage medium storing computer-executable instructions that, when executed by one or more computer processors, cause the one or more computer processors to perform acts comprising:

16. The non-transitory computer-readable storage medium for video coding using geometric partitioning of claim 15, wherein pruning the MVs₀And the MV₁Further causing the one or more computer processors to perform:

responsive to determining the list (1-X) of second candidates₂) Motion vectors are present by selecting said list (1-X) of said second candidates for said second PU₂) Motion vector to obtain updatedTwo uni-directional predicted motion vectors.

17. The non-transitory computer-readable storage medium for video coding using geometric partitioning of claim 15, wherein pruning the MVs₀And the MV₁Further causing the one or more computer processors to perform:

18. The non-transitory computer-readable storage medium for video coding using geometric partitioning of claim 15, wherein determining the MVs₀And the MV₁The method further comprises causing the one or more computer processors to perform:

19. The non-transitory computer-readable storage medium for video codec utilizing geometric partitioning of claim 18, wherein the acts further comprise:

20. The non-transitory computer-readable storage medium for video codec utilizing geometric partitioning of claim 18, wherein the acts further comprise:

determining the MV₀And the MV₁Is the same as the second reference list; and