WO2024083115A1

WO2024083115A1 - Method and apparatus for blending intra and inter prediction in video coding system

Info

Publication number: WO2024083115A1
Application number: PCT/CN2023/124950
Authority: WO
Inventors: Man-Shu CHIANG; Yu-Ling Hsiao; Chih-Wei Hsu
Original assignee: Mediatek Inc.
Priority date: 2022-10-18
Filing date: 2023-10-17
Publication date: 2024-04-25

Abstract

A video coding system utilising flexible partition. According to this method, pixel data associated with a current block are received at an encoder side or coded data associated with the current block to be decoded are received at a decoder side. The current block is partitioned into a first region and a second region according to a partition line, wherein the partition line comprises at least two partitioning candidates and at least one of said at least two partitioning candidates corresponds to predefined splitting modes. The first region is encoded or decoded using a first coding mode. The second region is encoded or decoded using a second coding mode.

Description

METHOD AND APPARATUS FOR BLENDING INTRA AND INTER PREDICTION IN VIDEO CODING SYSTEM

CROSS REFERENCE TO RELATED APPLICATIONS

The present invention is a non-Provisional Application of and claims priority to U.S. Provisional Patent Application No. 63/379,921, filed on October 18, 2022. The U.S. Provisional Patent Application is hereby incorporated by reference in its entirety.

FIELD OF THE INVENTION

The present invention relates to video coding system. In particular, the present invention relates to partitioning a block using a flexible partitioning line.

BACKGROUND AND RELATED ART

Versatile video coding (VVC) is the latest international video coding standard developed by the Joint Video Experts Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) . The standard has been published as an ISO standard: ISO/IEC 23090-3: 2021, Information technology -Coded representation of immersive media -Part 3: Versatile video coding, published Feb. 2021. VVC is developed based on its predecessor HEVC (High Efficiency Video Coding) by adding more coding tools to improve coding efficiency and also to handle various types of video sources including 3-dimensional (3D) video signals.

Fig. 1A illustrates an exemplary adaptive Inter/Intra video encoding system incorporating loop processing. For Intra Prediction 110, the prediction data is derived based on previously encoded video data in the current picture. For Inter Prediction 112, Motion Estimation (ME) is performed at the encoder side and Motion Compensation (MC) is performed based on the result of ME to provide prediction data derived from other picture (s) and motion data. Switch 114 selects Intra Prediction 110 or Inter-Prediction 112 and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues. The prediction error is then processed by Transform (T) 118 followed by Quantization (Q) 120. The transformed and quantized residues are then coded by Entropy Encoder 122 to be included in a video bitstream corresponding to the compressed video data. The bitstream associated with the transform coefficients is then packed with side information such as motion and coding modes associated with Intra prediction and Inter prediction, and other information such as parameters associated with loop filters applied to underlying image area. The side information associated with Intra Prediction 110, Inter prediction 112 and in-loop filter 130, are provided to Entropy Encoder 122 as shown in Fig. 1A. When an Inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well. Consequently, the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues. The residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data. The reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.

As shown in Fig. 1A, incoming video data undergoes a series of processing in the encoding system. The reconstructed video data from REC 128 may be subject to various impairments due to a series of processing. Accordingly, in-loop filter 130 is often applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality. For example, deblocking filter (DF) , Sample Adaptive Offset (SAO) and Adaptive Loop Filter (ALF) may be used. The loop filter information may need to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, loop filter information is also provided to Entropy Encoder 122 for incorporation into the bitstream. In Fig. 1A, Loop filter 130 is applied to the reconstructed video before the reconstructed samples are stored in the reference picture buffer 134. The system in Fig. 1A is intended to illustrate an exemplary structure of a typical video encoder. It may correspond to the High Efficiency Video Coding (HEVC) system, VP8, VP9, H. 264 or VVC.

The decoder, as shown in Fig. 1B, can use similar or portion of the same functional blocks as the encoder except for Transform 118 and Quantization 120 since the decoder only needs Inverse Quantization 124 and Inverse Transform 126. Instead of Entropy Encoder 122, the decoder uses an Entropy Decoder 140 to decode the video bitstream into quantized transform coefficients and needed coding information (e.g. ILPF information, Intra prediction information and Inter prediction information) . The Intra prediction 150 at the decoder side does not need to perform the mode search. Instead, the decoder only needs to generate Intra prediction according to Intra prediction information received from the Entropy Decoder 140. Furthermore, for Inter prediction, the decoder only needs to perform motion compensation (MC 152) according to Inter prediction information received from the Entropy Decoder 140 without the need for motion estimation.

According to VVC, an input picture is partitioned into non-overlapped square block regions referred as CTUs (Coding Tree Units) , similar to HEVC. Each CTU can be partitioned into one or multiple smaller size coding units (CUs) . The resulting CU partitions can be in square or rectangular shapes. Also, VVC divides a CTU into prediction units (PUs) as a unit to apply prediction process, such as Inter prediction, Intra prediction, etc.

The VVC standard incorporates various new coding tools to further improve the coding efficiency over the HEVC standard. Among various new coding tools, some coding tools relevant to the present invention are reviewed as follows.

In VVC, a coding tool named Geometric Partitioning Mode (GPM) has been developed, where GPM partitions a block into two regions and allows separate prediction modes for the two regions. In the present invention, a technique to use a flexible partition line to partition a block is disclosed to improve coding efficiency.

BRIEF SUMMARY OF THE INVENTION

A method and apparatus for video coding are disclosed. According to this method, pixel data associated with a current block are received at an encoder side or coded data associated with the current block to be decoded are received at a decoder side. The current block is partitioned into a first region and a second region according to a partition line, wherein the partition line comprises at least two partitioning candidates and at least one of said at least two partitioning candidates corresponds to predefined splitting modes. The first region is encoded or decoded using a first coding mode. The second region is encoded or decoded using a second coding mode.

In one embodiment, one of the pre-defined splitting modes corresponds to horizontal, vertical, diagonal, or inverse diagonal splitting.

In one embodiment, the partition line is determined by applying a max operation or a min operation on weighting values from said at least two partitioning candidates.

In one embodiment, the partition line is generated by two existing partition candidates.

In one embodiment, said partitioning the current block according to the partition line is allowed when first angle index and second angle index associated with said at least two partitioning candidates are different. In one embodiment, said partitioning the current block according to the partition line is allowed when difference between the first angle index and the second angle index is smaller than a threshold.

In one embodiment, said partitioning the current block according to the partition line is allowed when width, height, and/or area of the first region or the second region is larger than a threshold.

In one embodiment, signalling of the partition line comprises signalling an orientation to indicate starting splitting of the partition line. In one embodiment, the orientation corresponds to vertical orientation or horizontal orientation. In another embodiment, signalling of the partition line comprises signalling a region position to indicate positions of the first region and the second region. In yet another embodiment, signalling of the partition line comprises signalling a direction to indicate turning direction of an intersection point of the partitioning candidate sections. In yet another embodiment, signalling of the partition line comprises signalling an intersection position. Furthermore, said signalling of the partition line may further comprise signalling a corner covered by the first region .

In one embodiment, said at least two partitioning candidates correspond to only two predefined splitting modes including horizontal splitting and vertical splitting.

BRIEF DESCRIPTION OF THE DRAWINGS

Fig. 1A illustrates an exemplary adaptive Inter/Intra video encoding system incorporating loop processing.

Fig. 1B illustrates a corresponding decoder for the encoder in Fig. 1A.

Fig. 2 illustrates the neighbouring blocks used for deriving spatial merge candidates for VVC.

Fig. 3 illustrates the possible candidate pairs considered for redundancy check in VVC.

Fig. 4 illustrates an example of temporal candidate derivation, where a scaled motion vector is derived according to POC (Picture Order Count) distances.

Fig. 5 illustrates the position for the temporal candidate selected between candidates C₀ and C₁.

Fig. 6 illustrates the distance offsets from a starting MV in the horizontal and vertical directions according to Merge Mode with MVD (MMVD) .

Fig. 7A illustrates an example of the affine motion field of a block described by motion information of two control point (4-parameter) .

Fig. 7B illustrates an example of the affine motion field of a block described by motion information of three control point motion vectors (6-parameter) .

Fig. 8 illustrates an example of block based affine transform prediction, where the motion vector of each 4×4 luma subblock is derived from the control-point MVs.

Fig. 9 illustrates an example of derivation for inherited affine candidates based on control-point MVs of a neighbouring block.

Fig. 10 illustrates an example of affine candidate construction by combining the translational motion information of each control point from spatial neighbours and temporal.

Fig. 11 illustrates an example of affine motion information storage for motion information inheritance.

Fig. 12 illustrates an example of the weight value derivation for Combined Inter and Intra Prediction (CIIP) according to the coding modes of the top and left neighbouring blocks.

Fig. 13 illustrates an example of the of 64 partitions used in the VVC standard, where the partitions are grouped according to their angles and dashed lines indicate redundant partitions.

Fig. 14 illustrates an example of uni-prediction MV selection for the geometric partitioning mode.

Fig. 15 illustrates an example of blending weight ω₀ using the geometric partitioning mode.

Fig. 16 illustrates an example of GPM blending process according to a discrete ramp function for the blending area around the boundary.

Fig. 17 illustrates an example of GPM blending process used for GPM blending in ECM 4.0.

Fig. 18 shows the intra prediction modes as adopted by the VVC video coding standard.

Fig. 19A illustrates an example of selected template for a current block, where the template comprises T rows above the current block and T columns left the current block.

Fig. 19B illustrates an example for T=3 and the HoGs (Histogram of Gradient) are calculated for pixels in the middle row and pixels in the middle column.

Fig. 19C illustrates an example of the amplitudes (ampl) for the angular intra prediction modes.

Fig. 20 illustrates an example of the blending process, where two angular intra modes (M1 and M2) are selected according to the indices with two tallest bars of histogram bars.

Fig. 21 illustrates an example of template-based intra mode derivation (TIMD) mode, where TIMD implicitly derives the intra prediction mode of a CU using a neighbouring template at both the encoder and decoder.

Figs. 22A-C illustrate examples of available IPM candidates: the parallel angular mode against the GPM block boundary (Parallel mode, Fig. 22A) , the perpendicular angular mode against the GPM block boundary (Perpendicular mode, Fig. 22B) , and the Planar mode (Fig. 22C) , respectively.

Fig. 22D illustrates an example of GPM with intra and intra prediction, where intra prediction is restricted to reduce the signalling overhead for IPMs and hardware decoder cost.

Fig. 23A illustrates the syntax coding for Spatial GPM (SGPM) before using a simplified method.

Fig. 23B illustrates an example of simplified syntax coding for Spatial GPM (SGPM) .

Fig. 24 illustrates an example of template for Spatial GPM (SGPM) .

Fig. 25 illustrates an example of the edge on the template being extended from the partitioning boundary of the current CU, but GPM blending process is not used in the template area across the edge.

Fig. 26A illustrates examples of flexible partitioning formed by 2 partitioning candidates according to the present invention.

Fig. 26B illustrates an example of signalling for flexible partitioning according to an embodiment of the present invention.

Fig. 27 illustrates an example of signalling to indicate the selected flexible partitioning including the syntaxes, the intersection position (xx, yy) and/or the corner covered by the first prediction region.

Fig. 28 illustrates an example of the corner covered by the first prediction region corresponding to the top-left corner, top-right corner, bottom-left corner and bottom-right corner of the current block.

Fig. 29 illustrates an example of adaptive blending with individual blending sizes for the two blending regions according to one embodiment of the present invention.

Fig. 30 illustrates an example of a partition line of flexible partition design by applying the max operation or min operation on the weighting values from the two predefined partitioning (splitting) modes.

Fig. 31 illustrates an example of determining costs associated with individual blending sizes based on a template and extended blending regions according to one embodiment of the present invention.

Fig. 32 illustrates an example of neighbouring mode information used for the candidate list.

Fig. 33 illustrates a flowchart of an exemplary video coding system that utilizes flexible partition with two partitioning candidates according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

It will be readily understood that the components of the present invention, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of the embodiments of the systems and methods of the present invention, as represented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. References throughout this specification to “one embodiment, ” “an embodiment, ” or similar language mean that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, etc. In other instances, well-known structures, or operations are not shown or described in detail to avoid obscuring aspects of the invention. The illustrated embodiments of the invention will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of apparatus and methods that are consistent with the invention as claimed herein.

Inter Prediction Overview

According to JVET-T2002 Section 3.4. (Jianle Chen, et. al., “Algorithm description for Versatile Video Coding and Test Model 11 (VTM 11) ” , Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29, 20th Meeting, by teleconference, 7 –16 October 2020, Document: JVET-T2002) , for each inter-predicted CU, motion parameters consist of motion vectors, reference picture indices and reference picture list usage index, and additional information needed for the new coding feature of VVC to be used for inter-predicted sample generation. The motion parameter can be signalled in an explicit or implicit manner. When a CU is coded with skip mode, the CU is associated with one PU and has no significant residual coefficients, no coded motion vector delta or reference picture index. A merge mode is specified whereby the motion parameters for the current CU, which are obtained from neighbouring CUs, including spatial and temporal candidates, and additional schedules introduced in VVC. The merge mode can be applied to any inter-predicted CU, not only for skip mode. The alternative to the merge mode is the explicit transmission of motion parameters, where motion vector, corresponding reference picture index for each reference picture list and reference picture list usage flag and other needed information are signalled explicitly per each CU.

Beyond the inter coding features in HEVC, VVC includes a number of new and refined inter prediction coding tools listed as follows:

– Extended merge prediction

– Merge mode with MVD (MMVD)

– Symmetric MVD (SMVD) signalling

– Affine motion compensated prediction

– Subblock-based temporal motion vector prediction (SbTMVP)

– Adaptive motion vector resolution (AMVR)

– Motion field storage: 1/16^th luma sample MV storage and 8x8 motion field compression

– Bi-prediction with CU-level weight (BCW)

– Bi-directional optical flow (BDOF)

– Decoder side motion vector refinement (DMVR)

– Geometric partitioning mode (GPM)

– Combined inter and intra prediction (CIIP)

The following description provides the details of those inter prediction methods specified in VVC.

Extended Merge Prediction

In VVC, the merge candidate list is constructed by including the following five types of candidates in order:

1) Spatial MVP from spatial neighbour CUs

2) Temporal MVP from collocated CUs

3) History-based MVP from an FIFO table

4) Pairwise average MVP

5) Zero MVs.

The size of merge list is signalled in sequence parameter set (SPS) header and the maximum allowed size of merge list is 6. For each CU coded in the merge mode, an index of best merge candidate is encoded using truncated unary binarization (TU) . The first bin of the merge index is coded with context and bypass coding is used for remaining bins.

The derivation process of each category of the merge candidates is provided in this session. As done in HEVC, VVC also supports parallel derivation of the merge candidate lists (or called as merging candidate lists) for all CUs within a certain size of area.

Spatial Candidate Derivation

The derivation of spatial merge candidates in VVC is the same as that in HEVC except that the positions of first two merge candidates are swapped. A maximum of four merge candidates (B₀, A₀, B₁ and A₁) for current CU 210 are selected among candidates located in the positions depicted in Fig. 2.The order of derivation is B₀, A₀, B₁, A₁ and B₂. Position B₂ is considered only when one or more neighbouring CU of positions B₀, A₀, B₁, A₁ are not available (e.g. belonging to another slice or tile) or is intra coded. After candidate at position A₁ is added, the addition of the remaining candidates is subject to a redundancy check which ensures that candidates with the same motion information are excluded from the list so that coding efficiency is improved. To reduce computational complexity, not all possible candidate pairs are considered in the mentioned redundancy check. Instead, only the pairs linked with an arrow in Fig. 3 are considered and a candidate is only added to the list if the corresponding candidate used for redundancy check does not have the same motion information.

Temporal Candidates Derivation

In this step, only one candidate is added to the list. Particularly, in the derivation of this temporal merge candidate for a current CU 410, a scaled motion vector is derived based on the co-located CU 420 belonging to the collocated reference picture as shown in Fig. 4. The reference picture list and the reference index to be used for the derivation of the co-located CU is explicitly signalled in the slice header. The scaled motion vector 430 for the temporal merge candidate is obtained as illustrated by the dotted line in Fig. 4, which is scaled from the motion vector 440 of the co-located CU using the POC (Picture Order Count) distances, tb and td, where tb is defined to be the POC difference between the reference picture of the current picture and the current picture and td is defined to be the POC difference between the reference picture of the co-located picture and the co-located picture. The reference picture index of temporal merge candidate is set equal to zero.

The position for the temporal candidate is selected between candidates C₀ and C₁, as depicted in Fig. 5. If CU at position C₀ is not available, is intra coded, or is outside of the current row of CTUs, position C₁ is used. Otherwise, position C₀ is used in the derivation of the temporal merge candidate.

History-Based Merge Candidates Derivation

The history-based MVP (HMVP) merge candidates are added to the merge list after the spatial MVP and TMVP. In this method, the motion information of a previously coded block is stored in a table and used as MVP for the current CU. The table with multiple HMVP candidates is maintained during the encoding/decoding process. The table is reset (emptied) when a new CTU row is encountered. Whenever there is a non-subblock inter-coded CU, the associated motion information is added to the last entry of the table as a new HMVP candidate.

The HMVP table size S is set to be 6, which indicates up to 5 History-based MVP (HMVP) candidates may be added to the table. When inserting a new motion candidate to the table, a constrained first-in-first-out (FIFO) rule is utilized where redundancy check is firstly applied to find whether there is an identical HMVP in the table. If found, the identical HMVP is removed from the table and all the HMVP candidates afterwards are moved forward, and the identical HMVP is inserted to the last entry of the table.

HMVP candidates could be used in the merge candidate list construction process. The latest several HMVP candidates in the table are checked in order and inserted to the candidate list after the TMVP candidate. Redundancy check is applied on the HMVP candidates to the spatial or temporal merge candidate.

To reduce the number of redundancy, check operations, the following simplifications are introduced:

1. The last two entries in the table are checked for redundancy with respect to A₁ and B₁ spatial candidates, respectively.

2. Once the total number of available merge candidates reaches the maximally allowed merge candidates minus 1, the merge candidate list construction process from HMVP is terminated.

Pair-Wise Average Merge Candidates Derivation

Pairwise average candidates are generated by averaging predefined pairs of candidates in the existing merge candidate list, using the first two merge candidates. The first merge candidate is defined as p0Cand and the second merge candidate can be defined as p1Cand, respectively. The averaged motion vectors are calculated according to the availability of the motion vector of p0Cand and p1Cand separately for each reference list. If both motion vectors are available in one list, these two motion vectors are averaged even when they point to different reference pictures, and its reference picture is set to the one of p0Cand; if only one motion vector is available, use the one directly; and if no motion vector is available, keep this list invalid. Also, if the half-pel interpolation filter indices of p0Cand and p1Cand are different, it is set to 0.

When the merge list is not full after pair-wise average merge candidates are added, the zero MVPs are inserted in the end until the maximum merge candidate number is encountered.

Merge Mode with MVD (MMVD)

In addition to the merge mode, where the implicitly derived motion information is directly used for prediction samples generation of the current CU, the merge mode with motion vector differences (MMVD) is introduced in VVC. A MMVD flag is signalled right after sending a regular merge flag to specify whether MMVD mode is used for a CU.

In MMVD, after a merge candidate is selected (referred as a base merge candidate in this disclosure) , it is further refined by the signalled MVDs information. The further information includes a merge candidate flag, an index to specify motion magnitude, and an index for indication of motion direction. In MMVD mode, one for the first two candidates in the merge list is selected to be used as MV basis. The MMVD candidate flag is signalled to specify which one is used between the first and second merge candidates.

Distance index specifies motion magnitude information and indicates the pre-defined offset from the starting points (612 and 622) for a L0 reference block 610 and L1 reference block 620. As shown in Fig. 6 an offset is added to either horizontal component or vertical component of the starting MV, where small circles in different styles correspond to different offsets from the centre. The relation of distance index and pre-defined offset is specified in Table 1.

Table 1 -The relation of distance index and pre-defined offset

Direction index represents the direction of the MVD relative to the starting point. The direction index can represent the four directions as shown in Table 2. It is noted that the meaning of MVD sign could be variant according to the information of starting MVs. When the starting MVs are an un-prediction MV or bi-prediction MVs with both lists pointing to the same side of the current picture (i.e. POCs of two references both larger than the POC of the current picture, or both smaller than the POC of the current picture) , the sign in Table 2 specifies the sign of the MV offset added to the starting MV. When the starting MVs are bi-prediction MVs with the two MVs pointing to the different sides of the current picture (i.e. the POC of one reference larger than the POC of the current picture, and the POC of the other reference smaller than the POC of the current picture) , and the difference of POC in list 0 is greater than the one in list 1, the sign in Table 2 specifies the sign of MV offset added to the list0 MV component of the starting MV and the sign for the list1 MV has an opposite value. Otherwise, if the difference of POC in list 1 is greater than list 0, the sign in Table 2 specifies the sign of the MV offset added to the list1 MV component of starting MV and the sign for the list0 MV has an opposite value.

The MVD is scaled according to the difference of POCs in each direction. If the differences of POCs in both lists are the same, no scaling is needed. Otherwise, if the difference of POC in list 0 is larger than the one in list 1, the MVD for list 1 is scaled, by defining the POC difference of L0 as td and POC difference of L1 as tb, described in Fig. 4. If the POC difference of L1 is greater than L0, the MVD for list 0 is scaled in the same way. If the starting MV is uni-predicted, the MVD is added to the available MV.

Table 2 -Sign of MV offset specified by direction index

Affine Motion Compensated Prediction

In HEVC, only translation motion model is applied for motion compensation prediction (MCP) . While in the real world, there are many kinds of motion, e.g. zoom in/out, rotation, perspective motions and the other irregular motions. In VVC, a block-based affine transform motion compensation prediction is applied. As shown Figs. 7A-B, the affine motion field of the block 710 is described by motion information of two control point (4-parameter) in Fig. 7A or three control point motion vectors (6-parameter) in Fig. 7B.

For 4-parameter affine motion model, motion vector at sample location (x, y) in a block is derived as:

For 6-parameter affine motion model, motion vector at sample location (x, y) in a block is derived as:

Where (mv_0x, mv_0y) is motion vector of the top-left corner control point, (mv_1x, mv_1y) is motion vector of the top-right corner control point, and (mv_2x, mv_2y) is motion vector of the bottom-left corner control point.

In order to simplify the motion compensation prediction, block based affine transform prediction is applied. To derive motion vector of each 4×4 luma subblock, the motion vector of the centre sample of each subblock, as shown in Fig. 8, is calculated according to above equations, and rounded to 1/16 fraction accuracy. Then, the motion compensation interpolation filters are applied to generate the prediction of each subblock with the derived motion vector. The subblock size of chroma-components is also set to be 4×4. The MV of a 4×4 chroma subblock is calculated as the average of the MVs of the top-left and bottom-right luma subblocks in the collocated 8x8 luma region.

As is for translational-motion inter prediction, there are also two affine motion inter prediction modes: affine merge mode and affine AMVP mode.

Affine Merge Prediction

AF_MERGE mode can be applied for CUs with both width and height larger than or equal to 8. In this mode, the CPMVs (Control Point MVs) of the current CU is generated based on the motion information of the spatial neighbouring CUs. There can be up to five CPMVP (CPMV Prediction) candidates and an index is signalled to indicate the one to be used for the current CU. The following three types of CPVM candidate are used to form the affine merge candidate list:

– Inherited affine merge candidates that are extrapolated from the CPMVs of the neighbour CUs

– Constructed affine merge candidates CPMVPs that are derived using the translational MVs of the neighbour CUs

– Zero MVs

In VVC, there are two inherited affine candidates at most, which are derived from the affine motion model of the neighbouring blocks, one from left neighbouring CUs and one from above neighbouring CUs. The candidate blocks are the same as those shown in Fig. 2. For the left predictor, the scan order is A₀->A₁, and for the above predictor, the scan order is B0->B₁->B₂. Only the first inherited candidate from each side is selected. No pruning check is performed between two inherited candidates. When a neighbouring affine CU is identified, its control point motion vectors are used to derived the CPMVP candidate in the affine merge list of the current CU. As shown in Fig. 9, if the neighbouring left bottom block A of the current block 910 is coded in affine mode, the motion vectors v₂ , v₃ and v₄ of the top left corner, above right corner and left bottom corner of the CU 920 containing block A are attained. When block A is coded with 4-parameter affine model, the two CPMVs of the current CU (i.e., v₀ and v₁) are calculated according to v₂, and v₃. In case that block A is coded with 6-parameter affine model, the three CPMVs of the current CU are calculated according to v₂ , v₃ and v₄.

Constructed affine candidate means the candidate is constructed by combining the neighbouring translational motion information of each control point. The motion information for the control points is derived from the specified spatial neighbours and temporal neighbour for a current block 1010 as shown in Fig. 10. CPMV_k (k=1, 2, 3, 4) represents the k-th control point. For CPMV₁, the B2->B3->A2 blocks are checked and the MV of the first available block is used. For CPMV₂, the B1->B0 blocks are checked and for CPMV₃, the A1->A0 blocks are checked. For TMVP is used as CPMV₄ if it’s available.

After MVs of four control points are attained, affine merge candidates are constructed based on the motion information. The following combinations of control point MVs are used to construct in order:

{CPMV₁, CPMV₂, CPMV₃} , {CPMV₁, CPMV₂, CPMV₄} , {CPMV₁, CPMV₃, CPMV₄} , {CPMV₂, CPMV₃, CPMV₄} , {CPMV₁, CPMV₂} , {CPMV₁, CPMV₃}

The combination of 3 CPMVs constructs a 6-parameter affine merge candidate and the combination of 2 CPMVs constructs a 4-parameter affine merge candidate. To avoid motion scaling process, if the reference indices of control points are different, the related combination of control point MVs is discarded.

After inherited affine merge candidates and constructed affine merge candidate are checked, if the list is still not full, zero MVs are inserted to the end of the list.

Affine AMVP Prediction

Affine AMVP mode can be applied for CUs with both width and height larger than or equal to 16. An affine flag in the CU level is signalled in the bitstream to indicate whether affine AMVP mode is used and then another flag is signalled to indicate whether 4-parameter affine or 6-parameter affine is used. In this mode, the difference of the CPMVs of current CU and their predictors CPMVPs is signalled in the bitstream. The affine AVMP candidate list size is 2 and it is generated by using the following four types of CPVM candidate in order:

– Inherited affine AMVP candidates that extrapolated from the CPMVs of the neighbour CUs

– Constructed affine AMVP candidates CPMVPs that are derived using the translational MVs of the neighbour CUs

– Translational MVs from neighbouring CUs

– Zero MVs

The checking order of inherited affine AMVP candidates is the same as the checking order of inherited affine merge candidates. The only difference is that, for AVMP candidate, only the affine CU that has the same reference picture as current block is considered. No pruning process is applied when inserting an inherited affine motion predictor into the candidate list.

Constructed AMVP candidate is derived from the specified spatial neighbours shown in Fig. 10. The same checking order is used as that in the affine merge candidate construction. In addition, the reference picture index of the neighbouring block is also checked. In the checking order, the first block that is inter coded and has the same reference picture as in current CUs is used. When the current CU is coded with the 4-parameter affine mode, and mv₀ and mv₁are both available, they are added as one candidate in the affine AMVP list. When the current CU is coded with 6-parameter affine mode, and all three CPMVs are available, they are added as one candidate in the affine AMVP list. Otherwise, the constructed AMVP candidate is set as unavailable.

If the number of affine AMVP list candidates is still less than 2 after valid inherited affine AMVP candidates and constructed AMVP candidate are inserted, mv₀, mv₁ and mv₂will be added as the translational MVs in order to predict all control point MVs of the current CU, when available. Finally, zero MVs are used to fill the affine AMVP list if it is still not full.

Affine Motion Information Storage

In VVC, the CPMVs of affine CUs are stored in a separate buffer. The stored CPMVs are only used to generate the inherited CPMVPs in the affine merge mode and affine AMVP mode for the lately coded CUs. The subblock MVs derived from CPMVs are used for motion compensation, MV derivation of merge/AMVP list of translational MVs and de-blocking.

To avoid the picture line buffer for the additional CPMVs, affine motion data inheritance from the CUs of the above CTU is treated differently for the inheritance from the normal neighbouring CUs. If the candidate CU for affine motion data inheritance is in the above CTU line, the bottom-left and bottom-right subblock MVs in the line buffer instead of the CPMVs are used for the affine MVP derivation. In this way, the CPMVs are only stored in a local buffer. If the candidate CU is 6-parameter affine coded, the affine model is degraded to 4-parameter model. As shown in Fig. 11, along the top CTU boundary, the bottom-left and bottom right subblock motion vectors of a CU are used for affine inheritance of the CUs in bottom CTUs. In Fig. 11, line 1110 and line 1112 indicate the x and y coordinates of the picture with the origin (0, 0) at the upper left corner. Legend 1120 shows the meaning of various motion vectors, where arrow 1122 represents the CPMVs for affine inheritance in the local buff, arrow 1124 represents sub-block vectors for MC/merge/skip/AMVP/deblocking/TMVPs in the local buffer and for affine inheritance in the line buffer, and arrow 1126 represents sub-block vectors for MC/merge/skip/AMVP/deblocking/TMVPs.

Adaptive Motion Vector Resolution (AMVR)

In HEVC, motion vector differences (MVDs) (between the motion vector and predicted motion vector of a CU) are signalled in units of quarter-luma-sample when use_integer_mv_flag is equal to 0 in the slice header. In VVC, a CU-level adaptive motion vector resolution (AMVR) scheme is introduced. AMVR allows MVD of the CU to be coded in different precisions. Dependent on the mode (normal AMVP mode or affine AVMP mode) for the current CU, the MVDs of the current CU can be adaptively selected as follows:

– Normal AMVP mode: quarter-luma-sample, half-luma-sample, integer-luma-sample or four-luma-sample.

– Affine AMVP mode: quarter-luma-sample, integer-luma-sample or 1/16 luma-sample.

The CU-level MVD resolution indication is conditionally signalled if the current CU has at least one non-zero MVD component. If all MVD components (that is, both horizontal and vertical MVDs for reference list L0 and reference list L1) are zero, quarter-luma-sample MVD resolution is inferred.

For a CU that has at least one non-zero MVD component, a first flag is signalled to indicate whether quarter-luma-sample MVD precision is used for the CU. If the first flag is 0, no further signalling is needed and quarter-luma-sample MVD precision is used for the current CU. Otherwise, a second flag is signalled to indicate half-luma-sample or other MVD precisions (integer or four-luma sample) is used for a normal AMVP CU. In the case of half-luma-sample, a 6-tap interpolation filter instead of the default 8-tap interpolation filter is used for the half-luma sample position. Otherwise, a third flag is signalled to indicate whether integer-luma-sample or four-luma-sample MVD precision is used for the normal AMVP CU. In the case of affine AMVP CU, the second flag is used to indicate whether integer-luma-sample or 1/16 luma-sample MVD precision is used. In order to ensure the reconstructed MV has the intended precision (quarter-luma-sample, half-luma-sample, integer-luma-sample or four-luma-sample) , the motion vector predictors for the CU will be rounded to the same precision as that of the MVD before being added together with the MVD. The motion vector predictors are rounded toward zero (that is, a negative motion vector predictor is rounded toward positive infinity and a positive motion vector predictor is rounded toward negative infinity) .

The encoder determines the motion vector resolution for the current CU using RD check. To avoid always performing the CU-level RD check four times for each MVD resolution, the RD check of MVD precisions other than quarter-luma-sample is only invoked conditionally in VTM11. For the normal AVMP mode, the RD cost of quarter-luma-sample MVD precision and integer-luma sample MV precision is computed first. Then, the RD cost of integer-luma-sample MVD precision is compared to that of quarter-luma-sample MVD precision to decide whether it is necessary to further check the RD cost of four-luma-sample MVD precision. When the RD cost for the quarter-luma-sample MVD precision is much smaller than that of the integer-luma-sample MVD precision, the RD check of four-luma-sample MVD precision is skipped. Then, the check of half-luma-sample MVD precision is skipped if the RD cost of integer-luma-sample MVD precision is significantly larger than the best RD cost of previously tested MVD precisions. For the affine AMVP mode, if the affine inter mode is not selected after checking rate-distortion costs of affine merge/skip mode, merge/skip mode, quarter-luma-sample MVD precision normal AMVP mode and quarter-luma-sample MVD precision affine AMVP mode, then 1/16 luma-sample MV precision and 1-pel MV precision affine inter modes are not checked. Furthermore, affine parameters obtained in quarter-luma-sample MV precision affine inter mode are used as starting search point in 1/16 luma-sample and quarter-luma-sample MV precision affine inter modes.

Combined Inter and Intra Prediction (CIIP)

In VVC, when a CU is coded in merge mode, if the CU contains at least 64 luma samples (that is, CU width times CU height is equal to or larger than 64) , and if both CU width and CU height are less than 128 luma samples, an additional flag is signalled to indicate if the combined inter/intra prediction (CIIP) mode is applied to the current CU. As its name indicates, the CIIP prediction combines an inter prediction signal with an intra prediction signal. The inter prediction signal in the CIIP mode P_inter is derived using the same inter prediction process applied to regular merge mode; and the intra prediction signal P_intra is derived following the regular intra prediction process with the planar mode. Then, the intra and inter prediction signals are combined using weighted averaging, where the weight value wt is calculated depending on the coding modes of the top and left neighbouring blocks (as shown in Fig. 12) of current CU 1210 as follows:

– If the top neighbour is available and intra coded, then set isIntraTop to 1, otherwise set isIntraTop to 0;

– If the left neighbour is available and intra coded, then set isIntraLeft to 1, otherwise set isIntraLeft to 0;

– If (isIntraLeft + isIntraTop) is equal to 2, then wt is set to 3;

– Otherwise, if (isIntraLeft + isIntraTop) is equal to 1, then wt is set to 2;

– Otherwise, set wt to 1.

The CIIP prediction is formed as follows:
P_CIIP= ( (4-wt) *P_inter+wt*P_intra+2) ＞＞2 (3)

Geometric Partitioning Mode (GPM)

In VVC, a Geometric Partitioning Mode (GPM) is supported for inter prediction as described in JVET-W2002 (Adrian Browne, et al., Algorithm description for Versatile Video Coding and Test Model 14 (VTM 14) , ITU-T/ISO/IEC Joint Video Exploration Team (JVET) , 23rd Meeting, by teleconference, 7–16 July 2021, document: document JVET-M2002) . The geometric partitioning mode is signalled using a CU-level flag as one kind of merge mode, with other merge modes including the regular merge mode, the MMVD mode, the CIIP mode and the subblock merge mode. A total of 64 partitions are supported by geometric partitioning mode for each possible CU size, w×h=2^m×2ⁿ with m, n ∈ {3…6} excluding 8x64 and 64x8. The GPM mode can be applied to skip or merge CUs having a size within the above limit and having at least two regular merge modes.

When this mode is used, a CU is split into two parts by a geometrically located straight line in certain angles. In VVC, there are a total of 20 angles and 4 offset distances used for GPM, which has been reduced from 24 angles in an earlier draft. The location of the splitting line is mathematically derived from the angle and offset parameters of a specific partition. In VVC, there are a total of 64 partitions as shown in Fig. 13, where the partitions are grouped according to their angles and dashed lines indicate redundant partitions. Each part of a geometric partition in the CU is inter-predicted using its own motion; only uni-prediction is allowed for each partition, that is, each part has one motion vector and one reference index. In Fig. 13, each line corresponds to the boundary of one partition. The partitions are grouped according to its angle. For example, partition group 1310 consists of three vertical GPM partitions (i.e., 90°) . Partition group 1320 consists of four slant GPM partitions with a small angle from the vertical direction. Also, partition group 1330 consists of three vertical GPM partitions (i.e., 270°) similar to those of group 1310, but with an opposite direction. The uni-prediction motion constraint is applied to ensure that only two motion compensated prediction are needed for each CU, same as the conventional bi-prediction. The uni-prediction motion for each partition is derived using the process described later.

If geometric partitioning mode is used for the current CU, then a geometric partition index indicating the selected partition mode of the geometric partition (angle and offset) , and two merge indices (one for each partition) are further signalled. The number of maximum GPM candidate size is signalled explicitly in SPS (Sequence Parameter Set) and specifies syntax binarization for GPM merge indices. After predicting each of part of the geometric partition, the sample values along the geometric partition edge are adjusted using a blending processing with adaptive weights using the process described later. This is the prediction signal for the whole CU, and transform and quantization process will be applied to the whole CU as in other prediction modes. Finally, the motion field of a CU predicted using the geometric partition modes is stored using the process described later.

Uni-Prediction Candidate List Construction

The uni-prediction candidate list is derived directly from the merge candidate list constructed according to the extended merge prediction process. Denote n as the index of the uni-prediction motion in the geometric uni-prediction candidate list. The LX motion vector of the n-th extended merge candidate (X = 0 or 1, i.e., LX = L0 or L1) , with X equal to the parity of n, is used as the n-th uni-prediction motion vector for geometric partitioning mode. These motion vectors are marked with “x” in Fig. 14. In case that a corresponding LX motion vector of the n-the extended merge candidate does not exist, the L (1 -X) motion vector of the same candidate is used instead as the uni-prediction motion vector for geometric partitioning mode.

Blending Along the Geometric Partitioning Edge

After predicting each part of a geometric partition using its own motion, blending is applied to the two prediction signals to derive samples around geometric partition edge. The blending weight for each position of the CU are derived based on the distance between individual position and the partition edge.

The two integer blending matrices (W₀ and W₁) are utilized for the GPM blending process. The weights in the GPM blending matrices contain the value range of [0, 8] and are derived based on the displacement from a sample position to the GPM partition boundary 1540 as shown in Fig. 15.

Specifically, the weights are given by a discrete ramp function with the displacement and two thresholds as shown in Fig. 16, where the two end points (i.e., -τ and τ) of the ramp correspond to lines 1542 and 1544 in Fig 15.

Here, the threshold τ defines the width of the GPM blending area and is selected as the fixed value in VVC. In other words, as JVET-Z0137 (Han Gao, et. al., “Non-EE2: Adaptive Blending for GPM” , Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29, 26th Meeting, by teleconference, 20–29 April 2022, JVET-Z0137) , the blending strength or blending area width θ is fixed for all different contents.

The weighing values in the blending mask can be given by a ramp function:

With a fixed θ=2 pel in the current ECM (VVC) design, this ramp function can be quantized as:
ω_m, n=Clip3 (0, 8, (d (m, n) +32+4) ＞＞3) (5)

The distance for a position (x, y) to the partition edge are derived as:

where i, j are the indices for angle and offset of a geometric partition, which depend on the signalled geometric partition index. The sign of ρ_x, j and ρ_y, j depend on angle index i.

Fig. 17 illustrates an example of GPM blending according to ECM 4.0 (Muhammed Coban, et. al., “Algorithm description of Enhanced Compression Model 4 (ECM 4) ” , Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29, 26th Meeting, by teleconference, 20–29 April 2022, JVET-Y2025) . In Fig. 17, the size of the blending region on each side of the partition boundary is indicated by θ. The weights for each part of a geometric partition are derived as following:
wIdxL (x, y) =partIdx ? 32+d (x, y) : 32-d (x, y) (10)

w₁ (x, y) =1-w₀ (x, y) (12)

The partIdx depends on the angle index i. One example of weigh w₀ is illustrated in Fig. 15, where the angle1510 and offset ρ_i 1520 are indicated for GPM index i and point 1530 corresponds to the centre of the block. Line 1540 corresponds to the GPM partitioning boundary

Motion Field Storage for Geometric Partitioning Mode

Mv1 from the first part of the geometric partition, Mv2 from the second part of the geometric partition and a combined MV of Mv1 and Mv2 are stored in the motion filed of a geometric partitioning mode coded CU.

The stored motion vector type for each individual position in the motion filed are determined as:
sType = abs (motionIdx) < 32 ? 2∶ (motionIdx≤0 ? (1 -partIdx) : partIdx) (13)

where motionIdx is equal to d (4x+2, 4y+2) , which is recalculated from equation (6) . The partIdx depends on the angle index i.

If sType is equal to 0 or 1, Mv0 or Mv1 are stored in the corresponding motion field, otherwise if sType is equal to 2, a combined MV from Mv0 and Mv2 are stored. The combined Mv are generated using the following process:

1) If Mv1 and Mv2 are from different reference picture lists (one from L0 and the other from L1) , then Mv1 and Mv2 are simply combined to form the bi-prediction motion vectors.

2) Otherwise, if Mv1 and Mv2 are from the same list, only uni-prediction motion Mv2 is stored.

Intra Mode Coding with 67 Intra Prediction Modes

To capture the arbitrary edge directions presented in natural video, the number of directional intra modes in VVC is extended from 33, as used in HEVC, to 65. The new directional modes not in HEVC are depicted as dotted arrows in Fig. 18, and the planar and DC modes remain the same. These denser directional intra prediction modes apply for all block sizes and for both luma and chroma intra predictions.

In VVC, several conventional angular intra prediction modes are adaptively replaced with wide-angle intra prediction modes for the non-square blocks.

In HEVC, every intra-coded block has a square shape and the length of each of its side is a power of 2. Thus, no division operations are required to generate an intra-predictor using DC mode. In VVC, blocks can have a rectangular shape that necessitates the use of a division operation per block in the general case. To avoid division operations for DC prediction, only the longer side is used to compute the average for non-square blocks.

To keep the complexity of the most probable mode (MPM) list generation low, an intra mode coding method with 6 MPMs is used by considering two available neighbouring intra modes. The following three aspects are considered to construct the MPM list:

– Default intra modes

– Neighbouring intra modes

– Derived intra modes.

A unified 6-MPM list is used for intra blocks irrespective of whether MRL and ISP coding tools are applied or not. The MPM list is constructed based on intra modes of the left and above neighbouring block. Suppose the mode of the left is denoted as Left and the mode of the above block is denoted as Above, the unified MPM list is constructed as follows:

– When a neighbouring block is not available, its intra mode is set to Planar by default.

– If both modes Left and Above are non-angular modes:

– MPM list → {Planar, DC, V, H, V -4, V + 4}

– If one of modes Left and Above is angular mode, and the other is non-angular:

– Set a mode Max as the larger mode in Left and Above

– MPM list → {Planar, Max, DC, Max -1, Max + 1, Max -2}

– If Left and Above are both angular and they are different:

– Set a mode Max as the larger mode in Left and Above

– if the difference of mode Left and Above is in the range of 2 to 62, inclusive

● MPM list → {Planar, Left, Above, DC, Max -1, Max + 1}

– Otherwise

● MPM list → {Planar, Left, Above, DC, Max -2, Max + 2}

– If Left and Above are both angular and they are the same:

– MPM list → {Planar, Left, Left -1, Left + 1, DC, Left -2}

Besides, the first bin of the MPM index codeword is CABAC context coded. In total three contexts are used, corresponding to whether the current intra block is MRL enabled, ISP enabled, or a normal intra block.

During 6 MPM list generation process, pruning is used to remove duplicated modes so that only unique modes can be included into the MPM list. For entropy coding of the 61 non-MPM modes, a Truncated Binary Code (TBC) is used.

Decoder Side Intra Mode Derivation (DIMD)

When DIMD is applied, two intra modes are derived from the reconstructed neighbour samples, and those two predictors are combined with the planar mode predictor with the weights derived from the gradients. The DIMD mode is used as an alternative prediction mode and is always checked in the high-complexity RDO mode.

To implicitly derive the intra prediction modes of a blocks, a texture gradient analysis is performed at both the encoder and decoder sides. This process starts with an empty Histogram of Gradient (HoG) with 65 entries, corresponding to the 65 angular modes. Amplitudes of these entries are determined during the texture gradient analysis.

In the first step, DIMD picks a template of T=3 columns and lines from respectively left side and above side of the current block. This area is used as the reference for the gradient based intra prediction modes derivation.

In the second step, the horizontal and vertical Sobel filters are applied on all 3×3 window positions, cantered on the pixels of the middle line of the template. At each window position, Sobel filters calculate the intensity of pure horizontal and vertical directions as G_x and G_y, respectively. Then, the texture angle of the window is calculated as:
angle=arctan (G_x/G_y) , (14)

which can be converted into one of 65 angular intra prediction modes. Once the intra prediction mode index of current window is derived as idx, the amplitude of its entry in the HoG [idx] is updated by addition of:
ampl = |G_x|+|G_y| (15)

Figs. 20A-C show an example of HoG, calculated after applying the above operations on all pixel positions in the template. Fig. 19A illustrates an example of selected template 1920 for a current block 1910. Template 1920 comprises T lines above the current block and T columns to the left of the current block. For intra prediction of the current block, the area 1930 at the above and left of the current block corresponds to a reconstructed area and the area 1940 below and at the right of the block corresponds to an unavailable area. Fig. 19B illustrates an example for T=3 and the HoGs are calculated for pixels 1960 in the middle line and pixels 1962 in the middle column. For example, for pixel 1952, a 3x3 window 1950 is used. Fig. 19C illustrates an example of the amplitudes (ampl) calculated based on equation (15) for the angular intra prediction modes as determined from equation (14) .

Once HoG is computed, the indices with two tallest histogram bars are selected as the two implicitly derived intra prediction modes for the block and are further combined with the Planar mode as the prediction of DIMD mode. The prediction fusion is applied as a weighted average of the above three predictors. To this aim, the weight of planar is fixed to 21/64 (～1/3) . The remaining weight of 43/64 (～2/3) is then shared between the two HoG IPMs, proportionally to the amplitude of their HoG bars. Fig. 20 illustrates an example of the blending process. As shown in Fig. 20, two intra modes (M1 2012 and M2 2014) are selected according to the indices with two tallest bars of histogram bars 2010. The three predictors (2040, 2042 and 2044) are used to form the blended prediction. The three predictors correspond to applying the M1, M2 and planar intra modes (2020, 2022 and 2024 respectively) to the reference pixels 2030 to form the respective predictors. The three predictors are weighted by respective weighting factors (ω₁, ω₂ and ω₃) 2050. The weighted predictors are summed using adder 2052 to generated the blended predictor 2060.

Besides, the two implicitly derived intra modes are included into the MPM list so that the DIMD process is performed before the MPM list is constructed. The primary derived intra mode of a DIMD block is stored with a block and is used for MPM list construction of the neighbouring blocks.

Template-based Intra Mode Derivation (TIMD)

Template-based intra mode derivation (TIMD) mode implicitly derives the intra prediction mode of a CU using a neighbouring template at both the encoder and decoder, instead of signalling the intra prediction mode to the decoder. As shown in Fig. 21, the prediction samples of the template (2112 and 2114) for the current block 2110 are generated using the reference samples (2120 and 2122) of the template for each candidate mode. A cost is calculated as the SATD (Sum of Absolute Transformed Differences) between the prediction samples and the reconstruction samples of the template. The intra prediction mode with the minimum cost is selected as the DIMD mode and used for intra prediction of the CU. The candidate modes may be 67 intra prediction modes as in VVC or extended to 131 intra prediction modes. In general, MPMs can provide a clue to indicate the directional information of a CU. Thus, to reduce the intra mode search space and utilize the characteristics of a CU, the intra prediction mode can be implicitly derived from the MPM list.

For each intra prediction mode in MPMs, the SATD between the prediction and reconstruction samples of the template is calculated. First two intra prediction modes with the minimum SATD are selected as the TIMD modes. These two TIMD modes are fused with weights after applying PDPC process, and such weighted intra prediction is used to code the current CU. Position dependent intra prediction combination (PDPC) is included in the derivation of the TIMD modes.

The costs of the two selected modes are compared with a threshold, in the test, the cost factor of 2 is applied as follows:

costMode2 < 2*costMode1.

If this condition is true, the fusion is applied, otherwise only mode1 is used. Weights of the modes are computed from their SATD costs as follows:

weight1 = costMode2/ (costMode1+ costMode2)

weight2 = 1 -weight1.

Multi-Hypothesis Prediction (MHP)

In the multi-hypothesis inter prediction mode (JVET-M0425) , one or more additional motion- compensated prediction signals are signalled, in addition to the conventional bi-prediction signal. The resulting overall prediction signal is obtained by sample-wise weighted superposition. With the bi-prediction signal p_bi and the first additional inter prediction signal/hypothesis h₃, the resulting prediction signal p₃ is obtained as follows:
p₃= (1-α) p_bi+αh₃ (16)

The weighting factor α is specified by the new syntax element add_hyp_weight_idx, according to the following mapping (Table 3) :

Table 3. Mapping α to add_hyp_weight_idx

Analogously to above, more than one additional prediction signal can be used. The resulting overall prediction signal is accumulated iteratively with each additional prediction signal.
p_n+1= (1-α_n+1) p_n+α_n+1h_n+1 (17)

The resulting overall prediction signal is obtained as the last p_n (i.e., the p_n having the largest index n) . For example, up to two additional prediction signals can be used (i.e., n is limited to 2) .

The motion parameters of each additional prediction hypothesis can be signalled either explicitly by specifying the reference index, the motion vector predictor index, and the motion vector difference, or implicitly by specifying a merge index. A separate multi-hypothesis merge flag distinguishes between these two signalling modes.

For inter AMVP mode, MHP is only applied if non-equal weight in BCW is selected in bi-prediction mode. Details of MHP for VVC can be found in JVET-W2025 (Muhammed Coban, et. al., “Algorithm description of Enhanced Compression Model 2 (ECM 2) ” , Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29, 23rd Meeting, by teleconference, 7–16 July 2021, Document: JVET-W2025) .

GPM Extension

Several variations of GPM mode (JVET-W0097 (Zhipin Deng, et. al., “AEE2-related: Combination of EE2-3.3, EE2-3.4 and EE2-3.5” , Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29, 23rd Meeting, by teleconference, 7–16 July 2021, Document: JVET-W0097) and JVET-Y0065 (Yoshitaka Kidani, et. al., “EE2-3.1: GPM with inter and intra prediction (JVET-X0166) ” , Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29, 25th Meeting, by teleconference, 12–21 January 2022, Document: JVET-Y0065) ) have been proposed to improve the coding efficiency of the GPM mode in the VVC. The methods were included in the exploration experiment (EE2) for further evaluations, the main technical aspects of which are described as follows:

EE2-3.3 on GPM with MMVD (GPM-MMVD) : 1) additional MVDs are added to the existing GPM merge candidates; 2) the MVDs are signalled in the same manner as the MMVD in the VVC, i.e., one distance index plus one direction index; 3) two flags are signalled to separately control whether the MMVD is applied to each GPM partition or not.

EE2-3.4-3.5 on GPM with template matching (GPM-TM) : 1) template matching is extended to the GPM mode by refining the GPM MVs based on the left and above neighbouring samples of the current CU; 2) the template samples are selected dependent on the GPM split direction; 3) one single flag is signalled to jointly control whether the template matching is applied to the MVs of two GPM partitions or not.

JVET-W0097 proposes a combination of EE2-3.3, EE2-3.4 and EE2-3.5 to further improve the coding efficiency of the GPM mode. Specifically, in the proposed combination, the existing designs in EE2-3.3, EE2-3.4 and EE2-3.5 are kept unchanged while the following modifications are further applied for the harmonization of the two coding tools:

1) The GPM-MMVD and GPM-TM are exclusively enabled to one GPM CU. This is done by firstly signalling the GPM-MMVD syntax. When both two GPM-MMVD control flags are equal to false (i.e., the GPM-MMVD are disabled for two GPM partitions) , the GPM-TM flag is signalled to indicate whether the template matching is applied to the two GPM partitions. Otherwise (at least one GPM-MMVD flag is equal to true) , the value of the GPM-TM flag is inferred to be false.

2) The GPM merge candidate list generation methods in EE2-3.3 and EE2-3.4-3.5 are directly combined in a manner that the MV pruning scheme in EE2-3.4-3.5 (where the MV pruning threshold is adapted based on the current CU size) is applied to replace the default MV pruning scheme applied in EE2-3.3; additionally, as in EE2-3.4-3.5, multiple zero MVs are added until the GPM candidate list is fully filled.

In JVET-Y0065, in GPM with inter and intra prediction (or named GPM intra) , the final prediction samples are generated by weighting inter predicted samples and intra predicted samples for each GPM-separated region. The inter predicted samples are derived by the same scheme as the GPM in the current ECM whereas the intra predicted samples are derived by an intra prediction mode (IPM) candidate list and an index signalled from the encoder. The IPM candidate list size is pre-defined as 3. The available IPM candidates are the parallel angular mode against the GPM block boundary (Parallel mode) , the perpendicular angular mode against the GPM block boundary (Perpendicular mode) , and the Planar mode as shown Figs. 22A-C, respectively. Furthermore, GPM with intra and intra prediction as shown Fig. 22D is restricted in the proposed method to reduce the signalling overhead for IPMs and avoid an increase in the size of the intra prediction circuit on the hardware decoder. In addition, a direct motion vector and IPM storage on the GPM-blending area is introduced to further improve the coding performance.

Spatial GPM

Similar to inter GPM, Spatial GPM (SGPM) consists of one partition mode and two associated intra prediction modes. If these modes are directly signalled in the bit-stream, as shown in Fig. 23A, it would yield significant overhead bits. To express the necessary partition and prediction information more efficiently in the bit-stream, a candidate list is employed and only the candidate index is signalled in the bit-stream. Each candidate in the list can derive a combination of one partition mode and two intra prediction modes, as shown in Fig. 23B.

A template is used to generate this candidate list. The shape of the template is shown in Fig. 24. For each possible combination of one partition mode and two intra prediction modes, a prediction is generated for the template with the partitioning weight extended to the template, as shown in Fig. 24. These combinations are ranked in ascending order of their SATD between the prediction and reconstruction of the template. The length of the candidate list is set equal to 16, and these candidates are regarded as the most probable SGPM combinations of the current block. Both encoder and decoder construct the same candidate list based upon the template.

To reduce the complexity in building the candidate list, both the number of possible partition modes and the number of possible intra prediction modes are pruned. In the following test, 26 out of 64 partition modes are used, and only the MPMs out of 67 intra prediction modes are used.

Recently, more schemes to speed up the encoding time of SGPM and improve the gain of SGPM have been disclosed and some details can be found in JVET-AA0118 (Fan Wang, et. al., “EE2-1.4: Spatial GPM” , Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29, 27th Meeting, by teleconference, 13–22 July 2022, Document: JVET-AA0118) .

Encoding fast algorithm

In JVET-Z0124 (Fan Wang, et. al., “Non-EE2: Spatial GPM” , Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29, 26th Meeting, by teleconference, 20–29 April 2022, Document: JVET-Z0124) , full RDO is processed for every candidate from the candidate list of size 16. In the EE2-1.4 test, the SAD/SATD cost is used to filter the candidates before full RDO. In particular, if the SAD/SATD cost of a candidate is larger than a threshold, this candidate will not go to full RDO. The threshold is the best ever SAD/SATD cost of the current block multiplied by a ratio. The maximum number of full RDO for SGPM is limited to 8 for each block.

Simplification of candidate list derivation

In JVET-Z0124, when deriving the candidate list, for each possible combination of one partition mode and two intra prediction modes, a prediction is generated for the template with the partitioning weight extended to the template, and SATD between the prediction and reconstruction of the template was used as the criterion for ranking. In the EE2-1.4 test, the GPM blending process is not used in the template, and SAD is used as the criterion for ranking instead. The weights in the template are either 1 or 0. For each MPM, the two SADs of the two parts for each partition mode are calculated and saved. To get the SAD of one combination, only one addition of two corresponding SADs is needed.

Template matching based reordering for GPM split modes

In template matching based reordering for GPM split modes, given the motion information of the current GPM block, the respective TM cost values of GPM split modes are computed. Then, all GPM split modes are reordered in an ascending ordering based on the TM cost values. Instead of sending GPM split mode, an index using Golomb-Rice code to indicate where the exact GPM split mode is located in the reordering list is signalled.

The reordering method for GPM split modes is a two-step process performed after the respective reference templates of the two GPM partitions in a coding unit are generated, as follows:

● extending GPM partition edge into the reference templates of the two GPM partitions, resulting in 64 reference templates and computing the respective TM cost for each of the 64 reference templates;

● reordering GPM split modes based on their TM cost values in ascending order and marking the best 32 as available split modes.

The edge 2520 on the template (2530 and 2532) is extended from that of the current CU 2510, as shown in Fig. 25 illustrates, but GPM blending process is not used in the template area across the edge. After ascending reordering using TM cost, an index is signalled.

In order to improve video coding efficiency, blending one or more hypotheses of predictions with the existing one or more hypotheses of predictions is used for achieving a better accuracy of prediction. In one embodiment, a hypothesis of prediction means prediction from motion with a pre-defined direction (either list0 or list1) . In another embodiment, a hypothesis of prediction means prediction generated from a motion candidate (e.g. a merging candidate or an AMVP candidate) . In another embodiment, a hypothesis of prediction means prediction from motion with a pre-defined direction (either list0 or list1) or bi-prediction. For example, each hypothesis of prediction refers to prediction from a pre-defined direction instead of prediction from bi-direction. For another example, one hypothesis of prediction refers to prediction from a pre-defined direction instead of prediction from bi-direction. In another embodiment, a hypothesis of prediction means prediction from an intra candidate or a motion candidate. In another embodiment, a hypothesis of prediction means prediction from an intra candidate. In another embodiment, the blending-prediction tools refer to (but not limited to) any one or more tools listed as follows or any combination of the listed tools.

- The blending-prediction tools include bi-prediction motion candidates, which can be merge candidates and/or AMVP candidates which mean the motion parameters, such as motion vectors difference and/or reference index, are signalled.

- The blending-prediction tools include GPM, one or more variations in GPM extension, and/or spatial GPM.

Since more than one hypothesis of predictions is used for the current block, a blending process is required for forming the final prediction of the current block.

In the current invention, an adaptive blending process and/or flexible partition design is proposed to improve the weighting scheme used in blending predictions. In the following, take GPM to be the blending tool as an example. In the following, the proposed adaptive blending process can also be applied to one or more mentioned blending-prediction tools and/or any combination of mentioned blending-prediction tools.

First, a partition line (e.g. GPM partition boundary) is defined to divide the current block into two prediction regions (shown in the Fig. 26A) . In one embodiment, the regular partition line (or called splitting mode, partitioning candidates) can be a straight line (i.e., called a straightforward partition line) represented by an angle and a distance. In another embodiment, the partition line is from a flexible partition design. The concept of the flexible partition design is to form the flexible partition line by one or more line sections (for example, generated using the regular partition line) . For example, the flexible partition line is formed by 2-line sections. Fig. 26A illustrates examples of flexible partitioning formed by 2 line sections according to the present invention, where 32 flexible partitions, each with two prediction regions, are shown. In one sub-embodiment, each line section refers to one straightforward partition line. The following embodiments specify the supported line (or partitioning) combinations to form the flexible partition design. In another sub-embodiment, one line section refers to any one of horizontal, vertical, diagonal, or inverse diagonal splitting (or any one of the subset of mentioned splitting directions) with any one or more pre-defined distances. In another sub-embodiment, the intersection angle of the two line sections can be any angle formed by the two candidate line sections. In another sub-embodiment, the intersection angle of the two line sections should be larger than k *90 degrees (if k is a positive integer) or smaller than k *90 degrees (if k is a negative integer) . In another sub-embodiment, the intersection angle of the two line sections should be larger than a pre-defined threshold. In another sub-embodiment, the intersection angle of the two line sections should be smaller than a pre-defined threshold. In another sub-embodiment, any pre-defined set of the partition information (including information of partitioning angle and/or partitioning distance) for the two line sections is used to determine the supported line combinations of the flexible partition design. For example, the mentioned restriction or determining rule can be a subset or extension from the following. (The partition index can use the GPM partition index by signalling the index to indicate one candidate straightforward partition line for GPM. )

– (1) Comparison of first partition index and second partition index

For example, first partition index is smaller than second partition index. For another example, first partition index is larger than second partition index.

– (2) Comparison of first and second angles

For example, first and second angles are different. For another example, the difference between the first and second angle indices is larger than a pre-defined threshold. For another example, the difference between the first and second angle indices is smaller than a pre-defined threshold. For another example, the first and second angle indices are different. For another example, the difference between the first and second angles is larger than a pre-defined threshold. For another example, the difference between the first and second angles is smaller than a pre-defined threshold. For another example, the first and second angles are different. The pre-defined threshold is any pre-defined integer such as 0, 1, 2, 4, 8, or any number specified in the standard or indicated by the signalled syntax element at any block, CTU, SPS, PPS, tile, slice, picture, or sequence level.

– (3) Comparison of weights of the flexible partition design and regular partition design

For example, weights of new GPM partitions are different from weights of existing GPM partitions.

– (4) Checking if any redundancy in candidates of the flexible partition design

For example, weights of any two new GPM partitions are different from each other.

– (5) Checking if all or any subset of candidates in the flexible partition design are available/valid for all block sizes of the target blending mode.

For example, the above checks should be satisfied by any GPM block sizes

– (6) Checking if all or any subset of candidates in the flexible partition design are available/valid for small/narrow block sizes of the target blending mode.

For example, the above checks should be further satisfied by GPM block sizes equal to 4x4, 4x8, 8x4, 4x16, or 16x4

– (7) Checking width, height, or area of regions

For example, region 1 (for the first prediction) or region 2 (for the second prediction) is not too small (such as in terms of width, height, or area of region 1 or region 2) . In other words, width, height, or area of region 1 or region 2 is above a threshold. The pre-defined threshold is any pre-defined integer such as 0, 1, 2, 4, 8, or any number specified in the standard or indicated by the signalled syntax element at any block, CTU, SPS, PPS, tile, slice, picture, or sequence level.

In another sub-embodiment, the intersection angle of the two line sections should be smaller than k *90 degrees (if k is a positive integer) or larger than k *90 degrees (if k is a negative integer) . In another sub-embodiment, the intersection angle of the two line sections should be k *90 degrees where k is an integer. That is, the two line sections should be perpendicular to each other. Then the flexible partition design refers to a L-shape partition. In another sub-embodiment, at least one of the two line sections should be generated using vertical or horizontal splitting (or any one of the subset of mentioned splitting) . In another sub-embodiment, each of the two line sections should be generated using vertical or horizontal splitting (for example, each of the two line sections should be vertical or horizontal splitting) (or any one of the subset of mentioned splitting) . For example, one line section is generated using vertical splitting and the other line section is generated using horizontal splitting. For another example, two of the line sections being vertical splitting is not allowed. For another example, two of the line sections being horizontal splitting is not allowed.

In another embodiment, the selection of using a straightforward partition line or a flexible partition design depends on explicit signalling. For example, the explicit signalling refers to the syntax element at any block, CTU, SPS, PPS, tile, slice, picture, or sequence level. For example, the explicit signalling can be context or bypass coding.

In one sub-embodiment, a flag at the block level is signalled to indicate whether to use a flexible partition line for the current block. If the flag indicates using a flexible partition line, one or more indices are signalled to indicate the selected flexible partition line from a pre-defined flexible candidate set. Otherwise (e.g. the flag indicates to use a straightforward partition line) , an index is signalled to indicate a selected straightforward partition line from a pre-defined straightforward candidate set (e.g. existing GPM partition modes) .

- For example, the size of flexible candidate set is fixed as a predefined N, where N = 1, 2, 3, 4, 8, or any positive value. When N = 1, the index for indicating the selected flexible partition line is inferred as the one candidate flexible reference line without signalling. The following is an example of N = 32. (If N is smaller, any subset of the example can be used; if N is larger, any extension of the example can be used. )

- For another example, the size of flexible candidate set varies with block width, block height, block shape, or block area. The blocks width/height/area being larger (or smaller) or the block shape being narrow (or wider) may have more candidates. In an inverse example, the blocks width/height/area being larger (or smaller) or the block shape being narrow (or wider) may have less candidates. The block shape can be calculated using the ratio of block width and block height. For example, a wider block refers to the block with the block width larger than k*block height, where k is any positive number such as 1, 2, 4, or any positive integer. For another example, a narrow block refers to the block with the block height larger than k*block width where k is any positive number such as 1, 2, 4, or any positive integer. In another embodiment, when the flexible partition design is not allowed for the current block, the size of flexible candidate set for the current block is 0. In another embodiment, when the size of flexible candidate set is 0, the flexible partition design is not allowed for the current block and/or only the regular partition design can be used for the current block.

- For another example, the candidates in the flexible candidate set are decided according to block width, block height, block shape, or block area. For example, when the block shape is narrow or wide, the candidates in the flexible candidate set are different and/or belong to a subset of the flexible candidate set for the non-narrow/non-wide/square blocks. For another example, when the block width/height/area is small, the candidates in the flexible candidate set are different and/or belong to a subset of the flexible candidate set for the larger blocks.

- For another example, the candidates in the flexible candidate set are the same for all block sizes which are supported by flexible partition design. In another embodiment, the candidates in the flexible candidate set are uniform/aligned for all target blending-prediction tools. For example, the target blending-prediction tools include GPM and/or any or pre-defined subset of mentioned GPM variations.

In another sub-embodiment, flexible partition designed is only available for the blocks with block area larger than a pre-defined threshold, the block width or height or area larger than a pre-defined threshold, the block shape being not narrow and not wide, and/or the block ratio (longer side/shorter side) larger than a pre-defined threshold. When the flexible partition design is not available, the explicit signalling for flexible partition design is not signalled. (In an alternative way, “larger than” in this method can be replaced with “smaller than. ” )

In another sub-embodiment, flexible partition lines and straightforward (regular) partition lines are inserted into the same candidate partition set. An index is signalled to indicate the selected candidate from the candidate partition set. For example, the codeword length for each candidate is equal length.

In another embodiment, the selection of using a straightforward partition line or a flexible partition design is decided implicitly. For example, in some pre-defined cases, only candidate straightforward partition lines are available for the current block and in other pre-defined cases, only candidate flexible partition lines are available for the current block. For another example, in some pre-defined cases, part of candidate straightforward partition lines are replaced with candidate flexible partition lines. (Then for a block already supporting regular partition lines, no additional signalling is required for supporting flexible partition lines. ) In another embodiment, the pre-defined cases may specify according to the block width, height, area, and/or block shape. For example, for a wide block, some regular partitions with horizontal-oriented splitting are replaced with flexible partition lines. For another example, for a narrow block, some regular partitions with vertical-oriented splitting are replaced with flexible partition lines.

In one sub-embodiment, the selection rule depends on block width, block height, block shape, or block area. For example, the pre-defined cases include that the block width, height, and/or area is larger than a pre-defined threshold. For another example, the pre-defined cases include that the block width, height, and/or area is smaller than a pre-defined threshold.

In another embodiment, when deciding to use a flexible partition line, the signalling to indicate the selected flexible partition line includes the following syntax: the direction, orientation (horizontal or vertical) , and/or region position. The orientation (e.g. 0 or 1) means the starting splitting (e.g. vertical or horizontal) of flexible partition. The region position (e.g. 0 or 1) means the position of the first or second prediction region. The direction (e.g. 0 or 1) means turning direction (e.g. turning left or right, turning above or bottom) at the intersection point of the two line sections. Fig. 26B illustrates an example of signalling for flexible partitioning according to an embodiment of the present invention. In Fig. 26B, partitions 2610, 2612, 2614 and 2616 correspond to 4 partitions, each including one vertical splitting (i.e., Horizontal = 0) . The signalling for partitions 2610, 2612, 2614 and 2616 corresponds to (Horizontal 0, Position 1, Direction 1) , (Horizontal 0, Position 0, Direction 0) , (Horizontal 0, Position 0, Direction 1) , and (Horizontal 0, Position 1, Direction 0) respectively. In Fig. 26B, partitions 2620, 2622, 2624 and 2626 correspond to 4 partitions, each including one horizontal splitting (i.e., Horizontal = 1) . The signalling for partitions 2620, 2622, 2624 and 2626 corresponds to (Horizontal 1, Position 0, Direction 0) , (Horizontal 1, Position 1, Direction 1) , (Horizontal 1, Position 1, Direction 0) , and (Horizontal 1, Position 0, Direction 1) respectively.

In one sub-embodiment, the signalling to indicate the selected flexible partition line can further include the following syntax: the staring position of the flexible partition line and/or the ending position of the flexible partition line. Take the orientation being vertical as an example, the staring position k (where k = 1, 2, or 3) means the starting position is at (k/M) *W, where W is the block width and M = 4. Take the orientation being horizontal as an example, the staring position k (where k = 1, 2, or 3) means the starting position is at (k/M) *H, where H is the block height and M = 4. Similar examples can be used for the ending position. For example, if k can be 0, 1, …, or 7 and M =8. Some values of k may be forbidden.

In another embodiment, when deciding to use a flexible partition line, the signalling to indicate the selected flexible partition line includes the following syntaxes: the intersection position (xx, yy) 2710 and/or the corner covered by the first prediction region 2720 as shown in Fig. 27. If the flexible partition line is an L shape partition line 2730 as shown in Fig. 27.

For example, yy equal to k (where k = 1, 2, or 3) means that the y axis of intersection position is (k/M) *H, where H is the block height and M = 4. xx equal to k (where k = 1, 2, or 3) means that the x axis of intersection position is (k/M) *W, where W is the block width and M = 4.

In one sub-embodiment, xx and yy are indicated by separate indices. The signalling of xx (or yy) may depend on yy (or xx) .

In another sub-embodiment, xx and yy are indicated by a joint index. If total number of combinations of (xx, yy) are equal to 8, the joint index is ranging from 0 to 7.

In another sub-embodiment, (xx, yy) = (2, 2) is not available. Or any one or more combinations are forbidden to reduce the syntax overhead.

In another sub-embodiment, the corner can be the top-left corner, top right corner, bottom left corner and bottom-right corner of the current block or any subset. Fig. 28 illustrates the corner being the top-left corner 2810, top right corner 2820, bottom left corner 2830, bottom-right corner 2840 of the current block.

If total number of candidate corners is 4, the index of corner is ranging from 0 to 3. For example, the codeword length for each candidate corner can be equal length.

First, a partition line (e.g. GPM partition boundary) is defined to divide the current block 2910 into two prediction regions (shown in Fig. 29) . The region near the partition line (e.g. theta1 line 2932 to partition line 2920 and partition line 2920 to -theta2 line 2930) is defined as the blending region. Inside the blending region, multiple (e.g. 2, first_hyp_pred and second_hyp_pred) hypotheses of predictions are combined with weighting (referring to W0 [x] [y] ) . Outside the blending region: for those samples located at the first prediction region, the weight for second hypothesis of prediction is zero and the weight for first hypothesis of prediction is N; for those samples located at the second prediction region, the weight for first hypothesis of prediction is zero and the weight for second hypothesis of prediction is N.

In the following, an example of the weighting for using the flexible partition design is shown in Fig. 30. The partition indices equal to 20 and 24 refer to two straightforward partition lines and with the two straightforward partition lines (block 3010 and block 3020 respectively) being the two line sections, a partition line of flexible partition design is done by applying the max operation (3040) or min operation (3030) on the weighting values from the two straightforward partitions, w0 (x, y) and w1(x, y) as shown in Fig. 30, where x, y mean the weight for the sample located at (x, y) for the current block. When min operation is applied, min (w0 (x, y) , w1 (x, y) ) is the weight value for the sample located at (x, y) for the current block. When max operation is applied, max (w0 (x, y) , w1 (x, y) ) is the weight value for the sample located at (x, y) for the current block. Comparing the resulting flexible partition designs from max or min operation, the intersection angle from max is inverse to the intersection angle from min.

Several embodiments are proposed:

● In one embodiment, Final_pred [x] [y] = (first_hyp_pred [x] [y] *W0 [x] [y] +second_hyp_pred [x] [y] * (N –W0 [x] [y] ) + offset1) >> shift1

○ (x, y) is a sample position in the current block.

○ For the sample located at (x, y) in the current block, W0 [x] [y] is the weight for first_hyp_pred and (N –W0 [x] [y] ) is the weight for second_hyp_pred.

● In another embodiment, N is pre-defined as a fix positive integer (e.g. 8, 16, 32, or 64) or specified by a block-level, SPS-level, PPS-level, APS-level, PH-level, and/or SH-level syntax.

● In another embodiment, offset1and shift1 are decided according to N and/or BitDepth. For the example of N = 8,

○ shift1 = Max (5, 17 -BitDepth)

○ offset1 = 1 << (shift1 -1)

● In another embodiment, if sample (x, y) is at second prediction region (i.e., distance (x, y) is smaller than or equal to –theta2) , W0 [x] [y] is defined as 0. (or W0 [x] [y] is defined following the derivation for the blending region which results in a value equal to 0 or approaching to 0. )

● In another embodiment, if sample (x, y) is at first prediction region (i.e., distance (x, y) is larger than or equal to theta1) , W0 [x] [y] is defined as N. (or W0 [x] [y] is defined following the derivation for the blending region which results in a value equal to N or approaching to N.

● In another embodiment, thata1 equal to 0 means no blending within the first prediction region. That is, W0 [x] [y] is defined as N in the first prediction region.

● In another embodiment, thata2 equal to 0 means no blending within the second prediction region. That is, W0 [x] [y] is defined as 0 in the second prediction region.

● In another embodiment, if sample (x, y) is at blending region (i.e., distance (x, y) is larger than –theta2 and smaller than theta1) , W0 [x] [y] is defined according to the distance, theta1 and/or theta2. For example, W0 [x] [y] is defined following the existing GPM weight derivation (e.g. VVC method) by setting the theta (used in GPM weight derivation) as the proposed theta1 or theta2.

○ In another sub-embodiment, if sample (x, y) is at blending region within first prediction region (i.e., distance (x, y) is larger than 0 and smaller than theta1) , W0[x] [y] is defined according to distance and theta1. For example, W0 [x] [y] is defined as (N* (distance (x, y) +theta1) ) / (2*theta1) or can be simplified by quantizing. For example, after quantizing, W0 [x] [y] is defined as ( (distance’ (x, y) + 16*theta1 +offset2) >> shift2) with clipping to [0, N] .

■ distance’ can be wIdxL in the GPM introduction section

■ offset2 = theta1>>1

■ shift2 = log2 (theta1)

For another example, after quantizing, W0 [x] [y] is defined as ( (distance’ (x, y) +16*theta1 + offset3) >> shift3) with clipping to [0, N] .

■ distance’ can be wIdxL in the GPM introduction section

■ Offset3 can be N right-shifted by 1. Shift3 can be log2 (N) . Take N equal to 8 as an example. Offset3 will be 4 and shift3 will be 3.

○ In another sub-embodiment, if sample (x, y) is at blending region within second prediction region (i.e., distance (x, y) is smaller than 0 and larger than –theta2) , W0 [x] [y] is defined according to distance and theta2. For example, W0 [x] [y] is defined as (N* (distance (x, y) +theta2) ) / (2*theta2) or can be simplified by quantizing. For example, after quantizing, W0 [x] [y] is define as ( (distance’ (x, y) + 16*theta2 +offset2) >> shift2) with clipping to [0, N] .

■ distance’ can be wIdxL in the GPM introduction section

■ offset2 = theta2>>1

■ shift2 = log2 (theta2)

For another example, after quantizing, W0 [x] [y] is defined as ( (distance’ (x, y) +16*theta2 + offset3) >> shift3) with clipping to [0, N] .

■ distance’ can be wIdxL in the GPM introduction section

■ Offset3 can be N right-shifted by 1. Shift3 can be log2 (N) . Take N equal to 32 as an example. Offset3 will be 16 and shift3 will be 5.

○ In another sub-embodiment, if sample (x, y) is at blending region on the partition line (i.e., distance (x, y) is equal to 0) , W0 [x] [y] is defined as the case “sample (x, y) is at blending region within first prediction region” , the case “sample (x, y) is at blending region within second prediction region” , any proposed embodiments, or defined as equal weight (N >> 1) .

In the second aspect, several embodiments are proposed to decide the values of theta1 and theta2 as follows.

In one embodiment, theta1 is predefined as a fix value (e.g. 0, 1/2, 1/4, 1, 2, 4 or 8) or specified by a block-level, SPS-level, PPS-level, APS-level, PH-level, and/or SH-level syntax. This embodiment is applicable for theta2.

In another embodiment, theta1 is selected from a candidate set including at least one candidate values. This embodiment is applicable for theta2.

In one sub-embodiment, the candidate set includes at least one of {0, 1/2, 1/4, 1, 2, 4 or 8} or any combination of the above values.

In another sub-embodiment, the candidate set includes at least one of {a/b, a, b*a} or any combinations of the above values, where a and b are set as positive integers, such as a = 2 and b = 4.

In another sub-embodiment, the candidate set varies with the block width, block height, and/or the block area. For example, when the shorter side of the current block is equal to or smaller than a predefined threshold, only smaller values are included in the candidate set; otherwise only larger values are included in the candidate set.

In another embodiment, theta 1 can be the same or different from theta2. The benefit of allowing different values of theta1 and theta2 (allowing asymmetric theta1 and theta2) is that the best blending quality for diverse video sequences may need different blending regions for first prediction region and second prediction region. For example, if the area of the first prediction region is smaller, theta1 should be smaller than theta2. Or in an inverse way, if the area of the first prediction region is larger, theta1 should be smaller than theta2.

In one sub-embodiment, theta1 and theta2 have their own candidate sets (e.g. theta1_set and theta2_set) , respectively. For example, the candidate numbers (i.e., the candidate number = the number of candidates in a candidate set) for theta1_set and theta2_set can be different. For another example, one candidate set is the subset of the other candidate set. For another example, the candidate numbers for theta1_set and theta2_set are the same.

In another sub-embodiment, theta1 and theta2 share a single candidate set. For example, theta1 and theta2 are the same. For another example, theta1 and theta2 can be the same or different.

In another embodiment, the candidate number of the candidate set is defined as a fix value (e.g. 3 or 5) or specified by a block-level, SPS-level, PPS-level, APS-level, PH-level, and/or SH-level syntax.

In another embodiment, the selection of theta1 and theta2 depends on explicit signalling.

In one sub-embodiment, two individual syntaxes are signalled at block-level, SPS-level, PPS-level, APS-level, PH-level, and/or SH-level syntax to indicate theta1 and theta2, respectively. For example, theta1 and theta 2 are selected from a candidate set including {0, 1, 2, 4, 8} , respectively. An index (e.g. index_theta1, ranging from 0 to 4) is signalled to select one value from the candidate set and an index (e.g. index_theta2, ranging from 0 to 4) is signalled to select one value from the candidate set.

In one sub-embodiment, a syntax is signalled at the block-level, SPS-level, PPS-level, APS-level, PH-level, and/or SH-level syntax to indicate a combination of theta1 and theta2.

- Theta1 and theta 2 are selected from a candidate set including {0, 1, 2, 4, 8} . The candidate combinations of theta1 and theta2, denoted as (theta1, theta2) , can be

○ (0, 0) , (0, 1) , (0, 2) , (0, 4) , (0, 8) , (1, 0) , (1, 1) , (1, 2) , (1, 4) , (1, 8) , (2, 0) , (2, 1) , (2, 2) , (2, 4) , (2, 8) , (4, 0) , (4, 1) , (4, 2) , (4, 4) , (4, 8) , (8, 0) , (8, 1) , (8, 2) , (8, 4) , (8, 8) . (The number of candidate combinations can be reduced by other proposed methods in this invention. )

- An index (ranging from 0 to the number of candidate combinations-1) is signalled.

○ In one way, the index can be signalled with truncated unary coding.

○ In another way, the index can be context-coding.

○ In another way, the candidate combinations are ordered with their template costs in an ascending order to form a reordered list. (Template cost measurement can be referenced in the section related to implicit derivation rule in this invention. ) The signalled index refers to the position of the used combination in the reordered list. Both encoder and decoder construct the same reordered list based on the template.

■ The candidate combination with smallest template cost uses the shortest codewords among all candidate combinations.

In another embodiment, the selection of theta1 and theta2 depends on implicit derivation.

In one sub-embodiment, template matching is used as the implicit derivation rule:

- Step1: A template (or a neighbouring region of the current block, which was encoded or decoded before the current block) is used to measure the cost for each candidate combination of theta1 and theta2. For example, theta1 and theta 2 are selected from a candidate set including {0, 1, 2, 4, 8} . The candidate combinations of theta1 and theta2, denoted as (theta1, theta2) , can be

- Step2: For each candidate combination, a template cost is calculated according to the distortion between the “prediction” and reconstruction of the template.

○ The “prediction” is generated by applying GPM with blending (i.e., using the candidate combination) to the template. As shown in Fig. 31, the partition line is extended to the template.

○ The distortion can be SATD, SAD, MSE, SSE, or any distortion measurement equations/metrics.

- Step3: theta1 and theta2 are implicitly set by the combination with the smallest template cost.

In another embodiment, GPM variations/extensions can be any inter or intra modes which

- Split the current block into two regions with a split direction (i.e., a partition mode)

- Generate multiple hypotheses of prediction by different prediction modes, respectively

○ A prediction mode refers to a motion candidate, a motion information derived from one or more motion candidates, an intra prediction mode, …)

- Combine the multiple hypotheses of prediction to form prediction of the current block

○ Combining with sample-based weighting. That is, each sample will derive its own weight

○ For the samples near the split direction, combining weight for each hypothesis of prediction is not zero. That is, the predicted samples near the split direction are the combination from the predicted samples based on one prediction mode and the predicted samples based on another prediction mode.

In one sub-embodiment, a GPM variation/extension refers to GPM-MMVD, GPM-TM, GPM-intra, or SGPM.

In another sub-embodiment, a GPM variation/extension refers to GPM-intra or SGPM.

In another embodiment, for a block coded with GPM and/or any one of GPM variations/extensions, a joint index is used to indicate a combination of “a partition mode and one or more prediction modes for multiple hypotheses of prediction” , a combination of “a subset from the partition mode and the one or more prediction modes for multiple hypotheses of prediction” or a combination of “more than one prediction mode for multiple hypotheses of prediction” .

In one sub-embodiment, the block is coded with SGPM. For example, the combination includes a partition mode and two intra prediction modes. For another example, the combination includes two intra prediction modes.

In another sub-embodiment, the block is coded with GPM-intra. For example, the combination includes a partition mode, a motion candidate/information, and an intra prediction mode. For another example, the combination includes a motion candidate/information and an intra prediction mode. For another example, the combination includes a partition mode and an intra prediction mode. For another example, the combination includes a partition mode and a motion candidate/information.

In another sub-embodiment, a combination list is reordered according to a template matching-based method. For example, the combination list including a partition mode, a motion candidate/information, and an intra prediction mode is reordered according to a template matching-based method. For another example, the combination list including a motion candidate/information and an intra prediction mode is reordered according to a template matching-based method, and the template matching costs are determined for the combination list and a signalled partition mode. For another example, the combination list including a partition mode and an intra prediction mode is reordered according to a template matching-based method, and the template matching costs are determined for a combination list and a signalled motion candidate/information. For another example, the combination list including a partition mode and a motion candidate/information is reordered according to a template matching-based method, and the template matching costs are determined for a combination list and a signalled intra prediction mode.

In another sub-embodiment, the joint index is signalled/parsed in the bitstream. For example, the joint index is coded with truncated unary codewords. The following examples show examples of the combination with the shortest codewords.

○ In one example, when the block is coded with GPM-intra, the combination with the shortest codewords contains a pre-defined mode.

■ The pre-defined mode can be a motion candidate/information with merge index equal to M, where M can be a positive integer, such as 0, 1, …, or (size of merge candidate list -1) .

■ The pre-defined mode can be an intra prediction mode such as one of planar, DC, horizontal, vertical, parallel mode, and perpendicular mode.

■ The pre-defined mode can be a partition mode with vertical direction, horizontal direction, or diagonal direction.

○ For another example, the pre-defined rule depends on the block width, height, area, neighbouring mode information.

In another sub-embodiment, the joint index indicates a combination from a combination list. For one example, the order in the combination list implies the signalling priority order of the combinations. That is, the combination at the first position in the combination list is signalled/parsed with the shortest codewords among all combinations. For another example, the combination at the first position in the combination list is predefined.

○ In one way, one of planar, DC, horizontal, vertical, parallel mode, and perpendicular mode is predefined at the first position in the combination list when the current block is GPM-intra or SGPM.

○ In another way, the pre-defined rule depends on the block width, height, area, neighbouring mode information.

For another example, the syntax for indicating the combination at the first position in the combination list is coded with one or more contexts.

○ In one way, the context selection depends on the block width, height, area, neighbouring mode information.

○ In another way, the one or more used contexts are not reused by the remaining combinations in the combination list.

For another example, the syntax for indicating the combinations at the non-first position in the combination list is not coded with contexts.

In another sub-embodiment, the joint index indicates a combination from a reordered combination list according to a template matching-based method. For one example, the order in the reordered combination list implies the signalling priority order of the combinations. That is, the combination at the first position in the combination list is signalled/parsed with the shortest codewords among all combinations. For another example, the syntax for indicating the combination at the first position in the combination list is coded with one or more contexts.

In another embodiment, for a block coded with GPM and/or any one of GPM variations/extensions, an index is used to indicate a reordered list of prediction mode according to a template matching-based method, where the template matching cost is calculated for a prediction mode in the list of prediction mode, a signalled partition mode, and another prediction mode by another signalled index to indicate a list of another prediction mode.

In another sub-embodiment, the block is coded with GPM-intra. For example, a list of prediction mode contains one or more intra prediction modes, and the list is reordered according to a template matching-based method where the template matching cost is calculated for an intra prediction mode in the list of intra prediction modes, a signalled partition mode, and an inter prediction mode where an inter prediction mode is determined by a signalled index to indicate a list of inter prediction modes.

In another embodiment, the design between GPM modes (e.g. GPM and/or different GPM variations/extensions) is proposed to be unified. The benefit is that with the unified design, the circuit can be reused by GPM and/or different GPM variations/extensions.

In another sub-embodiment, the unified design refers to the blending design (e.g. adaptive blending process) . For example, the candidate set used in an adaptive blending process can be unified.

○ In one way, the candidates in the set are unified.

■ With unified, the candidates for the first unified GPM mode are the same as a subset of the candidates for the second unified GPM mode.

○ In another way, the numbers of candidates in the set are unified.

■ With unified, the number of candidates for the first unified GPM mode is the same as the number of candidates for the second unified GPM mode.

For another example, the selection rule to pick one candidate from the candidate set used in an adaptive blending process can be unified.

○ In one way, the selection rule for the first unified GPM mode is the same as the selection rule of the candidates for the second unified GPM mode.

○ In another way, the selection rule depends on signalling/parsing in the bitstream.

○ In another way, the selection rule depends on the block width, block height, block area, or neighbouring mode information.

In another sub-embodiment, the unified design refers to generation of a candidate list (e.g. used to get a prediction mode for a hypothesis of prediction) . For example, the candidate list is used to generate the hypothesis of intra prediction for GPM-Intra and to generate one or more hypotheses of intra prediction for SGPM.

○ In one way, the candidate list is IPM candidate list.

○ In another way, the candidate list is the MPM list used in normal intra mode (e.g. intra mode coding with 67 intra prediction modes or any extension from 67 intra prediction modes such as 131 intra prediction modes) or any subset of the MPM list.

■ The subset can be first N candidates in the MPM list used in normal intra mode where N can be any positive integer such as 1, 2, 3, 4, 5, 6, …or (size_of_MPM_list-1) .

○ In another way, the candidate list includes neighbouring mode information (e.g. neighbouring intra prediction mode) , where the neighbouring blocks can be one or more than one of the following, as shown in Fig. 32.

■ Left neighbouring block (L) or any block adjacent to the left boundary of the current block

■ Above neighbouring block (A) or any block adjacent to the above boundary of the current block

■ Below left neighbouring block (BL)

■ Above right neighbouring block (AR)

■ Above left neighbouring block (AL)

○ In another way, the candidate list includes one or more DIMD intra prediction modes (i.e., with two tallest histogram bars)

○ In another way, the candidate list includes one or more of DC, HOR, and VER.

For another example, the order in the candidate list implies the signalling priority order of the candidates. That is, the candidate at the first position in the list is signalled/parsed with the shortest codewords among all candidates. For another example, the candidate at the first position in the candidate list is predefined.

○ In one way, one of planar, DC, horizontal, vertical, parallel mode, and perpendicular mode is predefined at the first position in the candidate list when the current block is GPM-intra or SGPM.

For another example, the syntax for indicating the candidate at the first position in the candidate list is coded with one or more contexts.

○ In another way, the one or more used contexts are not reused by the remaining candidates in the candidate list.

For another example, the syntax for indicating the candidates at the non-first position in the candidate list is not coded with contexts.

The proposed methods in this invention can be unified with multiple blending tools. For example, the proposed methods used for GPM, GPM extension, and/or spatial GPM are unified.

In one embodiment, the proposed methods in this invention can only be applied to some predefined partition lines among all candidate partition lines.

The proposed methods in this invention can be enabled and/or disabled according to implicit rules (e.g. block width, height, or area) or according to explicit rules (e.g. syntax on the block, tile, slice, picture, SPS, or PPS level) . For example, the proposed method is applied when the block area is larger than a threshold. For another example, the proposed method is applied when the longer block side is larger than or equal to a threshold (e.g. 2) multiplied by the shorter block side.

The term “block” in this invention can refer to TU/TB, CU/CB, PU/PB, a predefined region, or CTU/CTB.

AMVP in this invention is like “AMVP” in JVET-T2002 (VVC tool description) . AMVP motion is from a motion candidate with syntax “merge flag” equal to false (e.g. general_merge_flag in VVC equal to false) .

Any combination of the proposed methods in this invention can be applied.

Any of the foregoing proposed blended prediction methods with a shared candidate list for GPM intra prediction and SGPM can be implemented in encoders and/or decoders. For example, any of the proposed blended prediction methods with a shared candidate list for GPM intra prediction and SGPM can be implemented in an intra/inter coding module (Intra Pred. 150 and/or Inter Pred. 112 in Fig. 1A) of an encoder, an intra prediction module (Intra Pred. 150 in Fig. 1B) and/or a motion compensation module (MC 152 in Fig. 1B) of a decoder. Alternatively, any of the proposed methods can be implemented as a circuit coupled to the intra/inter coding module of an encoder and/or motion compensation module, a merge candidate derivation module of the decoder.

Fig. 33 illustrates a flowchart of an exemplary video coding system that utilizes flexible partition with two line sections according to an embodiment of the present invention. The steps shown in the flowchart may be implemented as program codes executable on one or more processors (e.g., one or more CPUs) at the encoder side. The steps shown in the flowchart may also be implemented based hardware such as one or more electronic devices or processors arranged to perform the steps in the flowchart. According to this method, pixel data associated with a current block are received at an encoder side or coded data associated with the current block to be decoded are received at a decoder side in step 3310. The current block is partitioned into a first region and a second region according to a partition line in step 3320, wherein the partition line comprises at least two partitioning candidates and at least one of said at least two partitioning candidates corresponds to predefined splitting modes. The first region is encoded or decoded using a first coding mode in step 3330. The second region is encoded or decoded using a second coding mode in step 3340.

The flowchart shown is intended to illustrate an example of video coding according to the present invention. A person skilled in the art may modify each step, re-arranges the steps, split a step, or combine steps to practice the present invention without departing from the spirit of the present invention. In the disclosure, specific syntax and semantics have been used to illustrate examples to implement embodiments of the present invention. A skilled person may practice the present invention by substituting the syntax and semantics with equivalent syntax and semantics without departing from the spirit of the present invention.

The above description is presented to enable a person of ordinary skill in the art to practice the present invention as provided in the context of a particular application and its requirement. Various modifications to the described embodiments will be apparent to those with skill in the art, and the general principles defined herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed. In the above detailed description, various specific details are illustrated in order to provide a thorough understanding of the present invention. Nevertheless, it will be understood by those skilled in the art that the present invention may be practiced.

Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both. For example, an embodiment of the present invention can be one or more circuit circuits integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein. An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein. The invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA) . These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention. The software code or firmware code may be developed in different programming languages and different formats or styles. The software code may also be compiled for different target platforms. However, different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.

The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

A method of video coding, the method comprising:

receiving pixel data associated with a current block at an encoder side or coded data associated with the current block to be decoded at a decoder side;

partitioning the current block into a first region and a second region according to a partition line, wherein the partition line comprises at least two partitioning candidates and at least one of said at least two partitioning candidates corresponds to predefined splitting modes;

encoding or decoding the first region using a first coding mode; and

encoding or decoding the second region using a second coding mode.
The method of Claim 1, wherein one of the pre-defined splitting modes corresponds to horizontal, vertical, diagonal, or inverse diagonal splitting.
The method of Claim 1, wherein the partition line is determined by applying a max operation or a min operation on weighting values from said at least two partitioning candidates.
The method of Claim 1, wherein the partition line is generated by two existing partition candidates.
The method of Claim 1, wherein said partitioning the current block according to the partition line is allowed when first angle index and second angle index associated with said at least two partitioning candidates are different.
The method of Claim 5, wherein said partitioning the current block according to the partition line is allowed when difference between the first angle index and the second angle index is smaller than a threshold.
The method of Claim 1, wherein said partitioning the current block according to the partition line is allowed when width, height, and/or area of the first region or the second region is larger than a threshold.
The method of Claim 1, wherein signalling of the partition line comprises signalling an orientation to indicate starting splitting of the partition line.
The method of Claim 8, wherein the orientation corresponds to vertical orientation or horizontal orientation.
The method of Claim 1, wherein signalling of the partition line comprises signalling a region position to indicate positions of the first region and the second region.
The method of Claim 1, wherein signalling of the partition line comprises signalling a direction to indicate turning direction of an intersection point of the partitioning candidates.
The method of Claim 1, wherein signalling of the partition line sections comprises signalling an intersection position.
The method of Claim 12, wherein said signalling of the partition line sections further comprises signalling a corner covered by the first region.
The method of Claim 1, wherein said at least two partitioning candidates correspond to only two predefined splitting modes including horizontal splitting and vertical splitting.
An apparatus for video coding, the apparatus comprising one or more electronics or processors arranged to:

receive pixel data associated with a current block at an encoder side or coded data associated with the current block to be decoded at a decoder side, wherein the current block is coded using one of coding tools including a first coding tool and a second coding tool;

partition the current block into a first region and a second region according to a partition line, wherein the partition line comprises at least two partitioning candidates and at least one of said at least two partitioning candidates corresponds to predefined splitting modes;

encode or decode the first region using a first coding mode; and

encode or decode the second region using a second coding mode.