CN113273216B

CN113273216B - Improvement of MMVD

Info

Publication number: CN113273216B
Application number: CN202080008062.0A
Authority: CN
Inventors: 刘鸿彬; 张莉; 张凯; 王悦
Original assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Current assignee: Beijing ByteDance Network Technology Co Ltd; ByteDance Inc
Priority date: 2019-01-12
Filing date: 2020-01-13
Publication date: 2022-09-13
Anticipated expiration: 2040-01-13
Also published as: CN113273216A; WO2020143837A1

Abstract

This application relates to MMVD (motion vector difference Merge mode) improvement. A method for video processing comprising: determining one or more parameters of the current video block during a transition between the current video block and a bitstream representation of the current video block, wherein the one or more parameters of the current video block include at least one of a dimension and a prediction direction of the current video block; determining MMVD (motion vector difference Merge mode) side information based on at least one or more parameters of a current video block; and performing a conversion based at least on the MMVD side information; wherein the MMVD mode uses a motion vector expression including a motion direction, a motion magnitude distance, and a starting point of the current video block as a base Merge candidate.

Description

Improvement of MMVD

The application is a Chinese national phase application of international patent application No. PCT/CN2020/071848, filed on 13/1/2020, and timely claims the priority and benefit of international patent application No. PCT/CN2019/071503, filed on 12/1/2019. The entire disclosure of the above application is incorporated herein by reference as part of the disclosure of the present application.

Technical Field

This document relates to video coding and decoding techniques.

Background

Digital video occupies the largest bandwidth in the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, the bandwidth requirements for pre-counting the use of digital video will continue to grow.

Disclosure of Invention

The disclosed techniques may be used by a video decoder or encoder embodiment, where interpolation is improved using a block-shaped interpolation order technique.

In one example aspect, a method of video bitstream processing is disclosed. The method comprises the following steps: the method includes determining a shape of a first video block, determining an interpolation order based on the shape of the first video block, the interpolation order indicating a sequence in which horizontal interpolation and vertical interpolation are performed, and performing horizontal interpolation and vertical interpolation on the first video block in the sequence according to the interpolation order to reconstruct a decoded representation of the first video block.

In another example aspect, a method of video bitstream processing includes: the method includes determining a characteristic of a motion vector associated with a first video block, determining an interpolation order based on the characteristic of the motion vector, the interpolation order indicating a sequence in which horizontal and vertical interpolation is performed, and performing the horizontal and vertical interpolation on the first video block in the sequence according to the interpolation order to reconstruct a decoded representation of the first video block.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining, by a processor, dimensional characteristics of a first video block; determining, by the processor, to apply a first interpolation filter to the first video block based on the determination of the dimensional characteristic; and performing further processing of the first video block using the first interpolation filter.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining, by a processor, a first characteristic of a first video block; determining, by the processor, to apply a first interpolation filter to the first video block based on the first characteristic; performing further processing of the first video block using a first interpolation filter; determining, by the processor, a second characteristic of the second video block; determining, by the processor, to apply a second interpolation filter to the second video block based on the second characteristic, the first interpolation filter and the second interpolation filter being different short tap filters; and performing further processing of the second video block using a second interpolation filter.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining, by a processor, a characteristic of the first video block, the characteristic comprising one or more of dimensional information of the first video block, a prediction direction of the first video block, or motion information of the first video block; rounding a Motion Vector (MV) associated with the first video block to integer-pixel precision or half-pixel precision based on the determination of the characteristic of the first video block; and performing further processing of the first video block using the rounded motion vector.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining, by a processor, that a first video block is coded and decoded with a Merge mode; rounding motion information associated with the first video block to integer precision to generate modified motion information based on a determination that the first video block is coded using Merge mode; and performing motion compensation processing of the first video block using the modified motion information.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining a characteristic of the first video block, the characteristic being one or both of a size of the first video block or a shape of the first video block; modifying a motion vector associated with the first video block to integer pixel precision or half pixel precision to generate a modified motion vector; and performing further processing of the first video block using the modified motion vector.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining a characteristic of the first video block, the characteristic being one or both of a dimension of a size of the first video block or a prediction direction of the first video block; determining MMVD side information based on the determination of the characteristic of the first video block; and performing further processing of the first video block using the MMVD side information.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining a characteristic of the first video block, the characteristic being one or both of a size of the first video block or a shape of the first video block; determining a threshold number of half-pixel Motion Vector (MV) components or quarter-pixel MV components to restrict based on a determination of a characteristic of the first video block; and performing further processing of the first video block using the threshold number.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining a characteristic of the first video block, the characteristic comprising a size of the first video block; modifying a Motion Vector (MV) associated with the first video block from fractional precision to integer precision based on the determination of the characteristic of the first video block; and performing motion compensation of the first video block using the modified MV.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining a first dimension of a first video block; determining a first precision of a Motion Vector (MV) associated with the first video block based on the determination of the first dimension; determining a second dimension of the second video block, the first dimension and the second dimension being different dimensions; determining a second precision of the MV relative to the second video block based on the determination of the second dimension, the first precision and the second precision being different precisions; and performing further processing of the first video block using the first dimension and performing further processing of the second video block using the second dimension.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining one or more parameters of the current video block during a transition between the current video block and a bitstream representation of the current video block, wherein the one or more parameters of the current video block include at least one of a dimension and a prediction direction of the current video block; determining MMVD (motion vector difference, Merge, mode) side information based at least on one or more parameters of the current video block; and performing a conversion based at least on the MMVD side information; wherein the MMVD mode uses a motion vector representation including a motion direction, a motion magnitude distance, and a starting point as a base Merge candidate of the current video block.

In another example aspect, a method for video bitstream processing is disclosed. The method comprises the following steps: determining one or more parameters of the current video block during a transition between the current video block and a bitstream representation of the current video block, wherein the one or more parameters of the current video block include at least one of a size of the current video block and a shape of the current video block; determining a motion vector precision for the current video block based at least on one or more parameters of the current video block; and performing conversion based on the determined precision, wherein the current video block is converted in the MMVD mode, and the MMVD mode uses a motion vector expression including a motion direction, a motion magnitude distance, and a starting point as a base target candidate of the current video block.

In another example aspect, the above method may be implemented by a video decoder apparatus comprising a processor.

In another example aspect, the above-described method may be implemented by a video encoder apparatus that includes a processor for decoding encoded video during a video encoding process.

In yet another example aspect, the methods may be embodied in the form of processor-executable instructions and stored on a computer-readable program medium.

These and other aspects are further described herein.

Drawings

Fig. 1 is a diagram of a binary Quadtree (QTBT) structure.

Figure 2 illustrates an example derivation process for the Merge candidate list construction.

Fig. 3 shows example positions of spatial domain Merge candidates.

Fig. 4 shows an example of a candidate pair considering redundancy check for spatial domain Merge candidates.

Fig. 5a and 5b show example locations of second Prediction Units (PUs) of N × 2N and 2N × N partitions.

Fig. 6 is a diagram of motion vector scaling of temporal Merge candidates.

Fig. 7 shows example candidate positions C0 and C1 for the time domain Merge candidate.

Fig. 8 shows an example of combining bidirectional prediction Merge candidates.

Fig. 9 shows an example of the derivation process of the motion vector prediction candidate.

Fig. 10 is a diagram of motion vector scaling of spatial motion vector candidates.

Fig. 11 shows an example of Advanced Temporal Motion Vector Prediction (ATMVP) motion prediction for a Coding Unit (CU).

Fig. 12 shows an example of one CU with four sub-blocks (a-D) and their neighboring blocks (a-D).

Fig. 13 shows the non-adjacent Merge candidates proposed in J0021.

Fig. 14 shows the non-adjacent Merge candidate proposed in J0058.

Fig. 15 shows a non-adjacent Merge candidate proposed in J0059.

Fig. 16 shows an example of integer sample and fractional sample positions for quarter-sample luminance interpolation.

Fig. 17 is a block diagram of an example of a video processing apparatus.

Fig. 18 shows a block diagram of an example implementation of a video encoder.

Fig. 19 is a flowchart of an example of a video bitstream processing method.

Fig. 20 is a flowchart of an example of a video bitstream processing method.

Fig. 21 shows an example of repeated boundary pixels of a reference block before interpolation.

Fig. 22 is a flowchart of an example of a video bitstream processing method.

Fig. 23 is a flowchart of an example of a video bitstream processing method.

Fig. 24 is a flowchart of an example of a video bitstream processing method.

Fig. 25 is a flowchart of an example of a video bitstream processing method.

Detailed Description

Various techniques are provided herein that may be used by a decoder of a video bitstream to improve the quality of decompressed or decoded digital video. In addition, the video encoder may also implement these techniques during the encoding process in order to reconstruct the decoded frames for further encoding.

For ease of understanding, section headings are used herein and do not limit embodiments and techniques to the respective sections. Thus, embodiments from one section may be combined with embodiments from other sections.

1. Overview

The present invention relates to video coding techniques. In particular, it relates to interpolation in video codecs. It can be applied to existing video codec standards, such as HEVC, or to pending standards (multifunctional video codec). It is also applicable to future video coding standards or video codecs.

2. Background of the invention

Video codec standards have been developed primarily through the development of the well-known ITU-T and ISO/IEC standards. ITU-T has produced H.261 and H.263, ISO/IEC has produced MPEG-1 and MPEG-4 video, and these two organizations have produced H.262/MPEG-2 video and H.264/MPEG-4 Advanced Video Codec (AVC) and the H.265/HEVC standard together. Starting from h.262, the video codec standard is based on a hybrid video codec structure, in which temporal prediction plus transform coding is utilized. To explore future video codec technologies beyond HEVC, VCEG and MPEG have together established the joint video exploration team (jfet) in 2015. Since then, JFET has adopted many new approaches and applied them to a reference software named Joint Exploration Model (JEM). In month 4 of 2018, the joint video experts team (jfet) between VCEG (Q6/16) and ISO/IEC JTC1 SC29/WG11(MPEG) holds in an effort to address the VVC standard that targets 50% bit rate reduction compared to HEVC.

Fig. 18 is a block diagram of an exemplary implementation of a video encoder.

2.1 quad Tree plus binary Tree (QTBT) Block Structure with larger CTU

In HEVC, various local characteristics are accommodated by dividing the CTUs into CUs using a quadtree structure (denoted as a coding tree). At the CU level, it is decided whether to encode or decode a picture region using inter-picture (temporal) prediction or intra-picture (spatial) prediction. Each CU may be further divided into one, two, or four PUs depending on the partition type of the PU. In one PU, the same prediction process is applied and the relevant information is transmitted to the decoder on a PU basis. After obtaining the residual block by applying the prediction process based on the PU partition type, the CU may be partitioned into Transform Units (TUs) according to another quadtree structure similar to a coding tree of the CU. An important feature of the HEVC structure is that it has multiple partitioning concepts, including CU, PU, and TU.

The QTBT structure eliminates the concept of multiple segmentation types. That is, it eliminates the separation of CU, PU, and TU concepts and supports more flexibility of CU partition shapes. In the QTBT block structure, a CU may be square or rectangular. As shown in fig. 1, a coding/decoding tree unit (CTU) is first divided by a quadtree structure. The leaf nodes of the quadtree are further partitioned by a binary tree structure. There are two partition types in binary tree partitioning: a symmetrical horizontal division and a symmetrical vertical division. The binary tree leaf nodes are called Codec Units (CUs), and this partitioning is used for prediction and conversion processing without further partitioning. This means that CU, PU and TU have the same block size in the QTBT codec structure. In JEM, a CU sometimes consists of coding and decoding blocks (CBs) of different color components, e.g., in a P-slice and a B-slice of a 4:2:0 chroma format, one CU contains one luma CB and two chroma CBs, and a CU sometimes consists of CBs of a single component, e.g., in the case of an I-slice, one CU contains only one luma CB or only two chroma CBs.

The following parameters are defined for the QTBT segmentation scheme.

CTU size: the root node size of the quadtree is the same as the concept in HEVC.

MinQTSize: minimum allowed quadtree leaf node size

MaxBTSize: maximum allowed binary tree root node size

MaxBTDepth: maximum allowed binary tree depth

MinBTSize: minimum allowed binary tree leaf node size

In one example of the QTBT segmentation structure, the CTU size is set to 128 × 128 luma samples with two corresponding 64 × 64 chroma sample blocks, MinQTSize is set to 16 × 16, MaxBTSize is set to 64 × 64, MinBTSize (width and height) is set to 4 × 4, and MaxBTSize is set to 4. Quadtree partitioning is first applied to CTUs to generate quadtree leaf nodes. The sizes of the leaf nodes of the quadtree may have sizes from 16 × 16 (i.e., MinQTSize) to 128 × 128 (i.e., CTU size). If the leaf quad tree node is 128 x 128, it will not be further divided by the binary tree because its size exceeds MaxBTSize (i.e. 64 x 64). Otherwise, the leaf quadtree nodes may be further partitioned by the binary tree. Thus, the leaf nodes of the quadtree are also the root nodes of the binary tree, and their binary tree depth is 0. When the binary tree depth reaches MaxBTDepth (i.e., 4), no further partitioning is considered. When the width of the binary tree node is equal to MinBTSize (i.e., 4), no further horizontal partitioning is considered. Likewise, when the height of the binary tree node is equal to MinBTSize, no further vertical partitioning is considered. The leaf nodes of the binary tree are further processed by prediction and transformation processes without further partitioning. In JEM, the maximum CTU size is 256 × 256 luma samples.

Fig. 1 shows an example of a block partitioned by using QTBT, and fig. 1 (right side) shows the corresponding tree representation. The solid lines indicate quad-tree partitions and the dashed lines indicate binary tree partitions. In each partition (i.e., non-leaf) node of the binary tree, a flag is signaled to indicate which partition type (i.e., horizontal or vertical) is used, where 0 represents horizontal partition and 1 represents vertical partition. For the quad-tree partition, there is no need to indicate the partition type, because the quad-tree partition always divides one block horizontally and vertically at the same time to generate 4 sub-blocks of the same size.

Furthermore, the QTBT scheme supports the ability for luminance and chrominance to have separate QTBT structures. Currently, luminance and chrominance CTBs in one CTU share the same QTBT structure for P-and B-stripes. However, for the I-slice, the luma CTB is partitioned into CUs with a QTBT structure and the chroma CTB is partitioned into chroma CUs with another QTBT structure. This means that a CU in an I-slice consists of a codec block for a luma component or a codec block for two chroma components, and a CU in a P-slice or a B-slice consists of a codec block for all three color components.

In HEVC, to reduce memory access for motion compensation, inter prediction of small blocks is restricted such that 4 × 8 and 8 × 4 blocks do not support bi-prediction and 4 × 4 blocks do not support inter prediction. In the QTBT of JEM, these restrictions are removed.

Inter prediction in 2.2HEVC/H.265

Each inter-predicted PU has motion parameters of one or two reference picture lists. The motion parameters include a motion vector and a reference picture index. The use of one of the two reference picture lists can also be signaled using inter _ pred _ idc. Motion vectors can be explicitly coded as deltas relative to the predictor.

When a CU is coded in skip mode, one PU is associated with the CU and has no significant residual coefficients, no motion vector delta or reference picture index to code. A Merge mode is specified by which the motion parameters of a current PU may be obtained from neighboring PUs (including spatial and temporal candidates). The Merge mode may be applied to any inter-predicted PU, not just the skip mode. Another option for the Merge mode is the explicit transmission of motion parameters, where motion vectors (more precisely, motion vector differences compared to motion vector predictors), corresponding reference picture indices for each reference picture list, and the use of reference picture lists are all explicitly signaled per PU. In this disclosure, such a mode is named Advanced Motion Vector Prediction (AMVP).

When the signaling indicates that one of the two reference picture lists is to be used, the PU is generated from one block of samples. This is called "one-way prediction". Unidirectional prediction is available for both P-slices (slice) and B-slices.

When the signaling indicates that two reference picture lists are to be used, the PU is generated from two blocks of samples. This is called "bi-prediction". Bi-directional prediction is available only for B slices.

Details regarding inter prediction modes specified in HEVC are provided below. The description will start with the Merge mode.

2.2.1Merge mode

2.2.1.1 derivation of candidates for Merge mode

When predicting a PU using the Merge mode, the index pointing to an entry in the Merge candidate list is parsed from the bitstream and motion information is retrieved using the index. The construction of this list is specified in the HEVC standard and can be summarized in the following sequence of steps:

step 1: initial candidate derivation

Step 1.1: spatial domain candidate derivation

Step 1.2: spatial domain candidate redundancy check

Step 1.3: time domain candidate derivation

Step 2: additional candidate insertions

Step 2.1: creating bi-directional prediction candidates

Step 2.2: inserting zero motion candidates

These steps are also schematically shown in fig. 2. For spatial domain Merge candidate derivation, a maximum of four Merge candidates are selected among the candidates located at five different positions. For time domain Merge candidate derivation, at most one Merge candidate is selected among the two candidates. Since the number of candidates per PU is assumed to be constant at the decoder, additional candidates are generated when the number of candidates obtained from step 1 does not reach the maximum number of Merge candidates (maxNummergeCand) signaled in the slice header. Since the number of candidates is constant, the index of the best Merge candidate is encoded using truncated unary binarization (TU). If the size of the CU is equal to 8, all PUs of the current CU share a single large candidate list, which is the same as the large candidate list of the 2N × 2N prediction unit.

Hereinafter, operations associated with the foregoing steps are described in detail.

2.2.1.2 spatial domain candidate derivation

In the derivation of spatial domain Merge candidates, a maximum of four Merge candidates are selected among the candidates located at the positions shown in FIG. 3. The derivation sequence is A1, B1, B0, A ₀ And B ₂ . Only when the positions A1, B1, B0, A ₀ Is not available (e.g., because it belongs to another slice or slice) or is intra-coded, location B is considered ₂ . In the increased position A ₁ After the candidates are selected, a redundancy check is performed on the addition of the remaining candidates, which ensures that candidates with the same motion information are excluded from the list, thereby improving the codec. In order to reduce the computational complexity, not all possible candidate pairs are considered in the mentioned redundancy check. Instead, only pairs linked with arrows in fig. 4 are considered, and only when the corresponding candidates for redundancy check do not have the same motion information, the candidates are added to the list. Another source of duplicate motion information is the "second PU" associated with a different partition of 2Nx 2N. For example, fig. 5 depicts the second PU in the N × 2N and 2N × N cases, respectively. When the current PU is partitioned into Nx2N, position A is not considered for list construction ₁ Is used as a candidate of (1). In fact, adding this candidate may result in two prediction units with the same motion information, which is redundant for having only one PU in the coded unit. Likewise, when the current PU is divided into 2N, position B is not considered ₁ 。

2.2.1.3 time-domain candidate derivation

In this step, only one candidate is added to the list. In particular, in the derivation of this temporal domain Merge candidate, a scaled motion vector is derived based on the collocated PU belonging to the picture with the smallest POC difference from the current picture in a given reference picture list. The reference picture lists used to derive the collocated PUs are explicitly signaled in the slice header. A scaled motion vector of the temporal-domain Merge candidate is obtained (as indicated by the dashed line in fig. 6), which is scaled from the motion vector of the collocated PU using POC distances tb and td, where tb is defined as the POC difference between the reference picture of the current picture and the current picture, and td is defined as the POC difference between the reference picture of the collocated picture and the collocated picture. The reference picture index of the temporal region Merge candidate is set to zero. The actual implementation of the scaling process is described in the HEVC specification. For B slices, two motion vectors are obtained (one for reference picture list 0 and the other for reference picture list 1) and combined to make it a bi-predictive Merge candidate.

Fig. 6 is a diagram of motion vector scaling of temporal Merge candidates.

In collocated PU (Y) belonging to a reference frame, in candidate C ₀ And C ₁ The location of the time domain candidate is selected as shown in fig. 7. If at position C ₀ Is not available, is intra-coded or is outside the current CTU row, then position C is used ₁ . Otherwise, position C ₀ Is used for the derivation of the time domain Merge candidates.

2.2.1.4 additional candidate insertions

In addition to spatial and temporal Merge candidates, there are two additional types of Merge candidates: the bidirectional prediction Merge candidate and the zero Merge candidate are combined. The combined bidirectional prediction Merge candidates are generated using spatial and temporal Merge candidates. The combined bi-directional predicted Merge candidates are only for B slices. A combined bi-directional prediction candidate is generated by combining the first reference picture list motion parameters of the initial candidate with the second reference picture list motion parameters of the other candidate. If these two tuples provide different motion hypotheses they will form new bi-directional prediction candidates. As an example, fig. 8 shows the situation where two candidates with MVL0 and refIdxL0 or MVL1 and refIdxL1 in the original list (on the left) are used to create a combined bi-predictive Merge candidate that is added to the final list (on the right). A number of rules are defined regarding combinations that are considered to generate these additional Merge candidates.

Zero motion candidates are inserted to fill the remaining entries in the Merge candidate list to reach the capacity of MaxumMergeCand. These candidates have zero spatial displacement and reference picture indices that start from zero and increase each time a new zero motion candidate is added to the list. The number of reference frames used by these candidates is 1 and 2 for unidirectional and bi-directional prediction, respectively. Finally, no redundancy check is performed on these candidates.

2.2.1.5 motion estimation regions for parallel processing

To speed up the encoding process, motion estimation may be performed in parallel, thereby deriving motion vectors for all prediction units within a given region simultaneously. The derivation of Merge candidates from spatial neighbors may interfere with parallel processing because a prediction unit cannot derive motion parameters from neighboring PUs until its associated motion estimation is complete. To mitigate the trade-off between coding efficiency and processing delay, HEVC defines a Motion Estimation Region (MER), the size of which is signaled in the picture parameter set using a "log 2_ parallel _ merge _ level _ minus 2" syntax element. When defining MER, the Merge candidates falling into the same region are marked as unavailable and are therefore not considered in the list construction.

2.2.2AMVP

AMVP exploits the spatial-temporal correlation of motion vectors with neighboring PUs, which is used for explicit transmission of motion parameters. For each reference picture list, a motion vector candidate list is first constructed by checking the availability of temporally adjacent PU locations at the top left, removing redundant candidates, and adding a zero vector to make the candidate list length constant. The encoder may then select the best predictor from the candidate list and send a corresponding index indicating the selected candidate. Similar to the Merge index signaling, the index of the best motion vector candidate is encoded using a truncated unary. In this case, the maximum value to be encoded is 2 (see fig. 9). In the following sections, details regarding the derivation process of the motion vector prediction candidates are provided.

2.2.2.1 derivation of AMVP candidates

Fig. 9 summarizes the derivation process of the motion vector prediction candidates.

In motion vector prediction, two types of motion vector candidates are considered: spatial motion vector candidates and temporal motion vector candidates. For the derivation of spatial motion vector candidates, two motion vector candidates are finally derived based on the motion vectors of each PU located at five different positions as shown in fig. 3.

For the derivation of temporal motion vector candidates, one motion vector candidate is selected from two candidates, which are derived based on two different collocated positions. After the first list of spatio-temporal candidates is made, the duplicate motion vector candidates in the list are removed. If the number of potential candidates is greater than two, the motion vector candidate with a reference picture index greater than 1 in the associated reference picture list is removed from the list. If the number of spatial-temporal motion vector candidates is less than two, additional zero motion vector candidates are added to the list.

2.2.2.2 spatial motion vector candidates

In deriving spatial motion vector candidates, a maximum of two candidates are considered among the five potential candidates, which are from PUs at the positions shown in fig. 3, which are the same as the position of the motion Merge. The derivation order on the left side of the current PU is defined as A ₀ 、A ₁ And scaled A ₀ Zoom of A ₁ . The derivation order of the current PU upper side is defined as B ₀ 、B ₁ ,B ₂ Zoomed B ₀ Zoomed B ₁ Zoomed B ₂ . Thus, four cases per side can be used as motion vector candidates, two cases not requiring spatial scaling and two cases using spatial scaling. Four different cases are summarized as follows:

-spatial domain free scaling

(1) Same reference picture list, and same reference picture index (same POC)

(2) Different reference picture lists, but the same reference picture (same POC)

-spatial scaling

(3) Same reference picture list, but different reference pictures (different POCs)

(4) Different reference picture lists, and different reference pictures (different POCs)

First check for no spatial scaling and then check for cases where spatial scaling is allowed. Spatial scaling is considered when POC differs between reference pictures of neighboring PUs and reference pictures of the current PU, regardless of the reference picture list. If all PUs of the left candidate are not available or intra coded, scaling of the motion vectors is allowed to aid in the parallel derivation of the left and top MV candidates. Otherwise, spatial scaling of the motion vectors is not allowed.

Fig. 10 is a graphical representation of the motion vector scaling of spatial motion vector candidates.

In the spatial scaling process, the motion vectors of neighboring PUs are scaled in a similar manner to the temporal scaling, as shown in fig. 10. The main difference is that a reference picture list and an index of the current PU are given as input; the actual scaling process is the same as the time domain scaling process.

2.2.2.3 temporal motion vector candidates

All derivation processes of the temporal domain Merge candidate are the same as those of the spatial motion vector candidate except for the derivation of the reference picture index (see fig. 7). The reference picture index is signaled to the decoder.

New interframe Merge candidate in 2.3JEM

2.3.1 sub-CU-based motion vector prediction

In JEM with QTBT, each CU may have at most one set of motion parameters for each prediction direction. Two sub-CU level motion vector prediction methods are considered in the encoder by dividing a large CU into sub-CUs and deriving motion information for all sub-CUs of the large CU. An Alternative Temporal Motion Vector Prediction (ATMVP) method allows each CU to extract multiple sets of motion information from multiple smaller blocks in the collocated reference picture than the current CU. In the spatial-temporal motion vector prediction (STMVP) method, the motion vectors of sub-CUs are recursively derived by using a temporal motion vector predictor and spatial neighboring motion vectors.

In order to maintain a more accurate motion field for sub-CU motion prediction, motion compression of the reference frame is currently disabled.

2.3.1.1 alternative temporal motion vector prediction

In an Alternative Temporal Motion Vector Prediction (ATMVP) method, the motion vector Temporal Motion Vector Prediction (TMVP) is modified by extracting multiple sets of motion information (including motion vectors and reference indices) from blocks smaller than the current CU. As shown in fig. 11, the sub-CU is a square N × N block (default N is set to 4).

ATMVP predicts motion vectors of sub-CUs within a CU in two steps. The first step is to identify the corresponding block in the reference picture with a so-called temporal vector. The reference picture is called a motion source picture. The second step is to divide the current CU into sub-CUs and obtain the reference index and motion vector of each sub-CU from the corresponding block of each sub-CU, as shown in fig. 11.

In a first step, the reference picture and the corresponding block are determined from motion information of spatially neighboring blocks of the current CU. To avoid the repeated scanning process of the neighboring blocks, the first Merge candidate in the Merge candidate list of the current CU is used. The first available motion vector and its associated reference index are set as the temporal vector and the index to the motion source picture. In this way, in ATMVP, the corresponding block can be identified more accurately than in TMVP, where the corresponding block (sometimes referred to as a collocated block) is always located in the lower right or center position relative to the current CU.

In a second step, the corresponding block of the sub-CU is identified by a temporal vector in the motion source picture by adding the temporal vector to the coordinates of the current CU. For each sub-CU, the motion information of the sub-CU is derived using the motion information of its corresponding block (the minimum motion grid covering the central samples). After identifying the motion information corresponding to the nxn block, it is converted into a motion vector and reference index of the current sub-CU, as in the TMVP method of HEVC, where motion scaling and other processing is applied. For example, the decoder checks whether a low delay condition is met (e.g., POC of all reference pictures of the current picture is less than POC of the current picture)POC of preceding pictures) and possibly using motion vector MV _x (motion vector corresponding to reference picture list X) to predict motion vector MV for each sub-CU _y (X equals 0 or 1 and Y equals 1-X).

2.3.1.2 space-time motion vector prediction

In this method, the motion vectors of the sub-CUs are recursively derived in raster scan order. Fig. 12 illustrates this concept. Consider an 8 × 8 CU, which contains four 4 × 4 sub-CUs a, B, C, and D. The adjacent 4x4 blocks in the current frame are labeled a, b, c, and d.

The motion derivation of sub-CU a begins by identifying its two spatial neighbors. The first neighbor is the nxn block (block c) above the sub-CU a. If this block c is not available or intra-coded, the other nxn blocks above the sub-CU a are examined (from left to right, starting at block c). The second neighbor is the block to the left of sub-CU a (block b). If block b is not available or intra-coded, the other blocks to the left of sub-CU a are checked (from top to bottom, starting at block b). The motion information obtained by each list from neighboring blocks is scaled to the first reference frame of a given list. Next, the Temporal Motion Vector Predictor (TMVP) of sub-block a is derived following the same procedure as the TMVP derivation specified in HEVC. The motion information of the collocated block at position D is extracted and correspondingly scaled. Finally, after retrieving and scaling the motion information, all available motion vectors (up to 3) are averaged for each reference list, respectively. The average motion vector is specified as the motion vector of the current sub-CU.

2.3.1.3 sub-CU motion prediction mode signaling

The sub-CU mode is enabled as an additional Merge candidate and no additional syntax element is needed to signal this mode. Two additional Merge candidates are added to the Merge candidate list of each CU to represent ATMVP mode and STMVP mode. If the sequence parameter set indicates that ATMVP and STMVP are enabled, seven large candidates are used at the maximum. The coding logic of the additional Merge candidates is the same as the one of the HM, which means that two additional Merge candidates need to be subjected to two more RD checks for each CU in a P-slice or a B-slice.

In JEM, all blocks (bins) indexed by Merge are context coded by CABAC. However, in HEVC, only the first block is context-coded and the remaining blocks are context bypass-coded.

2.3.2 non-neighboring Merge candidates

In J0021, high-pass extraction derives additional spatial domain Merge candidates from non-adjacent neighboring locations (these locations are labeled 6 to 49 as shown in fig. 13). The derived candidates are added after the TMVP candidates in the Merge candidate list.

In J0058, lofting proposes to derive additional spatial domain Merge candidates from positions in the external reference region offset (-96 ) from the current block.

As shown in FIG. 14, the locations are labeled A (i, j), B (i, j), C (i, j), D (i, j), and E (i, j). Each candidate B (i, j) or C (i, j) has an offset of 16 in the vertical direction compared to its previous B or C candidate. Each candidate a (i, j) or D (i, j) has an offset of 16 in the horizontal direction compared to its previous a or D candidate. Each E (i, j) is shifted by 16 in both the horizontal and vertical directions compared to its previous E candidate. The candidates are checked from inside to outside. And the order of candidates is a (i, j), B (i, j), C (i, j), D (i, j), and E (i, j). Further study was made as to whether the number of Merge candidates could be further reduced. The TMVP candidate in the Merge candidate list is added after it.

In J0059, the extended spatial domain positions from 6 to 27 as shown in fig. 15 are checked according to the numerical order thereof after the time domain candidate. To save MV line buffers, all spatial candidates are restricted to two CTU lines.

2.4 Intra prediction in JEM

2.4.167 Intra-frame coding and decoding of intra-frame prediction modes

For luminance interpolation filtering, 2/4 precision samples used an 8-tap separable DCT-based interpolation filter, and 1/4 precision samples used a 7-tap separable DCT-based interpolation filter, as shown in table 1.

Table 1: 8-tap DCT-IF coefficients for 1/4 luminance interpolation.

Position of	Filter coefficient
		1/4	{-1,4,-10,58,17,-5,1}
2/4	{-1,4,-11,40,40,-11,4,-1}
		3/4	{1,-5,17,58,-10,4,-1}

Similarly, a 4-tap separable DCT-based interpolation filter is used for the chrominance interpolation filter, as shown in table 2.

Table 2: 4-tap DCT-IF coefficients for 1/8 luminance interpolation.

Position of	Filter coefficient
		1/8	{-2,58,10,-2}
2/8	{-4,54,16,-2}
		3/8	{-6,46,28,-4}
4/8	{-4,36,36,-4}
		5/8	{-4,28,46,-6}
6/8	{-2,16,54,-4}
		7/8	{-2,10,58,-2}

For vertical interpolation of 4:2:2 and horizontal and vertical interpolation of 4:4:4 chroma channels, the odd positions in table 2 are not used, resulting in 1/4 chroma interpolation.

For bi-directional prediction, the bit depth of the output of the interpolation filter is kept at 14-bit precision, regardless of the source bit depth, before the averaging of the two prediction signals. The actual averaging process is performed implicitly, and the bit depth reduction process is as follows:

predSamples[x,y]＝predSamplesL0[x,y]+predSamplesL1[x,y]+off set)>>shift

wherein shift is (15-BitDepth) and offset is 1< (shift-1).

If both the horizontal and vertical components of the motion vector point to sub-pixel locations, then always horizontal interpolation is performed first, followed by vertical interpolation. For example, to interpolate subpixel j as shown in FIG. 16 _0,0 First, interpolate b according to equation 2-1 _0,k (k-3, -2, … 3) and then interpolating j according to equation 2-2 _0,0 . Here, shift1 is Min (4, BitDepthY-8), and shift2 is 6.

b0,k＝(-A-3,k+4*A-2,k-11*A-1,k+40*A0,k+40*A1,k-11* A2,k+4*A3,k-A4,k)>>shift1 (2-1)

j0,0＝(-b0,-3+4*b0,-2-11*b0,-1+40*b0,0+40*b0,1-11*b0,2 +4*b0,3-b0,4)>>shift2 (2-2)

Alternatively, we can perform vertical interpolation first, and then perform horizontal interpolation. In this case, to interpolate j _0,0 First interpolate h according to equations 2-3 _k,0 (k-3, -2, … 3) and then interpolate j according to equation 2-4 _0,0 . When BitDepthY is less than or equal to 8, shift1 is 0, no content is lost in the first interpolation stage, and therefore the final interpolation result is not changed by the interpolation order. However, when BitDepthY is greater than 8, shift1 is greater than 0. In this case, when a different interpolation order is applied, the final interpolation result may be different.

hk,0＝(-Ak,-3+4*Ak,-2-11*Ak,-1+40*Ak,0+40*Ak,1-11* Ak,2+4*Ak,3-Ak,4)>>shift1 (2-3)

j0,0＝(-h-3,0+4*h-2,0-11*h-1,0+40*h0,0+40*h1,0-11*h2,0 +4*h3,0-h4,0)>>shift2 (2-4)

3. Examples of problems addressed by embodiments

For the luma block size WxH, if we always perform horizontal interpolation first, the required interpolation (per pixel) is as shown in table 3.

Table 3: interpolation required for HEVC/JEM WxH luma component

On the other hand, if we first perform vertical interpolation, the required interpolation is as shown in table 4. Obviously, the optimal interpolation order is the order between table 3 and table 4 that requires a smaller number of interpolations.

Table 4: the interpolation required for the WxH luminance component when the interpolation order is reversed.

For the chroma component, if we always perform horizontal interpolation first, the required interpolation is ((H +3) x W + W x H)/(W x H) ═ 2+ 3/H; if we always perform vertical interpolation first, the required interpolation is ((W +3) x H + W x H)/(W x H) ═ 2+ 3/W.

As described above, when the bit depth of the input video is greater than 8, different interpolation orders may result in different interpolation results. Therefore, the interpolation order should be implicitly defined in both the encoder and the decoder.

4. Examples of the embodiments

To address these issues and provide other benefits, we propose a shape dependent interpolation order. Assume that the interpolation filter tap (in motion compensation) is N (e.g., 8, 6, 4, or 2) and the current block size is WxH.

Assume that the number of allowed MVDs (such as the number of entries in the distance table) in MMVD is M.

The following detailed examples should be considered as examples to explain the general concept. These examples should not be construed narrowly. Further, these examples may be combined in any manner.

1. It is proposed that the interpolation order depends on the current codec block shape (e.g. the codec block is a CU).

a. In one example, for width>High-level blocks, such as the CU, PU or subblock used in subblock-based prediction (e.g., affine, ATMVP, or BIO), perform vertical interpolation first and then horizontal interpolation, e.g., interpolating pixel d first _k,0 、h _k,0 And n _k,0 Then interpolate e _0,0 To r _0,0 . Equations 2-3 and 2-4 show j _0,0 An example of (a).

i. Alternatively, for a block with width > -height (such as a CU, PU or sub-block used in sub-block based prediction (e.g., affine, ATMVP or BIO)), vertical interpolation is performed first, followed by horizontal interpolation.

b. In one example, for a block of width < ═ height, such as a CU, PU, or subblock used in subblock-based prediction (e.g., affine, ATMVP, or BIO), horizontal interpolation is performed first, followed by vertical interpolation.

i. Alternatively, for a block of width < height (such as a CU, PU, or sub-block used in sub-block based prediction (e.g., affine, ATMVP, or BIO)), horizontal interpolation is performed first, followed by vertical interpolation

c. In one example, both the luma component and the chroma component follow the same interpolation order.

d. Alternatively, when one chroma codec block corresponds to multiple luma codec blocks (e.g., one chroma 4x4 block may correspond to two 8x4 or 4x8 luma blocks for a 4:2:0 color format), luma and chroma may use different interpolation orders.

e. In one example, when different interpolation orders are used, the scaling factors in the multiple stages (i.e., shift1 and shift2) may be further changed accordingly.

2. Alternatively, it is furthermore proposed that the interpolation order of the luminance components may further depend on the MVs.

a. In one example, if the vertical MV component points to a quarter-pixel position and the horizontal MV component points to a half-pixel position, then horizontal interpolation is performed first, followed by vertical interpolation.

b. In one example, if the vertical MV component points to a half-pixel position and the horizontal MV component points to a quarter-pixel position, then vertical interpolation is performed first, followed by horizontal interpolation.

c. In one example, the proposed method is only applicable to square codec blocks.

3. It is proposed that for blocks coded using a Merge mode (e.g., conventional Merge mode, triangle Merge mode, affine Merge mode, or other non-intra/non-AMVP mode), the relevant motion information may be modified to integer precision (such as via rounding) before invoking the motion compensation process.

a. Alternatively, the Merge candidates having the score Merge candidate may be excluded from the Merge list.

b. Alternatively, when a Merge candidate derived from spatial or temporal blocks or other means (such as HMVP, pairwise bi-predictive Merge candidates) is associated with a fractional motion vector, the fractional motion vector may first be modified to integer precision (such as via rounding) before being added to the Merge list.

c. In one example, a separate HMVP table may be maintained in real-time to store motion candidates with integer precision.

d. Alternatively, the above method may be applied only when the Merge candidate is a bidirectional prediction candidate.

e. In one example, the above approach applies to certain block dimensions, such as 4x16, 16x4, 4x8, 8x4, 4x 4.

f. In one example, the above method is applied to an AMVP codec block, where a Merge candidate may be replaced with an AMVP candidate.

g. In one example, the above method is applied to certain block modes, such as non-affine modes.

4. It is proposed that MMVD side information (such as distance table, direction) may depend on block dimensions and/or prediction direction (e.g. unidirectional prediction or bi-directional prediction).

a. In one example, a distance table with full integer precision may be defined or signaled.

b. In one example, if the base Merge candidate is associated with a motion vector of fractional precision, it may first be modified (such as via rounding) to integer precision and then used to derive a final motion vector for motion compensation.

5. For some block sizes or block shapes, the MVs in MMVD mode can be limited to integer-pixel precision or half-pixel precision.

a. In one example, if integer pixel precision is selected for the MMVD codec block, the base Merge candidate used in MMVD may first be modified to integer pixel precision (such as via rounding).

b. In one example, if half pixel precision is selected for the MMVD codec block, the base Merge candidate used in MMVD may be modified to half pixel precision (such as via rounding).

i. In one example, rounding may be performed in the base Merge list construction process, and therefore rounded MVs are used in pruning.

in one example, rounding may be performed after the base Merge list construction process, so ungrounded MVs are used in pruning.

c. In one example, if integer-pel precision or half-pel precision is used for MMVD mode, only MVDs of the same or lower precision are allowed.

i. For example, if integer-pel precision is used for MMVD mode, only MVDs of integer-pel precision, 2-pel precision, or N-pel precision (N > ═ 1) are allowed.

d. In one example, if K MVDs are not allowed in the MVD mode, the binarization of the MVD index may be modified because the maximum MVD index is M-K-1 instead of M-1. Meanwhile, different contexts may be used in CABAC coding.

e. In one example, rounding may be performed after the MV is derived in MMVD mode.

f. The constraints may be different for bi-directional prediction and uni-directional prediction. For example, no limitation may be applied in unidirectional prediction.

g. The limits may be different for different block sizes or block shapes.

6. It is proposed that for some block sizes or block shapes, the maximum number of half-pixel MV components or 1/4-pixel MV components (e.g., horizontal MV or vertical MV) may be limited.

a. In one example, the bitstream should comply with the restriction.

b. The constraints may be different for bi-directional prediction and uni-directional prediction. For example, no limitation may be applied in unidirectional prediction.

c. The constraints may be different for different block sizes or block shapes.

d. In one example, for a bi-predicted block, up to 3 1/4 pixel MV components are allowed.

e. In one example, for a bi-predicted block, a maximum of 2 1/4 pixel MV components are allowed.

f. In one example, for a bi-predicted block, a maximum of 1 1/4 pixel MV component is allowed.

g. In one example, for a bi-predicted block, a maximum of 0 1/4 pixel MV components are allowed.

h. In one example, for a uni-directional prediction block, a maximum of 1 1/4 pixel MV component is allowed.

i. In one example, for a uni-directional prediction block, a maximum of 0 1/4 pixel MV components are allowed.

7. It is proposed to round some components of the MV to integer-pixel precision or half-pixel precision depending on the dimensions (e.g. width and/or height, ratio of width and height) of the block or/and prediction direction or/and motion information.

a. In one example, the MV is rounded to the nearest integer pixel precision MV or/and half pixel precision MV.

b. In one example, different rounding approaches may be used. For example, rounding-down, rounding-up, rounding-towards-zero, or rounding-away from-zero may be used.

c. In one example, if the size (i.e., width x height) of the block is less than (or greater than) (and/or equal to) a threshold L (e.g., L16 or 64), then MV rounding may be applied to the horizontal or/and vertical MV components.

d. In one example, if the width (or height) of the block is less than (and/or equal to) the threshold L1 (e.g., L1 ═ 4, 8), MV rounding can be applied to the horizontal (or vertical) MV component.

e. In one example, the thresholds L and L1 may be different for bi-directional and uni-directional prediction blocks. For example, a smaller threshold may be used for bi-prediction blocks.

f. In one example, MV rounding may be applied if the ratio between width and height is greater than a first threshold or less than a second threshold (such as for narrow blocks like 4x16 or 16x 4).

g. In one example, MV rounding can only be applied when both the horizontal and vertical components of the MV are fractional (i.e., they point to fractional pixel positions rather than integer pixel positions).

h. Whether MV rounding is applied may depend on whether the current block is bi-directionally predicted or uni-directionally predicted.

i. For example, MV rounding is only applied when the current block is bi-directionally predicted.

i. Whether MV rounding is applied may depend on the prediction direction (e.g., from list 0 or list 1) and/or the associated motion vector. In one example, for bi-predicted blocks, whether MV rounding is applied may be different for different prediction directions.

i. In one example, if the MV in the prediction direction X (X ═ 0 or 1) has fractional components in both the horizontal and vertical directions, MV rounding may be applied to the N MV components in the prediction direction X; otherwise, MV rounding may not be applied. Here, N is 0, 1 or 2.

in one example, if N (N > ═ 0) MV components have fractional precision, MV rounding may be applied to M of the N MV components (0< ═ M < ═ N).

1. For bi-directional and uni-directional prediction blocks, N and M may be different.

2. For different block sizes (width or/and height or/and width x height), N and M may be different.

3. For example, for a bi-prediction block, N equals 4 and M equals 4.

4. For example, for a bi-prediction block, N equals 4, and M equals 3.

5. For example, for a bi-prediction block, N equals 4 and M equals 2.

6. For example, for a bi-prediction block, N equals 4 and M equals 1.

7. For example, for a bi-prediction block, N equals 3 and M equals 3.

8. For example, for a bi-prediction block, N equals 3 and M equals 2.

9. For example, for a bi-prediction block, N equals 3 and M equals 1.

10. For example, for a bi-prediction block, N equals 2 and M equals 2.

11. For example, for a bi-prediction block, N equals 2 and M equals 1.

12. For example, for a bi-prediction block, N equals 1 and M equals 1.

13. For example, for a uni-directional prediction block, N equals 2 and M equals 2.

14. For example, for a uni-directional prediction block, N equals 2 and M equals 1.

15. For example, for a uni-directional prediction block, N equals 1 and M equals 1.

in one example, K of the M MV components are rounded to integer pixel precision and M K MV components are rounded to half pixel precision, where K is 0, 1, …, M1.

j. Whether MV rounding is applied may be different for different color components (such as Y, Cb and Cr).

i. For example, whether and how MV rounding is applied may depend on color formats such as 4:2:0, 4:2:2, or 4:4: 4.

k. Whether and/or how MV rounding is applied may depend on block size (or width, height), block shape, prediction direction, etc.

i. In one example, some MV components of the 4x16 or/and 16x4 bi-directionally predicted or/and uni-directionally predicted luma blocks may be rounded to one-half pixel precision.

in one example, some MV components of a 4x4 uni-directional predicted or/and bi-directional predicted luma block may be rounded to integer-pixel precision.

in one example, some MV components of a bi-directionally predicted or/and uni-directionally predicted luma block at 4x8 or/and 8x4 may be rounded to integer-pixel precision.

In one example, MV rounding may not be applied to sub-block prediction, such as affine prediction.

i. In an alternative example, MV rounding may be applied to sub-block prediction, such as ATMVP prediction. In this case, each sub-block is treated as a codec block to determine whether and how to apply MV rounding.

8. It is proposed that for some block sizes, the motion vector of a block should be modified to integer precision before being used for motion compensation, e.g. if it is fractional precision.

9. In one example, the stored motion vectors and the motion vectors used for motion compensation may have different accuracies for certain block dimensions.

a. In one example, sub-pixel precision (also referred to as fractional precision, such as 1/4 pixels, 1/16 pixels) may be stored for blocks with certain block dimensions, but the motion compensation process is based on integer versions of these motion vectors (such as via rounding).

10. It is proposed that an indication that bi-prediction is not allowed for certain block dimensions may be signaled in a sequence parameter set/picture parameter set/sequence header/picture header/slice group header/CTU row/region/other high level syntax.

a. Alternatively, an indication that bi-prediction is not allowed for certain block dimensions may be signaled in the sequence parameter set/picture parameter set/sequence header/picture header/slice group header/CTU row/region/other high level syntax.

b. Alternatively, an indication that bi-directional prediction and/or uni-directional prediction is not allowed for certain block dimensions may be signaled in the sequence parameter set/picture parameter set/sequence header/picture header/slice group header/CTU line/region/other high level syntax.

c. Alternatively, additionally, such indications may only be applied to certain modes, such as non-affine modes.

d. Alternatively, additionally, when the block does not allow unidirectional/bidirectional prediction, the signaling of the AMVR index may be modified accordingly, such as only integer-pixel precision is allowed, or a different MV precision may be used.

e. Alternatively, the above-described methods (such as the bullets 3-9) may be applied in addition.

11. It is proposed that the consistent bitstream should follow the following rules: for some block dimensions, bi-predictive coding blocks only allow integer-pixel motion vectors.

a. It is proposed that the consistent bitstream should follow the following rules: for some block dimensions, bi-predictive coding blocks only allow integer-pixel motion vectors.

12. In one example, the above-mentioned block dimensions are, for example, 4x16, 16x4, 4x8, 8x4, 4x 4.

13. It is proposed that filters with different interpolation filters (e.g., different filter taps and/or different filter interpolation filter coefficients) may be used in the interpolation depending on the dimensions of the block (e.g., width and/or height, ratio of width and height).

a. Different filters may be used for vertical interpolation and horizontal interpolation. For example, vertical interpolation may apply a shorter-tap filter than a horizontal interpolation filter.

b. In one example, in some cases, an interpolation filter with fewer taps than the interpolation filter in VTM-3.0 may be applied. These interpolation filters with fewer taps are also referred to as "short tap filters".

c. In one example, if the size (i.e., width x height) of the block is less than (or greater than) a threshold L (e.g., L16 or 64), then a different filter (e.g., a short tap filter) may be used for horizontal or/and vertical interpolation.

d. In one example, if the width (or height) of the block is less than (and/or equal to) the threshold L1 (e.g., L1 ═ 4, 8), then a different filter (e.g., a short tap filter) may be used for horizontal (or vertical) interpolation.

e. In one example, if the ratio between the width and the height is greater than a first threshold or less than a second threshold (such as for narrow blocks like 4x16 or 16x 4), then a different filter (e.g., a short tap filter) may be selected than for other types of blocks.

f. In one example, the short tap filter may be used only when both the horizontal and vertical components of the MV are fractional (i.e., they point to fractional pixel positions rather than integer pixel positions).

g. Which filter to use (e.g., a short tap filter may or may not be used) may depend on whether the current block is bi-directional predicted or uni-directional predicted.

i. For example, a short tap filter is used only when the current block is bi-directionally predicted.

h. Which filter to use (e.g., short tap filter may or may not be used) may depend on the prediction direction (e.g., from list 0 or list 1) and/or the associated motion vector. In one example, for bi-predicted blocks, whether to use a short tap filter may be different for different prediction directions.

i. In one example, if the MV of the prediction direction X (X ═ 0 or 1) has fractional components in both the horizontal and vertical directions, then a short tap filter is used for the prediction direction X; otherwise, the short tap filter is not used.

in one example, if N (N > ═ 0) MV components have fractional precision, then a short tap filter may be applied to M (0< ═ M < ═ N) of the N MV components.

3. For example, for a bi-prediction block, N equals 4 and M equals 4.

4. For example, for a bi-prediction block, N equals 4 and M equals 3.

5. For example, for a bi-prediction block, N equals 4, and M equals 2.

6. For example, for a bi-prediction block, N equals 4 and M equals 1.

7. For example, for a bi-prediction block, N equals 3 and M equals 3.

8. For example, for a bi-prediction block, N equals 3 and M equals 2.

9. For example, for a bi-prediction block, N equals 3 and M equals 1.

10. For example, for a bi-prediction block, N equals 2 and M equals 2.

11. For example, for a bi-prediction block, N equals 2 and M equals 1.

12. For example, for a bi-prediction block, N equals 1 and M equals 1.

M MV components may use different short-tap filters.

1. In one example, K of the M MV components use an S1 tap filter and M K MV components use an S2 tap filter, where K is 0, 1, …, M1. For example, S1 equals 6, and S2 equals 4.

i. In one example, different filters (e.g., short tap filters) may be used for only some pixels. For example, they are only used for boundary pixels of the block.

i. For example, they are only used for the N1 right side columns or/and the N2 left side columns or/and the N3 top rows or/and the N4 bottom rows of blocks.

j. Whether to use a short tap filter may be different for the uni-directional prediction block and the bi-directional prediction block.

k. Whether or not to use short tap filters may be different for different color components (such as Y, Cb and Cr).

i. Whether and how the short tap filter is applied may depend on a color format such as 4:2:0, 4:2:2, or 4:4:4, for example.

Different short tap filters may be used for different blocks. The short tap filter selected may depend on block size (or width, height), block shape, prediction direction, etc.

i. In one example, a 7 tap filter is used for horizontal and vertical interpolation of 4x16 or/and 16x4 bi-directional prediction or/and uni-directional prediction luma blocks.

in one example, a 7 tap filter is used for horizontal (or vertical) interpolation of 4x4 uni-directional prediction or/and bi-directional prediction luma blocks.

in one example, a 6 tap filter is used for horizontal and vertical interpolation of 4x8 or/and 8x4 bi-directional predicted or/and uni-directional predicted luma blocks.

1. Alternatively, for bi-directional prediction or/and uni-directional prediction luma blocks of 4x8 or/and 8x4, a 6-tap filter and a 5-tap filter (or a 5-tap filter and a 6-tap filter) are used in horizontal interpolation and vertical interpolation, respectively.

Different short tap filters can be used for different types of motion vectors.

i. In one example, a longer tap length filter may be used for motion vectors having fractional components in only one direction (i.e., horizontal or vertical), and a shorter tap length filter may be used for motion vectors having fractional components in both the horizontal and vertical directions.

For example, 8-tap filters are used for 4x16 or/and 16x4 or/and 4x8 or/and 8x4 or/and 4x4 bi-prediction or/and uni-prediction blocks, which have fractional MV components in only one direction, and short-tap filters described in the clause 3.h are used for 4x16 or/and 16x4 or/and 4x8 or/and 8x4 or/and 4x4 bi-prediction or/and uni-prediction blocks, which have fractional MV components in both directions.

in one example, the interpolation filter for affine motion may be different from the interpolation filter for translational motion vectors.

in one example, a short tap interpolation filter may be used for affine motion as compared to a filter used to translate motion vectors.

In one example, the short tap filter may not be applied to sub-block prediction, such as affine prediction.

i. In an alternative example, a short tap filter may be applied to sub-block prediction, such as ATMVP prediction. In this case, each sub-block is treated as a codec block to determine whether and how to apply the short tap filter.

In one example, whether and/or how the short tap filter is applied may depend on the block dimensions, codec information, and the like.

i. In one example, a short tap filter may be applied when certain modes are enabled for a block (such as OBMC, interleaved affine prediction mode).

14. It is proposed that reference pixels of (W + N-1-PW) (W + N-1-PH)) can be extracted (instead of reference pixels of (W + N-1) ((H + N-1)) for motion compensation of WxH blocks, where PW and PH cannot be equal to 0 at the same time.

a. In addition, in one example, for the remaining reference pixels (not extracted, but needed for motion compensation), padding or derivation from the extracted reference samples may be applied.

b. Additionally, alternatively, the pixels at the block boundaries (upper, left, lower, and right boundaries) are repeatedly referenced to generate a block of (W + N1) × (H + N1), which is used for the final interpolation. Examples are shown in fig. 21, where W is 8, H is 4, N is 7, PW is 2, and PH is 3.

c. The extracted reference pixel may be identified by (x + MVXInt N/2+ offSet1, y + MVYInt N/2+ offSet2), where (x, y) is the upper left position of the current block, (MVXInt, MVYInt) is the integer part of MV, and offSet1 and offSet2 are integers such as-2, -1, 0, 1, 2.

d. In one example, PH is zero and only the left boundary or/and the right boundary are repeated.

e. In one example, PW is zero, and only the top or/and bottom boundaries are repeated.

f. In one example, PW and PH are both greater than zero, and the left or/and right boundary is repeated first, and then the top or/and bottom boundary is repeated.

g. In one example, PW and PH are both greater than zero, and the top or/and bottom boundary is repeated first, and then the left or/and right boundary is repeated.

h. In one example, the left boundary is repeated M1 times and the right boundary is repeated PW M1 times, where M1 is an integer and M1> is 0.

i. Alternatively, if M1 (or PW M1) is greater than 1, then instead of repeating the first left (or right) column M1 times, multiple columns may be utilized, such as M1 left side columns (or PW-M1 right side columns may be repeated.

i. In one example, the top boundary is repeated M2 times and the bottom boundary is repeated PH M2 times, where M2 is an integer and M2> is 0.

i. Alternatively, if M2 (or PH M2) is greater than 1, then multiple rows may be utilized, such as M2 top rows (or PH M2 bottom rows) may be repeated, instead of repeating the first top (or bottom) row M2 times.

j. In one example, some default values may be used for boundary padding.

k. In one example, this boundary pixel repetition method can be used only when both the horizontal and vertical components of the MV are fractional (i.e., they point to fractional pixel positions rather than integer pixel positions).

In one example, this boundary pixel repetition method may be applied to some or all of the reference blocks.

i. In one example, if the MV of the prediction direction X (X ═ 0 or 1) has fractional components in both the horizontal and vertical directions, the boundary pixel repetition method is used for the prediction direction X; otherwise, the boundary pixel repetition method is not used.

in one example, if N (N > ═ 0) MV components have fractional precision, then the boundary pixel repetition method is applied to M (0< ═ M < ═ N) of the N MV components.

3. For example, for a bi-prediction block, N equals 4 and M equals 4.

4. For example, for a bi-prediction block, N equals 4 and M equals 3.

5. For example, for a bi-prediction block, N equals 4, and M equals 2.

6. For example, for a bi-prediction block, N equals 4, and M equals 1.

7. For example, for a bi-prediction block, N equals 3 and M equals 3.

8. For example, for a bi-prediction block, N equals 3 and M equals 2.

9. For example, for a bi-prediction block, N equals 3 and M equals 1.

10. For example, for a bi-prediction block, N equals 2 and M equals 2.

11. For example, for a bi-prediction block, N equals 2 and M equals 1.

12. For example, for a bi-prediction block, N equals 1 and M equals 1.

14. For example, for a unidirectional prediction block, N equals 2 and M equals 1.

M MV components may use different boundary pixel repetition methods.

For different color components such as Y, Cb and Cr, PW and/or PH may be different.

i. For example, whether and how border pixel repetition is applied may depend on a color format such as 4:2:0, 4:2:2, or 4:4: 4.

In one example, the PW and/or PH may be different for different block sizes or shapes.

i. In one example, PW and PH are set to 1 for 4x16 or/and 16x4 bi-directional prediction or/and uni-directional prediction blocks.

in one example, PW and PH are set to 0 and 1 (or 1 and 0), respectively, for a 4x4 bi-predictive or/and uni-predictive block.

in one example, PW and PH are set to 2 for 4x8 or/and 8x4 bi-directional prediction or/and uni-directional prediction blocks.

1. Alternatively, for 4x8 or/and 8x4 bi-directional prediction or/and uni-directional prediction blocks, PW and PH are set to 2 and 3 (or 3 and 2), respectively.

In one example, PW and PH may be different for unidirectional prediction and bi-directional prediction.

p. the PW and PH values may be different for different kinds of motion vectors.

i. In one example, PW and PH may be smaller (even zero) for motion vectors having fractional components in only one direction (i.e., horizontal or vertical), and PW and PH may be larger for motion vectors having fractional components in both the horizontal and vertical directions.

For example, PW and PH are set to 0 for 4x16 or/and 16x4 or/and 4x8 or/and 8x4 or/and 4x4 bi-prediction or/and uni-directional prediction blocks with fractional MV components in only one direction, and PW and PH in item symbol 4.i are used for 4x16 or/and 16x4 or/and 4x8 or/and 8x4 or/and 4x4 bi-prediction or/and uni-directional prediction blocks with fractional MV components in both directions.

15. The proposed method may be applied to certain modes, block sizes/shapes and/or certain sub-block sizes.

a. The proposed method can be applied to certain modes, such as bi-directional prediction mode.

b. The proposed method can be applied to certain block sizes.

i. In one example, it is only applied to blocks of w × h < ═ T, where w and h are the width and height of the current block.

in one example, it is only applied to blocks of h < ═ T.

c. The proposed method may be applied to certain color components (such as only the luminance component).

16. The rounding operation can be defined as:

shift (x, s) is defined as

Shift(x,s)＝(x+off)>>s

SignShift (x, s) is defined as

Wherein off is such as 0 or 2 ^s-1 Is an integer of (1).

c. It may be defined as those operations in AMVR processing, affine processing, or other processing module for motion vector rounding.

17. In one example, how to round the MV may depend on the MV component.

a. For example, the y component of the MV is rounded to integer pixels, while the x component of the MV is not rounded.

b. In one example, the MV may be rounded to integer pixels prior to motion compensation for the luma component, but rounded to 2 pixels prior to motion compensation for the chroma component when the color format is 4:2: 0.

18. It is proposed to interpolate filtering using a bilinear filter for one or more specific cases, such as:

4x4 unidirectional prediction;

4x8 bidirectional prediction;

8x4 bidirectional prediction;

4x16 bidirectional prediction;

e.16x4 bi-directional prediction;

8x8 bidirectional prediction;

g.8x4 unidirectional prediction;

h.4x8 unidirectional prediction;

19. it is proposed that when applying multi-hypothesis prediction to one block, a short tap or a different interpolation filter may be applied compared to the filter applied to the conventional prediction mode.

a. In one example, a bilinear filter may be used.

b. A short tap or second interpolation filter may be applied to a reference picture list involving multiple reference blocks, while for another reference picture with only one reference block, the same filter may be applied as used for the conventional prediction mode.

c. The proposed method may be applied under certain conditions, such as certain temporal layers, the quantization parameters of the blocks/slices/pictures containing the block being within a range (such as greater than a threshold).

Fig. 17 is a block diagram of the video processing apparatus 1700. Apparatus 1700 may be used to implement one or more methods herein. The apparatus 1700 may be implemented in a smartphone, tablet, computer, internet of things (IoT) receiver, and/or the like. The apparatus 1700 may include one or more processors 1702, one or more memories 1704, and video processing hardware 1706. The processor 1702 may be configured to implement one or more of the methods described herein. Memory 1704 may be used to store data and code for implementing the methods and techniques herein. The video processing hardware 1706 may be used to implement some of the techniques described herein in hardware circuits.

Fig. 19 is a flow chart of a method 1900 of video bitstream processing. The method 1900 includes: determining (1905) a shape of the video block, determining (1910) an interpolation order based on the video block, the interpolation order indicating a sequence in which horizontal interpolation and vertical interpolation are performed, and performing horizontal interpolation and vertical interpolation according to the interpolation order of the video block to reconstruct (1915) a decoded representation of the video block.

Fig. 20 is a flow chart of a method 2000 of video bitstream processing. The method 2000 includes: determining (2005) a characteristic of a motion vector associated with the video block, determining (2010) an interpolation order for the video block based on the characteristic of the motion vector, the interpolation order indicating a sequence in which horizontal and vertical interpolation is performed, and performing the horizontal and vertical interpolation according to the interpolation order for the video block to reconstruct (2015) a decoded representation of the video block.

Fig. 22 is a flow chart of a method 2200 of video bitstream processing. The method 2200 comprises: determining (2205) dimensional characteristics of the first video block, determining (2210) to apply a first interpolation filter to the first video block based on the determination of the dimensional characteristics, and performing (2215) further processing of the first video block using the first interpolation filter.

Fig. 23 is a flow chart of a method 2300 of video bitstream processing. The method 2300 comprises: determining (2305) a first characteristic of the first video block, determining (2310) to apply a first interpolation filter to the first video block based on the determination of the first characteristic, performing (2315) further processing of the first video block using the first interpolation filter, determining (2320) a second characteristic of the second video block, determining (2325) to apply a second interpolation filter to the second video block based on the second characteristic, the first and second interpolation filters being different short tap filters, and performing (2330) further processing of the second video block using the second interpolation filter.

Fig. 24 is a flow chart of a method 2400 of video bitstream processing. The method 2400 includes: determining (2405) one or more parameters of the current video block during a transition between the current video block and a bitstream representation of the current video block, wherein the one or more parameters of the current video block include at least one of a dimension and a prediction direction of the current video block; determining (2410) MMVD (motion vector difference Merge mode) side information based on at least one or more parameters of the current video block; and performing (2415) a conversion based at least on the MMVD side information; wherein the MMVD mode uses a motion vector expression including a motion direction, a motion magnitude distance, and a starting point as a base Merge candidate of the current video block.

Fig. 25 is a flow chart of a method 2500 of video bitstream processing. The method 2500 includes: determining (2505) one or more parameters of a current video block during a transition between the current video block and a bitstream representation of the current video block, wherein the one or more parameters of the current video block include at least one of a size of the current video block and a shape of the current video block; determining (2510) a motion vector precision for the current video block based at least on one or more parameters of the current video block; and performing (2515) a conversion based on the determined precision, wherein the current video block is converted in the MMVD mode, and the MMVD mode uses a motion vector representation including a motion direction, a motion magnitude distance, and a starting point as a base Merge candidate of the current video block.

With reference to

methods

1900, 2000, 2200, 2300, 2400, and 2500, section 4 herein describes some examples of sequences that perform horizontal interpolation and vertical interpolation and their use. For example, as described in section 4, one of horizontal interpolation or vertical interpolation may be preferentially performed in different shapes of video blocks. In some embodiments, horizontal interpolation is performed before vertical interpolation, and in some embodiments, vertical interpolation is performed before horizontal interpolation.

Referring to

methods

1900, 2000, 2200, 2300, 2400, and 2500, a video block may be encoded in a video bitstream, wherein bit efficiency is achieved by using a bitstream generation rule related to an interpolation order that also depends on the shape of the video block.

The method can comprise the following steps: wherein the rounded motion vector comprises one or more of: rounded to the nearest integer pixel precision MV or rounded to half pixel precision MV.

The method can comprise the following steps: wherein the rounded MV comprises one or more of: rounding down, rounding up, rounding towards zero or rounding away from zero.

The method can comprise the following steps: wherein the dimension information indicates that the size of the first video block is less than a threshold, and the rounded MV is applied to one or both of the horizontal MV component or the vertical MV component based on the dimension information indicating that the size of the first video block is less than the threshold.

The method can comprise the following steps: wherein the dimension information indicates that the width or height of the first video block is less than a threshold, and the rounded MV is applied to one or both of the horizontal MV component or the vertical MV component based on the dimension information indicating that the width or height of the first video block is less than the threshold.

The method can comprise the following steps: wherein the threshold is different for bi-directional and uni-directional prediction blocks.

The method can comprise the following steps: wherein the dimension information indicates that a ratio between a width and a height of the first video block is greater than a first threshold or less than a second threshold, and wherein the rounding of the MV is based on the determination of the dimension information.

The method can comprise the following steps: wherein rounding the MV is further based on the horizontal and vertical components of the MV as a score.

The method can comprise the following steps: wherein the rounded MV is further based on whether the first video block is bi-predicted or uni-predicted.

The method can comprise the following steps: wherein the rounding MV is further based on a prediction direction associated with the first video block.

The method can comprise the following steps: wherein the rounded MV is further based on the color component of the first video block.

The method can comprise the following steps: wherein the rounding MV is further based on a size of the first video block, a shape of the first video block, or a predicted shape of the first video block.

The method can comprise the following steps: wherein the rounded MV is applied in sub-block prediction.

The method can comprise the following steps: wherein the short-tap filter is applied to the MV components based on the MV components having fractional precision.

The method can comprise the following steps: wherein the short tap filter is applied based on a dimension of the first video block or codec information of the first video block.

The method can comprise the following steps: wherein the short tap filter is applied based on the mode of the first video block.

The method can comprise the following steps: wherein default values are used for boundary padding associated with the first video block.

The method can comprise the following steps: wherein the Merge mode is one or more of: regular Merge mode, triangle Merge mode, affine Merge mode, or other non-intra or non-AMVP mode.

The method can comprise the following steps: wherein the Merge candidates having the score Merge candidates are excluded from the Merge list.

The method can comprise the following steps: wherein, rounding the motion information comprises: the Merge candidates associated with the fractional motion vectors are rounded to integer precision and the modified motion information is inserted into the Merge list.

The method can comprise the following steps: wherein the motion information is a bi-directional prediction candidate.

The method can comprise the following steps: where MMVD is the average magnitude of the vector difference.

The method can comprise the following steps: wherein the motion vector is in MMVD mode.

The method can comprise the following steps: wherein the first video block is an MMVD coded block to be associated with integer-pixel precision, and wherein a base Merge candidate used in the MMVD is modified to integer-pixel precision via rounding.

The method can comprise the following steps: wherein the first video block is an MMVD coded block to be associated with one-half pixel precision, and wherein the base Merge candidate used in MMVD is modified to one-half pixel precision via rounding.

The method can comprise the following steps: wherein the threshold number is a maximum number of allowable half-pixel MV components or quarter-pixel MV components.

The method can comprise the following steps: wherein the threshold number is different between bi-directional prediction and uni-directional prediction.

The method can comprise the following steps: wherein the indication that bi-prediction is not allowed is signaled in a sequence parameter set, a picture parameter set, a sequence header, a picture header, a slice group header, a CTU row, a region, or other high level syntax.

The method can comprise the following steps: wherein the method complies with a bitstream rule that only allows integer pixel motion vectors of bi-predictive coded blocks with a particular dimension.

The method can comprise the following steps: wherein the size of the first video block is: 4x6, 16x4, 4x8, 8x4, or 4x 4.

The method can comprise the following steps: wherein modifying or rounding the motion information comprises: the different MV components are modified in different ways.

The method can comprise the following steps: wherein the y-component of the first MV is modified or rounded to integer pixels and the x-component of the first MV is not modified or rounded.

The method can comprise the following steps: wherein the luma component of the first MV is rounded to integer pixels and the chroma component of the first MV is rounded to 2 pixels.

The method can comprise the following steps: wherein the first MV is associated with a video block having a color format of 4:2: 0.

The method can comprise the following steps: wherein the bilateral filter is used for 4x4 uni-directional prediction, 4x8 bi-directional prediction, 8x4 bi-directional prediction, 4x16 bi-directional prediction, 16x4 bi-directional prediction, 8x8 bi-directional prediction, 8x4 uni-directional prediction, or 4x8 uni-directional prediction.

It should be appreciated that the disclosed techniques may be implemented in a video encoder or decoder to improve compression efficiency when the compressed codec unit has a shape that is significantly different from a conventional square or semi-square rectangular block. For example, new codec tools using long or high codec units (such as units of 4x32 or 32x4 size) may benefit from the disclosed techniques.

The disclosed and other solutions, examples, embodiments, modules, and functional operations described herein may be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed herein and their structural equivalents, or in combinations of one or more of them. The disclosed embodiments and other embodiments may be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer-readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a combination of a machine-readable storage device, a machine-readable storage substrate, a memory device, a substance that affects a machine-readable propagated signal, or a combination of one or more of them. The term "data processing apparatus" includes all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.

A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD ROM and DVD ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this patent document contains many specifics, these should not be construed as limitations on the scope of any subject matter or claims, but rather as descriptions of features of certain embodiments of certain technologies. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various functions described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claim combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Also, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments of this patent document should not be understood as requiring such separation in all embodiments.

Only a few implementations and examples have been described and other implementations, enhancements and variations can be made based on what is described and illustrated in this patent document.

5. Examples of the embodiments

In the following examples, PW and PH were designed for 4x16, 16x4, 4x4, 8x4 and 4x8 blocks.

Assume that the MV of a block in reference list X is MVX, and the horizontal and vertical components of MVX are MVX [0] and MVX [1], respectively, and the integer parts of MVX [0] and MVX [1] are MVXInt [0] and MVXInt [1], respectively, where X is 0 or 1. Assume that the interpolation filter tap (in motion compensation) is N (e.g., 8, 6, 4, or 2), and the current block size is WxH, and the position of the current block (i.e., the position of the top-left pixel) is (x, y). The row and column indices start with 1, e.g., row H includes row 1, … (H1).

The following boundary pixel repetition processing is performed only when MVX [0] and MVX [1] are both scores.

5.1 example

For the 4X16 and 16X4 uni-directional and bi-directional prediction blocks, both PW and PH are set to 1 for the prediction direction X. First, (W + N-2) ((H + N-2)) reference pixels are extracted from a reference picture, wherein the upper left position of the reference pixels is identified by (MVXINt [0] + X-N/2+1, MVXINt [1] + y-N/2+ 1). Then, the (W + N-1) th column is generated by copying the (W + N-2) th column. Finally, the (H + N-1) th line is generated by copying the (H + N-2) th line.

For a 4x4 uni-directional prediction block, PW and PH are set to 0 and 1, respectively. First, (W + N-1) ((H + N-2)) reference pixels are extracted from a reference picture, wherein the upper left position of the reference pixels is identified by (MVXInt [0] + x-N/2+1, MVXInt [1] + y-N/2+ 1). Then, the (H + N-1) th line is generated by copying the (H + N-2) th line.

For the 4x8 and 8x4 uni-directional and bi-directional prediction blocks, PW and PH are set to 2 and 3, respectively. First, (W + N-3) ((H + N-4)) reference pixels are extracted from a reference picture, wherein the upper left position of the reference pixels is identified by (MVXINT [0] + x-N/2+2, MVXINT [1] + y-N/2+ 2). Then, the first column is copied to the left side thereof to obtain a W + N2 column, and then a (W + N1) th column is generated by copying the (W + N2) th column. Finally, the first row is copied over to obtain the H + N3 row, and then the (H + N2) th row and the (H + N1) th row are generated by copying the (H + N3) th row.

5.2 example

For the 4X16 and 16X4 uni-directional and bi-directional prediction blocks, both PW and PH are set to 1 for the prediction direction X. First, (W + N-2) ((H + N-2)) reference pixels are extracted from a reference picture, wherein the upper left position of the reference pixels is identified by (MVXINt [0] + X-N/2+2, MVXINt [1] + y-N/2+ 2). Then, the 1 st column is copied to the left side thereof to obtain a W + N1 column. Finally, the first row is copied over to obtain the H + N1 row.

For a 4x4 uni-directional prediction block, PW and PH are set to 0 and 1, respectively. First, (W + N-1) ((H + N-2)) reference pixels are extracted from a reference picture, wherein the upper left position of the reference pixels is identified by (MVXINT [0] + x-N/2+1, MVXINT [1] + y-N/2+ 2). The first row is then copied over to obtain the H + N1 row.

Claims

1. A video processing method, comprising:

determining one or more parameters of a current video block during a transition between a current video block and a bitstream of the current video block, wherein the one or more parameters of the current video block include at least one of a size of the current video block and a shape of the current video block;

determining a motion vector precision for the current video block based at least on the one or more parameters of the current video block; and

performing the conversion based on the determined precision;

wherein the current video block is converted in an MMVD (motion vector difference Merge mode) mode, and the MMVD mode uses a motion vector expression including a motion direction, a motion magnitude distance, and a starting point as a base Merge candidate of the current video block;

wherein performing the conversion based on the determined precision comprises:

modifying the accuracy of one or more base Merge candidates in the base Merge candidate list from which the starting point is selected, based on the determined accuracy.

2. The method of claim 1, wherein performing the conversion based on the determined precision comprises: modifying the precision of the starting point based on the determined precision.

3. The method of claim 2, wherein modifying the precision of the starting point comprises:

in response to the determined precision being integer-pixel precision, modifying the starting point to integer-pixel precision.

4. The method of claim 2, wherein modifying the precision of the starting point comprises:

in response to the determined precision being one-half pixel precision, modifying the starting point to one-half pixel precision.

5. The method of any of claims 2 to 4, wherein modifying the precision of the starting point comprises:

the starting point is modified via rounding.

6. The method of claim 1, wherein modifying the precision of one or more base Merge candidates in the base Merge candidate list comprises:

modifying the one or more base Merge candidates via rounding when constructing the base Merge candidate list.

7. The method of claim 1, wherein modifying the accuracy of one or more base Merge candidates in the base Merge candidate list comprises:

after the base Merge candidate list is constructed, the one or more base Merge candidates are modified through rounding.

8. The method of any of claims 1-4, wherein performing the conversion based on the determined precision comprises:

selecting the motion magnitude distance based on the determined accuracy.

9. The method of claim 8, wherein selecting the motion magnitude distance based on the determined precision comprises:

selecting the motion magnitude distance with the same or lower precision in response to the determined precision being integer pixel precision or one-half pixel precision.

10. The method of claim 8, wherein performing the conversion based on the determined precision comprises:

modifying binarization of a motion amplitude index in response to not allowing K motion amplitudes in the MMVD mode based on the determined precision, wherein K is an integer greater than 0.

11. The method of any of claims 1-4, wherein performing the conversion based on the determined precision comprises:

the finally derived MV (motion vector) is modified based on the determined precision.

12. The method of claim 11, wherein modifying the finally derived MV based on the determined precision comprises:

modifying the finally derived MV via rounding based on the determined precision.

13. The method of any one of claims 1 to 4,

the constraints on the accuracy are different for bi-directional prediction and uni-directional prediction.

14. The method of any one of claims 1 to 4,

the limitations on the precision are different for different video blocks of different sizes or shapes.

15. The method of claim 1, wherein the one or more parameters of the current video block further comprise at least one of a dimension and a prediction direction of the current video block;

the method further comprises the following steps:

determining MMVD side information based at least on the one or more parameters of the current video block; and

performing the converting based at least on the MMVD side information.

16. The method of claim 15, wherein the MMVD side-information comprises: at least one of a motion magnitude distance table from which the motion magnitude distance of the current video block is selected and a motion direction table from which the motion direction of the current video block is selected.

17. The method of claim 16, wherein the motion magnitude distance table is a table with full integer precision and the motion magnitude distance table is predefined or signaled.

18. The method of any of claims 15 to 17, further comprising:

in response to the starting point being associated with a motion vector of fractional precision, modifying the starting point to integer precision for deriving a final MV (motion vector) in the MMVD mode.

19. A video processing apparatus comprising a processor configured to implement the method of any of claims 1 to 18.

20. The device of claim 19, wherein the device is a video encoder.

21. The device of claim 19, wherein the device is a video decoder.

22. A computer-readable recording medium having recorded thereon a program including codes for causing a processor to implement the method of any one of claims 1 to 18.