CN113994672B - Method and apparatus for video encoding and decoding using triangle prediction - Google Patents

Method and apparatus for video encoding and decoding using triangle prediction Download PDF

Info

Publication number
CN113994672B
CN113994672B CN202080042822.XA CN202080042822A CN113994672B CN 113994672 B CN113994672 B CN 113994672B CN 202080042822 A CN202080042822 A CN 202080042822A CN 113994672 B CN113994672 B CN 113994672B
Authority
CN
China
Prior art keywords
list
prediction
motion vector
candidate
merge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202080042822.XA
Other languages
Chinese (zh)
Other versions
CN113994672A (en
Inventor
王祥林
陈漪纹
修晓宇
马宗全
朱弘正
叶水明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Publication of CN113994672A publication Critical patent/CN113994672A/en
Application granted granted Critical
Publication of CN113994672B publication Critical patent/CN113994672B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Methods and apparatus for video encoding and decoding are provided. The method comprises the following steps: partitioning a video picture into a plurality of Coding Units (CUs), wherein at least one CU of the plurality of CUs is further partitioned into two Prediction Units (PUs), the two PUs comprising PUs of at least one geometry; constructing a first merge list comprising a plurality of candidates, wherein each candidate comprises one or more motion vectors; and obtaining a uni-directional prediction merge list for the PU of the geometry by directly selecting one or more motion vectors from the first merge list.

Description

Method and apparatus for video encoding and decoding using triangle prediction
Cross Reference to Related Applications
The present application claims priority from U.S. provisional application entitled "video codec with triangle prediction," application number 62/838,935, filed on 25 of 04 month 2019, the entire contents of which are incorporated herein by reference for all purposes.
Technical Field
The present application relates generally to video coding and compression, and in particular, but not exclusively, to methods and apparatus for motion compensated prediction using a triangular prediction unit (i.e., a special case of a geometrically partitioned prediction unit) in video coding.
Background
Various electronic devices (such as digital televisions, laptop or desktop computers, tablet computers, digital cameras, digital recording devices, digital media players, video game consoles, smart phones, video teleconferencing devices, video streaming devices, etc.) support digital video. Electronic devices transmit, receive, encode, decode, and/or store digital video data by performing video compression/decompression. Digital video devices implement video codec techniques such as those described in the standards defined by the universal video codec (VVC), joint exploration test model (JEM), MPEG-2, MPEG-4, ITU-T h.263, ITU-T h.264/MPEG-4, part 10, advanced Video Codec (AVC), ITU-T h.265/High Efficiency Video Codec (HEVC), and extensions to such standards.
Video coding typically uses prediction methods (e.g., inter-prediction, intra-prediction) that exploit redundancy present in video images or sequences. An important goal of video codec technology is to compress video data into a form that uses a lower bit rate while avoiding or minimizing degradation of video quality. As evolving video services become available, there is a need for codec techniques with better codec efficiency.
Video compression typically includes performing spatial (intra) prediction and/or temporal (inter) prediction to reduce or remove redundancy inherent in video data. For block-based video coding, a video frame is divided into one or more slices, each slice having a plurality of video blocks, which may also be referred to as Coding Tree Units (CTUs). Each CTU may contain one Coding Unit (CU) or be recursively divided into smaller CUs until a predefined minimum CU size is reached. Each CU (also referred to as a leaf CU) contains one or more Transform Units (TUs) and each CU also contains one or more Prediction Units (PUs). Each CU may be encoded in intra, inter or IBC mode. Video blocks in an intra-coded (I) slice of a video frame are encoded using spatial prediction with respect to reference samples in neighboring blocks within the same video frame. Video blocks in inter-codec (P or B) slices of a video frame may use spatial prediction with respect to reference samples in neighboring blocks within the same video frame or temporal prediction with respect to reference samples in other previous and/or future reference video frames.
A prediction block for a current video block to be encoded is derived based on spatial prediction or temporal prediction of a reference block (e.g., a neighboring block) that has been previously encoded. The process of finding the reference block may be accomplished by a block matching algorithm. Residual data representing pixel differences between a current block to be encoded and a prediction block is referred to as a residual block or prediction error. The inter-coded block is encoded according to the residual block and a motion vector pointing to a reference block in the reference frame that forms the prediction block. The process of determining motion vectors is commonly referred to as motion estimation. The intra-coded block is coded according to the intra-prediction mode and the residual block. For further compression, the residual block is transformed from the pixel domain to a transform domain (e.g., frequency domain), resulting in residual transform coefficients, which may then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, may be scanned to produce a one-dimensional vector of transform coefficients, which are then entropy encoded into a video bitstream to achieve even greater compression.
The encoded video bitstream is then saved in a computer readable storage medium (e.g., flash memory) for access by another electronic device having digital video capabilities or for transmission directly to the electronic device, either wired or wireless. The electronic device then performs video decompression (which is the reverse of the video compression described above), e.g., by parsing the encoded video bitstream to obtain semantic elements from the bitstream, and reconstructing the digital video data from the encoded video bitstream into its original format based at least in part on the semantic elements obtained from the bitstream, and the electronic device presents the reconstructed digital video data on a display of the electronic device.
As digital video quality changes from high definition to 4k×2K or even 8k×4K, the amount of video data to be encoded/decoded grows exponentially. It is a long-standing challenge how to be able to encode/decode video data more efficiently while maintaining the image quality of the decoded video data.
In a joint video expert group (jfet) conference, jfet defines a first draft of a generic video codec (VVC) and a VVC test model 1 (VTM 1) encoding method. The decision includes using quadtrees with nested multi-type trees of the two-partition and three-partition coding block structures as the initial new codec feature for the VVC. Since then, reference software VTM and sketched VVC decoding processes have been developed for implementing the codec method during the jfet conference.
Disclosure of Invention
In general, this disclosure describes examples of techniques related to motion compensated prediction using geometrically shaped prediction units in video coding.
According to a first aspect of the present disclosure, there is provided a method for video coding using geometric prediction, comprising: partitioning a video picture into a plurality of coding units, CUs, wherein at least one CU of the plurality of CUs is further partitioned into two prediction units, PUs, the two PUs comprising PUs of at least one geometry; constructing a first merge list comprising a plurality of candidates based on a merge list construction process for conventional merge prediction, wherein each candidate of the plurality of candidates is a motion vector comprising a list 0 motion vector or a list 1 motion vector or both a list 0 motion vector and a list 1 motion vector; receiving a signaled first index value indicating a first candidate selected from the first merge list; receiving a signaled second index value indicating a second candidate selected from the first merge list; receiving a signaled first binary flag indicating whether to select a list 0 motion vector of the first candidate or a list 1 motion vector of the first candidate for a first PU of the geometric prediction; inferring, based on the first binary flag and based on whether the current picture uses backward prediction, whether a second binary flag indicates whether to select a list 0 motion vector of the second candidate or a list 1 motion vector of the second candidate for the second PU of the geometric prediction.
According to a second aspect of the present disclosure, there is provided a method for video coding using geometric prediction, comprising: partitioning a video picture into a plurality of coding units, CUs, wherein at least one CU of the plurality of CUs is further partitioned into two prediction units, PUs, the two PUs comprising PUs of at least one geometry; constructing a first merge list comprising a plurality of candidates based on a merge list construction process for conventional merge prediction, wherein each candidate of the plurality of candidates is a motion vector comprising a list 0 motion vector or a list 1 motion vector or both a list 0 motion vector and a list 1 motion vector; receiving a signaled first index value indicating a first candidate selected from the first merge list; receiving a signaled second index value indicating a second candidate selected from the first merge list; inferring whether to select a list 0 motion vector of the first candidate or a list 1 motion vector of the first candidate for the first PU of the geometric prediction; it is inferred whether to select the list 0 motion vector of the second candidate or the list 1 motion vector of the second candidate for the second PU of the geometric prediction.
According to a third aspect of the present disclosure, there is provided an apparatus for video coding and decoding using geometric prediction, comprising: one or more processors; and a memory configured to store instructions executable by the one or more processors; wherein the one or more processors, when executing the instructions, are configured to: partitioning a video picture into a plurality of coding units, CUs, wherein at least one CU of the plurality of CUs is further partitioned into two prediction units, PUs, the two PUs comprising PUs of at least one geometry; constructing a first merge list comprising a plurality of candidates based on a merge list construction process for conventional merge prediction, wherein each candidate of the plurality of candidates is a motion vector comprising a list 0 motion vector or a list 1 motion vector or both a list 0 motion vector and a list 1 motion vector; receiving a signaled first index value indicating a first candidate selected from the first merge list; receiving a signaled second index value indicating a second candidate selected from the first merge list; receiving a signaled first binary flag indicating whether to select a list 0 motion vector of the first candidate or a list 1 motion vector of the first candidate for a first PU of the geometric prediction; inferring, based on the first binary flag and based on whether the current picture uses backward prediction, whether a second binary flag is a second PU indicating whether to select a list 0 motion vector of the second candidate or a list 1 motion vector of the second candidate for the geometric prediction.
Drawings
A more detailed description of examples of the present disclosure will be presented by reference to specific examples shown in the drawings. Whereas these drawings depict only some examples and are not therefore to be considered limiting of scope, the examples will be described and explained with additional specificity and detail through the use of the accompanying drawings.
Fig. 1 is a block diagram illustrating an exemplary video encoder according to some embodiments of the present disclosure.
Fig. 2 is a block diagram illustrating an exemplary video decoder according to some embodiments of the present disclosure.
Fig. 3 is a schematic diagram illustrating a quadtree plus binary tree (QTBT) structure, according to some embodiments of the present disclosure.
Fig. 4 is a schematic diagram illustrating an example of a picture divided into CTUs according to some embodiments of the present disclosure.
Fig. 5 is a schematic diagram illustrating a multi-type tree partitioning mode according to some embodiments of the present disclosure.
Fig. 6 is a schematic diagram illustrating partitioning of CUs into triangle prediction units according to some embodiments of the present disclosure.
Fig. 7 is a schematic diagram illustrating the location of neighboring blocks according to some embodiments of the present disclosure.
Fig. 8 is a schematic diagram illustrating the locations of spatial merging candidates according to some embodiments of the present disclosure.
Fig. 9 is a schematic diagram illustrating motion vector scaling of temporal merging candidates according to some embodiments of the present disclosure.
Fig. 10 is a schematic diagram illustrating candidate locations of temporal merging candidates according to some embodiments of the present disclosure.
Fig. 11A is a schematic diagram illustrating one example of unidirectional prediction Motion Vector (MV) selection for a triangular prediction mode according to some embodiments of the present disclosure.
Fig. 11B is a schematic diagram illustrating another example of unidirectional prediction Motion Vector (MV) selection for a triangular prediction mode according to some embodiments of the present disclosure.
Fig. 12A is a schematic diagram illustrating one example of unidirectional prediction MV selection for a triangular prediction mode according to some embodiments of the present disclosure.
Fig. 12B is a schematic diagram illustrating another example of unidirectional prediction MV selection for a triangular prediction mode according to some embodiments of the present disclosure.
Fig. 12C is a schematic diagram illustrating another example of unidirectional prediction MV selection for a triangular prediction mode according to some embodiments of the present disclosure.
Fig. 12D is a schematic diagram illustrating another example of unidirectional prediction MV selection for a triangular prediction mode according to some embodiments of the present disclosure.
Fig. 13 is a schematic diagram illustrating an example of unidirectional prediction MV selection for a triangular prediction mode according to some embodiments of the present disclosure.
Fig. 14 is a block diagram illustrating an exemplary apparatus for video encoding and decoding according to some embodiments of the present disclosure.
Fig. 15 is a flowchart illustrating an exemplary process for video codec for motion compensated prediction using a geometry prediction unit according to some embodiments of the present disclosure.
Detailed Description
Reference will now be made in detail to the specific embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous non-limiting specific details are set forth in order to provide an understanding of the subject matter presented herein. It will be apparent to those of ordinary skill in the art that various alternatives may be used. For example, it will be apparent to one of ordinary skill in the art that the subject matter presented herein may be implemented on many types of electronic devices having digital video capabilities.
Reference throughout this specification to "one embodiment," "an example," "some embodiments," "some examples," or similar language means that a particular feature, structure, or characteristic described is included in at least one embodiment or example. Features, structures, elements, or characteristics described in connection with one or some embodiments may be applicable to other embodiments unless explicitly stated otherwise.
Throughout this disclosure, unless explicitly stated otherwise, the terms "first," "second," "third," and the like, are used as nomenclature for referring to related elements (e.g., devices, components, compositions, steps, etc.) only, and do not denote any spatial or temporal order. For example, a "first device" and a "second device" may refer to two separately formed devices or two portions, components, or operational states of the same device, and may be arbitrarily named.
As used herein, the term "if" or "when" is understood to mean "when" or "in response to" depending on the context. These terms, if present in the claims, may not indicate that the relevant limitations or features are conditional or optional.
The terms "module," "sub-module," "circuit," "sub-circuit," "circuitry," "sub-circuitry," "unit," or "sub-unit" may include memory (shared, dedicated, or combination) that stores code or instructions executable by one or more processors. A module may include one or more circuits with or without stored code or instructions. A module or circuit may include one or more components connected directly or indirectly. These components may or may not be physically attached to each other or positioned adjacent to each other.
The units or modules may be implemented purely in software, purely in hardware or by a combination of hardware and software. In a software-only implementation, for example, a unit or module may include functionally related code blocks or software components that are directly or indirectly linked together in order to perform a particular function.
Fig. 1 illustrates a block diagram showing an exemplary block-based hybrid video encoder 100 that may be used in conjunction with many video codec standards that use block-based processing. In the encoder 100, a video frame is divided into a plurality of video blocks for processing. For each given video block, a prediction is formed based on either an inter prediction method or an intra prediction method. In inter prediction, one or more predictors are formed by motion estimation and motion compensation based on pixels from a previously reconstructed frame. In intra prediction, a predictor is formed based on reconstructed pixels in the current frame. Through mode decision, the best predictor may be selected to predict the current block.
The prediction residual, which represents the difference between the current video block and its predictor, is sent to transform circuit 102. The transform coefficients are then sent from the transform circuit 102 to the quantization circuit 104 for entropy reduction. The quantized coefficients are then fed to entropy encoding circuitry 106 to generate a compressed video bitstream. As shown in fig. 1, prediction related information 110 (such as video block partition information, motion vectors, reference picture indices, and intra prediction modes) from inter-prediction circuitry and/or intra-prediction circuitry 112 is also fed through entropy encoding circuitry 106 and saved into compressed video bitstream 114.
In the encoder 100, decoder-related circuitry is also required for prediction purposes in order to reconstruct the pixels. First, the prediction residual is reconstructed by inverse quantization 116 and inverse transform circuit 118. The reconstructed prediction residual is combined with the block predictor 120 to generate unfiltered reconstructed pixels of the current video block.
Spatial prediction (or "intra prediction") predicts a current video block using pixels from samples (which are referred to as reference samples) of neighboring blocks already encoded in the same video frame as the current video block.
Temporal prediction (also referred to as "inter prediction") predicts a current video block using reconstructed pixels from already encoded video pictures. Temporal prediction reduces the inherent temporal redundancy in video signals. The temporal prediction signal of a given Coding Unit (CU) or coding block is typically signaled by one or more Motion Vectors (MVs) indicating the amount and direction of motion between the current CU and its temporal reference. Furthermore, if a plurality of reference pictures are supported, one reference picture index for identifying from which reference picture in the reference picture memory the temporal prediction signal originates is additionally transmitted.
After performing spatial and/or temporal prediction, intra/inter mode decision circuit 121 in encoder 100 selects the best prediction mode, e.g., based on a rate-distortion optimization method. Then subtracting the block predictor 120 from the current video block; and decorrelates the resulting prediction residual using transform circuit 102 and quantization circuit 104. The resulting quantized residual coefficients are dequantized by dequantization circuit 116 and inverse transformed by inverse transformation circuit 118 to form reconstructed residuals, which are then added back to the prediction block to form the reconstructed signal of the CU. The reconstructed CU may be further applied with loop filtering 115, such as a deblocking filter, a Sample Adaptive Offset (SAO), and/or an Adaptive Loop Filter (ALF), before being placed in a reference picture memory of a picture buffer 117 and used to encode and decode subsequent video blocks. To form the output video bitstream 114, the coding mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are all sent to the entropy encoding unit 106 for further compression and packaging to form a bitstream.
For example, deblocking filters are available in current versions of AVC, HEVC, and VVC. In HEVC, an additional loop filter definition called a Sample Adaptive Offset (SAO) is used to further improve coding efficiency. In the current version of the VVC standard, a further loop filter called an Adaptive Loop Filter (ALF) is being actively studied, and it is highly likely to be included in the final standard.
These loop filter operations are optional. Performing these operations helps to improve coding efficiency and visual quality. They may also be turned off based on decisions presented by the encoder 100 to save computational complexity.
It should be noted that intra prediction is typically based on unfiltered reconstructed pixels, while inter prediction is based on filtered reconstructed pixels (with the encoder 100 turning on these filter options).
Fig. 2 is a block diagram illustrating an exemplary block-based video decoder 200 that may be used in connection with many video codec standards. The decoder 200 is similar to the reconstruction-related portion residing in the encoder 100 of fig. 1. In the decoder 200, an input video bitstream 201 is first decoded by entropy decoding 202 to derive quantized coefficient levels and prediction related information. The quantized coefficient levels are then processed by inverse quantization 204 and inverse transformation 206 to obtain reconstructed prediction residues. The block predictor mechanism implemented in the intra/inter mode selector 212 is configured to perform intra prediction 208 or motion compensation 210 based on the decoded prediction information. A set of unfiltered reconstructed pixels is obtained by summing the reconstructed prediction residual from the inverse transform 206 and the prediction output generated by the block predictor mechanism using a summer 214.
The reconstructed block may further pass through a loop filter 209 before being stored in a picture buffer 213 that serves as a reference picture memory. The reconstructed video in the picture buffer 213 may be sent to drive a display device and used to predict subsequent video blocks. With loop filter 209 open, a filtering operation is performed on these reconstructed pixels to derive the final reconstructed video output 222.
The video encoding/decoding standards mentioned above (such as VVC, JEM, HEVC, MPEG-4, part 10) are conceptually similar. For example, they all use block-based processing. The block partitioning scheme in some standards is set forth below.
HEVC is based on a hybrid block-based motion compensated transform codec architecture. The basic unit for compression is called a Coding Tree Unit (CTU). For 4:2: the 0 chroma format, the maximum CTU size is defined as a block of up to 64 by 64 luma pixels and two 32 by 32 chroma pixels. Each CTU may contain one Coding Unit (CU) or be recursively divided into four smaller CUs until a predefined minimum CU size is reached. Each CU, also referred to as a leaf CU, includes one or more Prediction Units (PUs) and a Transform Unit (TU) tree.
In general, in addition to monochrome content, a CTU may include one luma Coding Tree Block (CTB) and two corresponding chroma CTBs; a CU may include one luma Coding Block (CB) and two corresponding chroma CBs; the PU may include one luma Prediction Block (PB) and two corresponding chroma PBs; and a TU may include one luma Transform Block (TB) and two corresponding chroma TBs. However, exceptions may occur because the minimum TB size is 4 x 4 for both luminance and chrominance (i.e., 2 x 2 chrominance TBs are not supported for the 4:2:0 color format) and each intra chrominance CB always has only one intra chrominance PB regardless of the number of intra luminances PB in the corresponding intra luminance CB.
For an intra CU, a luma CB may be predicted by one or four luma PB, and each of two chroma CBs is always predicted by one chroma PB, where each luma PB has one intra luma prediction mode and the two chroma CBs share one intra chroma prediction mode. Further, for intra CUs, the TB size cannot be larger than the PB size. In each PB, intra prediction is applied to predict samples of each TB inside the PB from neighboring reconstructed samples of the TB. For each PB, a DC mode and a plane mode are supported to predict a flat region and a gradually changing region, respectively, in addition to 33 directional intra prediction modes.
For each inter PU, one of three prediction modes including inter, skip, and merge may be selected. In general, a Motion Vector Competition (MVC) scheme is introduced to select a motion candidate from a given candidate set comprising spatial and temporal motion candidates. Multiple references to motion estimation allow the best reference to be found among the 2 possible reconstructed reference picture lists (i.e., list 0 and list 1). For inter mode (referred to as AMVP mode, where AMVP represents advanced motion vector prediction), an inter prediction indicator (list 0, list 1, or bi-prediction), a reference index, a motion candidate index, a Motion Vector Difference (MVD), and a prediction residual are transmitted. For skip mode and merge mode, only the merge index is sent, and the current PU inherits the inter prediction indicator, reference index, and motion vector from the neighboring PU referenced by the encoded merge index. In case of skipping a codec CU, the residual signal is also omitted.
A joint exploration test model (JEM) builds on top of the HEVC test model. The basic encoding and decoding flow of HEVC is kept unchanged in JEM; however, the design elements of the most important modules (including the block structure, intra and inter prediction, residual transform, loop filter and entropy codec modules) are slightly modified and additional codec tools are added. The following new codec features are included in JEM.
In HEVC, CTUs are partitioned into CUs by using a quadtree structure denoted as coding tree to accommodate various local characteristics. The decision whether to use inter-picture (temporal) prediction or intra-picture (spatial) prediction to codec a picture region is made at the CU level. Each CU may be further divided into one, two, or four PUs according to PU division types. Within one PU, the same prediction process is applied and related information is sent to the decoder based on the PU. After obtaining the residual block by applying the prediction process based on the PU partition type, the CU may be partitioned into Transform Units (TUs) according to another quadtree structure of the CU that is similar to the coding tree. One of the key features of the HEVC structure is that it has multiple partitioning concepts including CUs, PUs, and TUs.
Fig. 3 is a schematic diagram illustrating a quadtree plus binary tree (QTBT) structure, according to some embodiments of the present disclosure.
The QTBT structure removes the concept of multiple partition types, i.e., it removes the distinction between CU, PU and TU concepts and supports greater flexibility for CU partition shapes. In QTBT block structures, a CU may have a square or rectangular shape. As shown in fig. 3, the Coding Tree Units (CTUs) are first partitioned by a quad-tree (i.e., quadtree) structure. The quadtree nodes may be further partitioned by a binary tree structure. There are two partition types in binary tree partitioning: symmetric horizontal division and symmetric vertical division. The binary leaf nodes are called Coding Units (CUs) and such partitions are used for prediction and transformation processing without any further partitioning. This means that the CU, PU and TU have the same block size in the QTBT coding block structure. In JEM, a CU sometimes consists of Coded Blocks (CBs) of different color components, for example, at 4:2: in the case of P and B slices of 0 chroma format, one CU contains one luma CB and two chroma CBs, and sometimes a CU is composed of CBs of one component, for example, in the case of I slices, one CU contains only one luma CB or only two chroma CBs.
The following parameters are defined for QTBT partitioning scheme.
CTU size: the root node size of the quadtree is the same as the concept in HEVC;
-MinQTSize: the minimum allowable quad-leaf node size;
MaxBTSize: maximum binary tree root node size allowed;
MaxBTDepth: maximum binary tree depth allowed;
-MinBTSize: the minimum binary leaf node size allowed.
In one example of a QTBT partition structure, CTU size is set to 128 x 128 luma samples and two corresponding 64 x 64 chroma sample blocks (with 4:2:0 chroma format), minQTSize is set to 16 x 16, maxbtsize is set to 64 x 64, minbtsize (for both width and height) is set to 4 x 4, and MaxBTDepth is set to 4. Quadtree partitioning is first applied to CTUs to produce quadtree leaf nodes. The quadtree nodes may have a size from 16×16 (i.e., minQTSize) to 128×128 (i.e., CTU size). If the quadtree leaf node is 128 x 128, it will not be further partitioned by the binary tree since the size exceeds MaxBTSize (i.e., 64 x 64). Otherwise, the quadtree leaf nodes may be further partitioned by a binary tree. Therefore, the quadtree leaf node is also the root node of the binary tree, and its binary tree depth is 0. When the binary tree depth reaches MaxBTDepth (i.e., 4), no further partitioning is considered. When the binary tree node has a width equal to MinBTSize (i.e., 4), no further horizontal partitioning is considered. Similarly, when the binary tree node has a height equal to MinBTSize, no further vertical partitioning is considered. The leaf nodes of the binary tree are further processed by prediction and transformation processing without any further segmentation. In JEM, the maximum CTU size is 256×256 luminance samples.
An example of block segmentation by using QTBT scheme and corresponding tree representation is shown in fig. 3. The solid line indicates a quadtree partition and the dashed line indicates a binary tree partition. As shown in fig. 3, the Coding Tree Unit (CTU) 300 is first partitioned by a quadtree structure, and three of the four quadtree leaf nodes 302, 304, 306, 308 are further partitioned by a quadtree structure or a binary tree structure. For example, the quadtree nodes 306 are further partitioned by quadtree partitioning; the quadtree leaf node 304 is further partitioned into two leaf nodes 304a, 304b by binary tree partitioning; and the quadtree nodes 302 are further partitioned by binary tree partitioning. In each partition (i.e., non-leaf) node of the binary tree, a flag is marked to indicate which partition type (i.e., horizontal or vertical) to use, with 0 indicating a horizontal partition and 1 indicating a vertical partition. For example, for the quadtree node 304, 0 is labeled to indicate a horizontal division, and for the quadtree node 302, 1 is labeled to indicate a vertical division. For quadtree partitioning, the partition type need not be indicated, because quadtree partitioning always partitions the block horizontally and vertically to produce 4 sub-blocks of equal size.
In addition, QTBT schemes support the ability of luminance and chrominance to have separate QTBT structures. Currently, for P and B slices, the luma and chroma CTBs in one CTU share the same QTBT structure. However, for the I-slice, the luma CTB is partitioned into CUs by a QTBT structure, and the chroma CTB is partitioned into chroma CUs by another QTBT structure. This means that a CU in an I slice consists of coding blocks of a luma component or coding blocks of two chroma components, and a CU in a P or B slice consists of coding blocks of all three color components.
In a joint video expert group (jfet) conference, jfet defines a first draft of a generic video codec (VVC) and a VVC test model 1 (VTM 1) encoding method. The decision includes using quadtrees with nested multi-type trees of the two-partition and three-partition coding block structures as the initial new codec feature for the VVC.
In VVC, a picture-splitting structure divides an input video into blocks called Coding Tree Units (CTUs). CTUs are partitioned into Coding Units (CUs) using quadtrees with a nested multi-type tree structure, where She Bianma units (CUs) define regions that share the same prediction mode (e.g., intra or inter). Here, the term "cell" defines an area of an image covering all components; the term "block" is used to define an area covering a particular component (e.g., luminance) and may differ in spatial location when considering a chroma sampling format (such as 4:2:0).
Dividing a picture into CTUs
Fig. 4 is a schematic diagram illustrating an example of a picture divided into CTUs according to some embodiments of the present disclosure.
In VVC, pictures are divided into CTU sequences, and CTU concepts are the same as those of HEVC. For a picture with three sample arrays, the CTU consists of an nxn block of luma samples and two corresponding blocks of chroma samples. Fig. 4 shows an example of a picture 400 divided into CTUs 402.
The maximum allowable size of the luminance block in the CTU is specified as 128×128 (although the maximum size of the luminance conversion block is 64×64).
Segmentation of CTUs using tree structures
Fig. 5 is a schematic diagram illustrating a multi-type tree partitioning mode according to some embodiments of the present disclosure.
In HEVC, CTUs are partitioned into CUs by using a quad-tree structure, denoted as a coding tree, to accommodate various local characteristics. The decision whether to use inter-picture (temporal) prediction or intra-picture (spatial) prediction to codec a picture region is made at the leaf-CU level. Each leaf CU may be further divided into one, two, or four PUs according to PU division types. Within one PU, the same prediction process is applied and related information is sent to the decoder based on the PU. After obtaining the residual block by applying the prediction process based on the PU partition type, the leaf CU may be partitioned into Transform Units (TUs) according to another quadtree structure of the CU that is similar to the coding tree. One of the key features of the HEVC structure is that it has multiple partitioning concepts including CUs, PUs, and TUs.
In VVC, the concept of multiple partition unit types is replaced with a quadtree with nested multi-type tree of a two-partition and three-partition structure, i.e. it removes the distinction of CU, PU and TU concepts (except that a CU with a size too large for the maximum transform length requires such a distinction) and supports greater flexibility for CU partition shapes. In the coding tree structure, a CU may have a square or rectangular shape. The Coding Tree Units (CTUs) are first partitioned by a quad-tree (i.e., quadtree) structure. The quaternary leaf nodes may then be further partitioned by a multi-type tree structure. As shown in fig. 5, there are four partition types in the multi-type tree structure: vertical bi-partition 502 (SPLIT BT VER), horizontal bi-partition 504 (SPLIT BT HOR), vertical tri-partition 506 (SPLIT TT VER), and horizontal tri-partition 508 (SPLIT TT HOR). The multi-type leaf nodes are called Coding Units (CUs), and unless the CU is too large for the maximum transform length, this partition is used for prediction and transform processing without any further partitioning. This means that in most cases, the CUs, PUs and TUs have the same block size in a quadtree with a nested multi-type tree coding block structure. An exception occurs when the maximum supported transform length is smaller than the width or height of the color component of the CU. In VTM1, a CU is composed of Coded Blocks (CBs) of different color components, e.g., one CU contains one luminance CB and two chrominance CBs (unless the video is monochrome, i.e., has only one color component).
Partitioning a CU into multiple prediction units
In VVC, for each CU partitioned based on the structure shown above, prediction of block content may be performed for the entire CU block or in a sub-block manner as explained in the following paragraphs. Such a predicted operation unit is referred to as a prediction unit (or PU).
In the case of intra prediction (or intra prediction), the size of a PU is typically equal to the size of a CU. In other words, prediction is performed on the entire CU block. For inter prediction (or inter prediction), the size of the PU may be equal to or smaller than the size of the CU. In other words, there are cases where a CU may be divided into multiple PUs for prediction.
Some examples of PU sizes smaller than CU sizes include affine prediction mode, advanced temporal level motion vector prediction (ATMVP) mode, triangle prediction mode, and so on.
In affine prediction mode, a CU may be divided into multiple 4×4 PUs for prediction. Motion vectors may be derived for each 4 x 4PU and motion compensation may be performed on the 4 x 4PU accordingly. In ATMVP mode, a CU may be partitioned into one or more 8 x 8 PUs for prediction. Motion vectors are derived for each 8 x 8PU and motion compensation may be performed on the 8 x 8PU accordingly. In the triangle prediction mode, a CU may be divided into two triangle shape prediction units. A motion vector is derived for each PU and motion compensation is performed accordingly. For inter prediction, a triangle prediction mode is supported. More details of the triangle prediction mode are set forth below.
Triangle prediction mode (or triangle segmentation mode)
Fig. 6 is a schematic diagram illustrating partitioning of CUs into triangle prediction units according to some embodiments of the present disclosure.
The concept of triangle prediction mode is to introduce triangle partitions for motion compensated prediction. The triangle prediction mode may also be named triangle prediction unit mode or triangle partition mode. As shown in fig. 6, a CU 602 or 604 is divided into two triangle prediction units PU in a diagonal or anti-diagonal direction (i.e., a division from the upper left corner to the lower right corner as shown in CU 602 or a division from the upper right corner to the lower left corner as shown in CU 604) 1 And PU (polyurethane) 2 . Each triangle prediction unit in the CU uses its own unidirectional prediction motion derived from the unidirectional prediction candidate listThe vector and reference frame indices are inter predicted. After predicting the triangle prediction unit, an adaptive weighting process is performed on the diagonal edges. Then, the transform and quantization process is applied to the entire CU. Note that this mode is applied only to the skip mode and the merge mode in the current VVC. Although in fig. 6, the CUs are shown as square blocks, the triangular prediction mode may also be applied to non-square (i.e., rectangular) shaped CUs.
The unidirectional prediction candidate list may include one or more candidates, and each candidate may be a motion vector. Thus, the terms "uni-directional prediction candidate list", "uni-directional prediction motion vector candidate list", and "uni-directional prediction merge list" are used interchangeably throughout this disclosure; and the terms "uni-directional prediction merge candidate" and "uni-directional prediction motion vector" are also used interchangeably.
Unidirectional prediction motion vector candidate list
Fig. 7 is a schematic diagram illustrating the location of neighboring blocks according to some embodiments of the present disclosure.
In some examples, the unidirectional prediction motion vector candidate list may include two to five unidirectional prediction motion vector candidates. In some other examples, other numbers are possible. It is derived from neighboring blocks. As shown in fig. 7, the unidirectional prediction motion vector candidate list is derived from seven neighboring blocks including five spatial neighboring blocks (1 to 5) and two temporal neighboring blocks (6 to 7). The motion vectors of the seven neighboring blocks are collected into a first merge list. Then, a uni-directional prediction candidate list is formed based on the first merge list motion vector according to a specific order. Based on the order, the unidirectional predicted motion vector from the first merge list is placed in the unidirectional predicted motion vector candidate list first, followed by the reference picture list 0 or L0 motion vector of the bi-directional predicted motion vector, followed by the reference picture list 1 or L1 motion vector of the bi-directional predicted motion vector, followed by the average motion vector of the L0 and L1 motion vectors of the bi-directional predicted motion vector. At this time, if the number of candidates is still smaller than the target number (which is 5 in the current VVC), a zero motion vector is added to the list to satisfy the target number.
A predictor is derived for each triangle PU based on its motion vector. Notably, the derived predictors cover a larger area than the actual triangle PU, such that there is an overlapping area of two predictors along the shared diagonal edge of the two triangle PUs. A weighting procedure is applied to the diagonal edge region between the two predictors to derive the final prediction of the CU. The weighting factors currently used for luminance and chrominance samples are {7/8,6/8,5/8,4/8,3/8,2/8,1/8} and {6/8,4/8,2/8}, respectively.
Triangle prediction mode semantics and signaling
Here, the triangle prediction mode is signaled using a triangle prediction flag. The triangle prediction flag is signaled when the CU is encoded in the skip mode or the merge mode. For a given CU, if the value of the triangle prediction flag is 1, it indicates that the corresponding CU is encoded using the triangle prediction mode. Otherwise, the CU is encoded using a prediction mode other than the triangle prediction mode.
For example, the triangle prediction flag is conditionally signaled in skip mode or merge mode. First, a triangle prediction tool enable/disable flag is signaled in a sequence parameter set (or SPS). The triangle prediction flag is signaled at the CU level only if this triangle prediction tool enable/disable flag is true. Second, triangle prediction tools are only allowed in the B-band. Therefore, the triangle prediction flag is signaled at the CU level only in the B slices. Third, the triangle prediction flag is signaled only for CUs whose size is equal to or greater than a certain threshold. If the size of the CU is less than the threshold, the triangle prediction flag is not signaled. Fourth, a triangle prediction flag is signaled to a CU only if the CU is not encoded in a sub-block merge mode including both affine and ATMVP modes. In the four cases listed above, when the triangle prediction flag is not signaled, the triangle prediction flag is inferred to be 0 at the decoder side.
When the triangle prediction flag is signaled, the triangle prediction flag is signaled with a specific context using a context-adaptive binary arithmetic coding (CABAC) entropy codec. A context is formed based on the triangle prediction flag values of the top block and the left block of the current CU.
To codec (i.e., encode or decode) the triangle prediction flag of the current block (or current CU), the triangle prediction flags from both the top block (or CU) and the left block (or CU) are derived and their values are summed. This generates three possible contexts corresponding to:
1) The left block and the top block both have triangle prediction flag 0;
2) The left block and the top block are provided with triangle prediction marks 1;
3) Others.
Separate probabilities are maintained for each of the three contexts. Once a context value is determined for the current block, the triangle prediction flag of the current block is encoded using a CABAC probability model corresponding to the context value.
If the triangle prediction flag is true, a triangle partition direction flag is signaled to indicate whether the partition direction is from the upper left corner to the lower right corner or from the upper right corner to the lower left corner.
Then, two merging index values are signaled to indicate a merging index value of a first unidirectional prediction merging candidate and a merging index value of a second unidirectional prediction merging candidate for triangle prediction, respectively. The two merge index values are used to locate two merge candidates from the uni-directional predicted motion vector candidate list for the first partition and the second partition, respectively. For triangle prediction, two merge index values are required to be different so that two predictors of two triangle partitions may be different from each other. Thus, the first combined index value is directly signaled. To signal the second merge index value, the second merge index value is directly signaled if the second merge index value is less than the first merge index value. Otherwise, the second combined index value is subtracted by 1 before being signaled to the decoder. On the decoder side, the first merge index is directly decoded and used. To decode the second combined index value, a value denoted "idx" is first decoded from the CABAC engine. If idx is less than the first merge index value, the second merge index value will be equal to the value of idx. Otherwise, the second merge index value would be equal to (idx+1).
Conventional merge mode motion vector candidate list
According to the current VVC, in a conventional merge mode in which the entire CU is predicted without being divided into more than one PU, a motion vector candidate list or a merge candidate list is constructed using a process different from that used for the triangle prediction mode.
Fig. 8 is a schematic diagram illustrating the locations of spatial merging candidates according to some embodiments of the present disclosure, as shown in fig. 8, first, spatial motion vector candidates are selected based on motion vectors from neighboring blocks. In the derivation of the spatial merge candidates for the current block 802, a maximum of four merge candidates are selected among candidates located in the position as depicted in fig. 8. The derived order is A 1 →B 1 →B 0 →A 0 →(B 2 ). Only when position A 1 、B 1 、B 0 、A 0 Position B is considered when any PU of (B) is not available or is intra-coded 2
Next, a temporal merging candidate is derived. In the derivation of temporal merging candidates, scaled motion vectors are derived based on co-located PUs belonging to pictures within a given reference picture list that have a smallest Picture Order Count (POC) difference from the current picture. The reference picture list to be used for deriving the co-located PU is explicitly indicated in the slice header. Fig. 9 illustrates motion vector scaling of temporal merging candidates, as indicated by the dashed lines in fig. 9, to obtain scaled motion vectors of temporal merging candidates, according to some embodiments of the present disclosure. The scaled motion vector of the temporal merging candidate is scaled from the motion vector of the co-located PU col_pu using POC distances tb and td, where tb is defined as the POC difference between the reference picture curr_ref of the current picture curr_pic and the current picture curr_pic, and td is defined as the POC difference between the reference picture col_ref of the co-located picture col_pic and the co-located picture col_pic. The reference picture index of the temporal merging candidate is set equal to zero. The actual implementation of the scaling procedure is described in the HEVC draft specification. For the B slices, two motion vectors (one for reference picture list 0 and the other for reference picture list 1) are obtained and combined to form a bi-prediction merge candidate.
Fig. 10 is a schematic diagram illustrating candidate locations of temporal merging candidates according to some embodiments of the present disclosure.
As depicted in fig. 10, the location of the co-located PU is selected between the two candidate locations C3 and H. If the PU at position H is not available, or is intra-coded, or is outside the current CTU, position C3 is used to derive temporal merging candidates. Otherwise, the position H is used to derive a temporal merging candidate.
After inserting both spatial and temporal motion vectors into the merge candidate list as described above, history-based merge candidates are added. So-called history-based merge candidates include those motion vectors from previously encoded CUs, which are maintained in separate motion vector lists and managed based on certain rules.
After inserting the history-based candidates, if the merge candidate list is not full, the paired average motion vector candidates are further added to the list. As its name indicates, this type of candidate is constructed by averaging the candidates already in the current list. More specifically, two candidates in the merge candidate list are employed each time based on a specific order or rule, and the average motion vector of the two candidates is appended to the current list.
After inserting the pairwise average motion vectors, if the merge candidate list is still not full, zero motion vectors will be added to fill the list.
Building a first merge list for triangle prediction using a conventional merge list building process
The triangular prediction modes in current VVCs share some similarity with conventional merged prediction modes throughout their formation of predictors. For example, in both prediction modes, a merge list needs to be constructed based at least on the neighboring spatial motion vector and the co-located motion vector of the current CU. Meanwhile, the triangle prediction mode also has some aspects different from the conventional merge prediction mode.
For example, although it is necessary to construct a merge list in a triangle prediction mode and a conventional merge prediction mode, the detailed procedure for obtaining such a list is different.
These differences result in additional cost for codec implementation because of the additional logic required. The process and logic of building the merge list may be unified and shared between the triangle prediction mode and the regular merge prediction mode.
In some examples, when forming a uni-directional prediction (also referred to as single prediction) merge list for a triangular prediction mode, new motion vectors are completely pruned for those already in the list before adding them to the merge list. In other words, the new motion vector is compared with each motion vector already in the uni-directionally predicted merge list, and the new motion vector is added to the list only if the new motion vector is different from each motion vector in the merge list. Otherwise, the new motion vector is not added to the list.
According to some examples of the present disclosure, in the triangular prediction mode, a unidirectional prediction merge list may be obtained or constructed from a conventional merge mode motion vector candidate list (which may be referred to as a conventional merge list).
More specifically, in order to construct a merge candidate list for a triangle prediction mode, a first merge list is first constructed based on a merge list construction process for conventional merge prediction. The first merge list includes a plurality of candidates, each candidate being a motion vector. The motion vectors in the first merge list are then used to further construct or derive a uni-directional prediction merge list for the triangle prediction mode.
It should be noted that the first merge list constructed in this case may select a list size different from that of the general merge mode or the normal merge mode. In one example of the present disclosure, the first merge list has the same size as the list of the general merge mode. In another example of the present disclosure, the constructed first merge list has a list size that is different from a list size of a general merge mode.
Building a unidirectional prediction merge list from a first merge list
According to some examples of the present disclosure, a unidirectional prediction merge list for a triangle prediction mode may be constructed or derived from a first merge list based on one of the following methods.
In an example of the present disclosure, to construct or derive a uni-directional prediction merge list, a candidate's prediction list 0 motion vector in a first merge list is first examined and selected into the uni-directional prediction merge list. If the uni-directional prediction merge list is not full after this process (e.g., the number of candidates in this list is still less than the target number), then the prediction list 1 motion vector of the candidate in the first merge list is checked and selected into the uni-directional prediction merge list. If the uni-directional prediction merge list is still not full, then the prediction list 0 zero vector is added to the uni-directional prediction merge list. If the uni-directional prediction merge list is still not full, then the prediction list 1 zero vector is added to the uni-directional prediction merge list.
In another example of the present disclosure, for each candidate in the first merge list, its prediction list 0 motion vector and prediction list 1 motion vector are added to the unidirectional prediction merge list in an interleaved manner. More specifically, for each candidate in the first merge list, if the candidate is a uni-directional predicted motion vector, it is directly added to the uni-directional predicted merge list. Otherwise, if the candidate is a bi-predictive motion vector in the first merge list, its prediction list 0 motion vector is first added to the uni-predictive merge list, followed by its prediction list 1 motion vector. Once all motion vector candidates in the first merge list have been checked and added, while the uni-directional prediction merge list is still not full, a uni-directional prediction zero motion vector may be added. For example, for each reference frame index, the prediction list 0 zero motion vector and the prediction list 1 zero motion vector may be added separately to the uni-directional prediction merge list until the list is full.
In yet another example of the present disclosure, a uni-directional predicted motion vector from a first merge list is first selected into a uni-directional predicted merge list. If the uni-directional prediction merge list is not full after this process, for each bi-directional prediction motion vector in the first merge list, its prediction list 0 motion vector is first added to the uni-directional prediction merge list, followed by its prediction list 1 motion vector. After this process, if the uni-directional prediction merge list is still not full, a uni-directional prediction zero motion vector may be added. For example, for each reference frame index, the prediction list 0 zero motion vector and the prediction list 1 zero motion vector may be added separately to the uni-directional prediction merge list until the list is full.
In the above description, when a uni-directional predicted motion vector is added to the uni-directional prediction merge list, a motion vector pruning process may be performed to ensure that the new motion vector to be added is different from those already in the uni-directional prediction merge list. Such a motion vector pruning process may also be performed in a partial manner to obtain lower complexity, e.g. to check the new motion vectors to be added only for some but not all motion vectors already in the uni-directional prediction merge list. In extreme cases, no motion vector pruning (i.e., motion vector comparison operations) is performed in the process.
Constructing a unidirectional prediction merge list from a first merge list based on a picture prediction configuration
In some examples of the disclosure, the unidirectional prediction merge list may be adaptively constructed based on whether the current picture uses backward prediction. For example, a unidirectional prediction merge list may be constructed using different methods depending on whether the current picture uses backward prediction. If the Picture Order Count (POC) value of all reference pictures is not greater than the POC value of the current picture, it indicates that the current picture does not use backward prediction.
In examples of the present disclosure, when the current picture does not use backward prediction, or after determining that the current picture does not use backward prediction, candidate prediction list 0 motion vectors in the first merge list are first checked and selected into the unidirectional prediction merge list, followed by those candidate prediction list 1 motion vectors; if the uni-directional prediction merge list is still not full, a uni-directional prediction zero motion vector may be added. Otherwise, if the current picture uses backward prediction, the prediction list 0 motion vector and the prediction list 1 motion vector of each candidate in the first merge list may be checked and selected into the unidirectional prediction merge list in an interleaved manner as described above, i.e., the prediction list 0 motion vector of the first candidate in the first merge list is added, then the prediction list 1 motion vector of the first candidate is added, then the prediction list 0 motion vector of the second candidate is added, then the prediction list 1 motion vector of the second candidate is added, and so on. At the end of the process, if the uni-directional prediction merge list is still not full, a uni-directional prediction zero vector may be added.
In another example of the present disclosure, if the current picture does not use backward prediction, then candidate prediction list 1 motion vectors in the first merge list are first checked and selected into the unidirectional prediction merge list, followed by those candidate prediction list 0 motion vectors; if the uni-directional prediction merge list is still not full, a uni-directional prediction zero motion vector may be added. Otherwise, if the current picture uses backward prediction, the prediction list 0 motion vector and the prediction list 1 motion vector of each candidate in the first merge list may be checked and selected into the unidirectional prediction merge list in an interleaved manner as described above, i.e., the prediction list 0 motion vector of the first candidate in the first merge list is added, then the prediction list 1 motion vector of the first candidate is added, then the prediction list 0 motion vector of the second candidate is added, then the prediction list 1 motion vector of the second candidate is added, and so on. At the end of the process, if the uni-directional prediction merge list is still not full, a uni-directional prediction zero vector may be added.
In yet another example of the present disclosure, if the current picture does not use backward prediction, only the candidate prediction list 0 motion vector in the first merge list is first checked and selected into the unidirectional prediction merge list, and if the unidirectional prediction merge list is still not full, a unidirectional prediction zero motion vector may be added. Otherwise, if the current picture uses backward prediction, the prediction list 0 motion vector and the prediction list 1 motion vector of each candidate in the first merge list may be checked and selected into the unidirectional prediction merge list in an interleaved manner as described above, i.e., the prediction list 0 motion vector of the first candidate in the first merge list is added, then the prediction list 1 motion vector of the first candidate is added, then the prediction list 0 motion vector of the second candidate is added, then the prediction list 1 motion vector of the second candidate is added, and so on. At the end of the process, if the uni-directional prediction merge list is still not full, a uni-directional prediction zero vector may be added.
In yet another example of the present disclosure, if the current picture does not use backward prediction, only the candidate prediction list 1 motion vector in the first merge list is first checked and selected into the unidirectional prediction merge list, and if the unidirectional prediction merge list is still not full, a unidirectional prediction zero motion vector may be added. Otherwise, if the current picture uses backward prediction, the prediction list 0 motion vector and the prediction list 1 motion vector of each candidate in the first merge list may be checked and selected into the unidirectional prediction merge list in an interleaved manner as described above, i.e., the prediction list 0 motion vector of the first candidate in the first merge list is added, then the prediction list 1 motion vector of the first candidate is added, then the prediction list 0 motion vector of the second candidate is added, then the prediction list 1 motion vector of the second candidate is added, and so on. At the end of the process, if the uni-directional prediction merge list is still not full, a uni-directional prediction zero vector may be added.
In another example of the present disclosure, when the current picture does not use backward prediction, a prediction list 0 motion vector of candidates in the first merge list is used as a unidirectional prediction merge candidate, and indexes are set according to the same index order as they are in the first merge list. Otherwise, if the current picture uses backward prediction, the list 0 motion vector and the list 1 motion vector of each candidate in the first merge list are used as unidirectional prediction merge candidates, and the index is set based on the interleaving manner as described above, i.e., first the list 0 motion vector of the first candidate in the first merge list, then the list 1 motion vector of the first candidate, then the list 0 motion vector of the second candidate, then the list 1 motion vector of the second candidate, and so on. In case the candidate in the first merge list is a uni-directional motion vector, an index is set to the zero motion vector after this candidate in the uni-directional prediction merge list. This ensures that for the case where backward prediction is used for the current picture, each candidate in the first merge list (whether it is a bi-predictive motion vector or a uni-predictive motion vector) can provide two uni-directional motion vectors as uni-predictive merge candidates.
In another example of the present disclosure, when the current picture does not use backward prediction, the prediction list 1 motion vector of the candidates in the first merge list is used as a unidirectional prediction merge candidate, and indexes are set according to the same index order as they are in the first merge list. Otherwise, if the current picture uses backward prediction, the list 0 motion vector and the list 1 motion vector of each candidate in the first merge list are used as unidirectional prediction merge candidates, and the index is set based on the interleaving manner as described above, i.e., first the list 0 motion vector of the first candidate in the first merge list, then the list 1 motion vector of the first candidate, then the list 0 motion vector of the second candidate, then the list 1 motion vector of the second candidate, and so on. In case the candidate in the first merge list is a uni-directional motion vector, a specific motion offset set index is added to the same motion vector after this candidate in the uni-directional prediction merge list.
In the above description, although it is described that the motion vector is selected from the first merge list to the unidirectional prediction merge list, in practice, the unidirectional prediction merge list may or may not be physically formed to implement the method in a different manner. For example, the first merge list may be used directly without physically creating a unidirectional prediction merge list. For example, those list 0 motion vectors and/or list 1 motion vectors of each candidate in the first merge list may simply be indexed based on a particular order and accessed directly from the first merge list. It should be noted that such an indexing order may follow the same selection order described in those examples above. This means that given the merge index of a PU that is coded using the triangular prediction mode, its corresponding uni-directional prediction merge candidates can be obtained directly from the first merge list without physically forming the uni-directional prediction merge list.
In this process, pruning may be performed in whole or in part when checking for new motion vectors to be added to the list. When partially performed, this means that the new motion vector is compared to some, but not all, of the motion vectors already in the uni-directional prediction merge list. In extreme cases, no motion vector pruning (i.e., motion vector comparison operations) is performed in the process.
Such motion vector pruning may also be adaptively performed when forming the uni-directional prediction merge list based on whether the current picture uses backward prediction. For example, for all examples of the present disclosure in this section described above, the motion vector pruning operation is performed fully or partially when the current picture does not use backward prediction. When the current picture uses backward prediction, a motion vector clipping operation is not performed when forming a unidirectional prediction merge list.
Triangle prediction using a first merge list without creating a unidirectional prediction merge list
In the above example, the unidirectional prediction merge list for triangle prediction is constructed by selecting the motion vector from the first merge list into the unidirectional prediction merge list. However, in practice, the method may be implemented in different ways (physically forming a unidirectional prediction (or single prediction) merge list or not forming a unidirectional prediction (or single prediction) merge list). In some examples, the first merge list may be used directly without physically creating a uni-directional prediction merge list. For example, only list 0 motion vectors and/or list 1 motion vectors for each candidate in the first merge list may be indexed based on a particular order and accessed directly from the first merge list.
For example, the first consolidated list may be obtained from a decoder or other electronic device/component. In other examples, after constructing a first merge list including a plurality of candidates (each candidate is one or more motion vectors) based on a merge list construction process for conventional merge prediction, a unidirectional prediction merge list is not constructed, but a predefined index list including a plurality of reference indices (each reference index is a reference to a motion vector of a candidate in the first merge list) is used to derive unidirectional merge candidates for a triangle prediction mode. The index list may be considered as a representation of a unidirectional prediction merge list for triangle prediction, and the unidirectional prediction merge list includes at least a subset of candidates in the first merge list that correspond to the reference index. It should be noted that the order of the indexes may follow any of the selection orders described in the example of building a uni-directional prediction merge list. In practice, such index lists may be implemented in different ways. For example, it may be explicitly implemented as a list. In other examples, it may also be implemented or obtained in specific logic and/or program functions without explicitly forming any list.
In some examples of the disclosure, the index list may be adaptively determined based on whether the current picture uses backward prediction. For example, the reference indexes in the index list may be arranged according to whether the current picture uses backward prediction, i.e., based on a comparison result of a Picture Order Count (POC) of the current picture and a POC of the reference picture. If the Picture Order Count (POC) value of all reference pictures is not greater than the POC value of the current picture, it indicates that the current picture does not use backward prediction.
In one example of the present disclosure, when the current picture does not use backward prediction, a prediction list 0 motion vector of candidates in the first merge list is used as a unidirectional prediction merge candidate, and indexes are set according to the same index order as they are in the first merge list. That is, after determining that the POC of the current picture is greater than each of the POC of the reference picture, the reference indices are arranged according to the same order of list 0 motion vectors of candidates in the first merge list. Otherwise, if the current picture uses backward prediction, the list 0 motion vector and the list 1 motion vector of each candidate in the first merge list are used as unidirectional prediction merge candidates and are indexed based on the interleaving manner, i.e., first the list 0 motion vector of the first candidate in the first merge list, then the list 1 motion vector of the first candidate, then the list 0 motion vector of the second candidate, then the list 1 motion vector of the second candidate, and so on. That is, after determining that the POC of the current picture is less than at least one of the POC of the reference picture, in the case where each candidate in the first merge list is a bi-predictive motion vector, the reference index is arranged according to an interleaved manner of the list 0 motion vector and the list 1 motion vector of each candidate in the first merge list. In the case where the candidate in the first merge list is a uni-directional motion vector, a zero motion vector setting is indexed as a uni-directional prediction merge candidate following the motion vector of the candidate. This ensures that for the case where backward prediction is used for the current picture, each candidate in the first merge list (whether it is a bi-directional predicted motion vector or a uni-directional predicted motion vector) provides two uni-directional motion vectors as uni-directional predicted merge candidates.
In another example of the present disclosure, when the current picture does not use backward prediction, a prediction list 0 motion vector of the candidates in the first merge list is used as a unidirectional prediction merge candidate, and indexes are set according to the same index order as they are in the first merge list. Otherwise, if the current picture uses backward prediction, the list 0 motion vector and the list 1 motion vector of each candidate in the first merge list are used as unidirectional prediction merge candidates and are indexed based on the interleaving manner as described above, i.e., first the list 0 motion vector of the first candidate in the first merge list, then the list 1 motion vector of the first candidate, then the list 0 motion vector of the second candidate, then the list 1 motion vector of the second candidate, and so on. In case the candidate in the first merge list is a uni-directional motion vector, the motion vector plus a specific motion offset setting is indexed as a uni-directional prediction merge candidate following the motion vector of the candidate.
Thus, in the case where the candidates in the first merge list are unidirectional motion vectors, after determining that the POC of the current picture is smaller than at least one of the POC of the reference picture, the reference indices are arranged according to the following interleaving manner: each candidate motion vector, zero motion vector, or motion vector plus an offset in the first merge list.
In the above procedure, pruning may be performed entirely or partially when checking for a new motion vector to be added to the uni-directional prediction merge list. When partially performed, this means that the new motion vector is compared to some, but not all, of the motion vectors already in the uni-directional prediction merge list. In extreme cases, no motion vector pruning (i.e., motion vector comparison operations) is performed in the process.
When forming the uni-directional prediction merge list, motion vector pruning may also be adaptively performed based on whether the current picture uses backward prediction. For example, for examples of the present disclosure related to index list determination based on picture prediction configuration, motion vector pruning operations are performed in whole or in part when the current picture does not use backward prediction. When the current picture uses backward prediction, a motion vector clipping operation is not performed.
Selecting unidirectional prediction merge candidates for triangle prediction modes
In addition to the examples described above, other ways of unidirectional prediction merge list construction or unidirectional prediction merge candidate selection are also disclosed.
In one example of the present disclosure, once the first merge list for the regular merge mode is constructed, a uni-directional prediction merge candidate may be selected for triangle prediction according to the following rules:
For a motion vector candidate in the first merge list, one and only one of its list 0 motion vector or list 1 motion vector is used for triangle prediction;
for a given motion vector candidate in the first merge list, if the merge index value of the motion vector candidate in the list is even, its list 0 motion vector is used for triangle prediction if available, and its list 0 motion vector is used for triangle prediction if this motion vector candidate does not have a list 1 motion vector; and
for a given motion vector candidate in the first merge list, its list 1 motion vector is used for triangle prediction if available, and its list 0 motion vector is used for triangle prediction if this motion vector candidate does not have a list 1 motion vector, if the merge index value of this motion vector candidate in the list is odd.
Fig. 11A shows an example of unidirectional prediction Motion Vector (MV) selection (or unidirectional prediction merging candidate selection) for the triangular prediction mode. In an example, the first 5 merging MV candidates derived in the first merging list are indexed from 0 to 4; and each row has two columns representing list 0 motion vectors and list 1 motion vectors, respectively, of candidates in the first merged list. Each candidate in the list may be predicted unidirectionally or bidirectionally. For unidirectional prediction candidates, they have only either list 0 motion vectors or list 1 motion vectors, but not both. For bi-prediction candidates, they have both list 0 and list 1 motion vectors. In fig. 11A, for each merge index, the motion vectors labeled "x" are those that were first used for triangle prediction if they were available. If the motion vector marked "x" is not available, the unlabeled motion vector corresponding to the same merge index will be used for triangle prediction. In other words, according to this method, given the merge index value of the PU that is encoded and decoded in the triangular prediction mode, the index value can be directly used to locate the merge candidate in the first merge list; then, depending on the parity of the index value (i.e., if the index value is even or odd), a list 0 motion vector or a list 1 motion vector of the merge candidates located in the first merge list is selected for the PU based on the above-described rule. There is no need to physically form a uni-directional prediction merge list in the process.
The concepts described above may be extended to other examples. Fig. 11B illustrates another example of unidirectional prediction Motion Vector (MV) selection for a triangular prediction mode. According to fig. 11B, a rule for selecting a unidirectional prediction merge candidate for triangle prediction is as follows:
for a motion vector candidate in the first merge list, one and only one of its list 0 motion vector or list 1 motion vector is used for triangle prediction;
for a given motion vector candidate in the first merge list, if the merge index value of the motion vector candidate in the list is even, its list 1 motion vector is used for triangle prediction if available, and its list 0 motion vector is used for triangle prediction if this motion vector candidate does not have a list 1 motion vector; and
for a given motion vector candidate in the first merge list, its list 0 motion vector is used for triangle prediction if available and its list 1 motion vector is used for triangle prediction if this motion vector candidate does not have a list 0 motion vector, if the merge index value of this motion vector candidate in the list is odd.
In some examples, other different orders may be defined and used to select unidirectional prediction merge candidates for triangle prediction from those in the first merge list. More specifically, for a given motion vector candidate in the first merge list, the decision to use its list 0 motion vector or list 1 motion vector first when the motion vector candidate is available for triangle prediction need not depend on the parity of the index values of the candidates in the first merge list as described above. For example, the following rules may also be used:
For a motion vector candidate in the first merge list, one and only one of its list 0 motion vector or list 1 motion vector is used for triangle prediction;
for several motion vector candidates in the first merged list, list 0 motion vectors of which are used for triangle prediction if available, and corresponding list 1 motion vectors are used for triangle prediction if no list 0 motion vectors are present, based on some predefined pattern; and
for the remaining motion vector candidates in the first merged list, their list 1 motion vectors are used for triangle prediction if available, and in the absence of a list 1 motion vector, the corresponding list 0 motion vector is used for triangle prediction, based on the same predefined pattern.
Fig. 12A to 12D show some examples of predefined modes in unidirectional prediction Motion Vector (MV) selection for a triangular prediction mode. For each merge index, the motion vectors marked with "x" are those that were first used for triangle prediction if they were available. If the motion vector marked "x" is not available, the unlabeled motion vector corresponding to the same merge index will be used for triangle prediction.
In fig. 12A, for the first three motion vector candidates in the first merge list, their list 0 motion vectors are first checked. Only when a list 0 motion vector is not available, the corresponding list 1 motion vector is used for triangle prediction. For the fourth and fifth motion vector candidates in the first merged list, their list 1 motion vectors are checked first. The corresponding list 0 motion vector is used for triangle prediction only when the list 1 motion vector is not available. Fig. 12B to 12D show three other modes of selecting a unidirectional prediction merge candidate from the first merge list. The examples shown in the drawings are not limiting and there are additional examples. For example, horizontal and/or vertical mirrored versions of those modes shown in fig. 12A-12D may also be used.
The index and access can be directly set from the first merging list to the selected unidirectional prediction merging candidates; or these selected uni-directional prediction merge candidates may be put into a uni-directional prediction merge list for triangle prediction. The derived uni-directional prediction merge list comprises a plurality of uni-directional prediction merge candidates, and each uni-directional prediction merge candidate comprises one motion vector of the corresponding candidate in the first merge list. According to some examples of the present disclosure, each candidate in the first merge list includes at least one of a list 0 motion vector and a list 1 motion vector, and each uni-directionally predicted merge candidate may be one of a list 0 motion vector and a list 1 motion vector of the corresponding candidate in the first merge list. Each unidirectional prediction merge candidate is associated with a merge index of an integer value; and selects a list 0 motion vector and a list 1 motion vector based on a preset rule for unidirectional prediction merging candidates.
In one example, for each uni-directional prediction merge candidate having an even merge index value, a list 0 motion vector of the corresponding candidate in the first merge list having the same merge index is selected as the uni-directional prediction merge candidate; and for each uni-directional predicted merge candidate having an odd merge index value, selecting a list 1 motion vector of the corresponding candidate in the first merge list having the same merge index. In another example, for each uni-directional predicted merge candidate having an even merge index value, a list 1 motion vector of the corresponding candidate in the first merge list having the same merge index is selected; and for each uni-directional predicted merge candidate having an odd merge index value, selecting a list 0 motion vector for the corresponding candidate in the first merge list having the same merge index.
In yet another example, for each uni-directional prediction merge candidate, in the event that it is determined that a list 1 motion vector of the corresponding candidate in the first merge list is available, selecting the list 1 motion vector as the uni-directional prediction merge candidate; and in the event that it is determined that a list 1 motion vector is not available, selecting a list 0 motion vector for the corresponding candidate in the first merged list.
In yet another example, for each uni-directionally predicted merge candidate having a merge index value within a first range, a list 0 motion vector of the corresponding candidate in the first merge list is selected as the uni-directionally predicted merge candidate; and for each uni-directionally predicted merge candidate having a merge index value within the second range, selecting a list 1 motion vector for the corresponding candidate in the first merge list.
Selecting unidirectional prediction merge candidates directly from a first merge list for triangular prediction mode
Fig. 13 illustrates another example of the present disclosure. Once the first merge list for the normal merge mode is constructed, the uni-directional prediction motion vector is selected directly from the list for triangle prediction. To indicate a particular list 0 motion vector or list 1 motion vector for triangle prediction, an index value is first signaled to indicate which candidate is selected from the first merge list. Then, a binary reference list indication flag, which is referred to as l0l1_flag in the following description, is signaled to indicate whether the list 0 motion vector of the candidate selected from the first merge list or the list 1 motion vector of the candidate is selected for the first partition of triangle prediction. The same signaling method is used to indicate that the second list 0 motion vector or the second list 1 motion vector is used for the second partition of triangle prediction. For example, semantics signaled for a CU for triangle mode codec may include index1, l0l1_flag1, index2, l0l1_flag2. Here, index1 and index2 are merging index values of two candidates selected from the first merging list for the first partition and the second partition, respectively. L0l1_flag1 is a binary flag of the first partition to indicate whether to select the list 0 motion vector of the candidate selected based on index1 from the first merge list or the list 1 motion vector of the candidate. L0l1_flag2 is a binary flag of the second partition to indicate whether to select the list 0 motion vector of the candidate selected based on index2 from the first merge list or the list 1 motion vector of the candidate. It is worth mentioning that different signaling orders of the semantics described above may be used in the method of the present disclosure. In one example, the signaling order may follow index1→l0l1_flag1→index2→l0l1_flag2; in another example, the signaling order may follow index1→index2→l0l1_flag1→l0l1_flag2; etc. Thus, in the methods of the present disclosure, the order of description of the signaled semantics should not be interpreted as the unique order of signaling of those semantics based on the method; alternatively, other different signaling orders of those semantics may be used, which should be understood to also be encompassed in the methods of the present disclosure.
In the above signaling method, in the triangle prediction mode, each list 0 motion vector and/or list 1 motion vector as indicated by the symbol "x" in the rectangular box in fig. 13 may be indicated/signaled to the decoder for deriving the prediction of the first partition, and each list 0 motion vector and/or list 1 motion vector as indicated by the symbol "x" in the rectangular box in fig. 13 may also be indicated/signaled to the decoder for deriving the prediction of the second partition. Therefore, the selection of unidirectional predicted motion vectors from the first merge list becomes very flexible. Given a first merge list of N candidates in size, up to 2N unidirectional prediction motion vectors may be used for each of the two triangle partitions. The two merge index values of the two partitions in the triangle prediction mode do not have to be different from each other. In other words, they may take the same value. The index value is signaled directly without adjustment prior to signaling. More specifically, unlike what is currently defined in VVC, the second index value is directly signaled to the decoder without any adjustment being performed on the second index value prior to signaling.
In another example of the present disclosure, when the two index values are the same, it is not necessary to signal the binary flag l0l1_flag2 of the second partition. Alternatively, the binary flag l0l1_flag2 is inferred to have the opposite value of the binary flag l0l1_flag1 relative to the first partition. In other words, in this case, L0L1_Flag2 may take the value (1-L0L1_Flag1).
In another example of the present disclosure, l0l1_flag1 and l0l1_flag2 may be encoded as CABAC context binary bits. The context for L0L1_Flag1 may be separate from the context for L0L1_Flag2. The CABAC probabilities in each context may be initialized at the beginning of the video sequence and/or at the beginning of a picture and/or at the beginning of a parallel block group.
In another example of the present disclosure, a unidirectional prediction zero motion vector may alternatively be used when the motion vector indicated by the combined index value and the associated l0l1_flag is not present.
In another example of the present disclosure, when the motion vector indicated by the merge index value and the associated l0l1_flag is not present, a corresponding motion vector indicated by the same merge index value but from another list (i.e., list (1-l0l1_flag)) may alternatively be used.
In another example of the present disclosure, for a CU of the triangle mode codec, a second l0l1_flag (i.e., l0l1_flag2) associated with a second index (i.e., index 2) is not signaled but is always inferred. In this case, it is still necessary to signal index1, l0l1_flag1 and index2 semantics. In one approach, the inference of l0l1_flag2 is based on the value of l0l1_flag1 and whether the current picture uses backward prediction. More specifically, for a triangle mode encoded CU, if the current picture uses backward prediction, the value of l0l1_flag2 is inferred to be the inverse binary value of l0l1_flag1 (i.e., 1-l0l1_flag1); if the current picture does not use backward prediction, the value of L0L1_Flag2 is inferred to be the same as L0L1_Flag1. In addition, if the current picture does not use backward prediction, it may be further forced that the value of index2 is different from the value of index1 because both motion vectors (one motion vector for each triangle partition) are from the same prediction list. If the value of index2 is equal to the value of index1, this means that the same motion vector will be used for both triangle partitions, which is useless from the codec efficiency point of view. In this case, when signaling the value of index2, a corresponding adjustment of the value of index2 may be performed prior to index binarization, as in current VVC designs signaling index 2. For example, in the case where the actual value of index1 is smaller than the actual value of index2, the value of index2 is signaled using a CABAC binarized codeword corresponding to (index 2-1); otherwise, the value of index2 is signaled using the CABAC binarized codeword corresponding to index 2. Accordingly, at the decoder side, if the signaled index2 value is smaller than the signaled index1 value, the actual value of index2 is set equal to the signaled index2 value; otherwise, the actual value of index2 is adjusted to be equal to the signaled index2 value plus one. It is worth mentioning that, based on this new example of the present disclosure, optionally, the index2 is forced to have a different value than index1 and the same index2 value adjustment for CABAC binarization can also be applied to the case when the current picture is used for backward prediction.
In another example of the present disclosure, for a CU of triangle mode codec, no l0l1_flag is signaled. Rather, they are all inferred. In this case, it is still necessary to signal index1 semantics and index2 semantics, where the index1 semantics and index2 semantics represent the merge index values of two candidates selected from the first merge list for the first partition and the second partition, respectively. Given the merge candidate index value, a particular method may be defined in determining whether to select a list 0 motion vector or a list 1 motion vector of the respective merge candidate from the first list for triangle mode prediction. In one approach, for index1, the mode shown in fig. 11A is used to determine from which prediction list to select the motion vector of the merge candidate for triangle mode prediction; and for index2, the mode shown in fig. 11B is used to determine from which prediction list the motion vector of the merge candidate is selected for triangle mode prediction. In other words, if index1 is an even value, a list 0 motion vector is selected, and if index1 is an odd value, a list 1 motion vector is selected. For index2, a list 1 motion vector is selected if index2 is an even value, and a list 0 motion vector is selected if index2 is an odd value. In case a motion vector corresponding to a particular prediction list does not exist, some default motion vector, such as a zero motion vector, or a motion vector from a corresponding candidate of another prediction list, etc., may alternatively be used. It is worth mentioning that the present disclosure also covers the case where the mode shown in fig. 11A is used for index2 and the mode shown in fig. 11B is used for index1 in determining from which prediction list the motion vector of the merge candidate is selected for triangle mode prediction.
In the above procedure, although the first merge list containing 5 merge candidates is described in all examples in this disclosure, actually, the size of the first merge list may be defined differently, for example, 6 or 4 or some other value. All the methods described in this disclosure are equally applicable when the first merge list has a size other than 5.
In the above process, motion vector pruning may also be performed. This trimming may be done completely or partially. When partially performed, this means that the new motion vector is compared to some, but not all, of the motion vectors already in the uni-directional prediction merge list. This may also mean that only some, but not all, new motion vectors need to be examined for pruning before being used as merging candidates for triangle prediction. One specific example is to examine the second motion vector only for the first motion vector to prune, instead of examining all other motion vectors to prune, before using the second motion vector as a merge candidate for triangle prediction. In extreme cases, no motion vector pruning (i.e., motion vector comparison operations) is performed in the process.
Although the methods of forming the unidirectional prediction merge list in the present disclosure are described with respect to triangle prediction modes, these methods are applicable to other prediction modes of similar kind. For example, in a more general geometric partition prediction mode where a CU is partitioned into two PUs along a line that is not entirely diagonal, the two PUs may have a geometry such as a triangle, wedge, or trapezoid shape. In such cases, the predictions for each PU are formed in a similar manner as in the triangular prediction mode, as is applicable to the methods described herein.
Fig. 14 is a block diagram illustrating an apparatus for video encoding and decoding according to some embodiments of the present disclosure. The apparatus 1400 may be a terminal such as a mobile phone, tablet computer, digital broadcast terminal, tablet device, or personal digital assistant.
As shown in fig. 14, the apparatus 1400 may include one or more of the following components: processing component 1402, memory 1404, power component 1406, multimedia component 1408, audio component 1410, input/output (I/O) interface 1412, sensor component 1414, and communication component 1416.
The processing component 1402 generally controls overall operation of the device 1400, such as operations related to display, telephone calls, data communications, camera operations, and recording operations. The processing component 1402 may include one or more processors 1420 for executing instructions to perform all or part of the steps of the methods described above. Further, the processing component 1402 can include one or more modules for facilitating interactions between the processing component 1402 and other components. For example, the processing component 1402 may include a multimedia module for facilitating interaction between the multimedia component 1408 and the processing component 1402.
The memory 1404 is configured to store different types of data to support the operation of the device 1400. Examples of such data include instructions for any application or method operating on the device 1400, contact data, phonebook data, messages, pictures, videos, and the like. The memory 1404 may be implemented by any type or combination of volatile or nonvolatile memory devices and the memory 1404 may be Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk.
The power component 1406 provides power to the different components of the device 1400. The power components 1406 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 1400.
The multimedia component 1408 includes a screen that provides an output interface between the device 1400 and the user. In some examples, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen that receives input signals from a user. The touch panel may include one or more touch sensors for sensing touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or sliding action, but also the duration and pressure associated with the touch or sliding operation. In some examples, the multimedia component 1408 may include a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the device 1400 is in an operational mode, such as a shooting mode or a video mode.
The audio component 1410 is configured to output and/or input audio signals. For example, the audio component 1410 includes a Microphone (MIC). The microphone is configured to receive external audio signals when the device 1400 is in an operational mode, such as a call mode, a recording mode, and a speech recognition mode. The received audio signals may further be stored in the memory 1404 or transmitted via the communication component 1416. In some examples, audio component 1410 further includes a speaker for outputting audio signals.
I/O interface 1412 provides an interface between processing component 1402 and peripheral interface modules. The peripheral interface module may be a keyboard, click wheel, button, etc. These buttons may include, but are not limited to, a home button, a volume button, an activate button, and a lock button.
The sensor assembly 1414 includes one or more sensors for providing status assessment in different aspects of the apparatus 1400. For example, the sensor assembly 1414 may detect the on/off state of the device 1400 and the relative positions of the assemblies. For example, the components are a display and a keyboard of the device 1400. The sensor assembly 1414 may also detect a change in the position of the device 1400 or a component of the device 1400, the presence or absence of a user contact on the device 1400, the direction or acceleration/deceleration of the device 1400, and a change in the temperature of the device 1400. The sensor assembly 1414 may include a proximity sensor configured to detect the presence of nearby objects without any physical touch. The sensor component 1314 may also include an optical sensor, such as a CMOS or CCD image sensor used in imaging applications. In some examples, the sensor assembly 1414 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 1416 is configured to facilitate wired or wireless communication between the apparatus 1400 and other devices. The device 1400 may access a wireless network based on a communication standard such as WiFi, 4G, or a combination thereof. In an example, the communication component 1416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an example, the communication component 1416 can further include a Near Field Communication (NFC) module for facilitating short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an example, the apparatus 1400 may be implemented by one or more of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components to perform the methods described above.
The non-transitory computer readable storage medium may be, for example, a Hard Disk Drive (HDD), a Solid State Drive (SSD), a flash memory, a hybrid drive or Solid State Hybrid Drive (SSHD), a read-only memory (ROM), a compact disk read-only memory (CD-ROM), a magnetic tape, a floppy disk, etc.
Fig. 15 is a flowchart illustrating an exemplary process for video codec for motion compensated prediction using a geometry prediction unit according to some embodiments of the present disclosure.
In step 1501, the processor 1420 partitions the video picture into a plurality of Coding Units (CUs), wherein at least one CU of the plurality of CUs is further partitioned into two Prediction Units (PUs). The two PUs may include PUs of at least one geometry. For example, a geometric PU may include a pair of triangular-shaped PUs, a pair of wedge-shaped PUs, or other geometric PUs.
At step 1502, the processor 1420 builds a first merge list comprising a plurality of candidates, wherein each candidate comprises one or more motion vectors, list 0 motion vectors, or list 1 motion vectors. For example, the processor 1420 may construct the first merge list based on a merge list construction process for conventional merge prediction. The processor 1420 may also obtain the first consolidated list from other electronic devices or memory.
In step 1503, the processor 1420 receives a signaled first index value indicating a first candidate selected from a first merge list.
At step 1504, the processor 1420 receives a signaled second index value indicating a second candidate selected from the first merge list.
In step 1505, processor 1420 receives a signaled first binary flag indicating whether to select a list 0 motion vector of a first candidate or a list 1 motion vector of the first candidate for a first PU of a geometric prediction.
In step 1506, the processor 1420 infers whether the second binary flag is a second PU indicating whether to select a list 0 motion vector of the second candidate or a list 1 motion vector of the second candidate for geometric prediction based on the first binary flag and based on whether the current picture uses backward prediction.
In some examples, an apparatus for video encoding and decoding is provided. The apparatus includes a processor 1420; and a memory 1404 configured to store instructions executable by the processor; wherein the processor, when executing the instructions, is configured to perform the method as shown in fig. 15.
In some other examples, a non-transitory computer-readable storage medium 1404 having instructions stored therein is provided. The instructions, when executed by the processor 1420, cause the processor to perform the method as shown in fig. 15.
The description of the present disclosure has been presented for purposes of illustration and is not intended to be exhaustive or limited to the disclosure. Many modifications, variations and alternative embodiments will come to mind to one skilled in the art having the benefit of the teachings presented in the foregoing descriptions and the associated drawings.
The examples were chosen and described in order to explain the principles of the present disclosure and to enable others skilled in the art to understand the disclosure for various embodiments and with various modifications as are suited to the particular use contemplated. Therefore, it is to be understood that the scope of the disclosure is not to be limited to the specific examples of the disclosed embodiments, and that modifications and other embodiments are intended to be included within the scope of the disclosure.

Claims (6)

1. A method for video encoding and decoding using geometric prediction, comprising:
partitioning a video picture into a plurality of coding units, CUs, wherein at least one CU of the plurality of CUs is further partitioned into two prediction units, PUs, both PUs being geometrically shaped PUs;
constructing a first merge list comprising a plurality of candidates based on a merge list construction process for conventional merge prediction, wherein each candidate of the plurality of candidates is a motion vector comprising a list 0 motion vector or a list 1 motion vector or both a list 0 motion vector and a list 1 motion vector;
receiving a signaled first index value indicating a first candidate selected from the first merge list;
Receiving a signaled second index value indicating a second candidate selected from the first merge list;
receiving a signaled first binary flag indicating whether to select a list 0 motion vector of the first candidate or a list 1 motion vector of the first candidate for a first PU of the geometric prediction;
inferring whether a second binary flag indicates whether to select a list 0 motion vector of the second candidate or a list 1 motion vector of the second candidate for a second PU of the geometric prediction based on the first binary flag and based on whether a current picture uses backward prediction,
wherein inferring, based on the first binary flag and based on whether the current picture uses backward prediction, whether a second binary flag indicates whether to select a list 0 motion vector of the second candidate or a list 1 motion vector of the second candidate for the geometrically predicted second PU, further comprises:
in response to the current picture using backward prediction, inferring the second binary flag as an opposite binary value of the first binary flag;
in response to the current picture not using backward prediction, the second binary flag is inferred to be the same value of the first binary flag.
2. The method for video coding with geometric prediction of claim 1, wherein the second index value is different from the first index value in response to the current picture not using backward prediction.
3. The method for video coding using geometric prediction of claim 1, further comprising:
in response to the current picture not using backward prediction and in response to the second signaled index value being equal to or greater than the first signaled index value, the second index value is adjusted to be equal to the second signaled index value plus one.
4. An apparatus for video encoding and decoding using geometric prediction, comprising:
one or more processors; and
a memory configured to store instructions executable by the one or more processors;
wherein the one or more processors, when executing the instructions, are configured to:
partitioning a video picture into a plurality of coding units, CUs, wherein at least one CU of the plurality of CUs is further partitioned into two prediction units, PUs, both PUs being geometrically shaped PUs;
constructing a first merge list comprising a plurality of candidates based on a merge list construction process for conventional merge prediction, wherein each candidate of the plurality of candidates is a motion vector comprising a list 0 motion vector or a list 1 motion vector or both a list 0 motion vector and a list 1 motion vector;
Receiving a signaled first index value indicating a first candidate selected from the first merge list;
receiving a signaled second index value indicating a second candidate selected from the first merge list;
receiving a signaled first binary flag indicating whether to select a list 0 motion vector of the first candidate or a list 1 motion vector of the first candidate for a first PU of the geometric prediction;
inferring whether a second binary flag indicates whether to select a list 0 motion vector of the second candidate or a list 1 motion vector of the second candidate for a second PU of the geometric prediction based on the first binary flag and based on whether a current picture uses backward prediction,
wherein the one or more processors, when executing the instructions, are further configured to:
in response to the current picture using backward prediction, inferring the second binary flag as an opposite binary value of the first binary flag;
in response to the current picture not using backward prediction, the second binary flag is inferred to be the same value of the first binary flag.
5. The device for video coding with geometric prediction of claim 4, wherein the second index value signaled is different from the first index value signaled in response to the current picture not using backward prediction.
6. The apparatus for video coding using geometric prediction of claim 4, wherein the one or more processors, when executing the instructions, are further configured to:
in response to the current picture not using backward prediction and in response to the second signaled index value being equal to or greater than the first signaled index value, the second index value is adjusted to be equal to the second signaled index value plus one.
CN202080042822.XA 2019-04-25 2020-04-27 Method and apparatus for video encoding and decoding using triangle prediction Active CN113994672B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201962838935P 2019-04-25 2019-04-25
US62/838,935 2019-04-25
PCT/US2020/030124 WO2020220037A1 (en) 2019-04-25 2020-04-27 Methods and apparatuses for video coding with triangle prediction

Publications (2)

Publication Number Publication Date
CN113994672A CN113994672A (en) 2022-01-28
CN113994672B true CN113994672B (en) 2023-07-25

Family

ID=72941866

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080042822.XA Active CN113994672B (en) 2019-04-25 2020-04-27 Method and apparatus for video encoding and decoding using triangle prediction

Country Status (2)

Country Link
CN (1) CN113994672B (en)
WO (1) WO2020220037A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011501508A (en) * 2007-10-12 2011-01-06 トムソン ライセンシング Method and apparatus for video encoding and decoding of geometrically partitioned bi-predictive mode partition
EP3739883B1 (en) * 2010-05-04 2022-11-16 LG Electronics Inc. Method and apparatus for encoding and decoding a video signal
US20170201773A1 (en) * 2014-06-12 2017-07-13 Nec Corporation Video coding apparatus, video coding method, and recording medium
CN115914625A (en) * 2016-08-01 2023-04-04 韩国电子通信研究院 Image encoding/decoding method

Also Published As

Publication number Publication date
CN113994672A (en) 2022-01-28
WO2020220037A1 (en) 2020-10-29

Similar Documents

Publication Publication Date Title
CN116800960B (en) Method, apparatus and storage medium for video decoding
US20220021894A1 (en) Methods and apparatuses for signaling of merge modes in video coding
CN113545050B (en) Video encoding and decoding method and device using triangle prediction
CN116156164B (en) Method, apparatus and readable storage medium for decoding video
US20230089782A1 (en) Methods and apparatuses for video coding using geometric partition
US20220239902A1 (en) Methods and apparatuses for video coding using triangle partition
US20220070445A1 (en) Methods and apparatuses for video coding with triangle prediction
US20220014780A1 (en) Methods and apparatus of video coding for triangle prediction
CN113994672B (en) Method and apparatus for video encoding and decoding using triangle prediction
CN114982230A (en) Method and apparatus for video coding and decoding using triangle partitions
WO2020236991A1 (en) Methods and apparatuses for video coding using triangle partition
CN114080807A (en) Method and device for video coding and decoding by utilizing triangular partition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant