WO2023194334A1

WO2023194334A1 - Video encoding and decoding using reference picture resampling

Info

Publication number: WO2023194334A1
Application number: PCT/EP2023/058736
Authority: WO
Inventors: Philippe Bordes; Tangi POIRIER; Franck Galpin; Antoine Robert
Original assignee: Interdigital Ce Patent Holdings, Sas
Priority date: 2022-04-07
Filing date: 2023-04-04
Publication date: 2023-10-12

Abstract

In a video coding system, it is proposed to adapt video coding tools to the use of Reference Picture Re-scaling where a reference picture has a different size than the current picture to be coded or decoded. Different embodiments are proposed hereafter for prediction (geometric partitioning, template matching based re-ordering for MVD sign prediction, template matching based re-ordering for enhanced MV prediction) introducing some tools modifications to increase coding efficiency and improve the codec consistency when RPR is enabled. A video encoding method, a decoding method, a video encoder and a video decoder are described.

Description

VIDEO ENCODING AND DECODING USING REFERENCE PICTURE RESAMPLING

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of European Application No. 22305485.9, filed on April 7th, 2022, which is incorporated herein by reference in its entirety.

TECHNICAL FIELD

At least one of the present embodiments generally relates to a method or an apparatus for video encoding or decoding, and more particularly, to a method or an apparatus performing Geometric Partitioning or Template Matching (TM) based reordering for Motion Vector Difference (MVD) sign prediction or for enhanced prediction of both Motion Vector and Motion Vector Difference.

BACKGROUND

To achieve high compression efficiency, image and video coding schemes usually employ prediction, including motion vector prediction, and transform to leverage spatial and temporal redundancy in the video content. Generally, intra or inter prediction is used to exploit the intra or inter frame correlation, then the differences between the original image and the predicted image, often denoted as prediction errors or prediction residuals, are transformed, quantized, and entropy coded. To reconstruct the video, the compressed data are decoded by inverse processes corresponding to the entropy coding, quantization, transform, and prediction.

Existing methods for coding and decoding show some limitations with the usage some tools with Reference Picture Re-scaling (RPR) where a reference picture has a different size than the current picture to be coded or decoded. Therefore, there is a need to improve the state of the art.

SUMMARY

The drawbacks and disadvantages of the prior art are solved and addressed by the general aspects described herein. According to a first aspect, there is provided a method. The method comprises predicting a block of a picture of a video using geometrical partition mode combining at least two partitions and storing intra parameters in a motion information buffer of the picture, wherein responsive that a subblock of a partition among the at least two partitions is inter, the method further comprises obtaining intra parameter from intra parameter stored in a motion information buffer of a reference picture at a location corresponding to motion compensated location of the subblock of the partition, the location being scaled using a ratio of the reference picture relatively to the picture when the reference picture has a size different from the size of the picture.

According to another aspect, there is provided a second method. The method comprises predicting a block of a picture of a video by motion compensation, the method further comprising reconstructing a motion vector by adding a motion vector prediction to a motion vector difference wherein at least the motion vector difference is obtained using templatebased matching to order a list of candidates and wherein the template-based matching is modified for a reference picture having a size different from a size of the picture. Advantageously the template-based matching (TM) is used to reorder a list of candidates for Motion Vector Difference (MVD) sign prediction or for enhanced prediction of both Motion Vector and Motion Vector Difference.

According to another aspect, there is provided a third method. The method comprises decoding data representative of a block of a picture of a video wherein the decoding further comprises predicting a block according to the predicting method adapted to Reference Picture Re-scaling in any of its variants and decoding picture data of the block of the picture of the video based on the predicted block.

According to another aspect, there is provided a fourth method. The method comprises encoding data representative of a block of a picture of a video wherein the encoding further comprises predicting a block according to the predicting method adapted to Reference Picture Re-scaling in any of its variants and encoding picture data of the block of the picture of the video based on the predicted block.

According to another aspect, there is provided an apparatus. The apparatus comprises one or more processors, wherein the one or more processors are configured to implement the method for video decoding according to any of its variants. According to another aspect, there is provided another apparatus. The apparatus comprises one or more processors, wherein the one or more processors are configured to implement the method for video decoding according to any of its variants.

According to another general aspect of at least one embodiment, there is provided a device comprising an apparatus according to any of the decoding embodiments; and at least one of (i) an antenna configured to receive a signal, the signal including the video block, (ii) a band limiter configured to limit the received signal to a band of frequencies that includes the video block, or (iii) a display configured to display an output representative of the video block.

According to another general aspect of at least one embodiment, there is provided a non- transitory computer readable medium containing data content generated according to any of the described encoding embodiments or variants.

According to another general aspect of at least one embodiment, there is provided a signal comprising video data generated according to any of the described encoding embodiments or variants.

According to another general aspect of at least one embodiment, a bitstream is formatted to include data content generated according to any of the described encoding embodiments or variants.

According to another general aspect of at least one embodiment, there is provided a computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out any of the described encoding/decoding embodiments or variants.

These and other aspects, features and advantages of the general aspects will become apparent from the following detailed description of exemplary embodiments, which is to be read in connection with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

In the drawings, examples of several embodiments are illustrated.

Figure 1 illustrates a block diagram of an example apparatus in which various aspects of the embodiments may be implemented. Figure 2 illustrates a block diagram of an embodiment of video encoder in which various aspects of the embodiments may be implemented.

Figure 3 illustrates a block diagram of an embodiment of video decoder in which various aspects of the embodiments may be implemented.

Figure 4 illustrates the principles of Reference Picture Resampling at the encoder side.

Figure 5 illustrates the principles of Reference Picture Resampling at the decoder side.

Figure 6 illustrates the different examples of predictions in a geometric partitioning mode.

Figure 7 illustrates example of GPM prediction and coarser storage parameters in motion info Figure 8 illustrates the storage of current CU coding intra mode information at sub-block precision in the motion info buffer.

Figure 9 illustrates samples of the template in a current picture and reference samples of the template in reference pictures.

Figure 10 illustrates Template Matching (TM) based reordering for MVD sign prediction.

Figure 1 1 illustrates the difference of size between the reference template and the current template when RPR is enabled.

Figure 12 illustrates a generic prediction method using template matching based reordering for motion vector difference adapted to RPR according to a general aspect of at least one embodiment.

Figure 13 illustrates another generic prediction method using template matching based reordering for motion vector difference adapted to RPR according to a general aspect of at least one embodiment.

Figure 14 illustrates Template Matching (TM) based reordering for enhanced motion vector and motion vector difference prediction.

Figure 15 illustrates another generic prediction method using template matching based reordering for motion vector difference adapted to RPR according to a general aspect of at least one embodiment.

Figure 16 illustrates a flowchart of an example of decoding using prediction adapted to reference picture re-sampling according to at least one embodiment.

Figure 17 illustrates a flowchart of an example of encoding using prediction adapted to reference picture re-sampling according to at least one embodiment

DETAILED DESCRIPTION

Various embodiments relate to a video coding system in which, in at least one embodiment, it is proposed to adapt video coding tools to the use of Reference Picture Re-scaling (RPR) where a reference picture has a different size than the current picture to be coded or decoded. Different embodiments are proposed hereafter, introducing some tools modifications to increase coding efficiency and improve the codec consistency when RPR is enabled. Amongst others, an encoding method, a decoding method, an encoding apparatus, a decoding apparatus based on this principle are proposed.

Moreover, the present aspects, although describing principles related to particular drafts of VVC (Versatile Video Coding) or to HEVC (High Efficiency Video Coding) specifications, or to ECM (Enhanced Compression Model) reference software are not limited to VVC or HEVC or ECM, and can be applied, for example, to other standards and recommendations, whether pre-existing or future-developed, and extensions of any such standards and recommendations (including VVC and HEVC and ECM). Unless indicated otherwise, or technically precluded, the aspects described in this application can be used individually or in combination.

The acronyms used herein are reflecting the current state of video coding developments and thus should be considered as examples of naming that may be renamed at later stages while still representing the same techniques.

Figure 1 illustrates a block diagram of an example of a system in which various aspects and embodiments can be implemented. System 100 may be embodied as a device including the various components described below and is configured to perform one or more of the aspects described in this application. Examples of such devices, include, but are not limited to, various electronic devices such as personal computers, laptop computers, smartphones, tablet computers, digital multimedia set top boxes, digital television receivers, personal video recording systems, connected home appliances, and servers. Elements of system 100, singly or in combination, may be embodied in a single integrated circuit, multiple ICs, and/or discrete components. For example, in at least one embodiment, the processing and encoder/decoder elements of system 100 are distributed across multiple ICs and/or discrete components. In various embodiments, the system 100 is communicatively coupled to other systems, or to other electronic devices, via, for example, a communications bus or through dedicated input and/or output ports. In various embodiments, the system 100 is configured to implement one or more of the aspects described in this application.

The system 100 includes at least one processor 110 configured to execute instructions loaded therein for implementing, for example, the various aspects described in this application. Processor 110 may include embedded memory, input output interface, and various other circuitries as known in the art. The system 100 includes at least one memory 120 (e.g. a volatile memory device, and/or a non-volatile memory device). System 100 includes a storage device 140, which may include non-volatile memory and/or volatile memory, including, but not limited to, EEPROM, ROM, PROM, RAM, DRAM, SRAM, flash, magnetic disk drive, and/or optical disk drive. The storage device 140 may include an internal storage device, an attached storage device, and/or a network accessible storage device, as non-limiting examples.

System 100 includes an encoder/decoder module 130 configured, for example, to process data to provide an encoded video or decoded video, and the encoder/decoder module 130 may include its own processor and memory. The encoder/decoder module 130 represents module(s) that may be included in a device to perform the encoding and/or decoding functions. As is known, a device may include one or both of the encoding and decoding modules. Additionally, encoder/decoder module 130 may be implemented as a separate element of system 100 or may be incorporated within processor 110 as a combination of hardware and software as known to those skilled in the art.

Program code to be loaded onto processor 1 10 or encoder/decoder 130 to perform the various aspects described in this application may be stored in storage device 140 and subsequently loaded onto memory 120 for execution by processor 1 10. In accordance with various embodiments, one or more of processor 110, memory 120, storage device 140, and encoder/decoder module 130 may store one or more of various items during the performance of the processes described in this application. Such stored items may include, but are not limited to, the input video, the decoded video or portions of the decoded video, the bitstream, matrices, variables, and intermediate or final results from the processing of equations, formulas, operations, and operational logic.

In several embodiments, memory inside of the processor 110 and/or the encoder/decoder module 130 is used to store instructions and to provide working memory for processing that is needed during encoding or decoding. In other embodiments, however, a memory external to the processing device (for example, the processing device may be either the processor 1 10 or the encoder/decoder module 130) is used for one or more of these functions. The external memory may be the memory 120 and/or the storage device 140, for example, a dynamic volatile memory and/or a non-volatile flash memory. In several embodiments, an external non-volatile flash memory is used to store the operating system of a television. In at least one embodiment, a fast external dynamic volatile memory such as a RAM is used as working memory for video coding and decoding operations, such as for HEVC, or VVC.

The input to the elements of system 100 may be provided through various input devices as indicated in block 105. Such input devices include, but are not limited to, (i) an RF portion that receives an RF signal transmitted, for example, over the air by a broadcaster, (ii) a Composite input terminal, (iii) a USB input terminal, and/or (iv) an HDMI input terminal. In various embodiments, the input devices of block 105 have associated respective input processing elements as known in the art. For example, the RF portion may be associated with elements suitable for (i) selecting a desired frequency (also referred to as selecting a signal, or band-limiting a signal to a band of frequencies), (ii) down converting the selected signal, (iii) band-limiting again to a narrower band of frequencies to select (for example) a signal frequency band which may be referred to as a channel in certain embodiments, (iv) demodulating the down converted and band-limited signal, (v) performing error correction, and (vi) demultiplexing to select the desired stream of data packets. The RF portion of various embodiments includes one or more elements to perform these functions, for example, frequency selectors, signal selectors, band-limiters, channel selectors, filters, downconverters, demodulators, error correctors, and demultiplexers. The RF portion may include a tuner that performs various of these functions, including, for example, down converting the received signal to a lower frequency (for example, an intermediate frequency or a near-baseband frequency) or to baseband. In one set-top box embodiment, the RF portion and its associated input processing element receives an RF signal transmitted over a wired (for example, cable) medium, and performs frequency selection by filtering, down converting, and filtering again to a desired frequency band. Various embodiments rearrange the order of the above-described (and other) elements, remove some of these elements, and/or add other elements performing similar or different functions. Adding elements may include inserting elements in between existing elements, for example, inserting amplifiers and an analog-to-digital converter. In various embodiments, the RF portion includes an antenna.

Additionally, the USB and/or HDMI terminals may include respective interface processors for connecting system 100 to other electronic devices across USB and/or HDMI connections. It is to be understood that various aspects of input processing, for example, Reed-Solomon error correction, may be implemented, for example, within a separate input processing IC or within processor 110 as necessary. Similarly, aspects of USB or HDMI interface processing may be implemented within separate interface ICs or within processor 110 as necessary. The demodulated, error corrected, and demultiplexed stream is provided to various processing elements, including, for example, processor 110, and encoder/decoder 130 operating in combination with the memory and storage elements to process the data stream as necessary for presentation on an output device.

Various elements of system 100 may be provided within an integrated housing, Within the integrated housing, the various elements may be interconnected and transmit data therebetween using suitable connection arrangement 115, for example, an internal bus as known in the art, including the I2C bus, wiring, and printed circuit boards. The system 100 includes communication interface 150 that enables communication with other devices via communication channel 190. The communication interface 150 may include, but is not limited to, a transceiver configured to transmit and to receive data over communication channel 190. The communication interface 150 may include, but is not limited to, a modem or network card and the communication channel 190 may be implemented, for example, within a wired and/or a wireless medium.

Data is streamed to the system 100, in various embodiments, using a Wi-Fi network such as IEEE 802. 1 1 . The Wi-Fi signal of these embodiments is received over the communications channel 190 and the communications interface 150 which are adapted for Wi-Fi communications. The communications channel 190 of these embodiments is typically connected to an access point or router that provides access to outside networks including the Internet for allowing streaming applications and other over-the-top communications. Other embodiments provide streamed data to the system 100 using a set-top box that delivers the data over the HDMI connection of the input block 105. Still other embodiments provide streamed data to the system 100 using the RF connection of the input block 105.

The system 100 may provide an output signal to various output devices, including a display 165, speakers 175, and other peripheral devices 185. The other peripheral devices 185 include, in various examples of embodiments, one or more of a stand-alone DVR, a disk player, a stereo system, a lighting system, and other devices that provide a function based on the output of the system 100. In various embodiments, control signals are communicated between the system 100 and the display 165, speakers 175, or other peripheral devices 185 using signaling such as AV. Link, CEC, or other communications protocols that enable device- to-device control with or without user intervention. The output devices may be communicatively coupled to system 100 via dedicated connections through respective interfaces 160, 170, and 180. Alternatively, the output devices may be connected to system 100 using the communications channel 190 via the communications interface 150. The display 165 and speakers 175 may be integrated in a single unit with the other components of system 100 in an electronic device, for example, a television. In various embodiments, the display interface 160 includes a display driver, for example, a timing controller (T Con) chip.

The display 165 and speaker 175 may alternatively be separate from one or more of the other components, for example, if the RF portion of input 105 is part of a separate set-top box. In various embodiments in which the display 165 and speakers 175 are external components, the output signal may be provided via dedicated output connections, including, for example, HDMI ports, USB ports, or COMP outputs. Figure 2 illustrates an example video encoder 200, such as VVC (Versatile Video Coding) encoder. Figure 2 may also illustrate an encoder in which improvements are made to the VVC standard or an encoder employing technologies similar to VVC.

In the present application, the terms “reconstructed” and “decoded” may be used interchangeably, the terms “encoded” or “coded” may be used interchangeably, and the terms “image,” “picture” and “frame” may be used interchangeably. Usually, but not necessarily, the term “reconstructed” is used at the encoder side while “decoded” is used at the decoder side.

Before being encoded, the video sequence may go through pre-encoding processing (201 ), for example, applying a color transform to the input color picture (e.g., conversion from RGB 4:4:4 to YCbCr 4:2:0), or performing a remapping of the input picture components in order to get a signal distribution more resilient to compression (for instance using a histogram equalization of one of the color components). Metadata can be associated with the preprocessing and attached to the bitstream. According to another example, pre-encoding comprises a re-scaling of the input picture for Reference Picture Re-scaling as described hereafter.

In the encoder 200, a picture is encoded by the encoder elements as described below. The picture to be encoded is partitioned (202) and processed in units of, for example, CUs. Each unit is encoded using, for example, either an intra or inter mode. When a unit is encoded in an intra mode, it performs intra prediction (260). In an inter mode, motion estimation (275) and compensation (270) are performed. The encoder decides (205) which one of the intra mode or inter mode to use for encoding the unit, and indicates the intra/inter decision by, for example, a prediction mode flag. Prediction residuals are calculated, for example, by subtracting (210) the predicted block from the original image block.

The prediction residuals are then transformed (225) and quantized (230). The quantized transform coefficients, as well as motion vectors and other syntax elements, are entropy coded (245) to output a bitstream. The encoder can skip the transform and apply quantization directly to the non-transformed residual signal. The encoder can bypass both transform and quantization, i.e., the residual is coded directly without the application of the transform or quantization processes.

The encoder decodes an encoded block to provide a reference for further predictions. The quantized transform coefficients are de-quantized (240) and inverse transformed (250) to decode prediction residuals. Combining (255) the decoded prediction residuals and the predicted block, an image block is reconstructed. In-loop filters (265) are applied to the reconstructed picture to perform, for example, deblocking/SAO (Sample Adaptive Offset) filtering to reduce encoding artifacts. The filtered image is stored at a reference picture buffer (280).

Figure 3 illustrates a block diagram of an example video decoder 300. In the decoder 300, a bitstream is decoded by the decoder elements as described below. Video decoder 300 generally performs a decoding pass reciprocal to the encoding pass as described in Figure 2. The encoder 200 also generally performs video decoding as part of encoding video data.

In particular, the input of the decoder includes a video bitstream, which can be generated by video encoder 200. The bitstream is first entropy decoded (330) to obtain transform coefficients, motion vectors, and other coded information. The picture partition information indicates how the picture is partitioned. The decoder may therefore divide (335) the picture according to the decoded picture partitioning information. The transform coefficients are dequantized (340) and inverse transformed (350) to decode the prediction residuals. Combining (355) the decoded prediction residuals and the predicted block, an image block is reconstructed. The predicted block can be obtained (370) from intra prediction (360) or motion-compensated prediction (i.e., inter prediction) (375). In-loop filters (365) are applied to the reconstructed image. The filtered image is stored at a reference picture buffer (380).

The decoded picture can further go through post-decoding processing (385), for example, an inverse color transform (e.g., conversion from YCbCr 4:2:0 to RGB 4:4:4) or an inverse remapping performing the inverse of the remapping process performed in the pre-encoding processing (201 ). According to another example, post decoding comprises a re-scaling of the decoded picture performing the inverse of the re-scaling of the encoding. The post-decoding processing can use metadata derived in the pre-encoding processing and signaled in the bitstream.

A video coding system may comprise a plurality of different tools for encoding and decoding according to different coding modes. A coding mode is selected for a block of image or a larger area of an image or a video according to rate-distortion optimization. Examples of tools are Reference Picture Resampling (RPR), Geometric Partitioning Mode (GPM), Template Matching (TM) based reordering for Motion Vector Difference (MVD) sign prediction among others.

Figure 4 illustrates the principles of Reference Picture Resampling at the encoder side, while Figure 5 illustrates the principles of Reference Picture Resampling at the decoder side. Reference Picture Resampling (RPR) is a picture-based re-scaling feature. The principle is that, when possible, the encoding and decoding processes may operate on smaller images which may increase the overall compression rate.

Given an original video sequence composed of pictures of size (width x height), the encoder may choose, for each frame of the video sequence, the resolution (in other words the picture size) to use for coding the frame. Different picture parameter sets (PPS) are coded in the bitstream with the different possible sizes of the pictures and the slice header or picture header indicates which PPS to use to decode the current picture part included in the video coding layer (VCL) network abstraction layer (NAL) unit.

The down-sampler (440) and the up-sampler (540) functions are respectively used as preprocessing (such as pre-encoding processing 101 in figure 1 ) or post-processing (postdecoding processing 285 of figure 2). These functions are not specified by the video coding standard generally.

For each frame, the encoder selects whether to encode at original (full size) or down-sized resolution (ex: picture width/height divided by 2). The choice can be made with two passes encoding or considering spatial and temporal activity in the original pictures for example. Consequently, the reference picture buffer (180 in figure 1 and 280 in figure 2), also known as Decoded Picture Buffer (DPB), may contain reference pictures of different sizes than the size of the current picture. In case one reference picture in the DPB has a size different from the current picture, a re-scaling function (430 in figure 4 for the encoder side and 530 in figure 5 for the decoder side) down-scales or up-scales the reference block to build the prediction block during the motion compensation process (170 in figure 1 , 275 in figure 2). The re-scaling (430, 530) of the reference block is made implicitly during the motion compensation process (270, 375).

The RPR tool may be enabled explicitly or implicitly at different levels using different mechanisms. At sequence level: In the sequence parameter sets (SPS) that describes elements common to a series of pictures, a flag indicates that RPR may be applied for coding at least one picture. This is the case in VVC and ECM using a flag named ref_pic_resampling_enabled_flag. At picture level: if RPR is enabled at sequence level (as above) and current picture uses at least one reference picture with size different from current picture size. At CU level: if RPR is enabled at picture level and current CU uses at least one reference picture with size different from current picture size. In the following, the term “RPR enabled” can be understood at any of these levels. According to a first aspect, A tool used in video coding system is Geometric Partitioning Mode (GPM). The geometric partitioning mode (GPM) allows predicting one Oil with two non- rectangular partitions. The Intra mode storage in GPM with inter and intra prediction is now detailed.

Figure 6 illustrates the different examples of predictions used in GPM. Each partition may be Inter (375) or Intra (360). The samples of the inter partition are predicted with regular interprediction process, using motion compensated reference samples picked from one (uniprediction) reference picture. The samples of the intra partition are predicted with regular intra prediction mode (IPM) and prediction process where the available IPM candidates are the parallel angular mode against the GPM block boundary (Parallel mode), the perpendicular angular mode against the GPM block boundary (Perpendicular mode), and the Planar mode. Figure 6a shows the combination of inter prediction and intra prediction using the parallel angular mode against the GPM block boundary (Parallel mode), the IPM being represented by the arrow. Figure 6b shows the combination of inter prediction and intra prediction using the perpendicular angular mode against the GPM block boundary (Perpendicular mode). Figure 6c shows the combination of inter prediction and intra prediction using the planar mode. Figure 6d shows the combination of two intra predictions respectively using the Parallel mode and perpendicular mode.

The motion vectors used to reconstruct the inter prediction part(s) and the IPM used to reconstruct the intra partition(s) are stored in a buffer (called motion info, “Ml”) associated with the current picture in the DPB. To reduce the storage amount, the information may be stored at coarser resolution than the current picture (ex: 4x4 resolution, figure 7). For example, if the ratio between the current picture size and the motion info is si=4, then the “Ml” parameters of the block located at (x,y) are stored at mi(x/4;y/4) in the motion info buffer. In case of GPM, since the partition may be non-rectangular, for the sub-blocks (or sub-partitions) shared by the two partitions, the stored information corresponds to the partition which occupies the larger area. For the sub-partition coded in inter, the motion information (MV and reference index) is stored and the IPM information is retrieved from the Ml buffer of the reference at the location miRef((x+mvx)/4;(y+mvy)/4). For the sub-partition coded in intra, the IPM information is stored, and the motion information is undefined.

Figure 7 illustrates examples of two GPM predictions and coarser storage parameters (dashed sub-blocks) in motion/IPM info. For example, in the figure 7(a) the Oil of size 32x32 is predicted with one partition inter and one partition intra. The corresponding inter and intra parameters are stored in the Ml buffer at 4x4 resolution as shown on figure 7(b) with dashed sub-blocks.

Figure 8 illustrates the storage of current CU coding mode information at sub-block precision in the motion info buffer. This information may be later used for coding subsequent CUs in the current picture and CUs in the subsequent pictures (in coding order), to predict the motion or the IPM to be used for example. First, a reconstructed CU is input (810). Then it is determined (820) for the given sub-sub, which partition (inter or intra) occupies the largest area in the subblock. Given a sub-block (x,y), if more samples are predicted with intra than in inter (820), the intra parameters are selected (830) to be stored in the Ml buffer (850) . If more samples are predicted with inter than in intra (840), then the intra parameters are retrieved from the reference Ml buffer, which is the Ml buffer associated with the reference samples used for inter prediction for this sub-block. The location of the intra parameters in the reference Ml buffer is ((x+mvX)/si , (y+mvY)/si) where (mvX,mvY) is the motion vector of the inter prediction and si is the ratio between the current picture size and the motion info buffer size and (x,y) is the location of the sub-block in the current frame. Thus, the Intra parameters from reference Ml buffer are stored in the current Ml buffer according to mi_cur(x/si, y/si) = mi_ref( (x+mvX)/si , (y+mvY)/si ) (840).

Then the coding mode and the intra parameters either retrieved from Intra partition (830) or Inter partition (840) are stored (850) in the current Ml buffer at location mi_cur(x/si, y/si).

In case of bi-prediction, some rules are applied to select between refO or ref 1 intra parameters. For example, refO may be used if it is coded in intra. If both refO and ref 1 are coded in inter, one may choose the reference picture with POC closest to the current POC.

According to another aspect, a Motion Vector Difference (MVD) can be coded in Inter prediction. The motion vector values are predicted from a list of motion vector(s) candidates Mv[] which are built from previously reconstructed CUs. These candidates are made of spatially neighboring CUs, co-located CU, and history-based motion vectors. An index “idx1” is coded for the current CU to indicate which candidate in the list to use. For bi-prediction, the MV candidates may be pair of motion vectors. The motion vector value “Mv[idx1]” is corrected with mvd(mvdx,mvd_y) and the motion vector is computed as:

Mv = Mv[idx1] + mvd

The mvd is made of 2 components (mvdx, mvdy). These values may be parsed from the bitstream in case of the current CU is coded in AMVP mode for example. In case of MMVD, a list of pre-defined mvd refinement positions along kxn78 diagonal angles is built and the index of the mvd to use is coded. In case of bi-prediction and SMVD mode, the pair of mvd is symmetric: mvd1 = -mvdO, and only one mvd value is coded. In case of coding modes with MVD sign prediction, the absolute values of the mvd refinements (absMvdx, absMvdy) are parsed or derived from the bitstream for the current CU. In case of bi-prediction or affine, two or three pairs of absolute values of mvd may be decoded respectively.

Figure 9 illustrates samples of the template in a current picture and reference samples of the template in reference pictures in the more general case of bi-prediction.

According to a first variant embodiment, MVD coding uses Template (TM) based reordering for MVD sign prediction.

Figure 10 illustrates Template Matching (TM) based reordering for MVD sign prediction. First, the absolute values of the mvd refinements (absMvdx, absMvdy) are parsed or derived from the bitstream for the current CU (1010). Then, to determine the sign of the mvd component values, a list of sign candidates corresponding to all the possible mvd components sign values combination is built (1020). The size of the list depends on the number of mvd values with non-zero component. For example:

Uni-prediction with 2 non-zero components: { (-1 ;-1 ), (-1 ;+1 ), (+1 ;-1 ), (+1 ,+1 )}

Uni-prediction with 1 zero component: { (-1 ), (+1 )}

Bi-prediction with 3 non-zero component and 1 zero component:

{(+1 , +1 , +1 ), (+1 , +1 , -1 ), (+1 , -1 , +1 ), (+1 , -1 , -1 ), (-1 , +1 , +1 ), (-1 , +1 , -1 ), (-1 , -1 , +1 ), (-1 , -1 , -1)}

For 3 mvds with all non-zero absolute values, the list size is 64.

The index “idx2” of the mvd sign candidate to use is parsed from the bitstream for the current CU (1060). The list of mvd sign values may be adaptively re-ordered using template matching (TM) cost (1040).

The template matching cost (1030) of a mvd sign candidate is measured by the sum of absolute differences (SAD) between samples of a template T of the current block and their corresponding reference samples (RT0 or RT1 as shown on Figure 9) translated by Mv[idx1] + sign[idx2]*(mvd) where the signs are multiplied with the non-zero components. The template T comprises a set of reconstructed samples neighboring to the current block. Reference samples of the template (RT0, RT 1 ) are located by the motion information of the Mv candidate. When a Mv candidate utilizes bi-directional prediction (Mv is a pair of motion vectors), the reference samples of the template of the candidate are also generated by bi-prediction as shown in figure 9.

According to a second variant embodiment, MVD coding uses Enhanced Template (TM) based reordering for Candidates and MVD prediction.

Figure 14 illustrates Template Matching (TM) based reordering for enhanced motion vector and motion vector difference prediction. In ECM, an enhanced process (1400) may combine (1420) both the list of candidates (build in 1405) and the MMVD values (build in 1410). Then this single list contains several MV candidates (with their reference index) and several mvd values combinations. Note that each candidate may be uni-dir or bi-prediction. Thus, the list size may be important. For a list containing uni-directional candidates only, the list size is the size of candidates multiplied by the number of mvd values. For bi-prediction candidates, the list size is up to the size of candidates multiplied by the number of mvd values raised at the power of 2 (nbmvd values²) . As for the variant of the sign prediction, a template matching cost (1430) of a pair of MV candidate and of mvd candidate is measured by the sum of absolute differences (SAD) between samples of a template T of the current block and their corresponding reference samples (RT0 or RT1 ) translated by (Mv+mvd)[idx], Then, (Mv+mvd) are ordered according to the TM cost. According to this variant, a single index (idx) allows signaling both the MV candidate and the MVD correction as depicted in figure 14. This index is parsed (1460) from the bitstream and allows selecting the pair of MV, mvd candidate for inter prediction.

In ECM, in case of RPR is enabled, the reference template RT may have different size from the current template T. Figure 11 illustrates the difference of size between the reference template RT and the current template T when RPR is enabled. Thus the motion compensation to obtain the reference template may include an implicit re-scaling using the regular RPR process. However, this embodiment may induce significant complexity.

In current ECM, both the GPM with inter and intra prediction and the Template (TM) based reordering for MMVD and affine MMVD, MVD sign prediction, and enhanced TM based MV prediction do not support RPR.

This is solved and addressed by the general aspects described herein, which are directed to modifications of GPM with inter and intra prediction and Template (TM) based reordering for inter prediction to support RPR. In a first embodiment, it is proposed to modify the storage of the intra parameters in the Ml buffer for GPM with inter and intra predictions when RPR is enabled.

In case of the current ClI is coded with GPM, for the sub-blocks marked as inter, the intra parameters to store in the current Ml buffer (mi_CUr) are derived from the reference Ml buffer (miref). Let’s denote (x,y) the location of the sub-block in the current picture, the derivation of the location of the intra parameter to be copied from the reference Ml buffer to the current Ml buffer (840) is modified as follows: mi_cur(x/si, y/si) = mi_ref( (x+mvX).rix / si , (y+mvY).riy / si )

Where:

(x,y) is the location of the sub-block in the current picture

(mvX, mvY) is the motion vector associated with the GPM part coded in inter si is the sub-sampling of the (current) Ml buffer: a block of size (Sx,Sy) in the current picture corresponds to area (Sx/si,Sy/si) in the current Ml buffer.

(rix,riy) is the scaling ratio of the reference picture relatively to the current picture: rix = (reference picture size X) / (current picture size X) riy = (reference picture size Y) / (current picture size Y)

In a second embodiment, it is proposed to adapt the TM based reordering for MVD sign prediction when RPR is enabled. If the current Oil is coded in inter mode, it is proposed not applying the TM based MVD sign prediction if at least one reference picture has a different size from current picture while RPR being enabled.

Figure 12 illustrates a generic prediction method using TM based reordering for MVD sign when RPR is enabled according to a general aspect of at least one embodiment. Accordingly, the reference picture indexes are determined (1210). They can be parsed or derived from candidates in case of merge for example. The size of the reference pictures is retrieved (1250) and compared to the size of the current picture. If at least one reference picture used in the prediction has different size from the size of current picture, then TM based MVD sign prediction is not applied, and an alternative method is used to derive mvd (1230). In a first variant, the signed MVD values may be parsed from the bitstream. In a second variant, TM based method of figure 10 is modified to skip the reordering of the list of sign candidates, that is step 1020 is followed by step 1050. In a third variant, TM based method of figure 10 is modified to compute (1030) a cost depending on other processing than TM matching, then the list is reordered (1040) using the computed cost. Yet according to different variants of the cost computation on other processing, the cost can be set to a same default value (low or high depending whether to favor those candidates or not), the cost can be a function of the ratio of the size of the current picture versus the size of the reference picture (the cost is higher if current picture is lower than reference picture to foster the prediction with more resolution), or the cost can be function of the POC difference between the current picture and the reference picture (to foster the prediction with nearest POC).

Back to figure 12, if all the reference pictures used in the prediction have a same size as the size of the current picture, then TM based MVD sign prediction can be applied. In other words, the TM based MVD sign prediction is only applied on condition that the additional test (1250) determines that any of at least one reference picture used in prediction has a same size as the size of the picture. Advantageously, this arrangement allows implementing MVD sign prediction in inter prediction while RPR is enabled at a limited complexity (a condition on the size of the reference pictures and modified/skipped sign candidate re-ordering) for computing the TM cost.

Figure 13 illustrates another generic prediction method using TM based reordering for MVD sign when RPR is enabled according to a general aspect of at least one embodiment. In this variant, the comparison between the size of the reference pictures and the size of the current picture is performed in the before computing the TM cost. According to a variant, if at least one reference picture has different size from current picture, then the TM based re-ordering is skipped. Accordingly, the method of figure 10 is modified to add a test (1310) on the size of the reference picture. If at least one reference picture used in the prediction of the current CU has different size from current picture (yes), then the step (1030) is not applied (TM based costs not computed) and the re-ordering (1040) is also not performed. However, the MVD are selected in the list of sign candidates using the MVD index parsed from the bitstream. According to another variant, TM based costs are set to a default value and the re-ordering (1040) is performed based on the default value. Advantageously, this arrangement allows implementing MVD sign prediction in inter prediction while RPR is enabled at a limited complexity for reordering the list with the TM cost.

In a third embodiment, it is proposed to adapt the TM based reordering for Enhanced TM- based reordering for Candidates and MVD prediction with RPR enabled. Figure 15 illustrates a generic prediction method using template matching based reordering for motion vector and motion vector difference adapted to RPR according to a general aspect of at least one embodiment. Accordingly, the method of figure 14 is modified to add a test (1525) on the size of the reference pictures and an alternative TM setting (1535). In case the list contains candidates that may have different reference picture indexes, if at least one reference picture has different size from the size of the current picture, then the step of TM cost computing (1430) is not applied (TM based costs not computed) and the TM based costs are set to a default value (1535). The default value may be zero (to favor this candidate since the coding cost of the index will be small because on top of the re-ordered list) or maximal huge number (to disadvantage this candidate since the coding cost of the index will be high because on bottom of the re-ordered list). As described above different variants of the cost computation comprises setting the cost to a same default value, determining a cost as function of the ratio between the size of the current picture and the size of the reference picture, or the cost can be function of the POC difference between the current picture and the reference picture. On the contrary, if all the reference pictures used in by the prediction candidate have a same size as the current picture, the step 1430 is performed and the list is re-ordered with the TM based costs of each candidate.

Figure 16 illustrates a flowchart of an example of decoding using prediction adapted to reference picture re-sampling according to any of the previous embodiments. This method is for example implemented in a decoder 300 of figure 3 or in a decoder 130 of a device 100 of figure 1. In step 1610, a block is predicted according to at least one of the embodiments described above, namely prediction with GPM, prediction using MVD sign TM based reordering or prediction using MV+MVD TM based reordering in any of their variants. In step 1620, picture data of the block of the picture of the video is decoded based on the predicted block.

Figure 17 illustrates a flowchart of an example of encoding using prediction adapted to reference picture re-sampling according to any of the previous embodiments. This method is for example implemented in an encoder 200 of figure 2 or in an encoder 130 of a device 100 of figure 1. In step 1710, a block is predicted according to at least one of the embodiments described above. In step 1720, picture data of the block of the picture of the video is encoded based on the predicted block.

Additional Embodiments and Information

Various methods are described herein, and each of the methods comprises one or more steps or actions for achieving the described method. Unless a specific order of steps or actions is required for proper operation of the method, the order and/or use of specific steps and/or actions may be modified or combined. Additionally, terms such as “first”, “second”, etc. may be used in various embodiments to modify an element, component, step, operation, etc., for example, a “first decoding” and a “second decoding”. Use of such terms does not imply an ordering to the modified operations unless specifically required. So, in this example, the first decoding need not be performed before the second decoding, and may occur, for example, before, during, or in an overlapping time period with the second decoding. Various methods and other aspects described in this application can be used to modify modules, for example, the selection of the motion vectors in motion compensation or motion estimation modules (270, 275, 375), of a video encoder 200 and decoder 300 as shown in figure 2 and figure 3. Moreover, the present aspects are not limited to VVC or HEVC, and can be applied, for example, to other standards and recommendations, and extensions of any such standards and recommendations. Unless indicated otherwise, or technically precluded, the aspects described in this application can be used individually or in combination.

Various numeric values are used in the present application. The specific values are for example purposes and the aspects described are not limited to these specific values.

Various implementations involve decoding. “Decoding,” as used in this application, may encompass all or part of the processes performed, for example, on a received encoded sequence in order to produce a final output suitable for display. In various embodiments, such processes include one or more of the processes typically performed by a decoder, for example, entropy decoding, inverse quantization, inverse transformation, and differential decoding. Whether the phrase “decoding process” is intended to refer specifically to a subset of operations or generally to the broader decoding process will be clear based on the context of the specific descriptions and is believed to be well understood by those skilled in the art.

Various implementations involve encoding. In an analogous way to the above discussion about “decoding”, “encoding” as used in this application may encompass all or part of the processes performed, for example, on an input video sequence in order to produce an encoded bitstream.

Note that the syntax elements as used herein are descriptive terms. As such, they do not preclude the use of other syntax element names.

The implementations and aspects described herein may be implemented as various pieces of information, such as for example syntax, that can be transmitted or stored, for example. This information can be packaged or arranged in a variety of manners, including for example manners common in video standards such as putting the information into an SPS, a PPS, a NAL unit, a header (for example, a NAL unit header, or a slice header), or an SEI message. Other manners are also available, including for example manners common for system level or application level standards such as putting the information into one or more of the following:

• SDP (session description protocol), a format for describing multimedia communication sessions for the purposes of session announcement and session invitation, for example as described in RFCs and used in conjunction with RTP (Real-time Transport Protocol) transmission;

• DASH MPD (Media Presentation Description) Descriptors, for example as used in DASH and transmitted over HTTP, a Descriptor is associated to a Representation or collection of Representations to provide additional characteristic to the content Representation;

• RTP header extensions, for example as used during RTP streaming;

• ISO Base Media File Format, for example as used in OMAF and using boxes which are object-oriented building blocks defined by a unique type identifier and length also known as 'atoms' in some specifications;

• HLS (HTTP live Streaming) manifest transmitted over HTTP. A manifest can be associated, for example, to a version or collection of versions of a content to provide characteristics of the version or collection of versions.

The implementations and aspects described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, for example, computers, cell phones, portable/personal digital assistants (“PDAs”), and other devices that facilitate communication of information between end-users.

Reference to “one embodiment” or “an embodiment” or “one implementation” or “an implementation”, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment” or “in one implementation” or “in an implementation”, as well any other variations, appearing in various places throughout this application are not necessarily all referring to the same embodiment.

Additionally, this application may refer to “determining” various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.

Further, this application may refer to “accessing” various pieces of information. Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, moving the information, copying the information, calculating the information, determining the information, predicting the information, or estimating the information. Additionally, this application may refer to “receiving” various pieces of information. Receiving is, as with “accessing”, intended to be a broad term. Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory). Further, “receiving” is typically involved, in one way or another, during operations, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.

It is to be appreciated that the use of any of the following 7”, “and/or”, and “at least one of’, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as is clear to one of ordinary skill in this and related arts, for as many items as are listed.

Also, as used herein, the word “signal” refers to, among other things, indicating something to a corresponding decoder. For example, in certain embodiments the encoder signals a quantization matrix for de-quantization. In this way, in an embodiment the same parameter is used at both the encoder side and the decoder side. Thus, for example, an encoder can transmit (explicit signaling) a particular parameter to the decoder so that the decoder can use the same particular parameter. Conversely, if the decoder already has the particular parameter as well as others, then signaling can be used without transmitting (implicit signaling) to simply allow the decoder to know and select the particular parameter. By avoiding transmission of any actual functions, a bit savings is realized in various embodiments. It is to be appreciated that signaling can be accomplished in a variety of ways. For example, one or more syntax elements, flags, and so forth are used to signal information to a corresponding decoder in various embodiments. While the preceding relates to the verb form of the word “signal”, the word “signal” can also be used herein as a noun.

As will be evident to one of ordinary skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry the bitstream of a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.

We describe a number of embodiments. Features of these embodiments can be provided alone or in any combination, across various claim categories and types. Further, embodiments can include one or more of the following features, devices, or aspects, alone or in any combination, across various claim categories and types.

Claims

1 . A method for predicting a block of a picture of a video using geometrical partition mode combining at least two partitions comprising storing (800) intra parameters in a motion information buffer of the picture wherein responsive that a subblock of a partition among the at least two partitions is inter, obtaining (840) intra parameter from intra parameter stored in a motion information buffer of a reference picture at a location corresponding to motion compensated location of the subblock of the partition scaled using a ratio of the reference picture relatively to the picture; wherein the reference picture has a size different from the size of the picture.

2. A method for predicting a block of a picture of a video comprising reconstructing a motion vector by adding a motion vector prediction to a motion vector difference wherein at least the motion vector difference is obtained using template-based matching to order a list of candidates and wherein the template-based matching is modified for a reference picture having a size different from a size of the picture.

3. The method of claim 2, further comprising: obtaining (1210) at least one reference picture used in prediction; and in response (1250) that at least one reference picture has a size different from the size of the picture, deriving a sign of the motion vector difference with other method than template based matching motion vector difference sign prediction (1230).

4. The method of any of claims 2 or 3, further comprising in response (1250) that any of at least one reference picture used in prediction has a same size as the size of the picture, deriving a sign of the motion vector difference using template based matching motion vector difference sign prediction (1220).

5. The method of claim 2 further comprising: obtaining (1010) absolute value of the motion vector difference; determining (1020) a list of sign candidates for the motion vector difference; and in response (1310) that at least one reference picture has a size different from the size of the picture, deriving a sign for the motion vector difference from the list of sign candidates for the motion vector difference and a coded index.

6. The method of claim 2 further comprising: obtaining (1010) absolute value of the motion vector difference; determining (1020) a list of sign candidates for the motion vector difference; and in response (1310) that at least one reference picture has a size different from the size of the picture, setting a default value to a template based matching cost for a sign candidate, ordering the list of sign candidates for the motion vector difference according to the template based matching cost, and deriving a sign for the motion vector difference from the ordered list and a coded index.

7. The method of any of claims 2, 5 or 6 further comprising: in response (1310) that any of at least one reference picture used in prediction has a same size as the size of the picture, computing a template based matching cost for a sign candidate of the list of sign candidates for the motion vector difference, ordering the list of sign candidates according to the template based matching cost and deriving a sign for the motion vector difference from the ordered list and a coded index.

8. The method of claim 2 further comprising determining (1420) a list of candidates combining motion vector and motion vector difference; and in response (1525) that at least one reference picture has a size different from the size of the picture, setting a default value to a template based matching cost for a candidate combining motion vector and motion vector difference, ordering the list of candidates combining motion vector prediction and motion vector difference according to the template based matching cost and deriving the motion vector from the ordered list and a coded index.

9. The method of any of claims 2 or 8 further comprising determining (1420) a list of combined candidates for the motion vector and the motion vector difference; and in response (1525) that any of at least one reference picture used in prediction has a same size as the size of the picture, computing a template based matching cost for a candidate of a list of candidates combining motion vector prediction and motion vector difference for the motion vector, ordering the list of candidates combining motion vector prediction and motion vector difference according to the template based matching cost and deriving the motion vector from the ordered list and a coded index.

10. A method for decoding data representative of a block of a picture of a video comprising:

- predicting (1610) a block according to the method of any of claims 1 to 9; and - decoding (1620) picture data of the block of the picture of the video based on the predicted block.

1 1 . A method for encoding data representative of a block of a picture of a video comprising:

- predicting (1710) a block according to the method of any of claims 1 to 9; and

- encoding (1720) data representative of the block of the picture of the video based on the predicted block.

12. An apparatus (100) comprising a decoder (130) for decoding picture data, the decoder being configured to:

- predict a block according to the method of any of claims 1 to 9; and

- decode picture data of the block of the picture of the video based on the predicted block.

13. An apparatus (100) comprising an encoder (130) for encoding picture data, the encoder being configured to:

- predict a block according to the method of any of claims 1 to 9; and

- encode data representative of the block of the picture of the video based on the predicted block.

14. Computer program comprising program code instructions for implementing the steps of a method according to any of claims 1 to 9 when executed by a processor.

15. A non-transitory program storage device, readable by a computer, tangibly embodying a program of instructions executable by the computer for performing the method according to any one of claims 1 to 9.