WO2023194334A1 - Video encoding and decoding using reference picture resampling - Google Patents

Video encoding and decoding using reference picture resampling Download PDF

Info

Publication number
WO2023194334A1
WO2023194334A1 PCT/EP2023/058736 EP2023058736W WO2023194334A1 WO 2023194334 A1 WO2023194334 A1 WO 2023194334A1 EP 2023058736 W EP2023058736 W EP 2023058736W WO 2023194334 A1 WO2023194334 A1 WO 2023194334A1
Authority
WO
WIPO (PCT)
Prior art keywords
picture
motion vector
prediction
size
vector difference
Prior art date
Application number
PCT/EP2023/058736
Other languages
French (fr)
Inventor
Philippe Bordes
Tangi POIRIER
Franck Galpin
Antoine Robert
Original Assignee
Interdigital Ce Patent Holdings, Sas
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Interdigital Ce Patent Holdings, Sas filed Critical Interdigital Ce Patent Holdings, Sas
Publication of WO2023194334A1 publication Critical patent/WO2023194334A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/43Hardware specially adapted for motion estimation or compensation
    • H04N19/433Hardware specially adapted for motion estimation or compensation characterised by techniques for memory access
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding

Definitions

  • At least one of the present embodiments generally relates to a method or an apparatus for video encoding or decoding, and more particularly, to a method or an apparatus performing Geometric Partitioning or Template Matching (TM) based reordering for Motion Vector Difference (MVD) sign prediction or for enhanced prediction of both Motion Vector and Motion Vector Difference.
  • TM Geometric Partitioning or Template Matching
  • image and video coding schemes usually employ prediction, including motion vector prediction, and transform to leverage spatial and temporal redundancy in the video content.
  • prediction including motion vector prediction, and transform
  • intra or inter prediction is used to exploit the intra or inter frame correlation, then the differences between the original image and the predicted image, often denoted as prediction errors or prediction residuals, are transformed, quantized, and entropy coded.
  • the compressed data are decoded by inverse processes corresponding to the entropy coding, quantization, transform, and prediction.
  • a method comprises predicting a block of a picture of a video using geometrical partition mode combining at least two partitions and storing intra parameters in a motion information buffer of the picture, wherein responsive that a subblock of a partition among the at least two partitions is inter, the method further comprises obtaining intra parameter from intra parameter stored in a motion information buffer of a reference picture at a location corresponding to motion compensated location of the subblock of the partition, the location being scaled using a ratio of the reference picture relatively to the picture when the reference picture has a size different from the size of the picture.
  • a second method comprises predicting a block of a picture of a video by motion compensation, the method further comprising reconstructing a motion vector by adding a motion vector prediction to a motion vector difference wherein at least the motion vector difference is obtained using templatebased matching to order a list of candidates and wherein the template-based matching is modified for a reference picture having a size different from a size of the picture.
  • the template-based matching is used to reorder a list of candidates for Motion Vector Difference (MVD) sign prediction or for enhanced prediction of both Motion Vector and Motion Vector Difference.
  • a third method comprises decoding data representative of a block of a picture of a video wherein the decoding further comprises predicting a block according to the predicting method adapted to Reference Picture Re-scaling in any of its variants and decoding picture data of the block of the picture of the video based on the predicted block.
  • a fourth method comprises encoding data representative of a block of a picture of a video wherein the encoding further comprises predicting a block according to the predicting method adapted to Reference Picture Re-scaling in any of its variants and encoding picture data of the block of the picture of the video based on the predicted block.
  • an apparatus comprising one or more processors, wherein the one or more processors are configured to implement the method for video decoding according to any of its variants.
  • another apparatus comprising one or more processors, wherein the one or more processors are configured to implement the method for video decoding according to any of its variants.
  • a device comprising an apparatus according to any of the decoding embodiments; and at least one of (i) an antenna configured to receive a signal, the signal including the video block, (ii) a band limiter configured to limit the received signal to a band of frequencies that includes the video block, or (iii) a display configured to display an output representative of the video block.
  • a non- transitory computer readable medium containing data content generated according to any of the described encoding embodiments or variants.
  • a signal comprising video data generated according to any of the described encoding embodiments or variants.
  • a bitstream is formatted to include data content generated according to any of the described encoding embodiments or variants.
  • a computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out any of the described encoding/decoding embodiments or variants.
  • Figure 1 illustrates a block diagram of an example apparatus in which various aspects of the embodiments may be implemented.
  • Figure 2 illustrates a block diagram of an embodiment of video encoder in which various aspects of the embodiments may be implemented.
  • Figure 3 illustrates a block diagram of an embodiment of video decoder in which various aspects of the embodiments may be implemented.
  • Figure 4 illustrates the principles of Reference Picture Resampling at the encoder side.
  • Figure 5 illustrates the principles of Reference Picture Resampling at the decoder side.
  • Figure 6 illustrates the different examples of predictions in a geometric partitioning mode.
  • Figure 7 illustrates example of GPM prediction and coarser storage parameters in motion info
  • Figure 8 illustrates the storage of current CU coding intra mode information at sub-block precision in the motion info buffer.
  • Figure 9 illustrates samples of the template in a current picture and reference samples of the template in reference pictures.
  • Figure 10 illustrates Template Matching (TM) based reordering for MVD sign prediction.
  • Figure 1 1 illustrates the difference of size between the reference template and the current template when RPR is enabled.
  • Figure 12 illustrates a generic prediction method using template matching based reordering for motion vector difference adapted to RPR according to a general aspect of at least one embodiment.
  • Figure 13 illustrates another generic prediction method using template matching based reordering for motion vector difference adapted to RPR according to a general aspect of at least one embodiment.
  • Figure 14 illustrates Template Matching (TM) based reordering for enhanced motion vector and motion vector difference prediction.
  • Figure 15 illustrates another generic prediction method using template matching based reordering for motion vector difference adapted to RPR according to a general aspect of at least one embodiment.
  • Figure 16 illustrates a flowchart of an example of decoding using prediction adapted to reference picture re-sampling according to at least one embodiment.
  • Figure 17 illustrates a flowchart of an example of encoding using prediction adapted to reference picture re-sampling according to at least one embodiment
  • Various embodiments relate to a video coding system in which, in at least one embodiment, it is proposed to adapt video coding tools to the use of Reference Picture Re-scaling (RPR) where a reference picture has a different size than the current picture to be coded or decoded.
  • RPR Reference Picture Re-scaling
  • Different embodiments are proposed hereafter, introducing some tools modifications to increase coding efficiency and improve the codec consistency when RPR is enabled.
  • an encoding method, a decoding method, an encoding apparatus, a decoding apparatus based on this principle are proposed.
  • VVC Very Video Coding
  • HEVC High Efficiency Video Coding
  • ECM Enhanced Compression Model
  • FIG. 1 illustrates a block diagram of an example of a system in which various aspects and embodiments can be implemented.
  • System 100 may be embodied as a device including the various components described below and is configured to perform one or more of the aspects described in this application. Examples of such devices, include, but are not limited to, various electronic devices such as personal computers, laptop computers, smartphones, tablet computers, digital multimedia set top boxes, digital television receivers, personal video recording systems, connected home appliances, and servers.
  • Elements of system 100 singly or in combination, may be embodied in a single integrated circuit, multiple ICs, and/or discrete components.
  • the processing and encoder/decoder elements of system 100 are distributed across multiple ICs and/or discrete components.
  • system 100 is communicatively coupled to other systems, or to other electronic devices, via, for example, a communications bus or through dedicated input and/or output ports.
  • system 100 is configured to implement one or more of the aspects described in this application.
  • the system 100 includes at least one processor 110 configured to execute instructions loaded therein for implementing, for example, the various aspects described in this application.
  • Processor 110 may include embedded memory, input output interface, and various other circuitries as known in the art.
  • the system 100 includes at least one memory 120 (e.g. a volatile memory device, and/or a non-volatile memory device).
  • System 100 includes a storage device 140, which may include non-volatile memory and/or volatile memory, including, but not limited to, EEPROM, ROM, PROM, RAM, DRAM, SRAM, flash, magnetic disk drive, and/or optical disk drive.
  • the storage device 140 may include an internal storage device, an attached storage device, and/or a network accessible storage device, as non-limiting examples.
  • System 100 includes an encoder/decoder module 130 configured, for example, to process data to provide an encoded video or decoded video, and the encoder/decoder module 130 may include its own processor and memory.
  • the encoder/decoder module 130 represents module(s) that may be included in a device to perform the encoding and/or decoding functions. As is known, a device may include one or both of the encoding and decoding modules. Additionally, encoder/decoder module 130 may be implemented as a separate element of system 100 or may be incorporated within processor 110 as a combination of hardware and software as known to those skilled in the art.
  • Program code to be loaded onto processor 1 10 or encoder/decoder 130 to perform the various aspects described in this application may be stored in storage device 140 and subsequently loaded onto memory 120 for execution by processor 1 10.
  • one or more of processor 110, memory 120, storage device 140, and encoder/decoder module 130 may store one or more of various items during the performance of the processes described in this application. Such stored items may include, but are not limited to, the input video, the decoded video or portions of the decoded video, the bitstream, matrices, variables, and intermediate or final results from the processing of equations, formulas, operations, and operational logic.
  • memory inside of the processor 110 and/or the encoder/decoder module 130 is used to store instructions and to provide working memory for processing that is needed during encoding or decoding.
  • a memory external to the processing device (for example, the processing device may be either the processor 1 10 or the encoder/decoder module 130) is used for one or more of these functions.
  • the external memory may be the memory 120 and/or the storage device 140, for example, a dynamic volatile memory and/or a non-volatile flash memory.
  • an external non-volatile flash memory is used to store the operating system of a television.
  • a fast external dynamic volatile memory such as a RAM is used as working memory for video coding and decoding operations, such as for HEVC, or VVC.
  • the input to the elements of system 100 may be provided through various input devices as indicated in block 105.
  • Such input devices include, but are not limited to, (i) an RF portion that receives an RF signal transmitted, for example, over the air by a broadcaster, (ii) a Composite input terminal, (iii) a USB input terminal, and/or (iv) an HDMI input terminal.
  • the input devices of block 105 have associated respective input processing elements as known in the art.
  • the RF portion may be associated with elements suitable for (i) selecting a desired frequency (also referred to as selecting a signal, or band-limiting a signal to a band of frequencies), (ii) down converting the selected signal, (iii) band-limiting again to a narrower band of frequencies to select (for example) a signal frequency band which may be referred to as a channel in certain embodiments, (iv) demodulating the down converted and band-limited signal, (v) performing error correction, and (vi) demultiplexing to select the desired stream of data packets.
  • a desired frequency also referred to as selecting a signal, or band-limiting a signal to a band of frequencies
  • down converting the selected signal for example
  • band-limiting again to a narrower band of frequencies to select (for example) a signal frequency band which may be referred to as a channel in certain embodiments
  • demodulating the down converted and band-limited signal (v) performing error correction, and (vi) demultiplexing to select the desired stream of data packets
  • the RF portion of various embodiments includes one or more elements to perform these functions, for example, frequency selectors, signal selectors, band-limiters, channel selectors, filters, downconverters, demodulators, error correctors, and demultiplexers.
  • the RF portion may include a tuner that performs various of these functions, including, for example, down converting the received signal to a lower frequency (for example, an intermediate frequency or a near-baseband frequency) or to baseband.
  • the RF portion and its associated input processing element receives an RF signal transmitted over a wired (for example, cable) medium, and performs frequency selection by filtering, down converting, and filtering again to a desired frequency band.
  • Adding elements may include inserting elements in between existing elements, for example, inserting amplifiers and an analog-to-digital converter.
  • the RF portion includes an antenna.
  • USB and/or HDMI terminals may include respective interface processors for connecting system 100 to other electronic devices across USB and/or HDMI connections.
  • various aspects of input processing for example, Reed-Solomon error correction, may be implemented, for example, within a separate input processing IC or within processor 110 as necessary.
  • aspects of USB or HDMI interface processing may be implemented within separate interface ICs or within processor 110 as necessary.
  • the demodulated, error corrected, and demultiplexed stream is provided to various processing elements, including, for example, processor 110, and encoder/decoder 130 operating in combination with the memory and storage elements to process the data stream as necessary for presentation on an output device.
  • connection arrangement 115 for example, an internal bus as known in the art, including the I2C bus, wiring, and printed circuit boards.
  • the system 100 includes communication interface 150 that enables communication with other devices via communication channel 190.
  • the communication interface 150 may include, but is not limited to, a transceiver configured to transmit and to receive data over communication channel 190.
  • the communication interface 150 may include, but is not limited to, a modem or network card and the communication channel 190 may be implemented, for example, within a wired and/or a wireless medium.
  • Data is streamed to the system 100, in various embodiments, using a Wi-Fi network such as IEEE 802. 1 1 .
  • the Wi-Fi signal of these embodiments is received over the communications channel 190 and the communications interface 150 which are adapted for Wi-Fi communications.
  • the communications channel 190 of these embodiments is typically connected to an access point or router that provides access to outside networks including the Internet for allowing streaming applications and other over-the-top communications.
  • Other embodiments provide streamed data to the system 100 using a set-top box that delivers the data over the HDMI connection of the input block 105.
  • Still other embodiments provide streamed data to the system 100 using the RF connection of the input block 105.
  • the system 100 may provide an output signal to various output devices, including a display 165, speakers 175, and other peripheral devices 185.
  • the other peripheral devices 185 include, in various examples of embodiments, one or more of a stand-alone DVR, a disk player, a stereo system, a lighting system, and other devices that provide a function based on the output of the system 100.
  • control signals are communicated between the system 100 and the display 165, speakers 175, or other peripheral devices 185 using signaling such as AV. Link, CEC, or other communications protocols that enable device- to-device control with or without user intervention.
  • the output devices may be communicatively coupled to system 100 via dedicated connections through respective interfaces 160, 170, and 180.
  • the output devices may be connected to system 100 using the communications channel 190 via the communications interface 150.
  • the display 165 and speakers 175 may be integrated in a single unit with the other components of system 100 in an electronic device, for example, a television.
  • the display interface 160 includes a display driver, for example, a timing controller (T Con) chip.
  • the display 165 and speaker 175 may alternatively be separate from one or more of the other components, for example, if the RF portion of input 105 is part of a separate set-top box.
  • the output signal may be provided via dedicated output connections, including, for example, HDMI ports, USB ports, or COMP outputs.
  • Figure 2 illustrates an example video encoder 200, such as VVC (Versatile Video Coding) encoder.
  • Figure 2 may also illustrate an encoder in which improvements are made to the VVC standard or an encoder employing technologies similar to VVC.
  • the terms “reconstructed” and “decoded” may be used interchangeably, the terms “encoded” or “coded” may be used interchangeably, and the terms “image,” “picture” and “frame” may be used interchangeably.
  • the term “reconstructed” is used at the encoder side while “decoded” is used at the decoder side.
  • the video sequence may go through pre-encoding processing (201 ), for example, applying a color transform to the input color picture (e.g., conversion from RGB 4:4:4 to YCbCr 4:2:0), or performing a remapping of the input picture components in order to get a signal distribution more resilient to compression (for instance using a histogram equalization of one of the color components).
  • Metadata can be associated with the preprocessing and attached to the bitstream.
  • pre-encoding comprises a re-scaling of the input picture for Reference Picture Re-scaling as described hereafter.
  • a picture is encoded by the encoder elements as described below.
  • the picture to be encoded is partitioned (202) and processed in units of, for example, CUs.
  • Each unit is encoded using, for example, either an intra or inter mode.
  • intra prediction 260
  • inter mode motion estimation (275) and compensation (270) are performed.
  • the encoder decides (205) which one of the intra mode or inter mode to use for encoding the unit, and indicates the intra/inter decision by, for example, a prediction mode flag.
  • Prediction residuals are calculated, for example, by subtracting (210) the predicted block from the original image block.
  • the prediction residuals are then transformed (225) and quantized (230).
  • the quantized transform coefficients, as well as motion vectors and other syntax elements, are entropy coded (245) to output a bitstream.
  • the encoder can skip the transform and apply quantization directly to the non-transformed residual signal.
  • the encoder can bypass both transform and quantization, i.e., the residual is coded directly without the application of the transform or quantization processes.
  • the encoder decodes an encoded block to provide a reference for further predictions.
  • the quantized transform coefficients are de-quantized (240) and inverse transformed (250) to decode prediction residuals.
  • In-loop filters (265) are applied to the reconstructed picture to perform, for example, deblocking/SAO (Sample Adaptive Offset) filtering to reduce encoding artifacts.
  • the filtered image is stored at a reference picture buffer (280).
  • Figure 3 illustrates a block diagram of an example video decoder 300.
  • a bitstream is decoded by the decoder elements as described below.
  • Video decoder 300 generally performs a decoding pass reciprocal to the encoding pass as described in Figure 2.
  • the encoder 200 also generally performs video decoding as part of encoding video data.
  • the input of the decoder includes a video bitstream, which can be generated by video encoder 200.
  • the bitstream is first entropy decoded (330) to obtain transform coefficients, motion vectors, and other coded information.
  • the picture partition information indicates how the picture is partitioned.
  • the decoder may therefore divide (335) the picture according to the decoded picture partitioning information.
  • the transform coefficients are dequantized (340) and inverse transformed (350) to decode the prediction residuals. Combining (355) the decoded prediction residuals and the predicted block, an image block is reconstructed.
  • the predicted block can be obtained (370) from intra prediction (360) or motion-compensated prediction (i.e., inter prediction) (375).
  • In-loop filters (365) are applied to the reconstructed image.
  • the filtered image is stored at a reference picture buffer (380).
  • the decoded picture can further go through post-decoding processing (385), for example, an inverse color transform (e.g., conversion from YCbCr 4:2:0 to RGB 4:4:4) or an inverse remapping performing the inverse of the remapping process performed in the pre-encoding processing (201 ).
  • post decoding comprises a re-scaling of the decoded picture performing the inverse of the re-scaling of the encoding.
  • the post-decoding processing can use metadata derived in the pre-encoding processing and signaled in the bitstream.
  • a video coding system may comprise a plurality of different tools for encoding and decoding according to different coding modes.
  • a coding mode is selected for a block of image or a larger area of an image or a video according to rate-distortion optimization.
  • tools are Reference Picture Resampling (RPR), Geometric Partitioning Mode (GPM), Template Matching (TM) based reordering for Motion Vector Difference (MVD) sign prediction among others.
  • FIG 4 illustrates the principles of Reference Picture Resampling at the encoder side
  • Figure 5 illustrates the principles of Reference Picture Resampling at the decoder side.
  • Reference Picture Resampling is a picture-based re-scaling feature. The principle is that, when possible, the encoding and decoding processes may operate on smaller images which may increase the overall compression rate.
  • the encoder may choose, for each frame of the video sequence, the resolution (in other words the picture size) to use for coding the frame.
  • Different picture parameter sets (PPS) are coded in the bitstream with the different possible sizes of the pictures and the slice header or picture header indicates which PPS to use to decode the current picture part included in the video coding layer (VCL) network abstraction layer (NAL) unit.
  • VCL video coding layer
  • NAL network abstraction layer
  • the down-sampler (440) and the up-sampler (540) functions are respectively used as preprocessing (such as pre-encoding processing 101 in figure 1 ) or post-processing (postdecoding processing 285 of figure 2). These functions are not specified by the video coding standard generally.
  • the encoder selects whether to encode at original (full size) or down-sized resolution (ex: picture width/height divided by 2).
  • original full size
  • down-sized resolution ex: picture width/height divided by 2.
  • the reference picture buffer 180 in figure 1 and 280 in figure 2
  • DPB Decoded Picture Buffer
  • a re-scaling function (430 in figure 4 for the encoder side and 530 in figure 5 for the decoder side) down-scales or up-scales the reference block to build the prediction block during the motion compensation process (170 in figure 1 , 275 in figure 2).
  • the re-scaling (430, 530) of the reference block is made implicitly during the motion compensation process (270, 375).
  • the RPR tool may be enabled explicitly or implicitly at different levels using different mechanisms.
  • sequence level In the sequence parameter sets (SPS) that describes elements common to a series of pictures, a flag indicates that RPR may be applied for coding at least one picture. This is the case in VVC and ECM using a flag named ref_pic_resampling_enabled_flag.
  • SPS sequence parameter sets
  • a flag indicates that RPR may be applied for coding at least one picture. This is the case in VVC and ECM using a flag named ref_pic_resampling_enabled_flag.
  • At picture level if RPR is enabled at sequence level (as above) and current picture uses at least one reference picture with size different from current picture size.
  • CU level if RPR is enabled at picture level and current CU uses at least one reference picture with size different from current picture size.
  • the term “RPR enabled” can be understood at any of these levels.
  • a tool used in video coding system is Geometric Partitioning Mode (GPM).
  • the geometric partitioning mode allows predicting one Oil with two non- rectangular partitions.
  • the Intra mode storage in GPM with inter and intra prediction is now detailed.
  • Figure 6 illustrates the different examples of predictions used in GPM.
  • Each partition may be Inter (375) or Intra (360).
  • the samples of the inter partition are predicted with regular interprediction process, using motion compensated reference samples picked from one (uniprediction) reference picture.
  • the samples of the intra partition are predicted with regular intra prediction mode (IPM) and prediction process where the available IPM candidates are the parallel angular mode against the GPM block boundary (Parallel mode), the perpendicular angular mode against the GPM block boundary (Perpendicular mode), and the Planar mode.
  • Figure 6a shows the combination of inter prediction and intra prediction using the parallel angular mode against the GPM block boundary (Parallel mode), the IPM being represented by the arrow.
  • Figure 6b shows the combination of inter prediction and intra prediction using the perpendicular angular mode against the GPM block boundary (Perpendicular mode).
  • Figure 6c shows the combination of inter prediction and intra prediction using the planar mode.
  • Figure 6d shows the combination of two intra predictions respectively using the Parallel mode and perpendicular mode.
  • the motion vectors used to reconstruct the inter prediction part(s) and the IPM used to reconstruct the intra partition(s) are stored in a buffer (called motion info, “Ml”) associated with the current picture in the DPB.
  • the partition since the partition may be non-rectangular, for the sub-blocks (or sub-partitions) shared by the two partitions, the stored information corresponds to the partition which occupies the larger area.
  • the motion information (MV and reference index) is stored and the IPM information is retrieved from the Ml buffer of the reference at the location miRef((x+mvx)/4;(y+mvy)/4).
  • the IPM information is stored, and the motion information is undefined.
  • Figure 7 illustrates examples of two GPM predictions and coarser storage parameters (dashed sub-blocks) in motion/IPM info.
  • the Oil of size 32x32 is predicted with one partition inter and one partition intra.
  • the corresponding inter and intra parameters are stored in the Ml buffer at 4x4 resolution as shown on figure 7(b) with dashed sub-blocks.
  • Figure 8 illustrates the storage of current CU coding mode information at sub-block precision in the motion info buffer. This information may be later used for coding subsequent CUs in the current picture and CUs in the subsequent pictures (in coding order), to predict the motion or the IPM to be used for example.
  • a reconstructed CU is input (810).
  • the intra parameters are selected (830) to be stored in the Ml buffer (850) .
  • the intra parameters are retrieved from the reference Ml buffer, which is the Ml buffer associated with the reference samples used for inter prediction for this sub-block.
  • the location of the intra parameters in the reference Ml buffer is ((x+mvX)/si , (y+mvY)/si) where (mvX,mvY) is the motion vector of the inter prediction and si is the ratio between the current picture size and the motion info buffer size and (x,y) is the location of the sub-block in the current frame.
  • the coding mode and the intra parameters either retrieved from Intra partition (830) or Inter partition (840) are stored (850) in the current Ml buffer at location mi cur (x/si, y/si).
  • refO may be used if it is coded in intra. If both refO and ref 1 are coded in inter, one may choose the reference picture with POC closest to the current POC.
  • the mvd is made of 2 components (mvdx, mvdy). These values may be parsed from the bitstream in case of the current CU is coded in AMVP mode for example.
  • AMVP mode a list of pre-defined mvd refinement positions along kxn78 diagonal angles is built and the index of the mvd to use is coded.
  • Figure 9 illustrates samples of the template in a current picture and reference samples of the template in reference pictures in the more general case of bi-prediction.
  • HLS HTTP live Streaming
  • a manifest can be associated, for example, to a version or collection of versions of a content to provide characteristics of the version or collection of versions.
  • references to “one embodiment” or “an embodiment” or “one implementation” or “an implementation”, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment.
  • the appearances of the phrase “in one embodiment” or “in an embodiment” or “in one implementation” or “in an implementation”, as well any other variations, appearing in various places throughout this application are not necessarily all referring to the same embodiment.
  • such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C).
  • This may be extended, as is clear to one of ordinary skill in this and related arts, for as many items as are listed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

In a video coding system, it is proposed to adapt video coding tools to the use of Reference Picture Re-scaling where a reference picture has a different size than the current picture to be coded or decoded. Different embodiments are proposed hereafter for prediction (geometric partitioning, template matching based re-ordering for MVD sign prediction, template matching based re-ordering for enhanced MV prediction) introducing some tools modifications to increase coding efficiency and improve the codec consistency when RPR is enabled. A video encoding method, a decoding method, a video encoder and a video decoder are described.

Description

VIDEO ENCODING AND DECODING USING REFERENCE PICTURE RESAMPLING
CROSS REFERENCE TO RELATED APPLICATIONS
This application claims the benefit of European Application No. 22305485.9, filed on April 7th, 2022, which is incorporated herein by reference in its entirety.
TECHNICAL FIELD
At least one of the present embodiments generally relates to a method or an apparatus for video encoding or decoding, and more particularly, to a method or an apparatus performing Geometric Partitioning or Template Matching (TM) based reordering for Motion Vector Difference (MVD) sign prediction or for enhanced prediction of both Motion Vector and Motion Vector Difference.
BACKGROUND
To achieve high compression efficiency, image and video coding schemes usually employ prediction, including motion vector prediction, and transform to leverage spatial and temporal redundancy in the video content. Generally, intra or inter prediction is used to exploit the intra or inter frame correlation, then the differences between the original image and the predicted image, often denoted as prediction errors or prediction residuals, are transformed, quantized, and entropy coded. To reconstruct the video, the compressed data are decoded by inverse processes corresponding to the entropy coding, quantization, transform, and prediction.
Existing methods for coding and decoding show some limitations with the usage some tools with Reference Picture Re-scaling (RPR) where a reference picture has a different size than the current picture to be coded or decoded. Therefore, there is a need to improve the state of the art.
SUMMARY
The drawbacks and disadvantages of the prior art are solved and addressed by the general aspects described herein. According to a first aspect, there is provided a method. The method comprises predicting a block of a picture of a video using geometrical partition mode combining at least two partitions and storing intra parameters in a motion information buffer of the picture, wherein responsive that a subblock of a partition among the at least two partitions is inter, the method further comprises obtaining intra parameter from intra parameter stored in a motion information buffer of a reference picture at a location corresponding to motion compensated location of the subblock of the partition, the location being scaled using a ratio of the reference picture relatively to the picture when the reference picture has a size different from the size of the picture.
According to another aspect, there is provided a second method. The method comprises predicting a block of a picture of a video by motion compensation, the method further comprising reconstructing a motion vector by adding a motion vector prediction to a motion vector difference wherein at least the motion vector difference is obtained using templatebased matching to order a list of candidates and wherein the template-based matching is modified for a reference picture having a size different from a size of the picture. Advantageously the template-based matching (TM) is used to reorder a list of candidates for Motion Vector Difference (MVD) sign prediction or for enhanced prediction of both Motion Vector and Motion Vector Difference.
According to another aspect, there is provided a third method. The method comprises decoding data representative of a block of a picture of a video wherein the decoding further comprises predicting a block according to the predicting method adapted to Reference Picture Re-scaling in any of its variants and decoding picture data of the block of the picture of the video based on the predicted block.
According to another aspect, there is provided a fourth method. The method comprises encoding data representative of a block of a picture of a video wherein the encoding further comprises predicting a block according to the predicting method adapted to Reference Picture Re-scaling in any of its variants and encoding picture data of the block of the picture of the video based on the predicted block.
According to another aspect, there is provided an apparatus. The apparatus comprises one or more processors, wherein the one or more processors are configured to implement the method for video decoding according to any of its variants. According to another aspect, there is provided another apparatus. The apparatus comprises one or more processors, wherein the one or more processors are configured to implement the method for video decoding according to any of its variants.
According to another general aspect of at least one embodiment, there is provided a device comprising an apparatus according to any of the decoding embodiments; and at least one of (i) an antenna configured to receive a signal, the signal including the video block, (ii) a band limiter configured to limit the received signal to a band of frequencies that includes the video block, or (iii) a display configured to display an output representative of the video block.
According to another general aspect of at least one embodiment, there is provided a non- transitory computer readable medium containing data content generated according to any of the described encoding embodiments or variants.
According to another general aspect of at least one embodiment, there is provided a signal comprising video data generated according to any of the described encoding embodiments or variants.
According to another general aspect of at least one embodiment, a bitstream is formatted to include data content generated according to any of the described encoding embodiments or variants.
According to another general aspect of at least one embodiment, there is provided a computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out any of the described encoding/decoding embodiments or variants.
These and other aspects, features and advantages of the general aspects will become apparent from the following detailed description of exemplary embodiments, which is to be read in connection with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
In the drawings, examples of several embodiments are illustrated.
Figure 1 illustrates a block diagram of an example apparatus in which various aspects of the embodiments may be implemented. Figure 2 illustrates a block diagram of an embodiment of video encoder in which various aspects of the embodiments may be implemented.
Figure 3 illustrates a block diagram of an embodiment of video decoder in which various aspects of the embodiments may be implemented.
Figure 4 illustrates the principles of Reference Picture Resampling at the encoder side.
Figure 5 illustrates the principles of Reference Picture Resampling at the decoder side.
Figure 6 illustrates the different examples of predictions in a geometric partitioning mode.
Figure 7 illustrates example of GPM prediction and coarser storage parameters in motion info Figure 8 illustrates the storage of current CU coding intra mode information at sub-block precision in the motion info buffer.
Figure 9 illustrates samples of the template in a current picture and reference samples of the template in reference pictures.
Figure 10 illustrates Template Matching (TM) based reordering for MVD sign prediction.
Figure 1 1 illustrates the difference of size between the reference template and the current template when RPR is enabled.
Figure 12 illustrates a generic prediction method using template matching based reordering for motion vector difference adapted to RPR according to a general aspect of at least one embodiment.
Figure 13 illustrates another generic prediction method using template matching based reordering for motion vector difference adapted to RPR according to a general aspect of at least one embodiment.
Figure 14 illustrates Template Matching (TM) based reordering for enhanced motion vector and motion vector difference prediction.
Figure 15 illustrates another generic prediction method using template matching based reordering for motion vector difference adapted to RPR according to a general aspect of at least one embodiment.
Figure 16 illustrates a flowchart of an example of decoding using prediction adapted to reference picture re-sampling according to at least one embodiment.
Figure 17 illustrates a flowchart of an example of encoding using prediction adapted to reference picture re-sampling according to at least one embodiment
DETAILED DESCRIPTION
Various embodiments relate to a video coding system in which, in at least one embodiment, it is proposed to adapt video coding tools to the use of Reference Picture Re-scaling (RPR) where a reference picture has a different size than the current picture to be coded or decoded. Different embodiments are proposed hereafter, introducing some tools modifications to increase coding efficiency and improve the codec consistency when RPR is enabled. Amongst others, an encoding method, a decoding method, an encoding apparatus, a decoding apparatus based on this principle are proposed.
Moreover, the present aspects, although describing principles related to particular drafts of VVC (Versatile Video Coding) or to HEVC (High Efficiency Video Coding) specifications, or to ECM (Enhanced Compression Model) reference software are not limited to VVC or HEVC or ECM, and can be applied, for example, to other standards and recommendations, whether pre-existing or future-developed, and extensions of any such standards and recommendations (including VVC and HEVC and ECM). Unless indicated otherwise, or technically precluded, the aspects described in this application can be used individually or in combination.
The acronyms used herein are reflecting the current state of video coding developments and thus should be considered as examples of naming that may be renamed at later stages while still representing the same techniques.
Figure 1 illustrates a block diagram of an example of a system in which various aspects and embodiments can be implemented. System 100 may be embodied as a device including the various components described below and is configured to perform one or more of the aspects described in this application. Examples of such devices, include, but are not limited to, various electronic devices such as personal computers, laptop computers, smartphones, tablet computers, digital multimedia set top boxes, digital television receivers, personal video recording systems, connected home appliances, and servers. Elements of system 100, singly or in combination, may be embodied in a single integrated circuit, multiple ICs, and/or discrete components. For example, in at least one embodiment, the processing and encoder/decoder elements of system 100 are distributed across multiple ICs and/or discrete components. In various embodiments, the system 100 is communicatively coupled to other systems, or to other electronic devices, via, for example, a communications bus or through dedicated input and/or output ports. In various embodiments, the system 100 is configured to implement one or more of the aspects described in this application.
The system 100 includes at least one processor 110 configured to execute instructions loaded therein for implementing, for example, the various aspects described in this application. Processor 110 may include embedded memory, input output interface, and various other circuitries as known in the art. The system 100 includes at least one memory 120 (e.g. a volatile memory device, and/or a non-volatile memory device). System 100 includes a storage device 140, which may include non-volatile memory and/or volatile memory, including, but not limited to, EEPROM, ROM, PROM, RAM, DRAM, SRAM, flash, magnetic disk drive, and/or optical disk drive. The storage device 140 may include an internal storage device, an attached storage device, and/or a network accessible storage device, as non-limiting examples.
System 100 includes an encoder/decoder module 130 configured, for example, to process data to provide an encoded video or decoded video, and the encoder/decoder module 130 may include its own processor and memory. The encoder/decoder module 130 represents module(s) that may be included in a device to perform the encoding and/or decoding functions. As is known, a device may include one or both of the encoding and decoding modules. Additionally, encoder/decoder module 130 may be implemented as a separate element of system 100 or may be incorporated within processor 110 as a combination of hardware and software as known to those skilled in the art.
Program code to be loaded onto processor 1 10 or encoder/decoder 130 to perform the various aspects described in this application may be stored in storage device 140 and subsequently loaded onto memory 120 for execution by processor 1 10. In accordance with various embodiments, one or more of processor 110, memory 120, storage device 140, and encoder/decoder module 130 may store one or more of various items during the performance of the processes described in this application. Such stored items may include, but are not limited to, the input video, the decoded video or portions of the decoded video, the bitstream, matrices, variables, and intermediate or final results from the processing of equations, formulas, operations, and operational logic.
In several embodiments, memory inside of the processor 110 and/or the encoder/decoder module 130 is used to store instructions and to provide working memory for processing that is needed during encoding or decoding. In other embodiments, however, a memory external to the processing device (for example, the processing device may be either the processor 1 10 or the encoder/decoder module 130) is used for one or more of these functions. The external memory may be the memory 120 and/or the storage device 140, for example, a dynamic volatile memory and/or a non-volatile flash memory. In several embodiments, an external non-volatile flash memory is used to store the operating system of a television. In at least one embodiment, a fast external dynamic volatile memory such as a RAM is used as working memory for video coding and decoding operations, such as for HEVC, or VVC.
The input to the elements of system 100 may be provided through various input devices as indicated in block 105. Such input devices include, but are not limited to, (i) an RF portion that receives an RF signal transmitted, for example, over the air by a broadcaster, (ii) a Composite input terminal, (iii) a USB input terminal, and/or (iv) an HDMI input terminal. In various embodiments, the input devices of block 105 have associated respective input processing elements as known in the art. For example, the RF portion may be associated with elements suitable for (i) selecting a desired frequency (also referred to as selecting a signal, or band-limiting a signal to a band of frequencies), (ii) down converting the selected signal, (iii) band-limiting again to a narrower band of frequencies to select (for example) a signal frequency band which may be referred to as a channel in certain embodiments, (iv) demodulating the down converted and band-limited signal, (v) performing error correction, and (vi) demultiplexing to select the desired stream of data packets. The RF portion of various embodiments includes one or more elements to perform these functions, for example, frequency selectors, signal selectors, band-limiters, channel selectors, filters, downconverters, demodulators, error correctors, and demultiplexers. The RF portion may include a tuner that performs various of these functions, including, for example, down converting the received signal to a lower frequency (for example, an intermediate frequency or a near-baseband frequency) or to baseband. In one set-top box embodiment, the RF portion and its associated input processing element receives an RF signal transmitted over a wired (for example, cable) medium, and performs frequency selection by filtering, down converting, and filtering again to a desired frequency band. Various embodiments rearrange the order of the above-described (and other) elements, remove some of these elements, and/or add other elements performing similar or different functions. Adding elements may include inserting elements in between existing elements, for example, inserting amplifiers and an analog-to-digital converter. In various embodiments, the RF portion includes an antenna.
Additionally, the USB and/or HDMI terminals may include respective interface processors for connecting system 100 to other electronic devices across USB and/or HDMI connections. It is to be understood that various aspects of input processing, for example, Reed-Solomon error correction, may be implemented, for example, within a separate input processing IC or within processor 110 as necessary. Similarly, aspects of USB or HDMI interface processing may be implemented within separate interface ICs or within processor 110 as necessary. The demodulated, error corrected, and demultiplexed stream is provided to various processing elements, including, for example, processor 110, and encoder/decoder 130 operating in combination with the memory and storage elements to process the data stream as necessary for presentation on an output device.
Various elements of system 100 may be provided within an integrated housing, Within the integrated housing, the various elements may be interconnected and transmit data therebetween using suitable connection arrangement 115, for example, an internal bus as known in the art, including the I2C bus, wiring, and printed circuit boards. The system 100 includes communication interface 150 that enables communication with other devices via communication channel 190. The communication interface 150 may include, but is not limited to, a transceiver configured to transmit and to receive data over communication channel 190. The communication interface 150 may include, but is not limited to, a modem or network card and the communication channel 190 may be implemented, for example, within a wired and/or a wireless medium.
Data is streamed to the system 100, in various embodiments, using a Wi-Fi network such as IEEE 802. 1 1 . The Wi-Fi signal of these embodiments is received over the communications channel 190 and the communications interface 150 which are adapted for Wi-Fi communications. The communications channel 190 of these embodiments is typically connected to an access point or router that provides access to outside networks including the Internet for allowing streaming applications and other over-the-top communications. Other embodiments provide streamed data to the system 100 using a set-top box that delivers the data over the HDMI connection of the input block 105. Still other embodiments provide streamed data to the system 100 using the RF connection of the input block 105.
The system 100 may provide an output signal to various output devices, including a display 165, speakers 175, and other peripheral devices 185. The other peripheral devices 185 include, in various examples of embodiments, one or more of a stand-alone DVR, a disk player, a stereo system, a lighting system, and other devices that provide a function based on the output of the system 100. In various embodiments, control signals are communicated between the system 100 and the display 165, speakers 175, or other peripheral devices 185 using signaling such as AV. Link, CEC, or other communications protocols that enable device- to-device control with or without user intervention. The output devices may be communicatively coupled to system 100 via dedicated connections through respective interfaces 160, 170, and 180. Alternatively, the output devices may be connected to system 100 using the communications channel 190 via the communications interface 150. The display 165 and speakers 175 may be integrated in a single unit with the other components of system 100 in an electronic device, for example, a television. In various embodiments, the display interface 160 includes a display driver, for example, a timing controller (T Con) chip.
The display 165 and speaker 175 may alternatively be separate from one or more of the other components, for example, if the RF portion of input 105 is part of a separate set-top box. In various embodiments in which the display 165 and speakers 175 are external components, the output signal may be provided via dedicated output connections, including, for example, HDMI ports, USB ports, or COMP outputs. Figure 2 illustrates an example video encoder 200, such as VVC (Versatile Video Coding) encoder. Figure 2 may also illustrate an encoder in which improvements are made to the VVC standard or an encoder employing technologies similar to VVC.
In the present application, the terms “reconstructed” and “decoded” may be used interchangeably, the terms “encoded” or “coded” may be used interchangeably, and the terms “image,” “picture” and “frame” may be used interchangeably. Usually, but not necessarily, the term “reconstructed” is used at the encoder side while “decoded” is used at the decoder side.
Before being encoded, the video sequence may go through pre-encoding processing (201 ), for example, applying a color transform to the input color picture (e.g., conversion from RGB 4:4:4 to YCbCr 4:2:0), or performing a remapping of the input picture components in order to get a signal distribution more resilient to compression (for instance using a histogram equalization of one of the color components). Metadata can be associated with the preprocessing and attached to the bitstream. According to another example, pre-encoding comprises a re-scaling of the input picture for Reference Picture Re-scaling as described hereafter.
In the encoder 200, a picture is encoded by the encoder elements as described below. The picture to be encoded is partitioned (202) and processed in units of, for example, CUs. Each unit is encoded using, for example, either an intra or inter mode. When a unit is encoded in an intra mode, it performs intra prediction (260). In an inter mode, motion estimation (275) and compensation (270) are performed. The encoder decides (205) which one of the intra mode or inter mode to use for encoding the unit, and indicates the intra/inter decision by, for example, a prediction mode flag. Prediction residuals are calculated, for example, by subtracting (210) the predicted block from the original image block.
The prediction residuals are then transformed (225) and quantized (230). The quantized transform coefficients, as well as motion vectors and other syntax elements, are entropy coded (245) to output a bitstream. The encoder can skip the transform and apply quantization directly to the non-transformed residual signal. The encoder can bypass both transform and quantization, i.e., the residual is coded directly without the application of the transform or quantization processes.
The encoder decodes an encoded block to provide a reference for further predictions. The quantized transform coefficients are de-quantized (240) and inverse transformed (250) to decode prediction residuals. Combining (255) the decoded prediction residuals and the predicted block, an image block is reconstructed. In-loop filters (265) are applied to the reconstructed picture to perform, for example, deblocking/SAO (Sample Adaptive Offset) filtering to reduce encoding artifacts. The filtered image is stored at a reference picture buffer (280).
Figure 3 illustrates a block diagram of an example video decoder 300. In the decoder 300, a bitstream is decoded by the decoder elements as described below. Video decoder 300 generally performs a decoding pass reciprocal to the encoding pass as described in Figure 2. The encoder 200 also generally performs video decoding as part of encoding video data.
In particular, the input of the decoder includes a video bitstream, which can be generated by video encoder 200. The bitstream is first entropy decoded (330) to obtain transform coefficients, motion vectors, and other coded information. The picture partition information indicates how the picture is partitioned. The decoder may therefore divide (335) the picture according to the decoded picture partitioning information. The transform coefficients are dequantized (340) and inverse transformed (350) to decode the prediction residuals. Combining (355) the decoded prediction residuals and the predicted block, an image block is reconstructed. The predicted block can be obtained (370) from intra prediction (360) or motion-compensated prediction (i.e., inter prediction) (375). In-loop filters (365) are applied to the reconstructed image. The filtered image is stored at a reference picture buffer (380).
The decoded picture can further go through post-decoding processing (385), for example, an inverse color transform (e.g., conversion from YCbCr 4:2:0 to RGB 4:4:4) or an inverse remapping performing the inverse of the remapping process performed in the pre-encoding processing (201 ). According to another example, post decoding comprises a re-scaling of the decoded picture performing the inverse of the re-scaling of the encoding. The post-decoding processing can use metadata derived in the pre-encoding processing and signaled in the bitstream.
A video coding system may comprise a plurality of different tools for encoding and decoding according to different coding modes. A coding mode is selected for a block of image or a larger area of an image or a video according to rate-distortion optimization. Examples of tools are Reference Picture Resampling (RPR), Geometric Partitioning Mode (GPM), Template Matching (TM) based reordering for Motion Vector Difference (MVD) sign prediction among others.
Figure 4 illustrates the principles of Reference Picture Resampling at the encoder side, while Figure 5 illustrates the principles of Reference Picture Resampling at the decoder side. Reference Picture Resampling (RPR) is a picture-based re-scaling feature. The principle is that, when possible, the encoding and decoding processes may operate on smaller images which may increase the overall compression rate.
Given an original video sequence composed of pictures of size (width x height), the encoder may choose, for each frame of the video sequence, the resolution (in other words the picture size) to use for coding the frame. Different picture parameter sets (PPS) are coded in the bitstream with the different possible sizes of the pictures and the slice header or picture header indicates which PPS to use to decode the current picture part included in the video coding layer (VCL) network abstraction layer (NAL) unit.
The down-sampler (440) and the up-sampler (540) functions are respectively used as preprocessing (such as pre-encoding processing 101 in figure 1 ) or post-processing (postdecoding processing 285 of figure 2). These functions are not specified by the video coding standard generally.
For each frame, the encoder selects whether to encode at original (full size) or down-sized resolution (ex: picture width/height divided by 2). The choice can be made with two passes encoding or considering spatial and temporal activity in the original pictures for example. Consequently, the reference picture buffer (180 in figure 1 and 280 in figure 2), also known as Decoded Picture Buffer (DPB), may contain reference pictures of different sizes than the size of the current picture. In case one reference picture in the DPB has a size different from the current picture, a re-scaling function (430 in figure 4 for the encoder side and 530 in figure 5 for the decoder side) down-scales or up-scales the reference block to build the prediction block during the motion compensation process (170 in figure 1 , 275 in figure 2). The re-scaling (430, 530) of the reference block is made implicitly during the motion compensation process (270, 375).
The RPR tool may be enabled explicitly or implicitly at different levels using different mechanisms. At sequence level: In the sequence parameter sets (SPS) that describes elements common to a series of pictures, a flag indicates that RPR may be applied for coding at least one picture. This is the case in VVC and ECM using a flag named ref_pic_resampling_enabled_flag. At picture level: if RPR is enabled at sequence level (as above) and current picture uses at least one reference picture with size different from current picture size. At CU level: if RPR is enabled at picture level and current CU uses at least one reference picture with size different from current picture size. In the following, the term “RPR enabled” can be understood at any of these levels. According to a first aspect, A tool used in video coding system is Geometric Partitioning Mode (GPM). The geometric partitioning mode (GPM) allows predicting one Oil with two non- rectangular partitions. The Intra mode storage in GPM with inter and intra prediction is now detailed.
Figure 6 illustrates the different examples of predictions used in GPM. Each partition may be Inter (375) or Intra (360). The samples of the inter partition are predicted with regular interprediction process, using motion compensated reference samples picked from one (uniprediction) reference picture. The samples of the intra partition are predicted with regular intra prediction mode (IPM) and prediction process where the available IPM candidates are the parallel angular mode against the GPM block boundary (Parallel mode), the perpendicular angular mode against the GPM block boundary (Perpendicular mode), and the Planar mode. Figure 6a shows the combination of inter prediction and intra prediction using the parallel angular mode against the GPM block boundary (Parallel mode), the IPM being represented by the arrow. Figure 6b shows the combination of inter prediction and intra prediction using the perpendicular angular mode against the GPM block boundary (Perpendicular mode). Figure 6c shows the combination of inter prediction and intra prediction using the planar mode. Figure 6d shows the combination of two intra predictions respectively using the Parallel mode and perpendicular mode.
The motion vectors used to reconstruct the inter prediction part(s) and the IPM used to reconstruct the intra partition(s) are stored in a buffer (called motion info, “Ml”) associated with the current picture in the DPB. To reduce the storage amount, the information may be stored at coarser resolution than the current picture (ex: 4x4 resolution, figure 7). For example, if the ratio between the current picture size and the motion info is si=4, then the “Ml” parameters of the block located at (x,y) are stored at mi(x/4;y/4) in the motion info buffer. In case of GPM, since the partition may be non-rectangular, for the sub-blocks (or sub-partitions) shared by the two partitions, the stored information corresponds to the partition which occupies the larger area. For the sub-partition coded in inter, the motion information (MV and reference index) is stored and the IPM information is retrieved from the Ml buffer of the reference at the location miRef((x+mvx)/4;(y+mvy)/4). For the sub-partition coded in intra, the IPM information is stored, and the motion information is undefined.
Figure 7 illustrates examples of two GPM predictions and coarser storage parameters (dashed sub-blocks) in motion/IPM info. For example, in the figure 7(a) the Oil of size 32x32 is predicted with one partition inter and one partition intra. The corresponding inter and intra parameters are stored in the Ml buffer at 4x4 resolution as shown on figure 7(b) with dashed sub-blocks.
Figure 8 illustrates the storage of current CU coding mode information at sub-block precision in the motion info buffer. This information may be later used for coding subsequent CUs in the current picture and CUs in the subsequent pictures (in coding order), to predict the motion or the IPM to be used for example. First, a reconstructed CU is input (810). Then it is determined (820) for the given sub-sub, which partition (inter or intra) occupies the largest area in the subblock. Given a sub-block (x,y), if more samples are predicted with intra than in inter (820), the intra parameters are selected (830) to be stored in the Ml buffer (850) . If more samples are predicted with inter than in intra (840), then the intra parameters are retrieved from the reference Ml buffer, which is the Ml buffer associated with the reference samples used for inter prediction for this sub-block. The location of the intra parameters in the reference Ml buffer is ((x+mvX)/si , (y+mvY)/si) where (mvX,mvY) is the motion vector of the inter prediction and si is the ratio between the current picture size and the motion info buffer size and (x,y) is the location of the sub-block in the current frame. Thus, the Intra parameters from reference Ml buffer are stored in the current Ml buffer according to micur(x/si, y/si) = miref( (x+mvX)/si , (y+mvY)/si ) (840).
Then the coding mode and the intra parameters either retrieved from Intra partition (830) or Inter partition (840) are stored (850) in the current Ml buffer at location micur(x/si, y/si).
In case of bi-prediction, some rules are applied to select between refO or ref 1 intra parameters. For example, refO may be used if it is coded in intra. If both refO and ref 1 are coded in inter, one may choose the reference picture with POC closest to the current POC.
According to another aspect, a Motion Vector Difference (MVD) can be coded in Inter prediction. The motion vector values are predicted from a list of motion vector(s) candidates Mv[] which are built from previously reconstructed CUs. These candidates are made of spatially neighboring CUs, co-located CU, and history-based motion vectors. An index “idx1” is coded for the current CU to indicate which candidate in the list to use. For bi-prediction, the MV candidates may be pair of motion vectors. The motion vector value “Mv[idx1]” is corrected with mvd(mvdx,mvdy) and the motion vector is computed as:
Mv = Mv[idx1] + mvd
The mvd is made of 2 components (mvdx, mvdy). These values may be parsed from the bitstream in case of the current CU is coded in AMVP mode for example. In case of MMVD, a list of pre-defined mvd refinement positions along kxn78 diagonal angles is built and the index of the mvd to use is coded. In case of bi-prediction and SMVD mode, the pair of mvd is symmetric: mvd1 = -mvdO, and only one mvd value is coded. In case of coding modes with MVD sign prediction, the absolute values of the mvd refinements (absMvdx, absMvdy) are parsed or derived from the bitstream for the current CU. In case of bi-prediction or affine, two or three pairs of absolute values of mvd may be decoded respectively.
Figure 9 illustrates samples of the template in a current picture and reference samples of the template in reference pictures in the more general case of bi-prediction.
According to a first variant embodiment, MVD coding uses Template (TM) based reordering for MVD sign prediction.
Figure 10 illustrates Template Matching (TM) based reordering for MVD sign prediction. First, the absolute values of the mvd refinements (absMvdx, absMvdy) are parsed or derived from the bitstream for the current CU (1010). Then, to determine the sign of the mvd component values, a list of sign candidates corresponding to all the possible mvd components sign values combination is built (1020). The size of the list depends on the number of mvd values with non-zero component. For example:
Uni-prediction with 2 non-zero components: { (-1 ;-1 ), (-1 ;+1 ), (+1 ;-1 ), (+1 ,+1 )}
Uni-prediction with 1 zero component: { (-1 ), (+1 )}
Bi-prediction with 3 non-zero component and 1 zero component:
{(+1 , +1 , +1 ), (+1 , +1 , -1 ), (+1 , -1 , +1 ), (+1 , -1 , -1 ), (-1 , +1 , +1 ), (-1 , +1 , -1 ), (-1 , -1 , +1 ), (-1 , -1 , -1)}
For 3 mvds with all non-zero absolute values, the list size is 64.
The index “idx2” of the mvd sign candidate to use is parsed from the bitstream for the current CU (1060). The list of mvd sign values may be adaptively re-ordered using template matching (TM) cost (1040).
The template matching cost (1030) of a mvd sign candidate is measured by the sum of absolute differences (SAD) between samples of a template T of the current block and their corresponding reference samples (RT0 or RT1 as shown on Figure 9) translated by Mv[idx1] + sign[idx2]*(mvd) where the signs are multiplied with the non-zero components. The template T comprises a set of reconstructed samples neighboring to the current block. Reference samples of the template (RT0, RT 1 ) are located by the motion information of the Mv candidate. When a Mv candidate utilizes bi-directional prediction (Mv is a pair of motion vectors), the reference samples of the template of the candidate are also generated by bi-prediction as shown in figure 9.
According to a second variant embodiment, MVD coding uses Enhanced Template (TM) based reordering for Candidates and MVD prediction.
Figure 14 illustrates Template Matching (TM) based reordering for enhanced motion vector and motion vector difference prediction. In ECM, an enhanced process (1400) may combine (1420) both the list of candidates (build in 1405) and the MMVD values (build in 1410). Then this single list contains several MV candidates (with their reference index) and several mvd values combinations. Note that each candidate may be uni-dir or bi-prediction. Thus, the list size may be important. For a list containing uni-directional candidates only, the list size is the size of candidates multiplied by the number of mvd values. For bi-prediction candidates, the list size is up to the size of candidates multiplied by the number of mvd values raised at the power of 2 (nbmvd values2) . As for the variant of the sign prediction, a template matching cost (1430) of a pair of MV candidate and of mvd candidate is measured by the sum of absolute differences (SAD) between samples of a template T of the current block and their corresponding reference samples (RT0 or RT1 ) translated by (Mv+mvd)[idx], Then, (Mv+mvd) are ordered according to the TM cost. According to this variant, a single index (idx) allows signaling both the MV candidate and the MVD correction as depicted in figure 14. This index is parsed (1460) from the bitstream and allows selecting the pair of MV, mvd candidate for inter prediction.
In ECM, in case of RPR is enabled, the reference template RT may have different size from the current template T. Figure 11 illustrates the difference of size between the reference template RT and the current template T when RPR is enabled. Thus the motion compensation to obtain the reference template may include an implicit re-scaling using the regular RPR process. However, this embodiment may induce significant complexity.
In current ECM, both the GPM with inter and intra prediction and the Template (TM) based reordering for MMVD and affine MMVD, MVD sign prediction, and enhanced TM based MV prediction do not support RPR.
This is solved and addressed by the general aspects described herein, which are directed to modifications of GPM with inter and intra prediction and Template (TM) based reordering for inter prediction to support RPR. In a first embodiment, it is proposed to modify the storage of the intra parameters in the Ml buffer for GPM with inter and intra predictions when RPR is enabled.
In case of the current ClI is coded with GPM, for the sub-blocks marked as inter, the intra parameters to store in the current Ml buffer (miCUr) are derived from the reference Ml buffer (miref). Let’s denote (x,y) the location of the sub-block in the current picture, the derivation of the location of the intra parameter to be copied from the reference Ml buffer to the current Ml buffer (840) is modified as follows: micur(x/si, y/si) = miref( (x+mvX).rix / si , (y+mvY).riy / si )
Where:
(x,y) is the location of the sub-block in the current picture
(mvX, mvY) is the motion vector associated with the GPM part coded in inter si is the sub-sampling of the (current) Ml buffer: a block of size (Sx,Sy) in the current picture corresponds to area (Sx/si,Sy/si) in the current Ml buffer.
(rix,riy) is the scaling ratio of the reference picture relatively to the current picture: rix = (reference picture size X) / (current picture size X) riy = (reference picture size Y) / (current picture size Y)
In a second embodiment, it is proposed to adapt the TM based reordering for MVD sign prediction when RPR is enabled. If the current Oil is coded in inter mode, it is proposed not applying the TM based MVD sign prediction if at least one reference picture has a different size from current picture while RPR being enabled.
Figure 12 illustrates a generic prediction method using TM based reordering for MVD sign when RPR is enabled according to a general aspect of at least one embodiment. Accordingly, the reference picture indexes are determined (1210). They can be parsed or derived from candidates in case of merge for example. The size of the reference pictures is retrieved (1250) and compared to the size of the current picture. If at least one reference picture used in the prediction has different size from the size of current picture, then TM based MVD sign prediction is not applied, and an alternative method is used to derive mvd (1230). In a first variant, the signed MVD values may be parsed from the bitstream. In a second variant, TM based method of figure 10 is modified to skip the reordering of the list of sign candidates, that is step 1020 is followed by step 1050. In a third variant, TM based method of figure 10 is modified to compute (1030) a cost depending on other processing than TM matching, then the list is reordered (1040) using the computed cost. Yet according to different variants of the cost computation on other processing, the cost can be set to a same default value (low or high depending whether to favor those candidates or not), the cost can be a function of the ratio of the size of the current picture versus the size of the reference picture (the cost is higher if current picture is lower than reference picture to foster the prediction with more resolution), or the cost can be function of the POC difference between the current picture and the reference picture (to foster the prediction with nearest POC).
Back to figure 12, if all the reference pictures used in the prediction have a same size as the size of the current picture, then TM based MVD sign prediction can be applied. In other words, the TM based MVD sign prediction is only applied on condition that the additional test (1250) determines that any of at least one reference picture used in prediction has a same size as the size of the picture. Advantageously, this arrangement allows implementing MVD sign prediction in inter prediction while RPR is enabled at a limited complexity (a condition on the size of the reference pictures and modified/skipped sign candidate re-ordering) for computing the TM cost.
Figure 13 illustrates another generic prediction method using TM based reordering for MVD sign when RPR is enabled according to a general aspect of at least one embodiment. In this variant, the comparison between the size of the reference pictures and the size of the current picture is performed in the before computing the TM cost. According to a variant, if at least one reference picture has different size from current picture, then the TM based re-ordering is skipped. Accordingly, the method of figure 10 is modified to add a test (1310) on the size of the reference picture. If at least one reference picture used in the prediction of the current CU has different size from current picture (yes), then the step (1030) is not applied (TM based costs not computed) and the re-ordering (1040) is also not performed. However, the MVD are selected in the list of sign candidates using the MVD index parsed from the bitstream. According to another variant, TM based costs are set to a default value and the re-ordering (1040) is performed based on the default value. Advantageously, this arrangement allows implementing MVD sign prediction in inter prediction while RPR is enabled at a limited complexity for reordering the list with the TM cost.
In a third embodiment, it is proposed to adapt the TM based reordering for Enhanced TM- based reordering for Candidates and MVD prediction with RPR enabled. Figure 15 illustrates a generic prediction method using template matching based reordering for motion vector and motion vector difference adapted to RPR according to a general aspect of at least one embodiment. Accordingly, the method of figure 14 is modified to add a test (1525) on the size of the reference pictures and an alternative TM setting (1535). In case the list contains candidates that may have different reference picture indexes, if at least one reference picture has different size from the size of the current picture, then the step of TM cost computing (1430) is not applied (TM based costs not computed) and the TM based costs are set to a default value (1535). The default value may be zero (to favor this candidate since the coding cost of the index will be small because on top of the re-ordered list) or maximal huge number (to disadvantage this candidate since the coding cost of the index will be high because on bottom of the re-ordered list). As described above different variants of the cost computation comprises setting the cost to a same default value, determining a cost as function of the ratio between the size of the current picture and the size of the reference picture, or the cost can be function of the POC difference between the current picture and the reference picture. On the contrary, if all the reference pictures used in by the prediction candidate have a same size as the current picture, the step 1430 is performed and the list is re-ordered with the TM based costs of each candidate.
Figure 16 illustrates a flowchart of an example of decoding using prediction adapted to reference picture re-sampling according to any of the previous embodiments. This method is for example implemented in a decoder 300 of figure 3 or in a decoder 130 of a device 100 of figure 1. In step 1610, a block is predicted according to at least one of the embodiments described above, namely prediction with GPM, prediction using MVD sign TM based reordering or prediction using MV+MVD TM based reordering in any of their variants. In step 1620, picture data of the block of the picture of the video is decoded based on the predicted block.
Figure 17 illustrates a flowchart of an example of encoding using prediction adapted to reference picture re-sampling according to any of the previous embodiments. This method is for example implemented in an encoder 200 of figure 2 or in an encoder 130 of a device 100 of figure 1. In step 1710, a block is predicted according to at least one of the embodiments described above. In step 1720, picture data of the block of the picture of the video is encoded based on the predicted block.
Additional Embodiments and Information
Various methods are described herein, and each of the methods comprises one or more steps or actions for achieving the described method. Unless a specific order of steps or actions is required for proper operation of the method, the order and/or use of specific steps and/or actions may be modified or combined. Additionally, terms such as “first”, “second”, etc. may be used in various embodiments to modify an element, component, step, operation, etc., for example, a “first decoding” and a “second decoding”. Use of such terms does not imply an ordering to the modified operations unless specifically required. So, in this example, the first decoding need not be performed before the second decoding, and may occur, for example, before, during, or in an overlapping time period with the second decoding. Various methods and other aspects described in this application can be used to modify modules, for example, the selection of the motion vectors in motion compensation or motion estimation modules (270, 275, 375), of a video encoder 200 and decoder 300 as shown in figure 2 and figure 3. Moreover, the present aspects are not limited to VVC or HEVC, and can be applied, for example, to other standards and recommendations, and extensions of any such standards and recommendations. Unless indicated otherwise, or technically precluded, the aspects described in this application can be used individually or in combination.
Various numeric values are used in the present application. The specific values are for example purposes and the aspects described are not limited to these specific values.
Various implementations involve decoding. “Decoding,” as used in this application, may encompass all or part of the processes performed, for example, on a received encoded sequence in order to produce a final output suitable for display. In various embodiments, such processes include one or more of the processes typically performed by a decoder, for example, entropy decoding, inverse quantization, inverse transformation, and differential decoding. Whether the phrase “decoding process” is intended to refer specifically to a subset of operations or generally to the broader decoding process will be clear based on the context of the specific descriptions and is believed to be well understood by those skilled in the art.
Various implementations involve encoding. In an analogous way to the above discussion about “decoding”, “encoding” as used in this application may encompass all or part of the processes performed, for example, on an input video sequence in order to produce an encoded bitstream.
Note that the syntax elements as used herein are descriptive terms. As such, they do not preclude the use of other syntax element names.
The implementations and aspects described herein may be implemented as various pieces of information, such as for example syntax, that can be transmitted or stored, for example. This information can be packaged or arranged in a variety of manners, including for example manners common in video standards such as putting the information into an SPS, a PPS, a NAL unit, a header (for example, a NAL unit header, or a slice header), or an SEI message. Other manners are also available, including for example manners common for system level or application level standards such as putting the information into one or more of the following:
• SDP (session description protocol), a format for describing multimedia communication sessions for the purposes of session announcement and session invitation, for example as described in RFCs and used in conjunction with RTP (Real-time Transport Protocol) transmission;
• DASH MPD (Media Presentation Description) Descriptors, for example as used in DASH and transmitted over HTTP, a Descriptor is associated to a Representation or collection of Representations to provide additional characteristic to the content Representation;
• RTP header extensions, for example as used during RTP streaming;
• ISO Base Media File Format, for example as used in OMAF and using boxes which are object-oriented building blocks defined by a unique type identifier and length also known as 'atoms' in some specifications;
• HLS (HTTP live Streaming) manifest transmitted over HTTP. A manifest can be associated, for example, to a version or collection of versions of a content to provide characteristics of the version or collection of versions.
The implementations and aspects described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, for example, computers, cell phones, portable/personal digital assistants (“PDAs”), and other devices that facilitate communication of information between end-users.
Reference to “one embodiment” or “an embodiment” or “one implementation” or “an implementation”, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment” or “in one implementation” or “in an implementation”, as well any other variations, appearing in various places throughout this application are not necessarily all referring to the same embodiment.
Additionally, this application may refer to “determining” various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
Further, this application may refer to “accessing” various pieces of information. Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, moving the information, copying the information, calculating the information, determining the information, predicting the information, or estimating the information. Additionally, this application may refer to “receiving” various pieces of information. Receiving is, as with “accessing”, intended to be a broad term. Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory). Further, “receiving” is typically involved, in one way or another, during operations, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
It is to be appreciated that the use of any of the following 7”, “and/or”, and “at least one of’, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as is clear to one of ordinary skill in this and related arts, for as many items as are listed.
Also, as used herein, the word “signal” refers to, among other things, indicating something to a corresponding decoder. For example, in certain embodiments the encoder signals a quantization matrix for de-quantization. In this way, in an embodiment the same parameter is used at both the encoder side and the decoder side. Thus, for example, an encoder can transmit (explicit signaling) a particular parameter to the decoder so that the decoder can use the same particular parameter. Conversely, if the decoder already has the particular parameter as well as others, then signaling can be used without transmitting (implicit signaling) to simply allow the decoder to know and select the particular parameter. By avoiding transmission of any actual functions, a bit savings is realized in various embodiments. It is to be appreciated that signaling can be accomplished in a variety of ways. For example, one or more syntax elements, flags, and so forth are used to signal information to a corresponding decoder in various embodiments. While the preceding relates to the verb form of the word “signal”, the word “signal” can also be used herein as a noun.
As will be evident to one of ordinary skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry the bitstream of a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
We describe a number of embodiments. Features of these embodiments can be provided alone or in any combination, across various claim categories and types. Further, embodiments can include one or more of the following features, devices, or aspects, alone or in any combination, across various claim categories and types.

Claims

1 . A method for predicting a block of a picture of a video using geometrical partition mode combining at least two partitions comprising storing (800) intra parameters in a motion information buffer of the picture wherein responsive that a subblock of a partition among the at least two partitions is inter, obtaining (840) intra parameter from intra parameter stored in a motion information buffer of a reference picture at a location corresponding to motion compensated location of the subblock of the partition scaled using a ratio of the reference picture relatively to the picture; wherein the reference picture has a size different from the size of the picture.
2. A method for predicting a block of a picture of a video comprising reconstructing a motion vector by adding a motion vector prediction to a motion vector difference wherein at least the motion vector difference is obtained using template-based matching to order a list of candidates and wherein the template-based matching is modified for a reference picture having a size different from a size of the picture.
3. The method of claim 2, further comprising: obtaining (1210) at least one reference picture used in prediction; and in response (1250) that at least one reference picture has a size different from the size of the picture, deriving a sign of the motion vector difference with other method than template based matching motion vector difference sign prediction (1230).
4. The method of any of claims 2 or 3, further comprising in response (1250) that any of at least one reference picture used in prediction has a same size as the size of the picture, deriving a sign of the motion vector difference using template based matching motion vector difference sign prediction (1220).
5. The method of claim 2 further comprising: obtaining (1010) absolute value of the motion vector difference; determining (1020) a list of sign candidates for the motion vector difference; and in response (1310) that at least one reference picture has a size different from the size of the picture, deriving a sign for the motion vector difference from the list of sign candidates for the motion vector difference and a coded index.
6. The method of claim 2 further comprising: obtaining (1010) absolute value of the motion vector difference; determining (1020) a list of sign candidates for the motion vector difference; and in response (1310) that at least one reference picture has a size different from the size of the picture, setting a default value to a template based matching cost for a sign candidate, ordering the list of sign candidates for the motion vector difference according to the template based matching cost, and deriving a sign for the motion vector difference from the ordered list and a coded index.
7. The method of any of claims 2, 5 or 6 further comprising: in response (1310) that any of at least one reference picture used in prediction has a same size as the size of the picture, computing a template based matching cost for a sign candidate of the list of sign candidates for the motion vector difference, ordering the list of sign candidates according to the template based matching cost and deriving a sign for the motion vector difference from the ordered list and a coded index.
8. The method of claim 2 further comprising determining (1420) a list of candidates combining motion vector and motion vector difference; and in response (1525) that at least one reference picture has a size different from the size of the picture, setting a default value to a template based matching cost for a candidate combining motion vector and motion vector difference, ordering the list of candidates combining motion vector prediction and motion vector difference according to the template based matching cost and deriving the motion vector from the ordered list and a coded index.
9. The method of any of claims 2 or 8 further comprising determining (1420) a list of combined candidates for the motion vector and the motion vector difference; and in response (1525) that any of at least one reference picture used in prediction has a same size as the size of the picture, computing a template based matching cost for a candidate of a list of candidates combining motion vector prediction and motion vector difference for the motion vector, ordering the list of candidates combining motion vector prediction and motion vector difference according to the template based matching cost and deriving the motion vector from the ordered list and a coded index.
10. A method for decoding data representative of a block of a picture of a video comprising:
- predicting (1610) a block according to the method of any of claims 1 to 9; and - decoding (1620) picture data of the block of the picture of the video based on the predicted block.
1 1 . A method for encoding data representative of a block of a picture of a video comprising:
- predicting (1710) a block according to the method of any of claims 1 to 9; and
- encoding (1720) data representative of the block of the picture of the video based on the predicted block.
12. An apparatus (100) comprising a decoder (130) for decoding picture data, the decoder being configured to:
- predict a block according to the method of any of claims 1 to 9; and
- decode picture data of the block of the picture of the video based on the predicted block.
13. An apparatus (100) comprising an encoder (130) for encoding picture data, the encoder being configured to:
- predict a block according to the method of any of claims 1 to 9; and
- encode data representative of the block of the picture of the video based on the predicted block.
14. Computer program comprising program code instructions for implementing the steps of a method according to any of claims 1 to 9 when executed by a processor.
15. A non-transitory program storage device, readable by a computer, tangibly embodying a program of instructions executable by the computer for performing the method according to any one of claims 1 to 9.
PCT/EP2023/058736 2022-04-07 2023-04-04 Video encoding and decoding using reference picture resampling WO2023194334A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP22305485.9 2022-04-07
EP22305485 2022-04-07

Publications (1)

Publication Number Publication Date
WO2023194334A1 true WO2023194334A1 (en) 2023-10-12

Family

ID=81388840

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2023/058736 WO2023194334A1 (en) 2022-04-07 2023-04-04 Video encoding and decoding using reference picture resampling

Country Status (1)

Country Link
WO (1) WO2023194334A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210195172A1 (en) * 2019-12-20 2021-06-24 Qualcomm Incorporated Reference picture scaling ratios for reference picture resampling in video coding

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210195172A1 (en) * 2019-12-20 2021-06-24 Qualcomm Incorporated Reference picture scaling ratios for reference picture resampling in video coding

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BORDES (INTERDIGITAL) P ET AL: "EE2-related: bug fixes for enabling RPR in ECM", no. JVET-X0121 ; m57921, 8 October 2021 (2021-10-08), XP030298006, Retrieved from the Internet <URL:https://jvet-experts.org/doc_end_user/documents/24_Teleconference/wg11/JVET-X0121-v2.zip JVET-X0121-v2.docx> [retrieved on 20211008] *
BROWNE A ET AL: "Algorithm description for Versatile Video Coding and Test Model 16 (VTM 16)", no. JVET-Y2002 ; m59197, 30 March 2022 (2022-03-30), XP030302159, Retrieved from the Internet <URL:https://jvet-experts.org/doc_end_user/documents/25_Teleconference/wg11/JVET-Y2002-v1.zip JVET-Y2002-v1.docx> [retrieved on 20220330] *
KIDANI (KDDI) Y ET AL: "EE2-related: Combination of JVET-X0078 (Test 7/8), JVET-X0147 (Proposal-2), and GPM direct motion storage", no. JVET-X0166 ; m58206, 11 October 2021 (2021-10-11), XP030298118, Retrieved from the Internet <URL:https://jvet-experts.org/doc_end_user/documents/24_Teleconference/wg11/JVET-X0166-v3.zip JVET-X0166-v3_clean.docx> [retrieved on 20211011] *
SEREGIN (QUALCOMM) V ET AL: "EE2: Summary Report on Enhanced Compression beyond VVC capability", no. JVET-Y0024, 12 January 2022 (2022-01-12), XP030300227, Retrieved from the Internet <URL:https://jvet-experts.org/doc_end_user/documents/25_Teleconference/wg11/JVET-Y0024-v1.zip JVET-Y0024-v1.docx> [retrieved on 20220112] *
ZHANG (QUALCOMM) Z ET AL: "Non-EE2: Fixing issues for RPR enabling and non-CTC configuration in ECM", no. JVET-Y0128, 14 January 2022 (2022-01-14), XP030300461, Retrieved from the Internet <URL:https://jvet-experts.org/doc_end_user/documents/25_Teleconference/wg11/JVET-Y0128-v4.zip JVET-Y0128-v4.docx> [retrieved on 20220114] *

Similar Documents

Publication Publication Date Title
US20210051342A1 (en) Method and apparatus for video encoding and decoding based on a linear model responsive to neighboring samples
US20220345744A1 (en) Secondary transform for video encoding and decoding
CN112970264A (en) Simplification of coding modes based on neighboring sample-dependent parametric models
US20230095387A1 (en) Neural network-based intra prediction for video encoding or decoding
US20220014778A1 (en) Unified process and syntax for generalized prediction in video coding/decoding
WO2022063729A1 (en) Template matching prediction for versatile video coding
WO2022167322A1 (en) Spatial local illumination compensation
WO2021130025A1 (en) Estimating weighted-prediction parameters
US20210360273A1 (en) Method and apparatus for video encoding and decoding using list of predictor candidates
US20230018401A1 (en) Motion vector prediction in video encoding and decoding
WO2023194334A1 (en) Video encoding and decoding using reference picture resampling
CN112703733A (en) Translation and affine candidates in a unified list
US20230024223A1 (en) Intra sub partitions for video encoding and decoding combined with multiple transform selection, matrix weighted intra prediction or multi-reference-line intra prediction
US20230262268A1 (en) Chroma format dependent quantization matrices for video encoding and decoding
WO2023072554A1 (en) Video encoding and decoding using reference picture resampling
KR20240099324A (en) Video encoding and decoding using reference picture resampling
WO2023046463A1 (en) Methods and apparatuses for encoding/decoding a video
WO2023046917A1 (en) Methods and apparatus for dmvr with bi-prediction weighting
WO2024099962A1 (en) ENCODING AND DECODING METHODS OF INTRA PREDICTION MODES USING DYNAMIC LISTS OF MOST PROBABLE MODEs AND CORRESPONDING APPARATUSES
WO2023052141A1 (en) Methods and apparatuses for encoding/decoding a video
WO2022101018A1 (en) A method and an apparatus for encoding or decoding a video
WO2024126045A1 (en) Methods and apparatuses for encoding and decoding an image or a video
WO2024083500A1 (en) Methods and apparatuses for padding reference samples
WO2022268623A1 (en) Template-based intra mode derivation
WO2020260310A1 (en) Quantization matrices selection for separate color plane mode

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23716546

Country of ref document: EP

Kind code of ref document: A1