CN117397240A

CN117397240A - System and method for explicit signaling of jointly encoded scaling factors for motion vector differences

Info

Publication number: CN117397240A
Application number: CN202280021655.XA
Authority: CN
Inventors: 赵亮; 赵欣; 刘杉
Original assignee: Tencent America LLC
Current assignee: Tencent America LLC
Priority date: 2022-04-06
Filing date: 2022-11-10
Publication date: 2024-01-12

Abstract

Systems and methods are provided for receiving an encoded video bitstream including a current frame, first and second reference frames associated with a current block in the current frame, a Joint Motion Vector Difference (JMVD) associated with the first and second reference frames, and a flag indicating whether one or more scaling factors are used to derive a Motion Vector Difference (MVD) from the JMVD. A first MVD/second MVD associated with the first reference frame/the second reference frame may be derived based on applying a first scaling factor/a second scaling factor of the one or more scaling factors to the JMVD or based on a distance between the first reference frame/the second reference frame and the current frame. The current block may be decoded based on the first MVD and the second MVD.

Description

System and method for explicit signaling of jointly encoded scaling factors for motion vector differences

Cross Reference to Related Applications

The present application claims the benefit of priority from U.S. provisional patent application No. 63/328,062, filed by the U.S. patent and trademark office at 4/2022, and the benefit of priority from U.S. patent application No. 17/983,089, filed by the U.S. patent and trademark office at 11/2022.

Technical Field

The present disclosure relates to advanced image and video coding techniques, and more particularly, to systems and methods for explicit signaling of jointly encoded scaling factors for motion vector differences.

Background

Streaming audiovisual content is becoming increasingly popular. Considerable bandwidth is required to facilitate this increase in the amount and quality of streaming content. Therefore, there is a need for efficient encoding and decoding schemes for streaming content using less bandwidth while maintaining high quality. For example, h.265/HEVC (High Efficiency Video Coding ), VP9, and AOMedia Video 1 (AV 1, open media alliance first generation Video coding standard) are some coding and decoding schemes developed for this purpose.

The h.265/HEVC standard was published by VCEG (Video Coding Experts Group, video coding expert group) (Q6/16) and MPEG (Moving Picture Expert Group, moving picture coding expert group) (JTC 1/SC 29/WG 11, 11 working group of the 29 th group committee of the joint technical commission, 1) of ITU-T (International Telegraph Union-Telecommunication Standardization Sector, international telecommunication union, telecommunication standardization sector) (Q6/16) and ISO (International Organization for Standardization, international standardization sector)/IEC (International Electrotechnical Commission ) (version 4) in 2013 (version 1), 2014 (version 2), 2015 (version 3) and 2016 (version 4), respectively.

AOMedia Video 1 (AV 1) is an open Video coding format designed for Video transmission over the internet. AV1 was developed by the open media alliance (AOMedia, alliance For OpenMedia) as a successor to VP9, which was established in 2015, including semiconductor companies, video on demand providers, video content producers, software development companies, and web browser providers. Many of the components of the AV1 project come from previous research work by members of the federation. Individual contributors began experimental technological platforms a few years ago: daala of Xiph/Mozilla has issued the code in 2010, and the experimental VP9 evolution project VP10 of Google (Google) was announced on 12 months 9 and 2014, and Thor of Cisco (Cisco) was published on 11 months 8 and 2015. Based on the codebase of VP9, AV1 incorporates other technologies, some of which were developed in these experimental formats. AV1 was released by month 4 and 7 of 2016 with reference to the first version of the codec, 0.1.0. The alliance announces that AV1 code stream specifications, as well as software-based reference encoders and reference decoders, were released on day 3 and 28 of 2018. A verified version 1.0.0 of the specification was released on the 6 th and 25 th 2018. A verified version 1.0.0 of the specification with a survey table 1 was published on 1, 8, 2019. The AV1 stream specification includes a reference video codec.

The development of next generation video codecs is also underway. For example, AOMedia has formally introduced a standard for next generation video codecs called multi-function video coding (Versatile Video Coding, VVC).

Disclosure of Invention

According to one aspect of the present disclosure, there is provided a method comprising: receiving an encoded video bitstream, the encoded video bitstream comprising a current frame, a first reference frame and a second reference frame associated with a current block in the current frame, a Joint Motion Vector Difference (JMVD) associated with the first reference frame and the second reference frame, and a flag indicating whether one or more scaling factors are used to derive a Motion Vector Difference (MVD) from the JMVD; deriving a first MVD associated with the first reference frame based on applying a first scaling factor of the one or more scaling factors to the JMVD or based on a distance between the first reference frame and the current frame; deriving a second MVD associated with the second reference frame based on applying a second scaling factor of the one or more scaling factors to the JMVD or based on a distance between the second reference frame and the current frame; and decoding the current block based on the first MVD and the second MVD.

According to other aspects of the disclosure, apparatus and computer-readable media consistent with the method are also provided.

Drawings

The above and other aspects, features and advantages of some embodiments of the present disclosure will become more apparent from the following description in conjunction with the accompanying drawings, in which:

FIG. 1 is a schematic diagram illustrating components of one or more devices according to various embodiments.

Fig. 2 is a diagram illustrating a method of explicit signaling of scaling factors for joint coding of motion vector differences, in accordance with various embodiments.

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

VVC includes a number of improvements, such as the introduction of joint motion vector difference (Joint Motion Vector Difference, JMVD) coding. This new inter prediction coding mode (named join_nemtv) is applied to indicate whether to jointly write the Motion Vector Difference (MVD) of the two reference lists to the bitstream. If the inter prediction mode is equal to the join_newmv mode, the MVDs of reference list 0 and reference list 1 are jointly written into the bitstream, and only one MVD (named join_mvd) is written into the bitstream and transmitted to the decoder, which derives the MVDs of reference list 0 and reference list 1 from the join_mvd. Typically, the join_new mode is written to the codestream along with the near_new, new_new, and global_global modes. No other context is added.

When the join_newmv mode is written to the bitstream and the picture order count (Picture Order Count, POC) distance between the two reference frames and the current frame is different, the MVD is scaled for reference list 0 or reference list 1 based on the POC distance. For example, if the distance (td 0) between the reference frame list 0 and the current frame is equal to or greater than the distance (td 1) between the reference frame list 1 and the current frame, then joint_mvd is directly used for the reference list 0, and MVD of the reference list 1 is derived from joint_mvd based on equation (1):

derived from

Otherwise, if td1 is equal to or greater than td0, then join_mvd is directly used for reference list 1, and MVD of reference list 0 is derived from join_mvd based on equation (2):

derived from

When the JMVD mode is selected for a block, the JMVD is written into the bitstream for (or signaled for) two reference frames, and the MVDs of the two reference frames are derived from the JMVD based on the distance between the reference frame and the current frame. This assumes that there is linear motion between the backward reference frame and the forward reference frame relative to the current frame. However, the motion between two reference frames may not always be linear. For example, the motion from the backward reference frame to the forward reference frame may become slower or faster.

According to one aspect of the present disclosure, there is provided a method comprising: receiving an encoded video code stream, the encoded video code stream comprising: a current frame, a first reference frame associated with a current block in the current frame, a second reference frame, a joint Motion Vector Difference (MVD), and a flag signaling one or more scaling factors; deriving a first MVD associated with the first reference frame based on one or more scaling factors or a distance between the first reference frame and the current frame; deriving a second MVD associated with the second reference frame based on one or more scaling factors or a distance between the second reference frame and the current frame; and decoding the current block based on the first MVD and the second MVD.

In some embodiments, the signaling flag includes a pair of scaling factors for the first reference frame and the second reference frame.

In some embodiments, one or more scaling factors are limited to 2 ⁿ Wherein n is an integer value.

In some embodiments, one or more scaling factors are limited to a value of M/M, where m=2 ⁿ And m and n are integer values.

In some embodiments, the one or more scaling factors are one scaling factor, and when the signaling flag indicates that the one scaling factor is not equal to the first default value, the one scaling factor is used to derive one of the first MVD and the second MVD, and the scaling factor used to derive the other of the first MVD and the second MVD is a second default value that is different from the first default value.

In some embodiments, the context for signaling (signal) one or more scaling factors is based on coding information of the current block or one or more neighboring blocks of the current block.

In some embodiments, the context is based on a block size of the current block, wherein the one or more scaling factors include a first set of scaling factors when the block size of the current block is equal to or less than a first threshold size, and the one or more scaling factors include a second set of scaling factors when the block size of the current block is greater than the first threshold size.

In some embodiments, the block size corresponds to one or more of a block width, a block height, a number of pixels in a current block, a minimum block width, a minimum block height, a maximum block width, and a maximum block height.

In some embodiments, the context is based on whether the motion vector prediction (Motion Vector Prediction, MVP) of the current block is symmetric.

In some embodiments, the context is based on an index of Motion Vector Prediction (MVP) candidates of the current block.

In some embodiments, the syntax is signaled in a sequence header, a frame header, or a slice header to indicate whether a signaling flag is included in the encoded bitstream.

In some embodiments, the first reference frame belongs to a list of backward reference frames and the second reference frame belongs to a list of forward reference frames, or the first reference frame belongs to a list of forward reference frames and the second reference frame belongs to a list of backward reference frames.

In some embodiments, when td1 is greater than td0, according to the derivation (i.e.)>) To derive a first MVD and/or a second MVD; and when td0 is greater than td1, according to the derivation +.> (i.e.,) To derive a first MVD and/or a second MVD. Where td0 corresponds to the distance between the first reference frame and the current frame and td1 corresponds to the distance between the second reference frame and the current frame.

In some embodiments, when td1 is greater than td0, according to the derivation (i.e.)>) To derive a first MVD and/or a second MVD; and when td0 is greater than td1, according to the derivation +.> (i.e.)>) To derive a first MVD and/or a second MVD. Wherein td0 corresponds to the distance between the current frame and the first reference frame, and td1 corresponds to the distance between the current frame and the second reference frame, M corresponds to 2 ⁿ Wherein n is an integer.

According to one aspect of the present disclosure, there is provided an apparatus comprising: a memory for storing program code; and at least one processor configured to execute the program code and to operate as instructed by the program code, the program code comprising: receiving code for causing at least one of the at least one processor to receive an encoded video stream, the encoded video stream comprising: a current frame, a first reference frame associated with a current block in the current frame, a second reference frame, a JMVD, and a flag signaling one or more scaling factors; deriving code for causing at least one of the at least one processor to derive a first MVD associated with a first reference frame based on one or more scaling factors or a distance between the first reference frame and a current frame; deriving code for causing at least one of the at least one processor to derive a second MVD associated with a second reference frame based on one or more scaling factors or a distance between the second reference frame and the current frame; and decoding code for causing at least one of the at least one processor to decode the current block based on the first MVD and the second MVD.

In some embodiments, the context for signaling one or more scaling factors is based on coding information of the current block or one or more neighboring blocks of the current block.

According to one aspect of the present disclosure, there is provided a non-transitory computer readable medium storing computer readable program code which, when executed by a processor, causes the processor to at least: receiving an encoded video code stream, the encoded video code stream comprising: a current frame, a first reference frame associated with a current block in the current frame, a second reference frame, a JMVD, and a flag signaling one or more scaling factors; deriving a first MVD associated with the first reference frame based on one or more scaling factors or a distance between the first reference frame and the current frame; deriving a second MVD associated with the second reference frame based on one or more scaling factors or a distance between the second reference frame and the current frame; and decoding the current block based on the first MVD and the second MVD.

The following detailed description of example embodiments refers to the accompanying drawings. The same reference numbers in different drawings may identify the same or similar elements.

The foregoing disclosure provides illustration and description, but is not intended to be exhaustive or to limit the implementations to the precise form disclosed. Modifications and variations are possible in light of the above disclosure or may be acquired from practice of the implementations. Furthermore, one or more features or components of one embodiment may be incorporated into or combined with another embodiment (or one or more features of another embodiment). Further, in the flowcharts and descriptions of operations provided below, it should be appreciated that one or more operations may be omitted, one or more operations may be added, one or more operations may be performed concurrently (at least in part), and the order of one or more operations may be switched.

It is to be understood that the systems and/or methods described herein may be implemented in various forms of hardware, firmware, or combinations thereof. The actual specialized control hardware or software code used to implement the systems and/or methods is not limiting of the implementation. Accordingly, the operations and behavior of the systems and/or methods are described herein without reference to the specific software code. It should be understood that software and hardware may be designed to implement the systems and/or methods based on the description herein.

Even if specific combinations of features are recited in the claims and/or disclosed in the specification, these combinations are not intended to limit the disclosure of possible implementations. Indeed, many of these features may be combined in ways that are not specifically recited in the claims and/or disclosed in the specification. Although each of the dependent claims listed below may depend directly on only one claim, the disclosure of possible implementations includes a combination of each dependent claim with each other claim in the claim set.

No element, act, or instruction used herein should be construed as critical or essential unless explicitly described as such. Furthermore, as used herein, the article "a/an" is intended to include one or more items and may be used interchangeably with "one or more". If only one item is intended, the term "one/single (one)" or similar language is used. Furthermore, as used herein, the terms "having," "including," and the like are intended to be open-ended terms. Furthermore, the phrase "based on" means "based, at least in part, on" unless explicitly stated otherwise. Furthermore, expressions such as "at least one of [ a ] and [ B", "or" at least one of [ a ] or [ B ]) should be understood to include a alone, B alone, or both a and B.

As described above, when the JMVD mode is selected for a block, JMVDs are written into a code stream for two reference frames, and MVDs of the two reference frames are derived from the JMVDs based on the distance between the reference frame and the current frame. This assumes that there is linear motion between the backward reference frame and the forward reference frame relative to the current frame. However, the motion between two reference frames may not always be linear. For example, the motion from the backward reference frame to the forward reference frame may become slower or faster.

A system and method for template matching based scale factor derivation for joint coding of motion vector differences (JMVD) is provided according to various embodiments of the present disclosure. When the JMVD mode is selected for a current block, a template region is defined for the current block and its prediction block in a reference frame list corresponding to each Motion Vector Difference (MVD). The template may be used to derive a prediction scaling factor for the MVD before being used to obtain the prediction block.

FIG. 1 is a schematic diagram illustrating components of one or more devices according to various embodiments. Referring to fig. 1, device 100 may include a bus 110, one or more processors 120, a memory 130, a storage component 140, and a communication interface 150. It should be appreciated that one or more components may be omitted and/or one or more additional components may be included.

Bus 110 includes components that allow communication among the components of device 100. The processor 120 may be implemented in hardware, firmware, or a combination of hardware and software. The processor 120 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), an Acceleration Processing Unit (APU), a microprocessor, a microcontroller, a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), a sparse tensor core, or other type of processing component. Processor 120 may include one or more processors. For example, the processor 120 may include one or more CPU, APU, FPGA, ASIC, sparse tensor cores, or other types of processing components. One or more of the processors 120 can be programmed to perform functions.

Memory 130 includes Random Access Memory (RAM), read Only Memory (ROM), and/or other types of dynamic or static storage devices (e.g., flash memory, magnetic memory, and/or optical memory) that store information and/or instructions for use by processor 120.

Storage component 140 stores information and/or software related to the operation and use of device 100. For example, storage component 140 may include a hard disk (e.g., magnetic, optical, magneto-optical, and/or solid state disk), a Compact Disk (CD), a Digital Versatile Disk (DVD), a floppy disk, a magnetic cassette, a magnetic tape, and/or other types of non-transitory computer-readable media, and a corresponding drive.

Communication interface 150 includes transceiver-like components (e.g., a transceiver and/or separate receivers and transmitters) that enable device 100 to communicate with other devices, for example, via a wired connection, a wireless connection, or a combination of wired and wireless connections. Communication interface 150 may allow device 100 to receive information from and/or provide information to another device. For example, communication interface 150 may include an ethernet interface, an optical interface, a coaxial interface, an infrared interface, a Radio Frequency (RF) interface, a Universal Serial Bus (USB) interface, a Wi-Fi interface, a cellular network interface, and so forth.

Device 100 may perform one or more processes or functions described herein. The device 100 may perform operations based on the processor 120 executing software instructions stored by a non-transitory computer readable medium (e.g., the memory 130 and/or the storage component 140). A computer-readable medium is defined herein as a non-transitory memory device. A memory device includes memory space within a single physical storage device or memory space distributed across multiple physical storage devices.

The software instructions may be read into memory 130 and/or storage component 140 from another computer-readable medium or from another device via communication interface 150. The software instructions stored in memory 130 and/or storage component 140, when executed, may cause processor 120 to perform one or more processes described herein.

Additionally or alternatively, hardwired circuitry may be used in place of or in combination with software instructions to perform one or more processes described herein. Thus, embodiments described herein are not limited to any specific combination of hardware circuitry and software.

The number and arrangement of components shown in fig. 1 are provided as examples. In practice, the apparatus 100 may include more components, fewer components, different components, or differently arranged components than those shown in fig. 1. Additionally or alternatively, one set of components (e.g., one or more components) of device 100 may perform one or more functions described as being performed by another set of components of device 100.

Any of the operations or processes described below (e.g., fig. 2-3) may be implemented by or using any of the elements shown in fig. 1.

Fig. 2 is a schematic diagram illustrating a method 200 of explicit signaling of a jointly encoded scaling factor for motion vector differences, in accordance with various embodiments.

At 201, method 200 includes receiving an encoded code stream, the encoded code stream including: the method includes a current frame, first and second reference frames associated with a current block in the current frame, a Joint Motion Vector Difference (JMVD) associated with the first and second reference frames, and a flag indicating whether one or more scaling factors are used to derive the Motion Vector Difference (MVD) from the JMVD.

In some embodiments, the first reference frame may belong to a list of backward reference frames and the second reference frame may belong to a list of forward reference frames. In some embodiments, the first reference frame may belong to a list of forward reference frames and the second reference frame may belong to a list of backward reference frames.

In some embodiments, the context for signaling one or more scaling factors is based on coding information of the current block or one or more neighboring blocks of the current block. The context may be based on a block size of the current block, wherein the one or more scaling factors include a first set of scaling factors when the block size of the current block is equal to or less than a first threshold size, and the one or more scaling factors include a second set of scaling factors when the block size of the current block is greater than the first threshold size. The block size may correspond to one or more of a block width, a block height, a number of pixels in a current block, a minimum block width, a minimum block height, a maximum block width, and a maximum block height. In some embodiments, the context is based on whether Motion Vector Prediction (MVP) of the current block is symmetric. In some embodiments, the context is based on an index of Motion Vector Prediction (MVP) candidates of the current block.

At 202, the method 200 includes deriving a first MVD associated with a first reference frame. For example, the device 100 may derive the first MVD associated with the first reference frame based on applying a first scaling factor of the one or more scaling factors to the JMVD or based on a distance between the first reference frame and the current frame. In some embodiments, the one or more scaling factors are one scaling factor, and when the device 100 determines that the signaling flag indicates that the one scaling factor is not equal to the first default value, the device 100 uses the one scaling factor to derive one of the first MVD and the second MVD, and the device 100 uses a second default value different from the first default value to derive the other of the first MVD and the second MVD.

In some embodiments, when td1 is greater than td0, according to the derivation (i.e.)>) To derive a first MVD; when td0 is greater than td1, according to the derivation +.> (i.e.)>To derive a first MVD. Where td0 corresponds to the distance between the first reference frame and the current frame and td1 corresponds to the distance between the second reference frame and the current frame.

In some embodiments, when td1 is greater than td0, according to the derivation (i.e.)>) To derive a first MVD and/or a second MVD; when td0 is greater than td1, according to the derivation +.> (i.e.)>) The first MVD and/or the second MVD are derived. Wherein td0 corresponds to the distance between the current frame and the first reference frame, td1 corresponds to the distance between the current frame and the second reference frame, and M corresponds to 2 ⁿ Wherein n is an integer.

At 203, the method 200 includes deriving a second MVD associated with a second reference frame. For example, the device 100 may derive a second MVD associated with a second reference frame based on applying a second scaling factor of the one or more scaling factors to the JMVD or based on a distance between the second reference frame and the current frame.

In some embodiments, when td1 is greater than td0, according to the derivation (i.e.)>) To derive a second MVD; when td0 is greater than td1, according to the derivation +.> (i.e.)>) To derive a second MVD. Where td0 corresponds to the distance between the first reference frame and the current frame and td1 corresponds to the distance between the second reference frame and the current frame.

At 204, the method 200 includes decoding the current frame based on the first MVD and the second MVD. For example, the device 100 may decode the current block based on the first MVD and the second MVD.

In some embodiments, when td1 is greater than td0, according to the derivation (i.e.,) To derive a first MVD and/or a second MVD; when td0 is greater than td1, according to the derivation +.> (i.e.)>) To derive a first MVD and/or a second MVD. Wherein td0 corresponds to the distance between the current frame and the first reference frame, td1 corresponds to the distance between the current frame and the second reference frame, and M corresponds to 2 ⁿ Wherein n is an integer.

The foregoing disclosure provides illustration and description, but is not intended to be exhaustive or to limit the implementations to the precise form disclosed. Modifications and variations are possible in light of the above disclosure or may be acquired from practice of the implementations.

Some embodiments may relate to systems, methods, and/or computer-readable media at any possible level of integrated technology detail. Furthermore, one or more of the components described above may be implemented as instructions (and/or may include at least one processor) stored on a computer-readable medium and executable by the at least one processor. The computer-readable medium may include a computer-readable non-transitory storage medium (or media) having computer-readable program instructions thereon for causing a processor to perform operations.

A computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: portable computer floppy disks, hard disks, random Access Memories (RAMs), read-only memories (ROMs), erasable programmable read-only memories (EPROMs or flash memories), static Random Access Memories (SRAMs), portable compact disk read-only memories (CD-ROMs), digital Versatile Disks (DVDs), memory sticks, floppy disks, mechanical coding devices (e.g., punch cards or raised structures in grooves having instructions recorded thereon), and any suitable combination of the foregoing. Computer-readable storage media, as used herein, should not be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a corresponding computing/processing device or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, optical transmission fibers, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.

Computer readable program code/instructions for performing an operation may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, configuration data for an integrated circuit, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++, and the like, and a procedural programming language, such as the "C" programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, electronic circuitry, including, for example, programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), may be personalized to execute computer-readable program instructions by utilizing state information of the computer-readable program instructions in order to perform aspects or operations.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having the instructions stored therein includes articles of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer readable media according to various embodiments. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). The method, computer system, and computer readable medium may include additional blocks, fewer blocks, different blocks, or differently arranged blocks than those shown in the figures. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Claims

1. A method, the method comprising:

receiving an encoded video stream, the encoded video stream comprising: a current frame, first and second reference frames associated with a current block in the current frame, a Joint Motion Vector Difference (JMVD) associated with the first and second reference frames, and a flag indicating whether one or more scaling factors are used to derive a Motion Vector Difference (MVD) from the JMVD;

deriving a first MVD associated with the first reference frame based on applying a first scaling factor of the one or more scaling factors to the JMVD or based on a distance between the first reference frame and the current frame;

deriving a second MVD associated with the second reference frame based on applying a second scaling factor of the one or more scaling factors to the JMVD or based on a distance between the second reference frame and the current frame; and

the current block is decoded based on the first MVD and the second MVD.

2. The method of claim 1, wherein the signaling flag comprises a pair of scaling factors for the first reference frame and the second reference frame.

3. The method of claim 1, wherein the one or more scaling factors are limited to 2 ⁿ Wherein n is an integer value.

4. The method of claim 1, wherein the one or more scaling factors are limited to a value of M/M, where M = 2 ⁿ And m and n are integer values.

5. The method of claim 1, wherein the one or more scaling factors are one scaling factor, and wherein the one scaling factor is used to derive one of the first MVD and the second MVD and is used to derive the other of the first MVD and the second MVD is a second default value different from the first default value when the signaling flag indicates that the one scaling factor is not equal to the first default value.

6. The method of claim 1, wherein the context for signaling the one or more scaling factors is based on coding information of the current block or one or more neighboring blocks of the current block.

7. The method of claim 6, wherein the context is based on a block size of the current block, wherein the one or more scaling factors comprise a first set of scaling factors when the block size of the current block is equal to or less than a first threshold size, and wherein the one or more scaling factors comprise a second set of scaling factors when the block size of the current block is greater than the first threshold size.

8. The method of claim 6, wherein the block size corresponds to one or more of a block width, a block height, a number of pixels in the current block, a minimum block width, a minimum block height, a maximum block width, and a maximum block height.

9. The method of claim 6, wherein the context is based on whether Motion Vector Prediction (MVP) of the current block is symmetric.

10. The method of claim 6, wherein the context is based on an index of Motion Vector Prediction (MVP) candidates of the current block.

11. The method of claim 1, wherein a syntax is signaled in a sequence header, a frame header, or a slice header to indicate whether the signaling flag is included in the encoded bitstream.

12. The method of claim 1, wherein the first reference frame belongs to a list of backward reference frames and the second reference frame belongs to a list of forward reference frames, or wherein the first reference frame belongs to a list of forward reference frames and the second reference frame belongs to a list of backward reference frames.

13. The method of claim 1, wherein the root when td1 is greater than td0Derived fromTo derive the first MVD and/or the second MVD; and when said td0 is greater than said td1, according to +.> To derive the first MVD and/or the second MVD,

wherein the td0 corresponds to a distance between the first reference frame and the current frame, and the td1 corresponds to a distance between the second reference frame and the current frame.

14. The method according to claim 1, wherein when td1 is greater than td0, according to the derivationTo derive the first MVD and/or the second MVD; and when said td0 is greater than said td1, according to the derived +.> To derive the first MVD and/or the second MVD,

wherein the td0 corresponds to a distance between the current frame and the first reference frame, and the td1 corresponds to a distance between the current frame and the second reference frame, and M corresponds to 2 ⁿ Wherein n is an integer.

15. An apparatus, the apparatus comprising:

a memory for storing program code; and

at least one processor configured to execute the program code and to operate as instructed by the program code, the program code comprising:

receiving code for causing at least one of the at least one processor to receive an encoded video stream, the encoded video stream comprising: a current frame, first and second reference frames associated with a current block in the current frame, a Joint Motion Vector Difference (JMVD) associated with the first and second reference frames, and a flag indicating whether one or more scaling factors are used to derive a Motion Vector Difference (MVD) from the JMVD;

deriving code for causing at least one of the at least one processor to derive a first MVD associated with the first reference frame based on applying a first scaling factor of the one or more scaling factors to the JMVD or based on a distance between the first reference frame and the current frame;

deriving code for causing at least one of the at least one processor to derive a second MVD associated with the second reference frame based on applying a second scaling factor of the one or more scaling factors to the JMVD or based on a distance between the second reference frame and the current frame; and

decoding code for causing at least one of the at least one processor to decode the current block based on the first MVD and the second MVD.

16. The apparatus of claim 15, wherein the one or more scaling factors are one scaling factor, and wherein the one scaling factor is used to derive one of the first MVD and the second MVD and is used to derive the other of the first MVD and the second MVD is a second default value different from the first default value when the signaling flag indicates that the one scaling factor is not equal to the first default value.

17. The apparatus of claim 15, wherein the context for signaling the one or more scaling factors is based on coding information of the current block or one or more neighboring blocks of the current block.

18. A non-transitory computer readable medium, characterized in that computer readable program code is stored, which when executed by a processor, causes the processor to at least:

the current block is decoded based on the first MVD and the second MVD.

19. The non-transitory computer-readable medium of claim 18, wherein the one or more scaling factors are one scaling factor, and wherein when the signaling flag indicates that the one scaling factor is not equal to a first default value, the one scaling factor is used to derive one of the first MVD and the second MVD, and wherein the scaling factor used to derive the other of the first MVD and the second MVD is a second default value that is different from the first default value.

20. The non-transitory computer-readable medium of claim 18, wherein the context for signaling the one or more scaling factors is based on encoding information of the current block or one or more neighboring blocks of the current block.