WO2017194756A1 - Methods and arrangements for coding and decoding motion vectors - Google Patents

Methods and arrangements for coding and decoding motion vectors Download PDF

Info

Publication number
WO2017194756A1
WO2017194756A1 PCT/EP2017/061499 EP2017061499W WO2017194756A1 WO 2017194756 A1 WO2017194756 A1 WO 2017194756A1 EP 2017061499 W EP2017061499 W EP 2017061499W WO 2017194756 A1 WO2017194756 A1 WO 2017194756A1
Authority
WO
WIPO (PCT)
Prior art keywords
mvp
pixel resolution
candidate
motion vector
corresponding difference
Prior art date
Application number
PCT/EP2017/061499
Other languages
French (fr)
Inventor
Per Wennersten
Ruoyang YU
Rickard Sjöberg
Usman HAKEEM
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Publication of WO2017194756A1 publication Critical patent/WO2017194756A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/521Processing of motion vectors for estimating the reliability of the determined motion vectors or motion vector field, e.g. for smoothing the motion vector field or for correcting motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/523Motion estimation or motion compensation with sub-pixel accuracy

Definitions

  • the embodiments relate to encoding and decoding of motion vectors.
  • High Efficiency Video Coding is a block based video codec standardized by the
  • Spatial prediction is achieved using intra (I) prediction from within the current picture.
  • Temporal prediction is achieved using inter (P) or bi-directional inter (B) prediction on a block level from previously decoded reference pictures.
  • the difference between the original pixel data and the predicted pixel data is transformed into the frequency domain and quantized and entropy encoded (e.g., Context-Adaptive Variable Length Coding (CAVLC), Context- Adaptive Binary Arithmetic Coding (CABAC), etc.) before being transmitted together with necessary prediction parameters, e.g., mode selections and motion vectors, after entropy encoding them.
  • CAVLC Context-Adaptive Variable Length Coding
  • CABAC Context- Adaptive Binary Arithmetic Coding
  • the level of quantization is determined by the quantization parameter (QP).
  • QP quantization parameter
  • the decoder performs entropy decoding, inverse quantization, and inverse transformation to obtain the residual, and then adds the residual to an intra- or inter-prediction to reconstruct a picture.
  • the first picture When encoding video, the first picture will typically be encoded as a still image. Subsequent pictures will include references to previous pictures, along with motion vectors indicating the movement between the pictures. Motion vectors will typically be supplied for blocks of pixels, ranging from 4x4 to 64x64 pixels in size, and can constitute up to about 50% of the total bitrate of encoded video.
  • the list of MV prediction candidates contains two separate prediction candidates.
  • the MVs in HEVC are defined in quarter-pixel resolution, so a motion vector with an x- component of 4 that corresponds to one full pixel of movement in the horizontal direction.
  • the two MV prediction candidates are rounded to full-pixel positions, and the coded MV difference (the delta) is treated as a full-pixel difference from the selected MV prediction candidate.
  • the coded MV difference is multiplied by four before being added to the selected MV prediction candidate on the decoder side. For example, normally it would be possible to use pixel positions 0 through 8, with 0, 4 and 8 corresponding to full-pixel positions, and the rest being half -pixel or quarter-pixel positions. Using full-pixel coding, they would be rounded to 0, 4 or 8, coded as 0, 1 or 2 and then multiplied by 4 on the decoder side to arrive at 0, 4 and 8 again.
  • a first aspect of the embodiments defines a method, performed by a video encoder, for encoding motion vectors.
  • a motion vector is represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate (also referred to as delta).
  • the method comprises selecting a MVP candidate from a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution.
  • the first pixel resolution is different than the second pixel resolution.
  • the method comprises sending an index of the selected MVP candidate in the list of MVP candidates and the corresponding difference to a video decoder.
  • a second aspect of the embodiments defines a video encoder, for encoding motion vectors.
  • a motion vector is represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate.
  • the video encoder comprises processing means operative to select a MVP candidate from a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution.
  • the video encoder comprises processing means operative to send an index of the selected MVP candidate in the list of MVP candidates and the corresponding difference to a video decoder.
  • a third aspect of the embodiments defines a computer program, for encoding motion vectors, wherein a motion vector is represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate.
  • the computer program comprises code means which, when run on a computer, causes the computer to select a MVP candidate from a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution.
  • the computer program comprises code means which, when run on a computer, causes the computer to send an index of the selected MVP candidate in the list of MVP candidates and the corresponding difference to a video decoder.
  • a fourth aspect of the embodiments defines a carrier comprising a computer program according to the third embodiment.
  • the carrier is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium.
  • a fifth aspect of the embodiments defines a computer program product comprising computer readable means and a computer program according to the third aspect, stored on the computer readable means.
  • a sixth aspect of the embodiments defines a method, performed by a video decoder, of decoding a motion vector represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate.
  • the method comprises decoding an index of the MVP candidate in a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution.
  • the method comprises decoding the corresponding difference.
  • the method comprises reconstructing the motion vector as a sum of the MVP candidate with the decoded index and the corresponding difference.
  • a seventh aspect of the embodiments defines a video decoder, for decoding a motion vector represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate.
  • the video decoder comprises processing means operative to decode an index of the MVP candidate in a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution.
  • the video decoder comprises processing means operative to decode the corresponding difference.
  • the video decoder comprises processing means operative to reconstruct the motion vector as a sum of the MVP candidate with the decoded index and the corresponding difference.
  • An eighth aspect of the embodiments defines a computer program for decoding a motion vector represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate.
  • the computer program comprises code means which, when run on a computer, causes the computer to decode an index of the MVP candidate in a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution.
  • the computer program comprises code means which, when run on a computer, causes the computer to decode the corresponding difference.
  • the computer program comprises code means which, when run on a computer, causes the computer to reconstruct the motion vector as a sum of the MVP candidate with the decoded index and the corresponding difference.
  • a ninth aspect of the embodiments defines a carrier comprising a computer program according to the eighth embodiment.
  • the carrier is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium.
  • a tenth aspect of the embodiments defines a computer program product comprising computer readable means and a computer program according to the ninth aspect, stored on the computer readable means.
  • the video encoder and/or the video decoder can be implemented in a server or a user device, e.g., in a server at a content distributor.
  • the user device may be a camera, a mobile phone, tablet, laptop, etc.
  • At least some of the embodiments provide that more possible MVs can be signaled in the full-pixel mode, without requiring more bits to be coded. Tests show that this provides a compression efficiency improvement of 0.1%, while having a relatively low complexity cost.
  • any feature of the first, second, third, fourth, fifth, sixth, seventh, eighth, ninth and tenth aspects may be applied to any other aspect, whenever appropriate.
  • any advantage of the first aspect may equally apply to the second, third, fourth, fifth, sixth, seventh, eighth, ninth and tenth aspect respectively, and vice versa.
  • FIGS 1A and IB illustrate examples of motion vector prediction according to the prior art.
  • Figure 1C illustrates an example of motion vector prediction according to an embodiment of the present invention.
  • Figure 2A illustrates the steps performed in an encoding method according to the embodiments of the present invention.
  • Figure 2B illustrates the steps performed in a decoding method according to the embodiments of the present invention.
  • Figure 3 illustrates how a motion vector prediction candidate is rounded according to an embodiment of the present invention.
  • Figures 4A, 5A, 6A and 7A depict a schematic block diagram illustrating functional units of a video encoder for encoding a motion vector according to embodiments of the present invention.
  • Figures 4B, 5B, 6B and 7B depict a schematic block diagram illustrating functional units of a video decoder for decoding a motion vector according to embodiments of the present invention.
  • Figures 8 A and 8B are schematic diagrams illustrating an example of how functionality can be distributed or partitioned between different network devices according to embodiments of the present invention.
  • the resolution of MVs in HEVC is generally quarter-pixel, but full-pixel resolution is occasionally used to reduce the cost of signaling the MVs.
  • the MV prediction candidates are rounded to full-pixel positions before a full-pixel motion vector difference is added. This gives final MVs at full-pixel positions, which is beneficial because no filtering (i.e., interpolation) is required during motion compensation. Filtering in this context could imply weighted averaging.
  • An example of filtering is when the full-pixel values are coded by the previous frame, but are interpolated in order to guess the sub-pixel values.
  • Figure 1 A illustrates a MV predictor and some possible final MVs after a full-pixel delta is applied.
  • the circle grid represents full-pixel positions.
  • the big circle corresponds to the MV predictor.
  • the MV delta is in full-pixel, that would lead the final MVs to also be in full-pixel positions which are indicated by the small circles.
  • Figure IB illustrates two MV predictors both rounded to full-pixel with a full-pixel delta applied. Rounding to full-pixel positions for both the MV prediction candidates means that only full-pixel positions can be signaled for either candidate as illustrated in figure IB.
  • the additional MV predictor is shown with the big cross. The possible final MVs when using this additional MV predictor are indicated by the small crosses. As can be seen, the number of possible final MV positions does not increase.
  • a method, performed by a video encoder, for encoding motion vectors is provided, as described in Figure 2A.
  • a motion vector is represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate, the difference also being referred to as delta.
  • the method comprises a step SI of selecting a MVP candidate from a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution.
  • the method comprises a step S2 of sending an index of the selected MVP candidate in the list of MVP candidates and the corresponding difference to a video decoder.
  • the list of MVP candidates consists of two MVP candidates. This is a typical scenario in HEVC and H.264/AVC.
  • the two candidates have indices 0 and 1 in the list of MVP candidates.
  • the first MVP candidate in the list of MVP candidates and the corresponding difference are rounded to the first pixel resolution
  • the second MVP candidate is rounded to the second pixel resolution and the corresponding difference is rounded to the first pixel resolution.
  • only the second MVP candidate is rounded to the second pixel resolution, whereas both differences and the first MVP candidate are rounded to the first pixel resolution.
  • the first pixel resolution is the quarter -pixel resolution and the second pixel resolution is the half-pixel resolution.
  • the list of MVP candidates consists of two MVP candidates, where the first MVP candidate in the list of MVP candidates and the corresponding difference are rounded to the first pixel resolution, whereas the second MVP candidate is rounded to the second pixel resolution and the corresponding difference is rounded to the first pixel resolution.
  • the first resolution is achieved by representing the difference between the motion vector and the first MVP candidate (delta) with full-pixel positions while the second resolution is achieved for the second MVP candidate by rounding to full pixel positions only if it is not a half-pixel position. In other words, instead of rounding all values to the nearest value divisible by four, e.g.
  • the second resolution contains full pixel positions and half-pixel positions while the first resolution contains full pixel positions.
  • the second resolution could be achieved by representing the MV prediction candidate with full pixel positions while the first resolution is achieved for the delta by rounding to full pixel positions only if it is not a half-pixel position. In this case the first resolution contains full pixel positions and half-pixel positions and the second resolution contains only full pixel positions.
  • the counting could be done in different ways, implying that it does not have to always be the odd values that are rounded.
  • the rounding has a sort of bias towards full-pixel positions. If there is 16 th pel (pixel) resolution, one could still round to half-pel positions, but the directions of rounding of the 15 sub-pel positions could be different.
  • Figure 1 C shows one MVP candidate rounded to half-pixel precision before the delta is applied, leading to twice as many possible final motion vectors.
  • the different resolutions are achieved by rounding differently for each candidate.
  • the video encoder it is possible for the video encoder to achieve different pixel resolution in order to get a different sub-pixel position for the final MV (i.e. the selected MV prediction candidate + the delta).
  • the pixel resolution containing both half -pixel resolution and full pixel resolution is achieved by performing conditional rounding to full-pixel resolution for e.g. the second prediction candidate.
  • the rounding of the second prediction candidate (or its associated delta) is done separately for its x- and y-component. If the component is at half-pixel resolution, then no rounding to full-pixel will be performed for this component. Otherwise, if the component is at quarter-pixel resolution, it will be rounded to full-pixel resolution. However, instead of always rounding up or always rounding down, we round to the nearest full-pixel position.
  • the full-pixel positions have a slight advantage: they require no filtering (interpolation) during motion estimation. For example, assuming the two prediction candidates before rounding are (5, 3) and (5, 6), with our proposed rounding, they become (4, 4) and (4, 6).
  • the encoder can then decide whether to use the first candidate to make the final motion vector pointing to full -pixel resolution, or use the second candidate to make the y component of the final motion vector pointing to half-pixel resolution.
  • a method, performed by a video decoder, of decoding a motion vector represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate is provided, as described in Figure 2B.
  • the method comprises decoding an index of the MVP candidate in a list of at least two MVP candidates.
  • One of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution, wherein the first pixel resolution is different than the second pixel resolution.
  • the method comprises decoding the corresponding difference.
  • the method comprises reconstructing the motion vector as a sum of the MVP candidate with the decoded index and the corresponding difference.
  • the list of MVP candidates consists of two MVP candidates.
  • the first MVP candidate in the list of MVP candidates and the corresponding difference are rounded to the first pixel resolution
  • the second MVP candidate is rounded to the second pixel resolution and the corresponding difference is rounded to the first pixel resolution.
  • only the second MVP candidate is rounded to the second pixel resolution
  • both differences and the first MVP candidate are rounded to the first pixel resolution.
  • the first pixel resolution is the quarter -pixel resolution and the second pixel resolution is the half-pixel resolution.
  • the list of MVP candidates consists of two MVP candidates, where the first MVP candidate in the list of MVP candidates and the corresponding difference are rounded to the first pixel resolution, whereas the second MVP candidate is rounded to the second pixel resolution and the corresponding difference is rounded to the first pixel resolution.
  • the first resolution is achieved by representing the difference between the motion vector and the first MVP candidate (delta) with full-pixel positions while the second resolution is achieved for the second MVP candidate by rounding to full pixel positions only if it is not a half-pixel position.
  • a video encoder for encoding motion vectors, MV is provided, wherein a motion vector is represented as a sum of a MVP candidate and a corresponding difference between the motion vector and the MVP candidate.
  • the video encoder is configured to select a MVP candidate from a list of at least two MVP candidates. One of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution.
  • the video encoder is configured send an index of the selected MVP candidate in the list of MVP candidates and the corresponding difference to a video decoder.
  • a video decoder for decoding a motion vector represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate.
  • the video decoder is configured to decode an index of the MVP candidate in a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution.
  • the video decoder is configured to decode the corresponding difference.
  • the video decoder is further configured to reconstruct the motion vector as a sum of the MVP candidate with the decoded index and the corresponding difference.
  • the video encoder encodes motion vectors, wherein a motion vector is represented as a sum of a MVP candidate and a corresponding difference between the motion vector and the MVP candidate.
  • the video encoder comprises processing means and a memory comprising instructions which, when executed by the processing means, cause the encoder to select a MVP candidate from a list of at least two MVP candidates.
  • One of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution.
  • the first pixel resolution is different than the second pixel resolution.
  • the video encoder comprises processing means and a memory comprising instructions which, when executed by the processing means, cause the encoder to send an index of the selected MVP candidate in the list of MVP candidates and the corresponding difference to a video decoder.
  • the video decoder decodes a motion vector represented as a sum of a MVP candidate and a corresponding difference between the motion vector and the MVP candidate.
  • the video decoder comprises processing means and a memory comprising instructions which, when executed by the processing means, cause the video decoder to decode an index of the MVP candidate in a list of at least two MVP candidates.
  • One of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution.
  • the first pixel resolution is different than the second pixel resolution.
  • the video decoder comprises processing means and a memory comprising instructions which, when executed by the processing means, cause the video decoder to decode the corresponding difference.
  • the video decoder comprises processing means and a memory comprising instructions which, when executed by the processing means, cause the video decoder to reconstruct the motion vector as a sum of the MVP candidate with the decoded index and the corresponding difference.
  • the video encoder and the video decoder may be implemented in hardware, in software or a combination of hardware and software.
  • the video encoder and the video decoder may be implemented in, e.g.
  • user equipment such as a mobile telephone, such as smart phone, tablet, desktop, netbook, multimedia player, video streaming server, set-top box or computer.
  • a mobile telephone such as smart phone, tablet, desktop, netbook, multimedia player, video streaming server, set-top box or computer.
  • Figure 4A illustrates a particular hardware implementation of a video encoder 100a according to the embodiments.
  • the video encoder 100a comprises an input unit 104a configured to obtain the video to be encoded.
  • the video encoder 100a may also comprise an output unit 105a configured to output an encoded bitstream.
  • the input unit 104a could be in the form of a general input unit, in particular in the case of a wired connection to external devices.
  • the input unit 104a could be in the form of a receiver or transceiver, in particular in the case or a wireless connection to external devices.
  • the output unit 105a could be in the form of a general output unit, in particular in the case of a wired connection to external devices.
  • the output unit 105a could be in the form of a transmitter or transceiver, in particular in the case or a wireless connection to external devices.
  • the input unit 104a is preferably connected to the encoding unit 101a to forward the video to be encoded thereto.
  • the encoding unit 101a is preferably to the output unit 105a to forward the encoded bitstream to a decoder.
  • processing circuitry such as one or more processors or processing units.
  • processing circuitry includes, but is not limited to, one or more microprocessors, one or more Digital Signal Processors (DSPs), one or more Central Processing Units (CPUs), video acceleration hardware, and/or any suitable programmable logic circuitry such as one or more Field Programmable Gate Arrays (FPGAs), or one or more Programmable Logic Controllers (PLCs).
  • DSPs Digital Signal Processors
  • CPUs Central Processing Units
  • FPGAs Field Programmable Gate Arrays
  • PLCs Programmable Logic Controllers
  • Figure 4B illustrates a particular hardware implementation of a video decoder 100b according to the embodiments.
  • the video decoder 100b comprises an input unit 104b configured to obtain the bitstream representing the video to be decoded.
  • the video decoder 100b may also comprise an output unit 105b configured to output a decoded video.
  • the input unit 104b could be in the form of a general input unit, in particular in the case of a wired connection to external devices.
  • the input unit 104b could be in the form of a receiver or transceiver, in particular in the case or a wireless connection to external devices.
  • the output unit 105b could be in the form of a general output unit, in particular in the case of a wired connection to external devices.
  • the output unit 105b could be in the form of a transmitter or transceiver, in particular in the case or a wireless connection to external devices.
  • the input unit 104b is preferably connected to the decoding unit 101b to forward the video bitstream to be decoded thereto.
  • the decoding unit 101b is preferably to the output unit 105b.
  • the video encoder 110a may comprise multiple different physical components that make up a single illustrated component, e.g. the input unit 113a may comprise terminals for coupling wires for a wired connection and a radio transceiver for a wireless connection.
  • the video encoder 110a may be composed of multiple physically separate components which may each have their own respective processor, memory, and interface components. In certain scenarios in which the video encoder 110a comprises multiple separate components, one or more of the separate components may be shared among several devices. For example, a single memory unit may be shared by multiple video encoders 110a.
  • the processor 111a may be a combination of one or more of a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, or any other suitable computing device, resource, or combination of hardware, software and/or encoded logic operable to provide, either alone or in conjunction with other device components, such as the memory 112a, or device functionality.
  • the processor 111a may execute instructions stored in the memory 112a.
  • Such functionality may include providing various encoding features and/or any of the other features or benefits disclosed herein.
  • the memory 112a may comprise any form of volatile or non- volatile computer readable memory including, without limitation, persistent memory, solid state memory, remotely mounted memory, magnetic media, optical media, random access memory (RAM), read-only memory (ROM), removable media, or any other suitable local or remote memory component.
  • the memory 112a may store any suitable instructions, data or information, including software and encoded logic, utilized by the device 110a.
  • the memory 112a may be used to store any calculations made by the processor 111a and/or any data received via the I/O interfaces, including the input unit 113a.
  • the video encoder 110a also comprises an input unit 113a and an output unit 114a, i.e. I/O interfaces, which may be used in the wired or wireless communication of video and/or data to and from the video encoder 110a.
  • the I/O interfaces may include a radio transmitter and/or receiver that may be coupled to or a part of an antenna.
  • the I/O interfaces may receive video that is to be encoded.
  • the memory 112a may comprise computer readable means on which a computer program can be stored.
  • the computer program may include instructions which cause the processor 111 a, and any operatively coupled entities and devices, such as the input unit 113a, the output unit 114a, and the memory 112a, to execute methods according to video encoding embodiments described herein.
  • the computer program and/or computer program product may thus provide means for performing any steps herein disclosed.
  • Each functional module may comprise software, computer programs, sub-routines, libraries, source code, or any other form of executable instructions that are executed by, for example, a processor.
  • each functional module may be implemented in hardware and/or in software.
  • one or more or all functional modules may be implemented by the processor 111a, possibly in cooperation with the memory 112a.
  • the processor 111a and the memory 112a may, thus, be arranged to allow the processor 11 la to fetch instructions from the memory 112a and execute the fetched instructions to allow the respective functional module to perform any steps or functions disclosed herein.
  • the components of Figure 5B are depicted as single boxes located within a single larger box.
  • the video decoder 11 Ob may comprise multiple different physical components that make up a single illustrated component, e.g. the input unit 113b may comprise terminals for coupling wires for a wired connection and a radio transceiver for a wireless connection.
  • the video decoder 11 Ob may be composed of multiple physically separate components which may each have their own respective processor, memory, and interface components.
  • the video decoder 110b comprises multiple separate components, one or more of the separate components may be shared among several devices. For example, a single memory unit may be shared by multiple video decoders 110b.
  • the processor 111b may be a combination of one or more of a microprocessor, controller,
  • microcontroller central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, or any other suitable computing device, resource, or combination of hardware, software and/or decoding logic operable to provide, either alone or in conjunction with other device components, such as the memory 112b, or device functionality.
  • the processor 111b may execute instructions stored in the memory 112b.
  • Such functionality may include providing various decoding features and/or any of the other features or benefits disclosed herein.
  • the memory 112b may comprise any form of volatile or non- volatile computer readable memory including, without limitation, persistent memory, solid state memory, remotely mounted memory, magnetic media, optical media, random access memory (RAM), read-only memory (ROM), removable media, or any other suitable local or remote memory component.
  • the memory 112b may store any suitable instructions, data or information, including software and decoding logic, utilized by the video decoder 110b.
  • the memory 112b may be used to store any calculations made by the processor 111b and/or any data received via the I/O interfaces, including the input unit 113b.
  • the video decoder 110b also comprises an input unit 113b and an output unit 114b, i.e. I/O interfaces, which may be used in the wired or wireless communication of video and/or data to and from the video decoder 110b.
  • the I/O interfaces may include a radio transmitter and/or receiver that may be coupled to or a part of an antenna.
  • the I/O interfaces may receive video that is to be decoded. Any appropriate steps, methods, or functions may be performed through a computer program product that may, for example, be executed by the components and equipment illustrated in the attached figures.
  • the memory 112b may comprise computer readable means on which a computer program can be stored.
  • the computer program may include instructions which cause the processor 11 lb, and any operatively coupled entities and devices, such as the input unit 113b, the output unit 114b, and the memory 112b, to execute methods according to embodiments described herein.
  • the computer program and/or computer program product may thus provide means for performing any steps herein disclosed.
  • Each functional module may comprise software, computer programs, sub-routines, libraries, source code, or any other form of executable instructions that are executed by, for example, a processor.
  • each functional module may be implemented in hardware and/or in software.
  • one or more or all functional modules may be implemented by the processor 11 lb, possibly in cooperation with the memory 112b.
  • the processor 111b and the memory 112b may, thus, be arranged to allow the processor 11 lb to fetch instructions from the memory 112b and execute the fetched instructions to allow the respective functional module to perform any steps or functions disclosed herein.
  • FIG. 6A is a schematic block diagram illustrating an example of a user equipment (UE) 200a comprising a processor 210a, an associated memory 220a and a communication circuitry 230a.
  • UE user equipment
  • a computer program 240a which is loaded into the memory 220a for execution by processing circuitry including one or more processors 210a.
  • the processor 210a and the memory 220a are interconnected to each other to enable normal software execution.
  • a communication circuitry 230a is also interconnected to the processor 210a and/or the memory 220a to enable input and/or output of video data and tune-in or seek requests.
  • the user equipment 200a can be any device or apparatus that can receive and process video data, i.e., encode motion vectors according to the embodiments described above.
  • the user equipment 200a could be a computer, either stationary or portable, such as laptop, a smart phone, a tablet, a set-top box, etc.
  • the term 'processor' should be interpreted in a general sense as any system or device capable of executing program code or computer program instructions to perform a particular processing, determining or computing task.
  • the processing circuitry including one or more processors is thus configured to perform, when executing the computer program, well-defined processing tasks such as those described herein.
  • the processing circuitry does not have to be dedicated to only execute the above-described steps, functions, procedure and/or blocks, but may also execute other tasks.
  • the proposed technology also provides a carrier 250a comprising the computer program 240a.
  • the carrier 250a is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium 250a.
  • the software or computer program 240a may be realized as a computer program product, which is normally carried or stored on a computer-readable medium 240a, preferably nonvolatile computer-readable storage medium 250a.
  • the computer-readable medium 250a may include one or more removable or non-removable memory devices including, but not limited to a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc (CD), a Digital Versatile Disc (DVD), a Blue-ray disc, a Universal Serial Bus (USB) memory, a Hard Disk Drive (HDD) storage device, a flash memory, a magnetic tape, or any other conventional memory device.
  • the computer program 240a may thus be loaded into the operating memory of a computer or equivalent processing device, represented by the user equipment 200a in Figure 6A, for execution by the processor 210a thereof.
  • FIG. 6B is a schematic block diagram illustrating an example of a user equipment (UE) 200b comprising a processor 210b, an associated memory 220b and a communication circuitry 230b.
  • UE user equipment
  • a computer program 240b which is loaded into the memory 220b for execution by processing circuitry including one or more processors 210b.
  • the processor 210b and the memory 220b are interconnected to each other to enable normal software execution.
  • a communication circuitry 230b is also interconnected to the processor 210b and/or the memory 220b to enable input and/or output of video data and tune-in or seek requests.
  • the user equipment 200b can be any device or apparatus that can receive and process video data, i.e., encode motion vectors according to the embodiments described above.
  • the user equipment 200b could be a computer, either stationary or portable, such as laptop, a smart phone, a tablet, a set-top box, etc.
  • processor' should be interpreted in a general sense as any system or device capable of executing program code or computer program instructions to perform a particular processing, determining or computing task.
  • the processing circuitry including one or more processors is thus configured to perform, when executing the computer program, well-defined processing tasks such as those described herein.
  • the processing circuitry does not have to be dedicated to only execute the above-described steps, functions, procedure and/or blocks, but may also execute other tasks.
  • the proposed technology also provides a carrier 250b comprising the computer program 240b.
  • the carrier 250b is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium 250b.
  • the software or computer program 240b may be realized as a computer program product, which is normally carried or stored on a computer-readable medium 240b, preferably nonvolatile computer-readable storage medium 250b.
  • the computer-readable medium 250b may include one or more removable or non-removable memory devices including, but not limited to a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc (CD), a Digital Versatile Disc (DVD), a Blue-ray disc, a Universal Serial Bus (USB) memory, a Hard Disk Drive (HDD) storage device, a flash memory, a magnetic tape, or any other conventional memory device.
  • the computer program 240b may thus be loaded into the operating memory of a computer or equivalent processing device, represented by the user equipment 200b in Fig. 6B for execution by the processor 210b thereof.
  • a further aspect of certain embodiments defines a computer program product for a video encoder comprising a computer program 240a for an encoder and a computer readable means 250a on which the computer program 240a for a video encoder is stored.
  • a further aspect of certain embodiments defines a computer program product for a video decoder comprising a computer program 240b for an encoder and a computer readable means 250b on which the computer program 240b for a video decoder is stored.
  • a corresponding device may be defined as a group of function modules, where each step performed by the processor corresponds to a function module.
  • the function modules are implemented as a computer program running on the processor.
  • the device may alternatively be defined as a group of function modules, where the function modules are implemented as a computer program running on at least one processor.
  • the computer program residing in memory may thus be organized as appropriate function modules configured to perform, when executed by the processor, at least part of the steps and/or tasks described herein.
  • An example of such function modules is illustrated in Figures 7 A and 7B.
  • FIG. 7 A is a schematic block diagram of a video encoder 120a with function modules.
  • the video encoder 120a comprises an encoder 810a further comprising a selection module for selecting a MVP candidate from a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution, wherein the first pixel resolution is different than the second pixel resolution, and a sending module for sending an index of the selected MVP candidate in the list of MVP candidates and the corresponding difference to a video decoder.
  • a further aspect of the embodiments relates to a user equipment comprising a video encoder according to the embodiments, such as illustrated in any of Figs. 4A, 5 A or 7 A.
  • the user equipment is selected from a group consisting of a computer, a laptop, a desktop, a multimedia player, a video streaming server, a mobile telephone, a smart phone, a tablet and a set-top box.
  • Yet another aspect of the embodiments relates to a signal representing an encoded version wherein the motion vector is encoded according to the present invention.
  • computing services such as hardware and/or software
  • network devices such as network nodes and/or servers
  • functionality can be distributed or re-located to one or more separate physical nodes or servers.
  • the functionality may be re-located or distributed to one or more jointly acting physical and/or virtual machines that can be positioned in separate physical node(s), i.e. in the so-called cloud.
  • cloud computing is a model for enabling ubiquitous on-demand network access to a pool of configurable computing resources such as networks, servers, storage, applications and general or customized services.
  • FIG. 7B is a schematic block diagram of a video decoder 120b with function modules.
  • the video decoder 120b comprises a decoder 820b further comprising a decoding module for decoding an index of the MVP candidate in a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution, wherein the first pixel resolution is different than the second pixel resolution.
  • the decoder further comprises a decoding module for decoding the corresponding difference and a reconstruction module for reconstructing the motion vector as a sum of the MVP candidate with the decoded index and the corresponding difference.
  • a further aspect of the embodiments relates to a user equipment comprising a video decoder according to the embodiments, such as illustrated in any of Figures. 4B, 5B or 7B.
  • the user equipment is selected from a group consisting of a computer, a laptop, a desktop, a multimedia player, a video streaming server, a mobile telephone, a smart phone, a tablet and a set-top box.
  • Yet another aspect of the embodiments relates to a signal representing a decoded version wherein the motion vector is decoded according to the present invention.
  • computing services such as hardware and/or software
  • network devices such as network nodes and/or servers
  • functionality can be distributed or re-located to one or more separate physical nodes or servers.
  • the functionality may be re-located or distributed to one or more jointly acting physical and/or virtual machines that can be positioned in separate physical node(s), i.e. in the so-called cloud.
  • cloud computing is a model for enabling ubiquitous on-demand network access to a pool of configurable computing resources such as networks, servers, storage, applications and general or customized services.
  • Figure 8 A is a schematic diagram illustrating an example of how functionality can be distributed or partitioned between different network devices 300a, 301a, 302a in a general case.
  • the network devices 300a, 301a, 302a may be part of the same wireless communication system, or one or more of the network devices may be so-called cloud-based network devices located outside of the wireless communication system.
  • Figure 8B is a schematic diagram illustrating an example of how functionality can be distributed or partitioned between different network devices 300b, 301b, 302b in a general case.
  • the network devices 300b, 301b, 302b may be part of the same wireless communication system, or one or more of the network devices may be so-called cloud-based network devices located outside of the wireless communication system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

There are provided mechanisms for encoding motion vectors, wherein a motion vector is represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate. The method comprises selecting a MVP candidate from a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution. The method comprises sending an index of the selected MVP candidate in the list of MVP candidates and the corresponding difference to a video decoder. There are provided mechanisms for decoding a motion vector represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate. The method comprises decoding (S3) an index of the MVP candidate in a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution. The method comprises decoding the corresponding difference. The method comprises reconstructing the motion vector as a sum of the MVP candidate with the decoded index and the corresponding difference.

Description

METHODS AND ARRANGEMENTS FOR CODING AND DECODING MOTION VECTORS
TECHNICAL FIELD
In the embodiments methods and arrangements for video encoding and decoding are provided. In particular, the embodiments relate to encoding and decoding of motion vectors.
BACKGROUND
High Efficiency Video Coding (HEVC) is a block based video codec standardized by the
Telecommunication Standardization Sector of the International Telecommunications Union (ITU-T) and the Moving Pictures Experts Group (MPEG) that utilizes both temporal and spatial prediction. Spatial prediction is achieved using intra (I) prediction from within the current picture. Temporal prediction is achieved using inter (P) or bi-directional inter (B) prediction on a block level from previously decoded reference pictures. The difference between the original pixel data and the predicted pixel data, referred to as the residual, is transformed into the frequency domain and quantized and entropy encoded (e.g., Context-Adaptive Variable Length Coding (CAVLC), Context- Adaptive Binary Arithmetic Coding (CABAC), etc.) before being transmitted together with necessary prediction parameters, e.g., mode selections and motion vectors, after entropy encoding them. By quantizing the transformed residuals, the tradeoff between bitrate and quality of the video may be controlled. The level of quantization is determined by the quantization parameter (QP). The decoder performs entropy decoding, inverse quantization, and inverse transformation to obtain the residual, and then adds the residual to an intra- or inter-prediction to reconstruct a picture.
When encoding video, the first picture will typically be encoded as a still image. Subsequent pictures will include references to previous pictures, along with motion vectors indicating the movement between the pictures. Motion vectors will typically be supplied for blocks of pixels, ranging from 4x4 to 64x64 pixels in size, and can constitute up to about 50% of the total bitrate of encoded video.
Several techniques are used to reduce the size of the encoded motion vectors, to reduce costs, and/or to improve performance. An example of such technique is motion vector prediction. In motion vector prediction, instead of explicitly signaling the motion vector to be used in a block, the encoder and decoder will instead follow an identical procedure in order to construct a list of motion vector (MV) prediction candidates. The encoder then signals which of the MV prediction candidates should be used, and a MV difference between the MV prediction candidate and the desired MV. This difference is also referred to as delta. That implies that the MV prediction candidate + the delta= the desired MV. In HEVC, the list of MV prediction candidates contains two separate prediction candidates.
Furthermore, the MVs in HEVC are defined in quarter-pixel resolution, so a motion vector with an x- component of 4 that corresponds to one full pixel of movement in the horizontal direction.
In the current draft of JEM, the working model for a successor to HEVC, it is possible to specify for certain areas that full-pixel resolution should be used for MVs instead. In this case, the two MV prediction candidates are rounded to full-pixel positions, and the coded MV difference (the delta) is treated as a full-pixel difference from the selected MV prediction candidate. In terms of quarter-pixel positions the two MV prediction candidates are rounded to values divisible by four, and the coded MV difference is multiplied by four before being added to the selected MV prediction candidate on the decoder side. For example, normally it would be possible to use pixel positions 0 through 8, with 0, 4 and 8 corresponding to full-pixel positions, and the rest being half -pixel or quarter-pixel positions. Using full-pixel coding, they would be rounded to 0, 4 or 8, coded as 0, 1 or 2 and then multiplied by 4 on the decoder side to arrive at 0, 4 and 8 again.
Although many techniques are used to reduce the number of bits spent on coding motion vectors, they can still constitute up to 50% of video bitrate, and reducing video bitrate continues to be very important.
SUMMARY
In order to reduce the cost, in terms of a number of bits used for coding of MVs, a method and arrangements for encoding and decoding of motion vectors are provided.
A first aspect of the embodiments defines a method, performed by a video encoder, for encoding motion vectors. A motion vector is represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate (also referred to as delta). The method comprises selecting a MVP candidate from a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution. The method comprises sending an index of the selected MVP candidate in the list of MVP candidates and the corresponding difference to a video decoder.
A second aspect of the embodiments defines a video encoder, for encoding motion vectors. A motion vector is represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate. The video encoder comprises processing means operative to select a MVP candidate from a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution. The video encoder comprises processing means operative to send an index of the selected MVP candidate in the list of MVP candidates and the corresponding difference to a video decoder.
A third aspect of the embodiments defines a computer program, for encoding motion vectors, wherein a motion vector is represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate. The computer program comprises code means which, when run on a computer, causes the computer to select a MVP candidate from a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution. The computer program comprises code means which, when run on a computer, causes the computer to send an index of the selected MVP candidate in the list of MVP candidates and the corresponding difference to a video decoder.
A fourth aspect of the embodiments defines a carrier comprising a computer program according to the third embodiment. The carrier is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium.
A fifth aspect of the embodiments defines a computer program product comprising computer readable means and a computer program according to the third aspect, stored on the computer readable means. A sixth aspect of the embodiments defines a method, performed by a video decoder, of decoding a motion vector represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate. The method comprises decoding an index of the MVP candidate in a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution. The method comprises decoding the corresponding difference. The method comprises reconstructing the motion vector as a sum of the MVP candidate with the decoded index and the corresponding difference.
A seventh aspect of the embodiments defines a video decoder, for decoding a motion vector represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate. The video decoder comprises processing means operative to decode an index of the MVP candidate in a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution. The video decoder comprises processing means operative to decode the corresponding difference. The video decoder comprises processing means operative to reconstruct the motion vector as a sum of the MVP candidate with the decoded index and the corresponding difference.
An eighth aspect of the embodiments defines a computer program for decoding a motion vector represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate. The computer program comprises code means which, when run on a computer, causes the computer to decode an index of the MVP candidate in a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution. The computer program comprises code means which, when run on a computer, causes the computer to decode the corresponding difference. The computer program comprises code means which, when run on a computer, causes the computer to reconstruct the motion vector as a sum of the MVP candidate with the decoded index and the corresponding difference.
A ninth aspect of the embodiments defines a carrier comprising a computer program according to the eighth embodiment. The carrier is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium. A tenth aspect of the embodiments defines a computer program product comprising computer readable means and a computer program according to the ninth aspect, stored on the computer readable means.
The video encoder and/or the video decoder can be implemented in a server or a user device, e.g., in a server at a content distributor. The user device may be a camera, a mobile phone, tablet, laptop, etc.
Advantageously, at least some of the embodiments provide that more possible MVs can be signaled in the full-pixel mode, without requiring more bits to be coded. Tests show that this provides a compression efficiency improvement of 0.1%, while having a relatively low complexity cost.
It is to be noted that any feature of the first, second, third, fourth, fifth, sixth, seventh, eighth, ninth and tenth aspects may be applied to any other aspect, whenever appropriate. Likewise, any advantage of the first aspect may equally apply to the second, third, fourth, fifth, sixth, seventh, eighth, ninth and tenth aspect respectively, and vice versa. Other objectives, features and advantages of the enclosed
embodiments will be apparent from the following detailed disclosure, from the attached dependent claims and from the drawings.
Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the element, apparatus, component, means, step, etc." are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, step, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention, together with further objects and advantages thereof, may best be understood by referring to the following description taken together with the accompanying drawings, in which:
Figures 1A and IB illustrate examples of motion vector prediction according to the prior art.
Figure 1C illustrates an example of motion vector prediction according to an embodiment of the present invention.
Figure 2A illustrates the steps performed in an encoding method according to the embodiments of the present invention. Figure 2B illustrates the steps performed in a decoding method according to the embodiments of the present invention.
Figure 3 illustrates how a motion vector prediction candidate is rounded according to an embodiment of the present invention.
Figures 4A, 5A, 6A and 7A depict a schematic block diagram illustrating functional units of a video encoder for encoding a motion vector according to embodiments of the present invention.
Figures 4B, 5B, 6B and 7B depict a schematic block diagram illustrating functional units of a video decoder for decoding a motion vector according to embodiments of the present invention.
Figures 8 A and 8B are schematic diagrams illustrating an example of how functionality can be distributed or partitioned between different network devices according to embodiments of the present invention.
DETAILED DESCRIPTION
As mentioned above, the resolution of MVs in HEVC is generally quarter-pixel, but full-pixel resolution is occasionally used to reduce the cost of signaling the MVs. When using full-pixel precision, the MV prediction candidates are rounded to full-pixel positions before a full-pixel motion vector difference is added. This gives final MVs at full-pixel positions, which is beneficial because no filtering (i.e., interpolation) is required during motion compensation. Filtering in this context could imply weighted averaging. An example of filtering is when the full-pixel values are coded by the previous frame, but are interpolated in order to guess the sub-pixel values.
Figure 1 A illustrates a MV predictor and some possible final MVs after a full-pixel delta is applied. The circle grid represents full-pixel positions. The big circle corresponds to the MV predictor. As the MV delta is in full-pixel, that would lead the final MVs to also be in full-pixel positions which are indicated by the small circles.
Figure IB illustrates two MV predictors both rounded to full-pixel with a full-pixel delta applied. Rounding to full-pixel positions for both the MV prediction candidates means that only full-pixel positions can be signaled for either candidate as illustrated in figure IB. Compared to Figure 1A, the additional MV predictor is shown with the big cross. The possible final MVs when using this additional MV predictor are indicated by the small crosses. As can be seen, the number of possible final MV positions does not increase. According to one aspect, a method, performed by a video encoder, for encoding motion vectors is provided, as described in Figure 2A. A motion vector is represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate, the difference also being referred to as delta. The method comprises a step SI of selecting a MVP candidate from a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution. The method comprises a step S2 of sending an index of the selected MVP candidate in the list of MVP candidates and the corresponding difference to a video decoder.
According to an embodiment, the list of MVP candidates consists of two MVP candidates. This is a typical scenario in HEVC and H.264/AVC. The two candidates have indices 0 and 1 in the list of MVP candidates. According to this embodiment, the first MVP candidate in the list of MVP candidates and the corresponding difference are rounded to the first pixel resolution, whereas the second MVP candidate is rounded to the second pixel resolution and the corresponding difference is rounded to the first pixel resolution. Thus, according to this embodiment, only the second MVP candidate is rounded to the second pixel resolution, whereas both differences and the first MVP candidate are rounded to the first pixel resolution.
According to an embodiment, the first pixel resolution is the quarter -pixel resolution and the second pixel resolution is the half-pixel resolution. Similar to the embodiment described above, the list of MVP candidates consists of two MVP candidates, where the first MVP candidate in the list of MVP candidates and the corresponding difference are rounded to the first pixel resolution, whereas the second MVP candidate is rounded to the second pixel resolution and the corresponding difference is rounded to the first pixel resolution. The first resolution is achieved by representing the difference between the motion vector and the first MVP candidate (delta) with full-pixel positions while the second resolution is achieved for the second MVP candidate by rounding to full pixel positions only if it is not a half-pixel position. In other words, instead of rounding all values to the nearest value divisible by four, e.g. only odd values are rounded in this way. An example of this is shown in Figure 3, showing a range for the second MVP candidate values of 0 through 8. Here value 1 is rounded to 0, values 3 and 5 are rounded to 4 and value 7 is rounded to 8. The rest are not rounded. Therefore, the second resolution contains full pixel positions and half-pixel positions while the first resolution contains full pixel positions. It should also be noted that the second resolution could be achieved by representing the MV prediction candidate with full pixel positions while the first resolution is achieved for the delta by rounding to full pixel positions only if it is not a half-pixel position. In this case the first resolution contains full pixel positions and half-pixel positions and the second resolution contains only full pixel positions.
It should be noted that the counting (coding) could be done in different ways, implying that it does not have to always be the odd values that are rounded. The rounding has a sort of bias towards full-pixel positions. If there is 16th pel (pixel) resolution, one could still round to half-pel positions, but the directions of rounding of the 15 sub-pel positions could be different.
Figure 1 C shows one MVP candidate rounded to half-pixel precision before the delta is applied, leading to twice as many possible final motion vectors. According to the embodiments and as illustrated in Figure 1 C, the different resolutions are achieved by rounding differently for each candidate. Hence, it is possible for the video encoder to achieve different pixel resolution in order to get a different sub-pixel position for the final MV (i.e. the selected MV prediction candidate + the delta).
According to an embodiment, the pixel resolution containing both half -pixel resolution and full pixel resolution is achieved by performing conditional rounding to full-pixel resolution for e.g. the second prediction candidate. The rounding of the second prediction candidate (or its associated delta) is done separately for its x- and y-component. If the component is at half-pixel resolution, then no rounding to full-pixel will be performed for this component. Otherwise, if the component is at quarter-pixel resolution, it will be rounded to full-pixel resolution. However, instead of always rounding up or always rounding down, we round to the nearest full-pixel position.
The reason for this is that the full-pixel positions have a slight advantage: they require no filtering (interpolation) during motion estimation. For example, assuming the two prediction candidates before rounding are (5, 3) and (5, 6), with our proposed rounding, they become (4, 4) and (4, 6). The encoder can then decide whether to use the first candidate to make the final motion vector pointing to full -pixel resolution, or use the second candidate to make the y component of the final motion vector pointing to half-pixel resolution.
This proposed rounding scheme can still benefit from the cost (bitrate) saving for signaling of full-pixel motion vector differences, but also offer possibility of letting the final motion vector have sub-pixel resolution. As in the prior art, this rounding is performed separately for the x- and y-directions. According to one aspect, a method, performed by a video decoder, of decoding a motion vector represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate is provided, as described in Figure 2B. The method comprises decoding an index of the MVP candidate in a list of at least two MVP candidates. One of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution, wherein the first pixel resolution is different than the second pixel resolution. The method comprises decoding the corresponding difference. The method comprises reconstructing the motion vector as a sum of the MVP candidate with the decoded index and the corresponding difference.
According to an embodiment, the list of MVP candidates consists of two MVP candidates. The first MVP candidate in the list of MVP candidates and the corresponding difference are rounded to the first pixel resolution, whereas the second MVP candidate is rounded to the second pixel resolution and the corresponding difference is rounded to the first pixel resolution. Thus, according to this embodiment, only the second MVP candidate is rounded to the second pixel resolution, whereas both differences and the first MVP candidate are rounded to the first pixel resolution. This embodiment assumes that the same resolutions for the MVP candidates and the corresponding differences are used at the encoder side, i.e. that the encoder and the decoder are synchronized on how to represent the MPV candidates and the corresponding differences.
According to an embodiment, the first pixel resolution is the quarter -pixel resolution and the second pixel resolution is the half-pixel resolution. Similar to the embodiment described above, the list of MVP candidates consists of two MVP candidates, where the first MVP candidate in the list of MVP candidates and the corresponding difference are rounded to the first pixel resolution, whereas the second MVP candidate is rounded to the second pixel resolution and the corresponding difference is rounded to the first pixel resolution. The first resolution is achieved by representing the difference between the motion vector and the first MVP candidate (delta) with full-pixel positions while the second resolution is achieved for the second MVP candidate by rounding to full pixel positions only if it is not a half-pixel position. Similarly, this embodiment also assumes that the same resolutions for the MVP candidates and the corresponding differences are used at the encoder side, i.e. that the encoder and the decoder are synchronized on how to represent the MPV candidates and the corresponding differences. A video encoder for encoding motion vectors, MV, is provided, wherein a motion vector is represented as a sum of a MVP candidate and a corresponding difference between the motion vector and the MVP candidate. The video encoder is configured to select a MVP candidate from a list of at least two MVP candidates. One of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution. The video encoder is configured send an index of the selected MVP candidate in the list of MVP candidates and the corresponding difference to a video decoder.
Further, a video decoder for decoding a motion vector represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate is provided. The video decoder is configured to decode an index of the MVP candidate in a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution. The video decoder is configured to decode the corresponding difference. The video decoder is further configured to reconstruct the motion vector as a sum of the MVP candidate with the decoded index and the corresponding difference.
Another aspect of certain embodiments defines a video encoder. The video encoder encodes motion vectors, wherein a motion vector is represented as a sum of a MVP candidate and a corresponding difference between the motion vector and the MVP candidate. The video encoder comprises processing means and a memory comprising instructions which, when executed by the processing means, cause the encoder to select a MVP candidate from a list of at least two MVP candidates. One of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution. The video encoder comprises processing means and a memory comprising instructions which, when executed by the processing means, cause the encoder to send an index of the selected MVP candidate in the list of MVP candidates and the corresponding difference to a video decoder.
Another aspect of certain embodiments defines a video decoder. The video decoder decodes a motion vector represented as a sum of a MVP candidate and a corresponding difference between the motion vector and the MVP candidate. The video decoder comprises processing means and a memory comprising instructions which, when executed by the processing means, cause the video decoder to decode an index of the MVP candidate in a list of at least two MVP candidates. One of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution. The first pixel resolution is different than the second pixel resolution. The video decoder comprises processing means and a memory comprising instructions which, when executed by the processing means, cause the video decoder to decode the corresponding difference. The video decoder comprises processing means and a memory comprising instructions which, when executed by the processing means, cause the video decoder to reconstruct the motion vector as a sum of the MVP candidate with the decoded index and the corresponding difference.
The video encoder and the video decoder may be implemented in hardware, in software or a combination of hardware and software. The video encoder and the video decoder may be implemented in, e.g.
comprised in, user equipment, such as a mobile telephone, such as smart phone, tablet, desktop, netbook, multimedia player, video streaming server, set-top box or computer.
Figure 4A illustrates a particular hardware implementation of a video encoder 100a according to the embodiments. In an embodiment, the video encoder 100a comprises an input unit 104a configured to obtain the video to be encoded.
The video encoder 100a may also comprise an output unit 105a configured to output an encoded bitstream.
The input unit 104a could be in the form of a general input unit, in particular in the case of a wired connection to external devices. Alternatively, the input unit 104a could be in the form of a receiver or transceiver, in particular in the case or a wireless connection to external devices. Correspondingly, the output unit 105a could be in the form of a general output unit, in particular in the case of a wired connection to external devices. Alternatively, the output unit 105a could be in the form of a transmitter or transceiver, in particular in the case or a wireless connection to external devices.
The input unit 104a is preferably connected to the encoding unit 101a to forward the video to be encoded thereto. The encoding unit 101a is preferably to the output unit 105a to forward the encoded bitstream to a decoder.
Alternatively, at least some of the steps, functions, procedures, modules and/or blocks described herein may be implemented in software such as a computer program for execution by suitable processing circuitry such as one or more processors or processing units. Examples of processing circuitry includes, but is not limited to, one or more microprocessors, one or more Digital Signal Processors (DSPs), one or more Central Processing Units (CPUs), video acceleration hardware, and/or any suitable programmable logic circuitry such as one or more Field Programmable Gate Arrays (FPGAs), or one or more Programmable Logic Controllers (PLCs).
It should also be understood that it may be possible to re-use the general processing capabilities of any conventional device or unit in which the proposed technology is implemented. It may also be possible to re-use existing software, e.g. by reprogramming of the existing software or by adding new software components.
Figure 4B illustrates a particular hardware implementation of a video decoder 100b according to the embodiments. In an embodiment, the video decoder 100b comprises an input unit 104b configured to obtain the bitstream representing the video to be decoded.
The video decoder 100b may also comprise an output unit 105b configured to output a decoded video.
The input unit 104b could be in the form of a general input unit, in particular in the case of a wired connection to external devices. Alternatively, the input unit 104b could be in the form of a receiver or transceiver, in particular in the case or a wireless connection to external devices. Correspondingly, the output unit 105b could be in the form of a general output unit, in particular in the case of a wired connection to external devices. Alternatively, the output unit 105b could be in the form of a transmitter or transceiver, in particular in the case or a wireless connection to external devices.
The input unit 104b is preferably connected to the decoding unit 101b to forward the video bitstream to be decoded thereto. The decoding unit 101b is preferably to the output unit 105b.
The components of Figure 5 A are depicted as single boxes located within a single larger box. In practice however, the video encoder 110a may comprise multiple different physical components that make up a single illustrated component, e.g. the input unit 113a may comprise terminals for coupling wires for a wired connection and a radio transceiver for a wireless connection. Similarly, the video encoder 110a may be composed of multiple physically separate components which may each have their own respective processor, memory, and interface components. In certain scenarios in which the video encoder 110a comprises multiple separate components, one or more of the separate components may be shared among several devices. For example, a single memory unit may be shared by multiple video encoders 110a. The processor 111a may be a combination of one or more of a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, or any other suitable computing device, resource, or combination of hardware, software and/or encoded logic operable to provide, either alone or in conjunction with other device components, such as the memory 112a, or device functionality. For example, the processor 111a may execute instructions stored in the memory 112a. Such functionality may include providing various encoding features and/or any of the other features or benefits disclosed herein.
The memory 112a may comprise any form of volatile or non- volatile computer readable memory including, without limitation, persistent memory, solid state memory, remotely mounted memory, magnetic media, optical media, random access memory (RAM), read-only memory (ROM), removable media, or any other suitable local or remote memory component. The memory 112a may store any suitable instructions, data or information, including software and encoded logic, utilized by the device 110a. The memory 112a may be used to store any calculations made by the processor 111a and/or any data received via the I/O interfaces, including the input unit 113a.
The video encoder 110a also comprises an input unit 113a and an output unit 114a, i.e. I/O interfaces, which may be used in the wired or wireless communication of video and/or data to and from the video encoder 110a. The I/O interfaces may include a radio transmitter and/or receiver that may be coupled to or a part of an antenna. The I/O interfaces may receive video that is to be encoded.
Any appropriate steps, methods, or functions may be performed through a computer program product that may, for example, be executed by the components and equipment illustrated in the attached figures. For example, the memory 112a may comprise computer readable means on which a computer program can be stored. The computer program may include instructions which cause the processor 111 a, and any operatively coupled entities and devices, such as the input unit 113a, the output unit 114a, and the memory 112a, to execute methods according to video encoding embodiments described herein. The computer program and/or computer program product may thus provide means for performing any steps herein disclosed.
Any appropriate steps, methods, or functions may be performed through one or more functional modules. Each functional module may comprise software, computer programs, sub-routines, libraries, source code, or any other form of executable instructions that are executed by, for example, a processor. In some embodiments, each functional module may be implemented in hardware and/or in software. For example, one or more or all functional modules may be implemented by the processor 111a, possibly in cooperation with the memory 112a. The processor 111a and the memory 112a may, thus, be arranged to allow the processor 11 la to fetch instructions from the memory 112a and execute the fetched instructions to allow the respective functional module to perform any steps or functions disclosed herein.
The components of Figure 5B are depicted as single boxes located within a single larger box. In practice however, the video decoder 11 Ob may comprise multiple different physical components that make up a single illustrated component, e.g. the input unit 113b may comprise terminals for coupling wires for a wired connection and a radio transceiver for a wireless connection. Similarly, the video decoder 11 Ob may be composed of multiple physically separate components which may each have their own respective processor, memory, and interface components. In certain scenarios in which the video decoder 110b comprises multiple separate components, one or more of the separate components may be shared among several devices. For example, a single memory unit may be shared by multiple video decoders 110b.
The processor 111b may be a combination of one or more of a microprocessor, controller,
microcontroller, central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, or any other suitable computing device, resource, or combination of hardware, software and/or decoding logic operable to provide, either alone or in conjunction with other device components, such as the memory 112b, or device functionality. For example, the processor 111b may execute instructions stored in the memory 112b. Such functionality may include providing various decoding features and/or any of the other features or benefits disclosed herein.
The memory 112b may comprise any form of volatile or non- volatile computer readable memory including, without limitation, persistent memory, solid state memory, remotely mounted memory, magnetic media, optical media, random access memory (RAM), read-only memory (ROM), removable media, or any other suitable local or remote memory component. The memory 112b may store any suitable instructions, data or information, including software and decoding logic, utilized by the video decoder 110b. The memory 112b may be used to store any calculations made by the processor 111b and/or any data received via the I/O interfaces, including the input unit 113b.
The video decoder 110b also comprises an input unit 113b and an output unit 114b, i.e. I/O interfaces, which may be used in the wired or wireless communication of video and/or data to and from the video decoder 110b. The I/O interfaces may include a radio transmitter and/or receiver that may be coupled to or a part of an antenna. The I/O interfaces may receive video that is to be decoded. Any appropriate steps, methods, or functions may be performed through a computer program product that may, for example, be executed by the components and equipment illustrated in the attached figures. For example, the memory 112b may comprise computer readable means on which a computer program can be stored. The computer program may include instructions which cause the processor 11 lb, and any operatively coupled entities and devices, such as the input unit 113b, the output unit 114b, and the memory 112b, to execute methods according to embodiments described herein. The computer program and/or computer program product may thus provide means for performing any steps herein disclosed.
Any appropriate steps, methods, or functions may be performed through one or more functional modules. Each functional module may comprise software, computer programs, sub-routines, libraries, source code, or any other form of executable instructions that are executed by, for example, a processor. In some embodiments, each functional module may be implemented in hardware and/or in software. For example, one or more or all functional modules may be implemented by the processor 11 lb, possibly in cooperation with the memory 112b. The processor 111b and the memory 112b may, thus, be arranged to allow the processor 11 lb to fetch instructions from the memory 112b and execute the fetched instructions to allow the respective functional module to perform any steps or functions disclosed herein.
Figure 6A is a schematic block diagram illustrating an example of a user equipment (UE) 200a comprising a processor 210a, an associated memory 220a and a communication circuitry 230a.
In this particular example, at least some of the steps, functions, procedures, modules and/or blocks described herein are implemented in a computer program 240a, which is loaded into the memory 220a for execution by processing circuitry including one or more processors 210a. The processor 210a and the memory 220a are interconnected to each other to enable normal software execution. A communication circuitry 230a is also interconnected to the processor 210a and/or the memory 220a to enable input and/or output of video data and tune-in or seek requests.
The user equipment 200a can be any device or apparatus that can receive and process video data, i.e., encode motion vectors according to the embodiments described above. For instance, the user equipment 200a could be a computer, either stationary or portable, such as laptop, a smart phone, a tablet, a set-top box, etc. The term 'processor' should be interpreted in a general sense as any system or device capable of executing program code or computer program instructions to perform a particular processing, determining or computing task.
The processing circuitry including one or more processors is thus configured to perform, when executing the computer program, well-defined processing tasks such as those described herein.
The processing circuitry does not have to be dedicated to only execute the above-described steps, functions, procedure and/or blocks, but may also execute other tasks.
The proposed technology also provides a carrier 250a comprising the computer program 240a. The carrier 250a is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium 250a.
By way of example, the software or computer program 240a may be realized as a computer program product, which is normally carried or stored on a computer-readable medium 240a, preferably nonvolatile computer-readable storage medium 250a. The computer-readable medium 250a may include one or more removable or non-removable memory devices including, but not limited to a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc (CD), a Digital Versatile Disc (DVD), a Blue-ray disc, a Universal Serial Bus (USB) memory, a Hard Disk Drive (HDD) storage device, a flash memory, a magnetic tape, or any other conventional memory device. The computer program 240a may thus be loaded into the operating memory of a computer or equivalent processing device, represented by the user equipment 200a in Figure 6A, for execution by the processor 210a thereof.
Figure 6B is a schematic block diagram illustrating an example of a user equipment (UE) 200b comprising a processor 210b, an associated memory 220b and a communication circuitry 230b.
In this particular example, at least some of the steps, functions, procedures, modules and/or blocks described herein are implemented in a computer program 240b, which is loaded into the memory 220b for execution by processing circuitry including one or more processors 210b. The processor 210b and the memory 220b are interconnected to each other to enable normal software execution. A communication circuitry 230b is also interconnected to the processor 210b and/or the memory 220b to enable input and/or output of video data and tune-in or seek requests. The user equipment 200b can be any device or apparatus that can receive and process video data, i.e., encode motion vectors according to the embodiments described above. For instance, the user equipment 200b could be a computer, either stationary or portable, such as laptop, a smart phone, a tablet, a set-top box, etc.
The term 'processor' should be interpreted in a general sense as any system or device capable of executing program code or computer program instructions to perform a particular processing, determining or computing task.
The processing circuitry including one or more processors is thus configured to perform, when executing the computer program, well-defined processing tasks such as those described herein.
The processing circuitry does not have to be dedicated to only execute the above-described steps, functions, procedure and/or blocks, but may also execute other tasks.
The proposed technology also provides a carrier 250b comprising the computer program 240b. The carrier 250b is one of an electronic signal, an optical signal, an electromagnetic signal, a magnetic signal, an electric signal, a radio signal, a microwave signal, or a computer-readable storage medium 250b.
By way of example, the software or computer program 240b may be realized as a computer program product, which is normally carried or stored on a computer-readable medium 240b, preferably nonvolatile computer-readable storage medium 250b. The computer-readable medium 250b may include one or more removable or non-removable memory devices including, but not limited to a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc (CD), a Digital Versatile Disc (DVD), a Blue-ray disc, a Universal Serial Bus (USB) memory, a Hard Disk Drive (HDD) storage device, a flash memory, a magnetic tape, or any other conventional memory device. The computer program 240b may thus be loaded into the operating memory of a computer or equivalent processing device, represented by the user equipment 200b in Fig. 6B for execution by the processor 210b thereof.
A further aspect of certain embodiments defines a computer program product for a video encoder comprising a computer program 240a for an encoder and a computer readable means 250a on which the computer program 240a for a video encoder is stored. A further aspect of certain embodiments defines a computer program product for a video decoder comprising a computer program 240b for an encoder and a computer readable means 250b on which the computer program 240b for a video decoder is stored.
The flow diagram or diagrams presented herein may therefore be regarded as a computer flow diagram or diagrams, when performed by one or more processors. A corresponding device may be defined as a group of function modules, where each step performed by the processor corresponds to a function module. In this case, the function modules are implemented as a computer program running on the processor. Hence, the device may alternatively be defined as a group of function modules, where the function modules are implemented as a computer program running on at least one processor.
The computer program residing in memory may thus be organized as appropriate function modules configured to perform, when executed by the processor, at least part of the steps and/or tasks described herein. An example of such function modules is illustrated in Figures 7 A and 7B.
Figure 7 A is a schematic block diagram of a video encoder 120a with function modules. The video encoder 120a comprises an encoder 810a further comprising a selection module for selecting a MVP candidate from a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution, wherein the first pixel resolution is different than the second pixel resolution, and a sending module for sending an index of the selected MVP candidate in the list of MVP candidates and the corresponding difference to a video decoder.
A further aspect of the embodiments relates to a user equipment comprising a video encoder according to the embodiments, such as illustrated in any of Figs. 4A, 5 A or 7 A. The user equipment is selected from a group consisting of a computer, a laptop, a desktop, a multimedia player, a video streaming server, a mobile telephone, a smart phone, a tablet and a set-top box.
Yet another aspect of the embodiments relates to a signal representing an encoded version wherein the motion vector is encoded according to the present invention.
It is becoming increasingly popular to provide computing services, such as hardware and/or software, in network devices, such as network nodes and/or servers, where the resources are delivered as a service to remote locations over a network. By way of example, this means that functionality, as described herein, can be distributed or re-located to one or more separate physical nodes or servers. The functionality may be re-located or distributed to one or more jointly acting physical and/or virtual machines that can be positioned in separate physical node(s), i.e. in the so-called cloud. This is sometimes also referred to as cloud computing, which is a model for enabling ubiquitous on-demand network access to a pool of configurable computing resources such as networks, servers, storage, applications and general or customized services.
Figure 7B is a schematic block diagram of a video decoder 120b with function modules. The video decoder 120b comprises a decoder 820b further comprising a decoding module for decoding an index of the MVP candidate in a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution, wherein the first pixel resolution is different than the second pixel resolution. The decoder further comprises a decoding module for decoding the corresponding difference and a reconstruction module for reconstructing the motion vector as a sum of the MVP candidate with the decoded index and the corresponding difference.
A further aspect of the embodiments relates to a user equipment comprising a video decoder according to the embodiments, such as illustrated in any of Figures. 4B, 5B or 7B. The user equipment is selected from a group consisting of a computer, a laptop, a desktop, a multimedia player, a video streaming server, a mobile telephone, a smart phone, a tablet and a set-top box.
Yet another aspect of the embodiments relates to a signal representing a decoded version wherein the motion vector is decoded according to the present invention.
It is becoming increasingly popular to provide computing services, such as hardware and/or software, in network devices, such as network nodes and/or servers, where the resources are delivered as a service to remote locations over a network. By way of example, this means that functionality, as described herein, can be distributed or re-located to one or more separate physical nodes or servers. The functionality may be re-located or distributed to one or more jointly acting physical and/or virtual machines that can be positioned in separate physical node(s), i.e. in the so-called cloud. This is sometimes also referred to as cloud computing, which is a model for enabling ubiquitous on-demand network access to a pool of configurable computing resources such as networks, servers, storage, applications and general or customized services. Figure 8 A is a schematic diagram illustrating an example of how functionality can be distributed or partitioned between different network devices 300a, 301a, 302a in a general case. In this example, there are at least two individual, but interconnected network devices 300a, 301a, which may have different functionalities, or parts of the same functionality, partitioned between the network devices 300a, 301a. There may be additional network devices 302a being part of such a distributed implementation. The network devices 300a, 301a, 302a may be part of the same wireless communication system, or one or more of the network devices may be so-called cloud-based network devices located outside of the wireless communication system.
Figure 8B is a schematic diagram illustrating an example of how functionality can be distributed or partitioned between different network devices 300b, 301b, 302b in a general case. In this example, there are at least two individual, but interconnected network devices 300b, 301b, which may have different functionalities, or parts of the same functionality, partitioned between the network devices 300b, 301b. There may be additional network devices 302b being part of such a distributed implementation. The network devices 300b, 301b, 302b may be part of the same wireless communication system, or one or more of the network devices may be so-called cloud-based network devices located outside of the wireless communication system.
Certain aspects of the inventive concept have mainly been described above with reference to a few embodiments. However, as is readily appreciated by a person skilled in the art, embodiments other than the ones disclosed above are equally possible and within the scope of the inventive concept. Similarly, while a number of different combinations have been discussed, all possible combinations have not been disclosed. One skilled in the art would appreciate that other combinations exist and are within the scope of the inventive concept. Moreover, as is understood by the skilled person, the herein disclosed embodiments are as such applicable also to other standards and encoder or decoder systems and any feature from a particular figure disclosed in connection with other features may be applicable to any other figure and or combined with different features.

Claims

1. A method, performed by a video encoder, for encoding motion vectors, wherein a motion vector is represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate, the method comprising:
selecting (SI) a MVP candidate from a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution, wherein the first pixel resolution is different than the second pixel resolution; sending (S2) an index of the selected MVP candidate in the list of MVP candidates and the corresponding difference to a video decoder.
2. The method according to claim 1, wherein the list of MVP candidates consists of two MVP candidates, and wherein the other MVP candidate from the list of MVP candidates and the
corresponding difference are rounded to the first pixel resolution.
3. The method according to claim 1 or 2, wherein the first pixel resolution is the quarter-pixel resolution and wherein the second pixel resolution is the half-pixel resolution.
4. The method according to any of the preceding claims, wherein the video encoder is an HEVC encoder.
5. A method, performed by a video decoder, of decoding a motion vector represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate, the method comprising:
decoding (S3) an index of the MVP candidate in a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution, wherein the first pixel resolution is different than the second pixel resolution;
decoding (S4) the corresponding difference;
reconstructing (S5) the motion vector as a sum of the MVP candidate with the decoded index and the corresponding difference.
6. The method according to claim 5, wherein the list of MVP candidates consists of two MPV candidates, and wherein the other MVP candidate from the list of MVP candidates and the corresponding difference are rounded to the first pixel resolution.
7. The method according to claim 5 or 6, wherein the first pixel resolution is the quarter-pixel resolution and wherein the second pixel resolution is the half-pixel resolution.
8. The method according to any of the preceding claims, wherein the video decoder is an HEVC decoder.
9. A video encoder, for encoding motion vectors, wherein a motion vector is represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate, the video encoder comprising processing means operative to: select a MVP candidate from a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution, wherein the first pixel resolution is different than the second pixel resolution;
send an index of the selected MVP candidate in the list of MVP candidates and the corresponding difference to a video decoder.
10. The video encoder according to claim 9, wherein the list of MVP candidates consists of two MVP candidates, and wherein the other MVP candidate from the list of MVP candidates and the corresponding difference are rounded to the first pixel resolution.
11. The video encoder according to claim 9 or 10, wherein the first pixel resolution is the quarter-pixel resolution and wherein the second pixel resolution is the half-pixel resolution.
12. The video encoder according to any of claims 9-11 , wherein the video encoder is an
HEVC encoder.
13. The video encoder according to any of claims 9-12, wherein the processing means
comprise a processor and a memory wherein said memory is containing instructions executable by said processor.
14. A video decoder, for decoding a motion vector represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate, the video decoder comprising processing means operative to:
decode an index of the MVP candidate in a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution, wherein the first pixel resolution is different than the second pixel resolution;
decode the corresponding difference;
reconstruct the motion vector as a sum of the MVP candidate with the decoded index and the corresponding difference.
15. The video decoder according to claim 14, wherein the list of MVP candidates consists of two MPV candidates, and wherein the other MVP candidate from the list of MVP candidates and the corresponding difference are rounded to the first pixel resolution.
16. The video decoder according to claim 14 or 15, wherein the first pixel resolution is the quarter-pixel resolution and wherein the second pixel resolution is the half-pixel resolution.
17. The video decoder according to any of claims 14-16, wherein the video decoder is an HEVC decoder.
18. The video encoder according to any of claims 14-17, wherein the processing means comprise a processor and a memory wherein said memory is containing instructions executable by said processor.
19. A computer program, for encoding motion vectors, wherein a motion vector is represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate, the computer program comprising code means which, when run on a computer, causes the computer to:
select a MVP candidate from a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution, wherein the first pixel resolution is different than the second pixel resolution;
send an index of the selected MVP candidate in the list of MVP candidates and the corresponding difference to a video decoder.
20. A computer program, for decoding a motion vector represented as a sum of a motion vector prediction (MVP) candidate and a corresponding difference between the motion vector and the MVP candidate, the computer program comprising code means which, when run on a computer, causes the computer to:
decode an index of the MVP candidate in a list of at least two MVP candidates, wherein one of the MVP candidates is rounded to a second pixel resolution and the corresponding difference is rounded to a first pixel resolution, wherein the first pixel resolution is different than the second pixel resolution;
decode the corresponding difference;
reconstruct the motion vector as a sum of the MVP candidate with the decoded index and the corresponding difference.
21. A computer program product comprising computer readable means and a computer program according to claim 19 stored on the computer readable means.
22. A computer program product comprising computer readable means and a computer program according to claim 20 stored on the computer readable means.
23. A carrier comprising a computer program according to claim 21.
24. A carrier, comprising a computer program according to claim 22.
PCT/EP2017/061499 2016-05-12 2017-05-12 Methods and arrangements for coding and decoding motion vectors WO2017194756A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662335125P 2016-05-12 2016-05-12
US62/335125 2016-05-12

Publications (1)

Publication Number Publication Date
WO2017194756A1 true WO2017194756A1 (en) 2017-11-16

Family

ID=58701655

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2017/061499 WO2017194756A1 (en) 2016-05-12 2017-05-12 Methods and arrangements for coding and decoding motion vectors

Country Status (1)

Country Link
WO (1) WO2017194756A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112970262A (en) * 2018-11-10 2021-06-15 北京字节跳动网络技术有限公司 Rounding in trigonometric prediction mode
US11973962B2 (en) 2018-06-05 2024-04-30 Beijing Bytedance Network Technology Co., Ltd Interaction between IBC and affine

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150195562A1 (en) * 2014-01-09 2015-07-09 Qualcomm Incorporated Adaptive motion vector resolution signaling for video coding
US20150195525A1 (en) * 2014-01-08 2015-07-09 Microsoft Corporation Selection of motion vector precision
US20150264390A1 (en) * 2014-03-14 2015-09-17 Canon Kabushiki Kaisha Method, device, and computer program for optimizing transmission of motion vector related information when transmitting a video stream from an encoder to a decoder

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150195525A1 (en) * 2014-01-08 2015-07-09 Microsoft Corporation Selection of motion vector precision
US20150195562A1 (en) * 2014-01-09 2015-07-09 Qualcomm Incorporated Adaptive motion vector resolution signaling for video coding
US20150264390A1 (en) * 2014-03-14 2015-09-17 Canon Kabushiki Kaisha Method, device, and computer program for optimizing transmission of motion vector related information when transmitting a video stream from an encoder to a decoder

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHEN J ET AL: "Algorithm description of Joint Exploration Test Model 2", 2. JVET MEETING; 20-2-2016 - 26-2-2016; SAN DIEGO; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ); URL: HTTP://PHENIX.INT-EVRY.FR/JVET/,, no. JVET-B1001, 8 March 2016 (2016-03-08), XP030150091 *
SAMUELSSON J ET AL: "Motion vector coding optimizations", 3. JVET MEETING; 26-5-2016 - 1-6-2016; GENEVA; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ); URL: HTTP://PHENIX.INT-EVRY.FR/JVET/,, no. JVET-C0068-v3, 28 May 2016 (2016-05-28), XP030150172 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11973962B2 (en) 2018-06-05 2024-04-30 Beijing Bytedance Network Technology Co., Ltd Interaction between IBC and affine
CN112970262A (en) * 2018-11-10 2021-06-15 北京字节跳动网络技术有限公司 Rounding in trigonometric prediction mode
CN112970262B (en) * 2018-11-10 2024-02-20 北京字节跳动网络技术有限公司 Rounding in trigonometric prediction mode

Similar Documents

Publication Publication Date Title
US10477240B2 (en) Linear model prediction mode with sample accessing for video coding
KR102435393B1 (en) Method and apparatus for determination of reference unit
RU2683495C1 (en) Innovations in block vector prediction and estimation of reconstructed sample values within an overlap area
US11206408B2 (en) Motion vector prediction method and related apparatus
JP2019508971A (en) Predicting filter coefficients from fixed filters for video coding
US20130215968A1 (en) Video information encoding method and decoding method
US20190045183A1 (en) Method and apparatus of adaptive inter prediction in video coding
US20180199055A1 (en) Encoding optimization with illumination compensation and integer motion vector restriction
US20130272412A1 (en) Common motion information candidate list construction process
JP2018509074A (en) Coding tree unit (CTU) level adaptive loop filter (ALF)
US20120327999A1 (en) Encoding mode values representing prediction modes
KR102607443B1 (en) Interframe prediction method and device for video data
JP2008259174A (en) Computer-implemented method for mapping motion vectors, transcoder for mapping motion vectors and decoder for mapping motion vectors
KR20120082994A (en) Motion vector coding and decoding method and apparatus
JP2022523851A (en) Video coding with unfiltered reference samples using different chroma formats
KR20210036411A (en) Method, apparatus, encoder and decoder for obtaining candidate motion vector list
JP7323220B2 (en) Candidates in frames with global motion
KR102609215B1 (en) Video encoders, video decoders, and corresponding methods
CN112740663B (en) Image prediction method, device and corresponding encoder and decoder
JP7086208B2 (en) Bidirectional intra-prediction signaling
JP2022515555A (en) Inter-prediction methods and equipment, as well as corresponding encoders and decoders
WO2017194756A1 (en) Methods and arrangements for coding and decoding motion vectors
JP2022513814A (en) Inter-prediction method and equipment
US10863189B2 (en) Motion vector reconstruction order swap
JP2021524707A (en) Partitioning boundary blocks in video coding

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17723110

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17723110

Country of ref document: EP

Kind code of ref document: A1