CN111107373B

CN111107373B - Inter-frame prediction method based on affine prediction mode and related device

Info

Publication number: CN111107373B
Application number: CN201910154692.8A
Authority: CN
Inventors: 陈焕浜; 杨海涛; 张恋
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-10-29
Filing date: 2019-02-28
Publication date: 2023-11-03
Anticipated expiration: 2039-02-28
Also published as: CN111107373A; WO2020088482A1

Abstract

The embodiment of the application relates to an inter prediction method based on affine prediction mode and a related device, wherein the method comprises the following steps: acquiring GBi index numbers of a plurality of control points of an image block to be processed; according to GBi index numbers of a plurality of control points, determining a weight value corresponding to a reference frame of an image block to be processed; and carrying out weighted prediction according to the weight value to obtain a predicted value of the image block to be processed. The implementation of the technical scheme of the application is beneficial to improving the prediction accuracy of the motion information of the image block and improving the coding and decoding performance.

Description

Inter-frame prediction method based on affine prediction mode and related device

Technical Field

The present application relates to the field of video encoding and decoding, and in particular, to a method and apparatus for inter prediction.

Background

Video coding (video encoding and decoding) is widely used in digital video applications such as broadcast digital television, video distribution over the internet and mobile networks, real-time conversational applications such as video chat and video conferencing, DVD and blu-ray discs, video content acquisition and editing systems, and security applications for camcorders.

With the development of block-based hybrid video coding in the h.261 standard in 1990, new video coding techniques and tools have evolved and form the basis for new video coding standards. Other video coding standards include MPEG-1 video, MPEG-2 video, ITU-T H.262/MPEG-2, ITU-T H.263, ITU-T H.264/MPEG-4 part 10 advanced video coding (Advanced Video Coding, AVC), ITU-T H.265/high efficiency video coding (High Efficiency Video Coding, HEVC) …, and extensions of such standards, such as scalability and/or 3D (three-dimensional) extensions. As video creation and use becomes more widespread, video traffic becomes the biggest burden on communication networks and data storage. One of the goals of most video coding standards is therefore to reduce the bit rate without sacrificing picture quality compared to previous standards. Even though the latest high efficiency video coding (High Efficiency video coding, HEVC) can compress video approximately twice more than AVC without sacrificing picture quality, new techniques are still needed to compress video further than HEVC.

Disclosure of Invention

The embodiment of the invention provides an inter-frame prediction method and device for video images, and a corresponding encoder and decoder, which can improve the prediction accuracy of motion information of image blocks to a certain extent and improve the encoding and decoding performance.

In a first aspect, an embodiment of the present invention provides a method for inter prediction based on affine prediction modes, including: acquiring GBi index numbers (the Generalization Bi-prediction weight index) of a plurality of control points of an image block to be processed (or called a current block); according to GBi index numbers of the control points, determining a weight value corresponding to a reference frame (for example, a reference frame in a certain direction) of the image block to be processed; and carrying out weighted prediction according to the weight value to obtain a predicted value of the image block to be processed.

The GBi index numbers of the plurality of control points are derived from different processed image blocks, and are used for determining weight values (namely, a correspondence between the GBi index numbers and the weight values) of reference frames of the processed image blocks in Generalized Bi-prediction (Generalized Bi-prediction); the weight value corresponding to the reference frame of the processed image block represents the weight occupied by the pixel value of the reference frame of the processed image block in the generalized bi-prediction.

It should be noted that, in a possible application scenario, the "GBi index number" referred to in the present invention may be named as another name, such as a weight value index, index information, GBi weight value index, etc., and may be also named as an "average weighted Bi-directional prediction (Bi-prediction with weighted averaging (BWA)) index number" by way of example, which is not limited by the present invention.

It can be seen that, under the condition that GBi index numbers corresponding to control points of the current block to be processed are different, by implementing the scheme of the embodiment of the invention, the weight value of the reference frame of the current block to be processed can be rapidly determined, so that weighted prediction is performed based on the weight value, normal running of a bidirectional prediction coding process is ensured, and coding efficiency and accuracy are improved.

Based on the first aspect, in a specific embodiment, the image to be processed includes a plurality of sub-blocks, and the motion vectors of the plurality of control points are respectively determined by the motion vectors of different neighboring processed blocks (for example, an inter-frame prediction process based on an affine prediction mode adopts a constructed control point motion vector prediction method); the method further comprises the steps of: obtaining a motion vector of each sub-block in the image block to be processed according to the motion vectors of the control points;

Correspondingly, the performing weighted prediction according to the weight value to obtain a predicted value of the image block to be processed includes: obtaining at least two motion compensation blocks (the motion compensation blocks can also be called as reference blocks or prediction blocks) of each sub-block according to at least two motion vectors of each sub-block in the image block to be processed and at least two reference frames respectively corresponding to the at least two motion vectors; and weighting the pixel values of the at least two motion compensation blocks according to the weight values respectively corresponding to the at least two reference frames to obtain the predicted value of each sub-block.

It can be seen that, in the inter prediction process, if the current block adopts an affine motion model, and the inter prediction process adopts a constructed control point motion vector prediction method, the reference frame corresponding weight value of the current block can be determined according to the GBi index number of the adjacent decoded block of the control point, so that weighted prediction can be performed based on the reference frame corresponding weight value, and further the prediction value of each sub-block of the current block is obtained, thereby ensuring smooth progress of the encoding/decoding process and improving the encoding efficiency and the prediction accuracy.

Based on the first aspect, in some specific embodiments, determining a weight value corresponding to the reference frame of the image block to be processed according to GBi index numbers of the plurality of control points includes: according to the GBi index numbers of the control points, determining the GBi index numbers of the image blocks to be processed; and taking the weight value corresponding to the GBi index number of the image block to be processed as the weight value corresponding to the reference frame of the image block to be processed.

The GBi index number of the image block to be processed is used for determining a weight value (namely, a corresponding relation between the GBi index number and the weight value) of a reference frame of the image block to be processed in generalized bi-directional prediction; the weight value corresponding to the reference frame of the image block to be processed represents the weight occupied by the pixel value of the reference frame of the image block to be processed in the generalized bi-prediction; in a specific implementation, in predicting the image block to be processed by adopting a fusion mode (Affine Merge mode) based on an affine motion model, the GBi index number of the image block to be processed may be included in the motion information in the constructed control point motion vector fusion candidate list.

It can be seen that, under the condition that GBi index numbers corresponding to control points of a current block to be processed are different, by implementing the scheme of the embodiment of the invention, GBi index numbers corresponding to candidate motion information of each control point can be rapidly determined, so that normal running of a bidirectional prediction coding process is ensured, and the GBi index number of the current block can be continuously used in a coding/decoding process of a subsequent image block, thereby improving coding efficiency and prediction accuracy.

Based on the first aspect, in a possible implementation manner, the determining, according to GBi index numbers of the plurality of control points, the GBi index number of the image block to be processed includes: and under the condition that the GBi index numbers of the control points are the same, taking the same GBi index number as the GBi index number of the image block to be processed.

Correspondingly, the weight value corresponding to the same GBi index number is used as the weight value corresponding to the reference frame of the image block to be processed in the GBi method.

Based on the first aspect, in a possible implementation manner, the determining, according to GBi index numbers of the plurality of control points, the GBi index number of the image block to be processed includes: and under the condition that different GBi index numbers exist in the GBi index numbers of the plurality of control points, taking the GBi index number corresponding to the preset value as the GBi index number of the image block to be processed.

Correspondingly, the preset value can be used as a weight value corresponding to the reference frame of the image block to be processed in the generalized bi-prediction.

Based on the first aspect, in a possible implementation manner, the determining, according to GBi index numbers of the plurality of control points, the GBi index number of the image block to be processed includes:

and under the condition that the same GBi index numbers exist in the GBi index numbers of the plurality of control points, taking the GBi index number with the largest number in the GBi index numbers of the plurality of control points as the GBi index number of the image block to be processed.

Correspondingly, the same GBi index number with the largest number can be used as the weight value corresponding to the reference frame of the image block to be processed in the generalized bi-prediction.

and under the condition that GBi index numbers of the control points are different from each other, taking the GBi index number corresponding to the preset value as the GBi index number of the image block to be processed.

and under the condition that at least one corresponding weight value in the GBi index numbers of the control points is equal to a preset value, taking the GBi index number corresponding to the preset value as the GBi index number of the image block to be processed.

And under the condition that a plurality of weight values corresponding to the GBi index numbers of the plurality of control points are different from a preset value, taking the GBi index number corresponding to the weight value with the smallest difference value with the preset value in the plurality of weight values as the GBi index number of the image block to be processed.

Correspondingly, a weight value with the smallest difference value with the preset value in the plurality of weight values can be used as a weight value corresponding to the reference frame of the image block to be processed in the generalized bi-prediction.

and taking the GBi index number corresponding to the preset value as the GBi index number of the image block to be processed under the condition that the average value of a plurality of weight values corresponding to the GBi index numbers of the control points is equal to the preset value.

And under the condition that a plurality of weight values corresponding to the GBi index numbers of the plurality of control points are different from the preset value and the average value of the plurality of weight values is not equal to the preset value, taking the GBi index number corresponding to the weight value with the smallest difference value with the preset value in the plurality of weight values as the GBi index number of the image block to be processed.

and under the condition that a plurality of weight values corresponding to the GBi index numbers of the control points are different from a preset value and the average value of at least two weight values in the plurality of weight values is equal to the preset value, taking the GBi index number corresponding to the preset value as the GBi index number of the image block to be processed.

In the above various possible embodiments, the preset value may be, for example, 1/2, and the GBi index number corresponding to the preset value may be, for example, 0.

In a second aspect, an embodiment of the present invention provides an inter prediction method based on an affine prediction mode, the method including: taking a preset GBi index number as the GBi index number of the image block to be processed, wherein the motion vectors of a plurality of control points of the image block to be processed are respectively obtained according to the motion vectors of a plurality of processed image blocks (for example, a constructed control point motion vector prediction method is adopted in an inter-frame prediction process based on an affine prediction mode); taking the weight value corresponding to the GBi index number of the image block to be processed as the weight value corresponding to the reference frame of the image block to be processed; and carrying out weighted prediction according to the weight value to obtain a predicted value of the image block to be processed.

The GBi index number and the weight value have a corresponding relation; the weight value corresponding to the reference frame of the processed image block represents the weight occupied by the pixel value of the reference frame of the processed image block in the generalized bi-prediction;

it can be seen that, in this solution, when the motion vectors of the plurality of control points are obtained according to the motion vectors of different processed image blocks, whether the GBi index numbers of the processed image blocks corresponding to the control points of the current to-be-processed block are the same or not, the preset GBi index number is directly adopted as the GBi index number of the to-be-processed image block, and the weight value corresponding to the GBi index number of the to-be-processed image block is used as the weight value corresponding to the reference frame of the to-be-processed image block, so that weighted prediction is performed based on the weight values. Therefore, by implementing the embodiment of the invention, the GBi index number of the current processing image block and the weight value of the reference frame of the block to be processed can be rapidly determined, so that the weighted prediction is carried out based on the weight value, the normal running of the bidirectional prediction coding process is ensured, and the coding efficiency and the coding accuracy are improved.

Based on the second aspect, in a specific embodiment, the image to be processed includes a plurality of sub-blocks, and the method further includes: obtaining a motion vector of each sub-block in the image block to be processed according to the motion vectors of the control points;

correspondingly, the performing weighted prediction according to the weight value to obtain a predicted value of the image block to be processed includes: obtaining at least two motion compensation blocks of each sub-block according to at least two motion vectors of each sub-block in the image block to be processed and at least two reference frames respectively corresponding to the at least two motion vectors; and weighting the pixel values of the at least two motion compensation blocks according to the weight values respectively corresponding to the at least two reference frames to obtain the predicted value of each sub-block.

It can be seen that, in the inter-frame prediction process, if the current block adopts an affine motion model, and the inter-frame prediction process adopts a constructed control point motion vector prediction method, then a preset GBi index number is directly adopted as the GBi index number of the image block to be processed, and a weight value corresponding to the GBi index number of the image block to be processed is used as a weight value corresponding to the reference frame of the image block to be processed, so that weighted prediction is performed based on the weight value, and further, a predicted value of each sub-block of the current block is obtained, smooth progress of the encoding/decoding process is ensured, and encoding efficiency and prediction accuracy are improved.

In this scheme, the preset GBi index number is, for example, 0, the weight value corresponding to the GBi index number of the current image block to be processed is, for example, equal to 1/2, and the weighting mode of weighting prediction (bi-directional prediction) according to the weight value is the average weighting.

In a third aspect, an embodiment of the present invention provides an apparatus, including: the acquisition module is used for acquiring GBi index numbers (the Generalization Bi-prediction weight index) of a plurality of control points of the image block to be processed; the weight determining module is used for determining a weight value corresponding to the reference frame of the image block to be processed according to the GBi index numbers of the plurality of control points; and the prediction module is used for carrying out weighted prediction according to the weight value so as to obtain a predicted value of the image block to be processed.

The functional modules of the apparatus are particularly useful for implementing the method described in the first aspect.

In a fourth aspect, embodiments of the present invention provide yet another apparatus, the apparatus comprising: the weight determining module is used for taking a preset GBi index number as the GBi index number of the image block to be processed, wherein the motion vectors of a plurality of control points of the image block to be processed are respectively obtained according to the motion vectors of a plurality of processed image blocks; taking the weight value corresponding to the GBi index number of the image block to be processed as the weight value corresponding to the reference frame of the image block to be processed; and the prediction module is used for carrying out weighted prediction according to the weight value so as to obtain a predicted value of the image block to be processed.

The functional modules of the apparatus are particularly useful for implementing the method described in the second aspect.

In a fifth aspect, an embodiment of the present invention provides a video codec apparatus, the apparatus including: a non-volatile memory and a processor coupled to each other, the processor invoking program code stored in the memory to perform the method as described in the first aspect.

In a sixth aspect, an embodiment of the present invention provides a video codec apparatus, the apparatus including: a non-volatile memory and a processor coupled to each other, the processor invoking program code stored in the memory to perform the method as described in the second aspect.

In a seventh aspect, an embodiment of the present invention provides an apparatus for decoding video, the apparatus comprising:

a memory for storing video data in the form of a code stream;

the decoder acquires GBi index numbers of a plurality of control points of the image block to be processed; according to GBi index numbers of the control points, determining weight values corresponding to reference frames of the image blocks to be processed; and carrying out weighted prediction according to the weight value to obtain a predicted value of the image block to be processed.

In an eighth aspect, an embodiment of the present invention provides an apparatus for decoding video, the apparatus comprising:

A memory for storing video data in the form of a code stream;

the decoder is used for taking a preset GBi index number as the GBi index number of the image block to be processed, wherein the motion vectors of a plurality of control points of the image block to be processed are respectively obtained according to the motion vectors of a plurality of processed image blocks; taking the weight value corresponding to the GBi index number of the image block to be processed as the weight value corresponding to the reference frame of the image block to be processed; and carrying out weighted prediction according to the weight value to obtain a predicted value of the image block to be processed.

In a ninth aspect, an embodiment of the present invention provides an apparatus for encoding video, the apparatus comprising:

a memory for storing video data in the form of a code stream;

the encoder is used for acquiring GBi index numbers of a plurality of control points of the image block to be processed from the plurality of processed image blocks; the image block to be processed adopts an affine prediction mode; and determining the GBi index number of the image block to be processed according to the GBi index numbers of the control points, wherein the GBi index number of the image block to be processed is used for determining a weight value corresponding to the reference frame of the image block to be processed.

In a tenth aspect, an embodiment of the present invention provides an apparatus for encoding video, the apparatus comprising:

A memory for storing video data in the form of a code stream;

the encoder is used for taking a preset GBi index number as the GBi index number of the image block to be processed, wherein the motion vectors of a plurality of control points of the image block to be processed are respectively obtained according to the motion vectors of a plurality of processed image blocks; correspondingly, the weight value corresponding to the GBi index number of the image block to be processed is the weight value corresponding to the reference frame of the image block to be processed.

In an eleventh aspect, embodiments of the present invention provide a computer-readable storage medium having instructions stored thereon that, when executed, cause one or more processors to encode video data. The instructions cause the one or more processors to perform the method according to any possible embodiment of the first aspect.

In a twelfth aspect, embodiments of the present invention provide a computer-readable storage medium having instructions stored thereon that, when executed, cause one or more processors to encode video data. The instructions cause the one or more processors to perform the method according to any possible embodiment of the second aspect.

In a thirteenth aspect, embodiments of the present invention provide a computer program comprising program code which, when run on a computer, performs the method according to any of the possible embodiments of the first aspect.

In a fourteenth aspect, embodiments of the present invention provide a computer program comprising program code which, when run on a computer, performs a method according to any of the possible embodiments of the second aspect.

It can be seen that, in the case that the motion vector of each control point of the current block to be processed is derived from the processed block (adjacent to the encoded/decoded block), the GBi index number of each processed block may be different.

Drawings

In order to more clearly describe the embodiments of the present invention or the technical solutions in the background art, the following description will describe the drawings that are required to be used in the embodiments of the present invention or the background art.

FIG. 1A is a block diagram of an example of a video encoding and decoding system 10 for implementing an embodiment of the invention;

FIG. 1B is a block diagram of an example of a video coding system 40 for implementing an embodiment of the invention;

FIG. 2 is a block diagram of an example structure of an encoder 20 for implementing an embodiment of the present invention;

FIG. 3 is a block diagram of an example architecture of a decoder 30 for implementing an embodiment of the present invention;

Fig. 4 is a block diagram of an example of a video coding apparatus 400 for implementing an embodiment of the invention;

FIG. 5 is a block diagram of another example encoding or decoding device for implementing an embodiment of the present invention;

FIG. 6 is an exemplary diagram for representing current block spatial and temporal candidate motion information;

FIG. 7 is an exemplary diagram for representing affine model motion information retrieval;

FIG. 8A is an exemplary schematic diagram of a method of motion vector prediction for a control point of a construct;

FIG. 8B is an exemplary flow chart of a method of motion vector prediction for a constructed control point;

FIG. 9 is an exemplary schematic diagram of ATMVP technology;

fig. 10 is an exemplary schematic diagram of a PLANAR (inter-frame plane mode) technique;

FIG. 11A is an exemplary flow chart of an inter prediction method;

FIG. 11B is an exemplary flow chart of yet another method of inter prediction;

FIG. 12 is an exemplary diagram of control point motion information;

FIG. 13 is a block diagram of an example of an apparatus 1000 for implementing an embodiment of the invention;

fig. 14 is a block diagram of an example of a device 2000 for implementing an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. In the following description, reference is made to the accompanying drawings which form a part hereof and which show by way of illustration specific aspects in which embodiments of the invention may be practiced. It is to be understood that embodiments of the invention may be used in other aspects and may include structural or logical changes not depicted in the drawings. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims. For example, it should be understood that the disclosure in connection with the described methods may be equally applicable to a corresponding apparatus or system for performing the methods, and vice versa. For example, if one or more specific method steps are described, the corresponding apparatus may comprise one or more units, such as functional units, to perform the one or more described method steps (e.g., one unit performing one or more steps, or multiple units each performing one or more of the multiple steps), even if such one or more units are not explicitly described or illustrated in the figures. On the other hand, if a specific apparatus is described based on one or more units such as a functional unit, for example, the corresponding method may include one step to perform the functionality of the one or more units (e.g., one step to perform the functionality of the one or more units, or multiple steps each to perform the functionality of one or more units, even if such one or more steps are not explicitly described or illustrated in the figures). Further, it is to be understood that features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless explicitly stated otherwise.

In the embodiments of the present invention, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

The technical scheme related to the embodiment of the invention can be applied to the existing video coding standards (such as H.264, HEVC and the like) and future video coding standards (such as H.266). The terminology used in the description of the embodiments of the invention herein is for the purpose of describing particular embodiments of the invention only and is not intended to be limiting of the invention. Some concepts that may be related to embodiments of the present invention are briefly described below.

Video coding generally refers to processing a sequence of pictures that form a video or video sequence. In the field of video coding, the terms "picture", "frame" or "image" may be used as synonyms. Video encoding as used herein refers to video encoding or video decoding. Video encoding is performed on the source side, typically including processing (e.g., by compression) the original video picture to reduce the amount of data required to represent the video picture, thereby more efficiently storing and/or transmitting. Video decoding is performed on the destination side, typically involving inverse processing with respect to the encoder to reconstruct the video pictures. The embodiment relates to video picture "encoding" is understood to relate to "encoding" or "decoding" of a video sequence. The combination of the encoding portion and the decoding portion is also called codec (encoding and decoding).

A video sequence comprises a series of pictures (pictures) which are further divided into slices (slices) which are further divided into blocks (blocks). Video coding performs coding processing in units of blocks, and in some new video coding standards, the concept of blocks is further extended. For example, in the h.264 standard, there are Macro Blocks (MBs), which can be further divided into a plurality of prediction blocks (partition) that can be used for predictive coding. In the high performance video coding (high efficiency video coding, HEVC) standard, basic concepts such as a Coding Unit (CU), a Prediction Unit (PU), and a Transform Unit (TU) are adopted, and various block units are functionally divided and described by using a brand new tree-based structure. For example, a CU may be divided into smaller CUs according to a quadtree, and the smaller CUs may continue to be divided, thereby forming a quadtree structure, where a CU is a basic unit for dividing and encoding an encoded image. Similar tree structures exist for PUs and TUs, which may correspond to prediction blocks, being the basic unit of predictive coding. The CU is further divided into a plurality of PUs according to a division pattern. The TU may correspond to a transform block, which is a basic unit for transforming a prediction residual. However, whether CU, PU or TU, essentially belongs to the concept of blocks (or picture blocks).

In HEVC, for example, CTUs are split into multiple CUs by using a quadtree structure denoted as coding tree. A decision is made at the CU level whether to encode a picture region using inter-picture (temporal) or intra-picture (spatial) prediction. Each CU may be further split into one, two, or four PUs depending on the PU split type. The same prediction process is applied within one PU and the relevant information is transmitted to the decoder on a PU basis. After the residual block is obtained by applying the prediction process based on the PU split type, the CU may be partitioned into Transform Units (TUs) according to other quadtree structures similar to the coding tree for the CU. In a recent development of video compression technology, a Quad tree and a binary tree (qd-tree and binary tree, QTBT) partition frames are used to partition the encoded blocks. In QTBT block structures, a CU may be square or rectangular in shape.

Herein, for convenience of description and understanding, an image block to be processed (simply referred to as a to-be-processed image block) in a current encoded image may be referred to as a current block, for example, in encoding, the to-be-processed image block refers to a block currently being encoded; in decoding, an image block to be processed refers to a block currently being decoded. A decoded image block in a reference image used for predicting a current block is referred to as a reference block, i.e. a reference block is a block providing a reference signal for the current block, wherein the reference signal represents pixel values within the image block. A block in the reference picture that provides a prediction signal for the current block may be referred to as a prediction block, where the prediction signal represents pixel values or sample signals within the prediction block. For example, after traversing multiple reference blocks, the best reference block is found, which will provide prediction for the current block, which may be referred to as a prediction block.

In the case of lossless video coding, the original video picture may be reconstructed, i.e., the reconstructed video picture has the same quality as the original video picture (assuming no transmission loss or other data loss during storage or transmission). In the case of lossy video coding, the amount of data needed to represent a video picture is reduced by performing further compression, e.g. quantization, whereas the decoder side cannot reconstruct the video picture completely, i.e. the quality of the reconstructed video picture is lower or worse than the quality of the original video picture.

Several video coding standards of h.261 belong to the "lossy hybrid video codec" (i.e. spatial and temporal prediction in the sample domain is combined with 2D transform coding in the transform domain for applying quantization). Each picture of a video sequence is typically partitioned into non-overlapping sets of blocks, typically encoded at the block level. In other words, the encoder side typically processes, i.e. encodes, video at the block (video block) level, e.g. generates a prediction block by spatial (intra-picture) prediction and temporal (inter-picture) prediction, subtracts the prediction block from the current block (currently processed or to-be-processed block) to obtain a residual block, transforms the residual block in the transform domain and quantizes the residual block to reduce the amount of data to be transmitted (compressed), while the decoder side applies the inverse processing part of the relative encoder to the encoded or compressed block to reconstruct the current block for representation. In addition, the encoder replicates the decoder processing loop so that the encoder and decoder generate the same predictions (e.g., intra-prediction and inter-prediction) and/or reconstructions for processing, i.e., encoding, the subsequent blocks.

The system architecture to which the embodiments of the present invention are applied is described below. Referring to fig. 1A, fig. 1A schematically illustrates a block diagram of a video encoding and decoding system 10 to which embodiments of the present invention are applied. As shown in fig. 1A, video encoding and decoding system 10 may include a source device 12 and a destination device 14, source device 12 generating encoded video data, and thus source device 12 may be referred to as a video encoding apparatus. Destination device 14 may decode encoded video data generated by source device 12, and thus destination device 14 may be referred to as a video decoding apparatus. Various implementations of source apparatus 12, destination apparatus 14, or both may include one or more processors and memory coupled to the one or more processors. The memory may include, but is not limited to RAM, ROM, EEPROM, flash memory, or any other medium that can be used to store the desired program code in the form of instructions or data structures that can be accessed by a computer, as described herein. The source device 12 and the destination device 14 may include a variety of devices including desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones, televisions, cameras, display devices, digital media players, video game consoles, vehicle mount computers, wireless communication devices, or the like.

Although fig. 1A depicts source device 12 and destination device 14 as separate devices, device embodiments may also include the functionality of both source device 12 and destination device 14, or both, i.e., source device 12 or corresponding functionality and destination device 14 or corresponding functionality. In such embodiments, the source device 12 or corresponding functionality and the destination device 14 or corresponding functionality may be implemented using the same hardware and/or software, or using separate hardware and/or software, or any combination thereof.

A communication connection may be made between source device 12 and destination device 14 via link 13, and destination device 14 may receive encoded video data from source device 12 via link 13. Link 13 may include one or more media or devices capable of moving encoded video data from source device 12 to destination device 14. In one example, link 13 may include one or more communication media that enable source device 12 to transmit encoded video data directly to destination device 14 in real-time. In this example, source apparatus 12 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to destination apparatus 14. The one or more communication media may include wireless and/or wired communication media such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (e.g., the internet). The one or more communication media may include routers, switches, base stations, or other equipment that facilitate communication from source apparatus 12 to destination apparatus 14.

Source device 12 includes an encoder 20 and, alternatively, source device 12 may also include a picture source 16, a picture preprocessor 18, and a communication interface 22. In a specific implementation, the encoder 20, the picture source 16, the picture preprocessor 18, and the communication interface 22 may be hardware components in the source device 12 or may be software programs in the source device 12. The descriptions are as follows:

the picture source 16 may include or be any type of picture capture device for capturing, for example, real world pictures, and/or any type of picture or comment (for screen content encoding, some text on the screen is also considered part of the picture or image to be encoded), for example, a computer graphics processor for generating computer animated pictures, or any type of device for capturing and/or providing real world pictures, computer animated pictures (e.g., screen content, virtual Reality (VR) pictures), and/or any combination thereof (e.g., live (augmented reality, AR) pictures). Picture source 16 may be a camera for capturing pictures or a memory for storing pictures, picture source 16 may also include any type of (internal or external) interface for storing previously captured or generated pictures and/or for capturing or receiving pictures. When picture source 16 is a camera, picture source 16 may be, for example, an integrated camera, either local or integrated in the source device; when picture source 16 is memory, picture source 16 may be local or integrated memory integrated in the source device, for example. When the picture source 16 comprises an interface, the interface may for example be an external interface receiving pictures from an external video source, for example an external picture capturing device, such as a camera, an external memory or an external picture generating device, for example an external computer graphics processor, a computer or a server. The interface may be any kind of interface according to any proprietary or standardized interface protocol, e.g. a wired or wireless interface, an optical interface.

Wherein a picture can be regarded as a two-dimensional array or matrix of pixel elements. The pixels in the array may also be referred to as sampling points. The number of sampling points of the array or picture in the horizontal and vertical directions (or axes) defines the size and/or resolution of the picture. To represent color, three color components are typically employed, i.e., a picture may be represented as or contain three sample arrays. For example, in RBG format or color space, the picture includes corresponding red, green, and blue sample arrays. However, in video coding, each pixel is typically represented in a luminance/chrominance format or color space, e.g., for a picture in YUV format, comprising a luminance component indicated by Y (which may sometimes also be indicated by L) and two chrominance components indicated by U and V. The luminance (luma) component Y represents the luminance or grayscale level intensity (e.g., the same in a grayscale picture), while the two chrominance (chroma) components U and V represent the chrominance or color information components. Accordingly, a picture in YUV format includes a luminance sample array of luminance sample values (Y) and two chrominance sample arrays of chrominance values (U and V). Pictures in RGB format may be converted or transformed into YUV format and vice versa, a process also known as color transformation or conversion. If the picture is black and white, the picture may include only an array of luma samples. In the embodiment of the present invention, the picture transmitted from the picture source 16 to the picture processor may also be referred to as the original picture data 17.

A picture preprocessor 18 for receiving the original picture data 17 and performing preprocessing on the original picture data 17 to obtain a preprocessed picture 19 or preprocessed picture data 19. For example, the preprocessing performed by the picture preprocessor 18 may include truing, color format conversion (e.g., from RGB format to YUV format), toning, or denoising.

Encoder 20 (or encoder 20) receives pre-processed picture data 19, and processes pre-processed picture data 19 using an associated prediction mode (e.g., a prediction mode in various embodiments herein) to provide encoded picture data 21 (details of the structure of encoder 20 will be described below further based on fig. 2 or fig. 4 or fig. 5). In some embodiments, encoder 20 may be configured to perform various embodiments described below to implement the application of the chroma block prediction method described in the present invention on the encoding side.

Communication interface 22 may be used to receive encoded picture data 21 and may transmit encoded picture data 21 over link 13 to destination device 14 or any other device (e.g., memory) for storage or direct reconstruction, which may be any device for decoding or storage. Communication interface 22 may be used, for example, to encapsulate encoded picture data 21 into a suitable format, such as a data packet, for transmission over link 13.

Destination device 14 includes a decoder 30, and alternatively destination device 14 may also include a communication interface 28, a picture post-processor 32, and a display device 34. The descriptions are as follows:

communication interface 28 may be used to receive encoded picture data 21 from source device 12 or any other source, such as a storage device, such as an encoded picture data storage device. The communication interface 28 may be used to transmit or receive encoded picture data 21 via a link 13 between the source device 12 and the destination device 14, such as a direct wired or wireless connection, or via any type of network, such as a wired or wireless network or any combination thereof, or any type of private and public networks, or any combination thereof. Communication interface 28 may, for example, be used to decapsulate data packets transmitted by communication interface 22 to obtain encoded picture data 21.

Both communication interface 28 and communication interface 22 may be configured as unidirectional communication interfaces or bidirectional communication interfaces and may be used, for example, to send and receive messages to establish connections, to acknowledge and to exchange any other information related to the communication link and/or to the transmission of data, for example, encoded picture data transmissions.

Decoder 30 (or referred to as decoder 30) for receiving encoded picture data 21 and providing decoded picture data 31 or decoded picture 31 (details of the structure of decoder 30 will be described below further based on fig. 3 or fig. 4 or fig. 5). In some embodiments, decoder 30 may be configured to perform various embodiments described below to implement the application of the chroma block prediction method described in the present invention on the decoding side.

A picture post-processor 32 for performing post-processing on the decoded picture data 31 (also referred to as reconstructed slice data) to obtain post-processed picture data 33. The post-processing performed by the picture post-processor 32 may include: color format conversion (e.g., from YUV format to RGB format), toning, truing, or resampling, or any other process, may also be used to transmit post-processed picture data 33 to display device 34.

A display device 34 for receiving the post-processed picture data 33 for displaying pictures to, for example, a user or viewer. The display device 34 may be or include any type of display for presenting reconstructed pictures, for example, an integrated or external display or monitor. For example, the display may include a liquid crystal display (liquid crystal display, LCD), an organic light emitting diode (organic light emitting diode, OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (liquid crystal on silicon, LCoS), a digital light processor (digital light processor, DLP), or any other type of display.

It will be apparent to those skilled in the art from this description that the functionality of the different units or the existence and (exact) division of the functionality of the source device 12 and/or destination device 14 shown in fig. 1A may vary depending on the actual device and application. Source device 12 and destination device 14 may comprise any of a variety of devices, including any type of handheld or stationary device, such as a notebook or laptop computer, mobile phone, smart phone, tablet or tablet computer, video camera, desktop computer, set-top box, television, camera, in-vehicle device, display device, digital media player, video game console, video streaming device (e.g., content service server or content distribution server), broadcast receiver device, broadcast transmitter device, etc., and may not use or use any type of operating system.

Encoder 20 and decoder 30 may each be implemented as any of a variety of suitable circuits, such as, for example, one or more microprocessors, digital signal processors (digital signal processor, DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, hardware or any combinations thereof. If the techniques are implemented in part in software, an apparatus may store instructions of the software in a suitable non-transitory computer-readable storage medium and may execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered one or more processors.

In some cases, the video encoding and decoding system 10 shown in fig. 1A is merely an example, and the techniques of this disclosure may be applied to video encoding settings (e.g., video encoding or video decoding) that do not necessarily involve any data communication between encoding and decoding devices. In other examples, the data may be retrieved from local memory, streamed over a network, and the like. The video encoding device may encode and store data to the memory and/or the video decoding device may retrieve and decode data from the memory. In some examples, encoding and decoding are performed by devices that do not communicate with each other, but instead only encode data to memory and/or retrieve data from memory and decode data.

Referring to fig. 1B, fig. 1B is an illustration of an example of a video coding system 40 including encoder 20 of fig. 2 and/or decoder 30 of fig. 3, according to an example embodiment. Video coding system 40 may implement a combination of the various techniques of embodiments of the present invention. In the illustrated embodiment, video coding system 40 may include an imaging device 41, an encoder 20, a decoder 30 (and/or a video codec implemented by logic circuitry 47 of a processing unit 46), an antenna 42, one or more processors 43, one or more memories 44, and/or a display device 45.

As shown in fig. 1B, the imaging device 41, the antenna 42, the processing unit 46, the logic circuit 47, the encoder 20, the decoder 30, the processor 43, the memory 44, and/or the display device 45 can communicate with each other. As discussed, although video coding system 40 is depicted with encoder 20 and decoder 30, in different examples, video coding system 40 may include only encoder 20 or only decoder 30.

In some examples, antenna 42 may be used to transmit or receive an encoded bitstream of video data. Additionally, in some examples, display device 45 may be used to present video data. In some examples, logic 47 may be implemented by processing unit 46. The processing unit 46 may comprise application-specific integrated circuit (ASIC) logic, a graphics processor, a general-purpose processor, or the like. The video coding system 40 may also include an optional processor 43, which optional processor 43 may similarly include application-specific integrated circuit (ASIC) logic, a graphics processor, a general purpose processor, or the like. In some examples, logic 47 may be implemented in hardware, such as video encoding dedicated hardware, processor 43 may be implemented in general purpose software, an operating system, or the like. In addition, the memory 44 may be any type of memory, such as volatile memory (e.g., static random access memory (Static Random Access Memory, SRAM), dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.) or non-volatile memory (e.g., flash memory, etc.), and the like. In a non-limiting example, the memory 44 may be implemented by an overspeed cache. In some examples, logic circuitry 47 may access memory 44 (e.g., for implementing an image buffer). In other examples, logic 47 and/or processing unit 46 may include memory (e.g., a cache, etc.) for implementing an image buffer, etc.

In some examples, encoder 20 implemented by logic circuitry may include an image buffer (e.g., implemented by processing unit 46 or memory 44) and a graphics processing unit (e.g., implemented by processing unit 46). The graphics processing unit may be communicatively coupled to the image buffer. The graphics processing unit may include encoder 20 implemented by logic circuitry 47 to implement the various modules discussed with reference to fig. 2 and/or any other encoder system or subsystem described herein. Logic circuitry may be used to perform various operations discussed herein.

In some examples, decoder 30 may be implemented in a similar manner by logic circuit 47 to implement the various modules discussed with reference to decoder 30 of fig. 3 and/or any other decoder system or subsystem described herein. In some examples, decoder 30 implemented by logic circuitry may include an image buffer (implemented by processing unit 2820 or memory 44) and a graphics processing unit (e.g., implemented by processing unit 46). The graphics processing unit may be communicatively coupled to the image buffer. The graphics processing unit may include decoder 30 implemented by logic circuit 47 to implement the various modules discussed with reference to fig. 3 and/or any other decoder system or subsystem described herein.

In some examples, antenna 42 may be used to receive an encoded bitstream of video data. As discussed, the encoded bitstream may include data related to the encoded video frame, indicators, index values, mode selection data, etc., discussed herein, such as data related to the encoded partitions (e.g., transform coefficients or quantized transform coefficients, optional indicators (as discussed), and/or data defining the encoded partitions). Video coding system 40 may also include a decoder 30 coupled to antenna 42 and used to decode the encoded bitstream. The display device 45 is used to present video frames.

It should be understood that decoder 30 may be used to perform the reverse process for the example described with reference to encoder 20 in embodiments of the present invention. Regarding signaling syntax elements, decoder 30 may be configured to receive and parse such syntax elements and decode the associated video data accordingly. In some examples, encoder 20 may entropy encode the syntax elements into an encoded video bitstream. In such examples, decoder 30 may parse such syntax elements and decode the relevant video data accordingly.

It should be noted that the method described in the embodiment of the present invention is mainly used in the inter-frame prediction process, where the process exists in both the encoder 20 and the decoder 30, and in the embodiment of the present invention, the encoder 20 and the decoder 30 may be, for example, a codec corresponding to a video standard protocol such as h.263, h.264, HEVV, MPEG-2, MPEG-4, VP8, VP9, or a next-generation video standard protocol (such as h.266, etc.).

Referring to fig. 2, fig. 2 shows a schematic/conceptual block diagram of an example of an encoder 20 for implementing an embodiment of the invention. In the example of fig. 2, encoder 20 includes a residual calculation unit 204, a transform processing unit 206, a quantization unit 208, an inverse quantization unit 210, an inverse transform processing unit 212, a reconstruction unit 214, a buffer 216, a loop filter unit 220, a decoded picture buffer (decoded picture buffer, DPB) 230, a prediction processing unit 260, and an entropy encoding unit 270. The prediction processing unit 260 may include an inter prediction unit 244, an intra prediction unit 254, and a mode selection unit 262. The inter prediction unit 244 may include a motion estimation unit and a motion compensation unit (not shown). The encoder 20 shown in fig. 2 may also be referred to as a hybrid video encoder or a video encoder according to a hybrid video codec.

For example, the residual calculation unit 204, the transform processing unit 206, the quantization unit 208, the prediction processing unit 260 and the entropy encoding unit 270 form a forward signal path of the encoder 20, whereas for example the inverse quantization unit 210, the inverse transform processing unit 212, the reconstruction unit 214, the buffer 216, the loop filter 220, the decoded picture buffer (decoded picture buffer, DPB) 230, the prediction processing unit 260 form a backward signal path of the encoder, wherein the backward signal path of the encoder corresponds to the signal path of the decoder (see decoder 30 in fig. 3).

Encoder 20 receives picture 201 or an image block 203 of picture 201, e.g., a picture in a sequence of pictures forming a video or video sequence, through, e.g., input 202. Image block 203 may also be referred to as a current encoded block or a to-be-processed image block, and picture 201 may be referred to as a current picture or a to-be-encoded picture (especially when distinguishing the current picture from other pictures in video encoding, such as previously encoded and/or decoded pictures in the same video sequence, i.e., a video sequence that also includes the current picture).

An embodiment of encoder 20 may comprise a partitioning unit (not shown in fig. 2) for partitioning picture 201 into a plurality of blocks, e.g. image blocks 203, typically into a plurality of non-overlapping blocks. The segmentation unit may be used to use the same block size for all pictures in the video sequence and a corresponding grid defining the block size, or to alter the block size between pictures or subsets or groups of pictures and to segment each picture into corresponding blocks.

In one example, prediction processing unit 260 of encoder 20 may be used to perform any combination of the above-described partitioning techniques.

Like picture 201, image block 203 is also or may be considered as a two-dimensional array or matrix of sampling points having sampling values, albeit of smaller size than picture 201. In other words, the image block 203 may comprise, for example, one sampling array (e.g., a luminance array in the case of a black-and-white picture 201) or three sampling arrays (e.g., one luminance array and two chrominance arrays in the case of a color picture) or any other number and/or class of arrays depending on the color format applied. The number of sampling points in the horizontal and vertical directions (or axes) of the image block 203 defines the size of the image block 203.

The encoder 20 as shown in fig. 2 is used for encoding a picture 201 block by block, for example, performing encoding and prediction for each image block 203.

The residual calculation unit 204 is configured to calculate a residual block 205 based on the picture image block 203 and the prediction block 265 (further details of the prediction block 265 are provided below), for example, by subtracting sample values of the prediction block 265 from sample values of the picture image block 203 on a sample-by-sample (pixel-by-pixel) basis to obtain the residual block 205 in a sample domain.

The transform processing unit 206 is configured to apply a transform, such as a discrete cosine transform (discrete cosine transform, DCT) or a discrete sine transform (discrete sine transform, DST), on the sample values of the residual block 205 to obtain transform coefficients 207 in the transform domain. The transform coefficients 207 may also be referred to as transform residual coefficients and represent the residual block 205 in the transform domain.

The transform processing unit 206 may be used to apply integer approximations of DCT/DST, such as the transforms specified for HEVC/H.265. Such integer approximations are typically scaled by some factor compared to the orthogonal DCT transform. To maintain the norms of the forward and inverse transformed processed residual blocks, an additional scaling factor is applied as part of the transformation process. The scaling factor is typically selected based on certain constraints, e.g., the scaling factor is a tradeoff between power of 2, bit depth of transform coefficients, accuracy, and implementation cost for shift operations, etc. For example, a specific scaling factor is specified for inverse transformation by, for example, the inverse transformation processing unit 212 on the decoder 30 side (and for corresponding inverse transformation by, for example, the inverse transformation processing unit 212 on the encoder 20 side), and accordingly, a corresponding scaling factor may be specified for positive transformation by the transformation processing unit 206 on the encoder 20 side.

The quantization unit 208 is for quantizing the transform coefficients 207, for example by applying scalar quantization or vector quantization, to obtain quantized transform coefficients 209. The quantized transform coefficients 209 may also be referred to as quantized residual coefficients 209. The quantization process may reduce the bit depth associated with some or all of the transform coefficients 207. For example, n-bit transform coefficients may be rounded down to m-bit transform coefficients during quantization, where n is greater than m. The quantization level may be modified by adjusting quantization parameters (quantization parameter, QP). For example, for scalar quantization, different scales may be applied to achieve finer or coarser quantization. Smaller quantization step sizes correspond to finer quantization, while larger quantization step sizes correspond to coarser quantization. The appropriate quantization step size may be indicated by a quantization parameter (quantization parameter, QP). For example, the quantization parameter may be an index of a predefined set of suitable quantization steps. For example, smaller quantization parameters may correspond to fine quantization (smaller quantization step size) and larger quantization parameters may correspond to coarse quantization (larger quantization step size) and vice versa. Quantization may involve division by a quantization step size and corresponding quantization or inverse quantization, e.g., performed by inverse quantization 210, or may involve multiplication by a quantization step size. Embodiments according to some standards, such as HEVC, may use quantization parameters to determine quantization step sizes. In general, the quantization step size may be calculated based on quantization parameters using a fixed-point approximation of an equation that includes division. Additional scaling factors may be introduced for quantization and inverse quantization to recover norms of residual blocks that may be modified due to the scale used in the fixed point approximation of the equation for quantization step size and quantization parameters. In one example embodiment, the inverse transformed and inverse quantized scales may be combined. Alternatively, a custom quantization table may be used and signaled from the encoder to the decoder, e.g., in a bitstream. Quantization is a lossy operation, where the larger the quantization step size, the larger the loss.

The inverse quantization unit 210 is configured to apply inverse quantization of the quantization unit 208 on the quantized coefficients to obtain inverse quantized coefficients 211, e.g., apply an inverse quantization scheme of the quantization scheme applied by the quantization unit 208 based on or using the same quantization step size as the quantization unit 208. The dequantized coefficients 211 may also be referred to as dequantized residual coefficients 211, correspond to the transform coefficients 207, although the losses due to quantization are typically different from the transform coefficients.

The inverse transform processing unit 212 is configured to apply an inverse transform of the transform applied by the transform processing unit 206, for example, an inverse discrete cosine transform (discrete cosine transform, DCT) or an inverse discrete sine transform (discrete sine transform, DST), to obtain an inverse transform block 213 in the sample domain. The inverse transform block 213 may also be referred to as an inverse transformed inverse quantized block 213 or an inverse transformed residual block 213.

A reconstruction unit 214 (e.g., a summer 214) is used to add the inverse transform block 213 (i.e., the reconstructed residual block 213) to the prediction block 265 to obtain the reconstructed block 215 in the sample domain, e.g., to add sample values of the reconstructed residual block 213 to sample values of the prediction block 265.

Optionally, a buffer unit 216, e.g. a line buffer 216 (or simply "buffer" 216), is used to buffer or store the reconstructed block 215 and the corresponding sample values for e.g. intra prediction. In other embodiments, the encoder may be configured to use the unfiltered reconstructed block and/or the corresponding sample values stored in the buffer unit 216 for any kind of estimation and/or prediction, such as intra prediction.

For example, embodiments of encoder 20 may be configured such that buffer unit 216 is used not only to store reconstructed blocks 215 for intra prediction 254, but also for loop filter unit 220 (not shown in fig. 2), and/or such that buffer unit 216 and decoded picture buffer unit 230 form one buffer, for example. Other embodiments may be used to use the filtered block 221 and/or blocks or samples (neither shown in fig. 2) from the decoded picture buffer 230 as an input or basis for the intra prediction 254.

The loop filter unit 220 (or simply "loop filter" 220) is used to filter the reconstructed block 215 to obtain a filtered block 221, which facilitates pixel transitions or improves video quality. Loop filter unit 220 is intended to represent one or more loop filters, such as deblocking filters, sample-adaptive offset (SAO) filters, or other filters, such as bilateral filters, adaptive loop filters (adaptive loop filter, ALF), or sharpening or smoothing filters, or collaborative filters. Although loop filter unit 220 is shown in fig. 2 as an in-loop filter, in other configurations loop filter unit 220 may be implemented as a post-loop filter. The filtered block 221 may also be referred to as a filtered reconstructed block 221. Decoded picture buffer 230 may store the reconstructed encoded block after loop filter unit 220 performs a filtering operation on the reconstructed encoded block.

Embodiments of encoder 20 (and correspondingly loop filter unit 220) may be configured to output loop filter parameters (e.g., sample adaptive offset information), e.g., directly or after entropy encoding by entropy encoding unit 270 or any other entropy encoding unit, e.g., such that decoder 30 may receive and apply the same loop filter parameters for decoding.

Decoded picture buffer (decoded picture buffer, DPB) 230 may be a reference picture memory that stores reference picture data for use by encoder 20 in encoding video data. DPB 230 may be formed of any of a variety of memory devices, such as dynamic random access memory (dynamic random access memory, DRAM) (including Synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM)), or other types of memory devices. DPB 230 and buffer 216 may be provided by the same memory device or separate memory devices. In a certain example, a decoded picture buffer (decoded picture buffer, DPB) 230 is used to store the filtered block 221. The decoded picture buffer 230 may further be used to store the same current picture or other previously filtered blocks, e.g., previously reconstructed and filtered blocks 221, of different pictures, e.g., previously reconstructed pictures, and may provide complete previously reconstructed, i.e., decoded pictures (and corresponding reference blocks and samples) and/or partially reconstructed current pictures (and corresponding reference blocks and samples), e.g., for inter prediction. In a certain example, if the reconstructed block 215 is reconstructed without in-loop filtering, the decoded picture buffer (decoded picture buffer, DPB) 230 is used to store the reconstructed block 215.

The prediction processing unit 260, also referred to as block prediction processing unit 260, is adapted to receive or obtain image blocks 203 (current image blocks 203 of a current picture 201) and reconstructed slice data, e.g. reference samples of the same (current) picture from the buffer 216 and/or reference picture data 231 of one or more previously decoded pictures from the decoded picture buffer 230, and to process such data for prediction, i.e. to provide a prediction block 265, which may be an inter-predicted block 245 or an intra-predicted block 255.

The mode selection unit 262 may be used to select a prediction mode (e.g., intra or inter prediction mode) and/or a corresponding prediction block 245 or 255 used as the prediction block 265 to calculate the residual block 205 and reconstruct the reconstructed block 215.

Embodiments of mode selection unit 262 may be used to select the prediction mode (e.g., from those supported by prediction processing unit 260) that provides the best match or minimum residual (minimum residual meaning better compression in transmission or storage), or that provides the minimum signaling overhead (minimum signaling overhead meaning better compression in transmission or storage), or both. The mode selection unit 262 may be adapted to determine a prediction mode based on a rate-distortion optimization (rate distortion optimization, RDO), i.e. to select the prediction mode that provides the least rate-distortion optimization, or to select a prediction mode for which the associated rate-distortion at least meets a prediction mode selection criterion.

The prediction processing performed by an instance of encoder 20 (e.g., by prediction processing unit 260) and the mode selection performed (e.g., by mode selection unit 262) will be explained in detail below.

As described above, the encoder 20 is configured to determine or select the best or optimal prediction mode from a (predetermined) set of prediction modes. The set of prediction modes may include, for example, intra prediction modes and/or inter prediction modes.

In a possible implementation, the set of intra prediction modes may comprise 35 different intra prediction modes, e.g. a non-directional mode as DC (or mean) mode and plane mode, or a directional mode as defined in h.265, or 67 different intra prediction modes, e.g. a non-directional mode as DC (or mean) mode and plane mode, or a directional mode as defined in h.266 under development.

In a possible implementation, the set of inter prediction modes depends on the available reference pictures (i.e. at least part of the decoded pictures stored in the DBP 230 as described above, for example) and other inter prediction parameters, e.g. on whether the entire reference picture is used or only a part of the reference picture is used, e.g. a search window area surrounding an area of the current block, to search for the best matching reference block, and/or on whether pixel interpolation like half-pixel and/or quarter-pixel interpolation is applied, e.g. the set of inter prediction modes may comprise advanced motion vector (Advanced Motion Vector Prediction, AMVP) mode and fusion (merge) mode, for example. In specific implementations, the inter-prediction mode set may include prediction modes based on an Affine motion model, for example, advanced motion vector prediction mode (Affine AMVP mode) based on an Affine motion model or fusion mode (Affine Merge mode) based on an Affine motion model, specifically, AMVP mode (inherited control point motion vector prediction method or constructed control point motion vector prediction method) based on a control point, merge mode (inherited control point motion vector prediction method or constructed control point motion vector prediction method) based on a control point described in the embodiments of the present invention; and, advanced temporal motion vector prediction (advanced temporal motion vector prediction, ATMVP) methods, PLANAER methods, and the like; alternatively, a Sub-block fusion pattern (Sub-block based merging mode) formed by the above-described synthesis of the affine motion model-based simulation fusion pattern, ATMVP, and/or PLANAR methods, and the like. In the embodiment of the present invention, the inter prediction of the to-be-processed push image block may be applied to unidirectional prediction (forward or backward), bi-directional prediction (forward and backward), or multi-frame prediction, and when applied to Bi-directional prediction, generalized Bi-prediction (GBi) at the Bi-directional prediction block level may be used, or a weighted prediction method, and in one example, the intra prediction unit 254 may be used to perform any combination of inter prediction techniques described below.

In addition to the above prediction modes, embodiments of the present invention may also apply skip modes and/or direct modes.

The prediction processing unit 260 may be further operative to partition the image block 203 into smaller block partitions or sub-blocks, for example, by iteratively using a quad-tree (QT) partition, a binary-tree (BT) partition, or a ternary-tree (TT) partition, or any combination thereof, and to perform prediction for each of the block partitions or sub-blocks, for example, wherein the mode selection includes selecting a tree structure of the partitioned image block 203 and selecting a prediction mode applied to each of the block partitions or sub-blocks.

The inter prediction unit 244 may include a motion estimation (motion estimation, ME) unit (not shown in fig. 2) and a motion compensation (motion compensation, MC) unit (not shown in fig. 2). The motion estimation unit is used to receive or obtain a picture image block 203 (current picture image block 203 of current picture 201) and a decoded picture 231, or at least one or more previously reconstructed blocks, e.g. reconstructed blocks of one or more other/different previously decoded pictures 231, based on the determined inter prediction mode. For example, the video sequence may include a current picture and a previously decoded picture 31, or in other words, the current picture and the previously decoded picture 31 may be part of, or form, a sequence of pictures that form the video sequence.

For example, encoder 20 may be configured to select a reference block from a plurality of reference blocks of the same or different pictures of a plurality of other pictures (reference pictures), and provide the reference picture and/or an offset (spatial offset) between a position (X, Y coordinates) of the reference block and a position of a current block to a motion estimation unit (not shown in fig. 2) as the inter prediction parameter. This offset is also called Motion Vector (MV).

The motion compensation unit is used to acquire inter prediction parameters and perform inter prediction based on or using the inter prediction parameters to acquire the inter prediction block 245. The motion compensation performed by the motion compensation unit (not shown in fig. 2) may involve fetching or generating a prediction block (predictor) based on motion/block vectors determined by motion estimation (possibly performing interpolation of sub-pixel accuracy). Interpolation filtering may generate additional pixel samples from known pixel samples, potentially increasing the number of candidate prediction blocks available for encoding a picture block. Upon receiving the motion vector for the PU of the current picture block, motion compensation unit 246 may locate the prediction block to which the motion vector points in a reference picture list. Motion compensation unit 246 may also generate syntax elements associated with the blocks and video slices for use by decoder 30 in decoding the picture blocks of the video slices.

Specifically, the inter-prediction unit 244 may transmit syntax elements to the entropy encoding unit 270, where the syntax elements include, for example, inter-prediction parameters (such as indication information of an inter-prediction mode selected for current block prediction after traversing a plurality of inter-prediction modes), index numbers of candidate motion vector lists, and optionally GBi index numbers, reference frame indexes, and the like. In a possible application scenario, if the inter prediction mode is only one, the inter prediction parameter may not be carried in the syntax element, and the decoder 30 may directly use the default prediction mode for decoding. It is appreciated that the inter prediction unit 244 may be used to perform any combination of inter prediction techniques.

The intra prediction unit 254 is used to obtain, for example, a picture block 203 (current picture block) that receives the same picture and one or more previously reconstructed blocks, for example, reconstructed neighboring blocks, for intra estimation. For example, encoder 20 may be configured to select an intra-prediction mode from a plurality of (predetermined) intra-prediction modes.

Embodiments of encoder 20 may be used to select an intra-prediction mode based on optimization criteria, such as based on a minimum residual (e.g., the intra-prediction mode that provides a prediction block 255 most similar to current picture block 203) or minimum rate distortion.

The intra prediction unit 254 is further adapted to determine an intra prediction block 255 based on intra prediction parameters like the selected intra prediction mode. In any case, after the intra-prediction mode for the block is selected, the intra-prediction unit 254 is also configured to provide the intra-prediction parameters, i.e., information indicating the selected intra-prediction mode for the block, to the entropy encoding unit 270. In one example, intra-prediction unit 254 may be used to perform any combination of intra-prediction techniques.

Specifically, the intra-prediction unit 254 may transmit a syntax element including an intra-prediction parameter (such as indication information of an intra-prediction mode selected for the current block prediction after traversing a plurality of intra-prediction modes) to the entropy encoding unit 270. In a possible application scenario, if there is only one intra prediction mode, then the intra prediction parameters may not be carried in the syntax element, and the decoder 30 may directly use the default prediction mode for decoding.

The entropy encoding unit 270 is configured to apply an entropy encoding algorithm or scheme (e.g., a variable length coding (variable length coding, VLC) scheme, a Context Adaptive VLC (CAVLC) scheme, an arithmetic coding scheme, a context adaptive binary arithmetic coding (context adaptive binary arithmetic coding, CABAC), a syntax-based context-based binary arithmetic coding (SBAC), a probability interval partitioning entropy (probability interval partitioning entropy, PIPE) coding, or other entropy encoding methods or techniques) to one or all of the quantized residual coefficients 209, inter-prediction parameters, intra-prediction parameters, and/or loop filter parameters (or not) to obtain encoded picture data 21 that may be output by the output 272 in the form of, for example, an encoded bitstream 21. The encoded bitstream may be transmitted to the decoder 30 or archived for later transmission or retrieval by the decoder 30. Entropy encoding unit 270 may also be used to entropy encode other syntax elements of the current video slice being encoded.

Other structural variations of encoder 20 may be used to encode the video stream. For example, the non-transform based encoder 20 may directly quantize the residual signal without a transform processing unit 206 for certain blocks or frames. In another embodiment, encoder 20 may have quantization unit 208 and inverse quantization unit 210 combined into a single unit.

In a particular embodiment, encoder 20 may be used to implement the inter prediction method described in the embodiment of fig. 11B, below.

It should be appreciated that other structural variations of encoder 20 may be used to encode a video stream. For example, for some image blocks or image frames, encoder 20 may directly quantize the residual signal without processing by transform processing unit 206, and accordingly without processing by inverse transform processing unit 212; alternatively, for some image blocks or image frames, encoder 20 does not generate residual data and accordingly does not need to be processed by transform processing unit 206, quantization unit 208, inverse quantization unit 210, and inverse transform processing unit 212; alternatively, encoder 20 may store the reconstructed image block directly as a reference block without processing via filter 220; alternatively, the quantization unit 208 and the inverse quantization unit 210 in the encoder 20 may be combined together. The loop filter 220 is optional, and in the case of lossless compression encoding, the transform processing unit 206, quantization unit 208, inverse quantization unit 210, and inverse transform processing unit 212 are optional. It should be appreciated that inter-prediction unit 244 and intra-prediction unit 254 may be selectively enabled depending on the different application scenarios.

Referring to fig. 3, fig. 3 shows a schematic/conceptual block diagram of an example of a decoder 30 for implementing an embodiment of the invention. Decoder 30 is for receiving encoded picture data (e.g., an encoded bitstream) 21, e.g., encoded by encoder 20, to obtain decoded picture 231. During the decoding process, decoder 30 receives video data, such as an encoded video bitstream representing picture blocks of an encoded video slice and associated syntax elements, from encoder 20.

In the example of fig. 3, decoder 30 includes entropy decoding unit 304, inverse quantization unit 310, inverse transform processing unit 312, reconstruction unit 314 (e.g., summer 314), buffer 316, loop filter 320, decoded picture buffer 330, and prediction processing unit 360. The prediction processing unit 360 may include an inter prediction unit 344, an intra prediction unit 354, and a mode selection unit 362. In some examples, decoder 30 may perform a decoding pass that is substantially reciprocal to the encoding pass described with reference to encoder 20 of fig. 2.

Entropy decoding unit 304 is used to perform entropy decoding on encoded picture data 21 to obtain, for example, quantized coefficients 309 and/or decoded encoding parameters (not shown in fig. 3), e.g., any or all of inter-prediction, intra-prediction parameters, loop filter parameters, and/or other syntax elements (decoded). Entropy decoding unit 304 is further configured to forward inter-prediction parameters, intra-prediction parameters, and/or other syntax elements to prediction processing unit 360. Decoder 30 may receive syntax elements at the video slice level and/or the video block level.

Inverse quantization unit 310 may be functionally identical to inverse quantization unit 110, inverse transform processing unit 312 may be functionally identical to inverse transform processing unit 212, reconstruction unit 314 may be functionally identical to reconstruction unit 214, buffer 316 may be functionally identical to buffer 216, loop filter 320 may be functionally identical to loop filter 220, and decoded picture buffer 330 may be functionally identical to decoded picture buffer 230.

The prediction processing unit 360 may include an inter prediction unit 344 and an intra prediction unit 354, where the inter prediction unit 344 may be similar in function to the inter prediction unit 244 and the intra prediction unit 354 may be similar in function to the intra prediction unit 254. The prediction processing unit 360 is typically used to perform block prediction and/or to obtain a prediction block 365 from the encoded data 21, as well as to receive or obtain prediction related parameters and/or information about the selected prediction mode (explicitly or implicitly) from, for example, the entropy decoding unit 304.

When a video slice is encoded as an intra-coded (I) slice, the intra-prediction unit 354 of the prediction processing unit 360 is used to generate a prediction block 365 for a picture block of the current video slice based on the signaled intra-prediction mode and data from a previously decoded block of the current frame or picture. When a video frame is encoded as an inter-coded (i.e., B or P) slice, an inter-prediction unit 344 (e.g., a motion compensation unit) of prediction processing unit 360 is used to generate a prediction block 365 for a video block of the current video slice based on the motion vector and other syntax elements received from entropy decoding unit 304. For inter prediction, a prediction block may be generated from one reference picture within one reference picture list. Decoder 30 may construct a reference frame list based on the reference pictures stored in DPB 330 using default construction techniques: list 0 and list 1.

The prediction processing unit 360 is configured to determine prediction information for a video block of a current video slice by parsing the motion vector and other syntax elements, and generate a prediction block for the current video block being decoded using the prediction information. In an example of this disclosure, prediction processing unit 360 uses some syntax elements received to determine a prediction mode (e.g., intra or inter prediction) for encoding video blocks of a video slice, an inter prediction slice type (e.g., B slice, P slice, or GPB slice), construction information for one or more of a reference picture list of the slice, motion vectors for each inter-encoded video block of the slice, inter prediction state for each inter-encoded video block of the slice, and other information to decode video blocks of a current video slice. In another example of the present disclosure, syntax elements received by decoder 30 from the bitstream include syntax elements received in one or more of an adaptive parameter set (adaptive parameter set, APS), a sequence parameter set (sequence parameter set, SPS), a picture parameter set (picture parameter set, PPS), or a slice header.

Inverse quantization unit 310 may be used to inverse quantize (i.e., inverse quantize) the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 304. The inverse quantization process may include using quantization parameters calculated by encoder 20 for each video block in a video stripe to determine the degree of quantization that should be applied and likewise the degree of inverse quantization that should be applied.

The inverse transform processing unit 312 is configured to apply an inverse transform (e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients in order to generate a residual block in the pixel domain.

A reconstruction unit 314 (e.g., a summer 314) is used to add the inverse transform block 313 (i.e., the reconstructed residual block 313) to the prediction block 365 to obtain a reconstructed block 315 in the sample domain, e.g., by adding sample values of the reconstructed residual block 313 to sample values of the prediction block 365.

Loop filter unit 320 is used (during or after the encoding cycle) to filter reconstructed block 315 to obtain filtered block 321, to smooth pixel transitions or improve video quality. In one example, loop filter unit 320 may be used to perform any combination of the filtering techniques described below. Loop filter unit 320 is intended to represent one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or other filters, such as a bilateral filter, an adaptive loop filter (adaptive loop filter, ALF), or a sharpening or smoothing filter, or a collaborative filter. Although loop filter unit 320 is shown in fig. 3 as an in-loop filter, in other configurations loop filter unit 320 may be implemented as a post-loop filter.

The decoded video blocks 321 in a given frame or picture are then stored in a decoded picture buffer 330 that stores reference pictures for subsequent motion compensation.

Decoder 30 is for outputting decoded picture 31, e.g., via output 332, for presentation to a user or for viewing by a user.

Other variations of decoder 30 may be used to decode the compressed bit stream. For example, decoder 30 may generate the output video stream without loop filter unit 320. For example, the non-transform based decoder 30 may directly inverse quantize the residual signal without an inverse transform processing unit 312 for certain blocks or frames. In another embodiment, the decoder 30 may have an inverse quantization unit 310 and an inverse transform processing unit 312 combined into a single unit.

In a particular embodiment, decoder 30 may be used to implement the inter prediction method described in the embodiment of fig. 11A, below.

It should be appreciated that other structural variations of decoder 30 may be used to decode the encoded video bitstream. For example, decoder 30 may generate an output video stream without processing by filter 320; alternatively, for some image blocks or image frames, the entropy decoding unit 304 of the decoder 30 does not decode quantized coefficients, and accordingly does not need to be processed by the inverse quantization unit 310 and the inverse transform processing unit 312. Loop filter 320 is optional; and for the case of lossless compression, the inverse quantization unit 310 and the inverse transform processing unit 312 are optional. It should be appreciated that the inter prediction unit and the intra prediction unit may be selectively enabled according to different application scenarios.

It should be understood that, in the encoder 20 and the decoder 30 of the present invention, the processing result for a certain link may be further processed and then output to a next link, for example, after the links such as interpolation filtering, motion vector derivation or loop filtering, the processing result for the corresponding link may be further processed by performing operations such as Clip or shift.

For example, the motion vector of the control point of the current image block, which is derived from the motion vector of the neighboring affine encoded block, may be further processed, which is not limited by the present invention. For example, the range of motion vectors is constrained to be within a certain bit width. Assuming that the bit width of the allowed motion vector is bitDepth, the range of motion vectors is-2 (bitDepth-1) to 2 (bitDepth-1) -1, where the "∈" sign represents the power. If the bitDepth is 16, the value range is-32768-32767.

If the bitDepth is 18, the value range is-131072 ～ 131071. The constraint may be performed in two ways:

mode 1, the high order overflow of the motion vector is removed:

ux＝(vx+2 ^bitDepth )％2 ^bitDepth

vx＝(ux>＝2 ^bitDepth-1 )？(ux-2 ^bitDepth ):ux

uy＝(vy+2 ^bitDepth )％2 ^bitDepth

vy＝(uy>＝2 ^bitDepth-1 )？(uy-2 ^bitDepth ):uy

for example vx has a value of-32769 and 32767 by the above formula. Because in the computer the values are stored in the form of binary complements, -32769 has a binary complement 1,0111,1111,1111,1111 (17 bits), the computer discards the high order bits for the overflow treatment, and vx has a value 0111,1111,1111,1111 and 32767, consistent with the result obtained by the formula treatment.

Method 2, clipping the motion vector as shown in the following formula:

vx＝Clip3(-2 ^bitDepth-1 ,2 ^bitDepth-1 -1,vx)

vy＝Clip3(-2 ^bitDepth-1 ,2 ^bitDepth-1 -1,vy)

where Clip3 is defined to mean clamping the value of z between intervals [ x, y ]:

referring to fig. 4, fig. 4 is a schematic structural diagram of a video decoding apparatus 400 (e.g., a video encoding apparatus 400 or a video decoding apparatus 400) according to an embodiment of the present invention. The video coding apparatus 400 is adapted to implement the embodiments described herein. In one embodiment, video coding device 400 may be a video decoder (e.g., decoder 30 of fig. 1A) or a video encoder (e.g., encoder 20 of fig. 1A). In another embodiment, video coding apparatus 400 may be one or more of the components described above in decoder 30 of fig. 1A or encoder 20 of fig. 1A.

The video coding apparatus 400 includes: an ingress port 410 and a receiving unit (Rx) 420 for receiving data, a processor, logic unit or Central Processing Unit (CPU) 430 for processing data, a transmitter unit (Tx) 440 and an egress port 450 for transmitting data, and a memory 460 for storing data. The video decoding apparatus 400 may further include a photoelectric conversion component and an electro-optical (EO) component coupled to the inlet port 410, the receiver unit 420, the transmitter unit 440, and the outlet port 450 for the outlet or inlet of optical or electrical signals.

The processor 430 is implemented in hardware and software. Processor 430 may be implemented as one or more CPU chips, cores (e.g., multi-core processors), FPGAs, ASICs, and DSPs. Processor 430 is in communication with inlet port 410, receiver unit 420, transmitter unit 440, outlet port 450, and memory 460. The processor 430 includes a coding module 470 (e.g., an encoding module 470 or a decoding module 470). The encoding/decoding module 470 implements embodiments disclosed herein to implement the chroma block prediction methods provided by embodiments of the present invention. For example, the encoding/decoding module 470 implements, processes, or provides various encoding operations. Thus, substantial improvements are provided to the functionality of the video coding device 400 by the encoding/decoding module 470 and affect the transition of the video coding device 400 to different states. Alternatively, the encoding/decoding module 470 is implemented in instructions stored in the memory 460 and executed by the processor 430.

Memory 460 includes one or more disks, tape drives, and solid state drives, and may be used as an overflow data storage device for storing programs when selectively executing such programs, as well as storing instructions and data read during program execution. Memory 460 may be volatile and/or nonvolatile and may be Read Only Memory (ROM), random Access Memory (RAM), random access memory (TCAM) and/or Static Random Access Memory (SRAM).

Referring to fig. 5, fig. 5 is a simplified block diagram of an apparatus 500 that may be used as either or both of the source device 12 and the destination device 14 in fig. 1A, according to an example embodiment. The apparatus 500 may implement the techniques of the present invention. In other words, fig. 5 is a schematic block diagram of one implementation of an encoding device or decoding device (simply referred to as decoding device 500) of an embodiment of the present invention. The decoding device 500 may include, among other things, a processor 510, a memory 530, and a bus system 550. The processor is connected with the memory through the bus system, the memory is used for storing instructions, and the processor is used for executing the instructions stored by the memory. The memory of the decoding device stores program code, and the processor may invoke the program code stored in the memory to perform the various video encoding or decoding methods described herein. To avoid repetition, a detailed description is not provided herein.

In an embodiment of the present invention, the processor 510 may be a central processing unit (Central Processing Unit, abbreviated as "CPU"), and the processor 510 may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), off-the-shelf programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 530 may include a Read Only Memory (ROM) device or a Random Access Memory (RAM) device. Any other suitable type of storage device may also be used as memory 530. Memory 530 may include code and data 531 accessed by processor 510 using bus 550. Memory 530 may further include an operating system 533 and an application 535, which application 535 includes at least one program that allows processor 510 to perform the video encoding or decoding methods described herein. For example, applications 535 may include applications 1 through N, which further include video encoding or decoding applications (simply video coding applications) that perform the video encoding or decoding methods described in this disclosure.

The bus system 550 may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus. For clarity of illustration, the various buses are labeled in the figure as bus system 550.

Optionally, the decoding device 500 may also include one or more output devices, such as a display 570. In one example, the display 570 may be a touch sensitive display that incorporates a display with a touch sensitive unit operable to sense touch input. A display 570 may be connected to processor 510 via bus 550.

Although processor 510 and memory 530 of apparatus 500 are depicted in fig. 5 as being integrated in a single unit, other configurations may also be used. The operations of processor 510 may be distributed among a plurality of directly couplable machines, each having one or more processors, or distributed in a local area or other network. Memory 530 may be distributed among multiple machines, such as network-based memory or memory in multiple machines running apparatus 500. Although depicted here as a single bus, the bus 550 of the apparatus 500 may be formed of multiple buses. Further, slave memory 530 may be coupled directly to other components of apparatus 500 or may be accessible through a network, and may comprise a single integrated unit, such as a memory card, or multiple units, such as multiple memory cards. Thus, the apparatus 500 may be implemented in a variety of configurations.

In order to better understand the technical solution of the embodiment of the present invention, the inter-frame prediction mode, the non-translational motion model, the inherited control point motion vector prediction method, the constructed control point motion vector prediction method, the advanced motion vector prediction mode based on the affine motion model, the fusion mode based on the affine motion model, the sub-block fusion mode and the generalized bi-directional prediction method related to the embodiment of the present invention are further described below.

1) Inter prediction mode. In HEVC, two inter prediction modes are used, advanced motion vector prediction (advanced motion vector prediction, AMVP) mode and fusion (merge) mode, respectively.

For AMVP mode, a coded block (denoted as a neighboring block) adjacent to a current block in the spatial domain or the temporal domain is traversed, a candidate motion vector list (may also be referred to as a motion information candidate list) is constructed according to motion information of each neighboring block, then an optimal motion vector is determined from the candidate motion vector list through a rate-distortion cost, and the candidate motion information with the minimum rate-distortion cost is used as a motion vector predicted value (motion vector predictor, MVP) of the current block. The positions of the adjacent blocks and the traversing sequence thereof are predefined. The rate-distortion Cost is calculated by the formula (1), wherein J represents the rate-distortion Cost RD Cost, SAD is the sum of absolute errors (sum of absolute differences, SAD) between the predicted pixel value obtained by performing motion estimation using the candidate motion vector predicted value, R represents the code rate, and λ represents the lagrangian multiplier. The coding end transmits the index value of the selected motion vector predicted value in the candidate motion vector list and the index value of the reference frame to the decoding end. Further, motion search is performed in a neighborhood with the MVP as the center to obtain an actual motion vector of the current block, and the encoding end transmits a difference value (motion vector difference) between the MVP and the actual motion vector to the decoding end.

J＝SAD+λR (1)

For the Merge mode, a candidate motion vector list is constructed through the motion information of the coded blocks adjacent to the current block in the space domain or the time domain, then the optimal motion information is determined from the candidate motion vector list as the motion information of the current block through calculating the rate-distortion cost, and then the index value (marked as Merge index, the same applies below) of the position of the optimal motion information in the candidate motion vector list is transmitted to a decoding end. The spatial and temporal candidate motion information of the current block is shown in fig. 6, the spatial candidate motion information is from 5 spatially adjacent blocks (A0, A1, B0, B1 and B2), and if a neighboring block is not available (the neighboring block does not exist or is not coded or the prediction mode adopted by the neighboring block is not inter prediction mode), the motion information of the neighboring block is not added to the candidate motion vector list. The time domain candidate motion information of the current block is obtained after the MVs of the corresponding position blocks in the reference frame are scaled according to the reference frame and the sequence count (picture order count, POC) of the current frame. Firstly, judging whether a block at a T position in a reference frame is available or not, and if not, selecting a block at a C position.

Similar to the AMVP mode, the position of the neighboring block and its traversal order of the Merge mode are also predefined, and the position of the neighboring block and its traversal order may be different in different modes.

It can be seen that in both AMVP mode and Merge mode, a list of candidate motion vectors needs to be maintained. Each time new motion information is added to the candidate list, it is checked whether the same motion information already exists in the list, and if so, the motion information is not added to the list. We refer to this checking procedure as pruning of the candidate motion vector list. List pruning is to prevent identical motion information from occurring in the list, avoiding redundant rate-distortion cost computation.

In inter prediction of HEVC, all pixels in a coding block use the same motion information, and then motion compensation is performed according to the motion information to obtain a predicted value of the pixels of the coding block. However, in the encoded block, not all pixels have the same motion characteristics, and using the same motion information may lead to inaccuracy of motion compensated prediction, thereby increasing residual information.

Existing video coding standards use block-matched motion estimation based on translational motion models and assume that the motion of all pixels in a block is consistent. However, since in the real world there are a lot of objects with non-translational motion, such as rotating objects, roller coasters rotating in different directions, some special actions in the fireworks and movies put in, especially moving objects in UGC scenes, which are encoded, if the block motion compensation technique based on translational motion model in the current encoding standard is adopted, the encoding efficiency is greatly affected, and thus, a non-translational motion model, such as affine motion model, is generated, so as to further improve the encoding efficiency.

Based on this, according to the difference of motion models, the AMVP mode may be divided into an AMVP mode based on a translational model and an AMVP mode based on a non-translational model; the Merge mode may be classified into a Merge mode based on a translational model and a Merge mode based on a non-translational motion model.

2) A non-translational motion model. The non-translational motion model prediction refers to that the same motion model is used at the encoding and decoding end to deduce the motion information of each sub motion compensation unit in the current block, and motion compensation is carried out according to the motion information of the sub motion compensation units to obtain a prediction block, so that the prediction efficiency is improved. Common non-translational motion models are 4-parameter affine motion models or 6-parameter affine motion models.

The sub-motion compensation unit in the embodiment of the present invention may be a pixel or divided according to a specific method into N ₁ ×N ₂ Wherein N is ₁ And N ₂ Are all positive integers, N ₁ Can be equal to N ₂ May not be equal to N ₂ 。

The common non-translational motion model is a 4-parameter affine motion model or a 6-parameter affine motion model, and in a possible application scene, the non-translational motion model is also an 8-parameter bilinear model. The following will explain separately.

The 4-parameter affine motion model is shown in formula (2):

The 4-parameter affine motion model can be represented by motion vectors of two pixels and coordinates thereof with respect to the top-left vertex pixel of the current block, and the pixels for representing the parameters of the motion model are referred to as control points. If the pixel points of the top left vertex (0, 0) and the top right vertex (W, 0) are adopted as the control points, the motion vectors (vx 0, vy 0) and (vx 1, vy 1) of the top left vertex and the top right vertex of the current block are determined, and then the motion information of each sub-motion compensation unit in the current block is obtained according to a formula (3), wherein (x, y) is the coordinate of the sub-motion compensation unit relative to the top left vertex pixel of the current block, and W is the width of the current block.

The 6-parameter affine motion model is shown in formula (4):

the 6-parameter affine motion model can be represented by motion vectors of three pixel points and coordinates thereof with respect to the top-left vertex pixel of the current block. If the pixel points of the upper left vertex (0, 0), the upper right vertex (W, 0) and the lower left vertex (0, H) are adopted as the control points, the motion vectors of the upper left vertex, the upper right vertex and the lower left vertex of the current block are respectively (vx 0, vy 0) and (vx 1, vy 1) and (vx 2, vy 2), and then the motion information of each sub-motion compensation unit in the current block is obtained according to a formula (5), wherein (x, y) is the coordinates of the sub-motion compensation unit relative to the upper left vertex pixel of the current block, and W and H are respectively the width and the height of the current block.

The 8-parameter bilinear model is shown in the following formula (6):

/>

the 8-parameter bilinear model can be represented by motion vectors of four pixels and their coordinates relative to the top-left vertex pixel of the current coding block. If the upper left vertex (0, 0), the upper right vertex (W, 0), the lower left vertex (0, H) and the lower right fixed point (W, H) pixels are used as control points, motion vectors (vx 0, vy 0), (vx 1, vy 1), (vx 2, vy 2) and (vx 3, vy 3) of the upper left vertex, the upper right vertex, the lower left vertex and the lower right vertex of the current coding block are determined, and then motion information of each sub-motion compensation unit in the current coding block is derived according to the following formula (7), wherein (x, y) is coordinates of the sub-motion compensation unit relative to the upper left vertex pixels of the current coding block, and W and H are the width and height of the current coding block respectively.

The encoded blocks predicted using affine motion models may also be referred to as affine encoded blocks. From the above, it can be seen that the affine motion model is related to the motion information of the control points of the affine coding block.

In general, the motion information of the control point of the affine coding block may be obtained using an advanced motion vector prediction (Advanced Motion Vector Prediction, AMVP) mode based on an affine motion model or a fusion (Merge) mode based on an affine motion model. Further, the motion information of the control point of the current coding block may be obtained by an inherited control point motion vector prediction method or a constructed control point motion vector prediction method.

3) Inherited control point motion vector prediction methods. The inherited control point motion vector prediction method is to determine the candidate control point motion vector of the current block by using the motion model of the adjacent coded affine coding block.

Taking the current block shown in fig. 7 as an example, traversing adjacent position blocks around the current block according to a set sequence, such as the sequence of A1- & gt B0- & gt A0- & gt B2, finding an affine coding block where the adjacent position blocks of the current block are located, obtaining control point motion information of the affine coding block, and further deriving a control point motion vector (for a Merge mode) or a motion vector predicted value (for an AMVP mode) of a control point of the current block through a motion model constructed by the control point motion information of the affine coding block. A1→B1→B0→A0→B2 are only examples, and other combinations of sequences are also applicable to the present invention. The adjacent position blocks are not limited to A1, B0, A0, and B2.

The adjacent position blocks may be a pixel point, and pixel blocks with preset sizes divided according to a specific method, for example, may be a 4x4 pixel block, a 4x2 pixel block, or pixel blocks with other sizes, which is not limited. Wherein, the affine coding block is a coded block (also simply called a neighboring affine coding block) adjacent to the current block, which is predicted by using an affine motion model in the coding stage.

The following describes the determination process of the candidate control point motion vector of the current block, taking A1 as an example as shown in fig. 7, and the like:

if the coding block where A1 is located is a 4-parameter affine coding block (i.e. the affine coding block predicts using a 4-parameter affine motion model), then motion vectors (vx 4, vy 4) of the top left vertex (x 4, y 4) and motion vectors (vx 5, vy 5) of the top right vertex (x 5, y 5) of the affine coding block are obtained.

Then, a motion vector (vx 0, vy 0) of the top-left vertex (x 0, y 0) of the current affine-coded block is obtained by calculation using the following formula (8):

the motion vector (vx 1, vy 1) of the top right vertex (x 1, y 1) of the current affine-coded block is obtained by calculation using the following formula (9):

the combination of the motion vector (vx 0, vy 0) of the upper left vertex (x 0, y 0) and the motion vector (vx 1, vy 1) of the upper right vertex (x 1, y 1) of the current block obtained by the affine-coded block on which A1 is located as above is a candidate control point motion vector of the current block.

If the coding block in which A1 is located is a 6-parameter affine coding block (i.e., the affine coding block predicts using a 6-parameter affine motion model), motion vectors (vx 4, vy 4) of an upper left vertex (x 4, y 4), motion vectors (vx 5, vy 5) of an upper right vertex (x 5, y 5), and motion vectors (vx 6, vy 6) of a lower left vertex (x 6, y 6) of the affine coding block are obtained.

Then, a motion vector (vx 0, vy 0) of the top-left vertex (x 0, y 0) of the current block is calculated using the following formula (10):

the motion vector (vx 1, vy 1) of the top right vertex (x 1, y 1) of the current block is calculated using the following formula (11):

motion vectors (vx 2, vy 2) of the lower left vertex (x 2, y 2) of the current block are obtained by calculation using the following formula (12):

the combination of the motion vector (vx 0, vy 0) of the upper left vertex (x 0, y 0), the motion vector (vx 1, vy 1) of the upper right vertex (x 1, y 1), the motion vector (vx 2, vy 2) of the lower left vertex (x 2, y 2) of the current block obtained by the affine-coded block based on A1 as above is the candidate control point motion vector of the current block.

It should be noted that other motion models, candidate positions, and search traversal sequences may also be applied to the present invention, which is not described in detail in the embodiments of the present invention.

It should be noted that, the method of using other control points to represent the motion model of the neighboring and current coding blocks may also be applied to the present invention, and will not be described herein.

4) The control point motion vector (constructed control point motion vectors) prediction method 1: control point motion vector prediction method based on construction of AMVP mode of affine motion model.

The constructed control point motion vector prediction method is to combine motion vectors of coded blocks adjacent to the periphery of a control point of a current block as motion vectors of control points of the current affine coded block without considering whether the coded blocks adjacent to the periphery are affine coded blocks. The control point motion vector prediction method constructed based on different prediction modes (AMVP mode based on affine motion model and Merge mode based on affine motion model) is different.

The control point motion vector prediction method based on the construction of the AMVP mode of the affine motion model is specifically described below. The constructed control point motion vector prediction method will be described with reference to fig. 8A, in which motion vectors of the upper left vertex and the upper right vertex of the current block are determined using motion information of coded blocks adjacent to the periphery of the current coded block. It should be noted that fig. 8A is only an example.

As shown in fig. 8A, in an embodiment (e.g., the current block is predicted using a 4-parameter affine motion model), the motion vectors of the blocks A2, B2, and B3 of the coded blocks adjacent to the top left vertex are used as candidate motion vectors for the motion vectors of the top left vertex of the current block; the motion vectors of the coded blocks B1 and B0 adjacent to the upper right vertex are used as candidate motion vectors of the motion vector of the upper right vertex of the current block. Combining the candidate motion vectors of the upper left vertex and the upper right vertex to form a plurality of tuples, wherein the motion vectors of the two encoded blocks included in the tuples can be used as candidate control point motion vectors of the current block, as shown in the following (13A):

{v _A2 ，v _B1 }，{v _A2 ，v _B0 }，{v _B2 ，v _B1 }，{v _B2 ，v _B0 }，{v _B3 ，v _B1 }，{v _B3 ，v _B0 } (13A)

wherein v is _A2 Representing the motion vector of A2, v _B1 Representing the motion vector of B1, v _B0 Representing the motion vector of B0, v _B2 Representing the motion vector of B2, v _B3 Represents the motion vector of B3.

In yet another embodiment (e.g., the current block is predicted using a 6-parameter affine motion model), as shown in fig. 8A, the motion vectors of the blocks A2, B2, and B3 of the coded blocks adjacent to the top-left vertex are used as candidate motion vectors for the motion vectors of the top-left vertex of the current block; the motion vector of the block B1 and B0 with the top right vertex adjacent to the coded block is used as the candidate motion vector of the top right vertex of the current block, and the motion vector of the block A0, A1 with the bottom right vertex adjacent to the coded block is used as the candidate motion vector of the bottom left vertex of the current block. Combining the candidate motion vectors of the upper left vertex, the upper right vertex and the lower left vertex to form a triplet, wherein the motion vectors of three encoded blocks included in the triplet can be used as candidate control point motion vectors of the current block, and the motion vectors are shown in the following formulas (13B) and (13C):

{v _A2 ，v _B1 ，v _A0 }，{v _A2 ，v _B0 ，v _A0 }，{v _B2 ，v _B1 ，v _A0 }，{v _B2 ，v _B0 ，v _A0 }，{v _B3 ，v _B1 ，v _A0 }，{v _B3 ，v _B0 ，v _A0 } (13B)

{v _A2 ，v _B1 ，v _A1 }，{v _A2 ，v _B0 ，v _A1 }，{v _B2 ，v _B1 ，v _A1 }，{v _B2 ，v _B0 ，v _A1 }，{v _B3 ，v _B1 ，v _A1 }，{v _B3 ，v _B0 ，v _A1 } (13C)

wherein v is _A2 Representing the motion vector of A2, v _B1 Representing the motion vector of B1, v _B0 Representing the motion vector of B0, v _B2 Representing the motion vector of B2, v _B3 Representing the motion vector of B3, v _A0 Representing the motion vector of A0, v _A1 Representing the motion vector of A1.

It should be noted that other methods for combining motion vectors of control points are also applicable to the present invention, and are not described herein.

5) The control point motion vector (constructed control point motion vectors) prediction method 2: control point motion vector prediction method based on construction of Merge mode of affine motion model.

The control point motion vector prediction method of this configuration will be described with reference to fig. 8B, in which motion vectors of the upper left vertex and the upper right vertex of the current block are determined using motion information of coded blocks adjacent to the periphery of the current coded block. It should be noted that fig. 8B is only an example.

As shown in fig. 8B, CPk (k=1, 2,3, 4) represents the kth control point. A0 A1, A2, B0, B1, B2 and B3 are the spatial neighboring positions of the current block and are used for predicting CP1, CP2 or CP3; t is the time-domain neighbor of the current block, used to predict CP4. Assume that the coordinates of CP1, CP2, CP3 and CP4 are (0, 0), (W, 0), (H, 0) and (W, H), respectively, where W and H are the width and height of the current block. Then for each control point of the current block, motion information of the respective control point of the current block may be acquired in the following order:

1. For CP1, the order of checking is b2→a2→b3, and if B2 is available, the motion information of B2 is employed. Otherwise, A2, B3 are detected. If the motion information of the three positions is not available, the motion information of the CP1 cannot be obtained.

2. For CP2, the checking sequence is B0→B1; if B0 is available, CP2 uses the motion information of B0. Otherwise, B1 is detected. If the motion information of both positions is not available, the motion information of CP2 cannot be obtained.

3. For CP3, the detection order is A0→A1;

4. for CP4, T motion information is employed.

Where X may denote that a block including X (X is A0, A1, A2, B0, B1, B2, B3, or T) position has been encoded and an inter prediction mode employed; otherwise, the X position is not available. It should be noted that other methods for obtaining the motion information of the control point may also be applied to the embodiments of the present invention, and are not described herein.

Then, the motion information of the control point of the current block is combined to obtain the constructed motion information of the control point.

In one embodiment (e.g., the current block is predicted using a 4-parameter affine motion model), the motion information of two control points of the current block are combined to form a binary set to construct the 4-parameter affine motion model. The two control points may be combined in the form of { CP1, CP4}, { CP2, CP3}, { CP1, CP2}, { CP2, CP4}, { CP1, CP3}, { CP3, CP4}. For example, a 4-parameter Affine motion model constructed using a binary set of CP1 and CP2 control points may be denoted as Affine (CP 1, CP 2).

In yet another embodiment (e.g., the current block is predicted using a 6-parameter affine motion model), the motion information of the three control points of the current block are combined to form a triplet for constructing the 6-parameter affine motion model. The three control points may be combined in the form of { CP1, CP2, CP4}, { CP1, CP2, CP3}, { CP2, CP3, CP4}, { CP1, CP3, CP4}. For example, a 6-parameter Affine motion model constructed using a triplet of CP1, CP2 and CP3 control points may be denoted as Affine (CP 1, CP2, CP 3).

In yet another embodiment (e.g., the current block is predicted using an 8-parameter bilinear model), the motion information of the four control points of the current block is combined to form a quadruple for constructing the 8-parameter bilinear model. An 8-parameter Bilinear model constructed by four groups of control points of CP1, CP2, CP3 and CP4 is recorded as Bilinear (CP 1, CP2, CP3 and CP 4).

In the embodiment of the present invention, for convenience of description, a combination of motion information of two control points (or two encoded blocks) is simply referred to as a binary group, a combination of motion information of three control points (or two encoded blocks) is simply referred to as a ternary group, and a combination of motion information of four control points (or four encoded blocks) is simply referred to as a quaternary group.

Traversing the models according to a preset sequence, and if the motion information of a certain control point corresponding to the combined model is not available, considering that the model is not available; otherwise, determining the reference frame index of the model, scaling the motion vector of the control point, and if the motion information of all the scaled control points is consistent, the model is illegal. And if the motion information of the control point for controlling the model is determined to be available and the model is legal, adding the motion information of the control point for constructing the model into a motion information candidate list.

The method of motion vector scaling of the control point is shown in equation (14):

wherein CurPoc represents the POC number of the current frame, desPoc represents the POC number of the reference frame of the current block, srcPoc represents the POC number of the reference frame of the control point, MV _s Representing the scaled motion vector, MV represents the motion vector of the control point.

In a possible embodiment, in the candidate list construction process, the construction device (such as an encoder or a decoder) determines whether reference frame indexes of all control points in the optional motion information combination are the same; in case the reference frame index of all control points in the optional motion information combination is the same, the construction means adds the optional motion information combination to the motion information candidate list. Compared with the method described in the embodiment of fig. 8B, the construction device does not perform judgment of the motion vector of the control point in the combination and scaling of the motion vector, so that the problem of higher computational complexity caused by scaling of the motion vector is solved. That is, in such a scene, the reference frame indexes of all control points in the optional motion information combination satisfying the preset condition are the same.

In a possible embodiment, in the candidate list construction process, the construction device (such as an encoder or a decoder) determines whether reference frame indexes of all control points in the optional motion information combination are the same; in the case that the reference frame indexes of all control points in the optional motion information combination are the same, the construction device judges whether the optional motion vectors of all control points in the optional motion information combination are the same or not; if the optional motion vectors of all control points in the optional motion information combination are different, the construction means stores the optional motion information combination in a motion information candidate list. Compared with the method described in the embodiment of fig. 8B, the construction device does not perform scaling of the motion vector, and solves the problem of higher computational complexity caused by scaling of the motion vector. That is, in such a scene, reference frame indexes of all control points in the optional motion information combination satisfying the preset condition are the same, and optional motion vectors of at least two control points are different.

In a possible embodiment, in the candidate list construction process, the optional motion information combination generated by the construction device (such as an encoder or a decoder) may further include at least optional motion information of a first control point and optional motion information of a second control point, where the first control point and the second control point are adjacent control points of the current image block. That is, the optional motion information combination may not include only the optional motion information of the diagonal control point of the current image block.

It should be noted that, in the embodiment of the present invention, the combination of different control points may also be converted into the control points at the same location.

For example, a 4-parameter affine motion model obtained by combining { CP1, CP4}, { CP2, CP3}, { CP2, CP4}, { CP1, CP3}, { CP3, CP4} is converted to be represented by { CP1, CP2} or { CP1, CP2, CP3 }. The conversion method is that the motion vector of the control point and the coordinate information thereof are substituted into a formula (2) to obtain model parameters, and the coordinate information of { CP1, CP2} is substituted into a formula (3) to obtain the motion vector thereof.

More directly, the conversion can be performed according to the following formulas (15) - (23), wherein W represents the width of the current block, H represents the height of the current block, in formulas (15) - (23), (vx) ₀ ，vy ₀ ) Motion vector representing CP1, (vx) ₁ ，vy ₁ ) Motion vector representing CP2, (vx) ₂ ，vy ₂ ) Motion vector representing CP3, (vx) ₃ ，vy ₃ ) Representing the motion vector of CP 4.

Conversion of { CP1, CP2} to { CP1, CP2, CP3} can be achieved by the following equation (15), i.e., the motion vector of CP3 in { CP1, CP2, CP3} can be determined by equation (15):

the { CP1, CP3} conversion { CP1, CP2} or { CP1, CP2, CP3} can be realized by the following formula (16):

the conversion of { CP2, CP3} to { CP1, CP2} or { CP1, CP2, CP3} can be achieved by the following equation (17):

The conversion of { CP1, CP4} to { CP1, CP2} or { CP1, CP2, CP3} can be achieved by the following formula (18) or (19):

/>

conversion of { CP2, CP4} to { CP1, CP2} can be achieved by the following formula (20), conversion of { CP2, CP4} to { CP1, CP2, CP3} can be achieved by the formulas (20) and (21):

the conversion of { CP3, CP4} into { CP1, CP2} can be achieved by the following formula (20), and the conversion of { CP3, CP4} into { CP1, CP2, CP3} can be achieved by the following formulas (22) and (23):

for example, the 6-parameter affine motion model of { CP1, CP2, CP4}, { CP2, CP3, CP4}, { CP1, CP3, CP4} combination is converted into the control point { CP1, CP2, CP3 }. The conversion method is that the motion vector of the control point and the coordinate information thereof are substituted into the formula (4) to obtain model parameters, and the coordinate information of { CP1, CP2 and CP3} is substituted into the formula (5) to obtain the motion vector thereof.

More directly, the conversion may be performed according to the following formulas (24) - (26), where W represents the width of the current block, H represents the height of the current block, where (vx) in formulas (24) - (26) ₀ ，vy ₀ ) Motion vector representing CP1, (vx) ₁ ，vy ₁ ) Motion vector representing CP2, (vx) ₂ ，vy ₂ ) Motion vector representing CP3, (vx) ₃ ，vy ₃ ) Representing the motion vector of CP 4.

Conversion of { CP1, CP2, CP4} to { CP1, CP2, CP3} can be achieved by equation (24):

Conversion of { CP2, CP3, CP4} to { CP1, CP2, CP3} can be achieved by equation (25):

the conversion of { C P, CP3, CP4} to { CP1, CP2, CP3} can be achieved by equation (26):

in a possible embodiment, after adding the currently constructed control point motion information to the candidate motion vector list, if the length of the candidate list is smaller than the maximum list length (e.g. MaxAffineNumMrgCand), traversing the combinations according to a preset sequence to obtain legal combinations as candidate control point motion information, and if the candidate motion vector list is empty, adding the candidate control point motion information to the candidate motion vector list; otherwise, traversing the motion information in the candidate motion vector list in turn, and checking whether the motion information which is the same as the motion information of the candidate control point exists in the candidate motion vector list. And if the motion information which is the same as the motion information of the candidate control point does not exist in the candidate motion vector list, adding the motion information of the candidate control point into the candidate motion vector list.

Illustratively, one preset sequence is as follows: an Affine (CP 1, CP2, CP 3) →an Affine (CP 1, CP2, CP 4) →an Affine (CP 1, CP3, CP 4) →an Affine (CP 2, CP3, CP 4) →an Affine (CP 1, CP 2) →an Affine (CP 1, CP 3) →an Affine (CP 2, CP 3) →an Affine (CP 1, CP 4) →an Affine (CP 2, CP 4) →an Affine (CP 3, CP 4), a total of 10 combinations.

If the control point motion information corresponding to the combination is not available, the combination is considered to be unavailable. If the combination is available, determining the reference frame index of the combination (when two control points are used, the reference frame index with the smallest reference frame index is selected as the reference frame index of the combination, when the reference frame index is larger than the two control points, the reference frame index with the largest occurrence number is selected first, and if a plurality of reference frame indexes occur as many as the occurrence number, the reference frame index with the smallest reference frame index is selected as the reference frame index of the combination), and scaling the motion vector of the control point. If the motion information of all the control points after scaling is consistent, the combination is illegal.

6) Advanced temporal motion vector prediction (advanced temporal motion vector prediction, ATMVP) method. In inter prediction of HEVC, all pixels in a Coding Unit (CU) use the same motion information for motion compensation to obtain a predicted value of a pixel in the CU. However, the pixels in the CU do not necessarily all have the same motion characteristics, and predicting all pixels in the CU using the same motion information may reduce the accuracy of motion compensation. And the ATMVP method is beneficial to improving the accuracy of motion compensation.

Taking fig. 9 as an example, the process of inter-prediction of the current image using the ATMVP technique mainly includes: determining an offset motion vector of a current block to be processed in a current coded image; determining a corresponding sub-block of the sub-block to be processed in a corresponding reference image (target image) according to the position of the sub-block to be processed in the current block to be processed and the offset motion vector; determining the motion vector of the current sub-block to be processed according to the motion vector of the corresponding sub-block; and performing motion compensation prediction on the sub-block to be processed according to the motion vector of the sub-block to be processed to obtain a more accurate predicted pixel value of the sub-block to be processed. It will be appreciated that more accurate predicted pixel values for the current block to be processed may be further obtained based on the above-described process.

7) PLANAR mode. The PLANAR method uses two linear filters in the horizontal and vertical directions, and takes the average value of the two as the predicted value of the current block pixel. The method can enable the predicted pixel value to change smoothly, and improves the subjective quality of the image.

Taking fig. 10 as an example, the motion information of the upper spatial neighboring position, the left spatial neighboring position, the right position, and the lower position of each sub-block (sub-coding unit) of the current block is obtained by using the PLANAR method, and the average value is calculated and converted into the motion information of each sub-block.

Specifically, for a sub-block having coordinates of (x, y), the sub-block motion vector P (x, y) may be interpolated using the horizontal direction motion vector P _h (x, y) and vertical interpolation motion vector P _v (x, y) as shown in equation (27):

P(x，y)＝(H×p _h (x，y)+W×P _v (x，y)+H×W)/(2×H×W) (27)

horizontal direction interpolation motion vector P _h (x, y) and vertical interpolation motion vector P _v (x, y) can be calculated by using motion vectors of left, right, upper and lower sides of the current sub-block as shown in formula (28) (29):

P _h (x，y)＝(W-1-x)×L(-1，y)+(x+1)×R(W，y) (28)

P _v (x，y)＝(H-1-y)×A(x，-1)+(y+1)×B(x，H) (29)

where L (-1, y) and R (W, y) represent motion vectors at left and right positions of the current sub-block, and A (x, -1) and S (x, H) represent motion vectors at upper and lower positions of the current sub-block.

The left motion vector L and the upper motion vector a may be derived from a spatial neighboring block of the current coding block. Motion vectors L (-1, y) and A (x, -1) of the encoded block at preset positions (-1, y) and (x, -1) are obtained from the sub-block coordinates (x, y).

The right side motion vector R (W, y) and the lower motion vector B (x, H) are extracted by: extracting time domain motion information BR at the lower right position of the current coding block; the right side motion vector R (W, y) is obtained by weighting calculation using the extracted motion vector AR of the upper right airspace adjacent position and the temporal motion information BR of the lower right position, as shown in the following formula (30):

R(W，y)＝((H-y-1)AR+(y+1)BR)/H (30)

The lower motion vector B (x, H) is obtained by weighting calculation using the extracted motion vector BL of the lower left airspace adjacent position and temporal motion information BR of the lower right position, as shown in the following formula (31):

B(x，H)＝((W-x-1)BL+(x+1)BR)/W (31)

all motion vectors used in the above calculations are scaled to point to the first reference frame in a particular reference frame queue.

8) Advanced motion vector prediction mode (Affine AMVP mode) based on Affine motion model. For the Affine AMVP mode, an inherited control point motion vector prediction method and/or a constructed control point motion vector prediction method may be utilized to construct a candidate motion vector list of AMVP mode based on an Affine motion model. In the embodiment of the present invention, the candidate motion vector list of AMVP mode based on affine motion model may be referred to as a control point motion vector predictor candidate list (control point motion vectors predictor candidate list), where the control point motion vector predictor in the list includes 2 (e.g., in the case that the current block is a 4-parameter affine motion model) candidate control point motion vectors or includes 3 (e.g., in the case that the current block is a 6-parameter affine motion model) candidate control point motion vectors.

In a possible application scenario, the candidate list of the motion vector predicted values of the control points can be pruned and ordered according to a specific rule, and can be truncated or filled to a specific number.

Then, at the encoding end, the encoder (such as the encoder 20 described above) obtains the motion vector of each sub-motion compensation unit in the current encoding block through the formula (3) or (5) or (7) by using each control point motion vector predicted value in the control point motion vector predicted value candidate list, and further obtains the pixel value of the corresponding position in the reference frame pointed by the motion vector of each sub-motion compensation unit, and uses the pixel value as the predicted value to perform motion compensation using the affine motion model. And calculating the average value of the difference value between the original value and the predicted value of each pixel point in the current coding block, selecting the control point motion vector predicted value corresponding to the minimum average value as the optimal control point motion vector predicted value, and taking the control point motion vector predicted value as the motion vector predicted value of 2 or 3 or 4 control points of the current coding block. In a possible embodiment, the control point motion vector is obtained by performing a motion search within a certain search range using the control point motion vector predictor as a search start point (control point motion vectors, CPMV), and a difference between the control point motion vector and the control point motion vector predictor is calculated (control point motion vectors differences, CPMVD). Then, the encoder transmits an index number indicating the position of the control point motion vector predictor in the control point motion vector predictor candidate list and the cpvd encoded input stream to the decoding end.

At the decoding end, the decoder (e.g., the decoder 30) parses the index number and the control point motion vector difference value (CPMVD) in the code stream, determines a control point motion vector predicted value (control point motion vectors predictor, CPMVP) from the control point motion vector predicted value candidate list according to the index number, and adds the CPMVP and the CPMVD to obtain the control point motion vector.

9) Fusion patterns based on affine motion models (Affine Merge mode). For Affine Merge mode, a control point motion vector fusion candidate list may be constructed using inherited control point motion vector prediction methods and/or constructed control point motion vector prediction methods (control point motion vectors merge candidate list).

In a possible application scenario, the control point motion vector fusion candidate list may be pruned and ordered according to a specific rule, and may be truncated or filled to a specific number.

At the encoding end, the encoder (e.g., encoder 20 described above) uses each control point motion in the fusion candidate listVector, each sub motion compensation unit (pixel point or size of specific method division is N) in the current coding block is obtained through formula (3) or (5) or (7) ₁ ×N ₂ And then obtains the pixel value of the position in the reference frame pointed by the motion vector of each sub motion compensation unit as the predicted value thereof, and carries out affine motion compensation. And calculating the average value of the difference value between the original value and the predicted value of each pixel point in the current coding block, and selecting the control point motion vector with the minimum average value of the difference values as the motion vector of 2 or 3 or 4 control points of the current coding block. And encoding an index number representing the position of the control point motion vector in the candidate list into a code stream and sending the code stream to a decoding end.

At the decoding end, the decoder (e.g., the aforementioned decoder 30) parses the index number, and determines the control point motion vector (control point motion vectors, CPMV) from the control point motion vector fusion candidate list based on the index number.

10 Sub-block fusion mode (Sub-block based merging mode). And (3) adding candidate motion information obtained by the ATMVP method and/or the PLANAR mode into a candidate list on the basis of the step (9) by utilizing a sub-block fusion mode. That is, the subblock fusion candidate list (sub-block based merging candidate list) may be constructed using an inherited control point motion vector prediction method, and/or a constructed control point motion vector prediction method, and/or an ATMVP method, and/or a PLANAR method.

In a possible application scenario, the sub-block fusion candidate list may be pruned and ordered according to a specific rule, and may be truncated or filled to a specific number.

At the encoding end, utilizing each candidate motion information in the sub-block fusion candidate list, and if the candidate is an ATMVP method or a PLANAR method, obtaining the motion information of each sub-block according to the method of 6) or 7); if the candidate is affine motion mode, obtaining each sub motion compensation unit (sub block, pixel point or specific method partition size N in the current coding block according to each control point motion vector through the formula (3)/(5)/(7) ₁ ×N ₂ Pixel block of (c) and thus each sub-motion compensation unitThe pixel value of the position in the reference frame pointed by the motion vector of (c) is used as the predicted value for affine motion compensation. And calculating the average value of the difference value between the original value and the predicted value of each pixel point in the current coding block, selecting the candidate corresponding to the smallest average value of the difference values as the motion information of the current coding block, and coding the index number representing the position of the candidate motion information in a candidate list into a code stream and sending the code stream to a decoder.

At the decoding end, the index number is analyzed, and the control point motion vector (control point motion vectors, CPMV) or the motion information of the sub-block (if the method is an ATMVP method or a PLANAR method) is determined from the control point motion vector fusion candidate list according to the index number.

11 Generalized Bi-prediction (GBi) method at the block level, which may also be referred to as a weighted prediction method at the Bi-prediction block level. Bi-directional prediction includes a first direction prediction and a second direction prediction, the first direction prediction being to predict a motion vector of a current block based on a reference picture of a first direction, thereby obtaining a reference block (or prediction block) of the current block in the first direction, wherein the reference picture of the first direction is one of a first reference picture frame set, and the first reference picture frame set includes a certain number of reference pictures; the second direction prediction predicts a second motion vector of the current block based on a reference image in the second direction, so as to obtain a reference block (or prediction block) of the current block in the second direction, wherein the reference image in the second direction is one of a second reference image frame set, and the second reference image frame set comprises a certain number of reference images. For example, the first set of reference image frames is reference image list0 (reference picture list, list0) and the second set of reference image frames is reference image list1 (reference picture list, list1). Also for example, the first set of reference image frames is list1 and the second set of reference image frames is list0.

When using the bi-prediction method, there will be two reference blocks (or two prediction blocks) for the current image block, each of which needs a motion vector and a reference frame index to indicate. The bi-prediction may specifically be to select one reference image from each of the first reference image frame set and the second reference image frame set to obtain a reference block, and then determine a predicted value of a pixel point in the current image block according to pixel values of pixel points in the two reference blocks.

For example, bi-prediction may also be referred to as forward backward prediction, that is, bi-prediction includes forward prediction and backward prediction, in which case when a first direction prediction is forward prediction, then a second direction prediction is backward prediction accordingly; when the first direction prediction is a backward prediction, then the second direction prediction is correspondingly a forward prediction. For example, in one bi-prediction implementation, two encoded frame lists (e.g., list0 and List 1) may be provided, both of which may contain forward and backward multiple encoded reference frames, with the forward and backward reference blocks (prediction blocks) of the current pending image block of the current encoded image being provided by the reference frames in the two encoded frame lists.

In the GBi method, each bi-predicted image block may select a set of weight values from a plurality of sets of weight value combinations to complete the weighted prediction.

For example, at the encoding end, the bi-directionally predicted picture block predictor P _bi-pred This can be achieved by the formula (32):

P _bi-pred ＝((8-w)*P ₀ +w*P ₁ +4)＞＞3 (32)

wherein, P0 and P1 are the predicted pixels after the first direction reference frame and the second direction reference frame are subjected to motion compensation, w is the weight parameter (w is weighting parameter for list-1prediction represented in 1/8 precision) occupied by the pixel value of the second direction reference frame expressed with 1/8precision in the predicted value of the image block of the bidirectional prediction, and > represents the right shift operation.

The value of w can be selected from a value set {4, -2,10,3,5}, and GBi index numbers corresponding to the weight values {4/8, -2/8,10/8,3/8,5/8} are {0,1,2,3,4}, in sequence. In non-low-delay image blocks, the value of w may be chosen, for example, from the value set {3,4,5 }. The GBi index numbers corresponding to the weight value {3/8,4/8,5/8} can be set to be {0,1,2} in sequence, namely, the weight value and the GBi index number have a corresponding relation. And carrying out weighted prediction according to the weight value, so as to obtain the predicted value of the image block of the bidirectional prediction.

In an embodiment of the present invention, the correspondence between w and GBi index numbers may also be directly established, for example, when the value of w may be selected from the value sets {4, -2,10,3,5}, the GBi index numbers corresponding to the value sets {4, -2,10,3,5} may be set to be {0,1,2,3,4}, in order. And carrying out weighted prediction according to the weight value, so as to obtain the predicted value of the image block of the bidirectional prediction. The technical solutions/technical ideas of the present invention are mainly described herein from the perspective of having a correspondence between weight values and GBi index numbers, and it should be understood that these technical solutions/technical ideas are also applicable to the case of "having a correspondence between w and GBi index numbers", and for brevity of description, detailed description will not be expanded herein.

In embodiments of the present invention, the GBi index number (the Generalization Bi-prediction weight index) may be used to determine the weight value of a directional reference frame (first directional reference frame or second directional reference frame) of an image block in bi-prediction of the GBi method. Specifically, in an example, the weight value corresponding to the GBi index number may be used to represent the weight occupied by the reference block pixel value of the reference image in list0 in the current block prediction value. In yet another example, the weight value corresponding to the GBi index number may be used to represent the weight that the reference block pixel values of the reference picture in list1 occupy in the current block predictor.

It should be noted that the above examples are only for explanation of the scheme and not limitation, and other mapping manners of GBi weight values and other index numbers are equally applicable to this patent. For example, in one implementation, the corresponding index numbers of { 2/8,3/8,4/8,5/8,10/8} may be {0,1,2,3,4} in order. The corresponding index numbers of {4/8,3/8,5/8} may be {0,1,2} in order.

When the weight value of the first direction reference frame (e.g., the backward reference frame) is determined (i.e., one of the set of weight values), then the weight value of the second direction reference frame (e.g., the forward reference frame) is also determined accordingly. For example, in the non-low-delay image block, when the weight value corresponding to the backward reference frame is 3/8 (the index number is 0, for example), then the weight value of the forward reference frame is 5/8 correspondingly. Such a combination of weight values <3/8,5/8> is a set of weight value combinations. It will be appreciated that different index numbers may correspond to different weight value combinations. Each bi-predicted image block may select a set of weight values from a plurality of sets of weight value combinations to complete the weighted prediction. Specifically, the predicted value of the bi-directionally predicted image block can be calculated according to each set of weight values, and the rate distortion Cost (RD Cost) is calculated according to the original value and the predicted value of each pixel point in the image block, so that a set of weight values with the minimum rate distortion Cost is selected as a final weight value combination, and the index number related to the weight value combination is transmitted to the decoding end.

At the decoding end, in the AMVP mode, directly analyzing the code stream to obtain GBi index numbers; in the Merge mode, a candidate motion information list is required to be constructed, then motion information and GBi index numbers are obtained from the list according to Merge index, then a target weight value combination is selected from a plurality of groups of weight value combinations according to the index numbers, and weighted prediction is carried out by utilizing the weight value combination to obtain predicted pixels.

When the GBi prediction method is used, in affine motion prediction, candidate motion information of each control point of the current block is obtained by using an inherited control point motion vector prediction method or a constructed control point motion vector prediction method, and GBi index numbers corresponding to each control point are obtained at the same time and stored for subsequent motion compensation and prediction; the obtained candidate motion information, GBi index number, will also be used in subsequent other decoding processes, e.g. as motion vector prediction in neighboring block decoding process, etc.

For the inherited control point motion vector prediction method, GBi index numbers corresponding to all control points of the current image block to be processed are all derived from the same coding unit (namely the same adjacent coded block), so that the GBi index numbers corresponding to all control points are consistent. However, in the control point motion vector prediction method constructed in the affine fusion mode, GBi index numbers of control points of the current image block to be processed come from different coding units (i.e., different adjacent coded blocks), so that there may be a case that GBi index numbers of control points are different, which may eventually result in that GBi index numbers of respective sub-blocks of the current block are different. For example, if the affine motion model adopted by the current affine decoding block is a 4 affine motion model, GBi index numbers of a left upper corner sub-block and a right upper corner sub-block are set to GBi index numbers of left upper vertex control points and right upper corner sub-blocks are respectively sub-blocks of different coding units, if the GBi index number of the left upper sub-block is 0 and the GBi index number of the right upper sub-block is 1, whether the GBi index number of the current image block to be processed is 0 or 1 cannot be judged, and then in the subsequent prediction of each sub-block, weight value combinations in bidirectional prediction cannot be determined, so that prediction is problematic, and the whole coding process is affected.

The embodiment of the invention provides a solution to solve the problems, ensures the normal running of the coding process when using the GBi prediction method, and improves the coding efficiency and accuracy. Some embodiments of determining GBi index numbers of current image blocks to be processed based on a constructed control point motion vector prediction method are described in detail below, and motion vectors of a plurality of control points of the current image blocks to be processed are obtained from motion vectors of a plurality of processed image blocks (a plurality of neighboring encoded blocks for an encoding end and a plurality of neighboring decoded blocks for a decoding end), respectively.

In some possible embodiments, the preset GBi index number may be used as the GBi index number of the current image block to be processed, and the weight value corresponding to the GBi index number of the image block to be processed may be used as the weight value corresponding to the reference frame of the image block to be processed. For example, if the preset GBi index number is 0 and the weight value corresponding to the GBi index number of the current image block to be processed is equal to 1/2, the weighting mode of the weighted prediction (bi-directional prediction) according to the weight value is the average weighting.

In some possible implementations, GBi index numbers for multiple control points of an image block to be processed may be obtained; in the constructed control point motion vector prediction method, GBi index numbers of the plurality of control points are GBi index numbers of processed image blocks corresponding to the control points respectively; and then determining a weight value corresponding to a reference frame in a certain direction of the current image block to be processed according to GBi index numbers of the control points. For example, if the weight value corresponding to a reference frame (for example, a backward reference frame) in a certain direction is equal to 1/2 according to GBi index numbers of the control points, the weighting mode of weighting prediction according to the weight value is average weighting. In a possible implementation scenario, the GBi index number of the current image block to be processed may be determined according to the GBi index numbers of the plurality of control points, and a weight value corresponding to the GBi index number of the image block to be processed is used as a weight value corresponding to the reference frame of the image block to be processed. For example, according to GBi index numbers of the plurality of control points, the GBi index number of the current image block to be processed is determined to be 0, and the weight value 1/2 corresponding to the GBi index number 0 determines the weight value corresponding to a reference frame (for example, a backward reference frame) in a certain direction.

In a possible implementation, the preset value may be used as a weight value of a certain direction reference frame (first direction reference frame or second direction reference frame, hereinafter the same) of the image block currently to be processed in the GBi method, for example, the preset value is 1/2. That is, the GBi index number corresponding to the candidate motion information of each control point obtained by the control point motion vector prediction method based on the structure is set as the index number (e.g., 0) corresponding to the weight value 1/2.

In a possible implementation manner, when GBi indexes of coding units corresponding to a plurality of control points of a current image block to be processed (which may be simply referred to as GBi indexes of a plurality of control points, hereinafter the same) are all the same, a weight value determined by the same GBi index is used as a weight value of a reference frame in a certain direction of the image block to be processed in a GBi method, and a GBi index corresponding to candidate motion information of a control point is set as the same GBi index; and under the condition that GBi index numbers of the control points are not identical, taking a preset value as a weight value of a reference frame in a certain direction of the image block to be processed in the GBi method, wherein the preset value is equal to 1/2, and GBi index numbers corresponding to candidate motion information of each control point are set as index numbers corresponding to the weight value of 1/2.

For example, the GBi index corresponding to the 1/2 weight value may be defined as K, and the GBi index of the candidate motion information obtained by the control point motion vector prediction method based on the configuration may be set as follows:

if it is presentThe image block to be processed has two control points, its candidate motion information is respectively obtained from the motion information of adjacent coded blocks A and B, and GBi index numbers of adjacent coded blocks A and B are respectively I _A And I _B . Then can be according to I _A And I _B To derive GBi index numbers corresponding to candidate motion information: if I _A ＝I _B The GBi index number corresponding to the candidate motion information is I _A The method comprises the steps of carrying out a first treatment on the surface of the If I _A ≠I _B And the GBi index number corresponding to the candidate motion information is K.

If the current image block to be processed has three control points, candidate motion information of the current image block to be processed is respectively obtained from motion information of adjacent coded blocks A, B and C, and GBi index numbers of the adjacent coded blocks A, B and C are respectively I _A 、I _B And I _C . Then can be according to I _A 、I _B And I _C To derive GBi index numbers corresponding to candidate motion information of each control point: if I _A ＝I _B ＝I _C The GBi index number corresponding to the candidate motion information of each control point is I _A The method comprises the steps of carrying out a first treatment on the surface of the If I _A 、I _B And I _C And if the candidate motion information of each control point is not identical, setting the GBi index number corresponding to the candidate motion information of each control point as K.

In a possible implementation manner, under the condition that the same GBi index numbers exist in the GBi index numbers of the plurality of control points, taking the same GBi index number with the largest number as a weight value of a reference frame in a certain direction of the image block to be processed in the GBi method, and setting the GBi index number corresponding to the candidate motion information of the control point as the same GBi index number; and under the condition that GBi index numbers of the control points are different, taking a preset value as a weight value of a reference frame in a certain direction of the image block to be processed, wherein the preset value is 1/2, and GBi index numbers corresponding to candidate motion information of the control points are set as index numbers corresponding to the weight value 1/2.

For example, the GBi index corresponding to the weight value 1/2 may be defined as K, and the GBi index of the candidate motion information obtained by the control point motion vector prediction method based on the configuration may be set as follows:

if it is presentThe image block to be processed has two control points, its candidate motion information is respectively obtained from the motion information of adjacent coded blocks A and B, and GBi index numbers of A and B are respectively I _A And I _B . According to I _A And I _B To derive GBi index numbers corresponding to candidate motion information: if I _A ＝I _B The GBi index number corresponding to the candidate motion information is I _A The method comprises the steps of carrying out a first treatment on the surface of the If I _A ≠I _B And the GBi index number corresponding to the candidate motion information is K.

If the current image block to be processed has three control points, candidate motion information of the current image block to be processed is respectively obtained from motion information of adjacent coded blocks A, B and C, and GBi index numbers of the adjacent coded blocks A, B and C are respectively I _A 、I _B And I _C . According to I _A 、I _B And I _C To derive GBi index numbers corresponding to candidate motion information: if I _A 、I _B And I _C If the values are equal, the GBi index number corresponding to the candidate motion information of each control point is the index number with the highest occurrence frequency; if I _A 、I _B And I _C And if the candidate motion information of each control point is different from each other, setting the GBi index number corresponding to the candidate motion information of each control point as K.

In a possible implementation manner, when a weight value identical to a preset value exists in a plurality of weight values determined by GBi index numbers of the plurality of control points, the preset value is used as a weight value of a reference frame in a certain direction of the image block to be processed in the GBi method, and the GBi index number corresponding to the candidate motion information of the control point is set as the index number corresponding to the weight value. And under the condition that a plurality of weight values determined by GBi index numbers of a plurality of control points are different from a preset value, taking the weight value with the smallest difference value with the preset value in the plurality of weight values as the weight value of the reference frame of the image block to be processed in the GBi method, and setting the GBi index number corresponding to the candidate motion information of the control point as the GBi index number corresponding to the weight value with the smallest difference value.

For example, the GBi index corresponding to the weight value 1/2 may be defined as K, and the GBi index of the candidate motion information obtained by the control point motion vector prediction method based on the configuration may be set as follows.

If the current image block to be processed has two control points, the candidate motion information is the motion information of the adjacent coded blocks A and B, and the GBi index numbers of the adjacent coded blocks A and B are I respectively _A And I _B . Then, according to I _A And I _B To derive GBi index numbers corresponding to candidate motion information: if I _A And I _B One or more than one value is K, and GBi index numbers corresponding to candidate motion information of the control point are set as K; if I _A And I _B And if the index numbers are not equal to K, the GBi index number corresponding to the candidate motion information is set as the index number corresponding to the weight value closest to 1/2 in the weight values.

In a possible implementation manner, in a case that an average value of a plurality of weight values determined by GBi index numbers of the plurality of control points is a preset value, the preset value is used as a weight value of a reference frame in a certain direction of the image block to be processed in the GBi method. And under the condition that the average value of a plurality of weight values determined by the GBi index numbers of the plurality of control points is not equal to a preset value, taking the weight value with the smallest difference value with the preset value in the plurality of weight values as the weight value of the reference frame of the image block to be processed in the GBi method, and setting the GBi index number corresponding to the candidate motion information of the control point as the GBi index number corresponding to the weight value with the smallest difference value.

If the current image block to be processed has two control points, the candidate motion information is the motion information of the adjacent coded blocks A and B, and the GBi index numbers of the adjacent coded blocks A and B are I respectively _A And I _B . Then, according to I _A And I _B To derive GBi index numbers corresponding to candidate motion information: if I _A And I _B The average value of the corresponding weight values is 1/2, and the GBi index number in the candidate motion information list is set to K; if I _A And I _B Corresponding weightAnd if the average value of the values is not 1/2, setting the GBi index number corresponding to the candidate motion information as the index number corresponding to the weight value which is closer to 1/2.

In a possible implementation manner, when a plurality of weight values determined by GBi index numbers of the plurality of control points are all different from a preset value, and an average value of at least two weight values in the plurality of weight values is equal to the preset value, the preset value is taken as a weight value of a reference frame in a certain direction of the image block to be processed in the GBi method.

If the current image block to be processed has three control points, candidate motion information of the current image block to be processed is respectively obtained from motion information of adjacent coded blocks A, B and C, and GBi index numbers of A, B and C are respectively I _A ，I _B And I _C . According to I _A 、I _B And I _C To derive GBi index numbers corresponding to candidate motion information: if I _A ，I _B And I _C One or more values in the candidate motion information list are K, and GBi index numbers in the candidate motion information list can be set to K; if I _A 、I _B And I _C Are not equal to K, and I _A 、I _B And I _C The average value of at least two weight values in the corresponding weight values is 1/2 (e.g. I _A Corresponding weight value and I _B The average value of the corresponding weight values is equal to 1/2), the GBi index number corresponding to the candidate motion information of each control point is set to K; otherwise, GBi index numbers corresponding to candidate motion information of each control point are set to be index numbers corresponding to weight values which are closer to 1/2.

It should be noted that, examples (for example, GBi index number is 0, weight value is 1/2, etc.) in the above embodiments are used to explain the technical solution of the present application, and are not limiting.

It should be noted that, based on the above technical ideas, other similar schemes may be adopted for implementation. In addition, in a specific application scenario, one or more of the above schemes may also be implemented comprehensively.

It can be seen that, under the condition that GBi index numbers of control points of a current block are different, by implementing the scheme of the embodiment of the invention, GBi index numbers corresponding to candidate motion information of each control point can be rapidly determined, thereby ensuring normal running of a bidirectional prediction coding process and improving coding efficiency and accuracy.

In the embodiment of the present invention, the encoding end may indicate the inter prediction mode of the current block and other related information to the decoding end using syntax elements.

A part of the syntax structure of the inter prediction mode used for parsing the current block, which is generally used at present, may be exemplarily shown in table 1. It should be noted that, syntax elements in the syntax structure may also be represented by other identifiers, which is not specifically limited in the present invention.

TABLE 1

/>

In table 1, ae (v) represents syntax elements encoded using adaptive binary arithmetic coding (context-based adaptive binary arithmetic coding, cabac).

The syntax element merge_flag [ x0] [ y0] may be used to indicate whether a fusion mode is employed for the current block. For example, when merge_flag [ x0] [ y0] =1, it indicates that the fusion mode is adopted for the current block, and when merge_flag [ x0] [ y0] =0, it indicates that the fusion mode is not adopted for the current block. x0, y0 represents the coordinates of the current block in the video image.

A syntax element merge_sub_flag [ x0] [ y0] may be used to indicate whether a sub-block-based merge mode is employed for the current block. The type (slice_type) of the slice where the current block is located is P-type or B-type. For example, merge_sub_flag [ x0] [ y0] =1, indicating that the merge mode based on the sub-block is adopted for the current block, and merge_sub_flag [ x0] [ y0] =0, indicating that the merge mode based on the sub-block is not adopted for the current block, the merge mode of the translational motion model can be adopted.

The syntax element merge_idx [ x0] [ y0] may be used to indicate an index value for the merge candidate list.

The syntax element merge_sub_idx [ x0] [ y0] may be used to indicate an index value for the sub-block-based merge candidate list.

The syntax element inter_affine_flag [ x0] [ y0] may be used to indicate whether AMVP mode based on an affine motion model is employed for a current block when a slice in which the current block is located is a P-type slice or a B-type slice.

The syntax element cu_affine_type_flag [ x0] [ y0] may be used to indicate: when the strip where the current block is located is a P type strip or a B type strip, carrying out motion compensation aiming at whether the current block adopts a 6-parameter affine motion model or not.

cujaffine_type_flag [ x0] [ y0] =0, indicating that motion compensation is not performed by using a 6-parameter affine motion model for the current block, and motion compensation can be performed by using only a 4-parameter affine motion model; cujaffine_type_flag [ x0] [ y0] =1, indicating that motion compensation is performed with a 6-parameter affine motion model for the current block.

Referring to table 2, motionmode idc [ x0] [ y0] =1, indicating that a 4-parameter affine motion model is used, motionmode idc [ x0] [ y0] =2, indicating that a 6-parameter affine motion model is used, and motionmode idc [ x0] [ y0] =0, indicating that a translational motion model is used.

TABLE 2

MotionModelIdc[x0][y0]	motion model for motion compensation (motion model for motion compensation)
		0	translational motion (translational movement)
1	4-parameter affine motion (4 parameter affine motion)
		2	6-parameter affine motion (6 parameter affine motion)

In table 1, a variable MaxNumMergeCand, maxNumSubblockMergeCand is used to indicate the maximum list length, indicating the maximum length of the constructed candidate motion vector list. inter_pred_idc [ x0] [ y0] is used to indicate the prediction direction. Pred_l1 is used to indicate backward prediction. num_ref_idx_l0_active_minus1 indicates the number of reference frames of the forward reference frame list, and ref_idx_l0[ x0] [ y0] indicates the forward reference frame index value of the current block. mvd_coding (x 0, y0, 0) indicates the first motion vector difference. mvp_l0_flag [ x0] [ y0] indicates a forward MVP candidate list index value. Pred_l0 indicates forward prediction. num_ref_idx_l1_active_minus1 indicates the reference frame number of the backward reference frame list. ref_idx_l1[ x0] [ y0] indicates a backward reference frame index value of the current block, and mvp_l1_flag [ x0] [ y0] indicates a backward MVP candidate list index value.

Based on the above description, the inter prediction method provided by the embodiment of the present invention is further described below, and the description is made from the viewpoint of the decoding end, referring to fig. 11A, and the method includes, but is not limited to, the following steps:

s601: and analyzing the code stream, and determining an inter prediction mode of the current image block to be processed (or called current decoding block or called current block).

For example, the bitstream may be parsed based on the syntax structure shown in table 1, thereby determining an inter prediction mode of the current block.

If it is determined that the inter prediction mode of the current block is an AMVP mode based on an affine motion model, i.e., the syntax element merge_flag=0 and affine_inter_flag=1, indicating that the inter prediction mode of the current block is an AMVP mode based on an affine motion model, S602a-S606a are subsequently performed.

If it is determined that the inter prediction mode of the current block is the Merge mode based on the affine motion model, i.e., the syntax element merge_flag=1 and affine_merge_flag=1, indicating that the inter prediction mode of the current block is the Merge mode based on the affine motion model, S602b-S605b are subsequently performed.

S602a: and constructing a candidate motion vector list corresponding to the AMVP mode based on the affine motion model.

In the embodiment of the invention, the candidate control point motion vector of the current block is obtained by deduction by using the inherited control point motion vector prediction method and/or the constructed control point motion vector prediction method, so as to add the candidate motion vector list corresponding to the AMVP mode. The motion information in the candidate motion information list may include: candidate control point motion vector predicted value, prediction direction. In a possible embodiment, the movement information may also comprise other information.

The candidate motion vector list may include a binary group list (the current encoding block is a 4-parameter affine motion model) or a ternary group list or a quaternary group list. The two-tuple list comprises one or more two-tuple used for constructing the 4-parameter affine motion model. The triplet list includes one or more triples for constructing a 6-parameter affine motion model. The four-tuple list comprises one or more three-tuple used for constructing the 8-parameter bilinear model

In the process of determining the candidate control point motion vector of the current block according to the inherited control point motion vector prediction method, motion vectors of at least two sub-blocks of adjacent affine decoding blocks are adopted to derive the candidate control point motion vector predicted value (candidate motion vector binary group/ternary group/quaternary group) of the current block, so as to add the candidate motion vector list. For details of the inherited control point motion vector prediction method, reference is made to the detailed description in the foregoing 3), and for brevity of description, details are not repeated here.

In the process of determining candidate control point motion vectors of the current block according to the constructed control point motion vector prediction method, the motion vectors of the coded blocks adjacent to the periphery of the control point of the current block are combined to be used as the motion vectors of the control points of the current affine coding block. For details reference is made to the previous descriptions of 4) and 5).

Illustratively, the affine motion model adopted by the current block is a 4-parameter affine motion model (i.e., motionmodel idc is 1), and motion vectors of the top left vertex and the top right vertex of the current block are determined using motion information of coded blocks adjacent to the periphery of the current block. Specifically, the control point motion vector prediction method 1 or the control point motion vector prediction method 2 can be adopted to obtain the candidate control point motion vector of the current block, and then the candidate motion vector list corresponding to the AMVP mode is added.

Illustratively, the affine motion model employed by the current block is a 6-parameter affine motion model (i.e., motionmodel idc is 2), and motion vectors of upper left and upper right vertices and lower left vertices of the current block are determined using motion information of coded blocks adjacent to the periphery of the current block. Specifically, the control point motion vector prediction method 1 or the control point motion vector prediction method 2 can be adopted to obtain the candidate control point motion vector of the current block, and then the candidate motion vector list corresponding to the AMVP mode is added.

It should be noted that, the method of using other control points to represent the motion models of the neighboring block and the current block may also be applied to the present invention, which is not described herein.

In a possible embodiment of the present invention, the candidate motion vector binary/ternary/quaternary list may be pruned and ordered according to a specific rule, and may be truncated or filled to a specific number.

S603a: and analyzing the code stream and determining the optimal control point motion vector predicted value.

Specifically, the index number of the candidate motion vector list is obtained by parsing the code stream, and the optimal control point motion vector predictor (control point motion vectors predictor, CPMVP) is determined from the candidate motion vector list constructed in S602a according to the index number of the candidate motion vector list.

For example, if the affine motion model adopted by the current block is a 4-parameter affine motion model (motionmode idc is 1), the index number of the candidate motion vector list is obtained by parsing, and exemplary index numbers are mvp_l0_flag or mvp_l1_flag, and the optimal motion vector predicted values of 2 control points are determined from the candidate motion vector list according to the index number.

For another example, if the affine motion model adopted by the current block is a 6-parameter affine motion model (motionmodel idc is 2), the index number of the candidate motion vector list is obtained by analysis, and the optimal motion vector predicted values of 3 control points are determined from the candidate motion vector list according to the index number.

S604a: and analyzing the code stream and determining the motion vector of the control point.

Specifically, a motion vector difference value (control point motion vectors differences, CPMVD) of the control point is obtained by parsing the code stream, and then the motion vector of the control point is obtained according to the motion vector difference value of the control point and the optimal control point motion vector prediction value (CPMVP) determined in S803 a.

For example, the affine motion model adopted by the current block is a 4-parameter affine motion model (motionmodel idc is 1), and for example, the motion vector difference of 2 control points is mvd_coding (x 0, y0, 0) and mvd_coding (x 0, y0, 1), respectively. The motion vector difference values of the 2 control points of the current block are obtained by decoding from the code stream, and the motion vector difference values of the upper left position control point and the upper right position control point can be obtained by decoding from the code stream. And then respectively adding the motion vector difference value and the motion vector predicted value of each control point to obtain the motion vector value of the control point, and obtaining the motion vector values of the control point at the upper left position and the control point at the upper right position of the current block.

For another example, the affine motion model used for the current block is a 6-parameter affine motion model (motionmodel idc is 2), and for forward prediction, motion vector differences of 3 control points are mvd_coding (x 0, y0, 0) and mvd_coding (x 0, y0, 1), mvd_coding (x 0, y0, 2), respectively. The motion vector differences of the 3 control points of the current block are obtained from the code stream by decoding, and the motion vector differences of the upper left control point, the upper right control point and the lower left control point are obtained from the code stream by decoding. And then, respectively adding the motion vector difference value and the motion vector predicted value of each control point to obtain the motion vector value of the control point, and obtaining the motion vector values of the upper left control point, the upper right control point and the lower left control point of the current block.

It should be noted that, in the embodiment of the present invention, other affine motion models and other control point positions may also be used, which is not described herein.

S605a: and obtaining the motion vector value of each sub-block in the current block according to the motion information of the control point and the affine motion model adopted by the current block.

For each sub-block of the current affine decoding block (one sub-block can be equivalently a motion compensation unit, and the width and height of the sub-block are smaller than those of the current block), the motion information of the pixel points at the preset position in the motion compensation unit can be used for representing the motion information of all the pixel points in the motion compensation unit. Assuming that the size of the motion compensation unit is MxN, the pixel point at the preset position may be the center point (M/2, n/2), the top left pixel point (0, 0), the top right pixel point (M-1, 0), or the pixel points at other positions of the motion compensation unit. The motion compensation unit center point is described below as an example, see fig. 12. In fig. 12, V0 represents the motion vector of the upper left control point, and V1 represents the motion vector of the upper right control point. Each small box represents a motion compensation unit.

The coordinates of the motion compensation unit center point with respect to the top-left vertex pixel of the current affine decoding block are calculated using the following formula (33), where i is the i-th motion compensation unit in the horizontal direction (left to right), j is the j-th motion compensation unit in the vertical direction (top to bottom), (x) _(i，j) ，y _(i，j) ) Representing the coordinates of the (i, j) th motion compensation unit center point with respect to the upper left control point pixel of the current affine decoding block.

For example, if the affine motion model employed by the current affine decoding block is a 6-parameter affine motion model, the method will (x _(i，j) ，y _(i，j) ) Substituting 6 parameter affine motion model formula (34) to obtain motion vector of center point of each motion compensation unit as motion vector (vx) of all pixel points in the motion compensation unit _(i，j) ，vy _(i，j) )。

For example, if the affine motion model employed by the current affine decoding block is a 4 affine motion model, the method will (x _(i，j) ，y _(i，j) ) Substituting the motion vector into a 4-parameter affine motion model formula (35) to obtain a motion vector of the center point of each motion compensation unit as a motion vector (vx) of all pixel points in the motion compensation unit _(i，j) ，vy _(i，j) )。

S606a: and analyzing the code stream to obtain the GBi index number and the reference frame index of the current block, and obtaining the pixel prediction value of each sub-block according to the GBi index number and the reference frame index of the current block and the motion vector value of each sub-block.

For example, in bi-prediction, after the current block obtains the GBi index number and the reference frame index, a first direction reference frame and a second direction reference frame may be obtained from a first reference image frame set and a second reference image frame set (e.g., list0 and List 1) according to the reference frame index, and a reference block (prediction block) of the sub-block may be determined in the first direction reference frame and the second direction reference frame according to the obtained motion vector value of the sub-block in S605a, respectively; and determining a group of weight value combinations corresponding to the GBi index numbers, and then carrying out weighted prediction and motion compensation on the reference block according to the group of weight value combinations so as to obtain the pixel prediction value of each sub-block of the current block. The detailed implementation process may also refer to the detailed description in the foregoing 11), and for brevity of the description, a detailed description is omitted here.

S602b: and constructing a motion information candidate list of a merge mode based on the affine motion model.

Specifically, a motion information candidate list of a merge mode based on an affine motion model can be constructed by using an inherited control point motion vector prediction method and/or a constructed control point motion vector prediction method, wherein motion information in the candidate motion information list comprises: candidate control point motion vector predictors (candidate motion vector combinations), prediction direction, GBi index number, reference frame index. In a possible embodiment, the movement information may also comprise other information. The related implementation may refer to the foregoing detailed description of 9), and for brevity of the description, it will not be repeated here.

In a possible embodiment, the sub-block based fusion candidate list (sub-block based merging candidate list) may also be constructed using an ATMVP method, and/or an inherited control point motion vector prediction method, and/or a constructed control point motion vector prediction method, and/or a PLANAR method. The related implementation may refer to the detailed descriptions of the foregoing 6), 7), 9), and 10), and will not be repeated here.

For a specific implementation of the inherited control point motion vector prediction method and/or the constructed control point motion vector prediction method, reference may be made to the foregoing detailed descriptions of 3), 4), and 5), and a detailed description thereof will be omitted herein.

For example, after obtaining candidate motion vector combinations of control points by using the constructed control point motion vector prediction method, if the length of the candidate list is smaller than the maximum list length MaxAffineNumMrgCand, traversing the combinations according to a preset sequence to obtain legal combinations as candidate control point motion information, and if the candidate motion vector list is empty, adding the candidate control point motion information into the candidate motion vector list; otherwise, traversing the motion information in the candidate motion vector list in turn, and checking whether the motion information which is the same as the motion information of the candidate control point exists in the candidate motion vector list. And if the motion information which is the same as the motion information of the candidate control point does not exist in the candidate motion vector list, adding the motion information of the candidate control point into the candidate motion vector list.

Illustratively, one preset sequence is as follows: affine (CP 1, CP2, CP 3) - > Affine (CP 1, CP2, CP 4) - > Affine (CP 1, CP3, CP 4) - > Affine (CP 2, CP3, CP 4) - > Affine (CP 1, CP 2) - > Affine (CP 1, CP 3) - > Affine (CP 2, CP 3) - > Affine (CP 1, CP 4) - > Affine (CP 2, CP 4) - > Affine (CP 3, CP 4), for a total of 10 combinations.

And if the control point motion information corresponding to the combination is available, determining the GBi index number corresponding to the combination. Since the motion vectors of the control points of the current image block to be processed are from different neighboring decoded blocks, the GBi index numbers of the different neighboring decoded blocks may be processed using one or more of the embodiments described above, thereby obtaining GBi index numbers corresponding to the combination.

For another example, if the candidate motion vectors of the control points are obtained by the inherited control point motion vector prediction method, the GBi index numbers corresponding to the candidate motion vectors of the respective control points are from neighboring decoded blocks. The candidate motion vector and the corresponding GBi index number of the control point may then be added to the candidate motion information list.

Optionally, the embodiment of the present invention may further fill the candidate motion vector list, for example, after the above traversal process, when the length of the candidate motion vector list is smaller than the maximum list length MaxAffineNumMrgCand, then the candidate motion vector list may be filled until the length of the list is equal to MaxAffineNumMrgCand.

In addition, in a possible embodiment, the motion information candidate list may be pruned and ordered according to a specific rule, and may be truncated or filled to a specific number. For example, the padding may be performed by a method of padding a zero motion vector, or by a method of combining and weighted-averaging motion information of candidates already existing in an existing list. It should be noted that other methods for obtaining the candidate motion vector list filling may be applied to the present invention, and will not be described herein.

S603b: and analyzing the code stream and determining optimal control point motion information.

Specifically, the index number of the candidate motion vector list is obtained by parsing the code stream, and the optimal control point motion information is determined from the candidate motion vector list constructed in S602b according to the index number of the candidate motion vector list. It will be appreciated that the optimal control point motion information includes an optimal candidate motion vector combination, a prediction direction, GBi index number, and a reference frame index. In a possible embodiment, the optimal control point movement information may also include other information.

S604b: and obtaining the motion vector value of each sub-block in the current block according to the optimal control point motion information and the affine motion model adopted by the current decoding block. The specific implementation process may refer to the description in S605a, and will not be repeated here.

S605b: and obtaining the pixel predicted value of each sub-block according to the GBi index number and the reference frame index of the current block and the motion vector value of each sub-block.

Specifically, the GBi index number and the reference frame index corresponding to the optimal candidate motion vector combination are obtained through S603b, and the GBi index number can be used as the GBi index number of the current block.

In bi-prediction, the current block may obtain a first direction reference frame and a second direction reference frame from a first reference image frame set and a second reference image frame set (e.g., list0 and List 1) according to the reference frame index, and determine reference blocks (prediction blocks) of the sub-blocks in the first direction reference frame and the second direction reference frame, respectively, according to the motion vector values of the sub-blocks obtained in S604 b; and determining a group of weight value combinations corresponding to the GBi index numbers, and then carrying out weighted prediction and motion compensation on the reference block according to the group of weight value combinations so as to obtain the pixel prediction value of each sub-block of the current block. The detailed implementation process may also refer to the detailed description in the foregoing 11), and for brevity of the description, a detailed description is omitted here.

It can be seen that in the inter-frame prediction process in the embodiment of the present invention, the decoding end may perform bi-directional prediction by adopting the GBi method in combination with the AMVP mode or the merge mode based on the affine motion model. If the current block adopts an affine motion model, and the inter-frame prediction process adopts a constructed control point motion vector prediction method, the GBi index number of the current block can be obtained directly from a code stream (AMVP mode) or can be obtained by processing according to the GBi index number of the adjacent decoded block of the control point. The smooth proceeding of the decoding process is ensured, the GBi index number of the current block can be continuously used in the decoding process of the subsequent block to be decoded, and the coding efficiency and the prediction accuracy are improved.

Based on the foregoing description, the inter prediction method provided by the embodiment of the present invention is further described below, and the description is made from the point of view of the encoding end, referring to fig. 11B, and the method includes, but is not limited to, the following steps:

s701: an inter prediction mode of the current block is determined.

In a specific implementation, for inter prediction at the encoding end, a plurality of inter prediction modes may also be preset, where the plurality of inter prediction modes include, for example, the AMVP mode based on the affine motion model and the merge mode based on the affine motion model described above, and the encoding end traverses the plurality of inter prediction modes, so as to determine the inter prediction mode optimal for prediction of the current block.

In yet another specific implementation, only one inter-prediction mode may be preset in the inter-prediction of the encoding end, that is, in this case, the encoding end directly determines that the default inter-prediction mode is currently adopted, and the default inter-prediction mode is an AMVP mode based on an affine motion model or a merge mode based on an affine motion model.

In the embodiment of the present invention, if it is determined that the inter prediction mode of the current block is an AMVP mode based on an affine motion model, S702a-S705a are performed subsequently.

In the embodiment of the present invention, if it is determined that the inter prediction mode of the current block is the merge mode based on the affine motion model, S702b to S704b are performed subsequently.

S702a: and constructing a candidate motion vector list corresponding to the AMVP mode based on the affine motion model.

In the embodiment of the invention, the motion vector of the control point of the candidate of the current block is obtained by deduction by using the inherited motion vector prediction method of the control point and/or the constructed motion vector prediction method of the control point, so as to add a candidate motion vector list corresponding to the AMVP mode, and the motion information in the candidate motion vector list comprises: candidate control point motion vector predictors (candidate motion vector combinations), prediction direction. The specific implementation of S702a may refer to the description of S602a, which is not repeated here.

S703a: and determining an optimal control point motion vector predicted value according to the rate distortion cost.

In an example, the encoding end may obtain, by using the control point motion vector predictor (e.g., candidate motion vector tuple/triplet/quadruple) in the candidate motion vector list, the motion vector of each sub-motion compensation unit in the current block through formulas (3) or (5) or (7), so as to obtain the pixel value of the corresponding position in the reference frame pointed by the motion vector of each sub-motion compensation unit, and use the pixel value as the predictor for performing motion compensation using the affine motion model. And calculating the average value of the difference value between the original value and the predicted value of each pixel point in the current coding block, selecting the control point motion vector predicted value corresponding to the minimum average value as the optimal control point motion vector predicted value, and taking the control point motion vector predicted value as the motion vector predicted value of 2 or 3 or 4 control points of the current block.

S704a: and determining the GBi index number corresponding to the optimal control point motion vector predicted value.

For example, if the optimal control point motion vector predictor is obtained by using an inherited control point motion vector prediction method, that is, the optimal control point motion vector predictor is obtained by using a motion model of an adjacent encoded affine encoding block (abbreviated as an adjacent encoded block), the GBi index number of the adjacent encoded affine encoding block may be used as the GBi index number corresponding to the optimal control point motion vector predictor (that is, the GBi index number of the current block).

For example, if the optimal control point motion vector predictor (combination) is obtained by using the constructed control point motion vector prediction method, in the optimal control point motion vector predictor, the motion vector predictors of different control points are from different neighboring encoded blocks, and since GBi index numbers of different neighboring encoded blocks may be different, GBi index numbers of different neighboring encoded blocks may be processed by using one or more embodiments described above, so as to obtain the GBi index number corresponding to the optimal control point motion vector predictor (i.e., the GBi index number of the current block).

S705a: index numbers of the optimal control point motion vector predicted values in the candidate motion vector list, motion vector difference values (control point motion vectors differences, CPPVD) of the control points, GBi index numbers, reference frame indexes and indication information of inter-frame prediction modes are coded into a code stream.

In an example, the decoding end may perform motion search within a certain search range using the optimal control point motion vector predictor as a search start point to obtain a control point motion vector (control point motion vectors, CPMV), and calculate a difference between the control point motion vector and the control point motion vector predictor (control point motion vectors differences, CPMVD). The encoding end encodes index numbers, CPMVD, GBi index numbers, reference frame indexes and indication information of inter-frame prediction modes of the optimal control point motion vector predicted values in the candidate motion vector list into a code stream so as to facilitate subsequent transmission to the decoding end.

In specific implementations, the syntax elements of the encoded bitstream may refer to the descriptions of the foregoing tables 1 and 2, and are not repeated here.

S702b: and constructing a candidate motion vector list corresponding to the Merge mode based on the affine motion model.

Specifically, a motion information candidate list of a merge mode based on an affine motion model can be constructed by using an inherited control point motion vector prediction method and/or a constructed control point motion vector prediction method, wherein motion information in the candidate motion information list comprises: candidate control point motion vector predictors (candidate motion vector combinations), prediction direction, GBi index number, reference frame index. The relevant implementation is also referred to the detailed description of 9) above.

For example, if the candidate control point motion vector predictor is obtained by adopting an inherited control point motion vector prediction method, that is, the candidate control point motion vector predictor is obtained by using a motion model of an adjacent coded block, the GBi index of the adjacent coded block can be used as the GBi index corresponding to the candidate control point motion vector predictor.

For example, if the candidate control point motion vector predictor is obtained by using the constructed control point motion vector predictor, the motion vector predictors of different control points in the candidate control point motion vector predictors are from different neighboring encoded blocks, and the GBi index numbers of different neighboring encoded blocks may be different, the GBi index numbers of different neighboring encoded blocks may be processed by using one or more embodiments described above, so as to obtain the GBi index number corresponding to the candidate control point motion vector predictor.

For a detailed description of the inherited control point motion vector prediction method and/or the constructed control point motion vector prediction method reference is made to the previous 3), 4), 5). The specific implementation of S702b may also refer to the description of S602b, which is not repeated here.

S703b: and determining optimal control point motion information according to the rate distortion cost.

In an example, the encoding end may obtain the motion vector of each sub-motion compensation unit in the current encoding block by using the control point motion vector (e.g., the candidate motion vector binary group/ternary group/quaternary group) in the candidate motion vector list through formulas (3) or (5) or (7), so as to obtain the pixel value of the position in the reference frame pointed by the motion vector of each sub-motion compensation unit, and use the pixel value as the predicted value to perform affine motion compensation. And calculating the average value of the difference value between the original value and the predicted value of each pixel point in the current coding block, and selecting a control point motion vector corresponding to the smallest average value of the difference values as an optimal control point motion vector, wherein the optimal control point motion vector is taken as the motion vector of 2 or 3 or 4 control points of the current coding block.

In yet another example, if the fused candidate list of the sub-blocks is constructed through S702b, traversing each candidate motion information in the fused candidate list, and if the candidate is in ATMVP or PLANAR mode, obtaining the motion information of each sub-block according to the method of 6) or 7); if the candidate is affine motion mode, according to each control point motion vector, the motion vector of each sub-block in the current block is obtained through a formula (3)/(5)/(7), and then the pixel value of the position in the reference frame pointed by the motion vector of each sub-block is obtained and used as the predicted value of the pixel value, and affine motion compensation is carried out. And calculating the average value of the difference value between the original value and the predicted value of each pixel point in the current block, and selecting the candidate corresponding to the minimum average value of the difference values as the optimal control point motion information of the current block.

S704b: and encoding the index number of the optimal control point motion information in the candidate motion vector list and the indication information of the inter-frame prediction mode into a code stream so as to facilitate the subsequent transmission to a decoding end.

It should be noted that, the foregoing embodiments only describe the process of implementing coding and code stream transmission by the coding end, and those skilled in the art understand that, according to the foregoing description, the coding end may implement other methods described in the embodiments of the present invention in other links. For example, in the prediction of the current block at the encoding end, the specific implementation of the reconstruction process of the current block may refer to the related method (such as the embodiment of fig. 11A) described at the decoding end, which is not described herein.

It can be seen that in the inter prediction process in the embodiment of the present invention, the encoding end may use the GBi method to encode in combination with the AMVP mode or the merge mode based on the affine motion model. If the current block adopts an affine motion model, and the inter-frame prediction process adopts a constructed control point motion vector prediction method, GBi index numbers corresponding to candidate motion vector information of the current block can be obtained by processing GBi index numbers of adjacent coded blocks of the control point. The smooth proceeding of the coding process is ensured, the GBi index number of the current block can be continuously used in the coding process of the subsequent block to be coded, and the coding efficiency and the prediction accuracy are improved.

Referring to fig. 13, based on the same inventive concept as the above method, the embodiment of the present invention further provides an apparatus 1000, where the apparatus 1000 includes an acquisition module 1001, a weight determining module 1002, and a prediction module 1003, where:

an obtaining module 1001, configured to obtain GBi index numbers (the Generalization Bi-prediction weight index) of a plurality of control points of an image block to be processed; the GBi index numbers of the plurality of control points are derived from different processed image blocks, the GBi index numbers being used to determine weight values of reference frames of the processed image blocks in Generalized Bi-prediction (Generalized Bi-prediction);

a weight determining module 1002, configured to determine a weight value corresponding to a reference frame of the image block to be processed in the generalized bi-prediction according to GBi index numbers of the plurality of control points;

and a prediction module 1003, configured to perform weighted prediction according to the weight value of the reference frame of the image block to be processed, so as to obtain a predicted value of the image block to be processed.

In some possible embodiments, the weight determining module 1002 is specifically configured to: according to the GBi index numbers of the control points, determining the GBi index numbers of the image blocks to be processed; and taking the weight value corresponding to the GBi index number of the image block to be processed as the weight value corresponding to the reference frame of the image block to be processed.

In some possible embodiments, the weight determining module 1002 is specifically configured to: and under the condition that the GBi index numbers of the control points are the same, taking the same GBi index number as the GBi index number of the image block to be processed.

In some possible embodiments, the weight determining module 1002 is specifically configured to: and under the condition that different GBi index numbers exist in the GBi index numbers of the plurality of control points, taking the GBi index number corresponding to the preset value as the GBi index number of the image block to be processed.

In some possible embodiments, the weight determining module 1002 is specifically configured to: and under the condition that the same GBi index numbers exist in the GBi index numbers of the plurality of control points, taking the GBi index number with the largest number in the GBi index numbers of the plurality of control points as the GBi index number of the image block to be processed.

In some possible embodiments, the weight determining module 1002 is specifically configured to: and under the condition that GBi index numbers of the control points are different from each other, taking the GBi index number corresponding to the preset value as the GBi index number of the image block to be processed.

In some possible embodiments, the weight determining module 1002 is specifically configured to: and under the condition that at least one corresponding weight value in the GBi index numbers of the control points is equal to a preset value, taking the GBi index number corresponding to the preset value as the GBi index number of the image block to be processed.

In some possible embodiments, the weight determining module 1002 is specifically configured to: and under the condition that a plurality of weight values corresponding to the GBi index numbers of the plurality of control points are different from a preset value, taking the GBi index number corresponding to the weight value with the smallest difference value with the preset value in the plurality of weight values as the GBi index number of the image block to be processed.

In some possible embodiments, the weight determining module 1002 is specifically configured to: and taking the GBi index number corresponding to the preset value as the GBi index number of the image block to be processed under the condition that the average value of a plurality of weight values corresponding to the GBi index numbers of the control points is equal to the preset value.

In some possible embodiments, the weight determining module 1002 is specifically configured to: and under the condition that a plurality of weight values corresponding to the GBi index numbers of the plurality of control points are different from the preset value and the average value of the plurality of weight values is not equal to the preset value, taking the GBi index number corresponding to the weight value with the smallest difference value with the preset value in the plurality of weight values as the GBi index number of the image block to be processed.

In some possible embodiments, the weight determining module 1002 is specifically configured to: and under the condition that a plurality of weight values corresponding to the GBi index numbers of the control points are different from a preset value and the average value of at least two weight values in the plurality of weight values is equal to the preset value, taking the GBi index number corresponding to the preset value as the GBi index number of the image block to be processed.

In some possible embodiments, the preset value in the above embodiments is, for example, 1/2.

In some possible embodiments, the image to be processed comprises a plurality of sub-blocks: the prediction module 1003 is further configured to: obtaining a motion vector of each sub-block in the image block to be processed according to the motion vectors of the control points; the prediction module 1003 is specifically configured to: obtaining at least two motion compensation blocks of each sub-block according to at least two motion vectors of each sub-block in the image block to be processed and at least two reference frames respectively corresponding to the at least two motion vectors; and weighting the pixel values of the at least two motion compensation blocks according to the weight values respectively corresponding to the at least two reference frames to obtain the predicted value of each sub-block.

It should be further noted that, specific implementations of the obtaining module 1001, the weight determining module 1002, and the predicting module 1003 may refer to fig. 11A, fig. 11B, and the related descriptions of the foregoing embodiments, which are not repeated herein for brevity of description.

Referring to fig. 14, based on the same inventive concept as the above method, the embodiment of the present invention further provides an apparatus 2000, the apparatus 2000 comprising a weight determining module 2002, a predicting module 2003, wherein:

The weight determining module 2002 is configured to use a preset GBi index number as a GBi index number of an image block to be processed, where motion vectors of a plurality of control points of the image block to be processed are obtained according to motion vectors of a plurality of processed image blocks respectively; taking the weight value corresponding to the GBi index number of the image block to be processed as the weight value corresponding to the reference frame of the image block to be processed;

and a prediction module 2003, configured to perform weighted prediction according to the weight value, so as to obtain a predicted value of the image block to be processed.

In some possible embodiments, the image to be processed comprises a plurality of sub-blocks:

the prediction module 2003 is also used to: obtaining a motion vector of each sub-block in the image block to be processed according to the motion vectors of the control points; the prediction module 2003 is specifically configured to: obtaining at least two motion compensation blocks of each sub-block according to at least two motion vectors of each sub-block in the image block to be processed and at least two reference frames respectively corresponding to the at least two motion vectors; and weighting the pixel values of the at least two motion compensation blocks according to the weight values respectively corresponding to the at least two reference frames to obtain the predicted value of each sub-block.

In some possible embodiments, the weight value corresponding to the GBi index number of the image block to be processed is, for example, 1/2.

In some possible embodiments, the preset GBi index number is, for example, 0.

Similarly, specific implementations of the weight determining module 2002 and the predicting module 2003 may refer to fig. 11A and 11B and the related descriptions of the foregoing embodiments, and are not repeated herein for brevity of the description.

Those of skill in the art will appreciate that the functions described in connection with the various illustrative logical blocks, modules, and algorithm steps described in connection with the disclosure herein may be implemented as hardware, software, firmware, or any combination thereof. If implemented in software, the functions described by the various illustrative logical blocks, modules, and steps may be stored on a computer readable medium or transmitted as one or more instructions or code and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media corresponding to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (e.g., according to a communication protocol). In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that the computer-readable storage medium and data storage medium do not include connections, carrier waves, signals, or other transitory media, but are actually directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor" as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functions described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combination codec. Moreover, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). The various components, modules, or units are described in this disclosure in order to emphasize functional aspects of the devices for performing the disclosed techniques, but do not necessarily require realization by different hardware units. Indeed, as described above, the various units may be combined in a codec hardware unit in combination with suitable software and/or firmware, or provided by an interoperable hardware unit (including one or more processors as described above).

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

The foregoing is merely illustrative of the embodiments of the present invention, and the scope of the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. A method of inter prediction based on affine prediction modes, comprising:

acquiring GBi index numbers (the Generalization Bi-predictionweight index) of a plurality of control points of an image block to be processed;

according to the GBi index numbers of the control points, determining the GBi index numbers of the image blocks to be processed; under the condition that different GBi index numbers exist in the GBi index numbers of the control points, the GBi index number corresponding to a preset value is used as the GBi index number of the image block to be processed;

taking the weight value corresponding to the GBi index number of the image block to be processed as the weight value corresponding to the reference frame of the image block to be processed;

And carrying out weighted prediction according to the weight value to obtain a predicted value of the image block to be processed.

2. The method according to claim 1, wherein the method further comprises:

and under the condition that the GBi index numbers of the control points are the same, taking the same GBi index number as the GBi index number of the image block to be processed.

3. The method according to claim 1, wherein the method further comprises:

4. The method according to claim 1, wherein the method further comprises:

5. The method according to claim 1, wherein the method further comprises:

6. The method according to claim 1, wherein the method further comprises:

7. The method according to claim 1, wherein the method further comprises:

8. The method according to claim 6 or 7, characterized in that the method further comprises:

9. The method according to claim 1, wherein the method further comprises:

10. The method according to any one of claims 4-7, 9, wherein the preset value is 1/2.

11. The method according to any one of claims 1-7, 9, wherein the image to be processed comprises a plurality of sub-blocks, the method further comprising:

obtaining a motion vector of each sub-block in the image block to be processed according to the motion vectors of the control points;

correspondingly, the performing weighted prediction according to the weight value to obtain a predicted value of the image block to be processed includes:

obtaining at least two motion compensation blocks of each sub-block according to at least two motion vectors of each sub-block in the image block to be processed and at least two reference frames respectively corresponding to the at least two motion vectors;

and weighting the pixel values of the at least two motion compensation blocks according to the weight values respectively corresponding to the at least two reference frames to obtain the predicted value of each sub-block.

12. An inter prediction method based on affine prediction mode, comprising:

under the condition that different GBi index numbers exist in GBi index numbers of a plurality of control points, taking the preset GBi index number as the GBi index number of the image block to be processed, wherein the motion vectors of the plurality of control points of the image block to be processed are respectively obtained according to the motion vectors of a plurality of processed image blocks;

13. The method of claim 12, wherein the image to be processed comprises a plurality of sub-blocks, the method further comprising:

14. The method according to claim 12 or 13, wherein the GBi index number of the image block to be processed corresponds to a weight value of 1/2.

15. The method according to any one of claims 12-14, wherein the preset GBi index number is 0.

16. An apparatus, comprising:

the acquisition module is used for acquiring GBi index numbers (the GeneralizationBi-prediction weight index) of a plurality of control points of the image block to be processed;

the weight determining module is used for determining GBi index numbers of the image blocks to be processed according to the GBi index numbers of the control points, and taking the GBi index numbers corresponding to preset values as the GBi index numbers of the image blocks to be processed and taking the weight values corresponding to the GBi index numbers of the image blocks to be processed as the weight values corresponding to the reference frames of the image blocks to be processed when different GBi index numbers exist in the GBi index numbers of the control points;

and the prediction module is used for carrying out weighted prediction according to the weight value so as to obtain a predicted value of the image block to be processed.

17. The apparatus of claim 16, wherein the weight determination module is specifically configured to:

18. The apparatus of claim 16, wherein the weight determination module is specifically configured to:

19. The apparatus of claim 16, wherein the weight determination module is specifically configured to:

20. The apparatus of claim 16, wherein the weight determination module is specifically configured to:

21. The apparatus of claim 16, wherein the weight determination module is specifically configured to:

22. The apparatus of claim 16, wherein the weight determination module is specifically configured to:

23. The apparatus according to claim 21 or 22, wherein the weight determining module is specifically configured to:

24. The apparatus of claim 16, wherein the weight determination module is specifically configured to:

25. The apparatus of any one of claims 17-24, wherein the preset value is 1/2.

26. The apparatus according to any of claims 16-25, wherein the image to be processed comprises a plurality of sub-blocks:

the prediction module is further configured to: obtaining a motion vector of each sub-block in the image block to be processed according to the motion vectors of the control points;

the prediction module is specifically configured to: obtaining at least two motion compensation blocks of each sub-block according to at least two motion vectors of each sub-block in the image block to be processed and at least two reference frames respectively corresponding to the at least two motion vectors; and weighting the pixel values of the at least two motion compensation blocks according to the weight values respectively corresponding to the at least two reference frames to obtain the predicted value of each sub-block.

27. An apparatus, comprising:

the weight determining module is used for taking a preset GBi index number as the GBi index number of the image block to be processed under the condition that different GBi index numbers exist in GBi index numbers of a plurality of control points, wherein the motion vectors of the plurality of control points of the image block to be processed are respectively obtained according to the motion vectors of a plurality of processed image blocks; taking the weight value corresponding to the GBi index number of the image block to be processed as the weight value corresponding to the reference frame of the image block to be processed;

28. The apparatus of claim 27, wherein the image to be processed comprises a plurality of sub-blocks:

29. The apparatus according to claim 27 or 28, wherein the GBi index number of the image block to be processed corresponds to a weight value of 1/2.

30. The device of any one of claims 27-29, wherein the predetermined GBi index number is 0.

31. A video codec apparatus, the apparatus comprising: a non-volatile memory and a processor coupled to each other, the processor invoking program code stored in the memory to perform the method of any of claims 1-11.

32. A video codec apparatus, the apparatus comprising: a non-volatile memory and a processor coupled to each other, the processor invoking program code stored in the memory to perform the method of any of claims 13-15.