WO2023191599A1 - Procédé de codage/décodage de signal vidéo et support d'enregistrement pour stocker un flux binaire - Google Patents

Procédé de codage/décodage de signal vidéo et support d'enregistrement pour stocker un flux binaire Download PDF

Info

Publication number
WO2023191599A1
WO2023191599A1 PCT/KR2023/004412 KR2023004412W WO2023191599A1 WO 2023191599 A1 WO2023191599 A1 WO 2023191599A1 KR 2023004412 W KR2023004412 W KR 2023004412W WO 2023191599 A1 WO2023191599 A1 WO 2023191599A1
Authority
WO
WIPO (PCT)
Prior art keywords
reference picture
block
prediction
current
picture
Prior art date
Application number
PCT/KR2023/004412
Other languages
English (en)
Korean (ko)
Inventor
임성원
Original Assignee
주식회사 케이티
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 케이티 filed Critical 주식회사 케이티
Publication of WO2023191599A1 publication Critical patent/WO2023191599A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/109Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop

Definitions

  • This disclosure relates to a video signal processing method and apparatus.
  • High-resolution, high-quality images such as HD (High Definition) images and UHD (Ultra High Definition) images
  • UHD Ultra High Definition
  • Video data becomes higher resolution and higher quality
  • the amount of data increases relative to existing video data. Therefore, when video data is transmitted using media such as existing wired or wireless broadband lines or stored using existing storage media, transmission costs and Storage costs increase.
  • High-efficiency video compression technologies can be used to solve these problems that arise as video data becomes higher resolution and higher quality.
  • Inter-screen prediction technology that predicts the pixel value included in the current picture from pictures before or after the current picture using video compression technology
  • intra-screen prediction technology that predicts the pixel value included in the current picture using pixel information in the current picture
  • entropy coding technology which assigns short codes to values with a high frequency of occurrence and long codes to values with a low frequency of occurrence.
  • the purpose of the present disclosure is to provide a method for generating a virtual reference picture using a plurality of reference pictures and an apparatus for performing the same.
  • the purpose of the present disclosure is to provide a method for performing inter prediction using a virtual reference picture and an apparatus for performing the same.
  • An image decoding method includes selecting a reference picture for a current block from a reference picture list; and obtaining a prediction block for the current block based on the reference picture.
  • the reference picture may be a virtual reference picture generated based on a plurality of reference pictures included in the reference picture list.
  • An image encoding method includes selecting a reference picture for a current block from a reference picture list; and obtaining a prediction block for the current block based on the reference picture.
  • the reference picture may be a virtual reference picture generated based on a plurality of reference pictures included in the reference picture list.
  • the output order of the virtual reference picture may be the same as that of the current picture including the current block.
  • the virtual reference picture may be generated based on a first reference picture in an output order earlier than the current picture and a second reference picture in an output order later than the current picture.
  • the generation of the virtual reference picture is performed block by block, and the block in the virtual reference picture includes the first search area in the first reference picture and the second reference picture. It can be generated by a weighted sum operation of a pair of reference blocks with the lowest cost obtained from the second search area.
  • a smoothing filter may be applied to the block based on the extended area of a neighboring block adjacent to the block.
  • the smoothing filter is performed by applying a weighted sum between the first sample included in the block and the second sample at the same location as the first sample in the extension area. You can.
  • the generation of the virtual reference picture is performed block by block, and the first block belonging to the first region in the virtual reference picture is performed in the first search in the first reference picture. It is generated by a weighted sum operation of a region and a pair of reference blocks with the lowest cost obtained from a second search region of the second reference picture, and the second block belonging to the second region in the virtual reference picture is the first It may be generated by a weighted sum operation of a reference picture and blocks co-located with the second block in the second reference picture.
  • the first area is an area in which a standard deviation derived based on the difference between the first reference picture and the second reference picture is equal to or greater than a threshold
  • the second area may be an area where the standard deviation is smaller than the threshold value
  • the motion vector of the current block may be set to a zero vector.
  • the virtual reference picture when bidirectional prediction is not applied to the current block, the virtual reference picture may be selected as the reference picture.
  • information indicating whether the virtual reference picture is used may be signaled through a bitstream.
  • the video encoding/decoding method further includes determining an inter prediction mode of the current block, and whether the virtual reference picture can be used is adaptively determined according to the inter prediction mode. You can.
  • the virtual reference picture may be located first in the reference picture list.
  • the amount of encoded data can be reduced by generating a virtual reference picture based on a plurality of reference pictures on the decoder side.
  • prediction accuracy can be improved by performing inter prediction using a virtual reference picture.
  • FIG. 1 is a block diagram showing a video encoding device according to an embodiment of the present disclosure.
  • Figure 2 is a block diagram showing a video decoding device according to an embodiment of the present disclosure.
  • 3 and 4 are flowcharts of the inter prediction method.
  • Figure 5 shows an example in which motion estimation is performed.
  • Figures 6 and 7 show an example in which a prediction block of the current block is generated based on motion information generated through motion estimation.
  • Figure 8 shows positions referenced to derive motion vector prediction values.
  • Figure 9 is a diagram for explaining a template-based motion estimation method.
  • Figure 10 shows examples of template configurations.
  • Figure 11 is a diagram for explaining a motion estimation method based on a bilateral matching method.
  • Figure 12 shows an example of generating a virtual reference picture.
  • Figure 13 shows an example in which a virtual reference picture is generated based on a two-way matching method.
  • Figure 14 shows an example in which two-way matching is applied only to some areas of the virtual reference picture.
  • Figure 15 shows an example in which two-way matching is applied to a block of extended size.
  • Figure 16 shows an example in which a smoothing filter is applied at the boundary between blocks.
  • first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, a first component may be referred to as a second component, and similarly, the second component may be referred to as a first component without departing from the scope of the present disclosure.
  • the term and/or includes any of a plurality of related stated items or a combination of a plurality of related stated items.
  • Figure 1 is a block diagram showing a video encoding device according to an embodiment of the present disclosure.
  • the image encoding device 100 includes a picture segmentation unit 110, prediction units 120 and 125, a conversion unit 130, a quantization unit 135, a reordering unit 160, and an entropy encoding unit ( 165), an inverse quantization unit 140, an inverse transform unit 145, a filter unit 150, and a memory 155.
  • each component shown in FIG. 1 is shown independently to represent different characteristic functions in the video encoding device, and does not mean that each component is comprised of separate hardware or a single software component. That is, each component is listed and included as a separate component for convenience of explanation, and at least two of each component can be combined to form one component, or one component can be divided into a plurality of components to perform a function, and each of these components can be divided into a plurality of components. Integrated embodiments and separate embodiments of the constituent parts are also included in the scope of the present disclosure as long as they do not deviate from the essence of the present disclosure.
  • components may not be essential components that perform essential functions in the present disclosure, but may simply be optional components to improve performance.
  • the present disclosure can be implemented by including only essential components for implementing the essence of the present disclosure, excluding components used only to improve performance, and a structure that includes only essential components excluding optional components used only to improve performance. is also included in the scope of rights of this disclosure.
  • the picture division unit 110 may divide the input picture into at least one processing unit.
  • the processing unit may be a prediction unit (PU), a transformation unit (TU), or a coding unit (CU).
  • the picture division unit 110 divides one picture into a combination of a plurality of coding units, prediction units, and transformation units, and combines one coding unit, prediction unit, and transformation unit based on a predetermined standard (for example, a cost function). You can encode the picture by selecting .
  • one picture may be divided into a plurality of coding units.
  • a recursive tree structure such as Quad Tree, Ternary Tree, or Binary Tree can be used, which can be used to divide one image or the largest coding unit.
  • a coding unit that is divided into other coding units with the coding unit as the root may be divided into child nodes equal to the number of divided coding units.
  • a coding unit that is no longer divided according to certain restrictions becomes a leaf node. For example, when it is assumed that quad tree partitioning is applied to one coding unit, one coding unit may be split into up to four different coding units.
  • the coding unit may be used to mean a unit that performs encoding or may be used to mean a unit that performs decoding.
  • a prediction unit may be divided into at least one square or rectangular shape of the same size within one coding unit, and any one of the prediction units divided within one coding unit may be a prediction unit of another prediction unit. It may be divided to have a different shape and/or size than the unit.
  • the conversion unit and prediction unit may be set to be the same. At this time, after dividing the coding unit into a plurality of transformation units, intra-screen prediction may be performed for each transformation unit.
  • a coding unit may be divided in the horizontal or vertical direction. The number of transformation units generated by dividing the coding unit may be 2 or 4, depending on the size of the coding unit.
  • the prediction units 120 and 125 may include an inter-prediction unit 120 that performs inter-prediction and an intra-prediction unit 125 that performs intra-prediction. It is possible to determine whether to use inter-prediction or intra-prediction for a coding unit, and determine specific information (eg, intra-prediction mode, motion vector, reference picture, etc.) according to each prediction method. At this time, the processing unit in which the prediction is performed and the processing unit in which the prediction method and specific contents are determined may be different. For example, the prediction method and prediction mode are determined in coding units, and prediction may be performed in prediction units or transformation units. The residual value (residual block) between the generated prediction block and the original block may be input to the conversion unit 130.
  • an inter-prediction unit 120 that performs inter-prediction
  • an intra-prediction unit 125 that performs intra-prediction. It is possible to determine whether to use inter-prediction or intra-prediction for a coding unit, and determine specific information (eg, intra-prediction mode, motion
  • prediction mode information, motion vector information, etc. used for prediction may be encoded in the entropy encoder 165 together with the residual value and transmitted to the decoding device.
  • prediction mode information, motion vector information, etc. used for prediction may be encoded in the entropy encoder 165 together with the residual value and transmitted to the decoding device.
  • the inter-picture prediction unit 120 may predict a prediction unit based on information on at least one picture among the pictures before or after the current picture, and in some cases, based on information on a partially encoded region within the current picture. You can also predict prediction units.
  • the inter-screen prediction unit 120 may include a reference picture interpolation unit, a motion prediction unit, and a motion compensation unit.
  • the reference picture interpolation unit may receive reference picture information from the memory 155 and generate pixel information of an integer number of pixels or less from the reference picture.
  • a DCT-based 8-tap interpolation filter with different filter coefficients can be used to generate pixel information of an integer pixel or less in 1/4 pixel units.
  • a DCT-based 4-tap interpolation filter with different filter coefficients can be used to generate pixel information of an integer pixel or less in 1/8 pixel units.
  • the motion prediction unit may perform motion prediction based on the reference picture interpolated by the reference picture interpolation unit.
  • Various methods such as FBMA (Full search-based Block Matching Algorithm), TSS (Three Step Search), and NTS (New Three-Step Search Algorithm), can be used to calculate the motion vector.
  • the motion vector may have a motion vector value in 1/2 or 1/4 pixel units based on the interpolated pixel.
  • the motion prediction unit can predict the current prediction unit by using a different motion prediction method.
  • Various methods can be used as motion prediction methods, such as the skip method, merge method, AMVP (Advanced Motion Vector Prediction) method, and intra block copy method.
  • the intra-screen prediction unit 125 may generate a prediction block based on reference pixel information, which is pixel information in the current picture.
  • Reference pixel information may be derived from one selected among a plurality of reference pixel lines.
  • the N-th reference pixel line among the plurality of reference pixel lines may include left pixels whose x-axis difference with the top-left pixel in the current block is N and top pixels whose y-axis difference with the top-left pixel is N.
  • the number of reference pixel lines that the current block can select may be 1, 2, 3, or 4.
  • the surrounding block of the current prediction unit is a block that performed inter-screen prediction
  • the reference pixel is a pixel that performed inter-screen prediction
  • the reference pixel included in the block that performed inter-screen prediction is used to perform intra-screen prediction around the surrounding reference pixel. It can be used instead of the reference pixel information of the block. That is, when a reference pixel is not available, information on the unavailable reference pixel can be replaced with information on at least one of the available reference pixels.
  • the prediction mode can include a directional prediction mode that uses reference pixel information according to the prediction direction and a non-directional mode that does not use directional information when performing prediction.
  • the mode for predicting luminance information and the mode for predicting chrominance information may be different, and the intra-screen prediction mode information used to predict luminance information or predicted luminance signal information may be used to predict chrominance information. .
  • the screen for the prediction unit is based on the pixel on the left, the pixel on the upper left, and the pixel on the top of the prediction unit.
  • My prediction can be carried out.
  • the intra-screen prediction method can generate a prediction block after applying a smoothing filter to the reference pixel according to the prediction mode. Depending on the selected reference pixel line, whether to apply a smoothing filter may be determined.
  • the intra prediction mode of the current prediction unit can be predicted from the intra prediction mode of prediction units existing around the current prediction unit.
  • the current prediction unit and the surrounding prediction unit are predicted using predetermined flag information.
  • Information that the prediction modes of the units are the same can be transmitted, and if the prediction modes of the current prediction unit and neighboring prediction units are different, entropy encoding can be performed to encode the prediction mode information of the current block.
  • a residual block may be generated that includes residual information that is the difference between the prediction unit on which prediction was performed and the original block of the prediction unit.
  • the generated residual block may be input to the conversion unit 130.
  • the transform unit 130 transforms the residual block, including the original block and the residual value information of the prediction unit generated through the prediction units 120 and 125, into DCT (Discrete Cosine Transform), DST (Discrete Sine Transform), KLT and It can be converted using the same conversion method. Whether to apply DCT, DST, or KLT to transform the residual block is based on at least one of the size of the transformation unit, the shape of the transformation unit, the prediction mode of the prediction unit, or the intra-screen prediction mode information of the prediction unit. can be decided.
  • the quantization unit 135 may quantize the values converted to the frequency domain by the conversion unit 130.
  • the quantization coefficient may change depending on the block or the importance of the image.
  • the value calculated by the quantization unit 135 may be provided to the inverse quantization unit 140 and the realignment unit 160.
  • the rearrangement unit 160 may rearrange coefficient values for the quantized residual values.
  • the rearrangement unit 160 can change the coefficients in a two-dimensional block form into a one-dimensional vector form through a coefficient scanning method.
  • the realignment unit 160 can scan from DC coefficients to coefficients in the high frequency region using a zig-zag scan method and change it into a one-dimensional vector form.
  • zig-zag scan instead of zig-zag scan, a vertical scan that scans the two-dimensional block-shaped coefficients in the column direction, a horizontal scan that scans the two-dimensional block-shaped coefficients in the row direction, or a two-dimensional A diagonal scan, which scans the block shape coefficients diagonally, may also be used. That is, depending on the size of the transformation unit and the intra-screen prediction mode, it can be determined which scan method among zig-zag scan, vertical scan, horizontal scan, or diagonal scan will be used.
  • the entropy encoding unit 165 may perform entropy encoding based on the values calculated by the reordering unit 160.
  • Entropy coding can use various coding methods, such as Exponential Golomb, Context-Adaptive Variable Length Coding (CAVLC), and Context-Adaptive Binary Arithmetic Coding (CABAC).
  • the entropy encoding unit 165 receives the residual value coefficient information and block type information of the coding unit, prediction mode information, division unit information, prediction unit information and transmission unit information, and motion information from the reordering unit 160 and the prediction units 120 and 125.
  • Various information such as vector information, reference frame information, block interpolation information, and filtering information can be encoded.
  • the entropy encoding unit 165 may entropy encode the coefficient value of the coding unit input from the reordering unit 160.
  • the inverse quantization unit 140 and the inverse transformation unit 145 inversely quantize the values quantized in the quantization unit 135 and inversely transform the values transformed in the transformation unit 130.
  • the residual value generated in the inverse quantization unit 140 and the inverse transform unit 145 is combined with the prediction unit predicted through the motion estimation unit, motion compensation unit, and intra-screen prediction unit included in the prediction units 120 and 125. Reconstructed blocks can be created.
  • the filter unit 150 may include at least one of a deblocking filter, an offset correction unit, and an adaptive loop filter (ALF).
  • a deblocking filter may include at least one of a deblocking filter, an offset correction unit, and an adaptive loop filter (ALF).
  • ALF adaptive loop filter
  • the deblocking filter can remove block distortion caused by boundaries between blocks in the restored picture. To determine whether to perform deblocking, it is possible to determine whether to apply a deblocking filter to the current block based on the pixels included in several columns or rows included in the block. When applying a deblocking filter to a block, a strong filter or a weak filter can be applied depending on the required deblocking filtering strength. Additionally, when applying a deblocking filter, horizontal filtering and vertical filtering can be processed in parallel when vertical filtering and horizontal filtering are performed.
  • the offset correction unit may correct the offset of the deblocked image from the original image in pixel units.
  • the pixels included in the image are divided into a certain number of areas, then the area to perform offset is determined and the offset is applied to that area, or the offset is performed by considering the edge information of each pixel. You can use the method of applying .
  • Adaptive Loop Filtering can be performed based on a comparison between the filtered restored image and the original image. After dividing the pixels included in the image into predetermined groups, filtering can be performed differentially for each group by determining one filter to be applied to that group. Information related to whether to apply ALF may be transmitted for each coding unit (CU), and the shape and filter coefficients of the ALF filter to be applied may vary for each block. Additionally, an ALF filter of the same type (fixed type) may be applied regardless of the characteristics of the block to which it is applied.
  • ALF Adaptive Loop Filtering
  • the memory 155 may store a reconstructed block or picture calculated through the filter unit 150, and the stored reconstructed block or picture may be provided to the prediction units 120 and 125 when inter-prediction is performed.
  • Figure 2 is a block diagram showing a video decoding device according to an embodiment of the present disclosure.
  • the image decoding device 200 includes an entropy decoding unit 210, a reordering unit 215, an inverse quantization unit 220, an inverse transform unit 225, a prediction unit 230, 235, and a filter unit ( 240) and memory 245 may be included.
  • the input bitstream can be decoded in a procedure opposite to that of the video encoding device.
  • the entropy decoding unit 210 may perform entropy decoding in a procedure opposite to the procedure in which entropy encoding is performed in the entropy encoding unit of the video encoding device. For example, various methods such as Exponential Golomb, CAVLC (Context-Adaptive Variable Length Coding), and CABAC (Context-Adaptive Binary Arithmetic Coding) may be applied in response to the method performed in the image encoding device.
  • various methods such as Exponential Golomb, CAVLC (Context-Adaptive Variable Length Coding), and CABAC (Context-Adaptive Binary Arithmetic Coding) may be applied in response to the method performed in the image encoding device.
  • the entropy decoder 210 can decode information related to intra-screen prediction and inter-screen prediction performed by the encoding device.
  • the reordering unit 215 may rearrange the bitstream entropy-decoded by the entropy decoding unit 210 based on the method in which the encoder rearranges the bitstream. Coefficients expressed in the form of a one-dimensional vector can be restored and rearranged as coefficients in the form of a two-dimensional block.
  • the reordering unit 215 may receive information related to coefficient scanning performed by the encoder and perform reordering by reverse scanning based on the scanning order performed by the encoder.
  • the inverse quantization unit 220 may perform inverse quantization based on the quantization parameters provided by the encoding device and the coefficient values of the rearranged blocks.
  • the inverse transform unit 225 may perform inverse transform, that is, inverse DCT, inverse DST, and inverse KLT, on the transform performed by the transformer, that is, DCT, DST, and KLT, on the quantization result performed by the image encoding device. Inverse transformation may be performed based on the transmission unit determined by the video encoding device.
  • the inverse transform unit 225 of the video decoding device selectively performs transformation techniques (e.g., DCT, DST, KLT) according to a plurality of information such as prediction method, size and shape of the current block, prediction mode, and intra-screen prediction direction. It can be.
  • the prediction units 230 and 235 may generate a prediction block based on prediction block generation-related information provided by the entropy decoder 210 and previously decoded block or picture information provided by the memory 245.
  • the pixel existing on the left of the prediction unit, the pixel existing in the upper left, and the upper In-screen prediction is performed for the prediction unit based on the pixels present in the screen.
  • the reference pixel based on the conversion unit is used to predict the screen. My prediction can be carried out.
  • intra-picture prediction using NxN partitioning only for the minimum coding unit can be used.
  • the prediction units 230 and 235 may include a prediction unit determination unit, an inter-screen prediction unit, and an intra-screen prediction unit.
  • the prediction unit discriminator receives various information such as prediction unit information input from the entropy decoder 210, prediction mode information of the intra-screen prediction method, and motion prediction-related information of the inter-screen prediction method, and distinguishes the prediction unit from the current encoding unit. , it is possible to determine whether the prediction unit performs inter-screen prediction or intra-screen prediction.
  • the inter-picture prediction unit 230 uses the information required for inter-prediction of the current prediction unit provided by the video encoding device to base the information included in at least one picture of the picture before or after the current picture containing the current prediction unit. You can perform inter-screen prediction for the current prediction unit. Alternatively, inter-screen prediction may be performed based on information on a pre-restored partial region within the current picture including the current prediction unit.
  • the motion prediction methods of the prediction unit included in the coding unit based on the coding unit are Skip Mode, Merge Mode, AMVP Mode, and In-Screen Block Copy. It is possible to determine which of the modes is used.
  • the intra-screen prediction unit 235 may generate a prediction block based on pixel information in the current picture. If the prediction unit is a prediction unit that has performed intra-prediction, intra-prediction can be performed based on the intra-prediction mode information of the prediction unit provided by the video encoding device.
  • the intra-screen prediction unit 235 may include an Adaptive Intra Smoothing (AIS) filter, a reference pixel interpolation unit, and a DC filter.
  • AIS filter is a part that performs filtering on the reference pixels of the current block, and can be applied by determining whether or not to apply the filter according to the prediction mode of the current prediction unit.
  • AIS filtering can be performed on the reference pixel of the current block using the prediction mode and AIS filter information of the prediction unit provided by the video encoding device. If the prediction mode of the current block is a mode that does not perform AIS filtering, the AIS filter may not be applied.
  • the reference pixel interpolator may interpolate the reference pixel to generate a reference pixel in pixel units of an integer value or less. If the prediction mode of the current prediction unit is a prediction mode that generates a prediction block without interpolating the reference pixel, the reference pixel may not be interpolated.
  • the DC filter can generate a prediction block through filtering when the prediction mode of the current block is DC mode.
  • the restored block or picture may be provided to the filter unit 240.
  • the filter unit 240 may include a deblocking filter, an offset correction unit, and an ALF.
  • the deblocking filter of the video decoding device receives information related to the deblocking filter provided by the video encoding device, and the video decoding device can perform deblocking filtering on the corresponding block.
  • the offset correction unit may perform offset correction on the reconstructed image based on the type of offset correction applied to the image during encoding and offset value information.
  • ALF can be applied to the coding unit based on ALF application availability information, ALF coefficient information, etc. provided from the coding device. This ALF information may be included and provided in a specific parameter set.
  • the memory 245 can store the restored picture or block so that it can be used as a reference picture or reference block, and can also provide the restored picture to an output unit.
  • the term coding unit is used as a coding unit for convenience of explanation, but it may also be a unit that performs not only encoding but also decoding.
  • the current block represents an encoding/decoding target block, and depending on the encoding/decoding stage, it may be a coding tree block (or coding tree unit), a coding block (or coding unit), a transform block (or transform unit), or a prediction block. (or prediction unit) or may represent a block to which an in-loop filter is applied.
  • 'unit' may represent a basic unit for performing a specific encoding/decoding process
  • 'block' may represent a pixel array of a predetermined size.
  • ‘block’ and ‘unit’ can be used with the same meaning.
  • a coding block (coding block) and a coding unit (coding unit) may be understood to have equivalent meanings.
  • the picture including the current block will be called the current picture.
  • overlapping data between pictures can be removed through inter prediction.
  • Inter prediction can be performed on a block basis.
  • a prediction block of the current block can be generated from a reference picture using motion information of the current block.
  • the motion information may include at least one of a motion vector, a reference picture index, and a prediction direction.
  • 3 and 4 are flowcharts of the inter prediction method.
  • Figure 3 shows the operation of the encoder
  • Figure 4 shows the operation of the decoder
  • the encoder performs motion estimation (S310) and obtains a prediction block based on motion information derived as a motion estimation result (S320).
  • the motion information may include at least one of a motion vector, reference picture index, motion vector precision, bidirectional weight, and whether L0 prediction is performed or L1 prediction is performed.
  • the encoder may determine an inter prediction mode for performing inter prediction and encode information for deriving motion information according to the determined inter prediction mode (S330).
  • an inter prediction mode is determined based on information signaled from the bitstream (S410), and motion information is acquired according to the determined inter prediction mode (S420).
  • motion information is acquired, a prediction block of the current block can be obtained based on the obtained motion information (S430).
  • Motion information of the current block can be generated through motion estimation.
  • Figure 5 shows an example in which motion estimation is performed.
  • the search range for motion estimation can be set from the same position as the reference point of the current block in the reference picture.
  • the reference point may be the location of the upper left sample of the current block.
  • a rectangle of size (w0+w01) and (h0+h1) is set as the search range, centered on the reference point.
  • w0, w1, h0, and h1 may have the same value.
  • at least one of w0, w1, h0, and h1 may be set to have a different value from the other.
  • the sizes of w0, w1, h0, and h1 may be determined so as not to exceed a Coding Tree Unit (CTU) boundary, slice boundary, tile boundary, or picture boundary.
  • CTU Coding Tree Unit
  • the cost of each reference block compared to the current block can be measured.
  • the cost can be calculated using the similarity between two blocks.
  • the cost may be calculated based on the Sum of Absolute Difference (SAD) of difference values between the original samples in the current block and the original samples (or restored samples) in the reference block.
  • SAD Sum of Absolute Difference
  • the reference block with the optimal cost can be set as the prediction block of the current block.
  • the distance between the current block and the reference block can be set as a motion vector.
  • the x-coordinate difference and y-coordinate difference between the current block and the reference block may be set as a motion vector.
  • the index of the picture containing the reference block specified through motion estimation is set as the reference picture index.
  • the prediction direction can be set based on whether the reference picture belongs to the L0 reference picture list or the L1 reference picture list.
  • motion estimation may be performed for each of the L0 direction and L1 direction.
  • motion information in the L0 direction and motion information in the L1 direction can be generated respectively.
  • Figures 6 and 7 show an example in which a prediction block of the current block is generated based on motion information generated through motion estimation.
  • Figure 6 shows an example of generating a prediction block through unidirectional (i.e., L0 direction) prediction
  • Figure 7 shows an example of generating a prediction block through bidirectional (i.e., L0 and L1 directions) prediction.
  • a prediction block of the current block is generated using one piece of motion information.
  • the motion information may include an L0 motion vector, an L0 reference picture index, and prediction direction information indicating the L0 direction.
  • a prediction block is created using two pieces of motion information.
  • a reference block in the L0 direction specified based on motion information in the L0 direction (L0 motion information) is set as an L0 prediction block
  • the L1 direction specified based on motion information in the L1 direction (L1 motion information) is set as an L0 prediction block.
  • the reference block can be used to generate an L1 prediction block.
  • the L0 prediction block and the L1 prediction block can be weighted to generate the prediction block of the current block.
  • the L0 reference picture exists in the direction before the current picture (i.e., the POC value is smaller than the current picture), and the L1 reference picture exists in the direction after the current picture (i.e., the POC value is smaller than the current picture). It is exemplified as existing in (the POC value is larger than the picture).
  • an L0 reference picture may exist in the direction after the current picture, or an L1 reference picture may exist in the direction before the current picture.
  • both the L0 reference picture and the L1 reference picture may exist in the previous direction of the current picture, or both may exist in the subsequent direction of the current picture.
  • bidirectional prediction may be performed using an L0 reference picture that exists in the direction after the current picture and an L1 reference picture that exists in the direction before the current picture.
  • Motion information of the block on which inter prediction was performed may be stored in memory. At this time, motion information may be stored in sample units. Specifically, motion information of the block to which a specific sample belongs may be stored as motion information of the specific sample. The stored motion information can be used to derive motion information of a neighboring block to be encoded/decoded later.
  • the encoder may signal information encoding a residual sample corresponding to the difference value between the sample of the current block (i.e., the original sample) and the prediction sample and the motion information necessary to generate the prediction block to the decoder.
  • the decoder may decode information about the signaled difference value to derive a difference sample, and add a prediction sample within a prediction block generated using motion information to the difference sample to generate a restored sample.
  • one of a plurality of inter prediction modes may be selected.
  • the plurality of inter prediction modes may include a motion information merge mode and a motion vector prediction mode.
  • the motion vector prediction mode is a mode in which the difference value between a motion vector and a motion vector predicted value is encoded and signaled.
  • the motion vector prediction value may be derived based on motion information of neighboring blocks or neighboring samples adjacent to the current block.
  • Figure 8 shows positions referenced to derive motion vector prediction values.
  • the current block has a size of 4x4.
  • 'LB' represents samples included in the leftmost column and bottommost row in the current block.
  • 'RT' represents the sample included in the rightmost column and topmost row in the current block.
  • A0 to A4 represent samples neighboring to the left of the current block
  • B0 to B5 represent samples neighboring to the top of the current block.
  • A1 represents a sample neighboring to the left of LB
  • B1 represents a sample neighboring to the top of RT.
  • a neighboring block containing a sample that is, one of A0 to A4 or one of B0 to B5 that is spatially adjacent to the current block may be referred to as a spatial neighboring block.
  • Col indicates the position of a sample neighboring the bottom right of the current block in the co-located picture.
  • the collocated picture is a different picture from the current picture, and information for specifying the collocated picture can be explicitly encoded and signaled in the bitstream.
  • a reference picture with a predefined reference picture index may be set as a collocated picture.
  • a neighboring block containing a sample (i.e., Col) temporally adjacent to the current block may be referred to as a temporal neighboring block.
  • the motion vector prediction value of the current block may be derived from at least one motion vector prediction candidate included in the motion vector prediction list.
  • the number of motion vector prediction candidates that can be inserted into the motion vector prediction list (i.e., the size of the list) may be predefined in the encoder and decoder.
  • the maximum number of motion vector prediction candidates may be two.
  • a motion vector stored at the position of a neighboring sample adjacent to the current block or a scaled motion vector derived by scaling the motion vector may be inserted into the motion vector prediction list as a motion vector prediction candidate.
  • a motion vector prediction candidate can be derived by scanning neighboring samples adjacent to the current block in a predefined order.
  • the earliest discovered available motion vector can be inserted into the motion vector prediction list as a motion vector prediction candidate.
  • a motion vector prediction candidate can be derived based on the available vector found first. Specifically, after scaling the first available motion vector found, the scaled motion vector can be inserted into the motion vector prediction list as a motion vector prediction candidate. At this time, scaling may be performed based on the output order difference between the current picture and the reference picture (i.e., POC difference) and the output order difference between the current picture and the reference picture of the neighboring sample (i.e., POC difference).
  • a motion vector prediction candidate can be derived based on the available vector found first. Specifically, after scaling the first available motion vector found, the scaled motion vector can be inserted into the motion vector prediction list as a motion vector prediction candidate. At this time, scaling may be performed based on the output order difference between the current picture and the reference picture (i.e., POC difference) and the output order difference between the current picture and the reference picture of the neighboring sample (i.e., POC difference).
  • a motion vector prediction candidate may be derived based on the motion vector of a block including a sample at a predetermined position.
  • a motion vector prediction candidate can be derived from a sample adjacent to the left of the current block, and a motion vector prediction candidate can be derived from a sample adjacent to the top of the current block.
  • the motion vector prediction candidate derived from the left sample may be inserted into the motion vector prediction list before the motion vector prediction candidate derived from the top sample.
  • the index assigned to the motion vector prediction candidate derived from the left sample may have a smaller value than the motion vector prediction candidate derived from the top sample.
  • the motion vector prediction candidate derived from the top sample may be inserted into the motion vector prediction list before the motion vector prediction candidate derived from the left sample.
  • the motion vector prediction candidate with the highest coding efficiency may be set as the motion vector predictor (MVP) of the current block.
  • index information indicating a motion vector prediction candidate that is set as the motion vector prediction value of the current block among a plurality of motion vector prediction candidates may be encoded and signaled to the decoder.
  • the index information may be a 1-bit flag (eg, MVP flag).
  • MVD motion vector difference
  • the decoder can construct a motion vector prediction list in the same way as the encoder. Additionally, index information may be decoded from the bitstream, and one of a plurality of motion vector prediction candidates may be selected based on the decoded index information. The selected motion vector prediction candidate can be set as the motion vector prediction value of the current block.
  • motion vector difference values can be decoded from the bitstream. Afterwards, the motion vector of the current block can be derived by combining the motion vector prediction value and the motion vector difference value.
  • a motion vector prediction list can be generated for each of the L0 direction and L1 direction. That is, the motion vector prediction list may be composed of motion vectors in the same direction. Accordingly, the motion vector of the current block and the motion vector prediction candidates included in the motion vector prediction list have the same direction.
  • the reference picture index and prediction direction information may be explicitly encoded and signaled to the decoder.
  • the reference picture index can be explicitly encoded and signaled to the decoder.
  • the reference picture list includes only one reference picture, encoding/decoding of the reference picture index may be omitted.
  • Prediction direction information may be an index indicating one of L0 unidirectional prediction, L1 unidirectional prediction, or bidirectional prediction.
  • the L0 flag indicating whether prediction in the L0 direction is performed and the L1 flag indicating whether prediction in the L1 direction is performed may be encoded and signaled, respectively.
  • the motion information merge mode is a mode that sets the motion information of the current block to be the same as the motion information of the neighboring block.
  • motion information can be encoded/decoded using a motion information merge list.
  • a motion information merge candidate may be derived based on motion information of a neighboring block or neighboring sample adjacent to the current block. For example, after pre-defining a reference position around the current block, it is possible to check whether motion information exists at the pre-defined reference position. If motion information exists at a predefined reference location, motion information at that location can be inserted into the motion information merge list as a motion information merge candidate.
  • the predefined reference position may include at least one of A0, A1, B0, B1, B5, and Col.
  • motion information merging candidates can be derived in the following order: A1, B1, B0, A0, B5, and Col.
  • a motion information merging candidate may be derived based on motion information of a block including a sample of a predefined reference position.
  • the motion information of the motion information merge candidate with the optimal cost can be set as the motion information of the current block.
  • index information eg, merge index
  • a motion information merge candidate selected from among a plurality of motion information merge candidates may be encoded and transmitted to the decoder.
  • a motion information merge list can be constructed in the same way as in the encoder. Then, a motion information merge candidate can be selected based on the merge index decoded from the bitstream. The motion information of the selected motion information merge candidate may be set as the motion information of the current block.
  • the motion information merge list consists of a single list regardless of the prediction direction. That is, the motion information merge candidate included in the motion information merge list may have only L0 motion information or L1 motion information, or may have bidirectional motion information (i.e., L0 motion information and L1 motion information).
  • Motion information of the current block can also be derived using the restored sample area around the current block.
  • the restored sample area used to derive motion information of the current block may be called a template.
  • Figure 9 is a diagram for explaining a template-based motion estimation method.
  • the prediction block of the current block is determined based on the cost between the current block and the reference block within the search range.
  • motion estimation for the current block is based on the cost between a template neighboring the current block (hereinafter referred to as the current template) and a reference template having the same size and shape as the current template. can be performed.
  • the cost may be calculated based on the absolute sum of difference values between restored samples in the current template and restored samples in the reference block. The smaller the absolute value sum, the lower the cost can be.
  • the reference block neighboring the reference template can be set as the prediction block of the current block.
  • motion information of the current block can be set based on the distance between the current block and the reference block, the index of the picture to which the reference block belongs, and whether the reference picture is included in the L0 or L1 reference picture list.
  • the decoder itself can perform motion estimation in the same manner as the encoder. Accordingly, when motion information is derived using a template, there is no need to encode and signal motion information other than information indicating whether the template is used.
  • the current template may include at least one of an area adjacent to the top of the current block or an area adjacent to the left. At this time, the area adjacent to the top may include at least one row, and the area adjacent to the left may include at least one column.
  • Figure 10 shows examples of template configurations.
  • a current template may be constructed following one of the examples shown in Figure 10.
  • the template may be configured only from the area adjacent to the left side of the current block, or may be configured only from the area adjacent to the top of the current block.
  • the size and/or shape of the current template may be predefined in the encoder and decoder.
  • index information specifying one of the plurality of template candidates can be encoded and signaled to the decoder.
  • one of a plurality of template candidates may be adaptively selected based on at least one of the size, shape, or location of the current block. For example, if the current block touches the upper border of the CTU, the current template can be constructed only from the area adjacent to the left side of the current block.
  • Template-based motion estimation can be performed for each reference picture stored in the reference picture list.
  • motion estimation may be performed on only some of the reference pictures.
  • motion estimation is performed only on reference pictures with a reference picture index of 0, or only on reference pictures whose reference picture index is smaller than the threshold, or on reference pictures whose POC difference with the current picture is smaller than the threshold. It can be done.
  • the reference picture index can be explicitly encoded and signaled, and then motion estimation can be performed only on the reference picture indicated by the reference picture index.
  • motion estimation can be performed targeting the reference picture of a neighboring block corresponding to the current template. For example, if the template consists of a left neighboring area and a top neighboring area, at least one reference picture can be selected using at least one of the reference picture index of the left neighboring block or the reference picture index of the top neighboring block. Afterwards, motion estimation can be performed on at least one selected reference picture.
  • Information indicating whether template-based motion estimation has been applied may be encoded and signaled to the decoder.
  • the information may be a 1-bit flag. For example, if the flag is true (1), it indicates that template-based motion estimation is applied to the L0 direction and L1 direction of the current block. On the other hand, if the flag is false (0), it indicates that template-based motion estimation is not applied. In this case, motion information of the current block may be derived based on the motion information merging mode or motion vector prediction mode.
  • template-based motion estimation can be applied only when it is determined that the motion information merge mode and motion vector prediction mode are not applied to the current block. For example, when the first flag indicating whether the motion information merge mode is applied and the second flag indicating whether the motion vector prediction mode is applied are both 0, motion estimation based on the template may be performed.
  • template-based motion estimation For each of the L0 direction and the L1 direction, information indicating whether template-based motion estimation has been applied may be signaled. That is, whether template-based motion estimation is applied to the L0 direction and whether template-based motion estimation is applied to the L1 direction can be determined independently of each other. Accordingly, template-based motion estimation may be applied to one of the L0 direction and the L1 direction, while another mode (eg, motion information merge mode or motion vector prediction mode) may be applied to the other direction.
  • another mode eg, motion information merge mode or motion vector prediction mode
  • a prediction block of the current block may be generated based on a weighted sum operation of the L0 prediction block and the L1 prediction block.
  • the prediction block of the current block is based on a weighted sum operation of the L0 prediction block and the L1 prediction block. This can be created. This will be described later through Equation 2.
  • a motion estimation method based on a template may be inserted as a motion information merging candidate in a motion information merging mode or a motion vector prediction candidate in a motion vector prediction mode.
  • whether to apply the template-based motion estimation method may be determined based on whether the selected motion information merge candidate or the selected motion vector prediction candidate indicates the template-based motion estimation method.
  • movement information of the current block can also be generated.
  • Figure 11 is a diagram for explaining a motion estimation method based on a bilateral matching method.
  • the two-way matching method can be performed only when the temporal order of the current picture (i.e., POC) exists between the temporal order of the L0 reference picture and the temporal order of the L1 reference picture.
  • POC temporal order of the current picture
  • the search range can be set for each of the L0 reference picture and L1 reference picture.
  • the L0 reference picture index for identifying the L0 reference picture and the L1 reference picture index for identifying the L1 reference picture may be encoded and signaled, respectively.
  • L1 reference picture can be encoded and signaled, and the L1 reference picture can be selected based on the distance between the current picture and the L0 reference picture (hereinafter referred to as L0 POC difference).
  • L1 POC difference an L1 reference whose absolute value of the distance to the current picture
  • the L1 reference picture that has a different temporal direction from the L0 reference picture can be used for bilateral matching. For example, if the POC of the L0 reference picture is smaller than that of the current picture, one of the L1 reference pictures whose POC is larger than the current picture can be selected.
  • the L1 reference picture index may be encoded and signaled, and the L0 reference picture may be selected based on the distance between the current picture and the L1 reference picture.
  • a two-way matching method may be performed using an L0 reference picture among L0 reference pictures that is closest in distance to the current picture, and an L1 reference picture among L1 reference pictures that is closest in distance to the current picture.
  • two-way A matching method can also be performed.
  • the LX (X is 0 or 1) reference picture is selected based on an explicitly signaled reference picture index, and the L
  • the L0 and/or L1 reference picture may be selected based on the motion information of the neighboring block of the current block.
  • the L0 and/or L1 reference picture to be used for two-way matching can be selected using the reference picture index of the left or top neighboring block of the current block.
  • the search range can be set to within a predetermined range from the collocated block in the reference picture.
  • the search range can be set based on initial motion information.
  • Initial motion information may be derived from a neighboring block of the current block. For example, motion information of the left neighboring block or the top neighboring block of the current block may be set as the initial motion information of the current block.
  • the L0 motion vector and the motion vector in the L1 direction are set in opposite directions. This indicates that the sign of the L0 motion vector and the motion vector in the L1 direction have opposite signs.
  • the size of the LX motion vector may be proportional to the distance (i.e., POC difference) between the current picture and the LX reference picture.
  • L0 reference block the reference block within the search range of the L0 reference picture
  • L1 reference block the reference block within the search range of the L1 reference picture
  • D can be determined by the ratio of the distance between the current picture and the L0 reference picture and the distance between the L1 reference picture and the current picture.
  • the absolute value of the distance between the current picture (T) and the L0 reference picture (T-1) and the distance between the current picture (T) and the L1 reference picture (T+1) The absolute values are mutually identical. Accordingly, in the illustrated example, the L0 motion vector (x0, y0) and the L1 motion vector (x1, y1) have the same size but opposite distances. If an L1 reference picture with a POC of (T+2) was used, the L1 motion vector (x1, y1) will be set to (-2*x0, -2*y0).
  • the L0 reference block and L1 reference block can be set as the L0 prediction block and L1 prediction block of the current block, respectively.
  • the final prediction block of the current block can be generated through a weighted sum operation of the L0 reference block and the L1 reference block.
  • a prediction block of the current block may be generated according to Equation 2, which will be described later.
  • the decoder can perform motion estimation in the same way as the encoder. Accordingly, information indicating whether the two-way motion matching method is applied is explicitly encoded/decoded, while encoding/decoding of motion information such as motion vectors can be omitted. As described above, at least one of the L0 reference picture index or the L1 reference picture index may be explicitly encoded/decoded.
  • information indicating whether the two-way matching method has been applied may be explicitly encoded/decoded, but if the two-way matching method has been applied, the L0 motion vector or the L1 motion vector may be explicitly encoded and signaled. If the L0 motion vector is signaled, the L1 motion vector can be derived based on the POC difference between the current picture and the L0 reference picture and the POC difference between the current picture and the L1 reference picture. If the L1 motion vector is signaled, the L0 motion vector can be derived based on the POC difference between the current picture and the L0 reference picture and the POC difference between the current picture and the L1 reference picture. At this time, the encoder can explicitly encode the smaller one of the L0 motion vector and the L1 motion vector.
  • Information indicating whether the two-way matching method has been applied may be a 1-bit flag. As an example, if the flag is true (eg, 1), it may indicate that the two-way matching method is applied to the current block. If the flag is false (eg, 0), it may indicate that the two-way matching method is not applied to the current block. In this case, motion information merge mode or motion vector prediction mode may be applied to the current block.
  • the two-way matching method can be applied only when it is determined that the motion information merge mode and motion vector prediction mode are not applied to the current block. For example, when the first flag indicating whether the motion information merge mode is applied and the second flag indicating whether the motion vector prediction mode is applied are both 0, the two-way matching method may be applied.
  • the two-way matching method may be inserted as a motion information merge candidate in the motion information merge mode or a motion vector prediction candidate in the motion vector prediction mode.
  • whether to apply the two-way matching method may be determined based on whether the selected motion information merge candidate or the selected motion vector prediction candidate indicates the two-way matching method.
  • the temporal order of the current picture must exist between the temporal order of the L0 reference picture and the temporal order of the L1 reference picture. It is also possible to generate a prediction block of the current block by applying a one-way matching method that does not apply the constraints of the above two-way matching method.
  • a one-way matching method two reference pictures whose temporal order (i.e., POC) is smaller than that of the current block or two reference pictures whose temporal order is larger than the current block can be used.
  • both reference pictures may be derived from the L0 reference picture list or the L1 reference picture list.
  • one of the two reference pictures may be derived from the L0 reference picture list, and the other may be derived from the L1 reference picture list.
  • Inter prediction can be performed using a picture different from the current picture as a reference picture.
  • a picture different from the current picture may have a different output order (Picture Order Count, POC) from the current picture.
  • POC Picture Order Count
  • a virtual reference picture may be generated using a plurality of pictures whose output order is different from the current picture, and inter prediction on the current block may be performed using the virtual reference picture. Generation of a virtual reference picture can be performed equally in the encoder and decoder.
  • the number of selectable reference pictures can be increased and prediction accuracy can be improved. Additionally, since the decoder can generate a virtual reference picture in the same way as the encoder, a new reference picture can be generated without increasing the amount of data to be encoded.
  • Figure 12 shows an example of generating a virtual reference picture.
  • T represents the output order of the current picture.
  • the reference pictures used to generate the virtual reference picture have a different output order (T-2, T-1, T+1, T+2) than the current picture.
  • a virtual reference picture can be generated using at least n reference pictures whose output order is different from the current picture.
  • n may be an integer of 2 or more.
  • the plurality of reference pictures used to obtain the virtual reference picture include reference pictures that are earlier in temporal order than the current picture (i.e., reference pictures whose POC is smaller than the current picture) and reference pictures that are later in temporal order than the current picture (i.e., The POC may include a reference picture (a reference picture that is larger than the current picture).
  • the virtual reference picture when generating a virtual reference picture, may be generated using at least one pair of reference pictures that have the same distance from the current picture.
  • the distance to the current picture means the absolute value of the POC difference with the current picture.
  • the distance from the current picture is A reference picture in the Backward direction where is 1 (i.e., a reference picture with a POC of (T+1)) may also be selected.
  • a virtual reference picture can be generated by selecting at least one reference picture from the L0 reference picture list and at least one reference picture from the L1 reference picture list.
  • a virtual reference picture may be generated using only reference pictures belonging to the L0 reference picture list, or a virtual reference picture may be generated using only reference pictures belonging to the L1 reference picture list.
  • the output order of the virtual reference picture may be set to be the same as the current picture.
  • the output order of the virtual reference picture is set to T, the same as that of the current picture.
  • the output order of the virtual reference picture can be set differently depending on the output order of the reference pictures used to generate the virtual reference picture. For example, if the distance D1 between the current picture and the forward reference picture used to generate the virtual reference picture is different from the distance D2 between the current picture and the future reference picture used to generate the virtual reference picture, the virtual reference picture The output order of the reference picture may be set differently from the current picture. At this time, the output order of the virtual reference picture may be set differently from the output order of reference pictures included in the reference picture list.
  • a virtual reference picture can be created through a two-way matching method.
  • Figure 13 shows an example in which a virtual reference picture is generated based on a two-way matching method.
  • a virtual reference picture is created using a reference picture with an output order of (T-1) and a reference picture with an output order of (T+1).
  • a virtual reference picture may be generated in block units.
  • an empty virtual reference picture can be divided into a plurality of blocks.
  • the size of the block may be predefined in the encoder and decoder.
  • the block size may be 4x4, 8x8, 16x16, or 32x32.
  • information about the block size may be encoded and signaled.
  • the information may be encoded and signaled through a video parameter set, sequence parameter set, picture header, or slice header.
  • the size of the divided block may be set to the same size as the coding tree unit or to the 1/N size of the coding tree unit.
  • N may be 2, 4, or 8.
  • a search area can be set centered on the same location of the block in the reference pictures, and a combination of reference blocks with the lowest cost in the search area can be searched.
  • the cost may be SAD (Sum of Absolute Difference).
  • a virtual reference picture may be generated using a larger number of reference pictures than shown in FIG. 13.
  • a virtual reference picture can be created using three or more reference pictures.
  • a virtual reference picture can be generated using two reference pictures whose temporal order is earlier than the current sequence and two reference pictures whose temporal order is later than the current sequence.
  • the two-way matching method when generating a virtual reference picture, can be applied only to areas where object movement exists. In areas where two-way matching is not applied, the sample value within the virtual reference picture may be determined based on a weighted sum operation or average operation of co-located samples within the reference pictures.
  • Figure 14 shows an example in which two-way matching is applied only to some areas of the virtual reference picture.
  • the virtual reference picture to be created is divided into a plurality of blocks.
  • reference pictures used to generate virtual reference pictures are also divided into a plurality of blocks. Afterwards, for the two reference pictures, samples at the same location are subtracted, and then the standard deviation is derived on a block basis.
  • Figures 14 (a) and (b) show reference pictures used to generate a virtual reference picture, and Figure 14 (c) shows an area where the standard deviation is greater than the threshold.
  • white areas represent blocks whose standard deviation is smaller than the threshold. In that area, the object may be determined to have no or weak movement.
  • black areas represent blocks whose standard deviation is equal to or greater than the threshold. In this area, it represents the area where the movement of the object exists.
  • two-way matching is applied to derive a pair of reference blocks with the lowest cost.
  • a block in the virtual reference picture is derived through a weighted sum operation or average operation of a pair of reference blocks.
  • a block in the virtual reference picture is derived through a weighted sum or average calculation of co-located blocks in the reference pictures.
  • the prediction accuracy of a block using the virtual reference picture may deteriorate due to blocking artifacts.
  • two-way matching can be performed on a block with a larger size than the block currently being created (hereinafter referred to as the current virtual block) in the virtual reference picture.
  • Figure 15 shows an example in which two-way matching is applied to a block of extended size.
  • the width is expanded by (w0+w1) and the height is expanded by (h0+h1) compared to the block of size wxh.
  • Two-way matching can be applied to blocks of the specified size.
  • w0 and w1 represent the left and right extended areas, respectively
  • h0 and h1 represent the upper and lower extended areas, respectively.
  • the size of the extended area in each direction may be the same. That is, the values of w0, w1, h0, and h1 may all be set to be the same. At this time, the size of the extended area may be predefined in the encoder and decoder.
  • the size of the extension area in each direction may be variably determined depending on whether a block to be created in a virtual picture or a reference block in a reference picture corresponding to the block borders a picture boundary.
  • information about the size of the extended area may be encoded and signaled.
  • Blocks obtained through two-way matching can be divided into a block area and an extended area.
  • the block area is an area corresponding to the current virtual block
  • the extended area represents the remaining area excluding the current virtual block.
  • the extended area overlaps with other blocks in the virtual reference picture, and smoothing can be performed on other blocks using the extended area. As a result of smoothing, blocking artifacts between blocks can be reduced.
  • Figure 16 shows an example in which a smoothing filter is applied at the boundary between blocks.
  • the size of the extended area is 2.
  • a 6x6 block can be obtained through two-way matching. Accordingly, the extended area of the block of the extended size obtained through two-way matching overlaps with the adjacent block.
  • a smoothing filter can be applied to a partial area of the current virtual block using pixels that overlap with a partial area of the current virtual block among the extended areas of the left neighboring block.
  • the smoothing filter may be applied by weighted summing overlapping samples.
  • the size of the extended area may be adaptively adjusted according to the encoding/decoding order. For example, if the encoding/decoding order proceeds sequentially from upper left to lower right (e.g., left ⁇ right, top ⁇ bottom), previously created virtual blocks may exist to the left and top of the current virtual block.
  • the extended area can be set only to the left and top of the block. Accordingly, a smoothing filter based on the extended area of the current virtual block may be applied to virtual blocks neighboring the left and top sides of the current virtual block.
  • the extended area can be set only to the right and bottom of the block. Accordingly, a smoothing filter may be applied to the current virtual block based on the extension areas of each of the left neighboring block and the top neighboring block.
  • the weight applied to each sample may be determined according to the priority between samples.
  • the priority may be determined based on whether the sample is included in the current virtual block, whether the sample is included in the extended area of a neighboring block, or the distance from a specific boundary of the current virtual block.
  • the weight applied to the sample included in the current virtual block may have a larger value than the weight applied to the sample included in the extended area of the neighboring block.
  • a smoothing filter that weights the b0 sample and the a0 sample may be applied to the b0 position in the current virtual block. At this time, a greater weight may be assigned to b0 than to a0.
  • the weight applied to the sample included in the current virtual block may have a smaller value than the weight applied to the sample included in the extended area of the neighboring block.
  • the weight may be adaptively determined depending on the location of the sample in the current virtual block.
  • the filter applied to the sample within the current virtual block is A weight adapted to the sample included in the extended area of the weight and neighboring block may be determined. Specifically, the closer it is to the left boundary of the current block, the greater the weight applied to the samples included in the extended area of the neighboring block may have a greater value.
  • the weight applied to the sample (i.e., a0) included in the extended area of the neighboring block is greater than the weight applied to the sample (i.e., a4) included in the extended area of the neighboring block at the b4 location. It can have a larger value.
  • BDOF Bi-Directional Optical Flow
  • an in-loop filter may be applied to the virtual reference picture.
  • the in-loop filter may include at least one of a deblocking filter, a sample adaptive offset filter, or an adaptive loop filter.
  • a prediction block for the current block can be generated.
  • optimal motion information can be derived through motion estimation in a virtual reference picture, and then a prediction block of the current block can be generated using the derived optimal motion information.
  • motion information of the current block may be encoded and signaled.
  • a reference picture list can also be configured to include a virtual reference picture.
  • information indicating whether the virtual reference picture is included in the reference picture list may be encoded and signaled at a higher level, such as a sequence parameter set, slice header, or picture header.
  • a virtual reference picture in the reference picture list is selected, inter prediction for the current block can be performed using the virtual reference picture.
  • whether to generate a virtual reference picture may be adaptively determined. For example, if the current picture is of type B, and there is a reference picture earlier than the current picture and a reference picture later than the current picture, a virtual reference picture may be created.
  • a virtual reference picture can be inserted at a specific position in the reference picture list.
  • the specific position may be the first position (i.e., index 0) in the reference picture list.
  • the specific location may be the last location in the reference picture list.
  • the virtual reference picture may be inserted into at least one of the L0 reference picture list or the L1 reference picture list.
  • Information indicating whether inter prediction using a virtual reference picture is performed on a block basis in the current picture may be signaled.
  • the information may be a 1-bit flag. If it is determined that inter prediction using a virtual reference picture is performed for the current block, encoding/decoding of part of the motion information may be omitted. For example, when it is determined that inter prediction using a virtual reference picture is performed on the current block, encoding/decoding of at least one of the prediction direction, reference picture index, or motion vector may be omitted.
  • a virtual reference picture When a virtual reference picture is inserted into the reference picture list, it may be determined whether the virtual reference picture is used as a reference picture based on the reference picture index. For example, when the reference picture index points to index 0, inter prediction may be performed using a virtual reference picture.
  • the virtual reference picture may be used for at least one of a motion vector prediction mode, a motion vector merging mode, or a template-based motion information estimation method.
  • whether or not to use a virtual reference picture may be determined depending on the inter prediction mode. As an example, using a virtual reference picture may be allowed only in motion vector prediction mode and may not be allowed in motion vector merging mode.
  • the motion vector of the current block may be fixed to the zero vector (0, 0). That is, when a virtual reference picture is selected, a prediction block of the current block can be derived from a block at the same location as the current block in the virtual reference picture. In this case, encoding/decoding the motion vector for the current block may be omitted.
  • the motion vector of the current block may be set to the zero vector. For example, only when the output order of the virtual reference picture is the same as the output order of the current picture, the motion vector of the current block can be set to the zero vector (0, 0).
  • the motion vector for the current block may be encoded/decoded, and a reference block within the virtual reference picture may be selected based on the encoded/decoded motion vector.
  • the size of the motion vector of the current block can be limited to within a preset range.
  • the absolute values of the horizontal component and vertical component of the motion vector of the current block may not exceed N, respectively.
  • size candidates and direction candidates for each of the horizontal and vertical components of the motion vector are stored, and then index information indicating one of the size candidates and index information indicating one of the direction candidates are encoded and signaled. can do.
  • deriving a motion vector prediction candidate or a motion information merge candidate from a call block may be omitted.
  • a virtual reference picture can be used only when unidirectional prediction is applied. That is, only when L0 direction prediction or L1 direction prediction is applied to the current block, the virtual reference picture can be set as the reference picture of the current block.
  • the prediction block of the current block can be obtained through a weighted sum or average operation of a reference block derived from a virtual reference picture and a reference block derived from a reference picture different from the virtual reference picture.
  • the reference picture used to generate the virtual reference picture may not be selected together with the virtual reference picture during bidirectional prediction of the current block. That is, a reference picture that is different from the virtual reference picture may be a reference picture that is not used to generate the virtual reference picture.
  • the reference picture used to generate the virtual reference picture may be excluded from the reference picture list. That is, in the reference picture list, only virtual reference pictures and reference pictures that are not used to generate the virtual reference picture can be present.
  • the reference picture index may be explicitly encoded and signaled.
  • the reference picture list can be reconstructed using only available reference pictures, and the reference picture index can be encoded to indicate one of the reference pictures included in the reconstructed reference picture list.
  • the virtual reference picture is excluded and the indices of the reference pictures are reallocated. Afterwards, the reference picture index is encoded to point to one of the reallocated indices.
  • the indices of the reference pictures are reallocated by excluding the reference pictures. Afterwards, the reference picture index is encoded to point to one of the reallocated indices.
  • the reference picture index is restored by comparing the decoded reference picture index with the index of the unavailable reference picture. For example, if the decoded reference picture index is smaller than the index of an unavailable reference picture, the decoded reference picture index is used as is. On the other hand, if the decoded reference picture index is equal to or greater than the index of an unavailable reference picture, 1 is added to the decoded reference picture index.
  • a first prediction block is obtained by weighting a first reference block derived from a first reference picture and a second reference block derived from a second reference picture. Thereafter, the second prediction block can be obtained by performing a weighted sum of the first prediction block and the third reference block in the virtual reference picture.
  • At least one of the first reference picture and the second reference picture may not be used to generate a virtual reference picture.
  • each of the components (e.g., units, modules, etc.) constituting the block diagram in the above-described disclosure may be implemented as a hardware device or software, and a plurality of components may be combined to form a single hardware device or software. It could be.
  • the above-described disclosure may be implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium.
  • the computer-readable recording medium may include program instructions, data files, data structures, etc., singly or in combination.
  • Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. media), and hardware devices specifically configured to store and perform program instructions, such as ROM, RAM, flash memory, etc.
  • the hardware devices may be configured to operate as one or more software modules to perform processing according to the present disclosure, and vice versa.
  • Embodiments through this disclosure can be applied to electronic devices that encode or decode images.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Un procédé de décodage d'image divulgué par la présente invention comprend les étapes consistant à : sélectionner une image de référence pour le bloc courant à partir d'une liste d'images de référence ; et à acquérir un bloc de prédiction pour le bloc courant sur la base de l'image de référence. Dans ce cas, le bloc de référence peut être un bloc de référence virtuel généré sur la base d'une pluralité de blocs de référence inclus dans la liste d'images de référence.
PCT/KR2023/004412 2022-04-01 2023-03-31 Procédé de codage/décodage de signal vidéo et support d'enregistrement pour stocker un flux binaire WO2023191599A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2022-0041298 2022-04-01
KR20220041298 2022-04-01

Publications (1)

Publication Number Publication Date
WO2023191599A1 true WO2023191599A1 (fr) 2023-10-05

Family

ID=88203130

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2023/004412 WO2023191599A1 (fr) 2022-04-01 2023-03-31 Procédé de codage/décodage de signal vidéo et support d'enregistrement pour stocker un flux binaire

Country Status (2)

Country Link
KR (1) KR20230142375A (fr)
WO (1) WO2023191599A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190116067A (ko) * 2018-04-03 2019-10-14 한국전자통신연구원 딥 러닝에 기반하여 생성된 참조 프레임을 이용하는 인터 예측을 위한 방법 및 장치
US20200359041A1 (en) * 2015-10-15 2020-11-12 Cisco Technology, Inc. Low-complexity method for generating synthetic reference frames in video coding
KR20210109049A (ko) * 2017-08-22 2021-09-03 구글 엘엘씨 비디오 코딩에서 모션 보상 예측을 위한 광흐름 추정
KR20210131978A (ko) * 2017-08-29 2021-11-03 에스케이텔레콤 주식회사 양방향 옵티컬 플로우를 이용한 움직임 보상 방법 및 장치
WO2022065977A1 (fr) * 2020-09-28 2022-03-31 현대자동차주식회사 Procédé de prédiction inter basé sur un apprentissage profond à coefficient variable

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200359041A1 (en) * 2015-10-15 2020-11-12 Cisco Technology, Inc. Low-complexity method for generating synthetic reference frames in video coding
KR20210109049A (ko) * 2017-08-22 2021-09-03 구글 엘엘씨 비디오 코딩에서 모션 보상 예측을 위한 광흐름 추정
KR20210131978A (ko) * 2017-08-29 2021-11-03 에스케이텔레콤 주식회사 양방향 옵티컬 플로우를 이용한 움직임 보상 방법 및 장치
KR20190116067A (ko) * 2018-04-03 2019-10-14 한국전자통신연구원 딥 러닝에 기반하여 생성된 참조 프레임을 이용하는 인터 예측을 위한 방법 및 장치
WO2022065977A1 (fr) * 2020-09-28 2022-03-31 현대자동차주식회사 Procédé de prédiction inter basé sur un apprentissage profond à coefficient variable

Also Published As

Publication number Publication date
KR20230142375A (ko) 2023-10-11

Similar Documents

Publication Publication Date Title
WO2017222325A1 (fr) Dispositif et procédé de traitement de signal vidéo
WO2018106047A1 (fr) Procédé et appareil de traitement de signal vidéo
WO2018212577A1 (fr) Procédé et dispositif de traitement de signal vidéo
WO2018221817A1 (fr) Procédé et dispositif de réalisation de décodage d'image basé sur une intraprédiction dans un système de codage d'image
WO2018008906A1 (fr) Procédé et appareil de traitement de signal vidéo
WO2019164031A1 (fr) Procédé et appareil de décodage d'image en fonction d'une structure de division de bloc dans un système de codage d'image
WO2018236031A1 (fr) Procédé de traitement d'image basé sur un mode d'intraprédiction, et appareil associé
WO2020096425A1 (fr) Procédé de codage/décodage de signal d'image, et dispositif associé
WO2018236028A1 (fr) Procédé de traitement d'image basé sur un mode d'intra-prédiction et appareil associé
WO2018008905A1 (fr) Procédé et appareil de traitement de signal vidéo
WO2019066524A1 (fr) Procédé et appareil de codage/ décodage d'image et support d'enregistrement pour stocker un train de bits
WO2018066958A1 (fr) Procédé et appareil de traitement de signal vidéo
WO2018044089A1 (fr) Procédé et dispositif pour traiter un signal vidéo
WO2019182295A1 (fr) Procédé et appareil de traitement de signal vidéo
WO2018105759A1 (fr) Procédé de codage/décodage d'image et appareil associé
WO2018056701A1 (fr) Procédé et appareil de traitement de signal vidéo
WO2017086738A1 (fr) Procédé et appareil de codage/décodage d'image
WO2021006617A1 (fr) Procédé et dispositif de codage et de décodage vidéo utilisant une prédiction inter
WO2019182329A1 (fr) Appareil/procédé de décodage d'image, appareil/procédé de codage d'image, et train de bits de stockage de support d'enregistrement
WO2017150823A1 (fr) Procédé d'encodage/décodage de signal vidéo, et appareil associé
WO2021040458A1 (fr) Procédé et dispositif de traitement de signal vidéo
WO2019078427A1 (fr) Procédé de traitement d'image basé sur un mode d'interprédiction et dispositif associé
WO2016200235A1 (fr) Procédé de traitement d'image basé sur un mode de prédiction intra et appareil associé
WO2017030270A1 (fr) Procédé de traitement d'image basé sur un mode d'intraprédiction, et appareil associé
WO2021125904A1 (fr) Procédé et dispositif de traitement de signal vidéo

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23781448

Country of ref document: EP

Kind code of ref document: A1