WO2024071523A1 - Procédé et dispositif de codage vidéo utilisant une prédiction améliorée de modèle linéaire inter-composantes - Google Patents

Procédé et dispositif de codage vidéo utilisant une prédiction améliorée de modèle linéaire inter-composantes Download PDF

Info

Publication number
WO2024071523A1
WO2024071523A1 PCT/KR2022/019676 KR2022019676W WO2024071523A1 WO 2024071523 A1 WO2024071523 A1 WO 2024071523A1 KR 2022019676 W KR2022019676 W KR 2022019676W WO 2024071523 A1 WO2024071523 A1 WO 2024071523A1
Authority
WO
WIPO (PCT)
Prior art keywords
mode
predictor
prediction
pixels
block
Prior art date
Application number
PCT/KR2022/019676
Other languages
English (en)
Korean (ko)
Inventor
전병우
이지환
김범윤
허진
박승욱
Original Assignee
현대자동차주식회사
기아 주식회사
성균관대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020220167522A external-priority patent/KR20240043043A/ko
Application filed by 현대자동차주식회사, 기아 주식회사, 성균관대학교 산학협력단 filed Critical 현대자동차주식회사
Publication of WO2024071523A1 publication Critical patent/WO2024071523A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • This disclosure relates to a video coding method and apparatus using improved cross-component linear model prediction.
  • video data Since video data has a larger amount of data than audio data or still image data, it requires a lot of hardware resources, including memory, to store or transmit it without processing for compression.
  • an encoder when storing or transmitting video data, an encoder is used to compress the video data and store or transmit it, and a decoder receives the compressed video data, decompresses it, and plays it.
  • video compression technologies include H.264/AVC, HEVC (High Efficiency Video Coding), and VVC (Versatile Video Coding), which improves coding efficiency by about 30% or more compared to HEVC.
  • an image to be encoded is partitioned into coding units (CUs) of various shapes and sizes and then encoded in units of CUs.
  • the tree structure represents information defining the division of these CU units, and can be transmitted from the encoder to the decoder to indicate the division type of the image.
  • the luma image and chroma image can be divided independently.
  • the luma signal and the chroma signal may be divided into CUs of the same structure.
  • CST Chroma Separate Tree
  • the chroma block may have a different partitioning method than the luma block. Additionally, a technology in which luma signals and chroma signals have the same division structure is called single tree technology. When the single tree technique is used, the chroma block has the same partitioning method as the luma block.
  • CCLM Cross-Component Linear Model
  • CCLM prediction For intra prediction of the current chroma block, CCLM prediction first determines the luma area corresponding to the current chroma block within the luma image. Afterwards, CCLM prediction derives a linear model between the pixels in the surrounding pixel lines of the current chroma block and the corresponding luma pixels. Finally, CCLM prediction uses the derived linear model to generate a predictor of the current chroma block from the pixel value of the corresponding luma area.
  • CCLM prediction as described above, neighboring pixels of the current chroma block are used to derive a linear model, but there is a problem in that the restored neighboring pixels are not used when generating a predictor. Therefore, when using CCLM prediction to improve picture quality and improve coding efficiency during intra prediction of the current chroma block, a method of additionally using the reconstructed neighboring pixels needs to be considered.
  • the present disclosure In order to improve the prediction performance of CCLM (Cross-Component Linear Model) prediction in intra prediction of the current chroma block, the present disclosure generates a first predictor of the current chroma block according to CCLM prediction and replaces the restored surrounding pixels.
  • the purpose is to provide a video coding method and device for additionally generating a second predictor of the current chroma block based on and then weightedly combining the first predictor and the second predictor.
  • the step of decoding a cross-component prediction mode for cross-component prediction for the current chroma block the cross component prediction predicts the current chroma block using pixels of the corresponding luma area for the current chroma block and the corresponding luma area; generating a first predictor of the current chroma block by performing the cross-component prediction based on the cross-component prediction mode; Inferring a representative mode from the reconstructed information of a peripheral chroma pixel area, wherein the peripheral chroma pixel area includes pixels surrounding the current chroma block, and the reconstructed information includes the Contains the values of pixels in the surrounding chroma pixel area, the positions of the pixels, and the number of pixels; generating a second predictor of the current chroma block by performing intra prediction using neighboring pixels of the current chroma block based on the representative
  • the step of determining a cross-component prediction mode for cross-component prediction for the current chroma block where the cross component prediction predicts the current chroma block using pixels of the corresponding luma area for the current chroma block and the corresponding luma area; generating a first predictor of the current chroma block by performing the cross-component prediction based on the cross-component prediction mode; Inferring a representative mode from the reconstructed information of a peripheral chroma pixel area, wherein the peripheral chroma pixel area includes pixels surrounding the current chroma block, and the reconstructed information includes the Contains the values of pixels in the surrounding chroma pixel area, the positions of the pixels, and the number of pixels; generating a second predictor of the current chroma block by performing intra prediction using neighboring pixels of the current chroma block based on
  • a computer-readable recording medium stores a bitstream generated by an image encoding method, wherein the image encoding method includes cross-component prediction for a current chroma block. determining a component prediction mode, wherein the cross-component prediction predicts the current chroma block using pixels of the corresponding luma area for the current chroma block and the corresponding luma area; generating a first predictor of the current chroma block by performing the cross-component prediction based on the cross-component prediction mode; Inferring a representative mode from the reconstructed information of a peripheral chroma pixel area, wherein the peripheral chroma pixel area includes pixels surrounding the current chroma block, and the reconstructed information includes the Contains the values of pixels in the surrounding chroma pixel area, the positions of the pixels, and the number of pixels; generating a second predictor of the current chroma block by performing intra prediction using neighboring pixels of the current chroma block
  • a first predictor of the current chroma block is generated according to CCLM prediction, and a second predictor of the current chroma block is generated based on the reconstructed neighboring pixels.
  • FIG. 1 is an example block diagram of a video encoding device that can implement the techniques of the present disclosure.
  • Figure 2 is a diagram to explain a method of dividing a block using the QTBTTT structure.
  • 3A and 3B are diagrams showing a plurality of intra prediction modes including wide-angle intra prediction modes.
  • Figure 4 is an example diagram of neighboring blocks of the current block.
  • Figure 5 is an example block diagram of a video decoding device that can implement the techniques of the present disclosure.
  • Figure 6 is an example diagram showing surrounding pixels referenced for CCLM prediction.
  • Figure 7 is an example diagram showing information that can be used in intra prediction of a chroma channel.
  • FIG. 8 is an exemplary diagram illustrating an intra prediction unit that performs intra prediction of a chroma block according to an embodiment of the present disclosure.
  • Figure 9 is an exemplary diagram showing a peripheral chroma pixel area according to an embodiment of the present disclosure.
  • 10 and 11 are exemplary diagrams showing intensity histograms for each directional mode, according to an embodiment of the present disclosure.
  • Figure 12 is an exemplary diagram showing a peripheral chroma pixel area according to another embodiment of the present disclosure.
  • 13A and 13B are exemplary diagrams showing intensity histograms for each directional mode according to another embodiment of the present disclosure.
  • Figure 14 is an exemplary diagram showing a peripheral chroma pixel area according to another embodiment of the present disclosure.
  • Figure 15 is an exemplary diagram showing a distortion histogram for each directional mode, according to an embodiment of the present disclosure.
  • 16A and 16B are flowcharts showing an intra prediction method of a current chroma block according to an embodiment of the present disclosure.
  • Figure 17 is an exemplary diagram showing a luma pixel area within a corresponding luma area, according to an embodiment of the present disclosure.
  • Figure 18 is an exemplary diagram showing a luma pixel area according to another embodiment of the present disclosure.
  • 19 to 21 are exemplary diagrams showing the distribution of neighboring blocks and prediction modes of the current chroma block, according to another embodiment of the present disclosure.
  • Figures 22 and 23 are exemplary diagrams showing the distribution of blocks and prediction modes included in the corresponding luma area, according to another embodiment of the present disclosure.
  • Figure 24 is an example diagram showing an intensity histogram for each directional mode according to another embodiment of the present disclosure.
  • Figure 25 is an example diagram showing the distribution of neighboring blocks and prediction modes of the current chroma block, according to another embodiment of the present disclosure.
  • FIG. 1 is an example block diagram of a video encoding device that can implement the techniques of the present disclosure.
  • the video encoding device and its sub-configurations will be described with reference to the illustration in FIG. 1.
  • the image encoding device includes a picture division unit 110, a prediction unit 120, a subtractor 130, a transform unit 140, a quantization unit 145, a rearrangement unit 150, an entropy encoding unit 155, and an inverse quantization unit. It may be configured to include (160), an inverse transform unit (165), an adder (170), a loop filter unit (180), and a memory (190).
  • Each component of the video encoding device may be implemented as hardware or software, or may be implemented as a combination of hardware and software. Additionally, the function of each component may be implemented as software and a microprocessor may be implemented to execute the function of the software corresponding to each component.
  • One image consists of one or more sequences including a plurality of pictures. Each picture is divided into a plurality of regions and encoding is performed for each region. For example, one picture is divided into one or more tiles and/or slices. Here, one or more tiles can be defined as a tile group. Each tile or/slice is divided into one or more Coding Tree Units (CTUs). And each CTU is divided into one or more CUs (Coding Units) by a tree structure. Information applied to each CU is encoded as the syntax of the CU, and information commonly applied to CUs included in one CTU is encoded as the syntax of the CTU.
  • CTUs Coding Tree Units
  • information commonly applied to all blocks within one slice is encoded as the syntax of the slice header, and information applied to all blocks constituting one or more pictures is a picture parameter set (PPS) or picture parameter set. Encoded in the header. Furthermore, information commonly referenced by multiple pictures is encoded in a sequence parameter set (SPS). And, information commonly referenced by one or more SPSs is encoded in a video parameter set (VPS). Additionally, information commonly applied to one tile or tile group may be encoded as the syntax of a tile or tile group header. Syntax included in the SPS, PPS, slice header, tile, or tile group header may be referred to as high level syntax.
  • the picture division unit 110 determines the size of the CTU (Coding Tree Unit). Information about the size of the CTU (CTU size) is encoded as SPS or PPS syntax and transmitted to the video decoding device.
  • CTU size Information about the size of the CTU (CTU size) is encoded as SPS or PPS syntax and transmitted to the video decoding device.
  • the picture division unit 110 divides each picture constituting the image into a plurality of CTUs (Coding Tree Units) with a predetermined size, and then repeatedly divides the CTUs using a tree structure. (recursively) Divide.
  • a leaf node in the tree structure becomes a coding unit (CU), the basic unit of encoding.
  • CU coding unit
  • the tree structure is QuadTree (QT), in which the parent node is divided into four child nodes (or child nodes) of the same size, or BinaryTree, in which the parent node is divided into two child nodes. , BT), or a TernaryTree (TT) in which the parent node is divided into three child nodes in a 1:2:1 ratio, or a structure that mixes two or more of these QT structures, BT structures, and TT structures.
  • QTBT QuadTree plus BinaryTree
  • QTBTTT QuadTree plus BinaryTree TernaryTree
  • BTTT may be combined and referred to as MTT (Multiple-Type Tree).
  • Figure 2 is a diagram to explain a method of dividing a block using the QTBTTT structure.
  • the CTU can first be divided into a QT structure. Quadtree splitting can be repeated until the size of the splitting block reaches the minimum block size (MinQTSize) of the leaf node allowed in QT.
  • the first flag (QT_split_flag) indicating whether each node of the QT structure is split into four nodes of the lower layer is encoded by the entropy encoder 155 and signaled to the video decoding device. If the leaf node of QT is not larger than the maximum block size (MaxBTSize) of the root node allowed in BT, it may be further divided into either the BT structure or the TT structure. In the BT structure and/or TT structure, there may be multiple division directions.
  • a second flag indicates whether the nodes have been split, and if split, an additional flag indicating the splitting direction (vertical or horizontal) and/or the splitting type (Binary). Or, a flag indicating Ternary) is encoded by the entropy encoding unit 155 and signaled to the video decoding device.
  • a CU split flag (split_cu_flag) indicating whether the node is split is encoded. It could be. If the CU split flag (split_cu_flag) value indicates that it is not split, the block of the corresponding node becomes a leaf node in the split tree structure and becomes a CU (coding unit), which is the basic unit of coding. When the CU split flag (split_cu_flag) value indicates splitting, the video encoding device starts encoding from the first flag in the above-described manner.
  • QTBT When QTBT is used as another example of a tree structure, there are two types: a type that horizontally splits the block of the node into two blocks of the same size (i.e., symmetric horizontal splitting) and a type that splits it vertically (i.e., symmetric vertical splitting). Branches may exist.
  • a split flag (split_flag) indicating whether each node of the BT structure is divided into blocks of a lower layer and split type information indicating the type of division are encoded by the entropy encoder 155 and transmitted to the video decoding device.
  • split_flag split flag
  • the asymmetric form may include dividing the block of the corresponding node into two rectangular blocks with a size ratio of 1:3, or may include dividing the block of the corresponding node diagonally.
  • a CU can have various sizes depending on the QTBT or QTBTTT division from the CTU.
  • the block corresponding to the CU i.e., leaf node of QTBTTT
  • the 'current block' the block corresponding to the CU (i.e., leaf node of QTBTTT) to be encoded or decoded
  • the shape of the current block may be rectangular as well as square.
  • the prediction unit 120 predicts the current block and generates a prediction block.
  • the prediction unit 120 includes an intra prediction unit 122 and an inter prediction unit 124.
  • each current block in a picture can be coded predictively.
  • prediction of the current block is done using intra prediction techniques (using data from the picture containing the current block) or inter prediction techniques (using data from pictures coded before the picture containing the current block). It can be done.
  • Inter prediction includes both one-way prediction and two-way prediction.
  • the intra prediction unit 122 predicts pixels within the current block using pixels (reference pixels) located around the current block within the current picture including the current block.
  • the plurality of intra prediction modes may include two non-directional modes including a planar mode and a DC mode and 65 directional modes.
  • the surrounding pixels and calculation formulas to be used are defined differently for each prediction mode.
  • the directional modes (67 to 80, -1 to -14 intra prediction modes) shown by dotted arrows in FIG. 3B can be additionally used. These may be referred to as “wide angle intra-prediction modes”.
  • the arrows point to corresponding reference samples used for prediction and do not indicate the direction of prediction. The predicted direction is opposite to the direction indicated by the arrow.
  • Wide-angle intra prediction modes are modes that perform prediction in the opposite direction of a specific directional mode without transmitting additional bits when the current block is rectangular. At this time, among the wide-angle intra prediction modes, some wide-angle intra prediction modes available for the current block may be determined according to the ratio of the width and height of the rectangular current block.
  • intra prediction modes 67 to 80 are available when the current block is in the form of a rectangle whose height is smaller than its width
  • wide-angle intra prediction modes with angles larger than -135 degrees are available.
  • Intra prediction modes (-1 to -14 intra prediction modes) are available when the current block has a rectangular shape with a width greater than the height.
  • the intra prediction unit 122 can determine the intra prediction mode to be used to encode the current block.
  • intra prediction unit 122 may encode the current block using multiple intra prediction modes and select an appropriate intra prediction mode to use from the tested modes. For example, the intra prediction unit 122 calculates rate-distortion values using rate-distortion analysis for several tested intra-prediction modes and has the best rate-distortion characteristics among the tested modes. You can also select intra prediction mode.
  • the intra prediction unit 122 selects one intra prediction mode from a plurality of intra prediction modes and predicts the current block using surrounding pixels (reference pixels) and an operation formula determined according to the selected intra prediction mode.
  • Information about the selected intra prediction mode is encoded by the entropy encoding unit 155 and transmitted to the video decoding device.
  • the inter prediction unit 124 generates a prediction block for the current block using a motion compensation process.
  • the inter prediction unit 124 searches for a block most similar to the current block in a reference picture that has been encoded and decoded before the current picture, and generates a prediction block for the current block using the searched block. Then, a motion vector (MV) corresponding to the displacement between the current block in the current picture and the prediction block in the reference picture is generated.
  • MV motion vector
  • motion estimation is performed on the luma component, and a motion vector calculated based on the luma component is used for both the luma component and the chroma component.
  • Motion information including information about the reference picture and information about the motion vector used to predict the current block is encoded by the entropy encoding unit 155 and transmitted to the video decoding device.
  • the inter prediction unit 124 may perform interpolation on a reference picture or reference block to increase prediction accuracy. That is, subsamples between two consecutive integer samples are interpolated by applying filter coefficients to a plurality of consecutive integer samples including the two integer samples. If the process of searching for the block most similar to the current block is performed for the interpolated reference picture, the motion vector can be expressed with precision in decimal units rather than precision in integer samples.
  • the precision or resolution of the motion vector may be set differently for each target area to be encoded, for example, slice, tile, CTU, CU, etc.
  • AMVR adaptive motion vector resolution
  • information about the motion vector resolution to be applied to each target area must be signaled for each target area. For example, if the target area is a CU, information about the motion vector resolution applied to each CU is signaled.
  • Information about motion vector resolution may be information indicating the precision of a differential motion vector, which will be described later.
  • the inter prediction unit 124 may perform inter prediction using bi-prediction.
  • bidirectional prediction two reference pictures and two motion vectors indicating the positions of blocks most similar to the current block within each reference picture are used.
  • the inter prediction unit 124 selects the first reference picture and the second reference picture from reference picture list 0 (RefPicList0) and reference picture list 1 (RefPicList1), respectively, and searches for a block similar to the current block within each reference picture. Create a first reference block and a second reference block. Then, the first reference block and the second reference block are averaged or weighted to generate a prediction block for the current block. Then, motion information including information about the two reference pictures used to predict the current block and information about the two motion vectors is transmitted to the encoder 150.
  • reference picture list 0 may be composed of pictures before the current picture in display order among the restored pictures
  • reference picture list 1 may be composed of pictures after the current picture in display order among the restored pictures.
  • relief pictures after the current picture may be additionally included in reference picture list 0, and conversely, relief pictures before the current picture may be additionally included in reference picture list 1. may be included.
  • the motion information of the current block can be transmitted to the video decoding device by encoding information that can identify the neighboring block. This method is called ‘merge mode’.
  • the inter prediction unit 124 selects a predetermined number of merge candidate blocks (hereinafter referred to as 'merge candidates') from neighboring blocks of the current block.
  • the surrounding blocks for deriving merge candidates include the left block (A0), bottom left block (A1), top block (B0), and top right block (B1) adjacent to the current block in the current picture. ), and all or part of the upper left block (A2) can be used.
  • a block located within a reference picture (which may be the same or different from the reference picture used to predict the current block) rather than the current picture where the current block is located may be used as a merge candidate.
  • a block co-located with the current block within the reference picture or blocks adjacent to the co-located block may be additionally used as merge candidates. If the number of merge candidates selected by the method described above is less than the preset number, the 0 vector is added to the merge candidates.
  • the inter prediction unit 124 uses these neighboring blocks to construct a merge list including a predetermined number of merge candidates.
  • a merge candidate to be used as motion information of the current block is selected from among the merge candidates included in the merge list, and merge index information is generated to identify the selected candidate.
  • the generated merge index information is encoded by the encoder 150 and transmitted to the video decoding device.
  • Merge skip mode is a special case of merge mode. After performing quantization, when all transformation coefficients for entropy encoding are close to zero, only peripheral block selection information is transmitted without transmitting residual signals. By using merge skip mode, relatively high coding efficiency can be achieved in low-motion images, still images, screen content images, etc.
  • merge mode and merge skip mode are collectively referred to as merge/skip mode.
  • AMVP Advanced Motion Vector Prediction
  • the inter prediction unit 124 uses neighboring blocks of the current block to derive predicted motion vector candidates for the motion vector of the current block.
  • the surrounding blocks used to derive predicted motion vector candidates include the left block (A0), bottom left block (A1), top block (B0), and top right block adjacent to the current block in the current picture shown in FIG. B1), and all or part of the upper left block (A2) can be used. Additionally, a block located within a reference picture (which may be the same or different from the reference picture used to predict the current block) rather than the current picture where the current block is located will be used as a surrounding block used to derive prediction motion vector candidates. It may be possible.
  • a collocated block located at the same location as the current block within the reference picture or blocks adjacent to the block at the same location may be used. If the number of motion vector candidates is less than the preset number by the method described above, the 0 vector is added to the motion vector candidates.
  • the inter prediction unit 124 derives predicted motion vector candidates using the motion vectors of the neighboring blocks, and determines a predicted motion vector for the motion vector of the current block using the predicted motion vector candidates. Then, the predicted motion vector is subtracted from the motion vector of the current block to calculate the differential motion vector.
  • the predicted motion vector can be obtained by applying a predefined function (eg, median, average value calculation, etc.) to the predicted motion vector candidates.
  • a predefined function eg, median, average value calculation, etc.
  • the video decoding device also knows the predefined function.
  • the neighboring blocks used to derive predicted motion vector candidates are blocks for which encoding and decoding have already been completed, the video decoding device also already knows the motion vectors of the neighboring blocks. Therefore, the video encoding device does not need to encode information to identify the predicted motion vector candidate. Therefore, in this case, information about the differential motion vector and information about the reference picture used to predict the current block are encoded.
  • the predicted motion vector may be determined by selecting one of the predicted motion vector candidates.
  • information for identifying the selected prediction motion vector candidate is additionally encoded, along with information about the differential motion vector and information about the reference picture used to predict the current block.
  • the subtractor 130 generates a residual block by subtracting the prediction block generated by the intra prediction unit 122 or the inter prediction unit 124 from the current block.
  • the transform unit 140 converts the residual signal in the residual block having pixel values in the spatial domain into transform coefficients in the frequency domain.
  • the conversion unit 140 may convert the residual signals in the residual block by using the entire size of the residual block as a conversion unit, or divide the residual block into a plurality of subblocks and perform conversion by using the subblocks as a conversion unit. You may.
  • the residual signals can be converted by dividing them into two subblocks, a transform area and a non-transformation region, and using only the transform region subblock as a transform unit.
  • the transformation area subblock may be one of two rectangular blocks with a size ratio of 1:1 based on the horizontal axis (or vertical axis).
  • a flag indicating that only the subblock has been converted (cu_sbt_flag), directional (vertical/horizontal) information (cu_sbt_horizontal_flag), and/or position information (cu_sbt_pos_flag) are encoded by the entropy encoding unit 155 and signaled to the video decoding device.
  • the size of the transform area subblock may have a size ratio of 1:3 based on the horizontal axis (or vertical axis), and in this case, a flag (cu_sbt_quad_flag) that distinguishes the corresponding division is additionally encoded by the entropy encoding unit 155 to encode the image. Signaled to the decryption device.
  • the transformation unit 140 can separately perform transformation on the residual block in the horizontal and vertical directions.
  • various types of transformation functions or transformation matrices can be used.
  • a pair of transformation functions for horizontal transformation and vertical transformation can be defined as MTS (Multiple Transform Set).
  • the conversion unit 140 may select a conversion function pair with the best conversion efficiency among MTSs and convert the residual blocks in the horizontal and vertical directions, respectively.
  • Information (mts_idx) about the transformation function pair selected from the MTS is encoded by the entropy encoder 155 and signaled to the video decoding device.
  • the quantization unit 145 quantizes the transform coefficients output from the transform unit 140 using a quantization parameter, and outputs the quantized transform coefficients to the entropy encoding unit 155.
  • the quantization unit 145 may directly quantize a residual block related to a certain block or frame without conversion.
  • the quantization unit 145 may apply different quantization coefficients (scaling values) depending on the positions of the transform coefficients within the transform block.
  • the quantization matrix applied to the quantized transform coefficients arranged in two dimensions may be encoded and signaled to the video decoding device.
  • the rearrangement unit 150 may rearrange coefficient values for the quantized residual values.
  • the rearrangement unit 150 can change a two-dimensional coefficient array into a one-dimensional coefficient sequence using coefficient scanning.
  • the realignment unit 150 can scan from DC coefficients to coefficients in the high frequency region using zig-zag scan or diagonal scan to output a one-dimensional coefficient sequence.
  • a vertical scan that scans a two-dimensional coefficient array in the column direction or a horizontal scan that scans the two-dimensional block-type coefficients in the row direction may be used instead of the zig-zag scan. That is, the scan method to be used among zig-zag scan, diagonal scan, vertical scan, and horizontal scan may be determined depending on the size of the transformation unit and the intra prediction mode.
  • the entropy encoding unit 155 uses various encoding methods such as CABAC (Context-based Adaptive Binary Arithmetic Code) and Exponential Golomb to encode the one-dimensional quantized transform coefficients output from the reordering unit 150.
  • CABAC Context-based Adaptive Binary Arithmetic Code
  • Exponential Golomb Exponential Golomb to encode the one-dimensional quantized transform coefficients output from the reordering unit 150.
  • a bitstream is created by encoding the sequence.
  • the entropy encoder 155 encodes information such as CTU size, CU split flag, QT split flag, MTT split type, and MTT split direction related to block splitting, so that the video decoding device can encode blocks in the same way as the video coding device. Allow it to be divided.
  • the entropy encoding unit 155 encodes information about the prediction type indicating whether the current block is encoded by intra prediction or inter prediction, and generates intra prediction information (i.e., intra prediction) according to the prediction type.
  • Information about the mode) or inter prediction information coding mode of motion information (merge mode or AMVP mode), merge index in case of merge mode, information on reference picture index and differential motion vector in case of AMVP mode
  • the entropy encoding unit 155 encodes information related to quantization, that is, information about quantization parameters and information about the quantization matrix.
  • the inverse quantization unit 160 inversely quantizes the quantized transform coefficients output from the quantization unit 145 to generate transform coefficients.
  • the inverse transform unit 165 restores the residual block by converting the transform coefficients output from the inverse quantization unit 160 from the frequency domain to the spatial domain.
  • the addition unit 170 restores the current block by adding the restored residual block and the prediction block generated by the prediction unit 120. Pixels in the restored current block are used as reference pixels when intra-predicting the next block.
  • the loop filter unit 180 restores pixels to reduce blocking artifacts, ringing artifacts, blurring artifacts, etc. that occur due to block-based prediction and transformation/quantization. Perform filtering on them.
  • the filter unit 180 is an in-loop filter and may include all or part of a deblocking filter 182, a Sample Adaptive Offset (SAO) filter 184, and an Adaptive Loop Filter (ALF) 186. .
  • the deblocking filter 182 filters the boundaries between restored blocks to remove blocking artifacts caused by block-level encoding/decoding, and the SAO filter 184 and alf(186) perform deblocking filtering. Additional filtering is performed on the image.
  • the SAO filter 184 and alf 186 are filters used to compensate for the difference between the restored pixel and the original pixel caused by lossy coding.
  • the SAO filter 184 improves not only subjective image quality but also coding efficiency by applying an offset in units of CTU.
  • the ALF 186 performs filtering on a block basis, distinguishing the edge and degree of change of the block and applying different filters to compensate for distortion.
  • Information about filter coefficients to be used in ALF may be encoded and signaled to a video decoding device.
  • the restored block filtered through the deblocking filter 182, SAO filter 184, and ALF 186 is stored in the memory 190.
  • the reconstructed picture can be used as a reference picture for inter prediction of blocks in the picture to be encoded later.
  • FIG. 5 is an example block diagram of a video decoding device that can implement the techniques of the present disclosure.
  • the video decoding device and its sub-configurations will be described with reference to FIG. 5.
  • the image decoding device includes an entropy decoding unit 510, a rearrangement unit 515, an inverse quantization unit 520, an inverse transform unit 530, a prediction unit 540, an adder 550, a loop filter unit 560, and a memory ( 570).
  • each component of the video decoding device may be implemented as hardware or software, or may be implemented as a combination of hardware and software. Additionally, the function of each component may be implemented as software and a microprocessor may be implemented to execute the function of the software corresponding to each component.
  • the entropy decoder 510 decodes the bitstream generated by the video encoding device, extracts information related to block division, determines the current block to be decoded, and provides prediction information and residual signals needed to restore the current block. Extract information, etc.
  • the entropy decoder 510 extracts information about the CTU size from a Sequence Parameter Set (SPS) or Picture Parameter Set (PPS), determines the size of the CTU, and divides the picture into CTUs of the determined size. Then, the CTU is determined as the highest layer of the tree structure, that is, the root node, and the CTU is divided using the tree structure by extracting the division information for the CTU.
  • SPS Sequence Parameter Set
  • PPS Picture Parameter Set
  • each node below the leaf node of QT is recursively divided into a BT or TT structure.
  • each node may undergo 0 or more repetitive MTT divisions after 0 or more repetitive QT divisions. For example, MTT division may occur immediately in the CTU, or conversely, only multiple QT divisions may occur.
  • the first flag (QT_split_flag) related to the division of the QT is extracted and each node is divided into four nodes of the lower layer. And, for the node corresponding to the leaf node of QT, a split flag (split_flag) indicating whether to further split into BT and split direction information are extracted.
  • the entropy decoding unit 510 determines the current block to be decoded using division of the tree structure, it extracts information about the prediction type indicating whether the current block is intra-predicted or inter-predicted.
  • prediction type information indicates intra prediction
  • the entropy decoder 510 extracts syntax elements for intra prediction information (intra prediction mode) of the current block.
  • prediction type information indicates inter prediction
  • the entropy decoder 510 extracts syntax elements for inter prediction information, that is, information indicating a motion vector and a reference picture to which the motion vector refers.
  • the entropy decoding unit 510 extracts information about quantized transform coefficients of the current block as quantization-related information and information about the residual signal.
  • the reordering unit 515 re-organizes the sequence of one-dimensional quantized transform coefficients entropy decoded in the entropy decoding unit 510 into a two-dimensional coefficient array (i.e., in reverse order of the coefficient scanning order performed by the image encoding device). block).
  • the inverse quantization unit 520 inversely quantizes the quantized transform coefficients and inversely quantizes the quantized transform coefficients using a quantization parameter.
  • the inverse quantization unit 520 may apply different quantization coefficients (scaling values) to quantized transform coefficients arranged in two dimensions.
  • the inverse quantization unit 520 may perform inverse quantization by applying a matrix of quantization coefficients (scaling values) from an image encoding device to a two-dimensional array of quantized transform coefficients.
  • the inverse transform unit 530 inversely transforms the inverse quantized transform coefficients from the frequency domain to the spatial domain to restore the residual signals, thereby generating a residual block for the current block.
  • the inverse transformation unit 530 when the inverse transformation unit 530 inversely transforms only a partial area (subblock) of the transformation block, a flag (cu_sbt_flag) indicating that only the subblock of the transformation block has been transformed, and directionality (vertical/horizontal) information of the subblock (cu_sbt_horizontal_flag) ) and/or extracting the position information (cu_sbt_pos_flag) of the subblock, and inversely transforming the transformation coefficients of the corresponding subblock from the frequency domain to the spatial domain to restore the residual signals, and for the area that has not been inversely transformed, a “0” value is used as the residual signal. By filling , the final residual block for the current block is created.
  • the inverse transform unit 530 determines a transformation function or transformation matrix to be applied in the horizontal and vertical directions, respectively, using the MTS information (mts_idx) signaled from the video encoding device, and uses the determined transformation function. Inverse transformation is performed on the transformation coefficients in the transformation block in the horizontal and vertical directions.
  • the prediction unit 540 may include an intra prediction unit 542 and an inter prediction unit 544.
  • the intra prediction unit 542 is activated when the prediction type of the current block is intra prediction
  • the inter prediction unit 544 is activated when the prediction type of the current block is inter prediction.
  • the intra prediction unit 542 determines the intra prediction mode of the current block among a plurality of intra prediction modes from the syntax elements for the intra prediction mode extracted from the entropy decoder 510, and provides a reference around the current block according to the intra prediction mode. Predict the current block using pixels.
  • the inter prediction unit 544 uses the syntax elements for the inter prediction mode extracted from the entropy decoder 510 to determine the motion vector of the current block and the reference picture to which the motion vector refers, and uses the motion vector and the reference picture to determine the motion vector of the current block. Use it to predict the current block.
  • the adder 550 restores the current block by adding the residual block output from the inverse transform unit and the prediction block output from the inter prediction unit or intra prediction unit. Pixels in the restored current block are used as reference pixels when intra-predicting a block to be decoded later.
  • the loop filter unit 560 may include a deblocking filter 562, a SAO filter 564, and an ALF 566 as an in-loop filter.
  • the deblocking filter 562 performs deblocking filtering on the boundaries between restored blocks to remove blocking artifacts that occur due to block-level decoding.
  • the SAO filter 564 and the ALF 566 perform additional filtering on the reconstructed block after deblocking filtering to compensate for the difference between the reconstructed pixel and the original pixel caused by lossy coding.
  • the filter coefficient of ALF is determined using information about the filter coefficient decoded from the non-stream.
  • the restored block filtered through the deblocking filter 562, SAO filter 564, and ALF 566 is stored in the memory 570.
  • the reconstructed picture is later used as a reference picture for inter prediction of blocks in the picture to be encoded.
  • This embodiment relates to encoding and decoding of images (videos) as described above. More specifically, in intra prediction of the current chroma block, a first predictor of the current chroma block is generated according to CCLM prediction, and a second predictor of the current chroma block is additionally generated based on the reconstructed neighboring pixels. , Provides a video coding method and device for weightedly combining a first predictor and a second predictor.
  • the following embodiments may be performed by the intra prediction unit 122 in a video encoding device. Additionally, it may be performed by the intra prediction unit 542 in a video decoding device.
  • the video encoding device may generate signaling information related to this embodiment in terms of bit rate distortion optimization when predicting the current block.
  • the video encoding device can encode the video using the entropy encoding unit 155 and then transmit it to the video decoding device.
  • the video decoding device can decode signaling information related to prediction of the current block from the bitstream using the entropy decoding unit 510.
  • the term 'target block' to be encoded/decoded may be used in the same sense as the current block or coding unit (CU) as described above, or a partial region of the coding unit. It may mean.
  • the target block includes a luma block including a luma component and a chroma block including a chroma component.
  • the chroma block of the target block is expressed as the target chroma block or the current chroma block.
  • the luma block of the target block is expressed as the target luma block or the current luma block.
  • the aspect ratio of a block is defined as the horizontal length of the block divided by the vertical length.
  • the intra prediction mode of the luma block has fine-grained directional modes (i.e., 2 to 66) in addition to the undirectional modes (i.e., Planar and DC), as illustrated in FIG. 3A. Additionally, as added to the example of FIG. 3B, the intra prediction mode of the luma block has directional modes (-14 to -1 and 67 to 80) according to wide-angle intra prediction.
  • the chroma block can also use intra prediction in this granular directional mode to a limited extent.
  • various directional modes other than the horizontal and vertical directions that the luma block can use cannot always be used.
  • the prediction mode of the current chroma block must be set to DM mode. By setting it to DM mode in this way, the current chroma block can use an orientation mode other than the horizontal and vertical of the luma block.
  • the most frequently used intra prediction modes or to maintain image quality include Planar, DC, Vertical, Horizontal, and DM modes.
  • the intra prediction mode of the luma block spatially corresponding to the current chroma block is used as the intra prediction mode of the chroma block.
  • the video encoding device can signal to the video decoding device whether the intra prediction mode of the chroma block is DM mode. At this time, there may be several ways to transmit the DM mode to the video decoding device. For example, the video encoding device can indicate whether it is in DM mode by setting intra_chroma_pred_mode, which is information for indicating the intra prediction mode of a chroma block, to a specific value and then transmitting it to the video decoding device.
  • intra_chroma_pred_mode which is information for indicating the intra prediction mode of a chroma block
  • the intra prediction unit 542 of the video decoding device determines the intra prediction mode of the chroma block according to Table 1. IntraPredModeC can be set.
  • intra_chroma_pred_mode and IntraPredModeC which are information related to the intra prediction mode of a chroma block, they are expressed as a chroma intra prediction mode indicator and a chroma intra prediction mode, respectively.
  • lumaIntraPredMode is the intra prediction mode of the luma block corresponding to the current chroma block (hereinafter referred to as 'luma intra prediction mode').
  • lumaIntraPredMode represents one of the prediction modes illustrated in FIG. 3A.
  • lumaIntraPredMode of 18, 50, and 66 indicates the directional modes referred to as horizontal, vertical, and VDIA, respectively.
  • intra_chroma_pred_mode 0, 1, 2, and 3
  • planar, vertical, horizontal, and DC prediction modes are indicated, respectively.
  • the IntraPredModeC value which is the chroma intra prediction mode, is set equal to the lumaIntraPredMode value.
  • the video encoding device determines encoding information in terms of bit rate distortion optimization. Afterwards, the video encoding device encodes them to generate a bitstream and then signals it to the video decoding device. Additionally, the video encoding device can obtain encoding information from a higher level and proceed with the subsequent encoding process.
  • cross-component prediction When performing prediction in a video encoding/decoding device, a method of generating a prediction block of the current block from a color component different from the color component of the target block to be currently encoded and decoded is called cross-component prediction. ) is defined.
  • cross-component prediction is performed using the linear relationship between chroma pixels and corresponding luma pixels to intra-predict the current chroma block, which is called CCLM (Cross-component Linear Model) prediction.
  • CCLM Cross-component Linear Model
  • the video decoding device parses cclm_mode_flag, which indicates whether to use CCLM prediction mode. If cclm_mode_flag is 1 and CCLM mode is used, the video decoding device parses cclm_mode_idx and parses the index of CCLM mode. At this time, depending on the value of cclm_mode_idx, the CCLM mode may indicate one of three modes. On the other hand, when cclm_mode_flag is 0 and CCLM mode is not used, the video decoding device parses intra_chroma_pred_mode indicating intra prediction mode, as described above.
  • Figure 6 is an example diagram showing surrounding pixels referenced for CCLM prediction.
  • the image decoding device determines the area in the luma image corresponding to the current chroma block (hereinafter, 'corresponding luma area').
  • left reference pixels and top reference pixels of the corresponding luma area, and left reference pixels and top reference pixels of the target chroma block may be used.
  • the left reference pixels and the top reference pixels are integrated into reference pixels and surrounding pixels. Or, it is expressed by adjacent pixels.
  • reference pixels of the chroma component are indicated as chroma reference pixels
  • reference pixels of the luma component are indicated as luma reference pixels.
  • the size of the chroma block that is, the number of pixels, is expressed as N ⁇ N (where N is a natural number).
  • a linear model is derived between the reference pixels of the luma area and the reference pixels of the chroma block, and then the linear model is applied to the restored pixels of the corresponding luma area, thereby acting as a predictor of the target chroma block.
  • a prediction block is created. For example, as illustrated in FIG. 6, four pairs of pixels combining pixels in the surrounding pixel line of the current chroma block and pixels in the corresponding luma area can be used to derive a linear model.
  • the image decoding device may derive ⁇ and ⁇ representing a linear model for four pairs of pixels, as shown in Equation 1.
  • X a and X b each represent the average value of the two minimum values and the average value of the two maximum values.
  • Y a and Y b each represent the average value of two minimum values and the average value of two maximum values.
  • the image decoding device generates a predictor pred C (i,j) of the current chroma block from the pixel value rec' L (i,j) of the corresponding luma area using a linear model, as shown in Equation 2. can do.
  • the image decoding device checks whether the size of the corresponding luma area is the same as the size of the current chroma block. If the sizes between the two are different depending on the subsampling method of the chroma channel, the video decoding device can adjust the size of the corresponding luma area to be the same as the size of the current chroma block by applying downsampling to the corresponding luma area.
  • the CCLM mode is divided into three modes: CCLM_LT, CCLM_L, and CCLM_T, depending on the positions of surrounding pixels used in the derivation process of the linear model.
  • the CCLM_LT mode uses two pixels in each direction among the surrounding pixels adjacent to the left and top of the current chroma block.
  • CCLM_L uses 4 pixels from surrounding pixels adjacent to the left of the current chroma block.
  • CCLM_T uses four pixels from among the surrounding pixels adjacent to the top of the current chroma block.
  • Figure 7 is an example diagram showing information that can be used in intra prediction of a chroma channel.
  • the video decoding device may use a method of generating a predictor using information (1) of the corresponding luma area, or a method of generating a predictor using information (2) of the same channel.
  • VVC technology there are various techniques for each method, and these techniques are divided into prediction modes. Additionally, the predictor generation method can be specified by indicating the prediction mode.
  • setting the predictor generation method is described as setting the prediction mode.
  • generating a predictor using information (1) of the corresponding luma area is expressed as 'cross component prediction'
  • the method is expressed as 'cross component prediction mode' or 'cross component prediction method'.
  • generating a predictor using information (2) of the same channel is expressed as 'same-channel prediction', and the method is expressed as 'same-channel prediction mode' or 'same-channel prediction method'.
  • the cross component prediction method using information (1) of the corresponding luma area includes the CCLM mode as described above.
  • a cross component prediction method a method of deriving multiple linear models between the corresponding luma area and the current chroma block and predicting using them, a gradient value (i.e., a change value) based on the pixel value instead of the luma pixel value of the corresponding position ), a method of deriving a linear model using and predicting using it, a method of predicting using many-to-one matching that also uses the luma pixel corresponding to the same position and its surrounding pixel values when predicting one pixel value of the current chroma block, etc. there is.
  • co-channel prediction methods that use information (2) of the same channel include planar, DC, and directional modes.
  • co-channel prediction methods include technologies such as ISP (Intra Sub Partition), MIP (Matrix-weighted Intra Prediction), and MRL (Multiple Reference Line).
  • ISP Intra Sub Partition
  • MIP Microx-weighted Intra Prediction
  • MRL Multiple Reference Line
  • a method of predicting by inferring the directional or non-directional mode from several reference lines around the current block, calculating a weight based on the distance between the pixel in the corresponding luma area and the pixel around the block, and then using this weight to calculate the current A method of predicting by weighting the pixels in a chroma block and the surrounding chroma pixels can also be a co-channel prediction method.
  • This problem of existing technology can be solved by considering surrounding pixel information of the current channel when predicting according to CCLM mode. This means that in addition to using the information in 1, the prediction is performed using the information in 2.
  • this problem of the existing technology can be solved by additionally using luma area information when making predictions using information on surrounding pixels within the same channel (for example, when performing directional or non-directional intra prediction). This means that in addition to using the information in 2, the prediction is performed using the information in 1.
  • FIG. 8 is an exemplary diagram illustrating an intra prediction unit that performs intra prediction of a chroma block according to an embodiment of the present disclosure.
  • the intra prediction unit 542 in the video decoding device performs first prediction based on CCLM mode.
  • a predictor of the current chroma block is generated by weightedly combining the second predictor additionally generated based on the ruler and the intra prediction mode.
  • the CCLM mode uses information (1) of the corresponding luma area
  • the intra prediction mode uses information (2) of the same channel.
  • the intra prediction unit 542 according to this embodiment includes all or part of an input unit 802, a first predictor generator 804, a second predictor generator 806, and a weighted summer 808. Meanwhile, the intra prediction unit 122 in the video encoding device may also include the same components.
  • the input unit 802 acquires a CCLM mode for CCLM prediction of the current chroma block.
  • the input device 802 may obtain a cross-component prediction mode for cross-component prediction of the current chroma block.
  • the first predictor generator 804 performs CCLM prediction based on CCLM mode to generate a first predictor of the current chroma block.
  • the first predictor generator 804 may generate the first predictor of the current chroma block by performing cross-component prediction based on the cross-component prediction mode.
  • the second predictor generator 806 generates a second predictor of the current chroma block based on an intra prediction mode using neighboring pixels. That is, the second predictor generator 806 generates the second predictor based on the same-channel prediction mode using the same-channel information.
  • the weighted summer 808 generates an intra predictor of the current chroma block by weightedly combining the first predictor and the second predictor using a weight.
  • the image decoding device may weightly combine the first predictor and the second predictor using weights as shown in Equation 3.
  • pred C (i,j) represents the position of the pixel
  • pred C (i,j) represents the intra predictor of the current chroma block
  • pred CCLM (i,j) represents the first predictor
  • pred intra (i,j) represents the second predictor
  • w CCLM (i,j) represents the weight.
  • pred CCLM (i,j) represents a predictor based on CCLM prediction, but may comprehensively represent a predictor based on cross-component prediction.
  • the second predictor and ‘additional predictor’ are used interchangeably. If there are multiple (e.g., n) additional predictors, another pred intra is added to Equation 3, and the weights are also divided and distributed for each additional predictor within 1-w CCLM (i,j). You can.
  • Equation 3 the weight is expressed based on w CCLM , but depending on the embodiment, it may be implemented based on w intra as in Equation 4.
  • the second predictor according to the co-channel prediction mode and the first predictor according to the CCLM mode can be weighted and combined as shown in Equation 4.
  • the predictor according to the co-channel prediction mode may be called the first predictor
  • the predictor according to the CCLM mode may be called the second predictor.
  • the intra prediction mode for generating the first predictor using the same channel information is parsed
  • the CCLM mode for generating the second predictor is inferred using the information in the corresponding luma area. It can be. Therefore, depending on the implementation, it should be understood that the first predictor and the second predictor may include both the cases shown in Equation 3 and Equation 4.
  • the predictor according to the CCLM mode will be referred to as the first predictor
  • the predictor according to the intra prediction mode using surrounding pixel information will be referred to as the second predictor
  • w CCLM shown in Equation 3 is The weight expressed as a standard is used.
  • the CCLM prediction mode for generating the first predictor may be parsed and the intra prediction mode for generating the second predictor may be inferred.
  • the term 'adjacent' refers to the case where two objects are spatially in contact
  • the term 'periphery', including the meaning of 'adjacent' refers to the spatial meaning that one object exists within a certain distance from another object. . If channel information is to be displayed, this is specified in the context.
  • the temporal meaning of 'surrounding' is not separately mentioned, but subsequent realization examples can be realized at corresponding positions in other frames.
  • the video decoding device can independently infer an intra prediction mode using neighboring pixels, or use the prediction mode transmitted on the bitstream by the video encoding device. Additionally, the video decoding device may independently infer a method of weighted combining the first predictor and the second predictor, or use a method transmitted on a bitstream by the video encoding device. Methods for inferring/transmitting the intra prediction mode and methods for inferring/transmitting weights can be combined in various ways. For example, the intra prediction mode can be inferred by a video decoding device, and the weighted combining method can be transmitted through a bitstream. Conversely, the intra prediction mode is transmitted through a bitstream and the weighted combining method can be inferred by the video decoding device. Below, preferred embodiments of these various combinations are described.
  • the video decoding device can set the preset prediction mode as the prediction mode of the second predictor without explicitly receiving a signal about the prediction mode of the second predictor from the video encoding device.
  • the image decoding device may set the width/height/area/aspect ratio/prediction mode/position/number/distance to the current chroma block of the surrounding chroma blocks of the current chroma block, and the value/position/number/up to the current chroma block of the surrounding chroma pixels.
  • at least one prediction mode of the second predictor can be inferred.
  • blocks included in the corresponding luma area are defined as blocks in which all or part of the block is included in the corresponding luma area.
  • the same-channel prediction mode for generating the intra predictor (pred intra ) is parsed, and the predictor (pred CCLM ) using information in the corresponding luma area is parsed.
  • the creation method can be inferred.
  • the video decoding device can set the preset prediction mode as the prediction mode of pred CCLM without explicitly receiving a signal about the prediction mode of pred CCLM from the video encoding device.
  • the image decoding device may use the width/height/area/aspect ratio/prediction mode/position/number/distance to the current chroma block of the current chroma blocks, the value/position/number/distance to the current chroma block of the surrounding chroma pixels, and the corresponding Based on at least one information of the width/height/area/aspect ratio/prediction mode/position/number of blocks included in the luma area and the surrounding blocks, and the value/position/number of luma pixels in and around the corresponding luma area. At least one prediction mode of pred CCLM can be inferred.
  • Examplementation Example 1-1 Setting a predefined prediction mode as the prediction mode of the second predictor
  • the video decoding device sets a predefined prediction mode as the prediction mode of the second predictor (pred intra ).
  • the available prediction mode may be a mode that generates a predictor based on surrounding pixels, such as the 67 intra prediction modes (IPM), matrix-weighted intra prediction (MIP) mode, etc. illustrated in FIG. 3A.
  • IPM intra prediction modes
  • MIP matrix-weighted intra prediction
  • the prediction mode of the second predictor may be Planar mode. Therefore, by applying Equation 3, the image decoding device can generate a predictor of the current chroma block as shown in Equation 5.
  • the prediction mode of each additional predictor may be Planar mode and DC mode. Accordingly, the image decoding device can generate a predictor of the current chroma block as shown in Equation 6.
  • the preset prediction mode is called 'representative mode', which will be explained later.
  • the prediction mode of the second predictor (pred CCLM ) using information of the corresponding luma area may be preset. This mode may be at least one of the cross-component prediction modes described above. At this time, the same-channel prediction mode for generating a predictor using the same-channel information can be parsed.
  • Examplementation Example 1-2 Using restored chroma information around the current chroma block
  • the image decoding device uses a prediction mode derived using information such as the restored chroma information around the current chroma block, that is, the value/position/number/distance to the current chroma block of pixels around the current chroma block, etc. (hereinafter referred to as 'representative mode') is set as the prediction mode of the second predictor.
  • 'representative mode' is set as the prediction mode of the second predictor.
  • the number of representative modes derived by the video decoding device depends on the number of additional predictors that are weighted.
  • the representative mode setting method according to this implementation can be applied when the second predictor is pred CCLM or pred intra .
  • the derivation of the representative mode for the case where the second predictor is pred intra is described.
  • the video decoding device can use one of the following methods as a method for deriving the representative mode.
  • the prediction mode with the highest intensity among prediction modes derived from the values of surrounding pixels of the current chroma block using an edge detection filter may be set as the representative mode.
  • the video decoding device can borrow the method used to derive the prediction mode in DIMD (Decoder-side Intra Mode Derivation) technology as follows.
  • DIMD Decoder-side Intra Mode Derivation
  • a 'surrounding chroma pixel area' is set including pixels surrounding the current chroma block.
  • the peripheral chroma pixel area may be set in various ways depending on the embodiment other than the example in FIG. 9.
  • the video decoding device applies an edge detection filter such as a Sobel filter, Prewitt filter, Robert cross filter, etc. to the determined surrounding chroma pixel area, as shown in the example of FIG. 9.
  • the video decoding device calculates the gradient for each pixel in the region and replaces it with the directional mode of intra prediction.
  • the video decoding device may derive the intensity (I) for the corresponding directional mode based on the size of the gradient value and then generate an intensity histogram by accumulating the intensities for each directional mode as shown in the example of FIG. 10.
  • I IPM represents the intensity of each directional mode.
  • the video decoding device may use mode 19, which is the directional mode with the highest intensity in the histogram illustrated in FIG. 10, as a representative mode.
  • the directional modes with the next highest intensity after mode 19 in the histogram can be used as representative modes in order of size.
  • a representative mode can be derived by specifying priorities.
  • a priority a determined order such as ⁇ horizontal mode, vertical mode, mode 66, ... ⁇ , ascending order, or descending order may be used. For example, when priority is given in descending order. , In the example of Figure 11, the index of mode 22 is greater than that of mode 21, so mode 22 has a higher priority. Therefore, the video decoding device can derive mode 22 as the representative mode.
  • the representative mode can be inferred according to predefined rules.
  • the video decoding device divides the surrounding pixels into surrounding chroma pixel areas on the left and top of the current chroma block, as shown in the example of FIG.
  • the directional mode with the highest intensity can be used as the representative mode.
  • the image decoding device uses mode 19 derived from the histogram illustrated in FIG. 13A and the example illustrated in FIG. 13B.
  • Mode 22 derived from the histogram can be used as a representative mode.
  • this implementation may be limited to the case where the second predictor is pred intra according to Equation 3.
  • the distortion of each prediction mode can be compared using information on surrounding pixels of the current chroma block, and then the prediction mode with the smallest distortion can be set as the representative mode.
  • the video decoding device can borrow the method used to derive the prediction mode from template-based intra prediction mode derivation (TIMD) technology as follows.
  • the surrounding chroma pixel area of the current chroma block is set as shown in the example in FIG. 14.
  • the image decoding device generates a predictor of the surrounding chroma pixel area by performing prediction for each prediction mode candidate using the surrounding pixels on the left and top of the corresponding area. In the example of FIG.
  • the surrounding pixels referred to when generating the predictor are limited to neighboring pixels adjacent to the set peripheral chroma pixel area, but in addition, neighboring pixels slightly distant from the set area may be used. Additionally, the surrounding chroma pixel area may be set in various ways depending on the embodiment.
  • prediction mode candidates may be prediction modes that generate a predictor based on surrounding pixels, such as IPM (Intra Prediction Mode) or MIP mode. Additionally, when the second predictor is pred CCLM , the prediction mode candidates may be one of the cross-component prediction modes, which may be a prediction mode that generates a predictor based on information in the corresponding luma area.
  • the image decoding device calculates the distortion (D) between the predictor of the surrounding chroma pixel area according to each prediction mode candidate and the restored pixel values of the area.
  • various image similarity measurement methods such as Mean Square Error (MSE), Sum of Absolute Differences (SAD), and Sum of Absolute Transformed Differences (SATD) may be used.
  • MSE Mean Square Error
  • SAD Sum of Absolute Differences
  • SATD Sum of Absolute Transformed Differences
  • the video decoding device uses the prediction mode with the smallest distortion among prediction mode candidates as the representative mode. For example, when a distortion (D IPM ) histogram is generated for prediction mode candidates as shown in the example of FIG. 15, mode 50 has the smallest distortion. Therefore, the video decoding device can use mode 50 as a representative mode.
  • the selection criteria for the representative mode are replaced in order of increasing distortion size, and then the first method described above can be equally applied.
  • the representative mode can be derived using various methods in addition to the two examples described above.
  • 16A and 16B are flowcharts showing an intra prediction method of a current chroma block according to an embodiment of the present disclosure.
  • the video decoding device parses cclm_mode_idx (S1600). By parsing the index cclm_mode_idx, obtain the CCLM mode to apply to the current chroma block. Alternatively, the video decoding device can parse the index and obtain a cross-component prediction mode to apply to the current chroma block. Meanwhile, cclm_mode_idx or cross component prediction mode can be determined by the video encoding device in terms of optimizing encoding efficiency.
  • the video decoding device generates the first predictor (pred CCLM ) by performing the existing getCclmPred() function using the parsed CCLM mode as input (S1602).
  • the video decoding device may generate a first predictor using the parsed cross-component prediction mode as an input.
  • the video decoding device performs the getExtraIntraMode() function to infer the representative mode (S1604).
  • the getExtraIntraMode() function is called 'representative mode derivation function' or 'derivation function' for short.
  • the video decoding device generates a second predictor (pred intra ) by using the representative mode as an input and performing the existing getIntraPred() function (S1606).
  • the image decoding device generates a predictor (pred C ) of the current chroma block by weighting the first predictor and the second predictor (S1608).
  • the video decoding device parses intra_chroma_pred_mode (S1620). By parsing the index intra_chroma_pred_mode, obtain the intra prediction mode to apply to the current chroma block. Meanwhile, intra_chroma_pred_mode may be determined by the video encoding device in terms of optimizing encoding efficiency.
  • the video decoding device generates the first predictor (pred intra ) by performing the existing getIntraPred() function using the parsed intra mode as input (S1622).
  • the video decoding device performs the getExtraIntraMode() function to infer the representative mode (S1624).
  • the video decoding device generates a second predictor (pred CCLM ) by using the representative mode as an input and performing the existing getCclmPred() function (S1626).
  • the image decoding device generates a predictor (pred C ) of the current chroma block by weighting the first predictor and the second predictor (S1628).
  • getExtraIntraMode() which infers the representative mode according to the various representative mode derivation methods of Realization Example 1-2
  • the derivation function getExtraIntraMode() can be implemented in various ways.
  • the operation of getExtraIntraMode() will be described based on the example of FIG. 16A, but can be equally described based on the example of FIG. 16B.
  • the upper-left pixel coordinates (x0, y0), width, and height of the current chroma block can be provided as basic input to getExtraIntraMode(), which implements this implementation.
  • the derivation function sets the surrounding chroma pixel area from which to derive the representative mode based on the coordinates (x0,y0), width, and height of the upper left pixel of the current block.
  • the derivation function creates an intensity histogram as follows.
  • the derivation function generates an intensity histogram in the form of a list with a length equal to the number of preset directional mode candidates and then initializes it to 0.
  • the derivation function positions the filter within the surrounding chroma pixel area according to the size of the preset filter and derives the slope value and intensity of the pixel area whose positions overlap with the filter.
  • the derivation function replaces the slope value with the directional mode index.
  • the derivation function uses the directional mode index as the position index in the histogram list and accumulates the derived intensity at that position.
  • the derivation function moves the center position of the filter and repeats the above operation until all points that can be the center position of the filter within the surrounding chroma pixel area are searched.
  • the derivation function derives the position index with the largest intensity value from the intensity histogram, and then outputs the directional mode index corresponding to the position index as the representative mode.
  • the derivation function sets the surrounding chroma pixel area from which to derive the representative mode based on the coordinates (x0,y0), width, and height of the upper left pixel of the current block and selects the range of reference pixels to predict the set area.
  • the derivation function creates a distortion histogram as follows.
  • the derivation function groups a preset prediction mode candidate and the distortion of that mode into a pair, sets up a vector-shaped distortion histogram with this as each component, and then initializes the distortion of all candidates to 0.
  • the derivation function generates a predictor of the surrounding chroma pixel area from reference pixels using the prediction mode candidate, which is the first component in the vector.
  • the derivation function uses a preset image similarity comparison measurement method to calculate the distortion value between the generated predictor and the restored pixel values of the surrounding chroma pixel area.
  • the derivation function updates the second component in the vector with the derived distortion value.
  • the derivation function repeats the above operation for all prediction mode candidates in the vector.
  • the derivation function outputs the prediction mode paired with the smallest distortion value from the distortion histogram as the representative mode.
  • the operation of the derivation function getExtraIntraMode() as described above regarding the two types of representative mode derivation methods describes the case of generating one representative mode. If multiple representative modes are created, the derived function can be expanded by additionally entering numExtraMode, the number of representative modes.
  • the representative mode inferred by getExtraIntraMode() may be the same type of prediction mode as the existing predictor's prediction mode, or the representative mode may not be inferred at all. In this case, the prediction mode with the next priority can be used as the representative mode in the inference process. Alternatively, the representative mode may be inferred using another inference method, or a preset mode may be used as the representative mode.
  • the image decoding device stores the deconstructed information inside and around the luma area (hereinafter, 'corresponding luma area') corresponding to the current chroma block, that is, inside and around the corresponding luma area.
  • the prediction mode (hereinafter referred to as 'representative mode') derived using information such as the value/position/number of pixels around the pixels is set as the prediction mode of the second predictor (pred intra ).
  • the number of representative modes derived by the video decoding device depends on the number of second predictors that are weighted.
  • a prediction mode eg, CCLM mode
  • the video decoding device can use one of the following methods as a method for deriving the representative mode.
  • the most dominant prediction mode among prediction modes derived from the values of pixels in and around the corresponding luma area using an edge detection filter can be set as the representative mode.
  • the video decoding device can borrow the method used to derive the prediction mode in DIMD technology.
  • a specific area among the pixels in the corresponding luma area is set as a 'luma pixel area'.
  • a specific area among the pixels in the corresponding luma area is set as the 'luma pixel area', but in addition, the luma pixel area can be set in various ways depending on the embodiment, including pixels surrounding the corresponding luma area. .
  • the video decoding device may derive one or more representative modes in the same manner as the first method of deriving the representative mode in Realization Example 1-2.
  • the prediction mode with the smallest distortion can be set as the representative mode.
  • the video decoding device can borrow the method used to derive the prediction mode in TIMD technology.
  • a luma pixel area is set including pixels in the corresponding luma area and surrounding pixels.
  • the video decoding device generates a predictor by performing prediction for each prediction mode candidate using surrounding pixels on the left and top of the corresponding area.
  • the surrounding pixels referred to when generating the predictor are limited to neighboring pixels adjacent to the set luma pixel area, but in addition, neighboring pixels slightly distant from the set area may be used.
  • the luma pixel area is limited to pixels within the corresponding luma area, but the luma pixel area can be set in various ways depending on the embodiment, including both pixels within the corresponding luma area and surrounding pixels. .
  • the video decoding device may derive one or more representative modes in the same manner as the second method of deriving the representative mode in Realization Example 1-2.
  • the representative mode can be derived using various methods in addition to the two examples described above.
  • the example of FIG. 16A may show the process of generating the final predictor in this implementation.
  • the derivation function getExtraIntraMode() which infers the representative mode according to the various representative mode derivation methods of Realization Examples 1-3, can be implemented in various ways.
  • the upper left pixel coordinates (x0, y0), width, and height of the current chroma block can be provided as basic input to getExtraIntraMode(), which implements this implementation.
  • the derivation function sets the luma pixel area from which the representative mode is derived from the pixels inside the corresponding luma area and the surrounding pixels based on the coordinates (x0,y0), width, and height of the upper left pixel of the current block.
  • the derivation function creates an intensity histogram as follows.
  • the derivation function generates an intensity histogram in the form of a list with a length equal to the number of preset directional mode candidates and then initializes it to 0.
  • the derivation function positions the filter within the luma pixel area according to the size of the preset filter and derives the slope value and intensity of the pixel area whose positions overlap with the filter.
  • the derivation function replaces the slope value with the directional mode index.
  • the directional mode index is used as the position index in the histogram list to accumulate the derived intensity at that position.
  • the derivation function moves the center position of the filter and repeats the above operation until all points that can be the center position of the filter within the luma pixel area are searched.
  • the derivation function derives the position index with the largest intensity value from the intensity histogram, and then outputs the directional mode index corresponding to the position index as the representative mode.
  • the derivation function sets the luma pixel area to derive the representative mode from the pixels inside the corresponding luma area and the surrounding pixels based on the coordinates (x0,y0), width, and height of the upper left pixel of the current block and predicts the set area. Select the range of reference pixels for this purpose.
  • the derivation function creates a distortion histogram as follows.
  • the derivation function groups a preset prediction mode candidate and the distortion of that mode into a pair, sets up a vector-shaped distortion histogram with this as each component, and then initializes the distortion of all candidates to 0.
  • the derivation function generates a predictor of the luma pixel area from reference pixels using the prediction mode candidate, which is the first component in the vector.
  • the derivation function uses a preset image similarity comparison measurement method to calculate the distortion value between the generated predictor and the restored pixel values of the luma pixel area.
  • the derivation function updates the second component in the vector with the derived distortion value.
  • the derivation function repeats the above operation for all prediction mode candidates in the vector.
  • the derivation function outputs the prediction mode paired with the smallest distortion value from the distortion histogram as the representative mode.
  • the operation of the derivation function getExtraIntraMode() as described above regarding the two types of representative mode derivation methods describes the case of generating one representative mode. If multiple representative modes are created, the derived function can be expanded by additionally entering numExtraMode, the number of representative modes.
  • the representative mode inferred by getExtraIntraMode() may be the same type of prediction mode as the existing predictor's prediction mode, or the representative mode may not be inferred at all. In this case, the prediction mode with the next priority can be used as the representative mode in the inference process. Alternatively, the representative mode may be inferred using another inference method, or a preset mode may be used as the representative mode.
  • the video decoding device when generating a plurality of second predictors, uses a prediction mode (hereinafter referred to as 'representative mode') to generate each second predictor.
  • 'representative mode' For inference, one of the methods presented in Realization Example 1-1 to Realization Example 1-3 can be selected and used.
  • the image decoding device may infer representative modes using different methods when generating each second predictor.
  • the video decoding device can infer the first representative mode using Realization Example 1-1 and the second representative mode using Realization Example 1-2. .
  • the first representative mode is inferred based on the first method (method of deriving the prediction mode with the largest intensity) in Realization Example 1-2
  • the second representative mode is inferred based on the second method in Realization Example 1-2 ( It can be inferred based on the method of deriving the prediction mode with the smallest distortion.
  • combinations of various inference methods may exist, and as the number of additional predictors increases, more various combinations of inference methods may be used.
  • a method of weightedly combining a second predictor generated based on the representative mode inferred according to Realization Example 1 and a first predictor generated according to existing CCLM prediction based on Equation 3 is described.
  • the video decoding device can be used with a specific weight fixed without being explicitly signaled by the video encoding device.
  • the video decoding device determines the width/height/width/aspect ratio/prediction mode/position/number/distance to the current chroma block of the surrounding chroma blocks of the current chroma block, and the value/position/number/up to the current chroma block of the surrounding chroma pixels.
  • the weight can be inferred.
  • the video decoding device can implement various weighted combining methods by appropriately setting w(i,j) in Equation 3.
  • first predictor predictor based on information (1) of the corresponding luma area
  • second predictor predictor based on information (2) of the same channel. Weighted combination methods are described for the case, but the same methods can also be applied to the case where a plurality of additional predictors exist.
  • Realization Examples 2-1 to 2-5 are methods for setting the same weight for all pixels in the predictor.
  • the weighted combining method for the corresponding implementations is described without considering the influence of pixel coordinates (i,j) in the predictor, as shown in Equation 7.
  • the image decoding device uses a predefined weight w CCLM .
  • the predefined weights include equal weights, higher weights for CCLM predictions (3:1, 7:1,...), or lower weights for CCLM predictions (1:3, 1:7, . ..) can be used.
  • an image decoding device can set equal weights for all predictors.
  • the image decoding device may set a higher weight to the first predictor according to CCLM prediction.
  • Examplementation Example 2-2> Using information from chroma blocks surrounding the current chroma block
  • the image decoding device sets the weight using information such as width/height/area/prediction mode/position/number/distance to the current chroma block of the surrounding chroma blocks of the current chroma block. Set it.
  • the correlation between the current chroma block and surrounding chroma blocks using a prediction mode (hereinafter referred to as 'representative mode') for generating the second predictor (pred intra ) can be quantified.
  • the numerical correlation is referred to as peripheral pixel correlation r C.
  • the video decoding device can set the value of the weight w CCLM using the neighboring pixel correlation r C .
  • the representative mode may be one of the 67 intra prediction modes illustrated in FIG. 3A, as described above.
  • the representative mode may be an intra prediction mode that collectively refers to all 67 intra prediction modes. If the representative mode is all intra prediction modes, prediction modes other than the representative mode may include MIP mode, CCLM mode, etc.
  • a prediction mode (hereinafter referred to as 'representative mode') for generating a predictor (pred CCLM ) of the current chroma block, and information on neighboring blocks of the current chroma block are used.
  • the correlation degree r C of the surrounding pixels of the predictor (pred CCLM ) can be inferred.
  • the value of the weight w intra can be set using r C as the degree of correlation between the surrounding pixels.
  • the video decoding device can use one of the following three methods as a method of deriving the peripheral pixel correlation r C.
  • the surrounding chroma blocks considered in the examples of the methods below include blocks adjacent to the current chroma block or blocks that are slightly distant, and the range of the surrounding chroma blocks may be set in various ways depending on the embodiment.
  • r C can be derived by calculating the ratio of neighboring blocks that use the representative mode among neighboring chroma blocks of the current chroma block based on the number of blocks.
  • this ratio may be set to r C.
  • the video decoding device can use this ratio as the weight of the second predictor, as shown in Equation 10, and set the value obtained by subtracting the ratio from 1 as the weight of the first predictor generated in CCLM mode.
  • the video decoding device may set 3/5 as the weight of the second predictor and 2/5 as the weight of the first predictor generated by CCLM mode according to Equation 10. If a plurality of additional predictors are weighted and combined, after calculating the weight of each additional predictor in the same way, the video decoding device can set the value obtained by subtracting the sum of the weights of the additional predictors from 1 as the weight of the first predictor. .
  • r C can be derived by calculating the ratio of neighboring blocks that use the representative mode among neighboring chroma blocks of the current chroma block based on block area.
  • this ratio may be set to r C.
  • the image decoding device can use this ratio as the weight of the second predictor, as shown in Equation 11, and set the value obtained by subtracting the ratio from 1 as the weight of the first predictor generated in CCLM mode.
  • the video protection device can set 28/68 as the weight of the second predictor and 40/68 as the weight of the first predictor generated by CCLM mode according to Equation 11. If a plurality of additional predictors are weighted and combined, after calculating the weight of each additional predictor in the same way, the video decoding device can set the value obtained by subtracting the sum of the weights of the additional predictors from 1 as the weight of the first predictor. .
  • r C can be derived based on the ratio of the lengths of the sides adjacent to the current block of neighboring blocks using the representative mode among the lengths of all sides adjacent to the current chroma block and neighboring chroma blocks.
  • this ratio may be set to r C. .
  • the video decoding device can use this ratio as the weight of the second predictor, as shown in Equation 12, and set the value obtained by subtracting the ratio from 1 as the weight of the first predictor generated in CCLM mode.
  • the image decoding device uses math According to Equation 12, 10/16 can be set as the weight of the second predictor and 6/16 can be set as the weight of the first predictor generated by CCLM mode.
  • the video decoding device can set the value obtained by subtracting the sum of the weights of the additional predictors from 1 as the weight of the first predictor. .
  • one intra prediction mode that is, Planar mode
  • the representative mode may be an intra prediction mode that collectively refers to all 67 intra prediction modes.
  • the image decoding device approximates each denominator and numerator in the form of a power of 2 using the operation shown in Equation 13, and then Equation 10 to Equation 10. 12 can be used to derive the relevance of surrounding pixels.
  • the video decoding device may adjust the importance of the neighboring pixel relevance by additionally multiplying the predetermined importance value p. For example, by applying the specific gravity p, the degree of relevance of the surrounding pixels in Equation 10 can be expressed as Equation 14.
  • the image decoder and device can approximate the derived weight to the nearest power of 1/2, such as 1/2, 1/4, or 1/8.
  • the video decoding device divides the weight section between 0 and 1 into equal parts such as 2 parts, 4 parts, 8 parts, etc. or uses a variable partition length, selects the value of the split position as the representative weight value, and then uses the derived weight can be approximated with the closest representative weight value.
  • the weights may be additionally adjusted using various conditional expressions or calculation formulas.
  • Examplementation Example 2-3 Using information on blocks included in the corresponding luma area and their surrounding blocks
  • the image decoding device determines the width/height/area/prediction mode of the blocks included in the luma area (hereinafter, 'corresponding luma area') corresponding to the current chroma block and the surrounding blocks. /Set the weight using information such as aspect ratio/position/number.
  • the numerical correlation is referred to as luma pixel correlation r L.
  • the video decoding device can set the value of the weight w CCLM using the luma pixel correlation r L.
  • the weight of the representative mode can be derived for the prediction mode (e.g., IPM) that generates the predictor based on the surrounding pixel information. .
  • the representative mode may be one of the 67 intra prediction modes illustrated in FIG. 3A, as described above.
  • the representative mode may be an intra prediction mode that collectively refers to all 67 intra prediction modes. If the representative mode is all intra prediction modes, prediction modes other than the representative mode may include MIP mode, CCLM mode, etc.
  • the video decoding device can use one of the following two methods as a method of deriving the luma pixel correlation r L.
  • blocks considered in the examples of the methods below are blocks included in the corresponding luma area, but the range of blocks may be set in various ways depending on the embodiment, including blocks surrounding the corresponding luma area.
  • r L can be derived by calculating the ratio of blocks using the representative mode among blocks included in the corresponding luma area based on the block area.
  • this ratio may be set to r L .
  • the image decoding device can use this ratio as the weight of the second predictor, as shown in Equation 15, and set the value obtained by subtracting the ratio from 1 as the weight of the first predictor generated in CCLM mode.
  • the video decoding device may set 2/5 as the weight of the second predictor and 3/5 as the weight of the first predictor generated by CCLM mode according to Equation 15. If a plurality of additional predictors are weighted and combined, after calculating the weight of each additional predictor in the same way, the video decoding device can set the value obtained by subtracting the sum of the weights of the additional predictors from 1 as the weight of the first predictor. .
  • r L can be derived by calculating the ratio of blocks using the representative mode among blocks included in the corresponding luma area based on the block area.
  • this ratio may be set to r L .
  • the image decoding device can use this ratio as the weight of the second predictor, as shown in Equation 16, and set the value obtained by subtracting the ratio from 1 as the weight of the first predictor generated in CCLM mode.
  • the video decoding device can set 96/256 as the weight of the second predictor and 160/256 as the weight of the first predictor generated by CCLM mode according to Equation 16. If a plurality of additional predictors are weighted and combined, after calculating the weight of each additional predictor in the same way, the video decoding device can set the value obtained by subtracting the sum of the weights of the additional predictors from 1 as the weight of the first predictor. .
  • one intra prediction mode that is, Planar mode
  • the representative mode may be an intra prediction mode that collectively refers to all 67 intra prediction modes.
  • the image decoding device approximates each denominator and numerator in the form of a power of 2 using an operation similar to that shown in Equation 13, and then Equation 15 and Equation 16
  • the luma pixel correlation can be derived using .
  • the video decoding device may adjust the proportion of the luma pixel correlation r L by additionally multiplying the predetermined proportion value p.
  • the luma pixel correlation in Equation 15 can be expressed as Equation 17.
  • the video decoding device may approximate the derived weight to the nearest power of 1/2, such as 1/2, 1/4, or 1/8.
  • the video decoding device divides the weight section between 0 and 1 into equal parts such as 2 parts, 4 parts, 8 parts, etc. or uses a variable partition length, selects the value of the split position as the representative weight value, and then uses the derived weight can be approximated with the closest representative weight value.
  • the weights may be additionally adjusted using various conditional expressions or calculation formulas.
  • Examplementation Example 2-4> Using restored chroma information around the current chroma block
  • the image decoding device sets the weight using the restored chroma information around the current chroma block, that is, information such as value/position/number/distance to the current chroma block of pixels around the current chroma block.
  • the restored information around the current chroma block may also include the width/height/area/aspect ratio/prediction mode/position/number/distance to the current chroma block of surrounding chroma blocks, etc., but the method using these is Realization Example 2- Depends on 2. Therefore, in this implementation, a method mainly based on information such as value/position/number/distance to the current chroma block of pixels surrounding the current chroma block is described. As described above, the area containing the surrounding pixels of the current chroma block is called the surrounding chroma pixel area.
  • the weight setting method according to this implementation example can be applied when the second predictor is pred CCLM or pred intra .
  • a method for setting weights for the case where the second predictor is pred intra is described.
  • the video decoding device can use one of the following methods to derive weights using the restored chroma information around the current chroma block.
  • the image decoding device calculates prediction modes and intensities derived from the values of pixels surrounding the current chroma block using an edge detection filter, and then makes a second prediction based on the ratio of the intensity of the representative mode among all intensities. It can be set to your own weight.
  • the video decoding device can borrow the method used to derive the prediction mode in DIMD technology.
  • This implementation is very similar in operation to the first method of Realization Example 1-2.
  • the prediction mode that can be inferred using the edge detection filter is limited to the directional prediction mode, so this implementation may be limited to the case where the second predictor is pred intra according to Equation 3.
  • the set representative mode is the directional mode
  • intensity histograms are generated for the directional modes as shown in the example of FIG. 24 according to the first method of Realization Example 1-2.
  • the video decoding device can set weights using the intensity of the representative mode from the intensity histogram.
  • representative mode M is assumed to be mode 19.
  • the ratio of the intensity of the representative mode to the total of the intensities in the intensity histogram can be set as the weight, as shown in Equation 18.
  • the weight of the representative mode is set to 25/95.
  • the weight of the representative mode is set to 25/65.
  • the neighboring blocks and prediction modes of the current chroma block are distributed as shown in the example of FIG. 25.
  • the ratio of the intensity of the representative mode to the total of the intensities of these directional modes is set as the weight. It can be.
  • the weight of the representative mode is set to 25/56.
  • weights can be set using the intensity of the representative mode according to various methods. If a plurality of additional predictors are weighted and combined, after calculating the weight of each additional predictor in the same way, the video decoding device can set the value obtained by subtracting the sum of the weights of the additional predictors from 1 as the weight of the first predictor. .
  • the image decoding device calculates the distortion of each prediction mode based on information on surrounding pixels of the current chroma block, and then uses the ratio of the distortion of the representative mode among the total distortion values when setting the weight of the second predictor. You can.
  • the video decoding device can borrow the method used to derive the prediction mode in TIMD technology.
  • This implementation is very similar in operation to the second method of Realization Example 1-2.
  • the video decoding device calculates the distortion value D of the prediction mode candidates according to the second method of Realization Example 1-2, and then calculates the value calculated according to Equation 21 or Equation 22 for each prediction mode candidate as Realization Example 1- Replace with the intensity value used in the first method of 2.
  • the video decoding device can generate an intensity histogram in the same manner as the first method of this implementation and then set weights using it.
  • the ratio of the distortion of the representative mode among the distortion values of the prediction mode candidates is calculated using various methods, and then the weight can be set using this ratio.
  • the image decoding device stores the reconstructed information inside and around the luma area (hereinafter, 'corresponding luma area') corresponding to the current chroma block, that is, the values of pixels inside and around the corresponding luma area.
  • the restored information in and around the corresponding luma area may also include the width/height/area/aspect ratio/prediction mode/position/number of blocks included in the corresponding luma area and the blocks surrounding them, but the method of utilizing these is It depends on Realization Example 2-3. Therefore, in this implementation, a method mainly based on information such as value/position/number of pixels in and around the corresponding luma area is described.
  • the prediction mode i.e., CCLM mode
  • CCLM mode an area containing pixels within and around the corresponding luma area.
  • the image decoding device may use one of the following methods to derive weights using the reconstructed information in and around the corresponding luma area.
  • the image decoding device calculates prediction modes and intensities derived from the values of pixels in and around the corresponding luma area using an edge detection filter, and then calculates the ratio of the intensity of the representative mode among all intensities. It can be set to the weight of the second predictor.
  • the video decoding device can borrow the method used to derive the prediction mode in DIMD technology. This implementation is very similar in operation to the first method of Realization Example 1-3.
  • the set representative mode is the directional mode
  • intensity histograms are generated for the directional modes as shown in the example of FIG. 24 according to the first method of Realization Example 1-3.
  • the video decoding device can set weights using the intensity of the representative mode from the intensity histogram. Thereafter, in the same manner as the first method of Realization Example 2-4, the video decoding device can set the weight using the intensity.
  • the image decoding device calculates the distortion of each prediction mode based on information on pixels in and around the corresponding luma area, and then sets the weight of the second predictor based on the ratio of the distortion of the representative mode among the total distortion values. It is available at the time.
  • the video decoding device can borrow the method used to derive the prediction mode in TIMD technology.
  • This implementation is very similar in operation to the second method of Realization Examples 1-3.
  • the image decoding device may calculate the distortion value D of the prediction mode candidates according to the second method of Realization Example 1-3, and then set the weight using the same permuted intensity as the second method of Realization Example 2-4. .
  • the video decoding device does not infer information for intra prediction of the current chroma block, but uses information signaled from the video encoding device. That is, information related to the prediction mode of the second predictor, information related to weighted combination, etc. are transmitted from the video encoding device to the video decoding device. Additionally, whether or not this embodiment is applied can be signaled from the video encoding device to the video decoding device.
  • information related to the prediction mode (hereinafter referred to as 'representative mode') for generating the second predictor is directly signaled from the video encoding device to the video decoding device.
  • related information includes the number of representative modes, representative mode derivation method, representative mode index, etc.
  • the representative mode number can be signaled as follows.
  • the number of representative modes is preset at a level higher than CU, such as SPS (Sequence Parameter Set)/VPS (Video Parameter Set)/PPS (Picture Parameter Set)/SH (Slice Header)/CTU (Coding Tree Unit), etc. can be set.
  • SPS Sequence Parameter Set
  • VPN Video Parameter Set
  • PPS Physical Parameter Set
  • SH Video Header
  • CTU Coding Tree Unit
  • ccip in the variable name is an abbreviation for Cross 'CCLM Intra Prediction'.
  • ccip is inserted into the variable name of the signal related to this embodiment.
  • the video encoding device encodes a preset number of representative modes, includes them in the bitstream, and signals them to the video decoding device.
  • the video decoding device parses sps_ccip_extra_mode_num in the bitstream. Afterwards, the number of representative modes to be derived when performing prediction is determined according to the value of sps_ccip_extra_mode_num.
  • the number of representative modes may be signaled each time prediction is performed at the CU level.
  • the intra prediction mode parsing process of the chroma channel described above in Table 2 may be changed as shown in the examples in Tables 4 to 6. According to Table 4 or Table 5, the number of representative modes required when predicting each block can be signaled by additionally parsing ccip_extra_mode_num according to the type of prediction mode.
  • ccip_extra_mode_num can be additionally parsed regardless of the type of prediction mode.
  • the representative mode derivation method can be signaled as follows. First, the representative mode derivation methods presented in Realization Example 1 can be classified by index as illustrated in Table 7.
  • a representative mode derivation method may be set in advance at a level higher than CU, such as SPS/VPS/PPS/SH/CTU.
  • CU such as SPS/VPS/PPS/SH/CTU.
  • the index sps_ccip_mode_infer_idx of the representative mode induction method on SPS may be defined in advance.
  • the video encoding device encodes the index of the predefined representative mode derivation method and then includes it in the bitstream and signals it to the video decoding device.
  • the video decoding device parses sps_ccip_mode_infer_idx in the bitstream. Afterwards, the representative mode derivation method to be used when performing prediction is determined according to the value of sps_ccip_mode_infer_idx.
  • the representative mode derivation method may be signaled each time prediction is performed at the CU level.
  • the representative mode derivation method required for predicting each block can be signaled by additionally parsing ccip_mode_infer_idx according to the type of prediction mode, as shown in Table 9 or Table 10.
  • ccip_mode_infer_idx can be additionally parsed regardless of the type of prediction mode.
  • the representative mode index can be signaled as follows.
  • a representative mode index may be set in advance at a level higher than CU, such as SPS/VPS/PPS/SH/CTU.
  • the representative mode index sps_ccip_extra_mode_idx may be defined in advance on the SPS.
  • the video encoding device encodes a predefined representative mode index, includes it in the bitstream, and signals it to the video decoding device.
  • the video decoding device parses sps_ccip_extra_mode_idx in the bitstream. Afterwards, the representative mode to be used when performing prediction is determined according to the value of sps_ccip_extra_mode_idx.
  • the representative mode index may be signaled each time prediction is performed at the CU level.
  • the representative mode index required for prediction of each block can be signaled by additionally parsing ccip_extra_mode_idx according to the type of prediction mode, as shown in Table 13 or Table 14.
  • ccip_extra_mode_idx can be additionally parsed regardless of the type of prediction mode.
  • ccip_extra_mode_idx displays 1 index when the number of representative modes is 1. Additionally, ccip_extra_mode_idx can be a list of multiple representative mode indices when multiple representative modes are used.
  • a representative mode derivation method may be signaled while information on the number of representative modes is signaled.
  • a preset number may be used without signaling representative mode number information, and the index of the representative mode may be signaled instead of information on the representative mode derivation method.
  • various prediction methods can be created by selecting various combinations of relevant information to be signaled and related information not to be signaled.
  • Examplementation Example 3-2 Method of signaling information related to weighted combination
  • weighted combining related information is signaled from the video encoding device to the video decoding device.
  • information related to weighted combination includes weighted combination method, weighted combination weight, specific gravity value, etc.
  • the specific gravity value is a value multiplied in the process of calculating the peripheral pixel correlation and the luma pixel correlation in Realization Examples 2-2 and 2-3.
  • the weighted combination method can be signaled as follows. First, the weighted combination methods presented in Realization Example 2 can be classified by index as illustrated in Table 16.
  • a weighted combination method may be set in advance at a level higher than CU, such as SPS/VPS/PPS/SH/CTU.
  • CU such as SPS/VPS/PPS/SH/CTU.
  • the index sps_ccip_weight_calc_mode_idx of the weighted combination method on SPS may be defined in advance.
  • the video encoding device encodes the index of a predefined weighted combination method and then includes it in the bitstream and signals it to the video decoding device.
  • the video decoding device parses sps_ccip_weight_calc_mode_idx in the bitstream. Afterwards, the weighted combination method to be used when performing prediction is determined according to the value of sps_ccip_weight_calc_mode_idx.
  • the weighted combination method may be signaled each time prediction is performed at the CU level.
  • the weighted combination method to be used when predicting each block can be signaled by additionally parsing ccip_weight_calc_mode_idx according to the type of prediction mode, as shown in Table 18 or Table 19.
  • ccip_weight_calc_mode_idx can be additionally parsed regardless of the type of prediction mode.
  • the weights of the weighted combination can be signaled as follows.
  • the weight of the weighted combination may be set in advance at a level higher than the CU, such as SPS/VPS/PPS/SH/CTU.
  • the weight sps_ccip_pred_weight of the weighted combination may be defined in advance on the SPS.
  • the video encoding device encodes the weight of the predefined weighted combination and then includes it in the bitstream and signals it to the video decoding device.
  • the video decoding device parses sps_ccip_pred_weight in the bitstream. Afterwards, the weighted combination method to be used when performing prediction is determined according to the value of sps_ccip_pred_weight.
  • the weight of the weighted combination may be signaled each time prediction is performed at the CU level.
  • the intra prediction mode parsing process of the chroma channel by additionally parsing ccip_pred_weight according to the type of prediction mode, as shown in Table 22 or Table 23, the grouping method of pixels in the block required for prediction of each block can be signaled.
  • ccip_pred_weight can be additionally parsed regardless of the type of prediction mode.
  • ccip_pred_weight represents one weight for the first predictor (or second predictor) when the number of representative modes is one. Additionally, when ccip_pred_weight uses multiple representative modes, the number of weights constituting the list may increase depending on the number of representative modes.
  • the weight value multiplied in the calculation process of peripheral pixel correlation and luma pixel correlation can also be signaled in the same manner as the weighted combination weight.
  • the specific gravity value can be signaled by changing ccip_pred_weight to ccip_relativity_importance in Tables 21 to 24.
  • SPS/ Flags may be set in advance at a level higher than CU, such as VPS/PPS/SH/CTU.
  • a flag sps_ccip_mode_flag indicating use of an improved predictor on SPS may be defined in advance.
  • the video encoding device encodes a flag indicating the use of a predefined improved predictor and then includes it in the bitstream and signals it to the video decoding device.
  • a combination of Realization Example 3-1 and Realization Example 3-2 is possible. For example, if it is determined that the present invention is applied by signaling whether the present invention is applied, the methods of Realization Example 3-1 and Realization Example 3-2 can then be applied.
  • whether or not the present invention is applicable may be signaled at a low level. That is, application of the present invention can be determined using ccip_mode_flag at the CU level. If ccip_mode_flag is 0, the video decoding device does not apply the present invention, and if ccip_mode_flag is 1, the video decoding device may generate a final predictor by weightedly combining the first predictor and the second predictor.
  • the video decoding device can parse ccip_mode_flag as shown in Tables 26 and 27, depending on the type of prediction mode.
  • ccip_mode_flag may be parsed regardless of the type of prediction mode.
  • the representative mode according to the present invention may be a co-channel prediction mode that generates a predictor using information (2) of the same channel.
  • the representative mode according to the present invention may be a cross-component prediction mode that generates a predictor using information (1) of the corresponding luma area. .
  • Non-transitory recording media include, for example, all types of recording devices that store data in a form readable by a computer system.
  • non-transitory recording media include storage media such as erasable programmable read only memory (EPROM), flash drives, optical drives, magnetic hard drives, and solid state drives (SSD).
  • EPROM erasable programmable read only memory
  • SSD solid state drives

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Sont divulgués un procédé et un dispositif de codage vidéo utilisant une prédiction améliorée de modèle linéaire inter-composantes (CCLM), et le présent mode de réalisation concerne un procédé et un dispositif de codage vidéo, dans lequel, dans une prédiction intra d'un bloc de chrominance actuel, un premier prédicteur du bloc de chrominance actuel est généré selon une prédiction CCLM, un second prédicteur du bloc de chrominance actuel est en outre généré sur la base de pixels voisins pré-reconstruits, puis les poids du premier prédicteur et du second prédicteur sont combinés.
PCT/KR2022/019676 2022-09-26 2022-12-06 Procédé et dispositif de codage vidéo utilisant une prédiction améliorée de modèle linéaire inter-composantes WO2024071523A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR10-2022-0121555 2022-09-26
KR20220121555 2022-09-26
KR10-2022-0167522 2022-12-05
KR1020220167522A KR20240043043A (ko) 2022-09-26 2022-12-05 개선된 크로스 컴포넌트 선형 모델 예측을 이용하는 비디오 코딩방법 및 장치

Publications (1)

Publication Number Publication Date
WO2024071523A1 true WO2024071523A1 (fr) 2024-04-04

Family

ID=90478320

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/019676 WO2024071523A1 (fr) 2022-09-26 2022-12-06 Procédé et dispositif de codage vidéo utilisant une prédiction améliorée de modèle linéaire inter-composantes

Country Status (1)

Country Link
WO (1) WO2024071523A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018034374A1 (fr) * 2016-08-19 2018-02-22 엘지전자(주) Procédé et appareil de codage et décodage d'un signal vidéo à l'aide d'un filtrage de prédiction-intra
KR20190046852A (ko) * 2016-09-15 2019-05-07 퀄컴 인코포레이티드 비디오 코딩을 위한 선형 모델 크로마 인트라 예측
KR20200113173A (ko) * 2019-03-20 2020-10-06 현대자동차주식회사 예측모드 추정에 기반하는 인트라 예측장치 및 방법
KR102194113B1 (ko) * 2015-06-18 2020-12-22 퀄컴 인코포레이티드 인트라 예측 및 인트라 모드 코딩
US20210014506A1 (en) * 2018-03-31 2021-01-14 Huawei Technologies Co., Ltd. Method and apparatus for intra prediction of picture block

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102194113B1 (ko) * 2015-06-18 2020-12-22 퀄컴 인코포레이티드 인트라 예측 및 인트라 모드 코딩
WO2018034374A1 (fr) * 2016-08-19 2018-02-22 엘지전자(주) Procédé et appareil de codage et décodage d'un signal vidéo à l'aide d'un filtrage de prédiction-intra
KR20190046852A (ko) * 2016-09-15 2019-05-07 퀄컴 인코포레이티드 비디오 코딩을 위한 선형 모델 크로마 인트라 예측
US20210014506A1 (en) * 2018-03-31 2021-01-14 Huawei Technologies Co., Ltd. Method and apparatus for intra prediction of picture block
KR20200113173A (ko) * 2019-03-20 2020-10-06 현대자동차주식회사 예측모드 추정에 기반하는 인트라 예측장치 및 방법

Similar Documents

Publication Publication Date Title
WO2020076143A1 (fr) Procédé et appareil de traitement de signal vidéo utilisant la prédiction à hypothèses multiples
WO2018030599A1 (fr) Procédé de traitement d'image fondé sur un mode de prédiction intra et dispositif associé
WO2017018664A1 (fr) Procédé de traitement d'image basé sur un mode d'intra prédiction et appareil s'y rapportant
WO2018066867A1 (fr) Procédé et appareil de codage et décodage d'image, et support d'enregistrement pour la mémorisation de flux binaire
WO2017176030A1 (fr) Procédé et appareil de traitement de signal vidéo
WO2017188779A2 (fr) Procédé et appareil de codage/décodage d'un signal vidéo
WO2019235896A1 (fr) Procédé de traitement de signal vidéo et appareil utilisant une résolution de vecteur de mouvement adaptative
WO2017026681A1 (fr) Procédé et dispositif d'interprédiction dans un système de codage vidéo
WO2018047995A1 (fr) Procédé de traitement d'image basé sur un mode d'intraprédiction et appareil associé
WO2018226066A1 (fr) Procédé et appareil de décodage de vidéo selon une prédiction affine dans un système de codage de vidéo
WO2018124333A1 (fr) Procédé de traitement d'image basé sur un mode de prédiction intra et appareil s'y rapportant
WO2017003063A1 (fr) Procédé de traitement d'image basé sur un mode interprédiction, et système associé
WO2020085800A1 (fr) Procédé et dispositif de traitement de signal vidéo à l'aide d'une compensation de mouvement basée sur un sous-bloc
WO2017069505A1 (fr) Procédé de codage/décodage d'image et dispositif correspondant
WO2020171681A1 (fr) Procédé et dispositif de traitement de signal vidéo sur la base de l'intraprédiction
WO2020111843A1 (fr) Procédé et dispositif de traitement de signal vidéo utilisant un filtrage d'intraprédiction
WO2019221465A1 (fr) Procédé/dispositif de décodage d'image, procédé/dispositif de codage d'image et support d'enregistrement dans lequel un train de bits est stocké
WO2017188509A1 (fr) Procédé de traitement d'image basé sur un mode de prédiction inter et appareil associé
WO2018174457A1 (fr) Procédé de traitement des images et dispositif associé
WO2020096427A1 (fr) Procédé de codage/décodage de signal d'image et appareil associé
WO2016190627A1 (fr) Procédé et dispositif pour traiter un signal vidéo
WO2021172914A1 (fr) Procédé de décodage d'image pour un codage résiduel et dispositif associé
WO2011021914A2 (fr) Procédé et appareil de codage/décodage d'images utilisant une résolution de vecteur de mouvement adaptative
WO2021040458A1 (fr) Procédé et dispositif de traitement de signal vidéo
WO2011021915A2 (fr) Procédé et appareil de codage/décodage d'images utilisant une résolution de vecteur de mouvement adaptative

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22961122

Country of ref document: EP

Kind code of ref document: A1