CN118056406A

CN118056406A - Image encoding/decoding method and apparatus based on inseparable main transform and recording medium storing bit stream

Info

Publication number: CN118056406A
Application number: CN202280067472.1A
Authority: CN
Inventors: 崔璋元; 崔情娥; 具文模; 金昇焕; 林宰显; 赵杰
Original assignee: LG Electronics Inc
Current assignee: LG Electronics Inc
Priority date: 2021-10-05
Filing date: 2022-10-05
Publication date: 2024-05-17

Abstract

An image encoding/decoding method and apparatus are provided. The image decoding method may include the steps of: obtaining prediction information of a current block; determining a prediction mode of the current block according to the prediction information; selecting a transform kernel for generating a residual block of the current block based on the prediction mode being a predetermined mode; and generating a residual block of the current block by performing an inseparable main transform with respect to the current block by means of the selected transform check.

Description

Image encoding/decoding method and apparatus based on inseparable main transform and recording medium storing bit stream

Technical Field

The present disclosure relates to an image encoding/decoding method and apparatus and a recording medium for storing a bitstream, and more particularly, to an image encoding/decoding method and apparatus based on an inseparable main transform and a recording medium for storing a bitstream generated by the image encoding method/apparatus of the present disclosure.

Background

Recently, in various fields, demands for high resolution and high quality images such as High Definition (HD) images and Ultra High Definition (UHD) images are increasing. As the resolution and quality of image data are improved, the amount of information or bits transmitted is relatively increased as compared to existing image data. An increase in the amount of transmission information or the amount of bits results in an increase in transmission costs and storage costs.

Therefore, there is a need for efficient image compression techniques for efficiently transmitting, storing, and reproducing information about high resolution and high quality images.

Disclosure of Invention

Technical problem

An object of the present disclosure is to provide an image encoding/decoding method and apparatus having improved encoding/decoding efficiency.

It is an object of the present disclosure to provide an image encoding/decoding method and apparatus for performing an inseparable main transform.

It is an object of the present disclosure to provide an image encoding/decoding method and apparatus for performing an inseparable main transform on a block to which a Combined Inter and Intra Prediction (CIIP) mode is applied.

It is an object of the present disclosure to provide an image encoding/decoding method and apparatus for performing an inseparable main transform on a block to which a Geometric Partition Mode (GPM) is applied.

It is an object of the present disclosure to provide an image encoding/decoding method and apparatus for performing an inseparable main transform when decoder-side mode derivation (DIMD) is used in CIIP mode.

It is an object of the present disclosure to provide an image encoding/decoding method and apparatus for performing an inseparable main transform when template-based intra mode derivation (TIMD) is used in CIIP mode.

It is an object of the present disclosure to provide an index signaling method when both separable transforms and non-separable main transforms are included.

It is an object of the present disclosure to provide an image encoding/decoding method and apparatus for performing an inseparable main transform based on residual characteristics of a current block or neighboring blocks.

It is another object of the present disclosure to provide a non-transitory computer-readable recording medium storing a bitstream generated by an image encoding method or apparatus according to the present disclosure.

It is another object of the present disclosure to provide a recording medium storing a bitstream received, decoded, and used for reconstructing an image by an image decoding apparatus according to the present disclosure.

It is another object of the present disclosure to provide a method of transmitting a bitstream generated by an image encoding method or apparatus according to the present disclosure.

The technical problems solved by the present disclosure are not limited to the above technical problems, and other technical problems not described herein will be apparent to those skilled in the art from the following description.

Technical proposal

According to one embodiment of the present disclosure, an image decoding method performed by an image decoding apparatus may include: obtaining prediction information of a current block; determining a prediction mode of the current block according to the prediction information; selecting a transform kernel for generating a residual block of the current block based on the prediction mode as a predetermined mode; and generating a residual block of the current block by performing an inseparable main transform with respect to the current block based on the selected transform check.

According to one embodiment of the present disclosure, the predetermined mode may be one of a Combined Inter and Intra Prediction (CIIP) mode or a Geometric Partition Mode (GPM).

According to one embodiment of the present disclosure, it may also be determined whether to apply the inseparable primary transform based on the number of pixels included in the current block.

According to one embodiment of the present disclosure, the image decoding method may further include obtaining information specifying whether to apply the inseparable main transform for a predetermined mode based on a prediction mode of the current block, and may determine whether to apply the inseparable main transform to the current block based on the obtained information.

According to one embodiment of the present disclosure, a set of transform cores or a transform core applied to the inseparable primary transform may be adaptively selected based on information about a predetermined pattern.

According to one embodiment of the present disclosure, based on the predetermined mode being CIIP modes, the information about the predetermined mode may include at least one of CIIP weight or CIIP intra mode.

According to one embodiment of the present disclosure, CIIP intra-frame modes may be at least one of a plane mode, CIIP _pdpc mode, CIIP _ DIMD mode, or CIIP _ TIMD mode.

According to one embodiment of the present disclosure, the information about the predetermined pattern may include at least one of a GPM index, an angle index, or a distance index, based on the predetermined pattern being the GPM pattern.

According to one embodiment of the present disclosure, selecting a transform core for a current block may include obtaining transform core selection information about the current block, and the transform core selection information may be information for selecting one transform core from one or more non-separable transform cores and one or more separable transform cores.

According to one embodiment of the present disclosure, it may also be determined whether to apply the inseparable main transform to the current block based on residual characteristics of the current block or neighboring blocks.

According to one embodiment of the present disclosure, the residual characteristic may be determined based on a position of a last significant coefficient in a residual block of the current block or the number of significant coefficients in the residual block of the current block.

According to one embodiment of the present disclosure, an image decoding apparatus may include a memory and at least one processor. The at least one processor may obtain prediction information for the current block; determining a prediction mode of the current block according to the prediction information; selecting a transform kernel for generating a residual block of the current block based on the prediction mode as a predetermined mode; and generating a residual block of the current block by performing an inseparable main transform with respect to the current block based on the selected transform check.

According to one embodiment of the present disclosure, an image encoding method performed by an image encoding apparatus may include the steps of: determining a prediction mode of the current block; selecting a transform kernel for encoding a residual block of the current block based on the prediction mode as a predetermined mode; and encoding a residual block of the current block by performing an inseparable main transform with respect to the current block using the determined transform check.

According to one embodiment of the present disclosure, based on the predetermined mode being CIIP modes, the information about the predetermined mode may include at least one of CIIP weight or CIIP intra mode, and CIIP intra mode may be at least one of planar mode, CIIP _pdpc mode, CIIP _ DIMD mode, or CIIP _ TIMD mode.

According to one embodiment of the present disclosure, in a computer-readable recording medium storing a bitstream generated by an image encoding method, the image encoding method may include: determining a prediction mode of the current block; selecting a transform kernel for encoding a residual block of the current block based on the prediction mode as a predetermined mode; and encoding a residual block of the current block by performing an inseparable main transform with respect to the current block using the determined transform check.

According to one embodiment of the present disclosure, a method of transmitting a bitstream generated by an image encoding method may include: determining a prediction mode of the current block; selecting a transform kernel for encoding a residual block of the current block based on the prediction mode as a predetermined mode; and encoding a residual block of the current block by performing an inseparable main transform with respect to the current block using the determined transform check.

Advantageous effects

According to the present disclosure, an image encoding/decoding method and apparatus having improved encoding/decoding efficiency may be provided.

According to the present disclosure, an image encoding/decoding method and apparatus for performing an inseparable main transform may be provided.

In accordance with the present disclosure, an image encoding/decoding method and apparatus for performing an inseparable main transform on a block to which a Combined Inter and Intra Prediction (CIIP) mode is applied may be provided.

In accordance with the present disclosure, an image encoding/decoding method and apparatus for performing an inseparable main transform on a block to which a Geometric Partition Mode (GPM) is applied may be provided.

In accordance with the present disclosure, an image encoding/decoding method and apparatus for performing an inseparable primary transform when decoder-side mode derivation (DIMD) is used in CIIP mode may be provided.

In accordance with the present disclosure, an image encoding/decoding method and apparatus for performing an inseparable primary transform when template-based intra mode derivation (TIMD) is used in CIIP mode may be provided.

According to the present disclosure, an index signaling method may be provided when both separable transforms and non-separable main transforms are included.

According to the present disclosure, an image encoding/decoding method and apparatus for performing an inseparable main transform based on residual characteristics of a current block or neighboring blocks may be provided.

Further, according to the present disclosure, a non-transitory computer readable recording medium storing a bitstream generated by the image encoding method or apparatus according to the present disclosure may be provided.

Further, according to the present disclosure, a non-transitory computer-readable recording medium storing a bitstream received, decoded, and used to reconstruct an image by an image decoding apparatus according to the present disclosure may be provided.

Further, according to the present disclosure, a method of transmitting a bitstream generated by an image encoding method or apparatus according to the present disclosure may be provided.

It will be appreciated by those skilled in the art that the effects that can be achieved by the present disclosure are not limited to the effects that have been specifically described above, and other advantages of the present disclosure will be more clearly understood from the detailed description.

Drawings

Fig. 1 is a diagram schematically illustrating a video coding system to which an embodiment of the present disclosure is applicable.

Fig. 2 is a diagram schematically illustrating an image encoding apparatus to which embodiments of the present disclosure are applicable.

Fig. 3 is a diagram schematically illustrating an image decoding apparatus to which an embodiment of the present disclosure is applied.

Fig. 4a to 4d show reference samples defined in the PDPC applied to various prediction modes.

Fig. 5 is a diagram illustrating 65 directional intra prediction modes.

Fig. 6 is a diagram illustrating a top neighboring block and a left neighboring block for CIIP weight derivation process.

Fig. 7 is a flowchart illustrating a CIIP mode application method using the PDPC.

Fig. 8 shows one example of GPM partitioning.

Fig. 9 is a diagram illustrating LFNST application methods.

Fig. 10 is a flowchart illustrating an encoding process of performing AMT.

Fig. 11 is a flowchart illustrating a decoding process of performing AMT.

Fig. 12 is a flowchart illustrating an encoding process of performing NSST.

Fig. 13 is a flowchart illustrating a decoding process of performing NSST.

Fig. 14 and 15 are diagrams illustrating a method of performing NSST.

Fig. 16 and 17 are diagrams illustrating a method of executing RST.

Fig. 18 is a diagram illustrating a transformation and inverse transformation process according to one embodiment of the present disclosure.

Fig. 19 is a flowchart illustrating a transformation method according to one embodiment of the present disclosure.

Fig. 20 is a flowchart illustrating an inverse transformation method according to one embodiment of the present disclosure.

Fig. 21 is a flowchart illustrating sub-block inseparable main transform/inverse transform according to one embodiment of the present disclosure.

Fig. 22a and 22b are diagrams illustrating an inseparable main transform (i.e., a forward inseparable main transform process) of an encoder stage.

Fig. 23 is a diagram illustrating a GPM for performing symmetric block partitioning.

Fig. 24 is a flowchart showing an example of a main transformation method for a block to which a predetermined pattern is applied.

Fig. 25 shows a flowchart illustrating a method of applying the inseparable main transformation according to a predetermined condition in a block to which a predetermined pattern is applied.

Fig. 26 is a flowchart illustrating an image encoding method according to an embodiment of the present disclosure.

Fig. 27 is a flowchart illustrating an image decoding method according to an embodiment of the present disclosure.

Fig. 28 is a diagram illustrating a content streaming system to which an embodiment of the present disclosure is applicable.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so as to be easily implemented by those skilled in the art. However, the present disclosure may be embodied in a variety of different forms and is not limited to the embodiments described herein.

In describing the present disclosure, in the event that it is determined that a detailed description of related known functions or configurations thereof renders the scope of the present disclosure unnecessarily ambiguous, the detailed description thereof will be omitted. In the drawings, parts irrelevant to the description of the present disclosure are omitted, and like reference numerals are attached to like parts.

In this disclosure, when a component is "connected," "coupled," or "linked" to another component, it can include not only a direct connection, but also an indirect connection in which intervening components are present. In addition, when an element is "comprising" or "having" other elements, it is intended that the other elements may be further included, rather than excluded, unless stated otherwise.

In this disclosure, the terms first, second, etc. may be used solely for the purpose of distinguishing one component from another and not limitation of the order or importance of the components unless otherwise specified. Accordingly, a first component in one embodiment may be referred to as a second component in another embodiment, and similarly, a second component in one embodiment may be referred to as a first component in another embodiment, within the scope of the present disclosure.

In this disclosure, components that are distinguished from each other are intended to clearly describe each feature without implying that the components must be separated. That is, a plurality of components may be integrated and implemented in one hardware or software unit or one component may be distributed and implemented in a plurality of hardware or software units. Accordingly, such embodiments in which the components are integrated or distributed are included within the scope of the present disclosure, even if not otherwise described.

In the present disclosure, the components described in the various embodiments do not necessarily mean necessary components, and some components may be optional components. Thus, embodiments consisting of a subset of the components described in the embodiments are also included within the scope of the present disclosure. In addition, embodiments that include components other than those described in the various embodiments are included within the scope of the present disclosure.

The present disclosure relates to encoding and decoding of images, and unless newly defined in the present disclosure, terms used in the present disclosure may have general meanings commonly used in the technical field to which the present disclosure pertains.

In this disclosure, "video" may refer to a set of images that change over time.

In this disclosure, "picture" generally refers to a unit representing one image in a specific period of time, and a slice/tile is an encoding unit that forms part of a picture, and one picture may be composed of one or more slices/tiles. In addition, the slice/tile may include one or more Coding Tree Units (CTUs).

In this disclosure, "pixel" or "pixel" may represent the smallest unit that constitutes one tab (or image). In addition, "sampling" may be used as a term corresponding to a pixel. The samples may generally represent pixels or values of pixels and may represent only pixels/pixel values of a luminance component or pixels/pixel values of a chrominance component.

In the present disclosure, "unit" may represent a basic unit of image processing. The unit may include at least one of a specific region of the picture and information related to the region. In some cases, the unit may be used interchangeably with terms such as "sample array", "block" or "region". In general, an mxn block may comprise samples (or an array of samples) or sets (or arrays) of M columns and N rows of transform coefficients.

In this disclosure, a "current block" may represent one of a "current encoding block", "current encoding unit", "encoding target block", "decoding target block", or "processing target block". When performing prediction, the "current block" may represent a "current prediction block" or a "prediction target block". When performing transform (inverse transform)/quantization (dequantization), a "current block" may represent a "current transform block" or a "transform target block. When filtering is performed, the "current block" may represent a "filtering target block".

In addition, in the present disclosure, a "current block" may refer to a block including both a luminance component block and a chrominance component block or a "luminance block of a current block" unless explicitly stated as a chrominance block. The luminance component block of the current block may be expressed by including an explicit description of a luminance component block such as "luminance block" or "current luminance block". In addition, the "chroma component block of the current block" may be expressed by including an explicit description of a chroma component block such as "chroma block" or "current chroma block".

In this disclosure, the terms "/" and "," should be interpreted as meaning "and/or". For example, the expressions "A/B" and "A, B" may represent "A and/or B". Further, "A/B/C" and "A/B/C" may refer to at least one of "A, B and/or C".

In this disclosure, the term "or" should be interpreted to mean "and/or". For example, the expression "a or B" may include 1) only "a", 2) only "B" and/or 3) both "a and B". In other words, in the present disclosure, the term "or" should be interpreted as meaning "additionally or alternatively".

In this disclosure, "at least one of A, B and C" may mean "a only", "B only", "C only" or "any and all combinations of A, B and C". In addition, "at least one of A, B or C" or "at least one of A, B and/or C" may mean "at least one of A, B and C".

Brackets as used in this disclosure may mean "for example". For example, if "prediction (intra prediction)" is indicated, the "intra prediction" may be proposed as an example of "prediction". In other words, "prediction" in the present disclosure is not limited to "intra prediction", and "intra prediction" may be proposed as an example of "prediction". In addition, even when "prediction (i.e., intra prediction)" is indicated, the "intra prediction" may be proposed as an example of "prediction".

Overview of video coding System

Fig. 1 is a diagram illustrating a video coding system to which an embodiment of the present disclosure is applicable.

A video coding system according to one embodiment may include a coding device 10 and a decoding device 20. Encoding device 10 may deliver encoded video and/or image information or data to decoding device 20 in the form of files or streams via a digital storage medium or network.

The encoding apparatus 10 according to one embodiment may include a video source generator 11, an encoding unit 12, and a transmitter 13. The decoding apparatus 20 according to one embodiment may include a receiver 21, a decoding unit 22, and a renderer 23. The encoding unit 12 may be referred to as a video/image encoding unit, and the decoding unit 22 may be referred to as a video/image decoding unit. The transmitter 13 may be included in the encoding unit 12. The receiver 21 may be included in the decoding unit 22. The renderer 23 may include a display, and the display may be configured as a separate device or an external component.

The video source generator 11 may acquire video/images through a process of capturing, synthesizing, or generating the video/images. The video source generator 11 may comprise video/image capturing means and/or video/image generating means. The video/image capturing means may comprise, for example, one or more cameras, video/image files comprising previously captured video/images, etc. Video/image generating means may include, for example, computers, tablets and smartphones, and may generate video/images (electronically). For example, virtual video/images may be generated by a computer or the like. In this case, the video/image capturing process may be replaced by a process of generating related data.

The encoding unit 12 may encode the input video/image. The encoding unit 12 may perform a series of processes such as prediction, transformation, and quantization for compression and encoding efficiency. The encoding unit 12 may output encoded data (encoded video/image information) in the form of a bit stream.

The transmitter 13 may obtain the encoded video/image information or data output in the form of a bit stream and transmit it in the form of a file or stream through a digital storage medium or network to the receiver 21 of the decoding apparatus 20 or another external object. The digital storage medium may include various storage media such as USB, SD, CD, DVD, blu-ray, HDD, SSD, etc. The transmitter 13 may include an element for generating a media file through a predetermined file format, and may include an element for transmitting through a broadcast/communication network. The transmitter 13 may be provided as a transmitting device separate from the encoding unit 120. In this case, the transmitting apparatus may include at least one processor that acquires encoded video/image information or data output in the form of a bitstream, and a transmitting unit that transmits the encoded video/image information or data in the form of a file or stream. The receiver 21 may extract/receive a bit stream from a storage medium or a network and transmit the bit stream to the decoding unit 22.

The decoding unit 22 may decode the video/image by performing a series of processes such as dequantization, inverse transformation, and prediction corresponding to the operation of the encoding unit 12.

The renderer 23 may render the decoded video/images. The rendered video/images may be displayed by a display.

Overview of image coding apparatus

As shown in fig. 2, the image encoding apparatus 100 may include an image divider 110, a subtractor 115, a transformer 120, a quantizer 130, an inverse quantizer 140, an inverse transformer 150, an adder 155, a filter 160, a memory 170, an inter prediction unit 180, an intra prediction unit 185, and an entropy encoder 190. The inter prediction unit 180 and the intra prediction unit 185 may be collectively referred to as a "predictor". The transformer 120, quantizer 130, inverse quantizer 140, and inverse transformer 150 may be included in a residual processor. The residual processor may also include a subtractor 115.

In some implementations, all or at least a portion of the plurality of components configuring image encoding device 100 may be configured by one hardware component (e.g., an encoder or processor). In addition, the memory 170 may include a Decoded Picture Buffer (DPB) and may be configured by a digital storage medium.

The image divider 110 may divide an input image (or picture or frame) input to the image encoding apparatus 100 into one or more processing units. For example, the processing unit may be referred to as a Coding Unit (CU). The coding units may be obtained by recursively dividing the Coding Tree Units (CTUs) or Largest Coding Units (LCUs) according to a quadtree binary tree ternary tree (QT/BT/TT) structure. For example, one coding unit may be divided into multiple coding units of greater depth based on a quadtree structure, a binary tree structure, and/or a ternary structure. For the partitioning of the coding units, a quadtree structure may be applied first, and a binary tree structure and/or a ternary structure may be applied later. The encoding process according to the present disclosure may be performed based on the final encoding unit that is no longer divided. The maximum coding unit may be used as a final coding unit, or a coding unit of deeper depth obtained by dividing the maximum coding unit may be used as a final coding unit. Here, the encoding process may include prediction, transformation, and reconstruction processes, which will be described later. As another example, the processing unit of the encoding process may be a Prediction Unit (PU) or a Transform Unit (TU). The prediction unit and the transform unit may be separated or divided from the final encoding unit. The prediction unit may be a unit of sample prediction and the transform unit may be a unit for deriving transform coefficients and/or a unit for deriving a residual signal from the transform coefficients.

The prediction unit (the inter prediction unit 180 or the intra prediction unit 185) may perform prediction on a block to be processed (a current block) and generate a prediction block including prediction samples of the current block. The prediction unit may determine whether to apply intra prediction or inter prediction on the basis of the current block or CU. The prediction unit may generate various information related to the prediction of the current block and transmit the generated information to the entropy encoder 190. Information about the prediction may be encoded in the entropy encoder 190 and output in the form of a bitstream.

The intra prediction unit 185 may predict the current block by referring to samples in the current picture. The referenced samples may be located in the neighborhood of the current block or may be located in separate locations, depending on the intra prediction mode and/or intra prediction technique. The intra prediction modes may include a plurality of non-directional modes and a plurality of directional modes. The non-directional modes may include, for example, a DC mode and a planar mode. Depending on the level of detail of the prediction direction, the direction modes may include, for example, 33 directional prediction modes or 65 directional prediction modes. However, this is merely an example, and more or less directional prediction modes may be used according to settings. The intra prediction unit 185 may determine a prediction mode applied to the current block by using a prediction mode applied to neighboring blocks.

The inter prediction unit 180 may derive a prediction block of the current block based on a reference block (reference sample array) specified by a motion vector on a reference picture. In this case, in order to reduce the amount of motion information transmitted in the inter prediction mode, the motion information may be predicted in units of blocks, sub-blocks, or samples based on the correlation of the motion information between the neighboring block and the current block. The motion information may include a motion vector and a reference picture index. The motion information may also include inter prediction direction (L0 prediction, L1 prediction, bi prediction, etc.) information. In the case of inter prediction, neighboring blocks may include a spatial neighboring block present in the current tab and a temporal neighboring block present in the reference tab. The reference picture including the reference block and the reference picture including the temporal neighboring block may be the same or different. Temporal neighboring blocks may be referred to as collocated reference blocks, co-located CUs (colcus), etc. The reference picture including the temporal neighboring block may be referred to as a collocated picture (colPic). For example, the inter prediction unit 180 may configure a motion information candidate list based on neighboring blocks and generate information indicating which candidate is used to derive a motion vector and/or a reference picture index of the current block. Inter prediction may be performed based on various prediction modes. For example, in the case of the skip mode and the merge mode, the inter prediction unit 180 may use motion information of a neighboring block as motion information of the current block. In the case of the skip mode, unlike the merge mode, the residual signal may not be transmitted. In the case of a Motion Vector Prediction (MVP) mode, a motion vector of a neighboring block may be used as a motion vector predictor, and a motion vector of a current block may be signaled by encoding a motion vector difference and an indicator for the motion vector predictor. The motion vector difference may represent a difference between a motion vector of the current block and a motion vector predictor.

The prediction unit may generate the prediction signal based on various prediction methods and prediction techniques described below. For example, the prediction unit may apply not only intra prediction or inter prediction but also both intra prediction and inter prediction at the same time in order to predict the current block. A prediction method of simultaneously applying both intra prediction and inter prediction to predict a current block may be referred to as Combined Inter and Intra Prediction (CIIP). In addition, the prediction unit may perform intra-block copy (IBC) for prediction of the current block. The intra-block copy may be used for content image/video encoding of games and the like, such as screen content encoding (SCC). IBC is a method of predicting a current picture using a previously reconstructed reference block in the current picture at a predetermined distance from the current block. When IBC is applied, the position of the reference block in the current picture may be encoded as a vector (block vector) corresponding to a predetermined distance. IBC basically performs prediction in the current picture, but may perform prediction similar to inter prediction because a reference block is derived within the current picture. That is, IBC may use at least one of the inter prediction techniques described in this disclosure.

The prediction signal generated by the prediction unit may be used to generate a reconstructed signal or to generate a residual signal. The subtractor 115 may generate a residual signal (residual block or residual sample array) by subtracting the prediction signal (prediction block or prediction sample array) output from the prediction unit from the input image signal (original block or original sample array). The generated residual signal may be transmitted to the transformer 120.

The transformer 120 may generate transform coefficients by applying a transform technique to the residual signal. For example, the transformation techniques may include at least one of Discrete Cosine Transformation (DCT), discrete Sine Transformation (DST), karhunen-lo ve transformation (KLT), graph-based transformation (GBT), or Conditional Nonlinear Transformation (CNT). Here, GBT refers to a transformation obtained from a graph when relationship information between pixels is represented by the graph. CNT refers to a transformation that is derived based on the prediction signal generated using all previously reconstructed pixels. In addition, the transform process may be applied to square pixel blocks having the same size, or may be applied to blocks having a variable size instead of square.

The quantizer 130 may quantize the transform coefficient and send it to the entropy encoder 190. The entropy encoder 190 may encode the quantized signal (information about quantized transform coefficients) and output a bitstream. The information on the quantized transform coefficients may be referred to as residual information. The quantizer 130 may rearrange the quantized transform coefficients of the block type into a one-dimensional vector form based on the coefficient scan order, and generate information about the quantized transform coefficients based on the quantized transform coefficients in the one-dimensional vector form.

The entropy encoder 190 may perform various encoding methods, such as index Golomb, context Adaptive Variable Length Coding (CAVLC), context Adaptive Binary Arithmetic Coding (CABAC), and the like. The entropy encoder 190 may encode information necessary for video/image reconstruction other than quantized transform coefficients (e.g., values of syntax elements, etc.) together or separately. The encoded information (e.g., encoded video/image information) may be transmitted or stored in units of a Network Abstraction Layer (NAL) in the form of a bitstream. The video/image information may also include information about various parameter sets, such as an Adaptive Parameter Set (APS), a Picture Parameter Set (PPS), a Sequence Parameter Set (SPS), or a Video Parameter Set (VPS). In addition, the video/image information may also include general constraint information. The signaled information, the transmitted information, and/or the syntax elements described in this disclosure may be encoded and included in a bitstream through the encoding process described above.

The bit stream may be transmitted over a network or may be stored in a digital storage medium. The network may include a broadcast network and/or a communication network, and the digital storage medium may include various storage media such as USB, SD, CD, DVD, blu-ray, HDD, SSD, etc. A transmitter (not shown) that transmits a signal output from the entropy encoder 190 and/or a storage unit (not shown) that stores the signal may be included as internal/external elements of the image encoding apparatus 100. Alternatively, a transmitter may be provided as a component of the entropy encoder 190.

The quantized transform coefficients output from the quantizer 130 may be used to generate a residual signal. For example, the residual signal (residual block or residual sample) may be reconstructed by applying inverse quantization and inverse transform to the quantized transform coefficients by the inverse quantizer 140 and the inverse transformer 150.

The adder 155 adds the reconstructed residual signal to the prediction signal output from the inter prediction unit 180 or the intra prediction unit 185 to generate a reconstructed signal (reconstructed picture, reconstructed block, reconstructed sample array). If there is no residual for the block to be processed, such as in the case of applying skip mode, the prediction block may be used as a reconstructed block. Adder 155 may be referred to as a reconstructor or a reconstructed block generator. The generated reconstructed signal may be used for intra prediction of the next block to be processed in the current picture, and may be used for inter prediction of the next picture by filtering as described below.

Meanwhile, as described above, the Luminance Map (LMCS) with chroma scaling is suitable for use in picture coding and/or reconstruction processes.

The filter 160 may improve subjective/objective image quality by applying filtering to the reconstructed signal. For example, the filter 160 may generate a modified reconstructed picture by applying various filtering methods to the reconstructed slice and store the modified reconstructed slice in the memory 170, in particular, in the DPB of the memory 170. Various filtering methods may include, for example, deblocking filtering, sample adaptive shifting, adaptive loop filters, bilateral filters, and the like. The filter 160 may generate various information related to filtering and transmit the generated information to the entropy encoder 190, as described later in the description of each filtering method. The information related to filtering may be encoded by the entropy encoder 190 and output in the form of a bitstream.

The modified reconstructed picture sent to the memory 170 may be used as a reference picture in the inter prediction unit 180. When the inter prediction is applied by the image encoding apparatus 100, prediction mismatch between the image encoding apparatus 100 and the image decoding apparatus can be avoided, and encoding efficiency can be improved.

The DPB of the memory 170 may store the modified reconstructed picture to be used as a reference picture in the inter prediction unit 180. The memory 170 may store motion information of blocks from which motion information in the current picture is derived (or encoded) and/or motion information of blocks in a picture that has been reconstructed. The stored motion information may be transmitted to the inter prediction unit 180 and used as motion information of a spatially neighboring block or motion information of a temporally neighboring block. The memory 170 may store reconstructed samples of the reconstructed block in the current picture and may transfer the reconstructed samples to the intra prediction unit 185.

Overview of image decoding apparatus

Fig. 3 is a diagram schematically illustrating an image decoding apparatus to which an embodiment of the present disclosure is applicable.

As shown in fig. 3, the image decoding apparatus 200 may include an entropy decoder 210, an inverse quantizer 220, an inverse transformer 230, an adder 235, a filter 240, a memory 250, an inter prediction unit 260, and an intra prediction unit 265. The inter prediction unit 260 and the intra prediction unit 265 may be collectively referred to as a "predictor". The inverse quantizer 220 and the inverse transformer 230 may be included in a residual processor.

According to an embodiment, all or at least a portion of the plurality of components configuring the image decoding apparatus 200 may be configured by a hardware component (e.g., a decoder or a processor). In addition, the memory 250 may include a Decoded Picture Buffer (DPB) or may be configured by a digital storage medium.

The image decoding apparatus 200, which has received the bitstream including the video/image information, may reconstruct an image by performing a process corresponding to the process performed by the image encoding apparatus 100 of fig. 2. For example, the image decoding apparatus 200 may perform decoding using a processing unit applied in the image encoding apparatus. Thus, the decoded processing unit may be, for example, an encoding unit. The coding unit may be obtained by dividing a coding tree unit or a maximum coding unit. The reconstructed image signal decoded and output by the image decoding apparatus 200 may be reproduced by a reproducing apparatus (not shown).

The image decoding apparatus 200 may receive a signal output in the form of a bit stream from the image encoding apparatus of fig. 2. The received signal may be decoded by the entropy decoder 210. For example, the entropy decoder 210 may parse the bitstream to derive information (e.g., video/image information) necessary for image reconstruction (or picture reconstruction). The video/image information may also include information about various parameter sets, such as an Adaptive Parameter Set (APS), a Picture Parameter Set (PPS), a Sequence Parameter Set (SPS), or a Video Parameter Set (VPS). In addition, the video/image information may also include general constraint information. The image decoding apparatus may also decode the picture based on information about the parameter set and/or general constraint information. The signaled/received information and/or syntax elements described in this disclosure may be decoded and obtained from the bitstream through a decoding process. For example, the entropy decoder 210 decodes information in a bitstream based on an encoding method such as index Golomb encoding, CAVLC, or CABAC, and output values of syntax elements required for image reconstruction and quantized values of transform coefficients of a residual. More specifically, the CABAC entropy decoding method may receive a bin corresponding to each syntax element in a bitstream, determine a context model using decoding target syntax element information, decoding information of neighboring blocks and decoding target blocks, or information of symbols/bins decoded in a previous stage, and perform arithmetic decoding on the bin by predicting occurrence probability of the bin according to the determined context model, and generate a symbol corresponding to a value of each syntax element. In this case, the CABAC entropy decoding method may update the context model by using information of the decoded symbol/bin of the context model for the next symbol/bin after determining the context model. The prediction-related information among the information decoded by the entropy decoder 210 may be provided to the prediction units (the inter prediction unit 260 and the intra prediction unit 265), and the residual values (i.e., quantized transform coefficients and related parameter information) on which entropy decoding is performed in the entropy decoder 210 may be input to the inverse quantizer 220. In addition, information on filtering among the information decoded by the entropy decoder 210 may be provided to the filter 240. Meanwhile, a receiver (not shown) for receiving a signal output from the image encoding apparatus may be further configured as an internal/external element of the image decoding apparatus 200, or the receiver may be a component of the entropy decoder 210.

Meanwhile, the image decoding apparatus according to the present disclosure may be referred to as a video/image/picture decoding apparatus. The image decoding apparatus may be classified into an information decoder (video/image/picture information decoder) and a sample decoder (video/image/picture sample decoder). The information decoder may include an entropy decoder 210. The sample decoder may include at least one of an inverse quantizer 220, an inverse transformer 230, an adder 235, a filter 240, a memory 250, an inter prediction unit 260, or an intra prediction unit 265.

The inverse quantizer 220 may inversely quantize the quantized transform coefficient and output the transform coefficient. The inverse quantizer 220 may rearrange the quantized transform coefficients in the form of two-dimensional blocks. In this case, the rearrangement may be performed based on the coefficient scan order performed in the image encoding apparatus. The inverse quantizer 220 may perform inverse quantization on the quantized transform coefficients by using quantization parameters (e.g., quantization step size information) and obtain the transform coefficients.

The inverse transformer 230 may inverse transform the transform coefficients to obtain residual signals (residual blocks, residual sample arrays).

The prediction unit may perform prediction on the current block and generate a prediction block including prediction samples of the current block. The prediction unit may determine whether intra prediction or inter prediction is applied to the current block based on information about prediction output from the entropy decoder 210, and may determine a specific intra/inter prediction mode (prediction technique).

The prediction unit may generate a prediction signal based on various prediction methods (techniques) to be described later, as described in the prediction unit of the image encoding apparatus 100.

The intra prediction unit 265 may predict the current block by referring to samples in the current picture. The description of the intra prediction unit 185 applies equally to the intra prediction unit 265.

The inter prediction unit 260 may derive a prediction block of the current block based on a reference block (reference sample array) specified by a motion vector on a reference picture. In this case, in order to reduce the amount of motion information transmitted in the inter prediction mode, the motion information may be predicted in units of blocks, sub-blocks, or samples based on the correlation of the motion information between the neighboring block and the current block. The motion information may include a motion vector and a reference picture index. The motion information may also include inter prediction direction (L0 prediction, L1 prediction, bi prediction, etc.) information. In the case of inter prediction, neighboring blocks may include a spatial neighboring block present in the current tab and a temporal neighboring block present in the reference tab. For example, the inter prediction unit 260 may configure a motion information candidate list based on neighboring blocks and derive a motion vector and/or a reference picture index of the current block based on the received candidate selection information. Inter prediction may be performed based on various prediction modes, and the information on the prediction may include information indicating a mode of inter prediction of the current block.

The adder 235 may generate a reconstructed signal (reconstructed picture, reconstructed block, reconstructed sample array) by adding the obtained residual signal to a prediction signal (prediction block, prediction sample array) output from a prediction unit (including the inter prediction unit 260 and/or the intra prediction unit 265). If there is no residual for the block to be processed, such as when a skip mode is applied, the prediction block may be used as a reconstructed block. The description of adder 155 applies equally to adder 235. Adder 235 may be referred to as a reconstructor or a reconstructed block generator. The generated reconstructed signal may be used for intra prediction of the next block to be processed in the current picture, and may be used for inter prediction of the next picture by filtering as described below.

The filter 240 may improve subjective/objective image quality by applying filtering to the reconstructed signal. For example, the filter 240 may generate a modified reconstructed picture by applying various filtering methods to the reconstructed slice and store the modified reconstructed slice in the memory 250, in particular, in the DPB of the memory 250. Various filtering methods may include, for example, deblocking filtering, sample adaptive shifting, adaptive loop filters, bilateral filters, and the like.

The (modified) reconstructed picture stored in the DPB of the memory 250 may be used as a reference picture in the inter prediction unit 260. The memory 250 may store motion information of a block from which motion information in a current picture is derived (or decoded) and/or motion information of a block in a picture that has been reconstructed. The stored motion information may be transmitted to the inter prediction unit 260 so as to be used as motion information of a spatial neighboring block or motion information of a temporal neighboring block. The memory 250 may store reconstructed samples of the reconstructed block in the current picture and transmit the reconstructed samples to the intra prediction unit 265.

In the present disclosure, the embodiments described in the filter 160, the inter prediction unit 180, and the intra prediction unit 185 of the image encoding apparatus 100 may be equally or correspondingly applied to the filter 240, the inter prediction unit 260, and the intra prediction unit 265 of the image decoding apparatus 200.

Prediction sample derivation based on intra prediction mode/type

The prediction unit of the encoding apparatus/decoding device may derive a reference sample from an intra prediction mode of the current block among neighboring reference samples of the current block, and may generate a prediction sample of the current block based on the reference sample.

For example, the prediction samples may be derived based on an average value or interpolation of neighboring reference samples of the current block, and (ii) the prediction samples may be derived based on reference samples existing in a specific (prediction) direction with respect to the prediction samples among the neighboring reference samples of the current block. (i) The case of (a) may be referred to as a non-oriented mode or a non-angular mode, and the case of (ii) may be referred to as an oriented mode or an angular mode. Further, based on a prediction sample of the current block among the neighboring reference samples, the prediction sample may be generated by interpolating a first neighboring sample and a second neighboring sample located in a direction opposite to a prediction direction of an intra prediction mode of the current block. This may be referred to as linear interpolation intra prediction (LIP). In addition, a temporal prediction sample of the current block may be derived based on the filtered neighboring reference samples, and a prediction sample of the current block may be derived by weighted summing at least one reference sample derived from the intra prediction mode (i.e., an unfiltered neighboring reference sample) among the existing neighboring reference samples with the temporal prediction sample. The above case may be referred to as position-dependent intra prediction (PDPC). In addition, a reference sample row having the highest prediction accuracy is selected from among a plurality of adjacent reference sample rows of the current block to derive a prediction sample using reference samples located in a prediction direction in the corresponding sample row, and intra-prediction encoding may be performed in such a manner that the used reference sample row is indicated (signaled) to the decoding apparatus. The above case may be referred to as multi-reference line intra prediction (MRL) or MRL-based intra prediction. In addition, the current block is divided into vertical or horizontal sub-partitions to perform intra prediction based on the same intra prediction mode, but neighboring reference samples may be derived and used in sub-partition units. That is, in this case, the intra prediction mode of the current block is equally applicable to the sub-partition, but in some cases, intra prediction performance may be improved by deriving and using neighboring reference samples in units of the sub-partition. Such a prediction method may be referred to as intra-sub-partition (IPS) or IPS-based intra-prediction. Specific details will be described later. In addition, when the prediction direction based on the prediction sample points between adjacent reference samples, that is, when the prediction direction indicates a partial (partial) sample position, the value of the prediction sample can be derived by interpolation of a plurality of reference samples located around the prediction direction (around the partial sample position).

The above-described intra prediction method may be referred to as an intra prediction type to distinguish from an intra prediction mode. Intra prediction types may be referred to as various terms such as intra prediction techniques or additional intra prediction modes. For example, the intra prediction type (or additional intra prediction modes, etc.) may include at least one of LIP, PDPC, MRL or ISP described above. Information about the type of intra prediction may be encoded in the encoding device, included in the bitstream, and signaled to the decoding device. The information about the intra prediction type may be implemented in various forms such as flag information indicating whether each intra prediction type is applied or index information indicating one of several intra prediction types.

The MPM list for deriving the above-described intra prediction modes may be variously constructed according to the type of intra prediction. Alternatively, the MPM list may be generally constructed regardless of the intra prediction type.

Overview of PDPC (intra prediction from location)

The PDPC may indicate an intra prediction method of deriving filtered reference samples by performing filtering based on a filter of the PDPC, deriving temporary prediction samples of the current block based on intra prediction modes of the current block and the filtered reference samples, and deriving prediction samples of the current block by weighted summing at least one reference sample (i.e., an unfiltered reference sample) derived from the intra prediction modes among existing reference samples with the temporary prediction samples. Here, the predefined filter may be one of five 7-tap filters. Alternatively, the predefined filter may be one of a 3-tap filter, a 5-tap filter, and a 7-tap filter. The 3-tap filter, the 5-tap filter, and the 7-tap filter may indicate a filter having 3 filter coefficients, a filter having 5 filter coefficients, and a filter having 7 filter coefficients, respectively.

For example, the prediction result of the intra plane mode may also be modified by the PDPC.

Alternatively, for example, the PDPC is an intra-plane mode, an intra-DC mode, a horizontal intra-prediction mode, a vertical intra-prediction mode, a lower left intra-prediction mode (i.e., #2 intra-prediction mode), 8 directional intra-prediction modes adjacent to the lower left intra-prediction mode, an upper right intra-prediction mode, and 8 directional intra-prediction modes adjacent to the upper right intra-prediction mode, without being signaled separately.

Specifically, when the PDPC is applied, a prediction sample based on the (x, y) coordinates of the linear combination prediction of the reference sample and the intra prediction mode can be derived as shown in the following formula.

pred(x,y)＝(wL×R_-1,y+wT×R_x,-1–wTL×R_-1,-1+(64–wL–wT+wTL)×pred(x,y)+32)>>6

Where R _x,-1 and R _-1,y represent upper and left reference samples located on the upper and left sides of the current sample at the (x, y) coordinate, and R _-1,-1 represents an upper left reference sample located on the upper left corner of the current block.

Further, when the PDPC is applied to an intra plane mode, an intra DC mode, a horizontal intra prediction mode, and a vertical intra prediction mode, an additional boundary filter such as a DC mode boundary filter or a vertical/horizontal mode edge filter of the existing HEVC may not be required.

Fig. 4a to 4d show reference samples (R _x,-1,R_-1,y,R_-1,-1) defined in the PDPC applied to various prediction modes.

Furthermore, the weight of the PDPC can be derived from the prediction mode. The weight of the PDPC can be derived as shown in table 1 below.

TABLE 1

In intra-prediction combining (PDPC) according to location, prediction samples are generated using reference samples according to a prediction mode, and then the prediction samples are modified using neighboring reference samples. Unlike the application of the PDPC to all intra prediction modes, the PDPC is restrictively applied to the plane, DC, 2 (lower right mode), VDIA (upper left mode), hor (horizontal mode), ver (vertical mode), neighbor modes (# 3 mode to #10 mode) of mode 2, and neighbor modes (# 58 mode to #65 mode) of VDIA mode based on 65-direction intra prediction modes. In addition, not all prediction samples within a block to be currently encoded are applied, but variably in consideration of the size of the block.

DIMD (overview of decoder-side intra mode derivation)

In DIMD, intra prediction can be derived from a weighted average of one planar mode and two angular modes. Both corner modes may be selected by a gradient histogram (HoG) calculated from neighboring pixels of the current block. When two corner modes are selected, the two predictors and the plane predictors selected are calculated, and then the weighted average of the two predictors and the plane predictor is used as the final predictor of the current block. To determine the weights, a corresponding amplitude of the HoG may be used for each of the two angular modes.

The derived intra mode may be included in the intra MPM master list. Thus, prior to generating the MPM list, a DIMD process may be performed. The main derived intra mode of DIMD blocks may be stored with the block and used to construct an MPM list of neighboring blocks.

TIMD (template-based intra mode derivation) overview

For each intra-prediction mode in the MPM, SATD between the prediction samples and reconstructed samples of the template may be calculated. The first two intra prediction modes with the smallest SATD may be selected as TIMD modes. These two TIMD modes can be combined with weights (weighted summation). Such weighted intra prediction may be used to encode the current CU. The PDPC (intra-prediction combination according to location) may be included in the derivation process TIMD. That is, the calculation of the SATD for deriving TIMD may be performed based on the prediction block for which the PDPC has been performed.

The cost of the two selected modes (costMode, costMode 2) is compared to a threshold, and the cost factor 2 under test can be applied as follows.

costMode2＜2*costMode1

When the above condition is true, a combination is applied, otherwise only mode 1 may be used.

The weights of the patterns can be calculated by their SATD costs as follows:

weight1＝costMode2/(costMode1+costMode2)

weight2＝1-weightl

General derivation of motion information

Intra prediction may be performed using motion information of the current block. The encoding device may derive the optimal motion information for the current block by motion estimation. For example, the encoding apparatus may search for a similar reference block having high correlation in units of fractional pixels within a predetermined search range in the reference picture using an original block in an original picture for the current block, thereby deriving motion information. The similarity of the blocks may be deduced from differences between the phase-based sample values. For example, the similarity of the blocks may be calculated from the SAD between the current block (or a template of the current block) and the reference block (or a template of the reference block). In this case, the motion information may be derived based on the reference block having the smallest SAD in the search region. The derived motion information may be signaled to the decoding device according to various methods based on inter prediction modes.

CIIP (combined inter and intra prediction) overview

CIIP may be applied to the current block. An additional flag (e.g., ciip _flag) may be signaled to indicate whether CIIP mode is to be applied to the current CU. For example, when one CU is encoded in the merge mode, if the CU contains at least 64 luma samples (i.e., the CU width times the CU height is equal to or greater than 64), and if both the CU width and the CU height are less than 128 luma samples, an additional flag may be signaled to indicate whether the ClIP mode is applied to the current CU. As the name suggests, CIIP prediction combines an inter-prediction signal with an intra-prediction signal. The inter prediction signal p_inter in CIIP mode may be derived using the same inter prediction procedure as applied to the conventional merge mode; and the intra prediction signal P _ intra may be derived after a conventional intra prediction process of the planar mode. The intra prediction signal and the inter prediction signal may then be combined using a weighted average method in which weight values are calculated according to coding modes of the top neighboring block and the left neighboring block (see fig. 6).

IsIntraTop is set to 1 if the top neighboring block is available and intra coded, and isIntraTop is set to 0 otherwise.

If the left neighboring block is available and intra coded, ISINTRALEFT is set to 1, otherwise ISINTRALEFT is set to 0.

If (isIntraTop + ISINTRALEFF) is equal to 2, then wt is set to 3.

Otherwise, if (isIntraTop + ISINTRALEFT) is equal to 1, then wt is set to 2.

Otherwise, wt is set to 1.

CIIP predicts the formation as follows:

P_CIIP＝((4-wt)*P_inter+wt*P_intra+2)＞＞2

CIIP mixing overview with PDPC

CIIP modes can be extended by combining with the PDPC. In the extended mode (CIIP _pdpc), the upper reconstructed sample R _x,-1 and the left reconstructed sample R _-1,y can be used to improve the prediction of the conventional merge mode. Such improvements may inherit the PDPC scheme. A predictive flow chart in CIIP _pdpc mode is shown in fig. 7. Here, W _T and W _L are weights according to the sample positions of the blocks defined in the PDPC.

CIIP _pdpc mode can be signaled along with CIIP mode. If CIIP flag is true, another flag, such as CIIP _pdpc flag, may be additionally signaled to indicate whether CIIP _pdpc is used.

As shown in fig. 7, the inter prediction signal PREDINTER (x, y) of CIIP may be generated by an inter prediction process 730 with MV data as an input. At this time, the MV data may be motion information obtained based on a conventional merge mode such as an existing CIIP mode.

In addition, by applying the PDPC 720 to the block generated by performing the intra prediction process 710 according to the plane mode (intrapred =0), the intra prediction signal predPdpc (x, y) of CIIP can be generated.

Finally, based on PREDINTER (x, y) and predPdpc (x, y), the prediction samples P _{CIIP_PDPC} (x, y) of the intra-prediction block (x, y) coordinates can be derived by the following formula.

PCIIP PDPC(x，y)＝(32+(predPdpc(x，y)＜＜6)+(64-WL-WT)*predInter(x，y))＞＞6

Overview of GPM (geometric partition mode)

GPM may be provided as an inter prediction mode. The GPM mode may be signaled as a merge mode using CU level flags, other merge modes including regular merge mode, MMVD mode, CIIP mode, and sub-block merge mode. For each possible CU size w×h=2 ^m×2ⁿ (m, n e {3 … }) excluding 8×64 and 64×8, the gpm mode can support 64 partitions in total.

Fig. 8 shows one example of GPM partitioning. Referring to fig. 8, when GPM is used, a CU is divided into two partitions by geometrically located straight lines. The location of the split line may be derived from the angle and offset parameters of a particular partition. Each partition of the geometric partitions in the CU may use its own motion vector for inter prediction. Each partition only allows a single prediction. That is, each portion has a motion vector and a reference index. Single prediction motion constraints may be employed to ensure that, as with conventional bi-prediction, only two motion compensated predictions are required per CU.

If GPM is used for the current CU, then the geometric partition index (angle and offset) and two merge indexes (one for each partition) indicating the partition mode of the geometric partition are further signaled. The number of maximum GPM candidate sizes is explicitly signaled in the SPS and syntax binarization of the GPM merge index may be specified. After predicting each portion of the geometric partition, a blending process with adaptive weights may be used to adjust sample values along the edges of the geometric partition. This is a prediction signal for the entire CU, which is transformed and quantized as in other prediction modes.

Overview of transformation/inverse transformation

As described above, the encoding apparatus may derive a residual block (residual sample) based on a block (prediction block) predicted by intra/inter/IBC prediction, and derive a quantized transform coefficient by applying transform and quantization to the derived residual sample. Information about quantized transform coefficients (residual information) may be included and encoded in a residual coding syntax and output in the form of a bitstream. The decoding apparatus may acquire and decode information (residual information) on the quantized transform coefficients from the bitstream to derive the quantized transform coefficients. The decoding device may derive residual samples by dequantizing/inverse transforming based on the quantized transform coefficients. As described above, at least one of quantization/dequantization and/or transform/inverse transform may be skipped. When quantization/dequantization is skipped, the quantized transform coefficients may be referred to as transform coefficients. When skipping the transform/inverse transform, the transform coefficients may be referred to as coefficients or residual coefficients, or for consistency of the expression, may still be referred to as transform coefficients. Whether to skip the transform/inverse transform may be signaled based on the transform_skip_flag.

In addition, in the present disclosure, the quantized transform coefficients and the transform coefficients may be referred to as transform coefficients and scaled transform coefficients, respectively. In this case, the residual information may include information about the transform coefficient, and the information about the transform coefficient may be signaled through a residual coding syntax. The transform coefficients may be derived based on residual information (or information about the transform coefficients), and the scaled transform coefficients may be derived by inverse transformation (scaling) of the transform coefficients. Residual samples may be derived based on an inverse transform (transform) of the scaled transform coefficients. This may be similarly applied/expressed in other parts of the disclosure.

The transformation/inverse transformation may be performed based on a transformation kernel. For example, a Multiple Transformation Selection (MTS) scheme is applicable in accordance with the present disclosure. In this case, some of the plurality of transform kernels may be selected and applied to the current block. The transform kernel may be referred to as various terms such as a transform matrix or a transform type. For example, the transform kernel set may indicate a combination of a vertical direction transform kernel (vertical transform kernel) and a horizontal direction transform kernel (horizontal transform kernel).

For example, MTS index information (or mts_idx syntax element) may be generated/encoded in the encoding device and signaled to the decoding device to indicate one of the transform core sets. For example, the set of transform kernels may be derived from the value of the MTS index information, as shown in table 2.

TABLE 2

tu_mts_idx[x0][y0]	0	1	2	3	4
						trTypeHor	0	1	2	1	2
trTypeVer	0	1	1	2	2

Table 2 shows tyTypeHor and TRTYPEVER values according to tu_mts_idx [ x0] [ y0 ].

For example, the transform core set may be determined as shown in table 3 based on the cu_sbt_horizontal_flag and the cu_sbt_pos_flag.

TABLE 3

cu_sbt_horizontal_flag	cu_sbt_pos_flag	trTypeHor	trTypeVer
				0	0	2	1
0	1	1	1
				1	0	1	2
1	1	1	1

Table 3 shows tyTypeHor and TRTYPEVER values according to the cu_sbt_horizontal_flag and the cu_sbt_pos_flag. Here, a cu_sbt_horizontal_flag equal to 1 may indicate that the current coding unit is horizontally divided into two transform blocks. In contrast, a cu_sbt_horizontal_flag equal to 0 may indicate that the current coding unit is vertically divided into two transform blocks. In addition, the cu_sbt_pos_flag equal to 1 may indicate that syntax elements tu_cbf_luma, tu_cbf_cb, and tu_cbf_cr of the first transform unit in the current coding unit are not present in the bitstream. In contrast, a cu_sbt_pos_flag equal to 0 may indicate that syntax elements tu_cbf_luma, tu_cbf_cb, and tu_cbf_cr of a second transform unit in the current coding unit are not present in the bitstream.

Further, in tables 2 and 3, trTypeHor may represent a horizontal direction conversion core, and TRTYPEVER may represent a vertical direction conversion core. trTypeHor/TRTYPEVER with a value of 0 may indicate DCT2, trTypeHor/TRTYPEVER with a value of 1 may indicate DST7, and trTypeHor/TRTYPEVER with a value of 2 may indicate DCT8. However, this is an example, and different values may map to different DCTs/DSTs per commitment.

Table 4 exemplarily shows the basis functions of the above DCT2, DCT8, and DST 7.

TABLE 4

In the present disclosure, an MTS-based transform is applied as the primary transform, and a secondary transform may also be applied. The auxiliary transform may be applied only to coefficients in the upper left w×h region of the coefficient block to which the main transform is applied, and may be referred to as a reduced auxiliary transform (RST). For example, w and/or h may be 4 or 8. In the transform, a first transform and a second transform may be sequentially applied to the residual block, and in the inverse transform, an inverse auxiliary transform and an inverse main transform may be sequentially applied to the transform coefficients. The auxiliary transform (RST transform) may be referred to as a Low Frequency Coefficient Transform (LFCT) or a low frequency inseparable transform (LFNST). The inverse auxiliary transform may be referred to as inverse LFCT or inverse LFNST.

Fig. 9 is a diagram showing LFNST application methods.

Referring to fig. 9, lfnst is applied between forward main transform 911 and quantization 913 at the encoder stage, and between dequantization 921 and inverse main transform (or main inverse transform) 923 at the decoder stage.

In LFNST, a 4×4 inseparable transform or an 8×8 inseparable transform may be applied (selectively) according to block size. For example, 4×4LFNST may be applied to relatively small blocks (i.e., min (height) < 8), and 8×8LFNST may be applied to relatively large blocks (i.e., min (height) > 4). In fig. 4, as an example, it is shown that 4×4 forward LFNST is applied to 16 input coefficients, and 8×8 forward LFNST is applied to 64 input coefficients. In addition, in fig. 4, as an example, it is shown that 4×4 inverse LFNST can be applied to 8 input coefficients, and 8×8 inverse LFNST can be applied to 16 input coefficients.

In LFNST, a total of four transform sets and two inseparable transform matrices (kernels) for each transform set may be used. As shown in table 5, a mapping from intra prediction modes to transform sets may be predefined.

TABLE 5

IntraPredMode	Transform set index
		IntraPredMode＜0	1
0＜＝IntraPredMode＜＝1	0
		2＜＝IntraPredMode＜＝12	1
13＜＝IntraPredMode＜＝23	2
		24＜＝IntraPredMode＜＝44	3
45＜＝IntraPredMode＜＝55	2
		56＜＝IntraPredMode＜＝80	1
81＜＝IntraPredMode＜＝83	0

Referring to Table 5, if three CCLM modes with prediction mode numbers 81 through 83 (i.e., 81. Gtoreq. IntraPredMode. Gtoreq.83) are used for the current block, transform set 0 may be selected for the current chroma block. For each transform set, the selected inseparable secondary transform candidate may be additionally specified by an explicitly signaled LFNST index. The index may be signaled once in the bitstream per intra CU after the transform coefficients.

Further, the transform/inverse transform may be performed in units of CUs or TUs. That is, the transform/inverse transform is applicable to residual samples in a CU or residual samples in a TU. A CU size may be equal to a TU size, or multiple TUs may exist in a CU region. Further, the CU size may generally indicate a luma component (sample) CB size. The TU size may generally indicate a luma component (sample) TB size. The luminance component (sample) CB or TB size on which the chrominance component (sample) CB or TB size may be based may be derived based on the component ratio (e.g., 4:4:4, 4:2:2, 4:2:0, etc.) according to the color format (chrominance format).

As described above, the transform may be applied to the residual block. This is to de-correlate the residual block as much as possible, concentrate the coefficients at low frequencies, and create zero tails at the ends of the block. The transformation part in JEM software includes two main functions: core transforms and auxiliary transforms. The core transform consists of a family of DCT (discrete cosine transform) and DST (discrete sine transform) applied to all rows and columns of the residual block. The secondary transform may then be additionally applied to the upper left corner of the output of the core transform. Similarly, the inverse transform may be applied in the following order: an auxiliary inverse transform and a core inverse transform. In particular, an auxiliary inverse transform may be applied to the upper left corner of the coefficient block. The core inverse transform is then applied to the rows and columns of the output of the auxiliary inverse transform. The core transform/inverse transform may be referred to as a main transform/inverse transform.

Overview of AMT (adaptive multi-core transformation)

In addition to existing DCT-2 and 4 x 4DST-7, adaptive multi-transform or explicit multi-transform (AMT or EMT) techniques may be used for residual coding for inter-coded blocks and intra-coded blocks. Hereinafter, AMT and EMT will be used interchangeably. In AMT, multiple transforms selected from the DCT/DST family may be used in addition to existing transforms. The newly introduced transformation matrices in JEM are DST-7, DCT-8, DST-1 and DCT-5. The basis functions of the DST/DCT used in the AMT are shown in Table 6.

TABLE 6

The EMT may be applied to CUs having a width and height of less than or equal to 64, and whether the EMT is applied may be controlled by a CU-level flag. For example, if the CU level flag is 0, DCT-2 is applied to the CU to encode the residual. For luma coded blocks within a CU to which EMT is applied, two additional flags are signaled to identify the horizontal and vertical transforms to be used. In JEM, the residual of a block may be encoded in transform skip mode. For intra residual coding, a mode dependent transform candidate selection procedure may be used due to different residual statistics for different intra prediction modes. For example, three transform subsets may be defined as shown in table 7, and the transform subsets may be selected based on the intra prediction modes as shown in table 8.

TABLE 7

TABLE 8

/>

With the subset concept, the transform subset can be identified based on table 6, first using the intra prediction mode of the CU whose CU level emt_cu_flag is equal to 1. Then, for each of the horizontal (emt_tu_horizontal_flag) and vertical (emt_tu_vertical_flag) transforms, one of the two transform candidates in the identified subset of transforms may be selected based on explicit signaling using the flags according to table 8.

TABLE 9

Table 9 shows the set of transformation configurations for which AMT is applied.

Referring to table 9, transform configuration groups are determined based on the prediction modes, and the number of groups may be 6 (G0 to G5) in total. In addition, G0 to G4 correspond to a case where intra prediction is applied, and G5 may indicate a transform combination (or a transform set, a transform combination set) applied to a residual block generated by inter prediction.

One transform combination consists of a horizontal transform (or row transform) applied to the rows of the corresponding 2D block and a vertical transform (or column transform) applied to the columns.

Here, all the transformation configuration groups may each have four transformation combination candidates. Four transform combination candidates may be selected or determined by transform combination indexes of 0 to 3, and the transform combination indexes may be encoded and transmitted from the encoder to the decoder.

For example, residual data (or residual signal) obtained through intra prediction may have different statistical characteristics according to intra prediction modes. Thus, as shown in table 9, transforms other than the general cosine transform may be applied to each intra prediction. In this disclosure, the transform type may be expressed as, for example, DCT type 2, DCT-II, or DCT-2.

Table 9 shows a case where 35 intra prediction modes are used and a case where 67 intra prediction modes are used. Multiple transform combinations may be applied to each transform configuration group divided in each intra prediction mode column. For example, the plurality of transform combinations may be composed of four (row-direction transform, column-direction transform) combinations. As a specific example, in group 0 DST-7 and DCT-5 may be applied in both row (horizontal) and column (vertical) directions, so a total of 4 combinations are possible.

Since a total of four transform kernel combinations may be applied to each intra prediction mode, a transform combination index for selecting one of them may be transmitted for each transform unit. In this disclosure, the transform combination index may be referred to as an AMT index and may be expressed as amt_idx.

In addition, in addition to the transform kernels presented in table 9, there may be cases where DCT-2 is optimal in both row and column directions due to the nature of the residual signal. Thus, the transformation can be adaptively applied by defining AMT flags for each coding unit. Here, if the AMT flag is 0, DCT-2 is applied to both the row direction and the column direction, and if the AMT flag is 1, one of four combinations can be selected or determined by the AMT index.

For example, when the AMT flag is 0 and the number of transform coefficients of one transform unit is less than 3, the transform core in table 9 is not applied and DST-7 may be applied to both the row direction and the column direction.

For example, if the transform coefficient value is first parsed and the number of transform coefficients is less than 3, the AMT index is not parsed and DST-7 is applied, thereby reducing the amount of additional information transmitted.

As an example, AMT may be applied only when the width and height of the transform unit are both 32 or less.

As an example, table 9 may be preset through offline training.

As an example, an AMT index may be defined as one index that may indicate a combination of horizontal and vertical transforms at the same time. Alternatively, the AMT index may be defined as separate horizontal and vertical transform indexes.

Fig. 10 is a flowchart illustrating an encoding process of performing AMT.

AMT may be applied regardless of the primary or secondary transformations. In other words, there is no limitation that must be applied to only one of the two, and both are applicable. Here, the primary transform may refer to a transform for initially transforming the residual block, and the secondary transform may refer to a transform for applying the transform to a block generated as a result of the primary transform.

First, the encoding apparatus may determine a transform group corresponding to a current block (S1010). Here, the transform group may refer to the transform group described above with reference to table 9, but is not limited thereto, and may be composed of other transform combinations.

The encoding device may perform a transform on candidate transform combinations available within the transform group (S1020).

As a result of performing the transform, the encoding device may determine or select a transform combination having the lowest RD (rate distortion) cost (S1030).

The encoding apparatus may encode a transform combination index corresponding to the selected transform combination (S1040).

Fig. 11 is a flowchart showing a decoding process of performing AMT.

First, the decoding apparatus may determine a transform group of the current block (S1110)

The decoding apparatus may parse the transform combination index (S1120). Here, the transformation combination index may correspond to one of a plurality of transformation combinations in the transformation group. The step S610 of determining the transform group and the step S1120 of parsing the transform combination index may be simultaneously performed.

The decoding apparatus may derive a transform combination corresponding to the transform combination index (S1130). Here, the transform combination may refer to the transform combination described above with reference to table 9, but is not limited thereto. In other words, configurations using different combinations of transforms are also possible.

The decoding apparatus may perform an inverse transform on the current block based on the transform combination (S1140). If the transform combination consists of a row transform and a column transform, the row transform may be applied first, and then the column transform may be applied. However, the process is not limited thereto, and may be applied conversely.

Overview of auxiliary transform NSST index coding

For the auxiliary transform/inverse transform, a mode dependent inseparable auxiliary transform (MDNSST) may be applied. To maintain low complexity MDNSST may be applied only to the low-frequency coefficients after the main transform. If both the width (W) and the height (H) of the transform coefficient block are greater than or equal to 8, an 8 x 8 inseparable auxiliary transform is applied to the upper left 8 x 8 region of the transform coefficient block. In contrast, if the width or height is less than 8, the 4×4 inseparable secondary transform is applied, and the 4×4 inseparable secondary transform may be performed on the upper left min (8,W) ×min (8,H) of the transform coefficient block. Here, min (a, B) is a function of the smaller value between outputs a and B.

There may be a total of 35 x 3 inseparable secondary transforms for both 4x 4 and 8 x 8 block sizes. Here, 35 may mean the number of transform sets specified by the intra prediction mode, and 3 may mean the number of NSST candidates for each intra prediction mode. The mapping from intra prediction modes to transform sets may be defined as shown in table 10.

TABLE 10

To indicate the transform core among the transform set, a NSST index (NSST idx) may be encoded. If NSST is not applied, an NSST index having a value of 0 may be signaled.

The secondary transform (e.g., MDSST) is not applied to the blocks encoded in the transform skip mode. If the MDNSST index is signaled for a CU and the MDNSST index is not zero, MDNSST is not used for blocks of components encoded in transform skip mode within the CU. The overall coding structure including the coefficient coding and NSST index coding is shown in fig. 11 and 12. A Coded Block Flag (CBF) is encoded to determine whether to perform coefficient coding or NSST coding. In fig. 11 and 12, the CBF flag may represent a luminance block CBF flag (cbf_luma flag) or a chrominance block CBF flag (cbf_cb flag or cbf_cr flag). Transform coefficients are encoded when the CBF flag is 1.

Fig. 12 is a flowchart showing an encoding process of performing NSST.

Referring to fig. 12, the encoding apparatus checks whether the CBF flag is 1 (S1210). If the CBF flag is 0 (no in S1210), the encoding apparatus does not perform transform coefficient encoding and NSST index encoding. In contrast, when the CBF flag is 1 (yes in S1210), the encoding apparatus performs encoding on the transform coefficient (S1220). Then, the encoding apparatus determines whether to perform NSST index encoding (S1230) and perform NSST index encoding (S1240). If NSST index encoding is not applied (NO in S1230), the encoding apparatus may terminate the transformation process and perform the subsequent steps (e.g., quantization) without applying NSST.

Fig. 13 is a flowchart showing a decoding process of performing NSST.

Referring to fig. 13, the decoding apparatus checks whether the CBF flag is 1 (S1310). If the CBF flag is 0 (no in S1310), the decoding apparatus does not perform transform coefficient decoding and NSST index decoding. In contrast, when the CBF flag is 1 (yes in S1310), the decoding apparatus performs decoding on the transform coefficient (S1320). Thereafter, the decoding apparatus determines whether to encode the NSST index (S1330) and parse the NSST index (S1340).

NSST may not be applied to the block (e.g., the entire TU) to which the main transform has been applied, but only to the upper left 8 x 8 region or 4 x 4 region of the block. For example, if the block size is 8×8 or more, 8×8NSST may be applied, and if the block size is less than 8×8, 4×4NSST may be applied. In addition, when 8×8NSST is applied, 4×4NSST may be applied to each 4×4 block. Both 8 x 8NSST and 4 x 4NSST follow the transform set configuration described above. An 8 x 8NSST may have 64 input data and 64 output data, and a 4 x 4NSST may have 16 inputs and 16 outputs.

Fig. 14 and 15 are diagrams showing a method of performing NSST. Fig. 14 shows Givens rotation and fig. 15 shows a 4 x 4NSST wheel configuration consisting of Givens rotation layer and permutation (permuzation).

Both 8 x 8NSST and 4 x 4NSST may consist of a hierarchical combination of Givens rotations. The matrix corresponding to one Givens rotation is shown in equation 1, and the matrix product is expressed in the graph shown in fig. 14.

[ 1]

As shown in fig. 14, by applying the matrix of formula 1 to two input data x _m and x _n, two output data t _m and t _n can be obtained.

Since one Givens rotation is to rotate two data, 32 or 8 Givens rotations are required to process 64 data (for 8×8 NSST) or 16 data (for 4×4 NSST), respectively. Thus, a Givens rotation layer can be constructed by grouping 32 or 8 Givens rotations.

As shown in fig. 15, the output data of one Givens rotation layer is forwarded by permutation (or shuffling) as the input data of the next Givens rotation layer. As shown in fig. 15, the permutation pattern is regularly defined, and in the case of 4×4NSST, four Givens rotation layers and the corresponding permutations form one round. 4×4NSST is performed in 2 rounds, and 8×8NSST is performed in 4 rounds. Different wheels use the same substitution pattern, but the applied Givens rotation angle may be different. Therefore, it is necessary to store the angle data configuring all Givens rotations for each transformation.

As a final step, the data output through the Givens rotation layer may be finally subjected to one permutation, and information about the permutation may be stored separately for each transformation. In the case of forward NSST, the permutation may be performed last, and in the case of reverse NSST, the reverse process of the permutation (i.e., reverse direction permutation or reverse permutation) may be performed first.

In the case of backward NSST, the Givens rotation layer and permutation that can be applied in forward NSST are applied in reverse order, and a negative (-) value can be added to each Givens rotation angle.

Overview of reduced auxiliary transforms (RST)

Fig. 16 and 17 are diagrams showing a method of executing RST.

Assuming that the orthogonal matrix representing one transform has a form of n×n, RT (reduced transform) leaves R (R < N) only from N transform basis vectors. The matrix that generates the forward RT of the transform coefficients may be defined as equation 2.

[ 2]

Since the matrix for backward RT is a transpose of the forward RT matrix, the application of forward RT and backward RT can be shown in fig. 16 (a) and (b).

The RT applied to the upper left 8×8 block of the transform coefficient block to which the main transform is applied may be referred to as 8×8RST. When the value of R is set to 16 in equation 2, the forward 8×8RST has a form of 16×64 matrix, and the backward 8×8RST has a form of 64×16 matrix. The transition set configuration shown in table 10 can also be applied to 8×8RST. That is, the 8×8RST can be determined from the transition set in table 10. Since one transform set consists of two or three transforms according to intra prediction modes, one of up to four transforms, including the case where no auxiliary transform is applied (one transform may correspond to an identity matrix), may be selected. Assuming that index 0, index 1, index 2, and index 3 are assigned to each of the four transforms (e.g., index 0 may be assigned to the identity matrix, i.e., where no secondary transform is applied), the applied transform may be specified by signaling syntax elements corresponding to the NSST index of each transform coefficient block. That is, for the 8×8 upper left block indexed by NSST, 8×8NSST may be specified in the case of NSST, and 8×8RST may be specified in the RST configuration.

When the forward 8×8RST shown in the above equation 2 is applied, since 16 significant transform coefficients are generated, it can be seen that 64 input data constituting an 8×8 region are reduced to 16 output data, and from the viewpoint of a two-dimensional region, only 1/4 of the region is padded with the significant transform coefficients. Therefore, 16 output data obtained by applying the positive 8×8RST are padded in the upper left area of fig. 12.

Fig. 17 shows a process of performing inverse scanning from sixty-fourth to seventeenth according to an inverse scanning order.

In fig. 17, the upper left 4×4 region becomes an ROI (region of interest) filled with significant transform coefficients, and the remaining region is left empty. The empty region may be filled with a default value of 0. If a non-zero significant transform coefficient is found in a region other than the ROI of fig. 17, it is determined that 8×8RST is not applied, and thus NSST index encoding may be skipped. In contrast, if non-zero transform coefficients are not found in the region other than the ROI of fig. 17 (i.e., the region other than the ROI is filled with 0), 8×8RST may be applied, and thus the NSST index may be encoded. Such conditional NSST index encoding may be performed after the residual encoding process since the presence or absence of non-zero transform coefficients needs to be checked.

In this disclosure, NSST/RT/RST may be collectively referred to as LFNST, and NSST index or (R) ST index may be collectively referred to as LFNST index. The LFNST may be applied in an inseparable transform format based on a transform kernel (transform matrix or transform matrix kernel) of low-frequency transform coefficients located in an upper left region of the transform coefficient block.

Fig. 18 is a diagram illustrating a transformation and inverse transformation process according to one embodiment of the present disclosure. In fig. 18, a transforming unit 1810 may correspond to the transformer 120 of fig. 2, and an inverse transforming unit 1820 may correspond to the inverse transformer 150 of fig. 2 or the inverse transformer 230 of fig. 3.

Referring to fig. 18, the transformation unit 1810 may include a main transformation unit 1811 and a sub transformation unit 1812.

The main transform unit 1811 may generate (main) transform coefficients B by applying a main transform to the residual samples a. In this disclosure, the primary transform may be referred to as a core transform.

The first transformation may be performed based on a MTS scheme. When an existing MTS is applied, a transform from a spatial domain to a frequency domain is applied to a residual signal (or a residual block) based on the DCT type 2, the DST type 7, and the DCT type 8, and transform coefficients (or main transform coefficients) may be generated. The DCT type 2, DST type 7, and DCT type 8 may be referred to herein as transform types, transform kernels, or transform kernels. Examples of basis functions for DCT type 2, DST type 7, and DCT type 8 are described above with reference to Table 4. However, this is an example and embodiments of the present disclosure may be applied even when the configuration of the existing MTS core is different, that is, even when it includes different types of DCT/DST or transform jumps.

Existing MTSs have the form of applying one kernel in the horizontal direction and one kernel in the vertical direction as separable transformations. It is well known that the non-separable transform core provides higher encoding/decoding efficiency than the separable transform core, but the non-separable transform method is not used in the conventional main transform.

Thus, according to embodiments of the present disclosure, the main transformation may be performed based on the inseparable transformation core. In this disclosure, a primary transform based on an inseparable transform core may be referred to as an inseparable primary transform or an inseparable core transform.

The inseparable primary transformations may replace at least one of the existing MTS candidates or may be added as new MTS candidates. For example, only DCT type 2 and the inseparable primary transform may be used as MTS candidates, or the inseparable primary transform may be used as MTS candidates other than DCT type 2, DST type 7, and DCT type 8.

Since the inseparable primary transformations are included in the MTS candidates, the tables in Table 2 above for MTS indexes (e.g., tu_mts_idx [ x0] [ y0 ]) can be modified, for example, as shown in Table 11 or Table 12.

TABLE 11

tu_mts_idx[x0][y0]	0	1	2
				trTypeHor	0	1	2
trTypeVer	0	1	2

TABLE 12

tu_mts_idx[x0][y0]	0	1	2	3	4	5
							trTypeHor	0	1	2	1	2	3
trTypeVer	0	1	1	2	2	3

In tables 11 and 12, trTypeHor may represent a horizontal direction conversion core, and TRTYPEVER may represent a vertical direction conversion core. trTypeHor/TRTYPEVER with a value of 0 may indicate DCT type 2, trTypeHor/TRTYPEVER with a value of 1 may indicate DST type 7, and trTypeHor/TRTYPEVER with a value of 2 may indicate DCT type 8. In addition, trTypeHor/TRTYPEVER of value 3 may indicate an inseparable primary transformation. However, this is merely an example and embodiments of the present disclosure are not limited thereto.

Since the inseparable main transform has a characteristic that the horizontal direction transform and the vertical direction transform are not separated, the transform kernel of the inseparable main transform needs to be the same for the horizontal direction and the vertical direction. Thus, when trTypeHor has a value indicating an inseparable primary transformation, TRTYPEVER values may also be constrained to have a value indicating an inseparable primary transformation, according to one embodiment of the present disclosure. For example, in table 12, if trTypeHor values are 3, TRTYPEVER values may not be between 0 and 2 and may be constrained to only 3.

Furthermore, in some embodiments, an inseparable primary transformation may be added as an option separate from the MTS scheme described above. For example, the inseparable primary transform may not be included in the MTS candidate, but may be used as an independent transform candidate. In this case, a predetermined first flag (e.g., nspt _flag) may be signaled to indicate whether to apply the inseparable primary transform, and a second flag (e.g., mts_flag) indicating whether to apply the MTS may be signaled only when the first flag is 0 (i.e., indicating that the inseparable primary transform is not applied).

For example, the inseparable main transform may be performed with a 4×4 block as input as follows. An example of the 4×4 input block x is shown in equation 3.

[ 3]

If the input block x is expressed in the form of a vector, it may be expressed as equation 4.

[ 4]

In this case, the inseparable main transform can be calculated as shown in equation 5.

[ 5]

Here the number of the elements is the number,Represents a transform coefficient vector, T represents a 16×16 inseparable transform matrix, and operator represents multiplication of the matrix and the vector.

The 16 x1 transform coefficient vector can be derived from 5And may be based on a scan order (e.g., horizontal, vertical, diagonal, or predetermined/stored scan order)The reconstruction is a4 x4 block. However, this is merely an example, and various optimized non-separable transformation computation methods may be used to reduce the computational complexity of the non-separable main transformation.

As such, according to an embodiment of the present disclosure, the main transform unit 1811 may generate (main) transform coefficients B by applying an inseparable main transform to the residual samples a.

The auxiliary transform unit 1812 may generate the (auxiliary) transform coefficient C by applying an auxiliary transform to the (main) transform coefficient B. In one example, LFNST above may be applied as a secondary transformation. In addition, the (secondary) transform coefficients C may be encoded by quantization and entropy encoding processes and used to generate a bitstream.

Next, the inverse transforming unit 1820 may include an (inverse) auxiliary transforming unit 1821 and an (inverse) main transforming unit 1822.

The (inverse) auxiliary transform unit 1821 applies an (inverse) auxiliary transform to the dequantized (auxiliary) transform coefficients C 'to generate (main) (inverse) transform coefficients B'. Here, the (inverse) auxiliary transform may correspond to an inverse process of the auxiliary transform performed by the transform unit 1810.

The (inverse) primary transform unit 1822 may generate residual samples a 'by applying an (inverse) primary transform to the (primary) (inverse) transform coefficients B'.

According to embodiments of the present disclosure, the (inverse) primary transform may comprise an inseparable primary inverse transform. The inseparable primary inverse transform may be included as an MTS candidate or provided as a separate inverse transform candidate. The inseparable primary inverse transform corresponds to the inverse process of the inseparable primary transform, and its specific details are described above with respect to the inseparable primary transform.

The transformation method in fig. 19 may be performed by the image encoding apparatus of fig. 2. For example, steps S1710 to S1730 may be performed by the inverter 120.

Referring to fig. 19, the image encoding apparatus may generate (main) transform coefficients by applying an inseparable main transform to residual samples (S1910). All residual samples in the residual block may be modified by the inseparable main transform. In one embodiment, the inseparable primary transform may have a Reduced Transform (RT) form with a number of output coefficients smaller than the number of input samples. In this case, zeroing may be performed for all regions where no output coefficient is generated.

The image encoding apparatus may determine whether a secondary transform is applied to the (primary) transform coefficient (S1920). The secondary transform may be an inseparable secondary transform such as NSST or RST. In one embodiment, the image encoding apparatus may determine whether to apply the secondary transform based on the primary transform residual transform coefficients. For example, the image encoding apparatus may determine that the auxiliary transform is applied when the number of non-zero residual transform coefficients included in the region to which the auxiliary transform is applied is greater than or equal to a predetermined threshold. On the other hand, if the number of non-zero residual transform coefficients included in the region to which the auxiliary transform is applied is less than the predetermined threshold, the image encoding apparatus may determine that the auxiliary transform is not applied. The information on whether to apply the secondary transform may be encoded as a predetermined syntax element (e.g., sps_ lfnst _enabled_flag, lfnst_idx, etc.).

Upon determining that the auxiliary transform is applied (yes in S1920), the image encoding device may generate (auxiliary) transform coefficients by applying the auxiliary transform to the (main) transform coefficients (S1930). In this case, the bit stream may be generated based on the (secondary) transform coefficients.

In contrast, when it is determined that the auxiliary transform is not applied (no in S1920), the image encoding apparatus may not perform the auxiliary transform on the (main) transform coefficient. In this case, the bit stream may be generated based on the (main) transform coefficients.

The inverse transformation method of fig. 20 may be performed by the image encoding apparatus of fig. 2 or the image decoding apparatus of fig. 3. For example, steps S2010 to S2030 may be performed by the inverse transformer 150 of fig. 2 or the inverse transformer 230 of fig. 3. Hereinafter, for convenience of description, attention will be given to an image decoding apparatus.

Referring to fig. 18, the image decoding apparatus may determine whether to apply an auxiliary inverse transform to transform coefficients obtained from a bitstream (S2010). The secondary inverse transform may be an inseparable secondary inverse transform such as NSST or RST. In one embodiment, the image decoding apparatus may determine whether to apply the secondary inverse transform based on a predetermined syntax element (e.g., sps_ lfnst _enabled_flag, lfnst_idx, etc.) obtained from the bitstream. For example, when lfnst _idx has a first value (e.g., 0), the image decoding apparatus may determine that the secondary inverse transform is not applied. In contrast, when lfnst _idx has a value different from the first value (e.g., 0), the image decoding apparatus may determine that the auxiliary inverse transform is applied.

Upon determining that the auxiliary inverse transform is applied (yes in S2010), the image decoding apparatus may generate (main) transform coefficients by applying the auxiliary inverse transform to the transform coefficients obtained from the bitstream (S2020). In this case, the transform coefficients obtained from the bitstream may correspond to the (auxiliary) transform coefficients.

On the other hand, upon determining that the auxiliary inverse transform is not applied (no in S2010), the image decoding apparatus may not perform the auxiliary inverse transform on the transform coefficient obtained from the bitstream. In this case, the transform coefficients obtained from the bitstream may correspond to the (main) transform coefficients.

The image decoding apparatus may generate residual samples by applying an inseparable primary inverse transform to the (primary) transform coefficients (S2030). All (primary) transform coefficients may be modified by an inseparable primary inverse transform. In one embodiment, the inseparable primary inverse transform may have a Reduced Transform (RT) form with a number of output coefficients greater than the number of input coefficients.

Sub-block based inseparable main transformation

In one embodiment, a block is partitioned into sub-blocks without applying an inseparable primary transform that is appropriate for the width and height of the block to a relatively large input block, and then the inseparable primary transform may be applied using an inseparable transform matrix that is appropriate for the width and height of each sub-block. For example, when the inseparable main transform is applied to the 4×8 block, the 4×8 block is horizontally divided into two 4×4 sub-blocks in the spatial domain, and the inseparable main transform in units of 4×4 blocks may be applied to each 4×4 sub-block. Alternatively, when the inseparable main transform is applied to the 16×8 block, the 16×8 block is vertically divided into two 8×8 sub-blocks in the spatial domain, and the inseparable main transform in units of 8×8 blocks may be applied to each 8×8 sub-block.

The transformation method in fig. 21 may be performed by the image encoding apparatus of fig. 2. For example, steps S2110 to S2130 may be performed by the converter 120. In addition, the inverse transformation method of fig. 21 may be performed by the image encoding apparatus of fig. 2 or the image decoding apparatus of fig. 3. For example, steps S2110 to S2130 may be performed by the inverse transformer 150 of fig. 2 or the inverse transformer 230 of fig. 3.

Referring to fig. 21, the image encoding/decoding apparatus may determine whether a predetermined sub-block transform/inverse transform condition is satisfied (S2110). In one embodiment, the image encoding/decoding apparatus may determine whether the sub-block transform/inverse transform condition is satisfied based on a result of comparing the size of the input block with a predetermined threshold. Here, the threshold values may include a first threshold value of 4×4 size and a second threshold value of 8×8 size. Specifically, when the size of the input block is greater than the first threshold value and less than the second threshold value, the image encoding/decoding apparatus may determine that the sub-block transform/inverse transform condition is satisfied. Further, the image encoding/decoding apparatus may determine that the sub-block transform/inverse transform condition is satisfied when the size of the input block is greater than the second threshold.

If the sub-block transform condition is satisfied (yes in S2110), the image encoding/decoding apparatus may obtain a plurality of sub-blocks by dividing the input block (S2120). For example, the image encoding/decoding apparatus may obtain two 4×4 sub-blocks by vertically dividing an 8×4 block. Alternatively, the image encoding/decoding apparatus obtains two 8×8 sub-blocks by horizontally dividing an 8×16 block.

In contrast, if the sub-block transform condition is not satisfied (no in S2110), the image encoding/decoding apparatus may determine that the input block is not divided and proceed to step S2130.

In addition, the image encoding/decoding apparatus may apply an inseparable main transform/inverse transform to the input block or each sub-block (S2130). When the inseparable main transform/inverse transform is applied to the entire input block, the inseparable transform matrix may be determined based on the width and height of the input block. In contrast, when the inseparable main transform/inverse transform is applied to each sub-block, the inseparable transform matrix may be determined based on the width and height of each sub-block.

Fig. 22a and 22b are diagrams illustrating a sub-block-based inseparable main transformation process.

Fig. 22 a-22 b show the inseparable main transform at the encoder stage (i.e. the forward inseparable main transform process).

Fig. 22a shows a case where the size of an input block is 8×4 or 4×8, and fig. 22b shows a case where the size of an input block is 16×8 or 8×16.

In fig. 22a to 22b, the region indicated by the bold line represents the region to which the inseparable main transformation is applied.

First, referring to (a) of fig. 22a, an 8×4 input block may be divided into two 4×4 sub-blocks Sb1 and Sb2. Further, a 4×4 inseparable main transform may be applied to each of the sub-blocks Sb1 and Sb2. Specifically, a 16×16 inseparable main transform matrix may be applied to the first sub-block Sb1 to generate 16 (main) transform coefficients. In addition, a 16×16 inseparable main transform matrix may be applied to the second sub-block Sb2 to generate 16 (main) transform coefficients.

Referring to (b) of fig. 22a, a 4×8 input block may be divided into two 4×4 sub-blocks Sb3 and Sb4. In addition, a 4×4 inseparable main transform may be applied to each of the sub-blocks Sb3 and Sb4. Specifically, a 16×16 inseparable main transform matrix may be applied to the third sub-block Sb3 to generate 16 (main) transform coefficients. In addition, a 16×16 inseparable main transform matrix may be applied to the fourth sub-block Sb4 to generate 16 (main) transform coefficients.

Next, referring to (a) of fig. 22b, the 16×8 input block may be divided into two 8×8 sub-blocks Sb1 and Sb2. Further, an 8×8 inseparable main transform may be applied to each of the sub-blocks Sb1 and Sb2. Specifically, a 64×64 inseparable main transform matrix may be applied to the first sub-block Sb1 to generate 64 (main) transform coefficients. Further, a 64×64 inseparable main transform matrix may be applied to the second sub-block Sb2 to generate 64 (main) transform coefficients.

Referring to (b) of fig. 22b, the 8×16 input block may be divided into two 8×8 sub-blocks Sb3 and Sb4. In addition, an 8×8 inseparable main transform may be applied to each of the sub-blocks Sb3 and Sb4. Specifically, a 64×64 inseparable main transform matrix may be applied to the third sub-block Sb3 to generate 16 (main) transform coefficients. In addition, a 64×64 inseparable main transform matrix may be applied to the fourth sub-block Sb4 to generate 16 (main) transform coefficients.

Further, in some embodiments, a different non-separable main transform matrix may be applied to each sub-block. For example, in fig. 22a, a 16×16 inseparable main transform matrix may be applied to the first sub-block Sb1, and a 16×8 inseparable main transform matrix may be applied to the second sub-block Sb2. In this case, the region in the second sub-block Sb2 where the (main) transform coefficient is not generated may be filled with zero values (i.e., zeroed).

Application of inseparable transforms in CIIP (combined inter and intra prediction)

According to one embodiment of the present disclosure, it may be determined whether to apply the inseparable primary transform to the block to which CIIP mode was applied and/or to determine the inseparable primary transform core set. According to one embodiment of the present disclosure, when various transform cores are present, a set of transform cores and/or a transform core for a current transform block may be selected based on CIIP modes.

If CIIP modes are applied to the current block, the existing DCT type 2, DST type 7, and DCT type 8 may be used. In this embodiment, the inseparable main transform may be applied to the block to which CIIP modes are applied. For example, when the inseparable main transform is applied to the block to which CIIP modes are applied, only the DCT type 2 and inseparable main transforms may be used. Alternatively, an inseparable main transform may be used in addition to the existing DCT type 2, DST type 7 and DCT type 8. Alternatively, one or more cores of existing DCT type 2, DST type 7, and DCT type 8 may be replaced with an inseparable main transform. In addition, it may also be determined whether to apply the inseparable main transform to the block to which the CIIP mode is applied based on the size and/or shape condition of the current block. More specifically, when the prediction mode of the current block is CIIP modes, the main transform method of the current block may be adaptively selected as shown in the following example.

Always apply inseparable main transformations

Determining based on information (e.g., a 1-bit flag, such as ciip _ nspt _mode) indicating whether to apply an inseparable primary transform

A determination is made based on the number of pixels in the current block (e.g., if the number of pixels in the current block is greater than 16 but less than 1024, the inseparable main transform is always applied). In other cases, no inseparable primary transformations are applied.

Signaling information specifying whether to apply the inseparable primary transform based on the number of pixels in the current block (e.g., if the number of pixels in the current block is 16 or more but less than 1024, the primary transform method is determined from the information specifying whether to apply the inseparable primary transform). In other cases, no inseparable primary transformations are applied.

In the above example, the conditions regarding the block size and/or shape for determining whether to apply the main transform are not limited to the above example. For example, the particular value for the number of pixels in the current block may vary.

As described above, according to one embodiment of the present disclosure, an inseparable main transform may be performed on a block predicted in CIIP modes. At this time, the transform core of the inseparable main transform may be selected from the set of n transform cores and each set. At this time, n and k may be adaptively changed. For example, when there are various inseparable primary transform cores, the set of transform cores and/or the transform core of the current transform block may be selected based on information about CIIP weights and/or CIIP intra predictors. Here, the information on the CIIP intra predictor may be information indicating a specific mode for generating the CIIP intra predictor. The specific modes may refer to, for example, an existing planar mode, a PDPC hybrid mode, a mode using DIMD in CIIP, and/or a mode using TIMD in CIIP, which will be described later. In this disclosure, information about CIIP intra predictors may be referred to as "CIIP intra modes".

Specifically, CIIP modes are modes in which prediction is performed by a weighted sum of a prediction value generated in a planar intra prediction mode and a prediction value generated in a merge inter prediction mode. Accordingly, the set of transform kernels and/or the transform kernels may be adaptively selected according to the predictor type and/or CIIP weights generated. More specifically, the transform core of the inseparable main transform of the current block predicted in CIIP mode may be selected or determined by at least one of the following examples.

-Using a single transformation core in a single set of transformation cores. That is, a single non-separable main transformation core may be used.

-Selecting/determining from a plurality of transformation cores in a single set of transformation cores. The selection from the plurality of transformed cores may be made adaptively or based on signaled core selection information.

-Selecting/determining from a set of multiple transform cores and a single transform core in the set. At this point, a set of multiple transform kernels may be selected based on the weight values of CIIP, as shown in the examples of table 13 and/or table 14.

-Selecting/determining from a set of a plurality of transformation cores and a plurality of transformation cores in the set. At this point, a set of multiple transform kernels may be selected according to the weight values of CIIP, as shown in examples of table 13 and/or table 14. Further, the selection from among the plurality of transform cores in the set may be made adaptively or based on signaled core selection information.

TABLE 13

CIIP weight	TrSetIdx
		1	0
2	1
		3	2

TABLE 14

CIIP weight	TrSetIdx
		1	0
2	1
		3	1

Table 13 is an example of a method of selecting the inseparable primary transform kernel set (TrSetIdx, n=3) based on CIIP weights. In addition, table 14 is an example of a method of selecting the inseparable main transform kernel set (TrSetIdx, n=2) based on CIIP weights.

As described above, a PDPC mixed mode obtained by combining CIIP modes with a PDPC can be applied. In this case, the set of transform cores and/or the transform core to be applied to the inseparable primary transform of the current block may be selected/determined based on the CIIP weight values and/or types of the generated predictors. At this time, the type of predictor may mean whether it is a predictor obtained based on the existing CIIP mode (normal) or a predictor obtained based on the PDPC blend mode. According to another embodiment of the present disclosure, when the PDPC blend mode is available, the main transform method of the predicted current block may be adaptively selected CIIP as shown in the following example. In this disclosure, it may be referred to as a normal mode, which is one of CIIP intra modes to which an existing plane mode is applied during CIIP intra prediction. In addition, a case where the PDPC is applied during the intra prediction of CIIP may be referred to as a PDPC mixed mode or CIIP _pdpc mode as one of the CIIP intra modes.

-Selecting/determining from a set of multiple transform cores and a single transform core in the set. At this point, a set of multiple transform kernels may be selected according to CIIP intra modes and weight values of CIIP, as shown in the examples of table 15 and/or table 16.

-Selecting/determining from a set of a plurality of transformation cores and a plurality of transformation cores in the set. At this point, a set of multiple transform kernels may be selected based on the CIIP intra mode and weight values of CIIP, as shown in the examples of table 15 and/or table 16. The selection from the plurality of transform cores in the set may be made adaptively or based on signaled core selection information.

TABLE 15

CIIP intra mode	TrSetIdx
		Normal state	0
PDPC mixing	1

TABLE 16

CIIP intra mode	CIIP weight	TrSetIdx
			Normal state	1	0
Normal state	2	1
			Normal state	3	2
PDPC mixing	1	3
			PDPC mixing	2	4
PDPC mixing	3	5

Table 15 is an example of a method of selecting an inseparable main transform kernel set (TrSetIdx, n=2) based on CIIP intra modes. Further, table 16 is an example of a method of selecting an inseparable primary transform kernel set (TrSetIdx, n=6) based on CIIP intra modes and CIIP weights.

As described above, when CIIP intra modes and CIIP weights are used, CIIP intra modes and CIIP weights are grouped into n sets, and each set may include k transform kernels. In the present disclosure, CIIP weights and the number of CIIP intra modes and the grouping method thereof are not limited to the above-described embodiments. For example, if determined according to the width and/or height of the input block, the number of sets and the number of transform kernels in the sets may vary according to the width and/or height of the block.

Hereinafter, various embodiments of applying the inseparable main transform when DIMD (decoder-side mode derivation) or TIMD (template-based intra mode derivation) is used during intra prediction of CIIP will be described. In the present disclosure, the case of using DIMD during intra prediction of CIIP may be referred to as CIIP _ DIMD mode, as one of CIIP intra modes. In addition, the case of using TIMD during intra prediction of CIIP may be referred to as CIIP _ TIMD mode, as one of the CIIP intra modes.

In addition to the planar intra prediction modes described above, the intra prediction signal of CIIP may also be generated by a prediction mode based on the PDPC weights and/or DIMD/TIMD. In accordance with the present disclosure, a set of transform cores and/or transform cores may be adaptively selected based on the generated predictor type. More specifically, when a PDPC blend mode and/or TIMD/DIMD is available, the main transform method of the current block predicted by CIIP may be adaptively selected as shown in the following example.

-Selecting/determining from a set of multiple transform cores and a single transform core in the set. At this time, a set of a plurality of transform cores may be selected based on CIIP intra modes, as shown in the example of table 17.

-Selecting/determining from a set of a plurality of transformation cores and a plurality of transformation cores in the set. At this time, a set of a plurality of transform cores may be selected according to CIIP intra modes, as shown in the example of table 17. The selection from the plurality of transform cores in the set may be made adaptively or based on signaled core selection information.

TABLE 17

CIIP intra mode	TrSetIdx
		Normal state	0
PDPC mixing	1
		CIIP with DIMD	2
CIIP with TIMD	3

Table 17 illustrates a method of selecting an inseparable primary transform kernel set (TrSetIdx, n=4) based on CIIP intra modes.

Application of inseparable main transformation in GPM (geometric partition mode)

Hereinafter, various embodiments of applying an inseparable main transform to blocks predicted in a GPM will be described.

According to one embodiment of the present disclosure, it may be determined whether to apply the inseparable primary transform to a block to which GPM (geometric partition mode) is applied and/or to determine a set of inseparable primary transform cores. According to one embodiment of the present disclosure, when there are various transform cores, a set of transform cores and/or a transform core for a current transform block may be selected based on the GPM.

If GPM is applied to the current block, existing DCT type 2, DST type 7, and DCT type 8 may be used. In this embodiment, the inseparable main transformation may be applied to the blocks to which the GPM is applied. For example, when an inseparable main transform is applied to a block to which the GPM is applied, only DCT type 2 and inseparable main transforms may be used. Alternatively, an inseparable main transform may be used in addition to the existing DCT type 2, DST type 7 and DCT type 8. Alternatively, one or more cores of existing DCT type 2, DST type 7, and DCT type 8 may be replaced with an inseparable main transform. Further, it may be determined whether to apply the inseparable main transform to the GPM-applied block according to the size and/or shape condition of the current block. More specifically, when the prediction mode of the current block is GPM, the main transform method of the current block may be adaptively selected as shown in the following example.

Always apply inseparable main transformations

-Determining based on information (e.g. a 1-bit flag, such as gpm_ nspt _mode) indicating whether to apply an inseparable main transform

Signaling information specifying whether to apply the inseparable primary transform based on the number of pixels in the current block (e.g., if the number of pixels in the current block is 16 or more but less than 1024, the primary transform method is determined based on the information specifying whether to apply the inseparable primary transform). In other cases, no inseparable primary transformations are applied.

As described above, according to one embodiment of the present disclosure, an inseparable main transform may be performed on a block predicted in a GPM. At this time, the transform core of the inseparable main transform may be selected from a set of n transform cores and k transform cores in each set. At this time, n and k may be adaptively changed. For example, the GPM may generate 64 block partitions as shown in table 18 based on various angle indexes and distance indexes.

TABLE 18

GPM index	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
																	Angular index	0	0	2	2	2	2	3	3	3	3	4	4	4	4	5	5
Distance index	1	3	0	1	2	3	0	1	2	3	0	1	2	3	0	1
																	GPM index	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31
Angular index	5	5	8	8	11	11	11	11	12	12	12	12	13	13	13	13
																	Distance index	2	3	1	3	0	1	2	3	0	1	2	3	0	1	2	3
GPM index	32	33	34	35	36	37	38	39	40	41	42	43	44	45	46	47
																	Angular index	14	14	14	14	16	16	18	18	18	19	19	19	20	20	20	21
Distance index	0	1	2	3	1	3	1	2	3	1	2	3	1	2	3	1
																	GPM index	48	49	50	51	52	53	54	55	56	57	58	59	60	61	62	63
Angular index	21	21	24	24	27	27	27	28	28	28	29	29	29	30	30	30
																	Distance index	2	3	1	3	1	2	3	1	2	3	1	2	3	1	2	3

Table 18 shows the 64 GPM block partition indices.

Thus, the set of transform kernels and the transform kernel may be adaptively selected based on the generated block partition shape. More specifically, the transform core to which the inseparable primary transform of the current block of the GPM is applied may be selected or determined by at least one of the following examples.

-Selecting/determining from a set of multiple transform cores and a single transform core in the set. At this time, a set of a plurality of transformation cores may be selected based on the GPM index, the angle index, and/or the distance index, as illustrated in examples of table 19, table 20, and/or table 21.

-Selecting/determining from a set of a plurality of transformation cores and a plurality of transformation cores in the set. At this time, a set of a plurality of transformation cores may be selected according to a GPM index, an angle index, or a distance index, as illustrated in examples of table 19, table 20, and/or table 21. Further, the selection from among the plurality of transform cores in the set may be made adaptively or based on signaled core selection information.

TABLE 19

GPM index	TrSetIdx
		GPM index < = 17	0
17< Gpm index < = 35	1
		35< Gpm index < = 49	2
49< Gpm index < = 63	3

TABLE 20

Angular index	TrSetIdx
		0,16	0
2,18	1
		3,19	2
4,20	3
		5,21	4
8,24	5

TABLE 21

Distance index	TrSetIdx
		0	0
1	1
		2	2
3	3

Table 19 illustrates a method of selecting an inseparable primary transform core set (TrSetIdx, n=4) based on a GPM index. Table 20 illustrates a method of selecting an inseparable primary transform kernel set (TrSetIdx, n=10) based on an angle index. Table 21 illustrates a method of selecting an inseparable primary transform kernel set (TrSetIdx, n=4) based on a distance index.

As described above, the GPM shapes are grouped into n sets, and each set may include k transform cores. In the present disclosure, the number of GPM shapes and the grouping method are not limited to the above examples. For example, if determined according to the width and/or height of the input block, the number of sets and/or the number of transform kernels in the sets may vary according to the width and/or height of the block.

In another embodiment of the present disclosure, a method for transpose of symmetric block partitioning and using the same set of transform kernels is presented. Referring to fig. 23, when GPM is used for block division, two different GPM indexes and symmetric division may occur. Accordingly, the present invention proposes a method of transposing input data and using the same primary transform core set for symmetric GPM indexes when configuring an inseparable primary transform core set. For example, the main transform core set for GPM index 11 may be equally used for GPM index 44 (input data transpose). Transpose the input data means that the rows for the two-dimensional block data mxn become columns and the columns become rows, forming NxM data.

Fig. 24 is a flowchart illustrating an example of a main transformation method for a block to which a predetermined pattern is applied according to the present disclosure. The method shown in fig. 24 may be performed in an image encoding apparatus and/or an image decoding apparatus to transform and/or inverse transform a residual block of a current block.

Referring to fig. 24, it may be determined whether the prediction mode of the current block is a predetermined mode (S2401). Here, as described above, the predetermined mode may be CIIP mode or GPM. However, the predetermined pattern is not limited thereto. When it is determined in step S2401 that the prediction mode of the current block is the predetermined mode (yes in S2401), an inseparable main transform (NSPT) may be applied to the current block (S2402). When it is determined in step S2401 that the prediction mode of the current block is not the predetermined mode, the inseparable main transform (NSPT) may not be applied to the current block. In this case, a transform according to a method different from the inseparable main transform (e.g., conventional separable transform) according to the present disclosure may be performed on the current block, or the main transform may be omitted.

In various embodiments of the present disclosure described below, when it is determined that the non-separable main transform is not applicable to the current block, it will be apparent to those skilled in the art that the transform according to a method different from the non-separable main transform (e.g., conventional separable transform) may be performed on the current block, or the main transform may be omitted, even though not separately described. According to the example shown in fig. 24, since whether to apply the inseparable main transform is determined according to the prediction mode of the current block, the encoding apparatus does not need to separately transmit a flag on whether to apply the inseparable main transform.

According to another embodiment of the present disclosure, when the prediction mode of the current block is a predetermined mode, a flag (e.g., a 1-bit flag, such as ciip _ nspt _mode, gpm_ nspt _mode) specifying whether to apply the inseparable main transform may be transmitted from the encoding apparatus to the decoding apparatus. For example, when the prediction mode of the current block is a predetermined mode (CIIP mode or GPM), the image encoding apparatus may determine whether to apply an inseparable main transform to transform the current block and encode flag information related thereto into a bitstream. When the prediction mode of the current block is a predetermined mode (CIIP mode or GPM), the image decoding apparatus may parse (decode) flag information from the bitstream and determine whether to apply an inseparable main transform to transform the current block based on the flag information.

According to another embodiment of the present disclosure, when the prediction mode of the current block is a predetermined mode (CIIP mode or GPM), the image encoding apparatus and/or the image decoding apparatus may determine whether to apply the inseparable main transform to the current block based on the size and/or shape condition of the current block.

Fig. 25 shows a flowchart illustrating a method of applying an inseparable primary transformation according to a block size in a predetermined mode. The method shown in fig. 25 may be performed in an image encoding apparatus and/or an image decoding apparatus to transform and/or inversely transform a residual block of a current block.

Referring to fig. 25, it may be determined whether the prediction mode of the current block is a predetermined mode (S2501). Here, as described above, the predetermined mode may be CIIP mode or GPM. However, the predetermined pattern is not limited thereto. When it is determined in step S2501 that the prediction mode of the current block is the predetermined mode (yes in S2501), it may be determined whether a condition regarding the size and/or shape of the current block satisfies a predetermined condition (S2502). For example, the predetermined condition may be related to whether the number of pixels in the current block is within a predetermined range, less than or equal to (less than) a predetermined threshold, or greater than (exceeding) a predetermined threshold. For example, the predetermined condition may be that the number of pixels in the current block is 16 or more but less than 1024. However, the predetermined condition is not limited to the above example. For example, the particular value for the number of pixels in the current block may vary. If the condition on the size and/or shape of the current block satisfies the predetermined condition in step S2502 (yes in S2502), the inseparable main transform (NSPT) may be applied to the current block (S2503). According to the example shown in fig. 25, whether to apply the inseparable main transformation is determined according to whether a predetermined condition is satisfied, and thus the encoding apparatus does not need to separately transmit a flag as to whether to apply the inseparable main transformation.

According to another embodiment of the present disclosure, when a condition regarding the size and/or shape of the current block satisfies a predetermined condition, a flag (e.g., a 1-bit flag, such as ciip _ nspt _mode, gpm_ nspt _mode) specifying whether to apply the inseparable main transform may be transmitted from the encoding apparatus to the decoding apparatus. For example, when the prediction mode of the current block is a predetermined mode (CIIP mode or GPM) and the condition on the size and/or shape of the current block is a predetermined condition (e.g., the number of pixels in the current block is 16 or more but less than 1024), the image encoding apparatus may determine whether to apply the inseparable main transform to transform the current block and encode flag information related thereto as a bitstream. When the prediction mode of the current block is a predetermined mode (CIIP mode or GPM) and the condition of the size and/or shape of the current block satisfies a predetermined condition (e.g., the number of pixels of the current block is 16 or more but less than 1024), the image decoding apparatus may parse (decode) flag information from the bitstream and determine whether to apply the inseparable main transform to transform the current block according to the flag information.

Transform index signaling method in inseparable main transform

Hereinafter, various embodiments of a signaling method related to the inseparable main transformation according to the present disclosure will be described.

In one embodiment according to the present disclosure, when the main transformation is applied, both separable and non-separable transformations may be included. That is, when a transform from the spatial domain to the frequency domain is applied, both separable transforms and non-separable transforms may be applied. The primary transform method based on separable transforms may include DCT type 2, DST type 7, DCT type 8, DCT type 5, DST type 4, DST type 1, IDT (identity transform) or other transforms that are not based on non-separable transforms (e.g., transform skipping). In the case of separable transforms, there are a plurality of optional transforms, and in the case of non-separable transforms, the computational complexity or memory requirements may be greater than for separable transforms. Thus, the number of selectable inseparable transformations may be one or more.

In accordance with one embodiment of the present disclosure, various signaling methods (hereinafter, simply referred to as a first method) in the case where there is one core of an inseparable transformation and there are a plurality of separable transformations will be described below.

(A) According to one example, an inseparable transform may be set to transform the index to "0", and the index of the separable transform may start with 1. For example, if there are five separable transformation methods, transformation index 0 may indicate an inseparable transformation, and transformation indexes 1 through 5 may indicate predefined separable transformation methods. Thus, a transform index value of 0 to 5 can be signaled from the encoding device to the decoding device, and from the decoded index value, it can be known whether an inseparable transform or a separable transform is used, and it can be determined which separable transform is used.

(B) As another example, setting the non-separable transform index to a predefined "N", the separable transform index may be defined sequentially, excluding N, starting from 0 up to a predefined maximum index value M (where M > =n). For example, if there are five separable transformation methods available, the index of the inseparable transformation is defined as 3 (e.g., n=3), the index of the separable transformation is allocated from 0 to 5 in a predetermined order, but may be determined to be other than "3". In this case, possible index values from 0 to 5 may be signaled from the encoding device to the decoding device, from which decoded index values it may be known whether an inseparable transform or a separable transform is used, and in the case of a separable transform it may be determined which separable transform is used.

(C) In another embodiment, the encoding device may first signal whether to use the inseparable transform by a flag indicating whether to use the inseparable transform. If the flag indicating whether the non-separable transform is used is 0, the index of the separable transform may be additionally signaled. The value of the flag indicating whether to use the inseparable transform may be encoded by encoding the cell prediction probability via context. At this time, a context model of the probability prediction may be constructed using the size of a block, the shape of a block, an intra prediction mode, or information about a previously encoded block.

(D) The above-mentioned transform index value (or separable transform index value) may be binarized by FLC (fixed length coding) or TBC (truncated binary code), in which case it may be encoded by context coding (i.e. by treating it as a context-coded cell) or by-pass coding (i.e. by treating it as a by-pass-coded cell). Alternatively, the transform index is expressed by Truncated Unary (TU) binarization and may be encoded by context coding (i.e., by treating it as a context coding cell) to predict the probability of occurrence, or may be encoded by bypass coding (i.e., by treating it as a bypass coding cell) with the same probability.

According to another embodiment of the present disclosure, various signaling methods (hereinafter referred to as a first method) in the case where there are two or more inseparable transformation cores and there are a plurality of separable transformations will be described below.

(A) According to one example, the encoding device may first signal whether to use the inseparable transform by a flag indicating whether to use the inseparable transform. The non-separable transform core index may be additionally signaled if a flag indicating whether the non-separable transform is used is 1, and the separable transform index may be additionally signaled if a flag indicating whether the non-separable transform is used is 0. At this time, the inseparable transform core index value or the separable transform index value may be binarized by FLC (fixed length coding) or TBC (truncated binary code), in which case it may be encoded by context coding or bypass coding. Alternatively, the inseparable transform kernel index value or the separable transform index value may be expressed by Truncated Unary (TU) binarization, and may be encoded by predicting occurrence probability through context encoding, or may be encoded with the same probability by bypass encoding. Furthermore, the flag indicating whether to use the inseparable transform may be encoded by encoding the cell prediction probability via context. At this time, the context model of the probability prediction may use the size of a block, the shape of a block, an intra prediction mode, or information about a previously encoded block.

(B) As another example, the encoding device may signal the transform index once without separately signaling a flag indicating whether to use the inseparable transform. If there are N non-separable transform cores and M available separable transforms, a transform index value from 0 to N-1 may indicate N non-separable transform cores (0 to N-1) and a transform index value from N to N+M-1 may indicate separable transform indexes 0 to (M-1). In addition to the above, the method of mapping separable transformation indexes and non-separable transformation kernels available from transformation indexes may be applied in various predefined forms.

In defining the separable transformation index according to the above-described embodiment, the encoding apparatus may determine the transformation to be applied to the horizontal direction and the vertical direction by a given separable transformation index value according to a predetermined rule. Alternatively, the separable transformation indexes may be separable into horizontal separable transformation indexes and vertical separable transformation indexes and transmitted, and the transformation specified by each index may be applied to the corresponding direction.

Inseparable main transformation method using residual characteristics

Hereinafter, various embodiments of performing an inseparable main transform using characteristics of a residual signal according to the present disclosure will be described.

According to one embodiment of the present disclosure, when an inseparable main transform is applied to a block to which CIIP modes are applied and an inseparable main transform is applied to a block to which GPM is applied, an encoding apparatus may adaptively apply the inseparable main transform using characteristics of residuals of a current block or neighboring blocks. For blocks with lower residual signal energy (i.e., blocks with smaller residual coefficient numbers), it is more efficient to use one inseparable transform kernel than to use multiple inseparable transform kernels. In this case, the encoding apparatus can achieve more efficient compression performance by reducing bit consumption for signaling the inseparable transform core. On the other hand, a block with higher residual signal energy (i.e., a block with a larger residual coefficient number) may have various characteristics, and thus it is effective to use a plurality of inseparable transform kernels even if additional signaling is performed. In addition, in the case of the inseparable transform, since the computational complexity or memory requirement is greater than that of the separable transform, the encoding apparatus can reduce the computational complexity (reduce power consumption) by the method proposed in the present disclosure.

According to one embodiment of the present disclosure, the encoding apparatus may adaptively apply the inseparable main transform to the CIIP-mode-applied block and/or the GPM-applied block using the position of the last significant coefficient (last non-zero coefficient) of the current block or neighboring block. The location of the last significant coefficient in the block may be an important cue for determining the residual characteristics of the block. In general, the closer the position of the last significant coefficient in a block is to the upper left pixel of the block, the more uniform the distribution of pixel values in the spatial domain, the further the position of the last significant coefficient in the block is from the upper left pixel of the block, and the greater the variance of pixel values in the spatial domain. In particular, since CIIP mode and GPM are inter-prediction and the value of high-frequency coefficients in a block is small, this can be used to efficiently determine whether to apply the inseparable main transform to a block to which CIIP mode is applied and/or a block to which GPM is applied. When lastScanPos is defined as one-dimensional position information identifying the last significant-coefficient position in the block, lastScanPos is set to 0 at the upper-left pixel position and lastScanPos may have a greater value when moving to the lower-right pixel of the block. In the decoding apparatus, lastScanPos values may be derived based on syntax values (e.g., last_sig_coeff_x_prefix, last_sig_coeff_y_prefix, last_sig_coeff_x_suffix, last_sig_coeff_y_suffix) signaling the position information of the last significant coefficient, or may be allocated in a scan order of predetermined coefficients or in an order opposite to the scan order of coefficients according to a block size.

According to one embodiment of the present disclosure, the encoding apparatus may change the number of inseparable transform cores by dividing a range using lastScanPos information of a current block or a neighboring block. For example, when two ranges are used, the encoding device may use one non-separable transform kernel when the lastScanPos value of the current block or neighboring block is less than or equal to the first threshold (the last significant coefficient is located near the top-left pixel of the block), otherwise, use multiple non-separable transform kernels. As another example, if the lastScanPos value of the current block or neighboring block is less than or equal to the first threshold, then no inseparable transform is applied and only a separable transform (e.g., DCT-2) is applied, otherwise one or more inseparable transform kernels may be used. As another example, when lastScanPos values of the current block or neighboring blocks are greater than or equal to the first threshold, the encoding device may use only the non-separable transform or may use both the separable transform and the non-separable transform. As another example, when three ranges are used, if the lastScanPos value of the current block or neighboring block is less than or equal to the second threshold, the encoding device may use one inseparable transform kernel, use M inseparable transform kernels when the lastScanPos value is greater than the second threshold and less than or equal to the third threshold, and use N inseparable transform kernels if the lastScanPos value is greater than the third threshold. At this time, if the significant coefficient in the current block exists only in the upper left pixel (i.e., in the DC position), the above-described method is not applied, and a separable transform (e.g., DCT-2) or a separately determined transform method may be applied.

In the present embodiment, the number of the first threshold, the second threshold, the third threshold, and the threshold range may be experimentally determined values, and may be adaptively changed and applied according to the size of the current block.

As described above, it may be determined whether to perform the inseparable primary transform on the current block based on lastScanPos. This embodiment mode can be combined with other embodiment modes of the present disclosure. For example, lastScanPos-based determination processes may be added to the embodiment of fig. 24 or the embodiment of fig. 25. Alternatively, the determination process in the embodiment of fig. 24 or 25 (S2401, S2501, or S2502) may be partially replaced.

According to another embodiment of the present disclosure, the encoding apparatus may adaptively apply the inseparable primary transform to the CIIP-mode-applied block and/or the GPM-applied block using the number of non-zero coefficients of the current block or neighboring blocks. The number of last significant coefficients in a block may be an important clue to determine the residual characteristics of the block. In general, the fewer the number of non-zero coefficients in a block, the more uniform the distribution of pixel values in the spatial domain, and the greater the number of non-zero coefficients in a block, the greater the variance of pixel values in the spatial domain. In particular, since CIIP mode and GPM are inter-prediction and most coefficients in the block are 0, the encoding device uses this to efficiently determine whether to apply the inseparable main transform to the CIIP mode-applied block and/or to the GPM-applied block. When numSigCoeff is defined as the number of significant coefficients in a block, numSigCoeff values may be derived based on a syntax value (e.g., sig_coeff_flag or/and sb_coded_flag) that signals whether it is a significant coefficient or may be calculated by a separate counter. In the present disclosure, the number of inseparable transform cores may be changed by dividing a range using numSigCoeff information of a current block or a neighboring block. For example, when two ranges are used, if the numSigCoeff value of the current block or neighboring blocks is less than or equal to the fourth threshold (if the number of significant coefficients in a block is small), the encoding device may use one non-separable transform kernel, otherwise, may use multiple non-separable transform kernels. As another example, if the numSigCoeff value of the current block or neighboring block is less than or equal to the fourth threshold, then no inseparable transform is applied and only a separable transform (e.g., DCT-2) is applied, otherwise one or more inseparable transform kernels may be used. As another example, if the numSigCoeff value of the current block or neighboring block is greater than or equal to the fourth threshold, the encoding device may use only the non-separable transform or may use both the separable transform and the non-separable transform. As another example, when three ranges are used, if the numSigCoeff value of the current block or neighboring block is less than or equal to the fifth threshold, the encoding device may use one inseparable transform kernel, K inseparable transform kernels if the numSigCoeff value is greater than the fifth threshold and less than or equal to the sixth threshold, and L inseparable transform kernels if the numSigCoeff value is greater than the sixth threshold. At this time, if the significant coefficient in the current block exists only in the upper left pixel (i.e., at the DC position), the above-described method is not applied, and a separable transform (e.g., DCT-2) or a separately determined transform method may be applied.

In the present embodiment, the number of the fourth threshold, the fifth threshold, the sixth threshold, and the threshold range may be experimentally determined values, and may be adaptively changed and applied according to the size of the current block.

As described above, it may be determined whether to perform the inseparable primary transform on the current block based on numSigCoeff. This embodiment mode can be combined with other embodiment modes of the present disclosure. For example, numSigCoeff-based determination processes may be added to the embodiment of fig. 24 or the embodiment of fig. 25. Alternatively, the determination process in the embodiment of fig. 24 or 25 (S2401, S2501, or S2502) may be partially replaced. Alternatively, to determine whether to apply the inseparable primary transform to the current block, a determination regarding numSigCoeff and a determination regarding lastScanPos may be performed in addition to or as an alternative to other embodiments of the present disclosure.

In the decoding apparatus, the syntax on the residual may be parsed before the syntax on the inseparable transform information (whether the inseparable transform and/or inseparable transform index information is applied). Whether to apply the inseparable transform and/or the method of signaling inseparable transform index information according to the present disclosure may follow the transform index signaling method in the inseparable main transform. More specifically, the first method may be applied when one inseparable main transformation core exists for the current block determined according to the above method, and the second method may be applied when a plurality of inseparable main transformation cores exist for the current block determined according to the above method. If the number of inseparable main transform cores of the current block determined according to the above-described method is 0 (when inseparable main transforms are not applied), inseparable main transform related information (a flag specifying whether inseparable main transforms are applied or not, etc., for example, a 1-bit flag, for example, ciip _ nspt _mode, gpm_ nspt _mode) may not be transmitted.

Hereinafter, an image encoding/decoding method according to an embodiment of the present disclosure will be described in detail with reference to fig. 26 and 27.

The image encoding method of fig. 26 may be performed by the image encoding apparatus of fig. 2. For example, step S2604 may be performed by converter 120.

Referring to fig. 26, the image encoding apparatus may determine a prediction mode of a current block (S2601).

In one embodiment, the image encoding apparatus may determine the prediction mode of the current block through various methods such as RD cost comparison. The image encoding device may encode the determined prediction mode into a bitstream as the prediction information.

The image encoding apparatus may determine whether the prediction mode is a predetermined mode (S2602). For example, the predetermined mode may be CIIP mode or GPM. If the prediction mode of the current block is a predetermined mode (yes in S2602), the image encoding apparatus may select a transform core for encoding a residual block of the current block (S2603). At this point, the selected transform core may be an inseparable primary transform core.

Then, the image encoding apparatus may encode the residual block of the current block by performing an inseparable main transform using the transform collation current block (S2604).

As described above, in one embodiment, it may also be determined whether to apply the inseparable main transform based on the number of pixels included in the current block. For example, if the number of pixels included in the current block is 16 or more but less than 1024, the inseparable main transform may be applied, and the number of pixels included in the current block is not limited to the above example.

In addition, in one embodiment, the image encoding apparatus may determine whether to apply the inseparable main transform to the current block based on the prediction mode of the current block being a predetermined mode, and encode information indicating this.

In one embodiment, the set of transform cores or transform cores applied to the inseparable primary transform may be adaptively selected based on information about the predetermined pattern. For example, when the predetermined mode is CIIP modes, the information about the predetermined mode may include at least one of CIIP weight or CIIP intra mode. Here, the CIIP intra mode may be at least one of a plane mode, a CIIP _pdpc mode, a CIIP _ DIMD mode, or a CIIP _ TIMD mode. Alternatively, based on the predetermined pattern being a GPM, the information about the predetermined pattern may be at least one of a GPM index, an angle index, or a distance index.

In one embodiment, the image encoding apparatus may encode transform core selection information for the current block so as to signal the transform core selected for the current block. The transform core selection information may be information for selecting one transform core from one or more non-separable transform cores and one or more separable transform cores.

In one embodiment, it may also be determined whether to apply the inseparable primary transform to the current block based on residual characteristics of the current block or neighboring blocks. At this time, the residual characteristic may be determined based on the position of the last significant coefficient in the residual block of the current block or the number of significant coefficients in the residual block of the current block.

The image decoding method in fig. 27 may be performed by the image decoding apparatus of fig. 3.

Referring to fig. 27, the image decoding apparatus may obtain prediction information of a current block (S2701). Specifically, the image decoding apparatus may parse (decode) information (prediction information) related to a prediction mode of the current block from the bitstream.

In addition, the image decoding apparatus may determine a prediction mode of the current block based on the prediction information (S2702).

In addition, the image decoding apparatus may determine whether the prediction mode is a predetermined mode (S2703). Based on the prediction mode of the current block being a predetermined mode, the image decoding apparatus may select a transform core for the current block (S2704). At this time, the selected transform core may be an inseparable main transform core for generating a residual block of the current block.

In one embodiment, the predetermined mode may be one of CIIP modes or GPM.

Then, the image decoding apparatus may generate a residual block of the current block by performing an inseparable main transform using the transform collation current block (S2705).

In addition, in one embodiment, the image decoding apparatus may obtain information specifying whether to apply the inseparable main transform to the current block based on the prediction mode of the current block being a predetermined mode. Whether to apply the inseparable primary transform to the current block may be determined based on the obtained information.

In one embodiment, the image decoding apparatus may obtain transform core selection information of the current block so as to select a transform core with respect to the current block. The transform core selection information may be information for selecting one transform core from one or more non-separable transform cores and one or more separable transform cores.

In one embodiment, it may also be determined whether to apply the inseparable primary transform to the current block based on residual characteristics of the current block or neighboring blocks. At this time, the residual characteristic may be determined according to a position of the last significant coefficient in the residual block of the current block or the number of significant coefficients in the residual block of the current block.

According to embodiments of the present disclosure, the inseparable main transform may be adaptively/selectively applied based on a predetermined prediction mode or various conditions, thereby improving coding efficiency.

In addition, according to the signaling method of inseparable main transformation related information of the present disclosure, an effective signaling method can be provided in all cases where various types of transformations are applicable.

While, for clarity of description, the above-described exemplary methods of the present disclosure are represented as a series of operations, it is not intended to limit the order in which the steps are performed, and the steps may be performed simultaneously or in a different order, as desired. For the implementation of the method according to the present disclosure, the described steps may also include other steps, may include remaining steps in addition to some steps, or may include other additional steps in addition to some steps.

In the present disclosure, the image encoding apparatus or the image decoding apparatus that performs a predetermined operation (step) may perform an operation (step) of confirming an execution condition or status of the corresponding operation (step). For example, if it is described that the predetermined operation is performed when the predetermined condition is satisfied, the image encoding apparatus or the image decoding apparatus may perform the predetermined operation after determining whether the predetermined condition is satisfied.

The various embodiments of the present disclosure are not a list of all possible combinations and are intended to describe representative aspects of the present disclosure, and the what is described in the various embodiments may be applied independently or in combinations of two or more.

Various embodiments of the present disclosure may be implemented in hardware, firmware, software, or a combination thereof. Where the present disclosure is implemented in hardware, the present disclosure may be implemented in an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a general purpose processor, a controller, a microcontroller, a microprocessor, or the like.

In addition, the image decoding apparatus and the image encoding apparatus to which the embodiments of the present disclosure are applied may be included in a multimedia broadcast transmitting and receiving device, a mobile communication terminal, a home theater video device, a digital cinema video device, a monitoring camera, a video chat device, a real-time communication device such as video communication, a mobile streaming device, a storage medium, a camcorder, a video on demand (VoD) service providing device, an OTT video (top video) device, an internet streaming service providing device, a three-dimensional (3D) video device, a video telephony video device, a medical video device, and the like, and may be used to process video signals or data signals. For example, OTT video devices may include game consoles, blu-ray players, internet access TVs, home theater systems, smart phones, tablet PCs, digital Video Recorders (DVRs), and the like.

As shown in fig. 28, a content streaming system to which embodiments of the present disclosure are applied may mainly include an encoding server, a streaming server, a web server, a media storage, a user device, and a multimedia input device.

The encoding server compresses content input from a multimedia input device such as a smart phone, a camera, a camcorder, etc. into digital data to generate a bit stream, and transmits the bit stream to the streaming server. As another example, the encoding server may be omitted when the multimedia input device, such as a smart phone, a camera, a camcorder, etc., directly generates the bitstream.

The bit stream may be generated by applying the image encoding method or the image encoding apparatus of the embodiment of the present disclosure, and the stream server may temporarily store the bit stream in the course of transmitting or receiving the bit stream.

The streaming server transmits multimedia data to the user device through the web server based on a request of the user, and the web server serves as a medium for notifying the service to the user. When a user requests a desired service from the web server, the web server may deliver it to the streaming server, and the streaming server may send multimedia data to the user. In this case, the content streaming system may include a separate control server. In this case, the control server is used to control commands/responses between devices in the content streaming system.

The streaming server may receive content from the media store and/or the encoding server. For example, the content may be received in real-time as it is received from the encoding server. In this case, in order to provide a smooth streaming service, the streaming server may store the bitstream for a predetermined time.

Examples of user devices may include mobile phones, smart phones, laptops, digital broadcast terminals, personal Digital Assistants (PDAs), portable Multimedia Players (PMPs), navigation, tablet PCs, superbooks, wearable devices (e.g., smart watches, smart glasses, head mounted displays), digital televisions, desktop computers, digital signage, and the like.

Each server in the content streaming system may operate as a distributed server, in which case the data received from each server may be distributed.

The scope of the present disclosure includes software or machine-executable instructions (e.g., operating systems, application programs, firmware, programs, etc.) for enabling the execution of operations of the methods according to various embodiments on a device or computer, non-transitory computer-readable media having such software or instructions stored thereon and executable on the device or computer.

INDUSTRIAL APPLICABILITY

Embodiments of the present disclosure may be used to encode or decode images.

Claims

1. An image decoding method performed by an image decoding apparatus, the image decoding method comprising the steps of:

obtaining prediction information of a current block;

determining a prediction mode of the current block based on the prediction information;

Selecting a transform core for generating a residual block of the current block based on the prediction mode being a predetermined mode; and

The residual block of the current block is generated by performing an inseparable main transform on the current block based on a selected transform check.

2. The image decoding method of claim 1, wherein the predetermined mode is one of a combined inter-prediction and intra-prediction CIIP mode or a geometric partition mode GPM.

3. The image decoding method of claim 1, wherein whether to apply the inseparable primary transform is further determined based on a number of pixels included in the current block.

4. The image decoding method according to claim 1, further comprising obtaining information specifying whether to apply the inseparable main transform based on the prediction mode of the current block being the predetermined mode,

Wherein whether to apply the inseparable primary transform to the current block is determined based on the obtained information.

5. The image decoding method of claim 1, wherein a set of transform kernels or transform kernels applied to the inseparable primary transform are adaptively selected based on information about the predetermined pattern.

6. The image decoding method of claim 5, wherein the information about the predetermined mode includes at least one of CIIP weight or CIIP intra mode based on the predetermined mode being CIIP modes.

7. The image decoding method of claim 6, wherein the CIIP intra mode is at least one of a plane mode, CIIP _pdpc mode, CIIP _ DIMD mode, or CIIP _ TIMD mode.

8. The image decoding method of claim 5, wherein the information about the predetermined mode includes at least one of a GPM index, an angle index, or a distance index based on the predetermined mode being a GPM mode.

9. The image decoding method of claim 1,

Wherein the step of selecting the transform core for the current block includes obtaining transform core selection information for the current block, and

Wherein the transformation core selection information is information for selecting one transformation core from one or more non-separable transformation cores and one or more separable transformation cores.

10. The image decoding method of claim 1, wherein it is further determined whether to apply the inseparable primary transform to the current block based on residual characteristics of the current block or neighboring blocks.

11. The image decoding method of claim 10, wherein the residual characteristic is determined based on a position of a last significant coefficient in the residual block of the current block or a number of significant coefficients in a residual block of the current block.

12. An image decoding apparatus comprising a memory and at least one processor,

Wherein the at least one processor is configured to:

obtaining prediction information of a current block;

13. An image encoding method performed by an image encoding apparatus, the image encoding method comprising the steps of:

determining a prediction mode of the current block;

Selecting a transform core for encoding a residual block of the current block based on the prediction mode being a predetermined mode; and

The residual block of the current block is encoded by performing an inseparable main transform with respect to the current block based on the determined transform check.

14. The image encoding method of claim 13, wherein the predetermined mode is one of a combined inter-prediction and intra-prediction CIIP mode or a geometric partition mode GPM.

15. The image encoding method of claim 13, wherein whether to apply the inseparable primary transform is further determined based on a number of pixels included in the current block.

16. The image encoding method of claim 13, wherein a set of transform kernels or transform kernels applied to the inseparable primary transform are adaptively selected based on information about the predetermined pattern.

17. The image encoding method according to claim 16,

Wherein, based on the predetermined pattern being CIIP patterns, the information about the predetermined pattern includes at least one of CIIP weights or CIIP intra patterns, and

Wherein the CIIP intra mode is at least one of a plane mode, CIIP _pdpc mode, CIIP _ DIMD mode, or CIIP _ TIMD mode.

18. The image encoding method of claim 16, wherein the information about the predetermined mode includes at least one of a GPM index, an angle index, or a distance index based on the predetermined mode being a GPM mode.

19. A computer readable recording medium storing a bit stream generated by an image encoding method, the image encoding method comprising the steps of:

determining a prediction mode of the current block;

20. A method of transmitting a bitstream generated by an image encoding method, the image encoding method comprising the steps of:

determining a prediction mode of the current block;