CN111699682A - Method and apparatus for encoding and decoding using selective information sharing between channels - Google Patents

Method and apparatus for encoding and decoding using selective information sharing between channels Download PDF

Info

Publication number
CN111699682A
CN111699682A CN201880088927.1A CN201880088927A CN111699682A CN 111699682 A CN111699682 A CN 111699682A CN 201880088927 A CN201880088927 A CN 201880088927A CN 111699682 A CN111699682 A CN 111699682A
Authority
CN
China
Prior art keywords
information
block
channel
transform
target block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201880088927.1A
Other languages
Chinese (zh)
Inventor
全东山
姜晶媛
高玄硕
林成昶
李镇浩
李河贤
全炳宇
金晖容
朴智允
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sungkyunkwan University School Industry Cooperation
Electronics and Telecommunications Research Institute ETRI
Sungkyunkwan University Research and Business Foundation
Original Assignee
Sungkyunkwan University School Industry Cooperation
Electronics and Telecommunications Research Institute ETRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sungkyunkwan University School Industry Cooperation, Electronics and Telecommunications Research Institute ETRI filed Critical Sungkyunkwan University School Industry Cooperation
Priority claimed from PCT/KR2018/015573 external-priority patent/WO2019112394A1/en
Publication of CN111699682A publication Critical patent/CN111699682A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • H04N19/463Embedding additional information in the video signal during the compression process by compressing encoding parameters before transmission
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • H04N19/122Selection of transform size, e.g. 8x8 or 2x4x8 DCT; Selection of sub-band transforms of varying structure or type
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/129Scanning of coding units, e.g. zig-zag scan of transform coefficients or flexible macroblock ordering [FMO]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Discrete Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A video decoding method, a video decoding apparatus, a video encoding method, and a video encoding apparatus are disclosed. The encoding decision information of the representative channel of the target block is shared as the encoding decision information of the target channel of the target block, and decoding of the target block is performed using the encoding decision information of the target channel. Since the coding decision information of the representative channel is shared with other channels, it is possible to prevent the same coding decision information from being repeatedly signaled. By this prevention, the efficiency of encoding and decoding for the target block, and the like can be improved.

Description

Method and apparatus for encoding and decoding using selective information sharing between channels
Technical Field
The following embodiments relate generally to a video decoding method and apparatus and a video encoding method and apparatus, and more particularly, to a video decoding method and apparatus and a video encoding method and apparatus using sharing of selective information between channels.
Background
With the continuous development of the information and communication industry, broadcasting services supporting High Definition (HD) resolution have been popularized throughout the world. Through this popularity, a large number of users have become accustomed to high resolution and high definition images and/or videos.
In order to meet the demand of users for high definition, a large number of mechanisms have accelerated the development of next-generation imaging devices. In addition to high definition TV (hdtv) and Full High Definition (FHD) TV, user interest in UHD TV has also increased, where the resolution of UHD TV is more than four times the resolution of Full High Definition (FHD) TV. With the increase of interest thereof, image encoding/decoding techniques for images having higher resolution and higher definition are continuously required.
The image encoding/decoding apparatus and method may use an inter prediction technique, an intra prediction technique, an entropy encoding technique, etc. in order to perform encoding/decoding on high resolution and high definition images. The inter prediction technique may be a technique for predicting values of pixels included in a target picture using a temporally preceding picture and/or a temporally succeeding picture. The intra prediction technique may be a technique for predicting values of pixels included in a target picture using information on the pixels in the target picture. The entropy coding technique may be a technique for allocating short codewords to frequently occurring symbols and long codewords to rarely occurring symbols.
Recently, demands for high quality images, such as Ultra High Definition (UHD) images, capable of providing high resolution, further widened color space, and excellent image quality have increased in various application fields. As images tend to be higher resolution and higher quality images, the amount of image data required to provide an image may increase beyond the amount of existing image data. In the case of transmitting image data through a communication medium such as a wired/wireless broadband line or various broadcasting media such as a satellite, a terrestrial wave, an Internet Protocol (IP) network, a wireless network, a cable, or a mobile communication network, or in the case of storing image data in various types of storage media such as a Compact Disc (CD), a Digital Versatile Disc (DVD), a Universal Serial Bus (USB) medium, and a High Definition (HD) -DVD, transmission costs and storage costs increase as the amount of image data increases.
With the use of high-resolution and high-quality images, efficient image encoding/decoding techniques are required in order to solve inevitable and more serious problems in image data and provide images with higher resolution and higher image quality.
Disclosure of Invention
Technical problem
Embodiments are directed to providing an encoding apparatus and method and a decoding apparatus and method using sharing of selective information between channels.
Technical scheme
According to an aspect, there is provided a decoding method comprising: sharing the coding decision information of the representative channel of the target block as the coding decision information of the target channel of the target block; and performing decoding on a target block using the coding decision information of the target channel.
The decoding method may further include: a bitstream including information about a target block is received.
The information about the target block may include coding decision information of the representative channel.
The information on the target block may not include coding decision information of the target channel.
The coding decision information of the representative channel may be transform skip information indicating whether a transform is to be skipped.
The coding decision information of the representative channel may indicate which transform is to be used for the transform block of the channel.
The coding decision information of the representative channel may be intra coding decision information of the representative channel.
The representative channel and the target channel may be channels in a YCbCr color space.
The representative channel may be a luminance channel.
The target channel may be a chrominance channel.
The representative channel may be a color channel having the highest correlation with the luminance signal.
The representative channel may be determined by an index in the bitstream indicating the selected representative channel.
The sharing operation may be performed when image attributes of the plurality of channels of the target block are similar to each other.
When the intra prediction mode of the chroma channel of the target block is the direct mode, the image attributes of the plurality of channels may be determined to be similar to each other.
When cross-channel prediction is used, the sharing operation may be performed.
Whether cross-channel prediction is used may be derived based on information obtained from the bitstream.
When cross-channel prediction is used, the sharing operation may be performed.
Whether to use cross-channel prediction may be determined based on an intra prediction mode of a target block.
When the INTRA prediction mode of the target block is one of the INTRA _ CCLM mode, the INTRA _ MMLM mode, and the INTRA _ MFLM mode, cross-channel prediction may be used.
Whether sharing is to be performed may be determined based on the size of the target block.
The encoding decision information of a representative channel of the plurality of channels of the target block may be used for all of the plurality of channels.
According to another aspect, there is provided an encoding method comprising: determining coding decision information of a representative channel of a target block; and performing encoding on the target block using the encoding decision information of the representative channel, wherein the encoding decision information of the representative channel is shared with another channel of the target block.
The encoding method may further include: a bitstream including information on the target block is generated.
The information about the target block may include coding decision information of the representative channel.
The information on the target block may not include coding decision information of the further channel.
The representative channel and the further channel may be channels in a YCbCr color space.
According to another aspect, there is provided a computer-readable storage medium storing a bitstream for image decoding, the bitstream including information on a target block, wherein the information on the target block includes encoding decision information of a representative channel of the target block, wherein the encoding decision information of the representative channel of the target block is used and shared as the encoding decision information of the target channel of the target block, and wherein decoding of the target block is performed using the encoding decision information of the target channel.
Advantageous effects
An encoding apparatus and method and a decoding apparatus and method using sharing of selective information between channels are provided.
Drawings
Fig. 1 is a block diagram showing a configuration of an embodiment of an encoding apparatus to which the present disclosure is applied;
fig. 2 is a block diagram showing a configuration of an embodiment of a decoding apparatus to which the present disclosure is applied;
fig. 3 is a diagram schematically showing a partition structure of an image when the image is encoded and decoded;
fig. 4 is a diagram illustrating a form of a Prediction Unit (PU) that a Coding Unit (CU) can include;
fig. 5 is a diagram illustrating a form of a Transform Unit (TU) that can be included in a CU;
FIG. 6 illustrates partitioning of blocks according to an example;
FIG. 7 is a diagram for explaining an embodiment of an intra prediction process;
fig. 8 is a diagram for explaining positions of reference samples used in an intra prediction process;
fig. 9 is a diagram for explaining an embodiment of an inter prediction process;
FIG. 10 illustrates spatial candidates according to an embodiment;
fig. 11 illustrates an order of adding motion information of spatial candidates to a merge list according to an embodiment;
FIG. 12 illustrates a transform and quantization process according to an example;
FIG. 13 illustrates a diagonal scan according to an example;
FIG. 14 shows a horizontal scan according to an example;
FIG. 15 shows a vertical scan according to an example;
fig. 16 is a configuration diagram of an encoding device according to an embodiment;
fig. 17 is a configuration diagram of a decoding apparatus according to an embodiment;
fig. 18 is a flow diagram of a method for decoding coding decision information according to an embodiment;
fig. 19 is a flowchart of a decoding method for determining whether a transform is to be skipped, according to an embodiment;
fig. 20 is a flowchart of a decoding method for determining whether a transform is to be skipped with reference to an intra mode according to an embodiment;
FIG. 21 is a flow diagram of a method for sharing transformation selection information according to an embodiment;
FIG. 22 illustrates a single treeblock partition structure, according to an example;
FIG. 23 illustrates a dual treeblock partition structure, according to an example;
FIG. 24 illustrates a scheme for specifying a corresponding block based on a location in a corresponding region, according to an example;
FIG. 25 illustrates a scheme for specifying corresponding blocks based on areas in corresponding regions, according to an example;
FIG. 26 illustrates another scheme for specifying corresponding blocks based on areas in corresponding regions, according to an example;
FIG. 27 illustrates a scheme for specifying a corresponding block based on a form of the block in a corresponding region, according to an example;
FIG. 28 illustrates another scheme for specifying corresponding blocks based on the form of the blocks in the corresponding region, according to an example;
FIG. 29 illustrates a scheme for specifying a corresponding block based on an aspect ratio of blocks in a corresponding region, according to an example;
FIG. 30 illustrates another scheme for specifying corresponding blocks based on aspect ratios of blocks in corresponding regions, according to an example;
fig. 31 illustrates a scheme for specifying a corresponding block based on encoding characteristics of blocks in a corresponding region, according to an example;
FIG. 32 is a flow diagram of an encoding method according to an embodiment; and
fig. 33 is a flowchart of a decoding method according to an embodiment.
Detailed Description
The present invention may be variously modified and may have various embodiments, and specific embodiments will be described in detail below with reference to the accompanying drawings. It should be understood, however, that these examples are not intended to limit the invention to the particular forms disclosed, but to include all changes, equivalents, and modifications that are within the spirit and scope of the invention.
The following exemplary embodiments will be described in detail with reference to the accompanying drawings showing specific embodiments. These embodiments are described so that those of ordinary skill in the art to which this disclosure pertains will be readily able to practice them. It should be noted that the various embodiments are distinct from one another, but are not necessarily mutually exclusive. For example, particular shapes, structures, and characteristics described herein may be implemented as other embodiments without departing from the spirit and scope of the embodiments with respect to one embodiment. Further, it is to be understood that the location or arrangement of individual components within each disclosed embodiment can be modified without departing from the spirit and scope of the embodiments. Therefore, the appended detailed description is not intended to limit the scope of the disclosure, and the scope of exemplary embodiments is defined only by the appended claims and equivalents thereof, as they are properly described.
In the drawings, like numerals are used to designate the same or similar functions in various respects. The shapes, sizes, and the like of components in the drawings may be exaggerated for clarity of the description.
Terms such as "first" and "second" may be used to describe various components, but the components are not limited by the terms. The terms are only used to distinguish one component from another component. For example, a first component may be termed a second component without departing from the scope of the present description. Similarly, the second component may be referred to as the first component. The term "and/or" may include a combination of multiple related items or any one of multiple related items.
It will be understood that when an element is referred to as being "connected" or "coupled" to another element, the two elements may be directly connected or coupled to each other or intervening elements may be present between the two elements. It will be understood that when components are referred to as being "directly connected" or "coupled," there are no intervening components present between the two components.
Further, components described in the embodiments are independently illustrated to represent different feature functions, but this does not mean that each component is formed of a separate hardware or software. That is, a plurality of components are individually arranged and included for convenience of description. For example, at least two of the plurality of components may be integrated into a single component. Instead, one component may be divided into a plurality of components. Embodiments in which a plurality of components are integrated or embodiments in which some components are separated are included in the scope of the present specification as long as they do not depart from the essence of the present specification.
Further, it should be noted that, in the exemplary embodiments, the expression that a component is described as "including" a specific component means that another component may be included within the scope of practical or technical spirit of the exemplary embodiments, but does not exclude the presence of components other than the specific component.
The terminology used in the description is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Singular references include plural references unless the context specifically indicates the contrary. In this specification, it is to be understood that terms such as "including" or "having" are only intended to indicate that there are features, numbers, steps, operations, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, operations, components, parts, or combinations thereof will be present or added.
The embodiments will be described in detail below with reference to the accompanying drawings so that those skilled in the art to which the embodiments pertain can easily practice the embodiments. In the following description of the embodiments, a detailed description of known functions or configurations incorporated herein will be omitted. In addition, the same reference numerals are used to designate the same components throughout the drawings, and repeated description of the same components will be omitted.
Hereinafter, "image" may represent a single picture constituting a video, or may represent the video itself. For example, "encoding and/or decoding of an image" may mean "encoding and/or decoding of a video", and may also mean "encoding and/or decoding of any one of a plurality of images constituting a video".
Hereinafter, the terms "video" and "moving picture" may be used to have the same meaning and may be used interchangeably with each other.
Hereinafter, the target image may be an encoding target image that is a target to be encoded and/or a decoding target image that is a target to be decoded. Further, the target image may be an input image input to the encoding apparatus or an input image input to the decoding apparatus.
Hereinafter, the terms "image", "picture", "frame", and "screen" may be used to have the same meaning and may be used interchangeably with each other.
Hereinafter, the target block may be an encoding target block (i.e., a target to be encoded) and/or a decoding target block (i.e., a target to be decoded). Furthermore, the target block may be a current block, i.e., a target that is currently to be encoded and/or decoded. Here, the terms "target block" and "current block" may be used to have the same meaning and may be used interchangeably with each other.
Hereinafter, the terms "block" and "unit" may be used to have the same meaning and may be used interchangeably with each other. Alternatively, a "block" may represent a particular unit.
Hereinafter, the terms "region" and "fragment" are used interchangeably with each other.
Hereinafter, the specific signal may be a signal indicating a specific block. For example, the original signal may be a signal indicating a target block. The prediction signal may be a signal indicating a prediction block. The residual signal may be a signal indicating a residual block.
In the following embodiments, particular information, data, flags, elements, and attributes may have their respective values. A value of "0" corresponding to each of the information, data, flags, elements, and attributes may indicate a logical false or first predefined value. In other words, the values "0", false, logical false, and the first predefined value may be used interchangeably with each other. A value of "1" corresponding to each of the information, data, flags, elements, and attributes may indicate a logical true or a second predefined value. In other words, the values "1", true, logically true, and second predefined values may be used interchangeably with each other.
When a variable such as i or j is used to indicate a row, column, or index, the value i may be an integer 0 or greater than 0, or may be an integer 1 or greater than 1. In other words, in an embodiment, each of the rows, columns, and indexes may count from 0, or may count from 1.
Hereinafter, terms to be used in the embodiments will be described.
An encoder: the encoder represents an apparatus for performing encoding.
A decoder: the decoder represents means for performing decoding.
A unit: the "unit" may represent a unit of image encoding and decoding. The terms "unit" and "block" may be used to have the same meaning and may be used interchangeably with each other.
The "cell" may be an array of M × N spots. M and N may be positive integers, respectively. The term "cell" may generally denote a two-dimensional (2D) array of spots.
During the encoding and decoding of an image, a "unit" may be a region generated by partitioning an image. In other words, a "cell" may be a region designated in one image. A single image may be partitioned into multiple cells. Alternatively, one image may be partitioned into sub-parts, and a unit may represent each partitioned sub-part when encoding or decoding is performed on the partitioned sub-parts.
During the encoding and decoding of the image, a predefined processing may be performed on each unit according to the type of unit.
Unit types may be classified into macro-units, Coding Units (CUs), Prediction Units (PUs), residual units, Transform Units (TUs), etc., according to function. Alternatively, the unit may represent a block, a macroblock, a coding tree unit, a coding tree block, a coding unit, a coding block, a prediction unit, a prediction block, a residual unit, a residual block, a transform unit, a transform block, and the like according to functions.
The term "unit" may denote information including a luminance (luma) component block, a chrominance (chroma) component block corresponding to the luminance component block, and syntax elements for the respective blocks such that the unit is designated to be distinguished from the blocks.
The size and shape of the cells can be implemented differently. Further, the cells may have any of a variety of sizes and shapes. Specifically, the shape of the cell may include not only a square but also a geometric shape (such as a rectangle, a trapezoid, a triangle, and a pentagon) that can be represented in two dimensions (2D).
Further, the unit information may include one or more of a type of the unit, a size of the unit, a depth of the unit, an encoding order of the unit, a decoding order of the unit, and the like. For example, the type of the unit may indicate one of a CU, a PU, a residual unit, and a TU.
A unit may be partitioned into sub-units, each sub-unit having a size smaller than the size of the associated unit.
-depth: depth may represent the degree to which a cell is partitioned. Further, the cell depth may indicate a level at which a corresponding cell exists when the cell is represented in a tree structure.
The unit partition information may comprise a depth indicating a depth of the unit. The depth may indicate the number of times a cell is partitioned and/or the extent to which the cell is partitioned.
In the tree structure, the depth of the root node can be considered to be the smallest and the depth of the leaf nodes the largest.
A single unit may be hierarchically partitioned into a plurality of sub-units, while the single unit has tree structure based depth information. In other words, a unit and a child unit generated by partitioning the unit may correspond to a node and a child node of the node, respectively. Each partitioned sub-unit may have a unit depth. Since the depth indicates the number of times the unit is partitioned and/or the degree to which the unit is partitioned, the partition information of the sub-unit may include information on the size of the sub-unit.
In the tree structure, the top node may correspond to the initial node before partitioning. The top node may be referred to as the "root node". Further, the root node may have a minimum depth value. Here, the depth of the top node may be level "0".
A node with a depth of level "1" may represent a unit generated when an initial unit is partitioned once. A node with a depth of level "2" may represent a unit generated when an initial unit is partitioned twice.
A leaf node with a depth of level "n" may represent a unit generated when an initial unit is partitioned n times.
A leaf node may be the bottom node that cannot be partitioned further. The depth of a leaf node may be a maximum level. For example, the predefined value for the maximum level may be 3.
the-QT depth may represent the depth for a quad-partition. BT depth may represent depth for a bipartite partition. The TT depth may represent a depth for a tri-partition.
-sampling points: the samples may be elementary units that constitute a block. Available from 0 to 2 according to the bit depth (Bd)Bd-A value of 1 to indicate a sample point.
The samples may be pixels or pixel values.
In the following, the terms "pixel" and "sample" may be used with the same meaning and may be used interchangeably with each other.
Code Tree Unit (CTU): a CTU may be composed of a single luma component (Y) coding tree block and two chroma component (Cb, Cr) coding tree blocks associated with the luma component coding tree block. Further, the CTU may represent information including the above-described blocks and syntax elements for each block.
-each Coding Tree Unit (CTU) may be partitioned using one or more partitioning methods, such as Quadtree (QT), Binary Tree (BT) and Ternary Tree (TT), in order to configure sub-units, such as coding units, prediction units and transform units. Further, each coding tree unit may be partitioned using a multi-type tree using one or more partitioning methods.
"CTU" may be used as a term designating a pixel block as a processing unit in an image decoding and encoding process, such as in the case of partitioning an input image.
Coded Tree Block (CTB): "CTB" may be used as a term designating any one of a Y coding tree block, a Cb coding tree block, and a Cr coding tree block.
Adjacent blocks: the neighboring blocks (or neighboring blocks) may represent blocks adjacent to the target block. The neighboring blocks may represent reconstructed neighboring blocks.
Hereinafter, the terms "adjacent block" and "adjacent block" may be used to have the same meaning and may be used interchangeably with each other.
Spatially adjacent blocks: the spatially neighboring block may be a block spatially adjacent to the target block. The neighboring blocks may include spatially neighboring blocks.
The target block and the spatially neighboring blocks may be comprised in the target picture.
Spatially neighboring blocks may represent blocks whose boundaries are in contact with the target block or blocks which are located within a predetermined distance from the target block.
The spatially neighboring blocks may represent blocks adjacent to the vertex of the target block. Here, the blocks adjacent to the vertex of the target block may represent blocks vertically adjacent to an adjacent block horizontally adjacent to the target block or blocks horizontally adjacent to an adjacent block vertically adjacent to the target block.
Temporal neighboring blocks: the temporally adjacent block may be a block temporally adjacent to the target block. The neighboring blocks may include temporally neighboring blocks.
The temporally adjacent blocks may comprise co-located blocks (col blocks).
A col block may be a block in a previously reconstructed co-located picture (col picture). The location of the col block in the col picture may correspond to the location of the target block in the target picture. Alternatively, the location of the col block in the col picture may be equal to the location of the target block in the target picture. The col picture may be a picture included in the reference picture list.
The temporally neighboring blocks may be blocks temporally adjacent to spatially neighboring blocks of the target block.
A prediction unit: the prediction unit may be a basic unit for prediction such as inter prediction, intra prediction, inter compensation, intra compensation, and motion compensation.
A single prediction unit may be divided into multiple partitions or sub-prediction units of smaller size. The plurality of partitions may also be basic units in performing prediction or compensation. The partition generated by dividing the prediction unit may also be the prediction unit.
Prediction unit partitioning: the prediction unit partition may be a shape into which the prediction unit is divided.
Reconstructed neighboring cells: the reconstructed neighboring cells may be cells that have been decoded and reconstructed around the target cell.
The reconstructed neighboring cells may be cells that are spatially adjacent to the target cell or temporally adjacent to the target cell.
The reconstructed spatially neighboring units may be units comprised in the target picture that have been reconstructed by encoding and/or decoding.
The reconstructed temporal neighboring cells may be cells included in the reference picture that have been reconstructed by encoding and/or decoding. The position of the reconstructed temporally neighboring cell in the reference image may be the same as the position of the target cell in the target picture or may correspond to the position of the target cell in the target picture.
Parameter set: the parameter set may be header information in the structure of the bitstream. For example, parameter sets may include video parameter sets, sequence parameter sets, picture parameter sets, adaptive parameter sets, and the like.
Further, the parameter set may include slice header information and parallel block header information.
And (3) rate distortion optimization: an encoding device may use rate-distortion optimization to provide high encoding efficiency by utilizing a combination of: a size of a Coding Unit (CU), a prediction mode, a size of a Prediction Unit (PU), motion information, and a size of a Transform Unit (TU).
The rate-distortion optimization scheme may calculate the rate-distortion cost of each combination to select the optimal combination from the combinations. The rate-distortion cost may be calculated using equation 1 below. In general, the combination that minimizes the rate-distortion cost may be selected as the optimal combination under the rate-distortion optimization scheme.
[ equation 1]
D+λ*R
D may represent distortion. D may be the average of the squares of the differences between the original transform coefficients and the reconstructed transform coefficients in the transform unit (i.e., the mean square error).
-R may represent the rate, which may represent the bit rate using the relevant context information.
- λ represents the lagrange multiplier. R may include not only coding parameter information such as a prediction mode, motion information, and a coding block flag, but also bits generated as a result of coding transform coefficients.
The coding device may perform processes such as inter-and/or intra-prediction, transformation, quantization, entropy coding, inverse quantization (dequantization) and inverse transformation in order to calculate the exact D and R. These processes can add significant complexity to the encoding device.
-a bit stream: the bitstream may represent a stream of bits including encoded image information.
-a set of parameters: the parameter set may be header information in the structure of the bitstream.
The parameter set may include at least one of a video parameter set, a sequence parameter set, a picture parameter set, and an adaptation parameter set. In addition, the parameter set may include information on a slice header and information on a parallel block header.
And (3) analysis: parsing may be a decision on the value of a syntax element made by performing entropy decoding on the bitstream. Alternatively, the term "parsing" may denote such entropy decoding itself.
Symbol: the symbol may be at least one of a syntax element, an encoding parameter, and a transform coefficient of the encoding target unit and/or the decoding target unit. Further, the symbol may be a target of entropy encoding or a result of entropy decoding.
Reference picture: the reference picture may be an image that is unit-referenced in order to perform inter prediction or motion compensation. Alternatively, the reference picture may be an image including a reference unit that is referred to by the target unit in order to perform inter prediction or motion compensation.
Hereinafter, the terms "reference picture" and "reference image" may be used to have the same meaning and may be used interchangeably with each other.
Reference picture list: the reference picture list may be a list including one or more reference pictures used for inter prediction or motion compensation.
The types of the reference picture list may include a merged List (LC), a list 0(L0), a list 1(L1), a list 2(L3), a list 3(L3), and the like.
For inter prediction, one or more reference picture lists may be used.
Inter prediction indicator: the inter prediction indicator may indicate an inter prediction direction for the target unit. The inter prediction may be one of unidirectional prediction and bidirectional prediction. Alternatively, the inter prediction indicator may represent the number of reference pictures used to generate the prediction unit of the target unit. Alternatively, the inter prediction indicator may represent the number of prediction blocks used for inter prediction or motion compensation of the target unit.
Reference picture index: the reference picture index may be an index indicating a specific reference picture in the reference picture list.
Motion Vector (MV): the motion vector may be a 2D vector for inter prediction or motion compensation. The motion vector may represent an offset between the target image and the reference image.
For example, may be represented by a symbol such as (mv)x,mvy) Represents the MV. mvxCan indicate the horizontal component, mvyA vertical component may be indicated.
-search scope: the search range may be a 2D region in which a search for MVs is performed during inter prediction. For example, the size of the search range may be M × N. M and N may be positive integers, respectively.
Motion vector candidates: the motion vector candidate may be a block that is a prediction candidate when the motion vector is predicted or a motion vector of a block that is a prediction candidate.
The motion vector candidate may be comprised in a motion vector candidate list.
Motion vector candidate list: the motion vector candidate list may be a list configured using one or more motion vector candidates.
Motion vector candidate index: the motion vector candidate index may be an indicator for indicating a motion vector candidate in the motion vector candidate list. Alternatively, the motion vector candidate index may be an index of a motion vector predictor.
Motion information: the motion information may be information including at least one of a reference picture list, a reference picture, a motion vector candidate index, a merge candidate, and a merge index, and a motion vector, a reference picture index, and an inter prediction indicator.
Merging the candidate lists: the merge candidate list may be a list using a merge candidate configuration.
Merging candidates: the merge candidates may be spatial merge candidates, temporal merge candidates, combined bi-predictive merge candidates, zero merge candidates, and the like. The merge candidates may include motion information such as prediction type information, reference picture index for each list, and motion vector.
Merging indexes: the merge index may be an indicator for indicating a merge candidate in the merge candidate list.
The merging index may indicate a reconstruction unit for deriving a merging candidate between a reconstruction unit spatially neighboring the target unit and a reconstruction unit temporally neighboring the target unit.
The merge index may indicate at least one of pieces of motion information of the merge candidates.
A transformation unit: the transform unit may be a basic unit of residual signal encoding and/or residual signal decoding, such as transform, inverse transform, quantization, inverse quantization, transform coefficient encoding, and transform coefficient decoding. A single transform unit may be partitioned into multiple transform units having smaller sizes.
Zooming: scaling may refer to the process of multiplying a factor by a transform coefficient level.
-as a result of scaling the transform coefficient level, transform coefficients may be generated. Scaling may also be referred to as "inverse quantization".
Quantization Parameter (QP): the quantization parameter may be a value used to generate a transform coefficient level for a transform coefficient in quantization. Alternatively, the quantization parameter may also be a value used to generate a transform coefficient by scaling the transform coefficient level in inverse quantization. Alternatively, the quantization parameter may be a value mapped to a quantization step.
Delta (Delta) quantization parameter: the delta quantization parameter is a difference between the quantization parameter of the target unit and the predicted quantization parameter.
Scanning: scanning may represent a method of arranging the order of coefficients in a cell, block, or matrix. For example, a method for arranging a 2D array in the form of a one-dimensional (1D) array may be referred to as "scanning". Alternatively, the method for arranging the 1D array in the form of a 2D array may also be referred to as "scanning" or "inverse scanning".
Transform coefficients: the transform coefficient may be a coefficient value generated when the encoding apparatus performs the transform. Alternatively, the transform coefficient may be a coefficient value generated when the decoding apparatus performs at least one of entropy decoding and inverse quantization.
Quantized levels generated by applying quantization to the transform coefficients or the residual signal or quantized transform coefficient levels may also be included in the meaning of the term "transform coefficients".
Level of quantization: the level of quantization may be a value generated when the encoding apparatus performs quantization on the transform coefficient or the residual signal. Alternatively, the quantized level may be a value that is a target of inverse quantization when the decoding apparatus performs inverse quantization.
The quantized transform coefficient levels as a result of the transform and quantization may also be included in the meaning of quantized levels.
Non-zero transform coefficients: the non-zero transform coefficient may be a transform coefficient having a value other than 0, or may be a transform coefficient level having a value other than 0. Alternatively, the non-zero transform coefficient may be a transform coefficient whose value is not 0 in magnitude, or may be a transform coefficient level whose value is not 0 in magnitude.
Quantization matrix: the quantization matrix may be a matrix used in a quantization process or an inverse quantization process in order to improve subjective image quality or objective image quality of an image. The quantization matrix may also be referred to as a "scaling list".
Quantization matrix coefficients: the quantization matrix coefficient may be each element in the quantization matrix. The quantized matrix coefficients may also be referred to as "matrix coefficients".
Default matrix: the default matrix may be a quantization matrix predefined by the encoding device and the decoding device.
Non-default matrix: the non-default matrix may be a quantization matrix that is not predefined by the encoding device and the decoding device. The non-default matrix may be signaled by the encoding device to the decoding device.
Most Probable Mode (MPM): the MPM may represent an intra prediction mode in which a high probability is used for intra prediction for the target block.
The encoding apparatus and the decoding apparatus may determine one or more MPMs based on the encoding parameters related to the target block and the attributes of the entity related to the target block.
The encoding apparatus and the decoding apparatus may determine one or more MPMs based on an intra prediction mode of a reference block. The reference block may include a plurality of reference blocks. The plurality of reference blocks may include a spatially neighboring block adjacent to the left side of the target block and a spatially neighboring block adjacent to the upper side of the target block. In other words, one or more different MPMs may be determined according to which intra prediction modes have been used for the reference block.
One or more MPMs may be determined in the same manner in both the encoding device and the decoding device. That is, the encoding apparatus and the decoding apparatus may share the same MPM list including one or more MPMs.
List of MPMs: the MPM list may be a list including one or more MPMs. The number of one or more MPMs in the MPM list may be predefined.
MPM indicator: the MPM indicator may indicate an MPM to be used for intra prediction for the target block among one or more MPMs in the MPM list. For example, the MPM indicator may be an index for an MPM list.
Since the MPM list is determined in the same manner in both the encoding apparatus and the decoding apparatus, it may not be necessary to transmit the MPM list itself from the encoding apparatus to the decoding apparatus.
The MPM indicator may be signaled from the encoding device to the decoding device. Since the MPM indicator is signaled, the decoding apparatus may determine an MPM to be used for intra prediction for the target block among MPMs in the MPM list.
MPM usage indicator: the MPM usage indicator may indicate whether an MPM usage mode is to be used for prediction for the target block. The MPM use mode may be a mode that determines an MPM to be used for intra prediction for the target block using the MPM list.
The MPM usage indicator may be signaled from the encoding device to the decoding device.
Signaling: "signaling" may mean that information is sent from an encoding device to a decoding device. Alternatively, "signaling" may mean that the information is included in a bitstream or storage medium. The information signaled by the encoding device may be used by the decoding device.
Fig. 1 is a block diagram showing a configuration of an embodiment of an encoding apparatus to which the present disclosure is applied.
The encoding device 100 may be an encoder, a video encoding device, or an image encoding device. A video may comprise one or more images (pictures). The encoding apparatus 100 may sequentially encode one or more images of a video.
Referring to fig. 1, the encoding apparatus 100 includes an inter prediction unit 110, an intra prediction unit 120, a switch 115, a subtractor 125, a transform unit 130, a quantization unit 140, an entropy encoding unit 150, an inverse quantization (inverse quantization) unit 160, an inverse transform unit 170, an adder 175, a filter unit 180, and a reference picture buffer 190.
The encoding apparatus 100 may perform encoding on a target image using an intra mode and/or an inter mode.
Further, the encoding apparatus 100 may generate a bitstream including information on encoding by encoding the target image, and may output the generated bitstream. The generated bitstream may be stored in a computer-readable storage medium and may be streamed through a wireless/wired transmission medium.
When the intra mode is used as the prediction mode, the switch 115 may switch to the intra mode. When the inter mode is used as the prediction mode, the switch 115 may switch to the inter mode.
The encoding apparatus 100 may generate a prediction block of a target block. Further, after the prediction block has been generated, the encoding apparatus 100 may encode a residual between the target block and the prediction block.
When the prediction mode is the intra mode, the intra prediction unit 120 may use pixels of a neighboring block, which is previously encoded/decoded, around the target block as reference samples. The intra prediction unit 120 may perform spatial prediction on the target block using the reference sampling points, and may generate prediction sampling points for the target block via the spatial prediction.
The inter prediction unit 110 may include a motion prediction unit and a motion compensation unit.
When the prediction mode is the inter mode, the motion prediction unit may search for a region that best matches the target block in the reference image in the motion prediction process, and may derive a motion vector for the target block and the found region based on the found region.
The reference image may be stored in the reference picture buffer 190. More specifically, when encoding and/or decoding of a reference image has been processed, the reference image may be stored in the reference picture buffer 190.
The motion compensation unit may generate the prediction block for the target block by performing motion compensation using the motion vector. Here, the motion vector may be a two-dimensional (2D) vector for inter prediction. Further, the motion vector may represent an offset between the target image and the reference image.
When the motion vector has a value other than an integer, the motion prediction unit and the motion compensation unit may generate the prediction block by applying an interpolation filter to a partial region of the reference image. To perform inter prediction or motion compensation, it may be determined which one of a skip mode, a merge mode, an Advanced Motion Vector Prediction (AMVP) mode, and a current picture reference mode corresponds to a method for predicting and compensating for motion of a PU included in a CU based on the CU, and the inter prediction or motion compensation may be performed according to the mode.
The subtractor 125 may generate a residual block, wherein the residual block is a difference between the target block and the prediction block. The residual block may also be referred to as a "residual signal".
The residual signal may be the difference between the original signal and the predicted signal. Alternatively, the residual signal may be a signal generated by transforming or quantizing the difference between the original signal and the prediction signal or a signal generated by transforming and quantizing the difference. The residual block may be a residual signal for a block unit.
The transform unit 130 may generate a transform coefficient by transforming the residual block, and may output the generated transform coefficient. Here, the transform coefficient may be a coefficient value generated by transforming the residual block.
The transformation unit 130 may use one of a plurality of predefined transformation methods when performing the transformation.
The plurality of predefined transform methods may include Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), Karhunen-Loeve transform (KLT), and the like.
The transform method for transforming the residual block may be determined according to at least one of the encoding parameters for the target block and/or the neighboring blocks. For example, the transform method may be determined based on at least one of an inter prediction mode for the PU, an intra prediction mode for the PU, a size of the TU, and a shape of the TU. Alternatively, transform information indicating a transform method may be signaled from the encoding apparatus 100 to the decoding apparatus 200.
When the transform skip mode is used, the transform unit 130 may omit an operation of transforming the residual block.
By performing quantization on the transform coefficients, quantized transform coefficient levels or quantized levels may be generated. Hereinafter, in the embodiment, each of the quantized transform coefficient level and the quantized level may also be referred to as a "transform coefficient".
The quantization unit 140 may generate a quantized transform coefficient level or a quantized level by quantizing the transform coefficient according to a quantization parameter. The quantization unit 140 may output the generated quantized transform coefficient level or the quantized level. In this case, the quantization unit 140 may quantize the transform coefficient using a quantization matrix.
The entropy encoding unit 150 may generate a bitstream by performing probability distribution-based entropy encoding based on the values calculated by the quantization unit 140 and/or the encoding parameter values calculated in the encoding process. The entropy encoding unit 150 may output the generated bitstream.
The entropy encoding unit 150 may perform entropy encoding on information about pixels of an image and information required for decoding the image. For example, information required for decoding an image may include syntax elements and the like.
When entropy coding is applied, fewer bits may be allocated to more frequently occurring symbols and more bits may be allocated to less frequently occurring symbols. Since the symbols are represented by this allocation, the size of the bit string for the target symbol to be encoded can be reduced. Accordingly, the compression performance of video encoding can be improved by entropy encoding.
Also, in order to perform entropy encoding, the entropy encoding unit 150 may use an encoding method such as exponential golomb, Context Adaptive Variable Length Coding (CAVLC), or Context Adaptive Binary Arithmetic Coding (CABAC). For example, entropy encoding unit 150 may perform entropy encoding using a variable length coding/code (VLC) table. For example, the entropy encoding unit 150 may derive a binarization method for the target symbol. Furthermore, entropy encoding unit 150 may derive a probability model for the target symbol/bin. The entropy encoding unit 150 may perform arithmetic encoding using the derived binarization method, probability model, and context model.
The entropy encoding unit 150 may transform the coefficients in the form of 2D blocks into the form of 1D vectors by a transform coefficient scanning method so as to encode the quantized transform coefficient levels.
The encoding parameter may be information required for encoding and/or decoding. The encoding parameter may include information encoded by the encoding apparatus 100 and transmitted from the encoding apparatus 100 to the decoding apparatus, and may also include information that may be derived in an encoding or decoding process. For example, the information sent to the decoding device may include syntax elements.
The encoding parameters may include not only information (or flags or indexes) such as syntax elements encoded by the encoding apparatus and signaled to the decoding apparatus by the encoding apparatus, but also information derived in the encoding or decoding process. In addition, the encoding parameter may include information required to encode or decode the image. For example, the encoding parameters may include at least one of the following, a combination of the following, or statistics: a size of the unit/block, a depth of the unit/block, partition information of the unit/block, a partition structure of the unit/block, information indicating whether the unit/block is partitioned in a quad-tree structure, information indicating whether the unit/block is partitioned in a binary-tree structure, a partition direction (horizontal direction or vertical direction) of a binary-tree structure, a partition form (symmetric partition or asymmetric partition) of a binary-tree structure, information indicating whether the unit/block is partitioned in a tri-tree structure, a partition direction (horizontal direction or vertical direction) of a tri-tree structure, a partition form (symmetric partition or asymmetric partition, etc.) of a tri-tree structure, information indicating whether the unit/block is partitioned in a composite-tree structure, a combination and direction (horizontal direction or vertical direction, etc.) of partitions of a composite-tree structure, a prediction scheme (intra-prediction or inter-prediction), a prediction scheme (inter, Intra prediction mode/direction, reference sample point filtering method, prediction block boundary filtering method, filter tap for filtering, filter coefficient for filtering, inter prediction mode, motion information, motion vector, reference picture index, inter prediction direction, inter prediction indicator, reference picture list, reference picture, motion vector predictor, motion vector prediction candidate, motion vector candidate list, information indicating whether merge mode is used, merge candidate list, information indicating whether skip mode is used, type of interpolation filter, tap of interpolation filter, filter coefficient of interpolation filter, size of motion vector, accuracy of motion vector representation, transform type, transform size, information indicating whether primary transform is used, information indicating whether additional (secondary) transform is used, transform coefficient, motion vector prediction mode, reference picture index, inter prediction mode, information indicating whether skip mode is used, information indicating whether additional (secondary) transform is used, transform coefficient, and motion vector prediction mode, First transform selection information (or first transform index), second transform selection information (or second transform index), information indicating the presence or absence of a residual signal, coding block pattern, coding block flag, quantization parameter, quantization matrix, information about in-loop filter, information indicating whether in-loop filter is applied, coefficient of in-loop filter, tap of in-loop filter, shape/form of in-loop filter, information indicating whether deblocking filter is applied, coefficient of deblocking filter, tap of deblocking filter, deblocking filter strength, shape/form of deblocking filter, information indicating whether adaptive sample offset is applied, value of adaptive sample offset, class of adaptive sample offset, type of adaptive sample offset, information indicating whether adaptive loop filter is applied, information indicating whether adaptive sample offset is applied, information indicating whether adaptive loop filter is applied, information indicating whether adaptive sample offset is applied, information indicating whether adaptive loop filter is, Coefficients of an adaptive loop filter, taps of the adaptive loop filter, a shape/form of the adaptive loop filter, a binarization/inverse binarization method, a context model decision method, a context model update method, information indicating whether a normal mode is performed, information indicating whether a bypass (bypass) mode is performed, a context binary bit, a bypass binary bit, a transform coefficient level scanning method, an image display/output order, slice identification information, a slice type, slice partition information, parallel block identification information, a parallel block type, parallel block partition information, a picture type, a bit depth, information on a luminance signal, and information on a chrominance signal. The prediction scheme may represent one of an intra prediction mode and an inter prediction mode.
The first transform selection information may indicate a first transform applied to the target block.
The quadratic transform selection information may indicate a quadratic transform applied to the target block.
The residual signal may represent the difference between the original signal and the predicted signal. Alternatively, the residual signal may be a signal generated by transforming a difference between the original signal and the prediction signal. Alternatively, the residual signal may be a signal generated by transforming and quantizing the difference between the original signal and the prediction signal. The residual block may be a residual signal for the block.
Here, signaling the flag or the index may indicate that the encoding apparatus 100 includes an entropy-encoded flag or an entropy-encoded index generated by performing entropy encoding on the flag or the index in the bitstream, and may indicate that the decoding apparatus 200 acquires the flag or the index by performing entropy decoding on the entropy-encoded flag or the entropy-encoded index extracted from the bitstream.
Since the encoding apparatus 100 performs encoding via inter prediction, the encoded target image can be used as a reference image for another image to be subsequently processed. Accordingly, the encoding apparatus 100 may reconstruct or decode the encoded target image and store the reconstructed or decoded image as a reference image in the reference picture buffer 190. For decoding, inverse quantization and inverse transformation of the encoded target image may be performed.
The quantized levels may be inverse quantized by the inverse quantization unit 160 and inverse transformed by the inverse transformation unit 170. The coefficients that have been inverse quantized and/or inverse transformed may be added to the prediction block by adder 175. The inverse quantized and/or inverse transformed coefficients and the prediction block are added, and then a reconstructed block may be generated. Here, the coefficients of inverse quantization and/or inverse transformation may represent the coefficients on which one or more of inverse quantization and inverse transformation are performed, and may also represent the reconstructed residual block.
The reconstructed block may be filtered by the filter unit 180. Filter unit 180 may apply one or more of a deblocking filter, a Sample Adaptive Offset (SAO) filter, an Adaptive Loop Filter (ALF), and a non-local filter (NLF) to the reconstructed block or reconstructed picture. The filter unit 180 may also be referred to as a "loop filter".
The deblocking filter may remove block distortion occurring at the boundary between blocks. In order to determine whether to apply the deblocking filter, it may be decided to be included in the block and include the number of columns or lines of pixels on which to determine whether to apply the deblocking filter to the target block.
When the deblocking filter is applied to the target block, the applied filter may be different according to the strength of the deblocking filtering required. In other words, among different filters, a filter decided in consideration of the strength of the deblocking filtering may be applied to the target block. When the deblocking filter is applied to the target block, a filter corresponding to any one of the strong filter and the weak filter may be applied to the target block according to the strength of the required deblocking filter.
Further, when vertical filtering and horizontal filtering are performed on the target block, the horizontal filtering and the vertical filtering may be performed in parallel.
The SAO may add appropriate offsets to the pixel values to compensate for the coding error. The SAO may perform a correction on the image to which the deblocking is applied on a pixel basis, wherein the correction uses an offset of a difference between the original image and the image to which the deblocking is applied. In order to perform offset correction for an image, a method for dividing pixels included in the image into a certain number of regions, determining a region to which an offset is to be applied among the divided regions, and applying the offset to the determined region may be used, and a method for applying the offset in consideration of edge information of each pixel may also be used.
ALF may perform filtering based on values obtained by comparing a reconstructed image with an original image. After pixels included in an image have been divided into a predetermined number of groups, a filter to be applied to each group may be determined, and filtering may be performed differently for the respective groups. For luminance signals, information about whether to apply the adaptive loop filter may be signaled for each CU. The shape and filter coefficients of the ALF to be applied to each block may be different for each block. Alternatively, ALF having a fixed form may be applied to a block regardless of the characteristics of the block.
The non-local filter may perform filtering based on a reconstructed block similar to the target block. A region similar to the target block may be selected from the reconstructed picture, and filtering of the target block may be performed using statistical properties of the selected similar region. Information about whether to apply a non-local filter may be signaled for a Coding Unit (CU). Further, the shape and filter coefficients of the non-local filter to be applied to a block may be different according to the block.
The reconstructed block or the reconstructed image filtered by the filter unit 180 may be stored in the reference picture buffer 190. The reconstructed block filtered by the filter unit 180 may be a portion of a reference picture. In other words, the reference picture may be a reconstructed picture composed of the reconstructed block filtered by the filter unit 180. The stored reference pictures can then be used for inter prediction.
Fig. 2 is a block diagram showing a configuration of an embodiment of a decoding apparatus to which the present disclosure is applied.
The decoding apparatus 200 may be a decoder, a video decoding apparatus, or an image decoding apparatus.
Referring to fig. 2, the decoding apparatus 200 may include an entropy decoding unit 210, an inverse quantization (inverse quantization) unit 220, an inverse transformation unit 230, an intra prediction unit 240, an inter prediction unit 250, a switch 245, an adder 255, a filter unit 260, and a reference picture buffer 270.
The decoding apparatus 200 may receive the bitstream output from the encoding apparatus 100. The decoding apparatus 200 may receive a bitstream stored in a computer-readable storage medium and may receive a bitstream transmitted through a wired/wireless transmission medium stream.
The decoding apparatus 200 may perform decoding on the bitstream in an intra mode and/or an inter mode. Further, the decoding apparatus 200 may generate a reconstructed image or a decoded image via decoding, and may output the reconstructed image or the decoded image.
For example, an operation of switching to an intra mode or an inter mode based on a prediction mode for decoding may be performed by the switch 245. When the prediction mode for decoding is intra mode, switch 245 may be operated to switch to intra mode. When the prediction mode for decoding is an inter mode, switch 245 may be operated to switch to the inter mode.
The decoding apparatus 200 may acquire a reconstructed residual block by decoding an input bitstream and may generate a prediction block. When the reconstructed residual block and the prediction block are acquired, the decoding apparatus 200 may generate a reconstructed block, which is a target to be decoded, by adding the reconstructed residual block to the prediction block.
The entropy decoding unit 210 may generate symbols by performing entropy decoding on the bitstream based on the probability distribution of the bitstream. The generated symbols may comprise quantized transform coefficient level format symbols. Here, the entropy decoding method may be similar to the entropy encoding method described above. That is, the entropy decoding method may be the inverse process of the entropy encoding method described above.
The entropy decoding unit 210 may change coefficients having a one-dimensional (1D) vector form into a 2D block shape by a transform coefficient scanning method in order to decode quantized transform coefficient levels.
For example, the coefficients of a block may be changed to a 2D block shape by scanning the block coefficients using an upper right diagonal scan. Alternatively, which one of the upper right diagonal scan, the vertical scan, and the horizontal scan is to be used may be determined according to the size of the corresponding block and/or the intra prediction mode.
The quantized coefficients may be inverse quantized by the inverse quantization unit 220. The inverse quantization unit 220 may generate inverse quantized coefficients by performing inverse quantization on the quantized coefficients. Also, the inverse quantized coefficients may be inverse transformed by the inverse transform unit 230. The inverse transform unit 230 may generate a reconstructed residual block by performing an inverse transform on the inversely quantized coefficients. As a result of inverse quantization and inverse transformation performed on the quantized coefficients, a reconstructed residual block may be generated. Here, when generating the reconstructed residual block, the inverse quantization unit 220 may apply a quantization matrix to the quantized coefficients.
When the intra mode is used, the intra prediction unit 240 may generate a prediction block by performing spatial prediction using pixel values of previously decoded neighboring blocks around the target block.
The inter prediction unit 250 may include a motion compensation unit. Alternatively, the inter prediction unit 250 may be designated as a "motion compensation unit".
When the inter mode is used, the motion compensation unit 250 may generate a prediction block by performing motion compensation using a reference image stored in the reference picture buffer 270 and a motion vector.
The motion compensation unit may apply an interpolation filter to a partial region of the reference image when the motion vector has a value other than an integer, and may generate the prediction block using the reference image to which the interpolation filter is applied. To perform motion compensation, the motion compensation unit may determine which one of a skip mode, a merge mode, an Advanced Motion Vector Prediction (AMVP) mode, and a current picture reference mode corresponds to a motion compensation method for a PU included in the CU based on the CU, and may perform motion compensation according to the determined mode.
The reconstructed residual block and the prediction block may be added to each other by an adder 255. The adder 255 may generate a reconstructed block by adding the reconstructed residual block and the predicted block.
The reconstructed block may be filtered by the filter unit 260. Filter unit 260 may apply at least one of a deblocking filter, SAO filter, ALF, and NLF to the reconstructed block or the reconstructed image. The reconstructed image may be a picture that includes the reconstructed block.
The filtered reconstructed image may be output by the encoding apparatus 100 and may be used by the encoding apparatus.
The reconstructed image filtered by the filter unit 260 may be stored as a reference picture in the reference picture buffer 270. The reconstructed block filtered by the filter unit 260 may be a portion of a reference picture. In other words, the reference picture may be an image composed of the reconstructed block filtered by the filter unit 260. The stored reference pictures can then be used for inter prediction.
Fig. 3 is a diagram schematically showing a partition structure of an image when the image is encoded and decoded.
Fig. 3 may schematically illustrate an example in which a single cell is partitioned into a plurality of sub-cells.
In order to efficiently partition an image, a Coding Unit (CU) may be used in encoding and decoding. The term "unit" may be used to collectively specify 1) a block comprising image samples and 2) syntax elements. For example, "partition of a unit" may represent "partition of a block corresponding to the unit".
A CU can be used as a basic unit for image encoding/decoding. A CU can be used as a unit to which one mode selected from an intra mode and an inter mode is applied in image encoding/decoding. In other words, in image encoding/decoding, it may be determined which one of an intra mode and an inter mode is to be applied to each CU.
Also, a CU may be a basic unit that predicts, transforms, quantizes, inversely transforms, inversely quantizes, and encodes/decodes transform coefficients.
Referring to fig. 3, a picture 300 may be sequentially partitioned into units corresponding to maximum coding units (LCUs), and a partition structure may be determined for each LCU. Here, the LCU may be used to have the same meaning as a Coding Tree Unit (CTU).
Partitioning a unit may mean partitioning a block corresponding to the unit. The block partition information may include depth information regarding a depth of the unit. The depth information may indicate a number of times the unit is partitioned and/or a degree to which the unit is partitioned. A single unit may be hierarchically partitioned into sub-units while having tree structure based depth information. Each partitioned sub-unit may have depth information. The depth information may be information indicating a size of the CU. Depth information may be stored for each CU.
Each CU may have depth information. When a CU is partitioned, the depth of the CU generated from the partition may be increased by 1 from the depth of the partitioned CU.
The partition structure may represent the distribution of Coding Units (CUs) in the LCU 310 for efficient encoding of the image. Such a distribution may be determined according to whether a single CU is to be partitioned into multiple CUs. The number of CUs generated by partitioning may be a positive integer of 2 or more, including 2, 3, 4, 8, 16, etc. According to the number of CUs generated by performing partitioning, the horizontal size and the vertical size of each CU generated by performing partitioning may be smaller than those of the CUs before being partitioned.
Each partitioned CU may be recursively partitioned into four CUs in the same manner. At least one of a horizontal size and a vertical size of each partitioned CU may be reduced via recursive partitioning compared to at least one of a horizontal size and a vertical size of a CU before being partitioned.
Partitioning of CUs may be performed recursively until a predefined depth or a predefined size. For example, the depth of a CU may have a value ranging from 0 to 3. The size of a CU may range from a size of 64 × 64 to a size of 8 × 8, depending on the depth of the CU.
For example, the depth of an LCU may be 0 and the depth of a minimum coding unit (SCU) may be a predefined maximum depth. Here, as described above, the LCU may be a CU having a maximum coding unit size, and the SCU may be a CU having a minimum coding unit size.
Partitioning may begin at LCU 310, and the depth of a CU may increase by 1 each time the horizontal and/or vertical dimensions of the CU are reduced by partitioning.
For example, for each depth, a CU that is not partitioned may have a size of 2N × 2N. Further, in the case where CUs are partitioned, CUs of a size of 2N × 2N may be partitioned into four CUs each of a size of N × N. The value of N may be halved each time the depth is increased by 1.
Referring to fig. 3, an LCU having a depth of 0 may have 64 × 64 pixels or 64 × 64 blocks. 0 may be a minimum depth. An SCU of depth 3 may have 8 × 8 pixels or 8 × 8 blocks. 3 may be the maximum depth. Here, a CU having 64 × 64 blocks as an LCU may be represented by depth 0. A CU with 32 x 32 blocks may be represented with depth 1. A CU with 16 x 16 blocks may be represented with depth 2. A CU with 8 x 8 blocks as SCU can be represented by depth 3.
The information on whether the corresponding CU is partitioned or not may be represented by partition information of the CU. The partition information may be 1-bit information. All CUs except the SCU may include partition information. For example, the value of the partition information of a CU that is not partitioned may be 0. The value of the partition information of the partitioned CU may be 1.
For example, when a single CU is partitioned into four CUs, the horizontal and vertical sizes of each of the four CUs generated by partitioning may be half the horizontal and vertical sizes of the CU before being partitioned. When a CU having a size of 32 × 32 is partitioned into four CUs, the size of each of the partitioned four CUs may be 16 × 16. When a single CU is partitioned into four CUs, the CUs may be considered to have been partitioned in a quadtree structure.
For example, when a single CU is partitioned into two CUs, the horizontal size or the vertical size of each of the two CUs generated by partitioning may be half the horizontal size or the vertical size of the CU before being partitioned. When a CU having a size of 32 × 32 is vertically partitioned into two CUs, the size of each of the partitioned two CUs may be 16 × 32. When a CU having a size of 32 × 32 is horizontally partitioned into two CUs, the size of each of the partitioned two CUs may be 32 × 16. When a single CU is partitioned into two CUs, the CUs may be considered to have been partitioned in a binary tree structure.
Both quad tree and binary tree partitioning are applied to LCU310 of fig. 3.
In the encoding apparatus 100, a Coding Tree Unit (CTU) having a size of 64 × 64 may be partitioned into a plurality of smaller CUs by a recursive quad-tree structure. A single CU may be partitioned into four CUs having the same size. Each CU may be recursively partitioned and may have a quadtree structure.
By recursive partitioning of CUs, the optimal partitioning method that results in the smallest rate-distortion cost can be selected.
Fig. 4 is a diagram illustrating a form of a Prediction Unit (PU) that a Coding Unit (CU) can include.
Among CUs partitioned from the LCU, CUs that are no longer partitioned may be divided into one or more Prediction Units (PUs). This division is also referred to as "partitioning".
A PU may be the basic unit for prediction. The PU may be encoded and decoded in any one of a skip mode, an inter mode, and an intra mode. The PU may be partitioned into various shapes according to various modes. For example, the target block described above with reference to fig. 1 and the target block described above with reference to fig. 2 may both be PUs.
A CU may not be partitioned into PUs. When a CU is not divided into PUs, the size of the CU and the size of the PU may be equal to each other.
In skip mode, there may be no partition in a CU. In the skip mode, the 2N × 2N mode 410 may be supported without partitioning, wherein the size of the PU and the size of the CU are the same as each other in the 2N × 2N mode 410.
In inter mode, there may be 8 types of partition shapes in a CU. For example, in the inter mode, a 2N × 2N mode 410, a 2N × N mode 415, an N × 2N mode 420, an N × N mode 425, a 2N × nU mode 430, a 2N × nD mode 435, an nL × 2N mode 440, and an nR × 2N mode 445 may be supported.
In intra mode, a 2N × 2N mode 410 and an N × N mode 425 may be supported.
In the 2N × 2N mode 410, PUs of size 2N × 2N may be encoded. A PU of size 2N × 2N may represent a PU of the same size as the CU. For example, a PU of size 2N × 2N may have a size 64 × 64, 32 × 32, 16 × 16, or 8 × 8.
In the nxn mode 425, PUs of size nxn may be encoded.
For example, in intra prediction, when the size of a PU is 8 × 8, four partitioned PUs may be encoded. The size of each partitioned PU may be 4 x 4.
When a PU is encoded in intra mode, the PU may be encoded using any one of a plurality of intra prediction modes. For example, HEVC techniques may provide 35 intra prediction modes, a PU may be encoded under any one of the 35 intra prediction modes.
Which of the 2N × 2N mode 410 and the N × N mode 425 is to be used to encode the PU may be determined based on the rate-distortion cost.
The encoding apparatus 100 may perform an encoding operation on PUs having a size of 2N × 2N. Here, the encoding operation may be an operation of encoding the PU in each of a plurality of intra prediction modes that can be used by the encoding apparatus 100. Through the encoding operation, the optimal intra prediction mode for a PU of size 2N × 2N may be derived. The optimal intra prediction mode may be an intra prediction mode in which a minimum rate-distortion cost occurs when a PU having a size of 2N × 2N is encoded, among a plurality of intra prediction modes that can be used by the encoding apparatus 100.
Further, the encoding apparatus 100 may sequentially perform an encoding operation on the respective PUs obtained by performing the N × N partitioning. Here, the encoding operation may be an operation of encoding the PU in each of a plurality of intra prediction modes that can be used by the encoding apparatus 100. Through the encoding operation, the optimal intra prediction mode for a PU of size N × N may be derived. The optimal intra prediction mode may be an intra prediction mode in which a minimum rate-distortion cost occurs when a PU having a size of N × N is encoded, among a plurality of intra prediction modes that can be used by the encoding apparatus 100.
The encoding apparatus 100 may determine which one of a PU of size 2N × 2N and a PU of size N × N is to be encoded based on a comparison between a rate distortion cost of the PU of size 2N × 2N and a rate distortion cost of the PU of size N × N.
A single CU may be partitioned into one or more PUs, and a PU may be partitioned into multiple PUs.
For example, when a single PU is partitioned into four PUs, the horizontal and vertical dimensions of each of the four PUs generated by the partitioning may be half the horizontal and vertical dimensions of the PU before being partitioned. When a PU of size 32 x 32 is partitioned into four PUs, the size of each of the four partitioned PUs may be 16 x 16. When a single PU is partitioned into four PUs, the PUs may be considered to have been partitioned in a quad-tree structure.
For example, when a single PU is partitioned into two PUs, the horizontal or vertical size of each of the two PUs generated by the partitioning may be half the horizontal or vertical size of the PU before being partitioned. When a PU of size 32 x 32 is vertically partitioned into two PUs, the size of each of the two partitioned PUs may be 16 x 32. When a PU of size 32 x 32 is horizontally partitioned into two PUs, the size of each of the two partitioned PUs may be 32 x 16. When a single PU is partitioned into two PUs, the PUs may be considered to have been partitioned in a binary tree structure.
Fig. 5 is a diagram illustrating a form of a Transform Unit (TU) that can be included in a CU.
A Transform Unit (TU) may be a basic unit used in a CU for processes such as transform, quantization, inverse transform, inverse quantization, entropy coding, and entropy decoding.
The TU may have a square shape or a rectangular shape. The shape of a TU may be determined based on the size and/or shape of the CU.
Among CUs partitioned from the LCU, CUs that are no longer partitioned into CUs may be partitioned into one or more TUs. Here, the partition structure of the TU may be a quad-tree structure. For example, as shown in fig. 5, a single CU 510 may be partitioned one or more times according to a quadtree structure. With such partitioning, a single CU 510 may be composed of TUs having various sizes.
A CU may be considered to be recursively divided when a single CU is divided two or more times. By the division, a single CU may be composed of Transform Units (TUs) having various sizes.
Alternatively, a single CU may be divided into one or more TUs based on the number of vertical and/or horizontal lines dividing the CU.
A CU may be divided into symmetric TUs or asymmetric TUs. For the division into asymmetric TUs, information regarding the size and/or shape of each TU may be signaled from the encoding apparatus 100 to the decoding apparatus 200. Alternatively, the size and/or shape of each TU may be derived from information on the size and/or shape of the CU.
A CU may not be divided into TUs. When a CU is not divided into TUs, the size of the CU and the size of the TU may be equal to each other.
A single CU may be partitioned into one or more TUs, and a TU may be partitioned into multiple TUs.
For example, when a single TU is partitioned into four TUs, the horizontal size and the vertical size of each of the four TUs generated by the partitioning may be half of the horizontal size and the vertical size of the TU before being partitioned. When a TU having a size of 32 × 32 is partitioned into four TUs, the size of each of the four partitioned TUs may be 16 × 16. When a single TU is partitioned into four TUs, the TUs may be considered to have been partitioned in a quadtree structure.
For example, when a single TU is partitioned into two TUs, the horizontal size or the vertical size of each of the two TUs generated by the partitioning may be half of the horizontal size or the vertical size of the TU before being partitioned. When a TU of a size of 32 × 32 is vertically partitioned into two TUs, each of the two partitioned TUs may be of a size of 16 × 32. When a TU having a size of 32 × 32 is horizontally partitioned into two TUs, the size of each of the two partitioned TUs may be 32 × 16. When a single TU is partitioned into two TUs, the TUs may be considered to have been partitioned in a binary tree structure.
A CU may be partitioned in a different manner than shown in fig. 5.
For example, a single CU may be divided into three CUs. The horizontal or vertical sizes of the three CUs generated by the division may be 1/4, 1/2, and 1/4, respectively, of the horizontal or vertical size of the original CU before being divided.
For example, when a CU having a size of 32 × 32 is vertically divided into three CUs, the sizes of the three CUs generated by the division may be 8 × 32, 16 × 32, and 8 × 32, respectively. In this way, when a single CU is divided into three CUs, the CU can be considered to be divided in a form of a ternary tree.
One of exemplary division forms (i.e., quadtree division, binary tree division, and ternary tree division) may be applied to the division of the CU, and a variety of division schemes may be combined and used together for the division of the CU. Here, a case where a plurality of division schemes are combined and used together may be referred to as "composite tree format division".
Fig. 6 illustrates partitioning of blocks according to an example.
In the video encoding and/or decoding process, as shown in fig. 6, the target block may be divided.
For the division of the target block, an indicator indicating division information may be signaled from the encoding apparatus 100 to the decoding apparatus 200. The partition information may be information indicating how the target block is partitioned.
The partition information may be one or more of a partition flag (hereinafter, referred to as "split _ flag"), a quad-binary flag (hereinafter, referred to as "QB _ flag"), a quad-tree flag (hereinafter, referred to as "quadtree _ flag"), a binary tree flag (hereinafter, referred to as "binary _ flag"), and a binary type flag (hereinafter, referred to as "Btype _ flag").
The "split _ flag" may be a flag indicating whether the block is divided. For example, a split _ flag value of 1 may indicate that the corresponding block is divided. A split _ flag value of 0 may indicate that the corresponding block is not divided.
"QB _ flag" may be a flag indicating which of the quad tree form and the binary tree form corresponds to the shape in which the block is divided. For example, a QB _ flag value of 0 may indicate that the block is divided in a quad tree form. A QB _ flag value of 1 may indicate that the block is divided in a binary tree form. Alternatively, the QB _ flag value of 0 may indicate that the block is divided in a binary tree form. A QB _ flag value of 1 may indicate that the block is divided in a quad tree form.
"quadtree _ flag" may be a flag indicating whether a block is divided in a quad-tree form. For example, a value of quadtree _ flag of 1 may indicate that the block is divided in a quad-tree form. A quadtree _ flag value of 0 may indicate that the block is not divided in a quadtree form.
"binarytree _ flag" may be a flag indicating whether a block is divided in a binary tree form. For example, a binarytree _ flag value of 1 may indicate that the block is divided in a binary tree form. A binarytree _ flag value of 0 may indicate that the block is not divided in a binary tree form.
"Btype _ flag" may be a flag indicating which one of the vertical division and the horizontal division corresponds to the division direction when the block is divided in the binary tree form. For example, a Btype _ flag value of 0 may indicate that the block is divided in the horizontal direction. A Btype _ flag value of 1 may indicate that the block is divided in the vertical direction. Alternatively, a Btype _ flag value of 0 may indicate that the block is divided in the vertical direction. A Btype _ flag value of 1 may indicate that the block is divided in the horizontal direction.
For example, the partition information of the block in fig. 6 may be derived by signaling at least one of quadtree _ flag, binytree _ flag, and Btype _ flag, as shown in table 1 below.
TABLE 1
Figure BDA0002623340100000301
For example, the partition information of the block in fig. 6 may be derived by signaling at least one of split _ flag, QB _ flag, and Btype _ flag, as shown in table 2 below.
TABLE 2
Figure BDA0002623340100000311
The partitioning method may be limited to only a quad tree or a binary tree depending on the size and/or shape of the block. When this restriction is applied, the split _ flag may be a flag indicating whether the block is divided in a quad tree form or a flag indicating whether the block is divided in a binary tree form. The size and shape of the block may be derived from the depth information of the block, and the depth information may be signaled from the encoding apparatus 100 to the decoding apparatus 200.
When the size of the block falls within a certain range, division in the form of only a quad tree is possible. For example, the specific range may be defined by at least one of a maximum block size and a minimum block size that can be divided only in a quad-tree form.
Information indicating the maximum block size and the minimum block size that can be divided only in the form of a quadtree may be signaled from the encoding apparatus 100 to the decoding apparatus 200 through a bitstream. Further, this information may be signaled for at least one of units such as video, sequences, pictures, and slices (or clips).
Alternatively, the maximum block size and/or the minimum block size may be a fixed size predefined by the encoding apparatus 100 and the decoding apparatus 200. For example, when the size of the block is larger than 64 × 64 and smaller than 256 × 256, only the division in the form of a quad tree is possible. In this case, split _ flag may be a flag indicating whether to perform partitioning in the form of a quad tree.
When the size of the block falls within a certain range, division in the form of only a binary tree is possible. For example, the specific range may be defined by at least one of a maximum block size and a minimum block size that can be divided only in a binary tree form.
Information indicating the maximum block size and/or the minimum block size that can be divided only in a binary tree form may be signaled from the encoding apparatus 100 to the decoding apparatus 200 through a bitstream. Further, this information may be signaled for at least one of the units such as sequence, picture, and slice (or slice).
Alternatively, the maximum block size and/or the minimum block size may be a fixed size predefined by the encoding apparatus 100 and the decoding apparatus 200. For example, when the size of the block is larger than 8 × 8 and smaller than 16 × 16, only division in a binary tree form is possible. In this case, split _ flag may be a flag indicating whether to perform partitioning in the form of a binary tree.
The partitioning of the block may be limited by the previous partitioning. For example, when a block is divided in a binary tree form and a plurality of partition blocks are generated, each partition block may be additionally divided only in a binary tree form.
The indicator may not be signaled when the horizontal size or the vertical size of the partition block is a size that cannot be further divided.
Fig. 7 is a diagram for explaining an embodiment of an intra prediction process.
The arrows extending radially from the center of the graph in fig. 7 indicate the prediction directions of the intra prediction modes. Further, numerals appearing in the vicinity of the arrow indicate examples of mode values assigned to the intra prediction mode or to the prediction direction of the intra prediction mode.
Intra-coding and/or decoding may be performed using reference samples of blocks adjacent to the target block. The neighboring blocks may be neighboring reconstructed blocks. For example, intra-coding and/or decoding may be performed using values of reference samples included in each of the neighboring reconstructed blocks or encoding parameters of the neighboring reconstructed blocks.
The encoding apparatus 100 and/or the decoding apparatus 200 may generate the prediction block by performing intra prediction on the target block based on the information on the sampling points in the target image. When the intra prediction is performed, the encoding apparatus 100 and/or the decoding apparatus 200 may generate a prediction block for the target block by performing the intra prediction based on the information on the sampling points in the target image. When the intra prediction is performed, the encoding apparatus 100 and/or the decoding apparatus 200 may perform directional prediction and/or non-directional prediction based on the at least one reconstructed reference sample.
The prediction block may be a block generated as a result of performing intra prediction. The prediction block may correspond to at least one of a CU, a PU, and a TU.
The units of the prediction block may have a size corresponding to at least one of the CU, the PU, and the TU. The prediction block may have a square shape with a size of 2N × 2N or N × N. The size N × N may include sizes 4 × 4, 8 × 8, 16 × 16, 32 × 32, 64 × 64, and so on.
Alternatively, the prediction block may be a square block having a size of 2 × 2, 4 × 4, 8 × 8, 16 × 16, 32 × 32, 64 × 64, or the like or a rectangular block having a size of 2 × 8, 4 × 8, 2 × 16, 4 × 16, 8 × 16, or the like.
The intra prediction may be performed in consideration of an intra prediction mode for the target block. The number of intra prediction modes that the target block may have may be a predefined fixed value, and may be a value differently determined according to the properties of the prediction block. For example, the properties of the prediction block may include the size of the prediction block, the type of prediction block, and the like.
For example, the number of intra prediction modes may be fixed to 35 regardless of the size of the prediction block. Alternatively, the number of intra prediction modes may be, for example, 3, 5, 9, 17, 34, 35, or 36.
The intra prediction mode may be a non-directional mode or a directional mode. For example, as shown in fig. 7, the intra prediction modes may include two non-directional modes and 33 directional modes.
The two non-directional modes may include a DC mode and a planar mode.
The directional pattern may be a pattern having a specific direction or a specific angle.
The intra prediction modes may be represented by at least one of a mode number, a mode value, and a mode angle. The number of intra prediction modes may be M. The value of M may be 1 or greater. In other words, the number of intra prediction modes may be M, where M includes the number of non-directional modes and the number of directional modes.
The number of intra prediction modes may be fixed to M regardless of the size and/or color components of the block. For example, the number of intra prediction modes may be fixed to any one of 35 and 67 regardless of the size of the block.
Alternatively, the number of intra prediction modes may be different according to the size of the block and/or the type of color component.
For example, the larger the size of the block, the larger the number of intra prediction modes. Alternatively, the larger the size of the block, the smaller the number of intra prediction modes. When the size of the block is 4 × 4 or 8 × 8, the number of intra prediction modes may be 67. When the size of the block is 16 × 16, the number of intra prediction modes may be 35. When the size of the block is 32 × 32, the number of intra prediction modes may be 19. When the size of the block is 64 × 64, the number of intra prediction modes may be 7.
For example, the number of intra prediction modes may be different according to whether a color component is a luminance signal or a chrominance signal. Alternatively, the number of intra prediction modes corresponding to the luminance component block may be greater than the number of intra prediction modes corresponding to the chrominance component block.
For example, in the vertical mode with the mode value of 26, prediction may be performed in the vertical direction based on the pixel values of the reference sampling points. For example, in the horizontal mode with the mode value of 10, prediction may be performed in the horizontal direction based on the pixel values of the reference sampling points.
Even in a directional mode other than the above-described modes, the encoding apparatus 100 and the decoding apparatus 200 may perform intra prediction on a target unit using reference samples according to an angle corresponding to the directional mode.
The intra prediction mode located on the right side with respect to the vertical mode may be referred to as a "vertical-right mode". The intra prediction mode located below the horizontal mode may be referred to as a "horizontal-below mode". For example, in fig. 7, the intra prediction mode having one of the mode values 27, 28, 29, 30, 31, 32, 33, and 34 may be a vertical-right mode 613. The intra prediction mode having a mode value of one of 2, 3, 4, 5, 6, 7, 8, and 9 may be a horizontal-lower mode 616.
The non-directional mode may include a DC mode and a planar mode. For example, the value of the DC mode may be 1. The value of the planar mode may be 0.
The directional pattern may include an angular pattern. Among the plurality of intra prediction modes, the remaining modes other than the DC mode and the planar mode may be directional modes.
When the intra prediction mode is the DC mode, the prediction block may be generated based on an average value of pixel values of the plurality of reference pixels. For example, the values of the pixels of the prediction block may be determined based on an average of pixel values of a plurality of reference pixels.
The number of intra prediction modes and the mode values of the respective intra prediction modes described above are merely exemplary. The number of intra prediction modes described above and the mode values of the respective intra prediction modes may be defined differently according to embodiments, implementations, and/or requirements.
In order to perform intra prediction on the target block, a step of checking whether or not a sample included in the reconstructed neighboring block can be used as a reference sample of the target block may be performed. When there are samples that cannot be used as reference samples of the target block among samples in the neighboring block, a value generated via interpolation and/or duplication using at least one sample value among samples included in the reconstructed neighboring block may replace sample values of samples that cannot be used as reference samples. When a value generated via replication and/or interpolation replaces a sample value of an existing sample, the sample may be used as a reference sample for the target block.
In the intra prediction, a filter may be applied to at least one of the reference sampling point and the prediction sampling point based on at least one of an intra prediction mode and a size of the target block.
The type of the filter to be applied to at least one of the reference samples and the prediction samples may be different according to at least one of an intra prediction mode of the target block, a size of the target block, and a shape of the target block. The type of filter may be classified according to one or more of the number of filter taps, the value of the filter coefficient, and the filter strength.
When the intra prediction mode is the planar mode, the sample value of the prediction target block may be generated using a weighted sum of the upper reference sample of the target block, the left reference sample of the target block, the upper right reference sample of the target block, and the lower left reference sample of the target block according to the position of the prediction target sample in the prediction block when generating the prediction block of the target block.
When the intra prediction mode is the DC mode, an average value of the reference samples above the target block and the reference samples to the left of the target block may be used in generating the prediction block of the target block. Further, filtering using the value of the reference sampling point may be performed on a specific row or a specific column in the target block. The particular row may be one or more upper rows adjacent to the reference sample point. The particular column may be one or more left-hand columns adjacent to the reference sample point.
When the intra prediction mode is a directional mode, the prediction block may be generated using the upper reference sample, the left reference sample, the upper right reference sample, and/or the lower left reference sample of the target block.
To generate the predicted samples described above, real-based interpolation may be performed.
The intra prediction mode of the target block may be predicted from intra prediction modes of neighboring blocks adjacent to the target block, and information for prediction may be entropy-encoded/entropy-decoded.
For example, when the intra prediction modes of the target block and the neighboring block are identical to each other, the intra prediction modes of the target block and the neighboring block may be signaled to be identical using a predefined flag.
For example, an indicator for indicating the same intra prediction mode as that of the target block among intra prediction modes of a plurality of neighboring blocks may be signaled.
When the intra prediction modes of the target block and the neighboring block are different from each other, information regarding the intra prediction mode of the target block may be encoded and/or decoded using entropy encoding and/or entropy decoding.
Fig. 8 is a diagram for explaining positions of reference samples used in an intra prediction process.
Fig. 8 illustrates positions of reference samples used for intra prediction of a target block. Referring to fig. 8, the reconstructed reference samples for intra prediction of the target block may include a lower left reference sample 831, a left reference sample 833, an upper left reference sample 835, an upper reference sample 837, and an upper right reference sample 839.
For example, the left reference sample 833 may represent a reconstructed reference pixel adjacent to the left side of the target block. The upper reference sample 837 may represent the reconstructed reference pixel adjacent to the top of the target block. The upper left reference sampling point 835 may represent the reconstructed reference pixel located at the upper left corner of the target block. The lower left reference spot 831 may represent a reference spot located below the left side spot line composed of the left reference spot 833 among spots located on the same line as the left side spot line. The upper right reference sampling point 839 may represent a reference sampling point located at the right side of the upper sampling point line, among sampling points located on the same line as the upper sampling point line composed of the upper reference sampling points 837.
When the size of the target block is N × N, the numbers of the lower left reference samples 831, the left reference samples 833, the upper reference samples 837, and the upper right reference samples 839 may all be N.
By performing intra prediction on the target block, a prediction block may be generated. The process of generating the prediction block may include determining values of pixels in the prediction block. The sizes of the target block and the prediction block may be the same.
The reference sampling point used for intra prediction of the target block may be changed according to the intra prediction mode of the target block. The direction of the intra prediction mode may represent a dependency between the reference samples and the pixels of the prediction block. For example, a value specifying a reference sample may be used as a value of one or more specified pixels in the prediction block. In this case, the specified reference samples and the one or more specified pixels in the prediction block may be samples and pixels located on a straight line along a direction of the intra prediction mode. In other words, the value of the specified reference sample point may be copied as the value of the pixel located in the direction opposite to the direction of the intra prediction mode. Alternatively, the value of a pixel in the prediction block may be a value of a reference sample point located in the direction of the intra prediction mode with respect to the position of the pixel.
In an example, when the intra prediction mode of the target block is a vertical mode having a mode value of 26, the upper reference samples 837 may be used for intra prediction. When the intra prediction mode is a vertical mode, the value of a pixel in the prediction block may be a value of a reference sample point vertically above the position of the pixel. Therefore, the upper reference samples 837 adjacent to the top of the target block may be used for intra prediction. Also, the values of pixels in a row of the prediction block may be the same as the values of the pixels of the upper reference sample 837.
In an example, when the intra prediction mode of the target block is a horizontal mode having a mode value of 10, the left reference sampling point 833 may be used for intra prediction. When the intra prediction mode is a horizontal mode, the value of a pixel in the prediction block may be a value of a reference sample horizontally located to the left of the position of the pixel. Therefore, the left reference sample 833 adjacent to the left side of the target block may be used for intra prediction. Furthermore, the values of the pixels in a column of the prediction block may be the same as the values of the pixels of the left reference sample 833.
In an example, when the mode value of the intra prediction mode of the current block is 18, at least some of the left reference samples 833, the upper-left corner reference samples 835, and at least some of the upper reference samples 837 may be used for intra prediction. When the mode value of the intra prediction mode is 18, the value of a pixel in the prediction block may be a value of a reference sample point located diagonally at an upper left corner of the pixel.
Also, in the case where an intra prediction mode having a mode value of 27, 28, 29, 30, 31, 32, 33, or 34 is used, at least a portion of the upper-right reference samples 839 may be used for intra prediction.
Also, in the case where an intra prediction mode having a mode value of 2, 3, 4, 5, 6, 7, 8, or 9 is used, at least a portion of the lower left reference samples 831 may be used for intra prediction.
Also, in case that the mode value is an intra prediction mode of a value ranging from 11 to 25, the upper left reference sample 835 may be used for intra prediction.
The number of reference samples used to determine the pixel value of one pixel in the prediction block may be 1 or 2 or more.
As described above, the pixel values of the pixels in the prediction block may be determined according to the positions of the pixels and the positions of the reference samples indicated by the direction of the intra prediction mode. When the position of the pixel and the position of the reference sample point indicated by the direction of the intra prediction mode are integer positions, the value of one reference sample point indicated by the integer position may be used to determine the pixel value of the pixel in the prediction block.
When the position of the pixel and the position of the reference sample point indicated by the direction of the intra prediction mode are not integer positions, an interpolated reference sample point based on two reference sample points closest to the position of the reference sample point may be generated. The values of the interpolated reference samples may be used to determine pixel values for pixels in the prediction block. In other words, when the position of the pixel in the prediction block and the position of the reference sample point indicated by the direction of the intra prediction mode indicate a position between two reference sample points, an interpolation based on the values of the two sample points may be generated.
The prediction block generated via prediction may be different from the original target block. In other words, there may be a prediction error, which is a difference between the target block and the prediction block, and there may also be a prediction error between pixels of the target block and pixels of the prediction block.
Hereinafter, the terms "difference", "error" and "residual" may be used to have the same meaning and may be used interchangeably with each other.
For example, in the case of directional intra prediction, the longer the distance between the pixels of the predicted block and the reference sample, the larger the prediction error that may occur. Such a prediction error may cause discontinuity between the generated prediction block and the neighboring block.
To reduce the prediction error, a filtering operation for the prediction block may be used. The filtering operation may be configured to adaptively apply a filter to a region in the prediction block that is considered to have a large prediction error. For example, a region considered to have a large prediction error may be a boundary of a prediction block. In addition, regions that are considered to have a large prediction error in a prediction block may be different according to an intra prediction mode, and characteristics of a filter may also be different according to the intra prediction mode.
Fig. 9 is a diagram for explaining an embodiment of an inter prediction process.
The rectangle shown in fig. 9 may represent an image (or picture). In addition, in fig. 9, an arrow may indicate a prediction direction. That is, each image may be encoded and/or decoded according to a prediction direction.
Images can be classified into an intra picture (I picture), a mono-predictive picture or a predictive coded picture (P picture), and a bi-predictive picture or a bi-predictive coded picture (B picture) according to coding types. Each picture may be encoded and/or decoded according to its coding type.
When the target image that is the target to be encoded is an I picture, the target image can be encoded using data contained in the image itself without performing inter prediction with reference to other images. For example, an I picture may be encoded via intra prediction only.
When the target image is a P picture, the target image may be encoded via inter prediction using a reference picture existing in one direction. Here, the one direction may be a forward direction or a backward direction.
When the target image is a B picture, the image may be encoded via inter prediction using reference pictures existing in both directions, or may be encoded via inter prediction using reference pictures existing in one of a forward direction and a backward direction. Here, the two directions may be a forward direction and a backward direction.
P-pictures and B-pictures encoded and/or decoded using reference pictures may be considered images using inter prediction.
Hereinafter, inter prediction in the inter mode according to the embodiment will be described in detail.
Inter prediction may be performed using motion information.
In the inter mode, the encoding apparatus 100 may perform inter prediction and/or motion compensation on the target block. The decoding apparatus 200 may perform inter prediction and/or motion compensation corresponding to the inter prediction and/or motion compensation performed by the encoding apparatus 100 on the target block.
The motion information of the target block may be separately derived by the encoding apparatus 100 and the decoding apparatus 200 during inter prediction. The motion information may be derived using motion information of reconstructed neighboring blocks, motion information of a col block, and/or motion information of blocks adjacent to the col block.
For example, the encoding apparatus 100 or the decoding apparatus 200 may perform prediction and/or motion compensation by using motion information of a spatial candidate and/or a temporal candidate as motion information of a target block. The target block may represent a PU and/or a PU partition.
The spatial candidate may be a reconstructed block spatially adjacent to the target block.
The temporal candidate may be a reconstructed block corresponding to the target block in a previously reconstructed co-located picture (col picture).
In the inter prediction, the encoding apparatus 100 and the decoding apparatus 200 may improve encoding efficiency and decoding efficiency by using motion information of spatial candidates and/or temporal candidates. The motion information of the spatial candidates may be referred to as "spatial motion information". The motion information of the temporal candidates may be referred to as "temporal motion information".
Next, the motion information of the spatial candidate may be the motion information of the PU including the spatial candidate. The motion information of the temporal candidate may be the motion information of the PU including the temporal candidate. The motion information of the candidate block may be motion information of a PU that includes the candidate block.
Inter prediction may be performed using a reference picture.
The reference picture may be at least one of a picture preceding the target picture and a picture following the target picture. The reference picture may be an image used for prediction of the target block.
In inter prediction, a region in a reference picture may be specified using a reference picture index (or refIdx) indicating the reference picture, a motion vector to be described later, or the like. Here, the area specified in the reference picture may indicate a reference block.
Inter prediction may select a reference picture, and may also select a reference block corresponding to the target block from the reference picture. Further, inter prediction may generate a prediction block for a target block using the selected reference block.
The motion information may be derived by each of the encoding apparatus 100 and the decoding apparatus 200 during inter prediction.
The spatial candidates may be 1) blocks that exist in the target picture that 2) have been previously reconstructed via encoding and/or decoding and 3) are adjacent to the target block or located at corners of the target block. Here, the "block located at a corner of the target block" may be a block vertically adjacent to an adjacent block horizontally adjacent to the target block, or a block horizontally adjacent to an adjacent block vertically adjacent to the target block. Further, "a block located at a corner of the target block" may have the same meaning as "a block adjacent to the corner of the target block". The meaning of "a block located at a corner of a target block" may be included in the meaning of "a block adjacent to the target block".
For example, the spatial candidate may be a reconstructed block located to the left of the target block, a reconstructed block located above the target block, a reconstructed block located in the lower left corner of the target block, a reconstructed block located in the upper right corner of the target block, or a target block located in the upper left corner of the target block.
Each of the encoding apparatus 100 and the decoding apparatus 200 can identify a block existing in a position spatially corresponding to a target block in a col picture. The position of the target block in the target picture and the position of the identified block in the col picture may correspond to each other.
Each of the encoding apparatus 100 and the decoding apparatus 200 may determine, as a time candidate, a col block existing at a predefined correlation position with respect to the identified block. The predefined relative location may be a location that exists inside and/or outside the identified block.
For example, the col blocks may include a first col block and a second col block. When the coordinates of the identified block are (xP, yP) and the size of the identified block is represented by (nPSW, nPSH), the first col block may be a block located at coordinates (xP + nPSW, yP + nPSH). The second col block may be a block located at coordinates (xP + (nPSW > >1), yP + (nPSH > > 1)). When the first col block is not available, the second col block may be selectively used.
The motion vector of the target block may be determined based on the motion vector of the col block. Each of the encoding apparatus 100 and the decoding apparatus 200 may scale the motion vector of the col block. The scaled motion vector of the col block can be used as the motion vector of the target block. Further, the motion vector of the run information of the temporal candidate stored in the list may be a scaled motion vector.
The ratio of the motion vector of the target block to the motion vector of the col block may be the same as the ratio of the first distance to the second distance. The first distance may be a distance between the reference picture and a target picture of the target block. The second distance may be a distance between the reference picture and a col picture of the col block.
The scheme for deriving the motion information may vary according to the inter prediction mode of the target block. For example, as an inter prediction mode applied to inter prediction, there may be an Advanced Motion Vector Predictor (AMVP) mode, a merge mode, a skip mode, a current picture reference mode, and the like. The merge mode may also be referred to as a "motion merge mode". Each mode will be described in detail below.
1) AMVP mode
When using the AMVP mode, the encoding apparatus 100 may search for similar blocks in the neighborhood of the target block. The encoding apparatus 100 may acquire a prediction block by performing prediction on a target block using motion information of the found similar block. The encoding apparatus 100 may encode a residual block that is a difference between the target block and the prediction block.
1-1) creating a list of predicted motion vector candidates
When the AMVP mode is used as the prediction mode, each of the encoding apparatus 100 and the decoding apparatus 200 may create a list of predicted motion vector candidates using a motion vector of a spatial candidate, a motion vector of a temporal candidate, and a zero vector. The predicted motion vector candidate list may include one or more predicted motion vector candidates. At least one of a motion vector of the spatial candidate, a motion vector of the temporal candidate, and a zero vector may be determined and used as the prediction motion vector candidate.
Hereinafter, the terms "prediction motion vector (candidate)" and "motion vector (candidate)" may be used to have the same meaning and may be used interchangeably with each other.
Hereinafter, the terms "prediction motion vector candidate" and "AMVP candidate" may be used to have the same meaning and may be used interchangeably with each other.
Hereinafter, the terms "predicted motion vector candidate list" and "AMVP candidate list" may be used to have the same meaning and may be used interchangeably with each other.
The spatial candidates may include reconstructed spatially neighboring blocks. In other words, the motion vectors of the reconstructed neighboring blocks may be referred to as "spatial prediction motion vector candidates".
The temporal candidates may include a col block and blocks adjacent to the col block. In other words, a motion vector of a col block or a motion vector of a block adjacent to the col block may be referred to as a "temporal prediction motion vector candidate".
The zero vector may be a (0,0) motion vector.
The predicted motion vector candidate may be a motion vector predictor for predicting a motion vector. Further, in the encoding apparatus 100, each predicted motion vector candidate may be an initial search position for a motion vector.
1-2) searching for motion vector using list of predicted motion vector candidates
The encoding apparatus 100 may determine a motion vector to be used for encoding the target block within the search range using the list of predicted motion vector candidates. Further, the encoding apparatus 100 may determine a predicted motion vector candidate to be used as the predicted motion vector of the target block among the predicted motion vector candidates existing in the predicted motion vector candidate list.
The motion vector to be used for encoding the target block may be a motion vector that can be encoded at a minimum cost.
In addition, the encoding apparatus 100 may determine whether to encode the target block using the AMVP mode.
1-3) Transmission of inter-frame prediction information
The encoding apparatus 100 may generate a bitstream including inter prediction information required for inter prediction. The decoding apparatus 200 may perform inter prediction on the target block using inter prediction information of the bitstream.
The inter prediction information may include 1) mode information indicating whether the AMVP mode is used, 2) a prediction motion vector index, 3) a Motion Vector Difference (MVD), 4) a reference direction, and 5) a reference picture index.
Hereinafter, the terms "prediction motion vector index" and "AMVP index" may be used to have the same meaning and may be used interchangeably with each other.
Furthermore, the inter prediction information may include a residual signal.
When the mode information indicates that the AMVP mode is used, the decoding apparatus 200 may acquire a prediction motion vector index, an MVD, a reference direction, and a reference picture index from the bitstream through entropy decoding.
The prediction motion vector index may indicate a prediction motion vector candidate to be used for predicting the target block among prediction motion vector candidates included in the prediction motion vector candidate list.
1-4) inter prediction in AMVP mode using inter prediction information
The decoding apparatus 200 may derive a predicted motion vector candidate using the predicted motion vector candidate list, and may determine motion information of the target block based on the derived predicted motion vector candidate.
The decoding apparatus 200 may determine a motion vector candidate for the target block among the predicted motion vector candidates included in the predicted motion vector candidate list using the predicted motion vector index. The decoding apparatus 200 may select a predicted motion vector candidate indicated by the predicted motion vector index as the predicted motion vector of the target block from among the predicted motion vector candidates included in the predicted motion vector candidate list.
The motion vector that will actually be used for inter prediction of the target block may not match the predicted motion vector. To indicate the difference between the motion vector that will actually be used for inter-predicting the target block and the predicted motion vector, MVD may be used. The encoding apparatus 100 may derive a prediction motion vector similar to a motion vector that will actually be used for inter-prediction of the target block in order to use an MVD as small as possible.
The MVD may be the difference between the motion vector of the target block and the predicted motion vector. The encoding apparatus 100 may calculate an MVD and may entropy-encode the MVD.
The MVD may be transmitted from the encoding apparatus 100 to the decoding apparatus 200 through a bitstream. The decoding apparatus 200 may decode the received MVD. The decoding apparatus 200 may derive a motion vector of the target block by summing the decoded MVD and the prediction motion vector. In other words, the motion vector of the target block derived by the decoding apparatus 200 may be the sum of the entropy-decoded MVD and the motion vector candidate.
The reference direction may indicate a list of reference pictures to be used for predicting the target block. For example, the reference direction may indicate one of the reference picture list L0 and the reference picture list L1.
The reference direction indicates only a reference picture list to be used for prediction of the target block, and may not mean that the direction of the reference picture is limited to a forward direction or a backward direction. In other words, each of the reference picture list L0 and the reference picture list L1 may include pictures in the forward direction and/or the backward direction.
The reference direction being unidirectional may mean that a single reference picture list is used. The reference direction being bi-directional may mean that two reference picture lists are used. In other words, the reference direction may indicate one of the following: the case of using only the reference picture list L0, the case of using only the reference picture list L1, and the case of using two reference picture lists.
The reference picture index may indicate a reference picture to be used for prediction of the target block among reference pictures in the reference picture list. The reference picture index may be entropy-encoded by the encoding apparatus 100. The entropy-encoded reference picture index may be signaled by the encoding apparatus 100 to the decoding apparatus 200 through a bitstream.
When two reference picture lists are used for prediction of a target block, a single reference picture index and a single motion vector may be used for each of the reference picture lists. Further, when two reference picture lists are used for predicting the target block, two prediction blocks may be specified for the target block. For example, an average or a weighted sum of two prediction blocks for a target block may be used to generate a (final) prediction block for the target block.
The motion vector of the target block may be derived by predicting a motion vector index, an MVD, a reference direction, and a reference picture index.
The decoding apparatus 200 may generate a prediction block for the target block based on the derived motion vector and the reference picture index. For example, the prediction block may be a reference block indicated by a derived motion vector in a reference picture indicated by a reference picture index.
Since the prediction motion vector index and the MVD are encoded while the motion vector itself of the target block is not encoded, the number of bits transmitted from the encoding apparatus 100 to the decoding apparatus 200 can be reduced and the encoding efficiency can be improved.
The motion information of the reconstructed neighboring blocks can be used for the target block. In a specific inter prediction mode, the encoding apparatus 100 may not encode actual motion information of the target block alone. The motion information of the target block is not encoded, but additional information that enables the motion information of the target block to be derived using the reconstructed motion information of the neighboring blocks may be encoded. Since the additional information is encoded, the number of bits transmitted to the decoding apparatus 200 may be reduced and the encoding efficiency may be improved.
For example, as an inter prediction mode in which motion information of a target block is not directly encoded, a skip mode and/or a merge mode may exist. Here, each of the encoding apparatus 100 and the decoding apparatus 200 may use an indicator and/or an index indicating a unit whose motion information is to be used as motion information of the target unit among the reconstructed neighboring units.
2) Merge mode
As a scheme for deriving motion information of a target block, there is merging. The term "merging" may mean merging motion of multiple blocks. "merging" may mean that motion information of one block is also applied to other blocks. In other words, the merge mode may be a mode in which motion information of the target block is derived from motion information of neighboring blocks.
When the merge mode is used, the encoding apparatus 100 may predict motion information of the target block using motion information of the spatial candidate and/or motion information of the temporal candidate. The spatial candidates may include reconstructed spatially neighboring blocks that are spatially adjacent to the target block. The spatially neighboring blocks may include a left neighboring block and an upper neighboring block. The temporal candidates may include col blocks. The terms "spatial candidate" and "spatial merge candidate" may be used to have the same meaning and may be used interchangeably with each other. The terms "time candidate" and "time merge candidate" may be used to have the same meaning and may be used interchangeably with each other.
The encoding apparatus 100 may acquire a prediction block via prediction. The encoding apparatus 100 may encode a residual block that is a difference between the target block and the prediction block.
2-1) creating a merge candidate list
When the merge mode is used, each of the encoding apparatus 100 and the decoding apparatus 200 may create a merge candidate list using motion information of spatial candidates and/or motion information of temporal candidates. The motion information may include 1) a motion vector, 2) a reference picture index, and 3) a reference direction. The reference direction may be unidirectional or bidirectional.
The merge candidate list may include merge candidates. The merge candidate may be motion information. In other words, the merge candidate list may be a list storing a plurality of pieces of motion information.
The merge candidate may be motion information of a plurality of temporal candidates and/or spatial candidates. Further, the merge candidate list may include a new merge candidate generated by combining merge candidates already existing in the merge candidate list. In other words, the merge candidate list may include new motion information generated by combining a plurality of pieces of motion information previously existing in the merge candidate list.
The merge candidate may be a specific mode of deriving inter prediction information. The merge candidate may be information indicating a specific mode of deriving inter prediction information. Inter prediction information for the target block may be derived from a particular mode indicated by the merge candidate. Further, the particular mode may include a process of deriving a series of inter prediction information. This particular mode may be an inter prediction information derivation mode or a motion information derivation mode.
The inter prediction information of the target block may be derived according to a mode indicated by a merge candidate selected among merge candidates in the merge candidate list by a merge index.
For example, the motion information derivation mode in the merge candidate list may be at least one of the following modes: 1) a motion information derivation mode for the sub-block unit; 2) affine motion information derivation mode.
In addition, the merge candidate list may include motion information of a zero vector. The zero vector may also be referred to as a "zero merge candidate".
In other words, the pieces of motion information in the merge candidate list may be at least one of: 1) motion information of a spatial candidate, 2) motion information of a temporal candidate, 3) motion information generated by combining pieces of motion information previously existing in the merge candidate list, and 4) a zero vector.
The motion information may include 1) a motion vector, 2) a reference picture index, and 3) a reference direction. The reference direction may also be referred to as an "inter prediction indicator". The reference direction may be unidirectional or bidirectional. The unidirectional reference direction may indicate L0 prediction or L1 prediction.
The merge candidate list may be created before performing prediction in merge mode.
The number of merge candidates in the merge candidate list may be predefined. Each of the encoding apparatus 100 and the decoding apparatus 200 may add the merge candidates to the merge candidate list according to a predefined scheme and a predefined priority such that the merge candidate list has a predefined number of merge candidates. The merge candidate list of the encoding apparatus 100 and the merge candidate list of the decoding apparatus 200 may be made identical to each other using a predefined scheme and a predefined priority.
Merging may be applied on a CU or PU basis. When the merging is performed on a CU or PU basis, the encoding apparatus 100 may transmit a bitstream including predefined information to the decoding apparatus 200. For example, the predefined information may include 1) information indicating whether to perform merging for respective block partitions, and 2) information on a block on which merging is to be performed among blocks that are spatial candidates and/or temporal candidates for a target block.
2-2) searching for motion vector using merge candidate list
The encoding apparatus 100 may determine a merge candidate to be used for encoding the target block. For example, the encoding apparatus 100 may perform prediction on the target block using the merge candidate in the merge candidate list, and may generate a residual block for the merge candidate. The encoding apparatus 100 may encode the target block using a merging candidate that generates the minimum cost in the encoding of the prediction and residual blocks.
In addition, the encoding apparatus 100 may determine whether to encode the target block using the merge mode.
2-3) Transmission of inter-frame prediction information
The encoding apparatus 100 may generate a bitstream including inter prediction information required for inter prediction. The encoding apparatus 100 may generate entropy-encoded inter prediction information by performing entropy encoding on the inter prediction information, and may transmit a bitstream including the entropy-encoded inter prediction information to the decoding apparatus 200. The entropy-encoded inter prediction information may be signaled by the encoding apparatus 100 to the decoding apparatus 200 through a bitstream.
The decoding apparatus 200 may perform inter prediction on the target block using inter prediction information of the bitstream.
The inter prediction information may include 1) mode information indicating whether a merge mode is used and 2) a merge index.
Furthermore, the inter prediction information may include a residual signal.
The decoding apparatus 200 may acquire the merge index from the bitstream only when the mode information indicates that the merge mode is used.
The mode information may be a merge flag. The unit of the mode information may be a block. The information on the block may include mode information, and the mode information may indicate whether a merge mode is applied to the block.
The merge index may indicate a merge candidate to be used for prediction of the target block among merge candidates included in the merge candidate list. Alternatively, the merge index may indicate a block to be merged with the target block among neighboring blocks spatially or temporally adjacent to the target block.
The encoding apparatus 100 may select a merge candidate having the highest encoding performance among merge candidates included in the merge candidate list, and set a value of the merge index to indicate the selected merge candidate.
2-4) inter prediction of merge mode using inter prediction information
The decoding apparatus 200 may perform prediction on the target block using the merge candidate indicated by the merge index among the merge candidates included in the merge candidate list.
The motion vector of the target block may be specified by the motion vector of the merging candidate indicated by the merging index, the reference picture index, and the reference direction.
3) Skip mode
The skip mode may be a mode in which motion information of a spatial candidate or motion information of a temporal candidate is applied to the target block without change. Also, the skip mode may be a mode that does not use a residual signal. In other words, when the skip mode is used, the reconstructed block may be a predicted block.
The difference between the merge mode and the skip mode is whether a residual signal is sent or used. That is, the skip mode may be similar to the merge mode except that no residual signal is sent or used.
When the skip mode is used, the encoding apparatus 100 may transmit information on a block whose motion information is to be used as motion information of a target block among blocks that are spatial candidates or temporal candidates to the decoding apparatus 200 through a bitstream. The encoding apparatus 100 may generate entropy-encoded information by performing entropy encoding on the information, and may signal the entropy-encoded information to the decoding apparatus 200 through a bitstream.
Also, when the skip mode is used, the encoding apparatus 100 may not send other syntax information (such as MVD) to the decoding apparatus 200. For example, when the skip mode is used, the encoding apparatus 100 may not signal syntax elements related to at least one of MVC, a coded block flag, and a transform coefficient level to the decoding apparatus 200.
3-1) creating a merge candidate list
The skip mode may also use a merge candidate list. In other words, the merge candidate list may be used in both the merge mode and the skip mode. In this regard, the merge candidate list may also be referred to as a "skip candidate list" or a "merge/skip candidate list".
Alternatively, the skip mode may use an additional candidate list different from the candidate list of the merge mode. In this case, in the following description, the merge candidate list and the merge candidate may be replaced with the skip candidate list and the skip candidate, respectively.
The merge candidate list may be created before performing prediction in skip mode.
3-2) searching for motion vector using merge candidate list
The encoding apparatus 100 may determine a merge candidate to be used for encoding the target block. For example, the encoding apparatus 100 may perform prediction on the target block using the merge candidate in the merge candidate list. The encoding apparatus 100 may encode the target block using the merge candidate that generates the smallest cost in the prediction.
In addition, the encoding apparatus 100 may determine whether to encode the target block using the skip mode.
3-3) Transmission of inter-frame prediction information
The encoding apparatus 100 may generate a bitstream including inter prediction information required for inter prediction. The decoding apparatus 200 may perform inter prediction on the target block using inter prediction information of the bitstream.
The inter prediction information may include 1) mode information indicating whether a skip mode is used and 2) a skip index.
The skip index may be the same as the merge index described above.
When the skip mode is used, the target block may be encoded without using a residual signal. The inter prediction information may not include a residual signal. Alternatively, the bitstream may not include a residual signal.
The decoding apparatus 200 may acquire the skip index from the bitstream only when the mode information indicates that the skip mode is used. As described above, the merge index and the skip index may be identical to each other. The decoding apparatus 200 may acquire the skip index from the bitstream only when the mode information indicates that the merge mode or the skip mode is used.
The skip index may indicate a merge candidate to be used for prediction of the target block among merge candidates included in the merge candidate list.
3-4) inter prediction in skip mode using inter prediction information
The decoding apparatus 200 may perform prediction on the target block using the merge candidate indicated by the skip index among the merge candidates included in the merge candidate list.
The motion vector of the target block may be specified by the motion vector of the merging candidate indicated by the skip index, the reference picture index, and the reference direction.
4) Current picture reference mode
The current picture reference mode may represent a prediction mode: the prediction mode uses a previously reconstructed region in a target picture to which the target block belongs.
A motion vector for specifying a previously reconstructed region may be used. The reference picture index of the target block may be used to determine whether the target block has been encoded in the current picture reference mode.
A flag or index indicating whether the target block is a block encoded in the current picture reference mode may be signaled by the encoding apparatus 100 to the decoding apparatus 200. Alternatively, whether the target block is a block encoded in the current picture reference mode may be inferred by the reference picture index of the target block.
When the target block is encoded in the current picture reference mode, the current picture may exist at a fixed position or an arbitrary position in the reference picture list for the target block.
For example, the fixed position may be a position where the value of the reference picture index is 0 or the last position.
When the target picture exists at an arbitrary position in the reference picture list, an additional reference picture index indicating such an arbitrary position may be signaled by the encoding apparatus 100 to the decoding apparatus 200.
In the AMVP mode, the merge mode, and the skip mode described above, the index of the list may be used to specify motion information to be used for prediction of the target block among pieces of motion information in the list.
To improve encoding efficiency, the encoding apparatus 100 may signal only an index of an element that generates the smallest cost in inter prediction of the target block among elements in the list. The encoding apparatus 100 may encode the index and may signal the encoded index.
Therefore, it is necessary to be able to derive the above-described lists (i.e., the predictive motion vector candidate list and the merge candidate list) based on the same data using the same scheme by the encoding apparatus 100 and the decoding apparatus 200. Here, the same data may include a reconstructed picture and a reconstructed block. Further, in order to specify an element using an index, the order of the elements in the list must be fixed.
Fig. 10 illustrates spatial candidates according to an embodiment.
In fig. 10, the positions of the spatial candidates are shown.
The large block at the center of the graph may represent the target block. Five small blocks may represent spatial candidates.
The coordinates of the target block may be (xP, yP), and the size of the target block may be expressed in (nPSW, nPSH).
Spatial candidate A0May be a block adjacent to the lower left corner of the target block. A. the0May be a block occupying a pixel located at the coordinates (xP-1, yP + nPSH + 1).
Spatial candidate A1May be the block adjacent to the left side of the target block. A. the1May be the lowermost block among blocks adjacent to the left side of the target block. Alternatively, A1May be with A0Top adjacent block of (a). A. the1May be a block occupying pixels located at coordinates (xP-1, yP + nPSH).
Spatial candidate B0May be a block adjacent to the upper right corner of the target block. B is0May be a block occupying a pixel located at the coordinates (xP + nPSW +1, yP-1).
Spatial candidate B1May be a block adjacent to the top of the target block. B is1May be the rightmost block among blocks adjacent to the top of the target block. Alternatively, B1May be with B0Left adjacent block. B is1May be a block occupying a pixel located at the coordinates (xP + nPSW, yP-1).
The spatial candidate B2 may be a block adjacent to the upper left corner of the target block. B2 may be a block that occupies a pixel located at coordinates (xP-1, yP-1).
Determination of availability of spatial and temporal candidates
In order to include motion information of a spatial candidate or motion information of a temporal candidate in a list, it must be determined whether motion information of a spatial candidate or motion information of a temporal candidate is available.
Hereinafter, the candidate block may include a spatial candidate and a temporal candidate.
For example, the determination may be performed by sequentially applying the following steps 1) to 4).
Step 1) when a PU including a candidate block is located outside the boundary of the picture, the availability of the candidate block may be set to "false". The expression "availability is set to false" may have the same meaning as "set to unavailable".
Step 2) when a PU including a candidate block is located outside the boundary of a slice, the availability of the candidate block may be set to "false". When the target block and the candidate block are located in different stripes, the availability of the candidate block may be set to "false".
Step 3) when the PU including the candidate block is outside the boundary of the parallel block, the availability of the candidate block may be set to "false". When the target block and the candidate block are located in different parallel blocks, the availability of the candidate block may be set to "false".
Step 4) when the prediction mode of the PU including the candidate block is an intra prediction mode, the availability of the candidate block may be set to "false". The availability of a candidate block may be set to "false" when a PU that includes the candidate block does not use inter prediction.
Fig. 11 illustrates an order of adding motion information of spatial candidates to a merge list according to an embodiment.
As shown in fig. 11, when pieces of motion information of spatial candidates are added to the merge list, it is possible toUsing A1、B1、B0、A0And B2The order of (a). That is, can be according to A1、B1、B0、A0And B2The order of the available spatial candidates adds pieces of motion information of the available spatial candidates to the merge list.
Method for deriving merge lists in merge mode and skip mode
As described above, the maximum number of merging candidates in the merge list may be set. The maximum number of settings may be indicated by "N". The set number may be transmitted from the encoding apparatus 100 to the decoding apparatus 200. The head of the strip may comprise N. In other words, the maximum number of merging candidates in the merging list for the target block of the slice may be set by the slice header. For example, the value of N may be substantially 5.
Pieces of motion information (i.e., merging candidates) may be added to the merge list in the order of the following steps 1) to 4).
Step 1)Among the spatial candidates, the available spatial candidates may be added to the merge list. The pieces of motion information of the available spatial candidates may be added to the merge list in the order shown in fig. 10. Here, when the motion information of the available spatial candidate overlaps with other motion information already existing in the merge list, the motion information of the available spatial candidate may not be added to the merge list. The operation of checking whether the corresponding motion information overlaps with other motion information present in the list may be simply referred to as "overlap check".
The maximum number of pieces of motion information to be added may be N.
Step 2)When the number of pieces of motion information in the merge list is less than N and a temporal candidate is available, the motion information of the temporal candidate may be added to the merge list. Here, when the motion information of the available temporal candidate overlaps with other motion information already existing in the merge list, the motion information of the available temporal candidate may not be added to the merge list.
Step 3)When the number of pieces of motion information in the merge list is less than N and the type of the target strip is "B", the pass combinations may be combinedThe combined motion information generated by bi-prediction (bi-prediction) is added to the merge list.
The target stripe may be a stripe that includes the target block.
The combined motion information may be a combination of the L0 motion information and the L1 motion information. The L0 motion information may be motion information referring only to the reference picture list L0. The L1 motion information may be motion information referring only to the reference picture list L1.
In the merge list, there may be one or more pieces of L0 motion information. Further, in the merge list, there may be one or more pieces of L1 motion information.
The combined motion information may include one or more pieces of combined motion information. When generating the combined motion information, L0 motion information and L1 motion information to be used for the step of generating the combined motion information among the one or more pieces of L0 motion information and the one or more pieces of L1 motion information may be defined in advance. One or more pieces of combined motion information may be generated in a predefined order via combined bi-prediction using a pair of different pieces of motion information in the merge list. One piece of motion information of the pair of different motion information may be L0 motion information, and the other piece of motion information of the pair of different motion information may be L1 motion information.
For example, the combined motion information added with the highest priority may be a combination of L0 motion information having a merge index of 0 and L1 motion information having a merge index of 1. When the motion information having the merge index 0 is not the L0 motion information or when the motion information having the merge index 1 is not the L1 motion information, the combined motion information may be neither generated nor added. Next, the combined motion information added with the next priority may be a combination of L0 motion information having a merge index of 1 and L1 motion information having a merge index of 0. The detailed combinations that follow may conform to other combinations in the video encoding/decoding field.
Here, when the combined motion information overlaps with other motion information already existing in the merge list, the combined motion information may not be added to the merge list.
Step 4)When the number of pieces of motion information in the merge list is less than N, the motion information of the zero vector may be added to the merge list.
The zero vector motion information may be motion information in which the motion vector is a zero vector.
The number of pieces of zero vector motion information may be one or more. The reference picture indices of one or more pieces of zero vector motion information may be different from each other. For example, the value of the reference picture index of the first zero vector motion information may be 0. The reference picture index of the second zero vector motion information may have a value of 1.
The number of pieces of zero vector motion information may be the same as the number of reference pictures in the reference picture list.
The reference direction of the zero vector motion information may be bi-directional. The two motion vectors may be zero vectors. The number of pieces of zero vector motion information may be the smaller one of the number of reference pictures in the reference picture list L0 and the number of reference pictures in the reference picture list L1. Alternatively, when the number of reference pictures in the reference picture list L0 and the number of reference pictures in the reference picture list L1 are different from each other, the reference direction, which is unidirectional, may be used for the reference picture index that can be applied to only a single reference picture list.
The encoding apparatus 100 and/or the decoding apparatus 200 may then add zero vector motion information to the merge list while changing the reference picture index.
Zero vector motion information may not be added to the merge list when it overlaps with other motion information already present in the merge list.
The order of the above-described steps 1) to 4) is merely exemplary, and may be changed. Furthermore, some of the above steps may be omitted according to predefined conditions.
Method for deriving predicted motion vector candidate list in AMVP mode
The maximum number of predicted motion vector candidates in the predicted motion vector candidate list may be predefined. A predefined maximum number may be indicated by N. For example, the predefined maximum number may be 2.
Pieces of motion information (i.e., predicted motion vector candidates) may be added to the predicted motion vector candidate list in the following order of step 1) to step 3).
Step 1)An available spatial candidate among the spatial candidates may be added to the predicted motion vector candidate list. The spatial candidates may include a first spatial candidate and a second spatial candidate.
The first spatial candidate may be a0、A1Scaled A0And scaled A1One of them. The second spatial candidate may be B0、B1、B2Scaled B0Scaled B1And scaled B2One of them.
The plurality of pieces of motion information of the available spatial candidates may be added to the prediction motion vector candidate list in the order of the first spatial candidate and the second spatial candidate. In this case, when the motion information of the available spatial candidate overlaps with other motion information already existing in the predicted motion vector candidate list, the motion information of the available spatial candidate may not be added to the predicted motion vector candidate list. In other words, when the value of N is 2, if the motion information of the second spatial candidate is the same as the motion information of the first spatial candidate, the motion information of the second spatial candidate may not be added to the predicted motion vector candidate list.
The maximum number of pieces of motion information to be added may be N.
Step 2)When the number of pieces of motion information in the predicted motion vector candidate list is less than N and a temporal candidate is available, the motion information of the temporal candidate may be added to the predicted motion vector candidate list. In this case, when the motion information of the available temporal candidate overlaps with other motion information already existing in the predicted motion vector candidate list, the motion information of the available temporal candidate may not be added to the predicted motion vector candidate list.
Step 3)When the number of pieces of motion information in the predicted motion vector candidate list is less than N, zero vector motion information may be added to the predicted motion vector candidate list.
The zero vector motion information may include one or more pieces of zero vector motion information. The reference picture indices of the one or more pieces of zero vector motion information may be different from each other.
The encoding apparatus 100 and/or the decoding apparatus 200 may sequentially add pieces of zero vector motion information to the predicted motion vector candidate list while changing the reference picture index.
When the zero vector motion information overlaps with other motion information already existing in the predicted motion vector candidate list, the zero vector motion information may not be added to the predicted motion vector candidate list.
The description of zero vector motion information made above in connection with the merge list is also applicable to zero vector motion information. A repetitive description thereof will be omitted.
The order of step 1) to step 3) described above is merely exemplary and may be changed. Furthermore, some of the steps may be omitted according to predefined conditions.
Fig. 12 illustrates a transform and quantization process according to an example.
As shown in fig. 12, the quantized level may be generated by performing a transform and/or quantization process on the residual signal.
The residual signal may be generated as a difference between the original block and the prediction block. Here, the prediction block may be a block generated via intra prediction or inter prediction.
The residual signal may be transformed into a signal in the frequency domain by a transformation process as part of a quantization process.
The transform kernels used for the transform may include various DCT kernels, such as Discrete Cosine Transform (DCT) type 2(DCT-II) and Discrete Sine Transform (DST) kernels.
These transform kernels may perform separable transforms or two-dimensional (2D) inseparable transforms on the residual signal. The separable transform may be a transform indicating that a one-dimensional (1D) transform is performed on the residual signal in each of a horizontal direction and a vertical direction.
The DCT type and DST type adaptively used for the 1D transform may include DCT-V, DCT-VIII, DST-I, and DST-VII in addition to DCT-II, as shown in Table 3 below.
TABLE 3
Figure BDA0002623340100000531
Figure BDA0002623340100000541
As shown in table 3, when a DCT type or a DST type to be used for transformation is derived, a transformation set may be used. Each transform set may include a plurality of transform candidates. Each transform candidate may be a DCT type or a DST type.
Table 4 below shows an example of a transform set applied to the horizontal direction depending on the intra prediction mode.
TABLE 4
Figure BDA0002623340100000542
Figure BDA0002623340100000551
In table 4, the number of each transform set to be applied to the horizontal direction of the residual signal is indicated depending on the intra prediction mode of the target block.
Table 5 below shows an example of a transformation set applied to the vertical direction of a residual signal depending on an intra prediction mode.
TABLE 5
Figure BDA0002623340100000552
Figure BDA0002623340100000561
As illustrated in tables 4 and 5, a transformation set to be applied to the horizontal direction and the vertical direction may be predefined according to the intra prediction mode of the target block. The encoding apparatus 100 may perform transformation and inverse transformation on the residual signal using the transformation included in the transformation set corresponding to the intra prediction mode of the target block. Further, the decoding apparatus 200 may perform inverse transformation on the residual signal using the transformation included in the transformation set corresponding to the intra prediction mode of the target block.
In the transform and inverse transform, as illustrated in table 3, table 4, and table 5, a transform set to be applied to a residual signal may be determined and may not be signaled. The transformation indication information may be signaled from the encoding apparatus 100 to the decoding apparatus 200. The transformation indication information may be information indicating which one of a plurality of transformation candidates included in the transformation set to be applied to the residual signal is used.
As described above, a method using various transforms may be applied to a residual signal generated via intra prediction or inter prediction.
The transformation may include at least one of a first transformation and a second transformation. The transform coefficient may be generated by performing a primary transform on the residual signal, and the secondary transform coefficient may be generated by performing a secondary transform on the transform coefficient.
The first transformation may be referred to as the "primary transformation". Furthermore, the first transformation may also be referred to as an "adaptive multi-transformation (AMT) scheme". As described above, the AMT may represent applying different transforms to respective 1D directions (i.e., vertical and/or horizontal directions) or selected directions.
Alternatively, AMT may be referred to as multi-transform selection (MTS) or extended multi-transform (EMT).
The secondary transform may be a transform for improving the energy concentration of transform coefficients generated by the primary transform. Similar to the first transformation, the second transformation may be a separable transformation or a non-separable transformation. Such an inseparable transform may be an inseparable quadratic transform (NSST).
The first transformation may be performed using at least one of a predefined plurality of transformation methods. For example, the predefined multiple transform methods may include Discrete Cosine Transform (DCT), Discrete Sine Transform (DST), Karhunen-Loeve transform (KLT), and the like.
Further, the first transformation may be a transformation having various types according to a kernel function defining DCT or DST.
For example, the first transformation may include transformations such as DCT-2, DCT-5, DST-7, DCT-7, DST-8, DST-1, and DCT-8 according to the transformation kernel presented in Table 6 below. In table 6 below, various transform types and transform kernels for Multiple Transform Selection (MTS) are illustrated.
MTS may refer to the selection of a combination of one or more DCT and/or DST kernels to transform the residual signal in the horizontal and/or vertical directions.
TABLE 6
Figure BDA0002623340100000571
Figure BDA0002623340100000581
In Table 6, i and j may be integer values equal to or greater than 0 and less than or equal to N-1.
A second transformation may be performed on the transformation coefficients generated by performing the first transformation.
The first transform and/or the second transform may be applied to a signal component corresponding to one or more of a luminance (luma) component and a chrominance (chroma) component. Whether to apply the primary transform and/or the secondary transform may be determined according to at least one of encoding parameters for the target block and/or the neighboring blocks. For example, whether to apply the first transformation and/or the second transformation may be determined according to the size and/or shape of the target block.
The transformation method to be applied to the primary transformation and/or the secondary transformation may be determined according to at least one of encoding parameters for the target block and/or the neighboring blocks. The determined transformation method may also indicate that the first transformation and/or the second transformation is not used.
Alternatively, transform information indicating a transform method may be signaled from the encoding apparatus 100 to the decoding apparatus 200. For example, the transformation information may include an index of a transformation to be used for the first transformation and/or the second transformation.
The quantized transform coefficient (i.e., the quantized level) may be generated by performing quantization on a result generated by performing the primary transform and/or the secondary transform or performing quantization on the residual signal.
Fig. 13 illustrates a diagonal scan according to an example.
Fig. 14 shows a horizontal scan according to an example.
Fig. 15 shows a vertical scan according to an example.
The quantized transform coefficients may be scanned via at least one of a (top right) diagonal scan, a vertical scan, and a horizontal scan according to at least one of an intra prediction mode, a block size, and a block shape. The block may be a Transform Unit (TU).
Each scan may be initiated at a particular starting point and may be terminated at a particular ending point.
For example, the quantized transform coefficients may be changed into a 1D vector form by scanning the coefficients of the block using the diagonal scan of fig. 13. Alternatively, the horizontal scan of fig. 14 or the vertical scan of fig. 15 may be used according to the size of the block and/or the intra prediction mode, instead of using the diagonal scan.
The vertical scanning may be an operation of scanning the 2D block type coefficients in the column direction. The horizontal scanning may be an operation of scanning the 2D block type coefficients in a row direction.
In other words, which one of the diagonal scan, the vertical scan, and the horizontal scan is to be used may be determined according to the size of the block and/or the inter prediction mode.
As shown in fig. 13, 14, and 15, the quantized transform coefficients may be scanned in a diagonal direction, a horizontal direction, or a vertical direction.
The quantized transform coefficients may be represented by block shapes. Each block may include a plurality of sub-blocks. Each sub-block may be defined according to a minimum block size or a minimum block shape.
In the scanning, a scanning order according to the type or direction of the scanning may be first applied to the subblocks. Further, a scanning order according to the direction of scanning may be applied to the quantized transform coefficients in each sub-block.
For example, as shown in fig. 13, 14, and 15, when the size of the target block is 8 × 8, quantized transform coefficients may be generated by primary transform, secondary transform, and quantization of a residual signal of the target block. Thus, one of three types of scanning orders may be applied to four 4 × 4 sub-blocks, and the quantized transform coefficients may also be scanned for each 4 × 4 sub-block according to the scanning order.
The scanned quantized transform coefficients may be entropy encoded, and the bitstream may include the entropy encoded quantized transform coefficients.
The decoding apparatus 200 may generate quantized transform coefficients via entropy decoding of a bitstream. The quantized transform coefficients may be arranged in the form of 2D blocks via inverse scanning. Here, as a method of the inverse scanning, at least one of the upper right diagonal scanning, the vertical scanning, and the horizontal scanning may be performed.
Inverse quantization may be performed on the quantized transform coefficients. The inverse quadratic transform may be performed on a result generated by performing inverse quantization according to whether the inverse quadratic transform is performed. Further, the first inverse transform may be performed on a result generated by performing the second inverse transform according to whether the first inverse transform is to be performed. The reconstructed residual signal may be generated by performing a first inverse transform on a result generated by performing a second inverse transform.
Fig. 16 is a configuration diagram of an encoding apparatus according to an embodiment.
The encoding apparatus 1600 may correspond to the encoding apparatus 100 described above.
The encoding apparatus 1600 may include a processing unit 1610, a memory 1630, a User Interface (UI) input device 1650, a UI output device 1660, and a storage 1640 that communicate with each other over a bus 1690. The encoding device 1600 may also include a communication unit 1620 connected to the network 1699.
The processing unit 1610 may be a Central Processing Unit (CPU) or semiconductor device for executing processing instructions stored in the memory 1630 or the storage 1640. The processing unit 1610 may be at least one hardware processor.
The processing unit 1610 may generate and process a signal, data, or information input to the encoding apparatus 1600, output from the encoding apparatus 1600, or used in the encoding apparatus 1600, and may perform checking, comparison, determination, or the like related to the signal, data, or information. In other words, in embodiments, the generation and processing of data or information, as well as the inspection, comparison, and determination related to the data or information, may be performed by the processing unit 1610.
The processing unit 1610 may include an inter prediction unit 110, an intra prediction unit 120, a switch 115, a subtractor 125, a transform unit 130, a quantization unit 140, an entropy encoding unit 150, an inverse quantization unit 160, an inverse transform unit 170, an adder 175, a filter unit 180, and a reference picture buffer 190.
At least some of the inter prediction unit 110, the intra prediction unit 120, the switch 115, the subtractor 125, the transform unit 130, the quantization unit 140, the entropy encoding unit 150, the inverse quantization unit 160, the inverse transform unit 170, the adder 175, the filter unit 180, and the reference picture buffer 190 may be program modules and may communicate with an external device or system. The program modules may be included in the encoding device 1600 in the form of an operating system, application program modules, or other program modules.
The program modules may be physically stored in various types of well-known storage devices. Furthermore, at least some of the program modules may also be stored in a remote memory storage device capable of communicating with the encoding apparatus 1200.
Program modules may include, but are not limited to, routines, subroutines, programs, objects, components, and data structures for performing functions or operations in accordance with the embodiments or for implementing abstract data types in accordance with the embodiments.
The program modules may be implemented using instructions or code executed by at least one processor of the encoding apparatus 1600.
The processing unit 1610 may execute instructions or code in the inter-prediction unit 110, the intra-prediction unit 120, the switch 115, the subtractor 125, the transform unit 130, the quantization unit 140, the entropy encoding unit 150, the inverse quantization unit 160, the inverse transform unit 170, the adder 175, the filter unit 180, and the reference picture buffer 190.
The memory unit may represent the memory 1630 and/or the memory 1640. Each of memory 1630 and storage 1640 may be any of various types of volatile or non-volatile storage media. For example, the memory 1630 may include at least one of Read Only Memory (ROM)1631 and Random Access Memory (RAM) 1632.
The storage unit may store data or information for the operation of the encoding device 1600. In an embodiment, data or information of the encoding apparatus 1600 may be stored in a storage unit.
For example, the storage unit may store pictures, blocks, lists, motion information, inter prediction information, bitstreams, and the like.
The encoding device 1600 may be implemented in a computer system including a computer-readable storage medium.
The storage medium may store at least one module required for the operation of the encoding apparatus 1600. Memory 1630 may store at least one module and may be configured to cause the at least one module to be executed by processing unit 1610.
Functions related to communication of data or information of the encoding apparatus 1600 may be performed by the communication unit 1620.
For example, the communication unit 1620 may transmit the bit stream to the decoding apparatus 1600 to be described later.
Fig. 17 is a configuration diagram of a decoding apparatus according to an embodiment.
The decoding apparatus 1700 may correspond to the decoding apparatus 200 described above.
The decoding apparatus 1700 may include a processing unit 1710, a memory 1730, a User Interface (UI) input device 1750, a UI output device 1760, and a storage 1740 that communicate with each other through a bus 1790. The decoding apparatus 1700 may further include a communication unit 1720 connected to a network 1799.
The processing unit 1710 may be a Central Processing Unit (CPU) or a semiconductor device for executing processing instructions stored in the memory 1730 or the storage 1740. The processing unit 1710 may be at least one hardware processor.
The processing unit 1710 may generate and process a signal, data, or information input to the decoding apparatus 1700, output from the decoding apparatus 1700, or used in the decoding apparatus 1700, and may perform checking, comparison, determination, or the like related to the signal, data, or information. In other words, in embodiments, the generation and processing of data or information, as well as the checking, comparing, and determining related to the data or information, may be performed by the processing unit 1710.
The processing unit 1710 may include the entropy decoding unit 210, the inverse quantization unit 220, the inverse transform unit 230, the intra prediction unit 240, the inter prediction unit 250, the switch 245, the adder 255, the filter unit 260, and the reference picture buffer 270.
At least some of the entropy decoding unit 210, the inverse quantization unit 220, the inverse transform unit 230, the intra prediction unit 240, the inter prediction unit 250, the adder 255, the switch 245, the filter unit 260, and the reference picture buffer 270 of the decoding apparatus 200 may be program modules and may communicate with an external device or system. The program modules may be included in the decoding apparatus 1700 in the form of an operating system, application program modules, or other program modules.
Program modules may be physically stored in various types of well-known memory devices. Furthermore, at least some of the program modules may also be stored in a remote memory storage device that is capable of communicating with the decoding apparatus 1700.
Program modules may include, but are not limited to, routines, subroutines, programs, objects, components, and data structures for performing functions or operations in accordance with the embodiments or for implementing abstract data types in accordance with the embodiments.
The program modules may be implemented using instructions or code executed by at least one processor of the decoding apparatus 1700.
Processing unit 1710 may execute instructions or code in entropy decoding unit 210, inverse quantization unit 220, inverse transform unit 230, intra prediction unit 240, inter prediction unit 250, switch 245, adder 255, filter unit 260, and reference picture buffer 270.
The memory unit may represent memory 1730 and/or memory 1740. Memory 1730 and storage 1740 can each be any of a variety of types of volatile or non-volatile storage media. For example, memory 1730 may include at least one of ROM1731 and RAM 1732.
The storage unit may store data or information for the operation of the decoding apparatus 1700. In an embodiment, data or information of the decoding apparatus 1700 may be stored in a storage unit.
For example, the storage unit may store pictures, blocks, lists, motion information, inter prediction information, bitstreams, and the like.
The decoding apparatus 1700 may be implemented in a computer system including a computer-readable storage medium.
The storage medium may store at least one module required for the operation of the decoding apparatus 1700. The memory 1730 may store at least one module and may be configured to cause the at least one module to be executed by the processing unit 1710.
Functions related to communication of data or information of the decoding apparatus 1700 can be performed by the communication unit 1720.
For example, the communication unit 1720 may receive a bitstream from the encoding device 1600.
Image processing method using sharing of information between channels
Methods and apparatuses according to embodiments may apply transform coding (transcoding) techniques using prediction and various transforms to high resolution images, such as 4K or 8K resolution images, may encode and/or decode images by sharing various types of predefined coding decision information between channels, and may decode compressed bitstreams or compressed data for the encoded images by sharing the transmitted coding decision information between channels.
The plurality of channels may represent a plurality of components representing a block. For example, the plurality of channels may include a color channel, a depth channel, an alpha channel, and the like.
Hereinafter, the terms "channel" and "color" may have the same meaning and may be used interchangeably with each other. Further, the term "color" may indicate one of the channels. The term "channel" may be used interchangeably with one or more of the terms "color", "depth", and "alpha".
The technique in the present embodiment can be used, and thus the problem of degradation in compression rate and image quality occurring when the conventional technique is applied to encoding and decoding of an image can be solved. In particular, when the conventional technique is applied to an image in which the variation of pixel values is spatially concentrated, the problem of degradation of the compression rate and image quality may be serious.
In an example, as various types of encoding decision information shared between channels to perform encoding and decoding according to the embodiment, there are the following pieces of information. In the names of the following information, "flag" may be omitted.
1) Transform _ skip _ flag information may indicate whether a transform is selectively skipped. Alternatively, the transform _ skip _ flag information may indicate that one of a transform and a skip transform is used.
2) The intra smoothing filter information may indicate whether smoothing filtering is applied to reference pixels used in intra prediction.
3) The position-dependent intra prediction combination (PDPC) flag (PDPC _ flag) information may indicate whether intra prediction is to be performed by using neighboring pixels to which smoothing (i.e., filtering) is applied and neighboring pixels to which smoothing (i.e., filtering) is not applied when a specific intra prediction (e.g., plane prediction) is performed.
4) Residual Differential Pulse Code Modulation (RDPCM) flag (RDPCM _ flag) information may indicate whether RDPCM, which additionally performs Differential Pulse Code Modulation (DPCM) on a residual signal acquired through one-time prediction and acquires the residual signal again, is to be performed.
5) The Multiple Transform Selection (MTS) flag (MTS _ flag) information may indicate whether an Extended Multiple Transform (EMT) -based encoding method is to be used.
The EMT may be a coding method of selecting and using a designated transform for a transform block as a target block among the provided plurality of transforms.
EMT may also represent "enhanced multi-transform" and may also indicate "multi-transform selection (MTS)".
6) The EMT flag information may indicate whether the EMT is to be used.
7) The MTS index (MTS _ idx) information may indicate which transforms are to be used in the horizontal and vertical directions when using MTS.
A part of the mts _ idx information (e.g., one designated bit in the mts _ idx) may be information indicating a transform used in the horizontal direction of the residual signal.
Another portion of the mts _ idx information or a portion of the rest of the mts _ idx information (e.g., one other designated bit in the mts _ idx) may be information indicating a transform used in the vertical direction of the residual signal.
For example, the determination of the transformation according to the mts _ idx information may be configured as shown in table 7 and table 8 below.
[ Table 7]
Figure BDA0002623340100000641
Figure BDA0002623340100000651
[ Table 8]
Figure BDA0002623340100000652
In table 7, the transformation in the horizontal direction and the transformation in the vertical direction used according to the intra prediction mode and the value of the mts _ idx information are exemplified.
According to table 7, when acquiring mts _ idx information in order to perform intra prediction or inter prediction of a target block, transformation in the horizontal direction and transformation in the vertical direction to be used for transformation of the target block can be determined according to the value of mts _ idx information.
For example, when the intra prediction mode of the target block is 6 and the value of the mts _ idx information is 2, DCT-7 may be used as a transform in the horizontal direction and DCT-8 may be used as a transform in the vertical direction.
Table 8 shows a modified example of table 7. In table 8, the MTS _ CU _ flag may indicate that flag information MTS _ flag is determined and transmitted based on the CU, wherein the flag information MTS _ flag indicates whether a Multiple Transform Selection (MTS) method is used. Further, MTS _ Hor _ flag and MTS _ Ver _ flag may indicate a transform used in the horizontal direction and a transform used in the vertical direction, respectively. Table 8 may illustrate transforms used in the horizontal and vertical directions by values of MTS _ Hor _ flag and MTS _ Ver _ flag.
Alternatively, the determination of the transformation according to the mts _ idx information of table 7 may be configured as shown in table 9 below.
[ Table 9]
Figure BDA0002623340100000653
Figure BDA0002623340100000661
In table 9, in the intra prediction and the inter prediction, a horizontal transform type and a vertical transform type used according to the value of the mts _ idx information are exemplified.
Various values of the horizontal transform type may indicate a particular transform. For example, a value of "1" for the horizontal transform type may represent DST-7. The value of "2" for the horizontal transform type may represent DCT-8.
8) The non-separable secondary transform (NSST) flag (NSST _ flag) information may indicate whether an NSST encoding method for additionally performing a non-separable secondary transform on all or some transform coefficients acquired via a primary transform is to be used.
9) The NSST index (NSST _ idx) information may indicate a type of quadratic transform to be applied to all or some transform coefficients when using the NSST encoding method.
The nsst _ idx information may indicate a transform to be used for the non-separable quadratic transform.
10) The CU skip flag information may indicate whether a step of transmitting encoded data on the CU is to be skipped.
11) The CU Local Illumination Compensation (LIC) flag (CU LIC flag) information may indicate whether a difference between luminance values of blocks is compensated.
12) Overlapped Block Motion Compensation (OBMC) flag (OBMC _ flag) information may indicate whether a plurality of overlapped motion compensation blocks are used to generate a final motion compensation block.
13) The codealfctueenable flag (codealfctueenable _ flag) information may indicate whether an Adaptive Loop Filter (ALF) is applicable to a pixel value of the current CTU.
When such encoding decision information is shared between channels, an image having excellent image quality can be obtained while improving the image compression rate.
In describing coding decision information to be shared between channels according to the present embodiment, in order to facilitate the overall description and understanding of the embodiments (such as description of operations, drawings, and equations), transform _ skip _ flag information may be used as an example of coding decision information to be shared between channels.
However, the transform _ skip _ flag information is only a single example, and the coding decision information to be shared between channels to which the present embodiment is applied does not necessarily represent only the transform _ skip _ flag information.
For example, it is understood that one or more pieces (such as 1) rdpcm _ flag information, 2) pieces of transform-related selection information such as mts _ flag information, mts _ idx information, nst _ flag information, and nst _ idx information, 3) obmc _ flag information, and 4) PDPC _ flag information) of the above pieces of encoding decision information required for decoding are included in encoding decision information to be shared between channels.
Also, when a channel that will share coding decision information required for decoding according to an embodiment is described, a YCbCr color space may be used as an example. However, the YCbCr color space is only a single detailed example, and embodiments may be applied to various color spaces, such as YUV color space, XYZ color space, and RGB color space.
The color index cIDX may be a channel index indicating one of the channels in the color space.
For the YCbCr color space and the YUV color space, the cIDX may have a value such as "0/1/2" for the channels sequentially displayed in the corresponding color space. The value "a/b/c" may represent that the value indicating the cIDX of the first channel is 'a', the value indicating the cIDX of the second channel is 'b', and the value indicating the cIDX of the third channel is 'c'.
Alternatively, for the YCbCr color space and the YUV color space, the cIDX may have a value such as "0/2/1" for the channels sequentially displayed in the corresponding color space.
For both the RGB color space and the XYZ color space, cIDX may have a value such as "1/0/2" or "2/0/1" for the channels sequentially displayed in the corresponding color space.
As image compression techniques that have been developed or are being developed for the purpose of achieving efficient image encoding/decoding, there may be various techniques such as 1) an inter prediction technique of predicting values of pixels included in a target picture from pictures before or after the target picture, 2) an intra prediction technique of predicting values of pixels included in a current target picture using information of the pixels in the target picture, 3) a transform and quantization technique of compressing energy of a residual signal remaining as a prediction error, 4) an entropy encoding technique of allocating short codes to more frequently occurring values and long codes to less frequently occurring values, and an arithmetic encoding technique. By utilizing these image compression techniques, image data can be efficiently compressed, transmitted, and stored.
There are various compression techniques that can be applied to the encoding of images. Furthermore, certain compression techniques may be more advantageous than others depending on the properties of the image to be encoded. Accordingly, the encoding apparatus 1600 may perform the most advantageous compression on the target block by adaptively determining whether any one of various types of compression techniques is used for the target block.
Accordingly, in order to select the most advantageous compression technique for the target block from among various selectable compression techniques, the encoding apparatus 1600 may generally perform Rate Distortion Optimization (RDO). From a rate-distortion perspective, it may not be known in advance which of the various image coding decisions that can be selected for the coding of an image is optimal. Thus, the encoding apparatus 1600 may calculate rate-distortion values for a combination of all available image encoding decisions by performing encoding (or simplified encoding) on respective combinations of all available image encoding decisions, and may determine and use an image encoding decision having a smallest rate-distortion value of the calculated rate-distortion values as a final image encoding decision for the target block.
Further, the encoding apparatus 1600 may record an encoding decision derived by performing such RDO or derived using an additional decision method selected by the encoding apparatus 1600 in the bitstream. The decoding apparatus 1700 can read (i.e., parse) the coding decision recorded in the bitstream and can accurately perform decoding on the target block by performing inverse processing corresponding to the coding according to the coding decision.
Here, the information indicating the coding decision may be referred to as "coding decision information" or "coding information" required for decoding.
Hereinafter, the terms "encoding decision information" and "encoding information" may have the same meaning and may be used interchangeably with each other.
In general, multiple channels for an image (e.g., YUV, YCbCr, RGB, and XYZ) may not always have the same or similar properties. Therefore, from the perspective of improving compression rate, making coding decisions independently of each of multiple passes may generally achieve better performance.
For example, as one of the above-described coding decisions, there is a transform _ skip _ flag as a coding decision indicating whether or not to perform a transform on a target block. That is, it may be determined whether a transform will be skipped for each of the blocks, and transform _ skip _ flag information indicating such a decision may be recorded as coding decision information in a bitstream for each of a plurality of channels.
In general, in encoding for image compression, it has been considered that a transform for a target block is always performed. However, when the spatial change of the values of pixels in a target block that is a target of compression is very large, or particularly when the change of the pixel values is very locally limited, even if the transform is applied, the degree to which the image energy concentrates on the low frequency may not be large, and conversely, a large number of transform coefficients for the high frequency region having a relatively large value may occur.
Therefore, when a low-frequency signal component is largely maintained and a high-frequency signal component is eliminated by the transform and quantization process, or when a transform and quantization process for reducing the amount of data by applying strong quantization is applied, a serious degradation of image quality may occur. In particular, such degradation of image quality may be further enhanced when the spatial change of the values of the pixels is very large or the change of the pixel values is concentrated on a very locally limited area.
In order to solve the above problem, a method for directly encoding values of pixels in a spatial domain without a transform may be used instead of uniformly applying the transform to a target block. According to this method, it may be determined whether to perform a transform on each transform block. By performing a transform or skipping a transform based on such a decision, encoding of a transform block may be performed. In the bitstream, transform _ skip _ flag information, which is coding decision information indicating whether or not to skip performing a transform, may be recorded.
For example, when the value of transform _ skip _ flag information is 1, a transform may be skipped. When the value of transform _ skip _ flag information is 0, a transform may be performed. The encoding apparatus 1600 may transmit information on whether to skip a transform for a target block to the decoding apparatus 1700 through transform _ skip _ flag information, and may solve the above-described problems by means of such transmission.
Further, a plurality of transform _ skip _ flag information may be set for a luminance channel (i.e., Y channel) and a chrominance channel (i.e., Cb channel and Cr channel), respectively, and then may be transmitted. The decoding apparatus 1700 may perform decoding on the target block by skipping or performing transformation on the channel of the target block according to the value of transform _ skip _ flag information for each channel read (i.e., parsed) from the bitstream.
However, when a plurality of pieces of transform _ skip _ flag information for channels such as Y, Cb and Cr are transmitted for all transform blocks, another problem may occur in that overhead may increase due to signaling of the plurality of pieces of transform _ skip _ flag information and a compression rate of a picture may be deteriorated.
In order to alleviate problems such as degradation of compression rate, flag information indicating whether a transform is to be skipped may be transmitted limitedly only when the size of a transform block is less than or equal to a specific transform block size. However, even if such a scheme is used, pieces of flag information indicating whether or not transforms are to be skipped for all channels must be transmitted for each transform block having a size larger than the specific block size, which may still deteriorate the compression rate of an image. Further, such a deteriorated compression rate inevitably reduces the quality of a compressed image.
In order to solve the degradation of the compression rate caused by transmitting a plurality of pieces of encoding decision information selected by the encoding apparatus 1600 for all channels, an encoding and/or decoding method using sharing of information between channels is disclosed in the embodiment.
First, conditions under which image attributes of channels are determined to be similar to each other can be defined in advance. When these conditions are satisfied, encoding decision information for an image or a block determined by the encoding apparatus 1600 for a representative channel among a plurality of channels may be transmitted to the decoding apparatus 1700.
The coding decision information transmitted for the representative channel may be shared and used for all or selected ones of the plurality of channels except for the representative channel. By such sharing and using means, the compression rate of the image can be improved. Therefore, the encoding and/or decoding method according to the present embodiment can provide excellent encoding efficiency even if individual pieces of encoding decision information for a plurality of channels are not transmitted.
Here, the encoding decision information to be shared may include one or more of the above-described transform _ skip _ flag information, intra smoothing filter information, rdpcm _ flag information, MTS _ idx information, PDPC _ flag information, MTS _ CU _ flag information, MTS _ Hor _ flag information, MTS _ Ver _ flag information, nst _ idx information, CU skip flag information, CU _ lic _ flag information, obmc _ flag information, codeAlfCtuEnable _ flag information, and PDPC _ flag information.
Condition in which image attributes of respective channels are determined to be similar to each other
To determine that the image properties of the channels are similar to each other, it may be checked whether cross-channel prediction (inter-channel prediction) has been used for the target block.
That is, in order to predict a decoding target channel of a target block, it may be checked whether a prediction method for obtaining a prediction value for decoding the target channel by applying a specific model to reconstruction information of another channel (e.g., a luminance channel) is used. For example, the reconstruction information may be a pixel value of a reconstructed pixel or a value of a transform coefficient. The specific model may be a linear model.
The decoding target channel may be a channel that is a target to be currently decoded among a plurality of channels. The encoding target channel may be a channel that is a target to be currently encoded among a plurality of channels. Hereinafter, the encoding target channel and/or the decoding target channel may also be simply referred to as "target channel".
For example, it may be checked whether the intra prediction for the target block uses an intra prediction mode that derives a prediction value for the target channel by using reconstruction information of another channel.
To derive a predicted value for a target channel using reconstruction information of additional channels, a cross-component linear model (CCLM) using a single linear model, a multi-mode linear model (MMLM) using a plurality of linear models, and a multi-filter linear model using a plurality of filters may be used. In CCLM, the term "component" can be replaced by "channel".
The INTRA _ CCLM mode may be an INTRA prediction mode using CCLM. The INTRA _ MMLM mode may be an INTRA prediction mode using MMLM. The INTRA _ MFLM mode may be an INTRA prediction mode using MFLM.
Alternatively, the determination that the image properties of the channels are similar to each other may be achieved by checking whether an INTRA prediction mode of the target block (e.g., INTRA _ chroma _ pred _ mode information indicating an INTRA prediction mode for a chroma channel of the target block) is one of an INTRA _ CCLM mode, an INTRA _ MMLM mode, and an INTRA _ MFLM mode.
Alternatively, the determination that the image attributes of the channels are similar to each other may be achieved by checking whether the encoding mode of the target channel of the target block uses the encoding mode of another channel (e.g., a luminance channel) without being changed. For example, the determination that the image attributes of the channels are similar to each other may be accomplished by checking whether an intra prediction mode (e.g., intra _ chroma _ pred _ mode information) of the target block is a Direct Mode (DM). The direct mode may also be referred to as the "derived mode". The DM may be a mode indicating that the intra prediction mode of the luminance channel is used as the intra prediction mode of the chrominance channel without change due to a characteristic that the correlation between the luminance channel and the chrominance channel may be high.
The feature of DM, which is one of the intra prediction modes, and the detailed operation thereof may be defined in more detail with reference to the following tables 10 and 11.
Table 10 shows a method for setting an IntraPredModeC value for intra prediction of a chrominance signal (when the value of the sps _ cclm _ enabled _ flag information is 0).
Table 11 shows a method for setting an IntraPredModeC value for intra prediction of a chrominance signal (when the value of the sps _ cclm _ enabled _ flag information is 1).
[ Table 10]
Figure BDA0002623340100000711
[ Table 11]
Figure BDA0002623340100000721
Generally, in intra prediction, it may be determined whether an intra cross-component linear model (CCLM) mode, an intra multi-model lm (mmlm) mode, or an intra multi-filter lm (mflm) mode is used in which pixel values of reconstructed pixels for a single channel (e.g., a luma channel, or more generally, a representative channel) are used to calculate predicted values for another channel (e.g., a chroma channel, or more generally, a target channel).
The indication of the case of using the INTRA _ CCLM mode, the INTRA _ MMLM mode, or the INTRA _ MFLM mode may be classified into two types, and is defined in detail according to the value of the sps _ CCLM _ enabled _ flag information, as shown in tables 10 and 11.
The sps _ CCLM _ enabled _ flag information may be information indicating whether the INTRA _ CCLM mode, the INTRA _ MMLM mode, and the INTRA _ MFLM mode are to be enabled. Alternatively, the sps _ CCLM _ enabled _ flag information may be information indicating whether the INTRA _ CCLM mode, the INTRA _ MMLM mode, and the INTRA _ MFLM mode have been enabled.
The INTRA _ CCLM mode, INTRA _ MMLM mode, and INTRA _ MFLM mode may not be used when the value of sps _ CCLM _ enabled _ flag is 0, and the INTRA _ CCLM mode may be used when the value of sps _ CCLM _ enabled _ flag is 1. Alternatively, when the value of sps _ cclm _ enabled _ flag is 1, at least one of the INTRA _ MMLM mode and the INTRA _ MFLM mode may be used.
Whether the intra prediction mode (e.g., intra _ chroma _ pred _ mode information) of the target channel is DM may be determined by checking whether the intra prediction mode (i.e., the value of intra _ chroma _ pred _ mode) of the chroma channel is a specific value (e.g., 4 in table 10 and 5 in table 11). In the description of such an operation, when the value of sps _ cclm _ enabled _ flag is 0, table 10 may be referred to, and when the value of sps _ cclm _ enabled _ flag is 1, table 11 may be referred to.
When the value of sps _ cclm _ enabled _ flag is 0 and the value of the intra prediction mode (e.g., intra _ chroma _ pred _ mode) of the target channel is 4, it may be considered that DM is applied. Alternatively, when the value of sps _ cclm _ enabled _ flag is 1 and the value of the intra prediction mode (e.g., intra _ chroma _ pred _ mode) of the target channel is 5, it may be considered that DM is applied. For intra prediction of a target channel (e.g., a chrominance channel) of a target block indicated by DM, a value of intrapredmode indicating an intra prediction mode of a representative channel (e.g., a luminance channel) may be used as a value of intrapredmode without change.
Here, the intra prediction mode intra _ chroma _ pred _ mode of the chrominance signal may be index information indicating which type of intra prediction is to be used for the chrominance signal.
With such index information, a final value indicating an intra prediction mode actually used for intra prediction of a chrominance signal may be a value of IntraPredModeC. In other words, IntraPredModeC may indicate an intra prediction mode actually used for intra prediction of a chrominance signal.
When the value of sps _ cclm _ enabled _ flag is 0 and DM is applied (i.e., the value of intra _ chroma _ pred _ mode is 4), if the value of intrapredmode is 0, 50, 18, or 1, the value of intrapredmode may also be 0, 50, 18, or 1.
Here, a value of 0 may represent a planar mode (i.e., a plane prediction or plane direction), a value of 1 may represent a DC mode, a value of 18 may represent a horizontal mode, a value of 50 may represent a vertical mode, and a value of 66 may represent a diagonal mode.
When the value of IntraPredModeY is another value X different from any one of the four values 0, 50, 18, and 1, the value of IntraPredModeC may also be X equal to the value of IntraPredModeY at this time.
In addition, as shown in the first four rows in table 10, when the value of cclm _ enabled _ flag is 0, if the value of IntraPredModeY is 0, 50, 18, or 1, the value of IntraPredModeC may be determined according to the value of IntraPredModeY.
For example, as described in the first row of table 10, when the value of IntraPredModeY is 0, 50, 18, or 1, the value of IntraPredModeC may be 66, 0, or 0. When the value of IntraPredModeY is another value than 0, 50, 18, and 1, the value of IntraPredModeC may be 0.
Further, when the value of sps _ cclm _ enabled _ flag is 1 and DM is applied (i.e., the value of intra _ chroma _ pred _ mode is 5), if the value of intrapredmode is 0, 50, 18, or 1, the value of intrapredmode may also be 0, 50, 18, or 1.
Here, a value of 0 may represent a planar mode (i.e., a plane prediction or plane direction), a value of 1 may represent a DC mode, a value of 18 may represent a horizontal mode, a value of 50 may represent a vertical mode, and a value of 66 may represent a diagonal mode.
When the value of IntraPredModeY is another value X different from any one of the four values 0, 50, 18, and 1, the value of IntraPredModeC may also be X equal to the value of IntraPredModeY at this time.
In addition, as shown in the first five rows in table 11, when the value of cclm _ enabled _ flag is 1, if the value of IntraPredModeY is 0, 50, 18, or 1, the value of IntraPredModeC may be determined according to the value of IntraPredModeY.
For example, as described in the first row of table 11, when the value of IntraPredModeY is 0, 50, 18, or 1, the value of IntraPredModeC may be 66, 0, or 0. When the value of IntraPredModeY is another value than 0, 50, 18, and 1, the value of IntraPredModeC may be 0.
In another embodiment, the determination that the image attributes of the channels are similar to each other may be achieved by checking whether a mode indicating that only a specific mode restricted by the encoding mode of another channel (e.g., a luminance channel) is to be used as the encoding mode of the target channel for the target block. For example, the determination that the image attributes of the channels are similar to each other may be made by checking whether the intra prediction mode of the target channel is the Direct Mode (DM).
Cross-channel prediction using correlation between channels
Cross-channel prediction may be a technique of using pixel values of pixels in another channel instead of using intra prediction or inter prediction when predicting pixel values of pixels in a target channel.
The fact that the performance of cross-channel prediction is better than that of other types of prediction when the target block is encoded may indicate that there is considerable similarity between the pixel values of the pixels in the channels of the target block.
Therefore, in this case, when the value of the coding decision information of the representative channel is determined, it may be advantageous to use the determined value of the coding decision information of the representative channel for the coding decision information of the other channel as such or to use a specific value indicated by the value of the coding decision information of the representative channel for the coding decision information of the other channel.
For example, when the value of the transform _ skip _ flag information of the representative channel is 0 (indicating that no transform is skipped), the probability that the determined value of the transform _ skip _ flag information will be '0' may be high even in other channels.
Therefore, for a picture or a block for which cross-channel prediction is effective, a case may arise in which it is not necessary to individually specify pieces of transform _ skip _ flag information for a plurality of channels. The reason for this is that the similarity between channels is high, and therefore the probability that a plurality of transform _ skip _ flag information of the respective channels will be identical to each other is likely to be high.
Despite such image attributes, when a plurality of pieces of transform _ skip _ flag information are transmitted separately for channels of an image, the compression rate and image quality may be deteriorated.
This principle may also be applied to additional coding decision information, i.e., mts _ flag information, mts _ idx information, nsst _ flag information, nsst _ idx information, intra smoothing filter information, PDPC _ flag information, and rdpcm _ flag information, and the probability that the value of the coding decision information for a representative channel will be the same as the value of the coding decision information for the other channel may be higher.
Accordingly, based on the condition that the image properties of the channels are determined to be similar to each other, it may be determined whether cross-channel prediction using correlation between the channels has been determined as the encoding mode of the target block.
For example, the determination of whether cross-channel prediction has been determined as the encoding mode of the target block may aim to determine whether image properties of the channels are similar to each other according to whether a color component linear prediction mode (CCLM) indicating cross-channel prediction is applied to the target block. To determine whether CCLM is applied to the target block, it may be checked whether the prediction mode of the target block is one of INTRA _ CCLM mode, INTRA _ MMLM mode, and INTRA _ MFLM mode.
For example, the determination of whether cross-channel prediction has been determined as the encoding mode of the target block may be intended to determine whether the intra mode of the representative channel (e.g., luma channel) is used for the other channels (e.g., chroma channels Cb and Cr) without being changed. Alternatively, the determination of whether cross-channel prediction has been determined as the encoding mode of the target block may be intended to determine whether a particular intra mode indicated by the intra mode of the representative channel is used for another channel. Alternatively, the determination of whether cross-channel prediction has been determined as the encoding mode of the target block may be intended to determine whether a particular intra mode derived from the intra mode of the representative channel is used for another channel.
For example, the determination of whether cross-channel prediction has been determined as the encoding mode of the target block may be intended to determine whether the encoding apparatus 1600 and the decoding apparatus 1700 use a specific encoding mode (e.g., an inter-channel sharing mode) in conformity with each other.
For example, the determination of whether cross-channel prediction has been determined as the encoding mode of the target block may be intended to determine whether the intra prediction mode of the chroma channel is DM.
Such DM may be a mode indicating that the intra prediction mode of the chroma channel is used as the intra prediction mode of the luma channel without being changed due to a characteristic that the correlation between the luma channel and the chroma channel may be high. Therefore, when the intra prediction mode of the chroma channel of the target block is DM, it may be determined that the condition that the image attributes of the channels are determined to be similar to each other is satisfied.
In addition to the conditions described in the above examples, it may be determined whether cross-channel prediction has been determined as the encoding mode of the target block based on the size of the block.
For example, the larger the size of a block, the higher the probability that a pixel having a heterogeneous property will be present in the corresponding block. Thus, the similarity between channels of blocks having a larger size may be less than the similarity between channels of blocks having a smaller size. Furthermore, when the size of the block is too small, the similarity between channels of the block may be unstable.
For example, sharing of information between channels may be performed only for blocks having a size less than or equal to a particular size. The specific size may be 64 × 64, 32 × 32, or 16 × 16. When the block size is less than or equal to the specific size, sharing of information between channels is performed, and thus the condition that the image attributes of the channels are determined to be similar to each other can be more reliably satisfied.
Alternatively, sharing of information between channels may be performed only for blocks having a size larger than a certain size. The specific size may be 4 × 4. When the size of the block is larger than the certain size, sharing of information between channels is performed, and thus the condition that the image attributes of the channels are determined to be similar to each other can be satisfied more reliably.
Alternatively, the sharing of information between channels may be performed only when the size of the block is greater than a first specific size (e.g., 4 × 4) and less than or equal to a second specific size (e.g., 32 × 32 or 64 × 64). Sharing of information between channels may be performed only when the size of the block falls within a certain range, and thus the condition that the image attributes of the channels are determined to be similar to each other may be more reliably satisfied.
In an embodiment, a method and apparatus for encoding a target block through sharing of information between channels will be described below, and the following functions may be provided.
-the coding decision information of one channel of the target block can be parsed from the compressed bitstream and said one channel can be used The coding decision information of the individual channels to perform decoding for all channels or some selected channels of the target block.
The bitstream may be configured such that the coding decision information is sent only for a representative channel or some selected channels of the target block.
It may be determined whether a transform is to be skipped for one channel, and the determination of whether a transform is to be skipped may be applied to further channels.
For one channel of the transform block, transform _ skip _ flag information may be parsed from the compressed bitstream. Whether a transform is to be skipped may be determined for a channel or channels of a transform block by utilizing parsed transform _ skip _ flag information.
For one channel of the transform block, transform _ skip _ flag information may be signaled. The transform _ skip _ flag information for one channel can be used even for other channels.
Coding decision information can be efficiently signaled by sharing of information between channels. By such effective signaling, coding efficiency and subjective image quality can be improved.
In particular, when the spatial change of pixel values in a block is very large or very steep, the degree to which the image energy is concentrated on low frequencies may not be large even if the transform is applied to the target block. Further, when a low-frequency signal component is largely maintained and a high-frequency signal component is eliminated by applying transform and quantization processing to such a block, or when strong quantization is applied to such a block, serious deterioration in image quality may occur. In an embodiment, whether a transform for a block is to be skipped may be sparingly indicated based on the determination of the encoding apparatus 1600 without causing large overhead. By such an indication of the saving, the compression rate of the image can be improved, and the deterioration of the image quality can be minimized.
When a cross-channel prediction technique using correlation between channels is used, pieces of coding decision information may not be used for a plurality of channels, respectively. In an embodiment, the coding decision information may be transmitted for one channel, and the coding decision information transmitted for the one channel may be shared and used with all or some selected from the remaining channels. By such sharing, the problem of deterioration of the compression rate and the image quality can be solved.
Determining representative channels from a color space
As a color space for image encoding and decoding, there are YCbCr and YUV spaces for encoding and decoding of general images, and further, there are RGB, XYZ, and YCoCg spaces. When one of the various color spaces is a target color space for encoding and decoding of an image, one of the channels of the target color space may be determined as a representative channel of the target color space.
In an embodiment, a color channel having the highest correlation with a luminance signal among the channels may be determined as the representative channel. For example, in the RGB color space, the G channel may have the highest correlation with the luminance signal, and thus the G channel may be selected as the representative channel. In the XYZ color space, the Y channel may have the highest correlation with the luminance signal, and thus the Y channel may be selected as the representative channel. In the YCoCg color space, the Y channel may have the highest correlation with the luminance signal, and thus the Y channel may be selected as the representative channel.
A channel in color space may be represented by an index value such as "0/1/2". The SelectedCIDX may be an index of the selected color. Alternatively, the SelectedCIDX may be an index value indicating the selected representative channel. The representative channel may be determined by an index SelectedCIDX indicating the selected representative channel among pieces of information on the target block in the bitstream.
For example, in the YCbCr color space, the value of SelectedCIDX may be 0 indicating the Y channel.
For example, in the YCbCr color space, the Cb channel may be determined as a representative channel. When the Cb channel is determined to be a representative channel, the value of SelectedCIDX may be 1 indicating the Cb channel.
For example, in the YUV color space, the U channel may be determined as a representative channel. When the U channel is determined to be a representative channel, the value of SelectedCIDX may be 1 indicating the U channel.
For sharing of coding decision information between channels, a specific channel in the color space may be selected as a representative channel. In encoding and decoding of an image, encoding decision information of a representative channel may be shared among one or more remaining channels.
For example, the encoding apparatus 1600 may signal only the encoding decision information of the representative channel to the decoding apparatus 1700 through the bitstream. Alternatively, the decoding apparatus 1700 may derive the coding decision information of the representative channel using the bitstream. The coding decision information for at least some of the remaining channels may not be signaled separately. The decoding apparatus 1700 may derive the coding decision information of at least some of the remaining channels using the coding decision information of the representative channel. In other words, the coding decision information of a representative channel may be shared with at least some of the remaining channels.
For example, when the Y channel having the highest correlation with a luminance signal is selected as a representative channel in the YCbCr color space, there may be a correlation between the luminance channel (i.e., Y) and the chrominance channels (i.e., Cb and/or Cr). Accordingly, when prediction is performed for image compression, coding decision information for a luminance channel as a representative channel may be implicitly shared as pieces of coding decision information for one or more chrominance blocks, instead of applying independent predictions to three channels in a color space, respectively. The one or more chroma blocks may include one or more of Cb blocks and Cr blocks.
For example, when a Cb channel is selected as a representative channel in the YCbCr color space, there may be a correlation between a Cb signal and a Cr signal constituting a chrominance channel. Therefore, when prediction is performed for image compression, coding decision information for a Cb channel, which is a representative channel, may be implicitly shared as coding decision information for a Cr block, instead of applying independent predictions to two chroma channels, respectively.
Coding decision information shared between channels
The coding decision information that can be shared between the channels may be information such as syntax elements, which is encoded by the encoding apparatus 1600 and signaled to the decoding apparatus 1700 as information included in the bitstream. For example, the coding decision information may include a flag, an index, and the like. Further, the encoding decision information may include information derived during the encoding and/or decoding process. Further, the encoding decision information may represent information required to encode and/or decode the image.
For example, the encoding decision information may include a size of the unit/block, a depth of the unit/block, partition information of the unit/block, a partition structure of the unit/block, partition flag information indicating whether the unit/block is partitioned in the form of a quad-tree, partition flag information indicating whether the unit/block is partitioned in the form of a binary tree, a partition direction (horizontal direction or vertical direction) in the form of a binary tree, a partition form (symmetric partition or asymmetric partition) in the form of a binary tree, partition flag information indicating whether the unit/block is partitioned in the form of a ternary tree, a partition direction (horizontal direction or vertical direction) in the form of a ternary tree, a partition form (symmetric partition or asymmetric partition) in the form of a ternary tree, information indicating whether the unit/block is partitioned in the form of a composite tree, a combination and direction (horizontal direction or vertical direction) of partitions in the form of a composite tree, and a partition structure of the unit/block, Prediction scheme (intra prediction or inter prediction), intra prediction mode/direction, reference sample point filtering method, prediction block boundary filtering method, filtered filter tap, filtered filter coefficient, inter prediction mode, motion information, motion vector, reference picture index, inter prediction direction, inter prediction indicator, reference picture list, reference picture, motion vector predictor, motion vector prediction candidate, motion vector candidate list, information indicating whether merge mode is used, merge candidate list, information indicating whether skip mode is used, type of interpolation filter, filter tap of interpolation filter, filter coefficient of interpolation filter, size of motion vector, precision of representation of motion vector, transform type, transform size, information indicating whether primary transform is used, motion vector, Information indicating whether additional (secondary) transform is used, primary transform selection information (or primary transform index), secondary transform selection information (or secondary transform index), information indicating the presence or absence of a residual signal, a coding block pattern, a coding block flag, a quantization parameter, a quantization matrix, information about an in-loop filter, information about whether an in-loop filter is applied, a coefficient of an in-loop filter, a filter tap of an in-loop filter, a shape/form of an in-loop filter, information indicating whether a deblocking filter is applied, a coefficient of a deblocking filter, a tap of a deblocking filter, a strength of a deblocking filter, a shape/form of a deblocking filter, information indicating whether an adaptive sample offset is applied, a value of an adaptive sample offset, a video signal, and a video signal, A category of the adaptive sample point offset, a type of the adaptive sample point offset, information indicating whether the adaptive loop filter is applied, coefficients of the adaptive loop filter, taps of the adaptive loop filter, a shape/form of the adaptive loop filter, a binarization/debinarization method, a context model determination method, a context model update method, information indicating whether a normal mode is performed, information indicating whether a bypass mode is performed, a context binary bit, a bypass binary bit, a transform coefficient level scanning method, an image display/output sequence, slice identification information, a slice type, slice partition information, parallel block identification information, parallel block type information, parallel block partition information, a picture type, a bit depth, information on a luminance signal, and information on a chrominance signal, At least one or a combination of transform _ skip _ flag information, primary transform selection information, secondary transform selection information, reference sample filtering information, PDPC _ flag information, rdpcm _ flag information, EMT flag information, mts _ idx information, nst _ flag information, and nst _ idx information.
Among pieces of encoding decision information that may be shared between channels, the primary transform selection information may be transform information required to perform a transform process on a residual signal using a combination of one or more DCT transform kernels and/or DST transform kernels related to a horizontal direction and/or a vertical direction. For example, the primary transformation selection information may be information required in order to use MTS in the primary transformation. The primary transform selection information may include mts _ flag information and mts _ idx information.
The primary transform selection information applied to the target block may be explicitly signaled or, alternatively, may be implicitly derived by the encoding apparatus 1600 and the decoding apparatus 1700 using the coding decision information of the target block and the coding decision information of the neighboring blocks.
After the primary transform is completed in the encoding apparatus 1600, a secondary transform may be performed in order to improve the energy concentration of transform coefficients.
The quadratic transform selection information applied to the target block may be explicitly signaled or, alternatively, may be implicitly derived by the encoding apparatus 1600 and the decoding apparatus 1700 using the coding decision information of the target block and the coding decision information of the neighboring blocks. The decoding apparatus 1700 may perform the secondary inverse transform according to whether the secondary inverse transform is to be performed, and may perform the primary inverse transform on the result of performing the secondary inverse transform according to whether the primary inverse transform is to be performed.
The encoding apparatus 1600 may generate rdpcm _ flag information for a target block and may record the rdpcm _ flag information in a bitstream. The decoding apparatus 1700 may acquire RDPCM _ flag information through a bitstream and may or may not perform RDPCM according to information indicated by the RDPCM _ flag information.
Fig. 18 is a flow diagram of a method for decoding coding decision information according to an embodiment.
According to an embodiment, when a specific channel in the color space is selected as the representative channel in order to share information among the plurality of channels, the encoding decision information of the representative channel of the target block may be shared by one or more remaining channels of the plurality of channels other than the representative channel.
For example, when performing intra prediction for a target block in a YCbCr color space, a Y channel may be set as a representative channel, and thereafter, intra coding decision information of the representative channel may be shared and used to perform decoding of channels other than the representative channel (i.e., a Cb channel and/or a Cr channel), instead of independently transmitting pieces of intra coding decision information for three channels in the color space, respectively. Alternatively, after the Cb channel is assumed to be a representative channel in the YCbCr color space, the intra coding decision information of the representative channel may be implicitly used for decoding of the Cr channel, which is a channel other than the representative channel.
For example, the intra coding decision information of the representative channel, which may be shared by the remaining channels, may include one or more of an intra prediction mode, an intra prediction direction, a prediction block boundary filtering method, a filter tap for prediction block boundary filtering, a filter coefficient for prediction block boundary filtering, transform _ skip _ flag information, primary transform selection information, secondary transform selection information, mts _ flag information, mts _ idx information, PDPC _ flag information, rdpcm _ flag information, EMT flag information, nst _ flag information, nsst _ idx information, intra smoothing filter information, CU skip flag information, CU _ lic flag information, obmc _ flag information, codealfctueable _ flag information, and PDPC _ flag information.
The case where cross-channel prediction is selected from among various techniques for predicting a chrominance signal (including angle prediction, DC prediction, plane prediction, etc.) or the case where cross-channel prediction is more advantageous may mean that the properties of a luminance signal (i.e., a Y signal) are very similar to those of a chrominance signal (i.e., a Cb signal and/or a Cr signal).
In this case, when the channel of the Y signal block is a representative channel, the coding decision information determined in the process of encoding and/or decoding the representative channel may be equally applied to the chrominance block. By this application, the number of bits required to transmit the coding decision information can be reduced. Thus, encoding and decoding may be performed such that a single piece of coding decision information is used for multiple channels via cross-channel prediction.
For example, after the coding decision information for the luminance channel has been determined, the determined coding decision information may be shared among the remaining channels, and the encoding may be performed based on the shared coding decision information. Alternatively, pieces of encoding decision information may be independently applied to three channels instead of being shared among the channels, and the three channels may be independently encoded and/or decoded. The encoding apparatus 1600 may determine a method advantageous from the viewpoint of rate distortion as an encoding method among the method for sharing encoding decision information between channels and the method for independently encoding the channels. According to the determination, the encoding apparatus 1600 may explicitly write information regarding whether the encoding decision information is shared between channels into the bitstream, and this information may be transmitted to the decoding apparatus 1700 through the bitstream.
For example, when a specific coding condition is satisfied in the encoding and/or decoding process, information regarding whether coding decision information is shared between channels may not be explicitly signaled, and coding decision information of a representative channel may be shared by the remaining channels.
For example, the specific coding condition may be a condition indicating whether cross-component prediction (CCP), DM, or cross-component linear model (CCLM) is used.
For example, when the remaining chrominance signals (i.e., the Cb signal and/or the Cr signal) are predicted using at least one of the original signal, the reconstructed signal, the residual signal, and the prediction signal of the luminance signal, which has been already encoded or decoded, the sharing of the encoding decision information between channels may be applied.
For example, when the luminance signal is predicted using at least one of the original signal, the reconstructed signal, the residual signal, and the prediction signal of the chrominance signal, which has been already encoded or decoded, the sharing of the encoding decision information between channels may be applied.
For example, when the Cr signal is predicted using at least one of the original signal, the reconstructed signal, the residual signal, and the prediction signal of the Cb signal, which has been already encoded or decoded, the sharing of the encoding decision information between channels may be applied.
For example, when the Cb signal is predicted using at least one of the original signal, the reconstructed signal, the residual signal, and the prediction signal of the Cr signal, which has been already encoded or decoded, the sharing of the encoding decision information between channels may be applied.
When the luminance channel is set as a representative channel in the YCbCr color space, the coding decision information may be signaled only for the luminance signal, and the coding decision information may not be separately signaled for the remaining chrominance channels, instead of signaling the coding decision information to the remaining channels other than the luminance channel. By such selective transmission, the compressibility can be improved.
Furthermore, the encoding decision information may be transmitted only for the Cb signal, and the transmitted encoding decision information may be shared for the Cr signal, and thus the compression rate may be improved. Alternatively, the encoding decision information may be transmitted only for the Cr signal, and the transmitted encoding decision information may be shared for the Cb signal, and thus, the compression rate may be improved.
In step 1810, communication unit 1720 may receive a bitstream. The bitstream may include coding decision information.
In step 1820, the processing unit 1710 may determine whether the sharing of coding decision information is to be used for the target channel of the target block.
When the coding decision information is not shared, step 1830 may be performed.
When the coding decision information is to be shared, step 1840 may be performed.
In step 1830, the processing unit 1710 may obtain coding decision information of the target channel from the bitstream. The processing unit 1710 may parse and read the coding decision information of the target channel from the bitstream.
In step 1840, the processing unit 1710 may set the coding decision information such that the coding decision information of the representative channel is used as the coding decision information of the target channel.
Step 1820, step 1830, and step 1840 may be represented by the following code 1:
[ code 1]
Figure BDA0002623340100000831
The cIdx may indicate a target channel of the target block. For example, when the number of channels in the target image is 3, the cIdx may be one of predefined specific values, and may be one of {0,1,2 }.
In an embodiment, the cIdx of a representative channel may be assumed to be 0.
"cidx! 0 "may indicate that the target channel is not a representative channel (e.g., a luminance channel). In other words, a cIdx value of "0" may indicate that the target channel is a representative channel.
In other words, in step 1820, when the target channel is not a representative channel and cross-channel prediction is used for the target block, the processing unit 1710 may determine to use the sharing of coding decision information for the target channel. When the target channel is a representative channel or when cross-channel prediction is not used for the target block, the processing unit 1710 may determine not to use sharing of coding decision information for the target channel.
Whether cross-channel prediction is used for the target block may be 1) derived based on information acquired from a bitstream and 2) implicitly derived according to whether a specific condition is satisfied.
As described above, whether cross-channel prediction is used may be determined based on the intra prediction mode of the target block. Whether cross-channel prediction is used may be determined based on whether the INTRA prediction mode of the target block is one of an INTRA _ CCLM mode, an INTRA _ MMLM mode, and an INTRA _ MFLM mode. For example, when the INTRA prediction mode of the target block is one of the INTRA _ CCLM mode, the INTRA _ MMLM mode, and the INTRA _ MFLM mode, the processing unit 1710 may determine that cross-channel prediction is used.
As described above, whether cross-channel prediction is used may be determined based on whether the intra prediction mode of the chroma channel of the target block has a specific value. For example, when the intra prediction mode of the chroma channel of the target block has a particular value, the processing unit 1710 may determine that cross-channel prediction is used.
The intra prediction mode of the chroma channel of the target block may be indicated by intra _ chroma _ pred _ mode information.
As described above, whether cross-channel prediction is used may be determined based on whether the intra prediction mode of the target channel is DM. For example, when the intra prediction mode of the target channel is DM, the processing unit 1710 may determine that cross-channel prediction is used.
In step 1830, the processing unit 1710 may obtain coding decision information for the block of the channel indicated by the cIdx from the bitstream.
The processing unit 1710 may parse and read the coding decision information for the block of the channel indicated by the cIdx from the bitstream.
In step 1840, the processing unit 1710 may share the coding decision information of the representative channel as the coding decision information of the block of the channel indicated by the cIdx. In other words, the processing unit 1710 may set the coding decision information of the representative channel to the coding decision information of the block of the channel indicated by the cIdx.
According to an embodiment, operations corresponding to the conditions or execution may be additionally performed before step 1820 or between steps 1820 and 1840.
According to an embodiment, coding decision information may be transmitted for a Cb signal, and coding decision information of the Cb signal may be shared for a Cr signal without transmitting the coding decision information. In this case, the above code 1 may be modified to the following code 2:
[ code 2]
Figure BDA0002623340100000841
Fig. 19 is a flowchart of a decoding method for determining whether a transform is to be skipped, according to an embodiment.
The encoding apparatus 1600 may determine whether a transform (e.g., a primary transform and/or a secondary transform) is to be skipped according to the size of the target block.
The target block may be a transform block.
log2TrafoSize may represent the size of the target block.
For example, when the size of the target block is less than or equal to a threshold value indicating a boundary value of the block size, the encoding apparatus 1600 may skip the transform of the target block.
The Log2 maxtransformkipsize may represent a threshold value indicating a boundary value of a block size.
When skipping the transform of the target block, the encoding apparatus 1600 may set the value of transform _ skip _ flag information to 1 without performing the transform. the transform _ skip _ flag information may be transmitted to the decoding apparatus 1700 through a bitstream.
Further, when performing the transform of the target block, the encoding apparatus 1600 may perform the transform and may set the value of transform _ skip _ flag information to 0. the transform _ skip _ flag information may be transmitted to the decoding apparatus 1700 through a bitstream.
Here, a plurality of pieces of transform _ skip _ flag information may be separately transmitted for channels constituting a color space of an image.
The decoding apparatus 1700 may acquire a value of transform _ skip _ flag information from the bitstream. In other words, the decoding apparatus 1700 may parse and read transform _ skip _ flag information from the bitstream.
Here, the decoding apparatus 1700 may acquire the value of transform _ skip _ flag information from the bitstream only when the size of the block is less than or equal to a threshold value indicating a boundary value of the block size.
In addition, the decoding apparatus 1700 may acquire a plurality of transform _ skip _ flag information for a plurality of channels of an image.
the acquisition of transform _ skip _ flag information can be represented by the following code 3:
[ code 3]
If(log2TrafoSize<=Log2MaxTransformSkipSize)
transform_skip_flag[x0][y0][cIdx]
x0 and y0 may represent spatial coordinates indicating the position of the target block.
The cIdx may indicate a target channel of the target block information.
When there are three image channels, cIdx may have one of the predefined values {0,1,2 }. The value of the representative channel may be 0.
Code 3 may be modified to the following code 4:
[ code 4]
If((log2TbWidth<=Log2MaxTransformSkipSize_W)&&(log2TbHeight<=Log2MaxTransformSkipSize_H))
transform_skip_flag[x0][y0][cIdx]
log2TbWidth may be a value based on equation 11 below. The "width" may be the width of the target block (i.e., the horizontal length of the target block).
[ equation 11]
log2TbWidth=log2width
log2TbHeight may have a value based on equation 12 below. "height" may be the height of the target block (i.e., the vertical length of the target block).
[ equation 12]
log2TbHeight=log2height
The predefined thresholds, Log2MaxTransformSkipSize _ W and Log2MaxTransformSkipSize _ H, may be equal to each other or may be different from each other. For example, the value of Log2 maxtransformkipsize _ W may be 2, and the value of Log2 maxtransformkipsize _ H may be 2.
Code 3 may be modified to the following code 5:
[ code 5]
If((log2TbWidth<=2)&&(log2TbHeight<=2))
transform_skip_flag[x0][y0][cIdx]
As described above, instead of signaling a plurality of pieces of transform _ skip _ flag information separately for a plurality of channels, the transform _ skip _ flag information may be signaled only for a luminance (Y) signal, and may not be signaled separately for the remaining chrominance channels. Alternatively, the transform _ skip _ flag information may be signaled only for the Cb signal, and may not be signaled separately for the Cr signal, and may be shared for the Cr signal.
Next, an embodiment of sharing transform _ skip _ flag information will be described.
In an embodiment, the decoding apparatus 1700 may acquire transform _ skip _ flag information from a bitstream. This acquisition may be represented by the following code 6:
[ code 6]
Figure BDA0002623340100000861
Figure BDA0002623340100000871
x0 and y0 may be spatial coordinates indicating the location of the target block.
The cIdx may indicate a target channel of the target block.
In code 6 and other codes including the condition "if (Log2 fosize < ═ Log2 maxtransformsmisize)", the condition "if (Log2 fosize < ═ Log2 maxtransformsmisize _ W) & (Log2TbHeight < ═ Log2 maxtransformsmisize _ H))" or the condition "if ((Log2 tbwiddth < ═ 2) & (Log2TbHeight < (Log2 tbheart < ═ 2))" may be used instead of the condition "if (Log2 fosize < Log2 × transformsmisize" size) ".
When the number of channels of an image is 3, the cIdx may have one of predefined values {0,1,2 }. The value of the representative channel may be 0. Alternatively, the value of the representative channel may be 1 or 2.
In step 1910, the communication unit 1720 may receive a bitstream.
At step 1920, processing unit 1710 may determine whether the transform for the target block may be skipped.
If it is determined that the transform may be skipped, step 1930 may be performed.
If it is determined that it is not possible to skip the transform, step 1960 may be performed.
For example, when the size of the target block is less than or equal to a certain size, processing unit 1710 may determine that it is not possible to skip the transform.
For example, when the size of the target block is greater than a certain size, processing unit 1710 may determine that it is not possible to skip the transform.
Here, the specific size may be a boundary value of a block size that allows the transform to be skipped.
For example, when the following condition in code 7 is satisfied (i.e., when the result of the condition in code 7 is true), the processing unit 1710 may determine that it is not possible to skip the transform, and when the following condition in code 7 is not satisfied (i.e., when the result of the condition in code 7 is false), the processing unit 1710 may determine that it is possible to skip the transform.
[ code 7]
if(log2TrafoSize<=Log2MaxTransformSkipSize)
In step 1930, processing unit 1710 may determine whether sharing of transform _ skip _ flag information with the target channel of the target block is to be used.
If it is determined that transform _ skip _ flag information is not to be shared, step 1940 may be performed.
If it is determined that transform _ skip _ flag information is to be shared, step 1950 may be performed.
For example, when the following condition in code 8 is satisfied (i.e., when the result of the condition in code 8 is true), the processing unit 1710 may determine that the transform _ skip _ flag information is to be shared, and when the following condition in code 8 is not satisfied (i.e., when the result of the condition in code 8 is false), the processing unit 1710 may determine that the transform _ skip _ flag information is not to be shared.
[ code 8]
if ((cIdx! 0) & (cross-channel prediction is used))
In other words, in step 1930, when the target channel is not a representative channel and cross-channel prediction is used for the target block, processing unit 1710 may determine that transform _ skip _ flag information is to be shared with the target channel. In contrast, when the target channel is a representative channel or when cross-channel prediction is not used for the target block, the processing unit 1710 may determine that transform _ skip _ flag information will not be shared.
In step 1940, the processing unit 1710 may acquire transform _ skip _ flag information of the target channel from the bitstream. The processing unit 1710 may parse and read transform _ skip _ flag information of the target channel from the bitstream.
transform _ skip _ flag information may be stored in transform _ skip _ flag [ x0] [ y0] [ cIdx ].
In step 1950, processing unit 1710 may set transform _ skip _ flag information such that the transform _ skip _ flag information of the representative channel is used as the transform _ skip _ flag information of the target channel.
The processing unit 1710 may use the transform _ skip _ flag information of the representative channel as the transform _ skip _ flag information of the target channel without parsing and reading the transform _ skip _ flag information of the target channel from the bitstream. In other words, the processing unit 1710 may store the value of transform _ skip _ flag [ x0] [ y0] [0] in transform _ skip _ flag [ x0] [ y0] [ cIdx ].
That is, the values previously stored in transform _ skip _ flag [ x0] [ y0] [0] may be similarly used in transform _ skip _ flag [ x0] [ y0] [ cIdx ] without requiring a process for parsing and reading transform _ skip _ flag information of a target channel from a bitstream.
For example, transform _ skip _ flag information signaled for a luminance (Y) channel as a representative channel may be used even for chrominance channels (Cb and/or Cr).
According to an embodiment, for a Cb signal, transform _ skip _ flag information may be transmitted, and for a Cr signal, the transform _ skip _ flag information for the Cb signal may be shared without transmitting the transform _ skip _ flag information. In this case, the above code 6 may be modified to the following code 9:
[ code 9]
Figure BDA0002623340100000891
When it is not possible to skip the transform for the target block, step 1960 may be performed.
At step 1960, information may be set indicating that no transform will be skipped for the target block. The value of transform _ skip _ flag x0 y0 cIdx may be set to 0 because the transform for the target block is not allowed to be skipped.
Fig. 20 is a flowchart of a decoding method for determining whether a transform is to be skipped according to an intra mode according to an embodiment.
There may be a significant correlation between the luminance channel (i.e., Y) and the chrominance channels (i.e., Cb and/or Cr) of the image. For example, a luminance channel may include a large amount of information on a texture of an image, and a Cb channel and a Cr channel, which are chrominance channels, may additionally provide color information to be added to the texture.
Therefore, when performing prediction required for compression and reconstruction of an image, prediction values for a Cb block and a Cr block for which prediction is performed from a signal of a luminance channel previously acquired through decoding may be calculated without performing independent prediction for three channels of a color space, respectively.
The technique for computing these predictions may be referred to as "cross-channel prediction (CCP)" or "CCLM" described above.
The decoding apparatus 1700 may determine whether cross-channel prediction has been used by checking whether the INTRA prediction mode of the target block is one of an INTRA _ CCLM mode, an INTRA _ MMLM mode, and an INTRA _ MFLM mode.
Such cross-channel prediction may be effective since a considerable portion of the texture information of the chrominance signal is also included in the luminance signal. Similarly, a predicted value for a Cr block that is a target of prediction may be calculated from a signal of a Cb channel using cross-channel prediction.
The case where cross-channel prediction is selected from various techniques including angle prediction, DC prediction, and plane prediction for prediction of a chrominance signal or the case where cross-channel prediction is advantageous may indicate that the signal characteristics of a channel corresponding to the SelectedCIDX are very similar to those of another channel.
Thus, while skipping (or performing) the transform for the block of the channel corresponding to the SelectedCIDX is advantageous, it may be equally advantageous to also skip (or perform) the transform for the blocks of the remaining channels.
Therefore, when cross-channel prediction is used, parsing of the bitstream may not be performed separately for the three channels in order to obtain transform _ skip _ flag information. When the transform _ skip _ flag information of a representative channel is parsed, the transform _ skip _ flag information of the remaining channels may not be parsed separately. Transform _ skip _ flag information of a representative channel may be shared and used as transform _ skip _ flag information of the remaining channels, and information indicating such sharing may be recorded in a bitstream. For example, for such sharing, the channel corresponding to the SelectedCIDX may be used to determine whether a transform is to be skipped.
Alternatively, the rate-distortion value may be calculated for a case where the transform is skipped identically for the three channels, and the rate-distortion value may be calculated for a case where the transform is performed identically for the three channels. The rate-distortion value calculated when the transform is skipped and the rate-distortion value calculated when the transform is performed may be compared with each other, and encoding of the channel may be performed using a more advantageous scheme between the scheme for skipping the transform and the scheme for performing the transform based on the result of the comparison.
In an embodiment, instead of signaling a plurality of pieces of transform _ skip _ flag information for a plurality of channels, the transform _ skip _ flag information may be signaled only for the channel corresponding to the SelectedCIDX, and the transform _ skip _ flag information may not be signaled separately for the remaining channels.
Next, an embodiment in which such transform _ skip _ flag information is shared will be described.
In an embodiment, the decoding apparatus 1700 may acquire transform _ skip _ flag information from a bitstream. Such acquisition may be represented by the following code 10:
[ code 10]
Figure BDA0002623340100000901
x0 and y0 may be spatial coordinates indicating the location of the target block.
The cIdx may indicate a target channel of the target block.
When the number of channels in the image is 3, the value of cIdx in the code 10 may be one of the values {0,1,2 }. For example, the value of cIdx may be one of the predefined values 0,1, 2.
In step 2010, the communication unit 1720 may receive the bitstream.
In step 2020, processing unit 1710 may determine whether the transform for the target block may be skipped.
When a transform may be skipped, step 2030 may be performed.
When it is not possible to skip a transform, step 2060 may be performed.
For example, when the size of the target block is less than or equal to a certain size, processing unit 1710 may determine that it is not possible to skip the transform.
For example, when the size of the target block is greater than a certain size, processing unit 1710 may determine that it is not possible to skip the transform.
Here, the specific size may be a boundary value of a block size that allows the transform to be skipped.
For example, when the following condition in code 7 is satisfied (i.e., when the result of the condition in code 7 is true), the processing unit 1710 may determine that it is not possible to skip the transform, and when the following condition in code 7 is not satisfied (i.e., when the result of the condition in code 7 is false), the processing unit 1710 may determine that it is possible to skip the transform.
[ code 11]
if(log2TrafoSize<=Log2MaxTransformSkipSize)
In step 2030, the processing unit 1710 may determine whether sharing of transform _ skip _ flag information with the target channel of the target block is to be used based on the selected representative channel.
If it is determined that transform _ skip _ flag information is not to be shared, step 2040 may be performed.
If it is determined that transform _ skip _ flag information is to be shared, step 2050 may be performed.
For example, when the following condition in code 8 is satisfied (i.e., when the result of the condition in code 8 is true), the processing unit 1710 may determine that the transform _ skip _ flag information is to be shared, and when the following condition in code 8 is not satisfied (i.e., when the result of the condition in code 8 is false), the processing unit 1710 may determine that the transform _ skip _ flag information is not to be shared.
[ code 12]
if ((cIdx! ═ SelectdCIDX) & & "Cross-channel prediction is used")
In other words, in step 2030, when the target channel is not the selected representative channel indicated by the SelectedCIDX and cross-channel prediction is used for the target block, the processing unit 1710 may determine to share transform _ skip _ flag information with the target channel. Further, the processing unit 1710 may determine not to share transform _ skip _ flag information when the target channel is the selected representative channel indicated by the SelectedCIDX or when cross-channel prediction is not used for the target block.
In step 2040, the processing unit 1710 may obtain transform _ skip _ flag information of the target channel from the bitstream. The processing unit 1710 may parse and read transform _ skip _ flag information of the target channel from the bitstream.
transform _ skip _ flag information may be stored in transform _ skip _ flag [ x0] [ y0] [ cIdx ].
In step 2050, the processing unit 1710 may set transform _ skip _ flag information such that the transform _ skip _ flag information of the selected representative channel indicated by the SelectedCIDX is used as the transform _ skip _ flag information of the target channel.
The processing unit 1710 may use the transform _ skip _ flag information of the selected representative channel indicated by the SelectedCIDX as the transform _ skip _ flag information of the target channel without parsing and reading the transform _ skip _ flag information of the target channel from the bitstream. In other words, the processing unit 1710 may store the value of transform _ skip _ flag [ x0] [ y0] [ SelectedCIDX ] in transform _ skip _ flag [ x0] [ y0] [ cIdx ].
That is, the value previously stored in the transform _ skip _ flag [ x0] [ y0] [ SelectedCIDX ] may also be used in the transform _ skip _ flag [ x0] [ y0] [ cIdx ] as well, without requiring a process for parsing and reading the transform _ skip _ flag information of the target channel from the bitstream.
At step 2060, information may be set indicating that the target channel for the target block will not skip a transform. The value of transform _ skip _ flag x0 y0 cIdx may be set to 0 because the transform for the target block is not allowed to be skipped.
In other words, for the target channel of the target block indicated by cIdx, a predefined value of 0 may be set in the transform _ skip _ flag information to indicate that no transform is to be skipped for the target channel, without parsing and reading the transform _ skip _ flag information indicating whether a transform is to be skipped from the bitstream.
As described above, in the above-described embodiment, the coding decision information of the selected representative channel may be shared with all channels except the selected representative channel.
The above-described embodiments may be partially modified. In other words, the coding decision information of the selected representative channel may be shared as the coding decision information of another specified channel. For example, the coding decision information may include transform _ skip _ flag information.
For example, when the value of SelectedCIDX is 1, the cIDX value 1 indicates a Cb signal, and the cIDX value 2 indicates a Cr signal, the coding decision information for the Cb signal may be shared as the coding decision information of the Cr signal.
In other words, the encoding decision information for the Cb signal may be transmitted from the encoding apparatus 1600 to the decoding apparatus 1700, and the encoding decision information for the Cr signal may be set using the encoding decision information for the Cb signal, without separately transmitting the encoding decision information for the Cr signal.
When the coding decision information for the Cb signal is shared as the coding decision information for the Cr signal, the above code 10 may be modified to the following code 13:
[ code 13]
Figure BDA0002623340100000931
In step 2030, when the following condition in the code 8 is satisfied (i.e., when the result of the condition in the code 8 is true), the processing unit 1710 may determine that the transform _ skip _ flag information is to be shared, and when the following condition in the code 8 is not satisfied (i.e., when the result of the condition in the code 8 is false), the processing unit 1710 may determine that the transform _ skip _ flag information is not to be shared.
[ code 14]
if ((cIdx! 2) or (! cross-channel prediction used "))
In other words, in step 2030, the processing unit 1710 may determine not to share transform _ skip _ flag information with the target channel 1) when the target channel is not a channel sharing coding decision information of the selected representative channel indicated by the SelectedCIDX, or 2) when cross-channel prediction is not used for the target block. Further, the processing unit 1710 may determine to share transform _ skip _ flag information 1) when the target channel is a channel sharing coding decision information of the selected representative channel indicated by the SelectedCIDX, and 2) when cross-channel prediction is used for the target block.
The steps in the code 13 may be implemented as other steps that retain the same meaning. For example, code 13 may be modified to the following code 15:
[ code 15]
Figure BDA0002623340100000941
Sharing of transformation selection information
Fig. 21 is a flow diagram of a method for sharing transformation selection information, according to an embodiment.
In the above-described embodiment, the transform _ skip _ flag information has been described as coding decision information to be shared. The transform _ skip _ flag information in the above embodiments may be replaced with another type of coding decision information. Next, the transform selection information will be described as encoding decision information to be shared.
The transform selection information may be information indicating which transform is to be used for the transform block of the target channel. The transformation selection information may include the primary transformation selection information and/or the secondary transformation selection information described above.
There may be a significant correlation between the luminance channel (i.e., Y) and the chrominance channels (i.e., Cb and/or Cr) of an image. For example, a luminance channel may include a large amount of information on a texture of an image, and a Cb channel and a Cr channel, which are chrominance channels, may additionally provide color information to be added to the texture.
Therefore, when performing prediction required for compression and reconstruction of an image, prediction values for a Cb block and a Cr block for which prediction is performed from a signal of a luminance channel previously acquired through decoding may be calculated without performing independent prediction for three channels of a color space, respectively. Such cross-channel prediction may be effective because a considerable amount of texture information of a chrominance signal may be included in a luminance signal.
The case where cross-channel prediction is selected from various techniques for predicting chrominance signals (including angle prediction, DC prediction, plane prediction, etc.) or where cross-channel prediction is more advantageous may indicate that the signal properties of the luminance channel are very similar to those of the chrominance channels (i.e., Cb and/or Cr).
Thus, when it is advantageous to use a particular transform of a plurality of transforms for a luma block, it may be advantageous to use the same transform for chroma blocks (i.e., Cb blocks and/or Cr blocks).
In other words, in general, the same transform may be used for luma and chroma blocks. Alternatively, when a specific transform is used for a luminance block, another specific transform corresponding to the specific transform for the luminance block may be used for a chrominance block.
Thus, when using cross-channel prediction, if one transform is determined, the determined transform may be used for the luma channel and the chroma channel (i.e., three channels) as such, without signaling the transforms for the luma channel and the chroma channel, respectively.
Alternatively, if one transform is determined for the luminance channel, a transform corresponding to the transform determined for the luminance channel may be used for the chrominance channel.
When the transform to be used for the channel is determined in this manner, the luminance channel and the chrominance channel may be encoded based on the determination. For such encoding, one of the plurality of available transforms may be selected only for the luma channel and the one transform for the luma channel may be selected, so the transform for the chroma channel may be automatically determined.
Alternatively, encoding using the same transform may be performed on three channels. With such encoding, rate-distortion values may be calculated for multiple transforms. Thereafter, by comparison between rate-distortion values of applicable transforms, a transform having the most favorable rate-distortion value may be selected, and encoding may be performed according to the selected transform.
Alternatively, encoding using a transform set may be performed on three channels. The set of transforms may include a particular transform for the luma channel and a transform for the chroma channel that corresponds to the particular transform.
With such encoding, rate-distortion values may be calculated for multiple transform sets. Thereafter, through comparison between rate-distortion values of the applicable transform sets, a transform set having the most favorable rate-distortion value may be selected, and encoding may be performed according to the selected transform set.
In an embodiment, instead of separately signaling a plurality of pieces of transform selection information for a plurality of channels, the transform selection information may be signaled only for a luminance signal (luminance channel), and the transform selection information may not be separately signaled for the remaining channels as chrominance channels.
In an embodiment, the decoding apparatus 1700 may acquire transform selection information from a bitstream.
In step 2110, the communication unit 1720 may receive the bitstream.
In step 2120, the processing unit 1710 may determine whether sharing of transform selection information with a target channel of the target block is to be used.
When the transformation selection information is not to be shared, step 2130 may be performed.
When the transformation selection information is to be shared, step 2140 may be performed.
In step 2130, the processing unit 1710 may obtain transform selection information of the target channel from the bitstream. The processing unit 1710 may parse and read transform selection information of a target channel from the bitstream.
In step 2140, the processing unit 1710 may set the transformation selection information such that the transformation selection information of the representative channel is used as the transformation selection information of the target channel.
Step 2120, step 2130 and step 2140 may be represented by the following code 16:
[ code 16]
Figure BDA0002623340100000961
x0 and y0 may be spatial coordinates indicating the location of the target block.
The cIdx may indicate a target channel of the target block.
When the number of channels in the image is 3, the value of cIdx in the code 10 may be one of the values {0,1,2 }. For example, the value of cIdx may be one of the predefined values 0,1, 2.
The cIdx of a representative channel may be assumed to be 0.
"cIdx! 0 "may indicate that the target channel is not a representative channel (e.g., a luminance channel). A cIdx value of 0 may indicate that the target channel is a representative channel.
In other words, in step 2120, when the target channel is not a representative channel and cross-channel prediction is used for the target block, the processing unit 1710 may determine that sharing of transform selection information with the target channel is to be used. When the target channel is a representative channel or when cross-channel prediction is not used for the target block, the processing unit 1710 may determine that sharing of transform selection information with the target channel will not be used.
In step 2130, the processing unit 1710 may obtain transform selection information of the target channel from the bitstream. The processing unit 1710 may parse and read transform selection information of a target channel from the bitstream.
Transformation selection information may be stored in transformation selection information x0 y0 cIdx.
In step 2140, the processing unit 1710 may set the transformation selection information such that the transformation selection information of the representative channel is used as the transformation selection information of the target channel.
The processing unit 1710 may use the transformation selection information of the representative channel as the transformation selection information of the target channel without parsing and reading the transformation selection information of the target channel from the bitstream. That is, the processing unit 1710 may store the value of transform selection information [ x0] [ y0] [0] in transform selection information [ x0] [ y0] [ cIdx ].
That is, the values previously stored in the transform selection information [ x0] [ y0] [0] can be used for the transform selection information [ x0] [ y0] [ cIdx ] as well, without requiring a process for parsing and reading the transform selection information of the target channel from the bitstream.
Determining whether cross-channel prediction will be used
In the embodiments described above with reference to fig. 18 to 21, it has been illustrated that whether cross-channel prediction is to be used is determined by checking whether the INTRA prediction mode of the target block is one of the INTRA _ CCLM mode, the INTRA _ MMLM mode, and the INTRA _ MFLM mode. In other words, when the INTRA prediction mode of the target block is one of the INTRA _ CCLM mode, INTRA _ MMLM mode, and INTRA _ MFLM mode, cross-channel prediction may be used. When the INTRA prediction mode of the target block is not one of the INTRA _ CCLM mode, the INTRA _ MMLM mode, and the INTRA _ MFLM mode, cross-channel prediction may not be used.
The above determination of whether cross-channel prediction is to be used is only an example, and whether cross-channel prediction is to be used may be determined by one of the following codes 17 to 23. For example, cross-channel prediction may be used when the value of the condition of each of the following codes is true, and may not be used when the value of the condition of each of the following codes is false. The "intra _ chroma _ pred _ mode" may be an intra prediction mode for the chroma channel.
[ code 17]
if(intra_chroma_pred_mode==CCLM mode)
[ code 18]
if(intra_chroma_pred_mode==DM mode)
[ code 19]
if(intra_chroma_pred_mode==INTRA_CCLM mode)
[ code 20]
if(intra_chroma_pred_mode==INTRA_MMLM mode)
[ code 21]
if(intra_chroma_pred_mode==INTRA_MFLM mode)
[ code 22]
if((intra_chroma_pred_mode==INTRA_CCLM mode)||
(intra_chroma_pred_mode==INTRA_MMLM mode)||
(intra_chroma_pred_mode==INTRA_MFLM mode))
[ code 23]
if((intra_chroma_pred_mode==DM mode)||
(intra_chroma_pred_mode==INTRA_CCLM mode)||
(intra_chroma_pred_mode==INTRA_MMLM mode)||
(intra_chroma_pred_mode==INTRA_MFLM mode))
Each of the CCLM mode, DM mode, INTRA _ CCLM mode, INTRA _ MMLM mode, and INTRA _ MFLM mode described in codes 17 to 23 may indicate one value of INTRA _ chroma _ pred _ mode presented in the first column in tables 10 and 11 described above. With respect to the CCLM mode, DM mode, INTRA _ CCLM mode, INTRA _ MMLM mode, and INTRA _ MFLM mode, the foregoing description with respect to tables 10 and 11 may be referred to.
When it is intended to determine whether cross-channel prediction is to be used, the size of the block may be additionally considered in the schemes in the above-described codes 17 to 23.
The determination of whether cross-channel prediction is to be used may also be performed by one of the following code 24 through code 30. For example, cross-channel prediction may be used when the value of the condition in each of the following codes is true, and may not be used when the value of the condition in each of the following codes is false.
[ code 24]
if (intra _ chroma _ pred _ mode ═ CCLM mode) & & block size condition)
[ code 25]
if (intra _ chroma _ pred _ mode ═ DM mode) & & block size condition)
[ code 26]
if (INTRA _ chroma _ pred _ mode ═ INTRA _ CCLM mode) & & block size condition)
[ code 27]
if (INTRA _ chroma _ pred _ mode ═ INTRA _ MMLM mode) & & block size condition)
[ code 28]
if (INTRA _ chroma _ pred _ mode ═ INTRA _ MFLM mode) & & block size condition)
[ code 29]
if((intra_chroma_pred_mode==INTRA_CCLM mode)||
(intra_chroma_pred_mode==INTRA_MMLM mode)||
(INTRA _ chroma _ pred _ mode ═ INTRA _ MFLM mode) & & block size condition)
[ code 30]
if((intra_chroma_pred_mode==DM(direct mode)mode)||
(intra_chroma_pred_mode==INTRA_CCLM mode)||
(intra_chroma_pred_mode==INTRA_MMLM mode)||
(INTRA _ chroma _ pred _ mode ═ INTRA _ MFLM mode) & & block size condition)
The block size condition presented in codes 24 to 30 may be replaced by one of the following code codes 31, 32, 33 and 34.
[ code 31]
((log2TbWidth<=Log2MaxSizeWidth)&&(log2TbHeight<=Log2MaxSizeHeight))
[ code 32]
((log2TbWidth>=Log2MinSizeWidth)&&(log2TbHeight>=Log2MinSizeHeight))
[ code 33]
((log2TbWidth>Log2MinSizeWidth)&&(log2TbHeight<=Log2MaxSizeHeight))
[ code 34]
((log2TbWidth>Log2MinSizeWidth)&&(log2TbHeight>Log2MaxSizeHeight))
Log2TbWidth and log2TbHeight have been described above with reference to equations 11 and 12.
Log2MaxSizeWidth, Log2MaxSizeHeight, Log2MinSizeWidth and Log2MinSizeHeight may be predefined values. Log2 maxsizevidth may be the width of the block with the largest size. Log2MaxSizeHeight may be the height of the block with the largest size. Log2MinSizeWidth may be the width of the block having the smallest size. Log2MinSizeHeight may be the height of the block with the smallest size.
For example, the value of Log2MaxSizeWidth may be 16, the value of Log2MaxSizeHeight may be 16, the value of Log2MinSizeWidth may be 4, and the value of Log2MinSizeHeight may be 4.
Alternatively, the value of Log2 maxsizedwidth may be 32, and the value of Log2 maxsizehight may be 32.
When DM is used, the intra prediction mode of the chrominance signal may not be signaled separately. When DM is used, the intra prediction mode signaled for the luminance signal can also be used in the chroma mode without change.
Encoding and decoding using sharing of selective information between channels under block partitioning structure
In general, when an image is encoded, an appropriate encoding scheme may be separately used for a plurality of spatial regions in consideration of spatial characteristics in the image. For such encoding, the image may be partitioned into CUs, and CUs generated from the partitions may be separately encoded.
To perform such encoding, the same block partition structure may be used for the luminance channel and the chrominance channel.
However, the characteristics of the luminance signal and the characteristics of the chrominance signal may be different from each other. Different block partition structures may be used for the luminance channel and the chrominance channel, respectively, in order to achieve more efficient encoding, taking into account the differences between the characteristics.
Hereinafter, the block partition structure of an image is referred to as a "single-tree block partition structure" or a "single tree" for the case where a luminance signal and a chrominance signal (or a plurality of channels) are the same.
Hereinafter, the block partition structure of an image is referred to as a "dual-tree block partition structure" or a "dual tree" for the case where a luminance signal and a chrominance signal (or channels) are not identical.
In an embodiment, a block of another channel corresponding to a target block of a target channel may be specified between a luminance channel and a chrominance channel (or multiple channels). The block of the other channel corresponding to the target block of the target channel is referred to as a "corresponding block (col-block)".
In an embodiment, when the luminance channel and the chrominance channel (or channels) have the same block partition structure (i.e., when a single tree is applied), blocks of another channel corresponding to a target block of a target channel may be specified.
In an embodiment, when the luminance channel and the chrominance channel (or channels) have different block partition structures (i.e., when a dual tree is applied), blocks of the other channel corresponding to a target block of the target channel may be specified.
When the luminance channel and the chrominance channel (or channels) have different block partition structures (i.e., when a dual tree is applied), the coding decision information of the corresponding block may be shared with the target block of the target channel. By sharing the coding decision information, the target block is coded, and thus coding efficiency can be improved.
Encoding decision information of a corresponding block corresponding to a target block of a target channel may be parsed from the compressed bitstream.
The encoding decision information of the corresponding block corresponding to the target block of the target channel may be parsed from the compressed bitstream, and the target block of the target channel may be decoded using the encoding decision information of the corresponding block.
For example, transform _ skip _ flag information of a corresponding block corresponding to a target block of a target channel may be parsed, and the transform _ skip _ flag information of the corresponding block may be used to determine whether a transform is to be skipped for the target block.
The shared information may be coding decision information shared between the luminance block and the chrominance block.
When the block partition structure of the first channel and the block partition structure of the second channel are identical to each other, if the spatial position of the first block of the first channel corresponds to the spatial position of the second block of the second channel, the first block and the second block may correspond to each other (i.e., may be co-located). In other words, the corresponding blocks of different channels may be blocks in different channels having corresponding (co-sited) spatial locations. The shared information of the second block corresponding to the first block may be used for encoding and/or decoding of the first block.
When the block partition structure of the chrominance channel is the same as the block partition structure of the luminance channel, a luminance block corresponding to a specific chrominance block of the chrominance channel may be a luminance block at a spatial position corresponding to a spatial position of the specific chrominance block. In this case, the shared information of the luminance block corresponding to the chrominance block may be used for encoding and/or decoding of the chrominance block.
FIG. 22 shows a single treeblock partition structure.
FIG. 23 illustrates a dual treeblock partition structure.
Under a 4:2:0 color sub-sampling structure, a region of the luminance signal spatially corresponding to a chrominance block may occupy an area four times as large as the chrominance block. In other words, the horizontal length (width) and the vertical length (height) of the luminance signal region may be twice the horizontal length and the vertical length of the chrominance block.
As shown in fig. 22, in the corresponding image areas of the chrominance channel and the luminance channel, the block partition structure of the chrominance channel and the block partition structure of the luminance channel may be identical to each other. In other words, a single tree may be used for both the chrominance and luminance channels.
In fig. 22 and subsequent drawings, image areas corresponding to each other are indicated by "corresponding areas".
As shown in fig. 23, in the corresponding image areas of the chrominance channel and the luminance channel, the block partition structure of the chrominance channel and the block partition structure of the luminance channel may be different from each other. In other words, a dual tree may be used for the chrominance and luminance channels.
For example, in fig. 23, a region of a luminance channel spatially corresponding to one chrominance block may be partitioned into eight blocks.
When the block partition structures in the corresponding image areas of the luminance channel and the chrominance channel are identical to each other, the block of the luminance channel spatially corresponding to a specific chrominance block can be explicitly specified.
In contrast, when the block partition structure of the luminance channel is not identical to that of the chrominance channel, for a designated chrominance block determined by partitioning according to the block partition structure of the chrominance channel, the luminance block corresponding to the designated chrominance block may not be explicitly designated in the luminance channel. As shown in fig. 23, this ambiguity is attributed to the fact that the block partition structure of the chrominance channel and the block partition structure of the luminance channel are different from each other.
In the embodiment, for the case where the block partition structure of the luminance channel is not identical to that of the chrominance channel, a method for specifying a luminance block corresponding to a chrominance block will be described.
For example, a plurality of luminance blocks corresponding to the chrominance blocks may be specified. With this specification, one piece (pieces) of shared information may be acquired from one or more luminance blocks corresponding to a chrominance block as a target block, and encoding and/or decoding of the chrominance block may be performed using the acquired piece (pieces) of shared information.
Next, a method for specifying one or more blocks of the second channel corresponding to the blocks of the first channel will be described for a case where the block partition structure of the first channel is not identical to the block partition structure of the second channel. In the following description, although the first channel will be described as a chrominance channel and the second channel will be described as a luminance channel, the chrominance channel and the luminance channel are only exemplary, and as described above, the first channel and the second channel may be different types of channels.
Fig. 24 illustrates a scheme for specifying a corresponding block based on a location in a corresponding region, according to an example.
A corresponding region indicating a luminance block corresponding to a chrominance block may be designated as a rectangular region. The position of the uppermost leftmost pixel in the rectangular region may be (xCb, yCb). The position of the lowermost rightmost pixel in the rectangular region may be (xCb + cbWidth-1, yCb + cbHeight-1).
The position (xCb, yCb) may indicate the position of the luma pixel corresponding to the position of the uppermost leftmost pixel in the chroma block (i.e., chroma coding block).
cbWidth and cbHeight may be values indicating the width and height, respectively, of the target block based on the luminance pixel.
In other words, a corresponding region indicating a luminance block corresponding to a chrominance block may be defined as a rectangular region in which the position of the uppermost leftmost pixel based on the position of the luminance pixel is (xCb, yCb), and which has a horizontal width cbWidth and a vertical height cbHeight.
The corresponding region indicating the luminance block corresponding to the above-described chrominance block may be applied to the embodiments described above with reference to fig. 24 to 31.
In fig. 24, a luminance block corresponding to a chrominance block may be a luminance block existing at a predefined position in a corresponding region of a luminance channel spatially corresponding to the chrominance block.
In other words, a luminance block existing at a predefined position in a corresponding region of a luminance channel spatially corresponding to a chrominance block may be designated as one or more luminance blocks corresponding to the chrominance block. Alternatively, a luminance block occupying a predefined position in a corresponding region of a luminance channel spatially corresponding to a chrominance block may be designated as one or more luminance blocks corresponding to the chrominance block.
For example, the predefined positions may be a Center (CR) position, a Top Left (TL) position, a Top Right (TR) position, a Bottom Left (BL) position, and a Bottom Right (BR) position in a region of the luminance channel spatially corresponding to the chrominance blocks.
The CR position may indicate (xCb + cbWidth/2, yCb + cbHeight/2). The TL position may be indicated (xCb, yCb). The TR position may indicate (xCb + cbWidth-1, yCb). The BL position may indicate (xCb, yCb + cbHeight-1). The BR position can indicate (xCb + cbWidth-1, yCb + cbHeight-1).
In an embodiment, blocks of another channel corresponding to a target block of a target channel may be specified in a plurality of channels (such as a luminance channel and a chrominance channel). The blocks of the other channel corresponding to the target blocks of the target channel are referred to as "corresponding blocks".
In an embodiment, in an area of a luminance channel spatially corresponding to a chrominance block, a luminance block including luminance pixels existing at a position (xCb + cbWidth/2, yCb + cbHeight/2) indicating a Center (CR) may be a corresponding block. Accordingly, when the target block is partitioned in the form of a dual tree, a specific block of another channel (e.g., a luminance channel) corresponding to the target block of the target channel (e.g., a chrominance channel) among the plurality of channels can be explicitly specified. In an embodiment, in encoding and/or decoding of a chroma channel, a luma block including luma pixels existing at positions (xCb + cbWidth/2, yCb + cbHeight/2) may be designated as a corresponding block. The information about the specified corresponding block may be used to encode and/or decode the target block.
For example, the predefined positions may be some of a CR position, a TL position, a TR position, a BL position, and a BR position in a region of the luminance channel spatially corresponding to the chrominance block.
For example, the luminance block corresponding to the chrominance block may be a block including at least one of pixels located at the following positions in a region of a luminance channel spatially corresponding to the chrominance block: a center position, an upper left position, an upper right position, a lower left position, and a lower right position.
For example, the luminance block corresponding to the chrominance block may be some of blocks including at least one of the pixels located at the following positions in a region of a luminance channel spatially corresponding to the chrominance block: a center position, an upper left position, an upper right position, a lower left position, and a lower right position.
For example, a luma block corresponding to a chroma block may include a block including at least one of the pixels located at the following positions in a region of a luma channel spatially corresponding to the chroma block: a center position, an upper left position, an upper right position, a lower left position, and a lower right position.
Fig. 25 illustrates a scheme for specifying a corresponding block based on an area in a corresponding region according to an example.
Fig. 26 illustrates another scheme for specifying a corresponding block based on an area in a corresponding region according to an example.
The luminance block corresponding to the chrominance block may indicate a luminance block having a largest area in a region of a luminance channel spatially corresponding to the chrominance block.
Alternatively, the luminance block corresponding to the chrominance block may be a predefined number of luminance blocks having a maximum area in an area of the luminance channel spatially corresponding to the chrominance block.
Such a designation can be attributed to the fact that: there is a high probability that the characteristics of the luminance block having the largest area or a predefined number of luminance blocks having the largest area in the region of the luminance channel corresponding to the chrominance blocks will be similar to those of the chrominance blocks.
As shown in fig. 25, the predetermined number may be 2. In fig. 25, two blocks (i.e., block 1 and block 2) having the largest area may be selected from eight luminance blocks in the region of the luminance channel corresponding to the chrominance blocks.
As shown in fig. 26, the predetermined number may be 3. In fig. 25, three blocks (i.e., block 1, block 2, and block 3) having the largest area may be selected from eight luminance blocks in the region of the luminance channel corresponding to the chrominance blocks.
With this specification scheme, even if the block partition structures of the luminance channel and the chrominance channel are different from each other, the coding efficiency can be improved by using the shared information.
Fig. 27 illustrates a scheme for specifying a corresponding block based on a form of a block in a corresponding region according to an example.
Fig. 28 illustrates another scheme for specifying a corresponding block based on a form of the block in a corresponding region according to an example.
The luminance block corresponding to the chrominance block may indicate a luminance block having the same form as the chrominance block in an area of a luminance channel spatially corresponding to the chrominance block.
For example, the form of the block may include the size of the block.
According to a dual-tree block partition structure, such as those shown in fig. 27 and 28, the block partition structure of a CU of a chroma channel may be different from the block partition structure of a CU in a region of a luma channel corresponding to a CU of a chroma channel. Even in this case, the region of the luminance channel corresponding to the region of the target block in the CU of the chrominance channel may exist as a single block.
In this case, a single luminance block in the region of the luminance channel corresponding to the chrominance block can be accurately matched with the chrominance channel. The shared information of the matched luminance block may be shared as coding decision information of the chrominance block.
Fig. 29 illustrates a scheme for specifying a corresponding block based on an aspect ratio of blocks in a corresponding region according to an example.
Fig. 30 illustrates another scheme for specifying a corresponding block based on an aspect ratio of blocks in a corresponding region, according to an example.
The luminance block corresponding to the chrominance block may be a luminance block having the same aspect ratio as the chrominance block in an area of a luminance channel spatially corresponding to the chrominance block.
Alternatively, the luminance block corresponding to the chrominance block may be a luminance block having an aspect ratio similar to that of the chrominance block in a region of a luminance channel spatially corresponding to the chrominance block.
For example, in the region of the luminance channel of fig. 29, luminance block 2 may be selected, and in the region of the luminance channel of fig. 30, luminance block 1 may be selected.
Here, the aspect ratio of the block may be a ratio of a horizontal length to a vertical length of the corresponding block. In other words, the aspect ratio of the block may be a value obtained by dividing the horizontal length of the block by the vertical length of the block.
For example, the following equation 13 may be used to determine whether the aspect ratios of the blocks are equal to each other:
[ equation 13]
(log2WidthChroma-log2HeightChroma)==(log2WidthLuma-log2HeightLuma)
WidthChromaMay be the width of the chrominance block. HeightChromaMay be the height of the chroma block.
WidthLumaMay be the width of the luminance block corresponding to the chrominance block. HeightLumaMay be the height of the luminance block corresponding to the chrominance block.
The following equation 14 may be used to determine whether the aspect ratios are similar to each other:
[ equation 14]
|(log2WidthChroma-log2HeightChroma)==(log2WidthLuma-log2HeightLuma)|<THD
"| x |" may indicate the absolute value of x.
THD may be a threshold. For example, the value of THD may be 2.
According to equations 13 and 14, one or more luminance blocks having the same aspect ratio as the chrominance blocks may be designated as corresponding blocks. Alternatively, one or more luminance blocks having an aspect ratio similar to that of the chrominance blocks may be designated as the corresponding blocks.
Fig. 31 illustrates a scheme for specifying a corresponding block based on encoding properties of blocks in a corresponding region according to an example.
Although the block partition structure of the chrominance channel and the block partition structure of the luminance channel are independent of each other, if there is a luminance block having the same coding decision information as the chrominance block among luminance blocks in a region of the luminance channel spatially corresponding to the chrominance block, the shared information of the chrominance block and the shared information of the luminance block may be identical to each other.
To take advantage of these characteristics, a luma block having the same value as a chroma block for predefined coding decision information may be designated as a luma block corresponding to the chroma block. Alternatively, a luminance block having a value similar to that of a chrominance block for predefined coding decision information may be designated as a luminance block corresponding to the chrominance block.
For example, the predefined coding decision information may be information on whether intra prediction is used, intra prediction mode, motion prediction information, motion vector, information on whether merge mode is used, derivation mode, transform selection information, and the like.
For example, the intra prediction mode may be used as the predefined coding decision information. In fig. 31, luminance block 1, luminance block 2, and luminance block 3 are shown in the area of the luminance channel. The luminance block 1, the luminance block 2, and the luminance block 3 may be luminance blocks having the same (or similar) intra prediction mode as that of the chrominance block. The luminance block 1, the luminance block 2, and the luminance block 3 may be designated as luminance blocks corresponding to the chrominance blocks.
According to the particular scheme described above, the luma block corresponding to the chroma blocks may include a plurality of luma blocks. When a plurality of pieces of shared information of a plurality of luminance blocks are identical to each other, a problem may not occur when the plurality of pieces of shared information are used for a chrominance block. In contrast, when a plurality of pieces of shared information of a plurality of luminance blocks are different from each other, the shared information to be used for encoding and/or decoding of a chrominance block may not be explicit.
A value corresponding to most of values of a plurality of pieces of shared information of a plurality of corresponding blocks may be used as a value of shared information of the chroma block. This decision method may be referred to as a "majority-based shared information decision method". By this method, information to be shared can be efficiently shared without additional signaling.
For example, when transform _ skip _ flag information is shared, values occupying most of values of pieces of transform _ skip _ flag information of a plurality of luminance blocks corresponding to a chroma block may be shared as values of transform _ skip _ flag information of the chroma block. By means of this sharing, encoding and/or decoding of chroma blocks may be performed.
For example, the shared information is used only when the number of luminance blocks corresponding to the chrominance blocks is only 1. The shared information may be used only when there is only one luminance block satisfying the above-described specific condition in an area of the luminance channel spatially corresponding to the chrominance block. Alternatively, when the region of the further channel spatially corresponding to the target block of the target channel is specified by only a single block, the coding decision information may be shared between the channels.
Referring to fig. 27, when an area of a luminance channel spatially corresponding to a target block of a chrominance channel is partitioned into only one block, the one block may be a corresponding block of the luminance channel. Such shared information of the corresponding block may be used to encode and/or decode the chroma block.
When a region of a luminance channel spatially corresponding to a target block of a chrominance channel is not partitioned into one block, encoding decision information may not be shared without separate signaling.
Fig. 32 is a flow chart of an encoding method according to an embodiment.
The encoding method and the bitstream generation method according to the present embodiment may be performed by the encoding apparatus 1600. The present embodiment may be part of a target encoding method or a video encoding method.
In step 3210, the processing unit 1610 may determine coding decision information of a representative channel of the target block.
In step 3220, the processing unit 1610 may generate information on the target block by performing encoding on the target block using the encoding decision information of the representative channel of the target block.
In step 3230, the processing unit 1610 may generate a bitstream including information on the target block.
The information about the target block may include coding decision information of the representative channel. Further, the bitstream and the information on the target block may not include coding decision information of the target channel.
The embodiment described with reference to fig. 32 may be combined with the further embodiments described above. Duplicate description will be omitted here
Fig. 33 is a flowchart of a decoding method according to an embodiment.
The decoding method according to the present embodiment may be performed by the decoding apparatus 1600.
In step 3310, the communication unit 1710 may receive a bitstream including information on the target block.
The information about the target block may include coding decision information of the representative channel. Further, the bitstream and the information on the target block may not include coding decision information of the target channel.
In step 3320, the processing unit 1720 may share the encoding decision information of the representative channel of the target block as the encoding decision information of the target channel of the target block.
The coding decision information of the representative channel may be shared as the coding decision information of the target channel.
In step 3320, the processing unit 1610 may perform decoding on the target block using the coding decision information of the target channel.
The embodiment described with reference to fig. 33 may be combined with the further embodiments described above. Duplicate description will be omitted here.
The above-described embodiments may be performed using the same method in the encoding apparatus 1600 and the decoding apparatus 1700.
The order of steps, operations, and processes to be applied in the embodiments may be different from each other in the encoding apparatus 1600 and the decoding apparatus 1700. Alternatively, the order of steps, operations, and processes to be applied in the embodiments may be the same as each other in the encoding apparatus 1600 and the decoding apparatus 1700.
The embodiments may be performed separately for a luminance signal and a chrominance signal. Alternatively, the embodiment may be equally performed on the luminance signal and the chrominance signal.
The form of each block to which the embodiment will be applied may be a square form or a non-square form.
Whether to apply the embodiments may be determined based on a size of at least one of the CU, PU, TU, and target block. Here, the size may be defined as a minimum size and/or a maximum size enabling the embodiment to be applied to the object, and may be defined as a fixed size enabling the embodiment to be applied to the object.
Further, the first embodiment may be applied to a first size, and the second embodiment may be applied to a second size. That is, the embodiment may be compositely applied according to the size of the object. Further, the embodiments may be applied only to the case where the size of the target is equal to or greater than the minimum size and less than or equal to the maximum size. That is, the embodiments may be applied only to a case where the size of the object falls within a specific range.
For example, the embodiment can be applied only to the case where the size of the target block is equal to or larger than 8 × 8. For example, the embodiment can be applied only to the case where the size of the target block is 4 × 4. For example, the embodiments may be applied only to the case where the size of the target block is less than or equal to 16 × 16. For example, the embodiments can be applied only to the case where the size of the target block is equal to or larger than 16 × 16 and smaller than or equal to 64 × 64.
Whether to apply the embodiments may be determined according to a temporal layer. To identify the temporal layer to which an embodiment is to be applied, a separate identifier may be signaled. Embodiments may be selectively applied to temporal layers specified by identifiers. Here, such an identifier may indicate the lowest and/or highest layer to which the embodiment is to be applied, and may also indicate a particular layer to which the embodiment is to be applied. Further, the temporal layer to which the embodiment is to be applied may be predefined.
For example, the embodiment may be applied only to a case where the temporal layer of the target image is the lowest layer. For example, the embodiments may be applied only to the case where the temporal layer identifier of the target image is equal to or greater than 1. For example, the embodiment may be applied only to a case where the temporal layer of the target image is the highest layer.
A stripe type to which the embodiment will be applied may be defined. Depending on the type of strip, the embodiment may be selectively applied.
In the above-described embodiments, although the method has been described based on the flowchart as a series of steps or units, the present disclosure is not limited to the order of the steps, and some steps may be performed in an order different from that of the described steps or simultaneously with other steps. Furthermore, those skilled in the art will understand that: the steps shown in the flowcharts are not exclusive and may also include other steps, or one or more steps in the flowcharts may be deleted without departing from the scope of the present disclosure.
The above-described embodiments according to the present disclosure may be implemented as programs that can be executed by various computer devices, and may be recorded on a computer-readable storage medium. Computer readable storage media may include program instructions, data files, and data structures, alone or in combination. The program instructions recorded on the storage medium may be specially designed and configured for the present disclosure, or may be known or available to those having ordinary skill in the computer software art.
Computer-readable storage media may include information used in embodiments of the present disclosure. For example, a computer-readable storage medium may include a bitstream, and the bitstream may contain information described above in embodiments of the invention.
The computer-readable storage medium may include a non-transitory computer-readable medium.
Examples of the computer-readable storage medium may include all types of hardware devices specifically configured to record and execute program instructions, such as magnetic media (such as hard disks, floppy disks, and magnetic tapes), optical media (such as Compact Disk (CD) -ROMs and Digital Versatile Disks (DVDs)), magneto-optical media (such as floppy disks, ROMs, RAMs, and flash memories). Examples of program instructions include both machine code, such as created by a compiler, and high-level language code that may be executed by the computer using an interpreter. The hardware devices may be configured to operate as one or more software modules to perform the operations of the present disclosure, and vice versa.
As described above, although the present disclosure has been described based on specific details (such as detailed components and a limited number of embodiments and drawings), which are provided only for easy understanding of the present disclosure, the present disclosure is not limited to these embodiments, and those skilled in the art will practice various changes and modifications according to the above description.
Therefore, it is to be understood that the spirit of the present embodiments is not limited to the above-described embodiments, and that the appended claims and their equivalents and modifications fall within the scope of the present disclosure.

Claims (20)

1. A decoding method, comprising:
sharing the coding decision information of the representative channel of the target block as the coding decision information of the target channel of the target block; and is
Decoding a target block using coding decision information of the target channel.
2. The decoding method of claim 1, further comprising: receiving a bitstream including information on a target block,
wherein the information on the target block includes coding decision information of the representative channel, and
wherein the information on the target block does not include coding decision information of the target channel.
3. The decoding method of claim 1, wherein the coding decision information of the representative channel is transform skip information indicating whether a transform is to be skipped.
4. The decoding method of claim 1, wherein the coding decision information of the representative channel is transform selection information indicating which transform is to be used for a transform block of the channel.
5. The decoding method of claim 1, wherein the coding decision information of the representative channel is intra-coding decision information of the representative channel.
6. The decoding method of claim 1, wherein the representative channel and the target channel are channels in a YCbCr color space.
7. The decoding method of claim 1, wherein:
the representative channel is a luminance channel, and
the target channel is a chrominance channel.
8. The decoding method of claim 1, wherein the representative channel is a color channel having a highest correlation with a luminance signal.
9. The decoding method of claim 1, wherein the representative channel is determined by an index in the bitstream indicating the selected representative channel.
10. The decoding method according to claim 1, wherein the sharing operation is performed when image properties of a plurality of channels of the target block are similar to each other.
11. The decoding method of claim 10, wherein when the intra prediction mode of the chroma channel of the target block is a direct mode, the image properties of the plurality of channels are determined to be similar to each other.
12. The decoding method of claim 1, wherein:
when cross-channel prediction is used, the sharing operation is performed, and
whether cross-channel prediction is used is derived based on information obtained from the bitstream.
13. The decoding method of claim 1, wherein:
when cross-channel prediction is used, the sharing operation is performed, and
determining whether to use cross-channel prediction based on an intra prediction mode of a target block.
14. The decoding method of claim 13, wherein when the INTRA prediction mode of the target block is one of an INTRA _ CCLM mode, an INTRA _ MMLM mode, and an INTRA _ MFLM mode, the cross-channel prediction is used.
15. The decoding method of claim 1, wherein whether the sharing operation is to be performed is determined based on a size of a target block.
16. The decoding method of claim 1, wherein the coding decision information of the representative channel of the plurality of channels of the target block is used for all of the plurality of channels.
17. An encoding method, comprising:
determining coding decision information of a representative channel of a target block; and is
Performing encoding on a target block using the encoding decision information of the representative channel,
Wherein the coding decision information of the representative channel is shared with a further channel of the target block.
18. The encoding method of claim 17, further comprising: generating a bitstream including information on the target block,
wherein the information on the target block includes coding decision information of the representative channel, and
wherein the information on the target block does not include coding decision information of the further channel.
19. The encoding method of claim 17, wherein the representative channel and the further channel are channels in a YCbCr color space.
20. A computer-readable storage medium storing a bitstream for image decoding, the bitstream comprising:
information on the target block is transmitted to the mobile station,
wherein the information on the target block includes encoding decision information of a representative channel of the target block,
wherein the coding decision information of the representative channel of the target block is used and shared as the coding decision information of the target channel of the target block, and
wherein decoding of the target block is performed using the coding decision information of the target channel.
CN201880088927.1A 2017-12-07 2018-12-07 Method and apparatus for encoding and decoding using selective information sharing between channels Pending CN111699682A (en)

Applications Claiming Priority (9)

Application Number Priority Date Filing Date Title
KR20170167729 2017-12-07
KR10-2017-0167729 2017-12-07
KR10-2018-0079012 2018-07-06
KR20180079012 2018-07-06
KR20180106479 2018-09-06
KR10-2018-0106479 2018-09-06
KR20180114333 2018-09-21
KR10-2018-0114333 2018-09-21
PCT/KR2018/015573 WO2019112394A1 (en) 2017-12-07 2018-12-07 Method and apparatus for encoding and decoding using selective information sharing between channels

Publications (1)

Publication Number Publication Date
CN111699682A true CN111699682A (en) 2020-09-22

Family

ID=67064587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201880088927.1A Pending CN111699682A (en) 2017-12-07 2018-12-07 Method and apparatus for encoding and decoding using selective information sharing between channels

Country Status (2)

Country Link
KR (2) KR20190067732A (en)
CN (1) CN111699682A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116257494A (en) * 2021-04-21 2023-06-13 华为技术有限公司 Method, system and computer equipment for aggregating communication

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113475064A (en) * 2019-01-09 2021-10-01 北京达佳互联信息技术有限公司 Video coding and decoding using cross-component linear model
US20220311996A1 (en) * 2019-06-20 2022-09-29 Electronics And Telecommunications Research Institute Method and apparatus for image encoding and image decoding using prediction based on block type
CN114026865A (en) * 2019-06-21 2022-02-08 北京字节跳动网络技术有限公司 Coding and decoding tool for chrominance component
KR20220035154A (en) * 2019-07-21 2022-03-21 엘지전자 주식회사 Image encoding/decoding method, apparatus and method of transmitting bitstream for signaling chroma component prediction information according to whether or not the palette mode is applied
US11930180B2 (en) 2019-08-06 2024-03-12 Hyundai Motor Company Method and apparatus for intra-prediction coding of video data
CN114270842A (en) * 2019-08-27 2022-04-01 现代自动车株式会社 Video encoding and decoding with differential encoding
KR20220050966A (en) 2019-09-25 2022-04-25 엘지전자 주식회사 Transformation-based video coding method and apparatus
JP7418561B2 (en) * 2019-10-04 2024-01-19 エルジー エレクトロニクス インコーポレイティド Video coding method and device based on conversion
KR20220082847A (en) * 2019-10-28 2022-06-17 베이징 바이트댄스 네트워크 테크놀로지 컴퍼니, 리미티드 Syntax signaling and parsing based on color components
MX2022005495A (en) * 2019-11-11 2022-08-04 Lg Electronics Inc Image coding method based on transform, and device therefor.
AU2019275552B2 (en) * 2019-12-03 2022-10-13 Canon Kabushiki Kaisha Method, apparatus and system for encoding and decoding a coding tree unit
AU2019275553B2 (en) * 2019-12-03 2022-10-06 Canon Kabushiki Kaisha Method, apparatus and system for encoding and decoding a coding tree unit
WO2021230618A1 (en) * 2020-05-11 2021-11-18 엘지전자 주식회사 Image coding method and device therefor
WO2023200155A1 (en) * 2022-04-12 2023-10-19 엘지전자 주식회사 Image encoding/decoding method, method for transmitting bitstream, and recording medium having bitstream stored therein

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116257494A (en) * 2021-04-21 2023-06-13 华为技术有限公司 Method, system and computer equipment for aggregating communication
CN116257494B (en) * 2021-04-21 2023-12-08 华为技术有限公司 Method, system and computer equipment for aggregating communication

Also Published As

Publication number Publication date
KR20240066144A (en) 2024-05-14
KR20190067732A (en) 2019-06-17

Similar Documents

Publication Publication Date Title
US11902553B2 (en) Method and apparatus for encoding and decoding using selective information sharing between channels
CN110463201B (en) Prediction method and apparatus using reference block
CN111699682A (en) Method and apparatus for encoding and decoding using selective information sharing between channels
CN110476425B (en) Prediction method and device based on block form
CN111567045A (en) Method and apparatus for using inter prediction information
US20220078485A1 (en) Bidirectional intra prediction method and apparatus
US20200029077A1 (en) Block form-based prediction method and device
US20220321890A1 (en) Method, apparatus, and recording medium for encoding/decoding image by using geometric partitioning
US20230013063A1 (en) Method and device for encoding/decoding image by using palette mode, and recording medium
US11812013B2 (en) Method, apparatus and storage medium for image encoding/decoding using subpicture
CN111684801A (en) Bidirectional intra prediction method and apparatus
CN114450946A (en) Method, apparatus and recording medium for encoding/decoding image by using geometric partition
US20220201295A1 (en) Method, apparatus and storage medium for image encoding/decoding using prediction
CN111919448A (en) Method and apparatus for image encoding and image decoding using temporal motion information
US20220272321A1 (en) Method, device, and recording medium for encoding/decoding image using reference picture
CN114270865A (en) Method, apparatus and recording medium for encoding/decoding image
US20220295059A1 (en) Method, apparatus, and recording medium for encoding/decoding image by using partitioning
CN114270828A (en) Method and apparatus for image encoding and image decoding using block type-based prediction
US11838506B2 (en) Method, apparatus and storage medium for image encoding/decoding
US11778169B2 (en) Method, apparatus and storage medium for image encoding/decoding using reference picture
US20230342980A1 (en) Method, apparatus, and storage medium for encoding/decoding multi-resolution feature map
US20230082092A1 (en) Transform information encoding/decoding method and device, and bitstream storage medium
US20220311996A1 (en) Method and apparatus for image encoding and image decoding using prediction based on block type
CN114788288A (en) Transform information encoding/decoding method and apparatus, and bit stream storage medium
CN115066895A (en) Method and apparatus for encoding/decoding image by using palette mode, and recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination