CN113841397A

CN113841397A - Image encoding and decoding method and device

Info

Publication number: CN113841397A
Application number: CN202080036117.9A
Authority: CN
Inventors: 沈东圭; 朴时奈; 崔韩松; 朴胜煜; 林和平
Original assignee: Hyundai Motor Co; Industry Academic Collaboration Foundation of Kwangwoon University; Kia Corp
Current assignee: Hyundai Motor Co; Industry Academic Collaboration Foundation of Kwangwoon University; Kia Corp
Priority date: 2019-05-15
Filing date: 2020-05-15
Publication date: 2021-12-24
Also published as: KR20200132753A

Abstract

The present invention relates to encoding and decoding a picture, in which a picture encoding apparatus divides each picture into sub-pictures that can be independently displayed and signals layout information on the sub-pictures, and a picture decoding apparatus recognizes each sub-block through the layout information and decodes the sub-pictures.

Description

Image encoding and decoding method and device

Technical Field

The present invention relates to encoding and decoding of video, and more particularly, to partitioning each image into sub-images capable of being independently displayed and encoding and decoding each sub-image.

Background

Since the amount of video data is larger than the amount of voice data or the amount of still image data, storing or transmitting video data without compression processing requires a large amount of hardware resources including a memory.

Accordingly, when storing or transmitting video data, the video data is generally compressed using an encoder to facilitate the storage or transmission of the video data. Then, the decoder receives the compressed video data, and decompresses and reproduces the video data. Compression techniques for such Video include H.264/AVC and High Efficiency Video Coding (HEVC), which is an improvement of approximately 40% over the Coding Efficiency of H.264/AVC.

However, the size, resolution, and frame rate of video images are gradually increasing, and thus the amount of data to be encoded is also increasing. Therefore, a new compression technique having better coding efficiency and higher picture quality than the existing compression technique is required.

Further, due to the advent of various applications such as 360 video, a technique of displaying not only the entire area of a decoded image but also a partial area of the image is required.

Disclosure of Invention

Technical problem

The present invention relates to a technique for partitioning each image into sub-images that can be displayed independently of each other, and a technique for encoding and decoding each sub-image.

Technical scheme

According to an aspect of the present invention, there is provided a video decoding method for decoding a bitstream containing a sequence of encoded pictures partitioned into a plurality of sub-pictures comprising a plurality of coding tree blocks. The method comprises the following steps: decoding layout information on sub-images from partitions of images included in a sequence from a bitstream; dividing an encoding tree block to be decoded in a tree structure in any one of the sub-images identified by the layout information, and determining a target block; decoding prediction information for predicting a target block and information on a residual signal of the target block from a bitstream; predicting pixels in the target block based on the prediction information and generating a prediction block; generating a residual block of the target block based on the information on the residual signal; the target block is reconstructed by adding the prediction block to the residual block.

According to another aspect of the present invention, there is provided a video decoding apparatus for decoding a bitstream containing a sequence of encoded pictures partitioned into a plurality of sub-pictures including a plurality of coding tree blocks. The device comprises: a decoder configured to decode layout information on sub-images from partitions of images included in a sequence from a bitstream, divide an encoded tree block to be decoded in a tree structure in any one of the sub-images identified by the layout information, and determine a target block, and decode prediction information for predicting the target block and information on a residual signal of the target block from the bitstream; the predictor is configured to predict pixels in a target block based on prediction information and generate a prediction block; the residual reconstructor is configured to generate a residual block of the target block based on information on the residual signal; the adder is configured to reconstruct the target block by adding the prediction block to the residual block.

Herein, the sub-images constituting the image are units that can be displayed independently of each other.

Drawings

FIG. 1 is an exemplary block diagram of a video encoding device capable of implementing the techniques of this disclosure.

Fig. 2 is a schematic diagram showing block segmentation using the QTBTTT structure.

Fig. 3 is a diagram illustrating a plurality of intra prediction modes.

Fig. 4 is an exemplary block diagram of a video decoding device capable of implementing the techniques of this disclosure.

Fig. 5 is an exemplary diagram showing the structure of a bitstream.

Fig. 6 is an exemplary diagram illustrating the structure of a NAL unit containing a third parameter set.

Fig. 7 is an exemplary diagram showing the layout of sub-images constituting each image.

Fig. 8 is another exemplary diagram showing the layout of sub-images constituting each image.

Fig. 9 is an exemplary diagram illustrating a method of processing sub-images overlapping each other.

Detailed Description

Hereinafter, some embodiments of the present invention will be described in detail with reference to the accompanying drawings. It should be noted that, when reference numerals are assigned to constituent elements in the respective drawings, the same reference numerals denote the same elements although the elements are shown in different drawings. Further, in the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted to avoid obscuring the subject matter of the present invention.

FIG. 1 is an exemplary block diagram of a video encoding device capable of implementing the techniques of this disclosure. Hereinafter, a video encoding apparatus and elements of the apparatus will be described with reference to fig. 1.

The video encoding device includes: a block divider 110, a predictor 120, a subtractor 130, a transformer 140, a quantizer 145, a recombiner 150, an entropy coder 155, an inverse quantizer 160, an inverse transformer 165, an adder 170, a loop filtering unit 180, and a memory 190.

Each element of the video encoding apparatus may be implemented in hardware or software or a combination of hardware and software. The functions of the respective elements may be implemented as software, and the microprocessor may be implemented to perform the software functions corresponding to the respective elements.

One video includes a plurality of images. Each image is divided into a plurality of regions, and encoding is performed on each region. For example, an image is segmented into one or more tiles (tiles) and/or slices (slices). Here, one or more tiles may be defined as a tile group. Each tile or slice is partitioned into one or more Coding Tree Units (CTUs). Each CTU is divided into one or more Coding Units (CUs) by a tree structure. Information applied to each CU is encoded as syntax of the CU, and information commonly applied to CUs included in one CTU is encoded as syntax of the CTU. In addition, information commonly applied to all blocks in one slice is encoded as syntax of a slice header, and information applied to all blocks constituting one or more pictures is encoded in a Picture Parameter Set (PPS) or a Picture header. Further, information commonly referred to by a Sequence composed of a plurality of pictures is encoded in a Sequence Parameter Set (SPS). Information commonly applied to one tile or tile group may be encoded as syntax of a tile header or tile group header.

The block partitioner 110 determines the size of a Coding Tree Unit (CTU). Information on the size of the CTU (CTU size) is encoded into the syntax of the SPS or PPS and transmitted to the video decoding apparatus.

The block divider 110 divides each picture constituting a video into a plurality of CTUs having a predetermined size, and then recursively divides the CTUs using a tree structure. In the tree structure, leaf nodes serve as Coding Units (CUs), which are basic units of coding.

The tree structure may be a QuadTree (QT), a Binary Tree (BT), i.e., a node (or parent node) divided into four slave nodes (or child nodes) of the same size, a Ternary Tree (TT), i.e., a node divided into two slave nodes, or a structure formed by combining two or more QT structures, BT structures, and TT structures, and the Ternary Tree (TT), i.e., a node divided into three slave nodes at a ratio of 1:2: 1. For example, a QuadTree plus binary tree (QTBT) structure may be used, or a QuadTree plus binary tree TernaryTree (QTBTTT) structure may be used. Here, BTTTs may be collectively referred to as a multiple-type tree (MTT).

Fig. 2 exemplarily shows a QTBTTT split tree structure. As shown in fig. 2, the CTU may be first partitioned into QT structures. The QT split may be repeated until the size of the split block reaches the minimum block size MinQTSize of the leaf nodes allowed in QT. A first flag (QT _ split _ flag) indicating whether each node of the QT structure is divided into four nodes of a lower layer is encoded by the entropy encoder 155 and signaled to the video decoding apparatus. When the leaf node of QT is not larger than the maximum block size of the root node allowed in BT (MaxBTSize), it may be further partitioned into one or more BT structures or TT structures. The BT structure and/or the TT structure may have a plurality of splitting directions. For example, there may be two directions, i.e., a direction of dividing a block of a node horizontally and a direction of dividing a block vertically. As shown in fig. 2, when MTT segmentation starts, a second flag (MTT _ split _ flag) indicating whether a node is segmented, a flag indicating a segmentation direction (vertical or horizontal) in the case of segmentation, and/or a flag indicating a segmentation type (binary or trifurcate) are encoded by the entropy encoder 155 and signaled to the video decoding apparatus. Alternatively, a CU partition flag (split _ CU _ flag) indicating whether a node is divided may be encoded before encoding a first flag (QT _ split _ flag) indicating whether each node is divided into 4 nodes of a lower layer. When the value of the CU partition flag (split _ CU _ flag) indicates that no partition is performed, the block of the node becomes a leaf node in the partition tree structure and is used as a Coding Unit (CU), which is a basic unit of coding. When the value of the CU partition flag (split _ CU _ flag) indicates that the partition is performed, the video encoding apparatus starts encoding the flag from the first flag in the above-described manner.

When using QTBT as another example of the tree structure, there may be two types of division, i.e., a type in which a block is horizontally divided into two blocks of the same size (i.e., symmetric horizontal division) and a type in which a block is vertically divided into two blocks of the same size (i.e., symmetric vertical division). A partition flag (split _ flag) indicating whether each node of the BT structure is partitioned into blocks of a lower layer and partition type information indicating a partition type are encoded by the entropy encoder 155 and transmitted to the video decoding apparatus. There may be additional types of partitioning a block of a node into two asymmetric blocks. The asymmetric division type may include a type in which a block is divided into two rectangular blocks at a size ratio of 1:3, or a type in which a block of a node is divided diagonally.

CUs may have various sizes according to QTBT or QTBTTT partitioning of CTUs. Hereinafter, a block corresponding to a CU to be encoded or decoded (i.e., a leaf node of the QTBTTT) is referred to as a "current block". When QTBTTT partitioning is employed, the shape of the current block may be square or rectangular.

The predictor 120 predicts the current block to generate a prediction block. The predictor 120 includes an intra predictor 122 and an inter predictor 124.

The intra predictor 122 predicts pixels in the current block using pixels (reference pixels) located around the current block in a current picture including the current block. Depending on the prediction direction, there are multiple intra prediction modes. For example, as shown in fig. 3, the plurality of intra prediction modes may include 2 non-directional modes and 65 directional modes, and the 2 non-directional modes include a plane (planar) mode and a DC mode. The adjacent pixels and equations to be used are defined differently for each prediction mode.

The intra predictor 122 may determine an intra prediction mode to be used when encoding the current block. In some examples, the intra predictor 122 may encode the current block using several intra prediction modes and select an appropriate intra prediction mode to use from the tested modes. For example, the intra predictor 122 may calculate a rate-distortion value using a rate-distortion (rate-distortion) analysis of several tested intra prediction modes, and may select an intra prediction mode having the best rate-distortion characteristic among the tested modes.

The intra predictor 122 selects one intra prediction mode from among a plurality of intra prediction modes, and predicts the current block using neighboring pixels (reference pixels) determined according to the selected intra prediction mode and an equation. The information on the selected intra prediction mode is encoded by the entropy encoder 155 and transmitted to the video decoding apparatus.

The inter predictor 124 generates a prediction block of the current block through motion compensation. The inter predictor 124 searches for a block most similar to the current block in a reference picture that has been encoded and decoded earlier than the current picture, and generates a prediction block of the current block using the searched block. Then, the inter predictor generates a motion vector (motion vector) corresponding to a displacement (displacement) between the current block in the current picture and the prediction block in the reference picture. In general, motion estimation is performed on a luminance component, and a motion vector calculated based on the luminance component is used for both the luminance component and the chrominance component. The motion information including information on the reference image and information on the motion vector for predicting the current block is encoded by the entropy encoder 155 and transmitted to the video decoding apparatus. The inter predictor 124 may perform interpolation on a reference picture or a reference block to increase the accuracy of prediction. That is, a sub-pixel is interpolated between two consecutive integer pixels by applying a filter coefficient to the plurality of consecutive integer pixels including the two integer pixels. When a process of searching for a block most similar to the current block of the interpolated reference image is performed, the motion vector may be expressed not as the precision of integer pixels but as the precision of fractional units. The precision or resolution of the motion vector may be set differently for each unit of the target area to be encoded, e.g., a slice, a tile, a CTU, or a CU.

The subtractor 130 subtracts the prediction block generated by the intra predictor 122 or the inter predictor 124 from the current block to generate a residual block.

The transformer 140 may divide the residual block into one or more sub-blocks and apply a transform to the one or more sub-blocks, thereby transforming the residual values of the transformed block from the pixel domain to the frequency domain. In the frequency domain, a transform block is referred to as a coefficient block or transform block containing one or more transform coefficient values. Two-dimensional transform kernels may be used for the transform, while one-dimensional transform kernels may be used for the horizontal transform and the vertical transform, respectively. The transform kernel may be based on a Discrete Cosine Transform (DCT), a Discrete Sine Transform (DST), or the like.

The transformer 140 may transform the residual signal in the residual block using the entire size of the residual block as a transform unit. Alternatively, the residual block may be partitioned into a plurality of sub-blocks, and the residual signals in the sub-blocks may be transformed using the sub-blocks as a transform unit.

The transformer 140 may transform the residual block separately in the horizontal direction and the vertical direction. For the transformation, various types of transformation functions or transformation matrices may be used. For example, a pair of transformation functions for transforming in the horizontal direction and the vertical direction may be defined as a Multiple Transform Set (MTS). The transformer 140 may select a pair of transformation functions having the best transformation efficiency in the MTS and transform the residual block in the horizontal direction and the vertical direction, respectively. The information (MTS _ idx) on the pair of transform functions selected from the MTS is encoded by the entropy encoder 155 and signaled to the video decoding apparatus.

The quantizer 145 quantizes the transform coefficient output from the transformer 140 using the quantization parameter, and outputs the quantized transform coefficient to the entropy encoder 155. For some blocks or frames, the quantizer 145 may quantize the associated residual block directly without transformation. The quantizer 145 may apply different quantization coefficients (scaling values) according to the positions of transform coefficients in the transform block. A quantized coefficient matrix applied to quantized transform coefficients arranged in two dimensions may be encoded and signaled to a video decoding apparatus.

The recombiner 150 may recombine the coefficient values of the quantized residual values. The recombinator 150 may change the 2-dimensional coefficient array into a 1-dimensional coefficient sequence through a coefficient scan (coefficient scanning). For example, the recombiner 150 may scan the coefficients from the DC coefficients to the coefficients in the high frequency region using a zig-zag scan (zig-zag scan) or a diagonal scan (diagonalscan) to output a 1-dimensional sequence of coefficients. Depending on the size of the transform unit and the intra prediction mode, vertical scanning (vertical scan), i.e. scanning a two-dimensional array of coefficients in the column direction, or horizontal scanning (horizontal scan), i.e. scanning coefficients in the row direction in the shape of a two-dimensional block, may be used instead of zig-zag scanning (zig-zag scan). That is, the scan mode to be used may be determined in zigzag scanning, diagonal scanning, vertical scanning, and horizontal scanning according to the size of the transform unit and the intra prediction mode.

The entropy encoder 155 encodes the one-dimensionally quantized transform coefficients output from the re-composer 150 using various encoding techniques such as Context-based Adaptive Binary Arithmetic Code (CABAC) and exponential Golomb (exponential Golomb) to generate a bitstream.

The entropy encoder 155 encodes information related to block division (e.g., CTU size, CU division flag, QT division flag, MTT division type, and MTT division direction) so that the video decoding apparatus can divide blocks in the same manner as the video encoding apparatus. In addition, the entropy encoder 155 encodes information on a prediction type indicating whether the current block is intra prediction encoded or inter prediction encoded, and encodes intra prediction information (i.e., information on an intra prediction mode) or inter prediction information (i.e., information on a reference picture index and a motion vector) according to the prediction type. In addition, the entropy encoder 155 encodes information related to quantization, that is, information on a quantization parameter and information on a quantization matrix.

The inverse quantizer 160 inversely quantizes the quantized transform coefficient output from the quantizer 145 to generate a transform coefficient. The inverse transformer 165 transforms the transform coefficients output from the inverse quantizer 160 from the frequency domain to the spatial domain and reconstructs a residual block.

The adder 170 adds the reconstructed residual block to the prediction block generated by the predictor 120 to reconstruct the current block. The pixels in the reconstructed current block are used as reference pixels when performing intra prediction of a subsequent block.

The loop filtering unit 180 filters the reconstructed pixels to reduce block artifacts (blocking artifacts), ringing artifacts (ringing artifacts), and blurring artifacts (blurring artifacts) generated due to block-based prediction and transform/quantization. The loop filtering unit 180 may include one or more of a deblocking filter 182, a Sample Adaptive Offset (SAO) filter 184, or an Adaptive Loop Filter (ALF) 186.

The deblocking filter 182 filters boundaries between reconstructed blocks to remove block artifacts caused by block-wise encoding/decoding, and the SAO filter 184 performs additional filtering on the deblock filtered video. The SAO filter 184 is a filter for compensating a difference between a reconstructed pixel and an original pixel caused by lossy encoding, and performs filtering in such a manner that a corresponding offset is added to each reconstructed pixel. The ALF 186 performs filtering on a target pixel to be filtered by applying filter coefficients to the target pixel and neighboring pixels of the target pixel. The ALF 186 may divide pixels included in an image into predetermined groups and then determine one filter to be applied to the corresponding group to differently perform filtering for each group. Information on the filter coefficients to be used for the ALF may be encoded and signaled to a video decoding apparatus.

The reconstructed block filtered by the loop filtering unit 180 is stored in the memory 190. Once all blocks in a picture are reconstructed, the reconstructed picture can be used as a reference picture for inter-predicting blocks in subsequent pictures to be encoded.

Fig. 4 is an exemplary functional block diagram of a video decoding device capable of implementing the techniques of this disclosure. Hereinafter, a video decoding apparatus and elements of the apparatus will be described with reference to fig. 4.

The video decoding apparatus may include: an entropy decoder 410, a recombiner 415, an inverse quantizer 420, an inverse transformer 430, a predictor 440, an adder 450, a loop filtering unit 460, and a memory 470.

Similar to the video encoding apparatus of fig. 1, each element of the video decoding apparatus may be implemented in hardware, software, or a combination of hardware and software. Further, the function of each element may be implemented as software, and the microprocessor may be implemented to execute the function of the software corresponding to each element.

The entropy decoder 410 determines a current block to be decoded by decoding a bitstream generated by a video encoding apparatus and extracting information related to block division, and extracts prediction information required for reconstructing the current block, information on a residual signal, and the like.

The entropy decoder 410 extracts information on the CTU size from a Sequence Parameter Set (SPS) or a Picture Parameter Set (PPS), determines the size of the CTU, and partitions the picture into CTUs of the determined size. Then, the decoder determines the CTU as the highest layer of the tree structure, that is, a root node of the tree structure, and extracts the partitioning information about the CTU to partition the CTU using the tree structure.

For example, when a CTU is divided using a QTBTTT structure, a first flag (QT _ split _ flag) related to the division of QT is extracted to divide each node into four nodes of a sub-layer. For nodes corresponding to leaf nodes of the QT, a second flag (MTT _ split _ flag) related to the splitting of the MTT and information on the splitting direction (vertical/horizontal) and/or the splitting type (binary/trifurcate) are extracted, thereby splitting the corresponding leaf nodes in the MTT structure. Thus, each node below the leaf node of the QT is recursively split in BT or TT structure.

As another example, when a CTU is divided using a QTBTTT structure, a CU division flag (split _ CU _ flag) indicating whether or not to divide a CU may be extracted. When the corresponding block is divided, a first flag (QT _ split _ flag) may be extracted. In a split operation, after zero or more recursive QT splits, zero or more recursive MTT splits may occur per node. For example, a CTU may undergo MTT segmentation directly without undergoing QT segmentation, or only multiple QT segmentations.

As another example, when a CTU is divided using a QTBT structure, a first flag (QT _ split _ flag) related to QT division is extracted, and each node is divided into four nodes of a lower layer. Then, a partition flag (split _ flag) indicating whether or not a node corresponding to a leaf node of the QT is further partitioned with BT and partition direction information are extracted.

Once the current block to be decoded is determined through tree structure division, the entropy decoder 410 extracts information on a prediction type indicating whether the current block is intra-predicted or inter-predicted. When the prediction type information indicates intra prediction, the entropy decoder 410 extracts a syntax element of intra prediction information (intra prediction mode) of the current block. When the prediction type information indicates inter prediction, the entropy decoder 410 extracts syntax elements for the inter prediction information, that is, information indicating a motion vector and a reference picture referred to by the motion vector.

The entropy decoder 410 also extracts information regarding transform coefficients of the quantized current block as quantization-related information and information regarding a residual signal.

The re-composer 415 may change the sequence of one-dimensional quantized transform coefficients entropy-decoded by the entropy decoder 410 into a 2-dimensional coefficient array (i.e., block) in the reverse order of the coefficient scanning performed by the video encoding apparatus.

The inverse quantizer 420 inversely quantizes the quantized transform coefficients using the quantization parameter. The inverse quantizer 420 may apply different quantization coefficients (scaling values) to the quantized transform coefficients arranged in two dimensions. The inverse quantizer 420 may perform inverse quantization by applying a quantization coefficient (scaling value) matrix from the video encoding apparatus to the 2-dimensional array of quantized transform coefficients.

The inverse transformer 430 inverse-transforms the inverse-quantized transform coefficients from the frequency domain to the spatial domain to reconstruct a residual signal, thereby generating a reconstructed residual block of the current block. In addition, when the MTS is applied, the inverse transformer 430 determines a transform function or a transform matrix to be applied in the horizontal direction and the vertical direction, respectively, using MTS information (MTS _ idx) signaled from the video encoding apparatus, and inversely transforms transform coefficients in the transform block in the horizontal direction and the vertical direction using the determined transform function.

The predictor 440 may include an intra predictor 442 and an inter predictor 444. The intra predictor 442 is activated when the prediction type of the current block is intra prediction, and the inter predictor 444 is activated when the prediction type of the current block is inter prediction.

The intra predictor 442 determines an intra prediction mode of the current block among a plurality of intra prediction modes based on syntax elements of the intra prediction modes extracted from the entropy decoder 410, and predicts the current block using reference pixels around the current block according to the intra prediction mode.

The inter predictor 444 determines a motion vector of the current block and a reference picture referred to by the motion vector using syntax elements of the intra prediction mode extracted from the entropy decoder 410 and predicts the current block based on the motion vector and the reference picture.

The adder 450 reconstructs the current block by adding the residual block output from the inverse transformer and the prediction block output from the inter predictor or the intra predictor. When intra-predicting a block to be subsequently decoded, pixels in the reconstructed current block are used as reference pixels.

Loop filtering unit 460 may include at least one of a deblocking filter 462, SAO filter 464, or ALF 466. The deblocking filter 462 deblocks the boundaries between the reconstructed blocks to remove block artifacts caused by block-by-block decoding. The SAO filter 464 performs filtering in such a manner that a reconstructed block is added after deblocking filtering a corresponding offset in order to compensate for a difference between a reconstructed pixel and an original pixel caused by lossy coding. The ALF 466 performs filtering on the target pixel to be filtered by applying the filter coefficient to the target pixel and the neighboring pixels of the target pixel. The ALF 466 may divide pixels in an image into predetermined groups and then determine one filter to be applied to the corresponding group to perform filtering differently for each group. The filter coefficients of the ALF are determined based on information about the filter coefficients decoded from the bitstream.

The reconstructed block filtered by the loop filtering unit 460 is stored in the memory 470. When all blocks in a picture are reconstructed, the reconstructed picture is used as a reference picture for inter prediction of blocks in pictures to be subsequently encoded.

As described above, the video encoding apparatus transmits a bitstream containing encoded data regarding a video, and the video decoding apparatus decodes the bitstream to reconstruct each image constituting the video.

According to an aspect of the present invention, a bitstream may be composed of a plurality of transport units, that is, Network Abstraction Layer (NAL) units. As shown in fig. 5, a NAL unit may include a NAL unit header and data carried by the NAL unit. To match the size of the NAL unit in bytes, bits may be added to the rear of the data constituting the NAL unit. The NAL unit header includes a NAL unit type indicating the type of data carried by the NAL unit and a temporal layer ID of the NAL unit.

The NAL unit may be classified into a Video Coding Layer (VCL) type and a non-VCL type according to a data type included in a data field. A VCL type NAL unit is an NAL unit containing image data of a group of pixels encoded in a data field and generally contains information about the type of pixelEncodingData of the slice of (1). This means that the unit in which the image data is transmitted is a slice. non-VCL type NAL units include parameters necessary to decode data for groups of pixels in the data field. Bag (bag)NAL units including high level syntax, such as SPS (hereinafter referred to as "first parameter set") including parameters shared at the sequence level or PPS (hereinafter referred to as "second parameter set") including parameters shared at the picture level or more, correspond to NAL units of non-VCL type. Whether a transmitted NAL unit relates to a first parameter set, a second parameter set, or a coded slice is indicated by a NAL unit type contained in a NAL unit header.

Furthermore, NAL units may be further defined that carry a third parameter set that includes parameters, e.g., one or more slices, that are commonly applied to a picture or a group of pixels smaller than a picture. In the present invention, the bitstream representing a sequence comprises one or more NAL units carrying a third parameter set.

The third parameter set is contained in a data field of the NAL unit. The third parameter set includes at least one of ID information and parameter type information. The third parameter set further includes parameters corresponding to parameter type information.

Each third set of parameters is associated with any one of a plurality of parameter types according to the coding tool, and the parameters carried by the sets are associated with said coding tool. As shown in (a) of fig. 6, information indicating the parameter type may be included in the third parameter set. The NAL unit type in the structure of fig. 6 (a) may indicate whether the NAL unit is related to the third parameter set by a specific index value. The parameter type included in the third parameter set is identified by the parameter type information included in the third parameter set. Alternatively, as shown in (b) of fig. 6, the parameter type may be indicated by a NAL unit type contained in the NAL unit header. In this case, the NAL unit types are further subdivided according to the types of parameters included in the third parameter set. For example, in the structure of fig. 6 (b), the NAL unit type may have different index values according to the type of the parameter carried in the third parameter set. Based on the value of the NAL unit type, it is identified whether the data contained in the NAL unit is a third parameter set, and the parameter type of the third parameter set. As another example, as shown in (c) of fig. 6, only parameter type information may be contained in a data field of a NAL unit without ID information. In this case, the ID of the third parameter set is assigned according to the encoding or decoding order. The ID value may be sequentially assigned in decoding for each parameter type, or may be sequentially assigned according to a decoding order regardless of the parameter type.

The parameter types of the third parameter set may include, for example, a loop filter type, a scaling list type, and a prediction information type. When the parameter type information is a loop filtering type, the third parameter set includes information on one or more sets of filter coefficients used for loop filtering. Here, the filter coefficients belonging to each group may be expressed by the absolute values of the filter coefficients and the signs of the filter coefficients. When the parameter type information is a scaling list type, the third parameter set includes scaling values, i.e., coefficients of a quantization matrix used for quantization of each transform coefficient. When the parameter type information is a prediction information type, the third parameter set includes a set of filter coefficients for generating a prediction signal in a specific prediction mode.

The video decoding device decodes the NAL unit associated with the third parameter set and stores the third parameter set in memory. The ID and parameter type of the third parameter set are determined from the decoding of the NAL unit. For each parameter type, m parameter sets (where m is an integer greater than or equal to 1) may be stored in memory, where m may vary depending on the parameter type. When m parameter sets have been stored in the memory, the video decoding apparatus deletes the existing parameter set and stores a newly decoded third parameter set. For example, a parameter type having the same ID and parameter type as the newly decoded third parameter set may be deleted from the memory. Alternatively, the oldest stored third parameter set may be deleted. Alternatively, the ID and the parameter type information may be extracted from the bitstream to delete the third parameter set corresponding to the extracted information.

Due to the advent of various applications such as 360 video, a technique of displaying not only the entire area of a decoded image but also a partial area of the image is required. To support this technical requirement, another aspect of the present invention provides a method of partitioning an image into a plurality of sub-images, and encoding and decoding each sub-image. The present invention enables independent encoding or independent transmission of sub-images constituting each image. In addition, data corresponding to each of the sub-images may be independently extracted or decoded from the entire bitstream. Further, the sub-images from the partitions of the image may be displayed independently of each other.

To represent the layout of sub-images from partitions of an image, groups of Coding Units (CUs) may be defined. A CU group may be a CTU, a slice, a tile, or a grid of a predefined size. The CU group may be classified into a first CU group that is a basic unit constituting the sub-image and a second CU group that is composed of a plurality of the first CU groups. For example, the first CU group may be a CTU, and the second CU group may be a slice corresponding to a transmission unit as described above. The sub-image may be composed of one or more second CU groups. In the following, it is assumed for simplicity that the first CU group is a CTU and the second CU group is a slice, but it is apparent that the present invention is not limited thereto. For example, the first CU group may be a mesh having a predefined size and the second CU group may be a slice or a tile.

The layout information is represented by the number of sub-images in the image, the ID of each sub-image, and the position and size of each sub-image in the image. Here, the position and size of each sub-picture may be expressed as information for identifying the CTUs constituting each sub-picture.

The CTUs that make up each sub-picture may be identified by the first CTU and the last CTU that make up the sub-picture in raster scan order. For rectangular sub-pictures, the first and last CTUs in raster scan order refer to the CTU at the top left and the CTU at the bottom right in each sub-picture. Accordingly, the information for identifying the CTUs constituting the sub-image may include identification information for identifying the position of the upper left CTU (e.g., coordinate information on the upper left CTU) and identification information for identifying the CTU located at the lower right (e.g., coordinate information on the lower right CTU). Alternatively, the information for identifying the CTUs constituting the sub-image may be expressed by identification information for identifying the position of the upper left CTU, the number of CTUs in the horizontal direction, and the number of CTUs in the vertical direction of the sub-image.

In the example of fig. 8, the first CU group, which is a basic unit constituting the sub-image, is a grid in which a plurality of pixels are grouped. The sub-images may be a grid set.

The layout information about the sub-images may be represented by a grid. As layout information on sub-images in an image, the size of a grid is first defined. The size of the grid may be defined by the horizontal and vertical length of the grid, or the number of grids in the horizontal and vertical directions of the image.

The layout information on the sub-images includes the number of sub-images in the image, an ID of each sub-image, and identification information for identifying a grid constituting each sub-image. For example, the identification information includes identification information on the first grid and identification information on the last grid in a raster scan order within the sub-image. In a rectangular sub-image, the first grid corresponds to the top left grid in the sub-image, and the last grid corresponds to the bottom right grid within the sub-image. Accordingly, the identification information includes identification information on the upper left grid and the lower right grid in the sub-image. Alternatively, the identification information may include identification information on the upper left grid in the sub-image and information on the number of grids in the horizontal direction and the vertical direction. In an implementation example, the identification information may be a location of the grid. In another example of implementation, the identification information may be represented as an ID or an address of the mesh. Here, the IDs or addresses of the grids in the image may be allocated in ascending order starting from 0 according to a specific scanning order, for example, a raster scanning order.

The video encoding apparatus encodes partition information, that is, layout information for partitioning each image into sub-images. The layout information may be included in the first parameter set or the second parameter set described above. The video decoding apparatus extracts layout information included in the first parameter set or the second parameter set to identify sub-images constituting each image.

The video encoding apparatus encodes a sub-image ID of each slice as a transmission unit. The video decoding apparatus can determine which sub-image the slice belongs to by extracting the sub-image ID on a per slice basis. In addition, each sub-image is reconstructed by decoding blocks in one or more slices constituting the sub-image by the above-described decoding process.

The sub-pictures should be transmitted and encoded/decoded independently and displayed independently of each other. Accordingly, the video encoding apparatus and the video decoding apparatus of the present invention can perform processing that does not allow reconstructed pixels outside the reference sub-picture boundary in the process of encoding or decoding the sub-picture. Here, the boundary of the sub-image may be identified by the layout information.

In an embodiment, the predictor 120 of the video encoding apparatus and the predictor 440 of the video decoding apparatus predict pixels within a current block from previously reconstructed pixels based on prediction information (inter prediction information or intra prediction information) on the current block. When the pixel position determined based on the prediction information is outside the boundary of the current sub-image, the

predictors

120 and 440 predict pixels in the current block based on the substitute pixels instead of the previously reconstructed pixels at the determined positions even in the case where the previously reconstructed pixels exist at the determined positions. Here, the substitute pixel may have a predefined fixed pixel value, or may be a pixel at a predefined position in the current sub-image, e.g. a pixel in contact with a sub-image boundary in the current sub-image.

A process of replacing previously reconstructed pixels outside the boundary of the current sub-image with substitute pixels may be included in various prediction processes. For example, when a pre-reconstructed reference pixel around the current block and used for intra prediction of the current block is outside the boundary of the sub-picture, the reference pixel may be replaced with a predefined fixed pixel value. As another example, when at least some of the integer pixels used for interpolation of sub-pixels in inter prediction are outside the boundary of the current sub-image, integer pixels in the sub-image that are adjacent to the boundary may be used instead of integer pixels outside the boundary of the current sub-image.

In another implementation, the loop filtering unit 180 of the video encoding device and the loop filtering unit 460 of the video decoding device may perform loop filtering on the target pixel to be filtered by applying a filter coefficient to the target pixel and neighboring pixels around the target pixel in the currently reconstructed block. When at least one of the neighboring pixels is outside the boundary of the current sub-image, the

loop filtering units

180 and 460 perform loop filtering on the target pixel based on the substitute pixel instead of the neighboring pixel outside the boundary of the current sub-image. The substitute pixel may have a predefined fixed pixel value or may be a pixel at a predefined location in the current sub-image, e.g. a pixel in the current sub-image and in contact with the boundary of the current sub-image.

The filter coefficients for loop filtering may be determined from the third parameter set described above. For each group of CUs, e.g., each slice, the video encoding device signals ID information for a third parameter set related to the loop filtering type. The video decoding apparatus decodes ID information regarding a third parameter set related to loop filtering from a header of a CU group such as a slice header, and selects the third parameter set corresponding to the decoded ID information from the third parameter set stored in the memory.

ID information on a plurality of third parameter sets related to loop filtering may be signaled on a per slice basis. For this, first, information on the number of IDs of the third parameter set included in the slice is signaled, and as many pieces of ID information on the third parameter set as the number are signaled. The number of pieces of ID information signaled may vary depending on the color component of the pixel to be filtered, that is, depending on whether the color component is luminance or chrominance. For example, in the case of a luminance component, the number of pieces of ID information is signaled first, and pieces of ID information on the third parameter set corresponding to the number are signaled. On the other hand, in the case of the chrominance component, a single third parameter set of the loop filtering type may be always used. Therefore, only one piece of ID information may be signaled without signaling information on the number of pieces of ID information.

With the third parameter set selected on a per slice basis, the filter coefficients to be applied to the blocks in the slice may be determined on a per CTU basis. All blocks included in one CTU share the same filter coefficient.

As described above, each third set of parameters may comprise one or more sets of filtering coefficients. One filter coefficient set includes a plurality of filter coefficients. The number of filter coefficients is determined according to the number of pixels for filtering one pixel. For example, when one pixel is filtered using n pixels in total (a filtering target pixel and pixels around the target pixel), one filter set includes n filter coefficients corresponding to the n pixels.

The video encoding device signals filtering information for loop filtering for each CTU on a per CTU basis. The video decoding apparatus determines a set of filter coefficients to be applied to the CTUs using the signaled filter information on a per-CTU basis.

In embodiments where one third parameter set is selected on a per slice basis, filter index information indicating one filter coefficient set from among filter coefficient sets included in one third parameter set may be signaled on a per CTU basis.

In another embodiment where the plurality of third parameter sets are determined on a per slice basis, parameter ID information indicating which of the plurality of third parameter sets is to be used may be signaled on a per CTU basis. In addition, filter index information indicating one filter coefficient set among filter coefficient sets included in the third parameter set corresponding to the parameter ID information may be additionally signaled. Alternatively, the set of filter coefficients to be used in the set of filter coefficients included in the third parameter set corresponding to the parameter ID information may be obtained based on the characteristics of the target pixel to be filtered in the CTU. For example, a characteristic such as directivity or activity of the target pixel may be calculated using the target pixel and the surrounding pixels, and a filter coefficient set to be applied to each target pixel may be selected according to the calculated characteristic. Here, the characteristics of the pixel, such as the direction or activity of the target pixel, may be calculated by gradient operation based on the target pixel and its neighboring pixels, such as pixels within a predetermined region including the target pixel. In this alternative embodiment, there is no need to signal filter index information indicating a filter coefficient set to be used among filter coefficient sets included in the third parameter set corresponding to the parameter ID information. The video encoding apparatus and the video decoding apparatus may calculate characteristics of the target filter, and select the filter set according to the calculated characteristics.

In addition to the filter coefficient sets included in the third parameter set, a predefined plurality of filter coefficient sets may be further used. Here, the predefined plurality of sets of filter coefficients may be sets of filter coefficients used to determine one or more CTUs decoded immediately before the current CTU in decoding order, or may be sets of filter coefficients used to determine neighboring CTUs at predefined locations (e.g., CTUs located on the top side and/or left side of the current CTU) adjacent to the current CTU. Alternatively, the plurality of predefined sets of filter coefficients may be sets of filter coefficients for fixedly presetting all CTUs. Hereinafter, a preset filter coefficient set used in a previously decoded CTU is referred to as a "filter set reference list". First, information indicating whether a filter set reference list is used for a current CTU is signaled.

When the filter set reference list is not used, one or more of the parameter ID information and the filter index information in the third parameter set may be signaled for a filter coefficient set to be used in the current CTU, and the video decoding apparatus obtains the filter coefficient set to be applied to the current CTU or each pixel of the current CTU based on the signaled information as described above.

When the filter set reference list is used, a filter set to be applied to the current CTU is selected from filter sets included in the preset filter set reference list. The selection may be performed by filtering index information signaled from the video encoding apparatus to the video decoding apparatus, or the selection of each pixel may be inferred from the characteristics after calculating the characteristics of each pixel in the current CTU as described above.

The sub-images reconstructed by the above decoding process can be displayed independently of each other. The sub-images may be stored in different storage spaces in the memory, or may be stored separately in a plurality of memories.

When a plurality of sub-images are stored in the same memory, the sub-images may be stored at predetermined intervals therebetween.

The memory storage structure in units of sub-images may be signaled from the video encoding apparatus to the video decoding apparatus on a per-image or per-sub-image basis, or may be obtained by the index, coordinates, and reference relationship of the sub-images. As an example, a plurality of sub-images having no reference relationship may be stored in different memories. As another example, multiple sub-images having the same location, the same size, the same sub-image index, or the same memory index may be stored in the same memory.

The plurality of reconstructed sub-images stored in different memories may be stored in the same memory before display. The plurality of reconstructed sub-images stored discontinuously in the same memory may be stored continuously before display.

As shown in fig. 9, each image may be partitioned into a plurality of sub-images so as to have areas with overlapping sub-images. The video decoding apparatus may select pixels of one sub-image from pixels of different sub-images constituting the overlap area and store them in a memory or display them. The selection of pixels in the sub-picture may be signaled from the video encoding device to the video decoding device. Alternatively, the video decoding apparatus may obtain the pixel values in the overlap area to be stored or displayed by applying a mathematical operation such as an average or a weighted average to a plurality of sub-images constituting the overlap area.

It should be appreciated that the above-described exemplary embodiments may be implemented in many different ways. The functions described in one or more examples may be implemented as hardware, software, firmware, or any combination thereof. It should be appreciated that the functional components described herein have been labeled as "units" to further emphasize their implementation independence.

Various functions or methods described in the present invention may be implemented with instructions stored in a nonvolatile recording medium, which can be read and executed by one or more processors. The nonvolatile recording medium includes, for example, all types of recording devices in which data is stored in a form readable by a computer system. For example, the nonvolatile recording medium includes storage media such as an Erasable Programmable Read Only Memory (EPROM), a flash memory drive, an optical disc drive, a magnetic hard disc drive, and a Solid State Drive (SSD).

Although the exemplary embodiments have been described for illustrative purposes, those skilled in the art will appreciate that various modifications and changes are possible without departing from the spirit and scope of the embodiments. For the sake of brevity and clarity, exemplary embodiments have been described. Accordingly, it will be appreciated by those of ordinary skill that the scope of the embodiments is not limited by the embodiments explicitly described above, but is included in the claims and their equivalents.

Cross Reference to Related Applications

The present application claims the priority of korean patent application No.10-2019-0056973, filed on 15/5/2019, korean patent application No.10-2019-0121030, filed on 30/9/2019, and korean patent application No.10-2020-0058245, filed on 15/5/2020, the entire contents of which are incorporated herein by reference.

Claims

1. A video decoding method for decoding a bitstream containing a sequence of encoded pictures partitioned into a plurality of sub-pictures comprising a plurality of coding tree blocks, the method comprising:

decoding layout information on sub-images partitioned from images included in a sequence from a bitstream;

dividing an encoding tree block to be decoded in a tree structure in any one of the sub-images identified by the layout information, and determining a target block;

decoding prediction information for predicting a target block and information on a residual signal of the target block from a bitstream;

predicting pixels in the target block based on the prediction information and generating a prediction block;

generating a residual block of the target block based on the information on the residual signal; and

the target block is reconstructed by adding the prediction block and the residual block.

2. The method of claim 1, wherein the sub-images are cells that can be displayed independently of each other.

3. The method of claim 1, wherein the layout information comprises: the number of sub-images, identification information for identifying the position of the first coding tree block in raster scan order within each sub-image, and information about the size of the sub-image.

4. The method according to claim 3, wherein the identification information is information for identifying a position of a coding tree block located at a top left of each sub-image.

5. The method of claim 3, wherein the information about the size of the sub-image comprises: the number of coding tree blocks in the horizontal direction and the number of coding tree blocks in the vertical direction.

6. The method of claim 1, wherein generating a prediction block comprises:

when a previously-reconstructed pixel at a position determined based on the prediction information is outside a boundary of a sub-image including the target block, a pixel in the target block is predicted based on a substitute pixel that replaces the previously-reconstructed pixel.

7. The method of claim 1, further comprising:

performing loop filtering on a target pixel to be filtered in the reconstructed target block by applying filter coefficients to the target pixel and neighboring pixels of the target pixel,

wherein when at least one of the neighboring pixels is outside a boundary of the sub-image including the target block, loop filtering is performed on the target pixel based on a substitute pixel that substitutes for the at least one of the neighboring pixels.

8. The method of claim 7, wherein the bitstream contains a first set of parameters that carry parameters that apply in common to a level of a sequence, and a second set of parameters that carry parameters that apply in common to a level of a picture,

wherein the layout information is decoded from the first parameter set or the second parameter set.

9. The method of claim 8, wherein the bitstream contains one or more third parameter sets carrying parameters that apply collectively to groups of pixels of a size less than or equal to a picture,

wherein each of the third parameter sets comprises: ID information, type information indicating a type of a parameter carried therein among a plurality of parameter types, and a parameter corresponding to the type information,

wherein the plurality of parameter types includes at least a parameter type related to loop filtering.

10. The method of claim 8, wherein each of the sub-pictures is composed of one or more slices comprising a plurality of coding tree blocks.

11. The method of claim 10, wherein at least one ID information indicating a third parameter set relating to loop filtering is decoded from a slice header including a target block located therein,

wherein one or more sets of filter coefficients used for loop filtering blocks in the slice are reconstructed through a third set of parameters corresponding to the ID information decoded from the slice header.

12. The method of claim 11, wherein filter coefficients for loop filtering are determined from one or more filter coefficient sets based on a coding tree block,

wherein the loop filtering of the pixels in the reconstructed target block is performed using filter coefficients corresponding to the coding tree block comprising the target block.

13. A video decoding apparatus for decoding a bitstream containing a sequence of encoded pictures partitioned into a plurality of sub-pictures comprising a plurality of coding tree blocks, the apparatus comprising:

a decoder configured to:

dividing an encoding tree block to be decoded in a tree structure in any one of the sub-images identified by the layout information, and determining a target block; and is

a predictor configured to predict pixels in the target block based on the prediction information and generate a prediction block;

a residual reconstructor configured to generate a residual block of the target block based on information on the residual signal; and

an adder configured to reconstruct the target block by adding the prediction block to the residual block.

14. The apparatus of claim 13, wherein the sub-images are cells that can be displayed independently of each other.

15. The apparatus of claim 13, wherein the layout information comprises: the number of sub-images, identification information for identifying the position of the first coding tree block in raster scan order within each sub-image, and information about the size of the sub-image.

16. The apparatus of claim 13, wherein the predictor predicts the pixel in the target block based on a substitute pixel that replaces the previously reconstructed pixel when the previously reconstructed pixel at the position determined based on the prediction information is outside a boundary of a sub-image including the target block.

17. The apparatus of claim 13, further comprising:

a loop filtering unit configured to perform loop filtering on a target pixel to be filtered in the reconstructed target block by applying a filter coefficient to the target pixel and a neighboring pixel of the target pixel,

wherein the loop filtering unit performs loop filtering on the target pixel based on a substitute pixel that substitutes for at least one of the neighboring pixels when the at least one of the neighboring pixels is outside a boundary of the sub-image including the target block.