KR20220025686A

KR20220025686A - Method for Frame Packing in MPEG Immersive Video Format

Info

Publication number: KR20220025686A
Application number: KR1020210111352A
Authority: KR
Inventors: 강제원; 박승욱; 민영빈
Original assignee: 현대자동차주식회사; 기아 주식회사; 이화여자대학교 산학협력단
Priority date: 2020-08-24
Filing date: 2021-08-24
Publication date: 2022-03-03

Abstract

Disclosed is a method for packing a frame in an MPEG immersive video format. In the present embodiment, in a method of encoding and decoding an immersive video, provided is a method for packing a frame for efficiently arranging texture and depth information in a basic view and an additional view in one screen in order to increase encoding efficiency.

Description

MPEG 몰입형 비디오 포맷에서의 프레임 패킹방법{Method for Frame Packing in MPEG Immersive Video Format}Method for Frame Packing in MPEG Immersive Video Format

본 개시는 MPEG 몰입형 비디오 포맷에서의 프레임 패킹방법에 관한 것이다. The present disclosure relates to a frame packing method in an MPEG immersive video format.

이하에 기술되는 내용은 단순히 본 발명과 관련되는 배경 정보만을 제공할 뿐 종래기술을 구성하는 것이 아니다. The content described below merely provides background information related to the present invention and does not constitute the prior art.

MPEG(Moving Picture Experts Group)에서는 몰입형 비디오(immersive video)를 위한 부호화 프로젝트로서 MPEG-I(MPEG-Immersive) 표준화가 시작되었다. 현재 표준화 기구(ISO/IEC 23090 Part 7, Metadata for Immersive Video)는 3DoF+ 비디오의 압축 표준을 개발 중이고, 향후 6DoF(Degree of Freedom) 비디오의 압축 표준으로 기술 개발을 확장할 전망이다. 6DoF는 전방위(omnidirectional) 비디오에 자유로운 움직임 시차(motion parallax)를 제공하고, 3DoF+ 비디오는 고정 시점의 머리를 중심으로 제한된 한도 내에서 움직임 시차를 제공한다. MPEG-I (MPEG-Immersive) standardization started as an encoding project for immersive video in the Moving Picture Experts Group (MPEG). Currently, the standardization organization (ISO/IEC 23090 Part 7, Metadata for Immersive Video) is developing a compression standard for 3DoF+ video, and is expected to expand technology development to a compression standard for 6DoF (Degree of Freedom) video in the future. 6DoF provides free motion parallax for omnidirectional video, and 3DoF+ video provides motion parallax within a limited limit around the head of a fixed viewpoint.

6DoF 비디오 또는 3DoF+ 비디오는 Windowed 6DoF와 Omnidirectional 6DoF 방식으로 획득 가능하다. 여기서, Windowed 6DoF는 다시점 카메라(multi-view camera) 시스템으로부터 획득되므로, 창문 형태의 영역과 같이, 사용자가 바라보는 현재 및 이웃의 시점을 평행 이동으로 제한한다. Omnidirectional 6DoF는 360 도 비디오를 다시점으로 구성하여 사용자 시점에 맞추어 제한된 공간에서 시청 자유도를 제공한다. 예컨대, 시청자는 HMD(Head Mounted Display)를 착용한 채로, 제한된 영역에서 3차원의 전방위 가상 환경을 경험할 수 있다6DoF video or 3DoF+ video can be acquired using Windowed 6DoF and Omnidirectional 6DoF methods. Here, since Windowed 6DoF is obtained from a multi-view camera system, the current and neighboring viewpoints viewed by the user are limited to parallel movement, such as a window-shaped area. Omnidirectional 6DoF provides viewing freedom in a limited space according to the user's point of view by composing a 360-degree video with multiple views. For example, a viewer can experience a three-dimensional omnidirectional virtual environment in a limited area while wearing a head mounted display (HMD).

몰입형 비디오는 일반적으로 RGB 또는 YUV 정보로 구성된 텍스처(texture) 비디오와 3차원 기하(geometry) 정보를 포함하는 깊이(depth) 비디오로 구성된다. 이외에도 3차원 상에 가려진 정보를 표현하기 위한 점유맵(occupancy map)이 포함될 수 있다.Immersive video is generally composed of texture video composed of RGB or YUV information and depth video containing 3D geometry information. In addition, an occupancy map may be included to express information hidden in 3D.

몰입형 비디오의 부호화는 HEVC(High Efficiency Video Coding), VVC(Versatile Video Coding) 등과 같은 2D 비디오 코덱 표준과의 호환성을 유지하면서 전방위 비디오에 운동시차를 부여하는 것을 목표로 한다. 몰입형 비디오는 전방위로 확장된 시청 공간에 필요한 시점 영상을 고려하므로, 시점 영상의 개수의 증가, 및 시야의 증가에 따라 해상도가 증가한다. 이에 따른 입출력 인터페이스, 압축 처리를 위한 데이터량 등을 반영하면 부호화 과정에서 픽셀율(pixel rate)이 증가하여 부호화 효율이 감소한다는 문제가 있다. 따라서, 몰입형 비디오의 부호화 과정에서의 픽셀율을 최소화하여 부호화 효율을 증가시키기 위한 다양한 방법이 고려될 필요가 있다. Immersive video encoding aims to impart motion parallax to omnidirectional video while maintaining compatibility with 2D video codec standards such as High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC). Since the immersive video considers a viewpoint image necessary for a viewing space extended in all directions, the resolution increases as the number of viewpoint images increases and the field of view increases. Accordingly, if the input/output interface and the amount of data for compression processing are reflected, there is a problem in that the pixel rate increases during the encoding process, thereby reducing the encoding efficiency. Therefore, it is necessary to consider various methods for increasing the encoding efficiency by minimizing the pixel rate in the encoding process of the immersive video.

본 개시는, 몰입형 비디오 부호화 및 복호화 방법에 있어서, 부호화 효율을 증가시키기 위해 기본 시점 및 추가 시점에서의 텍스처 및 깊이 정보를 하나의 화면에 효율적으로 배치하는 프레임 패킹방법을 제공하는 데 목적이 있다.An object of the present disclosure is to provide a frame packing method for efficiently arranging texture and depth information from a base view and an additional view on one screen in order to increase encoding efficiency in an immersive video encoding and decoding method. .

본 개시의 실시예에 따르면, 몰입형 비디오 복호화 장치가 수행하는, 몰입형 비디오의 아틀라스(atlas) 구성요소를 포함하는 팩(pack)을 언패킹(unpacking)하는 방법에 있어서, 비트스트림으로부터 패킹 플래그(packing flag)를 복호화하는 단계; 상기 패킹 플래그가 참인 경우, 상기 비트스트림으로부터 패킹 정보를 복호화하는 단계; 상기 비트스트림으로부터 서브 픽처 또는 타일을 복호화하여 상기 팩을 생성하는 단계; 및 상기 패킹 정보를 이용하여 상기 팩으로부터 상기 아틀라스 구성요소를 언패킹하는 단계를 포함하는 것을 특징으로 하는 방법을 제공한다. According to an embodiment of the present disclosure, in a method of unpacking a pack including an atlas component of an immersive video performed by an immersive video decoding apparatus, a packing flag from a bitstream Decrypting (packing flag); decoding packing information from the bitstream when the packing flag is true; decoding a subpicture or a tile from the bitstream to generate the pack; and unpacking the atlas component from the pack using the packing information.

본 개시의 다른 실시예에 따르면, 몰입형 비디오 부호화 장치가 수행하는, 몰입형 비디오(immersive video)의 아틀라스(atlas) 구성요소를 패킹(packing)하는 방법에 있어서, 상기 몰입형 비디오로부터 상기 아틀라스 구성요소를 생성하는 단계; 기설정된 패킹 플래그(packing flag)를 획득하는 단계; 상기 패킹 플래그가 참인 경우, 패킹 정보를 획득하거나 생성하는 단계; 및 상기 패킹 정보를 기반으로 상기 아틀라스 구성요소를 패킹하여 팩(pack)을 생성하는 단계를 포함하는 것을 특징으로 하는 방법을 제공한다. According to another embodiment of the present disclosure, in a method of packing an atlas component of an immersive video performed by an immersive video encoding apparatus, the atlas component is constructed from the immersive video creating an element; obtaining a preset packing flag; obtaining or generating packing information when the packing flag is true; and packing the atlas component based on the packing information to generate a pack.

이상에서 설명한 바와 같이 본 실시예에 따르면, 기본 시점 및 추가 시점에서의 텍스처 및 깊이 정보를 하나의 화면에 효율적으로 배치하는 프레임 패킹방법을 제공함으로써, 몰입형 비디오 부호화 과정에서의 부호화 효율을 증가시키고, 네트워크의 부담을 경감시키며, 몰입형 비디오 복호화를 위한 영상재생 기기의 에너지 소비 절감이 가능해지는 효과가 있다.As described above, according to the present embodiment, the encoding efficiency in the immersive video encoding process is increased by providing a frame packing method for efficiently arranging texture and depth information from the base view and the additional view on one screen, and , it has the effect of reducing the burden on the network, and reducing the energy consumption of video playback devices for immersive video decoding.

도 1은 본 개시의 기술들을 구현할 수 있는, 몰입형 비디오 부호화 장치에 대한 예시적인 블록도이다.
도 2는 본 개시의 일 실시예에 따른 시점 최적화의 과정을 나타내는 예시도이다.
도 3은 본 개시의 일 실시예에 따른 푸르너의 동작을 나타내는 예시도이다.
도 4는 본 개시의 기술들을 구현할 수 있는, 몰입형 비디오 복호화 장치에 대한 예시적인 블록도이다.
도 5는 본 개시의 일 실시예에 따른 MIV 모드에서의 부호화 방식을 나타내는 예시도이다.
도 6a, 6b 및 6c는 본 개시의 일 실시예에 따른 팩 구조를 나타내는 예시도이다.
도 7은 본 개시의 다른 실시예에 따른 팩 구조를 나타내는 예시도이다.
도 8은 본 개시의 일 실시예에 따른 깊이 정보에 우선하는 프레임 패킹 방식에 대한 예시도이다.
도 9는 본 개시의 일 실시예에 따른, 4:0:0 포맷을 갖는 심도 정보의 패킹 방식을 나타내는 예시도이다.
도 10은 본 개시의 다른 실시예에 따른 깊이 정보에 우선하는 프레임 패킹 방식에 대한 예시도이다.
도 11은 본 개시의 일 실시예에 따른 다시점 비디오 그룹에 대한 개념적인 예시도이다.
도 12는 본 개시의 일 실시예에 따른, 부호화 장치가 몰입형 비디오의 아틀라스 구성요소를 패킹하는 방법에 대한 흐름도이다.
도 13은 본 개시의 일 실시예에 따른, 복호화 장치가 몰입형 비디오의 아틀라스 구성요소를 포함하는 팩을 언패킹하는 방법에 대한 흐름도이다. 1 is an exemplary block diagram of an immersive video encoding apparatus, which may implement techniques of this disclosure.
2 is an exemplary diagram illustrating a process of time optimization according to an embodiment of the present disclosure.
3 is an exemplary diagram illustrating an operation of a Fourner according to an embodiment of the present disclosure.
4 is an exemplary block diagram of an immersive video decoding apparatus that may implement the techniques of the present disclosure.
5 is an exemplary diagram illustrating an encoding method in MIV mode according to an embodiment of the present disclosure.
6A, 6B, and 6C are exemplary views illustrating a pack structure according to an embodiment of the present disclosure.
7 is an exemplary view illustrating a pack structure according to another embodiment of the present disclosure.
8 is an exemplary diagram of a frame packing scheme that takes precedence over depth information according to an embodiment of the present disclosure.
9 is an exemplary diagram illustrating a packing method of depth information having a 4:0:0 format according to an embodiment of the present disclosure.
10 is an exemplary diagram of a frame packing scheme that takes precedence over depth information according to another embodiment of the present disclosure.
11 is a conceptual illustration of a multi-view video group according to an embodiment of the present disclosure.
12 is a flowchart of a method for an encoding apparatus to pack an atlas component of an immersive video according to an embodiment of the present disclosure;
13 is a flowchart of a method for a decoding apparatus to unpack a pack including an atlas component of an immersive video according to an embodiment of the present disclosure.

이하, 본 발명의 실시예들을 예시적인 도면을 참조하여 상세하게 설명한다. 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 실시예들을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 실시예들의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to exemplary drawings. In adding reference numerals to the components of each drawing, it should be noted that the same components are given the same reference numerals as much as possible even though they are indicated on different drawings. In addition, in the description of the present embodiments, if it is determined that a detailed description of a related well-known configuration or function may obscure the gist of the present embodiments, the detailed description thereof will be omitted.

또한, 본 실시예들의 구성요소를 설명하는 데 있어서, 제 1, 제 2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성요소를 다른 구성요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 명세서 전체에서, 어떤 부분이 어떤 구성요소를 '포함', '구비'한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 '…부', '모듈' 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Also, in describing the components of the present embodiments, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are only for distinguishing the elements from other elements, and the essence, order, or order of the elements are not limited by the terms. Throughout the specification, when a part 'includes' or 'includes' a certain element, this means that other elements may be further included, rather than excluding other elements, unless otherwise stated. . In addition, the '... Terms such as 'unit' and 'module' mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software.

첨부된 도면과 함께 이하에 개시될 상세한 설명은 본 발명의 예시적인 실시형태를 설명하고자 하는 것이며, 본 발명이 실시될 수 있는 유일한 실시형태를 나타내고자 하는 것이 아니다.DETAILED DESCRIPTION The detailed description set forth below in conjunction with the appended drawings is intended to describe exemplary embodiments of the present invention and is not intended to represent the only embodiments in which the present invention may be practiced.

도 1은 본 개시의 기술들을 구현할 수 있는, 몰입형 비디오 부호화 장치에 대한 예시적인 블록도이다. 이하, 도 1의 도시를 참조하여 몰입형 비디오 부호화 장치(이하, '부호화 장치')와 이 장치의 하위 구성들에 대하여 설명하도록 한다.1 is an exemplary block diagram of an immersive video encoding apparatus, which may implement techniques of this disclosure. Hereinafter, an immersive video encoding apparatus (hereinafter, 'encoding apparatus') and sub-components of the immersive video encoding apparatus will be described with reference to FIG. 1 .

부호화 장치는 시점 최적화기(view optimizer, 110), 아틀라스 구성기(atlas constructor, 120), 텍스처 부호화기(texture encoder, 130), 깊이 부호화기(depth encoder, 140), 및 메타데이터 합성기(metadata composer, 150)의 전부 또는 일부를 포함한다. 부호화 장치는, 입력된 다시점 비디오를 시점 최적화기(110), 및 아틀라스 구성기(120)를 순서대로 이용하여 MIV(MPEG Immersive Video) 포맷을 생성한 후, 텍스처 인코더(130) 및 깊이 인코더(140)를 이용하여 MIV 포맷의 데이터를 부호화한다. The encoding apparatus includes a view optimizer 110 , an atlas constructor 120 , a texture encoder 130 , a depth encoder 140 , and a metadata composer 150 . ) in whole or in part. The encoding apparatus generates a MIV (MPEG Immersive Video) format by using the input multi-view video in order using the viewpoint optimizer 110 and the atlas constructor 120, and then the texture encoder 130 and the depth encoder ( 140) to encode MIV format data.

시점 최적화기(110)는 입력된 다시점 비디오에 포함된 전체 시점들을 기본 시점(basic view)과 추가 시점(additional view)으로 분류한다.The view optimizer 110 classifies all views included in the input multi-view video into a basic view and an additional view.

이러한 시점 최적화를 위해, 시점 최적화기(110)는 몇 개의 기본 시점이 필요한지를 계산하고, 결정된 기본 시점의 개수만큼 기본 시점을 선택한다. 시점 최적화기(110)는, 도 2에 예시된 바와 같이, 각 시점 간의 물리적 위치(예를 들어, 시점 간의 각도 차이) 및 상호 간의 겹침을 이용하여 기본 시점과 추가 시점을 결정할 수 있다. 따라서, 전체 시점에서 가장 공통된 장면을 많이 갖는 시점이 기본 시점으로 선택될 수 있다. 기본 시점 및 추가 시점이 선택된 후, 기본 시점은 보존이 되어 부호화기에 직접 입력된다.For such viewpoint optimization, the viewpoint optimizer 110 calculates how many basic viewpoints are needed, and selects the number of basic viewpoints as many as the determined number of basic viewpoints. As illustrated in FIG. 2 , the viewpoint optimizer 110 may determine a basic viewpoint and an additional viewpoint by using a physical location between viewpoints (eg, an angular difference between viewpoints) and mutual overlap. Accordingly, a viewpoint having the most common scenes among all viewpoints may be selected as a basic viewpoint. After the base view and the additional view are selected, the base view is preserved and directly input to the encoder.

본 개시에 따른 다른 실시예에 있어서, 시점 최적화기(110)는 카메라의 시점 및 용도를 고려하여 전체 시점을 우선 그룹핑한 후, 각 그룹별 기본 시점 및 추가 시점을 구성할 수 있다. In another embodiment according to the present disclosure, the viewpoint optimizer 110 may first group all viewpoints in consideration of the viewpoint and use of the camera, and then configure a basic viewpoint and an additional viewpoint for each group.

아틀라스 구성기(120)는 기본 시점과 추가 시점으로부터 아틀라스를 구성한다. 전술한 바와 같이, 시점 최적화기(110)에서 선택된 기본 시점들은 온전한 영상 그대로 아틀라스에 포함된다. 아틀라스 구성기(120)는 기본 시점을 기준으로 예측이 어려운 부분들을 나타내는 패치들을 추가 시점로부터 생성한 후, 다수의 추가 시점으로부터 생성된 패치들을 하나의 아틀라스로 구성한다. 아틀라스를 생성하기 위해, 아틀라스 구성기(120)는, 도 1에 예시된 바와 같이, 프루너(Pruner, 122), 애그리게이터(Aggregator, 124) 및 패치 패커(Patch packer, 126)를 포함한다. The atlas configurator 120 constructs an atlas from a basic viewpoint and an additional viewpoint. As described above, the basic viewpoints selected by the viewpoint optimizer 110 are included in the atlas as they are. The atlas configurator 120 generates patches representing parts that are difficult to predict based on the basic viewpoint from the additional viewpoint, and then configures the patches generated from a plurality of additional viewpoints into one atlas. To create an atlas, the atlas configurator 120 includes a Pruner 122 , an Aggregator 124 and a Patch packer 126 , as illustrated in FIG. 1 .

프루너(122)는, 도 3에 예시된 바와 같이, 기본 시점들을 보존한 채로 추가 시점들의 중복된 부분을 제거하되, 추가 시점에 포함된 픽셀들의 중복 여부를 나타내는 이진 마스크를 생성한다. 예컨대, 하나의 추가 시점에서의 마스크는 추가 시점과 동일한 해상도를 가지며, '1' 값은 깊이 영상의 해당 픽셀에서의 값이 유효함을 나타내고, '0'은 기본 시점과 중복되므로 제거되어야 할 픽셀임을 나타낸다. As illustrated in FIG. 3 , the pruner 122 removes overlapping portions of the additional viewpoints while preserving the basic viewpoints, but generates a binary mask indicating whether pixels included in the additional viewpoints overlap. For example, the mask at one additional view has the same resolution as the additional view, a value of '1' indicates that the value at the corresponding pixel of the depth image is valid, and '0' overlaps with the primary view, so the pixel to be removed indicates that

프루너(122)는 깊이 정보를 기반으로 3차원 좌표에서 와핑(warping)하여 중복되는 정보를 탐색한다. 여기서, 와핑이란 깊이 정보를 이용하여 두 시점 간의 변위 벡터 예측 및 보상을 수행하는 과정을 나타낸다.The pruner 122 searches for overlapping information by warping in 3D coordinates based on depth information. Here, warping refers to a process of performing displacement vector prediction and compensation between two views using depth information.

프루너(122)는, 도 3에 예시된 바와 같이, 프루닝 처리가 완료된 추가 시점과도 중복성을 확인하여 최종적으로 마스크를 생성한다. 즉, 도 3의 예시에서, 추가 시점 v2의 경우, 프루너(122)는 기준 시점 v0 및 v1과의 중복성을 확인하여 마스크를 생성하고, 추가 시점 v3의 경우, 프루너(122)는 기준 시점 v0 및 v1, 추가 시점 v2와의 중복성을 확인하여 마스크를 생성한다. As illustrated in FIG. 3 , the pruner 122 confirms redundancy with the additional time point at which the pruning process is completed, and finally generates a mask. That is, in the example of FIG. 3 , in the case of the additional time v2, the pruner 122 checks the redundancy with the reference points v0 and v1 to generate a mask, and in the case of the additional time v3, the pruner 122 generates the reference points v0 and v1 Create a mask by checking for redundancy with v1 and adding point v2.

애그리게이터(124)는 시간적 순서에 따라 각 추가 시점별로 생성된 마스크를 누적한다. 이러한 마스크의 누적은 최종 아틀라스의 구성 정보를 감소시킬 수 있다. The aggregator 124 accumulates the masks generated for each additional time point according to the temporal order. Accumulation of such masks may reduce the compositional information of the final atlas.

패치 패커(126)는 기본 시점 및 추가 시점의 패치들을 패킹하여 최종적으로 아틀라스를 생성한다. 기본 시점의 텍스처 및 깊이 정보의 경우, 패치 패커(126)는 원본 영상을 패치로 이용하여 기본 시점의 아틀라스를 구성한다. 추가 시점의 텍스처 및 깊이 정보의 경우, 패치 패커(126)는 마스크를 이용하여 블록 패치들을 생성한 후, 블록 패치들을 패킹하여 추가 시점의 아틀라스를 구성한다.The patch packer 126 packs the patches of the base view and the additional view to finally generate an atlas. In the case of texture and depth information of the base view, the patch packer 126 constructs an atlas of the base view by using the original image as a patch. In the case of texture and depth information of an additional view, the patch packer 126 generates block patches using a mask, and then packs the block patches to construct an atlas of the additional view.

텍스처 부호화기(130)는 텍스처 아틀라스를 부호화한다. The texture encoder 130 encodes the texture atlas.

깊이 부호화기(140)는 깊이 아틀라스를 부호화한다.The depth encoder 140 encodes the depth atlas.

텍스처 부호화기(130) 및 깊이 부호화기(140)는, 전술한 바와 같이, HECV 또는 VVC와 같은 기존의 부호화기를 이용하여 구현될 수 있다. As described above, the texture encoder 130 and the depth encoder 140 may be implemented using an existing encoder such as HECV or VVC.

메타데이터 합성기(150)는 부호화에 관련된 시퀀스 파라미터(sequence parameter), 다시점 카메라에 대한 메타데이터, 및 아틀라스 관련된 파라미터를 생성한다. The metadata synthesizer 150 generates a sequence parameter related to encoding, metadata for a multi-viewpoint camera, and an atlas related parameter.

부호화 장치는 부호화된 텍스처, 부호화된 깊이, 및 메타데이터가 결합된 비트스트림을 생성하여 전송한다. The encoding apparatus generates and transmits a bitstream in which an encoded texture, an encoded depth, and metadata are combined.

도 4는 본 개시의 기술들을 구현할 수 있는, 몰입형 비디오 복호화 장치에 대한 예시적인 블록도이다.4 is an exemplary block diagram of an immersive video decoding apparatus that may implement the techniques of the present disclosure.

몰입형 비디오 복호화 장치(이하, '복호화 장치')는 텍스처 복호화기(texture decoder, 410), 깊이 복호화기(depth decoder, 420), 메타데이터 분석기(metadata parser, 430), 아틀라스 패치 점유맵 생성기(atlas patch occupation map generator, 440, 이하 '점유맵 생성기') 및 렌더러(renderer, 450)의 전부 또는 일부를 포함한다. The immersive video decoding apparatus (hereinafter, 'decoding apparatus') includes a texture decoder 410, a depth decoder 420, a metadata parser 430, and an atlas patch occupation map generator ( Atlas patch occupation map generator, 440, hereinafter, includes all or part of the 'occupation map generator') and the renderer (renderer, 450).

텍스처 복호화기(410)는 비트스트림으로부터 텍스처 아틀라스를 복호화한다. The texture decoder 410 decodes the texture atlas from the bitstream.

깊이 복호화기(420)는 비트스트림으로부터 깊이 아틀라스를 복호화한다.The depth decoder 420 decodes the depth atlas from the bitstream.

메타데이터 분석기(430)는 비트스트림으로부터 메타데이터를 파싱(parsing)한다.The metadata analyzer 430 parses metadata from the bitstream.

점유맵 생성기(440)는 메타데이터에 포함된 아틀라스 관련된 파라미터를 이용하여 점유맵을 생성한다. 점유맵은 블록 패치들의 위치와 관련된 정보로서, 부호화 장치에서 생성된 후 복호화 장치로 전송되거나, 복호화 장치에서 메타데이터를 이용하여 생성될 수 있다. The occupancy map generator 440 generates an occupancy map by using the atlas-related parameters included in the metadata. The occupancy map is information related to the positions of block patches, and may be generated by the encoding apparatus and then transmitted to the decoding apparatus, or may be generated by using metadata in the decoding apparatus.

렌더러(450)는 텍스처 아틀라스, 깊이 아틀라스, 및 점유맵을 이용하여 사용자에게 제공하기 위한 몰입형 비디오를 복원한다. The renderer 450 uses the texture atlas, the depth atlas, and the occupancy map to reconstruct the immersive video for presentation to the user.

전술한 바와 같이, HECV 또는 VVC와 같은 기존의 부호화기를 이용하여 아틀라스에 대한 부호화가 수행될 수 있다. 이때, 2 가지의 모드가 적용될 수 있다. As described above, encoding of the atlas may be performed using an existing encoder such as HECV or VVC. In this case, two modes may be applied.

도 5는 본 개시의 일 실시예에 따른 MIV 모드에서의 부호화 방식을 나타내는 예시도이다. 5 is an exemplary diagram illustrating an encoding method in MIV mode according to an embodiment of the present disclosure.

MIV 모드에서, 부호화 장치는 전체 영상을 모두 압축하여 전송한다. 예를 들어, 도 5에 예시된 바와 같이, 10 개의 다시점 비디오가 시점 최적화기(110)와 아틀라스 생성기(120)를 순서대로 거치면, 하나의 기본 시점에 대한 아틀라스와 세 개의 추가 시점에 대한 아틀라스들이 생성된다. 이때, 다시점 비디오의 구성에 따라 부호화 장치는 기본 시점 및 추가 시점의 개수를 각각 다르게 구성할 수 있다. 부호화 장치는 생성된 아틀라스들 각각을 기존의 부호화기를 이용하여 부호화함으로써 비트스트림을 생성할 수 있다. In the MIV mode, the encoding apparatus compresses and transmits the entire image. For example, as illustrated in FIG. 5 , when ten multi-view videos pass through the view optimizer 110 and the atlas generator 120 in order, an atlas for one primary view and an atlas for three additional views are created In this case, the encoding apparatus may configure the number of base views and additional views differently according to the configuration of the multi-view video. The encoding apparatus may generate a bitstream by encoding each of the generated atlases using an existing encoder.

다른 모드인 MIV 시점 모드에서, 부호화 장치는 아틀라스의 생성이 없이, 열 개의 전체 시점 중, 예를 들어, 다섯 개의 시점을 전송한다. 복호화 장치는 전송받은 깊이 정보와 텍스처 정보를 이용하여 나머지 다섯 개의 중간 시점을 합성한다. In another mode, the MIV view mode, the encoding apparatus transmits, for example, five views among ten total views without generating an atlas. The decoding apparatus synthesizes the remaining five intermediate views using the received depth information and texture information.

복호화 장치의 복잡도 감소 측면에서 아틀라스를 이용하는 장점은 다음과 같다. 도 5의 예시에 있어서, 부호화 장치가, 텍스처 및 깊이를 포함하여 총 20 개의 인코더를 이용하여 10 개의 전체 시점을 모두 전송하는 경우, 복호화 장치도 텍스처 및 깊이를 포함하여 총 20 개의 디코더가 필요하다. 반면, 부호화 장치가 하나의 기본 시점 및 세 개의 추가 시점에 대한 아틀라스를 생성한 후, 텍스처 및 깊이를 포함하여 총 8 개의 인코더를 이용하여 아틀라스를 전송하게 되면, 복호화 장치도 텍스처 및 깊이를 포함하여 총 8 개의 디코더가 필요하게 되어, 복잡도가 대폭 감소될 수 있다.Advantages of using the atlas in terms of reducing the complexity of the decoding device are as follows. In the example of FIG. 5 , when the encoding apparatus transmits all 10 views using a total of 20 encoders including textures and depths, the decoding apparatus also requires a total of 20 decoders including textures and depths. . On the other hand, if the encoding device generates atlas for one basic view and three additional views and then transmits the atlas using a total of eight encoders including texture and depth, the decoding device also includes texture and depth. A total of 8 decoders are required, so that complexity can be greatly reduced.

전술한 바와 같이, 텍스처와 깊이 비디오를 상이한 아틀라스로 구성하는 방식과 비교하여, frame packed video coding 방식이란 텍스처와 깊이 영상이 하나의 비디오 아틀라스에 구성되는 것을 나타낸다. 이 방식을 이용하면 복호화 장치에 포함되는 디코더의 개수가 더욱 감소될 수 있다. As described above, compared to the method of configuring the texture and depth video in different atlases, the frame packed video coding method indicates that the texture and the depth image are configured in one video atlas. By using this method, the number of decoders included in the decoding apparatus can be further reduced.

부호화 장치는, 패킹 플래그(packing flag)가 활성화된 경우(즉, 참인 경우), 다양한 아틀라스의 서로 다른 구성요소(예컨대, 텍스처, 깊이 또는 점유를 나타내는 비디오 데이터)를 하나 또는 여러 개 포함시킨 프레임 또는 팩(pack)으로 패킹한 후, 부호화 단계를 수행할 수 있다. 부호화 장치는, 고유한 식별자(id)가 부여된 각각의 팩을 비디오 비트스트림으로 부호화할 수 있다. 이때, 동일한 팩에 포함된 모든 비디오 데이터는 동일한 YUV 샘플링 형식과 비트 심도(bit depth)를 갖는다. When a packing flag is activated (that is, true), the encoding device is a frame or After packing into a pack, an encoding step may be performed. The encoding apparatus may encode each pack to which a unique identifier (id) is assigned as a video bitstream. In this case, all video data included in the same pack has the same YUV sampling format and bit depth.

도 6a, 6b 및 6c는 본 개시의 일 실시예에 따른 팩 구조를 나타내는 예시도이다. 6A, 6B, and 6C are exemplary views illustrating a pack structure according to an embodiment of the present disclosure.

도 6a, 6b 및 6c의 예시는, 3 개 그룹(각 그룹은 하나의 기본 시점과 추가 시점을 포함)에서 생성된 6 개의 텍스처 비디오, 깊이 비디오, 점유 정보를 포함하는 다양한 팩 구조를 나타낸다. a0, a2 및 a4 아틀라스는 각각 기본 시점을 포함하고, 아틀라스 a1, a3 및 a5 아틀라스는 추가 시점을 포함한다. 추가 시점과 관련된 아틀라스에는 점유 정보가 포함될 수 있다. 전술한 바와 같이, 부호화 장치는 점유 정보를 팩에 포함시켜 복호화 장치로 전송할 수도 있고, 복호화 장치에 의해 정유 정보와 관련된 점유맵이 생성될 수도 있다. 도 6a, 6b 및 6c의 예시에서, 각 아틀라스를 나타내는 숫자 뒤의 T는 텍스처, D는 깊이, 그리고 O는 점유 정보를 나타낸다. 또한, r 뒤의 숫자는 팩 안에 존재하는 모든 아틀라스의 순서를 나타낸다. The examples of FIGS. 6A, 6B and 6C show various pack structures including six texture videos, a depth video, and occupancy information generated in three groups (each group includes one basic view and an additional view). Atlas a0, a2 and a4 each contain the primary time point, and the atlas a1, a3 and a5 atlas contain additional time points. The atlas associated with the additional time points may contain occupancy information. As described above, the encoding apparatus may include the occupancy information in the pack and transmit it to the decoding apparatus, or the occupancy map related to the essential oil information may be generated by the decoding apparatus. In the example of Figures 6a, 6b and 6c, after the number representing each atlas, T represents texture, D represents depth, and O represents occupancy information. Also, the number after r indicates the order of all atlases in the pack.

도 6a에 예시된 옵션 A에서는, 동일 아틀라스의 서로 다른 구성요소인 텍스처 비디오, 깊이 비디오 및 점유 정보를 이용하여 하나의 팩이 구성된다. In option A illustrated in FIG. 6A , one pack is constructed using texture video, depth video, and occupancy information, which are different components of the same atlas.

도 6b에 예시된 옵션 B에서는, 상이한 아틀라스에서 얻은 동일한 구성요소를 이용하여 하나의 팩이 구성된다. In option B illustrated in FIG. 6b , one pack is constructed using the same components obtained from different atlases.

도 6c에 예시된 옵션 C에서는, 상이한 아틀라스의 서로 다른 구성요소인 텍스처 비디오, 깊이 비디오 및 점유 정보를 이용하여 하나의 팩이 구성될 수 있다.In option C illustrated in FIG. 6C , one pack may be constructed using different components of different atlas: texture video, depth video and occupancy information.

도 7은 본 개시의 다른 실시예에 따른 팩 구조를 나타내는 예시도이다. 7 is an exemplary view illustrating a pack structure according to another embodiment of the present disclosure.

도 7에 예시된 옵션 D와 같이, 하나의 그룹에 대한 텍스처 비디오와 깊이 비디오만을 이용하여(점유 정보는 배제됨) 하나의 팩이 구성될 수 있다. 이때, 상단으로부터 기본 시점의 텍스처 비디오, 추가 시점의 텍스처 비디오, 왼쪽 하단에 기본 시점 깊이 비디오, 그리고 추가 시점 깊이 비디오의 순서로 팩이 구성된다. 그리고, 팩의 우측 하단에 널(null) 데이터가 포함된다. 한편, 옵션 A 내지 옵션 D에서, 깊이 비디오는 다운샘플링(downsampling)된 채로 팩이 구성되었다. As in option D illustrated in FIG. 7 , one pack may be configured using only the texture video and the depth video for one group (occupying information is excluded). At this time, the pack is configured in the order of the texture video of the base view from the top, the texture video of the additional view, the base view depth video at the bottom left, and the depth video of the additional view from the top. And, null data is included in the lower right corner of the pack. On the other hand, in the options A to D, the depth video is downsampled and the pack is configured.

옵션 A 내지 D에서 각 아틀라스 구성요소들은 프레임 패킹이 되나, HEVC의 타일이나 VVC의 서브 픽처로 구성되어 후단의 부호화기로 전달될 수 있다. 이때, 각 타일 및 서브 픽처의 구성 정보, 예를 들어, 각 사각형의 좌상단 좌표, 깊이 및 높이는 SEI(Supplementary Enhancement Information) 메시지를 이용하여 부호화 장치에서 복호화 장치로 전달될 수 있다. In options A to D, each atlas component is frame packed, but may be configured as a tile of HEVC or subpicture of VVC and transmitted to a later encoder. In this case, configuration information of each tile and subpicture, for example, the upper-left coordinate, depth, and height of each rectangle may be transmitted from the encoding apparatus to the decoding apparatus using a supplementary enhancement information (SEI) message.

한편, 각 옵션별로, 하나의 팩을 부호화 또는 복호화하기 위해 하나의 인코더 또는 디코더가 필요하다. 팩을 구성하기 전에는, 예컨대, 하나의 기본 시점 또는 추가 시점에 대해 2 개의 인코더 또는 디코더가 필요했던 것을 고려하면, 부호화 장치 및 복호화 장치의 복잡도가 감소됨을 확인할 수 있다.Meanwhile, for each option, one encoder or decoder is required to encode or decode one pack. Considering that, for example, two encoders or decoders are required for one basic view or one additional view before the pack is formed, it can be seen that the complexity of the encoding apparatus and the decoding apparatus is reduced.

본 실시예는 몰입형 비디오 포맷에서의 프레임 패킹방법에 관한 내용을 개시한다. 보다 자세하게는, 몰입형 비디오 부호화 및 복호화 방법에 있어서, 부호화 효율을 증가시키기 위해 기본 시점 및 추가 시점에서의 텍스처 및 깊이 정보를 하나의 화면에 효율적으로 배치하는 프레임 패킹방법을 제공한다.This embodiment discloses a frame packing method in an immersive video format. More specifically, in an immersive video encoding and decoding method, there is provided a frame packing method for efficiently arranging texture and depth information in a basic view and an additional view in one screen in order to increase encoding efficiency.

본 개시에 따른 실시예에 있어서, 패킹 플래그가 참인 경우, 부호화 장치는 패킹 정보를 기반으로 아트라스 구성요소를 패킹(packing)하여 팩을 생성하고, 복호화 장치는 패킹 정보를 기반으로 팩으로부터 아틀라스 구성요소를 언패킹(unpacking)할 수 있다. In the embodiment according to the present disclosure, when the packing flag is true, the encoding device packs the atlas component based on the packing information to generate a pack, and the decoding device generates the pack from the atlas component based on the packing information. can be unpacked.

부호화 장치는 기설정된 패킹 플래그를 획득할 수 있다. The encoding apparatus may acquire a preset packing flag.

본 개시의 따른 다른 실시예에 있어서, 부호화 장치는, 패킹 플래그를 사용하지 않고, 기결정된 프레임 패킹 방식을 이용할 수 있다. In another embodiment of the present disclosure, the encoding apparatus may use a predetermined frame packing method without using a packing flag.

본 개시의 따른 다른 실시예에 있어서, 부호화 장치는 패킹 플래그를 생성할 수 있다. 예컨대, 텍스처 비디오와 깊이 비디오가 YUV 4:2:0의 동일 포맷인 경우, 부호화 장치는 패킹 플래그를 참으로 설정한다. 반면, 텍스처 비디오는 YUV 4:2:0이고 깊이 비디오는 YUV 4:0:0으로 상이한 포맷인 경우, 부호화 장치는 패킹 플래그를 거짓으로 설정할 수 있다. 여기서, YUV 4:0:0 포맷은 Y, U 및 V 채널 중에 Y 채널만을 사용하는 것을 나타낸다(U 및 V 채널은 존재하지 않음). 한편, 전술한 바와 같은 예시에서도, 사전에 결정된 입력 비디오의 포맷에 패킹 플래그가 의존하므로, 패킹 플래그도 사전에 설정된다고 볼 수 있다.In another embodiment of the present disclosure, the encoding apparatus may generate a packing flag. For example, when the texture video and the depth video have the same format of YUV 4:2:0, the encoding apparatus sets the packing flag to true. On the other hand, when the texture video has different formats such as YUV 4:2:0 and the depth video is YUV 4:0:0, the encoding apparatus may set the packing flag to false. Here, the YUV 4:0:0 format indicates that only the Y channel is used among Y, U, and V channels (the U and V channels do not exist). Meanwhile, even in the example as described above, since the packing flag depends on a predetermined format of the input video, it can be seen that the packing flag is also set in advance.

아틀라스 구성요소는 몰입형 비디오로부터 생성된, 기본 시점의 텍스처 비디오, 기본 시점의 깊이 비디오, 추가 시점의 텍스처 비디오, 및 추가 시점의 깊이 비디오를 포함한다.The atlas component includes texture video of a base view, depth video of a base view, texture video of an additional view, and depth video of an additional view, generated from the immersive video.

패킹 정보는 팩의 구성과 관련된 정보로서, 깊이 비디오의 우선을 지시하는 플래그, 텍스처 비디오의 비트 심도, 깊이 비디오의 비트 심도, 널 데이터(null data) 플래그, 및 다시점 비디오 그룹을 나타내는 플래그 등을 포함할 수 있다. 부호화 장치는 패킹 플래그가 참인 경우, 이러한 패킹 정보를 획득하거나 생성할 수 있다. 예컨대, 팩이 널 데이터를 포함하는 경우, 부호화 장치는 널 데이터 플래그를 참으로 설정한 후, 복호화 장치에게 전달할 수 있다. 널 데이터에 대한 자세한 사항은 추후 설명한다. The packing information is information related to the configuration of the pack, and includes a flag indicating priority of depth video, a bit depth of a texture video, a bit depth of a depth video, a null data flag, and a flag indicating a multi-view video group. may include The encoding apparatus may obtain or generate such packing information when the packing flag is true. For example, when the pack includes null data, the encoding apparatus may set the null data flag to true and then transmit it to the decoding apparatus. Details of the null data will be described later.

본 개시에 따른 프레임 패킹에 있어서, 부호화 장치는 깊이 비디오의 우선을 지시하는 플래그가 참인 경우, 깊이 정보에 우선하는 프레임 패킹을 구성할 수 있다. In the frame packing according to the present disclosure, when the flag indicating priority of depth video is true, the encoding apparatus may configure frame packing to give priority to depth information.

도 6a에 예시된 옵션 A, 도 6c에 예시된 옵션 C, 또는 도 7에 예시된 옵션 D에서와 같이 하나의 비디오 서브 픽처 또는 타일에 서로 다른 영상 정보가 패킹되는 경우, 복호화 과정에서의 각 비디오 정보 간 의존성에 기반하여 프레임 팩이 구성될 수 있다. 복호화 장치의 입장에서 보다 독립적으로 복호화가 가능한 정보는 팩의 구성에서 앞 단계에 배치되고, 앞에 배치된 정보를 이용해야 하는 정보는 팩의 구성에서 뒷 단계에 배치될 수 있다. When different image information is packed in one video subpicture or tile as in option A illustrated in FIG. 6A, option C illustrated in FIG. 6C, or option D illustrated in FIG. 7, each video in the decoding process A frame pack may be constructed based on dependencies between information. Information that can be decoded more independently from the standpoint of the decoding device may be arranged at an earlier stage in the configuration of the pack, and information that needs to use the information arranged in the front may be arranged at a later stage in the configuration of the pack.

부호화 순서 및 의존성에 따른, 부호화 장치 또는 복호화 장치 측면에서의 우선 순위는, 기본 시점의 텍스처 비디오, 기본 시점의 깊이 비디오, 추가 시점의 깊이 비디오, 및 추가 시점의 텍스처 비디오이다. 또 다른 순서는, 기본 시점의 깊이 비디오, 기본 시점의 텍스처 비디오, 추가 시점의 깊이 비디오, 및 추가 시점의 텍스처 비디오이다. 전술한 바와 같이, 추가 시점의 텍스처 비디오는, 깊이 정보가 선행적으로 복호화되어야 깊이 정보에 기반하는 와핑 후, 복호가 가능하기 때문이다. According to the encoding order and dependency, priority in terms of an encoding device or a decoding device is a texture video of a base view, a depth video of a base view, a depth video of an additional view, and a texture video of an additional view. Another order is the depth video of the base view, the texture video of the base view, the depth video of the additional view, and the texture video of the additional view. This is because, as described above, the texture video of the additional view can be decoded after warping based on the depth information only when the depth information is previously decoded.

도 8은 본 개시의 일 실시예에 따른 깊이 정보에 우선하는 프레임 패킹 방식에 대한 예시도이다.8 is an exemplary diagram of a frame packing scheme that takes precedence over depth information according to an embodiment of the present disclosure.

예를 들어, 도 7에 예시된 옵션 D는, 도 8에 예시된 바와 같이 하나의 팩이 구성될 수 있다. 또한, 도 6a에 예시된 옵션 A에서는, 기본 시점을 구성하는 아틀라스 0의 경우 텍스처 정보가 우선하여 나오지만, 추가 시점을 구성하는 아틀라스 1의 경우 깊이 정보가 우선하도록 팩이 구성될 수 있다. For example, in option D illustrated in FIG. 7 , one pack may be configured as illustrated in FIG. 8 . In addition, in option A illustrated in FIG. 6A , the pack may be configured such that, in the case of atlas 0 constituting the basic view, texture information takes precedence, but in the case of atlas 1 constituting the additional view, depth information takes precedence.

전술한 바와 같이, 각 아틀라스 구성요소 간 의존성 순서에 따라 팩을 구성하면, 복호화 과정에서 메모리를 절감하는 것이 가능해지는 효과가 있다.As described above, if the pack is configured according to the order of dependencies between each atlas component, memory can be saved in the decryption process.

본 개시에 따른 일 실시예에 있어서, 텍스처 및 깊이 정보가 상이한 채널 및 비트 심도를 갖는 경우, 널 데이터 영역을 이용하여 팩이 구성될 수 있다.In an embodiment according to the present disclosure, when texture and depth information have different channels and bit depths, a pack may be configured using a null data region.

전술한 바와 같이, 동일한 팩에 포함된 모든 비디오 데이터는 동일한 YUV 샘플링 형식과 비트 심도를 갖는다. 하지만, 예를 들어, HEVC main10 프로파일을 이용하여 부호화하는 경우, 10 비트의 비트 심도를 갖는 YUV 4:2:0 포맷의 텍스처 비디오의 부호화에는 문제가 없으나, YUV 4:0:0 포맷의 10 비트 이상 심도를 갖는 깊이 비디오를 압축하는데 문제가 있을 수 있다. 일반적으로, 깊이 비디오는 16 비트의 심도 값에 의해 표현될 수 있다. As mentioned above, all video data included in the same pack has the same YUV sampling format and bit depth. However, for example, when encoding using the HEVC main10 profile, there is no problem in encoding the texture video of the YUV 4:2:0 format having a bit depth of 10 bits, but 10 bits of the YUV 4:0:0 format. There can be problems compressing depth video with abnormal depth. In general, a depth video may be represented by a depth value of 16 bits.

이러한 문제를 해결하기 위해, 부호화 장치는 깊이 비디오를 상위(Most Significant Bit: MSB)와 하위(Least Significant Bit: LSB) 부분으로 나눈 후, 상위 부분과 하위 부분 각각을 하나의 타일 또는 서브 픽처로 패킹할 수 있다. 예를 들어, 도 9에 예시된 바와 같이, 옵션 D에 있어서, 16 비트 심도를 갖는 깊이 정보는 각각 비트 심도 8 비트를 갖는 상위 및 하위 영상으로 분할된 채로 패킹될 수 있다. 이때, 도 7에 예시된 옵션 D의 팩 구성에서, 널 데이터 영역이 상위 또는 하위 영상을 배치하기 위해 이용될 수 있다. 도 9에 예시된 바는, 하위 영상이 널 데이터 영역에 배치된 경우이다.In order to solve this problem, the encoding apparatus divides the depth video into upper (Most Significant Bit: MSB) and lower (Least Significant Bit: LSB) parts, and then packs each of the upper part and lower part into one tile or sub picture. can do. For example, as illustrated in FIG. 9 , in option D, depth information having a depth of 16 bits may be divided and packed into upper and lower images each having a bit depth of 8 bits. In this case, in the pack configuration of option D illustrated in FIG. 7 , a null data area may be used to arrange an upper or lower image. As illustrated in FIG. 9 , the lower image is arranged in the null data area.

본 개시에 따른 다른 실시예에 있어서, 깊이 정보에 우선하는 프레임 패킹 방식을 이용하는 경우, 도 10에 예시된 바와 같이, 기본 시점의 텍스처 비디오, 기본 시점의 깊이 비디오(상위 및 하위), 추가 시점의 깊이 비디오(상위 및 하위), 및 추가 시점의 텍스처 비디오의 순서로 팩이 구성될 수 있다.In another embodiment according to the present disclosure, when using the frame packing method that takes precedence over depth information, as illustrated in FIG. 10 , the texture video of the base view, the depth video (upper and lower) of the base view, and the additional view A pack can be organized in the following order: depth video (top and bottom), and texture video from additional viewpoints.

한편, 본 개시에 따른 다른 실시예에 있어서, 4:0:0 포맷의 깊이 비디오를 4:2:0 포맷에 패킹하는 경우, 부호화 장치는 다음과 같은 방식을 이용할 수 있다. Meanwhile, according to another embodiment of the present disclosure, when a 4:0:0 format depth video is packed into a 4:2:0 format, the encoding apparatus may use the following method.

먼저, 깊이 비디오의 비트 심도가 텍스처 비디오의 비트 심도보다 작거나 같은 경우, 부호화 장치는 Y 채널에 4:0:0의 깊이 비디오를 넣고, 기설정된 값(예컨대, 0 또는 128) 또는 Y 채널을 다운샘플링하여 U 및 V 채널에 넣는다. First, when the bit depth of the depth video is less than or equal to the bit depth of the texture video, the encoding apparatus puts the depth video of 4:0:0 into the Y channel, and sets a preset value (eg, 0 or 128) or the Y channel. Downsample it and put it into the U and V channels.

다음, 깊이 비디오의 비트 심도가 텍스처 비디오의 비트 심도보다 큰 경우, 부호화 장치는 텍스처 비디오의 비트 심도만큼의 상위(MSB) 깊이 비디오 또는 하위(MSB) 깊이 비디오를 Y 채널에 넣고, 나머지 정보를 U 및 V 채널에 패킹한다. Next, when the bit depth of the depth video is greater than the bit depth of the texture video, the encoding device puts the higher (MSB) depth video or the lower (MSB) depth video equal to the bit depth of the texture video into the Y channel, and stores the remaining information in the U channel. and the V channel.

널 데이터는, 도 7의 예시의 우하단에 위치한 영역을 나타내는 데이터로서, 해당 영역에는 텍스처 또는 깊이 정보 등과 같은 유효한 정보가 패킹되지 않는다. 도 7에 예시된 바와 같이, 이러한 널 데이터가 독립된 타일 또는 서브 픽처로 구성되어 전달되는 경우, 미소하지만 추가적인 비트량의 소요, 복호화 과정에서의 지연 등의 문제가 발생할 수 있다. 따라서, 부호화 장치는 동반 전달되는 SEI 메시지를 이용하여, 각 타일 및 서브 픽처의 구성 정보 이외에도 해당 영역이 널 데이터를 포함하고 있는지 여부를 나타내는 널 데이터 플래그를 시그널링할 수 있다. 만약 널 데이터를 포함하고 있다고 하면, 복호화 장치는 추가적인 타일의 복호화 없이 널 데이터의 해당 영역을 기설정된 값(예컨대, 0 또는 128)으로 채울 수 있다. Null data is data indicating a region located at the lower right of the example of FIG. 7 , and valid information such as texture or depth information is not packed in the corresponding region. As illustrated in FIG. 7 , when such null data is transmitted as an independent tile or sub-picture, a small amount of additional bits is required, and problems such as a delay in the decoding process may occur. Accordingly, the encoding apparatus may use the accompanying SEI message to signal a null data flag indicating whether a corresponding region includes null data in addition to configuration information of each tile and subpicture. If null data is included, the decoding apparatus may fill the corresponding area of null data with a preset value (eg, 0 or 128) without decoding an additional tile.

도 11은 본 개시의 일 실시예에 따른 다시점 비디오 그룹에 대한 개념적인 예시도이다. 11 is a conceptual illustration of a multi-view video group according to an embodiment of the present disclosure.

다시점 비디오 그룹이란 서로 인접한 영역에 위치한 하나 이상의 (다시점) 카메라로부터 동일 피사체를 취득한 비디오 셋을 나타낸다. 예를 들어, 도 11에 예시된 바와 같은 카메라 배열은 2 개의 그룹 g1과 g2을 구성함을 보인다. 이때, g1과 g2을 구성하는 비디오는 상호 겹치지 않으며, g1과 g2의 비디오를 모두 합하면 전체 비디오를 구성할 수 있다. 다시점 비디오 그룹을 나타내는 플래그가 참인 경우, 각 그룹은 그룹별로 기본 시점 및 추가 시점을 갖는다. 이때, 하나의 팩은 동일한 그룹에 있는 기본 시점 및 추가 시점으로만 구성될 수 있다. 즉, 상이한 그룹에 있는 시점 비디오는 하나의 픽처에 패킹되지 않는다. 부호화 장치는 다시점 비디오 그룹별로 프레임 패킹을 수행한다. 즉, 부호화 장치는 하나의 그룹에 포함되는 기본 시점 및 추가 시점만으로 하나의 팩을 구성할 수 있다.The multi-view video group refers to a video set obtained by acquiring the same subject from one or more (multi-view) cameras located in areas adjacent to each other. For example, it is shown that the camera arrangement as illustrated in FIG. 11 constitutes two groups g1 and g2. In this case, the videos constituting g1 and g2 do not overlap each other, and the entire video can be composed by summing all the videos of g1 and g2. When the flag indicating the multi-view video group is true, each group has a basic view and an additional view for each group. In this case, one pack may consist of only the basic viewpoint and the additional viewpoint in the same group. That is, the viewpoint videos in different groups are not packed into one picture. The encoding apparatus performs frame packing for each multi-view video group. That is, the encoding apparatus may configure one pack with only the basic view and the additional view included in one group.

한편, VVC에서는, 서브 픽처를 구성하면 각 영역을 독립적인 픽처 단위로 부호화하는 것이 가능하다. 이러한 픽처 단위의 부호화 과정에서, 특정 부호화 알고리즘이 적응적으로 사용되거나 강제로 사용되지 않을 수 있다. 예컨대, VVC의 인루프필터(inloop filter)를 구성하는 디블로킹 필터, SAO(Sample Adaptive Offset) 필터, 및 ALF(Adaptive Loop Filter)는 사람의 인지 시각 측면에서 화질을 개선하기 위한 부호화 기술이다. 이러한 부호화 기술은 기본 시점의 텍스처 아틀라스 영상에는 필수적이나 경계 간에 차이를 보존해야 하는 깊이 영상이나 추가 시점의 아틀라스에는 굳이 필요하지 않을 수 있다. 따라서, 이러한 아틀라스 영상을 서브 픽처로 구성하여 부호화하는 경우, 부호화 장치 또는 복호화 장치는 강제로 인루프필터를 사용하지 않도록 한다. On the other hand, in VVC, if subpictures are constituted, each region can be coded in independent picture units. In this picture-unit encoding process, a specific encoding algorithm may be adaptively used or may not be forcibly used. For example, a deblocking filter, a sample adaptive offset (SAO) filter, and an adaptive loop filter (ALF) constituting an inloop filter of VVC are encoding technologies for improving image quality in terms of human perception. Such an encoding technique is essential for a texture atlas image of a base view, but may not be necessary for a depth image or an atlas of an additional view in which a difference between boundaries must be preserved. Accordingly, when encoding the atlas image by composing it as a sub picture, the encoding apparatus or the decoding apparatus forcibly does not use the in-loop filter.

전술한 바와 같이, 각 서브 픽처를 압축하는 경우, 부호화 장치 또는 복호화 장치는 상이한 코덱 프로파일을 적용할 수 있다. 예를 들어, 텍스처 영상의 부호화 또는 복호화를 위해서는 4:2:0 포맷을 지원하는 코덱 프로파일이 이용된다. 그러나, 깊이 영상의 부호화 또는 복호화를 위해서는 4:0:0 포맷을 지원하는 코덱 프로파일이 이용될 수 있다.As described above, when each subpicture is compressed, an encoding apparatus or a decoding apparatus may apply a different codec profile. For example, a codec profile supporting a 4:2:0 format is used for encoding or decoding a texture image. However, a codec profile supporting a 4:0:0 format may be used for encoding or decoding a depth image.

부호화 장치는, 전술한 바와 같은 팩의 구성과 관련된 패킹 정보를 복호화 장치에게 전송할 수 있다. The encoding apparatus may transmit the packing information related to the configuration of the pack as described above to the decoding apparatus.

도 12는 본 개시의 일 실시예에 따른, 부호화 장치가 몰입형 비디오의 아틀라스 구성요소를 패킹하는 방법에 대한 흐름도이다. 12 is a flowchart of a method for an encoding apparatus to pack an atlas component of an immersive video according to an embodiment of the present disclosure;

부호화 장치는 몰입형 비디오로부터 아틀라스 구성요소를 생성한다(S1200).The encoding apparatus generates an atlas component from the immersive video (S1200).

부호화 장치는, 전술한 바와 같이 입력된 몰입형 비디오에 대한 시점 최적화 및 아틀라스 구성을 수행하여 아틀라스 구성요소를 생성한다. 여기서, 아틀라스 구성요소는 몰입형 비디오로부터 생성된, 기본 시점의 텍스처 비디오, 기본 시점의 깊이 비디오, 추가 시점의 텍스처 비디오, 및 추가 시점의 깊이 비디오를 포함한다.The encoding apparatus generates an atlas component by performing view optimization and atlas construction on the input immersive video as described above. Here, the atlas component includes a texture video of a base view, a depth video of a base view, a texture video of an additional view, and a depth video of an additional view, generated from the immersive video.

부호화 장치는 기설정된 패킹 플래그를 획득한다(S1202).The encoding apparatus acquires a preset packing flag (S1202).

부호화 장치는 패킹 플래그가 참인 경우, 패킹 정보를 획득한다(S1204). 여기서, 패킹 정보는 깊이 비디오의 우선을 지시하는 플래그, 텍스처 비디오의 비트 심도, 깊이 비디오의 비트 심도, 널 데이터(null data) 플래그, 및 다시점 비디오 그룹을 나타내는 플래그 등을 포함할 수 있다. 부호화 장치는, 전술한 바와 같이 패킹 정보의 일부를 생성할 수 있다.When the packing flag is true, the encoding apparatus obtains packing information (S1204). Here, the packing information may include a flag indicating priority of a depth video, a bit depth of a texture video, a bit depth of a depth video, a null data flag, and a flag indicating a multi-view video group. The encoding apparatus may generate a part of the packing information as described above.

패킹 플래그가 참이 아닌 경우, 부호화 장치는 패킹 정보의 전부 또는 일부를 획득하지 않고, 팩을 생성하지 않은 채로, 아틀라스 구성요소별로 부호화를 수행할 수 있다. When the packing flag is not true, the encoding apparatus may perform encoding for each atlas component without acquiring all or part of the packing information and without generating a pack.

부호화 장치는 패킹 정보를 기반으로 아틀라스 구성요소를 패킹하여 팩을 생성한다(S1206).The encoding device packs the atlas component based on the packing information to generate a pack (S1206).

부호화 장치는, 깊이 비디오의 우선을 지시하는 플래그가 참이 아닌 경우, 기본 시점의 텍스처 비디오, 추가 시점의 텍스처 비디오, 기본 시점의 깊이 비디오, 및 추가 시점의 깊이 비디오의 순서로 팩을 구성한다. When the flag indicating priority of the depth video is not true, the encoding apparatus constructs packs in the order of the texture video of the base view, the texture video of the additional view, the depth video of the base view, and the depth video of the additional view.

한편, 부호화 장치는, 깊이 비디오의 우선을 지시하는 플래그가 참인 경우, 기본 시점의 텍스처 비디오, 기본 시점의 깊이 비디오, 추가 시점의 깊이 비디오, 및 추가 시점의 텍스처 비디오의 순서로 팩을 구성한다. On the other hand, when the flag indicating priority of the depth video is true, the encoding apparatus configures packs in the order of the texture video of the base view, the depth video of the base view, the depth video of the additional view, and the texture video of the additional view.

부호화 장치는, 깊이 비디오의 비트 심도가 16 비트인 경우, 깊이 비디오를 비트 심도가 8 비트인 상위 비디오 및 하위 비디오로 분할하여 팩을 구성할 수 있다. 이때, 깊이 비디오의 우선을 지시하는 플래그가 참인 경우, 부호화 장치는, 기본 시점의 텍스처 비디오, 기본 시점에 대한 상위 깊이 비디오, 기본 시점에 대한 하위 깊이 비디오, 추가 시점에 대한 하위 깊이 비디오, 추가 시점에 대한 하위 깊이 비디오, 및 추가 시점의 텍스처 비디오의 순서로 팩을 구성할 수 있다.When the bit depth of the depth video is 16 bits, the encoding apparatus may configure a pack by dividing the depth video into an upper video and a lower video having a bit depth of 8 bits. In this case, when the flag indicating priority of the depth video is true, the encoding apparatus performs the texture video of the base view, the higher depth video for the base view, the lower depth video for the base view, the lower depth video for the additional view, and the additional view. You can organize the pack in the following order: a sub-depth video for , and a texture video from an additional viewpoint.

부호화 장치는, 4:0:0 포맷의 깊이 정보를 4:2:0 포맷에 패킹함에 있어서, 깊이 비디오의 비트 심도가 텍스처 비디오의 비트 심도보다 작거나 같은 경우, Y 채널에 4:0:0 포맷의 깊이 정보를 채우고, 기설정된 값 또는 Y 채널을 다운샘플링하여 U 및 V 채널에 채울 수 있다.When the encoding apparatus packs the depth information of the 4:0:0 format into the 4:2:0 format, when the bit depth of the depth video is less than or equal to the bit depth of the texture video, the 4:0:0 channel is transmitted to the Y channel. The depth information of the format may be filled, and the U and V channels may be filled by down-sampling a preset value or the Y channel.

한편, 부호화 장치는, 깊이 비디오의 비트 심도가 텍스처 비디오의 비트 심도보다 큰 경우, 텍스처 비디오의 비트 심도만큼의 깊이 비디오 또는 하위 깊이 비디오를 Y 채널에 채우고, 나머지 정보를 U 및 V 채널에 채울 수 있다.On the other hand, when the bit depth of the depth video is greater than the bit depth of the texture video, the encoding apparatus fills the Y channel with the depth video or the sub-depth video corresponding to the bit depth of the texture video, and the remaining information can be filled into the U and V channels. there is.

부호화 장치는, 도 7 또는 도 8에 예시된 바와 같이, 팩에 널 데이터를 포함시키고, 이를 나타내는 널 데이터 플래그를 참으로 설정할 수 있다. 이때, 널 데이터의 해당 영역은 기설정된 값(예컨대, 0 또는 128)으로 표현될 수 있다. As illustrated in FIG. 7 or FIG. 8 , the encoding apparatus may include null data in the pack and set a null data flag indicating this to true. In this case, the corresponding area of null data may be expressed as a preset value (eg, 0 or 128).

다시점 비디오 그룹을 나타내는 플래그가 참인 경우, 적어도 하나의 다시점 비디오 그룹 각각은, 그룹별 기본 시점 및 추가 시점을 포함한다. 따라서, 부호화 장치는, 다시점 비디오 그룹별로 팩을 구성한다. 즉, 부호화 장치는 하나의 그룹에 포함되는 기본 시점 및 추가 시점만으로 하나의 팩을 구성할 수 있다.When the flag indicating the multi-view video group is true, each of the at least one multi-view video group includes a basic view and an additional view for each group. Accordingly, the encoding apparatus configures a pack for each multi-view video group. That is, the encoding apparatus may configure one pack with only the basic view and the additional view included in one group.

부호화 장치는, 팩을 서브 픽처 또는 타일로 구성한 후, 부호화한다. The encoding apparatus encodes a pack after composing it with subpictures or tiles.

부호화 장치는, 패킹 플래그, 패킹 정보, 및 팩을 부호화한 비트스트림을 복호화 장치에게 전송하다. The encoding apparatus transmits the packing flag, packing information, and the bitstream obtained by encoding the pack to the decoding apparatus.

도 13은 본 개시의 일 실시예에 따른, 복호화 장치가 몰입형 비디오의 아틀라스 구성요소를 포함하는 팩을 언패킹하는 방법에 대한 흐름도이다. 13 is a flowchart of a method for a decoding apparatus to unpack a pack including an atlas component of an immersive video, according to an embodiment of the present disclosure.

복호화 장치는 비트스트림으로부터 패킹 플래그를 복호화한다(S1300). The decoding apparatus decodes the packing flag from the bitstream (S1300).

복호화 장치는 패킹 플래그가 참인 경우, 비트스트림으로부터 패킹 정보를 복호화한다(S1302). 여기서 패킹 정보는, 깊이 비디오의 우선을 지시하는 플래그, 텍스처 비디오의 비트 심도, 깊이 비디오의 비트 심도, 널 데이터(null data) 플래그, 및 다시점 비디오 그룹을 나타내는 플래그 등을 포함할 수 있다. When the packing flag is true, the decoding apparatus decodes packing information from the bitstream (S1302). Here, the packing information may include a flag indicating priority of a depth video, a bit depth of a texture video, a bit depth of a depth video, a null data flag, and a flag indicating a multi-view video group.

패킹 플래그가 참이 아닌 경우, 복호화 장치는 패킹 정보의 전부 또는 일부를 복호화하지 않고, 팩을 생성하지 않은 채로, 아틀라스 구성요소별로 복호화를 수행할 수 있다. When the packing flag is not true, the decoding apparatus may perform decoding for each atlas component without decoding all or part of the packing information and without generating a pack.

복호화 장치는 비트스트림으로부터 서브 픽처 또는 타일을 복호화하여 팩을 생성한다(S1304).The decoding apparatus generates a pack by decoding a subpicture or a tile from the bitstream (S1304).

복호화 장치는 패킹 정보를 이용하여 팩으로부터 아틀라스 구성요소를 언패킹한다(S1306). 여기서, 아틀라스 구성요소는 몰입형 비디오를 복원하기 위한, 기본 시점의 텍스처 비디오, 기본 시점의 깊이 비디오, 추가 시점의 텍스처 비디오, 및 추가 시점의 깊이 비디오를 포함한다.The decoding apparatus unpacks the atlas component from the pack using the packing information (S1306). Here, the atlas component includes a texture video of a base view, a depth video of a base view, a texture video of an additional view, and a depth video of an additional view for reconstructing an immersive video.

널 데이터 플래그가 참인 경우, 팩은 널 데이터를 포함하고, 복호화 장치는 추가적인 타일 복호화 없이 널 데이터의 해당 영역을 기설정된 값(예컨대, 0 또는 128)으로 채울 수 있다.When the null data flag is true, the pack includes null data, and the decoding apparatus may fill a corresponding area of null data with a preset value (eg, 0 or 128) without additional tile decoding.

이상에서 설명한 바와 같이 본 실시예에 따르면, 기본 시점 및 추가 시점에서의 텍스처 및 깊이 정보를 하나의 화면에 효율적으로 배치하는 프레임 패킹방법을 제공함으로써, 몰입형 비디오 부호화 과정에서의 부호화 효율을 증가시키는 것이 가능해지는 효과가 있다. As described above, according to the present embodiment, there is provided a frame packing method for efficiently arranging texture and depth information from a basic view and an additional view on one screen, thereby increasing encoding efficiency in an immersive video encoding process. It has the effect of making it possible.

본 실시예에 따른 각 순서도에서는 각각의 과정을 순차적으로 실행하는 것으로 기재하고 있으나, 반드시 이에 한정되는 것은 아니다. 다시 말해, 순서도에 기재된 과정을 변경하여 실행하거나 하나 이상의 과정을 병렬적으로 실행하는 것이 적용 가능할 것이므로, 순서도는 시계열적인 순서로 한정되는 것은 아니다.Although it is described that each process is sequentially executed in each flowchart according to the present embodiment, the present invention is not limited thereto. In other words, since it may be applicable to change and execute the processes described in the flowchart or to execute one or more processes in parallel, the flowchart is not limited to a time-series order.

한편, 본 실시예에서 설명된 다양한 기능들 혹은 방법들은 하나 이상의 프로세서에 의해 판독되고 실행될 수 있는 비일시적 기록매체에 저장된 명령어들로 구현될 수도 있다. 비일시적 기록매체는, 예를 들어, 컴퓨터 시스템에 의하여 판독가능한 형태로 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 예를 들어, 비일시적 기록매체는 EPROM(erasable programmable read only memory), 플래시 드라이브, 광학 드라이브, 자기 하드 드라이브, 솔리드 스테이트 드라이브(SSD)와 같은 저장매체를 포함한다.Meanwhile, various functions or methods described in this embodiment may be implemented as instructions stored in a non-transitory recording medium that can be read and executed by one or more processors. The non-transitory recording medium includes, for example, any type of recording device in which data is stored in a form readable by a computer system. For example, the non-transitory recording medium includes a storage medium such as an erasable programmable read only memory (EPROM), a flash drive, an optical drive, a magnetic hard drive, and a solid state drive (SSD).

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of this embodiment, and a person skilled in the art to which this embodiment belongs may make various modifications and variations without departing from the essential characteristics of the present embodiment. Accordingly, the present embodiments are intended to explain rather than limit the technical spirit of the present embodiment, and the scope of the technical spirit of the present embodiment is not limited by these embodiments. The protection scope of this embodiment should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be interpreted as being included in the scope of the present embodiment.

110: 시점 최적화기 120: 아틀라스 구성기
122: 프루너 124: 애그리게이터
126: 패치 패커
130: 텍스처 부호화기 140: 깊이 부호화기
150: 메타데이터 합성기
410: 텍스처 복호화기 450: 깊이 복호화기
430: 메타데이터 분석기
440: 아틀라스 패티 점유맵 생성기
450: 렌더러
110: viewpoint optimizer 120: atlas configurator
122: pruner 124: aggregator
126: Patch Packer
130: texture encoder 140: depth encoder
150: Metadata Composer
410: texture decoder 450: depth decoder
430: metadata analyzer
440: Atlas Patty Occupancy Map Generator
450: renderer

Claims

몰입형 비디오 복호화 장치가 수행하는, 몰입형 비디오의 아틀라스(atlas) 구성요소를 포함하는 팩(pack)을 언패킹(unpacking)하는 방법에 있어서,
비트스트림으로부터 패킹 플래그(packing flag)를 복호화하는 단계;
상기 패킹 플래그가 참인 경우, 상기 비트스트림으로부터 패킹 정보를 복호화하는 단계;
상기 비트스트림으로부터 서브 픽처 또는 타일을 복호화하여 상기 팩을 생성하는 단계; 및
상기 패킹 정보를 이용하여 상기 팩으로부터 상기 아틀라스 구성요소를 언패킹하는 단계
를 포함하는 것을 특징으로 하는 방법. A method of unpacking a pack including an atlas component of an immersive video, performed by an immersive video decoding apparatus, the method comprising:
decoding a packing flag from the bitstream;
decoding packing information from the bitstream when the packing flag is true;
decoding a subpicture or a tile from the bitstream to generate the pack; and
unpacking the atlas component from the pack using the packing information;
A method comprising a.

제1항에 있어서,
상기 아틀라스 구성요소는,
상기 몰입형 비디오를 복원하기 위한, 기본 시점(basic view)의 텍스처(texture) 비디오, 기본 시점의 깊이 비디오(depth video), 추가 시점(additional view)의 텍스처 비디오, 및 추가 시점의 깊이 비디오를 포함하는 것을 특징으로 하는 방법.According to claim 1,
The atlas component is
For reconstructing the immersive video, including a texture video of a basic view, a depth video of a basic view, a texture video of an additional view, and a depth video of an additional view A method characterized in that

제2항에 있어서,
상기 패킹 정보는,
깊이 비디오의 우선을 지시하는 플래그, 텍스처 비디오의 비트 심도(bit depth), 상기 깊이 비디오의 비트 심도, 널 데이터(null data) 플래그, 및 다시점 비디오 그룹을 나타내는 플래그를 포함하는 것을 특징으로 하는 방법. 3. The method of claim 2,
The packing information is
Method comprising: a flag indicating priority of depth video, bit depth of texture video, bit depth of depth video, null data flag, and flag indicating multi-view video group .

제3항에 있어서,
상기 깊이 비디오의 우선을 지시하는 플래그가 참이 아닌 경우, 상기 기본 시점의 텍스처 비디오, 상기 추가 시점의 텍스처 비디오, 상기 기본 시점의 깊이 비디오, 및 상기 추가 시점의 깊이 비디오의 순서로 상기 팩이 구성되는 것을 특징으로 하는 방법. 4. The method of claim 3,
When the flag indicating priority of the depth video is not true, the pack is configured in the order of the texture video of the base view, the texture video of the additional view, the depth video of the base view, and the depth video of the additional view. A method characterized by being.

제3항에 있어서,
상기 깊이 비디오의 우선을 지시하는 플래그가 참인 경우, 상기 기본 시점의 텍스처 비디오, 상기 기본 시점의 깊이 비디오, 상기 추가 시점의 깊이 비디오, 및 상기 추가 시점의 텍스처 비디오의 순서로 상기 팩이 구성되는 것을 특징으로 하는 방법. 4. The method of claim 3,
When the flag indicating priority of the depth video is true, the pack is configured in the order of the texture video of the base view, the depth video of the base view, the depth video of the additional view, and the texture video of the additional view How to characterize.

제3항에 있어서,
상기 깊이 비디오의 비트 심도가 16 비트인 경우, 상기 깊이 비디오를 비트 심도가 8 비트인 상위 비디오 및 하위 비디오로 분할하여 상기 팩이 구성되는 것을 특징으로 하는 방법.4. The method of claim 3,
When the bit depth of the depth video is 16 bits, the pack is configured by dividing the depth video into an upper video and a lower video having a bit depth of 8 bits.

제6항에 있어서,
상기 깊이 비디오의 우선을 지시하는 플래그가 참인 경우, 상기 기본 시점의 텍스처 비디오, 상기 기본 시점에 대한 상위 깊이 비디오, 상기 기본 시점에 대한 하위 깊이 비디오, 상기 추가 시점에 대한 하위 깊이 비디오, 상기 추가 시점에 대한 하위 깊이 비디오, 및 상기 추가 시점의 텍스처 비디오의 순서로 상기 팩이 구성되는 것을 특징으로 하는 방법. 7. The method of claim 6,
When the flag indicating priority of the depth video is true, the texture video of the base view, the higher depth video for the base view, the lower depth video for the base view, the lower depth video for the additional view, and the additional view The method according to claim 1 , wherein the pack is configured in the order of a low-depth video for , and a texture video of the additional view.

제3항에 있어서,
4:0:0 포맷의 깊이 정보를 4:2:0 포맷에 패킹함에 있어서, 상기 깊이 비디오의 비트 심도가 상기 텍스처 비디오의 비트 심도보다 작거나 같은 경우, Y 채널에 4:0:0 포맷의 깊이 정보가 채워지고, 기설정된 값 또는 상기 Y 채널을 다운샘플링하여 U 및 V 채널에 채워지는 것을 특징으로 하는 방법.4. The method of claim 3,
In packing the depth information of the 4:0:0 format in the 4:2:0 format, when the bit depth of the depth video is less than or equal to the bit depth of the texture video, the 4:0:0 format is transmitted to the Y channel. Depth information is filled, and the U and V channels are filled with a preset value or by downsampling the Y channel.

제8항에 있어서,
상기 깊이 비디오의 비트 심도가 상기 텍스처 비디오의 비트 심도보다 큰 경우, 상기 텍스처 비디오의 비트 심도만큼의 상위 깊이 비디오 또는 하위 깊이 비디오가 상기 Y 채널에 채워지고, 나머지 정보가 상기 U 및 V 채널에 채워지는 것을 특징으로 하는 방법.9. The method of claim 8,
When the bit depth of the depth video is greater than the bit depth of the texture video, a higher depth video or a lower depth video equal to the bit depth of the texture video is filled in the Y channel, and the remaining information is not filled in the U and V channels. A method characterized in that

제3항에 있어서,
상기 널 데이터 플래그가 참인 경우, 상기 팩은 널 데이터를 포함하고,
상기 언패킹하는 단계는 상기 널 데이터의 해당 영역을 기설정된 값으로 채우는 것을 특징으로 하는 방법.4. The method of claim 3,
If the null data flag is true, the pack contains null data,
The unpacking comprises filling the corresponding area of the null data with a preset value.

제3항에 있어서,
상기 다시점 비디오 그룹을 나타내는 플래그가 참인 경우, 적어도 하나의 다시점 그룹 각각은, 그룹별 기본 시점 및 추가 시점을 포함하되, 하나의 그룹에 포함되는 기본 시점 및 추가 시점만으로 상기 팩이 구성되는 것을 특징으로 하는 방법.4. The method of claim 3,
When the flag indicating the multi-view video group is true, each of the at least one multi-view group includes a basic view and an additional view for each group, and the pack is configured only with the basic view and the additional view included in one group. How to characterize.

제2항에 있어서,
상기 기본 시점의 깊이 비디오, 상기 추가 시점의 텍스처 비디오, 및 상기 추가 시점의 깊이 비디오에 해당하는 서브 픽처를 복호화하는 경우, 인루프 필터(inloop filter)를 사용하지 않는 것을 특징으로 하는 방법. 3. The method of claim 2,
In the case of decoding the subpicture corresponding to the depth video of the base view, the texture video of the additional view, and the depth video of the additional view, an inloop filter is not used.

몰입형 비디오 부호화 장치가 수행하는, 몰입형 비디오(immersive video)의 아틀라스(atlas) 구성요소를 패킹(packing)하는 방법에 있어서,
상기 몰입형 비디오로부터 상기 아틀라스 구성요소를 생성하는 단계;
기설정된 패킹 플래그(packing flag)를 획득하는 단계;
상기 패킹 플래그가 참인 경우, 패킹 정보를 획득하거나 생성하는 단계; 및
상기 패킹 정보를 기반으로 상기 아틀라스 구성요소를 패킹하여 팩(pack)을 생성하는 단계
를 포함하는 것을 특징으로 하는 방법. A method of packing an atlas component of an immersive video performed by an immersive video encoding apparatus, the method comprising:
generating the atlas component from the immersive video;
obtaining a preset packing flag;
obtaining or generating packing information when the packing flag is true; and
Packing the atlas component based on the packing information to generate a pack
A method comprising a.

제13항에 있어서,
상기 아틀라스 구성요소는,
상기 몰입형 비디오로부터 생성된, 기본 시점(basic view)의 텍스처(texture) 비디오, 기본 시점의 깊이 비디오(depth video), 추가 시점(additional view)의 텍스처 비디오, 및 추가 시점의 깊이 비디오를 포함하는 것을 특징으로 하는 방법.14. The method of claim 13,
The atlas component is
A texture video of a basic view, a depth video of a basic view, a texture video of an additional view, and a depth video of an additional view generated from the immersive video A method characterized in that.

제13항에 있어서,
상기 패킹 정보는,
깊이 비디오의 우선을 지시하는 플래그, 텍스처 비디오의 비트 심도(bit depth), 상기 깊이 비디오의 비트 심도, 널 데이터(null data) 플래그, 및 다시점 비디오 그룹을 나타내는 플래그를 포함하는 것을 특징으로 하는 방법. 14. The method of claim 13,
The packing information is
Method comprising: a flag indicating priority of depth video, bit depth of texture video, bit depth of depth video, null data flag, and flag indicating multi-view video group .

제13항에 있어서,
상기 패킹 플래그, 상기 패킹 정보, 상기 팩을 부호화하여 비트스트림을 생성하는 단계; 및
상기 비트스트림을 몰입형 비디오 복호화 장치로 전송하는 단계를 더 포함하는 것을 특징으로 하는 방법. 14. The method of claim 13,
generating a bitstream by encoding the packing flag, the packing information, and the pack; and
The method of claim 1, further comprising transmitting the bitstream to an immersive video decoding apparatus.

제15항에 있어서,
상기 패킹 정보를 획득하거나 생성하는 단계는,
상기 팩이 널 데이터의 영역을 포함하는 경우, 상기 널 데이터 플래그를 참으로 설정하는 것을 특징으로 하는 방법.
16. The method of claim 15,
The step of obtaining or generating the packing information includes:
and setting the null data flag to true when the pack includes an area of null data.