KR20220108704A

KR20220108704A - Apparatus and method of processing audio

Info

Publication number: KR20220108704A
Application number: KR1020210138834A
Authority: KR
Inventors: 남우현; 고상철; 김경래; 김정규; 손윤재; 이태미; 정현권; 황성희
Original assignee: 삼성전자주식회사
Priority date: 2021-01-27
Filing date: 2021-10-18
Publication date: 2022-08-03

Abstract

An aspect of the present disclosure may provide an audio processing device including: a memory that stores one or more instructions; and a processor. The processor executes the one or more instructions stored in the memory to: obtain second audio signals corresponding to channels included in a second channel group from first audio signals corresponding to channels included in a first channel group; downsample at least one third audio signal corresponding to at least one channel identified based on the degree of association with the second channel group among channels included in the first channel group by using an artificial intelligence model; and generate a bitstream including second audio signals corresponding to channels included in the second channel group and the at least one downsampled third audio signal. The first channel group includes a channel group of an original audio signal. The second channel group is configured by combining at least two channels among channels included in the first channel group. It is possible to enhance transmission efficiency.

Description

오디오 처리 장치 및 방법{APPARATUS AND METHOD OF PROCESSING AUDIO}AUDIO PROCESSING APPARATUS AND METHOD OF PROCESSING AUDIO

본 개시는 오디오 처리 장치 및 방법에 관한 것이다. 보다 구체적으로, 본 개시는 다수의 채널들을 포함하는 오디오 신호를 부호화 하는 장치 및 방법, 또는 복호화하는 장치 및 방법에 관한 것이다.The present disclosure relates to an audio processing apparatus and method. More specifically, the present disclosure relates to an apparatus and method for encoding an audio signal including a plurality of channels, or an apparatus and method for decoding.

기술의 발전으로 더 크고 선명한 디스플레이(Display)와 다수의 스피커(Speaker)들로 구성되는 입체 음향 기기가 널리 보급되고 있다. 이와 함께, 더욱 생생한 영상을 송수신 하기 위한 비디오 코딩 기술과 더욱 현장감 있는 몰입형(immersive) 오디오 신호를 송수신하기 위한 오디오 코딩 기술에 대한 연구가 진행되고 있다. 예를 들어, 몰입형 오디오 신호는 소정의 압축 표준, 예를 들어, AAC(Advanced Audio Coding) 표준, OPUS 표준 등을 따르는 코덱(codec)에 의해 부호화된 후 비트스트림 형태로 기록매체에 저장되거나 통신 채널을 통해 전송될 수 있다. 입체 음향 기기는 소정의 압축 표준에 따라 생성된 비트스트림을 복호화하여 몰입형 오디오 신호를 재생할 수 있다. With the development of technology, a stereoscopic sound device composed of a larger and clearer display and a plurality of speakers has been widely distributed. At the same time, research on video coding technology for transmitting and receiving more vivid images and audio coding technology for transmitting and receiving more realistic immersive audio signals is being conducted. For example, the immersive audio signal is encoded by a codec conforming to a predetermined compression standard, for example, AAC (Advanced Audio Coding) standard, OPUS standard, etc. It may be transmitted through a channel. The stereophonic device may reproduce an immersive audio signal by decoding a bitstream generated according to a predetermined compression standard.

오디오 컨텐츠는, 해당 오디오 컨텐츠가 소비되는 환경에 따라서, 다양한 채널 레이아웃을 통해 재생될 수 있다. 예를 들어, 헤드폰과 같은 음향 출력 장치를 통해 구현되는 2 채널 레이아웃, TV와 같은 디스플레이 장치에 탑재된 스피커들로 구현되는 3.1 채널 레이아웃, 3.1.2 채널 레이아웃, 또는 시청자의 주변에 배치된 복수의 스피커들로 구현되는 5.1 채널 레이아웃, 5.1.2 채널 레이아웃, 7.1 채널 레이아웃, 또는 7.1.4 채널 레이아웃 등을 통해 오디오 컨텐츠가 재생될 수 있다. Audio content may be reproduced through various channel layouts according to an environment in which the corresponding audio content is consumed. For example, a two-channel layout implemented through a sound output device such as headphones, a 3.1 channel layout implemented with speakers mounted on a display device such as a TV, a 3.1.2 channel layout, or a plurality of channels arranged around the viewer Audio content may be reproduced through a 5.1-channel layout, a 5.1.2-channel layout, a 7.1-channel layout, or a 7.1.4-channel layout implemented by speakers.

특히, OTT 서비스(Over-The-Top service)의 확대, TV의 해상도 증가, 테블릿과 같은 전자 기기의 화면의 대형화에 따라, 댁 내에서 극장과 같이 몰입형 사운드(Immersive Sound)를 경험하고자 하는 시청자의 요구(Needs)가 증가하고 있다. 따라서, 오디오 처리 장치가, 디스플레이 화면을 중심으로 음상(Sound image)이 구현되는 3차원 채널 레이아웃을 지원해야 할 필요성이 증가되고 있다.In particular, with the expansion of OTT service (Over-The-Top service), the increase in the resolution of TVs, and the enlargement of the screens of electronic devices such as tablets, people who want to experience immersive sound like in a theater Viewer needs are increasing. Accordingly, there is an increasing need for the audio processing apparatus to support a three-dimensional channel layout in which a sound image is implemented based on a display screen.

다양한 채널 레이아웃 간의 변환을 지원하면서도 전송 효율을 높일 수 있는 부호화 및 복호화 방법이 요구된다. 특히, 소정 채널 레이아웃의 오디오 컨텐츠를 음상(sound image)이 상이한 다른 채널 레이아웃으로 변환하여 출력하는 경우에도 원본 오디오 신호를 정확하게 재현할 수 있도록 하는, 오디오 부호화 및 복호화 방법이 요구된다.There is a need for an encoding and decoding method capable of increasing transmission efficiency while supporting conversion between various channel layouts. In particular, there is a need for an audio encoding and decoding method capable of accurately reproducing an original audio signal even when audio content of a predetermined channel layout is converted into another channel layout having a different sound image and output.

본 개시의 일 측면은 하나 이상의 인스트럭션을 저장하는 메모리; 및 상기 메모리에 저장된 상기 하나 이상의 인스트럭션을 실행함으로써, 제1 채널 그룹에 포함된 채널들에 대응되는 제1 오디오 신호들로부터 제2 채널 그룹에 포함된 채널들에 대응되는 제2 오디오 신호들을 획득하고, 상기 제1 채널 그룹에 포함된 채널들 중 상기 제2 채널 그룹과의 관련도에 기초하여 식별된 적어도 하나의 채널에 대응되는 적어도 하나의 제3 오디오 신호를 인공 지능 모델을 이용하여 다운샘플링하고, 상기 제2 채널 그룹에 포함된 채널들에 대응되는 제2 오디오 신호들 및 상기 다운샘플링된 적어도 하나의 제3 오디오 신호를 포함하는 비트스트림을 생성하는, 프로세서를 포함하고, 상기 제1 채널 그룹은 원본 오디오 신호의 채널 그룹을 포함하고, 상기 제2 채널 그룹은 제1 채널 그룹에 포함된 채널들 중 적어도 두 채널들을 결합함으로써 구성되는, 오디오 처리 장치를 제공할 수 있다.One aspect of the present disclosure is a memory for storing one or more instructions; and by executing the one or more instructions stored in the memory, second audio signals corresponding to channels included in the second channel group are obtained from first audio signals corresponding to channels included in the first channel group, , down-sampling at least one third audio signal corresponding to at least one channel identified based on the degree of relevance to the second channel group among channels included in the first channel group using an artificial intelligence model; , a processor that generates a bitstream including second audio signals corresponding to channels included in the second channel group and the downsampled at least one third audio signal, the first channel group comprising: includes a channel group of an original audio signal, and the second channel group is configured by combining at least two channels among channels included in the first channel group.

또한 본 개시의 일 실시예에서 상기 프로세서는, 상기 제1 채널 그룹에 포함된 채널들 중에서 상기 제2 채널 그룹과의 관련도가 소정 값보다 낮은 상기 적어도 하나의 채널을 식별하는, 오디오 처리 장치를 제공할 수 있다. In addition, in an embodiment of the present disclosure, the processor is configured to identify the at least one channel having a lower relevance to the second channel group than a predetermined value from among the channels included in the first channel group. can provide

또한 본 개시의 일 실시예에서 상기 프로세서는, 상기 제1 채널 그룹에 포함된 채널들 각각의 상기 제2 채널 그룹과의 관련도에 기초하여, 상기 제1 채널 그룹에 포함된 채널들에 가중치 값들을 할당하고, 상기 제1 채널 그룹에 포함된 채널들에 할당된 상기 가중치 값들에 기초하여, 상기 제1 오디오 신호들 중 적어도 두 개의 제1 오디오 신호들을 가중치 합함으로써, 상기 제1 오디오 신호들로부터 상기 제2 오디오 신호들을 획득하고, 상기 제1 채널 그룹에 포함된 채널들에 할당된 가중치 값들에 기초하여, 상기 제1 채널 그룹에 포함된 채널들 중에서 상기 적어도 하나의 채널을 식별하는, 오디오 처리 장치를 제공할 수 있다. In addition, in an embodiment of the present disclosure, the processor is configured to assign a weight value to the channels included in the first channel group based on the degree of relevance of each of the channels included in the first channel group to the second channel group. , and weight summing at least two first audio signals among the first audio signals based on the weight values assigned to the channels included in the first channel group. Acquire the second audio signals and identify the at least one channel from among the channels included in the first channel group based on weight values assigned to the channels included in the first channel group device can be provided.

또한 본 개시의 일 실시예에서 상기 제1 채널 그룹에 포함된 채널들은, 제1 서브 그룹의 채널들 및 제2 서브 그룹의 채널들로 구분되고, 상기 프로세서는, 상기 제1 채널 그룹에 포함된 채널들 중 적어도 두 채널들에 할당된 가중치 값들에 기초하여 상기 적어도 두 채널들에 대응되는 오디오 신호들을 합함으로써, 상기 제2 채널 그룹에 포함된 채널들 중의 하나의 채널에 대응되는 오디오 신호를 획득하고, 상기 적어도 두 채널들 중에서 할당된 가중치 값이 최대인 채널을 상기 제1 서브 그룹의 채널로 식별하고, 상기 적어도 두 채널들 중에서 나머지 채널을 상기 제2 서브 그룹의 채널로 식별하는, 오디오 처리 장치를 제공할 수 있다. Also, in an embodiment of the present disclosure, channels included in the first channel group are divided into channels of a first subgroup and channels of a second subgroup, and the processor includes: Obtaining an audio signal corresponding to one of the channels included in the second channel group by summing the audio signals corresponding to the at least two channels based on weight values assigned to at least two of the channels and identifying a channel having a maximum assigned weight value among the at least two channels as a channel of the first subgroup, and identifying the remaining channels among the at least two channels as a channel of the second subgroup. device can be provided.

또한 본 개시의 일 실시예에서 상기 프로세서는, 상기 적어도 하나의 제3 오디오 신호로부터 제1 오디오 샘플 그룹 및 제2 오디오 샘플 그룹을 추출하고, 상기 인공 지능 모델을 이용하여, 상기 제1 오디오 샘플 그룹 및 상기 제2 오디오 샘플 그룹에 대한 다운샘플링 관련 정보를 획득하고, 상기 다운샘플링 관련 정보를 상기 제1 오디오 샘플 그룹 및 상기 제2 오디오 샘플 그룹에 적용함으로써, 상기 적어도 하나의 제3 오디오 신호를 다운샘플링하는, 오디오 처리 장치를 제공할 수 있다. Also, in an embodiment of the present disclosure, the processor extracts a first audio sample group and a second audio sample group from the at least one third audio signal, and uses the artificial intelligence model, the first audio sample group and obtaining downsampling related information for the second audio sample group, and applying the downsampling related information to the first audio sample group and the second audio sample group, thereby downloading the at least one third audio signal. An audio processing device for sampling may be provided.

또한 본 개시의 일 실시예에서 상기 인공 지능 모델은, 상기 제2 오디오 신호들 및 상기 다운샘플링된 적어도 하나의 제3 오디오 신호에 기초하여 복원되는 복원 오디오 신호들과, 상기 제1 오디오 신호들 간의 오차를 최소화 하는 상기 다운샘플링 관련 정보를 획득하도록 훈련된 인공 지능 모델인 것을 특징으로 하는, 오디오 처리 장치를 제공할 수 있다. In addition, in an embodiment of the present disclosure, the artificial intelligence model includes restored audio signals reconstructed based on the second audio signals and the down-sampled at least one third audio signal, and between the first audio signals. It is possible to provide an audio processing device, characterized in that it is an artificial intelligence model trained to obtain the downsampling related information that minimizes an error.

또한 본 개시의 일 실시예에서 상기 프로세서는, 상기 제2 채널 그룹에 포함되는 채널들에 대응되는 제2 오디오 신호들로부터 기본 채널 그룹의 오디오 신호들 및 종속 채널 그룹의 오디오 신호들을 획득하고, 상기 기본 채널 그룹의 오디오 신호들을 압축하여 제1 압축 신호를 획득하고, 상기 종속 채널 그룹의 오디오 신호들을 압축하여 제2 압축 신호를 획득하고, 상기 다운샘플링된 적어도 하나의 제3 오디오 신호를 압축함으로써 제3 압축 신호를 획득하고, 상기 제1 압축 신호, 상기 제2 압축 신호, 및 상기 제3 압축 신호를 포함하는 상기 비트스트림을 생성하는, 오디오 처리 장치를 제공할 수 있다. Also, in an embodiment of the present disclosure, the processor obtains audio signals of a basic channel group and audio signals of a subordinate channel group from second audio signals corresponding to channels included in the second channel group, A first compressed signal is obtained by compressing the audio signals of the base channel group, a second compressed signal is obtained by compressing the audio signals of the dependent channel group, and a second compressed signal is obtained by compressing the downsampled at least one third audio signal. An audio processing apparatus may be provided that obtains three compressed signals and generates the bitstream including the first compressed signal, the second compressed signal, and the third compressed signal.

또한 본 개시의 일 실시예에서 상기 기본 채널 그룹은, 스테레오 재생을 위한 두 채널들을 포함하고, 상기 종속 채널 그룹은, 상기 제2 채널 그룹에 포함된 채널들 중에서, 상기 스테레오 재생을 위한 두 채널들과 관련도가 높은 두 채널들 이외의 채널들을 포함하는, 오디오 처리 장치를 제공할 수 있다. Also, in an embodiment of the present disclosure, the basic channel group includes two channels for stereo reproduction, and the dependent channel group includes two channels for stereo reproduction among channels included in the second channel group. It is possible to provide an audio processing apparatus including channels other than the two channels having a high degree of relevance to .

또한 본 개시의 일 실시예에서 상기 제1 채널 그룹에 포함된 채널들에 대응되는 제1 오디오 신호들은, 청자 중심 다채널 오디오 신호를 포함하고, 상기 제2 채널 그룹에 포함된 채널들에 대응되는 제2 오디오 신호들은, 청자 전방 중심 다채널 오디오 신호를 포함하는, 오디오 처리 장치를 제공할 수 있다. Also, in an embodiment of the present disclosure, the first audio signals corresponding to the channels included in the first channel group include a listener-centered multi-channel audio signal, and the first audio signals corresponding to the channels included in the second channel group The second audio signals may provide the audio processing device, including the listener front centered multi-channel audio signal.

또한 본 개시의 일 실시예에서 상기 제1 채널 그룹은, 전방 좌측 채널, 전방 우측 채널, 센터 채널, 좌측 채널, 우측 채널, 후방 좌측 채널, 후방 우측 채널, 서브 우퍼 채널, 전방 상부 좌측 채널, 전방 상부 우측 채널, 후방 상부 좌측 채널, 및 후방 상부 우측 채널로 구성되는 7.1.4 채널을 포함하고, 상기 제2 채널 그룹은, 전방 좌측 채널, 전방 우측 채널, 센터 채널, 서브 우퍼 채널, 전방 상부 좌측 채널, 및 전방 상부 우측 채널로 구성되는 3.1.2 채널을 포함하고, 상기 프로세서는, 상기 제1 채널 그룹에 포함되는 채널들 중에서 상기 제2 채널 그룹과의 관련도가 낮은, 상기 좌측 채널, 상기 우측 채널, 상기 후방 좌측 채널, 상기 후방 우측 채널, 상기 후방 상부 좌측 채널, 및 상기 후방 상부 우측 채널을 제2 서브 그룹의 채널들로서 식별하는, 오디오 처리 장치를 제공할 수 있다. Also, in an embodiment of the present disclosure, the first channel group includes a front left channel, a front right channel, a center channel, a left channel, a right channel, a rear left channel, a rear right channel, a subwoofer channel, a front upper left channel, and a front 7.1.4 channel consisting of an upper right channel, a rear upper left channel, and a rear upper right channel, wherein the second channel group includes: a front left channel, a front right channel, a center channel, a subwoofer channel, a front upper left channel and a 3.1.2 channel consisting of a channel and a front upper right channel, wherein the processor is configured to: the left channel, the second channel group having a low relation to the second channel group among channels included in the first channel group and identifying the right channel, the rear left channel, the rear right channel, the rear upper left channel, and the rear upper right channel as channels of a second subgroup.

본 개시의 다른 측면은 하나 이상의 인스트럭션을 저장하는 메모리; 및 상기 메모리에 저장된 상기 하나 이상의 인스트럭션을 실행함으로써, 비트스트림으로부터 제1 채널 그룹에 포함된 채널들에 대응되는 제1 오디오 신호들 및 다운샘플링된 제2 오디오 신호를 획득하고, 상기 다운샘플링된 제2 오디오 신호를 인공 지능 모델을 이용하여 업샘플링함으로써, 제2 채널 그룹에 포함된 채널들 중에서 적어도 하나의 채널에 대응되는 적어도 하나의 제2 오디오 신호를 획득하고, 상기 제1 오디오 신호들 및 상기 적어도 하나의 제2 오디오 신호로부터 상기 제2 채널 그룹에 포함된 채널들에 대응되는 제3 오디오 신호들을 복원하고, 상기 제1 채널 그룹은 상기 제2 채널 그룹보다 적은 수의 채널들을 포함하는, 오디오 처리 장치를 제공할 수 있다. Another aspect of the present disclosure is a memory for storing one or more instructions; and by executing the one or more instructions stored in the memory, first audio signals and downsampled second audio signals corresponding to channels included in the first channel group are obtained from the bitstream, and the downsampled second audio signal is obtained. By upsampling two audio signals using an artificial intelligence model, at least one second audio signal corresponding to at least one channel among channels included in a second channel group is obtained, and the first audio signals and the reconstructing third audio signals corresponding to channels included in the second channel group from at least one second audio signal, wherein the first channel group includes a smaller number of channels than the second channel group A processing device may be provided.

또한 본 개시의 일 실시예에서 상기 제2 채널 그룹에 포함된 채널들은, 상기 제2 채널 그룹에 포함된 채널들 각각의 상기 제1 채널 그룹과의 관련도에 기초하여, 제1 서브 그룹의 채널들 및 제2 서브 그룹의 채널들로 구분되고, 상기 프로세서는, 상기 제2 서브 그룹의 채널들에 대응되는 제2 오디오 신호들을 획득하는, 오디오 처리 장치를 제공할 수 있다. In addition, in an embodiment of the present disclosure, the channels included in the second channel group are channels of the first subgroup based on the degree of relevance of each of the channels included in the second channel group to the first channel group. and channels of the second subgroup, and the processor obtains second audio signals corresponding to the channels of the second subgroup, the audio processing apparatus may be provided.

또한 본 개시의 일 실시예에서 상기 프로세서는, 상기 제2 채널 그룹에 포함되는 채널들로부터 상기 제1 채널 그룹에 포함되는 채널들로의 변환 규칙에 따라, 상기 제1 오디오 신호들 및 상기 제2 서브 그룹의 채널들에 대응되는 제2 오디오 신호들로부터, 상기 제1 서브 그룹의 채널들에 대응되는 제4 오디오 신호들을 획득하고, 상기 인공 지능 모델을 이용하여, 상기 제2 오디오 신호들 및 상기 제4 오디오 신호들을 수정(refinement)하고, 상기 수정된 제2 오디오 신호들 및 상기 수정된 제4 오디오 신호들로부터, 상기 제3 오디오 신호들을 획득하는, 오디오 처리 장치를 제공할 수 있다. In addition, in an embodiment of the present disclosure, the processor is configured to: according to a conversion rule from channels included in the second channel group to channels included in the first channel group, the first audio signals and the second Obtaining fourth audio signals corresponding to the channels of the first subgroup from the second audio signals corresponding to the channels of the subgroup, and using the artificial intelligence model, the second audio signals and the and refining fourth audio signals, and obtaining the third audio signals from the modified second audio signals and the modified fourth audio signals.

또한 본 개시의 일 실시예에서 상기 인공 지능 모델 내의 제1 레이어들을 통하여 상기 제4 오디오 신호들이 수정되고, 상기 인공 지능 모델 내의 제2 레이어들을 통하여 상기 제2 오디오 신호들이 수정되고, 상기 제1 레이어들에, 상기 제1 오디오 신호들, 상기 제2 오디오 신호들, 및 상기 제4 오디오 신호들이 입력됨으로써, 상기 수정된 제4 오디오 신호들이 획득되고, 상기 제2 레이어들에, 상기 제1 오디오 신호들, 상기 제2 오디오 신호들, 상기 수정된 제4 오디오 신호들, 및 상기 제1 레이어들로부터 출력된 값들이 입력됨으로써, 상기 수정된 제2 오디오 신호들이 획득되는 것을 특징으로 하는, 오디오 처리 장치를 제공할 수 있다. Also, in an embodiment of the present disclosure, the fourth audio signals are modified through first layers in the artificial intelligence model, the second audio signals are modified through second layers in the artificial intelligence model, and the first layer to the first audio signals, the second audio signals, and the fourth audio signals are input, the modified fourth audio signals are obtained, and in the second layers, the first audio signal The second audio signals, the modified fourth audio signals, and the values output from the first layers are input, whereby the modified second audio signals are obtained. can provide

또한 본 개시의 일 실시예에서 상기 프로세서는, 상기 비트스트림을 압축해제 함으로써, 기본 채널 그룹에 포함된 채널들에 대응되는 오디오 신호들 및 종속 채널 그룹에 포함된 채널들에 대응되는 오디오 신호들을 획득하고, 상기 비트스트림으로부터 획득된 부가 정보, 상기 기본 채널 그룹에 포함된 채널들에 대응되는 오디오 신호들, 및 상기 종속 채널 그룹에 포함된 채널들에 대응되는 오디오 신호들에 기초하여, 상기 제1 오디오 신호들을 획득하는, 오디오 처리 장치를 제공할 수 있다. In addition, in an embodiment of the present disclosure, the processor decompresses the bitstream to obtain audio signals corresponding to channels included in the basic channel group and audio signals corresponding to channels included in the dependent channel group. and, based on the additional information obtained from the bitstream, audio signals corresponding to channels included in the basic channel group, and audio signals corresponding to channels included in the dependent channel group, the first An audio processing apparatus for obtaining audio signals may be provided.

또한 본 개시의 일 실시예에서 상기 기본 채널 그룹은, 스테레오 재생을 위한 두 채널들을 포함하고, 상기 종속 채널 그룹은, 상기 제1 채널 그룹에 포함되는 채널들 중에서, 상기 스테레오 재생을 위한 두 채널들과 관련도가 높은 두 채널들 이외의 채널들을 포함하고, 상기 프로세서는, 상기 기본 채널 그룹에 포함된 채널들에 대응되는 오디오 신호들, 및 상기 종속 채널 그룹에 포함된 채널들에 대응되는 오디오 신호들을 믹싱하여 상기 제1 채널 그룹에 포함된 채널들에 대응되는 믹싱된(mixed) 오디오 신호들을 획득하고, 상기 부가 정보에 기초하여 상기 믹싱된 오디오 신호들을 렌더링 함으로써, 상기 제1 오디오 신호들을 획득하는, 오디오 처리 장치를 제공할 수 있다. Also, in an embodiment of the present disclosure, the basic channel group includes two channels for stereo reproduction, and the dependent channel group includes two channels for stereo reproduction among channels included in the first channel group. channels other than the two channels having a high correlation with Obtaining the first audio signals by mixing the audio signals to obtain mixed audio signals corresponding to channels included in the first channel group, and rendering the mixed audio signals based on the additional information , an audio processing device may be provided.

또한 본 개시의 일 실시예에서 상기 제1 채널 그룹에 포함된 채널들에 대응되는 상기 제1 오디오 신호들은, 청자 전방 중심 다채널 오디오 신호를 포함하고, 상기 제2 채널 그룹에 포함된 채널들에 대응되는 상기 제3 오디오 신호들은, 청자 중심 다채널 오디오 신호를 포함하는, 오디오 처리 장치를 제공할 수 있다. Also, in an embodiment of the present disclosure, the first audio signals corresponding to the channels included in the first channel group include a multi-channel audio signal centered in front of the listener, and include channels included in the second channel group. The corresponding third audio signals may include a listener-centered multi-channel audio signal, and the audio processing apparatus may be provided.

또한 본 개시의 일 실시예에서 상기 제1 채널 그룹은, 전방 좌측 채널, 전방 우측 채널, 센터 채널, 서브 우퍼 채널, 전방 상부 좌측 채널, 및 전방 상부 우측 채널로 구성되는 3.1.2 채널을 포함하고, 상기 제2 채널 그룹은, 전방 좌측 채널, 전방 우측 채널, 센터 채널, 좌측 채널, 우측 채널, 후방 좌측 채널, 후방 우측 채널, 서브 우퍼 채널, 전방 상부 좌측 채널, 전방 상부 우측 채널, 후방 상부 좌측 채널, 및 후방 상부 우측 채널로 구성되는 7.1.4 채널을 포함하고, 상기 제2 서브 그룹의 채널들은, 상기 제2 채널 그룹에 포함되는 채널들 중에서, 상기 제1 채널 그룹과의 관련도가 낮은, 상기 좌측 채널, 상기 우측 채널, 상기 후방 좌측 채널, 상기 후방 우측 채널, 상기 후방 상부 좌측 채널, 및 상기 후방 상부 우측 채널을 포함하는, 오디오 처리 장치를 제공할 수 있다. In addition, in an embodiment of the present disclosure, the first channel group includes a 3.1.2 channel consisting of a front left channel, a front right channel, a center channel, a subwoofer channel, a front upper left channel, and a front upper right channel, and , the second channel group includes a front left channel, a front right channel, a center channel, a left channel, a right channel, a rear left channel, a rear right channel, a subwoofer channel, a front upper left channel, a front upper right channel, and a rear upper left channel. a 7.1.4 channel consisting of a channel and a rear upper right channel, wherein the channels of the second subgroup have low relevance to the first channel group among channels included in the second channel group. , the left channel, the right channel, the rear left channel, the rear right channel, the rear upper left channel, and the rear upper right channel, the audio processing apparatus may be provided.

본 개시의 다른 측면은 제1 채널 그룹에 포함된 채널들에 대응되는 제1 오디오 신호들로부터 제2 채널 그룹에 포함된 채널들에 대응되는 제2 오디오 신호들을 획득하는 단계; 상기 제1 채널 그룹에 포함된 채널들 중 상기 제2 채널 그룹과의 관련도에 기초하여 식별된 적어도 하나의 채널에 대응되는 적어도 하나의 제3 오디오 신호를 인공 지능 모델을 이용하여 다운샘플링하는 단계; 및 상기 제2 채널 그룹에 포함된 채널들에 대응되는 제2 오디오 신호들 및 상기 다운샘플링된 적어도 하나의 제3 오디오 신호를 포함하는 비트스트림을 생성하는 단계를 포함하고, 상기 제1 채널 그룹은 원본 오디오 신호의 채널 그룹을 포함하고, 상기 제2 채널 그룹은 제1 채널 그룹에 포함된 채널들 중 적어도 두 채널들을 결합함으로써 구성되는, 오디오 처리 방법을 제공할 수 있다. Another aspect of the present disclosure may include: obtaining second audio signals corresponding to channels included in a second channel group from first audio signals corresponding to channels included in the first channel group; Downsampling at least one third audio signal corresponding to at least one channel identified based on the degree of relevance to the second channel group among channels included in the first channel group using an artificial intelligence model; ; and generating a bitstream including second audio signals corresponding to channels included in the second channel group and the down-sampled at least one third audio signal, wherein the first channel group comprises: and a channel group of an original audio signal, wherein the second channel group is configured by combining at least two channels among channels included in the first channel group.

본 개시의 다른 측면은 비트스트림으로부터 제1 채널 그룹에 포함된 채널들에 대응되는 제1 오디오 신호들 및 다운샘플링된 제2 오디오 신호를 획득하는 단계; 상기 다운샘플링된 제2 오디오 신호를 인공 지능 모델을 이용하여 업샘플링함으로써, 제2 채널 그룹에 포함된 채널들 중에서 적어도 하나의 채널에 대응되는 적어도 하나의 제2 오디오 신호를 획득하는 단계; 및 상기 제1 오디오 신호들 및 상기 적어도 하나의 제2 오디오 신호로부터 상기 제2 채널 그룹에 포함된 채널들에 대응되는 제3 오디오 신호들을 복원하는 단계를 포함하고, 상기 제1 채널 그룹은 상기 제2 채널 그룹보다 적은 수의 채널들을 포함하는, 오디오 처리 방법을 제공할 수 있다. Another aspect of the present disclosure is a method comprising: obtaining first audio signals and a downsampled second audio signal corresponding to channels included in a first channel group from a bitstream; obtaining at least one second audio signal corresponding to at least one channel from among channels included in a second channel group by upsampling the downsampled second audio signal using an artificial intelligence model; and reconstructing third audio signals corresponding to channels included in the second channel group from the first audio signals and the at least one second audio signal, wherein the first channel group includes the first audio signal. It is possible to provide an audio processing method, including a number of channels less than two channel groups.

도 1a는 일 실시 예에 따라 오디오 컨텐츠 소비 환경에 기초하여 음상이 변환되는 오디오 처리 시스템의 예를 도시한다.
도 1b는 일 실시 예에 따른 오디오 부호화 장치 및 오디오 복호화 장치가 다채널 오디오 신호를 기본 채널 그룹의 오디오 신호들과 종속 채널 그룹의 오디오 신호들로 나누어 처리하는 방법의 예를 도시한다.
도 2a는 일 실시 예에 따른 오디오 부호화 장치의 블록도를 도시한다.
도 2b는 일 실시 예에 따른 오디오 부호화 장치의 구체적인 블록도를 도시한다.
도 2c는 일 실시 예에 따른 오디오 부호화 장치에 포함되는 다채널 오디오 신호 처리부의 블록도를 도시한다.
도 2d는 일 실시 예에 따른 다채널 오디오 신호 처리부의 동작을 설명하는 도면이다.
도 3a는 일 실시 예에 따른 오디오 복호화 장치의 블록도를 도시한다.
도 3b는 일 실시 예에 따른 오디오 복호화 장치의 구체적인 블록도를 도시한다.
도 3c는 일 실시 예에 따른 오디오 복호화 장치에 포함되는 다채널 오디오 신호 복원부의 블록도를 도시한다.
도 3d는 일 실시 예에 따른 다채널 오디오 신호 복원부의 믹싱부의 동작을 설명하는 도면이다.
도 4a는 일 실시 예에 따른 오디오 부호화 장치의 블록도를 도시한다.
도 4b는 일 실시 예에 따른 오디오 복호화 장치의 블록도를 도시한다.
도 5는 일 실시 예에 따른 오디오 처리 시스템에서 수행되는 채널 그룹들 간의 변환의 예를 도시한다.
도 6은 일 실시 예에 따른 오디오 부호화 장치의 블록도를 도시한다.
도 7a는 일 실시 예에 따른 오디오 부호화 장치에서 수행되는 채널 그룹들 간의 변환 규칙의 예를 도시한다.
도 7b는 일 실시 예에 따른 오디오 부호화 장치에서 수행되는 채널 그룹들 간의 변환 규칙의 예를 도시한다.
도 8a는 일 실시 예에 따른 오디오 부호화 장치가 수행하는 사이드 채널 오디오 신호의 다운 샘플링을 설명하는 도면이다.
도 8b는 일 실시 예에 따른 오디오 부호화 장치의 다운샘플링부의 동작을 설명한다.
도 9는 일 실시 예에 따른 오디오 복호화 장치의 블록도를 도시한다.
도 10a는 일 실시 예에 따른 오디오 복호화 장치의 음상 복원부의 동작을 설명한다.
도 10b는 일 실시 예에 따른 오디오 복호화 장치의 업샘플링부 및 수정부의 동작을 설명한다.
도 10c는 일 실시 예에 따른 오디오 복호화 장치의 수정부의 동작의 예들을 설명한다.
도 11은 일 실시 예에 따른 오디오 부호화 장치의 오디오 신호 부호화 방법의 흐름도를 도시한다.
도 12는 일 실시 예에 따른 오디오 복호화 장치의 오디오 신호 복호화 방법의 흐름도를 도시한다.
도 13은 일 실시 예에 따른 오디오 처리 시스템에서 음상 특성 분석에 기초하여 수행되는 채널 그룹들 간의 변환의 예를 도시한다.
도 14는 일 실시 예에 따른 오디오 처리 시스템이 채널의 특성에 기초하여 사이드 채널의 오디오 신호를 다운샘플링 하는 예를 도시한다.
도 15는 일 실시 예에 따른 오디오 처리 방법이 적용될 수 있는 실시 예들을 도시한다.1A illustrates an example of an audio processing system in which a sound image is converted based on an audio content consumption environment, according to an embodiment.
FIG. 1B illustrates an example of a method in which an audio encoding apparatus and an audio decoding apparatus divide a multi-channel audio signal into audio signals of a basic channel group and audio signals of a dependent channel group and process the audio signals according to an embodiment.
2A is a block diagram of an audio encoding apparatus according to an embodiment.
2B is a detailed block diagram of an audio encoding apparatus according to an embodiment.
2C is a block diagram of a multi-channel audio signal processing unit included in an audio encoding apparatus according to an exemplary embodiment.
2D is a diagram for explaining an operation of a multi-channel audio signal processing unit according to an exemplary embodiment.
3A is a block diagram of an audio decoding apparatus according to an embodiment.
3B is a detailed block diagram of an audio decoding apparatus according to an embodiment.
3C is a block diagram of a multi-channel audio signal restoration unit included in an audio decoding apparatus according to an embodiment.
3D is a view for explaining an operation of a mixing unit of a multi-channel audio signal restoration unit according to an embodiment.
4A is a block diagram of an audio encoding apparatus according to an embodiment.
4B is a block diagram of an audio decoding apparatus according to an embodiment.
5 illustrates an example of conversion between channel groups performed in an audio processing system according to an embodiment.
6 is a block diagram of an audio encoding apparatus according to an embodiment.
7A illustrates an example of a transformation rule between channel groups performed in an audio encoding apparatus according to an embodiment.
7B illustrates an example of a transformation rule between channel groups performed in an audio encoding apparatus according to an embodiment.
8A is a diagram for describing downsampling of a side channel audio signal performed by an audio encoding apparatus according to an embodiment.
8B illustrates an operation of a downsampling unit of an audio encoding apparatus according to an embodiment.
9 is a block diagram of an audio decoding apparatus according to an embodiment.
10A illustrates an operation of a sound image restoration unit of an audio decoding apparatus according to an exemplary embodiment.
10B illustrates operations of an upsampling unit and a correction unit of an audio decoding apparatus according to an exemplary embodiment.
10C illustrates examples of operations of a correction unit of an audio decoding apparatus according to an embodiment.
11 is a flowchart illustrating an audio signal encoding method of an audio encoding apparatus according to an exemplary embodiment.
12 is a flowchart illustrating an audio signal decoding method of an audio decoding apparatus according to an embodiment.
13 illustrates an example of conversion between channel groups performed based on sound image characteristic analysis in an audio processing system according to an exemplary embodiment.
14 illustrates an example in which the audio processing system downsamples an audio signal of a side channel based on channel characteristics according to an embodiment.
15 illustrates embodiments to which an audio processing method according to an embodiment may be applied.

아래에서는 첨부한 도면을 참조하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 개시의 실시 예를 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 또한, 도면에서 본 개시를 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art to which the present disclosure pertains can easily implement them. However, the present disclosure may be implemented in several different forms and is not limited to the embodiments described herein. In addition, in order to clearly describe the present disclosure in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

본 개시의 일부 실시 예는 기능적인 블록 구성들 및 다양한 처리 단계들로 나타내어 질 수 있다. 이러한 기능 블록들의 일부 또는 전부는, 특정 기능들을 실행하는 다양한 개수의 하드웨어 및/또는 소프트웨어 구성들로 구현될 수 있다. 예를 들어, 본 개시의 기능 블록들은 하나 이상의 마이크로프로세서들에 의해 구현되거나, 소정의 기능을 위한 회로 구성들에 의해 구현될 수 있다. 또한, 예를 들어, 본 개시의 기능 블록들은 다양한 프로그래밍 또는 스크립팅 언어로 구현될 수 있다. 기능 블록들은 하나 이상의 프로세서들에서 실행되는 알고리즘으로 구현될 수 있다. 또한, 본 개시는 전자적인 환경 설정, 신호 처리, 및/또는 데이터 처리 등을 위하여 종래 기술을 채용할 수 있다.Some embodiments of the present disclosure may be represented by functional block configurations and various processing steps. Some or all of these functional blocks may be implemented in various numbers of hardware and/or software configurations that perform specific functions. For example, the functional blocks of the present disclosure may be implemented by one or more microprocessors, or by circuit configurations for a given function. Also, for example, the functional blocks of the present disclosure may be implemented in various programming or scripting languages. The functional blocks may be implemented as an algorithm running on one or more processors. Also, the present disclosure may employ prior art for electronic configuration, signal processing, and/or data processing, and the like.

또한, 도면에 도시된 구성 요소들 간의 연결 선 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것일 뿐이다. 실제 장치에서는 대체 가능하거나 추가된 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들에 의해 구성 요소들 간의 연결이 나타내어 질 수 있다.In addition, the connecting lines or connecting members between the components shown in the drawings only exemplify functional connections and/or physical or circuit connections. In an actual device, a connection between components may be represented by various functional connections, physical connections, or circuit connections that are replaceable or added.

본 개시에서 'DNN(deep neural network)'은 뇌 신경을 모사한 인공신경망 모델의 대표적인 예시로서, 특정 알고리즘을 사용한 인공신경망 모델로 한정되지 않는다.In the present disclosure, a 'deep neural network (DNN)' is a representative example of an artificial neural network model simulating a brain nerve, and is not limited to an artificial neural network model using a specific algorithm.

본 개시에서 '다채널 오디오 신호'는 n채널(n은, 2보다 큰 정수)의 오디오 신호를 의미할 수 있다. '다채널 오디오 신호'와 구별되는 '모노 채널 오디오 신호' 또는 '스테레오 채널 오디오 신호'는 1차원 오디오 신호 또는 2차원 오디오 신호일 수 있고, '다채널 오디오 신호'는 3차원 오디오 신호임이 바람직하나, 이에 제한되지 않고, 2차원 오디오 신호를 포함할 수 있다.In the present disclosure, a 'multi-channel audio signal' may mean an audio signal of n channels (n is an integer greater than 2). A 'mono channel audio signal' or a 'stereo channel audio signal' distinguished from a 'multi-channel audio signal' may be a one-dimensional audio signal or a two-dimensional audio signal, and the 'multi-channel audio signal' is preferably a three-dimensional audio signal, The present invention is not limited thereto, and may include a two-dimensional audio signal.

본 개시에서 '채널 레이아웃', '스피커 레이아웃', 또는 '스피커 채널 레이아웃'은 적어도 하나의 채널의 조합을 나타낼 수 있고, 채널들 또는 오디오 신호들이 출력되는 스피커들의 공간적인 배치를 특정할 수 있다. 여기서의 채널은 실제로 오디오 신호가 출력되는 채널이므로, 표시 채널(presentation channel)이라 할 수 있다. In the present disclosure, 'channel layout', 'speaker layout', or 'speaker channel layout' may indicate a combination of at least one channel, and may specify a spatial arrangement of speakers to which channels or audio signals are output. Since the channel here is a channel through which an audio signal is actually output, it may be referred to as a presentation channel.

예를 들어, 소정 채널 레이아웃은 "X.Y.Z 채널 레이아웃"이라는 명칭으로 특정될 수 있다. 여기서, X는 서라운드 채널의 개수, Y는 서브우퍼 채널의 개수, Z는 높이 채널의 개수일 수 있다. 소정 채널 레이아웃은, 서라운드 채널/서브우퍼 채널/높이 채널 각각의 공간적인 위치를 특정할 수 있다.For example, a predetermined channel layout may be specified with the name "X.Y.Z channel layout". Here, X may be the number of surround channels, Y may be the number of subwoofer channels, and Z may be the number of height channels. A given channel layout may specify the spatial location of each of the surround channel/subwoofer channel/height channel.

예를 들어, '채널 레이아웃'의 예로, 1.0.0 채널(또는, 모노 채널) 레이아웃, 2.0.0 채널(또는, 스테레오 채널) 레이아웃, 3.1.2 채널 레이아웃, 3.1.4 채널 레이아웃, 5.1.0 채널 레이아웃, 5.1.2 채널 레이아웃, 5.1.4 채널 레이아웃, 7.1.0 레이아웃, 7.1.2 레이아웃, 7.1.4 채널 레이아웃이 있으나, 이에 제한되지 않고, 다양한 채널 레이아웃이 있을 수 있다.For example, as an example of 'channel layout', 1.0.0 channel (or mono channel) layout, 2.0.0 channel (or stereo channel) layout, 3.1.2 channel layout, 3.1.4 channel layout, 5.1.0 Channel layout, 5.1.2 channel layout, 5.1.4 channel layout, 7.1.0 layout, 7.1.2 layout, 7.1.4 channel layout, but is not limited thereto, and there may be various channel layouts.

'채널 레이아웃'은 '채널 그룹'이라고도 지칭될 수 있고, '채널 레이아웃'을 구성하는 채널들의 명칭들은 다양하게 명명될 수 있으나, 본 개시에서는 설명의 편의상 통일하기로 한다. '채널 레이아웃'을 구성하는 복수의 채널들은, 각 채널의 공간적인 위치에 기초하여 명명될 수 있다.The 'channel layout' may also be referred to as a 'channel group', and the names of channels constituting the 'channel layout' may be variously named, but in the present disclosure, they will be unified for convenience of description. A plurality of channels constituting the 'channel layout' may be named based on a spatial location of each channel.

예를 들어, 1.0.0 채널 레이아웃의 제 1 서라운드 채널은 모노 채널(Mono Channel)로 명명될 수 있다. 2.0.0 채널 레이아웃의 제 1 서라운드 채널은 L2 채널로 명명될 수 있고, 제 2 서라운드 채널은 R2 채널로 명명될 수 있다. 여기서 "L"은 청자 기준으로 왼쪽에 위치하는 채널임을 나타내고, "R"은 청자 기준으로 오른쪽에 위치하는 채널임을 나타낸다. "2"는 서라운드 채널의 개수가 2개임을 나타낸다.For example, the first surround channel of the 1.0.0 channel layout may be called a mono channel. The first surround channel of the 2.0.0 channel layout may be referred to as an L2 channel, and the second surround channel may be referred to as an R2 channel. Here, "L" indicates a channel located on the left with respect to the listener, and "R" indicates a channel located on the right side with respect to the listener. "2" indicates that the number of surround channels is two.

3.1.2 채널의 제 1 서라운드 채널은 L3 채널, 제 2 서라운드 채널은 R3 채널, 제 3 서라운드 채널은 C 채널로 명명될 수 있다. 3.1.2 채널의 제 1 서브우퍼 채널은 LFE 채널로 명명될 수 있다. 3.1.2 채널의 제 1 높이 채널은 Hfl3 채널(또는, Tl 채널), 제 2 높이 채널은 Hfr3 채널(또는, Tr 채널)로 명명될 수 있다. The first surround channel of the 3.1.2 channel may be referred to as an L3 channel, the second surround channel may be referred to as an R3 channel, and the third surround channel may be referred to as a C channel. The first subwoofer channel of the 3.1.2 channel may be referred to as an LFE channel. 3.1.2 The first height channel of the channel may be referred to as an Hfl3 channel (or Tl channel), and the second height channel may be referred to as an Hfr3 channel (or Tr channel).

5.1.0 채널 레이아웃의 제 1 서라운드 채널은 L5 채널, 제 2 서라운드 채널은 R5 채널, 제 3 서라운드 채널은 C 채널, 제 4 서라운드 채널은 Ls5 채널, 제 5 서라운드 채널은 Rs5 채널로 명명될 수 있다. 여기서 "C"는 청자 기준으로 중심(Center)에 위치하는 채널임을 나타낸다. "s"는 측방에 위치하는 채널임을 의미한다. 5.1.0 채널 레이아웃의 제 1 서브 우퍼 채널은 LFE 채널로 명명될 수 있다. 여기서, LFE는 저주파 효과(Low Frequency Effect)를 의미할 수 있다. 즉, LFE 채널은 저주파 효과음을 출력하기 위한 채널일 수 있다. In the 5.1.0 channel layout, the first surround channel may be named as L5 channel, second surround channel as R5 channel, third surround channel as C channel, fourth surround channel as Ls5 channel, and fifth surround channel as Rs5 channel. . Here, "C" indicates a channel located at the center with respect to the listener. "s" means a channel located laterally. The first subwoofer channel of the 5.1.0 channel layout may be referred to as an LFE channel. Here, LFE may mean a low frequency effect. That is, the LFE channel may be a channel for outputting a low-frequency sound effect.

5.1.2 채널 레이아웃 및 5.1.4 채널 레이아웃의 서라운드 채널과 5.1.0 채널 레이아웃의 서라운드 채널의 명칭은 동일할 수 있다. 마찬가지로, 5.1.2 채널 레이아웃 및 5.1.4 채널 레이아웃의 서브 우퍼 채널과 5.1.0 채널 레이아웃의 서브 우퍼 채널의 명칭은 동일할 수 있다. The names of the surround channel of 5.1.2 channel layout and 5.1.4 channel layout and the surround channel of 5.1.0 channel layout may be the same. Similarly, the subwoofer channel of the 5.1.2 channel layout and the 5.1.4 channel layout and the subwoofer channel of the 5.1.0 channel layout may have the same name.

5.1.2 채널 레이아웃의 제 1 높이 채널은 Hl5로 명명될 수 있다. 여기서 H는 높이 채널을 나타낸다. 제 2 높이 채널은 Hr5로 명명될 수 있다.5.1.2 The first height channel of the channel layout may be named H15. where H represents the height channel. The second height channel may be named Hr5.

한편, 5.1.4 채널 레이아웃의 제 1 높이 채널은 Hfl 채널, 제 2 높이 채널은 Hfr, 제 3 높이 채널은 Hbl 채널, 제 4 높이 채널은 Hbr 채널로 명명될 수 있다. 여기서, f는 청자 중심으로 전방 채널, b는 후방 채널임을 나타낸다.Meanwhile, in the 5.1.4 channel layout, the first height channel may be referred to as an Hfl channel, the second height channel may be referred to as Hfr, the third height channel may be referred to as an Hbl channel, and the fourth height channel may be referred to as an Hbr channel. Here, f denotes a front channel centered on the listener, and b denotes a rear channel.

7.1.0 채널 레이아웃의 제 1 서라운드 채널은 L 채널, 제 2 서라운드 채널은 R 채널, 제 3 서라운드 채널은 C 채널, 제 4 서라운드 채널은 Ls 채널, 제 5 서라운드 채널은 Rs 채널, 제 6 서라운드 채널은 Lb 채널, 제 7 서라운드 채널은 Rb 채널로 명명될 수 있다. In 7.1.0 channel layout, 1st surround channel is L channel, 2nd surround channel is R channel, 3rd surround channel is C channel, 4th surround channel is Ls channel, 5th surround channel is Rs channel, 6th surround channel may be referred to as an Lb channel, and the seventh surround channel may be referred to as an Rb channel.

7.1.2 채널 레이아웃 및 7.1.4 채널 레이아웃의 서라운드 채널과 7.1.0 채널 레이아웃의 서라운드 채널의 명칭은 동일할 수 있다. 마찬가지로, 7.1.2 채널 레이아웃 및 7.1.4 채널 레이아웃의 서브 우퍼 채널과 7.1.0 채널 레이아웃의 서브 우퍼 채널의 명칭은 동일할 수 있다. 7.1.2 채널 레이아웃의 제 1 높이 채널은 Hl7 채널, 제 2 높이 채널은 Hr7 채널로 명명될 수 있다. The names of the surround channel of 7.1.2 channel layout and 7.1.4 channel layout and the surround channel of 7.1.0 channel layout may be the same. Similarly, the subwoofer channel of the 7.1.2 channel layout and 7.1.4 channel layout and the subwoofer channel of the 7.1.0 channel layout may have the same name. 7.1.2 In the channel layout, the first height channel may be referred to as an H17 channel, and the second height channel may be referred to as an Hr7 channel.

7.1.4 채널 레이아웃의 제 1 높이 채널은 Hfl 채널, 제 2 높이 채널은 Hfr 채널, 제 3 높이 채널은 Hbl 채널, 제 4 높이 채널은 Hbr 채널로 명명될 수 있다.7.1.4 In the channel layout, the first height channel may be referred to as an Hfl channel, the second height channel may be referred to as an Hfr channel, the third height channel may be referred to as an Hbl channel, and the fourth height channel may be referred to as an Hbr channel.

여기서, 일부 채널은 채널 레이아웃에 따라 달리 명명되나, 동일한 채널을 나타낼 수 있다. 예를 들어, Hl5 채널과 Hl7 채널은 동일한 채널일 수 있다. 마찬가지로, Hr5 채널과 Hr7 채널은 동일한 채널일 수 있다.Here, some channels are named differently depending on the channel layout, but may represent the same channel. For example, the H15 channel and the H17 channel may be the same channel. Likewise, the Hr5 channel and the Hr7 channel may be the same channel.

한편, 전술한 채널들의 명칭에 제한되지 않고, 다양한 채널의 명칭이 이용될 수 있다. 예를 들어, L2 채널은 L'' 채널, R2 채널은 R'' 채널, L3 채널은 ML3 채널 또는 L' 채널, R3 채널은 MR3 채널 또는 R' 채널, Hfl3 채널은 MHL3 채널, Hfr3 채널은 MHR3 채널, Ls5 채널은 MSL5 채널 또는 Ls' 채널, Rs5 채널은 MSR5 채널, Hl5 채널은 MHL5 채널 또는 Hl' 채널, Hr5 채널은 MHR5 채널 또는 Hr' 채널, C 채널은 MC 채널로 명명될 수 있다.Meanwhile, it is not limited to the above-mentioned names of channels, and names of various channels may be used. For example, L2 channel is L'' channel, R2 channel is R'' channel, L3 channel is ML3 channel or L' channel, R3 channel is MR3 channel or R' channel, Hfl3 channel is MHL3 channel, Hfr3 channel is MHR3 Channel, Ls5 channel may be referred to as MSL5 channel or Ls' channel, Rs5 channel as MSR5 channel, H15 channel as MHL5 channel or Hl' channel, Hr5 channel as MHR5 channel or Hr' channel, and C channel as MC channel.

상술한 바와 같이, 채널 레이아웃을 구성하는 적어도 하나의 채널은 [표 1]과 같이 명명될 수 있다.As described above, at least one channel constituting the channel layout may be named as shown in [Table 1].

채널 레이아웃Channel Layout 채널 명칭Channel name 1.0.01.0.0 MonoMono 2.0.02.0.0 L2/R2L2/R2 3.1.23.1.2 L3/C/R3/Hfl3/Hfr3/LFEL3/C/R3/Hfl3/Hfr3/LFE 5.1.05.1.0 L5/C/R5/Ls5/Rs5/LFEL5/C/R5/Ls5/Rs5/LFE 5.1.25.1.2 L5/C/R5/Ls5/Rs5/Hl5/Hr5/LFEL5/C/R5/Ls5/Rs5/Hl5/Hr5/LFE 5.1.45.1.4 L5/C/R5/Ls5/Rs5/Hfl/Hfr/Hbl/Hbr/LFEL5/C/R5/Ls5/Rs5/Hfl/Hfr/Hbl/Hbr/LFE 7.1.07.1.0 L/C/R/Ls/Rs/Lb/Rb/LFEL/C/R/Ls/Rs/Lb/Rb/LFE 7.1.27.1.2 L/C/R/Ls/Rs/Lb/Rb/Hl7/Hr7/LFEL/C/R/Ls/Rs/Lb/Rb/Hl7/Hr7/LFE 7.1.47.1.4 L/C/R/Ls/Rs/Lb/Rb/Hfl/Hfr/Hbl/Hbr/LFEL/C/R/Ls/Rs/Lb/Rb/Hfl/Hfr/Hbl/Hbr/LFE

한편, '전송 채널(Transmission Channel)'은 압축된 오디오 신호를 전송하기 위한 채널로, '전송 채널(Transmission Channel)'의 일부는 '표시 채널(presentation channel)'과 동일할 수 있으나, 이에 제한되지 않고, 다른 일부는 표시 채널의 오디오 신호가 믹싱된 오디오 신호의 채널일 수 있다. 즉, '전송 채널(Transmission Channel)'은 '표시 채널(presentation channel)'의 오디오 신호를 담은 채널이나, 일부는 표시 채널과 동일하고, 나머지는 표시 채널과 다른 믹싱된 채널일 수 있다. Meanwhile, a 'transmission channel' is a channel for transmitting a compressed audio signal, and a part of the 'transmission channel' may be the same as a 'presentation channel', but is not limited thereto. However, the other part may be a channel of an audio signal in which the audio signal of the display channel is mixed. That is, a 'transmission channel' may be a channel including an audio signal of a 'presentation channel', or a mixed channel having the same part as the display channel and the rest being different from the display channel.

'전송 채널(Transmission Channel)'은 '표시 채널'과 구별하여 명명될 수 있다. 예를 들어, 전송 채널이 A/B 채널인 경우, A/B 채널은 L2/R2 채널들에 대응되는 오디오 신호들을 담을 수 있다. 전송 채널이 T/P/Q 채널인 경우, T/P/Q 채널은 C/LFE/Hfl3,Hfr3 채널들에 대응되는 오디오 신호들을 담을 수 있다. 전송 채널이 S/U/V 채널 인 경우, S/U/V 채널은 L,R/Ls,Rs/Hfl,Hfr 채널들에 대응되는 오디오 신호들을 담을 수 있다.A 'transmission channel' may be named to be distinguished from a 'display channel'. For example, when the transport channel is an A/B channel, the A/B channel may contain audio signals corresponding to L2/R2 channels. When the transport channel is a T/P/Q channel, the T/P/Q channel may contain audio signals corresponding to C/LFE/Hfl3 and Hfr3 channels. When the transport channel is the S/U/V channel, the S/U/V channel may contain audio signals corresponding to the L, R/Ls, Rs/Hfl, and Hfr channels.

본 개시에서, '3차원 오디오 신호'는, 청자가 청자 주위의 소리의 높이감(sensation of height)을 느낄 수 있도록 하여 보다 깊은 오디오 경험을 하게 하는 오디오 신호를 의미할 수 있다.In the present disclosure, a '3D audio signal' may mean an audio signal that allows a listener to feel a sense of height of a sound around the listener, thereby providing a deeper audio experience.

본 개시에서, '청자 전방 중심 다채널 오디오 신호'는, 청자의 전방(예를 들어, 디스플레이 장치)을 중심으로 음상이 구성되는 채널 레이아웃에 기초한 오디오 신호를 의미할 수 있다. 청자의 전방 중심 다채널 오디오 신호가, 청자 전방에 위치한 디스플레이 장치의 화면을 중심으로 배치되는 경우, '화면 중심(screen centered) 오디오 신호' 또는 '전방 3차원(Front-3D) 오디오 신호'라고 지칭될 수도 있다.In the present disclosure, a 'multi-channel audio signal centered in front of the listener' may mean an audio signal based on a channel layout in which a sound image is configured centered in front of the listener (eg, a display device). When the listener's front-centered multi-channel audio signal is centered on the screen of the display device positioned in front of the listener, it is referred to as a 'screen centered audio signal' or a 'Front-3D audio signal' it might be

본 개시에서, '청자 중심 다채널 오디오 신호'는 청자를 중심으로 음상이 구성되는 채널 레이아웃에 기초한 오디오 신호를 의미할 수 있다. 청자 중심 다채널 오디오 신호는, 청자를 중심으로 전방향(omni-direction)으로 배치되는 채널 레이아웃에 기초하기 때문에, '풀-3차원(Full-3D) 오디오 신호'라고 지칭될 수도 있다.In the present disclosure, a 'listener-centered multi-channel audio signal' may mean an audio signal based on a channel layout in which a sound image is configured around a listener. The listener-centered multi-channel audio signal may be referred to as a 'full-3D audio signal' because it is based on a channel layout arranged in omni-direction with the listener as the center.

본 개시에서 '기본 채널 그룹(Base Channel Group)'은 적어도 하나의 '기본 채널(Base Channel)'을 포함하는 그룹을 의미할 수 있다. '기본 채널'의 오디오 신호는, 다른 채널(예를 들어, 종속 채널)의 오디오 신호에 대한 정보 없이 독립적으로 복호화되어, 소정 채널 레이아웃을 구성할 수 있는 오디오 신호를 포함할 수 있다. 예를 들어, '기본 채널 그룹'의 오디오 신호는 모노 채널 오디오 신호이거나, 스테레오 오디오 신호를 구성하는 좌 채널 오디오 및 우 채널 오디오 신호를 포함할 수 있다. In the present disclosure, a 'base channel group' may mean a group including at least one 'base channel'. The audio signal of the 'basic channel' may include an audio signal that is independently decoded without information on an audio signal of another channel (eg, a dependent channel) to configure a predetermined channel layout. For example, the audio signal of the 'basic channel group' may be a mono channel audio signal or may include a left channel audio signal and a right channel audio signal constituting a stereo audio signal.

본 개시에서 '종속 채널 그룹(Dependent Channel Group)'은 적어도 하나의 '종속 채널'을 포함하는 그룹을 의미할 수 있다. '종속 채널(Dependent Channel)'의 오디오 신호는, ‘기본 채널’의 오디오 신호와 함께 믹싱되어 소정 채널 레이아웃의 적어도 하나의 채널을 구성하는 오디오 신호를 포함할 수 있다. In the present disclosure, a 'dependent channel group' may mean a group including at least one 'dependent channel'. An audio signal of a 'dependent channel' may include an audio signal that is mixed with an audio signal of the 'basic channel' to configure at least one channel of a predetermined channel layout.

본 개시의 일 실시 예에 따른 부호화 장치가 소정 채널 레이아웃의 다채널 오디오 신호를 부호화 하여 전송하는 경우, 부호화 장치는, 해당 다채널 오디오 신호를 믹싱하여 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호를 획득하고, 획득된 신호들을 압축하여 전송될 수 있다. 예를 들어, 기본 채널 그룹이, 스테레오 채널을 구성하는 좌 채널 및 우 채널을 포함하는 경우, 종속 채널 그룹은, 소정 채널 레이아웃에 포함되는 채널들 중에서, 기본 채널 그룹에 대응하는 두 채널들 이외의 채널들을 포함할 수 있다. When the encoding apparatus according to an embodiment of the present disclosure encodes and transmits a multi-channel audio signal of a predetermined channel layout, the encoding apparatus mixes the multi-channel audio signal to obtain an audio signal of a basic channel group and an audio of a dependent channel group. It may be transmitted by acquiring a signal and compressing the acquired signals. For example, when the basic channel group includes a left channel and a right channel constituting a stereo channel, the dependent channel group includes channels other than two channels corresponding to the basic channel group among channels included in a predetermined channel layout. It may include channels.

본 개시의 일 실시 예에 따른 복호화 장치는, 수신된 비트스트림으로부터 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호를 복호화 하고, 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호를 믹싱함으로써, 소정 채널 레이아웃의 다채널 오디오 신호를 복원할 수 있다.A decoding apparatus according to an embodiment of the present disclosure decodes an audio signal of a basic channel group and an audio signal of a dependent channel group from a received bitstream, and mixes the audio signal of the basic channel group and the audio signal of the dependent channel group. , it is possible to restore a multi-channel audio signal of a predetermined channel layout.

본 개시에서, '사이드 채널 정보(side channel information)'는, 복호화 장치가, 제2 채널 레이아웃의 오디오 신호로부터 더 많은 개수의 제1 채널 레이아웃의 오디오 신호를 복원하기 위해 이용하는 정보로서, 제1 채널 레이아웃에 포함되는 적어도 하나의 사이드 채널에 관한 오디오 신호를 의미할 수 있다. 예를 들어, 사이드 채널은, 제1 채널 레이아웃에 포함되는 채널들 중에서, 제2 채널 레이아웃에 포함되는 채널들과 위치 관련도가 낮은 채널을 포함 할 수 있다. 본 개시는, 사이드 채널이, 제1 채널 레이아웃에 포함되는 채널들 중에서, 제2 채널 레이아웃에 포함되는 채널들과 관련도가 낮은 채널을 포함하는 예에 제한되지 않는다. 예를 들어, 제1 채널 레이아웃의 채널들 중에서 소정 기준을 만족하는 채널이 사이드 채널이거나, 오디오 신호의 제작자가 의도한 채널이 사이드 채널이 될 수 있다.In the present disclosure, 'side channel information' is information used by the decoding apparatus to reconstruct a larger number of audio signals of the first channel layout from the audio signals of the second channel layout, and the first channel It may mean an audio signal related to at least one side channel included in the layout. For example, the side channel may include a channel having a low positional relation with channels included in the second channel layout among channels included in the first channel layout. The present disclosure is not limited to an example in which the side channel includes a channel having low relevance to channels included in the second channel layout among channels included in the first channel layout. For example, a channel satisfying a predetermined criterion among channels of the first channel layout may be a side channel, or a channel intended by a producer of an audio signal may be a side channel.

이하, 본 개시의 기술적 사상에 의한 실시 예들을 차례로 상세히 설명한다.Hereinafter, embodiments according to the technical spirit of the present disclosure will be described in detail in turn.

TV와 같은 디스플레이 장치가 몰입형 오디오 컨텐츠를 재현하기 위해서는, 디스플레이 장치의 화면을 중심으로 음상이 재현되도록 하는 오디오 코덱이 이용될 수 있다. 그러나, 디스플레이 장치는, 설치 방식에 따라서, 단독으로 사용되거나, 사운드 바와 같은 추가적인 스피커와 함께 이용될 수 있다. 예를 들어, 디스플레이 장치에 탑재된 스피커들 이외에 홈 시어터를 위한 복수의 스피커들이 추가되는 경우, 화면 중심으로 음상이 변환되었던 오디오 컨텐츠를 청자 중심의 음상을 갖는 오디오 컨텐츠로 복원시키는 방법이 필요하다.In order for a display device such as a TV to reproduce immersive audio content, an audio codec for reproducing a sound image based on a screen of the display device may be used. However, the display device may be used alone or in combination with additional speakers such as a sound bar, depending on the installation method. For example, when a plurality of speakers for a home theater are added in addition to the speakers mounted on the display device, a method of restoring audio content having a sound image centered on the screen to audio content having a sound image centered on the listener is needed.

도 1a는 오디오 컨텐츠 소비 환경에 따라 음상이 변환되는 오디오 처리 시스템의 예를 도시한다.1A shows an example of an audio processing system in which a sound image is converted according to an audio content consumption environment.

도 1a의 영상(10)에 도시된 바와 같이, 컨텐츠 제작자는 청자 중심의 음상을 갖는 몰입형 오디오 컨텐츠(예를 들어, 7.1.4 채널 레이아웃의 오디오 컨텐츠)를 제작할 수 있다. 제작된 오디오 컨텐츠는, 화면 중심의 음상을 갖는 오디오 컨텐츠로 변환되어 사용자에게 전송될 수 있다. 영상(20)에 도시된 바와 같이, TV와 같은 디스플레이 디바이스를 통해 화면 중심의 음상을 갖는 오디오 컨텐츠(예를 들어, 3.1.2 채널 레이아웃의 오디오 컨텐츠)가 소비될 수 있다. 이 때, 영상(30)에 도시된 바와 같이, 디스플레이 디바이스 이외에 추가적인 스피커를 더 이용하는 환경에서 오디오 컨텐츠가 소비되기 위해서는, 화면 중심의 음상을 갖는 오디오 컨텐츠가 청자 중심의 음상을 갖는 오디오 컨텐츠(예를 들어, 7.1.4 채널 레이아웃의 오디오 컨텐츠)로 복원되어야 할 필요성이 있다.As shown in the image 10 of FIG. 1A , a content creator may produce immersive audio content (eg, audio content with a 7.1.4 channel layout) having a listener-centered sound image. The produced audio content may be converted into audio content having a sound image centered on the screen and transmitted to the user. As shown in the image 20 , audio content having a sound image centered on the screen (eg, audio content having a 3.1.2 channel layout) may be consumed through a display device such as a TV. At this time, as shown in the image 30, in order for audio content to be consumed in an environment using an additional speaker in addition to the display device, the audio content having the sound image centered on the screen is the audio content having the sound image centered on the listener (for example, For example, audio content of 7.1.4 channel layout) needs to be restored.

전송되는 오디오 컨텐츠가 다양한 채널 레이아웃에 따라 변환되어 음질 열화 없이 출력되도록 하기 위해서는, 오디오 컨텐츠에 포함되는 모든 채널 레이아웃들에 대한 오디오 신호들을 그대로 전송하는 방법이 이용될 수 있다. 그러나, 이러한 방법은 전송 효율이 떨어지고, 모노 채널 또는 스테레오 채널과 같은 레거시 채널 레이아웃으로의 하위 호환이 어렵다는 문제점이 있었다. In order to convert transmitted audio content according to various channel layouts and output without deterioration of sound quality, a method of transmitting audio signals for all channel layouts included in the audio content as it is may be used. However, this method has problems in that transmission efficiency is low and backward compatibility with a legacy channel layout such as a mono channel or a stereo channel is difficult.

따라서, 본 개시의 일 실시 예에 따른 오디오 부호화 장치는, 화면 중심의 음상을 갖는 채널 레이아웃에 적합하고 하위 호환이 가능하도록, 다채널 오디오 신호를 기본 채널 그룹의 오디오 신호들과 종속 채널 그룹의 오디오 신호들로 나누어 부호화하여 전송할 수 있다.Accordingly, the audio encoding apparatus according to an embodiment of the present disclosure converts a multi-channel audio signal into audio signals of a basic channel group and audio of a sub-channel group so as to be suitable for a channel layout having a sound image at the center of a screen and to be backward compatible. It can be divided into signals and then encoded and transmitted.

도 1b는 일 실시 예에 따른 오디오 부호화 장치 및 오디오 복호화 장치가 다채널 오디오 신호를 기본 채널 그룹의 오디오 신호들과 종속 채널 그룹의 오디오 신호들로 나누어 처리하는 방법의 예를 도시한다.FIG. 1B illustrates an example of a method in which an audio encoding apparatus and an audio decoding apparatus divide a multi-channel audio signal into audio signals of a basic channel group and audio signals of a dependent channel group and process the audio signals according to an embodiment.

일 실시 예에 다른 오디오 부호화 장치(200)는, 3.1.2 채널 레이아웃의 오디오 신호들(170)에 대한 정보를 전송하기 위해, 스테레오 채널 오디오 신호를 압축하여 생성한 기본 채널 그룹의 압축 오디오 신호 및 3.1.2 채널 레이 아웃의 일부 채널들의 오디오 신호들을 압축하여 생성한 제1 종속 채널 그룹의 압축 오디오 신호를 전송할 수 있다. 오디오 부호화 장치(200)는, 스테레오 채널 레이아웃의 L2 채널 오디오 신호 및 R2 채널 오디오 신호(160)를 압축하여 기본 채널 그룹의 압축 오디오 신호를 생성할 수 있다. 예를 들어, L2 채널 오디오 신호는 스테레오 오디오 신호의 왼쪽 채널의 신호, R2 채널 오디오 신호는 스테레오 오디오 신호의 오른쪽 채널의 신호일 수 있다.In order to transmit information on the audio signals 170 of the 3.1.2 channel layout, the audio encoding apparatus 200 according to an embodiment includes a compressed audio signal of a basic channel group generated by compressing a stereo channel audio signal and 3.1.2 The compressed audio signal of the first subordinate channel group generated by compressing audio signals of some channels of the channel layout may be transmitted. The audio encoding apparatus 200 may compress the L2-channel audio signal and the R2-channel audio signal 160 of the stereo channel layout to generate a compressed audio signal of the basic channel group. For example, the L2-channel audio signal may be a signal of a left channel of a stereo audio signal, and the R2-channel audio signal may be a signal of a right channel of a stereo audio signal.

오디오 부호화 장치(200)는, 3.1.2 채널 레이아웃의 오디오 신호들(170) 중에서, Hfl3 채널, Hfr3 채널, LFE 채널, 및 C 채널의 오디오 신호를 압축하여 제1 종속 채널 그룹의 압축 오디오 신호를 생성할 수 있다. 3.1.2 채널 레이아웃은 청자의 전방을 중심으로 음상이 구성되는 6개의 채널들로 구성되는 채널 레이아웃일 수 있다. 3.1.2 채널 레이아웃에서, C 채널은 센터 채널을 의미하고, LFE 채널은 서브우퍼 채널을 의미하고, Hfl3 채널은 좌측 상부 채널을 의미하고, Hfr3 채널은 우측 상부 채널을 의미할 수 있다.The audio encoding apparatus 200 compresses the audio signals of the Hfl3 channel, the Hfr3 channel, the LFE channel, and the C channel among the audio signals 170 of the 3.1.2 channel layout to obtain the compressed audio signal of the first subordinate channel group. can create 3.1.2 The channel layout may be a channel layout consisting of 6 channels in which a sound image is formed centered on the front of the listener. 3.1.2 In the channel layout, the C channel may mean the center channel, the LFE channel may mean the subwoofer channel, the Hfl3 channel may mean the upper left channel, and the Hfr3 channel may mean the upper right channel.

오디오 부호화 장치(200)는, 기본 채널 그룹의 압축 오디오 신호 및 제1 종속 채널 그룹의 압축 오디오 신호를 오디오 복호화 장치(300)로 전송할 수 있다.The audio encoding apparatus 200 may transmit the compressed audio signal of the basic channel group and the compressed audio signal of the first dependent channel group to the audio decoding apparatus 300 .

일 실시 예에 따른 오디오 복호화 장치(300)는, 기본 채널 그룹의 압축 오디오 신호 및 제1 종속 채널 그룹의 압축 오디오 신호로부터, 3.1.2 채널 레이아웃의 오디오 신호들을 복원할 수 있다.The audio decoding apparatus 300 according to an embodiment may reconstruct audio signals having a 3.1.2 channel layout from the compressed audio signal of the basic channel group and the compressed audio signal of the first sub-channel group.

먼저, 오디오 복호화 장치(300)는 기본 채널 그룹의 압축 오디오 신호를 압축 해제하여, L2 채널 오디오 신호 및 R2 채널 오디오 신호를 획득할 수 있다. 또한, 오디오 복호화 장치(300)는 제1 종속 채널 그룹의 압축 오디오 신호를 압축 해제하여, C, LFE, Hfl3, 및 Hfr3 채널들의 오디오 신호들을 획득할 수 있다.First, the audio decoding apparatus 300 may decompress a compressed audio signal of a basic channel group to obtain an L2-channel audio signal and an R2-channel audio signal. Also, the audio decoding apparatus 300 may decompress the compressed audio signal of the first dependent channel group to obtain audio signals of C, LFE, Hfl3, and Hfr3 channels.

도 1b에서 화살표(1)로 표시된 바와 같이, 오디오 복호화 장치(300)는 L2 채널 오디오 신호 및 C 채널 오디오 신호를 믹싱하여 3.1.2 채널 레이아웃의 L3 채널 오디오 신호를 복원할 수 있다. 화살표(2)로 표시된 바와 같이, 오디오 복호화 장치(300)는 R2 채널 오디오 신호 및 C 채널 오디오 신호를 믹싱하여 3.1.2 채널의 R3 채널 오디오 신호를 복원할 수 있다. 3.1.2 채널 레이아웃에 있어서, L3 채널은 좌측 채널을 의미하고, R3 채널은 우측 채널을 의미할 수 있다.As indicated by the arrow 1 in FIG. 1B , the audio decoding apparatus 300 may reconstruct the L3 channel audio signal of the 3.1.2 channel layout by mixing the L2 channel audio signal and the C channel audio signal. As indicated by the arrow 2 , the audio decoding apparatus 300 may reconstruct the 3.1.2 channel R3 channel audio signal by mixing the R2 channel audio signal and the C channel audio signal. 3.1.2 In the channel layout, the L3 channel may mean a left channel, and the R3 channel may mean a right channel.

한편, 일 실시 예에 다른 오디오 부호화 장치(200)는, 5.1.2 채널 레이아웃의 오디오 신호들(180)에 대한 정보를 전송하기 위해, 기본 채널 그룹의 압축 오디오 신호 및 제1 종속 채널 그룹의 압축 오디오 신호에 더하여, 5.1.2 채널 레이아웃의 오디오 신호들(180) 중에서, L5 채널 및 R5 채널의 오디오 신호들을 압축하여 제2 종속 채널 그룹의 압축 오디오 신호를 생성할 수 있다. 5.1.2 채널 레이아웃은, 청자의 전방을 중심으로 음상이 구성되는, 8개의 채널들로 구성되는 채널 레이아웃일 수 있다. 5.1.2 채널 레이아웃에서, L5 채널은 전방 좌측 채널을 의미하고, R5 채널은 전방 우측 채널을 의미할 수 있다.Meanwhile, in order to transmit information on the audio signals 180 of the 5.1.2 channel layout, the audio encoding apparatus 200 according to an exemplary embodiment compresses the compressed audio signal of the basic channel group and the first subordinate channel group. In addition to the audio signal, among the audio signals 180 of the 5.1.2 channel layout, audio signals of the L5 channel and the R5 channel may be compressed to generate a compressed audio signal of the second subordinate channel group. 5.1.2 The channel layout may be a channel layout consisting of 8 channels, in which a sound image is formed centered on the front of the listener. 5.1.2 In the channel layout, the L5 channel may mean the front left channel, and the R5 channel may mean the front right channel.

오디오 부호화 장치(200)는, 기본 채널 그룹의 압축 오디오 신호, 제1 종속 채널 그룹의 압축 오디오 신호, 및 제2 종속 채널 그룹의 압축 오디오 신호를 오디오 복호화 장치(300)로 전송할 수 있다.The audio encoding apparatus 200 may transmit the compressed audio signal of the basic channel group, the compressed audio signal of the first dependent channel group, and the compressed audio signal of the second dependent channel group to the audio decoding apparatus 300 .

일 실시 예에 따른 오디오 복호화 장치(300)는, 기본 채널 그룹의 압축 오디오 신호, 제1 종속 채널 그룹의 압축 오디오 신호, 및 제2 종속 채널 그룹의 압축 오디오 신호로부터, 5.1.2 채널 레이아웃의 오디오 신호들을 복원할 수 있다.The audio decoding apparatus 300 according to an embodiment of the present invention provides audio of 5.1.2 channel layout from the compressed audio signal of the basic channel group, the compressed audio signal of the first dependent channel group, and the compressed audio signal of the second dependent channel group. signals can be restored.

먼저, 오디오 복호화 장치(300)는 기본 채널 그룹의 압축 오디오 신호 및 제1 종속 채널 그룹의 압축 오디오 신호로부터 3.1.2 채널 레이아웃의 오디오 신호들(170)을 복원할 수 있다.First, the audio decoding apparatus 300 may reconstruct the audio signals 170 of the 3.1.2 channel layout from the compressed audio signal of the basic channel group and the compressed audio signal of the first dependent channel group.

다음으로, 오디오 복호화 장치(300)는, 제2 종속 채널 그룹의 압축 오디오 신호를 압축 해제하여, L5 채널 및 R5 채널의 오디오 신호들을 획득할 수 있다. Next, the audio decoding apparatus 300 may decompress the compressed audio signal of the second dependent channel group to obtain audio signals of the L5 channel and the R5 channel.

도 1b에서 화살표(3)으로 표시된 바와 같이, 오디오 복호화 장치(300)는 L3 채널 오디오 신호 및 L5 채널 오디오 신호를 믹싱하여 5.1.2 채널 레이아웃의 Ls5 채널 오디오 신호를 복원할 수 있다. 화살표(4)로 표시된 바와 같이, 오디오 복호화 장치(300)는 R3 채널 오디오 신호 및 R5 채널 오디오 신호를 믹싱하여 5.1.2 채널의 Rs5 채널 오디오 신호를 복원할 수 있다. 5.1.2 채널 레이아웃에 있어서, Ls5 채널은 좌측 채널을 의미하고, Rs5 채널은 우측 채널을 의미할 수 있다. 화살표(5)로 표시된 바와 같이, 오디오 복호화 장치(300)는 Hfl3 채널 오디오 신호 및 Ls5 채널 오디오 신호를 믹싱하여 5.1.2 채널 레이아웃의 Hl5 채널 오디오 신호를 복원할 수 있다. 화살표(6)으로 표시된 바와 같이, 오디오 복호화 장치(300)는 Hfr3 채널 오디오 신호 및 Rs5 채널 오디오 신호를 믹싱하여 5.1.2 채널의 Hr5 채널 오디오 신호를 복원할 수 있다. 5.1.2 채널 레이아웃에 있어서, Hl5 채널은 전방 상부 좌측 채널을 의미하고, Hr5 채널은 전방 상부 우측 채널을 의미할 수 있다.As indicated by an arrow 3 in FIG. 1B , the audio decoding apparatus 300 may reconstruct an Ls5-channel audio signal of a 5.1.2 channel layout by mixing an L3-channel audio signal and an L5-channel audio signal. As indicated by the arrow 4 , the audio decoding apparatus 300 may reconstruct the 5.1.2 channel Rs5 channel audio signal by mixing the R3 channel audio signal and the R5 channel audio signal. 5.1.2 In the channel layout, the Ls5 channel may mean the left channel, and the Rs5 channel may mean the right channel. As indicated by the arrow 5 , the audio decoding apparatus 300 may reconstruct the H15 channel audio signal of the 5.1.2 channel layout by mixing the Hfl3 channel audio signal and the Ls5 channel audio signal. As indicated by an arrow 6 , the audio decoding apparatus 300 may reconstruct a 5.1.2-channel Hr5-channel audio signal by mixing the Hfr3-channel audio signal and the Rs5-channel audio signal. 5.1.2 In the channel layout, the H15 channel may mean a front upper left channel, and the Hr5 channel may mean a front upper right channel.

한편, 일 실시 예에 다른 오디오 부호화 장치(200)는, 7.1.4 채널 레이아웃의 오디오 신호들(190)에 대한 정보를 전송하기 위해, 기본 채널 그룹의 압축 오디오 신호, 제1 종속 채널 그룹의 압축 오디오 신호, 및 제2 종속 채널 그룹의 압축 오디오 신호에 더하여, 7.1.4 채널 레이아웃의 오디오 신호들(190) 중에서, Hfl 채널, Hfr 채널, Ls 채널 및 Rs 채널의 오디오 신호들을 압축하여 제3 종속 채널 그룹의 압축 오디오 신호를 생성할 수 있다. 7.1.4 채널 레이아웃은 청자를 중심으로 음상이 구성되는 12개의 채널들로 구성되는 채널 레이아웃일 수 있다. 7.1.4 채널 레이아웃에서, Hfl 채널은 전방 상부 좌측 채널을 의미하고, Hfr은 전방 상부 우측 채널을 의미하고, Ls 채널은 좌측 채널을 의미하고, Rs 채널은 전방 우측 채널을 의미할 수 있다.Meanwhile, according to an embodiment, the audio encoding apparatus 200 compresses the compressed audio signal of the basic channel group and the first subordinate channel group in order to transmit information on the audio signals 190 of the 7.1.4 channel layout. In addition to the audio signal and the compressed audio signal of the second subordinate channel group, among the audio signals 190 of the 7.1.4 channel layout, the audio signals of the Hfl channel, the Hfr channel, the Ls channel and the Rs channel are compressed to form a third subordinate channel group. It is possible to generate a compressed audio signal of a group of channels. 7.1.4 The channel layout may be a channel layout consisting of 12 channels in which a sound image is composed centered on the listener. 7.1.4 In the channel layout, Hfl channel means front upper left channel, Hfr means front upper right channel, Ls channel means left channel, Rs channel means front right channel.

오디오 부호화 장치(200)는, 기본 채널 그룹의 압축 오디오 신호, 제1 종속 채널 그룹의 압축 오디오 신호, 제2 종속 채널 그룹의 압축 오디오 신호, 및 제3 종속 채널 그룹의 압축 오디오 신호를 오디오 복호화 장치(300)로 전송할 수 있다.The audio encoding apparatus 200 converts the compressed audio signal of the basic channel group, the compressed audio signal of the first dependent channel group, the compressed audio signal of the second dependent channel group, and the compressed audio signal of the third dependent channel group to the audio decoding apparatus It can be transmitted to (300).

일 실시 예에 따른 오디오 복호화 장치(300)는, 기본 채널 그룹의 압축 오디오 신호, 제1 종속 채널 그룹의 압축 오디오 신호, 제2 종속 채널 그룹의 압축 오디오 신호, 및 제3 종속 채널 그룹의 압축 오디오 신호로부터, 7.1.4 채널 레이아웃의 오디오 신호들을 복원할 수 있다.The audio decoding apparatus 300 according to an embodiment may include a compressed audio signal of a basic channel group, a compressed audio signal of a first dependent channel group, a compressed audio signal of a second dependent channel group, and compressed audio of a third dependent channel group. From the signal, it is possible to recover audio signals of 7.1.4 channel layout.

먼저, 오디오 복호화 장치(300)는 기본 채널 그룹의 압축 오디오 신호, 제1 종속 채널 그룹의 압축 오디오 신호 및 제2 종속 채널 그룹의 압축 오디오 신호로부터 5.1.2 채널 레이아웃의 오디오 신호들(180)을 복원할 수 있다.First, the audio decoding apparatus 300 converts the audio signals 180 of a 5.1.2 channel layout from the compressed audio signal of the basic channel group, the compressed audio signal of the first dependent channel group, and the compressed audio signal of the second dependent channel group. can be restored

다음으로, 오디오 복호화 장치(300)는 제3 종속 채널 그룹의 압축 오디오 신호를 압축 해제하여, Hfl 채널, Hfr 채널, Ls 채널, 및 Rs 채널의 오디오 신호들을 획득할 수 있다.Next, the audio decoding apparatus 300 may decompress the compressed audio signal of the third subordinate channel group to obtain audio signals of the Hfl channel, the Hfr channel, the Ls channel, and the Rs channel.

도 1b에서 화살표(7)로 표시된 바와 같이, 오디오 복호화 장치(300)는 Ls5 채널 오디오 신호 및 Ls 채널 오디오 신호를 믹싱하여 7.1.4 채널 레이아웃의 Lb 채널 오디오 신호를 복원할 수 있다. 화살표(8)로 표시된 바와 같이, 오디오 복호화 장치(300)는 Rs5 채널 오디오 신호 및 Rs 채널 오디오 신호를 믹싱하여 7.1.4 채널의 Rb 채널 오디오 신호를 복원할 수 있다. 7.1.4 채널 레이아웃에 있어서, Lb 채널은 후방 좌측 채널을 의미하고, Rb 채널은 후방 우측 채널을 의미할 수 있다. As indicated by an arrow 7 in FIG. 1B , the audio decoding apparatus 300 may reconstruct the Lb channel audio signal of the 7.1.4 channel layout by mixing the Ls5 channel audio signal and the Ls channel audio signal. As indicated by the arrow 8 , the audio decoding apparatus 300 may reconstruct the 7.1.4 channel Rb channel audio signal by mixing the Rs5 channel audio signal and the Rs channel audio signal. 7.1.4 In the channel layout, the Lb channel may mean a rear left channel, and the Rb channel may mean a rear right channel.

화살표(9)로 표시된 바와 같이, 오디오 복호화 장치(300)는 Hfl 채널 오디오 신호 및 Hl5 채널 오디오 신호를 믹싱하여 7.1.4 채널 레이아웃의 Hbl 채널 오디오 신호를 복원할 수 있다. 화살표(10)으로 표시된 바와 같이, 오디오 복호화 장치(300)는 Hfr 채널 오디오 신호 및 Hr5 채널 오디오 신호를 믹싱하여 7.1.4 채널 레이아웃의 Hbr 채널 오디오 신호를 복원할 수 있다. 7.1.4 채널 레이아웃에 있어서, Hbl 채널은 후방 상부 좌측 채널을 의미하고, Hbr 채널은 후방 상부 우측 채널을 의미할 수 있다. As indicated by the arrow 9 , the audio decoding apparatus 300 may reconstruct the Hbl channel audio signal of the 7.1.4 channel layout by mixing the Hfl channel audio signal and the H15 channel audio signal. As indicated by the arrow 10 , the audio decoding apparatus 300 may restore the Hbr channel audio signal of the 7.1.4 channel layout by mixing the Hfr channel audio signal and the Hr5 channel audio signal. 7.1.4 In the channel layout, the Hbl channel may mean the rear upper left channel, and the Hbr channel may mean the rear upper right channel.

상술한 바와 같이, 일 실시 예에 따른 오디오 복호화 장치(300)는, 복원되는 출력 다채널 오디오 신호를 스테레오 채널 레이아웃으로부터 3.1.2 채널 레이아웃, 5.1.2 채널 레이아웃, 또는 7.1.4 채널 레이아웃으로 확장할 수 있다. 그러나 본 개시는 도 1b에 도시된 예에 제한되지 않으며, 오디오 부호화 장치(200) 및 오디오 복호화 장치(300)에서 처리되는 오디오 신호들이, 스테레오 채널 레이아웃, 3.1.2 채널 레이아웃, 5.1.2 채널 레이아웃, 및 7.1.4 채널 레이아웃 이외에 다양한 채널 레이아웃으로 확장 가능하도록 구현될 수 있다. As described above, the audio decoding apparatus 300 according to an embodiment expands the restored output multi-channel audio signal from the stereo channel layout to the 3.1.2 channel layout, 5.1.2 channel layout, or 7.1.4 channel layout. can do. However, the present disclosure is not limited to the example illustrated in FIG. 1B , and audio signals processed by the audio encoding apparatus 200 and the audio decoding apparatus 300 may include a stereo channel layout, a 3.1.2 channel layout, and a 5.1.2 channel layout. , and 7.1.4 In addition to the channel layout, it can be implemented to be expandable to various channel layouts.

이하에서는, 화면 중심의 음상을 갖는 채널 레이아웃에 적합하고 하위 호환이 가능하도록 다채널 오디오 신호를 처리하는, 일 실시 예에 따른 오디오 부호화 장치(200)에 대하여 구체적으로 설명한다.Hereinafter, the audio encoding apparatus 200 according to an embodiment, which processes a multi-channel audio signal so as to be suitable for a channel layout having a screen-centered sound image and to be backward compatible, will be described in detail.

도 2a는 일 실시 예에 따른 오디오 부호화 장치의 구성을 도시하는 블록도이다.2A is a block diagram illustrating a configuration of an audio encoding apparatus according to an embodiment.

오디오 부호화 장치(200)는 메모리(210) 및 프로세서(230)를 포함한다. 오디오 부호화 장치(200)는 서버, TV, 카메라, 휴대폰, 컴퓨터, 디지털 방송용 단말, 태블릿 PC, 노트북 등 오디오 처리가 가능한 기기로 구현될 수 있다.The audio encoding apparatus 200 includes a memory 210 and a processor 230 . The audio encoding apparatus 200 may be implemented as a device capable of audio processing, such as a server, a TV, a camera, a mobile phone, a computer, a digital broadcasting terminal, a tablet PC, and a notebook computer.

도 2a에는 메모리(210) 및 프로세서(230)가 개별적으로 도시되어 있으나, 메모리(210) 및 프로세서(230)는 하나의 하드웨어 모듈(예를 들어, 칩)을 통해 구현될 수 있다. Although the memory 210 and the processor 230 are separately illustrated in FIG. 2A , the memory 210 and the processor 230 may be implemented through one hardware module (eg, a chip).

프로세서(230)는 신경망 기반의 오디오 처리를 위한 전용 프로세서로 구현될 수 있다. 또는, 프로세서(230)는 AP(application processor), CPU(central processing unit) 또는 GPU(graphic processing unit)와 같은 범용 프로세서와 소프트웨어의 조합을 통해 구현될 수도 있다. 전용 프로세서의 경우, 본 개시의 실시 예를 구현하기 위한 메모리를 포함하거나, 외부 메모리를 이용하기 위한 메모리 처리부를 포함할 수 있다.The processor 230 may be implemented as a dedicated processor for neural network-based audio processing. Alternatively, the processor 230 may be implemented through a combination of a general-purpose processor such as an application processor (AP), a central processing unit (CPU), or a graphic processing unit (GPU) and software. The dedicated processor may include a memory for implementing an embodiment of the present disclosure or a memory processing unit for using an external memory.

프로세서(230)는 복수의 프로세서들로 구성될 수도 있다. 이 경우, 전용 프로세서들의 조합으로 구현될 수도 있고, AP(Application Processor), CPU(Central Processing Unit) 또는 GPU(Graphics Processing Unit)와 같은 다수의 범용 프로세서들과 소프트웨어의 조합을 통해 구현될 수도 있다.The processor 230 may include a plurality of processors. In this case, it may be implemented as a combination of dedicated processors, or may be implemented through a combination of software and a plurality of general-purpose processors such as an application processor (AP), a central processing unit (CPU), or a graphics processing unit (GPU).

메모리(210)는 오디오 처리를 위한 하나 이상의 인스트럭션을 저장할 수 있다. 일 실시 예에서, 메모리(210)는 신경망을 저장할 수 있다. 신경망이 인공 지능을 위한 전용 하드웨어 칩 형태로 구현되거나, 기존의 범용 프로세서(예를 들어, CPU 또는 애플리케이션 프로세서) 또는 그래픽 전용 프로세서(예를 들어, GPU)의 일부로 구현되는 경우에는, 신경망이 메모리(210)에 저장되지 않을 수 있다. 신경망은 외부 장치(예를 들어, 서버)에 의해 구현될 수 있고, 이 경우, 오디오 부호화 장치(200)는 외부 장치에게 신경망에 기초한 결과 정보를 요청하고, 외부 장치로부터 신경망에 기초한 결과 정보를 수신할 수 있다.The memory 210 may store one or more instructions for audio processing. In an embodiment, the memory 210 may store a neural network. When a neural network is implemented in the form of a dedicated hardware chip for artificial intelligence, or as part of an existing general-purpose processor (eg, CPU or application processor) or graphics-only processor (eg, GPU), the neural network is 210) may not be stored. The neural network may be implemented by an external device (eg, a server). In this case, the audio encoding apparatus 200 requests the external device for neural network-based result information, and receives the neural network-based result information from the external device. can do.

프로세서(230)는 메모리(210)에 저장된 인스트럭션에 따라 원본 오디오 신호에 포함된 연속된 오디오 프레임들을 순차적으로 처리하여 압축 오디오 신호를 포함하는 비트스트림을 출력할 수 있다. 이때, 압축 오디오 신호는 원본 오디오 신호 보다 적거나 같은 채널의 개수를 갖는 오디오 신호일 수 있다.The processor 230 may sequentially process consecutive audio frames included in the original audio signal according to an instruction stored in the memory 210 to output a bitstream including the compressed audio signal. In this case, the compressed audio signal may be an audio signal having the same number of channels as the original audio signal.

비트스트림은 기본 채널 그룹의 압축 오디오 신호 및 적어도 하나의 종속 채널 그룹의 압축 오디오 신호를 포함할 수 있다. 프로세서(230)는, 전송하고자 하는 채널들의 개수에 따라, 비트스트림에 포함되는 종속 채널 그룹의 개수를 변경할 수 있다.The bitstream may include a compressed audio signal of a base channel group and a compressed audio signal of at least one dependent channel group. The processor 230 may change the number of subordinate channel groups included in the bitstream according to the number of channels to be transmitted.

도 2b는 일 실시 예에 따른 오디오 부호화 장치의 구성을 도시하는 블록도이다.2B is a block diagram illustrating a configuration of an audio encoding apparatus according to an embodiment.

도 2b를 참조하면, 오디오 부호화 장치(200)는 다채널 오디오 부호화부(250), 비트스트림 생성부(280) 및 부가 정보 생성부(285)를 포함할 수 있다.Referring to FIG. 2B , the audio encoding apparatus 200 may include a multi-channel audio encoder 250 , a bitstream generator 280 , and an additional information generator 285 .

오디오 부호화 장치(200)는 도 2a의 메모리(210) 및 프로세서(230)를 포함할 수 있고, 도 2b의 각 구성요소(250, 260, 270, 280, 285)를 구현하기 위한 인스트럭션은 도 2a의 메모리(210)에 저장될 수 있다. 프로세서(230)는 메모리(210)에 저장된 인스트럭션을 실행할 수 있다.The audio encoding apparatus 200 may include the memory 210 and the processor 230 of FIG. 2A , and instructions for implementing each of the components 250 , 260 , 270 , 280 , 285 of FIG. 2B are shown in FIG. 2A . may be stored in the memory 210 of The processor 230 may execute instructions stored in the memory 210 .

일 실시 예에 따른 오디오 부호화 장치(200)의 다채널 오디오 부호화부(250)는, 원본 오디오 신호를 처리하여, 기본 채널 그룹의 압축 오디오 신호, 적어도 하나의 종속 채널 그룹의 압축 오디오 신호, 및 부가 정보를 획득할 수 있다. 다채널 오디오 부호화부(250)는, 다채널 오디오 신호 처리부(260) 및 압축부(270)을 포함할 수 있다.The multi-channel audio encoder 250 of the audio encoding apparatus 200 according to an embodiment processes the original audio signal, and includes a compressed audio signal of a basic channel group, a compressed audio signal of at least one subordinate channel group, and an additional audio signal. information can be obtained. The multi-channel audio encoding unit 250 may include a multi-channel audio signal processing unit 260 and a compression unit 270 .

다채널 오디오 신호 처리부(260)는 원본 오디오 신호로부터 기본 채널 그룹의 적어도 하나의 오디오 신호 및 적어도 하나의 종속 채널 그룹의 적어도 하나의 오디오 신호를 획득할 수 있다. 예를 들어, 기본 채널 그룹의 오디오 신호는 모노 채널 오디오 신호 또는 스테레오 채널 레이아웃의 오디오 신호일 수 있다. 적어도 하나의 종속 채널 그룹의 오디오 신호는, 원본 오디오 신호에 포함되는 다채널 오디오 신호들 중에서, 기본 채널 그룹에 대응하는 적어도 하나의 채널을 제외한 나머지 채널들 중에서 적어도 하나의 오디오 신호를 포함할 수 있다.The multi-channel audio signal processing unit 260 may obtain at least one audio signal of a basic channel group and at least one audio signal of at least one subordinate channel group from the original audio signal. For example, the audio signal of the basic channel group may be a mono channel audio signal or an audio signal of a stereo channel layout. The audio signal of at least one subordinate channel group may include at least one audio signal among the remaining channels except for at least one channel corresponding to the basic channel group among multi-channel audio signals included in the original audio signal. .

일 예로서, 원본 오디오 신호가 7.1.4 채널 레이아웃의 오디오 신호들인 경우, 다채널 오디오 신호 처리부(260)는 7.1.4 채널 레이아웃의 오디오 신호들로부터 스테레오 채널 레이아웃의 오디오 신호들을 기본 채널 그룹의 오디오 신호들로서 획득할 수 있다. As an example, when the original audio signal is audio signals of the 7.1.4 channel layout, the multi-channel audio signal processing unit 260 converts the audio signals of the stereo channel layout from the audio signals of the 7.1.4 channel layout to the audio signals of the basic channel group. can be obtained as signals.

다채널 오디오 신호 처리부(260)는, 복호화 단에서 3.1.2 채널 레이아웃의 오디오 신호들을 복원하기 위해 이용되는, 제1 종속 채널 그룹의 오디오 신호들을 획득할 수 있다. 다채널 오디오 신호 처리부(260)는, 3.1.2 채널 레이아웃을 구성하는 채널들 중에서 기본 채널 그룹의 채널들에 대응하는 두 채널들 이외의 채널들을 제1 종속 채널 그룹으로서 결정할 수 있다. 다채널 오디오 신호 처리부(260)는, 3.1.2 채널 레이아웃의 오디오 신호들 중에서 제1 종속 채널 그룹의 오디오 신호들을 획득할 수 있다.The multi-channel audio signal processing unit 260 may obtain audio signals of the first subordinate channel group, which are used to reconstruct the audio signals of the 3.1.2 channel layout in the decoding stage. The multi-channel audio signal processing unit 260 may determine channels other than two channels corresponding to the channels of the basic channel group as the first subordinate channel group among the channels constituting the 3.1.2 channel layout. The multi-channel audio signal processing unit 260 may obtain audio signals of the first subordinate channel group from among the audio signals of the 3.1.2 channel layout.

다채널 오디오 신호 처리부(260)는, 복호화 단에서 5.1.2 채널 레이아웃의 오디오 신호들을 복원하기 위해 이용되는, 제2 종속 채널 그룹의 오디오 신호들을 획득할 수 있다. 다채널 오디오 신호 처리부(260)는, 5.1.2 채널 레이아웃을 구성하는 채널들 중에서 기본 채널 그룹 및 제1 종속 채널 그룹의 채널들에 대응하는 여섯 채널들 이외의 채널들을 제2 종속 채널 그룹으로서 결정할 수 있다. 다채널 오디오 신호 처리부(260)는, 5.1.2 채널 레이아웃의 오디오 신호들 중에서 제2 종속 채널 그룹의 오디오 신호들을 획득할 수 있다. The multi-channel audio signal processing unit 260 may obtain audio signals of the second subordinate channel group, which are used to reconstruct the audio signals of the 5.1.2 channel layout in the decoding stage. The multi-channel audio signal processing unit 260 determines, as a second subordinate channel group, channels other than six channels corresponding to the channels of the basic channel group and the first subordinate channel group among the channels constituting the 5.1.2 channel layout. can The multi-channel audio signal processing unit 260 may obtain audio signals of the second subordinate channel group from among the audio signals of the 5.1.2 channel layout.

다채널 오디오 신호 처리부(260)는, 복호화 단에서 7.1.4 채널 레이아웃의 오디오 신호들을 복원하기 위해 이용되는, 제3 종속 채널 그룹의 오디오 신호들을 획득할 수 있다. 다채널 오디오 신호 처리부(260)는, 7.1.4 채널 레이아웃을 구성하는 채널들 중에서 기본 채널 그룹, 제1 종속 채널 그룹 및 제2 종속 채널 그룹의 채널들에 대응하는 여덟 채널들 이외의 채널들을 제3 종속 채널 그룹으로서 결정할 수 있다. 다채널 오디오 신호 처리부(260)는, 7.1.4 채널 레이아웃의 오디오 신호들 중에서 제3 종속 채널 그룹의 오디오 신호들을 획득할 수 있다.The multi-channel audio signal processing unit 260 may obtain audio signals of the third subordinate channel group, which are used to reconstruct the audio signals of the 7.1.4 channel layout in the decoding stage. The multi-channel audio signal processing unit 260 generates channels other than the eight channels corresponding to the channels of the basic channel group, the first sub-channel group, and the second sub-channel group from among the channels constituting the 7.1.4 channel layout. 3 can be determined as a subordinate channel group. The multi-channel audio signal processing unit 260 may obtain audio signals of the third subordinate channel group from among the audio signals of the 7.1.4 channel layout.

기본 채널 그룹의 오디오 신호는 모노 또는 스테레오 신호일 수 있다. 또는, 기본 채널 그룹의 오디오 신호는 좌측 스테레오 채널의 오디오 신호 L과 C_1를 믹싱하여 생성된 제 1 채널의 오디오 신호를 포함할 수 있다. 여기서, C_1는 압축후 압축해제된, 청자 전방의 중심(Center) 채널의 오디오 신호일 수 있다. 오디오 신호의 명칭(“X_Y”)에서 “X”는 채널의 명칭, “Y”는 복호화되거나, 업믹싱되거나, 에러 제거를 위한 펙터가 적용됨(즉, 스케일됨) 또는 이득이 적용됨을 나타낼 수 있다. 예를 들어, 복호화된 신호는 “X_1”으로 표현되고, 복호화된 신호를 업믹싱하여 생성된 신호는 “X_2”으로 표현될 수 있다. 또는, 복호화된 신호에 이득이 적용된 신호도 ‘X_2”으로 표현될 수 있다. 업믹싱된 신호에 에러 제거를 위한 펙터가 적용된(즉, 스케일된) 신호는 “X_3”으로 표현될 수 있다. The audio signal of the basic channel group may be a mono or stereo signal. Alternatively, the audio signal of the basic channel group may include the audio signal of the first channel generated by mixing the audio signals L and C_1 of the left stereo channel. Here, C_1 may be an audio signal of a center channel in front of the listener, decompressed after compression. In the name of the audio signal (“X_Y”), “X” is the name of a channel, “Y” is decoded, upmixed, a factor for error removal is applied (ie, scaled), or a gain is applied. . For example, a decoded signal may be expressed as “X_1”, and a signal generated by upmixing the decoded signal may be expressed as “X_2”. Alternatively, a signal to which a gain is applied to the decoded signal may also be expressed as 'X_2'. A signal to which a factor for error removal is applied (ie, scaled) to the upmixed signal may be expressed as “X_3”.

또한, 기본 채널 그룹의 오디오 신호는 우측 스테레오 채널의 오디오 신호 R과 C_1를 믹싱하여 생성된 제 2 채널의 오디오 신호를 포함할 수 있다.Also, the audio signal of the basic channel group may include the audio signal of the second channel generated by mixing the audio signals R and C_1 of the right stereo channel.

압축부(270)는, 기본 채널 그룹의 적어도 하나의 오디오 신호를 압축함으로써 기본 채널 그룹의 압축 오디오 신호를 획득하고, 적어도 하나의 종속 채널 그룹의 오디오 신호들을 압축함으로써 적어도 하나의 종속 채널 그룹의 압축 오디오 신호를 획득할 수 있다. 압축부(270)는, 주파수 변환, 양자화, 엔트로피 등의 처리 과정을 거쳐 오디오 신호들을 압축할 수 있다. 예를 들어, AAC 표준, OPUS 표준 등의 오디오 신호 압축 방법이 이용될 수 있다.The compression unit 270 obtains a compressed audio signal of the base channel group by compressing at least one audio signal of the base channel group, and compresses the at least one subordinate channel group by compressing the audio signals of the at least one subordinate channel group. An audio signal can be obtained. The compression unit 270 may compress audio signals through processing such as frequency conversion, quantization, and entropy. For example, an audio signal compression method such as an AAC standard or an OPUS standard may be used.

부가 정보 생성부(285)는 원본 오디오 신호, 기본 채널 그룹의 압축 오디오 신호 및 종속 채널 그룹의 압축 오디오 신호 중 적어도 하나를 기초로, 부가 정보를 생성할 수 있다. 부가 정보는, 복호화 단에서 다채널 오디오 신호를 복원하기 위해 이용되는 다양한 정보를 포함할 수 있다. The additional information generator 285 may generate additional information based on at least one of an original audio signal, a compressed audio signal of a basic channel group, and a compressed audio signal of a subordinate channel group. The additional information may include various types of information used for reconstructing a multi-channel audio signal in a decoding end.

예를 들어, 부가 정보는 오디오 객체(음원)의 오디오 신호, 위치, 방향 중 적어도 하나를 나타내는 오디오 객체 신호를 포함할 수 있다. 또는 부가 정보는 기본 채널 오디오 스트림 및 보조 채널 오디오 스트림을 포함하는 오디오 스트림의 총 개수에 관한 정보를 포함할 수 있다. 또한, 부가 정보는 다운믹스 이득 정보를 포함할 수 있다. 부가 정보는 채널 맵핑 테이블 정보를 포함할 수 있다. 부가 정보는 음량 정보를 포함할 수 있다. 부가 정보는 저주파 효과 이득(Low Frequency Effect Gain; LFE Gain) 정보를 포함할 수 있다. 부가 정보는 동적 범위 제어(Dynamic Range Control;DRC) 정보를 포함할 수 있다. 부가 정보는 채널 레이아웃 렌더링 정보를 포함할 수 있다. 부가 정보는 그 외 커플링된 오디오 스트림의 개수 정보, 다채널 레이아웃을 나타내는 정보, 오디오 신호 내 대화 존재 여부 및 대화 레벨에 관한 정보, 저주파 효과 출력 여부를 나타내는 정보, 화면 상 오디오 객체의 존재 여부에 관한 정보, 연속적인 채널 오디오 신호의 존재 여부에 관한 정보, 비연속적인 채널 오디오 신호의 존재 여부에 관한 정보를 포함할 수 있다. For example, the additional information may include an audio object signal indicating at least one of an audio signal, a position, and a direction of an audio object (sound source). Alternatively, the additional information may include information about the total number of audio streams including the primary channel audio stream and the secondary channel audio stream. In addition, the additional information may include downmix gain information. The additional information may include channel mapping table information. The additional information may include volume information. The additional information may include Low Frequency Effect Gain (LFE Gain) information. The additional information may include dynamic range control (DRC) information. The additional information may include channel layout rendering information. The additional information includes information on the number of other coupled audio streams, information indicating multi-channel layout, information on the existence and dialogue level of dialogue in the audio signal, information indicating whether low-frequency effect is output, and the presence or absence of an audio object on the screen. information regarding the existence of a continuous channel audio signal, and information regarding the existence of a discontinuous channel audio signal may be included.

부가 정보는 다채널 오디오 신호를 복원하기 위한, 디믹싱 행렬의 적어도 하나의 디믹싱 가중치 파라미터를 포함하는 디믹싱에 관한 정보를 포함할 수 있다. 디믹싱과 (다운)믹싱은 서로 대응되는 동작이므로, 디믹싱에 관한 정보는 (다운)믹싱에 관한 정보에 대응되고, 디믹싱에 관한 정보는 (다운)믹싱에 관한 정보를 포함할 수 있다. 예를 들어, 디믹싱에 관한 정보는 (다운)믹싱 행렬의 적어도 하나의 (다운)믹싱 가중치 파라미터를 포함할 수 있다. (다운)믹싱 가중치 파라미터를 기초로, 디믹싱 가중치 파라미터가 획득될 수 있다.The additional information may include information about demixing including at least one demixing weight parameter of a demixing matrix for reconstructing a multi-channel audio signal. Since demixing and (down)mixing correspond to each other, information on demixing may correspond to information on (down)mixing, and information on demixing may include information on (down)mixing. For example, the information on demixing may include at least one (down)mixing weight parameter of a (down)mixing matrix. Based on the (down)mixing weight parameter, a demixing weight parameter may be obtained.

부가 정보는 전술한 정보들의 다양한 조합일 수 있다. 즉, 부가 정보는 전술한 적어도 하나의 정보를 포함할 수 있다.The additional information may be various combinations of the above-mentioned information. That is, the additional information may include the above-described at least one piece of information.

또한, 부가 정보 생성부(285)는 기본 채널 그룹의 적어도 하나의 오디오 신호에 대응하는, 종속 채널의 오디오 신호가 존재하는 경우, 대응하는 종속 채널의 오디오 신호가 존재함을 나타내는 정보를 생성할 수 있다.In addition, when there is an audio signal of a dependent channel corresponding to at least one audio signal of the basic channel group, the additional information generating unit 285 may generate information indicating that an audio signal of the corresponding dependent channel exists. have.

비트스트림 생성부(280)는 기본 채널 그룹의 압축 오디오 신호 및 종속 채널 그룹의 압축 오디오 신호를 포함하는 비트스트림을 생성할 수 있다. 비트스트림 생성부(280)는 부가 정보 생성부(285)에서 생성된 부가 정보를 더 포함하는 비트스트림을 생성할 수 있다. The bitstream generator 280 may generate a bitstream including the compressed audio signal of the basic channel group and the compressed audio signal of the dependent channel group. The bitstream generator 280 may generate a bitstream further including the additional information generated by the additional information generator 285 .

예를 들어, 비트스트림 생성부(280)는, 기본 채널 그룹의 압축 오디오 신호가 기본 채널 오디오 스트림(Base Channel Audio Stream)에 포함되고, 종속 채널 그룹의 압축 오디오 신호가 종속 채널 오디오 스트림(Dependent Channel Audio Stream)에 포함되고, 부가 정보가 메타 데이터에 포함되도록 캡슐화(encapsulation)를 수행함으로써 비트스트림을 생성할 수 있다. 비트스트림 생성부(280)는 기본 채널 오디오 스트림 및 복수의 종속 채널 오디오 스트림들을 포함하는 비트스트림을 생성할 수 있다. For example, the bitstream generator 280 may include a compressed audio signal of a base channel group in a base channel audio stream, and a compressed audio signal of a dependent channel group in a dependent channel audio stream. Audio Stream) and encapsulation is performed so that additional information is included in metadata to generate a bitstream. The bitstream generator 280 may generate a bitstream including a base channel audio stream and a plurality of dependent channel audio streams.

일 실시 예에 따른 오디오 복호화 장치(300)가, 비트스트림으로부터 복원하는 다채널 오디오 신호의 채널 레이아웃들은 다음의 규칙을 따를 수 있다. The channel layouts of the multi-channel audio signal restored from the bitstream by the audio decoding apparatus 300 according to an embodiment may follow the following rule.

예를 들어, 오디오 복호화 장치(300)가, 기본 채널 그룹의 압축 오디오 신호 및 제1 종속 채널 그룹의 압축 오디오 신호로부터 복원하는 다채널 오디오 신호의 제1 채널 레이아웃은 S_n-1개의 서라운드 채널, W_n-1개의 서브 우퍼 채널, 및 H_n-1 개의 높이 채널로 구성될 수 있다. 오디오 복호화 장치(300)가, 기본 채널 그룹의 압축 오디오 신호, 제1 종속 채널 그룹의 압축 오디오 신호, 및 제2 종속 채널 그룹의 압축 오디오 신호로부터 복원하는 다채널 오디오 신호의 제2 채널 레이아웃은, S_n개의 서라운드 채널, W_n개의 서브 우퍼 채널, 및 H_n 개의 높이 채널로 구성될 수 있다.For example, the first channel layout of the multi-channel audio signal that the audio decoding apparatus 300 restores from the compressed audio signal of the basic channel group and the compressed audio signal of the first dependent channel group is S _n-1 surround channels; It may be composed of W _n-1 subwoofer channels and H _n-1 height channels. The second channel layout of the multi-channel audio signal that the audio decoding apparatus 300 restores from the compressed audio signal of the basic channel group, the compressed audio signal of the first dependent channel group, and the compressed audio signal of the second dependent channel group, It may be composed of S _n surround channels, W _n subwoofer channels, and H _n height channels.

오디오 복호화 장치(300)가 기본 채널 그룹 및 제1 종속 채널 그룹을 고려하여 복원한 다채널 오디오 신호의 제1 채널 레이아웃과 비교하여, 오디오 복호화 장치(300)가 제2 종속 채널 그룹을 더 고려하여 복원한 다채널 오디오 신호의 제2 채널 레이아웃은, 더 많은 개수의 채널들을 포함할 수 있다. 즉, 제1 채널 레이아웃은, 제2 채널 레이아웃의 하위 채널 레이아웃일 수 있다.Compared with the first channel layout of the multi-channel audio signal restored by the audio decoding apparatus 300 in consideration of the basic channel group and the first dependent channel group, the audio decoding apparatus 300 further considers the second dependent channel group The second channel layout of the reconstructed multi-channel audio signal may include a larger number of channels. That is, the first channel layout may be a sub-channel layout of the second channel layout.

구체적으로, S_n-1은 S_n보다 작거나 같고, W_n-1은 W_n보다 작거나 같고, H_n-1은 H_n보다 작거나 같을 수 있다.이 때, S_n-1이S_n과 동일하고,W_n-1이W_n과 동일하고, H_n-1이H_n과 동일한 경우는 제외된다. 예를 들어, 제1 채널 레이아웃이 5.1.2 채널 레이아웃이라고 하면, 제2 채널 레이아웃은 5.1.4 채널 레이아웃이거나, 7.1.2 채널 레이아웃일 수 있다.Specifically, S _n-1 may be less than or equal to S _n , W _n-1 may be less than or equal to W _n , and H _n-1 may be less than or equal to H _n .At this time, S _{n-1 is} the same as S _n ,W _{n-1 is} equal to W _n , and H _n-1 isThe same case as H _n is excluded. For example, if the first channel layout is a 5.1.2 channel layout, the second channel layout may be a 5.1.4 channel layout or a 7.1.2 channel layout.

오디오 부호화 장치(200)에서 생성되어 전송된 비트스트림을 수신한 오디오 복호화 장치(300)는, 기본 채널 오디오 스트림으로부터 기본 채널 그룹의 오디오 신호를 복원할 수 있고, 종속 채널 오디오 스트림을 더 고려하여, 기본 채널 그룹보다 많은 개수의 채널들로 구성되는 다양한 채널 레이아웃의 다채널 오디오 신호를 복원할 수 있다. The audio decoding apparatus 300 receiving the bitstream generated and transmitted by the audio encoding apparatus 200 may reconstruct the audio signal of the base channel group from the base channel audio stream, further considering the dependent channel audio stream, It is possible to restore a multi-channel audio signal having various channel layouts including a greater number of channels than the basic channel group.

도 2c는 일 실시 예에 따른 오디오 부호화 장치(200)의 다채널 오디오 신호 처리부의 구성을 도시하는 블록도이다.2C is a block diagram illustrating a configuration of a multi-channel audio signal processing unit of the audio encoding apparatus 200 according to an exemplary embodiment.

도 2c를 참조하면, 다채널 오디오 신호 처리부(260)는 채널 레이아웃 식별부(261), 다채널 오디오 변환부(262) 및 믹싱부(266)를 포함한다.Referring to FIG. 2C , the multi-channel audio signal processing unit 260 includes a channel layout identification unit 261 , a multi-channel audio conversion unit 262 , and a mixing unit 266 .

채널 레이아웃 식별부(261)는 원본 오디오 신호로부터, 적어도 하나의 채널 레이아웃을 식별할 수 있다. 이때, 적어도 하나의 채널 레이아웃은 계층적인 복수의 채널 레이아웃들을 포함할 수 있다. The channel layout identification unit 261 may identify at least one channel layout from the original audio signal. In this case, at least one channel layout may include a plurality of hierarchical channel layouts.

먼저, 채널 레이아웃 식별부(261)는 원본 오디오 신호의 채널 레이아웃을 식별하고, 원본 오디오 신호의 채널 레이아웃보다 하위 채널 레이아웃을 식별할 수 있다. 예를 들어, 원본 오디오 신호가 7.1.4 채널 레이아웃의 오디오 신호인 경우, 채널 레이아웃 식별부(261)는 7.1.4 채널 레이아웃을 식별하고, 7.1.4 채널 레이아웃보다 하위 채널 레이아웃인 5.1.2 채널, 3.1.2 채널 및 2 채널 등을 식별할 수 있다. First, the channel layout identification unit 261 may identify a channel layout of the original audio signal and identify a lower channel layout than the channel layout of the original audio signal. For example, when the original audio signal is an audio signal of the 7.1.4 channel layout, the channel layout identification unit 261 identifies the 7.1.4 channel layout, and 5.1.2 channel that is a lower channel layout than the 7.1.4 channel layout. , 3.1.2 channels and 2 channels, etc. can be identified.

채널 레이아웃 식별부(261)는, 오디오 부호화 장치(200)가 출력하고자 하는 비트스트림에 포함되는 오디오 신호들의 채널 레이아웃을 타겟 채널 레이아웃으로서 식별할 수 있다. 타겟 채널 레이아웃은 원본 오디오 신호의 채널 레이아웃, 또는 원본 오디오 신호의 채널 레이아웃보다 하위 채널 레이아웃일 수 있다. 채널 레이아웃 식별부(261)는 타겟 채널 레이아웃을 미리 결정된 채널 레이아웃들 중에서 식별할 수 있다.The channel layout identification unit 261 may identify a channel layout of audio signals included in a bitstream to be output by the audio encoding apparatus 200 as a target channel layout. The target channel layout may be a channel layout of the original audio signal or a channel layout lower than the channel layout of the original audio signal. The channel layout identification unit 261 may identify the target channel layout from among predetermined channel layouts.

채널 레이아웃 식별부(261)는 식별된 타겟 채널 레이아웃을 기초로, 제1 다운믹스 채널 오디오 생성부(263), 제2 다운믹스 채널 오디오 생성부(264), ... 제N 다운믹스 채널 오디오 생성부(265) 중 식별된 타겟 채널 레이아웃에 대응하는 다운믹스 채널 오디오 생성부를 결정할 수 있다. 다채널 오디오 변환부 (262)는, 결정된 다운믹스 채널 오디오 생성부를 이용하여 타겟 채널 레이아웃의 오디오 신호를 생성할 수 있다. The channel layout identification unit 261 includes a first downmix channel audio generation unit 263, a second downmix channel audio generation unit 264, ... Nth downmix channel audio based on the identified target channel layout. Among the generators 265 , a downmix channel audio generator corresponding to the identified target channel layout may be determined. The multi-channel audio converter 262 may generate an audio signal of a target channel layout using the determined downmix channel audio generator.

다채널 오디오 변환부(262)의 다운믹스 채널 오디오 생성부들(263, 264, 265)는 다운믹싱 가중치 파라미터를 포함하는 다운믹싱 매트릭스를 이용하여, 제1 채널 레이아웃의 원본 오디오 신호로부터 각각 제2 채널 레이아웃의 오디오 신호, 제3 채널 레이아웃의 오디오 신호 또는, 제4 채널 레이아웃의 오디오 신호를 생성할 수 있다.The downmix channel audio generators 263 , 264 , and 265 of the multi-channel audio converter 262 use a downmixing matrix including a downmixing weight parameter to obtain a second channel from the original audio signal of the first channel layout, respectively. The audio signal of the layout, the audio signal of the third channel layout, or the audio signal of the fourth channel layout may be generated.

도 2c에는 다채널 오디오 변환부(262)가, 복수의 다운믹스 채널 오디오 생성부들(263, 264, 265)을 포함하는 것으로 도시되었으나, 본 개시는 이에 제한되지 않는다. 다채널 오디오 변환부(262)는, 원본 오디오 신호를 적어도 하나의 다른 채널 레이아웃으로 변환하여 출력할 수 있다. 예를 들어, 다채널 오디오 변환부(262)는, 제1 채널 레이아웃의 원본 오디오 신호를 제1 채널 레이아웃의 하위 채널 레이아웃인 제2 채널 레이아웃의 오디오 신호로 변환할 수 있고, 제1 채널 레이아웃 및 제2 채널 레이아웃은 구현에 따라서 다양한 다채널 레이아웃을 포함할 수 있다. Although it is illustrated in FIG. 2C that the multi-channel audio converter 262 includes a plurality of downmix channel audio generators 263 , 264 , and 265 , the present disclosure is not limited thereto. The multi-channel audio converter 262 may convert the original audio signal into at least one other channel layout and output the converted audio signal. For example, the multi-channel audio converter 262 may convert an original audio signal of the first channel layout into an audio signal of a second channel layout that is a sub-channel layout of the first channel layout, the first channel layout and The second channel layout may include various multi-channel layouts according to implementation.

믹싱부(266)는, 다채널 오디오 변환부(262)에서 채널 레이아웃이 변환된 오디오 신호를 믹싱함으로써 기본 채널 그룹의 오디오 신호들 및 종속 채널 그룹의 오디오 신호들을 획득하고 출력할 수 있다. 따라서, 복호화 단의 오디오 재생 환경에 따라서, 기본 채널 그룹의 오디오 신호들만이 출력되거나, 기본 채널 그룹의 오디오 신호들 및 종속 채널 그룹의 오디오 신호들에 기초하여 다채널 오디오 신호가 복원되어 출력될 수 있다.The mixing unit 266 may obtain and output the audio signals of the basic channel group and the audio signals of the subordinate channel group by mixing the audio signal whose channel layout is converted by the multi-channel audio conversion unit 262 . Accordingly, depending on the audio reproduction environment of the decoding stage, only the audio signals of the basic channel group may be output, or the multi-channel audio signal may be restored and output based on the audio signals of the basic channel group and the audio signals of the dependent channel group. have.

일 실시 예에 따르면 '기본 채널 그룹'은 적어도 하나의 '기본 채널'을 포함하는 그룹을 의미할 수 있다. '기본 채널'의 오디오 신호는, 다른 채널(예를 들어, 종속 채널)의 오디오 신호에 대한 정보 없이 독립적으로 복호화되어, 소정 채널 레이아웃을 구성할 수 있는 오디오 신호를 포함할 수 있다.According to an embodiment, the 'basic channel group' may mean a group including at least one 'basic channel'. The audio signal of the 'basic channel' may include an audio signal that is independently decoded without information on an audio signal of another channel (eg, a dependent channel) to configure a predetermined channel layout.

'종속 채널 그룹'은 적어도 하나의 '종속 채널'을 포함하는 그룹을 의미할 수 있다. '종속 채널'의 오디오 신호는, ‘기본 채널의 오디오 신호와 함께 믹싱되어 소정 채널 레이아웃의 적어도 하나의 채널을 구성하는 오디오 신호를 포함할 수 있다. The 'dependent channel group' may mean a group including at least one 'dependent channel'. The audio signal of the 'dependent channel' may include an audio signal that is mixed with the audio signal of the basic channel to configure at least one channel of a predetermined channel layout.

일 실시 예에 따른 믹싱부(266)는, 변환된 채널 레이아웃의 적어도 두 채널들의 오디오 신호들을 믹싱함으로써, 기본 채널의 오디오 신호를 획득할 수 있다. 믹싱부(266)는, 변환된 채널 레이아웃에 포함되는 채널들 중에서, 기본 채널 그룹에 대응하는 적어도 하나의 채널 이외의 채널들의 오디오 신호들을 종속 채널 그룹의 오디오 신호들로서 획득할 수 있다.The mixing unit 266 according to an embodiment may obtain an audio signal of a basic channel by mixing audio signals of at least two channels of the converted channel layout. The mixing unit 266 may obtain audio signals of channels other than at least one channel corresponding to the basic channel group as audio signals of the dependent channel group among channels included in the converted channel layout.

도 2d는 일 실시 예에 따른 다채널 오디오 신호 처리부(260)의 동작을 설명하는 도면이다.2D is a diagram for explaining an operation of the multi-channel audio signal processing unit 260 according to an exemplary embodiment.

도 2d를 참조하면, 도 2c의 다채널 오디오 변환부(262)는 7.1.4 채널 레이아웃(290)의 원본 오디오 신호로부터, 하위 채널 레이아웃의 오디오 신호인 5.1.2 채널 레이아웃(291)의 오디오 신호, 3.1.2 채널 레이아웃(292)의 오디오 신호 및 2 채널 레이아웃(293)의 오디오 신호 및 모노 채널 레이아웃(294)의 오디오 신호를 획득할 수 있다. 다채널 오디오 변환부(262)의 각 다운믹스 채널 오디오 생성부(263,264,...,265)는 캐스케이드한 방식으로 연결되어 있기 때문에, 순차적으로, 현재 채널 레이아웃으로부터 바로 하위 채널 레이아웃의 오디오 신호를 획득할 수 있다. Referring to FIG. 2D , the multi-channel audio converter 262 of FIG. 2C receives the audio signal of the 5.1.2 channel layout 291 that is the audio signal of the lower channel layout from the original audio signal of the 7.1.4 channel layout 290 . , 3.1.2 the audio signal of the channel layout 292 and the audio signal of the two-channel layout 293 and the audio signal of the mono channel layout 294 may be obtained. Since each downmix channel audio generator 263,264,...,265 of the multi-channel audio converter 262 is connected in a cascaded manner, sequentially, an audio signal of a sub-channel layout directly from the current channel layout is generated. can be obtained

도 2d는, 믹싱부(266)가, 모노 채널 레이아웃(294)의 오디오 신호를 기본 채널 그룹(295)의 오디오 신호로 분류하여 출력하는 경우를 예로 들어 도시한다. FIG. 2D illustrates an example in which the mixing unit 266 classifies the audio signal of the mono channel layout 294 into the audio signal of the basic channel group 295 and outputs the classified audio signal.

일 실시 예에 따른 믹싱부(266)는, 2 채널 레이아웃(293)의 오디오 신호에 포함되는 L2 채널의 오디오 신호를 종속 채널 그룹 #1(296)의 오디오 신호로 분류할 수 있다. 도 2d에 도시된 바와 같이, L2 채널의 오디오 신호와 R2 채널의 오디오 신호가 믹싱되어 모노 채널 레이아웃(294)의 오디오 신호가 생성된다. 오디오 복호화 장치(300)는 모노 채널 레이아웃(294)의 오디오 신호(즉, 기본 채널 그룹(295)의 오디오 신호)와 종속 채널 그룹 #1의 L2 채널의 오디오 신호를 믹싱하여 R2 채널의 오디오 신호를 복원할 수 있다. 따라서, 오디오 부호화 장치(200)가 R2 채널의 오디오 신호를 전송하지 않고, 모노 채널 레이아웃(294)의 오디오 신호와 종속 채널 그룹 #1(296)의 L2 채널의 오디오 신호만을 전송하더라도, 오디오 복호화 장치(300)는, 모노 채널 레이아웃(294) 또는 스테레오 채널 레이아웃(293)으로 오디오 신호를 복원할 수 있다.The mixing unit 266 according to an embodiment may classify the audio signal of the L2 channel included in the audio signal of the two-channel layout 293 into the audio signal of the subordinate channel group #1 (296). As shown in FIG. 2D , the audio signal of the L2 channel and the audio signal of the R2 channel are mixed to generate the audio signal of the mono channel layout 294 . The audio decoding apparatus 300 mixes the audio signal of the mono channel layout 294 (that is, the audio signal of the basic channel group 295) and the audio signal of the L2 channel of the dependent channel group #1 to obtain the audio signal of the R2 channel. can be restored Accordingly, even if the audio encoding apparatus 200 does not transmit the R2 channel audio signal and only transmits the mono channel layout 294 audio signal and the L2 channel audio signal of the dependent channel group #1 296, the audio decoding apparatus The 300 may restore the audio signal to the mono channel layout 294 or the stereo channel layout 293 .

일 실시 예에 따른 믹싱부(266)는, 3.1.2 채널 레이아웃(292)의 오디오 신호에 포함되는 Hfl3 채널, C 채널, LFE 채널, 및 Hfr3 채널의 오디오 신호들을 종속 채널 그룹 #2(297)의 오디오 신호들로 분류할 수 있다. 도 2d에 도시된 바와 같이, L3 채널의 오디오 신호와 C 채널의 오디오 신호가 믹싱되어 스테레오 채널 레이아웃(293)의 L2 채널의 오디오 신호가 생성된다. R3 채널의 오디오 신호와 C 채널의 오디오 신호가 믹싱되어 스테레오 채널 레이아웃(293)의 R2 채널의 오디오 신호가 생성된다. The mixing unit 266 according to an embodiment converts the audio signals of the Hfl3 channel, the C channel, the LFE channel, and the Hfr3 channel included in the audio signal of the 3.1.2 channel layout 292 to the dependent channel group #2 (297) can be classified into audio signals of As shown in FIG. 2D , the audio signal of the L3 channel and the audio signal of the C channel are mixed to generate the audio signal of the L2 channel of the stereo channel layout 293 . The audio signal of the R3 channel and the audio signal of the C channel are mixed to generate the audio signal of the R2 channel of the stereo channel layout 293 .

오디오 복호화 장치(300)는 스테레오 채널 레이아웃(293)의 L2 채널의 오디오 신호와 종속 채널 그룹 #2(297)의 C 채널의 오디오 신호를 믹싱하여 3.1.2 채널의 L3 채널의 오디오 신호를 복원할 수 있다. 오디오 복호화 장치(300)는 스테레오 채널 레이아웃(293)의 R2' 채널의 오디오 신호와 종속 채널 그룹 #2(297)의 C 채널의 오디오 신호를 믹싱하여 3.1.2 채널의 R3 채널의 오디오 신호를 복원할 수 있다. 따라서, 오디오 부호화 장치(200)가 3.1.2 채널의 L3 채널 및 R3 채널의 오디오 신호들을 전송하지 않고, 모노 채널 레이아웃(294)의 오디오 신호, 종속 채널 그룹 #1(296)의 오디오 신호, 및 종속 채널 그룹 #2(297)의 오디오 신호들만을 전송하더라도, 오디오 복호화 장치(300)는, 모노 채널 레이아웃(294), 스테레오 채널 레이아웃(293), 또는 3.1.2 채널 레이아웃(292)으로 오디오 신호를 복원할 수 있다. The audio decoding apparatus 300 mixes the audio signal of the L2 channel of the stereo channel layout 293 with the audio signal of the C channel of the dependent channel group #2 297 to restore the audio signal of the L3 channel of 3.1.2 channel. can The audio decoding apparatus 300 restores the audio signal of the R3 channel of 3.1.2 channel by mixing the audio signal of the R2' channel of the stereo channel layout 293 and the C channel audio signal of the dependent channel group #2 (297) can do. Accordingly, the audio encoding apparatus 200 does not transmit the audio signals of the 3.1.2 channel L3 channel and the R3 channel, and the audio signal of the mono channel layout 294, the audio signal of the dependent channel group #1 296, and Even if only the audio signals of the dependent channel group #2 (297) are transmitted, the audio decoding apparatus 300 performs the audio signal in the mono channel layout 294, the stereo channel layout 293, or the 3.1.2 channel layout 292. can be restored.

일 실시 예에 따른 믹싱부(266)는 5.1.2 채널 레이아웃(291)의 오디오 신호를 전송하기 위해, 5.1.2 채널 레이아웃(291)의 일부 채널의 오디오 신호인 L 채널의 오디오 신호와 R 채널의 오디오 신호를 종속 채널 그룹 #3(298)의 오디오 신호로 전송할 수 있다. 오디오 부호화 장치(200)가 5.1.2 채널의 Ls5 채널, Hl5 채널, Rs5 채널, 및 Hr5 채널의 오디오 신호들을 모두 전송하지 않더라도, 오디오 복호화 장치(300)는, 모노 채널 레이아웃(294)의 오디오 신호, 종속 채널 그룹 #1(296)의 오디오 신호, 종속 채널 그룹 #2(297)의 오디오 신호들, 및 종속 채널 그룹 #3(298)의 오디오 신호들의 적어도 두 오디오 신호들의 믹싱을 통해 5.1.2 채널 레이아웃(291)으로 오디오 신호를 복원할 수 있다. 오디오 복호화 장치(300)는, 모노 채널 레이아웃(294)의 오디오 신호, 종속 채널 그룹 #1(296)의 오디오 신호, 종속 채널 그룹 #2(297)의 오디오 신호들, 및 종속 채널 그룹 #3(298)의 오디오 신호들 중 적어도 하나에 기초하여, 모노 채널 레이아웃(294), 스테레오 채널 레이아웃(293), 3.1.2 채널 레이아웃(292) 또는 5.1.2 채널 레이아웃(291)으로 오디오 신호를 복원할 수 있다.The mixing unit 266 according to an embodiment transmits the audio signal of the 5.1.2 channel layout 291, the audio signal of some channels of the 5.1.2 channel layout 291, the L channel audio signal and the R channel An audio signal of can be transmitted as an audio signal of the dependent channel group #3 (298). Even if the audio encoding apparatus 200 does not transmit all of the audio signals of the 5.1.2 channel Ls5 channel, H15 channel, Rs5 channel, and Hr5 channel, the audio decoding apparatus 300 performs the audio signal of the mono channel layout 294 , 5.1.2 through mixing of at least two audio signals of the audio signal of the dependent channel group #1 296, the audio signals of the dependent channel group #2 297, and the audio signals of the dependent channel group #3 298 An audio signal may be restored with the channel layout 291 . The audio decoding apparatus 300 includes the audio signal of the mono channel layout 294, the audio signal of the dependent channel group #1 (296), the audio signals of the dependent channel group #2 (297), and the dependent channel group #3 ( 298) to restore the audio signal to a mono channel layout 294, a stereo channel layout 293, a 3.1.2 channel layout 292, or a 5.1.2 channel layout 291, based on at least one of the audio signals of can

일 실시 예에 따른 오디오 부호화 장치(200)는, 소정 다채널 레이아웃의 일부 채널들의 오디오 신호들을 종속 채널 그룹의 오디오 신호들로 결정함에 있어서, 청자 전방에 배치된 채널들을 우선적으로 종속 채널들로 결정할 수 있다. 오디오 부호화 장치(200)는, 청자 전방에 배치된 채널들의 오디오 신호들을 그대로 압축하여 종속 채널 그룹의 압축 오디오 신호로서 전송함으로써, 복호화 단에서 청자 전방의 오디오 채널의 오디오 신호의 음질이 향상되도록 할 수 있다. 이로 인해, 청자는, 디스플레이 장치를 통해 재생되는 오디오 컨텐츠의 음질이 보다 향상된 것처럼 느낄 수 있다.The audio encoding apparatus 200 according to an embodiment determines the channels arranged in front of the listener as the dependent channels in determining the audio signals of some channels of the predetermined multi-channel layout as the audio signals of the dependent channel group. can The audio encoding apparatus 200 compresses the audio signals of the channels arranged in front of the listener as it is and transmits them as compressed audio signals of the sub-channel group, so that the decoding end can improve the sound quality of the audio signals of the audio channel in front of the listener. have. Accordingly, the listener may feel that the sound quality of the audio content reproduced through the display device is improved.

그러나 본 개시는 이러한 실시 예에 제한되지 않으며, 소정 다채널 레이아웃의 채널들 중에서, 소정 기준을 만족하는 채널들, 또는 사용자에 의해 설정된 채널들이 종속 채널 그룹에 포함되는 채널들이 결정될 수 있다. 종속 채널 그룹에 포함되는 것으로 결정되는 채널들은, 구현에 따라 다양하게 결정될 수 있다.However, the present disclosure is not limited to this embodiment, and among channels of a predetermined multi-channel layout, channels that satisfy a predetermined criterion or channels set by a user are included in a subordinate channel group may be determined. Channels determined to be included in the dependent channel group may be variously determined according to implementation.

또한, 도 2d에는 다채널 오디오 변환부(262)가 7.1.4 채널 레이아웃(290)의 원본 오디오 신호로부터, 하위 채널 레이아웃의 오디오 신호인 5.1.2 채널 레이아웃(291)의 오디오 신호, 3.1.2 채널 레이아웃(292)의 오디오 신호 및 2 채널 레이아웃(293)의 오디오 신호 및 모노 채널 레이아웃(294)의 오디오 신호를 모두 획득하는 경우를 예로 들어 도시하였으나, 본 개시는 이러한 실시 예에 제한되지 않는다. In addition, in FIG. 2D , the multi-channel audio converter 262 receives the audio signal of the 5.1.2 channel layout 291, which is the audio signal of the sub-channel layout, from the original audio signal of the 7.1.4 channel layout 290, 3.1.2 Although the case where both the audio signal of the channel layout 292 and the audio signal of the two-channel layout 293 and the audio signal of the mono channel layout 294 are acquired is shown as an example, the present disclosure is not limited to this embodiment.

다채널 오디오 변환부(262)는, 제1 채널 레이아웃의 원본 오디오 신호를 하위 채널 레이아웃인 제2 채널 레이아웃의 오디오 신호로 변환할 수 있고, 제1 채널 레이아웃 및 제2 채널 레이아웃은 구현에 따라서 다양한 다채널 레이아웃을 포함할 수 있다. 예를 들어, 다채널 오디오 변환부(262)는, 7.1.4 채널 레이아웃의 원본 오디오 신호를 3.1.2 채널 레이아웃의 오디오 신호로 변환할 수 있다. The multi-channel audio converter 262 may convert an original audio signal of the first channel layout into an audio signal of a second channel layout that is a sub-channel layout, and the first channel layout and the second channel layout may vary according to implementation. It can include multi-channel layouts. For example, the multi-channel audio converter 262 may convert the original audio signal of the 7.1.4 channel layout into the audio signal of the 3.1.2 channel layout.

이하에서는, 일 실시 예에 따른 오디오 부호화 장치(200)로부터 전송되는 비트스트림을 처리하여 오디오 신호를 복원하는 일 실시 예에 따른 오디오 복호화 장치(300)에 대하여 구체적으로 설명한다.Hereinafter, an audio decoding apparatus 300 according to an embodiment for reconstructing an audio signal by processing a bitstream transmitted from the audio encoding apparatus 200 according to an embodiment will be described in detail.

도 3a는 일 실시 예에 따른 다채널 오디오 복호화 장치의 구성을 도시하는 블록도이다.3A is a block diagram illustrating a configuration of a multi-channel audio decoding apparatus according to an embodiment.

오디오 복호화 장치(300)는 메모리(310) 및 프로세서(330)를 포함한다. 오디오 복호화 장치(300)는 서버, TV, 카메라, 휴대폰, 컴퓨터, 디지털 방송용 단말, 태블릿 PC, 노트북 등 오디오 처리가 가능한 기기로 구현될 수 있다.The audio decoding apparatus 300 includes a memory 310 and a processor 330 . The audio decoding apparatus 300 may be implemented as a device capable of audio processing, such as a server, a TV, a camera, a mobile phone, a computer, a digital broadcasting terminal, a tablet PC, and a notebook computer.

도 3a에는 메모리(310) 및 프로세서(330)가 개별적으로 도시되어 있으나, 메모리(310) 및 프로세서(330)는 하나의 하드웨어 모듈(예를 들어, 칩)을 통해 구현될 수 있다. Although the memory 310 and the processor 330 are separately illustrated in FIG. 3A , the memory 310 and the processor 330 may be implemented through one hardware module (eg, a chip).

프로세서(330)는 신경망 기반의 오디오 처리를 위한 전용 프로세서로 구현될 수 있다. 또는, 프로세서(230)는 AP(application processor), CPU(central processing unit) 또는 GPU(graphic processing unit)와 같은 범용 프로세서와 소프트웨어의 조합을 통해 구현될 수도 있다. 전용 프로세서의 경우, 본 개시의 실시 예를 구현하기 위한 메모리를 포함하거나, 외부 메모리를 이용하기 위한 메모리 처리부를 포함할 수 있다The processor 330 may be implemented as a dedicated processor for neural network-based audio processing. Alternatively, the processor 230 may be implemented through a combination of a general-purpose processor such as an application processor (AP), a central processing unit (CPU), or a graphic processing unit (GPU) and software. In the case of a dedicated processor, it may include a memory for implementing an embodiment of the present disclosure or a memory processing unit for using an external memory.

프로세서(330)는 복수의 프로세서들로 구성될 수도 있다. 이 경우, 전용 프로세서들의 조합으로 구현될 수도 있고, AP, CPU 또는 GPU와 같은 다수의 범용 프로세서들과 소프트웨어의 조합을 통해 구현될 수도 있다.The processor 330 may include a plurality of processors. In this case, it may be implemented as a combination of dedicated processors, or may be implemented through a combination of software and a plurality of general-purpose processors such as an AP, CPU, or GPU.

메모리(310)는 오디오 처리를 위한 하나 이상의 인스트럭션을 저장할 수 있다. 일 실시 예에서, 메모리(310)는 신경망을 저장할 수 있다. 신경망이 인공 지능을 위한 전용 하드웨어 칩 형태로 구현되거나, 기존의 범용 프로세서(예를 들어, CPU 또는 애플리케이션 프로세서) 또는 그래픽 전용 프로세서(예를 들어, GPU)의 일부로 구현되는 경우에는, 신경망이 메모리(310)에 저장되지 않을 수 있다. 신경망은 외부 장치(예를 들어, 서버)에 의해 구현될 수 있고, 이 경우, 오디오 복호화 장치(300)는 외부 장치에게 신경망에 기초한 결과 정보를 요청하고, 외부 장치로부터 신경망에 기초한 결과 정보를 수신할 수 있다.The memory 310 may store one or more instructions for audio processing. In an embodiment, the memory 310 may store a neural network. When a neural network is implemented in the form of a dedicated hardware chip for artificial intelligence, or as part of an existing general-purpose processor (eg, CPU or application processor) or graphics-only processor (eg, GPU), the neural network is 310) may not be stored. The neural network may be implemented by an external device (eg, a server). In this case, the audio decoding device 300 requests the external device for neural network-based result information, and receives the neural network-based result information from the external device. can do.

프로세서(330)는 메모리(310)에 저장된 인스트럭션에 따라 연속된 프레임들을 순차적으로 처리하여 연속된 복원 프레임들을 획득한다. 연속된 프레임은 오디오를 구성하는 프레임들을 의미할 수 있다. The processor 330 sequentially processes successive frames according to an instruction stored in the memory 310 to obtain successive restored frames. A continuous frame may mean frames constituting audio.

프로세서(330)는 비트스트림을 수신하고, 수신된 비트스트림에 대한 오디오 처리 동작을 수행하여 다채널 오디오 신호를 출력할 수 있다. 이때, 비트스트림은 기본 채널 그룹으로부터 채널의 개수를 증가시킬 수 있도록 스케일러블한 형태로 구현될 수 있다.The processor 330 may receive a bitstream and may output a multi-channel audio signal by performing an audio processing operation on the received bitstream. In this case, the bitstream may be implemented in a scalable form to increase the number of channels from the basic channel group.

프로세서(330)는 비트스트림으로부터 기본 채널 그룹의 압축 오디오 신호를 획득할 수 있고, 기본 채널 그룹의 압축 오디오 신호를 압축 해제하여 기본 채널 그룹의 오디오 신호(예를 들어, 모노 채널 오디오 신호 또는 스테레오 채널 오디오 신호)를 복원할 수 있다. 추가적으로, 프로세서(330)는 비트스트림으로부터 적어도 하나의 종속 채널 그룹의 압축 오디오 신호를 압축 해제하여 적어도 하나의 종속 채널 그룹의 오디오 신호를 복원할 수 있다. 프로세서(330)는 기본 채널 그룹의 오디오 신호 및 적어도 하나의 종속 채널 그룹의 오디오 신호를 기초로, 기본 채널 그룹의 채널 개수보다 증가된 채널 개수의 다채널 오디오 신호를 복원할 수 있다. The processor 330 may obtain a compressed audio signal of the basic channel group from the bitstream, and decompress the compressed audio signal of the basic channel group to obtain an audio signal of the basic channel group (eg, a mono-channel audio signal or a stereo channel). audio signal) can be restored. Additionally, the processor 330 may decompress the compressed audio signal of the at least one dependent channel group from the bitstream to reconstruct the audio signal of the at least one dependent channel group. The processor 330 may reconstruct a multi-channel audio signal having an increased number of channels than the number of channels in the basic channel group, based on the audio signal of the basic channel group and the audio signal of at least one subordinate channel group.

일 실시 예에 따른 프로세서(330)는 비트스트림으로부터 제1 종속 채널 그룹의 압축 오디오 신호를 압축 해제하여 제1 종속 채널 그룹의 오디오 신호를 복원할 수 있다. 프로세서(330)는 제2 종속 채널 그룹의 압축 오디오 신호를 압축 해제하여 제2 종속 채널 그룹의 오디오 신호를 복원할 수 있다. 프로세서(330)는 기본 채널 그룹의 오디오 신호, 제1 종속 채널 그룹 및 제2 종속 채널 그룹의 오디오 신호를 기초로, 기본 채널 그룹의 채널 개수보다 증가된 채널 개수의 다채널 오디오 신호를 복원할 수 있다.The processor 330 according to an embodiment may decompress the compressed audio signal of the first subordinate channel group from the bitstream to reconstruct the audio signal of the first subordinate channel group. The processor 330 may decompress the compressed audio signal of the second dependent channel group to reconstruct the audio signal of the second dependent channel group. The processor 330 may reconstruct a multi-channel audio signal having an increased number of channels than the number of channels in the basic channel group based on the audio signal of the basic channel group, the audio signal of the first dependent channel group, and the audio signal of the second dependent channel group. have.

일 실시 예에 따른 프로세서(330)는 n개의 종속 채널 그룹(n은 2보다 큰 정수)까지의 압축 오디오 신호를 압축 해제하고, 기본 채널 그룹의 오디오 신호 및 n개의 종속 채널 그룹의 오디오 신호를 기초로, 더욱 더 채널의 개수가 증가된 다채널 오디오 신호를 복원할 수 있다.The processor 330 according to an embodiment decompresses a compressed audio signal up to n subordinate channel groups (n is an integer greater than 2), and based on the audio signal of the base channel group and the audio signal of the n subordinate channel group As a result, it is possible to restore a multi-channel audio signal in which the number of channels is further increased.

도 3b는 일 실시 예에 따른 다채널 오디오 복호화 장치의 구성을 도시하는 블록도이다.3B is a block diagram illustrating a configuration of a multi-channel audio decoding apparatus according to an embodiment.

도 3b를 참조하면, 오디오 복호화 장치(300)는 정보 획득부(350), 다채널 오디오 복호화부(360)을 포함할 수 있다. 다채널 오디오 복호화부(360)는 압축 해제부(370) 및 다채널 오디오 신호 복원부(380)를 포함할 수 있다.Referring to FIG. 3B , the audio decoding apparatus 300 may include an information obtaining unit 350 and a multi-channel audio decoding unit 360 . The multi-channel audio decoding unit 360 may include a decompression unit 370 and a multi-channel audio signal restoration unit 380 .

오디오 복호화 장치(300)는 도 3a의 메모리(310) 및 프로세서(330)를 포함할 수 있고, 도 3b의 각 구성요소(350, 360, 370, 380)를 구현하기 위한 인스트럭션은 메모리(310)에 저장될 수 있다. 프로세서(330)는 메모리(210)에 저장된 인스트럭션을 실행할 수 있다.The audio decoding apparatus 300 may include the memory 310 and the processor 330 of FIG. 3A , and instructions for implementing each of the components 350 , 360 , 370 , and 380 of FIG. 3B are the memory 310 . can be stored in The processor 330 may execute instructions stored in the memory 210 .

일 실시 예에 따른 오디오 복호화 장치(300)의 정보 획득부(350)는, 비트스트림으로부터 기본 채널 오디오 스트림, 종속 채널 오디오 스트림 및 메타 데이터를 획득할 수 있다. 정보 획득부(350)는, 비트스트림 내에서 캡슐화 되어 있는 기본 채널 오디오 스트림, 종속 채널 오디오 스트림 및 메타 데이터를 획득할 수 있다.The information obtaining unit 350 of the audio decoding apparatus 300 according to an embodiment may obtain a base channel audio stream, a dependent channel audio stream, and metadata from a bitstream. The information obtaining unit 350 may obtain a base channel audio stream, a dependent channel audio stream, and metadata encapsulated in the bitstream.

정보 획득부(350)는 비트스트림으로부터 기본 채널 그룹의 압축 오디오 신호를 포함하는 기본 채널 오디오 스트림을 분류할 수 있다. 정보 획득부(350)는 기본 채널 오디오 스트림으로부터 기본 채널 그룹의 압축 오디오 신호를 획득할 수 있다.The information obtaining unit 350 may classify the base channel audio stream including the compressed audio signal of the base channel group from the bitstream. The information obtaining unit 350 may obtain a compressed audio signal of the base channel group from the base channel audio stream.

또한, 정보 획득부(350)는 비트스트림으로부터 종속 채널 그룹의 압축 오디오 신호를 포함하는 적어도 하나의 종속 채널 오디오 스트림을 분류할 수 있다. 정보 획득부(350)는 종속 채널 오디오 스트림으로부터 종속 채널 그룹의 압축 오디오 신호를 획득할 수 있다.Also, the information obtaining unit 350 may classify at least one dependent channel audio stream including the compressed audio signal of the dependent channel group from the bitstream. The information obtaining unit 350 may obtain the compressed audio signal of the dependent channel group from the dependent channel audio stream.

정보 획득부(350)는 비트스트림의 메타데이터로부터 다채널 오디오의 복원과 관련된 부가 정보를 획득할 수 있다. 정보 획득부(350)는 비트스트림으로부터 부가 정보를 포함하는 메타 데이터를 분류하고, 분류된 메타 데이터로부터 부가 정보를 획득할 수 있다.The information acquisition unit 350 may acquire additional information related to multi-channel audio restoration from the metadata of the bitstream. The information obtaining unit 350 may classify metadata including additional information from the bitstream and obtain additional information from the classified metadata.

일 실시 예에 따른 오디오 복호화 장치(300)의 다채널 오디오 복호화부(360)는, 비트스트림에 포함된 압축 오디오 신호들을 복호화 하여 출력 다채널 오디오 신호를 복원할 수 있다. 다채널 오디오 복호화부(360)는, 압축 해제부(370) 및 다채널 오디오 신호 복원부(380)를 포함할 수 있다.The multi-channel audio decoder 360 of the audio decoding apparatus 300 according to an embodiment may decode the compressed audio signals included in the bitstream to reconstruct the output multi-channel audio signal. The multi-channel audio decoding unit 360 may include a decompression unit 370 and a multi-channel audio signal restoration unit 380 .

압축 해제부(370)는, 기본 채널 그룹의 압축 오디오 신호 및 종속 채널 그룹의 압축 오디오 신호에 대한 엔트로피 복호화, 역양자화, 및 주파수 역변환 등의 압축해제 과정을 거쳐, 기본 채널 그룹의 적어도 하나의 오디오 신호 및 종속 채널 그룹의 오디오 신호들을 획득할 수 있다. 예를 들어, AAC 표준, OPUS 표준 등의 오디오 신호 압축 방법에 대응되는 오디오 신호 복원 방법이 이용될 수 있다.The decompression unit 370 performs decompression processes such as entropy decoding, inverse quantization, and inverse frequency transformation on the compressed audio signal of the basic channel group and the compressed audio signal of the dependent channel group, and at least one audio of the basic channel group A signal and audio signals of a dependent channel group may be obtained. For example, an audio signal restoration method corresponding to an audio signal compression method such as an AAC standard or an OPUS standard may be used.

압축 해제부(370)는 기본 채널 그룹의 적어도 하나의 압축 오디오 신호를 압축해제하여 기본 채널 그룹의 적어도 하나의 오디오 신호를 복원할 수 있다. 압축 해제부(370)는 종속 채널 그룹의 압축 오디오 신호를 압축 해제하여 적어도 하나의 종속 채널 그룹의 적어도 하나의 오디오 신호를 복원할 수 있다.The decompression unit 370 may decompress at least one compressed audio signal of the basic channel group to reconstruct at least one audio signal of the basic channel group. The decompression unit 370 may decompress the compressed audio signal of the subordinate channel group to reconstruct at least one audio signal of at least one subordinate channel group.

압축 해제부(370)는 각 채널 그룹(n개의 채널 그룹)의 압축 오디오 신호를 복호화하기 위한 별도의 제1 압축 해제부, ... , 제N 압축 해제부(미도시)를 포함할 수 있다. 이때, 제1 압축 해제부, ... , 제N 압축 해제부(미도시)는 서로 병렬적으로 동작할 수 있다.The decompression unit 370 may include a separate first decompressor, ..., an Nth decompressor (not shown) for decoding the compressed audio signal of each channel group (n channel groups). . At this time, the first decompression unit, ... , the Nth decompression unit (not shown) may operate in parallel with each other.

일 실시 예에 따른 다채널 오디오 신호 복원부(380)는, 기본 채널 그룹의 적어도 하나의 오디오 신호, 적어도 하나의 종속 채널 그룹의 적어도 하나의 오디오 신호, 및 부가 정보에 기초하여 출력 다채널 오디오 신호를 복원할 수 있다. The multi-channel audio signal restoration unit 380 according to an embodiment may be configured to output a multi-channel audio signal based on at least one audio signal of a basic channel group, at least one audio signal of at least one subordinate channel group, and additional information. can be restored.

예를 들어, 다채널 오디오 신호 복원부(380)는 기본 채널 그룹의 오디오 신호가 스테레오 채널의 오디오 신호인 경우, 기본 채널 그룹의 오디오 신호 및 제1 종속 채널 그룹의 오디오 신호를 기초로, 청자 전방 중심 다채널 오디오 신호를 복원할 수 있다. 예를 들어, 복원되는 청자 전방 중심 다채널 오디오 신호는, 3.1.2 채널 레이아웃의 오디오 신호일 수 있다.For example, when the audio signal of the basic channel group is the audio signal of the stereo channel, the multi-channel audio signal restoration unit 380 is configured to perform the audio signal of the basic channel group and the audio signal of the first sub-channel group in front of the listener based on the audio signal of the basic channel group. The center multi-channel audio signal can be restored. For example, the restored listener front center multi-channel audio signal may be an audio signal of a 3.1.2 channel layout.

또는, 다채널 오디오 신호 복원부(380)는 기본 채널 그룹의 오디오 신호, 제1 종속 채널 그룹의 오디오 신호 및 제2 종속 채널 그룹의 오디오 신호를 기초로, 청자 중심 다채널 오디오 신호를 복원할 수 있다. 예를 들어, 청자 둥심 다채널 오디오 신호는, 5.1.2 채널 레이아웃 또는 7.1.4 채널 레이아웃의 오디오 신호일 수 있다.Alternatively, the multi-channel audio signal restoration unit 380 may restore the listener-centered multi-channel audio signal based on the audio signal of the basic channel group, the audio signal of the first sub-channel group, and the audio signal of the second sub-channel group. have. For example, the multi-channel audio signal may be an audio signal of a 5.1.2 channel layout or a 7.1.4 channel layout.

다채널 오디오 신호 복원부(380)는 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호뿐 아니라, 부가 정보를 기초로, 다채널 오디오 신호를 복원할 수 있다. 이때, 부가 정보는 다채널 오디오 신호의 복원을 위한 부가 정보일 수 있다. 다채널 오디오 신호 복원부(380)는 복원된 다채널 오디오 신호를 출력할 수 있다..The multi-channel audio signal reconstructor 380 may reconstruct the multi-channel audio signal based on additional information as well as the audio signal of the basic channel group and the audio signal of the sub-channel group. In this case, the additional information may be additional information for reconstructing a multi-channel audio signal. The multi-channel audio signal restoration unit 380 may output a restored multi-channel audio signal.

도 3c는 일 실시 예에 따른 다채널 오디오 신호 복원부(380)의 구성을 도시하는 블록도이다.3C is a block diagram illustrating a configuration of a multi-channel audio signal restoration unit 380 according to an embodiment.

도 3c를 참조하면, 다채널 오디오 신호 복원부(380)는 믹싱부(383) 및 렌더링부(381)를 포함할 수 있다. Referring to FIG. 3C , the multi-channel audio signal restoration unit 380 may include a mixing unit 383 and a rendering unit 381 .

다채널 오디오 신호 복원부(380)의 믹싱부(383)는, 기본 채널 그룹의 적어도 하나의 오디오 신호 및 적어도 하나의 종속 채널 그룹의 오디오 신호를 믹싱함으로써 소정 채널 레이아웃의 믹싱된 오디오 신호를 획득할 수 있다. 믹싱부(383)는, 적어도 하나의 기본 채널의 오디오 신호 및 적어도 하나의 종속 채널의 오디오 신호의 가중치 합을 소정 채널 레이아웃의 적어도 하나의 채널의 오디오 신호로서 획득할 수 있다.The mixing unit 383 of the multi-channel audio signal restoration unit 380 obtains a mixed audio signal of a predetermined channel layout by mixing at least one audio signal of a basic channel group and an audio signal of at least one subordinate channel group. can The mixing unit 383 may obtain a weighted sum of the audio signal of at least one basic channel and the audio signal of at least one dependent channel as the audio signal of at least one channel of a predetermined channel layout.

믹싱부(383) 는 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호를 기초로, 소정 채널 레이아웃의 오디오 신호를 생성할 수 있다. 소정 채널 레이아웃의 오디오 신호는 다채널 오디오 신호일 수 있다. 믹싱부(383)는, 부가 정보(예를 들어, 동적 디믹싱 가중치 파라미터에 관한 정보)를 더 고려하여, 다채널 오디오 신호를 생성할 수 있다.The mixing unit 383 may generate an audio signal of a predetermined channel layout based on the audio signal of the basic channel group and the audio signal of the subordinate channel group. The audio signal of a predetermined channel layout may be a multi-channel audio signal. The mixer 383 may generate a multi-channel audio signal by further considering additional information (eg, information on a dynamic demixing weight parameter).

믹싱부(383)는 기본 채널 그룹의 적어도 하나의 오디오 신호 및 종속 채널 그룹의 적어도 하나의 오디오 신호를 믹싱함으로써, 소정 채널 레이아웃의 오디오 신호를 생성할 수 있다. 예를 들어, 믹싱부(383)는, 기본 채널 그룹에 포함되는 L2 채널 및 R2 채널의 오디오 신호들과 종속 채널 그룹에 포함되는 C 채널의 오디오 신호를 믹싱하여, 3.1.2 채널 레이아웃의 L3 채널 및 R3 채널의 오디오 신호들을 생성할 수 있다.The mixing unit 383 may generate an audio signal of a predetermined channel layout by mixing at least one audio signal of a basic channel group and at least one audio signal of a subordinate channel group. For example, the mixing unit 383 mixes the audio signals of the L2 channel and the R2 channel included in the basic channel group and the C channel audio signal included in the subordinate channel group, and the L3 channel of the 3.1.2 channel layout and R3 channel audio signals.

믹싱부(383)는, 종속 채널 그룹의 오디오 신호들 중 일부에 대하여 상술힌 믹싱 동작을 바이패스할 수 있다. 예를 들어, 믹싱부(383)는, 기본 채널 그룹의 적어도 하나의 오디오 신호와의 믹싱 동작 없이, 종속 채널 그룹에 포함되는 오디오 신호들로부터 3.1.2 채널 레이아웃의 C 채널, LFE 채널, Hfl3 채널, 및 Hfr3 채널의 오디오 신호들을 획득할 수 있다.The mixing unit 383 may bypass the above-described mixing operation with respect to some of the audio signals of the dependent channel group. For example, the mixing unit 383, without a mixing operation with at least one audio signal of the basic channel group, C channel, LFE channel, Hfl3 channel of the 3.1.2 channel layout from the audio signals included in the dependent channel group , and Hfr3 channel audio signals may be obtained.

믹싱부(383)는, 기본 채널의 오디오 신호와 종속 채널의 오디오 신호의 믹싱을 통해 획득된 적어도 하나의 채널의 오디오 신호 및, 믹싱 동작이 바이패스된 종속 채널의 오디오 신호로부터, 소정 채널 레이아웃의 다채널 오디오 신호를 생성할 수 있다. 예를 들어, 믹싱부(383)는, 믹싱을 통해 획득된 L3 채널 및 R3 채널의 오디오 신호들과 종속 채널 그룹에 포함되는 C 채널, LFE 채널, Hfl3 채널, 및 Hfr3 채널의 오디오 신호들로부터, 3.1.2 채널 레이아웃의 오디오 신호들을 획득할 수 있다.The mixing unit 383 is configured to generate a predetermined channel layout from an audio signal of at least one channel obtained through mixing of an audio signal of a basic channel and an audio signal of a dependent channel, and an audio signal of a dependent channel, the mixing operation of which is bypassed. A multi-channel audio signal can be generated. For example, the mixing unit 383, from the audio signals of the L3 channel and the R3 channel obtained through mixing, and the audio signals of the C channel, the LFE channel, the Hfl3 channel, and the Hfr3 channel included in the dependent channel group, 3.1.2 It is possible to acquire audio signals of a channel layout.

렌더링부(381)는, 믹싱부(383)에서 획득되는 다채널 오디오 신호를 렌더링하여 출력할 수 있다. 렌더링부(381)는, 음량 제어부(미도시), 및 리미터(미도시)를 포함할 수 있다. The rendering unit 381 may render and output the multi-channel audio signal obtained by the mixing unit 383 . The rendering unit 381 may include a volume control unit (not shown), and a limiter (not shown).

예를 들어, 렌더링부(381)는 비트스트림을 통해 시그널링된 음량 정보를 기초로, 각 채널의 오디오 신호의 음량을 타겟 음량(예를 들어, -24LKFS)로 제어할 수 있다. 또한, 렌더링부(381)는, 음량 제어 후에, 오디오 신호의 트루 피크 레벨을 제한(예를 들어, -1dBTP로 제한)할 수 있다. For example, the rendering unit 381 may control the volume of the audio signal of each channel to a target volume (eg, -24LKFS) based on volume information signaled through the bitstream. Also, the rendering unit 381 may limit the true peak level of the audio signal (eg, limit to -1 dBTP) after controlling the volume.

도 3d는 일 실시 예에 따른 다채널 오디오 신호 복원부의 믹싱부의 동작을 설명하는 도면이다..3D is a diagram for explaining an operation of a mixing unit of a multi-channel audio signal restoration unit according to an embodiment.

믹싱부(383)는, 기본 채널 그룹의 적어도 하나의 오디오 신호 및 적어도 하나의 종속 채널 그룹의 오디오 신호를 믹싱함으로써, 소정 채널 레이아웃의 오디오 신호를 획득할 수 있다. 믹싱부(383)는, 기본 채널의 적어도 하나의 오디오 신호 및 종속 채널의 적어도 하나의 오디오 신호의 가중치 합을, 소정 채널 레이아웃의 적어도 하나의 채널의 오디오 신호로서 획득할 수 있다.The mixing unit 383 may obtain an audio signal of a predetermined channel layout by mixing at least one audio signal of a basic channel group and an audio signal of at least one subordinate channel group. The mixing unit 383 may obtain a weighted sum of at least one audio signal of a basic channel and at least one audio signal of a dependent channel as an audio signal of at least one channel of a predetermined channel layout.

도 3d를 참조하면, 믹싱부(383)는 제1 디믹싱부(384), 제2 디믹싱부(385),..., 제N 디믹싱부(386)를 포함할 수 있다.Referring to FIG. 3D , the mixing unit 383 may include a first demixing unit 384 , a second demixing unit 385 , ..., and an Nth demixing unit 386 .

믹싱부(383)는, 기본 채널 그룹의 적어도 하나의 오디오 신호를 제1 채널 레이아웃의 오디오 신호로서 획득할 수 있다. 믹싱부(383)는, 기본 채널 그룹의 적어도 하나의 오디오 신호에 대해서는 믹싱 동작을 바이패스할 수 있다.The mixer 383 may obtain at least one audio signal of the basic channel group as an audio signal of the first channel layout. The mixing unit 383 may bypass the mixing operation for at least one audio signal of the basic channel group.

제1 디믹싱부(384)는 기본 채널 그룹의 적어도 하나의 오디오 신호 및 제1 종속 채널 그룹의 오디오 신호로부터 제2 채널 레이아웃의 오디오 신호를 획득할 수 있다. 제1 디믹싱부(384)는, 적어도 하나의 기본 채널 오디오 신호와 적어도 하나의 제1 종속 채널 오디오 신호를 믹싱함으로써, 제2 채널 레이아웃에 포함되는 채널의 오디오 신호를 획득할 수 있다.The first demixer 384 may obtain the audio signal of the second channel layout from at least one audio signal of the basic channel group and the audio signal of the first subordinate channel group. The first demixing unit 384 may obtain an audio signal of a channel included in the second channel layout by mixing at least one basic channel audio signal and at least one first dependent channel audio signal.

예를 들어, 제2 채널 레이아웃이 3.1.2 채널 레이아웃이고, 기본 채널 그룹이 스테레오 채널을 구성하는 L2 채널 및 R2 채널을 포함하고, 제1 종속 채널 그룹이 Hfl3 채널, Hfr3 채널, LFE 채널, 및 C 채널을 포함할 수 있다. 이 경우, 제1 디믹싱부(384)는, 기본 채널 그룹에 포함되는 L2 채널의 오디오 신호와 종속 채널 그룹에 포함되는 C 채널의 오디오 신호의 가중치 합을, 3.1.2 채널 레이아웃의 L3 채널 오디오 신호로 획득할 수 있다. 제1 디믹싱부(384)는, 기본 채널 그룹에 포함되는 R2의 오디오 신호와 종속 채널 그룹에 포함되는 C 채널의 오디오 신호의 가중치 합을, 3.1.2 채널 레이아웃의 R3 채널 오디오 신호로 획득할 수 있다. 제1 디믹싱부(384)는, 종속 채널 그룹의 Hfl3 채널, Hfr3 채널, LFE 채널, 및 C 채널의 오디오 신호들 및 믹싱된 L3 채널, 및 R3 채널 오디오 신호들로부터 3.1.2 채널 레이아웃의 오디오 신호를 획득할 수 있다.For example, the second channel layout is a 3.1.2 channel layout, the basic channel group includes L2 channels and R2 channels constituting a stereo channel, and the first subordinate channel group includes Hfl3 channels, Hfr3 channels, LFE channels, and It may include a C channel. In this case, the first demixing unit 384 calculates the weighted sum of the audio signal of the L2 channel included in the basic channel group and the audio signal of the C channel included in the dependent channel group, the L3 channel audio of the 3.1.2 channel layout. signal can be obtained. The first demixing unit 384 is configured to obtain a weighted sum of the R2 audio signal included in the basic channel group and the C channel audio signal included in the dependent channel group as the R3 channel audio signal of the 3.1.2 channel layout. can The first demixing unit 384 is configured to generate a 3.1.2 channel layout audio from the audio signals of the Hfl3 channel, the Hfr3 channel, the LFE channel, and the C channel of the dependent channel group and the mixed L3 channel, and the R3 channel audio signals. signal can be obtained.

제2 디믹싱부(385)는 기본 채널 그룹의 적어도 하나의 오디오 신호, 제1 종속 채널 그룹의 오디오 신호, 및 제2 종속 채널 그룹의 오디오 신호로부터 제3 채널 레이아웃의 오디오 신호를 획득할 수 있다. 제2 디믹싱부(385)는, 제1 디믹싱부(384)에서 획득된 제2 채널 레이아웃의 오디오신호들 중에서 적어도 하나의 오디오 신호와 제2 종속 채널 그룹의 오디오 신호 중에서 적어도 하나의 오디오 신호를 믹싱함으로써, 제3 채널 레이아웃에 포함되는 채널의 오디오 신호를 획득할 수 있다.The second demixing unit 385 may obtain the audio signal of the third channel layout from at least one audio signal of the basic channel group, the audio signal of the first dependent channel group, and the audio signal of the second dependent channel group. . The second demixing unit 385 may include at least one audio signal from among the audio signals of the second channel layout obtained by the first demixing unit 384 and at least one audio signal from among the audio signals of the second subordinate channel group. By mixing , it is possible to obtain an audio signal of a channel included in the third channel layout.

예를 들어, 제3 채널 레이아웃이 5.1.2 채널 레이아웃이고, 제2 종속 채널 그룹이 L 채널, 및 R 채널을 포함할 수 있다. 이 경우, 제2 디믹싱부(385)는, 3.1.2 채널 레이아웃에 포함되는 L3 채널의 오디오 신호와 종속 채널 그룹에 포함되는 L 채널의 오디오 신호를 믹싱함으로써, 5.1.2 채널 레이아웃의 Ls5 채널의 오디오 신호로 획득할 수 있다. 제2 디믹싱부(385)는 3.1.2 채널 레이아웃에 포함되는 R3 채널의 오디오 신호와 종속 채널 그룹에 포함되는 R 채널의 오디오 신호를 믹싱함으로써, 5.1.2 채널 레이아웃의 우측 Rs5 채널의 오디오 신호로 획득할 수 있다. 그리고, 제2 디믹싱부(385)는, 3.1.2 채널 레이아웃에 포함되는 Hfl3 채널의 오디오 신호와 새롭게 획득된 Ls5 채널의 오디오 신호를 믹싱함으로써, 5.1.2 채널 레이아웃의 Hl5 채널의 오디오 신호로 획득할 수 있다. 제2 디믹싱부(385)는 3.1.2 채널 레이아웃에 포함되는 Hfr3 채널의 오디오 신호와 새롭게 획득된 Rs5 채널의 오디오 신호를 믹싱함으로써, 5.1.2 채널 레이아웃의 Hr5 채널의 오디오 신호로 획득할 수 있다. For example, the third channel layout may be a 5.1.2 channel layout, and the second subordinate channel group may include an L channel and an R channel. In this case, the second demixing unit 385 mixes the L3 channel audio signal included in the 3.1.2 channel layout and the L channel audio signal included in the subordinate channel group, thereby mixing the Ls5 channel of the 5.1.2 channel layout. can be obtained as an audio signal of The second demixing unit 385 mixes the audio signal of the R3 channel included in the 3.1.2 channel layout with the audio signal of the R channel included in the subordinate channel group, thereby generating an audio signal of the right Rs5 channel of the 5.1.2 channel layout. can be obtained with Then, the second demixing unit 385 mixes the audio signal of the Hfl3 channel included in the 3.1.2 channel layout and the newly acquired audio signal of the Ls5 channel into the audio signal of the H15 channel of the 5.1.2 channel layout. can be obtained The second demixing unit 385 mixes the audio signal of the Hfr3 channel included in the 3.1.2 channel layout and the newly acquired audio signal of the Rs5 channel, thereby obtaining the audio signal of the Hr5 channel of the 5.1.2 channel layout. have.

제2 디믹싱부(385)는, 제1 종속 채널 그룹의 LFE 채널 및 C 채널, 제2 종속 채널 그룹의 L 채널 및 R 채널, 및 믹싱된 Ls5 채널, Rs5 채널, Hl5 채널, 및 Hr5 채널의 오디오 신호들로부터 5.1.2 채널 레이아웃의 오디오 신호를 획득할 수 있다.The second demixing unit 385 is configured to combine the LFE channels and C channels of the first subordinate channel group, the L channels and R channels of the second subordinate channel group, and the mixed Ls5 channels, Rs5 channels, H15 channels, and Hr5 channels. It is possible to obtain an audio signal of a 5.1.2 channel layout from the audio signals.

도 3d에는 믹싱부(383)가 복수의 디믹싱부들(384, 385)를 통해 제1 채널 레이아웃의 오디오 신호, 제2 채널 레이아웃의 오디오 신호, 및 제3 채널 레이아웃의 오디오 신호를 모두 획득하는 경우를 예로 들어 도시하였으나, 본 개시는 이러한 실시 예에 제한되지 않는다. In FIG. 3D , the mixing unit 383 obtains all of the audio signal of the first channel layout, the audio signal of the second channel layout, and the audio signal of the third channel layout through the plurality of demixing units 384 and 385 . Although illustrated as an example, the present disclosure is not limited to this embodiment.

믹싱부(383)는, 기본 채널 그룹의 적어도 하나의 오디오 신호 및 적어도 하나의 종속 채널 그룹의 오디오 신호를 믹싱함으로써 소정 채널 레이아웃의 오디오 신호를 획득할 수 있다. 믹싱부(383)를 통해 획득되는 오디오 신호의 소정 채널 레이아웃은 구현에 따라서 다양한 다채널 레이아웃을 포함할 수 있다.The mixing unit 383 may obtain an audio signal of a predetermined channel layout by mixing at least one audio signal of a basic channel group and an audio signal of at least one subordinate channel group. The predetermined channel layout of the audio signal obtained through the mixing unit 383 may include various multi-channel layouts according to implementation.

상술한 바와 같이, 본 개시의 일 실시 예에 따른 오디오 복호화 장치(300)는, 비트스트림으로부터 획득되는 기본 채널 그룹의 오디오 신호와 적어도 하나의 종속 채널 그룹의 오디오 신호로부터 다채널 오디오 신호를 복원함으로써, 모노 채널 레이아웃 또는 스테레오 채널 레이아웃과 같은 하위 채널 레이아웃 뿐만 아니라 화면 중심의 3차원 음상을 갖는 다양한 채널 레이아웃으로도 오디오 신호를 복원할 수 있다.As described above, the audio decoding apparatus 300 according to an embodiment of the present disclosure restores a multi-channel audio signal from an audio signal of a basic channel group obtained from a bitstream and an audio signal of at least one subordinate channel group. , it is possible to reconstruct an audio signal not only with a sub-channel layout such as a mono channel layout or a stereo channel layout, but also with various channel layouts having a three-dimensional sound image centered on the screen.

한편, 보다 전송 효율을 높이기 위해서, 일 실시 예에 따른 오디오 부호화 장치는, 다채널 오디오 신호를 화면 중심의 음상을 갖는 채널 레이아웃으로 변환하는 데에 있어서, 이용되지 않거나 적게 사용된 사이드 채널에 대한 오디오 신호는 다운샘플링하여 별도로 전송할 수 있다. 따라서, 다운샘플링의 배수만큼 전송량을 줄일 수 있다. 또한, 본 개시의 일 실시 예에 따른 오디오 복호화 장치는, 부호화 단에서의 다운샘플링에 의해 발생하는 데이터 손실을 보상하기 위하여, 인공 지능에 기반한 복호화를 수행할 수 있다.Meanwhile, in order to increase transmission efficiency, the audio encoding apparatus according to an embodiment converts a multi-channel audio signal into a channel layout having a sound image centered on a screen, and includes audio for unused or less-used side channels. The signal may be downsampled and transmitted separately. Accordingly, it is possible to reduce the transmission amount by a multiple of downsampling. Also, the audio decoding apparatus according to an embodiment of the present disclosure may perform artificial intelligence-based decoding in order to compensate for data loss caused by downsampling at the encoding stage.

도 4a는 일 실시 예에 따른 오디오 부호화 장치의 블록도를 도시하고, 도 4b는 일 실시 예에 따른 오디오 복호화 장치의 블록도를 도시한다.4A is a block diagram of an audio encoding apparatus according to an embodiment, and FIG. 4B is a block diagram of an audio decoding apparatus according to an embodiment.

일 실시예에 따른 오디오 부호화 장치(400)는, 오디오 신호를 부호화 하여 비트스트림으로 출력할 수 있다. 도 4a에 도시된 바와 같이, 일 실시 예에 따른 오디오 부호화 장치(400)는, 다채널 오디오 부호화부(410), 채널 정보 생성부(420), 및 비트스트림 생성부(430)를 포함할 수 있다.The audio encoding apparatus 400 according to an embodiment may encode an audio signal and output it as a bitstream. As shown in FIG. 4A , the audio encoding apparatus 400 according to an embodiment may include a multi-channel audio encoder 410 , a channel information generator 420 , and a bitstream generator 430 . have.

오디오 부호화 장치(400)의 다채널 오디오 부호화부(410)는, 제1 채널 그룹에 포함되는 채널들에 대응되는 제1 오디오 신호들(405)(이하, '제1 채널 그룹에 대응되는 제1 오디오 신호들'이라 함)을 다운믹싱하여 적은 개수의 채널을 포함하는 제2 채널 그룹에 포함되는 채널들에 대응되는 제2 오디오 신호들(415) (이하, '제2 채널 그룹에 대응되는 제2 오디오 신호들'이라 함)을 획득할 수 있다. 제1 채널 그룹은 원본 오디오 신호의 채널 그룹을 포함하고, 제2 채널 그룹은 제1 채널 그룹에 포함된 채널들 중 적어도 두 채널들을 결합함으로써 구성될 수 있다. 일 실시 예에 따른 다채널 오디오 부호화부(410)는, 청자 중심 다채널 오디오 신호인 제1 채널 그룹에 대응되는 제1 오디오 신호들(405)로부터, 청자 전방 중심(또는, 화면 중심) 다채널 오디오 신호인 제2 채널 그룹에 대응되는 제2 오디오 신호들(415)을 획득할 수 있다. The multi-channel audio encoder 410 of the audio encoding apparatus 400 generates first audio signals 405 corresponding to channels included in the first channel group (hereinafter, 'first corresponding to the first channel group'). The second audio signals 415 corresponding to channels included in the second channel group including a small number of channels by downmixing (referred to as 'audio signals') (hereinafter referred to as 'the second audio signals corresponding to the second channel group) 2 audio signals') can be obtained. The first channel group may include a channel group of the original audio signal, and the second channel group may be configured by combining at least two channels among channels included in the first channel group. The multi-channel audio encoder 410 according to an embodiment of the present invention receives, from the first audio signals 405 corresponding to the first channel group, which is the listener-centered multi-channel audio signal, the listener-centered (or screen-centered) multi-channel Second audio signals 415 corresponding to a second channel group that is an audio signal may be obtained.

다채널 오디오 부호화부(410)에서 획득되는 제2 오디오 신호들(415)의 제2 채널 그룹에 포함되는 채널들의 개수는, 제1 채널 그룹에 포함되는 채널들의 개수에 비하여 적어야 한다. 즉, 제2 채널 그룹은, 제1 채널 그룹의 하위 채널 그룹이어야 한다.The number of channels included in the second channel group of the second audio signals 415 obtained by the multi-channel audio encoder 410 should be smaller than the number of channels included in the first channel group. That is, the second channel group should be a sub-channel group of the first channel group.

예를 들어, 제1 채널 그룹은 S_n개의 서라운드 채널, W_n개의 서브 우퍼 채널, 및 H_n 개의 높이 채널로 구성되고, 제2 채널 그룹은 S_n-1개의 서라운드 채널, W_n-1개의 서브 우퍼 채널, 및 H_n-1 개의 높이 채널로 구성될 수 있다. n은 1 이상의 정수일 수 있다. 이 때, S_n-1은 S_n보다 작거나 같고, W_n-1은 W_n보다 작거나 같고, H_n-1은 H_n보다 작거나 같아야 하며, S_n-1이S_n과 동일하고,W_n-1이W_n과 동일하고, H_n-1이H_n과 동일한 경우는 제외된다. 예를 들어, 제1 오디오 신호들(405)이 7.1.4 채널 오디오 신호들이라면, 제2 오디오 신호들(415)은 2 채널, 3.1.2 채널, 3.1.4 채널, 5.1.2 채널, 5.1.4 채널, 또는 7.1.2 채널 오디오 신호들일 수 있다. 그러나, 본 개시의 다양한 실시 예는 이에 제한되지 않으며, 다양한 채널 그룹의 오디오 신호가 이용될 수 있다. 예를 들어, 제1 오디오 신호들(405)은 5.1.4 채널, 5.1.2 채널, 3.1.4 채널, 또는 3.1.2 채널 오디오 신호들을 포함할 수 있고, 제2 오디오 신호들(415)은 제1 오디오 신호들(405)의 하위 채널 그룹의 오디오 신호일 수 있다.For example, the first channel group includes S _n surround channels, W _n subwoofer channels, and H _n height channels, and the second channel group includes S _n-1 surround channels and W _n-1 height channels. It may be composed of a subwoofer channel, and H _n-1 height channels. n may be an integer of 1 or more. At this time, S _n-1 must be less than or equal to S _n , W _n-1 must be less than or equal to W _n , H _n-1 must be less than or equal to H _n , and S _{n-1 is} equal to S _n and ,W _{n-1 is} equal to W _n , and H _n-1 isThe same case as H _n is excluded. For example, if the first audio signals 405 are 7.1.4 channel audio signals, the second audio signals 415 may be 2-channel, 3.1.2-channel, 3.1.4-channel, 5.1.2-channel, 5.1. It can be 4 channel, or 7.1.2 channel audio signals. However, various embodiments of the present disclosure are not limited thereto, and audio signals of various channel groups may be used. For example, the first audio signals 405 may include 5.1.4 channel, 5.1.2 channel, 3.1.4 channel, or 3.1.2 channel audio signals, and the second audio signals 415 may include It may be an audio signal of a sub-channel group of the first audio signals 405 .

다채널 오디오 부호화부(410)는, 제2 채널 그룹에 대응되는 제2 오디오 신호들(415)을 믹싱하고 압축하여 비트스림 생성부(430)로 출력할 수 있다. The multi-channel audio encoder 410 may mix and compress the second audio signals 415 corresponding to the second channel group, and output them to the bitstream generator 430 .

오디오 부호화 장치(400)의 채널 정보 생성부(420)는, 오디오 복호화 장치(500)가 제2 채널 그룹의 오디오 신호를 제1 채널 그룹으로 업믹스하기 위해 이용할 수 있는 적어도 하나의 채널에 관한 정보를 제1 오디오 신호로부터 획득할 수 있다. 채널 정보 생성부(420)는, 제1 채널 그룹의 채널들 중에서 적어도 하나의 채널을 식별하고, 식별된 적어도 하나의 채널에 대응되는 제3 오디오 신호를 다운 샘플링하여, 다운샘플링된 적어도 하나의 제3 오디오 신호를 획득할 수 있다. 채널 정보 생성부(420)는, 다운샘플링된 적어도 하나의 제3 오디오 신호를 압축하여 비트스림 생성부(430)로 출력할 수 있다. The channel information generator 420 of the audio encoding apparatus 400 includes information on at least one channel that the audio decoding apparatus 500 can use to upmix the audio signal of the second channel group into the first channel group. may be obtained from the first audio signal. The channel information generator 420 identifies at least one channel from among the channels of the first channel group, down-samples a third audio signal corresponding to the identified at least one channel, and downsamples at least one second channel. 3 It is possible to acquire an audio signal. The channel information generator 420 may compress the downsampled at least one third audio signal and output it to the bitstream generator 430 .

비트스트림 생성부(430)는, 제2 채널 그룹에 대응되는 제2 오디오 신호들(415)에 관한 정보와 다운샘플링된 적어도 하나의 제3 오디오 신호에 관한 정보를 포함하는 비트스트림을 생성하여 도 4b에 도시된 오디오 복호화 장치(500)에게 출력할 수 있다.The bitstream generator 430 generates a bitstream including information on the second audio signals 415 corresponding to the second channel group and information on at least one downsampled third audio signal. It may output to the audio decoding apparatus 500 shown in 4b.

도 4b에 도시된 바와 같이, 오디오 복호화 장치(500)는, 정보 획득부(510), 다채널 오디오 복호화부(520), 및 음상 복원부(530)를 포함할 수 있다.As shown in FIG. 4B , the audio decoding apparatus 500 may include an information acquisition unit 510 , a multi-channel audio decoding unit 520 , and a sound image restoration unit 530 .

오디오 복호화 장치(500)는, 오디오 부호화 장치(400)로부터 수신된 비트스트림으로부터 다채널 오디오 신호를 복원할 수 있다.The audio decoding apparatus 500 may reconstruct a multi-channel audio signal from the bitstream received from the audio encoding apparatus 400 .

오디오 복호화 장치(500)의 정보 획득부(510)는, 비트스트림으로부터 제1 채널 그룹에 대응되는 제1 오디오 신호들에 관한 정보 및 다운샘플링된 제2 오디오 신호에 관한 정보를 획득할 수 있다. 다채널 오디오 복호화부(520)는, 압축 오디오 신호를 압축 해제하고 믹싱하여 제1 채널 그룹에 대응되는 제1 오디오 신호들(505)를 획득할 수 있다. The information obtaining unit 510 of the audio decoding apparatus 500 may obtain information about the first audio signals corresponding to the first channel group and information about the downsampled second audio signal from the bitstream. The multi-channel audio decoder 520 may obtain the first audio signals 505 corresponding to the first channel group by decompressing and mixing the compressed audio signal.

음상 복원부(530)는, 다운샘플링된 제2 오디오 신호에 관한 정보를 압축 해제하고 업샘플링하여, 제2 채널 그룹에 포함된 채널들 중에서 적어도 하나의 채널에 대응되는 적어도 하나의 제2 오디오 신호를 획득할 수 있다. 음상 복원부(530)는, 제1 채널 그룹에 대응되는 제1 오디오 신호들 (505) 및 적어도 하나의 제2 오디오 신호로부터, 제1 채널 그룹의 채널 개수보다 더 많은 채널 개수의 제2 채널 그룹에 대응되는 제3 오디오 신호(515)를 복원할 수 있다. The sound image restoration unit 530 decompresses and up-samples the down-sampled information on the second audio signal to at least one second audio signal corresponding to at least one channel among the channels included in the second channel group. can be obtained. The sound image restoration unit 530, from the first audio signals 505 and at least one second audio signal corresponding to the first channel group, includes a second channel group in which the number of channels is greater than the number of channels in the first channel group. It is possible to restore the third audio signal 515 corresponding to .

도 5는 일 실시 예에 따른 오디오 처리 시스템에서 수행되는 채널 그룹들 간의 변환의 예를 도시한다.5 illustrates an example of conversion between channel groups performed in an audio processing system according to an embodiment.

일 실시 예에 따른 오디오 부호화 장치(400)는, 원본 오디오 신호로서 제1 채널 그룹의 제1 오디오 신호를 수신할 수 있다. 예를 들어, 오디오 부호화 장치(400)는, Ls채널, Lb 채널, HBL 채널, L 채널, HFL 채널, C 채널, LFE 채널, HFR 채널, R 채널, HBR 채널, Rb 채널, 및 Rs로 구성되는 7.1.4 채널 오디오 신호를 원본 오디오 신호로서 수신할 수 있다. The audio encoding apparatus 400 according to an embodiment may receive the first audio signal of the first channel group as the original audio signal. For example, the audio encoding apparatus 400 includes an Ls channel, an Lb channel, an HBL channel, an L channel, an HFL channel, a C channel, an LFE channel, an HFR channel, an R channel, an HBR channel, an Rb channel, and Rs. 7.1.4 A channel audio signal can be received as an original audio signal.

오디오 부호화 장치(400)는, 원본 오디오 신호의 제1 채널 그룹을 디스플레이 장치의 화면을 중심으로 음상이 구현되는 제2 채널 그룹으로 변환할 수 있다. 예를 들어, 오디오 부호화 장치(400)는, 7.1.4 채널의 원본 오디오 신호를 3.1.2 채널의 오디오 신호(O_tv)로 변환할 수 있다. 오디오 부호화 장치(400)는, 제2 채널 그룹으로 변환된 오디오 신호를 비트스트림에 포함하여 오디오 복호화 장치(500)에게 전송할 수 있다.The audio encoding apparatus 400 may convert the first channel group of the original audio signal into a second channel group in which a sound image is implemented based on the screen of the display apparatus. For example, the audio encoding apparatus 400 may convert the 7.1.4-channel original audio signal into the 3.1.2-channel audio signal O _tv . The audio encoding apparatus 400 may transmit the audio signal converted into the second channel group to the audio decoding apparatus 500 by including it in a bitstream.

오디오 부호화 장치(400)는, 화면을 중심으로 음상이 구현되도록 채널 그룹을 변환함에 있어서, 제1 채널 그룹의 채널들 중에서 이용되지 않거나, 관련 정보가 가장 적게 사용된 적어도 하나의 채널을 사이드 채널로서 결정할 수 있다. 예를 들어, 오디오 부호화 장치(400)는, 7.1.4 채널 그룹의 채널들 중에서, Ls채널, Lb 채널, HBL 채널, HBR 채널, Rb 채널, 및 Rs 채널을 사이드 채널들로서 결정할 수 있다. 오디오 부호화 장치(400)는, 제1 채널 그룹의 채널들 중에서, 사이드 채널로서 결정된 적어도 하나의 채널 이외의 채널을 메인 채널로서 결정할 수 있다.When the audio encoding apparatus 400 transforms a channel group so that a sound image is implemented around a screen, at least one channel that is not used or has the least related information among the channels of the first channel group is used as a side channel. can decide For example, the audio encoding apparatus 400 may determine the Ls channel, the Lb channel, the HBL channel, the HBR channel, the Rb channel, and the Rs channel from among the channels of the 7.1.4 channel group as side channels. The audio encoding apparatus 400 may determine, as a main channel, a channel other than at least one determined as a side channel among channels of the first channel group.

오디오 부호화 장치(400)는, 사이드 채널로서 결정된 적어도 하나의 채널의 오디오 신호를 시간축으로 다운샘플링하여 오디오 복호화 장치(500)에게 전송할 수 있다. 오디오 부호화 장치(400)는, 제1 채널 그룹의 채널들 중에서 사이드 채널로서 결정된 N 개의 채널을 1/s배로 다운샘플링한 신호(M_adv)를 비트스트림에 더 포함하여 오디오 복호화 장치(500)에게 전송할 수 있다. (N은 1보다 큰 정수, s는 1보다 큰 유리수임)The audio encoding apparatus 400 may downsample an audio signal of at least one channel determined as a side channel in a time axis and transmit it to the audio decoding apparatus 500 . The audio encoding apparatus 400 further includes, in the bitstream, a signal M _adv obtained by downsampling N channels determined as side channels among channels of the first channel group by a factor of 1/s to the audio decoding apparatus 500 . can be transmitted (N is an integer greater than 1, s is a rational number greater than 1)

오디오 복호화 장치(500)는, 다운샘플링되어 전송된 적어도 하나의 사이드 채널의 오디오 신호(M_adv)를 업샘플링하여, 적어도 하나의 사이드 채널의 오디오 신호를 복원할 수 있다. 예를 들어, 오디오 복호화 장치(500)는, 업샘플링을 통해, 7.1.4 채널 그룹의 채널들 중에서, Ls채널, Lb 채널, HBL 채널, HBR 채널, Rb 채널, 및 Rs 채널의 오디오 신호들을 복원할 수 있다.The audio decoding apparatus 500 may upsample the downsampled and transmitted audio signal M _adv of at least one side channel to reconstruct the audio signal of at least one side channel. For example, the audio decoding apparatus 500 restores audio signals of the Ls channel, the Lb channel, the HBL channel, the HBR channel, the Rb channel, and the Rs channel among the channels of the 7.1.4 channel group through upsampling. can do.

오디오 복호화 장치(500)는, 복원된 적어도 하나의 사이드 채널의 오디오 신호를 이용하여, 제2 채널 그룹의 오디오 신호(O_tv)로부터, 청자 중심으로 음상이 구현되는 제1 채널 그룹의 오디오 신호를 복원할 수 있다. 오디오 복호화 장치(500)는, 적어도 하나의 사이드 채널의 오디오 신호를 이용하여, 제2 채널 그룹의 오디오 신호(O_tv)로부터, 제1 채널 그룹의 메인 채널과 사이드 채널을 교차 개선시킴으로써, 제1 채널 그룹의 오디오 신호를 복원할 수 있다.The audio decoding apparatus 500 receives, from the audio signal O _tv of the second channel group, the audio signal of the first channel group in which the sound image is centered on the listener, using the restored audio signal of at least one side channel. can be restored The audio decoding apparatus 500 cross-improves the main channel and the side channel of the first channel group from the audio signal O _tv of the second channel group by using the audio signal of at least one side channel, It is possible to restore the audio signal of the channel group.

이하에서는, 도 6을 참조하여, 오디오 부호화 장치(400)의 각 구성을 보다 상세하게 살펴본다.Hereinafter, each configuration of the audio encoding apparatus 400 will be described in more detail with reference to FIG. 6 .

도 6은 일 실시 예에 따른 오디오 부호화 장치(400)의 블록도를 도시한다.6 is a block diagram of an audio encoding apparatus 400 according to an embodiment.

일 실시 예에 따른 오디오 부호화 장치(400)의 다채널 오디오 부호화부(410)는, 제1 채널 그룹에 대응되는 제1 오디오 신호들을 제2 채널 그룹에 대응되는 제2 오디오 신호들로 변환하고 부호화하여, 기본 채널 그룹의 제1 압축 신호, 종속 채널 그룹의 제2 압축 신호, 및 부가 정보를 획득할 수 있다. 다채널 오디오 부호화부(410)는, 다채널 오디오 신호 처리부(450), 제1 압축부(411), 및 부가 정보 생성부(413)를 포함할 수 있다.The multi-channel audio encoder 410 of the audio encoding apparatus 400 according to an embodiment converts the first audio signals corresponding to the first channel group into second audio signals corresponding to the second channel group and encodes them. Thus, the first compressed signal of the basic channel group, the second compressed signal of the dependent channel group, and additional information may be obtained. The multi-channel audio encoder 410 may include a multi-channel audio signal processor 450 , a first compressor 411 , and an additional information generator 413 .

일 실시 예에 따른 다채널 오디오 변환부(451)는, 제1 채널 그룹에 대응되는 제1 오디오 신호들을 다운믹싱함으로써 제2 채널 그룹에 대응되는 제2 오디오 신호들을 획득할 수 있다. 다채널 오디오 변환부(451)는, 채널 그룹 변환 규칙에 따라서, 제1 채널 그룹에 포함되는 적어도 두 채널들의 오디오 신호들을 믹싱하여 제2 채널 그룹에 포함되는 하나의 채널의 오디오 신호를 획득할 수 있다. The multi-channel audio converter 451 according to an embodiment may obtain second audio signals corresponding to the second channel group by downmixing the first audio signals corresponding to the first channel group. The multi-channel audio converter 451 may obtain an audio signal of one channel included in the second channel group by mixing audio signals of at least two channels included in the first channel group according to the channel group conversion rule. have.

도 7a는 일 실시 예에 따른 오디오 부호화 장치에서 수행되는 채널 그룹들 간의 변환 규칙의 예를 도시한다.7A illustrates an example of a transformation rule between channel groups performed in an audio encoding apparatus according to an embodiment.

도 7a에 도시된 바와 같이, 제1 채널 그룹은, 청자를 중심으로 음상이 구성되는 채널 그룹으로서, 청자 중심 음상 재현 시스템에 적합한 채널 그룹일 수 있다. 예를 들어, 제1 채널 그룹은, 청자 전방의 3개의 서라운드 채널(Left Channel(L), Center Channel(C), Right Channel(R)), 청자 측방 및 후방의 4개의 서라운드 채널(Side Left Channel(Ls), Side Right Channel(Rs), Back Left Channel(Lb), Back Right Channel(Rb)), 청자 전방의 1개의 서브우퍼 채널(Sub-woofer Channel(LFE)), 청자 전방의 2개의 상부 채널(Height Front Left Channel(HFL), Height Front Right Channel(HFR)) 및 청자 후방의 2개의 상부 채널(Height Back Left Channel(HBL), Height Back Right Channel(HBR))을 포함하는 7.1.4 채널 그룹일 수 있다. As shown in FIG. 7A , the first channel group is a channel group in which a sound image is formed centered on a listener, and may be a channel group suitable for a listener-centered sound image reproduction system. For example, the first channel group includes three surround channels in front of the listener (Left Channel (L), Center Channel (C), and Right Channel (R)), and four surround channels in the side and rear of the listener (Side Left Channel). (Ls), Side Right Channel(Rs), Back Left Channel(Lb), Back Right Channel(Rb)), 1 Sub-woofer Channel(LFE)) in front of listener, 2 upper part in front of listener 7.1.4 channel with channels (Height Front Left Channel (HFL), Height Front Right Channel (HFR)) and two upper channels behind the listener (Height Back Left Channel (HBL), Height Back Right Channel (HBR)) It can be a group.

제2 채널 그룹은, 디스플레이 장치의 화면을 중심으로 음상이 구성되는 채널 그룹으로서, 화면 중심 음상 재현 시스템에 적합한 채널 그룹일 수 있다. 예를 들어, 제2 채널 그룹은, 청자 전방의 3개의 서라운드 채널(Left Channel(L3), Center Channel(C), Right Channel(R3)), 청자 전방의 1개의 서브우퍼 채널(Sub-woofer Channel(LFE)), 및 2개의 상부 채널(Height Front Left Channel(HFL3), Height Front Right Channel(HFR3))을 갖는 3.1.2 채널 그룹일 수 있다. The second channel group is a channel group in which a sound image is configured based on the screen of the display device, and may be a channel group suitable for a screen-centered sound image reproduction system. For example, the second channel group includes three surround channels in front of the listener (Left Channel (L3), Center Channel (C), and Right Channel (R3)) and one sub-woofer channel in front of the listener. (LFE)), and a 3.1.2 channel group with two upper channels (Height Front Left Channel (HFL3), Height Front Right Channel (HFR3)).

도 7a에 도시된 바와 같이, 다채널 오디오 변환부(451)는, 제1 채널 그룹의 후방에 배치된 채널들의 오디오 신호들을 전방에 배치된 채널들의 오디오 신호들에 가중치 합(weighted summation)하는 방식으로, 제2 채널 그룹에 포함되는 채널들에 대응되는 화면 중심 음상 신호를 생성할 수 있다.As shown in FIG. 7A , the multi-channel audio converter 451 performs weighted summation of audio signals of channels disposed at the rear of the first channel group to audio signals of channels disposed at the front of the first channel group. , it is possible to generate a screen-centered sound image signal corresponding to channels included in the second channel group.

예를 들어, 다채널 오디오 변환부(451)는, 7.1.4 채널 그룹에 포함되는 채널들 중에서, 전방 좌측 채널(L)에 관한 오디오 신호, 좌측 채널(Ls)에 관한 오디오 신호, 및 후방 좌측 채널(Lb)에 관한 오디오 신호를 믹싱함으로써, 3.1.2 채널 그룹의 좌측 채널(L3)에 관한 오디오 신호를 획득할 수 있다. 또한, 다채널 오디오 변환부(451)는, 7.1.4 채널 그룹에 포함되는 채널들 중에서, 전방 우측 채널(R)에 관한 오디오 신호, 우측 채널(Rs)에 관한 오디오 신호, 및 후방 우측 채널(Rb)에 관한 오디오 신호를 믹싱함으로써, 3.1.2 채널 그룹의 우측 채널(R3)에 관한 오디오 신호를 획득할 수 있다.For example, the multi-channel audio converter 451 may include, among channels included in the 7.1.4 channel group, an audio signal for a front left channel (L), an audio signal for a left channel (Ls), and a rear left By mixing the audio signal for the channel Lb, the audio signal for the left channel L3 of the 3.1.2 channel group can be obtained. In addition, the multi-channel audio conversion unit 451, among the channels included in the 7.1.4 channel group, an audio signal for the front right channel (R), an audio signal for the right channel (Rs), and the rear right channel ( By mixing the audio signal for Rb), it is possible to obtain an audio signal for the right channel R3 of the 3.1.2 channel group.

또한, 다채널 오디오 변환부(451)는, 7.1.4 채널 그룹에 포함되는 채널들 중에서, 좌측 채널(Ls)에 관한 오디오 신호, 후방 좌측 채널(Lb)에 관한 오디오 신호, 전방 상부 좌측 채널(HFL)에 관한 오디오 신호, 및 후방 상부 좌측 채널(HBL)를 믹싱함으로써, 3.1.2 채널 그룹의 상부 좌측 채널(HFL3)에 관한 오디오 신호를 획득할 수 있다. 다채널 오디오 변환부(451)는, 7.1.4 채널 그룹에 포함되는 채널들 중에서, 우측 채널(Rs)에 관한 오디오 신호, 후방 우측 채널(Rb)에 관한 오디오 신호, 전방 상부 우측 채널(HFR)에 관한 오디오 신호, 및 후방 상부 우측 채널(HBR)을 믹싱함으로써, 3.1.2 채널 그룹의 상부 우측 채널(HFR3)에 관한 오디오 신호를 획득할 수 있다.In addition, the multi-channel audio conversion unit 451, among the channels included in the 7.1.4 channel group, an audio signal for the left channel (Ls), an audio signal for the rear left channel (Lb), the front upper left channel ( By mixing the audio signal for the HFL) and the rear upper left channel (HBL), it is possible to obtain the audio signal for the upper left channel (HFL3) of the 3.1.2 channel group. The multi-channel audio conversion unit 451 includes, among channels included in the 7.1.4 channel group, an audio signal for a right channel (Rs), an audio signal for a rear right channel (Rb), and a front upper right channel (HFR). By mixing the audio signal for , and the rear upper right channel (HBR), it is possible to obtain an audio signal for the upper right channel (HFR3) of the 3.1.2 channel group.

도 6으로 되돌아와서, 일 실시 예에 따른 믹싱부(453)는, 제2 채널 그룹에 대응되는 제2 오디오 신호들을 믹싱함으로써 기본 채널 그룹의 오디오 신호들 및 종속 채널 그룹의 오디오 신호들을 획득할 수 있다. 6 , the mixing unit 453 according to an embodiment may obtain the audio signals of the basic channel group and the audio signals of the dependent channel group by mixing the second audio signals corresponding to the second channel group. have.

일 실시 예에 따른 믹싱부(453)는, 제2 채널 그룹과 같은 화면 중심의 음상을 갖는 채널 그룹이, 모노 채널 또는 스테레오 채널과 같은 하위 채널 그룹으로도 호환될 수 있도록, 제2 채널 그룹에 대응되는 제2 오디오 신호들을 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호로 변환하여 출력할 수 있다. The mixing unit 453 according to an embodiment may include a second channel group such that a channel group having a screen-centered sound image, such as a second channel group, is compatible with a sub-channel group such as a mono channel or a stereo channel. The corresponding second audio signals may be converted into an audio signal of a basic channel group and an audio signal of a subordinate channel group and output.

따라서, 복호화 단의 출력 스피커 레이아웃에 따라서, 기본 채널 그룹의 오디오 신호만이 출력되거나, 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호에 기초하여 다채널 오디오 신호가 복원되어 출력될 수 있다. 다채널 오디오 신호들을 기본 채널 그룹 및 종속 채널 그룹에 따라 나누어 부호화하는 구체적인 방법은 도 2a 내지 도 2d를 참조하여 상술한 방법과 동일하므로 생략한다. Accordingly, only the audio signal of the basic channel group may be output according to the output speaker layout of the decoding stage, or the multi-channel audio signal may be restored and output based on the audio signal of the basic channel group and the audio signal of the dependent channel group. A detailed method of dividing and encoding multi-channel audio signals according to a base channel group and a sub-channel group is the same as the method described above with reference to FIGS. 2A to 2D, and thus will be omitted.

일 실시 예에 따른 믹싱부(453)는, 제2 채널 그룹에 대응되는 제2 오디오 신호들에 포함되는 적어도 두 채널들의 신호들을 믹싱함으로써, 기본 채널의 오디오 신호를 획득할 수 있다. 믹싱부(453)는, 제2 채널 그룹에 포함되는 채널들 중에서, 기본 채널 그룹의 오디오 신호에 대응하는 적어도 하나의 채널 이외의 채널들의 오디오 신호들을 종속 채널 그룹의 오디오 신호들로서 획득할 수 있다.The mixing unit 453 according to an embodiment may obtain an audio signal of a basic channel by mixing signals of at least two channels included in the second audio signals corresponding to the second channel group. The mixing unit 453 may obtain audio signals of channels other than at least one channel corresponding to the audio signal of the basic channel group from among the channels included in the second channel group as audio signals of the dependent channel group.

예를 들어, 제2 채널 그룹이 3.1.2 채널 그룹이고, 기본 채널 그룹이 스테레오 채널을 구성하는 L2 채널 및 R2 채널을 포함하는 경우, 믹싱부(453)는, 3.1.2 채널 그룹에서 좌측 채널(L3), 우측 채널(R3), 및 중심 채널(C)의 오디오 신호들을 믹싱함으로써, 스테레오 채널을 구성하는 L2 채널 및 R2 채널의 오디오 신호들을 획득할 수 있다. 믹싱부(453)는, 스테레오 채널을 구성하는 L2 채널 및 R2 채널의 오디오 신호들을 기본 채널 그룹의 오디오 신호들로서 획득할 수 있다. 믹싱부(453)는, 3.1.2 채널 그룹에서, 스테레오 채널에 대응하는 두 채널들(즉, 좌측 채널(L3), 우측 채널(R3)) 이외의 채널들(즉, 중심 채널(C), 서브우퍼(LFE) 채널, 좌측 상부 채널(HFL3), 및 우측 상부 채널(HFR3))의 오디오 신호들을 종속 채널 그룹의 오디오 신호들로서 획득할 수 있다. For example, when the second channel group is a 3.1.2 channel group and the basic channel group includes the L2 channel and the R2 channel constituting the stereo channel, the mixing unit 453 performs the left channel in the 3.1.2 channel group. By mixing the audio signals of (L3), the right channel (R3), and the center channel (C), it is possible to obtain audio signals of the L2 channel and the R2 channel constituting the stereo channel. The mixing unit 453 may obtain audio signals of the L2 channel and the R2 channel constituting the stereo channel as audio signals of the basic channel group. Mixing unit 453, in the 3.1.2 channel group, channels other than the two channels corresponding to the stereo channel (ie, the left channel (L3), the right channel (R3)) (ie, the center channel (C), Audio signals of the subwoofer (LFE) channel, the upper left channel (HFL3), and the upper right channel (HFR3)) may be obtained as audio signals of the dependent channel group.

제1 압축부(411)는, 기본 채널 그룹의 오디오 신호들을 압축함으로써 제1 압축 신호를 획득하고, 종속 채널 그룹의 오디오 신호들을 압축함으로써 제2 압축 신호를 획득할 수 있다. 제1 압축부(411)는, 주파수 변환, 양자화, 엔트로피 등의 처리 과정을 거쳐 오디오 신호들을 압축할 수 있다. 예를 들어, AAC 표준, OPUS 표준 등의 오디오 신호 압축 방법이 이용될 수 있다. The first compression unit 411 may obtain a first compressed signal by compressing audio signals of a basic channel group, and may obtain a second compressed signal by compressing audio signals of a dependent channel group. The first compression unit 411 may compress audio signals through processing such as frequency conversion, quantization, and entropy. For example, an audio signal compression method such as an AAC standard or an OPUS standard may be used.

일 실시 예에 따른 부가 정보 생성부(413)는, 제1 오디오 신호, 기본 채널 그룹의 제1 압축 신호, 및 종속 채널 그룹의 제2 압축 신호로부터 부가 정보를 획득할 수 있다. 부가 정보는, 복호화 단에서, 기본 채널의 오디오 신호 및 종속 채널의 오디오 신호에 기초하여 다채널 오디오 신호를 복호화하기 위해 이용되는 정보를 포함할 수 있다. The additional information generator 413 according to an embodiment may obtain additional information from the first audio signal, the first compressed signal of the basic channel group, and the second compressed signal of the dependent channel group. The additional information may include information used for decoding the multi-channel audio signal based on the audio signal of the base channel and the audio signal of the dependent channel at the decoding end.

일 실시 예에 따른 부가 정보 생성부(413)는, 제1 압축 신호, 제2 압축 신호 및 사이드 채널 정보의 제3 압축 신호를 각각 복호화하고, 복호화된 신호들로부터 제1 채널 그룹에 포함되는 채널들에 대응되는 복원 오디오 신호들을 획득하고, 복원 오디오 신호들과 제1 오디오 신호들을 비교함으로써 부가 정보를 획득할 수 있다. The additional information generator 413 according to an embodiment decodes the first compressed signal, the second compressed signal, and the third compressed signal of the side channel information, and a channel included in the first channel group from the decoded signals. Additional information may be obtained by obtaining reconstructed audio signals corresponding to the audio signals and comparing the reconstructed audio signals with the first audio signals.

일 예로서, 부가 정보 생성부(413)는, 복원 오디오 신호들과 제1 오디오 신호들 간의 오차가 최소가 되도록, 에러 제거 관련 정보(예를 들어, 에러 제거를 위한 스케일 팩터)를 부가 정보로서 획득할 수 있다.As an example, the additional information generator 413 may use the error removal related information (eg, a scale factor for error removal) as additional information so that the error between the restored audio signals and the first audio signals is minimized. can be obtained

다른 예로서, 부가 정보는 오디오 객체(음원)의 오디오 신호, 위치, 방향 중 적어도 하나를 나타내는 오디오 객체 신호를 포함할 수 있다. 또는 부가 정보는 기본 채널 오디오 스트림 및 종속 채널 오디오 스트림을 포함하는 오디오 스트림의 총 개수에 관한 정보를 포함할 수 있다. 또한, 부가 정보는 다운믹스 이득 정보를 포함할 수 있다. 부가 정보는 채널 맵핑 테이블 정보를 포함할 수 있다. 부가 정보는 음량 정보를 포함할 수 있다. 부가 정보는 저주파 효과 이득 정보를 포함할 수 있다. 부가 정보는 동적 범위 제어 정보를 포함할 수 있다. 부가 정보는 채널 그룹 렌더링 정보를 포함할 수 있다. 부가 정보는 그 외 커플링된 오디오 스트림의 개수 정보, 다채널 그룹을 나타내는 정보, 오디오 신호 내 대화 존재 여부 및 대화 레벨에 관한 정보, 저주파 효과 출력 여부를 나타내는 정보, 화면 상 오디오 객체의 존재 여부에 관한 정보, 연속적인 채널 오디오 신호의 존재 여부에 관한 정보, 비연속적인 채널 오디오 신호의 존재 여부에 관한 정보를 포함할 수 있다. 부가 정보는 다채널 오디오 신호를 복원하기 위한, 디믹싱 행렬의 적어도 하나의 디믹싱 가중치 파라미터를 포함하는 디믹싱에 관한 정보를 포함할 수 있다.As another example, the additional information may include an audio object signal indicating at least one of an audio signal, a position, and a direction of an audio object (sound source). Alternatively, the additional information may include information about the total number of audio streams including the base channel audio stream and the dependent channel audio stream. In addition, the additional information may include downmix gain information. The additional information may include channel mapping table information. The additional information may include volume information. The additional information may include low frequency effect gain information. The additional information may include dynamic range control information. The additional information may include channel group rendering information. The additional information includes information on the number of other coupled audio streams, information indicating multi-channel groups, information on the existence and dialogue level of dialogue in the audio signal, information indicating whether low-frequency effect is output, and the existence of an audio object on the screen. information regarding the existence of a continuous channel audio signal, and information regarding the existence of a discontinuous channel audio signal may be included. The additional information may include information about demixing including at least one demixing weight parameter of a demixing matrix for reconstructing a multi-channel audio signal.

일 실시 예에 따른 채널 정보 생성부(420)는, 제1 채널 그룹에 대응되는 제1 오디오 신호들을 제2 채널 그룹에 대응되는 제2 오디오 신호들로 변환하는 데에 있어서, 이용되지 않거나 적게 사용된 사이드 채널에 대한 정보를 생성할 수 있다. 채널 정보 생성부(420)는, 제1 채널 그룹에 포함되는 적어도 하나의 사이드 채널에 대한 정보를 생성할 수 있다. 채널 정보 생성부(420)는, 사이드 채널 식별부(421), 다운샘플링부(423) 및 제2 압축부(425)를 포함할 수 있다.The channel information generating unit 420 according to an embodiment is not used or used less when converting the first audio signals corresponding to the first channel group into the second audio signals corresponding to the second channel group. It is possible to generate information on the side channel. The channel information generator 420 may generate information on at least one side channel included in the first channel group. The channel information generation unit 420 may include a side channel identification unit 421 , a downsampling unit 423 , and a second compression unit 425 .

일 실시 예에 따른 사이드 채널 식별부(421)는, 제1 채널 그룹에 포함되는 채널들 중에서 적어도 하나의 사이드 채널을 식별하고, 적어도 하나의 사이드 채널의 오디오 신호를 출력할 수 있다. 사이드 채널 식별부(421)는, 제1 채널 그룹에 포함되는 채널들 중에서, 제2 채널 그룹에 포함되는 채널들과 관련도가 낮은 채널을 사이드 채널로서 식별할 수 있다. 예를 들어, 사이드 채널 식별부(421)는, 제1 채널 그룹을 제2 채널 그룹으로 변환하기 위해 제1 채널 그룹의 채널들에 적용되는 가중치 값들에 기초하여, 제1 채널 그룹에 포함되는 채널들 중에서 제2 채널 그룹에 포함되는 채널들과 관련도가 낮은 채널을 식별할 수 있다. 예를 들어, 사이드 채널 식별부(421)는, 제1 채널 그룹의 채널들 중에서, 채널에 적용되는 가중치 값이 소정 값 이하인 채널을 사이드 채널로서 식별할 수 있다. The side channel identification unit 421 according to an embodiment may identify at least one side channel among channels included in the first channel group and output an audio signal of the at least one side channel. The side channel identification unit 421 may identify, as a side channel, a channel having a low relevance to channels included in the second channel group from among the channels included in the first channel group. For example, the side channel identification unit 421 is configured to convert the first channel group into a second channel group, based on weight values applied to the channels of the first channel group, the channels included in the first channel group. Among them, a channel having a low relevance to channels included in the second channel group may be identified. For example, the side channel identification unit 421 may identify a channel having a weight value applied to the channel equal to or less than a predetermined value among channels of the first channel group as a side channel.

도 7b는 일 실시 예에 따른 오디오 부호화 장치에서 수행되는 채널 그룹들 간의 변환 규칙의 예를 도시한다. 7B illustrates an example of a transformation rule between channel groups performed in an audio encoding apparatus according to an embodiment.

도 7b는, 제1 채널 그룹이 청자 중심의 음상을 갖는 7.1.4 채널을 포함하고, 제2 채널 그룹이 청자 전방 중심(또는, 화면 중심)의 음상을 갖는 3.1.2 채널을 포함하는 경우를 예로 들어 설명한다. 7.1.4 채널에서, 3.1.2 채널과 상관도가 낮은 채널인(즉, 청자 전방에서 멀리 떨어진) Ls 채널, Rs 채널, Lb 채널, Rb 채널, HBL 채널, 및 HBR 채널이 사이드 채널들로 식별될 수 있다. 7B shows a case in which the first channel group includes 7.1.4 channels with a listener-centered sound image, and the second channel group includes 3.1.2 channels with a listener-centered (or screen-centered) sound image. An example will be described. In the 7.1.4 channel, the Ls channel, the Rs channel, the Lb channel, the Rb channel, the HBL channel, and the HBR channel, which are channels with low correlation with the 3.1.2 channel (ie, far from the front of the listener), are identified as side channels. can be

일 실시 예에 따른 오디오 부호화 장치(400)는, 제1 채널 그룹에 포함되는 적어도 하나의 채널에 대응되는 적어도 하나의 오디오 신호에 가중치를 적용하여 제2 채널 그룹에 포함되는 채널에 대응되는 오디오 신호를 획득할 수 있다. 오디오 부호화 장치(400)는, 제1 채널 그룹에 포함되는 적어도 두 채널들에 대응되는 오디오 신호들을 가중치 합함으로써 제2 채널 그룹에 포함되는 채널에 대응되는 오디오 신호를 획득할 수 있다. 오디오 부호화 장치(400)는, 제2 채널 그룹에 대응되는 오디오 신호들을 획득하기 위해 제1 채널 그룹에 대응되는 오디오 신호들에 적용되는 가중치들에 기초하여 사이드 채널을 식별할 수 있다.The audio encoding apparatus 400 according to an embodiment applies a weight to at least one audio signal corresponding to at least one channel included in the first channel group, and an audio signal corresponding to the channel included in the second channel group. can be obtained. The audio encoding apparatus 400 may obtain an audio signal corresponding to a channel included in the second channel group by weight summing audio signals corresponding to at least two channels included in the first channel group. The audio encoding apparatus 400 may identify a side channel based on weights applied to audio signals corresponding to the first channel group in order to obtain audio signals corresponding to the second channel group.

도 7b에 도시된 바와 같이, 3.1.2 채널 그룹의 L3 채널의 오디오 신호는, 7.1.4 채널 그룹의 L 채널, Ls 채널, 및 Lb 채널의 오디오 신호들의 가중치 합으로 표현될 수 있다. 오디오 부호화 장치(400)는, L 채널에 적용되는 가중치 값 w1이 Ls 채널 및 Lb 채널에 적용되는 가중치 값들 w2, 및 w3보다 클 경우, Ls 채널 및 Lb 채널을 사이드 채널로 결정할 수 있다. As shown in FIG. 7B , the audio signal of the L3 channel of the 3.1.2 channel group may be expressed as a weighted sum of the audio signals of the L channel, the Ls channel, and the Lb channel of the 7.1.4 channel group. When the weight value w1 applied to the L channel is greater than the weight values w2 and w3 applied to the Ls channel and the Lb channel, the audio encoding apparatus 400 may determine the Ls channel and the Lb channel as the side channels.

보다 구체적으로 설명하면, 도 7a를 참조하여 설명한 바와 같이, 3.1.2 채널 그룹의 L3 채널은, 7.1.2 채널 그룹의 L 채널, Ls 채널, 및 Lb 채널의 조합으로 구성될 수 있다. 디스플레이 디바이스가, 화면을 중심으로 음상을 구성하는 3.1.2 스피커 채널 레이아웃에 따라 오디오 신호를 출력하기 위해서, 7.1.2 채널 그룹에 포함되는 채널들 중에서 후방 채널들이 전방 채널들에 맵핑되어야 한다. 따라서, 7.1.2 채널의 전방에 배치된 L 채널에 적용되는 가중치 값 w1이, Ls 채널 및 Lb 채널에 각각 적용되는 가중치 값들 w2, 및 w3보다 클 수 있다. 적용되는 가중치값들에 기초하여, Ls 채널 및 Lb 채널이 L3 채널과 가장 관련도가 적은(Least-Correlated) 채널로 결정되며, 사이드 채널들로서 결정될 수 있다. L3 채널에 대한 메인 채널 및 사이드 채널을 식별하는 방법은 다음의 수학식으로 표현될 수 있다.More specifically, as described with reference to FIG. 7A , the L3 channel of the 3.1.2 channel group may be composed of a combination of the L channel, the Ls channel, and the Lb channel of the 7.1.2 channel group. In order for the display device to output an audio signal according to the 3.1.2 speaker channel layout constituting a sound image centered on the screen, rear channels among channels included in the 7.1.2 channel group should be mapped to front channels. Accordingly, the weight value w1 applied to the L channel disposed in front of the 7.1.2 channel may be greater than the weight values w2 and w3 applied to the Ls channel and the Lb channel, respectively. Based on the applied weight values, the Ls channel and the Lb channel are determined as the least-correlated channels with the L3 channel, and may be determined as side channels. A method of identifying a main channel and a side channel for the L3 channel may be expressed by the following equation.

[수학식] [Equation]

상기 수학식에서, M_L3는, L3 채널에 대한 적어도 하나의 메인 채널을 포함하는 메인 채널 그룹을 나타낸다. S_L3는, L3 채널에 대한 적어도 하나의 사이드 채널을 포함하는 사이드 채널 그룹을 나타낸다. C_L3는 L3 채널의 생성을 위해 이용되는 제1 채널 그룹의 채널들을 나타낸다. 예를 들어, C_L3는 L 채널, Ls 채널, 및 Lb 채널을 포함할 수 있다. F는 두 채널들 간의 유사도 함수(Similarity Function)를 나타낸다. F는 예를 들어, 상호 상관 함수(Cross Correlation Function)를 포함할 수 있다.In the above equation, M _L3 represents a main channel group including at least one main channel for the L3 channel. S _L3 represents a side channel group including at least one side channel for the L3 channel. C _L3 represents channels of the first channel group used for generation of the L3 channel. For example, C _L3 may include an L channel, an Ls channel, and an Lb channel. F represents a similarity function between two channels. F may include, for example, a cross correlation function.

상기 수학식에 따라서, 오디오 부호화 장치(400)는, L3 채널의 생성을 위해 이용되는 제1 채널 그룹의 채널들(예를 들어, L 채널, Ls 채널, 및 Lb 채널) 중에서, 관련도가 높은 채널(예를 들어, L 채널)을 메인 채널로 식별하고, 메인 채널들 이외의 채널들(예를 들어, Ls 채널 및 Lb 채널)을 사이드 채널들로 식별할 수 있다.According to the above equation, the audio encoding apparatus 400 has a high degree of relevance among channels (eg, L channel, Ls channel, and Lb channel) of the first channel group used to generate the L3 channel. A channel (eg, L channel) may be identified as a main channel, and channels other than the main channels (eg, Ls channel and Lb channel) may be identified as side channels.

동일한 방식으로, 오디오 부호화 장치(400)는, 도 7b에 도시된 변환 규칙을 참조하여, w1, w6, w8 이 w2, w3, w4, w7보다 크다고 판단할 수 있다. 오디오 부호화 장치(400)는, 이러한 판단에 기초하여, 7.1.4 채널 그룹의 채널들 중에서 L 채널, R 채널, HFL 채널, 및 HFR 채널을 메인 채널들로서 결정할 수 있다. 오디오 부호화 장치(400)는, 7.1.4 채널 그룹의 채널들 중에서 Ls 채널, Rs 채널, Lb 채널, Rb 채널, HBL 채널, HBR 채널을 사이드 채널들로서 결정할 수 있다. In the same manner, the audio encoding apparatus 400 may determine that w1, w6, and w8 are greater than w2, w3, w4, and w7 with reference to the transformation rule illustrated in FIG. 7B . The audio encoding apparatus 400 may determine the L channel, the R channel, the HFL channel, and the HFR channel from among the channels of the 7.1.4 channel group as main channels based on this determination. The audio encoding apparatus 400 may determine the Ls channel, the Rs channel, the Lb channel, the Rb channel, the HBL channel, and the HBR channel from among the channels of the 7.1.4 channel group as side channels.

본 개시는 도 7b에 도시된 채널 변환 규칙에 제한되지 않으며, 각 채널에 적용되는 가중치는 구현에 따라 다양하게 변경될 수 있다. 예를 들어, 도 7b에는 7.1.4 채널 그룹의 L 채널 및 R 채널에 동일한 가중치 w1이 적용된다고 도시되었으나, 상이한 가중치들이 적용될 수 있다.The present disclosure is not limited to the channel conversion rule shown in FIG. 7B, and a weight applied to each channel may be variously changed according to implementation. For example, although FIG. 7B shows that the same weight w1 is applied to the L channel and the R channel of the 7.1.4 channel group, different weights may be applied.

한편, 본 개시는, 제2 채널 그룹에 포함되는 채널들과 관련도가 낮은 채널을 사이드 채널로서 식별하는 예에 제한되지 않으며, 제1 채널 그룹의 채널들 중에서 소정 기준을 만족하는 채널이 사이드 채널로서 식별되거나, 음상 재현 성능을 고려하여 오디오 신호의 제작자가 결정한 채널이 사이드 채널로서 식별될 수 있다.Meanwhile, the present disclosure is not limited to an example of identifying a channel having low relevance to channels included in the second channel group as a side channel, and a channel satisfying a predetermined criterion among channels of the first channel group is a side channel. or a channel determined by a producer of an audio signal in consideration of sound image reproduction performance may be identified as a side channel.

일 실시 예에 따른 도 6의 다운샘플링부(423)는, 적어도 하나의 사이드 채널의 오디오 신호를 다운샘플링함으로써, 오디오 신호를 전송하기 위해 이용되는 자원을 절약할 수 있다. 다운샘플링부(423)는, 적어도 하나의 사이드 채널의 오디오 신호로부터 제1 오디오 샘플 그룹 및 제2 오디오 샘플 그룹을 추출할 수 있다. 예를 들어, 다운샘플링부(423)는, 사이드 채널에 관한 오디오 신호를 구성하는 오디오 샘플들을 시간축을 따라 배열하고, 복수의 오디오 샘플들을 포함하는 제1 오디오 샘플 그룹 및 제2 오디오 샘플 그룹을 추출할 수 있다. 다운샘플링부(423)는, 제1 오디오 샘플 그룹 및 제2 오디오 샘플 그룹을 합성함으로써, 다운샘플링 된 정보를 획득할 수 있다. The downsampling unit 423 of FIG. 6 according to an embodiment may save resources used for transmitting an audio signal by downsampling an audio signal of at least one side channel. The downsampling unit 423 may extract the first audio sample group and the second audio sample group from the audio signal of at least one side channel. For example, the downsampling unit 423 arranges audio samples constituting an audio signal related to a side channel along a time axis, and extracts a first audio sample group and a second audio sample group including a plurality of audio samples can do. The downsampling unit 423 may obtain downsampled information by synthesizing the first audio sample group and the second audio sample group.

도 8a는 일 실시 예에 따른 오디오 부호화 장치가 수행하는 사이드 채널 오디오 신호의 다운 샘플링을 설명하는 도면이다.8A is a diagram for describing downsampling of a side channel audio signal performed by an audio encoding apparatus according to an embodiment.

도 8a에 도시된 바와 같이, 일 실시 예에 따른 오디오 부호화 장치(400)는, 적어도 하나의 사이드 채널의 오디오 신호를 구성하는 오디오 샘플들로부터 홀수 번째 인덱스들의 제1 오디오 샘플 그룹(D_odd) 및 짝수 번째 인덱스들의 제2 오디오 샘플 그룹(D_even)을 추출할 수 있다.As shown in FIG. 8A , the audio encoding apparatus 400 according to an embodiment includes a first audio sample group (D _odd ) of odd-numbered indexes from audio samples constituting an audio signal of at least one side channel, and The second audio sample group D _even of even-numbered indexes may be extracted.

일 예로서, 오디오 부호화 장치(400)는, 다운 샘플링을 위하여 제1 오디오 샘플 그룹(D_odd) 및 제2 오디오 샘플 그룹(D_even)을 합성함에 있어서, 각 그룹 별, 시간 별로 중요도를 동일하게 고려한 균일 평균 필터(uniform average filter)를 사용할 수 있다. 오디오 부호화 장치(400)는, 샘플 그룹 및 시간에 관계 없이 모든 샘플들에 동일한 가중치 값(예를 들어, α, β=0.5)을 적용하여 제1 오디오 샘플 그룹(D_odd) 및 제2 오디오 샘플 그룹(D_even)을 합성함으로써 다운샘플링된 적어도 하나의 사이드 채널의 오디오 신호(D)를 획득할 수 있다.As an example, in synthesizing the first audio sample group (D _odd ) and the second audio sample group (D _even ) for downsampling, the audio encoding apparatus 400 sets the same importance for each group and each time period. Considering a uniform average filter can be used. The audio encoding apparatus 400 applies the same weight value (eg, α, β=0.5) to all samples irrespective of the sample group and time to obtain a first audio sample group (D _odd ) and a second audio sample The downsampled audio signal D of at least one side channel may be obtained by synthesizing the group D _even .

다른 예로서, 오디오 부호화 장치(400)는, 오디오 샘플 별로 상이한 가중치를 적용함으로써 더 좋은 성능을 얻을 수 있는 다운 샘플링된 데이터를 획득할 수 있다. 오디오 부호화 장치(400)는, 그룹 별 및 시간 별로 상이한 가중치 값들이 할당된 중요도 가중치 맵(α_map, β_map)을 적용하여 제1 오디오 샘플 그룹(D_odd) 및 제2 오디오 샘플 그룹(D_even)을 합성함으로써 다운샘플링된 적어도 하나의 사이드 채널의 오디오 신호(D)를 획득할 수 있다.As another example, the audio encoding apparatus 400 may obtain down-sampled data capable of obtaining better performance by applying different weights to each audio sample. The audio encoding apparatus 400 applies an importance weight map (α _map , β _map ) to which different weight values are allocated for each group and for each time to apply the first audio sample group (D _odd ) and the second audio sample group (D _even ) ) to obtain the downsampled audio signal D of at least one side channel.

일 실시 예에 따른 다운샘플링부(423)는, 인공 지능 모델을 이용하여, 제1 오디오 샘플 그룹 및 제2 오디오 샘플 그룹에 대한 다운샘플링 관련 정보를 획득하고, 다운샘플링 관련 정보를 제1 오디오 샘플 그룹 및 제2 오디오 샘플 그룹에 적용함으로써, 적어도 하나의 제3 오디오 신호를 다운샘플링할 수 있다. 예를 들어, 일 실시 예에 따른 다운샘플링부(423)는, 인공 지능 모델을 이용하여 각 오디오 샘플에 다르게 적용될 가중치를 산출함으로써, 균일한 가중치를 부여하여 오디오 샘플을 합성하는 방식에 비해 좋은 성능을 얻을 수 있다.The downsampling unit 423 according to an embodiment obtains downsampling related information on the first audio sample group and the second audio sample group by using the artificial intelligence model, and uses the downsampling related information to convert the downsampling related information to the first audio sample. By applying to the group and the second audio sample group, it is possible to downsample the at least one third audio signal. For example, the downsampling unit 423 according to an embodiment calculates a weight to be differently applied to each audio sample using an artificial intelligence model, so that the performance is better compared to a method of synthesizing audio samples by giving uniform weights can get

다운샘플링부(423)는, 인공지능 모델을 이용하여, 제1 오디오 샘플 그룹 및 제2 오디오 샘플 그룹 각각에 적용될 가중치 값들을 획득할 수 있다. 다운샘플링부(423)는, 미리 훈련된 인공지능 모델을 이용하여, 각 오디오 샘플의 특징(feature)을 추출하고, 추출된 특징에 기초하여 각 오디오 샘플의 중요도를 결정하고, 해당 오디오 샘플에 적용될 가중치 값을 계산할 수 있다. 다운샘플링부(423)는, 시간에 따라 또는 오디오 샘플 그룹에 따라, 각 샘플에 상이하게 적용되는 가중치 값들을 포함하는 가중치 맵을 획득하고, 제1 오디오 샘플 그룹 및 제2 오디오 샘플 그룹에 적용할 수 있다. 다운샘플링부(423)는, 가중치 맵이 적용된, 제1 오디오 샘플 그룹 및 제2 오디오 샘플 그룹의 가중치 합을 사이드 채널 정보로서 획득할 수 있다.The downsampling unit 423 may obtain weight values to be applied to each of the first audio sample group and the second audio sample group by using the artificial intelligence model. The downsampling unit 423 extracts a feature of each audio sample using a pre-trained artificial intelligence model, determines the importance of each audio sample based on the extracted feature, and is applied to the audio sample. weight values can be calculated. The downsampling unit 423 is configured to obtain a weight map including weight values that are differently applied to each sample over time or according to an audio sample group, and apply to the first audio sample group and the second audio sample group. can The downsampling unit 423 may obtain a weighted sum of the first audio sample group and the second audio sample group to which the weight map is applied as side channel information.

일 실시 예에 따른 다운샘플링부(423)는, 오디오 샘플들 별로 가중치를 구하고, 오디오 샘플들에 가중치를 적용하여 다운 샘플링된 데이터를 생성할 수 있다. 다운샘플링부(423)는, 가중치를 적용함으로써 복호화 단에서의 오디오 복원 성능이 향상되는 지를 판단하고, 판단 결과를 훈련(train)할 수 있다. 다운샘플링부(423)는, 훈련된 인공 지능 모델을 이용하여, 오디오 복원율을 높일 수 있는 중요도 가중치 맵(α_map, β_map)을 도출할 수 있다.The downsampling unit 423 according to an embodiment may generate down-sampled data by obtaining a weight for each audio sample and applying the weight to the audio sample. The downsampling unit 423 may determine whether audio restoration performance in the decoding stage is improved by applying the weight, and train the determination result. The downsampling unit 423 may derive importance weight maps α _map and β _map that can increase the audio restoration rate by using the trained artificial intelligence model.

도 8b는 일 실시 예에 따른 오디오 부호화 장치의 다운샘플링부의 동작을 설명한다.8B illustrates an operation of a downsampling unit of an audio encoding apparatus according to an embodiment.

도 8b는, 일 실시 예에 따른 다운샘플링부(423)가 이용하는 인공 지능 모델의 블록도의 예를 도시한다. 다운샘플링을 위한 인공 지능 모델(802)은, 적어도 하나의 사이드 채널의 오디오 채널을 샘플링하고, 커널 사이즈(kernel size)가 K1이고 채널들의 숫자가 C1인 컨볼루션을 수행할 수 있다. 8B shows an example of a block diagram of an artificial intelligence model used by the downsampling unit 423 according to an embodiment. The artificial intelligence model 802 for downsampling may sample an audio channel of at least one side channel and perform convolution in which a kernel size is K1 and the number of channels is C1.

인공 지능 모델(802)의 S2D(space-to-depth)(S)는 입력 오디오 샘플들(S)을 하나씩 건너 뛰어서 샘플링을 수행하는 동작을 의미한다. 도 8b의 표기법(notation)(803)에 도시된 바와 같이, 1DConv(K1, C1)는 1차원 컨벌루션 레이어를 의미하고, S2D(S)에서 샘플링된 신호를 C1 개의 멀티 채널들로 분리하는 동작을 의미한다. 도 8a에서 1DRB는 1차원 레지듀얼 블록(Residual Block)을 의미하고, Prelu는 활성화 함수(Activation Function)을 의미할 수 있다.S2D (space-to-depth) (S) of the artificial intelligence model 802 refers to an operation of performing sampling by skipping input audio samples (S) one by one. As shown in the notation 803 of FIG. 8B, 1DConv(K1, C1) means a one-dimensional convolutional layer, and the operation of dividing a signal sampled in S2D(S) into C1 multi-channels it means. In FIG. 8A , 1DRB may mean a one-dimensional residual block, and Prelu may mean an activation function.

인공 지능 모델(802)의 특징 추출부(Basis Feature Extraction unit)는, 입력 데이터로부터 특징을 추출할 수 있다. 예를 들어, 도 8a에 도시된 제1 오디오 샘플 그룹(D_odd) 및 제2 오디오 샘플 그룹(D_even)가 특징(feature) 레벨로 추출될 수 있다. 영역 분석부(Region Analysis unit)는, 로컬 특징을 추출하고 주변 영역 분석을 수행할 수 있다. 중요도 맵 생성부(Importance Map Generation unit)는, 각 특징(feature)에 가중될(being weighted) 샘플 별 중요도 값을 추출할 수 있다. 도 8b에 도시된 바와 같이, 각 모듈은 복수의 컨벌루션 레이어들로 구성될 수 있다.A basis feature extraction unit of the artificial intelligence model 802 may extract features from input data. For example, the first audio sample group D _odd and the second audio sample group D _even shown in FIG. 8A may be extracted as a feature level. The region analysis unit may extract local features and perform peripheral region analysis. The importance map generation unit may extract an importance value for each sample to be weighted for each feature. As shown in FIG. 8B , each module may be composed of a plurality of convolutional layers.

도 8b는 다운샘플링부(423)의 동작을 설명하기 위한 예시일 뿐, 본 개시의 실시 예들이 도 8b에 도시된 예에 제한되지 않는다. 일 실시 예에 따른 오디오 부호화 장치(400)가 이용하는 인공 지능 모델은, 학습(learning)을 통해 자율적으로 결정하고 확장될 수 있다. 오디오 부호화 장치(400)가 이용하는 인공 지능 모델은, 오디오 신호 복원 성능을 높이기 위하여 다양하게 구성되고 훈련될 수 있다.8B is only an example for explaining the operation of the downsampling unit 423, and embodiments of the present disclosure are not limited to the example shown in FIG. 8B. The artificial intelligence model used by the audio encoding apparatus 400 according to an embodiment may be autonomously determined and expanded through learning. The artificial intelligence model used by the audio encoding apparatus 400 may be variously configured and trained in order to improve audio signal restoration performance.

도 6으로 되돌아 와서, 다운샘플링부(423)는, 적어도 하나의 사이드 채널의 오디오 신호를 다운 샘플링하고, 다운샘플링된 데이터를 포함하는 사이드 채널 정보를 획득할 수 있다. 제2 압축부(425)는, 사이드 채널 정보를 압축함으로써, 제3 압축 신호를 획득할 수 있다. 제2 압축부(425)는, 주파수 변환, 양자화, 엔트로피 등의 처리 과정을 거쳐 사이드 채널 정보를 압축할 수 있다. 예를 들어, AAC 표준, OPUS 표준 등의 오디오 신호 압축 방법이 이용될 수 있다.6 , the downsampling unit 423 may downsample an audio signal of at least one side channel and obtain side channel information including downsampled data. The second compression unit 425 may obtain the third compressed signal by compressing the side channel information. The second compression unit 425 may compress the side channel information through processing such as frequency transformation, quantization, and entropy. For example, an audio signal compression method such as an AAC standard or an OPUS standard may be used.

일 실시 예에 따른 비트스트림 생성부(430)는, 다채널 오디오 부호화부(410)에서 출력되는 기본 채널 그룹의 제1 압축 신호, 종속 채널 그룹의 제2 압축 신호, 부가 정보, 및 채널 정보 생성부(420)에서 생성된 사이드 채널의 제3 압축 신호로부터 비트스트림을 생성할 수 있다. 비트스트림 생성부(430)는, 압축 신호들에 대한 캡슐화 과정을 거쳐 비트스트림을 생성할 수 있다. 비트스트림 생성부(430)는, 기본 채널 그룹의 제1 압축 신호가 기본 채널 오디오 스트림에 포함되고, 종속 채널 그룹의 제2 압축 신호 및 사이드 채널의 제3 압축 신호가 종속 채널 오디오 스트림에 포함되고, 부가 정보가 메타 데이터에 포함되도록 캡슐화를 수행함으로써 비트스트림을 생성할 수 있다.The bitstream generator 430 according to an embodiment generates a first compressed signal of a basic channel group, a second compressed signal of a dependent channel group, additional information, and channel information output from the multi-channel audio encoder 410 . The bitstream may be generated from the third compressed signal of the side channel generated by the unit 420 . The bitstream generator 430 may generate a bitstream through an encapsulation process for compressed signals. The bitstream generator 430 includes a first compressed signal of a basic channel group included in a basic channel audio stream, a second compressed signal of a dependent channel group and a third compressed signal of a side channel included in the dependent channel audio stream, and , the bitstream may be generated by encapsulating the additional information to be included in the metadata.

도 9는 일 실시 예에 따른 오디오 복호화 장치의 블록도를 도시한다.9 is a block diagram of an audio decoding apparatus according to an embodiment.

일 실시 예에 따른 오디오 복호화 장치(500)의 정보 획득부(510)는, 비트스트림 내에서 캡슐화 되어 있는 기본 채널 오디오 스트림, 종속 채널 오디오 스트림 및 메타 데이터를 식별할 수 있다. The information obtaining unit 510 of the audio decoding apparatus 500 according to an embodiment may identify a base channel audio stream, a dependent channel audio stream, and metadata encapsulated in a bitstream.

정보 획득부(510)는, 기본 채널 오디오 스트림으로부터 기본 채널 그룹의 압축 오디오 신호를 획득하고, 종속 채널 오디오 스트림으로부터 종속 채널 그룹의 압축 오디오 신호 및 압축 사이드 채널 정보를 획득하고, 메타 데이터로부터 부가 정보를 획득할 수 있다.The information obtaining unit 510 obtains the compressed audio signal of the base channel group from the base channel audio stream, obtains the compressed audio signal of the subordinate channel group and the compressed side channel information from the subchannel audio stream, and additional information from the metadata can be obtained.

일 실시 예에 따른 오디오 복호화 장치(500)의 다채널 오디오 복호화부(520)는, 비트스트림로부터 획득된 압축 신호들을 복호화 하여 제1 채널 그룹에 대응되는 제1 오디오 신호들을 획득할 수 있다. 다채널 오디오 복호화부(520)는, 제1 압축 해제부(521) 및 다채널 오디오 신호 복원부(550)를 포함할 수 있다.The multi-channel audio decoder 520 of the audio decoding apparatus 500 according to an embodiment may decode compressed signals obtained from the bitstream to obtain first audio signals corresponding to the first channel group. The multi-channel audio decoder 520 may include a first decompressor 521 and a multi-channel audio signal restorer 550 .

제1 압축 해제부(521)는, 기본 채널 그룹의 압축 오디오 신호에 대한 엔트로피 복호화, 역양자화, 및 주파수 역변환 등의 압축해제 과정을 거쳐, 기본 채널 그룹의 적어도 하나의 오디오 신호를 획득할 수 있다. 제1 압축 해제부(521)는, 종속 채널 그룹의 압축 오디오 신호에 대한 엔트로피 복호화, 역양자화, 및 주파수 역변환 등의 압축해제 과정을 거쳐, 적어도 하나의 종속 채널 그룹의 적어도 하나의 오디오 신호를 획득할 수 있다. 예를 들어, AAC 표준, OPUS 표준 등의 오디오 신호 압축 방법에 대응되는 오디오 신호 복원 방법이 이용될 수 있다.The first decompression unit 521 may obtain at least one audio signal of the basic channel group through decompression processes such as entropy decoding, inverse quantization, and inverse frequency transformation for the compressed audio signal of the basic channel group. . The first decompression unit 521 obtains at least one audio signal of at least one subordinate channel group through decompression processes such as entropy decoding, inverse quantization, and inverse frequency transform on the compressed audio signal of the subordinate channel group. can do. For example, an audio signal restoration method corresponding to an audio signal compression method such as an AAC standard or an OPUS standard may be used.

일 실시 예에 따른 다채널 오디오 신호 복원부(550)는, 기본 채널 그룹의 적어도 하나의 오디오 신호, 적어도 하나의 종속 채널 그룹의 적어도 하나의 오디오 신호, 및 부가 정보에 기초하여 제1 채널 그룹에 대응되는 제1 오디오 신호들을 복원할 수 있다.The multi-channel audio signal restoration unit 550 according to an embodiment is configured to provide a first channel group based on at least one audio signal of a basic channel group, at least one audio signal of at least one subordinate channel group, and additional information. The corresponding first audio signals may be restored.

다채널 오디오 신호 복원부(550)의 믹싱부(555)는, 기본 채널 그룹의 적어도 하나의 오디오 신호 및 적어도 하나의 종속 채널 그룹의 적어도 하나의 오디오 신호를 믹싱함으로써 제2 채널 그룹에 포함되는 채널들로 믹싱된 오디오 신호를 획득할 수 있다. 믹싱부(555)는, 적어도 하나의 기본 채널의 오디오 신호 및 적어도 하나의 종속 채널의 오디오 신호의 가중치 합을 믹싱된 오디오 신호로서 획득할 수 있다. The mixing unit 555 of the multi-channel audio signal restoration unit 550 mixes at least one audio signal of a basic channel group and at least one audio signal of at least one subordinate channel group, whereby a channel included in the second channel group A mixed audio signal may be obtained. The mixing unit 555 may obtain a weighted sum of the audio signal of at least one basic channel and the audio signal of at least one dependent channel as the mixed audio signal.

예를 들어, 제1 채널 그룹이 3.1.2 채널 그룹이고, 기본 채널 그룹이 스테레오 채널을 구성하는 L2 채널 및 R2 채널을 포함하는 경우, 믹싱부(555)는, 기본 채널 그룹에 포함되는 L2 채널의 오디오 신호와 종속 채널 그룹에 포함되는 C 채널의 오디오 신호의 가중치 합을, 3.1.2 채널 그룹의 L3 채널의 오디오 신호로 획득할 수 있다. 믹싱부(555)는, 기본 채널 그룹에 포함되는 R2 채널의 오디오 신호와 종속 채널 그룹에 포함되는 C 채널의 오디오 신호의 가중치 합을, 3.1.2 채널 그룹의 R3 채널의 오디오 신호로 획득할 수 있다.For example, when the first channel group is a 3.1.2 channel group and the basic channel group includes the L2 channel and the R2 channel constituting the stereo channel, the mixing unit 555 includes the L2 channel included in the basic channel group. A weighted sum of the audio signal of , and the audio signal of the C channel included in the dependent channel group may be obtained as the audio signal of the L3 channel of the 3.1.2 channel group. The mixing unit 555 may obtain a weighted sum of the audio signal of the R2 channel included in the basic channel group and the audio signal of the C channel included in the dependent channel group as the audio signal of the R3 channel of the 3.1.2 channel group. have.

제2 렌더링부(552)는, 믹싱부(555)에서 믹싱된 신호를, 부가 정보에 기초하여 렌더링함으로써, 제1 채널 그룹에 대응되는 제1 오디오 신호들을 획득할 수 있다. 부가 정보는, 오디오 신호 복원 시 오차를 줄이기 하기 위하여 오디오 부호화 장치(400)에서 계산되어 전송된 정보를 포함할 수 있다.The second rendering unit 552 may obtain the first audio signals corresponding to the first channel group by rendering the signal mixed by the mixing unit 555 based on the additional information. The additional information may include information calculated and transmitted by the audio encoding apparatus 400 in order to reduce an error when restoring an audio signal.

일 실시 예에 따른 음상 복원부(530)는, 비트스트림에 포함된 사이드 채널 정보를 이용하여, 제1 채널 그룹에 대응되는 제1 오디오 신호들로부터, 제1 채널 그룹의 채널의 개수보다 더 많은 채널 개수의 제2 채널 그룹의 제3 오디오 신호들을 복원할 수 있다. 음상 복원부(530)는, 제2 압축 해제부(531), 업샘플링부(533) 및 수정(refinement)부(535)를 포함할 수 있다.The sound image restoration unit 530 according to an embodiment may use side channel information included in the bitstream to generate more than the number of channels in the first channel group from the first audio signals corresponding to the first channel group. The third audio signals of the second channel group having the number of channels may be reconstructed. The sound image restoration unit 530 may include a second decompression unit 531 , an upsampling unit 533 , and a refinement unit 535 .

제2 압축 해제부(531)는, 압축 사이드 채널 정보에 대한 엔트로피 복호화, 역양자화, 및 주파수 역변환 등의 압축해제 과정을 거쳐, 사이드 채널 정보를 획득할 수 있다. 예를 들어, AAC 표준, OPUS 표준 등의 오디오 신호 압축 방법에 대응되는 오디오 신호 복원 방법이 이용될 수 있다.The second decompression unit 531 may obtain side channel information through decompression processes such as entropy decoding, inverse quantization, and inverse frequency transform for the compressed side channel information. For example, an audio signal restoration method corresponding to an audio signal compression method such as an AAC standard or an OPUS standard may be used.

사이드 채널 정보는, 제2 채널 그룹에 대응되는 오디오 신호들을 복원하기 위해 이용될 수 있는, 제2 채널 그룹에 포함되는 적어도 하나의 채널에 대응되는 적어도 하나의 제2 오디오 신호를 포함할 수 있다. 사이드 채널은, 제2 채널 그룹에 포함되는 채널들 중에서, 제1 채널 그룹에 포함되는 채널들과 관련도가 낮은 채널일 수 있다. The side channel information may include at least one second audio signal corresponding to at least one channel included in the second channel group, which may be used to reconstruct audio signals corresponding to the second channel group. The side channel may be a channel having low relevance to channels included in the first channel group among channels included in the second channel group.

예를 들어, 제1 채널 그룹에 대응되는 제1 오디오 신호들은, 청자 전방 중심의 음상을 갖는 다채널 오디오 신호를 포함하고, 제2 채널 그룹에 대응되는 제3 오디오 신호들은, 청자 중심의 음상을 갖는 다채널 오디오 신호를 포함할 수 있다. 이 경우, 사이드 채널 정보는, 제2 채널 그룹에 포함되는 채널들 중에서, 청자 전방 채널 성분으로 구성된 제1 채널 그룹에 포함되는 채널들과 관련도가 낮은, 청자의 측방 채널 성분 및 후방 채널 성분을 포함할 수 있다. 제2 채널 그룹에서 사이드 채널들 이외의 채널들은, 제1 채널 그룹의 채널들과 상관도가 높은 메인 채널들로서 식별될 수 있다. For example, the first audio signals corresponding to the first channel group include a multi-channel audio signal having a listener-centered sound image, and the third audio signals corresponding to the second channel group include a listener-centered sound image. It may include a multi-channel audio signal with In this case, the side channel information includes, among channels included in the second channel group, a side channel component and a rear channel component of the listener, which have low relevance to channels included in the first channel group including the listener front channel component. may include Channels other than side channels in the second channel group may be identified as main channels having high correlation with channels of the first channel group.

그러나 본 개시는, 사이드 채널이 제1 채널 그룹에 포함되는 채널들과 관련도가 낮은 채널을 포함하는 예에 제한되지 않으며, 제2 채널 그룹의 채널들 중에서 소정 기준을 만족하는 채널이 사이드 채널이거나, 오디오 신호의 제작자가 의도한 채널이 사이드 채널이 될 수 있다.However, the present disclosure is not limited to an example in which the side channel includes a channel having low relevance to channels included in the first channel group, and a channel satisfying a predetermined criterion among channels in the second channel group is a side channel or , a channel intended by the producer of the audio signal may be a side channel.

예를 들어, 제1 채널 그룹이 청자 전방 중심의 음상을 갖는 3.1.2 채널이고, 제2 채널 그룹이 청자 중심의 음상을 갖는 7.1.4 채널인 경우, 제1 채널 그룹에 대응되는 제1 오디오 신호들로부터 제2 채널 그룹에 대응되는 제3 오디오 신호를 복원하기 위해서는, 적어도 6개의 채널들에 대한 정보가 필요하다. For example, when the first channel group is a 3.1.2 channel having a listener-centered sound image and the second channel group is a 7.1.4 channel having a listener-centered sound image, the first audio corresponding to the first channel group In order to reconstruct the third audio signal corresponding to the second channel group from the signals, information on at least six channels is required.

일 예로서, 7.1.4 채널 그룹의 채널들 중에서, 3.1.2 채널 그룹의 채널들과 상관도가 낮은 채널인(즉, 청자 전방에서 멀리 떨어진) Ls 채널, Rs 채널, Lb 채널, Rb 채널, HBL 채널, HBR 채널에 관한 정보가 사이드 채널 정보에 포함될 수 있다.As an example, among the channels of the 7.1.4 channel group, the Ls channel, the Rs channel, the Lb channel, the Rb channel, which are channels with low correlation with the channels of the 3.1.2 channel group (that is, farther from the front of the listener), Information on the HBL channel and the HBR channel may be included in the side channel information.

다른 예로서, 7.1.4 채널 그룹의 채널들 중에서, 3.1.2 채널 그룹의 채널들과 상관도가 높은 채널인, L 채널, R 채널, HFL 채널, HFR 채널, C 채널, 및 LFE 채널에 관한 정보가 사이드 채널 정보에 포함될 수 있다.As another example, among the channels of the 7.1.4 channel group, the L channel, the R channel, the HFL channel, the HFR channel, the C channel, and the LFE channel, which are channels with high correlation with the channels of the 3.1.2 channel group Information may be included in side channel information.

일 실시 예에 따른 업샘플링부(533)는, 사이드 채널 정보를 업샘플링함으로써, 제2 채널 그룹에 포함되는 채널들 중에서 적어도 하나의 채널에 대응되는 적어도 하나의 제2 오디오 신호를 획득할 수 있다. 예를 들어, 업샘플링부(533)는, 인공 지능 모델을 이용하여, 사이드 채널 정보에 포함된 다운 샘플링된 정보를 업샘플링할 수 있다. 오디오 복호화 장치(500)의 업샘플링부(533)에서 이용되는 인공 지능 모델은, 오디오 부호화 장치(400)의 다운샘플링부(423)에서 이용되는 인공 지능 모델에 대응될 수 있고, 오디오 신호 복원 성능을 높이기 위하여 훈련된 인공 지능 모델일 수 있다. 업샘플링부(533)는, 시간축 업샘플링을 수행함으로써 적어도 하나의 사이드 채널에 대응되는 제2 오디오 신호를 획득할 수 있다. The upsampling unit 533 according to an embodiment may obtain at least one second audio signal corresponding to at least one channel among channels included in the second channel group by upsampling the side channel information. . For example, the upsampling unit 533 may upsample down-sampled information included in the side channel information using an artificial intelligence model. The artificial intelligence model used in the upsampling unit 533 of the audio decoding apparatus 500 may correspond to the artificial intelligence model used in the downsampling unit 423 of the audio encoding apparatus 400, and audio signal restoration performance It may be an artificial intelligence model trained to increase . The upsampling unit 533 may obtain a second audio signal corresponding to at least one side channel by performing time-base upsampling.

일 실시 예에 따른 수정부(535)는, 적어도 하나의 사이드 채널에 대응되는 제2 오디오 신호를 이용하여 제1 채널 그룹에 대응되는 제1 오디오 신호들로부터 제2 채널 그룹에 대응되는 제3 오디오 신호들을 복원할 수 있다. 수정부(535)는, 부호화 단에서 사이드 채널의 오디오 신호를 다운 샘플링하여 전송함으로써 발생한 데이터 손실을 보상하기 위하여, 인공 지능에 기반하여 적어도 하나의 사이드 채널의 오디오 신호와 적어도 하나의 메인 채널의 오디오 신호를 수정(refinement)할 수 있다. 수정부(535)는, 인공 지능에 기반하여, 적어도 하나의 사이드 채널의 오디오 신호와 적어도 하나의 메인 채널의 오디오 신호를 교차적으로 반복 수정할 수 있다.The modifying unit 535 according to an embodiment may use a second audio signal corresponding to at least one side channel from the first audio signals corresponding to the first channel group to the third audio corresponding to the second channel group. signals can be restored. The correction unit 535 is configured to compensate for data loss caused by down-sampling and transmitting the audio signal of the side channel at the encoding stage, based on artificial intelligence, the audio signal of at least one side channel and the audio signal of at least one main channel Signals can be refined. The correction unit 535 may alternately and repeatedly modify the audio signal of at least one side channel and the audio signal of at least one main channel based on artificial intelligence.

먼저, 수정부(535)는, 제1 채널 그룹과 제2 채널 그룹 간의 채널 그룹 변환 규칙에 따라, 제3 오디오 신호 및 적어도 하나의 사이드 채널의 오디오 신호로부터, 제1 채널 그룹의 적어도 하나의 메인 채널의 오디오 신호를 획득할 수 있다. 수정부(535)는, 적어도 하나의 사이드 채널에 관한 오디오 신호 및 적어도 하나의 메인 채널에 관한 오디오 신호를 초기 조건으로 설정할 수 있다. 수정부(535)는, 미리 훈련된 인공 지능 모델에 초기 조건을 적용함으로써, 적어도 하나의 사이드 채널에 관한 오디오 신호 및 적어도 하나의 메인 채널에 관한 오디오 신호를 교차적으로 수정하고, 수정된 사이드 채널에 관한 오디오 신호 및 수정된 메인 채널에 관한 오디오 신호를 포함하는 제4 오디오 신호를 획득할 수 있다. 수정부(535)는, 사이드 채널에 관한 오디오 신호 및 메인 채널에 관한 오디오 신호를 교차적으로 수정하는 동작을 반복해서 수행함으로써, 음상 재현 성능을 높일 수 있다. 수정부(535)의 동작에 관해서는 추후 도 10a 내지 도 10c를 참조하여 보다 구체적으로 설명한다.First, the modifying unit 535 is configured to perform at least one main of the first channel group from the third audio signal and the audio signal of at least one side channel according to the channel group conversion rule between the first channel group and the second channel group. It is possible to obtain an audio signal of a channel. The correction unit 535 may set an audio signal related to at least one side channel and an audio signal related to at least one main channel as initial conditions. The correction unit 535 alternately corrects the audio signal for at least one side channel and the audio signal for at least one main channel by applying an initial condition to the pre-trained artificial intelligence model, and the modified side channel It is possible to obtain a fourth audio signal including an audio signal for , and an audio signal for the modified main channel. The correction unit 535 may improve sound image reproduction performance by repeatedly performing an operation of alternately modifying the audio signal for the side channel and the audio signal for the main channel. The operation of the correction unit 535 will be described in more detail later with reference to FIGS. 10A to 10C .

한편 일 실시 예에 따른 오디오 복호화 장치(500)는, 오디오 재생 환경에 따라서 다양한 채널 레이아웃을 지원할 수 있다. Meanwhile, the audio decoding apparatus 500 according to an embodiment may support various channel layouts according to an audio reproduction environment.

일 실시 예에 따른 오디오 복호화 장치(500)의 다채널 오디오 신호 복원부(550)는, 제1 렌더링부(551)를 통해 기본 채널 그룹의 오디오 신호들을 렌더링하여 출력할 수 있다. 예를 들어, 제1 렌더링부(551)는, 기본 채널 그룹의 오디오 신호들로부터 모노 채널 오디오 신호 또는 스테레오 채널 오디오 신호를 출력할 수 있다. 기본 채널 그룹의 오디오 신호들은, 다른 종속 채널 그룹의 오디오 신호들에 대한 정보 없이, 독립적으로 복호화되어 출력될 수 있다. The multi-channel audio signal restoration unit 550 of the audio decoding apparatus 500 according to an embodiment may render and output the audio signals of the basic channel group through the first rendering unit 551 . For example, the first rendering unit 551 may output a mono channel audio signal or a stereo channel audio signal from audio signals of a basic channel group. Audio signals of the basic channel group may be independently decoded and output without information on audio signals of other subordinate channel groups.

또한, 일 실시 예에 따른 오디오 복호화 장치(500)의 다채널 오디오 신호 복원부(550)는, 제2 렌더링부(552)를 통해 제1 채널 그룹에 대응되는 제1 오디오 신호들을 렌더링하여 출력할 수 있다. 오디오 복호화 장치(500)는, 청자 전방 중심의 음상을 갖는 제1 채널 그룹의 오디오 컨텐츠가 소비되는 경우, 사이드 채널 정보에 기반한 청자 중심의 음상을 복원하지 않고, 제1 채널 그룹에 대응되는 제1 오디오 신호들을 복원하여 출력할 수 있다. Also, the multi-channel audio signal restoration unit 550 of the audio decoding apparatus 500 according to an embodiment renders and outputs the first audio signals corresponding to the first channel group through the second rendering unit 552 . can When the audio content of the first channel group having the sound image centered in front of the listener is consumed, the audio decoding apparatus 500 does not restore the audio image centered on the listener based on the side channel information, but rather the first channel group corresponding to the first channel group. Audio signals may be restored and output.

또한, 일 실시 예에 따른 오디오 복호화 장치(500)의 음상 복원부(530)는, 제1 채널 그룹에 대응되는 제1 오디오 신호들 및 적어도 하나의 사이드 채널에 대응되는 제2 오디오 신호로부터 청자 중심의 제2 채널 그룹에 대응되는 제3 오디오 신호들을 복원하여 출력할 수 있다.In addition, the sound image restoration unit 530 of the audio decoding apparatus 500 according to an embodiment of the present invention receives the center of the listener from the first audio signals corresponding to the first channel group and the second audio signals corresponding to at least one side channel. It is possible to reconstruct and output the third audio signals corresponding to the second channel group of .

도 10a는 일 실시 예에 따른 오디오 복호화 장치의 음상 복원부의 동작을 설명한다.10A illustrates an operation of a sound image restoration unit of an audio decoding apparatus according to an exemplary embodiment.

도 10a는, 일 실시 예에 따른 오디오 복호화 장치가, 3.1.2 채널의 제1 채널 그룹에 대응되는 오디오 신호들로부터 7.1.4 채널인 제2 채널 그룹에 대응되는 오디오 신호들을 복원하는 과정의 예를 도시한다. 다만, 도 10a는, 발명의 동작에 대한 이해를 돕기 위한 예시에 불과하며, 일 실시 예에 따라 이용되는 사이드 채널의 종류와 개수, 입력 채널 그룹, 및 복원하고자 하는 타겟 채널 그룹은 구현에 따라 다양하게 변경 가능하다.10A is an example of a process in which an audio decoding apparatus according to an embodiment restores audio signals corresponding to a second channel group of 7.1.4 channels from audio signals corresponding to a first channel group of 3.1.2 channels shows However, FIG. 10A is merely an example for helping understanding of the operation of the present invention, and the type and number of side channels used according to an embodiment, an input channel group, and a target channel group to be restored vary according to implementations. can be changed to

일 실시 예에 따른 오디오 복호화 장치(500)의 업샘플링부(533)는, 다운샘플링되어 전송된 사이드 채널 정보(M_adv)에 시간축 기반 업샘플링을 수행하여 적어도 하나의 사이드 채널의 오디오 신호를 복원할 수 있다. 일 실시 예에 따른 업샘플링부(533)는, 인공 지능 모델을 이용하여, 비트스트림에 포함된 다운 샘플링된 정보(M_adv)를 업샘플링할 수 있다. The upsampling unit 533 of the audio decoding apparatus 500 according to an embodiment restores an audio signal of at least one side channel by performing time-base-based upsampling on the downsampled and transmitted side channel information M _adv . can do. The upsampling unit 533 according to an embodiment may upsample down-sampled information M _adv included in the bitstream by using an artificial intelligence model.

도 10a에 도시된 예에 따르면, 사이드 채널 정보의 업샘플링에 의해, 7.1.4 채널 그룹의 채널들 중에서 Ls 채널, Rs 채널, Lb 채널, Rb 채널, HBL 채널, 및 HBR 채널의 오디오 신호들(O_LS0, O_RS0, O_LB0, O_RB0, O_HBL0, O_HBR0)이 사이드 채널 그룹의 오디오 신호들로서 초기 복원될 수 있다.According to the example shown in FIG. 10A, audio signals ( O _LS0 , O _RS0 , O _LB0 , O _RB0 , O _HBL0 , O _HBR0 ) may be initially restored as audio signals of the side channel group.

일 실시 예에 따른 오디오 복호화 장치(500)의 수정부(535)는, 제1 채널 그룹과 제2 채널 그룹 간의 채널 그룹 변환 규칙에 따라서, 제1 채널 그룹에 대응되는 제1 오디오 신호들(O_TV) 및 사이드 채널 그룹에 대응되는 제2 오디오 신호들(O_LS0, O_RS0, O_LB0, O_RB0, O_HBL0, O_HBR0)로부터, 제2 채널 그룹에 포함되는 메인 채널 그룹의 오디오 신호들(O_L0, O_R0, O_HFL0, O_HFR0)을 획득할 수 있다. 수정부(535)는, 사이드 채널 그룹에 대응되는 제2 오디오 신호들(O_LS0, O_RS0, O_LB0, O_RB0, O_HBL0, O_HBR0) 및 메인 채널 그룹의 오디오 신호들(O_L0, O_R0, O_HFL0, O_HFR0)을 초기 조건으로 설정할 수 있다. 수정부(535)는, 제1 채널 그룹과 제2 채널 그룹 간의 채널 그룹 변환 규칙, 제1 채널 그룹에 대응되는 제1 오디오 신호들(O_TV), 및 초기 조건에 기초하여, 인공 지능 모델을 이용하여, 사이드 채널 그룹의 오디오 신호들과 메인 채널 그룹의 오디오 신호들을 교차적으로 수정할 수 있다.According to the channel group conversion rule between the first channel group and the second channel group, the modifying unit 535 of the audio decoding apparatus 500 according to an exemplary embodiment may generate first audio signals O corresponding to the first channel group. _TV ) and the second audio signals O _LS0 , O _RS0 , O _LB0 , O _RB0 , O _HBL0 , O _HBR0 corresponding to the side channel group, audio signals of the main channel group included in the second channel group ( O _L0 , O _R0 , O _HFL0 , O _HFR0 ) can be obtained. The correction unit 535 includes the second audio signals O _LS0 , O _RS0 , O _LB0 , O _RB0 , O _HBL0 , O _HBR0 corresponding to the side channel group and the audio signals O _L0 , O of the main channel group. _R0 , O _HFL0 , O _HFR0 ) can be set as initial conditions. The correction unit 535 is configured to create an artificial intelligence model based on a channel group transformation rule between the first channel group and the second channel group, the first audio signals O _TV corresponding to the first channel group, and an initial condition. By using this, it is possible to alternately modify the audio signals of the side channel group and the audio signals of the main channel group.

오디오 복호화 장치(500)의 수정부(535)는, 제2 채널 그룹에 포함되는 채널들로부터 제1 채널 그룹에 포함되는 채널들로의 변환 규칙에 따라, 제1 오디오 신호들(O_TV) 및 제2 채널 그룹에 포함되는 사이드 채널 그룹에 대응되는 제2 오디오 신호들(O_LS0, O_RS0, O_LB0, O_RB0, O_HBL0, O_HBR0)로부터, 메인 채널 그룹에 대응되는 오디오 신호들(O_L0, O_R0, O_HFL0, O_HFR0)을 획득할 수 있다. 수정부(535)는, 인공 지능 모델을 이용하여, 제2 오디오 신호들(O_LS0, O_RS0, O_LB0, O_RB0, O_HBL0, O_HBR0) 및 오디오 신호들(O_L0, O_R0, O_HFL0, O_HFR0)을 수정(refinement)할 수 있다. The modifying unit 535 of the audio decoding apparatus 500 may include, according to a conversion rule from channels included in the second channel group to channels included in the first channel group, first audio signals O _TV and From the second audio signals O _LS0 , O _RS0 , O _LB0 , O _RB0 , O _HBL0 , O _HBR0 corresponding to the side channel group included in the second channel group, the audio signals O corresponding to the main channel group _L0 , O _R0 , O _HFL0 , O _HFR0 ) can be obtained. The correction unit 535 may use the artificial intelligence model to configure the second audio signals O _LS0 , O _RS0 , O _LB0 , O _RB0 , O _HBL0 , O _HBR0 and the audio signals O _L0 , O _R0 , O _HFL0 , O _HFR0 ) can be refined.

일 실시예에 따른 수정부(535)는, 인공 지능 모델 내의 제1 레이어들을 통하여 오디오 신호들(O_L0, O_R0, O_HFL0, O_HFR0)을 수정하여 수정된 오디오 신호들 오디오 신호들(O_L1, O_R1, O_HFL1, O_HFR1)을 획득하고, 인공 지능 모델 내의 제2 레이어들을 통하여 제2 오디오 신호들(O_LS0, O_RS0, O_LB0, O_RB0, O_HBL0, O_HBR0)을 수정하여 수정된 오디오 신호들(O_LS1, O_RS1, O_LB1, O_RB1, O_HBL1, O_HBR1)을 획득할 수 있다. 수정부(535)는, 인공 지능 모델을 통해 수정된 오디오 신호들로부터, 제2 채널 그룹에 대응되는 제3 오디오 신호들(O_LS2, O_RS2, O_LB2, O_RB2, O_HBL2, O_HBR2,O_L2, O_R2, O_HFL2, O_HFR2)을 획득할 수 있다.The correction unit 535 according to an embodiment may modify the audio signals O _L0 , O _R0 , O _HFL0 , O _HFR0 through the first layers in the artificial intelligence model to modify the audio signals audio signals O _L1 , O _R1 , O _HFL1 , O _HFR1 ) and modify the second audio signals ( O _LS0 , O _RS0 , O _LB0 , O _RB0 , O _HBL0 , O _HBR0 ) through the second layers in the artificial intelligence model. Thus, the corrected audio signals O _LS1 , O _RS1 , O _LB1 , O _RB1 , O _HBL1 , O _HBR1 may be obtained. The correction unit 535 may include, from the audio signals modified through the artificial intelligence model, the third audio signals O _LS2 , O _RS2 , O _LB2 , O _RB2 , O _HBL2 , O _{HBR2 corresponding to the second channel group,} O _L2 , O _R2 , O _HFL2 , O _HFR2) can be obtained.

사이드 채널 그룹의 오디오 신호들 및 메인 채널 그룹의 오디오 신호들을 수정하기 위해 이용되는 인공 지능 모델은, 제1 채널 그룹에 대응되는 제1 오디오 신호들 및 사이드 채널 그룹에 대응되는 제2 오디오 신호들로부터, 제2 채널 그룹의 전체 채널들에 대응되는 제3 오디오 신호들을 복원하기 위한 인공 지능 모델로서, 복원된 오디오 신호와 원본 오디오 신호의 오차를 최소화 하도록 훈련된 인공지능 모델일 수 있다.The artificial intelligence model used to modify the audio signals of the side channel group and the audio signals of the main channel group is derived from the first audio signals corresponding to the first channel group and the second audio signals corresponding to the side channel group. , as an artificial intelligence model for reconstructing third audio signals corresponding to all channels of the second channel group, it may be an artificial intelligence model trained to minimize an error between the reconstructed audio signal and the original audio signal.

도 10a에 도시된 수정부(535)는, 사이드 채널 그룹의 오디오 신호들과 메인 채널 그룹의 오디오 신호들의 교차 수정 동작을 2회 반복 수행한다.The correction unit 535 illustrated in FIG. 10A repeatedly performs the cross-correction operation of the audio signals of the side channel group and the audio signals of the main channel group twice.

수정부(535)는, 1회 수정 동작의 결과로, 수정된 메인 채널 그룹의 오디오 신호들(O_L1, O_R1, O_HFL1, O_HFR1) 및 수정된 사이드 채널 그룹의 오디오 신호들(O_LS1, O_RS1, O_LB1, O_RB1, O_HBL1, O_HBR1)을 획득할 수 있다. 수정부(535)는, 2회 수정 동작의 결과로, 수정된 메인 채널 그룹의 오디오 신호들(O_L2, O_R2, O_HFL2, O_HFR2) 및 수정된 사이드 채널 그룹의 오디오 신호들(O_LS2, O_RS2, O_LB2, O_RB2, O_HBL2, O_HBR2)을 획득할 수 있다.As a result of the one-time correction operation, the correction unit 535 is configured to modify the audio signals O _L1 , O _{R1 ,} O _{HFL1 ,} O _HFR1 of the main channel group and the modified audio signals O _LS1 of the side channel group. , O _RS1 , O _LB1 , O _RB1 , O _HBL1 , O _HBR1 ) can be obtained. As a result of the two-time correction operation, the correction unit 535 is configured to modify the audio signals O _L2 , O _{R2 ,} O _{HFL2 ,} O _HFR2 of the main channel group and the modified audio signals O _LS2 of the side channel group. , O _RS2 , O _LB2 , O _RB2 , O _HBL2 , O _HBR2 ) can be obtained.

도 10b는 일 실시 예에 따른 오디오 복호화 장치의 업샘플링부 및 수정부의 동작을 설명한다.10B illustrates operations of an upsampling unit and a correction unit of an audio decoding apparatus according to an exemplary embodiment.

도 10b는, 일 실시 예에 따른 업샘플링부(533) 및 수정부(535)가 이용하는 인공 지능 모델의 블록도를 도시한다. 도 10b에 도시된 바와 같이, 각 모듈은 복수의 컨벌루션 레이어들로 구성될 수 있다. 10B is a block diagram of an artificial intelligence model used by the upsampling unit 533 and the correction unit 535 according to an embodiment. As shown in FIG. 10B , each module may be composed of a plurality of convolutional layers.

도 10에서, 1DRB는 1차원 레지듀얼 블록(Residual Block)을 의미하고, 1DConv(K1, C1)는 1차원 컨벌루션 레이어를 의미할 수 있다. K1은 컨벌루션 레이어의 커널의 개수를 의미하고, C1은 컨벌루션 레이어로의 입력이 C1 개의 멀티 채널들로 분리된다는 것을 의미할 수 있다. Prelu는 활성화 함수(Activation Function)을 의미할 수 있다.In FIG. 10 , 1DRB may mean a one-dimensional residual block, and 1DConv(K1, C1) may mean a one-dimensional convolutional layer. K1 may mean the number of kernels of the convolutional layer, and C1 may mean that an input to the convolutional layer is divided into C1 multi-channels. Prelu may mean an activation function.

일 실시 예에 따른 업샘플링부(533)는, 인공 지능 신경망을 이용하여, 사이드 채널 정보에 포함되는 채널 별 데이터를 업샘플링할 수 있다. 업샘플링부(533)에서 이용되는 인공 지능 모델은, 오디오 부호화 장치(400)의 다운샘플링부(423)에서 이용되는 인공 지능 모델에 대응될 수 있고, 오디오 신호 복원 성능을 높이기 위하여 훈련된 인공 지능 모델일 수 있다.The upsampling unit 533 according to an embodiment may upsample data for each channel included in the side channel information using an artificial intelligence neural network. The artificial intelligence model used in the upsampling unit 533 may correspond to the artificial intelligence model used in the downsampling unit 423 of the audio encoding apparatus 400, and an artificial intelligence trained to increase audio signal restoration performance. can be a model.

도 10b는 업샘플링부(533) 및 수정부(535)의 동작을 설명하기 위한 예시일 뿐, 본 개시의 실시 예들이 도 10b에 도시된 예에 제한되지 않는다. 일 실시 예에 따른 오디오 복호화 장치(500)가 이용하는 인공 지능 모델은, 학습(learning)을 통해 자율적으로 결정하고 확장될 수 있다. 오디오 복호화 장치(500)가 이용하는 인공 지능 모델은, 오디오 신호 복원 성능을 높이기 위하여 다양하게 구성되고 훈련될 수 있다.FIG. 10B is only an example for explaining the operations of the upsampling unit 533 and the correction unit 535 , and embodiments of the present disclosure are not limited to the example illustrated in FIG. 10B . The artificial intelligence model used by the audio decoding apparatus 500 according to an embodiment may be autonomously determined and expanded through learning. The artificial intelligence model used by the audio decoding apparatus 500 may be configured and trained in various ways to improve audio signal restoration performance.

업샘플링부(533)는, 사이드 채널 정보에 포함되는 채널 별로 가중치를 구하고, 사이드 채널 정보에 포함되는 채널 별 데이터에 가중치를 적용하여 업샘플링을 수행할 수 있다. 업샘플링부(533)는, 가중치를 적용함으로써 복호화 단에서의 오디오 복원 성능이 향상되는 지를 판단하고, 판단 결과를 훈련(train)할 수 있다. 업샘플링부(533)는, 훈련된 인공 지능 모델을 이용하여, 오디오 복원율을 높일 수 있는 가중치를 도출할 수 있다.The upsampling unit 533 may obtain a weight for each channel included in the side channel information and perform upsampling by applying the weight to data for each channel included in the side channel information. The upsampling unit 533 may determine whether audio restoration performance in the decoding stage is improved by applying the weight, and train the determination result. The upsampling unit 533 may derive a weight for increasing the audio restoration rate by using the trained artificial intelligence model.

도 10c는 일 실시 예에 따른 오디오 복호화 장치의 수정부의 동작의 예들을 설명한다.10C illustrates examples of operations of a correction unit of an audio decoding apparatus according to an embodiment.

도 10a는, 일 실시 예에 따른 오디오 복호화 장치가 2회에 걸쳐 사이드 채널 그룹의 오디오 신호들과 메인 채널 그룹의 오디오 신호들의 교차 수정 동작을 수행하는 경우를 예로 도시하였다. 10A illustrates an example in which an audio decoding apparatus according to an embodiment performs a cross-correction operation on audio signals of a side channel group and audio signals of a main channel group twice.

그래프(1031)를 참조하면, 일 실시 예에 따른 오디오 복호화 장치(500)는, 사이드 채널 정보의 업샘플링에 의해, 7.1.4 채널 그룹의 채널들 중에서 Ls 채널, Rs 채널, Lb 채널, Rb 채널, HBL 채널, 및 HBR 채널의 오디오 신호들을 사이드 채널 그룹의 오디오 신호들로서 초기 복원할 수 있다. 그리고, 오디오 복호화 장치(500)는, 제1 단계 수정 동작에 의해, 메인 채널 그룹(즉, L 채널, R 채널, HFL 채널, 및 HFR 채널)의 수정된 오디오 신호들 및 사이드 채널 그룹의 수정된 오디오 신호들을 획득할 수 있다. 오디오 복호화 장치(500)는, 제2 단계 수정 동작에 의해, 메인 채널 그룹의 2차 수정된 오디오 신호들 및 사이드 채널 그룹의 2차 수정된 오디오 신호들을 획득할 수 있다.Referring to the graph 1031 , the audio decoding apparatus 500 according to an embodiment performs an Ls channel, an Rs channel, an Lb channel, and an Rb channel among channels of a 7.1.4 channel group by upsampling side channel information. , the HBL channel, and the HBR channel may be initially restored as audio signals of the side channel group. Then, the audio decoding apparatus 500 performs the corrected audio signals of the main channel group (ie, the L channel, the R channel, the HFL channel, and the HFR channel) and the modified side channel group by the first step correction operation. Audio signals may be obtained. The audio decoding apparatus 500 may obtain the secondary modified audio signals of the main channel group and the secondary modified audio signals of the side channel group through the second step correction operation.

그러나 본 개시는 이에 제한되지 않는다. 연산 환경, 지연(latency) 조건에 따라서, 오디오 복호화 장치(500)의 교차 복원 동작의 단계 및 구성은 다양하게 변형될 수 있다.However, the present disclosure is not limited thereto. The steps and configuration of the cross restoration operation of the audio decoding apparatus 500 may be variously modified according to the computing environment and latency conditions.

한편, 일 실시 예에 따른 오디오 복호화 장치(500)는, 교차 복원 동작을 1회만 수행할 수 있다.Meanwhile, the audio decoding apparatus 500 according to an embodiment may perform the cross restoration operation only once.

그래프(1032)에 도시된 바와 같이, 일 실시 예에 따른 오디오 복호화 장치(500)는, 사이드 채널 정보의 업샘플링에 의해, 7.1.4 채널 그룹의 채널들 중에서 Ls 채널, Rs 채널, Lb 채널, Rb 채널, HBL 채널, 및 HBR 채널의 오디오 신호들을 사이드 채널 그룹의 오디오 신호들로서 초기 복원할 수 있다. 그리고, 오디오 복호화 장치(500)는, 제1 단계 수정 동작에 의해, 메인 채널 그룹(즉, L 채널, R 채널, HFL 채널, 및 HFR 채널)의 수정된 오디오 신호들 및 사이드 채널 그룹의 수정된 오디오 신호들을 획득할 수 있다. As shown in the graph 1032, the audio decoding apparatus 500 according to an embodiment performs an Ls channel, an Rs channel, an Lb channel, an Ls channel, an Rs channel, an Lb channel, Audio signals of the Rb channel, the HBL channel, and the HBR channel may be initially restored as audio signals of the side channel group. Then, the audio decoding apparatus 500 performs the corrected audio signals of the main channel group (ie, the L channel, the R channel, the HFL channel, and the HFR channel) and the modified side channel group by the first step correction operation. Audio signals may be obtained.

한편, 일 실시 예에 따른 오디오 복호화 장치(500)는, 사이드 채널 그룹 내에서 사이드 채널들의 오디오 신호들의 교차 수정 동작을 수행할 수 있다.Meanwhile, the audio decoding apparatus 500 according to an embodiment may perform a cross-correction operation of audio signals of side channels within a side channel group.

그래프(1033)을 참조하면, 일 실시 예에 따른 오디오 복호화 장치(500)는, 사이드 채널 정보의 업샘플링에 의해, 7.1.4 채널 그룹의 채널들 중에서 Ls 채널, Rs 채널, Lb 채널, Rb 채널, HBL 채널, 및 HBR 채널의 오디오 신호들을 사이드 채널 그룹의 오디오 신호들로서 초기 복원할 수 있다. Referring to the graph 1033 , the audio decoding apparatus 500 according to an embodiment performs an Ls channel, an Rs channel, an Lb channel, and an Rb channel among channels of a 7.1.4 channel group by upsampling side channel information. , the HBL channel, and the HBR channel may be initially restored as audio signals of the side channel group.

제1 단계 수정 동작에 있어서, 먼저, 오디오 복호화 장치(500)는, 사이드 채널 그룹의 초기 오디오 신호들에 기초하여, 메인 채널 그룹(즉, L 채널, R 채널, HFL 채널, 및 HFR 채널)의 수정된 오디오 신호들을 획득할 수 있다. 다음으로, 오디오 복호화 장치(500)는, 사이드 채널 그룹의 초기 오디오 신호들 및 메인 채널 그룹의 수정된 오디오 신호들에 기초하여, 사이드 채널 그룹의 Ls 채널 및 Rs 채널의 수정된 오디오 신호들을 획득할 수 있다. 다음으로, 오디오 복호화 장치(500)는, 사이드 채널 그룹의 초기 오디오 신호들, 메인 채널 그룹의 수정된 오디오 신호들, 및 사이드 채널 그룹의 Ls 채널 및 Rs 채널의 수정된 오디오 신호들에 기초하여, 사이드 채널 그룹의 Lb 채널 및 Rb 채널의 수정된 오디오 신호들을 획득할 수 있다. 다음으로, 오디오 복호화 장치(500)는, 사이드 채널 그룹의 초기 오디오 신호들, 메인 채널 그룹의 수정된 오디오 신호들, 및 사이드 채널 그룹의 Ls 채널, Rs 채널, Lb 채널, 및 Rb 채널의 수정된 오디오 신호들에 기초하여, 사이드 채널 그룹의 HBL 채널, 및 HBR 채널의 수정된 오디오 신호들을 획득할 수 있다.In the first step correction operation, first, the audio decoding apparatus 500 controls the main channel group (ie, the L channel, the R channel, the HFL channel, and the HFR channel) based on the initial audio signals of the side channel group. It is possible to obtain modified audio signals. Next, the audio decoding apparatus 500 obtains the modified audio signals of the Ls channel and the Rs channel of the side channel group based on the initial audio signals of the side channel group and the modified audio signals of the main channel group. can Next, the audio decoding apparatus 500, based on the initial audio signals of the side channel group, the modified audio signals of the main channel group, and the modified audio signals of the Ls channel and the Rs channel of the side channel group, It is possible to obtain modified audio signals of the Lb channel and the Rb channel of the side channel group. Next, the audio decoding apparatus 500 provides the initial audio signals of the side channel group, the modified audio signals of the main channel group, and the modified Ls channel, Rs channel, Lb channel, and Rb channel of the side channel group. Based on the audio signals, the HBL channel of the side channel group and the modified audio signals of the HBR channel may be obtained.

제2 단계 수정 동작에 있어서, 오디오 복호화 장치(500)는, 메인 채널 그룹의 수정된 오디오 신호들 및 사이드 채널 그룹의 수정된 오디오 신호들에 기초하여, 메인 채널 그룹(즉, L 채널, R 채널, HFL 채널, 및 HFR 채널)의 2차 수정된 오디오 신호들을 획득할 수 있다. 다음으로, 오디오 복호화 장치(500)는, 사이드 채널 그룹의 수정된 오디오 신호들 및 메인 채널 그룹의 2차 수정된 오디오 신호들에 기초하여, 사이드 채널 그룹의 Ls 채널 및 Rs 채널의 2차 수정된 오디오 신호들을 획득할 수 있다. 다음으로, 오디오 복호화 장치(500)는, 사이드 채널 그룹의 수정된 오디오 신호들, 메인 채널 그룹의 2차 수정된 오디오 신호들, 및 사이드 채널 그룹의 Ls 채널 및 Rs 채널의 2차 수정된 오디오 신호들에 기초하여, 사이드 채널 그룹의 Lb 채널 및 Rb 채널의 2차 수정된 오디오 신호들을 획득할 수 있다. 다음으로, 오디오 복호화 장치(500)는, 사이드 채널 그룹의 수정된 오디오 신호들, 메인 채널 그룹의 수정된 오디오 신호들, 및 사이드 채널 그룹의 Ls 채널, Rs 채널, Lb 채널, 및 Rb 채널의 2차 수정된 오디오 신호들에 기초하여, 사이드 채널 그룹의 HBL 채널, 및 HBR 채널의 2차 수정된 오디오 신호들을 획득할 수 있다.In the second step correction operation, the audio decoding apparatus 500 performs the main channel group (ie, L channel and R channel) based on the modified audio signals of the main channel group and the modified audio signals of the side channel group. , HFL channel, and HFR channel) may be obtained. Next, the audio decoding apparatus 500 performs second-order correction of the Ls channel and the Rs channel of the side channel group based on the modified audio signals of the side channel group and the second-order modified audio signals of the main channel group. Audio signals may be obtained. Next, the audio decoding apparatus 500 provides the modified audio signals of the side channel group, the second modified audio signals of the main channel group, and the second modified audio signals of the Ls channel and the Rs channel of the side channel group. Based on the values, it is possible to obtain second-order modified audio signals of the Lb channel and the Rb channel of the side channel group. Next, the audio decoding apparatus 500 generates two of the modified audio signals of the side channel group, the modified audio signals of the main channel group, and the Ls channel, the Rs channel, the Lb channel, and the Rb channel of the side channel group. Based on the second-order modified audio signals, the HBL channel of the side channel group and the second-order modified audio signals of the HBR channel may be obtained.

이하에서는, 도 11의 흐름도를 참조하여, 일 실시 예에 따른 오디오 부호화 장치(400)가 오디오 신호를 처리하는 방법을 설명한다. 도 11에 도시된 각 단계는, 상술한 오디오 부호화 장치(400)에 포함되는 적어도 하나의 구성에 의해 수행될 수 있으며, 중복되는 설명은 생략된다.Hereinafter, a method of processing an audio signal by the audio encoding apparatus 400 according to an embodiment will be described with reference to the flowchart of FIG. 11 . Each of the steps shown in FIG. 11 may be performed by at least one component included in the audio encoding apparatus 400 described above, and overlapping descriptions will be omitted.

도 11은 일 실시 예에 따른 오디오 부호화 장치(400)의 오디오 신호 부호화 방법의 흐름도를 도시한다.11 is a flowchart of an audio signal encoding method of the audio encoding apparatus 400 according to an exemplary embodiment.

단계 S1101에서 일 실시 예에 따른 오디오 부호화 장치(400)는, 제1 채널 그룹에 포함된 채널들에 대응되는 제1 오디오 신호들로부터 제2 채널 그룹에 포함된 채널들에 대응되는 제2 오디오 신호들을 획득할 수 있다. 오디오 부호화 장치(400)는, 제1 채널 그룹에 대응되는 제1 오디오 신호들을 다운믹싱함으로써 제2 채널 그룹에 대응되는 제2 오디오 신호들을 획득할 수 있다. 예를 들어, 제1 채널 그룹은 원본 오디오 신호의 채널 그룹을 포함하고, 제2 채널 그룹은 제1 채널 그룹에 포함된 채널들 중 적어도 두 채널들을 결합함으로써 구성될 수 있다.In step S1101 , the audio encoding apparatus 400 according to an embodiment performs the second audio signal corresponding to the channels included in the second channel group from the first audio signals corresponding to the channels included in the first channel group. can be obtained The audio encoding apparatus 400 may obtain second audio signals corresponding to the second channel group by downmixing the first audio signals corresponding to the first channel group. For example, the first channel group may include a channel group of the original audio signal, and the second channel group may be configured by combining at least two channels among channels included in the first channel group.

일 실시 예에 따른 오디오 부호화 장치(400)는, 제1 채널 그룹에 포함된 채널들 각각의 제2 채널 그룹과의 관련도에 기초하여, 제1 채널 그룹에 포함된 채널들에 가중치 값들을 할당할 수 있다. 오디오 부호화 장치(400)는, 제1 채널 그룹에 포함된 채널들에 할당된 가중치 값들에 기초하여, 제1 오디오 신호들 중 적어도 두 개의 제1 오디오 신호들을 가중치 합함으로써, 제1 오디오 신호들로부터 제2 오디오 신호들을 획득할 수 있다.The audio encoding apparatus 400 according to an embodiment allocates weight values to the channels included in the first channel group based on the degree of relevance of each of the channels included in the first channel group to the second channel group. can do. The audio encoding apparatus 400 performs weight summing of at least two first audio signals among the first audio signals based on weight values assigned to the channels included in the first channel group, thereby adding Second audio signals may be obtained.

일 실시 예에 따른 오디오 부호화 장치(400)는, 청자 중심 다채널 오디오 신호를 포함하는 제1 오디오 신호를, 청자 전방 중심 다채널 오디오 신호를 포함하는 제2 오디오 신호로 다운믹싱할 수 있다. 오디오 부호화 장치(400)는, 미리 결정된 채널 그룹 변환 규칙에 기초하여, 제1 채널 그룹의 제1 오디오 신호를 제2 채널 그룹의 제2 오디오 신호로 변환할 수 있다. 오디오 부호화 장치(400)는, 제1 채널 그룹의 적어도 두 채널들의 오디오 신호들을 믹싱함으로써, 제2 채널 그룹의 하나의 채널의 오디오 신호를 획득할 수 있다.The audio encoding apparatus 400 according to an embodiment may downmix the first audio signal including the listener-centered multi-channel audio signal into the second audio signal including the front-centered multi-channel audio signal. The audio encoding apparatus 400 may convert the first audio signal of the first channel group into the second audio signal of the second channel group based on a predetermined channel group transformation rule. The audio encoding apparatus 400 may obtain an audio signal of one channel of the second channel group by mixing audio signals of at least two channels of the first channel group.

예를 들어, 오디오 부호화 장치(400)는, 12개의 채널들로 구성되는 7.1.4 채널 그룹의 제1 오디오 신호를 다운믹싱함으로써, 6개의 채널들로 구성되는 3.1.2 채널 그룹의 제2 오디오 신호를 획득할 수 있다. For example, the audio encoding apparatus 400 downmixes the first audio signal of the 7.1.4 channel group consisting of 12 channels, and thereby the second audio of the 3.1.2 channel group consisting of 6 channels. signal can be obtained.

7.1.4 채널 그룹은, L(Left) 채널, C(Center) 채널, R(Right) 채널, Ls(Side Left) 채널, Rs(Side Right) 채널, Lb(Back Left) 채널, Rb(Back Right) 채널, LFE(Sub-woofer) 채널, HFL(Height Front Left) 채널, HFR(Height Front Right) 채널, 및 HBL(Height Back Left) 채널, HBR(Height Back Right) 채널을 포함할 수 있다. 3.1.2 채널 그룹은, L3(Left) 채널, C(Center) 채널, R3(Right) 채널, LFE 채널, HFL3(Height Front Left) 채널, 및 HFR3(Height Front Right) 채널을 포함할 수 있다.7.1.4 Channel groups are: L(Left) Channel, C(Center) Channel, R(Right) Channel, Ls(Side Left) Channel, Rs(Side Right) Channel, Lb(Back Left) Channel, Rb(Back Right) Channel ) channel, a sub-woofer (LFE) channel, a height front left (HFL) channel, a height front right (HFR) channel, a height back left (HBL) channel, and a height back right (HBR) channel. 3.1.2 The channel group may include an L3 (Left) channel, a C (Center) channel, an R3 (Right) channel, an LFE channel, a HFL3 (Height Front Left) channel, and a HFR3 (Height Front Right) channel.

예를 들어, 오디오 부호화 장치(400)는, 7.1.4 채널 그룹에 포함되는 채널들 중에서, 전방 좌측 채널(L)의 오디오 신호, 좌측 채널(Ls)의 오디오 신호, 및 후방 좌측 채널(Lb)의 오디오 신호를 믹싱함으로써, 3.1.2 채널 그룹의 좌측 채널(L3)의 오디오 신호를 획득할 수 있다. 오디오 부호화 장치(400)는, 7.1.4 채널 그룹에 포함되는 채널들 중에서, 전방 우측 채널(R)의 오디오 신호, 우측 채널(Rs)의 오디오 신호, 및 후방 우측 채널(Rb)의 오디오 신호를 믹싱함으로써, 3.1.2 채널 그룹의 우측 채널(R3)의 오디오 신호를 획득할 수 있다.For example, the audio encoding apparatus 400 may include, among channels included in the 7.1.4 channel group, an audio signal of a front left channel (L), an audio signal of a left channel (Ls), and a rear left channel (Lb). By mixing the audio signal of , it is possible to obtain the audio signal of the left channel (L3) of the 3.1.2 channel group. The audio encoding apparatus 400 converts, among channels included in the 7.1.4 channel group, an audio signal of a front right channel (R), an audio signal of a right channel (Rs), and an audio signal of a rear right channel (Rb). By mixing, the audio signal of the right channel (R3) of the 3.1.2 channel group can be obtained.

오디오 부호화 장치(400)는, 7.1.4 채널 그룹에 포함되는 채널들 중에서, 좌측 채널(Ls)의 오디오 신호, 후방 좌측 채널(Lb)의 오디오 신호, 전방 상부 좌측 채널(HFL)의 오디오 신호, 및 후방 상부 좌측 채널(HBL)의 오디오 신호를 믹싱함으로써, 3.1.2 채널 그룹의 상부 좌측 채널(HFL3)의 오디오 신호를 획득할 수 있다. 오디오 부호화 장치(400)는, 7.1.4 채널 그룹에 포함되는 채널들 중에서, 우측 채널(Rs)의 오디오 신호, 후방 우측 채널(Rb)의 오디오 신호, 전방 상부 우측 채널(HFR)의 오디오 신호, 및 후방 상부 우측 채널(HBR)의 오디오 신호를 믹싱함으로써, 3.1.2 채널 그룹의 상부 우측 채널(HFR3)의 오디오 신호를 획득할 수 있다. 오디오 부호화 장치(400)는, 7.1.4 채널 그룹에 포함되는 채널들 중에서, 서브 우퍼(LFE) 채널의 오디오 신호 및 센터(C) 채널의 오디오 신호 각각에 가중치를 적용하여, 3.1.2 채널 그룹의 서브 우퍼(LFE) 채널의 오디오 신호 및 센터(C) 채널의 오디오 신호를 획득할 수 있다.The audio encoding apparatus 400 includes, among channels included in the 7.1.4 channel group, an audio signal of a left channel (Ls), an audio signal of a rear left channel (Lb), an audio signal of a front upper left channel (HFL), and by mixing the audio signal of the rear upper left channel (HBL), the audio signal of the upper left channel (HFL3) of the 3.1.2 channel group may be obtained. The audio encoding apparatus 400 includes, among channels included in the 7.1.4 channel group, an audio signal of a right channel (Rs), an audio signal of a rear right channel (Rb), an audio signal of a front upper right channel (HFR), and by mixing the audio signal of the rear upper right channel (HBR), the audio signal of the upper right channel (HFR3) of the 3.1.2 channel group may be obtained. The audio encoding apparatus 400 applies a weight to each of the audio signal of the subwoofer (LFE) channel and the audio signal of the center (C) channel among the channels included in the 7.1.4 channel group, and applies a weight to the 3.1.2 channel group An audio signal of a subwoofer (LFE) channel and an audio signal of a center (C) channel may be obtained.

단계 S1102에서 일 실시 예에 따른 오디오 부호화 장치(400)는, 제1 채널 그룹에 포함된 채널들 중 제2 채널 그룹과의 관련도에 기초하여 식별된 적어도 하나의 채널에 대응되는 적어도 하나의 제3 오디오 신호를 인공 지능 모델을 이용하여 다운샘플링할 수 있다. 오디오 부호화 장치(400)는, 제1 채널 그룹에 대응되는 제1 오디오 신호들에 포함되는 적어도 하나의 사이드 채널의 오디오 신호를 다운샘플링함으로써, 사이드 채널 정보를 획득할 수 있다.In step S1102 , the audio encoding apparatus 400 according to an embodiment performs at least one second channel corresponding to at least one channel identified based on the degree of relevance to the second channel group among the channels included in the first channel group. 3 Audio signals can be downsampled using artificial intelligence models. The audio encoding apparatus 400 may obtain side channel information by downsampling an audio signal of at least one side channel included in the first audio signals corresponding to the first channel group.

일 실시 예에 따른 오디오 부호화 장치(400)는, 제1 채널 그룹에 포함되는 채널들과 제2 채널 그룹에 포함되는 채널들 간의 관련도에 기초하여, 제1 채널 그룹에 포함되는 채널들 중에서 적어도 하나의 사이드 채널을 식별할 수 있다. 예를 들어, 또는, 오디오 부호화 장치(400)는, 제1 채널 그룹에 포함된 채널들 중에서 제2 채널 그룹과의 관련도가 소정 값보다 낮은 적어도 하나의 채널을 사이드 채널로서 식별할 수 있다. 오디오 부호화 장치(400)는, 제1 채널 그룹에 포함되는 채널들 중에서 적어도 하나의 사이드 채널 이외의 채널을 메인 채널로서 식별할 수 있다.The audio encoding apparatus 400 according to an embodiment may, based on a degree of relevance between channels included in the first channel group and channels included in the second channel group, at least among the channels included in the first channel group. One side channel can be identified. For example, the audio encoding apparatus 400 may identify, as a side channel, at least one channel having a lower relevance to the second channel group than a predetermined value among channels included in the first channel group. The audio encoding apparatus 400 may identify a channel other than at least one side channel among channels included in the first channel group as a main channel.

일 실시 예에 따른 오디오 부호화 장치(400)는, 제1 채널 그룹의 채널들에 할당된 가중치 값들에 기초하여, 제1 채널 그룹의 제1 오디오 신호를 다운믹싱함으로써 제2 채널 그룹의 상기 제2 오디오 신호를 획득할 수 있다. 오디오 부호화 장치(400)는, 제1 채널 그룹의 채널들에 할당된 가중치 값들에 기초하여, 제1 채널 그룹에 포함되는 채널들 중에서 적어도 하나의 사이드 채널 및 적어도 하나의 메인 채널을 식별할 수 있다. The audio encoding apparatus 400 according to an embodiment may downmix the first audio signal of the first channel group based on the weight values assigned to the channels of the first channel group to thereby downmix the second channel group of the second channel group. An audio signal can be obtained. The audio encoding apparatus 400 may identify at least one side channel and at least one main channel from among channels included in the first channel group based on weight values assigned to channels of the first channel group. .

제1 채널 그룹에 포함되는 채널들 중에서, 메인 채널들을 제1 서브 그룹의 채널들로 지칭하고, 사이드 채널들을 제2 서브 그룹의 채널들로 지칭할 수 있다. 오디오 부호화 장치(400)는, 제1 채널 그룹의 적어도 두 채널들에 대응하는 오디오 신호들을 가중치 합함으로써, 제2 채널 그룹의 하나의 채널에 대응하는 오디오 신호를 획득함에 있어서, 제1 채널 그룹의 적어도 두 채널들 중에서 할당된 가중치 값이 최대인 채널을 메인 채널로 식별하고, 적어도 두 채널들 중에서 나머지 채널을 사이드 채널로 식별할 수 있다.Among the channels included in the first channel group, main channels may be referred to as channels of a first subgroup, and side channels may be referred to as channels of a second subgroup. The audio encoding apparatus 400 obtains an audio signal corresponding to one channel of the second channel group by weight summing the audio signals corresponding to at least two channels of the first channel group. A channel having a maximum assigned weight value among at least two channels may be identified as a main channel, and a remaining channel among at least two channels may be identified as a side channel.

예를 들어, 제1 채널 그룹이 7.1.4 채널 그룹이고, 제2 채널 그룹이 3.1.2 채널 그룹인 경우, 오디오 부호화 장치(400)는, 제1 채널 그룹의 Ls 채널, Rs 채널, Lb 채널, Rb 채널, HBL 채널, 및 HBR 채널을 사이드 채널들로서 결정할 수 있다. 오디오 부호화 장치(400)는, 제1 채널 그룹의 채널들 중에서 나머지 채널인, L 채널, R 채널, HFL 채널, 및 HFR 채널을 메인 채널들로서 결정할 수 있다. For example, when the first channel group is a 7.1.4 channel group and the second channel group is a 3.1.2 channel group, the audio encoding apparatus 400 may include the Ls channel, the Rs channel, and the Lb channel of the first channel group. , Rb channel, HBL channel, and HBR channel may be determined as side channels. The audio encoding apparatus 400 may determine, as main channels, an L channel, an R channel, an HFL channel, and an HFR channel, which are the remaining channels among the channels of the first channel group.

오디오 부호화 장치(400)는, 제1 채널 그룹의 채널들 중에서, 메인 채널들의 오디오 신호들은 제외하고, 적어도 하나의 사이드 채널의 오디오 신호만을 부호화 하여 출력할 수 있다. 이 때, 오디오 부호화 장치(400)는, 전송 효율을 높이기 위하여, 적어도 하나의 사이드 채널의 오디오 신호를 다운샘플링할 수 있다.The audio encoding apparatus 400 may encode and output only the audio signal of at least one side channel, excluding the audio signals of the main channels, among the channels of the first channel group. In this case, the audio encoding apparatus 400 may downsample an audio signal of at least one side channel in order to increase transmission efficiency.

일 실시 예에 따른 오디오 부호화 장치(400)는, 적어도 하나의 사이드 채널의 오디오 신호로부터 제1 오디오 샘플 그룹 및 제2 오디오 샘플 그룹을 추출할 수 있다. 오디오 부호화 장치(400)는, 제1 오디오 샘플 그룹 및 제2 오디오 샘플 그룹의 가중치 합을 획득함으로써, 적어도 하나의 사이드 채널의 오디오 신호를 다운샘플링할 수 있다.The audio encoding apparatus 400 according to an embodiment may extract a first audio sample group and a second audio sample group from an audio signal of at least one side channel. The audio encoding apparatus 400 may downsample the audio signal of at least one side channel by obtaining a weighted sum of the first audio sample group and the second audio sample group.

일 예로서, 오디오 부호화 장치(400)는, 제1 오디오 샘플 그룹 및 제2 오디오 샘플 그룹에 포함되는 오디오 샘플 별, 및 오디오 샘플 그룹 별 균일한 가중치 값들을 적용하여, 적어도 하나의 사이드 채널의 오디오 신호를 다운샘플링할 수 있다.As an example, the audio encoding apparatus 400 applies uniform weight values for each audio sample included in the first audio sample group and the second audio sample group and for each audio sample group, so that the audio of at least one side channel You can downsample the signal.

다른 예로서, 오디오 부호화 장치(400)는, 제1 오디오 샘플 그룹 및 제2 오디오 샘플 그룹에 포함되는 오디오 샘플 별, 및 오디오 샘플 그룹 별로 상이한 가중치 값들을 적용하여, 적어도 하나의 사이드 채널의 오디오 신호를 다운샘플링할 수 있다.As another example, the audio encoding apparatus 400 applies different weight values for each audio sample included in the first audio sample group and the second audio sample group and for each audio sample group, and applies different weight values to the audio signal of at least one side channel. can be downsampled.

일 실시예에 따른 오디오 부호화 장치(400)는, 인공 지능 모델을 이용하여, 제1 오디오 샘플 그룹 및 제2 오디오 샘플 그룹에 대한 다운샘플링 관련 정보를 획득할 수 있다. 오디오 부호화 장치(400)는, 다운샘플링 관련 정보를 제1 오디오 샘플 그룹 및 제2 오디오 샘플 그룹에 적용함으로써, 적어도 하나의 제3 오디오 신호를 다운샘플링 할 수 있다. 이 때 이용되는 인공 지능 모델은, 제2 오디오 신호들 및 다운샘플링된 적어도 하나의 제3 오디오 신호에 기초하여 복원되는 복원 오디오 신호들과, 원본 오디오 신호들 간의 오차를 최소화 하는 다운샘플링 관련 정보를 획득하도록 훈련된 인공 지능 모델일 수 있다.The audio encoding apparatus 400 according to an embodiment may obtain downsampling related information on the first audio sample group and the second audio sample group by using the artificial intelligence model. The audio encoding apparatus 400 may downsample at least one third audio signal by applying downsampling related information to the first audio sample group and the second audio sample group. The artificial intelligence model used at this time includes downsampling related information that minimizes an error between the second audio signals and the restored audio signals reconstructed based on the downsampled at least one third audio signal, and the original audio signals. It may be an artificial intelligence model trained to acquire.

예를 들어, 오디오 부호화 장치(400)는, 인공지능 모델을 이용하여, 제1 오디오 샘플 그룹 및 제2 오디오 샘플 그룹 각각에 적용될 가중치 값들을 획득할 수 있다. 이용되는 인공지능 모델은, 제2 오디오 신호 및 사이드 채널 정보에 기초하여 복원될 제1 채널 그룹의 전체 채널들의 복원 오디오 신호들과, 제1 오디오 신호 간의 오차를 최소화 하는 가중치 값들을 획득하도록 훈련된 인공지능 모델일 수 있다.For example, the audio encoding apparatus 400 may obtain weight values to be applied to each of the first audio sample group and the second audio sample group by using the artificial intelligence model. The artificial intelligence model used is trained to obtain weight values that minimize the error between the first audio signal and the restored audio signals of all channels of the first channel group to be reconstructed based on the second audio signal and side channel information. It could be an artificial intelligence model.

단계 S1103에서 일 실시 예에 따른 오디오 부호화 장치(400)는, 제2 채널 그룹에 포함된 채널들에 대응되는 제2 오디오 신호들 및 다운샘플링된 적어도 하나의 제3 오디오 신호를 포함하는 비트스트림을 생성할 수 있다. 오디오 부호화 장치(400)는, 제2 오디오 신호들 및 다운샘플링된 적어도 하나의 제3 오디오 신호를 부호화함으로써 비트스트림을 생성할 수 있다.In step S1103, the audio encoding apparatus 400 according to an embodiment generates a bitstream including second audio signals corresponding to channels included in the second channel group and at least one downsampled third audio signal. can create The audio encoding apparatus 400 may generate a bitstream by encoding the second audio signals and at least one downsampled third audio signal.

일 실시 예에 따른 오디오 부호화 장치(400)는, 제2 오디오 신호를 믹싱함으로써 기본 채널 그룹의 오디오 신호들 및 종속 채널 그룹의 오디오 신호들 획득할 수 있다. 오디오 부호화 장치(400)는, 기본 채널 그룹의 오디오 신호들을 압축하여 제1 압축 신호를 획득하고, 종속 채널 그룹의 오디오 신호들을 압축하여 제2 압축 신호를 획득하고, 사이드 채널 정보를 압축함으로써 제3 압축 신호를 획득할 수 있다. 오디오 부호화 장치(400)는, 제1 압축 신호, 제2 압축 신호, 및 제3 압축 신호를 포함하는 비트스트림을 생성할 수 있다.The audio encoding apparatus 400 according to an embodiment may obtain audio signals of a base channel group and audio signals of a dependent channel group by mixing the second audio signal. The audio encoding apparatus 400 obtains a first compressed signal by compressing audio signals of a base channel group, obtains a second compressed signal by compressing audio signals of a dependent channel group, and a third compressed signal by compressing side channel information A compressed signal can be obtained. The audio encoding apparatus 400 may generate a bitstream including the first compressed signal, the second compressed signal, and the third compressed signal.

예를 들어, 기본 채널 그룹은, 스테레오 채널을 구성하는 좌 채널 및 우 채널을 포함할 수 있다. 종속 채널 그룹은, 제2 채널 그룹에 포함되는 채널들 중에서, 기본 채널 그룹에 대응하는 두 채널들 이외의 채널들을 포함할 수 있다.For example, the basic channel group may include a left channel and a right channel constituting a stereo channel. The dependent channel group may include channels other than two channels corresponding to the basic channel group among channels included in the second channel group.

또한, 일 실시 예에 따른 오디오 부호화 장치(400)는, 복호화 단에서 기본 채널의 오디오 신호 및 종속 채널의 오디오 신호에 기초하여 다채널 오디오 신호를 복호화하기 위해 이용되는 정보인, 부가 정보를 획득할 수 있다. In addition, the audio encoding apparatus 400 according to an embodiment may obtain additional information, which is information used for decoding a multi-channel audio signal based on an audio signal of a basic channel and an audio signal of a dependent channel, in the decoding end. can

예를 들어, 오디오 부호화 장치(400)는, 제1 압축 신호, 제2 압축 신호 및 제3 압축 신호를 각각 복호화하고, 복호화된 신호들로부터 제1 채널 그룹에 대응되는 복원 오디오 신호를 획득할 수 있다. 오디오 부호화 장치(400)는, 복원 오디오 신호 및 제1 오디오 신호를 비교함으로써 부가 정보를 획득할 수 있다. 오디오 부호화 장치(400)는, 복원 오디오 신호와 제1 오디오 신호의 오차가 최소가 되도록 복원 오디오 신호에 포함되는 채널의 오디오 신호에 적용될 수 있는 스케일 팩터를 부가 정보로서 획득할 수 있다.For example, the audio encoding apparatus 400 may decode the first compressed signal, the second compressed signal, and the third compressed signal, respectively, and obtain a restored audio signal corresponding to the first channel group from the decoded signals. have. The audio encoding apparatus 400 may obtain additional information by comparing the restored audio signal and the first audio signal. The audio encoding apparatus 400 may obtain, as additional information, a scale factor applicable to an audio signal of a channel included in the restored audio signal such that an error between the restored audio signal and the first audio signal is minimized.

오디오 부호화 장치(400)는, 제1 압축 신호, 제2 압축 신호, 및 제3 압축 신호에 더불어 부가 정보를 더 포함하는 비트스트림을 생성할 수 있다.The audio encoding apparatus 400 may generate a bitstream including additional information in addition to the first compressed signal, the second compressed signal, and the third compressed signal.

이하에서는, 도 12의 흐름도를 참조하여, 일 실시 예에 따른 오디오 복호화 장치(500)가 수신된 비트스트림으로부터 오디오 신호를 복원하는 방법을 설명한다. 도 12에 도시된 각 단계는, 상술한 오디오 복호화 장치(500)에 포함되는 적어도 하나의 구성에 의해 수행될 수 있으며, 중복되는 설명은 생략된다.Hereinafter, a method of reconstructing an audio signal from a received bitstream by the audio decoding apparatus 500 according to an embodiment will be described with reference to the flowchart of FIG. 12 . Each of the steps shown in FIG. 12 may be performed by at least one component included in the above-described audio decoding apparatus 500, and a redundant description will be omitted.

도 12는 일 실시 예에 따른 오디오 복호화 장치의 오디오 신호 복호화 방법의 흐름도를 도시한다.12 is a flowchart illustrating an audio signal decoding method of an audio decoding apparatus according to an embodiment.

단계 S1201에서 일 실시 예에 따른 오디오 복호화 장치(500)는, 비트스트림을 복호화함으로써 제1 채널 그룹에 포함된 채널들에 대응되는 제1 오디오 신호들 및 다운샘플링된 제2 오디오 신호를 획득할 수 있다. 오디오 복호화 장치(500)는, 제1 채널 그룹에 대응되는 제1 오디오 신호들 및 다운샘플링된 제2 오디오 신호를 포함하는 사이드 채널 정보를 획득할 수 있다.In step S1201, the audio decoding apparatus 500 according to an embodiment may obtain first audio signals and downsampled second audio signals corresponding to channels included in the first channel group by decoding the bitstream. have. The audio decoding apparatus 500 may obtain side channel information including the first audio signals corresponding to the first channel group and the downsampled second audio signal.

일 실시 예에 따른 오디오 복호화 장치(500)는, 비트스트림을 압축해제 함으로써, 기본 채널 그룹의 오디오 신호들 및 종속 채널들의 오디오 신호들을 획득할 수 있다. 오디오 복호화 장치(500)는, 비트스트림에 포함된 부가 정보, 기본 채널 그룹의 오디오 신호들, 및 종속 채널들의 오디오 신호들에 기초하여, 제1 채널 그룹에 대응되는 제1 오디오 신호들을 획득할 수 있다.The audio decoding apparatus 500 according to an embodiment may obtain audio signals of a basic channel group and audio signals of dependent channels by decompressing a bitstream. The audio decoding apparatus 500 may obtain the first audio signals corresponding to the first channel group based on the additional information included in the bitstream, the audio signals of the basic channel group, and the audio signals of the dependent channels. have.

일 실시 예에 따르면, 기본 채널 그룹은, 스테레오 채널을 구성하는 좌 채널 및 우 채널을 포함하고, 종속 채널 그룹은, 상기 제1 채널 그룹에 포함되는 채널들 중에서, 기본 채널 그룹에 대응하는 두 채널들 이외의 채널들을 포함할 수 있다.According to an embodiment, the basic channel group includes a left channel and a right channel constituting a stereo channel, and the dependent channel group includes two channels corresponding to the basic channel group among channels included in the first channel group. It may include channels other than .

일 실시 예에 따른 오디오 복호화 장치(500)는, 기본 채널 그룹의 오디오 신호들, 및 종속 채널들의 오디오 신호들을 믹싱하여 제1 채널 그룹으로 믹싱된 오디오 신호들을 획득할 수 있다. 오디오 복호화 장치(500)는, 부가 정보에 기초하여 믹싱된 오디오 신호들을 렌더링 함으로써, 제1 채널 그룹에 대응되는 제1 오디오 신호들을 획득할 수 있다.The audio decoding apparatus 500 according to an embodiment may obtain the audio signals mixed into the first channel group by mixing the audio signals of the base channel group and the audio signals of the dependent channels. The audio decoding apparatus 500 may obtain the first audio signals corresponding to the first channel group by rendering the mixed audio signals based on the additional information.

일 실시 예에 따르면, 제1 채널 그룹에 대응되는 제1 오디오 신호들은 청자 전방 중심 다채널 오디오 신호를 포함할 수 있다. 예를 들어, 제1 채널 그룹은 L3 채널, C 채널, R3 채널, LFE 채널, HFL3 채널, 및 HFR3 채널을 포함하는 3.1.2 채널을 포함할 수 있다.According to an embodiment, the first audio signals corresponding to the first channel group may include a multi-channel audio signal centered in front of the listener. For example, the first channel group may include 3.1.2 channels including L3 channels, C channels, R3 channels, LFE channels, HFL3 channels, and HFR3 channels.

단계 S1202에서 일 실시 예에 따른 오디오 복호화 장치(500)는, 다운샘플링된 제2 오디오 신호를 인공 지능 모델을 이용하여 업샘플링함으로써, 제2 채널 그룹에 포함된 채널들 중에서 적어도 하나의 채널에 대응되는 적어도 하나의 제2 오디오 신호를 획득할 수 있다. 오디오 복호화 장치(500)는, 사이드 채널 정보를 업샘플링함으로써, 제2 채널 그룹에 포함되는 적어도 하나의 사이드 채널의 오디오 신호를 획득할 수 있다. 제1 채널 그룹은, 제2 채널 그룹보다 적은 수의 채널들을 포함하는 하위 채널 그룹일 수 있다.In step S1202, the audio decoding apparatus 500 according to an embodiment up-samples the down-sampled second audio signal using an artificial intelligence model to correspond to at least one channel among channels included in the second channel group. At least one second audio signal may be obtained. The audio decoding apparatus 500 may obtain an audio signal of at least one side channel included in the second channel group by upsampling the side channel information. The first channel group may be a sub-channel group including a smaller number of channels than the second channel group.

일 실시 예에 따른 오디오 복호화 장치(500)는, 인공 지능 모델을 이용하여, 사이드 채널 정보에 포함된 다운 샘플링된 제2 오디오 신호를 업샘플링함으로써 적어도 하나의 사이드 채널의 오디오 신호를 획득할 수 있다.The audio decoding apparatus 500 according to an embodiment may obtain an audio signal of at least one side channel by upsampling the down-sampled second audio signal included in the side channel information using an artificial intelligence model. .

일 실시 예에 따르면, 인공 지능 모델은, 제1 채널 그룹에 대응되는 제1 오디오 신호들 및 제2 채널 그룹에 포함되는 채널들 중에서 적어도 하나의 사이드 채널의 제2 오디오 신호로부터, 제2 채널 그룹의 전체 채널들의 오디오 신호들을 복원하기 위한 인공 지능 모델로서, 복원된 오디오 신호와 원본 오디오 신호 간의 오차를 최소화 하도록 훈련된 인공지능 모델인 것을 특징으로 할 수 있다.According to an embodiment, the artificial intelligence model may include a second channel group based on a second audio signal of at least one side channel among first audio signals corresponding to the first channel group and channels included in the second channel group. As an artificial intelligence model for reconstructing audio signals of all channels of

제2 채널 그룹에 포함되는 채널들은, 제2 채널 그룹에 포함된 채널들 각각의 제1 채널 그룹과의 관련도에 기초하여, 제1 채널 그룹과의 관련도가 높은 제1 서브 그룹의 채널들 및 제1 채널 그룹과의 관련도가 낮은 제2 서브 그룹의 채널들로 구분될 수 있다. 제1 서브 그룹의 채널들은 메인 채널들로 지칭되고, 제2 서브 그룹의 채널들은 사이드 채널들로 지칭될 수 있다. The channels included in the second channel group are channels of the first subgroup having a high degree of relevance to the first channel group based on the degree of relevance of each of the channels included in the second channel group to the first channel group. and channels of a second subgroup having low relevance to the first channel group. The channels of the first subgroup may be referred to as main channels, and the channels of the second subgroup may be referred to as side channels.

일 실시 예에 따른 오디오 복호화 장치(500)는, 제1 채널 그룹과 제2 채널 그룹 간의 채널 그룹 변환 규칙에 따라, 제1 채널 그룹에 대응되는 제1 오디오 신호들 및 적어도 하나의 사이드 채널의 오디오 신호로부터, 제2 채널 그룹에 포함되는 적어도 하나의 메인 채널의 오디오 신호를 획득할 수 있다.The audio decoding apparatus 500 according to an embodiment may include, according to a channel group conversion rule between a first channel group and a second channel group, first audio signals corresponding to the first channel group and audio of at least one side channel An audio signal of at least one main channel included in the second channel group may be obtained from the signal.

일 실시 예에 따른 오디오 복호화 장치(500)는, 인공 지능 모델을 이용하여 적어도 하나의 사이드 채널의 오디오 신호 및 적어도 하나의 메인 채널의 오디오 신호를 수정할 수 있다. 오디오 복호화 장치(500)는, 수정된 사이드 채널의 오디오 신호 및 수정된 메인 채널의 오디오 신호에 기초하여, 제2 채널 그룹에 대응되는 제3 오디오 신호들을 획득할 수 있다.The audio decoding apparatus 500 according to an embodiment may modify an audio signal of at least one side channel and an audio signal of at least one main channel using an artificial intelligence model. The audio decoding apparatus 500 may obtain third audio signals corresponding to the second channel group based on the modified audio signal of the side channel and the modified audio signal of the main channel.

일 실시 예에 따르면, 제2 채널 그룹에 대응되는 제3 오디오 신호들은 청자 중심 다채널 오디오 신호를 포함할 수 있다. 예를 들어, 제2 채널 그룹은, L(Left) 채널, C(Center) 채널, R(Right) 채널, Ls(Side Left) 채널, Rs(Side Right) 채널, Lb(Back Left) 채널, Rb(Back Right) 채널, LFE(Sub-woofer) 채널, HFL(Height Front Left) 채널, HFR(Height Front Right) 채널, 및 HBL(Height Back Left) 채널, HBR(Height Back Right) 채널을 포함하는 7.1.4 채널을 포함할 수 있다. 이 경우, 적어도 하나의 사이드 채널의 오디오 신호는, 제2 채널 그룹의 Ls 채널, Rs 채널, Lb 채널, Rb 채널, HBL 채널, 및 HBR 채널의 오디오 신호들을 포함할 수 있다.According to an embodiment, the third audio signals corresponding to the second channel group may include a listener-centered multi-channel audio signal. For example, the second channel group includes an L (Left) channel, C (Center) channel, R (Right) channel, Ls (Side Left) channel, Rs (Side Right) channel, Lb (Back Left) channel, and Rb channel. 7.1 including (Back Right) channel, LFE (Sub-woofer) channel, HFL (Height Front Left) channel, HFR (Height Front Right) channel, and HBL (Height Back Left) channel, HBR (Height Back Right) channel It may include .4 channels. In this case, the audio signal of the at least one side channel may include audio signals of the Ls channel, the Rs channel, the Lb channel, the Rb channel, the HBL channel, and the HBR channel of the second channel group.

일 실시 예에 따른 오디오 복호화 장치(500)는, 제2 채널 그룹에 포함되는 채널들로부터 상기 제1 채널 그룹에 포함되는 채널들로의 변환 규칙에 따라, 제1 오디오 신호들 및 사이드 채널들에 대응되는 제2 오디오 신호들로부터, 메인 채널들에 대응되는 제4 오디오 신호들을 획득할 수 있다. 오디오 복호화 장치(500)는, 인공 지능 모델을 이용하여, 제2 오디오 신호들 및 제4 오디오 신호들을 수정할 수 있다. 오디오 복호화 장치(500)는, 수정된 제2 오디오 신호들 및 수정된 제4 오디오 신호들로부터, 제2 채널 그룹에 포함되는 전체 채널들에 대응되는 제3 오디오 신호들을 획득할 수 있다. According to a conversion rule from channels included in the second channel group to channels included in the first channel group, the audio decoding apparatus 500 according to an embodiment is configured to convert the first audio signals and the side channels to the first audio signals and the side channels. Fourth audio signals corresponding to the main channels may be obtained from the corresponding second audio signals. The audio decoding apparatus 500 may modify the second audio signals and the fourth audio signals by using the artificial intelligence model. The audio decoding apparatus 500 may obtain third audio signals corresponding to all channels included in the second channel group from the modified second audio signals and the modified fourth audio signals.

예를 들어, 인공 지능 모델 내의 제1 레이어들을 통하여 제4 오디오 신호들이 수정되고, 인공 지능 모델 내의 제2 레이어들을 통하여 제2 오디오 신호들이 수정될 수 있다. 제1 레이어들에, 제1 오디오 신호들, 제2 오디오 신호들, 및 제4 오디오 신호들이 입력됨으로써, 수정된 제4 오디오 신호들이 획득될 수 있다. 제2 레이어들에, 제1 오디오 신호들, 제2 오디오 신호들, 수정된 제4 오디오 신호들, 및 제1 레이어들로부터 출력된 값들이 입력됨으로써, 수정된 제2 오디오 신호들이 획득되는 것을 특징으로 할 수 있다.For example, fourth audio signals may be modified through first layers in the artificial intelligence model, and second audio signals may be modified through second layers in the artificial intelligence model. As the first audio signals, the second audio signals, and the fourth audio signals are input to the first layers, modified fourth audio signals may be obtained. First audio signals, second audio signals, modified fourth audio signals, and values output from the first layers are input to the second layers, whereby modified second audio signals are obtained can be done with

단계 S1203에서 일 실시 예에 따른 오디오 복호화 장치(500)는, 제1 채널 그룹에 대응되는 제1 오디오 신호들 및 적어도 하나의 사이드 채널의 제2 오디오 신호에 기초하여, 제2 채널 그룹에 대응되는 제3 오디오 신호들을 복원할 수 있다.In step S1203 , the audio decoding apparatus 500 according to an embodiment performs a second channel group corresponding to the first audio signals corresponding to the first channel group and the second audio signal of at least one side channel. The third audio signals may be restored.

상술한 바와 같이, 일 실시 예에 따른 오디오 부호화 장치(400)는, 화면을 중심으로 음상이 구현되도록 채널 그룹을 변환함에 있어서, 제2 채널 그룹의 채널들 중에서 이용되지 않거나, 관련된 정보가 가장 적게 사용된 채널을 사이드 채널로서 결정할 수 있다. 그러나 본 개시는 상술한 예에 제한되지 않는다. 일 실시 예에 따른 오디오 부호화 장치(400)는, 음상 특성 분석 모듈을 추가로 포함하여, 시간에 따른 입력 오디오 신호의 음상 특성 변화에 따라 사이드 채널의 종류와 수를 변경할 수 있다.As described above, when the audio encoding apparatus 400 according to an embodiment converts a channel group to implement a sound image centered on a screen, it is not used among the channels of the second channel group or has the least amount of related information. The channel used may be determined as a side channel. However, the present disclosure is not limited to the above-described examples. The audio encoding apparatus 400 according to an embodiment may further include a sound image characteristic analysis module to change the type and number of side channels according to a change in sound image characteristics of the input audio signal over time.

도 13은 일 실시 예에 따른 오디오 처리 시스템에서 음상 특성 분석에 기초하여 수행되는 채널 그룹들 간의 변환의 예를 도시한다.13 illustrates an example of conversion between channel groups performed based on sound image characteristic analysis in an audio processing system according to an exemplary embodiment.

일 실시 예에 따른 오디오 부호화 장치(400)는, 제1 채널 그룹에 대응되는 오디오 신호를 원본 오디오 신호로서 수신할 수 있다. 예를 들어, 도 13에 도시된 바와 같이, 오디오 부호화 장치(400)는, Ls채널, Lb 채널, HBL 채널, L 채널, HFL 채널, C 채널, LFE 채널, HFR 채널, R 채널, HBR 채널, Rb 채널, 및 Rs로 구성되는 7.1.4 채널 그룹의 오디오 신호를 원본 오디오 신호로서 수신할 수 있다. The audio encoding apparatus 400 according to an embodiment may receive an audio signal corresponding to the first channel group as an original audio signal. For example, as shown in FIG. 13 , the audio encoding apparatus 400 includes an Ls channel, an Lb channel, an HBL channel, an L channel, an HFL channel, a C channel, an LFE channel, an HFR channel, an R channel, an HBR channel, An audio signal of the 7.1.4 channel group consisting of the Rb channel and Rs may be received as an original audio signal.

일 실시 예에 따른 오디오 부호화 장치(400)는, 입력되는 원본 오디오 신호의 음상 특성을 분석할 수 있다. 오디오 부호화 장치(400)는, 음상 특성에 기초하여, 원본 오디오 신호의 제1 채널 그룹을 디스플레이 장치의 화면을 중심으로 음상이 구현되는 제2 채널 그룹으로 변환할 수 있다. 예를 들어, 오디오 부호화 장치(400)는, 7.1.4 채널 그룹의 원본 오디오 신호를 3.1.2 채널 그룹의 오디오 신호(O_tv)로 변환할 수 있다. 오디오 부호화 장치(400)는, 제2 채널 그룹으로 변환된 오디오 신호를 비트스트림에 포함하여 오디오 복호화 장치(500)에게 전송할 수 있다.The audio encoding apparatus 400 according to an embodiment may analyze sound image characteristics of an input original audio signal. The audio encoding apparatus 400 may convert the first channel group of the original audio signal into a second channel group in which a sound image is implemented based on the sound image characteristic around the screen of the display apparatus. For example, the audio encoding apparatus 400 may convert the original audio signal of the 7.1.4 channel group into the audio signal O _tv of the 3.1.2 channel group. The audio encoding apparatus 400 may transmit the audio signal converted into the second channel group to the audio decoding apparatus 500 by including it in a bitstream.

오디오 부호화 장치(400)는, 음상 특성에 기초하여, 제1 채널 그룹의 채널들 중에서 적어도 하나의 사이드 채널을 결정할 수 있다.The audio encoding apparatus 400 may determine at least one side channel from among the channels of the first channel group, based on the sound image characteristic.

일 실시 예에 따른 오디오 부호화 장치(400)는, 인공 지능 모델을 이용하여, 시간 단위 별로, 원본 오디오 신호의 음원 특성을 분석할 수 있다. 예를 들어, 오디오 부호화 장치(400)는, 원본 오디오 신호가 복수의 화자들의 대화 정보를 포함하는 신호인지, 하나의 화자가 발화하는 정보를 포함하는 신호인지 분석할 수 있다. 또는, 오디오 부호화 장치(400)는, 청자를 중심으로 수평 방향의 음원 분포 특징, 또는 수직 방향의 음원 분포 특징을 분석할 수 있다.The audio encoding apparatus 400 according to an embodiment may analyze the sound source characteristics of the original audio signal for each time unit by using the artificial intelligence model. For example, the audio encoding apparatus 400 may analyze whether the original audio signal is a signal including conversation information of a plurality of speakers or a signal including information uttered by one speaker. Alternatively, the audio encoding apparatus 400 may analyze a sound source distribution characteristic in a horizontal direction or a sound source distribution characteristic in a vertical direction with a focus on the listener.

이러한 분석 결과에 기초하여, 오디오 부호화 장치(400)는, 복호화 단에서 오디오 복원 성능을 가장 높일 수 있는 적어도 하나의 사이드 채널을 결정할 수 있다. 오디오 부호화 장치(400)는, 적어도 하나의 사이드 채널의 오디오 신호를 시간축으로 다운샘플링하여 오디오 복호화 장치(500)에게 전송할 수 있다.Based on the analysis result, the audio encoding apparatus 400 may determine at least one side channel capable of maximizing audio restoration performance in the decoding stage. The audio encoding apparatus 400 may downsample the audio signal of at least one side channel along the time axis and transmit it to the audio decoding apparatus 500 .

원본 오디오 신호의 음상 특성이 시간에 따라 달라지는 경우, 오디오 부호화 장치(400)가 제1 채널 그룹의 채널들 중에서 사이드 채널로서 결정되는 채널의 종류와 수가 달라질 수 있다.When the sound image characteristic of the original audio signal varies with time, the type and number of channels determined by the audio encoding apparatus 400 as side channels among channels of the first channel group may vary.

오디오 부호화 장치(400)는, 제1 채널 그룹의 채널들 중에서 사이드 채널로서 결정된 N 개의 채널을 1/s배로 다운샘플링한 신호(M_adv)를 비트스트림에 더 포함하여 오디오 복호화 장치(500)에게 전송할 수 있다. (N은 1보다 큰 정수, s는 1보다 큰 유리수임)The audio encoding apparatus 400 further includes, in the bitstream, a signal M _adv obtained by downsampling N channels determined as side channels among channels of the first channel group by a factor of 1/s to the audio decoding apparatus 500 . can be transmitted (N is an integer greater than 1, s is a rational number greater than 1)

오디오 복호화 장치(500)는, 다운샘플링되어 전송된 적어도 하나의 사이드 채널의 오디오 신호(M_adv)를 업샘플링하여, 적어도 하나의 사이드 채널의 오디오 신호를 복원할 수 있다.The audio decoding apparatus 500 may upsample the downsampled and transmitted audio signal M _adv of at least one side channel to reconstruct the audio signal of at least one side channel.

오디오 복호화 장치(500)는, 복원된 적어도 하나의 사이드 채널의 오디오 신호를 이용하여, 제2 채널 그룹의 오디오 신호(O_tv)로부터, 청자 중심으로 음상이 구현되는 제1 채널 그룹의 오디오 신호를 복원할 수 있다. 이 때, 오디오 복호화 장치(500)는, 음상 특성을 반영하여, 제1 채널 그룹의 오디오 신호를 복원할 수 있다.The audio decoding apparatus 500 receives, from the audio signal O _tv of the second channel group, the audio signal of the first channel group in which the sound image is centered on the listener, using the restored audio signal of at least one side channel. can be restored In this case, the audio decoding apparatus 500 may restore the audio signal of the first channel group by reflecting the sound image characteristics.

도 14는 일 실시 예에 따른 오디오 처리 시스템이 채널의 특성에 기초하여 사이드 채널의 오디오 신호를 다운샘플링 하는 예를 도시한다.14 illustrates an example in which the audio processing system downsamples an audio signal of a side channel based on channel characteristics according to an embodiment.

일 실시 예에 따른 오디오 부호화 장치(400)는, 적어도 하나의 사이드 채널의 오디오 신호를 다운샘플링함에 있어서, 미리 결정된 룰(rule)에 기초하여 다운샘플링을 수행할 수 있다. 오디오 부호화 장치(400)가, 미리 결정된 룰에 기초하여 다운샘플링을 수행할 경우, 딜레이가 없는 실시간 처리에 유리할 수 있다. The audio encoding apparatus 400 according to an embodiment may perform downsampling based on a predetermined rule when downsampling an audio signal of at least one side channel. When the audio encoding apparatus 400 performs downsampling based on a predetermined rule, it may be advantageous for real-time processing without delay.

다른 일 실시 예에 따르면, 도 14에 도시된 바와 같이, 오디오 부호화 장치(400)는, 사이드 채널의 특성(예를 들어, 희소성(sparsity))을 추출하고, 사이드 채널의 특성에 기초하여 사이드 채널의 오디오 신호를 다운샘플링할 수 있다.According to another embodiment, as shown in FIG. 14 , the audio encoding apparatus 400 extracts a characteristic (eg, sparsity) of a side channel, and based on the characteristic of the side channel, the side channel can downsample the audio signal of

상술한 바와 같이, 일 실시 예에 따른 오디오 부호화 장치(400)는, 청자 중심 다채널 오디오 신호를 포함하는 제1 오디오 신호를 화면 중심 다채널 오디오 신호를 포함하는 제2 오디오 신호로 변환하여 전송할 수 있다. 오디오 복호화 장치(500)는, 컨텐츠 소비 환경 변화에 따라 화면 중심 다채널 오디오 신호를 복원하거나, 청자 중심 다채널 오디오 신호를 복원할 수 있다. 그러나 본 개시는, 원본 입력 오디오 신호가 청자 중심 다채널 오디오 신호를 포함하고, 전송되는 비트스트림이 화면 중심 다채널 오디오 신호를 포함하는 실시 예에 제한되지 않는다.As described above, the audio encoding apparatus 400 according to an embodiment may convert a first audio signal including a listener-centered multi-channel audio signal into a second audio signal including a screen-centered multi-channel audio signal and transmit it. have. The audio decoding apparatus 500 may restore a screen-centered multi-channel audio signal or a listener-centered multi-channel audio signal according to a change in the content consumption environment. However, the present disclosure is not limited to an embodiment in which an original input audio signal includes a listener-centered multi-channel audio signal and a transmitted bitstream includes a screen-centered multi-channel audio signal.

도 15는 일 실시 예에 따른 오디오 처리 방법이 적용될 수 있는 실시 예들을 도시한다.15 illustrates embodiments to which an audio processing method according to an embodiment may be applied.

일 예로서, 오디오 부호화 장치(400)는, 입력 오디오가 화면 중심의 음상을 구성하는 경우, 입력 오디오로부터 "청취 환경(예를 들어, 일반 2 채널 스피커를 이용하는 청취 환경, 또는 2 채널 이어폰을 이용하는 청취 환경) 중심으로 음상이 변환된 중간 결과 오디오 신호"를 획득하고, 중간 결과 오디오 신호와 관련도가 적은(Least-Correlated) 사이드 채널 정보를 추출할 수 있다. 오디오 부호화 장치(400)는, 중간 결과 오디오 신호와 사이드 채널 정보를 압축하여 전송할 수 있다. 오디오 복호화 장치(500)는, 중간 결과 오디오 신호와 사이드 채널 정보로부터 다시 “화면 중심의 음상을 구성하는 오디오 신호들"을 복원할 수 있다.As an example, the audio encoding apparatus 400, when the input audio constitutes a sound image at the center of the screen, "listening environment (eg, a listening environment using a general 2-channel speaker, or a 2-channel earphone using a 2-channel earphone Listening environment), an intermediate result audio signal in which the sound image is converted” may be obtained, and side channel information that is least-correlated with the intermediate result audio signal may be extracted. The audio encoding apparatus 400 may compress and transmit the intermediate result audio signal and side channel information. The audio decoding apparatus 500 may restore "audio signals constituting a sound image centered on the screen" again from the intermediate result audio signal and side channel information.

도 15를 참조하면, 오디오 부호화 장치(400)는, 화면 중심의 음상을 구성하는 오디오 신호(1501)를 청취 환경 중심 음상을 구성하는 오디오 신호(1502)로 변환하여 전송할 수 있다. 이 때, 오디오 부호화 장치(400)는, 오디오 신호(1502)와 관련성이 적은 사이드 채널의 오디오 신호를 함께 전송할 수 있다. 오디오 복호화 장치(500)는, 전송되는 비트스트림으로부터 청취 환경 중심 음상을 구성하는 오디오 신호(1502)를 복원할 수 있다. 또한, 오디오 복호화 장치(500)는, 사이드 채널의 오디오 신호를 이용하여, 오디오 신호(1502)를 화면 중심의 음상을 구성하는 오디오 신호(1503)로 복원할 수 있다.Referring to FIG. 15 , the audio encoding apparatus 400 may convert an audio signal 1501 constituting a sound image at the center of a screen into an audio signal 1502 constituting a sound image at the center of a listening environment and transmit it. In this case, the audio encoding apparatus 400 may transmit an audio signal of a side channel having little relevance to the audio signal 1502 together. The audio decoding apparatus 500 may reconstruct the audio signal 1502 constituting the sound image centered on the listening environment from the transmitted bitstream. Also, the audio decoding apparatus 500 may restore the audio signal 1502 to the audio signal 1503 constituting the sound image at the center of the screen by using the audio signal of the side channel.

다른 예로서, 오디오 부호화 장치(400)는, 입력 오디오가 청자 중심의 음상을 구성하는 경우, 디바이스를 통한 사용자의 청취 환경에 따른 중간 결과 오디오 신호를 획득하고 중간 결과 오디오 신호와 관련도가 적은 사이드 채널 정보를 추출할 수 있다. 오디오 부호화 장치(400)는, 중간 결과 오디오 신호와 사이드 채널 정보를 압축하여 전송할 수 있다. 오디오 복호화 장치(500)는, 중간 결과 오디오 신호와 사이드 채널 정보로부터 다시 “화면 중심의 음상을 구성하는 오디오 신호들"을 복원할 수 있다.As another example, when the input audio constitutes a listener-centered sound image, the audio encoding apparatus 400 obtains an intermediate result audio signal according to the user's listening environment through the device, and obtains an intermediate result audio signal with a low relevance to the intermediate result audio signal. Channel information can be extracted. The audio encoding apparatus 400 may compress and transmit the intermediate result audio signal and side channel information. The audio decoding apparatus 500 may restore "audio signals constituting a sound image centered on the screen" again from the intermediate result audio signal and side channel information.

도 15를 참조하면, 오디오 부호화 장치(400)는, 청자 중심의 음상을 구성하는 오디오 신호(1511)를 청취 환경 중심 음상을 구성하는 오디오 신호(1512)로 변환하여 전송할 수 있다. 예를 들어, 청취 환경 중심 음상을 구성하는 오디오 신호(1512)는 2 채널 레이아웃의 오디오 신호들을 포함할 수 있다. 청취 환경 중심 음상을 구성하는 오디오 신호(1512)는, 청자 중심의 음상은 유지하면서 채널 변환 등의 음향 효과가 강조된 중간 결과 오디오 신호일 수 있다. 오디오 부호화 장치(400)는, 청취 환경 중심 음상을 구성하는 오디오 신호(1512)와 관련성이 적은 사이드 채널의 오디오 신호를 함께 전송할 수 있다. 오디오 복호화 장치(500)는, 전송되는 비트스트림으로부터 청취 환경 중심 음상을 구성하는 오디오 신호(1512)를 복원할 수 있다. 또한, 오디오 복호화 장치(500)는, 사이드 채널의 오디오 신호를 이용하여, 청취 환경 중심 음상을 구성하는 오디오 신호(1512)를 화면 중심의 음상을 구성하는 오디오 신호(1513)로 복원할 수 있다.Referring to FIG. 15 , the audio encoding apparatus 400 may convert an audio signal 1511 constituting a listener-centered sound image into an audio signal 1512 constituting a listening environment-centered sound image and transmit the converted audio signal 1512 . For example, the audio signal 1512 constituting the central sound image of the listening environment may include audio signals of a two-channel layout. The audio signal 1512 constituting the sound image centered on the listening environment may be an intermediate result audio signal in which a sound effect such as a channel change is emphasized while maintaining a sound image centered on the listener. The audio encoding apparatus 400 may transmit the audio signal 1512 constituting the central sound image of the listening environment together with the audio signal of the side channel having little relevance. The audio decoding apparatus 500 may reconstruct an audio signal 1512 constituting a sound image centered on the listening environment from the transmitted bitstream. Also, the audio decoding apparatus 500 may restore the audio signal 1512 constituting the sound image centered in the listening environment to the audio signal 1513 constituting the sound image centered in the screen by using the audio signal of the side channel.

기기로 읽을 수 있는 저장매체는, 비일시적(non-transitory) 저장매체의 형태로 제공될 수 있다. 여기서, ‘비일시적 저장매체'는 실재(tangiLbe)하는 장치이고, 신호(signal)(예: 전자기파)를 포함하지 않는다는 것을 의미할 뿐이며, 이 용어는 데이터가 저장매체에 반영구적으로 저장되는 경우와 임시적으로 저장되는 경우를 구분하지 않는다. 예로, '비일시적 저장매체'는 데이터가 임시적으로 저장되는 버퍼를 포함할 수 있다.The device-readable storage medium may be provided in the form of a non-transitory storage medium. Here, 'non-transitory storage medium' is a device that is real (tangiLbe) and only means that it does not contain a signal (eg, electromagnetic wave). It does not distinguish the case where it is stored as For example, the 'non-transitory storage medium' may include a buffer in which data is temporarily stored.

일 실시 예에 따르면, 본 문서에 개시된 다양한 실시 예들에 따른 방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다. 컴퓨터 프로그램 제품은 기기로 읽을 수 있는 저장 매체(예: compact disc read only memory (CD-ROM))의 형태로 배포되거나, 또는 어플리케이션 스토어를 통해 또는 두개의 사용자 장치들(예: 스마트폰들) 간에 직접, 온라인으로 배포(예: 다운로드 또는 업로드)될 수 있다. 온라인 배포의 경우에, 컴퓨터 프로그램 제품(예: 다운로더블 앱(downloadaLbe app))의 적어도 일부는 제조사의 서버, 어플리케이션 스토어의 서버, 또는 중계 서버의 메모리와 같은 기기로 읽을 수 있는 저장 매체에 적어도 일시 저장되거나, 임시적으로 생성될 수 있다.According to an embodiment, the method according to various embodiments disclosed in this document may be provided by being included in a computer program product. Computer program products may be traded between sellers and buyers as commodities. The computer program product is distributed in the form of a machine-readable storage medium (eg compact disc read only memory (CD-ROM)), or via an application store or between two user devices (eg smartphones). It can be distributed directly or online (eg, downloaded or uploaded). In the case of online distribution, at least a portion of a computer program product (eg, a downloadaLbe app) is stored at least in a machine-readable storage medium, such as a memory of a manufacturer's server, a server of an application store, or a relay server. It may be temporarily stored or temporarily created.

Claims

하나 이상의 인스트럭션을 저장하는 메모리; 및
상기 메모리에 저장된 상기 하나 이상의 인스트럭션을 실행함으로써,
제1 채널 그룹에 포함된 채널들에 대응되는 제1 오디오 신호들로부터 제2 채널 그룹에 포함된 채널들에 대응되는 제2 오디오 신호들을 획득하고,
상기 제1 채널 그룹에 포함된 채널들 중 상기 제2 채널 그룹과의 관련도에 기초하여 식별된 적어도 하나의 채널에 대응되는 적어도 하나의 제3 오디오 신호를 인공 지능 모델을 이용하여 다운샘플링하고,
상기 제2 채널 그룹에 포함된 채널들에 대응되는 제2 오디오 신호들 및 상기 다운샘플링된 적어도 하나의 제3 오디오 신호를 포함하는 비트스트림을 생성하는, 프로세서를 포함하고,
상기 제1 채널 그룹은 원본 오디오 신호의 채널 그룹을 포함하고, 상기 제2 채널 그룹은 제1 채널 그룹에 포함된 채널들 중 적어도 두 채널들을 결합함으로써 구성되는, 오디오 처리 장치.a memory storing one or more instructions; and
By executing the one or more instructions stored in the memory,
obtaining second audio signals corresponding to channels included in a second channel group from first audio signals corresponding to channels included in the first channel group;
Downsampling at least one third audio signal corresponding to at least one channel identified based on the degree of relevance to the second channel group among channels included in the first channel group using an artificial intelligence model,
a processor for generating a bitstream including second audio signals corresponding to channels included in the second channel group and the downsampled at least one third audio signal;
The first channel group includes a channel group of an original audio signal, and the second channel group is configured by combining at least two channels among channels included in the first channel group.

제1 항에 있어서,
상기 프로세서는,
상기 제1 채널 그룹에 포함된 채널들 중에서 상기 제2 채널 그룹과의 관련도가 소정 값보다 낮은 상기 적어도 하나의 채널을 식별하는, 오디오 처리 장치.The method of claim 1,
The processor is
and identifying the at least one channel having a lower relevance to the second channel group than a predetermined value from among the channels included in the first channel group.

제1 항에 있어서,
상기 프로세서는,
상기 제1 채널 그룹에 포함된 채널들 각각의 상기 제2 채널 그룹과의 관련도에 기초하여, 상기 제1 채널 그룹에 포함된 채널들에 가중치 값들을 할당하고,
상기 제1 채널 그룹에 포함된 채널들에 할당된 상기 가중치 값들에 기초하여, 상기 제1 오디오 신호들 중 적어도 두 개의 제1 오디오 신호들을 가중치 합함으로써, 상기 제1 오디오 신호들로부터 상기 제2 오디오 신호들을 획득하고,
상기 제1 채널 그룹에 포함된 채널들에 할당된 가중치 값들에 기초하여, 상기 제1 채널 그룹에 포함된 채널들 중에서 상기 적어도 하나의 채널을 식별하는, 오디오 처리 장치.The method of claim 1,
The processor is
allocating weight values to the channels included in the first channel group based on the degree of relevance of each of the channels included in the first channel group to the second channel group;
Based on the weight values allocated to channels included in the first channel group, by weight summing at least two first audio signals among the first audio signals, the second audio signals are obtained from the first audio signals. get signals,
and identifying the at least one channel from among the channels included in the first channel group based on weight values assigned to the channels included in the first channel group.

제3 항에 있어서,
상기 제1 채널 그룹에 포함된 채널들은, 제1 서브 그룹의 채널들 및 제2 서브 그룹의 채널들로 구분되고,
상기 프로세서는,
상기 제1 채널 그룹에 포함된 채널들 중 적어도 두 채널들에 할당된 가중치 값들에 기초하여 상기 적어도 두 채널들에 대응되는 오디오 신호들을 합함으로써, 상기 제2 채널 그룹에 포함된 채널들 중의 하나의 채널에 대응되는 오디오 신호를 획득하고,
상기 적어도 두 채널들 중에서 할당된 가중치 값이 최대인 채널을 상기 제1 서브 그룹의 채널로 식별하고, 상기 적어도 두 채널들 중에서 나머지 채널을 상기 제2 서브 그룹의 채널로 식별하는, 오디오 처리 장치.4. The method of claim 3,
Channels included in the first channel group are divided into channels of a first subgroup and channels of a second subgroup,
The processor is
One of the channels included in the second channel group is selected by adding audio signals corresponding to the at least two channels based on weight values assigned to at least two channels among the channels included in the first channel group. Acquire an audio signal corresponding to a channel,
The audio processing apparatus of claim 1, wherein a channel having a maximum assigned weight value among the at least two channels is identified as a channel of the first subgroup, and the remaining channels among the at least two channels are identified as a channel of the second subgroup.

제1 항에 있어서,
상기 프로세서는,
상기 적어도 하나의 제3 오디오 신호로부터 제1 오디오 샘플 그룹 및 제2 오디오 샘플 그룹을 추출하고,
상기 인공 지능 모델을 이용하여, 상기 제1 오디오 샘플 그룹 및 상기 제2 오디오 샘플 그룹에 대한 다운샘플링 관련 정보를 획득하고,
상기 다운샘플링 관련 정보를 상기 제1 오디오 샘플 그룹 및 상기 제2 오디오 샘플 그룹에 적용함으로써, 상기 적어도 하나의 제3 오디오 신호를 다운샘플링하는, 오디오 처리 장치.The method of claim 1,
The processor is
extracting a first audio sample group and a second audio sample group from the at least one third audio signal;
obtaining downsampling related information for the first audio sample group and the second audio sample group by using the artificial intelligence model;
Downsampling the at least one third audio signal by applying the downsampling related information to the first audio sample group and the second audio sample group.

제5 항에 있어서,
상기 인공 지능 모델은,
상기 제2 오디오 신호들 및 상기 다운샘플링된 적어도 하나의 제3 오디오 신호에 기초하여 복원되는 복원 오디오 신호들과, 상기 제1 오디오 신호들 간의 오차를 최소화 하는 상기 다운샘플링 관련 정보를 획득하도록 훈련된 인공 지능 모델인 것을 특징으로 하는, 오디오 처리 장치.6. The method of claim 5,
The artificial intelligence model is
Trained to obtain the downsampling-related information that minimizes the error between the first audio signals and reconstructed audio signals reconstructed based on the second audio signals and the down-sampled at least one third audio signal An audio processing device, characterized in that it is an artificial intelligence model.

제1 항에 있어서,
상기 프로세서는,
상기 제2 채널 그룹에 포함되는 채널들에 대응되는 제2 오디오 신호들로부터 기본 채널 그룹의 오디오 신호들 및 종속 채널 그룹의 오디오 신호들을 획득하고,
상기 기본 채널 그룹의 오디오 신호들을 압축하여 제1 압축 신호를 획득하고,
상기 종속 채널 그룹의 오디오 신호들을 압축하여 제2 압축 신호를 획득하고,
상기 다운샘플링된 적어도 하나의 제3 오디오 신호를 압축함으로써 제3 압축 신호를 획득하고,
상기 제1 압축 신호, 상기 제2 압축 신호, 및 상기 제3 압축 신호를 포함하는 상기 비트스트림을 생성하는, 오디오 처리 장치.The method of claim 1,
The processor is
obtaining audio signals of a basic channel group and audio signals of a subordinate channel group from second audio signals corresponding to channels included in the second channel group;
Compressing the audio signals of the basic channel group to obtain a first compressed signal,
compressing the audio signals of the dependent channel group to obtain a second compressed signal;
obtaining a third compressed signal by compressing the downsampled at least one third audio signal;
and generating the bitstream including the first compressed signal, the second compressed signal, and the third compressed signal.

제7 항에 있어서,
상기 기본 채널 그룹은, 스테레오 재생을 위한 두 채널들을 포함하고,
상기 종속 채널 그룹은, 상기 제2 채널 그룹에 포함된 채널들 중에서, 상기 스테레오 재생을 위한 두 채널들과 관련도가 높은 두 채널들 이외의 채널들을 포함하는, 오디오 처리 장치.8. The method of claim 7,
The basic channel group includes two channels for stereo reproduction,
The subordinate channel group includes channels other than the two channels having a high degree of relevance to the two channels for the stereo reproduction among channels included in the second channel group.

제1 항에 있어서,
상기 제1 채널 그룹에 포함된 채널들에 대응되는 제1 오디오 신호들은, 청자 중심 다채널 오디오 신호를 포함하고,
상기 제2 채널 그룹에 포함된 채널들에 대응되는 제2 오디오 신호들은, 청자 전방 중심 다채널 오디오 신호를 포함하는, 오디오 처리 장치.The method of claim 1,
The first audio signals corresponding to the channels included in the first channel group include a listener-centered multi-channel audio signal,
The second audio signals corresponding to the channels included in the second channel group include a multi-channel audio signal centered in front of a listener.

제1 항에 있어서,
상기 제1 채널 그룹은, 전방 좌측 채널, 전방 우측 채널, 센터 채널, 좌측 채널, 우측 채널, 후방 좌측 채널, 후방 우측 채널, 서브 우퍼 채널, 전방 상부 좌측 채널, 전방 상부 우측 채널, 후방 상부 좌측 채널, 및 후방 상부 우측 채널로 구성되는 7.1.4 채널을 포함하고,
상기 제2 채널 그룹은, 전방 좌측 채널, 전방 우측 채널, 센터 채널, 서브 우퍼 채널, 전방 상부 좌측 채널, 및 전방 상부 우측 채널로 구성되는 3.1.2 채널을 포함하고,
상기 프로세서는,
상기 제1 채널 그룹에 포함되는 채널들 중에서 상기 제2 채널 그룹과의 관련도가 낮은, 상기 좌측 채널, 상기 우측 채널, 상기 후방 좌측 채널, 상기 후방 우측 채널, 상기 후방 상부 좌측 채널, 및 상기 후방 상부 우측 채널을 제2 서브 그룹의 채널들로서 식별하는, 오디오 처리 장치.The method of claim 1,
The first channel group includes a front left channel, a front right channel, a center channel, a left channel, a right channel, a rear left channel, a rear right channel, a subwoofer channel, a front upper left channel, a front upper right channel, and a rear upper left channel. , and a 7.1.4 channel consisting of a rear upper right channel,
the second channel group includes 3.1.2 channels consisting of a front left channel, a front right channel, a center channel, a subwoofer channel, a front upper left channel, and a front upper right channel,
The processor is
Among the channels included in the first channel group, the left channel, the right channel, the rear left channel, the rear right channel, the rear upper left channel, and the rear having a low relevance to the second channel group. and identifying the upper right channel as channels of the second subgroup.

하나 이상의 인스트럭션을 저장하는 메모리; 및
상기 메모리에 저장된 상기 하나 이상의 인스트럭션을 실행함으로써,
비트스트림으로부터 제1 채널 그룹에 포함된 채널들에 대응되는 제1 오디오 신호들 및 다운샘플링된 제2 오디오 신호를 획득하고,
상기 다운샘플링된 제2 오디오 신호를 인공 지능 모델을 이용하여 업샘플링함으로써, 제2 채널 그룹에 포함된 채널들 중에서 적어도 하나의 채널에 대응되는 적어도 하나의 제2 오디오 신호를 획득하고,
상기 제1 오디오 신호들 및 상기 적어도 하나의 제2 오디오 신호로부터 상기 제2 채널 그룹에 포함된 채널들에 대응되는 제3 오디오 신호들을 복원하고,
상기 제1 채널 그룹은 상기 제2 채널 그룹보다 적은 수의 채널들을 포함하는, 오디오 처리 장치.a memory storing one or more instructions; and
By executing the one or more instructions stored in the memory,
Obtaining first audio signals and down-sampled second audio signals corresponding to channels included in the first channel group from the bitstream,
Obtaining at least one second audio signal corresponding to at least one channel among channels included in the second channel group by upsampling the downsampled second audio signal using an artificial intelligence model,
reconstructing third audio signals corresponding to channels included in the second channel group from the first audio signals and the at least one second audio signal;
and the first channel group includes fewer channels than the second channel group.

제11 항에 있어서,
상기 제2 채널 그룹에 포함된 채널들은, 상기 제2 채널 그룹에 포함된 채널들 각각의 상기 제1 채널 그룹과의 관련도에 기초하여, 제1 서브 그룹의 채널들 및 제2 서브 그룹의 채널들로 구분되고,
상기 프로세서는,
상기 제2 서브 그룹의 채널들에 대응되는 제2 오디오 신호들을 획득하는, 오디오 처리 장치.12. The method of claim 11,
The channels included in the second channel group include the channels of the first subgroup and the channels of the second subgroup, based on the degree of relevance of each of the channels included in the second channel group with the first channel group. divided into
The processor is
and acquiring second audio signals corresponding to channels of the second subgroup.

제12 항에 있어서,
상기 프로세서는,
상기 제2 채널 그룹에 포함되는 채널들로부터 상기 제1 채널 그룹에 포함되는 채널들로의 변환 규칙에 따라, 상기 제1 오디오 신호들 및 상기 제2 서브 그룹의 채널들에 대응되는 제2 오디오 신호들로부터, 상기 제1 서브 그룹의 채널들에 대응되는 제4 오디오 신호들을 획득하고,
상기 인공 지능 모델을 이용하여, 상기 제2 오디오 신호들 및 상기 제4 오디오 신호들을 수정(refinement)하고,
상기 수정된 제2 오디오 신호들 및 상기 수정된 제4 오디오 신호들로부터, 상기 제3 오디오 신호들을 획득하는, 오디오 처리 장치.13. The method of claim 12,
The processor is
According to a conversion rule from channels included in the second channel group to channels included in the first channel group, a second audio signal corresponding to the first audio signals and channels of the second subgroup obtaining fourth audio signals corresponding to channels of the first subgroup from
refining the second audio signals and the fourth audio signals using the artificial intelligence model;
to obtain the third audio signals from the modified second audio signals and the modified fourth audio signals.

제13 항에 있어서,
상기 인공 지능 모델 내의 제1 레이어들을 통하여 상기 제4 오디오 신호들이 수정되고, 상기 인공 지능 모델 내의 제2 레이어들을 통하여 상기 제2 오디오 신호들이 수정되고,
상기 제1 레이어들에, 상기 제1 오디오 신호들, 상기 제2 오디오 신호들, 및 상기 제4 오디오 신호들이 입력됨으로써, 상기 수정된 제4 오디오 신호들이 획득되고,
상기 제2 레이어들에, 상기 제1 오디오 신호들, 상기 제2 오디오 신호들, 상기 수정된 제4 오디오 신호들, 및 상기 제1 레이어들로부터 출력된 값들이 입력됨으로써, 상기 수정된 제2 오디오 신호들이 획득되는 것을 특징으로 하는, 오디오 처리 장치.14. The method of claim 13,
the fourth audio signals are modified through first layers in the artificial intelligence model, and the second audio signals are modified through second layers in the artificial intelligence model;
The first audio signals, the second audio signals, and the fourth audio signals are input to the first layers, whereby the modified fourth audio signals are obtained;
The first audio signals, the second audio signals, the modified fourth audio signals, and values output from the first layers are input to the second layers, so that the modified second audio An audio processing device, characterized in that signals are obtained.

제11 항에 있어서,
상기 프로세서는,
상기 비트스트림을 압축해제 함으로써, 기본 채널 그룹에 포함된 채널들에 대응되는 오디오 신호들 및 종속 채널 그룹에 포함된 채널들에 대응되는 오디오 신호들을 획득하고,
상기 비트스트림으로부터 획득된 부가 정보, 상기 기본 채널 그룹에 포함된 채널들에 대응되는 오디오 신호들, 및 상기 종속 채널 그룹에 포함된 채널들에 대응되는 오디오 신호들에 기초하여, 상기 제1 오디오 신호들을 획득하는, 오디오 처리 장치.12. The method of claim 11,
The processor is
By decompressing the bitstream, audio signals corresponding to channels included in the basic channel group and audio signals corresponding to channels included in the subordinate channel group are obtained,
Based on the additional information obtained from the bitstream, audio signals corresponding to channels included in the basic channel group, and audio signals corresponding to channels included in the subordinate channel group, the first audio signal to acquire them, an audio processing device.

제15 항에 있어서,
상기 기본 채널 그룹은, 스테레오 재생을 위한 두 채널들을 포함하고,
상기 종속 채널 그룹은, 상기 제1 채널 그룹에 포함되는 채널들 중에서, 상기 스테레오 재생을 위한 두 채널들과 관련도가 높은 두 채널들 이외의 채널들을 포함하고,
상기 프로세서는,
상기 기본 채널 그룹에 포함된 채널들에 대응되는 오디오 신호들, 및 상기 종속 채널 그룹에 포함된 채널들에 대응되는 오디오 신호들을 믹싱하여 상기 제1 채널 그룹에 포함된 채널들에 대응되는 믹싱된(mixed) 오디오 신호들을 획득하고, 상기 부가 정보에 기초하여 상기 믹싱된 오디오 신호들을 렌더링 함으로써, 상기 제1 오디오 신호들을 획득하는, 오디오 처리 장치.16. The method of claim 15,
The basic channel group includes two channels for stereo reproduction,
The subordinate channel group includes channels other than the two channels having a high degree of relevance to the two channels for the stereo reproduction among channels included in the first channel group,
The processor is
Mixing audio signals corresponding to channels included in the basic channel group and audio signals corresponding to channels included in the subordinate channel group, and mixing audio signals corresponding to channels included in the first channel group ( mixed) audio signals, and by rendering the mixed audio signals based on the side information, the first audio signals are obtained.

제11 항에 있어서,
상기 제1 채널 그룹에 포함된 채널들에 대응되는 상기 제1 오디오 신호들은, 청자 전방 중심 다채널 오디오 신호를 포함하고,
상기 제2 채널 그룹에 포함된 채널들에 대응되는 상기 제3 오디오 신호들은, 청자 중심 다채널 오디오 신호를 포함하는, 오디오 처리 장치.12. The method of claim 11,
The first audio signals corresponding to the channels included in the first channel group include a multi-channel audio signal centered in front of a listener,
The third audio signals corresponding to channels included in the second channel group include a listener-centered multi-channel audio signal.

제12 항에 있어서,
상기 제1 채널 그룹은, 전방 좌측 채널, 전방 우측 채널, 센터 채널, 서브 우퍼 채널, 전방 상부 좌측 채널, 및 전방 상부 우측 채널로 구성되는 3.1.2 채널을 포함하고,
상기 제2 채널 그룹은, 전방 좌측 채널, 전방 우측 채널, 센터 채널, 좌측 채널, 우측 채널, 후방 좌측 채널, 후방 우측 채널, 서브 우퍼 채널, 전방 상부 좌측 채널, 전방 상부 우측 채널, 후방 상부 좌측 채널, 및 후방 상부 우측 채널로 구성되는 7.1.4 채널을 포함하고,
상기 제2 서브 그룹의 채널들은,
상기 제2 채널 그룹에 포함되는 채널들 중에서, 상기 제1 채널 그룹과의 관련도가 낮은, 상기 좌측 채널, 상기 우측 채널, 상기 후방 좌측 채널, 상기 후방 우측 채널, 상기 후방 상부 좌측 채널, 및 상기 후방 상부 우측 채널을 포함하는, 오디오 처리 장치.13. The method of claim 12,
the first channel group includes 3.1.2 channels consisting of a front left channel, a front right channel, a center channel, a subwoofer channel, a front upper left channel, and a front upper right channel,
The second channel group includes a front left channel, a front right channel, a center channel, a left channel, a right channel, a rear left channel, a rear right channel, a subwoofer channel, a front upper left channel, a front upper right channel, and a rear upper left channel. , and a 7.1.4 channel consisting of a rear upper right channel,
The channels of the second subgroup are
Among channels included in the second channel group, the left channel, the right channel, the rear left channel, the rear right channel, the rear upper left channel, and the an audio processing device comprising a rear upper right channel.

제1 채널 그룹에 포함된 채널들에 대응되는 제1 오디오 신호들로부터 제2 채널 그룹에 포함된 채널들에 대응되는 제2 오디오 신호들을 획득하는 단계;
상기 제1 채널 그룹에 포함된 채널들 중 상기 제2 채널 그룹과의 관련도에 기초하여 식별된 적어도 하나의 채널에 대응되는 적어도 하나의 제3 오디오 신호를 인공 지능 모델을 이용하여 다운샘플링하는 단계; 및
상기 제2 채널 그룹에 포함된 채널들에 대응되는 제2 오디오 신호들 및 상기 다운샘플링된 적어도 하나의 제3 오디오 신호를 포함하는 비트스트림을 생성하는 단계를 포함하고,
상기 제1 채널 그룹은 원본 오디오 신호의 채널 그룹을 포함하고, 상기 제2 채널 그룹은 제1 채널 그룹에 포함된 채널들 중 적어도 두 채널들을 결합함으로써 구성되는, 오디오 처리 방법.obtaining second audio signals corresponding to channels included in a second channel group from first audio signals corresponding to channels included in the first channel group;
Downsampling at least one third audio signal corresponding to at least one channel identified based on the degree of relevance to the second channel group among channels included in the first channel group using an artificial intelligence model; ; and
generating a bitstream including second audio signals corresponding to channels included in the second channel group and the downsampled at least one third audio signal;
The first channel group includes a channel group of an original audio signal, and the second channel group is configured by combining at least two channels among channels included in the first channel group.

비트스트림으로부터 제1 채널 그룹에 포함된 채널들에 대응되는 제1 오디오 신호들 및 다운샘플링된 제2 오디오 신호를 획득하는 단계;
상기 다운샘플링된 제2 오디오 신호를 인공 지능 모델을 이용하여 업샘플링함으로써, 제2 채널 그룹에 포함된 채널들 중에서 적어도 하나의 채널에 대응되는 적어도 하나의 제2 오디오 신호를 획득하는 단계; 및
상기 제1 오디오 신호들 및 상기 적어도 하나의 제2 오디오 신호로부터 상기 제2 채널 그룹에 포함된 채널들에 대응되는 제3 오디오 신호들을 복원하는 단계를 포함하고,
상기 제1 채널 그룹은 상기 제2 채널 그룹보다 적은 수의 채널들을 포함하는, 오디오 처리 방법.obtaining first audio signals and downsampled second audio signals corresponding to channels included in the first channel group from the bitstream;
obtaining at least one second audio signal corresponding to at least one channel from among channels included in a second channel group by upsampling the downsampled second audio signal using an artificial intelligence model; and
reconstructing third audio signals corresponding to channels included in the second channel group from the first audio signals and the at least one second audio signal;
and the first channel group includes fewer channels than the second channel group.