KR20220107913A

KR20220107913A - Apparatus and method of processing multi-channel audio signal

Info

Publication number: KR20220107913A
Application number: KR1020210140579A
Authority: KR
Inventors: 이태미; 고상철; 김경래; 김선민; 김정규; 남우현; 손윤재; 정현권; 황성희
Original assignee: 삼성전자주식회사
Priority date: 2021-01-25
Filing date: 2021-10-20
Publication date: 2022-08-02

Abstract

According to one embodiment of the present invention, disclosed is an audio processing apparatus. The apparatus includes at least one processor executing one or more instructions. The at least one processor acquires a second audio signal downmixed from at least one first audio signal from a bitstream, acquires information related to error removal for the first audio signal from the bitstream, and restores the first audio signal by applying the information related to error removal to the first audio signal de-mixed from the second audio signal. The information related to error removal may be information generated by using at least one between original signal strength of the first audio signal and signal strength after decoding of the first audio signal. The present invention may encode an audio signal of a three-dimensional audio channel layout in all directions of a listener.

Description

다채널 오디오 신호 처리 장치 및 방법 {APPARATUS AND METHOD OF PROCESSING MULTI-CHANNEL AUDIO SIGNAL}Multi-channel audio signal processing apparatus and method {APPARATUS AND METHOD OF PROCESSING MULTI-CHANNEL AUDIO SIGNAL}

본 개시는 다채널 오디오 신호를 처리하는 분야에 관한 것이다. 보다 구체적으로, 본 개시는 다채널 오디오 신호로부터 청자 전방의 3차원 오디오 채널 레이아웃의 오디오 신호를 처리하는 분야에 관한 것이다.The present disclosure relates to the field of processing a multi-channel audio signal. More specifically, the present disclosure relates to the field of processing an audio signal of a three-dimensional audio channel layout in front of a listener from a multi-channel audio signal.

오디오 신호는 2 채널, 5.1 채널, 7.1 채널, 및 9.1 채널의 오디오 신호와 같은 2차원 오디오 신호가 일반적이다.The audio signal is generally a two-dimensional audio signal such as an audio signal of two channels, 5.1 channels, 7.1 channels, and 9.1 channels.

하지만, 2차원 오디오 신호는 높이 방향의 오디오 정보가 불확실하기 때문에 음향의 공간적인 입체감을 제공하기 위해 3차원 오디오 신호(n채널 오디오 신호, 혹은 다채널 오디오 신호; n은 2보다 큰 정수)를 생성할 필요성이 있다.However, since the audio information in the height direction is uncertain in the 2D audio signal, a 3D audio signal (n-channel audio signal or multi-channel audio signal; n is an integer greater than 2) is generated to provide a spatial three-dimensional effect of the sound. there is a need to

3차원 오디오 신호를 위한 종래 채널 레이아웃은 청자를 중심으로 전방향(omni-direction)으로 채널이 배치가 된다. 다만, OTT 서비스(Over-The-Top service)의 확대, TV의 해상도 증가, 태블릿과 같은 전자 기기의 화면의 대형화에 따라, 홈 환경에서 극장용 컨텐츠와 같은 이머시브 사운드(Immersive Sound)를 경험하고자 하는 시청자의 니즈(Needs)가 증가하고 있다. 따라서, 화면상의 객체(음원)의 음상(Sound) 표현을 고려하여, 청자를 중심으로 전방에 채널이 배치되는 3차원 오디오 채널 레이아웃(청자 전방의 3차원 오디오 채널 레이아웃)의 오디오 신호를 처리할 필요성이 있다.In a conventional channel layout for a 3D audio signal, channels are arranged in an omni-direction with a listener as the center. However, as the OTT service (Over-The-Top service) expands, the resolution of TVs increases, and the screens of electronic devices such as tablets become larger, in order to experience immersive sound such as contents for theaters in the home environment, The needs of viewers are increasing. Therefore, in consideration of the sound representation of the object (sound source) on the screen, it is necessary to process the audio signal of the 3D audio channel layout (3D audio channel layout in front of the listener) in which the channels are arranged in front of the listener. There is this.

또한, 종래 3차원 오디오 신호 처리 시스템의 경우, 3차원 오디오 신호의 각 독립 채널들에 대한 독립적인 오디오 신호를 부/복호화하였고, 특히, 종래 스테레오 오디오 신호와 같은 2차원 오디오 신호를 복원하기 위해서는, 반드시 3차원 오디오 신호를 복원한 후에, 복원된 3차원 오디오 신호를 다운믹싱해야 하는 문제점이 있었다.In addition, in the case of a conventional 3D audio signal processing system, independent audio signals for each independent channel of a 3D audio signal are encoded/decoded. In particular, in order to restore a 2D audio signal such as a conventional stereo audio signal, There is a problem in that the reconstructed 3D audio signal must be downmixed after the 3D audio signal is restored.

일 실시예는, 청자 전방의 3차원 오디오 채널 레이아웃을 지원하는 다채널 오디오 신호를 처리하는 것을 기술적 과제로 한다.According to one embodiment, it is a technical task to process a multi-channel audio signal supporting a three-dimensional audio channel layout in front of a listener.

일 실시예에 따른 오디오 처리 방법은, 적어도 하나의 제 1 오디오 신호를 다운믹싱하여 제 2 오디오 신호를 생성하는 단계; 상기 제 1 오디오 신호의 원 신호 세기(original signal power) 및 상기 제 1 오디오 신호의 복호화 후 신호 세기 중 적어도 하나를 이용하여, 상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보를 생성하는 단계; 및 상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보 및 상기 다운믹싱된 제 2 오디오 신호를 송신하는 단계를 포함한다.An audio processing method according to an embodiment includes: downmixing at least one first audio signal to generate a second audio signal; generating information related to error cancellation for the first audio signal by using at least one of an original signal power of the first audio signal and a signal strength after decoding of the first audio signal; and transmitting information related to error cancellation for the first audio signal and the downmixed second audio signal.

상기 에러 제거와 관련된 정보는 상기 에러 제거를 위한 펙터에 관한 정보를 포함하고, 상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보를 생성하는 단계는, 상기 제 1 오디오 신호의 원 신호 세기가 소정의 제 1 값보다 작거나 같은 경우, 상기 에러 제거를 위한 펙터의 값이 0임을 나타내는, 상기 에러 제거를 위한 펙터에 관한 정보를 생성하는 단계를 포함할 수 있다.The error cancellation-related information includes information about a factor for the error cancellation, and the generating of the error cancellation-related information for the first audio signal may include: The method may include generating information about the factor for error cancellation indicating that the value of the factor for error cancellation is 0 when it is less than or equal to the first value.

상기 에러 제거와 관련된 정보는 상기 에러 제거를 위한 펙터에 관한 정보를 포함하고, 상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보를 생성하는 단계는, 상기 제 1 오디오 신호의 원 신호 세기와 상기 제 2 오디오 신호의 원 신호 세기의 비율이 소정의 제 2 값보다 작은 경우, 상기 제 1 오디오 신호의 원 신호 세기 및 상기 제 1 오디오 신호의 복호화후 신호 세기를 기초로, 상기 에러 제거를 위한 펙터에 관한 정보를 생성하는 단계를 포함할 수 있다.The information related to the error cancellation includes information about a factor for the error cancellation, and the generating of the error cancellation related information for the first audio signal includes: an original signal strength of the first audio signal and the second When the ratio of the original signal strength of the second audio signal is less than a predetermined second value, based on the original signal strength of the first audio signal and the signal strength after decoding of the first audio signal, the factor for removing the error is It may include generating information about the

상기 에러 제거를 위한 펙터에 관한 정보를 생성하는 단계는,The step of generating information about the factor for removing the error comprises:

상기 에러 제거를 위한 펙터의 값이, 상기 제 1 오디오 신호의 원 신호 세기와 상기 제 1 오디오 신호의 복호화후 신호 세기의 비율(ratio)임을 나타내는 상기 에러 제거를 위한 펙터에 관한 정보를 생성하는 단계를 포함할 수 있다.Generating information about the factor for error cancellation indicating that the value of the factor for error cancellation is a ratio of the original signal strength of the first audio signal and the signal strength after decoding of the first audio signal may include.

상기 에러 제거를 위한 펙터에 관한 정보를 생성하는 단계는, 상기 제 1 오디오 신호의 원 신호 세기와 상기 제 1 오디오 신호의 복호화후 신호 세기의 비율(ratio)이 1보다 큰 경우, 상기 에러 제거를 위한 펙터의 값이 1임을 나타내는 상기 에러 제거를 위한 펙터에 관한 정보를 생성하는 단계를 포함할 수 있다.The generating of the information on the factor for error removal may include, when a ratio between the original signal strength of the first audio signal and the signal strength after decoding of the first audio signal is greater than 1, the error removal is performed. and generating information about the factor for removing the error indicating that the value of the factor for error removal is 1.

상기 에러 제거와 관련된 정보는 상기 에러 제거를 위한 펙터에 관한 정보를 포함하고, 상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보를 생성하는 단계는, 상기 제 1 오디오 신호의 원 신호 세기와 상기 제 2 오디오 신호의 원 신호 세기의 비율이 소정의 제 2 값보다 크거나 같은 경우, 상기 에러 제거를 위한 펙터의 값이 1임을 나타내는 상기 에러 제거를 위한 펙터에 관한 정보를 생성하는 단계를 생성하는 단계를 포함할 수 있다.The information related to the error cancellation includes information about a factor for the error cancellation, and the generating of the error cancellation related information for the first audio signal includes: an original signal strength of the first audio signal and the second 2 When the ratio of the original signal strength of the audio signal is greater than or equal to a predetermined second value, generating information about the factor for error cancellation indicating that the value of the factor for error cancellation is 1; may include.

상기 에러 제거와 관련된 정보는 상기 제 2 오디오 신호의 프레임마다 생성될 수 있다.The information related to the error cancellation may be generated for each frame of the second audio signal.

상기 다운믹싱된 제 2 오디오 신호는, 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호를 포함하고, 상기 종속 채널 그룹의 오디오 신호는 청자 전방의 3차원 오디오 채널에 포함된 독립 채널의 오디오 신호를 포함하는 제 1 종속 채널 오디오 신호를 포함하고, 청자 측방 및 후방의 3차원 오디오 채널의 오디오 신호는 상기 제 1 종속 채널의 오디오 신호를 믹싱하여 획득된 오디오 신호일 수 있다.The downmixed second audio signal includes an audio signal of a basic channel group and an audio signal of a subordinate channel group, and the audio signal of the subordinate channel group is an audio signal of an independent channel included in a 3D audio channel in front of a listener and a first sub-channel audio signal including, wherein the audio signal of the 3D audio channel on the side and behind the listener may be an audio signal obtained by mixing the audio signal of the first sub-channel.

상기 기본 채널 그룹의 오디오 신호는 제 1 채널의 오디오 신호 및 제 2 채널의 오디오 신호를 포함하고, 상기 제 1 채널의 오디오 신호는 좌측 스테레오 채널의 오디오 신호 및 청자 전방의 중심 채널의 복호화된 오디오 신호를 믹싱하여 생성된 신호이고, 상기 제 2 채널의 오디오 신호는 우측 스테레오 채널의 오디오 신호 및 청자 전방의 중심 채널의 복호화된 오디오 신호를 믹싱하여 생성된 신호일 수 있다.The audio signal of the basic channel group includes an audio signal of a first channel and an audio signal of a second channel, and the audio signal of the first channel includes an audio signal of a left stereo channel and a decoded audio signal of a center channel in front of the listener. The audio signal of the second channel may be a signal generated by mixing the audio signal of the right stereo channel and the decoded audio signal of the center channel in front of the listener.

상기 다운믹싱된 제 2 오디오 신호는, 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호를 포함하고, 상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보 및 상기 다운믹싱된 제 2 오디오 신호를 송신하는 단계는, 상기 에러 제거와 관련된 정보 및 상기 다운믹싱된 제 2 오디오 신호에 관한 정보를 포함하는 비트스트림을 생성하는 단계; 및 상기 비트스트림을 송신하는 단계를 포함하고, 상기 비트스트림은 복수의 오디오 트랙의 파일 스트림이고, 상기 비트스트림을 생성하는 단계는, 기본 채널 그룹의 압축 오디오 신호를 포함하는 제 1 오디오 트랙의 오디오 스트림을 생성하는 단계; 및 종속 채널 오디오 신호 식별 정보를 포함하는 제 2 오디오 트랙의 오디오 스트림을 생성하는 단계를 포함하고, 상기 제 2 오디오 트랙은 상기 제 1 오디오 트랙에 인접하고, 상기 기본 채널 그룹의 오디오 신호에 대응하는, 종속 채널의 오디오 신호가 존재하는 경우, 상기 종속 채널의 오디오 신호가 존재함을 나타내는 상기 종속 채널 오디오 신호 식별 정보가 생성되고, 상기 종속 채널 오디오 신호 식별 정보는 종속 채널 오디오 신호가 존재함을 나타내는 경우, 상기 제 2 오디오 트랙의 오디오 스트림은 종속 채널 그룹의 압축 오디오 신호를 포함하고, 상기 종속 채널 오디오 신호 식별 정보는 종속 채널 오디오 신호가 존재하지 않음을 나타내는 경우, 상기 제 2 오디오 트랙의 오디오 스트림은 기본 채널 그룹의 다음 트랙의 오디오 신호를 포함할 수 있다.The downmixed second audio signal includes an audio signal of a base channel group and an audio signal of a dependent channel group, and transmits information related to error cancellation for the first audio signal and the downmixed second audio signal The method may include: generating a bitstream including information related to the error cancellation and information related to the downmixed second audio signal; and transmitting the bitstream, wherein the bitstream is a file stream of a plurality of audio tracks, and generating the bitstream comprises: audio of a first audio track including a compressed audio signal of a basic channel group creating a stream; and generating an audio stream of a second audio track comprising dependent channel audio signal identification information, wherein the second audio track is adjacent to the first audio track and corresponds to an audio signal of the base channel group. , when the audio signal of the dependent channel exists, the dependent channel audio signal identification information indicating that the audio signal of the dependent channel exists is generated, and the dependent channel audio signal identification information indicating that the audio signal of the dependent channel exists when the audio stream of the second audio track includes a compressed audio signal of a dependent channel group, and the dependent channel audio signal identification information indicates that there is no dependent channel audio signal, the audio stream of the second audio track may include the audio signal of the next track of the basic channel group.

상기 다운믹싱된 제 2 오디오 신호는, 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호를 포함하고, 상기 기본 채널의 오디오 신호는 스테레오 채널의 오디오 신호를 포함하고, 상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보 및 상기 다운믹싱된 제 2 오디오 신호를 송신하는 단계는, 상기 에러 제거와 관련된 정보 및 상기 다운믹싱된 제 2 오디오 신호에 관한 정보를 포함하는 비트스트림을 생성하는 단계; 및 상기 비트스트림을 송신하는 단계를 포함하고,The downmixed second audio signal includes an audio signal of a basic channel group and an audio signal of a dependent channel group, the audio signal of the basic channel includes an audio signal of a stereo channel, Transmitting the error cancellation-related information and the downmixed second audio signal may include: generating a bitstream including the error cancellation-related information and the downmixed second audio signal information; and transmitting the bitstream;

상기 비트스트림을 생성하는 단계는, 스테레오 채널의 압축 오디오 신호를 포함하는 기본 채널 오디오 스트림을 생성하는 단계; 및 복수의 종속 채널 그룹의 복수의 오디오 신호를 포함하는 복수의 종속 채널 오디오 스트림을 생성하는 단계를 포함하고, 상기 복수의 종속 채널 오디오 스트림은 제 1 종속 채널 오디오 스트림 및 제 2 종속 채널 오디오 스트림을 포함하고, 기본 채널 오디오 스트림 및 제 1 종속 채널 오디오 스트림을 생성하기 위해 이용된 다채널의 오디오 신호의 서라운드 채널의 개수는 S_n-1, 서브 우퍼 채널의 개수는 W_n-1, 높이 채널의 개수는 H_n-1이라고 하고, 기본 채널 오디오 스트림, 상기 제 1 종속 채널 오디오 스트림 및 제 2 종속 채널 오디오 스트림을 생성하기 위해 이용된 다채널의 오디오 신호의 서라운드 채널의 개수는 S_n, 서브 우퍼 채널의 개수는 W_n, 높이 채널의 개수는 H_n이라고 할 때, S_n-1은 S_n보다 작거나 같고, W_n-1은 W_n보다 작거나 같고, H_n-1은 H_n보다 작거나 같으나, S_n-1,W_n-1및 H_n-1이 모두 S_n, W_n,및H_n과 같지는 않은 것으로 제한될 수 있다.The generating of the bitstream may include: generating a base channel audio stream including a compressed audio signal of a stereo channel; and generating a plurality of dependent channel audio streams comprising a plurality of audio signals of a plurality of dependent channel groups, wherein the plurality of dependent channel audio streams comprises a first dependent channel audio stream and a second dependent channel audio stream. wherein the number of surround channels of the multi-channel audio signal used to generate the base channel audio stream and the first sub-channel audio stream is S _n-1 , the number of subwoofer channels is W _n-1 , and the number of height channels is Let the number be H _n-1 , and the number of surround channels of the multi-channel audio signal used to generate the base channel audio stream, the first sub-channel audio stream, and the second sub-channel audio stream is S _n , the subwoofer When the number of channels is W _n and the number of height channels is H _n , S _n-1 is less than or equal to S _n , W _n-1 is less than or equal to W _n , and H _n-1 is greater than H _n less than or equal to, but S _n-1, W _n-1 and H _n-1 are all S _n , W _{n ,} andmay be limited to not equal to H _n .

상기 오디오 처리 방법은, 오디오 객체(음원)의 오디오 신호, 위치, 방향 중 적어도 하나를 나타내는 청자 전방의 3차원 오디오 채널의 오디오 객체 신호를 생성하는 단계를 더 포함하고, 상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보 및 상기 다운믹싱된 제 2 오디오 신호를 송신하는 단계는, 상기 에러 제거와 관련된 정보, 청자 천방의 3차원 오디오 채널의 오디오 객체 신호 및 상기 다운믹싱된 제 2 오디오 신호에 관한 정보를 포함하는 비트스트림을 생성하는 단계; 및 상기 비트스트림을 송신하는 단계를 포함할 수 있다.The audio processing method further includes generating an audio object signal of a three-dimensional audio channel in front of a listener indicating at least one of an audio signal, a position, and a direction of an audio object (sound source), The step of transmitting the error cancellation-related information and the downmixed second audio signal includes: the error cancellation-related information, an audio object signal of a three-dimensional audio channel near the listener, and information on the downmixed second audio signal generating a bitstream comprising; and transmitting the bitstream.

다른 실시예에 따른 오디오 처리 방법은, 비트스트림으로부터, 적어도 하나의 제 1 오디오 신호로부터 다운믹싱된 제 2 오디오 신호를 획득하는 단계; 상기 비트스트림으로부터, 상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보를 획득하는 단계; 및 상기 다운믹싱된 제 2 오디오 신호로부터 상기 제1 오디오 신호를 디믹싱하는 단계; 및 상기 에러 제거와 관련된 정보를 상기 디믹싱된 제 1 오디오 신호에 적용하여 상기 제 1 오디오 신호를 복원하는 단계를 포함하고, 상기 에러 제거와 관련된 정보는, 상기 제 1 오디오 신호의 원 신호 세기 및 상기 제 1 오디오 신호의 복호화후 신호 세기 중 적어도 하나를 이용하여 생성된 정보일 수 있다.An audio processing method according to another embodiment includes: obtaining, from a bitstream, a second audio signal downmixed from at least one first audio signal; obtaining, from the bitstream, information related to error cancellation for the first audio signal; and demixing the first audio signal from the downmixed second audio signal. and restoring the first audio signal by applying the information related to the error cancellation to the demixed first audio signal, wherein the information related to the error cancellation includes an original signal strength of the first audio signal and It may be information generated using at least one of signal strengths after decoding of the first audio signal.

상기 에러 제거와 관련된 정보는, 상기 에러 제거를 위한 펙터에 관한 정보를 포함하고, 상기 에러 제거를 위한 펙터는 0보다 크거나 같고, 1보다 작거나 같은 값일 수 있다. The information related to the error cancellation may include information about a factor for the error cancellation, and the factor for the error cancellation may be greater than or equal to 0 and less than or equal to 1.

상기 제 1 오디오 신호를 복원하는 단계는, 상기 디믹싱된 제 1 오디오 신호의 신호 세기에 상기 에러 제거를 위한 펙터를 곱한 신호 세기를 갖도록, 상기 제 1 오디오 신호를 복원하는 단계를 포함할 수 있다.Restoring the first audio signal may include restoring the first audio signal to have a signal strength obtained by multiplying a signal strength of the demixed first audio signal by a factor for removing the error. .

상기 비트스트림은 기본 채널 그룹의 오디오 신호에 관한 정보 및 종속 채널 그룹의 오디오 신호에 관한 정보를 포함하고, 상기 기본 채널 그룹의 오디오 신호는 다른 채널 그룹의 오디오 신호와의 디믹싱 없이, 상기 비트스트림에 포함된 기본 채널 그룹의 오디오 신호에 관한 정보를 복호화하여 획득된 오디오 신호이고,The bitstream includes information about an audio signal of a base channel group and information about an audio signal of a dependent channel group, wherein the audio signal of the base channel group is the bitstream without demixing with an audio signal of another channel group An audio signal obtained by decoding information about an audio signal of a basic channel group included in

상기 종속 채널 그룹의 오디오 신호는, 기본 채널 그룹의 오디오 신호와의 디믹싱을 통해 적어도 하나의 업믹스 채널을 포함하는 업믹스 채널 그룹의 오디오 신호를 복원하기 위한 오디오 신호일 수 있다.The audio signal of the dependent channel group may be an audio signal for reconstructing an audio signal of an upmix channel group including at least one upmix channel through demixing with the audio signal of the basic channel group.

상기 종속 채널 그룹의 오디오 신호는 제 1 종속 채널 오디오 신호 및 제 2 종속 채널 오디오 신호를 포함하고, 상기 제 1 종속 채널 오디오 신호는 청자 전방의 독립 채널의 오디오 신호를 포함하고, 상기 제 2 종속 채널 오디오 신호는 청자 측방 및 후방의 채널의 오디오 신호가 믹싱된(mixed) 오디오 신호를 포함할 수 있다.The audio signal of the dependent channel group includes a first dependent channel audio signal and a second dependent channel audio signal, the first dependent channel audio signal includes an audio signal of an independent channel in front of a listener, and the second dependent channel The audio signal may include an audio signal in which audio signals of channels on the side and behind the listener are mixed.

상기 기본 채널 그룹의 오디오 신호는 제 1 채널의 오디오 신호 및 제 2 채널의 오디오 신호를 포함하고, 상기 제 1 채널의 오디오 신호는 좌측 스테레오 채널의 오디오 신호와 복호화된 청자 전방의 중심(center) 채널의 오디오 신호를 믹싱하여 생성된 신호이고, 우측 스테레오 채널의 오디오 신호와 압축후 압축해제된 청자 전방의 중심 채널의 오디오 신호를 믹싱하여 생성된 신호일 수 있다.The audio signal of the basic channel group includes an audio signal of a first channel and an audio signal of a second channel, and the audio signal of the first channel includes an audio signal of a left stereo channel and a center channel in front of a decoded listener. It may be a signal generated by mixing the audio signal of

상기 기본 채널 그룹은 모노 채널 또는 스테레오 채널을 포함하고, 상기 적어도 하나의 업믹스 채널은 청자 전방의 3차원 오디오 채널 또는 청자 전방향의 3차원 오디오 채널 중 상기 기본 채널 그룹의 채널을 제외한 적어도 하나의 채널로, 비연속적인(discrete) 오디오 채널일 수 있다.The basic channel group includes a mono channel or a stereo channel, and the at least one upmix channel includes at least one of a 3D audio channel in front of a listener or a 3D audio channel in a front direction of the listener except for a channel in the basic channel group. As a channel, it may be a discrete audio channel.

상기 청자 전방의 3차원 오디오 채널은 3.1.2 채널이고, 상기 3.1.2 채널은 상기 청자 전방의 3개의 서라운드 채널(surround channel), 상기 청자 전방의 1개의 서브우퍼 채널(subwoofer channel), 및 2개의 높이 채널(height channel)을 갖는 채널이고, 상기 청자 전방향의 3차원 오디오 채널은 5.1.2 채널 또는 7.1.4 채널 중 적어도 하나이고, 상기 5.1.2 채널은 상기 청자 전방의 3개의 서라운드 채널, 청자 측방 및 후방의 2개의 서라운드 채널, 상기 청자 전방의 1개의 서브우퍼 채널, 상기 청자 전방의 2개의 높이 채널을 갖는 채널이고, 상기 7.1.4 채널은 상기 청자 전방의 3개의 서라운드 채널, 상기 청자 측방 및 후방의 4개의 서라운드 채널, 상기 청자 전방의 1개의 서브우퍼 채널, 상기 청자 전방의 2개의 높이 채널 및 상기 청자 측방 및 후방의 2개의 높이 채널을 갖는 채널일 수 있다.The three-dimensional audio channel in front of the listener is a 3.1.2 channel, and the 3.1.2 channel is three surround channels in front of the listener, one subwoofer channel in front of the listener, and 2 a channel having height channels, the three-dimensional audio channel in front of the listener is at least one of a 5.1.2 channel or a 7.1.4 channel, and the 5.1.2 channel is three surround channels in front of the listener. , a channel having two surround channels to the side and rear of the listener, one subwoofer channel in front of the listener, and two height channels in front of the listener, wherein the 7.1.4 channel is three surround channels in front of the listener, the It may be a channel having four surround channels side and rear of the listener, one subwoofer channel in front of the listener, two height channels in front of the listener, and two height channels in the side and rear of the listener.

상기 디믹싱된 제 1 오디오 신호는, 적어도 하나의 업믹스 채널의 오디오 신호 및 독립 채널의 오디오 신호를 포함하고, 상기 독립 채널의 오디오 신호는 기본 채널 그룹의 오디오 신호 및 상기 종속 채널 그룹의 오디오 신호 중 일부를 포함할 수 있다.The first demixed audio signal includes an audio signal of at least one upmix channel and an audio signal of an independent channel, wherein the audio signal of the independent channel includes an audio signal of a basic channel group and an audio signal of the dependent channel group. may include some of them.

상기 비트스트림은 서로 인접하는 제 1 오디오 트랙 및 제 2 오디오 트랙을 포함하는 복수의 오디오 트랙의 파일 스트림이고, 상기 제 1 오디오 트랙으로부터, 기본 채널 그룹의 오디오 신호가 획득되고, 상기 제 2 오디오 트랙으로부터, 종속 채널 오디오 신호 식별 정보가 획득되고, 상기 획득된 종속 채널 오디오 신호 식별 정보는 상기 제 2 오디오 트랙에 종속 채널 오디오 신호가 존재함을 나타내는 경우, 상기 제 2 오디오 트랙으로부터 종속 채널 그룹의 오디오 신호가 획득되고,상기 획득된 종속 채널 오디오 신호 식별 정보는 상기 제 2 오디오 트랙에 종속 채널 오디오 신호가 존재하지 않음을 나타내는 경우, 상기 제 2 오디오 트랙으로부터 상기 기본 채널 그룹의 다음 트랙의 오디오 신호가 획득될 수 있다.The bitstream is a file stream of a plurality of audio tracks including a first audio track and a second audio track adjacent to each other, from the first audio track an audio signal of a basic channel group is obtained, and the second audio track When dependent channel audio signal identification information is obtained from, the obtained dependent channel audio signal identification information indicates that a dependent channel audio signal exists in the second audio track, the audio of the dependent channel group from the second audio track a signal is obtained, and when the obtained dependent channel audio signal identification information indicates that a dependent channel audio signal does not exist in the second audio track, the audio signal of the next track of the base channel group from the second audio track is can be obtained.

상기 비트스트림은 기본 채널 오디오 스트림 및 복수의 종속 채널 오디오 스트림을 포함하고,the bitstream includes a base channel audio stream and a plurality of dependent channel audio streams;

상기 복수의 종속 채널 오디오 스트림은 제 1 종속 채널 오디오 스트림 및 제 2 종속 채널 오디오 스트림을 포함하고,wherein the plurality of dependent channel audio streams include a first dependent channel audio stream and a second dependent channel audio stream,

상기 기본 채널 오디오 스트림은 스테레오 채널의 오디오 신호를 포함하고,The base channel audio stream includes an audio signal of a stereo channel,

기본 채널 오디오 스트림 및 제 1 종속 채널 오디오 스트림을 통해 복원된 다채널의 제 1 오디오 신호의 서라운드 채널의 개수는 S_n-1, 서브 우퍼 채널의 개수는 W_n-1, 높이 채널의 개수는 H_n-1이라고 하고, 기본 채널 오디오 스트림, 제 1 종속 채널 오디오 스트림 및 제 2 종속 채널 오디오 스트림을 통해 복원된 다채널의 제 2 오디오 신호의 서라운드 채널은 S_n, 서브 우퍼 채널은 W_n, 높이 채널은 H_n이라고 할 때, S_n-1은 S_n보다 작거나 같고, W_n-1은 W_n보다 작거나 같고, H_n-1은 H_n보다 작거나 같으나, S_n-1,W_n-1및 H_n-1이 모두 S_n, W_n,및 H_n과 같지는 않은 것으로 제한될 수 있다.The number of surround channels of the multi-channel first audio signal reconstructed through the base channel audio stream and the first dependent channel audio stream is S _n-1 , the number of subwoofer channels is W _n-1 , and the number of height channels is H _{Let n-1} be, the surround channel of the multi-channel second audio signal reconstructed through the base channel audio stream, the first dependent channel audio stream, and the second dependent channel audio stream is S _n , the subwoofer channel is W _n , the height When the channel is H _n , S _n-1 is less than or equal to S _n , W _n-1 is less than or equal to W _n , and H _n-1 is less than or equal to H _n , but S _n-1, W It can be limited that _n-1 and H _n-1 are not both equal to S _n , W _{n ,} and H _n .

상기 오디오 처리 방법은, 상기 비트스트림으로부터 오디오 객체(음원)의 오디오 신호, 위치, 방향 중 적어도 하나를 나타내는 청자 전방의 3차원 오디오 채널의 오디오 객체 신호를 획득하는 단계를 더 포함하고, 상기 기본 채널 그룹의 오디오 신호 및 상기 종속 채널 그룹의 오디오 신호로부터 생성된 청자 전방의 3차원 오디오 채널의 오디오 신호와 상기 청자 전방의 3차원 오디오 채널의 오디오 객체 신호를 기초로, 청자 전방의 3차원 오디오 채널의 오디오 신호가 복원될 수 있다.The audio processing method further includes obtaining an audio object signal of a three-dimensional audio channel in front of a listener indicating at least one of an audio signal, a position, and a direction of an audio object (sound source) from the bitstream, and the basic channel Based on the audio signal of the 3D audio channel in front of the listener and the audio object signal of the 3D audio channel in front of the listener generated from the audio signal of the group and the audio signal of the subordinate channel group, The audio signal may be restored.

상기 오디오 처리 방법은, 상기 비트스트림으로부터 다채널 오디오 관련 부가 정보를 획득하는 단계를 더 포함하고, 상기 다채널 오디오 관련 부가 정보는 기본 채널 오디오 스트림 및 종속 채널 오디오 스트림을 포함하는 오디오 스트림의 총 개수에 관한 정보, 다운믹스 이득 정보, 채널 맵핑 테이블 정보, 음량 정보, 저주파 효과 이득 정보, 동적 범위 제어(DRC) 정보, 채널 레이아웃 렌더링 정보, 커플링된 오디오 스트림의 개수 정보, 상기 다채널의 레이아웃을 나타내는 정보, 오디오 신호 내 대화 존재 여부 및 대화 레벨에 관한 정보, 저주파 효과 출력 여부를 나타내는 정보, 화면 상 오디오 객체의 존재 여부에 관한 정보, 연속적인 채널 오디오 신호의 존재 여부 또는 비연속적인 채널 오디오 신호의 존재 여부에 관한 정보 및 상기 다채널의 오디오 신호를 생성하기 위한 디믹싱 행렬의 적어도 하나의 디믹싱 파라미터를 포함하는 디믹싱에 관한 정보 중 적어도 하나를 포함할 수 있다.The audio processing method further comprises obtaining multi-channel audio related side information from the bitstream, wherein the multi-channel audio related side information is a total number of audio streams including a base channel audio stream and a dependent channel audio stream. information about, downmix gain information, channel mapping table information, volume information, low-frequency effect gain information, dynamic range control (DRC) information, channel layout rendering information, information on the number of coupled audio streams, and the layout of the multi-channel Information indicating the presence or absence of dialogue in the audio signal and information on the dialogue level, information indicating whether low-frequency effects are output, information regarding the existence of an audio object on the screen, the existence of a continuous channel audio signal, or a discontinuous channel audio signal It may include at least one of information about the existence of , and information about demixing including at least one demixing parameter of a demixing matrix for generating the multi-channel audio signal.

일 실시예에 따른 오디오 처리 장치는, 하나 이상의 인스트럭션을 실행하는 적어도 하나의 프로세서를 포함하고, 상기 적어도 하나의 프로세서는, 비트스트림으로부터, 적어도 하나의 제 1 오디오 신호로부터 다운믹싱된 제 2 오디오 신호를 획득하고, 상기 비트스트림으로부터, 상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보를 획득하고, 상기 다운믹싱된 제 2 오디오 신호로부터 상기 제1 오디오 신호를 디믹싱하고, 상기 에러 제거와 관련된 정보를 상기 제 2 오디오 신호로부터 상기 디믹싱된 제 1 오디오 신호에 적용하여 상기 제 1 오디오 신호를 복원하고, 상기 에러 제거와 관련된 정보는, 상기 제 1 오디오 신호의 원 신호 세기 및 상기 제 1 오디오 신호의 복호화후 신호 세기 중 적어도 하나를 이용하여 생성된 정보일 수 있다.An audio processing apparatus according to an embodiment includes at least one processor executing one or more instructions, wherein the at least one processor includes a second audio signal downmixed from a bitstream and at least one first audio signal. obtain, from the bitstream, information related to error cancellation for the first audio signal, demix the first audio signal from the downmixed second audio signal, and information related to the error cancellation is applied to the demixed first audio signal from the second audio signal to restore the first audio signal, and the information related to the error removal includes an original signal strength of the first audio signal and the first audio signal It may be information generated using at least one of signal strengths after decoding of .

일 실시예에 따른 오디오 처리 방법은 적어도 하나의 제 1 오디오 신호를 다운믹싱하여 제 2 오디오 신호를 생성하는 단계; 상기 제 2 오디오 신호의 원 신호 세기 및 상기 제 1 오디오 신호의 복호화 후 신호 세기 중 적어도 하나를 이용하여, 상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보를 생성하는 단계; 상기 에러 제거와 관련된 정보에 대하여, 저주파 효과(Low Frequency Effect) 채널의 오디오 신호의 생성을 위한 신경망(Neural Network)을 이용하여, 저주파 효과 채널의 오디오 신호를 생성하는 단계; 및 상기 다운믹싱된 제 2 오디오 신호 및 상기 저주파 효과 채널의 오디오 신호를 송신하는 단계를 포함할 수 있다.An audio processing method according to an embodiment includes the steps of: downmixing at least one first audio signal to generate a second audio signal; generating information related to error removal for the first audio signal by using at least one of an original signal strength of the second audio signal and a signal strength after decoding of the first audio signal; generating an audio signal of a low frequency effect channel using a neural network for generating an audio signal of a low frequency effect channel with respect to the information related to the error cancellation; and transmitting the downmixed second audio signal and the audio signal of the low-frequency effect channel.

다른 실시예에 따른 오디오 처리 방법은 비트스트림으로부터, 적어도 하나의 제 1 오디오 신호로부터 다운믹싱된 제 2 오디오 신호를 획득하는 단계; 상기 비트스트림으로부터, 저주파 효과 채널의 오디오 신호를 획득하는 단계; 상기 획득된 저주파 효과 채널의 오디오 신호에 대하여, 부가 정보 획득을 위한 신경망을 이용하여, 상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보를 획득하는 단계; 및 상기 에러 제거와 관련된 정보를 상기 제 2 오디오 신호로부터 업믹싱된 제 1 오디오 신호에 적용하여 상기 제 1 오디오 신호를 복원하는 단계를 포함하고, 상기 에러 제거와 관련된 정보는, 상기 제 1 오디오 신호의 원 신호 세기 및 상기 제 1 오디오 신호의 복호화후 신호 세기 중 적어도 하나를 이용하여 생성된 정보일 수 있다.An audio processing method according to another embodiment includes: obtaining, from a bitstream, a second audio signal downmixed from at least one first audio signal; obtaining, from the bitstream, an audio signal of a low-frequency effect channel; obtaining information related to error removal of the first audio signal by using a neural network for obtaining additional information with respect to the obtained audio signal of the low frequency effect channel; and restoring the first audio signal by applying the information related to the error cancellation to the first audio signal upmixed from the second audio signal, wherein the information related to the error cancellation includes the first audio signal It may be information generated using at least one of the original signal strength of , and the signal strength after decoding of the first audio signal.

컴퓨터로 판독 가능한 기록매체는 상기 오디오 처리 방법을 구현하기 위한 프로그램이 기록될 수 있다.A computer-readable recording medium may record a program for implementing the audio processing method.

일 실시예의 다채널 오디오 신호 처리 방법 또는 그 장치에 따르면, 종래 스테레오(2채널) 오디오 신호와의 하위 호환을 지원하면서, 청자 전방의 3차원 오디오 채널 레이아웃의 오디오 신호를 부호화하고, 나아가, 청자 전방향의 3차원 오디오 채널 레이아웃의 오디오 신호를 부호화할 수 있다.According to the multi-channel audio signal processing method or apparatus of an embodiment, an audio signal of a three-dimensional audio channel layout in front of a listener is encoded while supporting backward compatibility with a conventional stereo (two-channel) audio signal, and further, It is possible to encode an audio signal of a three-dimensional audio channel layout of a direction.

일 실시예의 다채널 오디오 신호 처리 방법 또는 그 장치에 따르면, 종래 스테레오(2채널) 오디오 신호와의 하위 호환을 지원하면서, 청자 전방의 3차원 오디오 채널 레이아웃의 오디오 신호를 복호화하고, 나아가, 청자 전방향의 3차원 오디오 채널 레이아웃의 오디오 신호를 복호화할 수 있다.According to the multi-channel audio signal processing method or apparatus of an embodiment, it decodes the audio signal of the 3D audio channel layout in front of the listener while supporting backward compatibility with the conventional stereo (2-channel) audio signal, and furthermore, It is possible to decode the audio signal of the 3D audio channel layout of the direction.

다만, 일 실시예에 따른 다채널 오디오 신호의 처리 장치 및 방법이 달성할 수 있는 효과는 이상에서 언급한 것들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.However, effects achievable by the apparatus and method for processing a multi-channel audio signal according to an embodiment are not limited to those mentioned above, and other effects not mentioned above are described below in the technical field to which the present disclosure pertains. It will be clearly understood by those of ordinary skill in the art.

본 명세서에서 인용되는 도면을 보다 충분히 이해하기 위하여 각 도면의 간단한 설명이 제공된다.
도 1a는 일 실시예에 따른 스케일러블 오디오 채널 레이아웃 구조(scalable channel layout structure)를 설명하기 위한 도면이다.
도 1b는 구체적인 스케일러블 오디오 채널 레이아웃 구조의 일 예를 설명하기 위한 도면이다.
도 2a는 일 실시예에 따른 오디오 부호화 장치의 구성을 도시하는 블록도이다.
도 2b는 일 실시예에 따른 오디오 부호화 장치의 구성을 도시하는 블록도이다.
도 2c는 일 실시예에 따른 다채널 오디오 신호 처리부의 구성을 도시하는 블록도이다.
도 2d는 오디오 신호 분류부의 구체적인 동작의 일 예를 설명하기 위한 도면이다.
도 3a는 일 실시예에 따른 다채널 오디오 복호화 장치의 구성을 도시하는 블록도이다.
도 3b는 일 실시예에 따른 다채널 오디오 복호화 장치의 구성을 도시하는 블록도이다.
도 3c는 일 실시예에 따른 다채널 오디오 신호 복원부의 구성을 도시하는 블록도이다.
도 3d는 일 실시예에 따른 업믹스 채널 오디오 생성부의 구성을 도시하는 블록도이다.
도 4a는 다른 실시예에 따른 오디오 부호화 장치의 구성을 도시하는 블록도이다.
도 4b는 일 실시예에 따른 복원부의 구성을 도시하는 블록도이다.
도 5a는 다른 실시예에 따른 오디오 복호화 장치의 구성을 도시하는 블록도이다.
도 5b는 일 실시예에 따른 다채널 오디오 신호 복원부의 구성을 도시하는 블록도이다.
도 6은 일 실시예에 따른 파일 구조를 도시하는 도면이다.
도 7a는 일 실시예에 따른 파일의 구체적인 구조를 설명하기 위한 도면이다.
도 7b는 도 7a의 파일 구조에 따라, 오디오 복호화 장치가 오디오 신호를 재생하는 방법의 흐름도를 도시한다.
도 8a는 또 다른 실시예에 따른 파일 구조를 설명하기 위한 도면이다.
도 8b는 도 8a의 파일 구조에 따라, 오디오 복호화 장치가 오디오 신호를 재생하는 방법의 흐름도를 도시한다.
도 9a는 도 7a의 파일 구조에 따른, 오디오 트랙의 패킷을 설명하기 위한 도면이다.
도 9b는 도 7c의 파일 구조에 따른, 오디오 트랙의 패킷을 설명하기 위한 도면이다.
도 9c는 도 8a의 파일 구조에 따른, 오디오 트랙의 패킷을 설명하기 위한 도면이다.
도 10은 일 실시예에 따른 메타 데이터 헤더/메타 데이터 오디오 패킷의 부가 정보를 설명하기 위한 도면이다.
도 11은 일 실시예에 따른, 오디오 부호화 장치를 설명하기 위한 도면이다.
도 12는 일 실시예에 따른 메타 데이터 생성부를 설명하기 위한 도면이다.
도 13은 일 실시예에 따른 오디오 복호화 장치를 설명하기 위한 도면이다.
도 14 는 일 실시예에 따른, 3.1.2 채널 오디오 렌더링부(1410), 5.1.2 채널 오디오 렌더링부(1420) 및 7.1.4 채널 오디오 렌더링부(1430)을 설명하기 위한 도면이다.
도 15a는 일 실시예에 따른 오디오 부호화 장치(400)가 에러 제거를 위한 펙터를 결정하는 과정을 설명하기 위한 흐름도이다.
도 15b는 일 실시예에 따른 오디오 부호화 장치(400)가 Ls5 신호의 스케일 펙터를 결정하는 과정을 설명하기 위한 흐름도이다.
도 15c는 일 실시예에 따른 오디오 복호화 장치(500)가 에러 제거를 위한 펙터를 기초로, Ls5_3 신호를 생성하는 과정을 설명하기 위한 흐름도이다.
도 16a는 일 실시예에 따른, 채널 레이아웃 확장을 위한 비트스트림의 구성을 설명하기 위한 도면이다.
도 16b는 다른 실시예에 따른, 채널 레이아웃 확장을 위한 비트스트림의 구성을 설명하기 위한 도면이다.
도 17는 일 실시예에 따라, 채널 레이아웃의 확장을 위해, 3.1.2 채널 레이아웃의 오디오 신호에, 추가되는 앰비소닉 오디오 신호를 설명하기 위한 도면이다.
도 18은 오디오 복호화 장치(1800)가 3.1.2 채널 레이아웃의 오디오 신호 및 음원 객체 정보를 기초로, 화면상의 객체 오디오 신호를 생성하는 과정을 설명하기 위한 도면이다.
도 19는, 일 실시예에 따른 오디오 부호화 장치(200,400)가 각 채널 그룹 내 오디오 스트림의 전송 순서 및 규칙을 설명하기 위한 도면이다.
도 20a는 일 실시예에 따른, 오디오 처리 방법의 흐름도를 도시한다.
도 20b는 다른 일 실시예에 따른,오디오 처리 방법의 흐름도를 도시한다.
도 20c는 다른 일 실시예에 따른, 오디오 처리 방법의 흐름도를 도시한다.
도 20d는 다른 일 실시예에 따른,오디오 처리 방법의 흐름도를 도시한다.
도 21는 일 실시예에 따라, 오디오 부호화 장치가 제 1 신경망 이용하여 LFE 신호에 메타 데이터를 담아 전송하고, 오디오 복호화 장치가 제 2 신경망을 이용하여 LFE 신호로부터 메타 데이터를 획득하는 과정을 설명하기 위한 도면이다.
도 22a는 일 실시예에 따른, 오디오 처리 방법의 흐름도를 도시한다.
도 22b는 일 실시예에 따른, 오디오 처리 방법의 흐름도를 도시한다.In order to more fully understand the drawings cited herein, a brief description of each drawing is provided.
1A is a diagram for explaining a scalable audio channel layout structure according to an embodiment.
1B is a diagram for explaining an example of a detailed scalable audio channel layout structure.
2A is a block diagram illustrating a configuration of an audio encoding apparatus according to an embodiment.
2B is a block diagram illustrating a configuration of an audio encoding apparatus according to an embodiment.
2C is a block diagram illustrating a configuration of a multi-channel audio signal processing unit according to an embodiment.
2D is a diagram for explaining an example of a specific operation of an audio signal classifier.
3A is a block diagram illustrating a configuration of a multi-channel audio decoding apparatus according to an embodiment.
3B is a block diagram illustrating a configuration of a multi-channel audio decoding apparatus according to an embodiment.
3C is a block diagram illustrating a configuration of a multi-channel audio signal restoration unit according to an embodiment.
3D is a block diagram illustrating a configuration of an upmix channel audio generator according to an embodiment.
4A is a block diagram showing the configuration of an audio encoding apparatus according to another embodiment.
4B is a block diagram illustrating a configuration of a restoration unit according to an exemplary embodiment.
5A is a block diagram illustrating a configuration of an audio decoding apparatus according to another embodiment.
5B is a block diagram illustrating a configuration of a multi-channel audio signal restoration unit according to an embodiment.
6 is a diagram illustrating a file structure according to an embodiment.
7A is a diagram for describing a specific structure of a file according to an exemplary embodiment.
FIG. 7B is a flowchart illustrating a method for an audio decoding apparatus to reproduce an audio signal according to the file structure of FIG. 7A .
8A is a diagram for explaining a file structure according to another embodiment.
FIG. 8B is a flowchart illustrating a method for an audio decoding apparatus to reproduce an audio signal according to the file structure of FIG. 8A.
9A is a diagram for explaining a packet of an audio track according to the file structure of FIG. 7A.
FIG. 9B is a diagram for explaining a packet of an audio track according to the file structure of FIG. 7C.
9C is a diagram for explaining a packet of an audio track according to the file structure of FIG. 8A.
10 is a diagram for explaining additional information of a metadata header/metadata audio packet according to an embodiment.
11 is a diagram for describing an audio encoding apparatus according to an embodiment.
12 is a diagram for describing a meta data generator according to an exemplary embodiment.
13 is a diagram for describing an audio decoding apparatus according to an embodiment.
14 is a diagram for describing a 3.1.2-channel audio rendering unit 1410, a 5.1.2-channel audio rendering unit 1420, and a 7.1.4-channel audio rendering unit 1430, according to an embodiment.
15A is a flowchart illustrating a process of determining a factor for error removal by the audio encoding apparatus 400 according to an exemplary embodiment.
15B is a flowchart illustrating a process in which the audio encoding apparatus 400 determines a scale factor of an Ls5 signal according to an embodiment.
15C is a flowchart illustrating a process in which the audio decoding apparatus 500 generates an Ls5_3 signal based on a factor for error removal according to an exemplary embodiment.
16A is a diagram for describing a configuration of a bitstream for channel layout extension, according to an embodiment.
16B is a diagram for explaining the configuration of a bitstream for channel layout extension, according to another embodiment.
FIG. 17 is a diagram for explaining an ambisonic audio signal added to an audio signal of a 3.1.2 channel layout in order to expand a channel layout, according to an embodiment.
18 is a diagram for explaining a process in which the audio decoding apparatus 1800 generates an object audio signal on a screen based on an audio signal of a 3.1.2 channel layout and sound source object information.
19 is a diagram for explaining the transmission order and rules of audio streams in each channel group by the audio encoding apparatuses 200 and 400 according to an embodiment.
20A illustrates a flowchart of an audio processing method, according to an embodiment.
20B is a flowchart of an audio processing method according to another embodiment.
20C is a flowchart of an audio processing method according to another embodiment.
20D is a flowchart of an audio processing method according to another embodiment.
21 illustrates a process in which an audio encoding apparatus transmits metadata in an LFE signal using a first neural network, and an audio decoding apparatus acquires metadata from an LFE signal using a second neural network, according to an embodiment. is a drawing for
22A illustrates a flowchart of an audio processing method, according to an embodiment.
22B illustrates a flowchart of an audio processing method, according to an embodiment.

본 개시는 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고, 이를 상세한 설명을 통해 설명하고자 한다. 그러나, 이는 본 개시를 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 개시의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Since the present disclosure can make various changes and can have various embodiments, specific embodiments are illustrated in the drawings, and this will be described through detailed description. However, this is not intended to limit the present disclosure to specific embodiments, and it should be understood to include all modifications, equivalents and substitutes included in the spirit and scope of the present disclosure.

실시예를 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 실시예의 설명 과정에서 이용되는 숫자(예를 들어, 제 1, 제 2 등)는 하나의 구성요소를 다른 구성요소와 구분하기 위한 식별기호에 불과하다.In describing the embodiment, if it is determined that a detailed description of a related known technology may unnecessarily obscure the subject matter, the detailed description thereof will be omitted. In addition, numbers (eg, first, second, etc.) used in the description process of the embodiment are only identification symbols for distinguishing one component from other components.

또한, 본 명세서에서 일 구성요소가 다른 구성요소와 "연결된다" 거나 "접속된다" 등으로 언급된 때에는, 상기 일 구성요소가 상기 다른 구성요소와 직접 연결되거나 또는 직접 접속될 수도 있지만, 특별히 반대되는 기재가 존재하지 않는 이상, 중간에 또 다른 구성요소를 매개하여 연결되거나 또는 접속될 수도 있다고 이해되어야 할 것이다.In addition, when a component is referred to as "connected" or "connected" to another component in the present specification, the component may be directly connected to or directly connected to the other component, but in particular the opposite is true. Unless there is a description to be used, it will be understood that it may be connected or connected through another element in the middle.

또한, 본 명세서에서 '~부(유닛)', '모듈' 등으로 표현되는 구성요소는 2개 이상의 구성요소가 하나의 구성요소로 합쳐지거나 또는 하나의 구성요소가 보다 세분화된 기능별로 2개 이상으로 분화될 수도 있다. 또한, 이하에서 설명할 구성요소 각각은 자신이 담당하는 주기능 이외에도 다른 구성요소가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성요소 각각이 담당하는 주기능 중 일부 기능이 다른 구성요소에 의해 전담되어 수행될 수도 있음은 물론이다In addition, in the present specification, components expressed as '~ part (unit)', 'module', etc. are two or more components combined into one component, or two or more components for each more subdivided function. may be differentiated into In addition, each of the components to be described below may additionally perform some or all of the functions of other components in addition to the main functions they are responsible for, and some of the main functions of each component may be different It goes without saying that it may be performed exclusively by the component.

본 명세서에서 'DNN(deep neural network)'은 뇌 신경을 모사한 인공신경망 모델의 대표적인 예시로써, 특정 알고리즘을 사용한 인공신경망 모델로 한정되지 않는다.In the present specification, a 'deep neural network (DNN)' is a representative example of an artificial neural network model simulating a brain nerve, and is not limited to an artificial neural network model using a specific algorithm.

본 명세서에서 '파라미터'는 뉴럴 네트워크를 이루는 각 레이어의 연산 과정에서 이용되는 값으로서 예를 들어, 입력 값을 소정 연산식에 적용할 때 이용되는 가중치(및 바이어스)를 포함할 수 있다. 파라미터는 행렬 형태로 표현될 수 있다. 파라미터는 훈련의 결과로 설정되는 값으로서, 필요에 따라 별도의 훈련 데이터(training data)를 통해 갱신될 수 있다.In the present specification, a 'parameter' is a value used in the calculation process of each layer constituting the neural network, and may include, for example, a weight (and bias) used when an input value is applied to a predetermined calculation expression. The parameter may be expressed in a matrix form. A parameter is a value set as a result of training, and may be updated through separate training data if necessary.

본 명세서에서 '다채널 오디오 신호'는 n채널(n은, 2보다 큰 정수)의 오디오 신호를 의미할 수 있다. '모노 채널 오디오 신호'는 1차원 오디오 신호이고, 또는 '스테레오 채널 오디오 신호'는 2차원 오디오 신호일 수 있고, '다채널 오디오 신호'는 3차원 오디오 신호일 수 있다.In this specification, a 'multi-channel audio signal' may mean an audio signal of n channels (n is an integer greater than 2). The 'mono channel audio signal' may be a one-dimensional audio signal, the 'stereo channel audio signal' may be a two-dimensional audio signal, and the 'multi-channel audio signal' may be a three-dimensional audio signal.

본 명세서에서 '채널(스피커) 레이아웃'은 적어도 하나의 채널의 조합을 나타낼 수 있고, 채널들(스피커들)의 공간적인 배치를 특정할 수 있다. 여기서의 채널은 실제로 오디오 신호가 출력되는 채널이므로, 표시 채널(Presentation channel)이라 할 수 있다. In the present specification, a 'channel (speaker) layout' may indicate a combination of at least one channel, and may specify a spatial arrangement of channels (speakers). Since the channel here is a channel through which an audio signal is actually output, it may be referred to as a presentation channel.

예를 들어, 채널 레이아웃은 X.Y.Z 채널 레이아웃일 수 있다. 여기서, X는 서라운드 채널의 개수, Y는 서브우퍼 채널의 개수, Z는 높이 채널의 개수일 수 있다. '채널 레이아웃'에 의하여, 서라운드 채널/서브우퍼 채널/높이 채널 각각의 공간적인 위치가 특정될 수 있다.For example, the channel layout may be an X.Y.Z channel layout. Here, X may be the number of surround channels, Y may be the number of subwoofer channels, and Z may be the number of height channels. By the 'channel layout', the spatial position of each of the surround channel/subwoofer channel/height channel may be specified.

'채널(스피커) 레이아웃'의 예로, 1.0.0 채널(모노 채널) 레이아웃, 2.0.0 채널(스테레오 채널) 레이아웃, 5.1.0 채널 레이아웃, 5.1.2 채널 레이아웃, 5.1.4 채널 레이아웃, 7.1.0 레이아웃, 7.1.2 레이아웃, 3.1.2 채널 레이아웃이 있으나, 이에 제한되지 않고, 다양한 채널 레이아웃이 있을 수 있다.As an example of 'channel (speaker) layout', 1.0.0 channel (mono channel) layout, 2.0.0 channel (stereo channel) layout, 5.1.0 channel layout, 5.1.2 channel layout, 5.1.4 channel layout, 7.1. 0 layout, 7.1.2 layout, 3.1.2 channel layout, but is not limited thereto, and there may be various channel layouts.

'채널(스피커) 레이아웃'에 의해 특정되는 채널들의 명칭은 다양할 수 있으나, 설명의 편의상 통일하기로 한다. The names of the channels specified by the 'channel (speaker) layout' may vary, but they will be unified for convenience of description.

각 채널들의 공간적인 위치를 기초로, 다음과 같이 '채널(스피커) 레이아웃'의 채널들이 명명될 수 있다.Based on the spatial location of each channel, the channels of the 'channel (speaker) layout' may be named as follows.

예를 들어, 1.0.0 채널 레이아웃의 제 1 서라운드 채널은 모노 채널(Mono Channel)로 명명될 수 있다. 2.0.0 채널 레이아웃의 제 1 서라운드 채널은 L2 채널로 명명될 수 있고, 제 2 서라운드 채널은 R2 채널로 명명될 수 있다.For example, the first surround channel of the 1.0.0 channel layout may be called a mono channel. The first surround channel of the 2.0.0 channel layout may be referred to as an L2 channel, and the second surround channel may be referred to as an R2 channel.

여기서 "L"은 청자 기준으로 왼쪽에 위치하는 채널임을 나타내고, "R"은 청자 기준으로 오른쪽에 위치하는 채널임을 나타낸다. "2"는 서라운드 채널이 총 2개의 채널인 경우의 서라운드 채널임을 나타낸다.Here, "L" indicates a channel located on the left with respect to the listener, and "R" indicates a channel located on the right side with respect to the listener. "2" indicates that the surround channel is a surround channel in the case of a total of two channels.

5.1.0 채널 레이아웃의 제 1 서라운드 채널은 L5 채널, 제 2 서라운드 채널은 R5 채널, 제 3 서라운드 채널은 C 채널, 제 4 서라운드 채널은 Ls5 채널, 제 5 서라운드 채널은 Rs5 채널로 명명될 수 있다. 여기서 "C"는 청자 기준으로 중심(Center)에 위치하는 채널임을 나타낸다. "s"는 측방에 위치하는 채널임을 의미한다. 5.1.0 채널 레이아웃의 제 1 서브 우퍼 채널은 LFE 채널로 명명될 수 있다. 여기서, LFE는 저주파 효과(Low Frequency Effect)를 의미할 수 있다. 즉, LFE 채널은 저주파 효과음을 출력하기 위한 채널일 수 있다. In the 5.1.0 channel layout, the first surround channel may be named as L5 channel, second surround channel as R5 channel, third surround channel as C channel, fourth surround channel as Ls5 channel, and fifth surround channel as Rs5 channel. . Here, "C" indicates a channel located at the center with respect to the listener. "s" means a channel located laterally. The first subwoofer channel of the 5.1.0 channel layout may be referred to as an LFE channel. Here, LFE may mean a low frequency effect. That is, the LFE channel may be a channel for outputting a low-frequency sound effect.

5.1.2 채널 레이아웃 및 5.1.4 채널 레이아웃의 서라운드 채널과 5.1.0 채널 레이아웃의 서라운드 채널의 명칭은 동일할 수 있다. 마찬가지로, 5.1.2 채널 레이아웃 및 5.1.4 채널 레이아웃의 서브 우퍼 채널과 5.1.0 채널 레이아웃의 서브 우퍼 채널의 명칭은 동일할 수 있다. The names of the surround channel of 5.1.2 channel layout and 5.1.4 channel layout and the surround channel of 5.1.0 channel layout may be the same. Similarly, the subwoofer channel of the 5.1.2 channel layout and the 5.1.4 channel layout and the subwoofer channel of the 5.1.0 channel layout may have the same name.

5.1.2 채널 레이아웃의 제 1 높이 채널은 Hl5로 명명될 수 있다. 여기서 H는 높이 채널을 나타낸다. 제 2 높이 채널은 Hr5로 명명될 수 있다.5.1.2 The first height channel of the channel layout may be named H15. where H represents the height channel. The second height channel may be named Hr5.

한편, 5.1.4 채널 레이아웃의 제 1 높이 채널은 Hfl 채널, 제 2 높이 채널은 Hfr, 제 3 높이 채널은 Hbl 채널, 제 4 높이 채널은 Hbr 채널로 명명될 수 있다. 여기서, f는 청자 중심으로 전방 채널, b는 후방 채널임을 나타낸다.Meanwhile, in the 5.1.4 channel layout, the first height channel may be referred to as an Hfl channel, the second height channel may be referred to as Hfr, the third height channel may be referred to as an Hbl channel, and the fourth height channel may be referred to as an Hbr channel. Here, f denotes a front channel centered on the listener, and b denotes a rear channel.

7.1.0 채널 레이아웃의 제 1 서라운드 채널은 L 채널, 제 2 서라운드 채널은 R 채널, 제 3 서라운드 채널은 C 채널, 제 4 서라운드 채널은 Ls 채널, 제 5 서라운드 채널은 Rs5 채널, 제 6 서라운드 채널은 Lb 채널, 제 7 서라운드 채널은 Rb 채널로 명명될 수 있다. In 7.1.0 channel layout, 1st surround channel is L channel, 2nd surround channel is R channel, 3rd surround channel is C channel, 4th surround channel is Ls channel, 5th surround channel is Rs5 channel, 6th surround channel may be referred to as an Lb channel, and the seventh surround channel may be referred to as an Rb channel.

7.1.2 채널 레이아웃 및 7.1.4 채널 레이아웃의 서라운드 채널과 7.1.0 채널 레이아웃의 서라운드 채널의 명칭은 동일할 수 있다. 마찬가지로, 7.1.2 채널 레이아웃 및 7.1.4 채널 레이아웃의 서브 우퍼 채널과 7.1.0 채널 레이아웃의 서브 우퍼 채널의 명칭은 동일할 수 있다. The names of the surround channel of 7.1.2 channel layout and 7.1.4 channel layout and the surround channel of 7.1.0 channel layout may be the same. Similarly, the subwoofer channel of the 7.1.2 channel layout and 7.1.4 channel layout and the subwoofer channel of the 7.1.0 channel layout may have the same name.

7.1.2 채널 레이아웃의 제 1 높이 채널은 Hl7 채널, 제 2 높이 채널은 Hr7 채널로 명명될 수 있다. 7.1.2 In the channel layout, the first height channel may be referred to as an H17 channel, and the second height channel may be referred to as an Hr7 channel.

7.1.4 채널 레이아웃의 제 1 높이 채널은 Hfl 채널, 제 2 높이 채널은 Hfr 채널, 제 3 높이 채널은 Hbl 채널, 제 4 높이 채널은 Hbr 채널로 명명될 수 있다.7.1.4 In the channel layout, the first height channel may be referred to as an Hfl channel, the second height channel may be referred to as an Hfr channel, the third height channel may be referred to as an Hbl channel, and the fourth height channel may be referred to as an Hbr channel.

3.1.2 채널의 제 1 서라운드 채널은 L3 채널, 제 2 서라운드 채널은 R3 채널, 제 3 서라운드 채널은 C 채널로 명명될 수 있다. 3.1.2 채널의 제 1 서브우퍼 채널은 LFE 채널로 명명될 수 있다. 3.1.2 채널의 제 1 높이 채널은 Hfl3 채널(Tl 채널), 제 2 높이 채널은 Hfr3 채널(Tr 채널)로 명명될 수 있다. The first surround channel of the 3.1.2 channel may be referred to as an L3 channel, the second surround channel may be referred to as an R3 channel, and the third surround channel may be referred to as a C channel. The first subwoofer channel of the 3.1.2 channel may be referred to as an LFE channel. 3.1.2 The first height channel of the channel may be referred to as an Hfl3 channel (Tl channel), and the second height channel may be referred to as an Hfr3 channel (Tr channel).

여기서, 일부 채널은 채널 레이아웃에 따라 달리 명명되나, 동일한 채널을 나타낼 수 있다. 예를 들어, Hl5 채널과 Hl7 채널은 동일한 채널일 수 있다. 마찬가지로, Hr5 채널과 Hr7 채널은 동일한 채널일 수 있다.Here, some channels are named differently depending on the channel layout, but may represent the same channel. For example, the H15 channel and the H17 channel may be the same channel. Likewise, the Hr5 channel and the Hr7 channel may be the same channel.

한편, 전술한 채널들의 명칭에 제한되지 않고, 다양한 채널의 명칭이 이용될 수 있다. Meanwhile, it is not limited to the above-mentioned names of channels, and names of various channels may be used.

예를 들어, L2 채널은 L'' 채널, R2 채널은 R'' 채널, L3 채널은 ML3 채널(L' 채널), R3 채널은 MR3 채널(R' 채널), Hfl3 채널은 MHL3 채널, Hfr3 채널은 MHR3 채널, Ls5 채널은 MSL5 채널(Ls' 채널), Rs5 채널은 MSR5 채널, Hl5 채널은 MHL5 채널(Hl'), Hr5 채널은 MHR5 채널(Hr'), C 채널은 MC 채널로 명명될 수 있다. For example, L2 channel is L'' channel, R2 channel is R'' channel, L3 channel is ML3 channel (L' channel), R3 channel is MR3 channel (R' channel), Hfl3 channel is MHL3 channel, Hfr3 channel is MHR3 channel, Ls5 channel is MSL5 channel (Ls' channel), Rs5 channel is MSR5 channel, Hl5 channel is MHL5 channel (Hl'), Hr5 channel is MHR5 channel (Hr'), C channel can be named MC channel have.

전술한 레이아웃에 대한 채널 레이아웃의 채널들의 명칭을 정리하면, 하기 표 1과 같다.The names of channels in the channel layout with respect to the above-described layout are summarized in Table 1 below.

채널 레이아웃Channel Layout 채널들의 명칭names of channels 1.0.01.0.0 MonoMono 2.0.02.0.0 L2/R2L2/R2 5.1.05.1.0 L5/C/R5/Ls5/Rs5/LFEL5/C/R5/Ls5/Rs5/LFE 5.1.25.1.2 L5/C/R5/Ls5/Rs5/Hl5/Hr5/LFEL5/C/R5/Ls5/Rs5/Hl5/Hr5/LFE 5.1.4　5.1.4 L5/C/R5/Ls5/Rs5/Hfl/Hfr/Hbl/Hbr/LFEL5/C/R5/Ls5/Rs5/Hfl/Hfr/Hbl/Hbr/LFE 7.1.07.1.0 L/C/R/Ls/Rs/Lb/Rb/LFEL/C/R/Ls/Rs/Lb/Rb/LFE 7.1.27.1.2 L/C/R/Ls/Rs/Lb/Rb/Hl7/Hr7/LFEL/C/R/Ls/Rs/Lb/Rb/Hl7/Hr7/LFE 7.1.47.1.4 L/C/R/Ls/Rs/Lb/Rb/Hfl/Hfr/Hbl/Hbr/LFEL/C/R/Ls/Rs/Lb/Rb/Hfl/Hfr/Hbl/Hbr/LFE 3.1.2　3.1.2 L3/C/R3/Hfl3/Hfr3/LFEL3/C/R3/Hfl3/Hfr3/LFE

한편, '전송 채널(Transmission Channel)'은 압축된 오디오 신호를 전송하기 위한 채널로, '전송 채널(Transmission Channel)'의 일부는 '표시 채널(Presentation channel)'과 동일할 수 있으나, 이에 제한되지 않고, 다른 일부는 표시 채널의 오디오 신호가 믹싱된 오디오 신호의 채널(믹스 채널)일 수 있다. 즉, '전송 채널(Transmission Channel)'은 '표시 채널(presentation channel)'의 오디오 신호를 담은 채널이나, 일부는 표시 채널과 동일하고, 나머지는 표시 채널과 다른 채널(믹스 채널)일 수 있다. Meanwhile, a 'transmission channel' is a channel for transmitting a compressed audio signal, and a part of the 'transmission channel' may be the same as a 'presentation channel', but is not limited thereto. However, the other part may be a channel (mix channel) of an audio signal in which the audio signal of the display channel is mixed. That is, a 'transmission channel' may be a channel containing an audio signal of a 'presentation channel', but some may be the same as the display channel and others may be channels (mixed channels) different from the display channel.

'전송 채널(Transmission Channel)'은 '표시 채널'과 구별하여 명명될 수 있다. 예를 들어, 전송 채널이 A/B 채널인 경우, A/B 채널은 L2/R2 채널의 오디오 신호를 담을 수 있다. 전송 채널이 T/P/Q 채널인 경우, T/P/Q 채널은 C/LFE/Hfl3,Hfr3 채널의 오디오 신호를 담을 수 있다. 전송 채널이 S/U/V 채널 인 경우, S/U/V 채널은 L,R/Ls,Rs/Hfl,Hfr 채널의 오디오 신호를 담을 수 있다.A 'transmission channel' may be named to be distinguished from a 'display channel'. For example, when the transport channel is an A/B channel, the A/B channel may contain an audio signal of an L2/R2 channel. When the transport channel is a T/P/Q channel, the T/P/Q channel may contain audio signals of C/LFE/Hfl3 and Hfr3 channels. When the transmission channel is an S/U/V channel, the S/U/V channel may contain audio signals of L, R/Ls, Rs/Hfl, and Hfr channels.

본 명세서에서, '3차원 오디오 신호'는 3차원 공간에서의 사운드의 분포와 음원들의 위치를 알아낼 수 있는 오디오 신호를 의미할 수 있다.In this specification, a '3D audio signal' may mean an audio signal capable of finding out the distribution of sound and the location of sound sources in a 3D space.

본 명세서에서, '청자 전방 3차원 오디오 채널'은, 청자의 전방에 배치되는 오디오 채널의 레이아웃에 기초한, 3차원 오디오 채널을 의미할 수 있다. '청자 전방 3차원 오디오 채널'은 '프론트 3D(Front 3D) 오디오 채널'로 지칭될 수도 있다. 특히, '청자 전방 3차원 오디오 채널'은, 청자 전방에 위치하는 화면을 중심으로 배치되는 오디오 채널의 레이아웃에 기초한, 3차원 오디오 채널이기 때문에, '화면 중심(screen centered) 3차원 오디오 채널'이라고 칭할 수 있다.In this specification, the '3D audio channel in front of the listener' may mean a 3D audio channel based on the layout of the audio channel disposed in front of the listener. The '3D audio channel in front of the listener' may be referred to as a 'Front 3D audio channel'. In particular, the 'listener front 3D audio channel' is a 3D audio channel based on the layout of the audio channel arranged around the screen located in front of the listener, so it is called a 'screen centered 3D audio channel'. can be called

본 명세서에서, '청자 전방향(Omni-direction) 3차원 오디오 채널'은, 청자 중심으로 전방향으로 배치되는 오디오 채널의 레이아웃에 기초한, 3차원 오디오 채널을 의미할 수 있다. '청자 전방향 3차원 오디오 채널'은 '풀 3D(Full 3D) 오디오 채널'로 지칭될 수도 있다. 여기서 전방향은 전방, 측방 및 후방을 모두 포함하는 방향을 의미할 수 있다. 특히, '청자 전방향 3차원 오디오 채널'은, 청자를 중심으로 전방향(Omni-direction)으로 배치되는 오디오 채널의 레이아웃에 기초한, 3차원 오디오 채널이기 때문에, '청자 중심(Listener centered) 3차원 오디오 채널'이라고 칭할 수 있다.In this specification, an 'audio-direction 3D audio channel' may mean a 3D audio channel based on a layout of an audio channel arranged in an omnidirectional direction with respect to a listener. The 'listener omnidirectional 3D audio channel' may also be referred to as a 'full 3D audio channel'. Here, the forward direction may mean a direction including all of the front, the side, and the rear. In particular, since the 'listener omnidirectional 3D audio channel' is a 3D audio channel based on the layout of the audio channel arranged in an omni-direction centering on the listener, the 'listener centered 3D audio channel' is audio channel'.

본 명세서에서, '채널 그룹(Channel Group)'은 일종의 데이터 단위로, 적어도 하나의 채널의 (압축) 오디오 신호를 포함할 수 있다. 구체적으로, 다른 채널 그룹과 독립적인 기본 채널 그룹(Base Channel Group)과, 적어도 하나의 채널 그룹에 종속하는 종속 채널 그룹(Dependent Channel Group) 중 적어도 하나를 포함할 수 있다. 이때, 종속 채널 그룹이 종속하는 대상 채널 그룹은 다른 종속 채널 그룹일 수 있고, 특히, 하위의 채널 레이아웃과 관련된 종속 채널 그룹일 수 있다. 또는, 종속 채널 그룹이 종속하는 채널 그룹은 기본 채널 그룹일 수 있다. '채널 그룹(Channel Group)'은 일종의 채널 그룹의 데이터를 포함하므로, '데이터 그룹(Coding Group)'으로 칭할 수 있다. 종속 채널 그룹(Dependent Channel Group)은 기본 채널 그룹에 포함된 채널로부터, 채널의 개수를 추가적으로 확장하기 위해 이용되는 그룹으로, 확장 채널 그룹(Scalable Channel Group 또는 Extended Channel Group)로 칭할 수 있다.In the present specification, a 'channel group' is a type of data unit and may include a (compressed) audio signal of at least one channel. Specifically, it may include at least one of a base channel group independent of other channel groups and a dependent channel group dependent on at least one channel group. In this case, the target channel group to which the subordinate channel group depends may be another subordinate channel group, in particular, a subordinate channel group related to a lower channel layout. Alternatively, the channel group to which the subordinate channel group depends may be a basic channel group. Since the 'Channel Group' includes data of a kind of channel group, it may be referred to as a 'Data Group (Coding Group)'. A dependent channel group is a group used to additionally extend the number of channels from a channel included in the basic channel group, and may be referred to as a scalable channel group or an extended channel group.

'기본 채널 그룹'의 오디오 신호는 모노 채널의 오디오 신호 또는 스테레오 채널의 오디오 신호를 포함할 수 있다. 이에 제한되지 않고, '기본 채널 그룹'의 오디오 신호는 청자 전방 3차원 오디오 채널의 오디오 신호를 포함할 수도 있다.The audio signal of the 'basic channel group' may include an audio signal of a mono channel or an audio signal of a stereo channel. The present invention is not limited thereto, and the audio signal of the 'basic channel group' may include an audio signal of a 3D audio channel in front of the listener.

예를 들어, '종속 채널 그룹'의 오디오 신호는 청자 전방 3차원 오디오 채널의 오디오 신호 또는 청자 전방향 3차원 오디오 채널의 오디오 신호 중 '기본 채널 그룹'의 오디오 신호를 제외한 나머지 채널의 오디오 신호를 포함할 수 있다. 이때, 상기 나머지 채널의 오디오 신호의 일부는 적어도 하나의 채널의 오디오 신호가 믹싱된 오디오 신호(즉, 믹싱 채널의 오디오 신호)일 수 있다.For example, the audio signal of the 'dependent channel group' is the audio signal of the 3D audio channel in front of the listener or the audio signal of the other channels except the audio signal of the 'basic channel group' among the audio signals of the 3D audio channel in the listener front direction. may include In this case, a part of the audio signal of the remaining channel may be an audio signal obtained by mixing the audio signal of at least one channel (ie, the audio signal of the mixing channel).

예를 들어, '기본 채널 그룹'의 오디오 신호는 모노 채널의 오디오 신호 또는 스테레오 채널의 오디오 신호일 수 있다.'기본 채널 그룹' 및 '종속 채널 그룹'의 오디오 신호를 기초로 복원되는 '다채널 오디오 신호'는 청자 전방 3차원 오디오 채널의 오디오 신호 또는 청자 전방향 3차원 오디오 채널의 오디오 신호일 수 있다.For example, the audio signal of the 'basic channel group' may be a mono-channel audio signal or a stereo channel audio signal. The 'signal' may be an audio signal of a listener-facing 3D audio channel or an audio signal of a listener-facing 3D audio channel.

본 명세서에서, '업믹싱(up-mixing)'는 디믹싱(de-mixing)을 통하여, 입력된 오디오 신호의 표시 채널의 개수에 비해, 출력되는 오디오 신호의 표시 채널의 개수가 늘어나게 되는 동작을 의미할 수 있다.In the present specification, 'up-mixing' refers to an operation in which the number of display channels of an output audio signal is increased compared to the number of display channels of an input audio signal through de-mixing. can mean

본 명세서에서, '디믹싱(de-mixing)'는 다양한 채널의 오디오 신호가 믹싱된 오디오 신호(즉, 믹스 채널(mixed channel)의 오디오 신호)로부터, 특정 채널의 오디오 신호를 분리하는 동작으로, 믹싱 동작 중 하나를 의미할 수 있다. 이때, '디믹싱'는 '디믹싱 행렬'(또는 이에 대응되는 '다운믹싱 행렬')를 이용한 연산으로 구현될 수 있고, '디믹싱 행렬'는 디믹싱 행렬(또는 이에 대응되는 '다운믹싱 행렬')의 계수로서 적어도 하나의 '디믹싱 가중치 파라미터'(또는 이에 대응되는 '다운믹싱 가중치 파라미터')를 포함할 수 있다. 또는, '디믹싱'는 '디믹싱 행렬'(또는 이에 대응되는 '다운믹싱 행렬')의 일부를 기초로 한 수학식 연산으로 구현될 수 있고, 이에 제한되지 않고, 다양한 방식으로 구현될 수 있다. 전술한 바와 같이, '디믹싱'는 '업믹싱'와 관련될 수 있다. As used herein, 'de-mixing' is an operation of separating an audio signal of a specific channel from an audio signal in which audio signals of various channels are mixed (that is, an audio signal of a mixed channel), It may mean one of the mixing operations. In this case, 'demixing' may be implemented as an operation using a 'demixing matrix' (or a 'downmixing matrix' corresponding thereto), and the 'demixing matrix' is a demixing matrix (or a corresponding 'downmixing matrix'). ') may include at least one 'demixing weight parameter' (or a 'downmixing weight parameter' corresponding thereto). Alternatively, 'demixing' may be implemented as an equation operation based on a part of a 'demixing matrix' (or a 'downmixing matrix' corresponding thereto), but is not limited thereto, and may be implemented in various ways. . As described above, 'demixing' may be related to 'upmixing'.

'믹싱'은 복수의 채널의 오디오 신호 각각에 각각의 대응 가중치를 곱하여 획득된 각각의 값들을 합하여(즉, 복수의 채널의 오디오 신호를 섞어) 새로운 채널(즉, 믹스 채널)의 오디오 신호를 생성하는 모든 동작을 의미한다.'Mixing' generates an audio signal of a new channel (ie, a mixed channel) by adding values obtained by multiplying each audio signal of a plurality of channels by respective corresponding weights (ie, mixing audio signals of a plurality of channels) It means every action you do.

'믹싱'은 오디오 부호화 장치에서 수행되는 좁은 의미의 '믹싱'과, 오디오 복호화 장치에서 수행되는 '디믹싱'으로 구분될 수 있다. 'Mixing' may be divided into 'mixing' in a narrow sense performed by the audio encoding apparatus and 'demixing' performed by the audio decoding apparatus.

오디오 부호화 장치에서 수행되는 '믹싱'은 '(다운)믹싱 매트릭스'를 이용한 연산으로 구현될 수 있고, '(다운)믹싱 매트릭스'는 (다운)믹싱 매트릭스의 계수로서 적어도 하나의 '(다운)믹싱 가중치 파라미터'를 포함할 수 있다. 또는, '(다운)믹싱'는 '(다운)믹싱 매트릭스'의 일부를 기초로 한 수학식 연산으로 구현될 수 있고, 이에 제한되지 않고, 다양한 방식으로 구현될 수 있다. The 'mixing' performed in the audio encoding apparatus may be implemented as an operation using a '(down)mixing matrix', and the '(down)mixing matrix' is a coefficient of the (down)mixing matrix and includes at least one '(down)mixing matrix'. weight parameter'. Alternatively, '(down)mixing' may be implemented as an equation operation based on a part of the '(down)mixing matrix', but is not limited thereto, and may be implemented in various ways.

본 명세서에서, '업믹스(up-mix) 채널 그룹'은 적어도 하나의 업믹스 채널을 포함하는 그룹을 의미하고, '업믹스(up-mixed) 채널'은 부/복호화된 채널의 오디오 신호에 대한 디믹싱을 통해 분리된 디믹스 채널(de-mixed channel)을 의미할 수 있다. 좁은 의미의 '업믹스(up-mix) 채널 그룹'은 '업믹스 채널'만을 포함할 수 있다. 하지만, 넓은 의미의 '업믹스(up-mix) 채널 그룹'은 '업믹스 채널'뿐 아니라, '부/복호화된 채널'을 더 포함할 수 있다. 여기서, '부/복호화된 채널'이란, 부호화(압축)되어 비트스트림에 포함된 오디오 신호의 독립 채널 또는 비트스트림으로부터 복호화되어 획득된 오디오 신호의 독립 채널을 의미한다. 이때, 부/복호화된 채널의 오디오 신호를 획득하기 위해 별도의 (디)믹싱 동작은 필요하지 않다. In the present specification, an 'up-mix channel group' refers to a group including at least one upmix channel, and an 'up-mixed channel' refers to an audio signal of an encoded/decoded channel. It may mean a de-mixed channel separated through demixing. An 'up-mix channel group' in a narrow sense may include only an 'up-mix channel'. However, an 'up-mix channel group' in a broad sense may further include not only an 'upmix channel' but also an 'encoded/decoded channel'. Here, the 'encoded/decoded channel' means an independent channel of an audio signal included in a bitstream after being encoded (compressed) or an independent channel of an audio signal obtained by decoding from the bitstream. In this case, a separate (de)mixing operation is not required to obtain the audio signal of the encoded/decoded channel.

넓은 의미의 '업믹스(up-mix) 채널 그룹'의 오디오 신호는 다채널 오디오 신호일 수 있고, 출력 다채널 오디오 신호는 스피커와 같은 장치로 출력되는 오디오 신호로, 적어도 하나의 다채널 오디오 신호(즉, 적어도 하나의 업믹스 채널 그룹의 오디오 신호) 중 하나일 수 있다.An audio signal of an 'up-mix channel group' in a broad sense may be a multi-channel audio signal, and the output multi-channel audio signal is an audio signal output to a device such as a speaker, and includes at least one multi-channel audio signal ( That is, it may be one of the audio signals of at least one upmix channel group).

본 명세서에서, '다운 믹싱(down-mixing)'는 믹싱(mixing)을 통해 입력된 오디오 신호의 표시 채널의 개수에 비하여, 출력되는 오디오 신호의 표시 채널의 개수가 줄어들게 되는 동작을 의미할 수 있다.In this specification, 'down-mixing' may refer to an operation in which the number of display channels of an output audio signal is reduced compared to the number of display channels of an input audio signal through mixing. .

본 명세서에서, '에러 제거(Error Removal)를 위한 펙터(factor)'은 손실 부호화(Lossy Coding)로 인하여 생성된 오디오 신호의 에러를 제거하기 위한 펙터일 수 있다. In the present specification, a 'factor for error removal' may be a factor for removing an error in an audio signal generated due to lossy coding.

손실 부호화로 인하여 생성된 신호의 에러는 양자화로 인한 에러, 구체적으로, 심리청각특성(Phycho-acoustic characteristic)에 기초한 부호화(양자화)로 인한 에러 등을 포함할 수 있다. '에러 제거를 위한 펙터'는 '부호화 에러 제거 펙터(Coding Error Removal Factor; CER Factor)' 또는 '에러 제거 비율(Error Cancellation Ratio)' 등으로 칭할 수 있다. 특히, 에러 제거 동작은 실질적으로 스케일 동작에 대응되므로, '에러 제거를 위한 펙터'는 '스케일 펙터'로 칭할 수 있다.The error of the signal generated due to lossy encoding may include an error due to quantization, specifically, an error due to encoding (quantization) based on a psycho-acoustic characteristic. The 'factor for error removal' may be referred to as a 'coding error removal factor (CER factor)' or an 'error cancellation ratio'. In particular, since the error removal operation substantially corresponds to the scale operation, the 'factor for error removal' may be referred to as a 'scale factor'.

이하, 본 개시의 기술적 사상에 의한 실시예들을 차례로 상세히 설명한다.Hereinafter, embodiments according to the technical spirit of the present disclosure will be described in detail in turn.

도 1a는 일 실시예에 따른 스케일러블 오디오 채널 레이아웃 구조(scalable channel layout structure)를 설명하기 위한 도면이다.1A is a diagram for explaining a scalable audio channel layout structure according to an embodiment.

종래의 3차원 오디오 복호화 장치는, 특정 채널 레이아웃의 독립 채널들의 압축 오디오 신호를 비트스트림으로부터 수신하였다. 종래의 3차원 오디오 복호화 장치는, 비트스트림으로부터 수신한 독립 채널들의 압축 오디오 신호를 이용하여, 청자 전방향의 3차원 오디오 채널의 오디오 신호를 복원하였다. 이때, 특정 채널 레이아웃의 오디오 신호만이 복원될 수 있었다.A conventional 3D audio decoding apparatus receives compressed audio signals of independent channels of a specific channel layout from a bitstream. A conventional 3D audio decoding apparatus reconstructs an audio signal of a 3D audio channel omnidirectional to a listener by using compressed audio signals of independent channels received from a bitstream. In this case, only the audio signal of a specific channel layout could be restored.

또는, 종래의 3차원 오디오 복호화 장치는, 특정 채널 레이아웃의 독립 채널들(제 1 독립 채널 그룹)의 압축 오디오 신호를 비트스트림으로부터 수신하였다. 예를 들어, 특정 채널 레이아웃은 5.1 채널 레이아웃일 수 있고, 이때, 제 1 독립 채널 그룹의 압축 오디오 신호는 5개의 서라운드 채널 및 1개의 서브우퍼 채널의 압축 오디오 신호일 수 있다. Alternatively, the conventional 3D audio decoding apparatus receives a compressed audio signal of independent channels (a first independent channel group) of a specific channel layout from a bitstream. For example, the specific channel layout may be a 5.1 channel layout, and in this case, the compressed audio signal of the first independent channel group may be the compressed audio signal of 5 surround channels and 1 subwoofer channel.

여기서, 채널의 개수의 증가를 위해, 종래의 3차원 오디오 복호화 장치는, 추가적으로 제 1 독립 채널 그룹과 독립적인 다른 채널들(제 2 독립 채널 그룹)의 압축 오디오 신호를 수신하였다. 예를 들어, 제 2 독립 채널 그룹의 압축 오디오 신호는 2개의 높이 채널의 압축 오디오 신호일 수 있다.Here, in order to increase the number of channels, the conventional 3D audio decoding apparatus additionally receives the compressed audio signal of other channels (second independent channel group) independent of the first independent channel group. For example, the compressed audio signal of the second independent channel group may be a compressed audio signal of two height channels.

즉, 종래의 3차원 오디오 복호화 장치는, 비트스트림으로부터 수신한 제 1 독립 채널 그룹의 압축 오디오 신호와 별개로, 비트스트림으로부터 수신한 제 2 독립 채널 그룹의 압축 오디오 신호를 이용하여, 청자 전방향의 3차원 오디오 채널의 오디오 신호를 복원하였다. 따라서, 채널의 개수가 증가된 오디오 신호가 복원되었다. 여기서, 청자 전방향의 3차원 오디오 채널의 오디오 신호는 5.1.2 채널의 오디오 신호일 수 있다.That is, the conventional 3D audio decoding apparatus uses the compressed audio signal of the second independent channel group received from the bitstream separately from the compressed audio signal of the first independent channel group received from the bitstream, The audio signal of the 3D audio channel of Accordingly, an audio signal with an increased number of channels is restored. Here, the audio signal of the 3D audio channel in the omnidirectional direction to the listener may be the audio signal of the 5.1.2 channel.

반면에, 스테레오 채널의 오디오 신호의 재생만을 지원하는 레거시 오디오 복호화 장치는 상기 비트스트림에 포함된 압축 오디오 신호를 제대로 처리하지 못하였다.On the other hand, the legacy audio decoding apparatus supporting only the reproduction of the audio signal of the stereo channel did not properly process the compressed audio signal included in the bitstream.

또한, 3차원 오디오 신호의 재생을 지원하는 종래의 3차원 오디오 복호화 장치도, 스테레오 채널의 오디오 신호를 재생하기 위해, 먼저 제 1 독립 채널 그룹 및 제 2 독립 채널 그룹의 압축 오디오 신호를 압축 해제(복호화)하였다. 그러고 나서, 종래의 3차원 오디오 복호화 장치는, 압축해제 하여 생성된 오디오 신호를 업믹싱을 수행하였다. 하지만, 스테레오 채널의 오디오 신호를 재생하기 위해 업믹싱과 같은 동작이 반드시 수행되어야 하는 번거로움이 있었다.In addition, in a conventional 3D audio decoding apparatus supporting reproduction of a 3D audio signal, in order to reproduce an audio signal of a stereo channel, first, the compressed audio signals of the first independent channel group and the second independent channel group are decompressed ( decrypted). Then, the conventional 3D audio decoding apparatus performs upmixing on an audio signal generated by decompression. However, there is a inconvenience in that an operation such as upmixing must be performed in order to reproduce an audio signal of a stereo channel.

따라서, 레거시 오디오 복호화 장치에서 압축 오디오 신호를 처리할 수 있는 스케일러블 채널 레이아웃 구조가 요구된다. 게다가, 다양한 실시예에 따른 3차원 오디오 신호의 재생을 지원하는 오디오 복호화 장치(300,500)에서, 재생 지원되는 3차원 오디오 채널 레이아웃에 따라, 압축 오디오 신호를 처리할 수 있는, 스케일러블 채널 레이아웃 구조가 요구된다. 여기서, 스케일러블 채널 레이아웃 구조는 기본 채널 레이아웃으로부터 자유롭게 채널 개수의 증가가 가능한 레이아웃 구조를 의미한다.Accordingly, a scalable channel layout structure capable of processing a compressed audio signal in a legacy audio decoding apparatus is required. In addition, in the audio decoding apparatus 300 and 500 supporting reproduction of a 3D audio signal according to various embodiments, a scalable channel layout structure capable of processing a compressed audio signal according to a 3D audio channel layout supported for reproduction is provided is required Here, the scalable channel layout structure means a layout structure in which the number of channels can be freely increased from the basic channel layout.

다양한 실시예에 따른 오디오 복호화 장치(300,500)는 비트스트림으로부터 스케일러블 채널 레이아웃 구조의 오디오 신호를 복원할 수 있다. 일 실시예에 따른 스케일러블 채널 레이아웃 구조에 따르면, 스테레오 채널 레이아웃(100)으로부터 청자 전방의 3차원 오디오 채널 레이아웃(110)으로 채널 개수의 증가가 가능하다. 더 나아가, 스케일러블 채널 레이아웃 구조에 따르면, 청자 전방의 3차원 오디오 채널 레이아웃(110)으로부터 청자 전방향의 3차원 오디오 채널 레이아웃(120)으로, 채널 개수의 증가가 가능하다. 예를 들어, 청자 전방의 3차원 오디오 채널 레이아웃(110)는 3.1.2 채널 레이아웃일 수 있다. 청자 전방향의 3차원 오디오 채널 레이아웃(120)는 5.1.2 또는 7.1.2 채널 레이아웃일 수 있다. 하지만 본 개시에서 구현 가능한 스케일러블 채널 레이아웃은 이에 한정되지는 않는다.The audio decoding apparatuses 300 and 500 according to various embodiments may reconstruct an audio signal having a scalable channel layout structure from a bitstream. According to the scalable channel layout structure according to an embodiment, it is possible to increase the number of channels from the stereo channel layout 100 to the 3D audio channel layout 110 in front of the listener. Furthermore, according to the scalable channel layout structure, it is possible to increase the number of channels from the 3D audio channel layout 110 in front of the listener to the 3D audio channel layout 120 in the front direction of the listener. For example, the 3D audio channel layout 110 in front of the listener may be a 3.1.2 channel layout. The 3D audio channel layout 120 of the listener omnidirectional direction may be a 5.1.2 or 7.1.2 channel layout. However, the scalable channel layout that can be implemented in the present disclosure is not limited thereto.

기본 채널 그룹으로서, 종래 스테레오 채널의 오디오 신호가 압축될 수 있다. 레거시 오디오 복호화 장치는 비트스트림으로부터 기본 채널 그룹의 압축 오디오 신호를 압축 해제할 수 있기 때문에, 종래 스테레오 채널의 오디오 신호를 원활하게 재생할 수 있다.As a basic channel group, an audio signal of a conventional stereo channel may be compressed. Since the legacy audio decoding apparatus can decompress the compressed audio signal of the basic channel group from the bitstream, it can smoothly reproduce the conventional stereo channel audio signal.

추가적으로, 종속 채널 그룹으로서, 다채널 오디오 신호 중 종래 스테레오 채널의 오디오 신호를 제외한 나머지 채널의 오디오 신호가 압축될 수 있다. Additionally, as a sub-channel group, audio signals of channels other than the conventional stereo channel audio signals among multi-channel audio signals may be compressed.

다만, 채널의 개수를 증가시키는 과정에서, 채널 그룹의 오디오 신호의 일부는, 특정 채널 레이아웃의 오디오 신호 중 일부 독립 채널의 신호가 믹싱된 오디오 신호일 수 있다. However, in the process of increasing the number of channels, a portion of the audio signal of the channel group may be an audio signal obtained by mixing signals of some independent channels among audio signals of a specific channel layout.

따라서, 오디오 복호화 장치(300,500)에서 기본 채널 그룹의 오디오 신호와 종속 채널 그룹의 오디오 신호 중 일부는 디믹싱되어, 특정 채널 레이아웃에 포함된 업믹스 채널의 오디오 신호가 생성될 수 있다. Accordingly, some of the audio signals of the basic channel group and the audio signals of the dependent channel group are demixed in the audio decoding apparatus 300 and 500 to generate an audio signal of an upmix channel included in a specific channel layout.

한편, 하나 이상의 종속 채널 그룹이 존재할 수 있다. 예를 들어, 청자 전방의 3차원 오디오 채널 레이아웃(110)의 오디오 신호 중 스테레오 채널의 오디오 신호를 제외한 나머지 채널의 오디오 신호가, 제 1 종속 채널 그룹의 오디오 신호로서 압축될 수 있다. Meanwhile, one or more subordinate channel groups may exist. For example, among the audio signals of the 3D audio channel layout 110 in front of the listener, audio signals of channels other than the audio signals of the stereo channel may be compressed as audio signals of the first subordinate channel group.

청자 전방향의 3차원 오디오 채널 레이아웃(120)의 오디오 신호 중 기본 채널 그룹과 제 1 종속 채널 그룹으로부터 복원된 채널들의 오디오 신호를 제외한 나머지 채널의 오디오 신호가, 제 2 종속 채널 그룹의 오디오 신호로서 압축될 수 있다.Among the audio signals of the three-dimensional audio channel layout 120 in the omnidirectional direction to the listener, the audio signals of the remaining channels except for the audio signals of channels restored from the basic channel group and the first subordinate channel group are the audio signals of the second subordinate channel group. can be compressed.

일 실시예에 따른 오디오 복호화 장치(300,500)는, 청자 전방향의 3차원 오디오 채널 레이아웃(120)의 오디오 신호에 대한 재생을 지원할 수 있다. The audio decoding apparatuses 300 and 500 according to an embodiment may support reproduction of an audio signal of the 3D audio channel layout 120 in an omnidirectional direction to the listener.

따라서, 일 실시예에 따른 오디오 복호화 장치(300,500)는 기본 채널 그룹의 오디오 신호, 제 1 종속 채널 그룹 및 제 2 종속 채널 그룹의 오디오 신호를 기초로, 청자 전방향의 3차원 오디오 채널 레이아웃(120)의 오디오 신호를 복원할 수 있다.Accordingly, the audio decoding apparatus 300 or 500 according to an embodiment provides an omnidirectional 3D audio channel layout 120 to the listener based on the audio signal of the basic channel group, the audio signal of the first dependent channel group, and the audio signal of the second dependent channel group. ) of the audio signal can be restored.

레거시 오디오 신호 처리 장치는 비트스트림으로부터 복원하지 못하는 종속 채널 그룹의 압축 오디오 신호를 무시하고, 비트스트림으로부터 복원된 스테레오 채널의 오디오 신호만을 재생할 수 있다. The legacy audio signal processing apparatus may ignore the compressed audio signal of the dependent channel group that cannot be restored from the bitstream, and reproduce only the audio signal of the stereo channel restored from the bitstream.

마찬가지로, 오디오 복호화 장치(300,500)는 기본 채널 그룹 및 종속 채널 그룹의 압축 오디오 신호를 처리하여, 스케일러블 채널 레이아웃 중에서 지원가능한 채널 레이아웃의 오디오 신호를 복원할 수 있다. 오디오 복호화 장치(300,500)는 비트스트림으로부터, 지원하지 않는 상위 채널 레이아웃에 관한 압축 오디오 신호를 복원하지 못한다. 따라서, 오디오 복호화 장치(300,500)에서 지원하지 않는 상위 채널 레이아웃에 관한 압축 오디오 신호를 무시하고, 지원가능한 채널 레이아웃의 오디오 신호만을 비트스트림으로부터 복원할 수 있다.Similarly, the audio decoding apparatuses 300 and 500 may reconstruct an audio signal of a supportable channel layout from among the scalable channel layouts by processing the compressed audio signals of the basic channel group and the subordinate channel group. The audio decoding apparatuses 300 and 500 cannot restore a compressed audio signal related to an unsupported upper channel layout from the bitstream. Accordingly, the compressed audio signal related to the upper channel layout not supported by the audio decoding apparatus 300 or 500 may be ignored, and only the audio signal of the supportable channel layout may be restored from the bitstream.

특히, 종래의 오디오 부호화 장치 및 오디오 복호화 장치는 특정 채널 레이아웃의 독립적인 채널의 오디오 신호만을 압축 및 압축 해제하였다. 따라서, 제한적인 채널 레이아웃의 오디오 신호의 압축과 압축 해제만이 가능하였다.In particular, the conventional audio encoding apparatus and audio decoding apparatus compress and decompress only an audio signal of an independent channel of a specific channel layout. Therefore, only compression and decompression of audio signals of limited channel layout were possible.

하지만, 스케일러블 채널 레이아웃을 지원하는 장치인 다양한 실시예의 오디오 부호화 장치 및 오디오 복호화 장치(200,300,400,500)에 의하면, 스테레오 채널의 레이아웃의 오디오 신호의 전송 및 복원이 가능하다. 또한, 다양한 실시예의 오디오 부호화 장치 및 오디오 복호화 장치(200,300,400,500)에 의하면, 청자 전방의 3차원 채널 레이아웃의 오디오 신호의 전송 및 복원이 가능하다. 나아가, 다양한 실시예의 오디오 부호화 장치 및 오디오 복호화 장치(200,300,400,500)에 의하면, 청자 전방향의 3차원 채널 레이아웃의 오디오 신호를 전송 및 복원이 가능할 수 있다.However, according to the audio encoding apparatus and the audio decoding apparatus 200 , 300 , 400 and 500 of various embodiments, which are apparatuses supporting a scalable channel layout, it is possible to transmit and restore an audio signal having a layout of a stereo channel. In addition, according to the audio encoding apparatus and the audio decoding apparatus 200 , 300 , 400 and 500 according to various embodiments, it is possible to transmit and restore an audio signal having a 3D channel layout in front of a listener. Furthermore, according to the audio encoding apparatus and the audio decoding apparatus 200 , 300 , 400 , and 500 of various embodiments, it is possible to transmit and restore an audio signal having a three-dimensional channel layout in an omnidirectional direction to the listener.

즉, 다양한 실시예에 따른 오디오 부호화 장치 및 오디오 복호화 장치(200,300,400,500)는 스테레오 채널의 레이아웃에 따른 오디오 신호를 전송 및 복원할 수 있다. 그뿐 아니라, 다양한 실시예에 따른 오디오 부호화/복호화 장치(200,300,400,500)는 현재 채널 레이아웃의 오디오 신호들을 다른 채널 레이아웃의 오디오 신호들로 자유로이 변환할 수 있다. 서로 다른 채널 레이아웃에 포함된 채널들의 오디오 신호 간의 믹싱/디믹싱을 통하여 채널 레이아웃들 간의 변환이 가능하다. 다양한 실시예에 따른 오디오 부호화/복호화 장치(200,300,400,500)는 다양한 채널 레이아웃들 간의 변환을 지원하므로, 다양한 3차원 채널 레이아웃들의 오디오 신호를 전송 및 재생할 수 있다. 즉, 청자 전방의 채널 레이아웃과 청자 전방향의 채널 레이아웃 사이, 또는, 스테레오 채널 레이아웃과 청자 전방의 채널 레이아웃 사이에는, 채널의 독립성이 보장되지는 않지만, 오디오 신호의 믹싱/디믹싱을 통하여 자유로이 변환이 가능하다.That is, the audio encoding apparatus and the audio decoding apparatus 200 , 300 , 400 and 500 according to various embodiments may transmit and restore an audio signal according to the layout of a stereo channel. In addition, the audio encoding/decoding apparatuses 200 , 300 , 400 and 500 according to various embodiments may freely convert audio signals of a current channel layout into audio signals of another channel layout. Conversion between channel layouts is possible through mixing/demixing between audio signals of channels included in different channel layouts. Since the audio encoding/decoding apparatuses 200 , 300 , 400 , and 500 according to various embodiments support conversion between various channel layouts, it is possible to transmit and reproduce audio signals of various 3D channel layouts. In other words, between the channel layout in front of the listener and the channel layout in the front direction of the listener, or between the stereo channel layout and the channel layout in front of the listener, although channel independence is not guaranteed, it is freely converted through mixing/demixing of audio signals. This is possible.

다양한 실시예에 따른 오디오 부호화/복호화 장치(200,300,400,500)는, 청자 전방의 채널 레이아웃의 오디오 신호의 처리를 지원하므로, 화면 중심으로 배치된 스피커에 대응되는 오디오 신호를 전송 및 복원함으로써 청자의 몰입감이 증대될 수 있다.Since the audio encoding/decoding apparatuses 200 , 300 , 400 and 500 according to various embodiments support processing of an audio signal of a channel layout in front of the listener, the listener's sense of immersion is increased by transmitting and restoring an audio signal corresponding to a speaker disposed at the center of the screen. can be

다양한 실시예에 따른 오디오 부호화/복호화 장치(200,300,400,500)의 구체적인 동작은 도 2a 내지 도 5b를 참고하여 후술하기로 한다.Specific operations of the audio encoding/decoding apparatuses 200 , 300 , 400 and 500 according to various embodiments will be described later with reference to FIGS. 2A to 5B .

도 1b는 구체적인 스케일러블 오디오 채널 레이아웃 구조의 일 예를 설명하기 위한 도면이다.1B is a diagram for explaining an example of a detailed scalable audio channel layout structure.

도 1b를 참조하면, 스테레오 채널 레이아웃(160)의 오디오 신호를 전송하기 위해, 오디오 부호화 장치(200,400)는 L2/R2 신호를 압축하여 기본 채널 그룹의 압축 오디오 신호(A/B 신호)를 생성할 수 있다. 1B, in order to transmit the audio signal of the stereo channel layout 160, the audio encoding apparatuses 200 and 400 compress the L2/R2 signal to generate a compressed audio signal (A/B signal) of the basic channel group. can

이때, 오디오 부호화 장치(200,400)는 L2/R2 신호를 압축하여 기본 채널 그룹의 오디오 신호를 생성할 수 있다. In this case, the audio encoding apparatuses 200 and 400 may compress the L2/R2 signal to generate an audio signal of the basic channel group.

또한, 청자 전방 3차원 오디오 채널 중 하나인 3.1.2 채널의 레이아웃(170)의 오디오 신호를 전송하기 위해, 오디오 부호화 장치(200,400)는 C, LFE, Hfl3, Hfr3 신호를 압축하여 종속 채널 그룹의 압축 오디오 신호를 생성할 수 있다. 오디오 복호화 장치(300,500)는 기본 채널 그룹의 압축 오디오 신호를 압축 해제하여, L2/R2 신호를 복원할 수 있다. 또한, 오디오 복호화 장치(300,500)는 종속 채널 그룹의 압축 오디오 신호를 압축 해제하여, C, LFE, Hfl3, Hfr3 신호를 복원할 수 있다.In addition, in order to transmit the audio signal of the layout 170 of the 3.1.2 channel, which is one of the three-dimensional audio channels in front of the listener, the audio encoding apparatuses 200 and 400 compress the C, LFE, Hfl3, and Hfr3 signals of the dependent channel group. Compressed audio signals can be generated. The audio decoding apparatuses 300 and 500 may decompress the compressed audio signal of the basic channel group to reconstruct the L2/R2 signal. Also, the audio decoding apparatuses 300 and 500 may decompress the compressed audio signal of the dependent channel group to reconstruct the C, LFE, Hfl3, and Hfr3 signals.

오디오 복호화 장치(300,500)는 L2 신호 및 C 신호를 디믹싱(1)하여 3.1.2 채널 레이아웃(170)의 L3 신호를 복원할 수 있다. 오디오 복호화 장치(300,500)는 R2 신호 및 C 신호를 디믹싱(2)하여 3.1.2 채널의 R3 신호를 복원할 수 있다. The audio decoding apparatuses 300 and 500 may reconstruct the L3 signal of the 3.1.2 channel layout 170 by demixing (1) the L2 signal and the C signal. The audio decoding apparatus 300 and 500 may restore the 3.1.2 channel R3 signal by demixing (2) the R2 signal and the C signal.

결국, 오디오 복호화 장치(300,500)는 L3, R3, C, Lfe, Hfl3, Hfr3 신호를, 3.1.2 채널 레이아웃(170)의 오디오 신호로 출력할 수 있다.As a result, the audio decoding apparatus 300 and 500 may output the L3, R3, C, Lfe, Hfl3, and Hfr3 signals as the audio signals of the 3.1.2 channel layout 170 .

한편, 청자 전방향 5.1.2 채널 레이아웃(180)의 오디오 신호를 전송하기 위해, 오디오 부호화 장치(200,400)는 L5, R5 신호를 추가적으로 압축하여, 제 2 종속 채널 그룹의 압축 오디오 신호를 생성할 수 있다. Meanwhile, in order to transmit the audio signal of the listener omnidirectional 5.1.2 channel layout 180, the audio encoding apparatuses 200 and 400 additionally compress the L5 and R5 signals to generate a compressed audio signal of the second subordinate channel group. have.

전술한 바와 같이, 오디오 복호화 장치(300,500)는 기본 채널 그룹의 압축된 오디오 신호를 압축 해제하여, L2/R2 신호를 복원할 수 있고, 제 1 종속 채널 그룹의 압축된 오디오 신호를 압축 해제하여, C, LFE, Hfl3, Hfr3 신호를 복원할 수 있다. 추가적으로, 오디오 복호화 장치(300,500)는 제 2 종속 채널 그룹의 압축된 오디오 신호를 압축 해제하여 L5, R5 신호를 복원할 수 있다. 또한, 전술한 바와 같이, 오디오 복호화 장치(300,500)는 압축 해제된 오디오 신호 중 일부의 신호를 디믹싱하여, L3 및 R3 신호를 복원할 수 있다. As described above, the audio decoding apparatus 300,500 may decompress the compressed audio signal of the basic channel group to reconstruct the L2/R2 signal, and decompress the compressed audio signal of the first dependent channel group, C, LFE, Hfl3, Hfr3 signals can be restored. Additionally, the audio decoding apparatus 300 or 500 may decompress the compressed audio signal of the second dependent channel group to reconstruct the L5 and R5 signals. Also, as described above, the audio decoding apparatuses 300 and 500 may reconstruct the L3 and R3 signals by demixing some of the decompressed audio signals.

추가적으로, 오디오 복호화 장치(300,500)는 L3 및 L5 신호를 디믹싱(3)하여 Ls5 신호를 복원할 수 있다. 오디오 복호화 장치(300,500)는 R3 및 R5 신호를 디믹싱(4)하여 Rs5 신호를 복원할 수 있다. Additionally, the audio decoding apparatus 300 or 500 may reconstruct the Ls5 signal by demixing (3) the L3 and L5 signals. The audio decoding apparatuses 300 and 500 may reconstruct the Rs5 signal by demixing (4) the R3 and R5 signals.

오디오 복호화 장치(300,500)는 Hfl3 신호 및 Ls5 신호를 디믹싱(5)하여 Hl5 신호를 복원할 수 있다. The audio decoding apparatuses 300 and 500 may restore the H15 signal by demixing (5) the Hfl3 signal and the Ls5 signal.

오디오 복호화 장치(300,500)는 Hfr3 신호 및 Rs5 신호를 디믹싱(6)하여 Hr5 신호를 복원할 수 있다. Hfr3 및 Hr5는 각각 높이 채널 중 전방의 오른쪽 채널이다. The audio decoding apparatuses 300 and 500 may reconstruct the Hr5 signal by demixing (6) the Hfr3 signal and the Rs5 signal. Hfr3 and Hr5 are the front right channels of the height channels, respectively.

결국, 오디오 복호화 장치(300,500)는 Hl5, Hr5, LFE, L, R, C, Ls5, Rs5 신호를 5.1.2 채널 레이아웃(180)의 오디오 신호로 출력할 수 있다.As a result, the audio decoding apparatus 300 and 500 may output the H15, Hr5, LFE, L, R, C, Ls5, and Rs5 signals as audio signals of the 5.1.2 channel layout 180 .

한편, 7.1.4 채널 레이아웃(190)의 오디오 신호를 전송하기 위해, 오디오 부호화 장치(200,400)는 Hfl, Hfr, Ls, Rs 신호를 제 3 종속 채널 그룹의 오디오 신호로서 추가적으로 압축할 수 있다. Meanwhile, in order to transmit the audio signal of the 7.1.4 channel layout 190 , the audio encoding apparatuses 200 and 400 may additionally compress the Hfl, Hfr, Ls, and Rs signals as audio signals of the third subordinate channel group.

전술한 바와 같이, 오디오 복호화 장치(300,500)는 기본 채널 그룹의 압축 오디오 신호, 제 1 종속 채널 그룹의 압축 오디오 신호 및 제 2 종속 채널 그룹의 압축 오디오 신호를 압축 해제하고, 디믹싱 (1),(2),(3),(4),(5) 및 (6)을 통해 Hl5, Hr5, LFE, L, R, C, Ls5, Rs5 신호를 복원할 수 있다. As described above, the audio decoding apparatus 300 and 500 decompresses the compressed audio signal of the base channel group, the compressed audio signal of the first dependent channel group, and the compressed audio signal of the second dependent channel group, and performs demixing (1), Signals H15, Hr5, LFE, L, R, C, Ls5, and Rs5 can be restored through (2), (3), (4), (5) and (6).

추가적으로, 오디오 복호화 장치(300,500)는 제 3 종속 채널 그룹의 압축 오디오 신호를 압축 해제하여 Hfl, Hfr, Ls, Rs 신호를 복원할 수 있다. 오디오 복호화 장치(300,500)는 Ls5 신호 및 Ls 신호를 디믹싱(7)하여 7.1.4 채널 레이아웃(190)의 Lb 신호를 복원할 수 있다. Additionally, the audio decoding apparatus 300 or 500 may decompress the compressed audio signal of the third subordinate channel group to reconstruct Hfl, Hfr, Ls, and Rs signals. The audio decoding apparatuses 300 and 500 may reconstruct the Lb signal of the 7.1.4 channel layout 190 by demixing (7) the Ls5 signal and the Ls signal.

오디오 복호화 장치(300,500)는 Rs5 신호 및 Rs 신호를 디믹싱(8)하여 7.1.4 채널 레이아웃(190)의 Rb 신호를 복원할 수 있다. The audio decoding apparatuses 300 and 500 may reconstruct the Rb signal of the 7.1.4 channel layout 190 by demixing (8) the Rs5 signal and the Rs signal.

오디오 복호화 장치(300,500)는 Hfl 신호 및 Hl5 신호를 디믹싱(9)하여 7.1.4 채널 레이아웃(190)의 Hbl 신호를 복원할 수 있다. The audio decoding apparatuses 300 and 500 may reconstruct the Hbl signal of the 7.1.4 channel layout 190 by demixing (9) the Hfl signal and the H15 signal.

오디오 복호화 장치(300,500)는 Hfr 신호 및 Hr5 신호를 디믹싱(또는 믹싱)(10)하여 7.1.4 채널 레이아웃(190)의 Hbr 신호를 복원할 수 있다. The audio decoding apparatuses 300 and 500 may reconstruct the Hbr signal of the 7.1.4 channel layout 190 by demixing (or mixing) the Hfr signal and the Hr5 signal.

결국, 오디오 복호화 장치(300,500)는 Hfl, Hfr, LFE, C, L, R, Ls, Rs, Lb, Rb, Hbl, Hbr 신호를 7.1.4 채널 레이아웃(190)의 오디오 신호로 출력할 수 있다.As a result, the audio decoding apparatus 300 and 500 may output the Hfl, Hfr, LFE, C, L, R, Ls, Rs, Lb, Rb, Hbl, and Hbr signals as the audio signals of the 7.1.4 channel layout 190 . .

따라서, 오디오 복호화 장치(300,500)는 디믹싱 동작을 통해 채널의 개수가 증가되는 스케일러블 채널 레이아웃을 지원함으로써, 종래 스테레오 채널 레이아웃의 오디오 신호뿐 아니라, 청자 전방의 3차원 오디오 채널의 오디오 신호 및 청자 전방향 3차원 오디오 채널의 오디오 신호까지 복원할 수 있다.Accordingly, the audio decoding apparatuses 300 and 500 support a scalable channel layout in which the number of channels is increased through a demixing operation, thereby not only the audio signal of the conventional stereo channel layout, but also the audio signal of the 3D audio channel in front of the listener and the listener. Even the audio signal of the omnidirectional 3D audio channel can be restored.

이상, 도 1b를 참조하여 구체적으로 설명한 스케일러블 채널 레이아웃 구조는 일 예에 불과하고, 다양한 채널 레이아웃을 포함하는 형태로, 채널 레이아웃 구조가 스케일러블하게 구현될 수 있다. The above, the scalable channel layout structure described in detail with reference to FIG. 1B is only an example, and the channel layout structure may be scalably implemented in a form including various channel layouts.

도 2a는 일 실시예에 따른 오디오 부호화 장치의 구성을 도시하는 블록도이다.2A is a block diagram illustrating a configuration of an audio encoding apparatus according to an embodiment.

오디오 부호화 장치(200)는 메모리(210) 및 프로세서(230)를 포함한다. 오디오 부호화 장치(200)는 서버, TV, 카메라, 휴대폰, 태블릿 PC, 노트북 등 오디오 처리가 가능한 기기로 구현될 수 있다.The audio encoding apparatus 200 includes a memory 210 and a processor 230 . The audio encoding apparatus 200 may be implemented as a device capable of audio processing, such as a server, a TV, a camera, a mobile phone, a tablet PC, and a notebook computer.

도 2a에는 메모리(210) 및 프로세서(230)가 개별적으로 도시되어 있으나, 메모리(210) 및 프로세서(230)는 하나의 하드웨어 모듈(예를 들어, 칩)을 통해 구현될 수 있다. Although the memory 210 and the processor 230 are separately illustrated in FIG. 2A , the memory 210 and the processor 230 may be implemented through one hardware module (eg, a chip).

프로세서(230)는 신경망 기반의 오디오 처리를 위한 전용 프로세서로 구현될 수 있다. 또는, 프로세서(230)는 AP(application processor), CPU(central processing unit) 또는 GPU(graphic processing unit)와 같은 범용 프로세서와 소프트웨어의 조합을 통해 구현될 수도 있다. 전용 프로세서의 경우, 본 개시의 실시예를 구현하기 위한 메모리를 포함하거나, 외부 메모리를 이용하기 위한 메모리 처리부를 포함할 수 있다.The processor 230 may be implemented as a dedicated processor for neural network-based audio processing. Alternatively, the processor 230 may be implemented through a combination of a general-purpose processor such as an application processor (AP), a central processing unit (CPU), or a graphic processing unit (GPU) and software. The dedicated processor may include a memory for implementing an embodiment of the present disclosure or a memory processing unit for using an external memory.

프로세서(230)는 복수의 프로세서로 구성될 수도 있다. 이 경우, 전용 프로세서들의 조합으로 구현될 수도 있고, AP, CPU 또는 GPU와 같은 다수의 범용 프로세서들과 소프트웨어의 조합을 통해 구현될 수도 있다.The processor 230 may include a plurality of processors. In this case, it may be implemented as a combination of dedicated processors, or may be implemented through a combination of software and a plurality of general-purpose processors such as an AP, CPU, or GPU.

메모리(210)는 오디오 처리를 위한 하나 이상의 인스트럭션을 저장할 수 있다. 일 실시예에서, 메모리(210)는 신경망을 저장할 수 있다. 신경망이 인공 지능을 위한 전용 하드웨어 칩 형태로 구현되거나, 기존의 범용 프로세서(예를 들어, CPU 또는 애플리케이션 프로세서) 또는 그래픽 전용 프로세서(예를 들어, GPU)의 일부로 구현되는 경우에는, 신경망이 메모리(210)에 저장되지 않을 수 있다. 신경망은 외부 장치(예를 들어, 서버)에 의해 구현될 수 있고, 이 경우, 오디오 부호화 장치(200)는 요청하고, 외부 장치로부터 신경망에 기초한 결과 정보를 수신할 수 있다.The memory 210 may store one or more instructions for audio processing. In one embodiment, the memory 210 may store a neural network. When a neural network is implemented in the form of a dedicated hardware chip for artificial intelligence, or as part of an existing general-purpose processor (eg, CPU or application processor) or graphics-only processor (eg, GPU), the neural network is 210) may not be stored. The neural network may be implemented by an external device (eg, a server). In this case, the audio encoding apparatus 200 may make a request and receive result information based on the neural network from the external device.

프로세서(230)는 메모리(210)에 저장된 인스트럭션에 따라 연속된 프레임들을 순차적으로 처리하여 연속된 부호화(압축) 프레임들을 획득한다. 연속된 프레임은 오디오를 구성하는 프레임들을 의미할 수 있다. The processor 230 sequentially processes successive frames according to an instruction stored in the memory 210 to obtain successive encoded (compressed) frames. A continuous frame may mean frames constituting audio.

프로세서(230)는 원본 오디오 신호를 입력으로 하여, 오디오 처리 동작을 수행하여 압축 오디오 신호를 포함하는 비트스트림을 출력할 수 있다. 이때, 원본 오디오 신호는 다채널 오디오 신호일 수 있다. 압축 오디오 신호는 원본 오디오 신호의 채널의 개수보다 작거나 같은 개수의 채널을 갖는 다채널 오디오 신호일 수 있다.The processor 230 may receive an original audio signal as an input, perform an audio processing operation, and output a bitstream including a compressed audio signal. In this case, the original audio signal may be a multi-channel audio signal. The compressed audio signal may be a multi-channel audio signal having a number of channels less than or equal to the number of channels of the original audio signal.

이때, 비트스트림은 기본 채널 그룹을 포함하고, 나아가, n개의 종속 채널 그룹(n은 1보다 크거나 같은 정수)을 포함할 수 있다. 따라서, 종속 채널 그룹의 개수에 따라, 채널의 개수를 자유롭게 증가시킬 수 있다.In this case, the bitstream may include a basic channel group and further include n dependent channel groups (n is an integer greater than or equal to 1). Accordingly, the number of channels can be freely increased according to the number of subordinate channel groups.

도 2b는 일 실시예에 따른 오디오 부호화 장치의 구성을 도시하는 블록도이다.2B is a block diagram illustrating a configuration of an audio encoding apparatus according to an embodiment.

도 2b를 참조하면, 오디오 부호화 장치(200)는 다채널 오디오 부호화부(250), 비트스트림 생성부(280) 및 부가 정보 생성부(285)를 포함할 수 있다. 다채널 오디오 부호화부(250)는 다채널 오디오 신호 처리부(260) 및 압축부(270)을 포함할 수 있다.Referring to FIG. 2B , the audio encoding apparatus 200 may include a multi-channel audio encoder 250 , a bitstream generator 280 , and an additional information generator 285 . The multi-channel audio encoder 250 may include a multi-channel audio signal processor 260 and a compression unit 270 .

도 2a를 다시 참조하면, 전술한 바와 같이, 오디오 부호화 장치(200)는 메모리(210) 및 프로세서(230)를 포함할 수 있고, 도 2b의 각 구성요소(250, 260, 270, 280, 285)를 구현하기 위한 인스트럭션은 도 2a의 메모리(210)에 저장될 수 있다. 프로세서(230)는 메모리(210)에 저장된 인스트럭션을 실행할 수 있다.Referring back to FIG. 2A , as described above, the audio encoding apparatus 200 may include a memory 210 and a processor 230 , and each of the components 250 , 260 , 270 , 280 and 285 of FIG. 2B . ) may be stored in the memory 210 of FIG. 2A . The processor 230 may execute instructions stored in the memory 210 .

다채널 오디오 신호 처리부(260)는 원본 오디오 신호로부터 기본 채널 그룹의 적어도 하나의 오디오 신호 및 적어도 하나의 종속 채널 그룹의 적어도 하나의 오디오 신호를 획득할 수 있다. 예를 들어, 원본 오디오 신호가 7.1.4 채널 레이아웃의 오디오 신호인 경우, 다채널 오디오 신호 처리부(260)는 7.1.4 채널 레이아웃의 오디오 신호에서, 2채널(스테레오 채널)의 오디오 신호를 기본 채널 그룹의 오디오 신호로서 획득할 수 있다. The multi-channel audio signal processing unit 260 may obtain at least one audio signal of a basic channel group and at least one audio signal of at least one subordinate channel group from the original audio signal. For example, when the original audio signal is an audio signal of the 7.1.4 channel layout, the multi-channel audio signal processing unit 260 converts the audio signal of 2 channels (stereo channel) into the basic channel from the audio signal of the 7.1.4 channel layout. It can be obtained as an audio signal of a group.

다채널 오디오 신호 처리부(260)는 청자 전방의 3차원 오디오 채널 중 하나인 3.1.2 채널 레이아웃의 오디오 신호를 복원하기 위해, 3.1.2 채널 레이아웃의 오디오 신호에서 2채널의 오디오 신호를 제외한, 나머지 채널의 오디오 신호를 제 1 종속 채널 그룹의 오디오 신호로서 획득할 수 있다. 이때, 제 1 종속 채널 그룹의 일부 채널의 오디오 신호를 디믹싱하여, 디믹싱된 채널(de-mixed channel)의 오디오 신호를 생성할 수 있다.In order to restore the audio signal of the 3.1.2 channel layout, which is one of the three-dimensional audio channels in front of the listener, the multi-channel audio signal processing unit 260 excluding the audio signal of 2 channels from the audio signal of the 3.1.2 channel layout, the rest The audio signal of the channel may be obtained as the audio signal of the first subordinate channel group. In this case, an audio signal of a de-mixed channel may be generated by demixing audio signals of some channels of the first dependent channel group.

다채널 오디오 신호 처리부(260)는 청자 전후방 3차원 오디오 채널 중 하나인 5.1.2 채널 레이아웃의 오디오 신호를 복원하기 위해, 5.1.2 채널 레이아웃의 오디오 신호에서 기본 채널 그룹 및 제 1 종속 채널 그룹의 오디오 신호를 제외한 나머지 채널의 오디오 신호를 제 2 종속 채널 그룹의 오디오 신호로서 획득할 수 있다. 이때, 제 2 종속 채널 그룹의 일부 채널의 오디오 신호를 디믹싱하여, 디믹싱된 채널(de-mixed channel)의 오디오 신호를 생성할 수 있다.The multi-channel audio signal processing unit 260 restores the audio signal of the 5.1.2 channel layout, which is one of the front and rear 3D audio channels of the listener, of the basic channel group and the first subordinate channel group in the audio signal of the 5.1.2 channel layout. Audio signals of channels other than the audio signals may be obtained as audio signals of the second subordinate channel group. In this case, an audio signal of a de-mixed channel may be generated by demixing audio signals of some channels of the second subordinate channel group.

다채널 오디오 신호 처리부(260)는 청자 전방향 3차원 오디오 채널 중 7.1.4 채널 레이아웃의 오디오 신호를 복원하기 위해, 7.1.4 레이아웃의 오디오 신호에서, 기본 채널 그룹, 제 1 종속 채널 그룹 및 제 2 종속 채널 그룹의 오디오 신호를 제외한 나머지 채널의 오디오 신호를 제 3 종속 채널 그룹의 오디오 신호로서 획득할 수 있다. 마찬가지로, 제 3 종속 채널 그룹의 일부 채널의 오디오 신호를 디믹싱하여, 디믹싱된 채널(de-mixed channel)의 오디오 신호가 획득될 수 있다.The multi-channel audio signal processing unit 260 is configured to restore the audio signal of the 7.1.4 channel layout among the listener omnidirectional 3D audio channels, in the audio signal of the 7.1.4 layout, the basic channel group, the first sub-channel group, and the first sub-channel group. Audio signals of the remaining channels except for the audio signals of the second subordinate channel group may be obtained as audio signals of the third subordinate channel group. Similarly, by demixing audio signals of some channels of the third subordinate channel group, an audio signal of a de-mixed channel may be obtained.

다채널 오디오 신호 처리부(260)의 구체적인 동작은 도 2c를 참조하여 후술하겠다.A detailed operation of the multi-channel audio signal processing unit 260 will be described later with reference to FIG. 2C .

압축부(270)는 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호를 압축할 수 있다. 즉, 압축부(270)는 기본 채널 그룹의 적어도 하나의 오디오 신호를 압축하여 기본 채널 그룹의 적어도 하나의 압축 오디오 신호를 획득할 수 있다. 여기서 압축이란, 다양한 오디오 코덱에 기초한 압축을 의미할 수 있다. 예를 들어, 압축은, 변환 및 양자화 프로세스를 포함할 수 있다.The compression unit 270 may compress the audio signal of the basic channel group and the audio signal of the dependent channel group. That is, the compression unit 270 may obtain at least one compressed audio signal of the basic channel group by compressing at least one audio signal of the basic channel group. Here, compression may mean compression based on various audio codecs. For example, compression may include transform and quantization processes.

여기서, 기본 채널 그룹의 오디오 신호는 모노 또는 스테레오 신호일 수 있다. 또는, 기본 채널 그룹의 오디오 신호는 좌측 스테레오 채널의 오디오 신호 L과 C_1를 믹싱하여 생성된 제 1 채널의 오디오 신호를 포함할 수 있다. 여기서, C_1는 압축후 압축해제된, 청자 전방의 중심(Center) 채널의 오디오 신호일 수 있다. 오디오 신호의 명칭("X_Y")에서 "X"는 채널의 명칭, "Y"는 복호화되거나, 업믹싱되거나, 에러 제거를 위한 펙터가 적용됨(스케일됨) 또는 LFE 이득이 적용됨을 나타낼 수 있다. 예를 들어, 복호화된 신호는 "X_1"으로 표현되고, 복호화된 신호를 업믹싱하여 생성된 신호(업믹싱된 신호)는 "X_2"으로 표현될 수 있다. 또는, 복호화된 LFE 신호에 LFE 이득이 적용된 신호도 'X_2"으로 표현될 수 있다. 업믹싱된 신호에 에러 제거를 위한 펙터가 적용된(스케일된) 신호는 "X_3"으로 표현될 수 있다. Here, the audio signal of the basic channel group may be a mono or stereo signal. Alternatively, the audio signal of the basic channel group may include the audio signal of the first channel generated by mixing the audio signals L and C_1 of the left stereo channel. Here, C_1 may be an audio signal of a center channel in front of the listener, decompressed after compression. In the name of the audio signal (“X_Y”), “X” is the name of a channel, “Y” is decoded, upmixed, or a factor for error cancellation is applied (scaled) or LFE gain is applied. For example, a decoded signal may be expressed as “X_1”, and a signal (upmixed signal) generated by upmixing the decoded signal may be expressed as “X_2”. Alternatively, a signal to which an LFE gain is applied to the decoded LFE signal may also be expressed as 'X_2.' A (scaled) signal to which a factor for error removal is applied to the upmixed signal may be expressed as "X_3".

또한, 기본 채널 그룹의 오디오 신호는 우측 스테레오 채널의 오디오 신호 R과 C_1를 믹싱하여 생성된 제 2 채널의 오디오 신호를 포함할 수 있다.Also, the audio signal of the basic channel group may include the audio signal of the second channel generated by mixing the audio signals R and C_1 of the right stereo channel.

또한, 압축부(270)는 적어도 하나의 종속 채널 그룹의 적어도 하나의 오디오 신호를 압축하여, 적어도 하나의 종속 채널 그룹의 적어도 하나의 압축 오디오 신호를 획득할 수 있다.Also, the compression unit 270 may compress at least one audio signal of at least one subordinate channel group to obtain at least one compressed audio signal of at least one subordinate channel group.

부가 정보 생성부(285)는 원본 오디오 신호, 기본 채널 그룹의 압축 오디오 신호 및 종속 채널 그룹의 압축 오디오 신호 중 적어도 하나를 기초로, 부가 정보를 생성할 수 있다. 이때, 부가 정보는 다채널 오디오 신호와 관련된 정보로, 다채널 오디오 신호의 복원을 위한 다양한 정보일 수 있다. The additional information generator 285 may generate additional information based on at least one of an original audio signal, a compressed audio signal of a basic channel group, and a compressed audio signal of a subordinate channel group. In this case, the additional information is information related to the multi-channel audio signal, and may be various information for reconstructing the multi-channel audio signal.

예를 들어, 부가 정보는 오디오 객체(음원)의 오디오 신호, 위치, 모양, 면적, 방향 중 적어도 하나를 나타내는 청자 전방의 3차원 오디오 채널의 오디오 객체 신호를 포함할 수 있다. 또는 부가 정보는 기본 채널 오디오 스트림 및 종속 채널 오디오 스트림을 포함하는 오디오 스트림의 총 개수에 관한 정보를 포함할 수 있다. 또한, 부가 정보는 다운믹스 이득 정보를 포함할 수 있다. 부가 정보는 채널 맵핑 테이블 정보를 포함할 수 있다. 부가 정보는 음량 정보를 포함할 수 있다. 부가 정보는 저주파 효과 이득(Low Frequency Effect Gain; LFE Gain) 정보를 포함할 수 있다. 부가 정보는 동적 범위 제어(Dynamic Range Control;DRC) 정보를 포함할 수 있다. 부가 정보는 채널 레이아웃 렌더링 정보를 포함할 수 있다. 부가 정보는 그 외 커플링된 오디오 스트림의 개수 정보, 다채널의 레이아웃을 나타내는 정보, 오디오 신호 내 대화(Dialogue) 존재 여부 및 대화 레벨에 관한 정보, 저주파 효과(LFE) 출력 여부를 나타내는 정보, 화면 상 오디오 객체의 존재 여부에 관한 정보, 연속적인 오디오 채널의 오디오 신호(audio signal of continuous audio channel; 또는 씬 기반(scene based) 오디오 신호; 또는 앰비소닉 오디오 신호)의 존재 여부에 관한 정보, 비연속적인 오디오 채널의 오디오 신호(audio signal of discrete audio channel; 또는 객체 기반 오디오 신호; 또는 공간적인 멀티 채널(spatial multi-channel)의 오디오 신호)의 존재 여부에 관한 정보를 포함할 수 있다. 부가 정보는 다채널 오디오 신호를 복원하기 위한, 디믹싱 행렬의 적어도 하나의 디믹싱 가중치 파라미터를 포함하는 디믹싱에 관한 정보를 포함할 수 있다. 디믹싱과 (다운)믹싱은 서로 대응되는 동작이므로, 디믹싱에 관한 정보는 (다운)믹싱에 관한 정보에 대응되고, 디믹싱에 관한 정보는 (다운)믹싱에 관한 정보를 포함할 수 있다. 예를 들어, 디믹싱에 관한 정보는 (다운)믹싱 행렬의 적어도 하나의 (다운)믹싱 가중치 파라미터를 포함할 수 있다. (다운)믹싱 가중치 파라미터를 기초로, 디믹싱 가중치 파라미터가 획득될 수 있다.For example, the additional information may include an audio object signal of a 3D audio channel in front of the listener indicating at least one of an audio signal, a position, a shape, an area, and a direction of an audio object (sound source). Alternatively, the additional information may include information about the total number of audio streams including the base channel audio stream and the dependent channel audio stream. In addition, the additional information may include downmix gain information. The additional information may include channel mapping table information. The additional information may include volume information. The additional information may include Low Frequency Effect Gain (LFE Gain) information. The additional information may include dynamic range control (DRC) information. The additional information may include channel layout rendering information. The additional information includes information on the number of other coupled audio streams, information indicating the layout of multi-channels, information on whether dialogue exists and dialogue level in the audio signal, information indicating whether low-frequency effect (LFE) is output, a screen Information on the existence of a phase audio object, information on the existence of an audio signal of a continuous audio channel (or a scene based audio signal; or an ambisonic audio signal), discontinuous It may include information on whether an audio signal of discrete audio channel (or object-based audio signal; or spatial multi-channel audio signal) exists. The additional information may include information about demixing including at least one demixing weight parameter of a demixing matrix for reconstructing a multi-channel audio signal. Since demixing and (down)mixing correspond to each other, information on demixing may correspond to information on (down)mixing, and information on demixing may include information on (down)mixing. For example, the information on demixing may include at least one (down)mixing weight parameter of a (down)mixing matrix. Based on the (down)mixing weight parameter, a demixing weight parameter may be obtained.

부가 정보는 전술한 정보들의 다양한 조합일 수 있다. 즉, 부가 정보는 전술한 적어도 하나의 정보를 포함할 수 있다.The additional information may be various combinations of the above-mentioned information. That is, the additional information may include the above-described at least one piece of information.

부가 정보 생성부(285)는 기본 채널 그룹의 적어도 하나의 오디오 신호에 대응하는, 종속 채널의 오디오 신호가 존재하는 경우, 종속 채널의 오디오 신호가 존재함을 나타내는 종속 채널 오디오 신호 식별 정보를 생성할 수 있다.When there is an audio signal of the dependent channel corresponding to at least one audio signal of the basic channel group, the additional information generating unit 285 generates dependent channel audio signal identification information indicating that the audio signal of the dependent channel exists. can

비트스트림 생성부(280)은 기본 채널 그룹의 압축 오디오 신호 및 종속 채널 그룹의 압축 오디오 신호를 포함하는 비트스트림을 생성할 수 있다. 비트스트림 생성부(280)는 부가 정보 생성부(285)에서 생성된 부가 정보를 더 포함하는 비트스트림을 생성할 수 있다. The bitstream generator 280 may generate a bitstream including the compressed audio signal of the basic channel group and the compressed audio signal of the dependent channel group. The bitstream generator 280 may generate a bitstream further including the additional information generated by the additional information generator 285 .

구체적으로, 비트스트림 생성부(280)는 기본 채널 오디오 스트림 및 종속 채널 오디오 스트림을 생성할 수 있다. 기본 채널 오디오 스트림은 기본 채널 그룹의 압축 오디오 신호를 포함할 수 있고, 종속 채널 오디오 스트림은 종속 채널 그룹의 압축 오디오 신호를 포함할 수 있다. Specifically, the bitstream generation unit 280 may generate a base channel audio stream and a dependent channel audio stream. The base channel audio stream may include the compressed audio signal of the base channel group, and the dependent channel audio stream may include the compressed audio signal of the dependent channel group.

비트스트림 생성부(280)는 기본 채널 오디오 스트림 및 복수의 종속 채널 오디오 스트림을 포함하는 비트스트림을 생성할 수 있다. 복수의 종속 채널 오디오 스트림은 n개의 종속 채널 오디오 스트림(n은 1보다 큰 정수)을 포함할 수 있다. 이때, 기본 채널 오디오 스트림은 모노 채널의 오디오 신호 또는 스테레오 채널의 압축 오디오 신호를 포함할 수 있다. The bitstream generator 280 may generate a bitstream including a base channel audio stream and a plurality of dependent channel audio streams. The plurality of dependent channel audio streams may include n dependent channel audio streams (n being an integer greater than 1). In this case, the basic channel audio stream may include a mono channel audio signal or a stereo channel compressed audio signal.

예를 들어, 기본 채널 오디오 스트림 및 제 1 종속 채널 오디오 스트림을 통해 복원된 제 1 다채널 레이아웃의 채널 중 서라운드 채널의 개수는 S_n-1, 서브 우퍼 채널의 개수는 W_n-1, 높이 채널의 개수는 H_n-1일 수 있다. 기본 채널 오디오 스트림, 제 1 종속 채널 오디오 스트림 및 제 2 종속 채널 오디오 스트림을 통해 복원된 제 2 다채널 레이아웃 중 서라운드 채널의 개수는 Sn, 서브 우퍼 채널의 개수는 Wn, 높이 채널의 개수는 Hn일 수 있다.For example, among the channels of the first multi-channel layout restored through the base channel audio stream and the first dependent channel audio stream, the number of surround channels is S _n-1 , the number of subwoofer channels is W _n-1 , and the height channel The number of may be H _n-1 . The number of surround channels is Sn, the number of subwoofer channels is Wn, and the number of height channels is Hn among the second multi-channel layout restored through the base channel audio stream, the first sub-channel audio stream, and the second sub-channel audio stream. can

이때, S_n-1은 S_n보다 작거나 같고, W_n-1은 W_n보다 작거나 같을 수 있고, H_n-1은 H_n보다 작거나 같을 수 있다.여기서, S_n-1이S_n과 동일하고,W_n-1이W_n과 동일하고, H_n-1이H_n과 동일한 경우는 제외될 수 있다. In this case, S _n-1 may be less than or equal to S _n , W _n-1 may be less than or equal to W _n , and H _n-1 may be less than or equal to H _n .where S _{n-1 is} the same as S _n ,W _n-1 Same as W _n , and H _n-1 isThe same case as H _n may be excluded.

즉, 제 2 다채널 레이아웃의 서라운드 채널의 개수는 제 1 다채널 레이아웃의 서라운드 채널의 개수보다 많아야 한다. 또는, 제 2 다채널 레이아웃의 서브우퍼 채널의 개수는 제 1 다채널 레이아웃의 서브우퍼 채널의 개수보다 많아야 한다. 또는, 제 2 다채널 레이아웃의 높이채널의 개수는 제 1 다채널 레이아웃의 높이채널의 개수보다 많아야 한다. That is, the number of surround channels in the second multi-channel layout should be greater than the number of surround channels in the first multi-channel layout. Alternatively, the number of subwoofer channels in the second multi-channel layout should be greater than the number of subwoofer channels in the first multi-channel layout. Alternatively, the number of height channels in the second multi-channel layout should be greater than the number of height channels in the first multi-channel layout.

또한, 제 2 다채널 레이아웃의 서라운드 채널의 개수는 제 1 다채널 레이아웃의 서라운드 채널의 개수보다 작을 수 없다. 마찬가지로 제 2 다채널 레이아웃의 서브우퍼채널의 개수는 제 1 다채널 레이아웃의 서브우퍼채널의 개수보다 작을 수 없다. 제 2 다채널 레이아웃의 높이채널의 개수는 제 1 다채널 레이아웃의 높이채널의 개수보다 작을 수 없다. Also, the number of surround channels in the second multi-channel layout cannot be smaller than the number of surround channels in the first multi-channel layout. Similarly, the number of subwoofer channels in the second multi-channel layout cannot be smaller than the number of subwoofer channels in the first multi-channel layout. The number of height channels of the second multi-channel layout cannot be smaller than the number of height channels of the first multi-channel layout.

또한, 제2 다채널 레이아웃의 서라운드 채널의 개수가 제 1 다채널 레이아웃의 서라운드 채널의 개수와 동일하면서, 제 2 다채널 레이아웃의 서브우퍼 채널의 개수가 제 1 다채널 레이아웃의 서브우퍼 채널의 개수와 동일하고, 또한, 제 2 다채널 레이아웃의 높이 채널의 개수가 제 1 다채널 레이아웃의 높이 채널의 개수와 동일할 수 없다. 즉, 제 2 다채널 레이아웃의 모든 채널들이 제 1 다채널 레이아웃의 모든 채널과 동일할 수 없다. In addition, while the number of surround channels of the second multi-channel layout is the same as the number of surround channels of the first multi-channel layout, the number of subwoofer channels of the second multi-channel layout is the number of subwoofer channels of the first multi-channel layout Also, the number of height channels in the second multi-channel layout cannot be equal to the number of height channels in the first multi-channel layout. That is, all channels of the second multi-channel layout cannot be identical to all channels of the first multi-channel layout.

구체적인 일 예로, 제 1 다채널 레이아웃이 5.1.2 채널 레이아웃이라고 하면, 제 2 채널 레이아웃은 7.1.4 채널 레이아웃일 수 있다.As a specific example, if the first multi-channel layout is a 5.1.2 channel layout, the second channel layout may be a 7.1.4 channel layout.

또한, 비트스트림 생성부(280)는 부가 정보를 포함하는 메타 데이터를 생성할 수 있다.Also, the bitstream generator 280 may generate metadata including additional information.

결국, 비트스트림 생성부(280)는 기본 채널 오디오 스트림, 종속 채널 오디오 스트림 및 메타 데이터를 포함하는 비트스트림을 생성할 수 있다.As a result, the bitstream generator 280 may generate a bitstream including a base channel audio stream, a dependent channel audio stream, and metadata.

비트스트림 생성부(280)는 기본 채널 그룹으로부터 채널의 개수를 자유롭게 증가시킬 수 있는 형태의 비트스트림을 생성할 수 있다.The bitstream generator 280 may generate a bitstream in a form capable of freely increasing the number of channels from the basic channel group.

즉, 기본 채널 오디오 스트림으로부터 기본 채널 그룹의 오디오 신호가 복원될 수 있고, 기본 채널 오디오 스트림 및 종속 채널 오디오 스트림으로부터, 기본 채널 그룹으로부터 채널의 개수가 증가된 다채널 오디오 신호가 복원될 수 있다.That is, the audio signal of the basic channel group may be reconstructed from the base channel audio stream, and the multi-channel audio signal with the increased number of channels from the base channel group may be reconstructed from the base channel audio stream and the dependent channel audio stream.

한편, 비트스트림 생성부(280)는 복수의 오디오 트랙을 갖는 파일 스트림을 생성할 수 있다. 비트스트림 생성부(280)는 기본 채널 그룹의 적어도 하나의 압축 오디오 신호를 포함하는 제 1 오디오 트랙의 오디오 스트림을 생성할 수 있다. 비트스트림 생성부(280)는 종속 채널 오디오 신호 식별 정보를 포함하는 제 2 오디오 트랙의 오디오 스트림을 생성할 수 있다. 이때, 제 2 오디오 트랙은 제 1 오디오 트랙 이후의 오디오 트랙으로, 서로 인접할 수 있다.Meanwhile, the bitstream generator 280 may generate a file stream having a plurality of audio tracks. The bitstream generator 280 may generate an audio stream of a first audio track including at least one compressed audio signal of a basic channel group. The bitstream generator 280 may generate an audio stream of the second audio track including dependent channel audio signal identification information. In this case, the second audio track is an audio track after the first audio track, and may be adjacent to each other.

비트스트림 생성부(280)는 기본 채널 그룹의 적어도 하나의 오디오 신호에 대응하는 종속 채널 오디오 신호가 존재하는 경우, 적어도 하나의 종속 채널 그룹의 적어도 하나의 압축 오디오 신호를 포함하는 제 2 오디오 트랙의 오디오 스트림을 생성할 수 있다.The bitstream generator 280 is configured to generate a second audio track including at least one compressed audio signal of at least one dependent channel group when there is a dependent channel audio signal corresponding to at least one audio signal of the basic channel group. You can create an audio stream.

한편, 비트스트림 생성부(280)는 기본 채널 그룹의 적어도 하나의 오디오 신호에 대응하는 종속 채널 오디오 신호가 존재하지 않는 경우, 기본 채널 그룹의 제 1 오디오 트랙의 오디오 신호의 다음 기본 채널 그룹의 오디오 신호를 포함하는 제 2 오디오 트랙의 오디오 스트림을 생성할 수 있다.On the other hand, the bitstream generator 280 is configured to, when there is no dependent channel audio signal corresponding to at least one audio signal of the basic channel group, the audio of the next basic channel group of the audio signal of the first audio track of the basic channel group. It is possible to create an audio stream of a second audio track comprising the signal.

도 2c는 일 실시예에 따른 다채널 오디오 신호 처리부의 구성을 도시하는 블록도이다.2C is a block diagram illustrating a configuration of a multi-channel audio signal processing unit according to an embodiment.

도 2c를 참조하면, 다채널 오디오 신호 처리부(260)는 채널 레이아웃 식별부(261), 다운믹스 채널 오디오 생성부(262) 및 오디오 신호 분류부(266)를 포함한다.Referring to FIG. 2C , the multi-channel audio signal processing unit 260 includes a channel layout identification unit 261 , a downmix channel audio generation unit 262 , and an audio signal classification unit 266 .

채널 레이아웃 식별부(261)는 원본 오디오 신호로부터, 적어도 하나의 채널 레이아웃을 식별할 수 있다. 이때, 적어도 하나의 채널 레이아웃은 계층적인 복수의 채널 레이아웃을 포함할 수 있다. 채널 레이아웃 식별부(261)는 원본 오디오 신호의 채널 레이아웃을 식별할 수 있다. 또한, 채널 레이아웃 식별부(261)는 원본 오디오 신호의 채널 레이아웃보다 하위 채널 레이아웃을 식별할 수 있다. 예를 들어, 원본 오디오 신호가 7.1.4 채널 레이아웃의 오디오 신호인 경우, 채널 레이아웃 식별부(261)는 7.1.4 채널 레이아웃을 식별하고, 7.1.4 채널 레이아웃보다 하위 채널 레이아웃인 5.1.2 채널 레이아웃, 3.1.2 채널 레이아웃 및 2 채널 레이아웃 등을 식별할 수 있다. 상위 채널 레이아웃은 하위 채널 레이아웃보다 서라운드 채널/서브우퍼 채널/높이 채널 중 적어도 하나의 채널 개수가 많은 레이아웃을 의미한다. 서라운드 채널의 개수가 많고 적은지에 따라, 상위/하위 채널 레이아웃이 결정될 수 있고, 서라운드 채널의 개수가 동일한 경우, 서브우퍼 채널의 개수가 많고 적은지에 따라 상위/하위 채널 레이아웃이 결정될 수 있다. 서브 우퍼 채널의 개수 및 서브 우퍼의 채널의 개수가 동일한 경우, 높이 채널의 개수가 많고 적은지에 따라 상위/하위 채널 레이아웃이 결정될 수 있다.The channel layout identification unit 261 may identify at least one channel layout from the original audio signal. In this case, the at least one channel layout may include a plurality of hierarchical channel layouts. The channel layout identification unit 261 may identify the channel layout of the original audio signal. Also, the channel layout identification unit 261 may identify a lower channel layout than the channel layout of the original audio signal. For example, when the original audio signal is an audio signal of the 7.1.4 channel layout, the channel layout identification unit 261 identifies the 7.1.4 channel layout, and 5.1.2 channel that is a lower channel layout than the 7.1.4 channel layout. Layout, 3.1.2 channel layout and 2 channel layout, etc. can be identified. The upper channel layout refers to a layout in which the number of at least one of the surround channels/subwoofer channels/height channels is greater than that of the lower channel layout. Depending on whether the number of surround channels is large or small, the upper/lower channel layout may be determined. When the number of surround channels is the same, the upper/lower channel layout may be determined according to whether the number of subwoofer channels is large or small. When the number of subwoofer channels and the number of subwoofer channels are the same, the upper/lower channel layout may be determined according to whether the number of height channels is large or small.

또한, 식별된 채널 레이아웃은 타겟 채널 레이아웃을 포함할 수 있다. 타겟 채널 레이아웃이란, 최종적으로 출력되는 비트스트림에 포함된 오디오 신호의 최상위 채널 레이아웃을 의미할 수 있다. 타겟 채널 레이아웃은 원본 오디오 신호의 채널 레이아웃, 또는 원본 오디오 신호의 채널 레이아웃보다 하위 채널 레이아웃일 수 있다. Also, the identified channel layout may include a target channel layout. The target channel layout may mean a layout of an uppermost channel of an audio signal included in a finally output bitstream. The target channel layout may be a channel layout of the original audio signal or a channel layout lower than the channel layout of the original audio signal.

구체적으로, 원본 오디오 신호로부터 식별되는 채널 레이아웃은 원본 오디오 신호의 채널 레이아웃으로부터 계층적으로 결정될 수 있다. 이때, 채널 레이아웃 식별부(261)는 미리 결정된 채널 레이아웃들 중 적어도 하나의 채널 레이아웃을 식별할 수 있다. 예를 들어, 채널 레이아웃 식별부(261)는 원본 오디오 신호의 레이아웃인 7.1.4 채널 레이아웃로부터, 미리 결정된 채널의 레이아웃들 중 일부인 7.1.4 채널 레이아웃, 5.1.4 채널 레이아웃, 5.1.2 채널 레이아웃, 3.1.2 채널 레이아웃 및 2 채널 레이아웃을 식별할 수 있다.Specifically, the channel layout identified from the original audio signal may be hierarchically determined from the channel layout of the original audio signal. In this case, the channel layout identification unit 261 may identify at least one channel layout among predetermined channel layouts. For example, the channel layout identification unit 261 may receive a 7.1.4 channel layout, a 5.1.4 channel layout, and a 5.1.2 channel layout which are some of the predetermined channel layouts from the 7.1.4 channel layout which is the layout of the original audio signal. , 3.1.2 channel layout and 2-channel layout can be identified.

채널 레이아웃 식별부(261)는 식별된 채널 레이아웃을 기초로, 제 1 다운믹스 채널 오디오 생성부(263), 제 2 다운믹스 채널 오디오 생성부(264), ..., 제 N 다운믹스 채널 오디오 생성부(265) 중 식별된 적어도 하나의 채널 레이아웃에 대응하는 다운믹스 채널 오디오 생성부로 제어 신호를 전달하고, 다운믹스 채널 오디오 생성부(262)는 채널 레이아웃 식별부(261)에서 식별된 적어도 하나의 채널 레이아웃을 기초로, 원본 오디오 신호로부터 다운믹스 채널 오디오를 생성할 수 있다. 다운믹스 채널 오디오 생성부(262)는 적어도 하나의 다운믹싱 가중치 파라미터를 포함하는 다운믹싱 매트릭스를 이용하여, 원본 오디오 신호로부터 다운믹스 채널 오디오를 생성할 수 있다.The channel layout identification unit 261 includes a first downmix channel audio generation unit 263, a second downmix channel audio generation unit 264, ..., an Nth downmix channel audio based on the identified channel layout. The control signal is transmitted to the downmix channel audio generator corresponding to the identified at least one channel layout of the generator 265 , and the downmix channel audio generator 262 generates at least one identified by the channel layout identifier 261 . Based on the channel layout of , it is possible to generate downmix channel audio from the original audio signal. The downmix channel audio generator 262 may generate downmix channel audio from the original audio signal by using a downmixing matrix including at least one downmixing weight parameter.

예를 들어, 원본 오디오 신호의 채널 레이아웃이 미리 결정된 채널 레이아웃들 중 오름차순으로 n번째 채널 레이아웃일 때, 다운믹스 채널 오디오 생성부(262)는 원본 오디오 신호로부터 원본 오디오 신호의 채널 레이아웃의 바로 하위의 n-1번째의 채널 레이아웃의 다운믹스 채널 오디오를 생성할 수 있다. 이러한 과정을 반복하여, 다운믹스 채널 오디오 생성부(252)는 현재 채널 레이아웃의 하위의 채널 레이아웃들의 다운믹스 채널 오디오들을 생성할 수 있다.For example, when the channel layout of the original audio signal is the n-th channel layout in ascending order among the predetermined channel layouts, the downmix channel audio generating unit 262 is configured to immediately lower the channel layout of the original audio signal from the original audio signal. It is possible to generate downmix channel audio of the n-1 th channel layout. By repeating this process, the downmix channel audio generator 252 may generate downmix channel audios of channel layouts lower than the current channel layout.

예를 들어, 다운믹스 채널 오디오 생성부(262)는 제 1 다운믹스 채널 오디오 생성부(263), 제 2 다운믹스 채널 오디오 생성부(264),..., 제 n-1 다운믹스 채널 오디오 생성부(미도시)를 포함할 수 있다. n-1은 N보다 작거나 같을 수 있다.For example, the downmix channel audio generation unit 262 includes the first downmix channel audio generation unit 263, the second downmix channel audio generation unit 264, ..., the n-1th downmix channel audio It may include a generator (not shown). n-1 may be less than or equal to N.

이때, 제 n-1 다운믹스 채널 오디오 생성부(미도시)는 원본 오디오 신호로부터 제 n-1 채널 레이아웃의 오디오 신호를 생성할 수 있다. 또한, 제 n-2 다운믹스 채널 오디오 생성부(미도시)는 원본 오디오 신호로부터 제 n-2 채널 레이아웃의 오디오 신호를 생성할 수 있다. 이와 같은 방식으로, 제 1 다운믹스 채널 오디오 생성부(263)는 원본 오디오 신호로부터 제 1 채널 레이아웃의 오디오 신호를 생성할 수 있다. 이때, 제 1 채널 레이아웃의 오디오 신호는 기본 채널 그룹의 오디오 신호일 수 있다.In this case, the n-1 th downmix channel audio generator (not shown) may generate an audio signal of the n-1 th channel layout from the original audio signal. Also, the n-2 th downmix channel audio generator (not shown) may generate an audio signal of an n-2 th channel layout from the original audio signal. In this way, the first downmix channel audio generator 263 may generate the audio signal of the first channel layout from the original audio signal. In this case, the audio signal of the first channel layout may be an audio signal of the basic channel group.

한편, 각 다운믹스 채널 오디오 생성부(263,264,...,265)는 캐스케이드한 방식으로 연결될 수 있다. 즉, 각 다운믹스 채널 오디오 생성부(263,264,...,265)는 상위 다운믹스 채널 오디오 생성부의 출력이 하위 다운믹스 채널 오디오 생성부의 입력이 되는 식으로 연결될 수 있다. 예를 들어, 원본 오디오 신호를 입력으로 하여 제 n-1 다운믹스 채널 오디오 생성부(미도시)로부터 제 n-1 채널 레이아웃의 오디오 신호가 출력될 수 있고, 제 n-1 채널 레이아웃의 오디오 신호는 제 n-2 다운믹스 채널 오디오 생성부(미도시)로 입력되고 제 n-2 다운믹스 채널 오디오 생성부(미도시)로부터 제 n-2 다운믹스 채널 오디오가 생성될 수 있다. 이런 식으로, 각 다운믹스 채널 오디오 생성부(263,264,...,265) 간에 연결되어, 각 채널 레이아웃의 오디오 신호를 출력할 수 있다.Meanwhile, each of the downmix channel audio generators 263, 264, ..., 265 may be connected in a cascaded manner. That is, each of the downmix channel audio generators 263 , 264 , ..., 265 may be connected in such a way that the output of the upper downmix channel audio generator becomes the input of the lower downmix channel audio generator. For example, an n-1 th channel layout audio signal may be output from an n-1 th downmix channel audio generator (not shown) with an original audio signal as an input, and an n-1 th channel layout audio signal may be output. may be input to an n-2 th downmix channel audio generator (not shown), and an n-2 th downmix channel audio may be generated from the n-2 th downmix channel audio generator (not shown). In this way, the respective downmix channel audio generators 263 , 264 , ..., 265 may be connected to each other to output an audio signal of each channel layout.

오디오 신호 분류부(266)는 적어도 하나의 채널 레이아웃의 오디오 신호를 기초로, 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호를 획득할 수 있다. 이때, 오디오 분류부(266)는 믹싱부(267)를 통해 적어도 하나의 채널 레이아웃의 오디오 신호에 포함된 적어도 하나의 채널의 오디오 신호를 믹싱할 수 있다. 오디오 분류부(266)는 믹싱된 오디오 신호를 기본 채널 그룹의 신호 및 종속 채널 그룹의 오디오 신호 중 적어도 하나로 분류할 수 있다.The audio signal classifier 266 may obtain an audio signal of a basic channel group and an audio signal of a subordinate channel group based on an audio signal of at least one channel layout. In this case, the audio classification unit 266 may mix the audio signal of at least one channel included in the audio signal of the at least one channel layout through the mixing unit 267 . The audio classification unit 266 may classify the mixed audio signal into at least one of a signal of a basic channel group and an audio signal of a subordinate channel group.

도 2d는 오디오 신호 분류부의 구체적인 동작의 일 예를 설명하기 위한 도면이다.2D is a diagram for explaining an example of a specific operation of an audio signal classifier.

도 2d를 참조하면, 도 2c의 다운믹스 채널 오디오 생성부(262)는 7.1.4 채널 레이아웃(290)의 원본 오디오 신호로부터, 하위 채널 레이아웃의 오디오 신호인 5.1.2 채널 레이아웃(291)의 오디오 신호, 3.1.2 채널 레이아웃(292)의 오디오 신호 및 2 채널 레이아웃(293)의 오디오 신호 및 모노 채널 레이아웃(294)의 오디오 신호를 획득할 수 있다. 다운믹스 채널 오디오 생성부(262)의 각 다운믹스 채널 오디오 생성부(263,264,...,265)는 캐스케이드한 방식으로 연결되어 있기 때문에, 순차적으로, 현재 채널 레이아웃으로부터 하위 채널 레이아웃의 오디오 신호를 획득할 수 있다.Referring to FIG. 2D , the downmix channel audio generator 262 of FIG. 2C uses the audio signal of the 5.1.2 channel layout 291 that is the audio signal of the lower channel layout from the original audio signal of the 7.1.4 channel layout 290 . signal, the audio signal of the 3.1.2 channel layout 292 and the audio signal of the two-channel layout 293 and the audio signal of the mono channel layout 294 may be obtained. Since each of the downmix channel audio generators 263, 264, ..., 265 of the downmix channel audio generator 262 is connected in a cascaded manner, sequentially, the audio signal of the lower channel layout from the current channel layout is generated. can be obtained

도 2c의 오디오 신호 분류부(266)는 모노 채널 레이아웃(294)의 오디오 신호를 기본 채널 그룹의 오디오 신호로 분류할 수 있다. The audio signal classifier 266 of FIG. 2C may classify the audio signal of the mono channel layout 294 into an audio signal of a basic channel group.

오디오 신호 분류부(266)는 2채널 레이아웃(293)의 오디오 신호 중 일부인 L2 채널의 오디오 신호를 종속 채널 그룹 #1의 오디오 신호로 분류할 수 있다. 한편, L2 채널의 오디오 신호와 R2 채널의 오디오 신호가 믹싱되어 모노 채널 레이아웃(294)의 오디오 신호가 생성되기 때문에, 역으로, 오디오 복호화 장치(300,500)는 모노 채널 레이아웃(294)의 오디오 신호와 L2 채널의 오디오 신호를 디믹싱하여 R2 채널의 오디오 신호를 복원할 수 있다. 따라서 R2 채널의 오디오 신호는 별도의 채널 그룹의 오디오 신호로 분류되지 않을 수 있다.The audio signal classifying unit 266 may classify the audio signal of the L2 channel, which is a part of the audio signal of the two-channel layout 293 , into the audio signal of the subordinate channel group #1. On the other hand, since the audio signal of the L2 channel and the audio signal of the R2 channel are mixed to generate the audio signal of the mono channel layout 294, conversely, the audio decoding apparatus 300, 500 performs the audio signal of the mono channel layout 294 and The audio signal of the R2 channel may be restored by demixing the audio signal of the L2 channel. Accordingly, the audio signal of the R2 channel may not be classified as an audio signal of a separate channel group.

오디오 신호 분류부(266)는 3.1.2 채널 레이아웃(292)의 오디오 신호 중 Hfl3 채널의 오디오 신호, C 채널의 오디오 신호, LFE의 오디오 신호 및 Hfr3 채널의 오디오 신호를 종속 채널 그룹 #2의 오디오 신호로 분류할 수 있다. L3 채널의 오디오 신호와 Hfl3 채널의 오디오 신호가 믹싱되어 L2 채널의 오디오 신호가 생성되기 때문에, 역으로, 오디오 복호화 장치(300,500)는 종속 채널 그룹 #1의 L2 채널의 오디오 신호와 종속 채널 그룹 #2의 Hfl3 채널의 오디오 신호를 디믹싱하여 L3 채널의 오디오 신호를 복원할 수 있다.The audio signal classifying unit 266 divides the audio signal of the Hfl3 channel, the C channel audio signal, the LFE audio signal, and the Hfr3 channel audio signal among the audio signals of the 3.1.2 channel layout 292 into the audio signal of the dependent channel group #2. It can be classified as a signal. Since the audio signal of the L3 channel and the audio signal of the Hfl3 channel are mixed to generate the audio signal of the L2 channel, conversely, the audio decoding apparatus 300,500 provides the audio signal of the L2 channel of the dependent channel group #1 and the audio signal of the dependent channel group # By demixing the audio signal of the Hfl3 channel of 2, the audio signal of the L3 channel can be restored.

따라서, 3.1.2 채널 레이아웃(292)의 오디오 신호 중 L3 채널 의 오디오 신호는 특정 채널 그룹의 오디오 신호로 분류되지 않을 수 있다. Accordingly, the audio signal of the L3 channel among the audio signals of the 3.1.2 channel layout 292 may not be classified as an audio signal of a specific channel group.

R3 채널도 마찬가지의 이유로, 특정 채널 그룹의 오디오 신호로 분류되지 않을 수 있다.The R3 channel may not be classified as an audio signal of a specific channel group for the same reason.

오디오 신호 분류부(266)는 5.1.2 채널 레이아웃(291)의 오디오 신호를 전송하기 위해, 5.1.2 채널 레이아웃(291)의 일부 채널의 오디오 신호인 L 채널의 오디오 신호와 R 채널의 오디오 신호를 종속 채널 그룹 #3의 오디오 신호로 전송할 수 있다. 한편, Ls5, Hl5, Rs5, Hr5 중 하나의 채널의 오디오 신호는 5.1.2 채널 레이아웃(291)의 오디오 신호 중 하나이나, 별도의 종속 채널 그룹의 오디오 신호로 분류되지 않는다. 이유는, Ls5, Hl5, Rs5, Hr5와 같은 채널의 신호들은 청자 전방의 채널 오디오 신호가 아닐 뿐 아니라, 7.1.4 채널 레이아웃(290)의 오디오 신호 중 청자 전방, 측방, 후방의 오디오 채널 중 적어도 하나 채널의 오디오 신호가 믹싱된 신호이다. 믹싱된 신호를 종속 채널 그룹의 오디오 신호로 분류하여 압축하기 보다는, 원본 오디오 신호 중 청자 전방의 오디오 채널의 오디오 신호를 그대로 압축하면, 청자 전방의 오디오 채널의 오디오 신호의 음질이 향상될 수 있다. 이로 인해, 청자 입장에서 재생되는 오디오 신호의 음질이 보다 향상된 것처럼 느낄 수 있다.The audio signal classification unit 266 transmits the audio signal of the 5.1.2 channel layout 291, the audio signal of some channels of the 5.1.2 channel layout 291, the audio signal of the L channel and the audio signal of the R channel may be transmitted as an audio signal of the dependent channel group #3. Meanwhile, an audio signal of one channel among Ls5, H15, Rs5, and Hr5 is one of the audio signals of the 5.1.2 channel layout 291, but is not classified as an audio signal of a separate subordinate channel group. The reason is that the signals of channels such as Ls5, H15, Rs5, and Hr5 are not only channel audio signals in front of the listener, but at least among the audio signals of the 7.1.4 channel layout 290 of the audio channels in front, side, and rear of the listener. The audio signal of one channel is a mixed signal. If the audio signal of the audio channel in front of the listener is compressed as it is, among the original audio signals, the audio quality of the audio signal of the audio channel in front of the listener may be improved, rather than classifying the mixed signal as an audio signal of a dependent channel group and compressing it. Accordingly, the listener may feel that the sound quality of the reproduced audio signal is more improved.

하지만, 경우에 따라, L 대신 Ls5 또는 Hl5가 종속 채널 그룹 #3의 오디오 신호로 분류될 수 있고, R 대신 Rs5또는 Hr5가 종속 채널 그룹 #3의 오디오 신호로 분류될 수 있다.However, in some cases, Ls5 or H15 may be classified as the audio signal of the dependent channel group #3 instead of L, and Rs5 or Hr5 may be classified as the audio signal of the dependent channel group #3 instead of R.

오디오 신호 분류부(266)는 7.1.4 채널 레이아웃(290)의 오디오 신호 중 Ls,Hfl,Rs,Hfr 채널의 신호를 종속 채널 그룹 #4의 오디오 신호로 분류할 수 있다. 이때, Ls 대신 Lb, Hfl 대신 Hbl, Rs 대신 Rb, Hfr 대신 Hbr는 종속 채널 그룹 #4의 오디오 신호로 분류되지 않는다. 7.1.4 채널 레이아웃(290)의 오디오 신호에서 청자 후방의 오디오 채널 오디오 신호를 채널 그룹의 오디오 신호로 분류하여 압축하기 보다는, 원본 오디오 신호 중 청자 전방에 가까운 측방의 오디오 채널의 오디오 신호를 그대로 압축하면 청자 전방에 가까운 측방의 오디오 채널의 오디오 신호의 음질이 향상될 수 있다. 따라서, 청자 입장에서 재생되는 오디오 신호의 음질이 보다 향상된 것처럼 느낄 수 있다. 하지만, 경우에 따라, Ls 대신 Lb, Hfl 대신 Hbl, Rs 대신 Rb, Hfr 대신 Hbr 채널의 오디오 신호가 종속 채널 그룹 #4의 오디오 신호로 분류될 수 있다.The audio signal classifying unit 266 may classify the signals of the Ls, Hfl, Rs, and Hfr channels among the audio signals of the 7.1.4 channel layout 290 into the audio signals of the subordinate channel group #4. In this case, Lb instead of Ls, Hbl instead of Hfl, Rb instead of Rs, and Hbr instead of Hfr are not classified as the audio signal of the dependent channel group #4. 7.1.4 In the audio signal of the channel layout 290, rather than classifying and compressing the audio signal of the audio channel behind the listener as the audio signal of the channel group, the audio signal of the audio channel near the front of the listener is compressed as it is among the original audio signals. Accordingly, the sound quality of the audio signal of the audio channel on the side closer to the front of the listener may be improved. Accordingly, the listener may feel that the sound quality of the reproduced audio signal is more improved. However, in some cases, an audio signal of a channel Lb instead of Ls, Hbl instead of Hfl, Rb instead of Rs, and Hbr instead of Hfr may be classified as an audio signal of the dependent channel group #4.

결국, 도 2c의 다운믹스 채널 오디오 생성부(262)는 원본 오디오 신호 레이아웃으로부터 식별된 복수의 하위 채널 레이아웃을 기초로, 복수의 하위 레이아웃의 오디오 신호(다운믹스 채널 오디오)를 생성할 수 있다. 도 2c의 오디오 신호 분류부(266)는 원본 오디오 신호 및 복수의 하위 레이아웃의 오디오 신호를 기초로, 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹 #1,#2,#3,#4의 오디오 신호를 분류할 수 있다. 이때, 분류되는 채널의 오디오 신호는 각 채널 레이아웃에 따른 각 채널의 오디오 신호 중 독립 채널의 오디오 신호의 일부를 채널 그룹의 오디오 신호로 분류할 수 있다. 오디오 복호화 장치(300,500)는 오디오 신호 채널 분류부(266)에서 분류되지 않는 오디오 신호는 디믹싱을 통해 복원할 수 있다. 한편, 청자 중심으로 좌측 채널의 오디오 신호가 특정 채널 그룹의 오디오 신호로 분류된다면, 좌측 채널에 대응하는 우측 채널의 오디오 신호도 해당 채널 그룹의 오디오 신호로 분류될 수 있다. 즉, 커플링된 채널들의 오디오 신호는 하나의 채널 그룹의 오디오 신호로 분류될 수 있다.As a result, the downmix channel audio generator 262 of FIG. 2C may generate audio signals (downmix channel audio) of the plurality of sub-layouts based on the plurality of sub-channel layouts identified from the original audio signal layout. The audio signal classifying unit 266 of FIG. 2C is based on the original audio signal and the audio signals of a plurality of sub-layouts, the audio signals of the basic channel group and the audio signals of the sub-channel groups #1, #2, #3, and #4. can be classified. In this case, as for the audio signal of the classified channel, a part of the audio signal of the independent channel among the audio signals of each channel according to each channel layout may be classified as the audio signal of the channel group. The audio decoding apparatuses 300 and 500 may reconstruct an audio signal that is not classified by the audio signal channel classifier 266 through demixing. Meanwhile, if the audio signal of the left channel is classified as the audio signal of a specific channel group with respect to the listener, the audio signal of the right channel corresponding to the left channel may also be classified as the audio signal of the corresponding channel group. That is, the audio signal of the coupled channels may be classified as an audio signal of one channel group.

스테레오 채널 레이아웃의 오디오 신호가 기본 채널 그룹의 오디오 신호로 분류된 경우에는, 커플링된 채널들의 오디오 신호는 모두 하나의 채널 그룹의 오디오 신호로 분류될 수 있다. 하지만, 도 2d를 참조하여 전술한 바와 같이, 모노 채널 레이아웃의 오디오 신호가 기본 채널 그룹의 오디오 신호로 분류된 경우에는, 예외적으로, 스테레오 채널의 오디오 신호 중 하나만이 종속 채널 그룹 #1의 오디오 신호로 분류될 수 있다. 다만, 채널 그룹의 오디오 신호의 분류 방법은 도 2d를 참조하여 상술한 내용에 제한되지 않고, 다양한 방법에 의할 수 있다. 즉, 분류된 채널 그룹의 오디오 신호를 디믹싱하고, 디믹싱된 오디오 신호로부터 채널 그룹의 오디오 신호로 분류되지 않은 채널의 오디오 신호를 복원할 수만 있다면, 다양한 형태로 채널 그룹의 오디오 신호가 분류될 수 있다.When an audio signal of a stereo channel layout is classified as an audio signal of a basic channel group, all audio signals of the coupled channels may be classified as an audio signal of one channel group. However, as described above with reference to FIG. 2D, when the audio signal of the mono channel layout is classified as the audio signal of the basic channel group, as an exception, only one of the audio signals of the stereo channel is the audio signal of the dependent channel group #1 can be classified as However, the method of classifying the audio signal of the channel group is not limited to the above description with reference to FIG. 2D, and various methods may be used. That is, if the audio signal of the classified channel group can be demixed and the audio signal of the channel not classified as the audio signal of the channel group can be restored from the demixed audio signal, the audio signal of the channel group can be classified in various forms. can

도 3a는 일 실시예에 따른 다채널 오디오 복호화 장치의 구성을 도시하는 블록도이다.3A is a block diagram illustrating a configuration of a multi-channel audio decoding apparatus according to an embodiment.

오디오 복호화 장치(300)는 메모리(310) 및 프로세서(330)를 포함한다. 오디오 복호화 장치(300)는 서버, TV, 카메라, 휴대폰, 태블릿 PC, 노트북 등 오디오 처리가 가능한 기기로 구현될 수 있다.The audio decoding apparatus 300 includes a memory 310 and a processor 330 . The audio decoding apparatus 300 may be implemented as a device capable of audio processing, such as a server, a TV, a camera, a mobile phone, a tablet PC, and a notebook computer.

도 3a에는 메모리(310) 및 프로세서(330)가 개별적으로 도시되어 있으나, 메모리(310) 및 프로세서(330)는 하나의 하드웨어 모듈(예를 들어, 칩)을 통해 구현될 수 있다. Although the memory 310 and the processor 330 are separately illustrated in FIG. 3A , the memory 310 and the processor 330 may be implemented through one hardware module (eg, a chip).

프로세서(330)는 신경망 기반의 오디오 처리를 위한 전용 프로세서로 구현될 수 있다. 또는, 프로세서(230)는 AP(application processor), CPU(central processing unit) 또는 GPU(graphic processing unit)와 같은 범용 프로세서와 소프트웨어의 조합을 통해 구현될 수도 있다. 전용 프로세서의 경우, 본 개시의 실시예를 구현하기 위한 메모리를 포함하거나, 외부 메모리를 이용하기 위한 메모리 처리부를 포함할 수 있다The processor 330 may be implemented as a dedicated processor for neural network-based audio processing. Alternatively, the processor 230 may be implemented through a combination of a general-purpose processor such as an application processor (AP), a central processing unit (CPU), or a graphic processing unit (GPU) and software. In the case of a dedicated processor, it may include a memory for implementing an embodiment of the present disclosure or a memory processing unit for using an external memory.

프로세서(330)는 복수의 프로세서로 구성될 수도 있다. 이 경우, 전용 프로세서들의 조합으로 구현될 수도 있고, AP, CPU 또는 GPU와 같은 다수의 범용 프로세서들과 소프트웨어의 조합을 통해 구현될 수도 있다.The processor 330 may include a plurality of processors. In this case, it may be implemented as a combination of dedicated processors, or may be implemented through a combination of software and a plurality of general-purpose processors such as an AP, CPU, or GPU.

메모리(310)는 오디오 처리를 위한 하나 이상의 인스트럭션을 저장할 수 있다. 일 실시예에서, 메모리(310)는 신경망을 저장할 수 있다. 신경망이 인공 지능을 위한 전용 하드웨어 칩 형태로 구현되거나, 기존의 범용 프로세서(예를 들어, CPU 또는 애플리케이션 프로세서) 또는 그래픽 전용 프로세서(예를 들어, GPU)의 일부로 구현되는 경우에는, 신경망이 메모리(310)에 저장되지 않을 수 있다. 신경망은 외부 장치(예를 들어, 서버)에 의해 구현될 수 있고, 이 경우, 오디오 복호화 장치(300)는 요청하고, 외부 장치로부터 신경망에 기초한 결과 정보를 수신할 수 있다.The memory 310 may store one or more instructions for audio processing. In one embodiment, the memory 310 may store a neural network. When a neural network is implemented in the form of a dedicated hardware chip for artificial intelligence, or as part of an existing general-purpose processor (eg, CPU or application processor) or graphics-only processor (eg, GPU), the neural network is 310) may not be stored. The neural network may be implemented by an external device (eg, a server). In this case, the audio decoding device 300 may make a request and receive result information based on the neural network from the external device.

프로세서(330)는 메모리(310)에 저장된 인스트럭션에 따라 연속된 프레임들을 순차적으로 처리하여 연속된 복원 프레임들을 획득한다. 연속된 프레임은 오디오를 구성하는 프레임들을 의미할 수 있다. The processor 330 sequentially processes successive frames according to an instruction stored in the memory 310 to obtain successive restored frames. A continuous frame may mean frames constituting audio.

프로세서(330)는 비트스트림을 입력으로 하여, 오디오 처리 동작을 수행하여 다채널 오디오 신호를 출력할 수 있다. 이때, 비트스트림은 기본 채널 그룹으로부터 채널의 개수를 증가시킬 수 있도록 스케일러블한 형태로 구현될 수 있다. 예를 들어, 프로세서(330)는 비트스트림으로부터 기본 채널 그룹의 압축 오디오 신호를 획득할 수 있고, 기본 채널 그룹의 압축 오디오 신호를 압축 해제하여 기본 채널 그룹의 오디오 신호(예를 들어, 스테레오 채널 오디오 신호)를 복원할 수 있다. 추가적으로, 프로세서(330)는 비트스트림으로부터 종속 채널 그룹의 압축 오디오 신호를 압축 해제하여 종속 채널 그룹의 오디오 신호를 복원할 수 있다. 프로세서(330)는 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호를 기초로, 다채널의 오디오 신호를 복원할 수 있다. The processor 330 may output a multi-channel audio signal by receiving a bitstream as an input and performing an audio processing operation. In this case, the bitstream may be implemented in a scalable form to increase the number of channels from the basic channel group. For example, the processor 330 may obtain a compressed audio signal of the basic channel group from the bitstream, and decompress the compressed audio signal of the basic channel group to obtain an audio signal of the basic channel group (eg, stereo channel audio). signal) can be restored. Additionally, the processor 330 may decompress the compressed audio signal of the dependent channel group from the bitstream to reconstruct the audio signal of the dependent channel group. The processor 330 may reconstruct a multi-channel audio signal based on the audio signal of the basic channel group and the audio signal of the dependent channel group.

한편, 프로세서(330)는 비트스트림으로부터 제 1 종속 채널 그룹의 압축 오디오 신호를 압축 해제하여 제 1 종속 채널 그룹의 오디오 신호를 복원할 수 있다. 프로세서(330)는 제 2 종속 채널 그룹의 압축 오디오 신호를 압축 해제하여 제 2 종속 채널 그룹의 오디오 신호를 복원할 수 있다.Meanwhile, the processor 330 may decompress the compressed audio signal of the first subordinate channel group from the bitstream to reconstruct the audio signal of the first subordinate channel group. The processor 330 may decompress the compressed audio signal of the second dependent channel group to reconstruct the audio signal of the second dependent channel group.

프로세서(330)는 기본 채널 그룹의 오디오 신호 및 제 1 종속 채널 그룹 및 제 2 종속 채널 그룹의 오디오 신호를 기초로, 보다 채널의 개수가 증가된 다채널의 오디오 신호를 복원할 수 있다. 이와 유사하게 n개의 종속 채널 그룹(n은 2보다 큰 정수)까지의 압축 오디오 신호를 압축 해제하고, 기본 채널 그룹의 오디오 신호 및 n개의 종속 채널 그룹의 오디오 신호를 기초로, 더욱 더 채널의 개수가 증가된 다채널의 오디오 신호를 복원할 수 있다.The processor 330 may reconstruct a multi-channel audio signal with an increased number of channels based on the audio signal of the basic channel group and the audio signal of the first and second dependent channel groups. Similarly, decompress the compressed audio signal up to n dependent channel groups (n is an integer greater than 2), and based on the audio signal of the base channel group and the audio signal of the n dependent channel group, the number of channels even more It is possible to reconstruct the multi-channel audio signal with increased .

도 3b는 일 실시예에 따른 다채널 오디오 복호화 장치의 구성을 도시하는 블록도이다.3B is a block diagram illustrating a configuration of a multi-channel audio decoding apparatus according to an embodiment.

도 3b를 참조하면, 오디오 복호화 장치(300)는 정보 획득부(350), 다채널 오디오 복호화부(360)을 포함할 수 있다. 다채널 오디오 복호화부(360)는 압축 해제부(370) 및 다채널 오디오 신호 복원부(380)을 포함할 수 있다.Referring to FIG. 3B , the audio decoding apparatus 300 may include an information obtaining unit 350 and a multi-channel audio decoding unit 360 . The multi-channel audio decoding unit 360 may include a decompression unit 370 and a multi-channel audio signal restoration unit 380 .

오디오 복호화 장치(300)는 도 3a의 메모리(310) 및 프로세서(330)를 포함할 수 있고, 도 3b의 각 구성요소(350, 360, 370, 380)를 구현하기 위한 인스트럭션은 메모리(310)에 저장될 수 있다. 프로세서(330)는 메모리(210)에 저장된 인스트럭션을 실행할 수 있다.The audio decoding apparatus 300 may include the memory 310 and the processor 330 of FIG. 3A , and instructions for implementing each of the components 350 , 360 , 370 , and 380 of FIG. 3B are the memory 310 . can be stored in The processor 330 may execute instructions stored in the memory 210 .

정보 획득부(350)는 비트스트림으로부터 기본 채널 그룹의 압축 오디오 신호를 획득할 수 있다. 즉, 정보 획득부(350)는 비트스트림으로부터 기본 채널 그룹의 적어도 하나의 압축 오디오 신호를 포함하는 기본 채널 오디오 스트림을 분류할 수 있다.The information obtaining unit 350 may obtain a compressed audio signal of the basic channel group from the bitstream. That is, the information obtaining unit 350 may classify the basic channel audio stream including at least one compressed audio signal of the basic channel group from the bitstream.

또한, 정보 획득부(350)는 비트스트림으로부터 적어도 하나의 종속 채널 그룹의 적어도 하나의 압축 오디오 신호를 획득할 수 있다. 즉, 정보 획득부(350)는 비트스트림으로부터 종속 채널 그룹의 적어도 하나의 압축 오디오 신호를 포함하는 적어도 하나의 종속 채널 오디오 스트림을 분류할 수 있다.Also, the information obtaining unit 350 may obtain at least one compressed audio signal of at least one dependent channel group from the bitstream. That is, the information obtaining unit 350 may classify at least one dependent channel audio stream including at least one compressed audio signal of the dependent channel group from the bitstream.

한편, 비트스트림은 기본 채널 오디오 스트림 및 복수의 종속 채널 스트림을 포함할 수 있다. 복수의 종속 채널 오디오 스트림은 제 1 종속 채널 오디오 스트림 및 제 2 종속 채널 오디오 스트림을 포함할 수 있다. Meanwhile, the bitstream may include a base channel audio stream and a plurality of dependent channel streams. The plurality of dependent channel audio streams may include a first dependent channel audio stream and a second dependent channel audio stream.

이때, 기본 채널 오디오 스트림 및 제 1 종속 채널 오디오 스트림을 통해 복원된 다채널의 제 1 오디오 신호와 기본 채널 오디오 스트림, 제 1 종속 채널 오디오 스트림 및 제 2 종속 채널 오디오 스트림을 통해 복원된 다채널의 제 2 오디오 신호의 채널들의 제한에 대하여 설명하기로 한다.In this case, the multi-channel first audio signal reconstructed through the base channel audio stream and the first sub-channel audio stream and the multi-channel audio signal reconstructed through the base channel audio stream, the first sub-channel audio stream, and the second sub-channel audio stream are Restrictions on channels of the second audio signal will be described.

예를 들어, 기본 채널 오디오 스트림 및 제 1 종속 채널 오디오 스트림을 통해 복원된 제1 다채널 레이아웃의 채널 중 서라운드 채널의 개수는 S_n-1, 서브 우퍼 채널의 개수는 W_n-1, 높이 채널의 개수는 H_n-1일 수 있다. 기본 채널 오디오 스트림, 제 1 종속 채널 오디오 스트림 및 제 2 종속 채널 오디오 스트림을 통해 복원된 제2 다채널 레이아웃 중 서라운드 채널의 개수는 S_n, 서브 우퍼 채널의 개수는 W_n, 높이 채널의 개수는 H_n일 수 있다. 이때, S_n-1은 S_n보다 작거나 같고, W_n-1은 W_n보다 작거나 같을 수 있고, H_n-1은 H_n보다 작거나 같을 수 있다_.여기서, S_n-1이S_n과 동일하고,W_n-1이W_n과 동일하고, H_n-1이H_n과 동일한 경우는 제외될 수 있다.For example, among the channels of the first multi-channel layout restored through the base channel audio stream and the first dependent channel audio stream, the number of surround channels is S _n-1 , the number of subwoofer channels is W _n-1 , and the height channel The number of may be H _n-1 . The number of surround channels is S _n , the number of subwoofer channels W _n , and the number of height channels among the second multi-channel layout restored through the base channel audio stream, the first dependent channel audio stream, and the second dependent channel audio stream is may be H _n . In this case, S _n-1 may be less than or equal to S _n , W _n-1 may be less than or equal to W _n , and H _n-1 may be less than or equal to H _n _. where S _{n-1 is} the same as S _n ,W _n-1 Same as W _n , and H _n-1 isThe same case as H _n may be excluded.

즉, 제2 다채널 레이아웃의 서라운드 채널의 개수는 제 1 다채널 레이아웃의 서라운드 채널의 개수보다 많아야 한다. 또는, 제 2 다채널 레이아웃의 서브우퍼 채널의 개수는 제1 다채널 레이아웃의 서브우퍼 채널의 개수보다 많아야 한다. 또는, 제2 다채널 레이아웃의 높이채널의 개수는 제1 다채널 레이아웃의 높이채널의 개수보다 많아야 한다. That is, the number of surround channels in the second multi-channel layout should be greater than the number of surround channels in the first multi-channel layout. Alternatively, the number of subwoofer channels of the second multi-channel layout should be greater than the number of subwoofer channels of the first multi-channel layout. Alternatively, the number of height channels in the second multi-channel layout should be greater than the number of height channels in the first multi-channel layout.

또한, 제2 다채널 레이아웃의 서라운드 채널의 개수는 제 1 다채널 레이아웃의 서라운드 채널의 개수보다 작을 수 없다. 마찬가지로 제 2 다채널 레이아웃의 서브우퍼채널의 개수는 제 1 다채널 레이아웃의 서브우퍼채널의 개수보다 작을 수 없다. 제 2 다채널 레이아웃의 높이채널의 개수는 제 1 다채널 레이아웃의 높이채널의 개수보다 작을 수 없다. Also, the number of surround channels in the second multi-channel layout cannot be smaller than the number of surround channels in the first multi-channel layout. Similarly, the number of subwoofer channels in the second multi-channel layout cannot be smaller than the number of subwoofer channels in the first multi-channel layout. The number of height channels of the second multi-channel layout cannot be smaller than the number of height channels of the first multi-channel layout.

또한, 제2 다채널 레이아웃의 서라운드 채널의 개수가 제 1 다채널 레이아웃의 서라운드 채널의 개수와 동일하면서, 제 2 다채널 레이아웃의 서브우퍼 채널의 개수가 제 1 다채널 레이아웃의 서브우퍼 채널의 개수와 동일하고, 또한, 제 2 다채널 레이아웃의 높이 채널의 개수가 제 1 다채널 레이아웃의 높이 채널의 개수와 동일할 수 없다. 즉, 제 2 다채널 레이아웃의 모든 채널들이 제 1 다채널 레이아웃의 모든 채널과 동일할 수 없다.In addition, while the number of surround channels of the second multi-channel layout is the same as the number of surround channels of the first multi-channel layout, the number of subwoofer channels of the second multi-channel layout is the number of subwoofer channels of the first multi-channel layout Also, the number of height channels in the second multi-channel layout cannot be equal to the number of height channels in the first multi-channel layout. That is, all channels of the second multi-channel layout cannot be identical to all channels of the first multi-channel layout.

구체적인 일 예로, 제 1 다채널 레이아웃이 5.1.2 채널이라고 하면, 제 2 다채널 레이아웃은 7.1.4 채널일 수 있다.As a specific example, if the first multi-channel layout is a 5.1.2 channel, the second multi-channel layout may be a 7.1.4 channel.

한편, 비트스트림은 제 1 오디오 트랙 및 제 2 오디오 트랙을 포함하는 복수의 오디오 트랙을 갖는 파일 스트림으로 구성될 수 있다. 이하, 정보 획득부(350)가 오디오 트랙에 포함된 부가 정보에 따라, 적어도 하나의 종속 채널 그룹의 적어도 하나의 압축 오디오 신호를 획득하는 과정을 설명하겠다.Meanwhile, the bitstream may be composed of a file stream having a plurality of audio tracks including a first audio track and a second audio track. Hereinafter, a process in which the information obtaining unit 350 obtains at least one compressed audio signal of at least one subordinate channel group according to the additional information included in the audio track will be described.

정보 획득부(350)는 제 1 오디오 트랙으로부터 기본 채널 그룹의 적어도 하나의 압축 오디오 신호를 획득할 수 있다.The information obtaining unit 350 may obtain at least one compressed audio signal of the basic channel group from the first audio track.

정보 획득부(350)는 제 1 오디오 트랙에 인접하는 제 2 오디오 트랙으로부터, 종속 채널 오디오 신호 식별 정보를 획득할 수 있다. The information obtaining unit 350 may obtain dependent channel audio signal identification information from a second audio track adjacent to the first audio track.

종속 채널 오디오 신호 식별 정보는 제 2 오디오 트랙에 종속 채널 오디오 신호가 존재함을 나타내는 경우, 정보 획득부(350)는 제 2 오디오 트랙으로부터 적어도 하나의 종속 채널 그룹의 적어도 하나의 오디오 신호를 획득할 수 있다.When the dependent channel audio signal identification information indicates that the dependent channel audio signal exists in the second audio track, the information obtaining unit 350 obtains at least one audio signal of at least one dependent channel group from the second audio track. can

종속 채널 오디오 신호 식별 정보는 상기 제 2 오디오 트랙에 종속 채널 오디오 신호가 존재하지 않음을 나타내는 경우, 정보 획득부(350)는 제 2 오디오 트랙으로부터 기본 채널 그룹의 다음 오디오 신호를 획득할 수 있다.When the dependent channel audio signal identification information indicates that the dependent channel audio signal does not exist in the second audio track, the information obtaining unit 350 may obtain the next audio signal of the basic channel group from the second audio track.

정보 획득부(350)는 비트스트림으로부터 다채널 오디오의 복원과 관련된 부가 정보를 획득할 수 있다. 즉, 정보 획득부(350)는 비트스트림으로부터 상기 부가 정보를 포함하는 메타 데이터를 분류하고, 분류된 메타 데이터로부터 부가 정보를 획득할 수 있다.The information obtaining unit 350 may obtain additional information related to restoration of multi-channel audio from the bitstream. That is, the information obtaining unit 350 may classify the metadata including the additional information from the bitstream, and obtain the additional information from the classified metadata.

압축 해제부(370)는 기본 채널 그룹의 적어도 하나의 압축 오디오 신호를 압축해제하여 기본 채널 그룹의 오디오 신호를 복원할 수 있다.The decompression unit 370 may decompress at least one compressed audio signal of the basic channel group to restore the audio signal of the basic channel group.

압축 해제부(370)는 적어도 하나의 종속 채널 그룹의 적어도 하나의 압축 오디오 신호를 압축 해제하여 적어도 하나의 종속 채널 그룹의 적어도 하나의 오디오 신호를 복원할 수 있다.The decompression unit 370 may decompress at least one compressed audio signal of at least one subordinate channel group to reconstruct at least one audio signal of at least one subordinate channel group.

이때, 압축 해제부(370)은 각 채널 그룹(n개의 채널 그룹)의 압축 오디오 신호를 복호화하기 위한 별도의 제 1 압축 해제부, ... , 제 n 압축 해제부(미도시)를 포함할 수 있다. 이때, 제 1 압축 해제부, ... , 제 n 압축 해제부(미도시)는 서로 병렬적으로 동작할 수 있다.At this time, the decompression unit 370 may include a separate first decompression unit, ..., an nth decompression unit (not shown) for decoding the compressed audio signal of each channel group (n channel groups). can At this time, the first decompression unit, ... , the nth decompression unit (not shown) may operate in parallel with each other.

다채널 오디오 신호 복원부(380)는 기본 채널 그룹의 적어도 하나의 오디오 신호 및 적어도 하나의 종속 채널 그룹의 적어도 하나의 오디오 신호를 기초로, 다채널 오디오 신호를 복원할 수 있다. The multi-channel audio signal reconstructor 380 may reconstruct the multi-channel audio signal based on at least one audio signal of a basic channel group and at least one audio signal of at least one subordinate channel group.

예를 들어, 다채널 오디오 신호 복원부(380)는 기본 채널 그룹의 오디오 신호가 스테레오 채널의 오디오 신호인 경우, 기본 채널 그룹의 오디오 신호 및 제 1 종속 채널 그룹의 오디오 신호를 기초로, 청자 전방 3차원 오디오 채널의 오디오 신호를 복원할 수 있다. 예를 들어, 청자 전방 3차원 오디오 채널은 3.1.2 채널일 수 있다.For example, when the audio signal of the basic channel group is the audio signal of the stereo channel, the multi-channel audio signal restoration unit 380 may perform the audio signal of the basic channel group and the audio signal of the first sub-channel group in front of the listener based on the audio signal of the first sub-channel group. An audio signal of a 3D audio channel may be restored. For example, the 3D audio channel in front of the listener may be 3.1.2 channel.

또는, 다채널 오디오 신호 복원부(380)는 기본 채널 그룹의 오디오 신호, 제 1 종속 채널 그룹의 오디오 신호 및 제 2 종속 채널 그룹의 오디오 신호를 기초로, 청자 전방향 오디오 채널의 오디오 신호를 복원할 수 있다. 예를 들어, 청자 전방향 3차원 오디오 채널은 5.1.2 채널 또는 7.1.4 채널일 수 있다.Alternatively, the multi-channel audio signal restoration unit 380 restores the audio signal of the listener omnidirectional audio channel based on the audio signal of the basic channel group, the audio signal of the first sub-channel group, and the audio signal of the second sub-channel group. can do. For example, the listener omnidirectional 3D audio channel may be a 5.1.2 channel or a 7.1.4 channel.

다채널 오디오 신호 복원부(380)는 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호뿐 아니라, 부가 정보를 기초로, 다채널 오디오 신호를 복원할 수 있다. 이때, 부가 정보는 다채널 오디오 신호의 복원을 위한 부가 정보일 수 있다. 다채널 오디오 신호 복원부(380)는 복원된 적어도 하나의 다채널 오디오 신호를 출력할 수 있다.The multi-channel audio signal reconstructor 380 may reconstruct the multi-channel audio signal based on additional information as well as the audio signal of the basic channel group and the audio signal of the sub-channel group. In this case, the additional information may be additional information for reconstructing a multi-channel audio signal. The multi-channel audio signal restoration unit 380 may output at least one restored multi-channel audio signal.

일 실시예에 따른 다채널 오디오 신호 복원부(380)는 기본 채널 그룹의 적어도 하나의 오디오 신호 및 상기 적어도 하나의 종속 채널 그룹의 적어도 하나의 오디오 신호로부터 청자 전방의 3차원 오디오 채널의 제 1 오디오 신호를 생성할 수 있다. 다채널 오디오 신호 복원부(380)는 제 1 오디오 신호 및 청자 전방의 오디오 채널의 오디오 객체 신호를 기초로, 청자 전방의 3차원 오디오 채널의 제 2 오디오 신호를 포함하는 다채널 오디오 신호를 복원할 수 있다. 이 때, 오디오 객체 신호는 오디오 객체(음원)의 오디오 신호, 모양, 면적, 위치, 방향 중 적어도 하나를 나타낼 수 있고, 정보 획득부(350)으로부터 획득될 수 있다.The multi-channel audio signal reconstructor 380 according to an embodiment may include a first audio of a 3D audio channel in front of a listener from at least one audio signal of a basic channel group and at least one audio signal of the at least one subordinate channel group. signal can be generated. The multi-channel audio signal restoration unit 380 restores the multi-channel audio signal including the first audio signal and the second audio signal of the 3D audio channel in front of the listener, based on the audio object signal of the audio channel in front of the listener. can In this case, the audio object signal may indicate at least one of an audio signal, a shape, an area, a location, and a direction of an audio object (sound source), and may be obtained from the information obtaining unit 350 .

다채널 오디오 신호 복원부(380)의 구체적인 동작은 도 3c를 참조하여 후술하겠다.A detailed operation of the multi-channel audio signal restoration unit 380 will be described later with reference to FIG. 3C .

도 3c는 일 실시예에 따른 다채널 오디오 신호 복원부의 구성을 도시하는 블록도이다.3C is a block diagram illustrating a configuration of a multi-channel audio signal restoration unit according to an embodiment.

도 3c를 참조하면, 다채널 오디오 신호 복원부(380)는 업믹스 채널 그룹 오디오 생성부(381) 및 렌더링부(386)을 포함할 수 있다. Referring to FIG. 3C , the multi-channel audio signal restoration unit 380 may include an upmix channel group audio generation unit 381 and a rendering unit 386 .

업믹스 채널 그룹 오디오 생성부(381)는 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호를 기초로, 업믹스 채널 그룹의 오디오 신호를 생성할 수 있다. 이때, 업믹스 채널 그룹의 오디오 신호는 다채널 오디오 신호일 수 있다. 이때, 추가적으로, 부가 정보(예를 들어, 동적 디믹싱 가중치 파라미터에 관한 정보)를 더 기초로 하여, 다채널 오디오 신호가 생성될 수 있다.The upmix channel group audio generator 381 may generate an audio signal of the upmix channel group based on the audio signal of the basic channel group and the audio signal of the subordinate channel group. In this case, the audio signal of the upmix channel group may be a multi-channel audio signal. In this case, a multi-channel audio signal may be additionally generated based on additional information (eg, information on a dynamic demixing weight parameter).

업믹스 채널 오디오 생성부(381)는 기본 채널 그룹의 오디오 신호와 종속 채널 그룹의 오디오 신호 중 일부를 디믹싱하여, 업믹스 채널의 오디오 신호를 생성할 수 있다. 예를 들어, 기본 채널 그룹의 오디오 신호 L, R과 종속 채널 그룹의 일부 오디오 신호인 C를 디믹싱하여, 디믹스 채널(de-mixed channel; 또는 upmixed channel)의 오디오 신호 L3 및 R3를 생성할 수 있다.The upmix channel audio generator 381 may generate an upmix channel audio signal by demixing some of the audio signal of the basic channel group and the audio signal of the subordinate channel group. For example, by demixing audio signals L and R of the base channel group and some audio signals C of the dependent channel group, audio signals L3 and R3 of a de-mixed channel (or upmixed channel) can be generated. can

업믹스 채널 오디오 생성부(381)는 종속 채널 그룹의 오디오 신호 중 일부에 대하여 디믹싱 동작을 바이패스함으로써, 다채널 오디오 신호 중 일부 채널의 오디오 신호를 생성할 수 있다. 예를 들어, 업믹스 채널 오디오 생성부(381)는 종속 채널 그룹의 일부 오디오 신호인 C, LFE, Hfl3, Hfr3 채널의 오디오 신호에 대하여 디믹싱 동작을 바이패스하여, 다채널 오디오 신호 중 C, LFE, Hfl3, Hfr3 채널의 오디오 신호를 생성할 수 있다.The upmix channel audio generator 381 may generate audio signals of some channels of the multi-channel audio signals by bypassing the demixing operation on some of the audio signals of the subordinate channel group. For example, the upmix channel audio generation unit 381 bypasses the demixing operation for audio signals of channels C, LFE, Hfl3, and Hfr3, which are some audio signals of the subordinate channel group, to thereby perform C, Audio signals of LFE, Hfl3, and Hfr3 channels can be generated.

결국, 업믹스 채널 오디오 생성부(381)는 디믹싱을 통해 생성된 업믹스 채널의 오디오 신호 및 디믹싱 동작이 바이패스된 종속 채널 그룹의 오디오 신호를 기초로, 업믹스 채널 그룹의 오디오 신호를 생성할 수 있다. 예를 들어, 업믹스 채널 오디오 생성부(381)는 디믹싱 채널의 오디오 신호인 L3, R3 채널의 오디오 신호와 종속 채널 그룹의 오디오 신호인 C, LFE, Hfl3, Hfr3 채널의 오디오 신호를 기초로, 3.1.2 채널의 오디오 신호 L3, R3, C, LFE, Hfl3, Hfr3 채널의 오디오 신호를 생성할 수 있다.As a result, the upmix channel audio generator 381 generates the audio signal of the upmix channel group based on the audio signal of the upmix channel generated through demixing and the audio signal of the subordinate channel group to which the demixing operation is bypassed. can create For example, the upmix channel audio generation unit 381 is based on the audio signals of channels L3 and R3, which are audio signals of the demixing channel, and audio signals of channels C, LFE, Hfl3, and Hfr3, which are audio signals of the subordinate channel group. , 3.1.2 channel audio signal L3, R3, C, LFE, Hfl3, Hfr3 channel audio signal can be generated.

업믹스 채널 오디오 생성부(381)의 구체적인 동작은 도 3d를 참조하여 후술하기로 한다.A detailed operation of the upmix channel audio generator 381 will be described later with reference to FIG. 3D .

렌더링부(386)는 음량 제어부(388), 및 리미터(389)를 포함할 수 있다. 렌더링부(386)는 입력이 되는 다채널 오디오 신호는 적어도 하나의 채널 레이아웃의 다채널 오디오 신호일 수 있다. 이때, 렌더링부(386)의 입력이 되는 다채널 오디오 신호는 PCM(Pulse-code modulation) 신호일 수 있다.The rendering unit 386 may include a volume control unit 388 , and a limiter 389 . The multi-channel audio signal input to the rendering unit 386 may be a multi-channel audio signal of at least one channel layout. In this case, the multi-channel audio signal input to the rendering unit 386 may be a pulse-code modulation (PCM) signal.

한편, 각 채널의 오디오 신호에 대한 음량(라우드니스; Loudness)는 ITU-R　BS.1770을 기초로 측정될 수 있고, 이는 비트스트림의 부가 정보를 통해 시그널링될 수 있다.Meanwhile, a loudness (loudness) of an audio signal of each channel may be measured based on ITU-R　BS.1770, which may be signaled through additional information of a bitstream.

음량 제어부(388)는 비트스트림을 통해 시그널링된 음량 정보를 기초로, 각 채널의 오디오 신호의 음량을 타겟 음량(예를 들어, -24LKFS)로 제어하여 출력할 수 있다. The volume controller 388 may control and output the volume of the audio signal of each channel to a target volume (eg, -24LKFS) based on volume information signaled through the bitstream.

한편, 트루 피크(True Peak)는 ITU-R　BS.1770을 기초로 측정될 수 있다. Meanwhile, the true peak may be measured based on ITU-R　BS.1770.

리미터(389)는 음량 제어 후에, 오디오 신호의 트루 피크 레벨을 제한(예를 들어, -1dBTP로 제한)할 수 있다. The limiter 389 may limit (eg, limit to -1 dBTP) the true peak level of the audio signal after volume control.

이상, 렌더링부(386)에 포함된 후처리 구성요소(388,389)에 대하여, 설명하였으나, 이에 제한되지 않고, 적어도 하나의 구성요소가 생략될 수 있고, 각 구성요소의 순서가 경우에 따라 바뀔 수 있다.As described above, the post-processing components 388 and 389 included in the rendering unit 386 have been described, but the present invention is not limited thereto, and at least one component may be omitted, and the order of each component may be changed in some cases. have.

다채널 오디오 신호 출력부(390)는 후처리된 적어도 하나의 다채널 오디오 신호를 출력할 수 있다. 예를 들어, 다채널 오디오 신호 출력부(390)는 타겟 채널 레이아웃에 따라, 후처리된 다채널 오디오 신호를 입력으로 하여, 다채널 오디오 신호의 각 채널의 오디오 신호를 각 채널에 대응하는 오디오 출력 장치로 출력할 수 있다. 오디오 출력 장치는 다양한 종류의 스피커를 포함할 수 있다. The multi-channel audio signal output unit 390 may output at least one post-processed multi-channel audio signal. For example, the multi-channel audio signal output unit 390 receives the post-processed multi-channel audio signal according to the target channel layout, and outputs the audio signal of each channel of the multi-channel audio signal corresponding to each channel. It can be output to the device. The audio output device may include various types of speakers.

도 3d는 일 실시예에 따른 업믹스 채널 오디오 생성부의 구성을 도시하는 블록도이다.3D is a block diagram illustrating a configuration of an upmix channel audio generator according to an embodiment.

도 3d를 참조하면, 업믹스 채널 오디오 생성부(381)는 디믹싱부(382)를 포함할 수 있다. 디믹싱부(382)는 제 1 디믹싱부(383), 제 2 디믹싱부(384),..., 제 N 디믹싱부(385)를 포함할 수 있다.Referring to FIG. 3D , the upmix channel audio generator 381 may include a demixer 382 . The demixing unit 382 may include a first demixing unit 383 , a second demixing unit 384 , ..., and an Nth demixing unit 385 .

디믹싱부(382)는 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호의 채널들(복호화된 채널) 중 일부 채널의 오디오 신호로부터 새로운 채널(업믹스 채널 또는 디믹스 채널)의 오디오 신호를 획득할 수 있다. 즉, 디믹싱부(382)는 여러 채널이 믹싱된 적어도 하나의 오디오 신호로부터 하나의 업믹스 채널의 오디오 신호를 획득할 수 있다. 디믹싱부(382)는 업믹스 채널의 오디오 신호와 복호화된 채널의 오디오 신호를 포함하는 특정 레이아웃의 오디오 신호를 출력할 수 있다.The demixing unit 382 converts an audio signal of a new channel (upmix channel or demix channel) from an audio signal of some of the channels (decoded channels) of the audio signal of the basic channel group and the audio signal of the subordinate channel group. can be obtained That is, the demixing unit 382 may obtain an audio signal of one upmix channel from at least one audio signal in which several channels are mixed. The demixing unit 382 may output an audio signal of a specific layout including an audio signal of an upmix channel and an audio signal of a decoded channel.

예를 들어, 기본 채널 그룹의 오디오 신호는 디믹싱부(382)에서 디믹싱 동작이 바이패스되어 제 1 채널 레이아웃의 오디오 신호로 출력될 수 있다. For example, the audio signal of the basic channel group may be output as the audio signal of the first channel layout by bypassing the demixing operation in the demixing unit 382 .

제 1 디믹싱부(383)는 기본 채널 그룹의 오디오 신호 및 제 1 종속 채널 그룹의 오디오 신호를 입력으로 하여, 일부의 채널의 오디오 신호를 디믹싱할 수 있다. 이때, 디믹스 채널(또는 업믹스 채널)의 오디오 신호를 생성될 수 있다. 제 1 디믹싱부(383)는 나머지 채널의 오디오 신호의 믹싱 동작을 바이패스하여 독립 채널의 오디오 신호를 생성할 수 있다. 제 1 디믹싱부(383)는 업믹스 채널의 오디오 신호 및 독립 채널의 오디오 신호를 포함하는 신호인 제 2 채널 레이아웃의 오디오 신호를 출력할 수 있다.The first demixing unit 383 may receive the audio signal of the basic channel group and the audio signal of the first sub-channel group as inputs, and demix the audio signals of some channels. In this case, an audio signal of a demix channel (or an upmix channel) may be generated. The first demixer 383 may generate an audio signal of an independent channel by bypassing a mixing operation of audio signals of the remaining channels. The first demixing unit 383 may output an audio signal of the second channel layout, which is a signal including an audio signal of an upmix channel and an audio signal of an independent channel.

제 2 디믹싱부(384)는 제 2 채널 레이아웃의 오디오 신호 및 제 2 종속 채널의 오디오 신호 중에서, 일부의 채널의 오디오 신호를 디믹싱함으로써, 디믹스 채널(또는 업믹스 채널)의 오디오 신호를 생성할 수 있다. 제2 디믹싱부(384)는 나머지 채널의 오디오 신호의 믹싱 동작을 바이패스하여 독립 채널의 오디오 신호를 생성할 수 있다. 제 2 디믹싱부(384)는 업믹스 채널의 오디오 신호 및 독립 채널의 오디오 신호를 포함하는, 제 3 채널 레이아웃의 오디오 신호를 출력할 수 있다.The second demixing unit 384 demixes the audio signals of some channels among the audio signals of the second channel layout and the audio signals of the second subordinate channels, so that the audio signals of the demix channel (or upmix channel) are mixed. can create The second demixing unit 384 may generate an audio signal of an independent channel by bypassing a mixing operation of audio signals of the remaining channels. The second demixing unit 384 may output an audio signal of a third channel layout including an audio signal of an upmix channel and an audio signal of an independent channel.

제 n 디믹싱부(미도시)는 제2 디믹싱부(384)의 동작과 유사하게, 제 n-1 채널 레이아웃의 오디오 신호 및 제 n-1 종속 채널 그룹의 오디오 신호를 기초로, 제 n 채널 레이아웃의 오디오 신호를 출력할 수 있다. n은 N보다 작거나 같을 수 있다.Similarly to the operation of the second demixing unit 384 , the n-th demixing unit (not shown) performs an n-th demixing unit based on the audio signal of the n-1 th channel layout and the audio signal of the n-1 th subordinate channel group. An audio signal of a channel layout can be output. n may be less than or equal to N.

제 N 디믹싱부(385)는 제 N-1 채널 레이아웃의 오디오 신호 및 제 N-1 종속 채널 그룹의 오디오 신호를 기초로, 제 N 채널 레이아웃의 오디오 신호를 출력할 수 있다. The N-th demixer 385 may output an audio signal of the N-th channel layout based on the audio signal of the N-1 th channel layout and the audio signal of the N-1 th subordinate channel group.

하위 채널 레이아웃의 오디오 신호가 각 디믹싱부(383,384,..,385)에 바로 입력되는 것으로 도시되어 있으나, 도 3c의 렌더링부(386)를 거쳐 출력되는 채널 레이아웃의 오디오 신호가 각 디믹싱부(383,384,..,385)에 입력될 수 있다. 즉, 후처리된 하위 채널 레이아웃의 오디오 신호가 각 디믹싱부(383,384,..,385)에 입력될 수 있다.Although it is shown that the audio signal of the lower channel layout is directly input to each of the demixing units 383 , 384 , ..., 385 , the audio signal of the channel layout output through the rendering unit 386 of FIG. (383,384,..,385) can be entered. That is, the post-processed audio signal of the sub-channel layout may be input to each of the demixing units 383 , 384 , ..., 385 .

도 3d를 통해 각 디믹싱부(383,384,...,385)가 캐스케이드한 방식으로 연결되어 각 채널 레이아웃의 오디오 신호를 출력하는 내용을 설명하였다. It has been described that each of the demixing units 383 , 384 , ..., 385 is connected in a cascaded manner to output an audio signal of each channel layout through FIG. 3D .

하지만, 각 디믹싱부(383,384,...,385)가 캐스케이드한 방식으로 연결되지 않고도, 기본 채널 그룹의 오디오 신호 및 적어도 하나의 종속 채널 그룹의 오디오 신호로부터, 특정 레이아웃의 오디오 신호를 출력할 수도 있다.However, without the respective demixing units 383 , 384 , ..., 385 being connected in a cascaded manner, an audio signal of a specific layout can be output from an audio signal of a basic channel group and an audio signal of at least one subordinate channel group. may be

한편, 오디오 부호화 장치(200,400)에서 여러 채널의 신호가 믹싱되어 생성된 오디오 신호는, 클리핑 방지를 위해 다운믹스 이득을 이용하여 오디오 신호의 레벨이 낮춰진 상태이다. 오디오 복호화 장치(300, 500)는 믹싱되어 생성된 신호에 대해, 대응하는 다운믹스 이득에 기초하여 오디오 신호의 레벨을 원본 오디오 신호의 레벨에 맞출 수 있다.Meanwhile, in the audio signal generated by mixing signals of several channels in the audio encoding apparatuses 200 and 400 , the level of the audio signal is lowered by using a downmix gain to prevent clipping. The audio decoding apparatuses 300 and 500 may adjust the level of the audio signal to the level of the original audio signal based on a corresponding downmix gain with respect to the mixed and generated signal.

한편, 전술된 다운믹스 이득에 기초한 동작은 채널별로 또는 채널 그룹별로 이루어질 수 있다. 이때, 오디오 부호화 장치(200, 400)는 채널별로 또는 채널 그룹별로 다운믹스 이득에 관한 정보는 비트스트림의 부가 정보를 통해, 시그널링할 수 있다. 따라서, 오디오 복호화 장치(300, 500)는 채널별로 또는 채널 그룹별로 다운믹스 이득에 관한 정보를 비트스트림의 부가 정보로부터 획득하고, 다운믹스 이득에 기초하여 전술된 동작을 수행할 수 있다.Meanwhile, the operation based on the above-described downmix gain may be performed for each channel or for each channel group. In this case, the audio encoding apparatuses 200 and 400 may signal the downmix gain for each channel or for each channel group through additional information of the bitstream. Accordingly, the audio decoding apparatuses 300 and 500 may obtain information on the downmix gain for each channel or for each channel group from the additional information of the bitstream, and perform the above-described operation based on the downmix gain.

한편, 디믹싱부(382)는 (다운믹싱 매트릭스의 다운믹싱 가중치 파라미터에 대응하는) 디믹싱 매트릭스의 동적 디믹싱 가중치 파라미터를 기초로, 디믹싱 동작을 수행할 수 있다. 이때, 오디오 부호화 장치(200,400)는 동적 디믹싱 가중치 파라미터 또는 이에 대응하는 동적 다운믹싱 가중치 파라미터는 비트스트림의 부가 정보를 통해, 시그널링할 수 있다. 일부 디믹싱 가중치 파라미터는 시그널링되지 않고, 고정된 값을 가질 수 있다.Meanwhile, the demixing unit 382 may perform a demixing operation based on the dynamic demixing weight parameter of the demixing matrix (corresponding to the downmixing weight parameter of the downmixing matrix). In this case, the audio encoding apparatuses 200 and 400 may signal the dynamic demixing weight parameter or the dynamic downmixing weight parameter corresponding thereto through side information of the bitstream. Some demixing weight parameters are not signaled and may have a fixed value.

따라서, 오디오 복호화 장치(300,500)는 동적 디믹싱 가중치 파라미터에 관한 정보(또는 동적 다운믹싱 가중치 파라미터에 관한 정보)를 비트스트림의 부가 정보로부터 획득하고, 획득된 동적 디믹싱 가중치 파라미터에 관한 정보(또는 동적 다운믹싱 가중치 파라미터에 관한 정보)를 기초로, 디믹싱 동작을 수행할 수 있다.Accordingly, the audio decoding apparatuses 300 and 500 obtain information about the dynamic demixing weight parameter (or information about the dynamic downmixing weight parameter) from the side information of the bitstream, and obtain information about the obtained dynamic demixing weight parameter (or A demixing operation may be performed based on the dynamic downmixing weight parameter).

도 4a는 다른 실시예에 따른 오디오 부호화 장치의 구성을 도시하는 블록도이다.4A is a block diagram showing the configuration of an audio encoding apparatus according to another embodiment.

도 4a를 참조하면, 오디오 부호화 장치(400)은 다채널 오디오 부호화부(450), 비트스트림 생성부(480) 및 에러 제거 관련 정보 생성부(490)를 포함할 수 있다. 다채널 오디오 부호화부(450)는 다채널 오디오 신호 처리부(460) 및 압축부(470)를 포함할 수 있다. Referring to FIG. 4A , the audio encoding apparatus 400 may include a multi-channel audio encoder 450 , a bitstream generator 480 , and an error removal related information generator 490 . The multi-channel audio encoder 450 may include a multi-channel audio signal processor 460 and a compression unit 470 .

도 4a의 각 구성요소(450, 460, 470, 480, 490)은 도 2a의 메모리(210) 및 프로세서(230)에 의해 구현될 수 있다.Each of the components 450 , 460 , 470 , 480 , and 490 of FIG. 4A may be implemented by the memory 210 and the processor 230 of FIG. 2A .

도 4a의 다채널 오디오 부호화부(450), 다채널 오디오 신호 처리부(460), 압축부(470) 및 비트스트림 생성부(480)의 동작은 도 2b의 다채널 오디오 부호화부(250), 다채널 오디오 신호 처리부(260), 압축부(270), 비트스트림 생성부(280)의 동작에 각각 대응되므로, 구체적인 설명은 도 2b의 설명으로 대체하기로 한다. The operations of the multi-channel audio encoder 450, the multi-channel audio signal processor 460, the compression unit 470, and the bitstream generator 480 of FIG. 4A are the multi-channel audio encoder 250 and the multi-channel audio encoder 250 of FIG. 2B. Since they correspond to the operations of the channel audio signal processing unit 260 , the compression unit 270 , and the bitstream generation unit 280 , the detailed description will be replaced with the description of FIG. 2B .

에러 제거 관련 정보 생성부(490)는 도 2b의 부가 정보 생성부(285)에 포함된 구성일 수 있으나, 이에 제한되지 않고, 별도로도 존재할 수 있다.The error removal related information generating unit 490 may be a component included in the additional information generating unit 285 of FIG. 2B , but is not limited thereto, and may exist separately.

에러 제거 관련 정보 생성부(490)는 제 1 전력 값과, 제 2 전력 값을 기초로 에러 제거를 위한 펙터(예를 들어, 스케일링 펙터)를 결정할 수 있다. 이때, 제 1 전력 값은 원본 오디오 신호의 하나의 채널 또는 원본 오디오 신호로부터 다운믹싱함으로써 획득된 하나의 채널의 오디오 신호의 에너지 값일 수 있다. 제 2 전력 값은 업믹스 채널 그룹의 오디오 신호 중 하나의 업믹스 채널의 오디오 신호의 전력 값일 수 있다. 업믹스 채널 그룹의 오디오 신호는 기본 채널 복원 신호 및 종속 채널 복원 신호를 디믹스함으로써 획득된 오디오 신호일 수 있다. The error removal related information generator 490 may determine a factor (eg, a scaling factor) for error removal based on the first power value and the second power value. In this case, the first power value may be one channel of the original audio signal or an energy value of the audio signal of one channel obtained by downmixing from the original audio signal. The second power value may be a power value of an audio signal of one upmix channel among audio signals of the upmix channel group. The audio signal of the upmix channel group may be an audio signal obtained by demixing the base channel reconstruction signal and the dependent channel reconstruction signal.

에러 제거 관련 정보 생성부(490)는 채널 별로 에러 제거를 위한 펙터를 결정할 수 있다.The error cancellation related information generator 490 may determine a factor for error cancellation for each channel.

에러 제거 관련 정보 생성부(490)는 결정된 에러 제거를 위한 펙터에 관한 정보를 포함하는 에러 제거와 관련된 정보를 생성할 수 있다. 비트스트림 생성부(480)는 에러 제거와 관련된 정보를 더 포함하는 비트스트림을 생성할 수 있다. 에러 제거 관련 정보 생성부(490)의 구체적인 동작은 도 4b를 참조하여 후술하기로 한다.The error removal related information generation unit 490 may generate information related to error removal including information about the determined factor for error removal. The bitstream generator 480 may generate a bitstream further including information related to error removal. A detailed operation of the error removal related information generating unit 490 will be described later with reference to FIG. 4B .

도 4b는 일 실시예에 따른 복원부의 구성을 도시하는 블록도이다.4B is a block diagram illustrating a configuration of a restoration unit according to an exemplary embodiment.

도 4b를 참조하면, 에러 제거 관련 정보 생성부(490)는, 압축 해제부(492), 디믹싱부(494), RMS 값 결정부(496) 및 에러 제거 펙터 결정부(498)을 포함할 수 있다.Referring to FIG. 4B , the error removal related information generation unit 490 may include a decompression unit 492 , a demixing unit 494 , an RMS value determination unit 496 , and an error removal factor determination unit 498 . can

압축 해제부(492)는 기본 채널 그룹의 압축 오디오 신호를 압축 해제하여, 기본 채널 복원 신호를 생성할 수 있다. 또한, 압축 해제부(492)는 종속 채널 그룹의 압축 오디오 신호를 압축 해제하여 종속 채널 복원 신호를 생성할 수 있다. The decompression unit 492 may decompress the compressed audio signal of the basic channel group to generate a basic channel restoration signal. Also, the decompression unit 492 may decompress the compressed audio signal of the dependent channel group to generate a dependent channel restoration signal.

디믹싱부(494)는 기본 채널 복원 신호 및 종속 채널 복원 신호를 디믹싱하여 업믹스 채널 그룹의 오디오 신호를 생성할 수 있다. 구체적으로, 디믹싱부(494)는 기본 채널 그룹 및 종속 채널 그룹의 오디오 신호 중 일부 채널의 오디오 신호를 디믹싱하여, 업믹스 채널(또는 디믹스 채널)의 오디오 신호를 생성할 수 있다. 또한, 디믹싱부(494)는 기본 채널 그룹 및 종속 채널 그룹의 오디오 신호 중 일부의 오디오 신호에 대한 디믹싱 동작을 바이패스할 수 있다.The demixing unit 494 may generate an audio signal of an upmix channel group by demixing the base channel reconstruction signal and the dependent channel reconstruction signal. Specifically, the demixer 494 may generate an audio signal of an upmix channel (or a demix channel) by demixing audio signals of some channels among audio signals of the base channel group and the subchannel group. Also, the demixing unit 494 may bypass a demixing operation of some audio signals among audio signals of the basic channel group and the dependent channel group.

디믹싱부(494)는 업믹스 채널의 오디오 신호와 디믹싱 동작이 바이패스된 오디오 신호를 포함하는 업믹스 채널 그룹의 오디오 신호를 획득할 수 있다.The demixing unit 494 may obtain an audio signal of an upmix channel group including an audio signal of the upmix channel and an audio signal in which the demixing operation is bypassed.

RMS 값 결정부(496)는 업믹스 채널 그룹 중 하나의 업믹스 채널의 제 1 오디오 신호의 RMS 값을 결정할 수 있다. RMS 값 결정부(496)는 원본 오디오 신호의 하나의 채널의 제 2 오디오 신호의 RMS 값 또는 원본 오디오 신호로부터 다운믹싱된 오디오 신호의 하나의 채널의 제 2 오디오 신호의 RMS 값을 결정할 수 있다. 이때, 제 1 오디오 신호의 채널과, 제 2 오디오 신호의 채널은 소정의 채널 레이아웃 내 동일한 채널을 나타낸다.The RMS value determiner 496 may determine the RMS value of the first audio signal of one upmix channel among the upmix channel group. The RMS value determiner 496 may determine the RMS value of the second audio signal of one channel of the original audio signal or the RMS value of the second audio signal of one channel of the audio signal downmixed from the original audio signal. In this case, the channel of the first audio signal and the channel of the second audio signal indicate the same channel in a predetermined channel layout.

에러 제거 펙터 결정부(498)는 제 1 오디오 신호의 RMS 값 및 제 2 오디오 신호의 RMS 값을 기초로, 에러 제거를 위한 펙터를 결정할 수 있다. 예를 들어, 제 1 오디오 신호의 RMS 값을 제 2 오디오 신호의 RMS 값으로 나누어 생성된 값이 에러 제거를 위한 펙터의 값으로 획득될 수 있다. 에러 제거 펙터 결정부(498)는 결정된 에러 제거를 위한 펙터에 관한 정보를 생성할 수 있다. 에러 제거 펙터 결정부(498)는 에러 제거를 위한 펙터에 관한 정보를 포함하는 에러 제거와 관련된 정보를 출력할 수 있다.The error cancellation factor determiner 498 may determine a factor for error cancellation based on the RMS value of the first audio signal and the RMS value of the second audio signal. For example, a value generated by dividing the RMS value of the first audio signal by the RMS value of the second audio signal may be obtained as a value of the factor for error removal. The error removal factor determiner 498 may generate information about the determined factor for error removal. The error removal factor determining unit 498 may output information related to error removal including information on factors for error removal.

도 5a는 다른 실시예에 따른 오디오 복호화 장치의 구성을 도시하는 블록도이다.5A is a block diagram illustrating a configuration of an audio decoding apparatus according to another embodiment.

도 5a를 참조하면, 오디오 복호화 장치(500)은 정보 획득부(550), 다채널 오디오 복호화부(560), 압축 해제부(570), 다채널 오디오 신호 복원부(580) 및 에러 제거 관련 정보 획득부(555)를 포함할 수 있다. 도 5a의 각 구성요소(550, 555, 560, 570, 580)은 도 3a의 메모리(310) 및 프로세서(330)에 의해 구현될 수 있다.Referring to FIG. 5A , the audio decoding apparatus 500 includes an information acquisition unit 550 , a multi-channel audio decoding unit 560 , a decompression unit 570 , a multi-channel audio signal restoration unit 580 , and error removal related information. It may include an acquisition unit 555 . Each of the components 550 , 555 , 560 , 570 , and 580 of FIG. 5A may be implemented by the memory 310 and the processor 330 of FIG. 3A .

도 5a의 각 구성요소(550, 555, 560, 570, 580)를 구현하기 위한 인스트럭션은 도 3a의 메모리(310)에 저장될 수 있다. 프로세서(330)는 메모리(310)에 저장된 인스트럭션을 실행할 수 있다.Instructions for implementing each of the components 550 , 555 , 560 , 570 , and 580 of FIG. 5A may be stored in the memory 310 of FIG. 3A . The processor 330 may execute instructions stored in the memory 310 .

도 5a의 정보 정보 획득부(550), 압축 해제부(570) 및 다채널 오디오 신호 복원부(580)의 동작은 도 3b의 정보 획득부(350), 압축 해제부(370) 및 다채널 오디오 신호 복원부(380)의 동작을 각각 포함하므로, 중복되는 설명은 도 3b의 설명으로 대체하기로 한다. 이하, 도 3b와 중복되지 않는 부분에 대하여 설명하겠다.The operations of the information information acquisition unit 550, the decompression unit 570, and the multi-channel audio signal restoration unit 580 of FIG. 5A are the information acquisition unit 350, the decompression unit 370 and the multi-channel audio signal of FIG. 3B. Since each of the operations of the signal restoration unit 380 is included, the overlapping description will be replaced with the description of FIG. 3B . Hereinafter, portions not overlapping with those of FIG. 3B will be described.

정보 획득부(550)는 비트스트림으로부터 메타 데이터를 획득할 수 있다.The information obtaining unit 550 may obtain metadata from the bitstream.

에러 제거 관련 정보 획득부(555)는 비트스트림에 포함된 메타 데이터로부터 에러 제거와 관련된 정보를 획득할 수 있다. 여기서, 에러와 관련된 정보에 포함된 에러 제거를 위한 펙터에 관한 정보는 업믹스 채널 그룹 중 하나의 업믹스 채널의 오디오 신호의 에러 제거를 위한 펙터일 수 있다. 에러 제거 관련 정보 획득부(555)는 정보 획득부(550)에 포함될 수 있다.The error removal-related information acquisition unit 555 may acquire error-removal-related information from metadata included in the bitstream. Here, the information on the factor for error cancellation included in the error-related information may be a factor for error cancellation of the audio signal of one upmix channel among the upmix channel group. The error removal related information obtaining unit 555 may be included in the information obtaining unit 550 .

다채널 오디오 신호 복원부(580)는 기본 채널의 적어도 하나의 오디오 신호 및 적어도 종속 채널 그룹의 적어도 하나의 오디오 신호를 기초로, 업믹스 채널 그룹의 오디오 신호를 생성할 수 있다. 업믹스 채널 그룹의 오디오 신호는 다채널 오디오 신호일 수 있다. 다채널 오디오 신호 복원부(580)는 업믹스 채널 그룹에 포함된 하나의 업믹스 채널의 오디오 신호에 에러 제거를 위한 펙터를 적용하여 상기 하나의 업믹스 채널의 오디오 신호를 복원할 수 있다.The multi-channel audio signal restoration unit 580 may generate an audio signal of an upmix channel group based on at least one audio signal of a basic channel and at least one audio signal of a subordinate channel group. The audio signal of the upmix channel group may be a multi-channel audio signal. The multi-channel audio signal restoration unit 580 may restore an audio signal of one upmix channel by applying a factor for error removal to an audio signal of one upmix channel included in the upmix channel group.

다채널 오디오 신호 복원부(580)는 상기 하나의 업믹스 채널의복원된 오디오 신호를 포함하는 다채널 오디오 신호를 출력할 수 있다.The multi-channel audio signal restoration unit 580 may output a multi-channel audio signal including the restored audio signal of the one upmix channel.

도 5b는 일 실시예에 따른 다채널 오디오 신호 복원부의 구성을 도시하는 블록도이다.5B is a block diagram illustrating a configuration of a multi-channel audio signal restoration unit according to an embodiment.

다채널 오디오 신호 복원부(580)는 업믹스 채널 그룹 오디오 생성부(581) 및 렌더링부(583)을 포함할 수 있다. 렌더링부(583)는 에러 제거부(584), 음량 제어부(585) 및 리미터(586), 및 다채널 오디오 신호 출력부(587)를 포함할 수 있다.The multi-channel audio signal restoration unit 580 may include an upmix channel group audio generation unit 581 and a rendering unit 583 . The rendering unit 583 may include an error removing unit 584 , a volume control unit 585 and a limiter 586 , and a multi-channel audio signal output unit 587 .

도 5b의 업믹스 채널 그룹 오디오 생성부(581), 에러 제거부(584), 음량 제어부(585), 리미터(586) 및 다채널 오디오 신호 출력부(587)는 도 3c의 업믹스 채널 그룹 오디오 생성부(381), 음량 제어부(388), 리미터(389), 및 다채널 오디오 신호 출력부(390)의 동작을 포함하므로, 중복되는 설명은 도 3c의 설명으로 대체하기로 한다. 이하, 도 3c와 중복되지 않는 부분에 대하여 설명하겠다.The upmix channel group audio generating unit 581, the error removing unit 584, the volume control unit 585, the limiter 586 and the multi-channel audio signal output unit 587 of FIG. 5B is the upmix channel group audio of FIG. 3C Since it includes the operations of the generator 381 , the volume controller 388 , the limiter 389 , and the multi-channel audio signal output unit 390 , the overlapping description will be replaced with the description of FIG. 3C . Hereinafter, portions not overlapping with those of FIG. 3C will be described.

에러 제거부(584)는 다채널 오디오 신호의 업믹스 채널 그룹 중 제 1 업믹스 채널의 오디오 신호 및 제 1 업믹스 채널의 에러 제거를 위한 펙터를 기초로, 제 1 채널의 에러 제거된 오디오 신호를 복원할 수 있다. 이때, 에러 제거를 위한 펙터는 원본 오디오 신호 또는 원본 오디오 신호로부터 다운믹싱된 오디오 신호의 제 1 채널의 오디오 신호의 RMS 값과 업믹스 채널 그룹 중 제 1 업믹스 채널의 오디오 신호의 RMS 값에 기초한 값일 수 있다. 제 1 채널과 제 1 업믹스 채널은 소정의 채널 레이아웃의 동일한 채널을 나타낼 수 있다. 에러 제거부(584)는 에러 제거를 위한 펙터를 기초로, 현재 업믹스 채널 그룹 중 제 1 업믹스 채널의 오디오 신호의 RMS 값이 원본 오디오 신호 또는 원본 오디오 신호로부터 다운믹싱된 오디오 신호의 제 1 채널의 오디오 신호의 RMS 값이 되도록 하여, 부호화로 인한 에러가 제거될 수 있다.The error removing unit 584 is configured to remove the error of the first channel based on the audio signal of the first upmix channel among the upmix channel group of the multi-channel audio signal and the factor for removing the error of the first upmix channel. can be restored. In this case, the factor for error removal is based on the RMS value of the audio signal of the first channel of the original audio signal or an audio signal downmixed from the original audio signal and the RMS value of the audio signal of the first upmix channel among the upmix channel group. can be a value. The first channel and the first upmix channel may represent the same channel of a predetermined channel layout. The error removing unit 584 determines the RMS value of the audio signal of the first upmix channel among the current upmix channel group based on the factor for error cancellation to the original audio signal or a first audio signal downmixed from the original audio signal. By making it the RMS value of the audio signal of the channel, an error due to encoding can be eliminated.

한편, 인접하는 오디오 프레임들 간의 에러 제거를 위한 펙터가 다를 수 있다. 이때, 이전 프레임의 끝 구간과 다음 프레임의 처음 구간에서 불연속적인 에러 제거를 위한 펙터로 인하여, 오디오 신호가 튀는 현상이 발생할 수 있다. Meanwhile, factors for error cancellation between adjacent audio frames may be different. In this case, due to a factor for removing discontinuous errors in the end section of the previous frame and the first section of the next frame, a phenomenon in which the audio signal is bounced may occur.

따라서, 에러 제거부(584)는 에러 제거를 위한 펙터에 대한 스무딩을 수행하여 프레임 경계 인접 구간에 이용되는 에러 제거를 위한 펙터를 결정할 수 있다. 프레임 경계 인접 구간은 경계를 기준으로 이전 프레임의 끝 구간과 경계를 기준으로 다음 프레임의 처음 구간을 의미한다. 각 구간은 소정의 개수의 샘플을 포함할 수 있다. Accordingly, the error removing unit 584 may determine the error removing factor used in the frame boundary adjacent section by smoothing the error removing factor. The frame boundary adjacent section means the end section of the previous frame based on the boundary and the beginning section of the next frame based on the boundary. Each section may include a predetermined number of samples.

여기서, 스무딩이란, 프레임 경계 구간에서 불연속적인 인접 오디오 프레임 간 에러 제거를 위한 펙터를 연속적인 에러 제거를 위한 펙터로 변환하는 동작을 의미한다.Here, the smoothing refers to an operation of converting a factor for error removal between discontinuous adjacent audio frames in a frame boundary section into a factor for continuous error removal.

다채널 오디오 신호 출력부(588)는 하나의 채널의 에러 제거된 오디오 신호를 포함하는 다채널 오디오 신호를 출력할 수 있다.The multi-channel audio signal output unit 588 may output a multi-channel audio signal including an error-removed audio signal of one channel.

한편, 렌더링부(583)에 포함된 후처리 구성요소(585, 586) 중 적어도 하나의 구성요소가 생략될 수 있고, 에러 제거부(584)를 포함하는 후처리 구성요소(584, 585, 586)의 순서가 경우에 따라 바뀔 수 있다.Meanwhile, at least one of the post-processing components 585 and 586 included in the rendering unit 583 may be omitted, and the post-processing components 584 , 585 and 586 including the error removing unit 584 . ) may be changed in some cases.

전술한 바와 같이, 오디오 부호화 장치(200,400)는 비트스트림을 생성할 수 있다. 오디오 부호화 장치(200,400)는 생성된 비트스트림을 전송할 수 있다. As described above, the audio encoding apparatuses 200 and 400 may generate a bitstream. The audio encoding apparatuses 200 and 400 may transmit the generated bitstream.

이때, 비트스트림은 파일 스트림 형태로 생성될 수 있다. 오디오 복호화 장치(300,500)는 비트스트림을 수신할 수 있다. 오디오 복호화 장치(300,500)는 수신된 비트스트림으로부터 획득된 정보를 기초로, 다채널 오디오 신호를 복원할 수 있다. 이때, 비트스트림은 소정의 파일 컨테이너에 포함될 수 있다. 예를 들어, 소정의 파일 컨테이너는 MP4(MPEG-4 Part 14) 컨테이너 등과 같이, 다양한 멀티미디어 디지털 데이터를 압축하기 위한 MPEG-4 용 미디어 컨테이너일 수 있다. In this case, the bitstream may be generated in the form of a file stream. The audio decoding apparatuses 300 and 500 may receive a bitstream. The audio decoding apparatuses 300 and 500 may reconstruct a multi-channel audio signal based on information obtained from the received bitstream. In this case, the bitstream may be included in a predetermined file container. For example, the predetermined file container may be a media container for MPEG-4 for compressing various multimedia digital data, such as an MP4 (MPEG-4 Part 14) container.

이하에서는, 도 6을 참조하여, 일 실시예에 따른 파일 구조를 설명하겠다.Hereinafter, a file structure according to an exemplary embodiment will be described with reference to FIG. 6 .

도 6을 참조하면, 파일(600)은 메타 데이터 박스(610) 및 미디어 데이터 박스(620)을 포함할 수 있다. Referring to FIG. 6 , a file 600 may include a metadata box 610 and a media data box 620 .

예를 들어, 메타 데이터 박스(610)는 MP4 파일 컨테이너의 moov 박스일 수 있고, 미디어 데이터 박스(620)는 MP4 파일 컨테이너의 mdat 박스일 수 있다.For example, the metadata box 610 may be a moov box of an MP4 file container, and the media data box 620 may be an mdat box of an MP4 file container.

메타 데이터 박스(610)는 파일(600)의 헤더 부분에 위치할 수 있다. 메타 데이터 박스(610)는 미디어 데이터의 메타 데이터를 저장하는 데이터 박스일 수 있다. 예를 들어, 메타 데이터 박스(610)는 전술한 부가 정보(615)를 포함할 수 있다.The meta data box 610 may be located in the header portion of the file 600 . The metadata box 610 may be a data box storing metadata of media data. For example, the meta data box 610 may include the aforementioned additional information 615 .

미디어 데이터 박스(620)는 미디어 데이터를 저장하는 데이터 박스일 수 있다. 예를 들어, 미디어 데이터 박스(620)는 기본 채널 오디오 스트림 또는 종속 채널 오디오 스트림(625)을 포함할 수 있다. The media data box 620 may be a data box storing media data. For example, the media data box 620 may include a base channel audio stream or a dependent channel audio stream 625 .

기본 채널 오디오 스트림 또는 종속 채널 오디오 스트림(625) 중 기본 채널 오디오 스트림은 기본 채널 그룹의 압축 오디오 신호를 포함할 수 있다.The base channel audio stream among the base channel audio stream or the dependent channel audio stream 625 may include a compressed audio signal of a base channel group.

기본 채널 오디오 스트림 또는 종속 채널 오디오 스트림(625) 중 종속 채널 오디오 스트림은 종속 채널 그룹의 압축 오디오 신호를 포함할 수 있다.Among the base channel audio stream or the dependent channel audio stream 625 , the dependent channel audio stream may include the compressed audio signal of the dependent channel group.

또한, 미디어 데이터 박스(620)는 부가 정보(630)을 포함할 수 있다. 부가 정보(630)는 미디어 데이터 박스(620)의 헤더 부분에 포함될 수 있다. 이에 제한되지 않고, 부가 정보(630)는 기본 채널 오디오 스트림 또는 종속 채널 오디오 스트림(625)의 헤더 부분에 포함될 수 있다. 특히, 부가 정보(630)는 종속 채널 오디오 스트림(625)의 헤더 부분에 포함될 수 있다.Also, the media data box 620 may include additional information 630 . The additional information 630 may be included in a header portion of the media data box 620 . Without being limited thereto, the additional information 630 may be included in a header portion of the base channel audio stream or the sub-channel audio stream 625 . In particular, the additional information 630 may be included in a header portion of the dependent channel audio stream 625 .

오디오 복호화 장치(300,500)는 파일(600)의 다양한 부분에 포함된 부가 정보(615, 630)를 획득할 수 있다. 오디오 복호화 장치(300,500)는 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호와 부가 정보(615, 630)를 기초로, 다채널 오디오 신호를 복원할 수 있다. 여기서, 기본 채널 그룹의 오디오 신호는 기본 채널 오디오 스트림으로부터 획득된 것이고, 종속 채널 그룹의 오디오 신호는 종속 채널 오디오 스트림으로부터 획득된 것일 수 있다. The audio decoding apparatuses 300 and 500 may acquire additional information 615 and 630 included in various parts of the file 600 . The audio decoding apparatuses 300 and 500 may reconstruct a multi-channel audio signal based on the audio signal of the basic channel group, the audio signal of the subordinate channel group, and the additional information 615 and 630 . Here, the audio signal of the base channel group may be obtained from the base channel audio stream, and the audio signal of the dependent channel group may be obtained from the dependent channel audio stream.

도 7a는 일 실시예에 따른 파일의 구체적인 구조를 설명하기 위한 도면이다.7A is a diagram for describing a specific structure of a file according to an exemplary embodiment.

도 7a를 참조하면, 파일(700)는 메타 데이터 박스(710) 및 미디어 데이터 박스(730)을 포함할 수 있다.Referring to FIG. 7A , a file 700 may include a metadata box 710 and a media data box 730 .

파일(700)은 메타 데이터 박스(710) 및 미디어 데이터 박스(730)를 포함할 수 있다. 메타 데이터 박스(710)은 적어도 하나의 오디오 트랙의 메타 데이터 박스를 포함할 수 있다. The file 700 may include a metadata box 710 and a media data box 730 . The metadata box 710 may include a metadata box of at least one audio track.

예를 들어, 메타 데이터 박스(710)는 오디오 트랙 #n(n은 1보다 크거나 같은 정수)의 메타 데이터 박스(715)를 포함할 수 있다. 예를 들어, 오디오 트랙 #n의 메타 데이터 박스(715)는 mp4 컨테이너의 trak 박스일 수 있다.For example, the metadata box 710 may include the metadata box 715 of the audio track #n (n is an integer greater than or equal to 1). For example, the metadata box 715 of the audio track #n may be a trak box of an mp4 container.

오디오 트랙 #n의 메타 데이터 박스(715)는 부가 정보(720)를 포함할 수 있다. The metadata box 715 of the audio track #n may include additional information 720 .

한편, 미디어 데이터 박스(730)는 적어도 하나의 오디오 트랙의 미디어 데이터 박스를 포함할 수 있다. 예를 들어, 미디어 데이터 박스(730)는 오디오 트랙 #n(n은 1보다 크거나 같은 정수)의 미디어 데이터 박스(735)를 포함할 수 있다. 이때, 오디오 트랙 #n의 메타 데이터 박스(715)에 포함된 위치 정보는 미디어 데이터 박스(730) 내 오디오 트랙 #n의 미디어 데이터 박스(735)의 위치를 나타낼 수 있다. 오디오 트랙 #n의 메타 데이터 박스(710)에 포함된 위치 정보를 기초로, 오디오 트랙 #n의 미디어 데이터 박스(735)가 식별될 수 있다.Meanwhile, the media data box 730 may include a media data box of at least one audio track. For example, media data box 730 may include media data box 735 of audio track #n (n being an integer greater than or equal to 1). In this case, the location information included in the metadata box 715 of the audio track #n may indicate the location of the media data box 735 of the audio track #n in the media data box 730 . Based on the location information included in the metadata box 710 of the audio track #n, the media data box 735 of the audio track #n may be identified.

오디오 트랙 #n의 미디어 데이터 박스(735)는 기본 채널 오디오 스트림 및 종속 채널 오디오 스트림(740) 및 부가 정보(745)를 포함할 수 있다. 부가 정보(745)는 오디오 트랙 #n 미디어 데이터 박스의 헤더 부분에 위치할 수 있다. 또는, 부가 정보(745)는 기본 채널 오디오 스트림 또는 종속 채널 오디오 스트림(740) 중 적어도 하나의 헤더 부분에 포함될 수 있다.The media data box 735 of the audio track #n may include a base channel audio stream and a dependent channel audio stream 740 and side information 745 . The additional information 745 may be located in the header part of the audio track #n media data box. Alternatively, the additional information 745 may be included in a header portion of at least one of the base channel audio stream and the dependent channel audio stream 740 .

도 7b는 도 7a의 파일 구조에 따라, 오디오 복호화 장치가 오디오 신호를 재생하는 방법의 흐름도를 도시한다.FIG. 7B is a flowchart illustrating a method for an audio decoding apparatus to reproduce an audio signal according to the file structure of FIG. 7A .

S700 단계에서, 오디오 복호화 장치(300,500)는 메타 데이터에 포함된 부가 정보로부터 오디오 트랙 #n의 식별 정보를 획득할 수 있다. In step S700 , the audio decoding apparatuses 300 and 500 may obtain identification information of the audio track #n from the additional information included in the metadata.

S705 단계에서, 오디오 복호화 장치(300,500)는 오디오 트랙 #n의 식별 정보가 기본 채널 그룹의 오디오 신호를 나타내는지 또는 오디오 트랙 #n의 식별 정보가 기본/종속 채널 그룹의 오디오 신호를 나타내는지를 식별할 수 있다.In step S705, the audio decoding apparatus 300,500 identifies whether the identification information of the audio track #n indicates the audio signal of the basic channel group or the identification information of the audio track #n indicates the audio signal of the basic/subordinate channel group. can

예를 들어, OPUS 오디오 포맷의 파일에 포함된 오디오 트랙 #n의 식별 정보는 Channel Mapping Family(CMF)일 수 있다. CMF가 1인 경우, 오디오 복호화 장치(300,500)는 현재 오디오 트랙에 기본 채널 그룹의 오디오 신호가 포함됨을 식별할 수 있다. 예를 들어, 기본 채널 그룹의 오디오 신호는 스테레오 채널 레이아웃의 오디오 신호일 수 있다. CMF가 4인 경우, 오디오 복호화 장치(300,500)는 현재 오디오 트랙에 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호가 포함됨을 식별할 수 있다.For example, the identification information of the audio track #n included in the file of the OPUS audio format may be a Channel Mapping Family (CMF). When the CMF is 1, the audio decoding apparatuses 300 and 500 may identify that the audio signal of the basic channel group is included in the current audio track. For example, an audio signal of a basic channel group may be an audio signal of a stereo channel layout. When the CMF is 4, the audio decoding apparatuses 300 and 500 may identify that the audio signal of the basic channel group and the audio signal of the dependent channel group are included in the current audio track.

S710 단계에서, 오디오 복호화 장치(300,500)는 오디오 트랙 #n의 식별 정보가 기본 채널 그룹의 오디오 신호를 나타내는 경우, 오디오 트랙 #n의 미디어 데이터 박스에 포함된 기본 채널 그룹의 압축 오디오 신호를 획득할 수 있다. 오디오 복호화 장치(300,500)는 기본 채널 그룹의 압축 오디오 신호를 압축 해제할 수 있다.In step S710, when the identification information of the audio track #n indicates the audio signal of the basic channel group, the audio decoding apparatus 300,500 obtains the compressed audio signal of the basic channel group included in the media data box of the audio track #n. can The audio decoding apparatuses 300 and 500 may decompress the compressed audio signal of the basic channel group.

S720 단계에서, 오디오 복호화 장치(300,500)는 기본 채널 그룹의 오디오 신호를 재생할 수 있다.In step S720 , the audio decoding apparatuses 300 and 500 may reproduce an audio signal of a basic channel group.

S730 단계에서, 오디오 복호화 장치(300,500)는 오디오 트랙 #n의 식별 정보가 기본/종속 채널 그룹의 오디오 신호를 나타내는 경우, 오디오 트랙 #n의 미디어 데이터 박스에 포함된 기본 채널 그룹의 압축 오디오 신호를 획득할 수 있다. 오디오 복호화 장치(300,500)는 획득된 기본 채널 그룹의 압축 오디오 신호를 압축 해제할 수 있다. In step S730, when the identification information of the audio track #n indicates the audio signal of the primary/dependent channel group, the audio decoding apparatus 300,500 converts the compressed audio signal of the primary channel group included in the media data box of the audio track #n can be obtained The audio decoding apparatuses 300 and 500 may decompress the compressed audio signal of the obtained basic channel group.

S735 단계에서, 오디오 복호화 장치(300,500)는 오디오 트랙 #n의 미디어 데이터 박스에 포함된 종속 채널 그룹의 압축 오디오 신호를 획득할 수 있다.In operation S735, the audio decoding apparatuses 300 and 500 may obtain a compressed audio signal of a subordinate channel group included in the media data box of the audio track #n.

오디오 복호화 장치(300,500)는 획득된 종속 채널 그룹의 압축 오디오 신호를 압축 해제할 수 있다. The audio decoding apparatuses 300 and 500 may decompress the obtained compressed audio signal of the subordinate channel group.

S740 단계에서, 오디오 복호화 장치(300,500)는 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호를 기초로, 적어도 하나의 업믹스 채널 그룹의 오디오 신호를 생성할 수 있다. In operation S740 , the audio decoding apparatus 300 or 500 may generate an audio signal of at least one upmix channel group based on the audio signal of the basic channel group and the audio signal of the dependent channel group.

오디오 복호화 장치(300,500)는 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호의 일부에 대한 디믹싱 동작을 바이패스하여, 적어도 하나의 독립 채널의 오디오 신호를 생성할 수 있다. 오디오 복호화 장치(300,500)는 적어도 하나의 업믹스 채널의 오디오 신호 및 적어도 하나의 독립 채널의 오디오 신호를 포함하는 업믹스 채널 그룹의 오디오 신호를 생성할 수 있다.The audio decoding apparatuses 300 and 500 may generate an audio signal of at least one independent channel by bypassing a demixing operation on a part of the audio signal of the basic channel group and the audio signal of the dependent channel group. The audio decoding apparatuses 300 and 500 may generate an audio signal of an upmix channel group including an audio signal of at least one upmix channel and an audio signal of at least one independent channel.

S745 단계에서, 오디오 복호화 장치(300,500)는 다채널 오디오 신호를 재생할 수 있다. 이때, 다채널 오디오 신호는 적어도 하나의 업믹스 채널 그룹의 오디오 신호 중 하나일 수 있다.In operation S745, the audio decoding apparatuses 300 and 500 may reproduce a multi-channel audio signal. In this case, the multi-channel audio signal may be one of the audio signals of at least one upmix channel group.

S750 단계에서, 오디오 복호화 장치(300,500)는 다음 오디오 트랙의 처리가 필요한지를 식별할 수 있다. 오디오 복호화 장치(300,500)는 다음 오디오 트랙의 처리가 필요하다고 식별하면, S700 단계에서, 그 다음 오디오 신호 트랙 #n+1의 식별 정보를 획득하는 등 전술한 단계들(S705~S750 단계들)의 동작을 수행할 수 있다. 즉, 오디오 복호화 장치(300,500)는 변수 n을 1만큼 증가시켜 새로운 n을 결정하고, 오디오 트랙 #n의 식별 정보를 획득하는 등 전술한 단계들의 동작(S705~S750 단계들)을 수행할 수 있다.In step S750 , the audio decoding apparatus 300 or 500 may identify whether processing of the next audio track is required. When the audio decoding apparatus 300 or 500 identifies that processing of the next audio track is necessary, in step S700, the above-described steps (steps S705 to S750) such as obtaining identification information of the next audio signal track #n+1 action can be performed. That is, the audio decoding apparatus 300 and 500 may perform the above-described operations (steps S705 to S750), such as determining a new n by increasing the variable n by 1, and acquiring identification information of the audio track #n. .

도 7a 및 7b를 참조하여 전술한 바와 같이, 기본 채널 그룹의 압축 오디오 신호 및 종속 채널 그룹의 압축 오디오 신호를 포함하는 하나의 오디오 트랙이 생성될 수 있다. 하지만, 종래의 레거시 오디오 복호화 장치는 오디오 트랙의 식별 정보가 기본/종속 채널 그룹의 오디오 신호를 나타내는 경우, 해당 오디오 트랙으로부터 기본 채널 그룹의 압축 오디오 신호만을 획득하지 못한다. 즉, 도 7a 및 7b에 의하면, 스테레오 오디오 신호와 같은 기본 채널 그룹의 오디오 신호에 대한 하위 호환이 지원되지 않는다.As described above with reference to FIGS. 7A and 7B , one audio track including the compressed audio signal of the base channel group and the compressed audio signal of the dependent channel group may be generated. However, the conventional legacy audio decoding apparatus cannot obtain only the compressed audio signal of the basic channel group from the corresponding audio track when the identification information of the audio track indicates the audio signal of the basic/dependent channel group. That is, according to FIGS. 7A and 7B , backward compatibility for an audio signal of a basic channel group such as a stereo audio signal is not supported.

도 7c는 다른 실시예에 따른 파일의 구체적인 구조를 설명하기 위한 도면이다.7C is a diagram for explaining a specific structure of a file according to another embodiment.

도 7c를 참조하면, 파일(750)은 메타 데이터 박스(760) 및 미디어 데이터 박스(780)를 포함할 수 있다. 메타 데이터 박스(760)은 적어도 하나의 오디오 트랙의 메타 데이터 박스를 포함할 수 있다. 예를 들어, 메타 데이터 박스(760)는 오디오 트랙 #n(n은 1보다 크거나 같은 정수)의 메타 데이터 박스(765) 및 오디오 트랙 #n+1의 메타 데이터 박스(770)를 포함할 수 있다. 오디오 트랙 #n+1의 메타 데이터 박스(770)는 부가 정보(775)를 포함할 수 있다. Referring to FIG. 7C , the file 750 may include a metadata box 760 and a media data box 780 . The metadata box 760 may include a metadata box of at least one audio track. For example, metadata box 760 may include metadata box 765 of audio track #n (n is an integer greater than or equal to 1) and metadata box 770 of audio track #n+1. have. The metadata box 770 of the audio track #n+1 may include additional information 775 .

미디어 데이터 박스(780)는 오디오 트랙 #n의 미디어 데이터 박스(782)를 포함할 수 있다. 오디오 트랙 #n의 미디어 데이터 박스(782)는 기본 채널 오디오 스트림(784)을 포함할 수 있다.Media data box 780 may include media data box 782 of audio track #n. The media data box 782 of audio track #n may contain a base channel audio stream 784 .

또한, 미디어 데이터 박스(780)는 오디오 트랙 #n+1의 미디어 데이터 박스(786)를 포함할 수 있다. 오디오 트랙 #n+1의 미디어 데이터 박스(786)는 종속 채널 오디오 스트림(788)을 포함할 수 있다. 또한, 오디오 트랙 #n+1의 미디어 데이터 박스(786)는 전술한 부가 정보(790)를 포함할 수 있다. 이때, 부가 정보(790)는 오디오 트랙 #n+1의 미디어 데이터 박스(786)의 헤더 부분에 포함될 수 있으나, 이에 제한되지 않는다. Also, the media data box 780 may include a media data box 786 of audio track #n+1. Media data box 786 of audio track #n+1 may contain a dependent channel audio stream 788 . Also, the media data box 786 of the audio track #n+1 may include the aforementioned additional information 790 . In this case, the additional information 790 may be included in the header portion of the media data box 786 of the audio track #n+1, but is not limited thereto.

오디오 트랙 #n의 메타 데이터 박스(765)에 포함된 위치 정보는 미디어 데이터 박스(780) 내 오디오 트랙 #n의 미디어 데이터 박스(782)의 위치를 나타낼 수 있다. 오디오 트랙 #n의 메타 데이터 박스(765)에 포함된 위치 정보를 기초로, 오디오 트랙 #n의 미디어 데이터 박스(782)가 식별될 수 있다. 마찬가지로, 오디오 트랙 #n+1의 메타 데이터 박스(770)에 포함된 위치 정보를 기초로, 오디오 트랙 #n+1의 미디어 데이터 박스(786)가 식별될 수 있다.The location information included in the metadata box 765 of the audio track #n may indicate the location of the media data box 782 of the audio track #n in the media data box 780 . Based on the location information included in the metadata box 765 of the audio track #n, the media data box 782 of the audio track #n may be identified. Similarly, based on the location information included in the metadata box 770 of the audio track #n+1, the media data box 786 of the audio track #n+1 may be identified.

도 7d는 도 7c의 파일 구조에 따라, 오디오 복호화 장치가 오디오 신호를 재생하는 방법의 흐름도를 도시한다.FIG. 7D is a flowchart illustrating a method for an audio decoding apparatus to reproduce an audio signal according to the file structure of FIG. 7C.

도 7d를 참조하면, S750 단계에서, 오디오 복호화 장치(300,500)는 메타 데이터 박스에 포함된 부가 정보로부터 오디오 트랙 #n의 식별 정보를 획득할 수 있다.Referring to FIG. 7D , in operation S750 , the audio decoding apparatuses 300 and 500 may obtain identification information of audio track #n from additional information included in the metadata box.

S755 단계에서, 오디오 복호화 장치(300,500)는 획득된 오디오 트랙 #n의 식별 정보가 기본 채널 그룹의 오디오 신호를 나타내는지 또는 종속 채널 그룹의 오디오 신호를 나타내는지를 식별할 수 있다.In operation S755 , the audio decoding apparatus 300 or 500 may identify whether the obtained identification information of the audio track #n represents the audio signal of the basic channel group or the audio signal of the subordinate channel group.

S760 단계에서, 오디오 복호화 장치(300,500)는 오디오 트랙 #n의 식별 정보가 기본 채널 그룹의 오디오 신호를 나타내는 경우, 오디오 트랙 #n의 기본 채널 그룹의 압축 오디오 신호를 압축 해제할 수 있다.In operation S760, when the identification information of the audio track #n indicates the audio signal of the basic channel group, the audio decoding apparatus 300,500 may decompress the compressed audio signal of the basic channel group of the audio track #n.

S765 단계에서, 오디오 복호화 장치(300,500)는 기본 채널 그룹의 오디오 신호를 재생할 수 있다.In operation S765 , the audio decoding apparatus 300 or 500 may reproduce an audio signal of a basic channel group.

S770 단계에서, 오디오 복호화 장치(300,500)는 오디오 트랙 #n의 식별 정보가 종속 채널 그룹의 오디오 신호를 나타내는 경우, 오디오 트랙 #n의 종속 채널 그룹의 압축 오디오 신호를 획득할 수 있다. 오디오 복호화 장치(300,500)는 오디오 트랙 #n의 종속 채널 그룹의 압축 오디오 신호를 압축 해제할 수 있다. 종속 채널 그룹의 오디오 신호에 대응하는 기본 채널 그룹의 오디오 신호의 오디오 트랙은 오디오 트랙 #n-1일 수 있다. 즉, 기본 채널 그룹의 압축 오디오 신호는 종속 채널 압축 오디오 신호를 포함하는 오디오 트랙보다 이전의 오디오 트랙에 포함될 수 있다. 구체적으로, 기본 채널 그룹의 압축 오디오 신호는 이전의 오디오 트랙 중 종속 채널 압축 오디오 신호를 포함하는 오디오 트랙과 인접하는 오디오 트랙에 포함될 수 있다. 따라서, S770 단계 이전에, 오디오 복호화 장치(300,500)는 오디오 트랙 #n-1의 기본 채널 그룹의 압축 오디오 신호를 획득할 수 있다. 또한, 오디오 복호화 장치(300,500)는 획득된 기본 채널 그룹의 압축 오디오 신호를 압축 해제할 수 있다.In operation S770 , when the identification information of the audio track #n indicates the audio signal of the dependent channel group, the audio decoding apparatus 300 or 500 may obtain the compressed audio signal of the dependent channel group of the audio track #n. The audio decoding apparatuses 300 and 500 may decompress the compressed audio signal of the dependent channel group of the audio track #n. The audio track of the audio signal of the basic channel group corresponding to the audio signal of the dependent channel group may be audio track #n-1. That is, the compressed audio signal of the basic channel group may be included in an audio track preceding the audio track including the dependent channel compressed audio signal. Specifically, the compressed audio signal of the basic channel group may be included in an audio track adjacent to an audio track including the dependent channel compressed audio signal among previous audio tracks. Therefore, before step S770, the audio decoding apparatuses 300 and 500 may obtain the compressed audio signal of the basic channel group of the audio track #n-1. Also, the audio decoding apparatus 300 or 500 may decompress the compressed audio signal of the obtained basic channel group.

S775 단계에서, 오디오 복호화 장치(300,500)는 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호를 기초로, 적어도 하나의 업믹스 채널 그룹의 오디오 신호를 생성할 수 있다.In operation S775 , the audio decoding apparatus 300 or 500 may generate an audio signal of at least one upmix channel group based on the audio signal of the basic channel group and the audio signal of the dependent channel group.

S780 단계에서, 오디오 복호화 장치(300,500)는 적어도 하나의 업믹스 채널 그룹의 오디오 신호 중 하나인 다채널 오디오 신호를 재생할 수 있다.In operation S780 , the audio decoding apparatus 300 or 500 may reproduce a multi-channel audio signal that is one of audio signals of at least one upmix channel group.

S785 단계에서, 오디오 복호화 장치(300,500)는 다음 오디오 트랙의 처리가 필요한지를 식별할 수 있다. 오디오 복호화 장치(300,500)는 다음 오디오 트랙의 처리가 필요하다고 식별하면, S750 단계에서, 그 다음 오디오 신호 트랙 #n+1의 식별 정보를 획득하고, 전술한 단계들(S755~S785 단계들)의 동작을 수행할 수 있다. 즉, 오디오 복호화 장치(300,500)는 변수 n을 1만큼 증가시켜 새로운 n을 결정하고, 오디오 트랙 #n의 식별 정보를 획득하는 등 전술한 단계들의 동작(S755~S785 단계들)을 수행할 수 있다.In operation S785 , the audio decoding apparatus 300 or 500 may identify whether processing of the next audio track is required. If the audio decoding apparatus 300 or 500 identifies that processing of the next audio track is necessary, in step S750, it obtains identification information of the next audio signal track #n+1, action can be performed. That is, the audio decoding apparatus 300 and 500 may perform the above-described operations (steps S755 to S785), such as determining a new n by increasing the variable n by 1, and acquiring identification information of the audio track #n. .

도 7c 및 7d를 참조하여 전술한 바와 같이, 기본 채널 그룹의 압축 오디오 신호를 포함하는 오디오 트랙과 별도로, 종속 채널 그룹의 압축 오디오 신호를 포함하는 오디오 트랙이 생성될 수 있다. 종래의 레거시 오디오 복호화 장치는 오디오 트랙 식별 정보가 종속 채널 그룹의 오디오 신호를 나타내는 경우, 해당 오디오 트랙으로부터 종속 채널 그룹의 압축 오디오 신호를 획득하지 못한다. 하지만, 도 7a 및 7b를 참조하여 전술한 바와 달리, 종래의 레거시 오디오 복호화 장치는 이전 오디오 트랙에 포함된 기본 채널 그룹의 압축 오디오 신호를 압축 해제하여, 기본 채널 그룹의 오디오 신호를 재생할 수 있다.As described above with reference to FIGS. 7C and 7D , an audio track including the compressed audio signal of the dependent channel group may be generated separately from the audio track including the compressed audio signal of the base channel group. When the audio track identification information indicates the audio signal of the dependent channel group, the conventional legacy audio decoding apparatus cannot obtain the compressed audio signal of the dependent channel group from the corresponding audio track. However, unlike the above with reference to FIGS. 7A and 7B , the conventional legacy audio decoding apparatus may decompress the compressed audio signal of the basic channel group included in the previous audio track to reproduce the audio signal of the basic channel group.

따라서, 도 7c 및 7d에 의하면, 스테레오 오디오 신호(즉, 기본 채널 그룹의 오디오 신호)에 대한 하위 호환이 지원될 수 있다.Accordingly, according to FIGS. 7C and 7D , backward compatibility for a stereo audio signal (ie, an audio signal of a basic channel group) may be supported.

그러면서도, 오디오 복호화 장치(300,500)는 별도의 오디오 트랙에 포함된 기본 채널 그룹의 압축 오디오 신호와 종속 채널 그룹의 압축 오디오 신호를 획득할 수 있다. 오디오 복호화 장치(300,500)는 제 1 오디오 트랙으로부터 획득된 기본 채널 그룹의 압축 오디오 신호를 압축해제할 수 있다. 오디오 복호화 장치(300,500)는 제 2 오디오 트랙으로부터 획득된 종속 채널 그룹의 압축 오디오 신호를 압축해제할 수 있다. 오디오 복호화 장치(300,500)는 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호를 기초로, 다채널 오디오 신호를 재생할 수 있다.Still, the audio decoding apparatuses 300 and 500 may obtain the compressed audio signal of the basic channel group and the compressed audio signal of the dependent channel group included in a separate audio track. The audio decoding apparatuses 300 and 500 may decompress the compressed audio signal of the basic channel group obtained from the first audio track. The audio decoding apparatuses 300 and 500 may decompress the compressed audio signal of the dependent channel group obtained from the second audio track. The audio decoding apparatuses 300 and 500 may reproduce the multi-channel audio signal based on the audio signal of the basic channel group and the audio signal of the dependent channel group.

한편, 기본 채널 그룹에 대응하는 종속 채널 그룹의 개수는 복수일 수 있다. 이 경우, 적어도 하나의 종속 채널 그룹의 오디오 신호를 포함하는 오디오 트랙이 복수 개로 생성될 수 있다. 예를 들어, 적어도 하나의 종속 채널 그룹 #1의 오디오 신호를 포함하는 오디오 트랙 #n이 생성될 수 있다. 적어도 하나의 종속 채널 그룹 #2의 오디오 신호를 포함하는 오디오 트랙 #n+1이 생성될 수 있다. 오디오 트랙 #n+1과 유사하게, 적어도 하나의 종속 채널 그룹 #3의 오디오 신호를 포함하는 오디오 트랙 #n+2가 생성될 수 있다. 전술한 것과 유사하게, 적어도 하나의 종속 채널 그룹 #m의 오디오 신호를 포함하는 오디오 트랙 #n+m-1가 생성될 수 있다. 오디오 복호화 장치(300,500)는 오디오 트랙 #n, n+1, ..., n+m-1에 포함된 종속 채널 그룹들 #1,#2,...,#m의 압축 오디오 신호를 획득하고, 획득된 종속 채널 그룹들 #1,#2,...,#m의 압축 오디오 신호들을 압축 해제할 수 있다. 오디오 복호화 장치(300,500)는 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹들 #1,#2,...,#m의 오디오 신호들을 기초로, 다채널 오디오 신호를 복원할 수 있다.Meanwhile, the number of sub-channel groups corresponding to the basic channel group may be plural. In this case, a plurality of audio tracks including audio signals of at least one subordinate channel group may be generated. For example, an audio track #n including an audio signal of at least one dependent channel group #1 may be generated. An audio track #n+1 including an audio signal of at least one dependent channel group #2 may be generated. Similar to audio track #n+1, audio track #n+2 including the audio signal of at least one dependent channel group #3 may be generated. Similar to the above, an audio track #n+m-1 including an audio signal of at least one subordinate channel group #m may be generated. The audio decoding apparatus 300 and 500 obtains compressed audio signals of subordinate channel groups #1, #2,..., #m included in audio tracks #n, n+1, ..., n+m-1 and decompressed compressed audio signals of the obtained dependent channel groups #1, #2, ..., #m. The audio decoding apparatus 300 or 500 may reconstruct a multi-channel audio signal based on the audio signal of the basic channel group and the audio signals of the sub-channel groups #1, #2, ..., #m.

오디오 복호화 장치(300,500)는 지원되는 채널 레이아웃에 따라, 지원되는 채널 레이아웃의 오디오 신호를 포함하는 오디오 트랙들의 압축 오디오 신호를 획득할 수 있다. 또한, 오디오 복호화 장치(300,500)는 지원되지 않는 채널 레이아웃의 오디오 신호를 포함하는 오디오 트랙의 압축 오디오 신호를 획득하지 않을 수 있다. 오디오 복호화 장치(300,500)는 지원되는 채널 레이아웃에 따라, 전체 오디오 트랙 중 일부 오디오 트랙의 압축 오디오 신호를 획득할 수 있고, 일부 오디오 트랙에 포함된 적어도 하나의 종속 채널의 압축 오디오 신호를 압축해제할 수 있다. 따라서, 오디오 복호화 장치(300,500)는 지원되는 채널 레이아웃에 따라, 다채널 오디오 신호를 복원할 수 있다.The audio decoding apparatuses 300 and 500 may obtain compressed audio signals of audio tracks including an audio signal of a supported channel layout according to a supported channel layout. Also, the audio decoding apparatuses 300 and 500 may not obtain the compressed audio signal of the audio track including the audio signal of the unsupported channel layout. The audio decoding apparatus 300 or 500 may obtain compressed audio signals of some audio tracks among all audio tracks according to a supported channel layout, and decompress the compressed audio signals of at least one dependent channel included in some audio tracks. can Accordingly, the audio decoding apparatuses 300 and 500 may reconstruct a multi-channel audio signal according to a supported channel layout.

도 8a는 또 다른 실시예에 따른 파일 구조를 설명하기 위한 도면이다.8A is a diagram for explaining a file structure according to another embodiment.

도 8a을 참조하면, 도 7c의 오디오 트랙 #n+1 메타 데이터 박스가 아닌 메타데이터 컨테이너 트랙 #n+1 메타 데이터 박스(810)에 부가 정보(820)를 포함할 수 있다. 또한, 도 7c의 오디오 트랙 #n+1의 미디어 데이터 박스가 아닌, 메타 데이터 컨테이너 트랙 #n+1의 미디어 데이터 박스(830)에 종속 채널 오디오 스트림(840)이 포함될 수 있다. 즉, 오디오 트랙이 아닌 메타 데이터 컨테이너 트랙에 부가 정보(820)가 포함될 수 있다. 다만, 메타 데이터 컨테이너 트랙과 오디오 트랙은 동일한 트랙 그룹에서 관리될 수 있고, 따라서, 기본 채널 오디오 스트림의 오디오 트랙의 번호가 n일 때, 종속 채널 오디오 스트림의 메타데이터 컨테이너 트랙의 번호는 n+1일 수 있다.Referring to FIG. 8A , the additional information 820 may be included in the metadata container track #n+1 metadata box 810 instead of the audio track #n+1 metadata box of FIG. 7C . In addition, the dependent channel audio stream 840 may be included in the media data box 830 of the metadata container track #n+1 instead of the media data box of the audio track #n+1 of FIG. 7C . That is, the additional information 820 may be included in the metadata container track instead of the audio track. However, the metadata container track and the audio track may be managed in the same track group. Therefore, when the number of the audio track of the primary channel audio stream is n, the number of the metadata container track of the dependent channel audio stream is n+1. can be

도 8b는 도 8a의 파일 구조에 따라, 오디오 복호화 장치가 오디오 신호를 재생하는 방법의 흐름도를 도시한다.FIG. 8B is a flowchart illustrating a method for an audio decoding apparatus to reproduce an audio signal according to the file structure of FIG. 8A.

오디오 복호화 장치(300,500)는 각 트랙의 타입을 식별할 수 있다.The audio decoding apparatuses 300 and 500 may identify the type of each track.

S800 단계에서, 오디오 복호화 장치(300,500)는 오디오 트랙(#n 트랙)에 대응하는 메타데이터 컨테이너 트랙(#n+1 트랙)이 존재하는지를 식별할 수 있다. 즉, 오디오 복호화 장치(300,500)는 n번째 트랙이 오디오 트랙 중 하나임을 식별하고, n+1 번째 트랙을 식별할 수 있다. 오디오 복호화 장치(300,500)는 n+1 번째 트랙이 n번째 오디오 트랙에 대응하는 메타데이터 컨테이너 트랙인지를 식별할 수 있다.In operation S800, the audio decoding apparatuses 300 and 500 may identify whether a metadata container track (#n+1 track) corresponding to the audio track (#n track) exists. That is, the audio decoding apparatuses 300 and 500 may identify the n-th track as one of the audio tracks and identify the n+1-th track. The audio decoding apparatuses 300 and 500 may identify whether the n+1-th track is a metadata container track corresponding to the n-th audio track.

S810 단계에서, 오디오 트랙(#n 트랙)에 대응하는 메타데이터 컨테이너 트랙(#n+1 트랙)이 존재하지 않는다고 식별된 경우, 오디오 복호화 장치(300,500)는 기본 채널 그룹의 압축 오디오 신호를 압축 해제할 수 있다.In step S810, if it is identified that the metadata container track (track #n+1) corresponding to the audio track (track #n) does not exist, the audio decoding apparatus 300,500 decompresses the compressed audio signal of the basic channel group can do.

S820 단계에서, 오디오 복호화 장치(300,500)는 압축해제된 기본 채널 그룹의 오디오 신호를 재생할 수 있다.In operation S820 , the audio decoding apparatus 300 or 500 may reproduce the decompressed audio signal of the basic channel group.

S830 단계에서, 오디오 트랙(#n 트랙)에 대응하는 메타데이터 컨테이너 트랙(#n+1 트랙)이 존재한다고 식별된 경우, 오디오 복호화 장치(300,500) 오디오 트랙의 기본 채널 그룹의 압축 오디오 신호를 압축 해제할 수 있다.In step S830, if it is identified that the metadata container track (track #n+1) corresponding to the audio track (track #n) exists, the audio decoding apparatus 300,500 compresses the compressed audio signal of the basic channel group of the audio track can be turned off

S840 단계에서, 오디오 복호화 장치(300,500)는 메타데이터 컨테이너 트랙의 종속 채널 그룹의 압축 오디오 신호를 압축 해제할 수 있다.In operation S840 , the audio decoding apparatus 300 or 500 may decompress the compressed audio signal of the subordinate channel group of the metadata container track.

S850 단계에서, 오디오 복호화 장치(300,500)는 압축해제된 기본 채널 그룹의 오디오 신호 및 압축해제된 종속 채널 그룹의 오디오 신호를 기초로, 적어도 하나의 업믹스 채널 그룹의 오디오 신호를 생성할 수 있다.In operation S850, the audio decoding apparatus 300 or 500 may generate an audio signal of at least one upmix channel group based on the decompressed audio signal of the basic channel group and the decompressed audio signal of the subordinate channel group.

S860 단계에서, 오디오 복호화 장치(300,500)는 적어도 하나의 업믹스 채널 그룹의 오디오 신호 중 하나인 다채널 오디오 신호를 재생할 수 있다. In operation S860 , the audio decoding apparatus 300 or 500 may reproduce a multi-channel audio signal that is one of audio signals of at least one upmix channel group.

S870 단계에서, 오디오 복호화 장치(300,500)는 다음 오디오 트랙의 처리가 필요하지를 식별할 수 있다. 만약, 오디오 트랙 #n 트랙에 대응하는 메타데이터 컨테이너 트랙 #n+1이 존재하는 경우, 그 다음 트랙으로 #n+2 트랙이 존재하는지를 식별하고, #n+2 트랙이 존재하는 경우, 트랙 #n+2 및 #n+3의 식별 정보를 획득하는 등 전술한 단계들(S800~S870 단계들)의 동작을 수행할 수 있다. 즉, 오디오 복호화 장치(300,500)는 변수 n을 2만큼 증가시켜 새로운 n을 결정하고, 트랙 #n 및 #n+1 식별 정보를 획득하고, 전술한 단계들의 동작(S800~S870 단계들)을 수행할 수 있다.In step S870 , the audio decoding apparatus 300 or 500 may identify whether processing of the next audio track is required. If the metadata container track #n+1 corresponding to the audio track #n exists, it is identified whether track #n+2 exists as the next track, and if track #n+2 exists, track # The operations of the above-described steps (steps S800 to S870), such as obtaining identification information of n+2 and #n+3, may be performed. That is, the audio decoding apparatus 300,500 increases the variable n by 2 to determine a new n, obtains track #n and #n+1 identification information, and performs the above-described operations (steps S800 to S870). can do.

만약, 오디오 트랙 #n 트랙에 대응하는 메타데이터 컨테이너 트랙 #n+1이 존재하지 않는 경우, 그 다음 트랙으로 #n+1 트랙이 존재하는지를 식별하고, #n+1 트랙이 존재하는 경우, 트랙 #n+1 및 #n+2의 식별 정보를 획득하고, 전술한 단계들(S800~S870 단계들)의 동작을 수행할 수 있다. 즉, 오디오 복호화 장치(300,500)는 변수 n을 1만큼 증가시켜 새로운 n을 결정하고, 트랙 #n+1 및 #n+2 식별 정보를 획득하는 등 전술한 단계들의 동작(S800~S870 단계들)을 수행할 수 있다.If the metadata container track #n+1 corresponding to the audio track #n does not exist, it is identified whether track #n+1 exists as the next track, and if the track #n+1 exists, the track Identification information of #n+1 and #n+2 may be obtained, and the operations of the above-described steps (steps S800 to S870) may be performed. That is, the audio decoding apparatus 300,500 increases the variable n by 1 to determine a new n, and obtains track #n+1 and #n+2 identification information, etc. can be performed.

도 9a는 도 7a의 파일 구조에 따른, 오디오 트랙의 패킷을 설명하기 위한 도면이다.9A is a diagram for explaining a packet of an audio track according to the file structure of FIG. 7A.

도 7a에서 전술한 바와 같이, 오디오 트랙 #n의 미디어 데이터 박스(735)는 기본 채널 오디오 스트림 또는 종속 채널 오디오 스트림(740)을 포함할 수 있다.As described above in FIG. 7A , the media data box 735 of audio track #n may include a base channel audio stream or a dependent channel audio stream 740 .

도 9a를 참조하면, 오디오 트랙 #n 패킷(900)은 메타 데이터 헤더(910), 기본 채널 오디오 패킷(920) 및 종속 채널 오디오 패킷(930)을 포함할 수 있다. 기본 채널 오디오 패킷(920)은 기본 채널 오디오 스트림 중 일부를 포함할 수 있고, 종속 채널 오디오 패킷(930)은 종속 채널 오디오 스트림 중 일부를 포함할 수 있다. 메타데이터 헤더(910)는 오디오 트랙 #n의 패킷(900)의 헤더 부분에 위치할 수 있다. 메타데이터 헤더(910)는 부가 정보를 포함할 수 있다. 다만, 이에 제한되지 않고, 부가 정보는 종속 채널 오디오 패킷(930)의 헤더 부분에 위치할 수 있다.Referring to FIG. 9A , the audio track #n packet 900 may include a metadata header 910 , a base channel audio packet 920 , and a dependent channel audio packet 930 . The base channel audio packet 920 may include a portion of the base channel audio stream, and the dependent channel audio packet 930 may include a portion of the dependent channel audio stream. The metadata header 910 may be located in the header portion of the packet 900 of the audio track #n. The metadata header 910 may include additional information. However, the present invention is not limited thereto, and the additional information may be located in a header portion of the dependent channel audio packet 930 .

도 9b는 도 7c의 파일 구조에 따른, 오디오 트랙의 패킷을 설명하기 위한 도면이다.FIG. 9B is a diagram for explaining a packet of an audio track according to the file structure of FIG. 7C.

도 7c에서 전술한 바와 같이, 오디오 트랙의 #n 미디어 데이터 박스(762)는 기본 채널 오디오 스트림(764)을 포함할 수 있고, 오디오 트랙 #n+1 미디어 데이터 박스(786)는 종속 채널 오디오 스트림(788)을 포함할 수 있다.As described above in FIG. 7C , the #n media data box 762 of an audio track may contain a base channel audio stream 764 , and the audio track #n+1 media data box 786 may contain a dependent channel audio stream (788).

도 9b를 참조하면, 오디오 트랙 #n 패킷(940)은 기본 채널 오디오 패킷(945)을 포함할 수 있다. 오디오 트랙 #n+1 패킷(950)은 메타데이터 헤더(955) 및 종속 채널 오디오 패킷(960)을 포함할 수 있다. 메타데이터 헤더(955)는 오디오 트랙 #n+1의 패킷(950)의 헤더 부분에 위치할 수 있다. 메타데이터 헤더(955)는 부가 정보를 포함할 수 있다.Referring to FIG. 9B , the audio track #n packet 940 may include a basic channel audio packet 945 . The audio track #n+1 packet 950 may include a metadata header 955 and a dependent channel audio packet 960 . The metadata header 955 may be located in the header portion of the packet 950 of the audio track #n+1. The metadata header 955 may include additional information.

다만, 이에 제한되지 않고, 종속 채널 오디오 패킷(960)은 하나 이상일 수 있다. 하나 이상의 종속 채널 오디오 패킷(960)의 헤더 부분에 부가정보가 포함될 수 있다.However, the present invention is not limited thereto, and there may be one or more dependent channel audio packets 960 . Additional information may be included in a header portion of one or more dependent channel audio packets 960 .

도 9c는 도 8a의 파일 구조에 따른, 오디오 트랙의 패킷을 설명하기 위한 도면이다.9C is a diagram for explaining a packet of an audio track according to the file structure of FIG. 8A.

도 8a에서 전술한 바와 같이, 오디오 트랙 #n 미디어 데이터 박스(850)은 기본 채널 오디오 스트림(860)을 포함할 수 있고, 메타데이터 컨테이너 트랙 #n+1 미디어 데이터 박스(830)는 종속 채널 오디오 스트림(840)을 포함할 수 있다.8A, audio track #n media data box 850 may contain a base channel audio stream 860, and metadata container track #n+1 media data box 830 may contain dependent channel audio stream 840 .

도 9b의 오디오 트랙 #n+1 패킷(950)이 도 9c의 메타데이터 컨테이너 트랙 #n+1 패킷(980)으로 대체된 것을 제외하고, 도 9b와 도 9c는 동일하므로, 도 9c의 설명은 도 9b의 설명으로 대체하기로 한다.9B and 9C are identical, except that the audio track #n+1 packet 950 of FIG. 9B is replaced by the metadata container track #n+1 packet 980 of FIG. 9C, so the description of FIG. 9C is It will be replaced with the description of FIG. 9B.

도 10은 일 실시예에 따른 메타 데이터 헤더/메타 데이터 오디오 패킷의 부가 정보를 설명하기 위한 도면이다.10 is a diagram for explaining additional information of a metadata header/metadata audio packet according to an embodiment.

도 10을 참조하면, 메타 데이터 헤더/메타 데이터 오디오 패킷(1000)는 코딩 타입(Coding type) 정보(1005), 스피치 존재(Speech exist) 정보(1010), 스피치 놈(Speech Norm) 정보(1015), 저음역 효과(Lfe: Low Frequency Effects) 존재 정보(1020), 저음역 효과 이득(Lfe Gain) 정보(1025), 탑 오디오 존재 정보(1030), 스케일 펙터 존재 정보(1035), 스케일 펙터 정보(1040), 온 스크린 오디오 객체 존재 정보(1050), 비연속적 채널 오디오 스트림 존재 정보(1055) 및 연속적 채널 오디오 스트림의 존재 정보(1060) 중 적어도 하나를 포함할 수 있다.Referring to FIG. 10 , the metadata header/metadata audio packet 1000 includes coding type information 1005 , speech exist information 1010 , and speech norm information 1015 . , Low Frequency Effects (Lfe) presence information (1020), low frequency effect gain (Lfe Gain) information (1025), top audio presence information (1030), scale factor presence information (1035), scale factor information (1040) , on-screen audio object existence information 1050 , non-contiguous channel audio stream existence information 1055 , and continuous channel audio stream existence information 1060 may include at least one of.

코딩 타입 정보(1005)는 메타 데이터 헤더/메타 데이터 오디오 패킷(1000)와 관련된 미디어 데이터에 부호화된 오디오 신호가 무엇인지를 식별할 수 있는 정보일 수 있다. 즉, 코딩 타입 정보(1005)는 기본 채널 그룹의 부호화 구조 및 종속 채널 그룹의 부호화 구조를 식별하기 위한 정보일 수 있다. The coding type information 1005 may be information capable of identifying an audio signal encoded in the media data related to the metadata header/meta data audio packet 1000 . That is, the coding type information 1005 may be information for identifying the coding structure of the base channel group and the coding structure of the dependent channel group.

예를 들어, 코딩 타입 정보(1005)의 값이 0x00인 경우, 부호화된 오디오 신호가 3.1.2 채널 레이아웃의 오디오 신호인지를 나타낼 수 있다. 또한, 코딩 타입 정보(1005)의 값이 0x00인 경우, 오디오 복호화 장치(300,500)는 부호화된 오디오 신호에 포함된 기본 채널 그룹의 압축 오디오 신호가 2채널 레이아웃의 오디오 신호 A/B임을 식별할 수 있고, 나머지 종속 채널 그룹의 압축 오디오 신호가 T,P,Q 신호임을 식별할 수 있다. 코딩 타입 정보(1005)의 값이 0X01인 경우, 부호화된 오디오 신호가 5.1.2 채널 레이아웃의 오디오 신호인지를 나타낼 수 있다. 또한, 코딩 타입 정보(1005)의 값이 0x01인 경우, 오디오 복호화 장치(300,500)는 부호화된 오디오 신호에 포함된 기본 채널 그룹의 압축 오디오 신호가 2채널 레이아웃의 오디오 신호 A/B임을 식별할 수 있고, 나머지 종속 채널 그룹의 압축 오디오 신호가 T,P,Q 및 S 신호임을 식별할 수 있다.For example, when the value of the coding type information 1005 is 0x00, it may indicate whether the encoded audio signal is an audio signal of a 3.1.2 channel layout. In addition, when the value of the coding type information 1005 is 0x00, the audio decoding apparatus 300 and 500 can identify that the compressed audio signal of the basic channel group included in the encoded audio signal is the audio signal A/B of the two-channel layout. and it can be identified that the compressed audio signals of the remaining subordinate channel groups are T, P, and Q signals. When the value of the coding type information 1005 is 0X01, it may indicate whether the encoded audio signal is an audio signal of a 5.1.2 channel layout. In addition, when the value of the coding type information 1005 is 0x01, the audio decoding apparatus 300 and 500 can identify that the compressed audio signal of the basic channel group included in the encoded audio signal is the audio signal A/B of the two-channel layout. and it can be identified that the compressed audio signals of the remaining subordinate channel groups are T, P, Q and S signals.

코딩 타입 정보(1005)의 값이 0x02인 경우, 부호화된 오디오 신호가 7.1.4 채널 레이아웃의 오디오 신호인지를 나타낼 수 있다. 또한, 코딩 타입 정보(1005)의 값이 0x02인 경우, 오디오 복호화 장치(300,500)는 부호화된 오디오 신호에 포함된 기본 채널 그룹의 압축 오디오 신호가 2채널 레이아웃의 오디오 신호 A/B임을 식별할 수 있고, 나머지 종속 채널 그룹의 압축 오디오 신호가 T,P,Q,S,U,V 신호임을 식별할 수 있다.When the value of the coding type information 1005 is 0x02, it may indicate whether the encoded audio signal is an audio signal of 7.1.4 channel layout. In addition, when the value of the coding type information 1005 is 0x02, the audio decoding apparatus 300 and 500 can identify that the compressed audio signal of the basic channel group included in the encoded audio signal is the audio signal A/B of the two-channel layout. and it can be identified that the compressed audio signals of the remaining subordinate channel groups are T, P, Q, S, U, V signals.

코딩 타입 정보(1005)의 값이 0x03인 경우, 부호화된 오디오 신호가 3.1.2 채널의 레이아웃 신호 및 앰비소닉 오디오 신호를 포함함을 나타낼 수 있다. 또한, 코딩 타입 정보(1005)의 값이 0x03인 경우, 오디오 복호화 장치(300,500)는 부호화된 오디오 신호에 포함된 기본 채널 그룹의 압축 오디오 신호가 2채널 레이아웃의 오디오 신호 A/B임을 식별할 수 있고, 나머지 종속 채널 그룹의 압축 오디오 신호가 T,P,Q 신호 및 W,X,Y,Z 신호임을 식별할 수 있다.When the value of the coding type information 1005 is 0x03, it may indicate that the encoded audio signal includes a 3.1.2 channel layout signal and an ambisonic audio signal. In addition, when the value of the coding type information 1005 is 0x03, the audio decoding apparatus 300 and 500 can identify that the compressed audio signal of the basic channel group included in the encoded audio signal is the audio signal A/B of the two-channel layout. and it can be identified that the compressed audio signals of the remaining subordinate channel groups are T, P, Q signals and W, X, Y, and Z signals.

코딩 타입 정보(1005)의 값이 0x04인 경우, 부호화된 오디오 신호가 7.1.4 채널의 레이아웃 신호 및 앰비소닉 오디오 신호를 포함함을 나타낼 수 있다. 코딩 타입 정보(1005)의 값이 0x04인 경우, 오디오 복호화 장치(300,500)는 부호화된 오디오 신호에 포함된 기본 채널 그룹의 압축 오디오 신호가 2채널 레이아웃의 오디오 신호 A/B임을 식별할 수 있고, 나머지 종속 채널 그룹의 압축 오디오 신호가 T,P,Q,S,U,V 신호 및 W,X,Y,Z 신호임을 식별할 수 있다.When the value of the coding type information 1005 is 0x04, it may indicate that the encoded audio signal includes a 7.1.4 channel layout signal and an ambisonic audio signal. When the value of the coding type information 1005 is 0x04, the audio decoding apparatus 300 and 500 can identify that the compressed audio signal of the basic channel group included in the encoded audio signal is the audio signal A/B of the two-channel layout, It can be identified that the compressed audio signals of the remaining subordinate channel groups are T,P,Q,S,U,V signals and W,X,Y,Z signals.

스피치 존재 정보(1010)는 메타 데이터 헤더/메타 데이터 오디오 패킷(1000)와 관련된 미디어 데이터에 포함된 센터 채널의 오디오 신호에 대화 정보가 존재하는지를 식별하기 위한 정보일 수 있다. 스피치 놈 정보(1015)는 센터 채널의 오디오 신호에 포함된 대화의 놈(Norm) 값을 나타낼 수 있다. 오디오 복호화 장치(300,500)는 스피치 놈 정보(1015)를 기초로, 음성 신호의 볼륨을 조절할 수 있다. 즉, 오디오 복호화 장치(300,500)는 주변 소리와 대화 소리의 볼륨 레벨을 다르게 조절할 수 있다. 따라서, 더 또렷한 대화 소리가 복원될 수 있다. 또한, 오디오 복호화 장치(300, 500)는 스피치 놈 정보(1015)를 기초로, 여러 개의 오디오 신호에 포함된 음성의 볼륨 레벨을 타겟 볼륨의 크기에 일정하게 맞추고, 여러 개의 오디오 신호를 순차적으로 재생할 수 있다. The speech existence information 1010 may be information for identifying whether dialogue information exists in an audio signal of a center channel included in media data related to the metadata header/meta data audio packet 1000 . The speech norm information 1015 may indicate a normal value of a dialogue included in the audio signal of the center channel. The audio decoding apparatuses 300 and 500 may adjust the volume of the voice signal based on the speech norm information 1015 . That is, the audio decoding apparatuses 300 and 500 may adjust the volume levels of the ambient sound and the dialogue sound differently. Accordingly, a clearer dialogue sound can be restored. In addition, the audio decoding apparatuses 300 and 500 constantly adjust the volume level of the voice included in the plurality of audio signals to the size of the target volume based on the speech norm information 1015, and sequentially reproduce the plurality of audio signals. can

저음역 효과(LFE: Low Frequency Effects) 존재 정보(1020)는 메타 데이터 헤더/메타 데이터 오디오 패킷(1000)와 관련된 미디어 데이터에 저음역 효과가 존재하는지를 식별하기 위한 정보일 수 있다.Low frequency effects (LFE) existence information 1020 may be information for identifying whether a low frequency effect exists in media data related to the metadata header/metadata audio packet 1000 .

저음역 효과를 내는 오디오 신호는 센터 채널에 할당되지 않고, 컨텐츠 제작자의 의도에 따라, 지정된 오디오 신호 구간에만 포함될 수 있다. 따라서, 저음역 효과 존재 정보가 on인 경우, LFE 채널의 오디오 신호가 복원될 수 있다.The audio signal generating the low-range effect is not allocated to the center channel and may be included only in a designated audio signal section according to the intention of the content creator. Accordingly, when the bass effect existence information is on, the audio signal of the LFE channel may be restored.

저음역 효과 이득 정보(1025)는 저음역 효과 존재 정보가 on인 경우 저음역 효과 채널의 오디오 신호의 이득을 나타내는 정보이다. 오디오 복호화 장치(300,500)는 저음역 효과 이득 정보(1025)에 기초한 저음역 효과 이득에 따라, 저음역 효과의 오디오 신호를 출력할 수 있다.The low tone effect gain information 1025 is information indicating the gain of the audio signal of the low tone effect channel when the low tone effect existence information is on. The audio decoding apparatuses 300 and 500 may output the audio signal of the low tone effect according to the low tone effect gain based on the low tone effect gain information 1025 .

탑 오디오 존재 정보(1030)는 메타 데이터 헤더/메타 데이터 오디오 패킷(1000)와 관련된 미디어 데이터에 탑 프론트 채널(Top Front Channel)의 오디오 신호가 존재하는지를 나타낼 수 있다. 여기서, 탑 프론트 채널은 3.1.2 채널 레이아웃의 Hfl3 채널(TFL(Top Front Left) 채널) 및 Hfr3 채널(TFR(Top Front Right) 채널)일 수 있다.The top audio presence information 1030 may indicate whether an audio signal of a top front channel is present in media data related to the metadata header/meta data audio packet 1000 . Here, the top front channel may be an Hfl3 channel (TFL (Top Front Left) channel) and an Hfr3 channel (TFR (Top Front Right) channel) of a 3.1.2 channel layout.

스케일 펙터 존재 정보(1035) 및 스케일 펙터 정보(1040)는 도 5a의 스케일 펙터에 관한 정보에 포함된다. 스케일 펙터 존재 정보(1035)는 특정 채널의 오디오 신호에 대한 RMS 스케일 펙터가 존재하는지를 나타내는 정보일 수 있다. 스케일 펙터 정보(1040)는 스케일 펙터 존재 정보(1035)가 특정 채널의 오디오 신호에 대한 RMS 스케일 펙터가 존재함을 나타내는 경우, 특정 채널에 대한 RMS 스케일 펙터의 값을 나타내는 정보일 수 있다.The scale factor existence information 1035 and the scale factor information 1040 are included in the information about the scale factor of FIG. 5A . The scale factor existence information 1035 may be information indicating whether an RMS scale factor for an audio signal of a specific channel exists. The scale factor information 1040 may be information indicating a value of the RMS scale factor for a specific channel when the scale factor existence information 1035 indicates that an RMS scale factor for an audio signal of a specific channel exists.

온 스크린(On Screen) 오디오 객체 존재 정보(1050)는 스크린 상에 오디오 객체가 존재하는지를 나타내는 정보일 수 있다. 온 스크린 오디오 객체 존재 정보(1050)가 on인 경우, 오디오 복호화 장치(300,500)는 스크린 상에 오디오 객체가 있음을 식별하고, 기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호를 기초로 복원된 다채널 오디오 신호를 청자 전방 중심 3차원 오디오 채널의 오디오 신호로 변환하여 출력할 수 있다.The on-screen audio object existence information 1050 may be information indicating whether an audio object exists on the screen. When the on-screen audio object existence information 1050 is on, the audio decoding apparatus 300 or 500 identifies that there is an audio object on the screen, and restores the audio signal based on the audio signal of the basic channel group and the audio signal of the dependent channel group. The multi-channel audio signal may be converted into an audio signal of a three-dimensional audio channel centered in front of the listener and output.

비연속적 채널 오디오 스트림 존재 정보(1055)는 메타데이터 헤더/메타 데이터 오디오 패킷(1000)와 관련된 미디어 데이터에, 비연속적인 채널의 오디오 스트림이 포함되어 있는지를 나타내는 정보이다. 이때, 비연속적인 채널은 5.1.2 채널 또는 7.1.4 채널일 수 있다.The discontinuous channel audio stream existence information 1055 is information indicating whether an audio stream of a discontinuous channel is included in media data related to the metadata header/metadata audio packet 1000 . In this case, the discontinuous channel may be a 5.1.2 channel or a 7.1.4 channel.

연속적 채널 오디오 스트림 존재 정보(1060)는 메타데이터 헤더/메타 데이터 오디오 패킷(1000)와 관련된 미디어 데이터에 연속적인 채널의 오디오 신호(WXYZ 값)의 오디오 스트림이 포함되어 있는지를 나타내는 정보이다. 이때, 오디오 복호화 장치(300,500)는 WXYZ 값과 같은 앰비소닉 채널의 오디오 신호를 기초로, 채널의 레이아웃에 구애 받지 않고, 다양한 채널 레이아웃의 오디오 신호로 변환하여 출력할 수 있다.The continuous channel audio stream existence information 1060 is information indicating whether an audio stream of an audio signal (WXYZ value) of a continuous channel is included in the media data related to the metadata header/metadata audio packet 1000 . In this case, the audio decoding apparatus 300 or 500 may convert the audio signal of the ambisonic channel, such as the WXYZ value, into an audio signal of various channel layouts based on the audio signal, regardless of the channel layout, and output the converted audio signal.

또는, 온 스크린(on Screen) 오디오 객체 존재 정보(1050)가 on인 경우, 오디오 복호화 장치(300,500)는 WXYZ 값을 변환하여 3.1.2 채널 레이아웃의 오디오 신호에서 화면 상의 오디오 신호를 강조할 수 있다. Alternatively, when the on-screen audio object existence information 1050 is on, the audio decoding apparatus 300 or 500 converts the WXYZ value to emphasize the audio signal on the screen in the audio signal of the 3.1.2 channel layout. .

이하 표 8은 오디오 데이터 구조에 관한 수도 코드(수도코드 1)이다.Table 8 below is a pseudo code (pseudo code 1) related to an audio data structure.

struct Metadata Header {
metadata_version : 4 bits;
metadata_header_length : 9 bits;
speech exist : 1 bit; // '1' is to have dialog audio in center channel, '0' is not to have a dialog audio
if (speech_exist == 1) {
speech_norm : 8 bits; }
lfe_exist : 1 bit; // '1' is to LFE audio in center channel, '0' is not to have a dialog audio
if (lfe_exist == 1) {
lfe_gain : 8 bits;
}

on_screen_audio_object_exist : 1 bit; // '1' is to have an audio object on screen, '0' is not to have an audio object on screen
if (on_screen_audio_object_exist == 1) {
object_S : 8 bits x 5; // object sound object mixing level about 3.1.2ch(L3, R3, C, Hfl3, Hfr3)
object_G: 8 bits x 2; // size of sound source shape/area from center of object sound source
object_V: 8 bits x 2; // object sound source moving vector (cartesian coordinate dx, dy) in 1 audio frame(e.g, 980 sample)
object_L: 8 bits x 2; // object sound source start location (cartesian coordinate x, y) in 1 audio frame(e.g, 980 sample)
}
audio_metadata_exist: 3 bits: first bit is discrete_audio_exist, second bit is continuous_audio_exist, third bit is reserved
if (discrete_audio_exist == 1) {
length_of_discrete_audio_stream : 16 bits; }
if (continuous_audio_exist == 1) {
length_of_continuous_audio_stream : 16 bits; }
zero bit padding for byte alignment: N bits};

struct MetadataAudioPacket
{
coding_type : 8 bits;
cancelation error ratio exist(3.1.2ch) : 1 bit;
if (cancelation error ratio exist(3.1.2ch)== 1) {
cancelation error ratio(3.1.2ch): 8bit * 2; }
cancelation error ratio exist(5.1.2ch) : 1 bit;
if (cancelation error ratio exist(5.1.2ch)== 1) {
cancelation error ratio (5.1.2ch): 8bit * 4; }
if (cancelation error ratio exist(7.1.4ch)== 1) {
cancelation error ratio(7.1.4ch): 8bit * 4; }
zero bit padding for byte alignment: N bits
if (discrete_audio_exist == 1) {
base_channel_audio_data_length[N1]: 16 bits
dependent_channel_audio_data_length{N2}: 16 bits
}
if (continuous_audio_exist == 1)
{
continuous_channel_audio_data_length[N3]: 16 bits
}
base_channel_audio_data[N1];
dependant_channel_audio_data[N2];
continuous_channel_audio_data[N3];
};struct Metadata Header {
metadata_version : 4 bits;
metadata_header_length : 9 bits;
speech exist : 1 bit; // '1' is to have dialog audio in center channel, '0' is not to have a dialog audio
if (speech_exist == 1) {
speech_norm : 8 bits; }
lfe_exist : 1 bit; // '1' is to LFE audio in center channel, '0' is not to have a dialog audio
if (lfe_exist == 1) {
lfe_gain : 8 bits;
}

on_screen_audio_object_exist : 1 bit; // '1' is to have an audio object on screen, '0' is not to have an audio object on screen
if (on_screen_audio_object_exist == 1) {
object_S : 8 bits x 5; // object sound object mixing level about 3.1.2ch(L3, R3, C, Hfl3, Hfr3)
object_G: 8 bits x 2; // size of sound source shape/area from center of object sound source
object_V: 8 bits x 2; // object sound source moving vector (cartesian coordinate dx, dy) in 1 audio frame(eg, 980 sample)
object_L: 8 bits x 2; // object sound source start location (cartesian coordinate x, y) in 1 audio frame(eg, 980 sample)
}
audio_metadata_exist: 3 bits: first bit is discrete_audio_exist, second bit is continuous_audio_exist, third bit is reserved
if (discrete_audio_exist == 1) {
length_of_discrete_audio_stream : 16 bits; }
if (continuous_audio_exist == 1) {
length_of_continuous_audio_stream : 16 bits; }
zero bit padding for byte alignment: N bits};

struct MetadataAudioPacket
{
coding_type: 8 bits;
cancelation error ratio exist(3.1.2ch) : 1 bit;
if (cancelation error ratio exist(3.1.2ch)== 1) {
cancelation error ratio(3.1.2ch): 8bit * 2; }
cancelation error ratio exist(5.1.2ch) : 1 bit;
if (cancelation error ratio exist(5.1.2ch)== 1) {
cancelation error ratio (5.1.2ch): 8bit * 4; }
if (cancelation error ratio exist(7.1.4ch)== 1) {
cancelation error ratio(7.1.4ch): 8bit * 4; }
zero bit padding for byte alignment: N bits
if (discrete_audio_exist == 1) {
base_channel_audio_data_length[N1]: 16 bits
dependent_channel_audio_data_length{N2}: 16 bits
}
if (continuous_audio_exist == 1)
{
continuous_channel_audio_data_length[N3]: 16 bits
}
base_channel_audio_data[N1];
dependent_channel_audio_data[N2];
continuous_channel_audio_data[N3];
};

수도코드 1의 메타 데이터 헤더(Metadata Header)의 구조에는, metadata_version[4 bits] 및 metadata_header_length[9 bits] 등이 순차적으로 포함될 수 있다. metadata_version는 메타 데이터의 버전을 나타내고, metadata_header_length는 메타 데이터의 헤더의 길이를 나타낼 수 있다.In the structure of the metadata header of the capital code 1, metadata_version[4 bits] and metadata_header_length[9 bits] may be sequentially included. metadata_version may indicate a version of metadata, and metadata_header_length may indicate a length of a header of metadata.

speech_exist는 대화 오디오가 센터 채널에 존재하는지를 나타낼 수 있다. speech_norm은 대화 오디오의 음량을 측정한 놈(norm) 값을 나타낼 수 있다. lfe_exist는 LFE 채널의 오디오 신호가 센터 채널에 존재하는지를 나타낸다. lfe_gain은 LFE 채널의 오디오 신호의 이득을 나타낸다. speech_exist may indicate whether dialogue audio exists in the center channel. speech_norm may represent a norm value obtained by measuring the volume of dialogue audio. lfe_exist indicates whether an audio signal of the LFE channel exists in the center channel. lfe_gain represents the gain of the audio signal of the LFE channel.

on_screen_audio_object_exist는 스크린 상에 오디오 객체가 존재하는지를 나타낸다. object_S은 스크린 상에 오디오 객체의 3.1.2 오디오 채널에서의 채널 내 믹싱 레벨(mix level)을 나타낸다. object_G은 스크린 상에 오디오 객체의 중심 기준으로, 스크린 상에서 객체가 차지하는 면적(area)과 모양(shape)을 나타낸다. object_V는 1개의 오디오 프레임 내 스크린 상에서의 객체의 이동 벡터(dx, dy)를 나타낸다. object_L는 1개의 오디오 프레임 내 스크린 상에서의 객체의 위치 좌표(x, y)를 나타낸다.on_screen_audio_object_exist indicates whether an audio object exists on the screen. object_S represents an intra-channel mix level in 3.1.2 audio channel of an audio object on the screen. object_G is the center reference of the audio object on the screen, and represents the area and shape occupied by the object on the screen. object_V represents a motion vector (dx, dy) of an object on the screen in one audio frame. object_L represents the position coordinates (x, y) of an object on the screen in one audio frame.

audio_meta_data_exist는 기본 메타 데이터가 존재하는지, 비연속적인 채널의 오디오 메타 데이터가 존재하는지, 연속적인 채널의 오디오 메타 데이터가 존재하는지를 나타내는 정보일 수 있다.audio_meta_data_exist may be information indicating whether basic metadata exists, audio metadata of a discontinuous channel exists, or audio metadata of a continuous channel.

discrete_audio_metadata_offset은 비연속적인 채널의 오디오 메타 데이터가 존재하는 경우, 비연속적인 채널의 오디오 메타 데이터의 주소를 나타낸다.discrete_audio_metadata_offset indicates an address of audio metadata of a discontinuous channel when audio metadata of a discontinuous channel exists.

continuous_audio_metadata_offset은 연속적인 채널의 오디오 메타 데이터가 존재하는 경우, 연속적인 채널의 오디오 메타 데이터의 주소를 나타낸다. continuous_audio_metadata_offset indicates an address of audio metadata of a continuous channel when audio metadata of a continuous channel exists.

수도코드 1의 메타 데이터 오디오 패킷(Metadata Audio packet)의 구조에는, coding_type[8 bits] 등이 순차적으로 포함될 수 있다. In the structure of the metadata audio packet of the pseudo code 1, coding_type[8 bits] and the like may be sequentially included.

coding type은 오디오 신호의 부호화 구조(coding structure)의 타입을 나타낼 수 있다.The coding type may indicate a type of a coding structure of an audio signal.

cancelation error ratio exist 등의 정보가 순차적으로 포함될 수 있다. Information such as cancelation error ratio exist may be sequentially included.

cancelation error ratio exist(3.1.2 채널)은 3.1.2 채널 레이아웃의 오디오 신호에 대한 CER(Cancelation error ratio)이 존재하는지를 나타낼 수 있다. cancelation error ratio (3.1.2 채널)은 3.1.2 채널 레이아웃의 오디오 신호에 대한 CER를 나타낼 수 있다. 이와 유사하게, cancelation error ratio (5.1.2 채널), cancelation error ratio (5.1.2 채널), cancelation error ratio exist(7.1.4 채널), cancelation error ratio (7.1.4 채널)이 존재할 수 있다.The cancelation error ratio exist (3.1.2 channel) may indicate whether a cancellation error ratio (CER) exists for the audio signal of the 3.1.2 channel layout. The cancelation error ratio (3.1.2 channel) may represent the CER for the audio signal of the 3.1.2 channel layout. Similarly, a cancelation error ratio (5.1.2 channel), a cancelation error ratio (5.1.2 channel), a cancelation error ratio exist (7.1.4 channel), and a cancelation error ratio (7.1.4 channel) may exist.

discrete_audio_channel_data 는 비연속적인 채널의 오디오 채널 데이터를 나타낼 수 있다. 비연속적인 채널의 오디오 채널 데이터는 base_audio_channel_data와 dependent_audio_channel_data를 포함한다.discrete_audio_channel_data may indicate audio channel data of a non-contiguous channel. Audio channel data of non-consecutive channels includes base_audio_channel_data and dependent_audio_channel_data.

discrete_audio_level_audio_exist의 값이 1인 경우, base_audio_channel_data_length 및 dependent_audio_channel_data_length 등이 순차적으로 메타 데이터 오디오 패킷(Metadata Audio packet)에 포함될 수 있다. When the value of discrete_audio_level_audio_exist is 1, base_audio_channel_data_length and dependent_audio_channel_data_length may be sequentially included in a metadata audio packet.

base_audio_channel_data_length는 기본 오디오 채널 데이터의 길이를 나타낼 수 있다. dependent_audio_channel_data_length는 종속 오디오 채널 데이터의 길이를 나타낼 수 있다. base_audio_channel_data_length may indicate the length of basic audio channel data. dependent_audio_channel_data_length may indicate the length of dependent audio channel data.

또한, base_audio_channel_data는 기본 오디오 채널 데이터를 나타낼 수 있다.Also, base_audio_channel_data may indicate basic audio channel data.

또한, dependent_audio_channel_data는 종속 오디오 채널 데이터를 나타낼 수 있다.In addition, dependent_audio_channel_data may indicate dependent audio channel data.

continouous_audio_channel_data는 연속적인 채널의 오디오 채널 데이터를 나타낼 수 있다.continouous_audio_channel_data may indicate audio channel data of a continuous channel.

도 11은 일 실시예에 따른, 오디오 부호화 장치를 설명하기 위한 도면이다.11 is a diagram for describing an audio encoding apparatus according to an embodiment.

오디오 부호화 장치(200,400)는 디믹싱부(1105), 오디오 신호 분류부(1110), 압축부(1115), 압축 해제부(1120) 및 메타데이터 생성부(1130)를 포함할 수 있다.The audio encoding apparatuses 200 and 400 may include a demixing unit 1105 , an audio signal classification unit 1110 , a compression unit 1115 , a decompression unit 1120 , and a metadata generation unit 1130 .

디믹싱부(1105)는 원본 오디오 신호를 디믹싱하여, 하위 채널 레이아웃의 오디오 신호를 획득할 수 있다. 이때, 원본 오디오 신호는 7.1.4 채널 레이아웃의 오디오 신호일 수 있고, 하위 채널 레이아웃의 오디오 신호는 3.1.2 채널 레이아웃의 오디오 신호일 수 있다. The demixing unit 1105 may demix the original audio signal to obtain an audio signal of a lower channel layout. In this case, the original audio signal may be an audio signal of the 7.1.4 channel layout, and the audio signal of the lower channel layout may be an audio signal of the 3.1.2 channel layout.

오디오 신호 분류부(1110)는 적어도 하나의 채널 레이아웃의 오디오 신호로부터 압축에 이용될 오디오 신호들을 분류할 수 있다. 이때, 믹싱부(1113)는 일부 채널의 오디오 신호를 믹싱하여, 믹싱된 채널의 오디오 신호를 생성할 수 있다. 오디오 신호 분류부(1110)는 믹싱된 채널의 오디오 신호를 출력할 수 있다.The audio signal classifier 1110 may classify audio signals to be used for compression from audio signals of at least one channel layout. In this case, the mixing unit 1113 may mix the audio signals of some channels to generate the audio signals of the mixed channels. The audio signal classifier 1110 may output an audio signal of a mixed channel.

예를 들어, 믹싱부(1113)는 3.1.2 채널 레이아웃의 오디오 신호 L3, R3에 3.1.2 채널 레이아웃의 오디오 신호 중 센터 채널의 신호인 C_1 를 믹싱할 수 있다. 이때, 새로운 믹싱된 채널의 오디오 신호 A 및 B가 생성될 수 있다. C_1는 3.1.2 채널 레이아웃의 오디오 신호 중 센터 채널의 신호 C가 압축되었다가 압축해제된 신호일 수 있다. For example, the mixing unit 1113 may mix the audio signals L3 and R3 of the 3.1.2 channel layout with C_1 that is the center channel signal among the audio signals of the 3.1.2 channel layout. In this case, audio signals A and B of a new mixed channel may be generated. C_1 may be a signal in which the signal C of the center channel is compressed and then decompressed among the audio signals of the 3.1.2 channel layout.

즉, 3.1.2 채널 레이아웃의 오디오 신호 중 센터 채널의 신호 C는 T 신호로 분류될 수 있다. 압축부(1115) 중 제 2 압축부(1117)는 T 신호를 압축하여 T 압축 오디오 신호를 획득할 수 있다. 압축 해제부(1120)는 T 압축 오디오 신호를 압축 해제하여 C_1를 획득할 수 있다. That is, the signal C of the center channel among the audio signals of the 3.1.2 channel layout may be classified as a T signal. The second compression unit 1117 of the compression unit 1115 may obtain a T-compressed audio signal by compressing the T signal. The decompression unit 1120 may decompress the T-compressed audio signal to obtain C_1.

압축부(1115)는 오디오 신호 분류부(1110)를 통해 분류된 적어도 하나의 채널의 오디오 신호를 압축할 수 있다. 압축부(1115)는 제 1 압축부(1116), 제 2 압축부(1117) 및 제 3 압축부(1118)를 포함할 수 있다. 제 1 압축부(1116)는 기본 채널 그룹의 오디오 신호인 A 및 B를 압축하고, 압축 오디오 신호 A 및 B를 포함하는 기본 채널 오디오 스트림(1142)을 생성할 수 있다. 제 2 압축부(1117)는 제 1 종속 채널 그룹의 오디오 신호인 T, P, Q1 및 Q2를 압축하여, 압축 오디오 신호 T, P, Q1 및 Q2를 포함하는 종속 채널 오디오 스트림(1144)을 생성할 수 있다.The compression unit 1115 may compress the audio signal of at least one channel classified by the audio signal classification unit 1110 . The compression unit 1115 may include a first compression unit 1116 , a second compression unit 1117 , and a third compression unit 1118 . The first compression unit 1116 may compress the audio signals A and B of the basic channel group and generate a basic channel audio stream 1142 including the compressed audio signals A and B. The second compression unit 1117 compresses the audio signals T, P, Q1 and Q2 of the first dependent channel group to generate a dependent channel audio stream 1144 including the compressed audio signals T, P, Q1 and Q2. can do.

제 3 압축부(1118)는 제 2 종속 채널 그룹의 오디오 신호인 S1,S2,U1,U2,V1 및 V2를 압축하여, 압축 오디오 신호 S1,S2,U1,U2,V1 및 V2를 포함하는 종속 채널 오디오 스트림(1144)을 생성할 수 있다.The third compression unit 1118 compresses the audio signals S1, S2, U1, U2, V1, and V2 of the second subordinate channel group, and includes the compressed audio signals S1, S2, U1, U2, V1 and V2. A channel audio stream 1144 may be generated.

이때, 7.1.4 채널 레이아웃의 오디오 신호 중 화면과 가까운 L,R,C,Lfe, Ls,Rs,Hfl 및 Hfr 채널의 오디오 신호를 오디오 신호 S1,S2,U1,U2,V1 및 V2로 분류하여 압축함으로써 화면 중심의 오디오 채널의 음질이 향상될 수 있다. At this time, among the audio signals of 7.1.4 channel layout, the audio signals of L,R,C,Lfe, Ls,Rs,Hfl and Hfr channels close to the screen are classified into audio signals S1,S2,U1,U2,V1 and V2. By compressing, the sound quality of the audio channel centered on the screen can be improved.

메타데이터 생성부(1130)는 오디오 신호, 압축된 오디오 신호 중 적어도 하나를 기초로, 부가 정보를 포함하는 메타데이터를 생성할 수 있다. 오디오 신호는 원본 오디오 신호 및 원본 오디오 신호로부터 다운믹싱되어 생성된 하위 채널 레이아웃의 오디오 신호를 포함할 수 있다. 메타데이터는 비트스트림(1140)의 메타데이터 헤더(1146)에 포함될 수 있다.The metadata generator 1130 may generate metadata including additional information based on at least one of an audio signal and a compressed audio signal. The audio signal may include an original audio signal and an audio signal of a sub-channel layout generated by downmixing from the original audio signal. The metadata may be included in the metadata header 1146 of the bitstream 1140 .

믹싱부(1113)는 압축되지 않은 오디오 신호 C와 L3 및 R3를 믹싱하여 A 신호 및 B 신호를 믹싱할 수 있다. 하지만, 오디오 복호화 장치(300,500)가 압축되지 않은 오디오 신호 C가 믹싱된 오디오 신호 A 및 B를 디믹싱하여 L3_1, R3_1를 획득하는 경우, 원본 오디오 신호 L3, R3보다 음질이 많이 떨어졌다.The mixing unit 1113 may mix the uncompressed audio signal C and L3 and R3 to mix the A signal and the B signal. However, when the audio decoding apparatus 300,500 obtains L3_1 and R3_1 by demixing the audio signals A and B in which the uncompressed audio signal C is mixed, the sound quality is much lower than that of the original audio signals L3 and R3.

믹싱부(1113)는 C 대신, C를 압축했다가 압축해제한 오디오 신호인 C_1를 믹싱함으로써 A 신호 및 B 신호를 생성할 수 있다. 이 경우, 오디오 복호화 장치(300,500)가 오디오 신호 C1이 믹싱된 오디오 신호 A 및 B를 디믹싱하여 L3_1, R3_1를 생성하는 경우, L3_1, R3_1는 이전에 오디오 신호 C가 믹싱된 경우의 L3_1 R3_1보다 음질이 향상될 수 있다. The mixing unit 1113 may generate the A signal and the B signal by mixing C_1, which is an audio signal obtained by compressing and decompressing C, instead of C. In this case, when the audio decoding apparatus 300,500 generates L3_1 and R3_1 by demixing the audio signals A and B in which the audio signal C1 is mixed, L3_1 and R3_1 are higher than L3_1 R3_1 when the audio signal C is previously mixed. Sound quality can be improved.

도 12는 일 실시예에 따른 메타 데이터 생성부를 설명하기 위한 도면이다.12 is a diagram for describing a meta data generator according to an exemplary embodiment.

도 12를 참조하면, 메타 데이터 생성부(1200)는 원본 오디오 신호, 압축 오디오 신호 A/B 신호, 압축 오디오 신호 T/P/Q 및 S/U/V 신호를 입력으로 하여, 에러 제거를 위한 펙터 정보 등과 같은 메타 데이터(1250)를 생성할 수 있다.Referring to FIG. 12 , the metadata generator 1200 receives an original audio signal, a compressed audio signal A/B signal, a compressed audio signal T/P/Q, and a S/U/V signal as inputs, and performs Meta data 1250 such as factor information may be generated.

압축 해제부(1210)는 압축 오디오 신호 A/B, T/P/Q 및 S/U/V 신호를 압축 해제할 수 있다. 업믹싱부(1215)는 오디오 신호 A/B, T/P/Q 및 S/U/V 신호 중 일부를 디믹싱하여 원본 채널 오디오 신호의 하위 채널 레이아웃의 오디오 신호를 복원할 수 있다. 예를 들어, 5.1.4 채널 레이아웃의 오디오 신호가 복원될 수 있다.The decompression unit 1210 may decompress the compressed audio signals A/B, T/P/Q, and S/U/V signals. The upmixer 1215 may reconstruct an audio signal of a sub-channel layout of the original channel audio signal by demixing some of the audio signals A/B, T/P/Q, and S/U/V signals. For example, an audio signal of a 5.1.4 channel layout may be restored.

다운믹싱부(1220)는 원본 오디오 신호를 믹싱하여 하위 채널 레이아웃의 오디오 신호를 생성할 수 있다. 이때, 업믹싱부(1215)에서 복원된 오디오 신호와 동일한 채널 레이아웃의 오디오 신호가 생성될 수 있다.The downmixer 1220 may mix an original audio signal to generate an audio signal of a sub-channel layout. In this case, an audio signal having the same channel layout as the audio signal restored by the upmixer 1215 may be generated.

RMS 측정부(1230)는 업믹싱부(1215)에서 복원된 각 업믹스 채널의 오디오 신호의 RMS 값을 측정할 수 있다. 또한, RMS 측정부(1230)는 다운믹싱부(1220)로부터 생성된 각 채널의 오디오 신호의 RMS 값을 측정할 수 있다.The RMS measuring unit 1230 may measure the RMS value of the audio signal of each upmix channel restored by the upmixing unit 1215 . Also, the RMS measuring unit 1230 may measure the RMS value of the audio signal of each channel generated by the downmixing unit 1220 .

RMS 비교부(1235)는 업믹싱부(1215)에서 복원된 업믹스 채널의 오디오 신호의 RMS 값과 다운믹싱부(1220)로부터 생성된 채널의 오디오 신호의 RMS 값을 채널별로 1:1 비교하여 각 업믹스 채널의 에러 제거를 위한 펙터 값을 생성할 수 있다. The RMS comparator 1235 compares the RMS value of the audio signal of the upmix channel restored by the upmixer 1215 with the RMS value of the audio signal of the channel generated by the downmixer 1220 1:1 for each channel. A factor value for error cancellation of each upmix channel can be generated.

메타 데이터 생성부(1200)는 각 업믹스 채널의 에러 제거를 위한 펙터 값 정보를 포함하는 메타 데이터(1250)를 생성할 수 있다.The metadata generator 1200 may generate metadata 1250 including factor value information for error removal of each upmix channel.

스피치 감지부(1240)는 원본 오디오 신호에 포함된 센터 채널의 오디오 신호 C로부터 대화(Speech)가 존재하는지를 식별할 수 있다. 메타데이터 생성부(1200)는 스피치 감지부(1240)의 식별 결과를 기초로, 스피치 존재 정보를 포함하는 메타데이터(1250)를 생성할 수 있다.The speech detector 1240 may identify whether a speech exists from the audio signal C of the center channel included in the original audio signal. The metadata generation unit 1200 may generate metadata 1250 including speech existence information based on the identification result of the speech detection unit 1240 .

스피치 측정부(1242)는 원본 오디오 신호에 포함된 센터 채널의 오디오 신호 C로부터 대화(Speech)의 놈 값을 측정할 수 있다. 메타데이터 생성부(1200)는 스피치 측정부(1242)의 측정 결과를 기초로, 스피치 놈 정보를 포함하는 메타데이터(1250)를 생성할 수 있다.The speech measurement unit 1242 may measure the norm value of the speech from the audio signal C of the center channel included in the original audio signal. The metadata generation unit 1200 may generate metadata 1250 including speech norm information based on the measurement result of the speech measurement unit 1242 .

LFE 감지부(1244)는 원본 오디오 신호에 포함된 LFE 채널의 오디오 신호로부터 저음역 효과를 감지할 수 있다. 메타데이터 생성부(1200)는 LFE 감지부(1244)의 감지 결과를 기초로, LFE 존재 정보를 포함하는 메타데이터(1250)를 생성할 수 있다.The LFE detector 1244 may detect a low-pitched sound effect from an audio signal of an LFE channel included in the original audio signal. The metadata generation unit 1200 may generate metadata 1250 including LFE existence information based on the detection result of the LFE detection unit 1244 .

LFE 진폭 측정부(1246)는 원본 오디오 신호에 포함된 LFE 채널의 오디오 신호의 진폭을 측정할 수 있다. 메타데이터 생성부(1200)는 LFE 진폭 측정부(1246)의 측정 결과를 기초로, LFE 이득 정보를 포함하는 메타데이터(1250)를 생성할 수 있다.The LFE amplitude measuring unit 1246 may measure the amplitude of the audio signal of the LFE channel included in the original audio signal. The metadata generation unit 1200 may generate metadata 1250 including LFE gain information based on the measurement result of the LFE amplitude measurement unit 1246 .

도 13은 일 실시예에 따른 오디오 복호화 장치를 설명하기 위한 도면이다.13 is a diagram for describing an audio decoding apparatus according to an embodiment.

도 13을 참조하면, 오디오 복호화 장치(300,500)는 비트스트림(1300)을 입력으로 하여 적어도 하나의 채널 레이아웃의 오디오 신호를 복원할 수 있다.Referring to FIG. 13 , the audio decoding apparatuses 300 and 500 may reconstruct an audio signal of at least one channel layout by receiving a bitstream 1300 as an input.

제 1 압축해제부(1305)는 비트스트림에 포함된 기본 채널 오디오(1301)의 압축 오디오 신호를 압축 해제하여, A_1(L2_1) 및 B_1(R2_1) 신호를 복원할 수 있다. 2채널 오디오 렌더링부(1320)는 복원된 A_1 및 B_1 신호(L2_1, R2_1)를 기초로, 2채널(스테레오 채널) 레이아웃의 오디오 신호 L2_1, R2_1를 복원할 수 있다.The first decompression unit 1305 may decompress the compressed audio signal of the basic channel audio 1301 included in the bitstream to reconstruct the A_1 (L2_1) and B_1 (R2_1) signals. The 2-channel audio rendering unit 1320 may reconstruct the audio signals L2_1 and R2_1 of the 2-channel (stereo channel) layout based on the restored A_1 and B_1 signals L2_1 and R2_1.

제 2 압축해제부(1310)는 비트스트림에 포함된 종속 채널 오디오(1302)의 압축 오디오 신호를 압축해제하여, C_1, LFE_1, Hfl3_1 및 Hfr3_1 신호를 복원할 수 있다.The second decompression unit 1310 may decompress the compressed audio signal of the dependent channel audio 1302 included in the bitstream to restore signals C_1, LFE_1, Hfl3_1, and Hfr3_1.

오디오 복호화 장치(300,500)는 C_1 및 A_1 신호를 디믹싱하여 L3_2 신호를 생성할 수 있다. 오디오 복호화 장치(300,500)는 C_1 및 B_1 신호를 디믹싱하여 R3_2 신호를 생성할 수 있다.The audio decoding apparatuses 300 and 500 may generate the L3_2 signal by demixing the C_1 and A_1 signals. The audio decoding apparatuses 300 and 500 may generate the R3_2 signal by demixing the C_1 and B_1 signals.

3.1.2 채널 오디오 렌더링부(1325)는 L3_2, R3_2, C_1, LFE_1, Hfl3_1, Hfr3_1 신호를 입력으로 하여 3.1.2 채널 레이아웃의 오디오 신호를 출력할 수 있다. 3.1.2 채널 오디오 렌더링부(1325)는 메타데이터 헤더(1303)에 포함된 메타데이터를 기초로 3.1.2 채널 레이아웃의 오디오 신호를 복원할 수 있다.The 3.1.2 channel audio rendering unit 1325 may receive the L3_2, R3_2, C_1, LFE_1, Hfl3_1, and Hfr3_1 signals as inputs and output the audio signal of the 3.1.2 channel layout. The 3.1.2 channel audio rendering unit 1325 may restore the audio signal of the 3.1.2 channel layout based on the metadata included in the metadata header 1303 .

제 3 압축해제부(1315)는 비트스트림(1300)에 포함된 종속 채널 오디오(1302)의 압축 오디오 신호를 압축해제하여, L_1 및 R_1 신호를 복원할 수 있다.The third decompression unit 1315 may decompress the compressed audio signal of the dependent channel audio 1302 included in the bitstream 1300 to reconstruct the L_1 and R_1 signals.

오디오 복호화 장치(300,500)는 L3_2 및 L_1 신호를 디믹싱하여 Ls5_2 신호를 생성할 수 있다.The audio decoding apparatuses 300 and 500 may generate the Ls5_2 signal by demixing the L3_2 and L_1 signals.

오디오 복호화 장치(300,500)는 R3_1 및 R_1 신호를 디믹싱하여 Rs5_2 신호를 생성할 수 있다. The audio decoding apparatuses 300 and 500 may generate the Rs5_2 signal by demixing the R3_1 and R_1 signals.

오디오 복호화 장치(300,500)는 Hfl3_1 신호 및 Ls5_2 신호를 디믹싱하여 Hl5_2 신호를 생성할 수 있다. 오디오 복호화 장치(300,500)는 Hfr3_1 신호 및 Rs_2 신호를 디믹싱하여 Hr5_2 신호를 생성할 수 있다. The audio decoding apparatuses 300 and 500 may generate the H15_2 signal by demixing the Hfl3_1 signal and the Ls5_2 signal. The audio decoding apparatuses 300 and 500 may generate the Hr5_2 signal by demixing the Hfr3_1 signal and the Rs_2 signal.

5.1.2 채널 오디오 렌더링부(1330)는 C_1, LFE_1, L_1, R_1, Ls5_2, Rs5_2, Hl5_2 및 Hr5_2 신호를 입력으로 하여 5.1.2 채널 레이아웃의 오디오 신호를 출력할 수 있다. 5.1.2 채널 오디오 렌더링부(1330)는 메타데이터 헤더(1303)에 포함된 메타데이터를 기초로 5.1.2 채널 레이아웃의 오디오 신호를 복원할 수 있다.The 5.1.2 channel audio rendering unit 1330 may receive the C_1, LFE_1, L_1, R_1, Ls5_2, Rs5_2, H15_2, and Hr5_2 signals as inputs and output the audio signal of the 5.1.2 channel layout. The 5.1.2 channel audio rendering unit 1330 may restore the audio signal of the 5.1.2 channel layout based on the metadata included in the metadata header 1303 .

제 3 압축해제부(1315)는 비트스트림에 포함된 종속 채널 오디오(1302)의 압축 오디오 신호를 압축해제하여, Ls_1,Rs_1, Hfl_1, Hfr_1 신호를 복원할 수 있다.The third decompression unit 1315 may decompress the compressed audio signal of the dependent channel audio 1302 included in the bitstream to reconstruct the Ls_1, Rs_1, Hfl_1, and Hfr_1 signals.

오디오 복호화 장치(300,500)는 Ls5_2 및 Ls 신호를 디믹싱하여 Lb_2 신호를 생성할 수 있다. 오디오 복호화 장치(300,500)는 Rs5_2 및 Rs 신호를 디믹싱하여 Rb_2 신호를 생성할 수 있다. 오디오 복호화 장치(300,500)는 Hl5_2 및 Hfl_1 신호를 디믹싱하여 Hbl_2 신호를 생성할 수 있다. 오디오 복호화 장치(300,500)는 MHR_2 및 Hfr_1 신호를 디믹싱하여 Hbr_2 신호를 생성할 수 있다. The audio decoding apparatuses 300 and 500 may generate the Lb_2 signal by demixing the Ls5_2 and Ls signals. The audio decoding apparatuses 300 and 500 may generate the Rb_2 signal by demixing the Rs5_2 and Rs signals. The audio decoding apparatuses 300 and 500 may generate the Hbl_2 signal by demixing the H15_2 and Hfl_1 signals. The audio decoding apparatuses 300 and 500 may generate the Hbr_2 signal by demixing the MHR_2 and Hfr_1 signals.

7.1.4 채널 오디오 렌더링부(1335)는 L_1,R_1, C_1, LFE_2, Ls, Rs, HFL_1, Hfr_1, Lb_2, Rb_2, Hbl_2 및 Hbr_2 신호를 입력으로 하여 7.1.4 채널 레이아웃의 오디오 신호를 출력할 수 있다. The 7.1.4 channel audio rendering unit 1335 receives the L_1, R_1, C_1, LFE_2, Ls, Rs, HFL_1, Hfr_1, Lb_2, Rb_2, Hbl_2, and Hbr_2 signals as inputs to output the 7.1.4 channel layout audio signal. can

7.1.4 채널 오디오 렌더링부(1335)는 메타데이터 헤더(1303)에 포함된 메타데이터를 기초로 7.1.4 채널 레이아웃의 오디오 신호를 복원할 수 있다.The 7.1.4 channel audio rendering unit 1335 may restore the audio signal of the 7.1.4 channel layout based on the metadata included in the metadata header 1303 .

도 14 는 일 실시예에 따른, 3.1.2 채널 오디오 렌더링부(1410), 5.1.2 채널 오디오 렌더링부(1420) 및 7.1.4 채널 오디오 렌더링부(1430)을 설명하기 위한 도면이다.14 is a diagram for describing a 3.1.2-channel audio rendering unit 1410, a 5.1.2-channel audio rendering unit 1420, and a 7.1.4-channel audio rendering unit 1430, according to an embodiment.

도 14를 참조하면, 3.1.2 채널 오디오 렌더링부(1410)는 L3_2 신호 및 메타데이터에 포함된 L3_2 에러 제거를 위한 펙터(ERF)를 이용하여, L3_3 신호를 생성할 수 있다. 3.1.2 채널 오디오 렌더링부(1410)는 R3_2 신호 및 메타데이터에 포함된 R3_2 에러 제거를 위한 펙터를 이용하여 R3_3 신호를 생성할 수 있다.Referring to FIG. 14 , the 3.1.2 channel audio rendering unit 1410 may generate the L3_3 signal by using the L3_2 signal and the factor (ERF) for removing the L3_2 error included in the metadata. 3.1.2 The channel audio rendering unit 1410 may generate the R3_3 signal by using the R3_2 signal and a factor for removing the R3_2 error included in the metadata.

3.1.2 채널 오디오 렌더링부(1410)는 LFE_1 신호 및 메타 데이터에 포함된 LFE 이득을 이용하여 LFE_2 신호를 생성할 수 있다.3.1.2 The channel audio rendering unit 1410 may generate the LFE_2 signal by using the LFE_1 signal and the LFE gain included in the metadata.

3.1.2 채널 오디오 렌더링부(1410)는 L3_3, R3_3, C_1, LFE_2, Hfl3_1 및 Hfr3_1 신호를 포함하는 3.1.2 채널 오디오 신호를 복원할 수 있다.The 3.1.2-channel audio rendering unit 1410 may restore the 3.1.2-channel audio signal including the L3_3, R3_3, C_1, LFE_2, Hfl3_1, and Hfr3_1 signals.

5.1.2 채널 오디오 렌더링부(1420)는 Ls5_2 신호 및 메타데이터에 포함된 Ls5_2 에러 제거를 위한 펙터를 이용하여 Ls5_3를 생성할 수 있다.5.1.2 The channel audio rendering unit 1420 may generate Ls5_3 by using the Ls5_2 signal and a factor for removing the Ls5_2 error included in the metadata.

5.1.2 채널 오디오 렌더링부(1420)는 Rs5_2 신호 및 메타데이터에 포함된 Rs5_2 에러 제거를 위한 펙터를 이용하여 Rs5_3 신호를 생성할 수 있다. 5.1.2 채널 오디오 렌더링부(1420)는 Hl5_2 신호 및 메타데이터에 포함된 Hl5_2 에러 제거를 위한 펙터를 이용하여 Hl5_3 신호를 생성할 수 있다. 5.1.2 채널 오디오 렌더링부(1420)는 Hr5_2 신호 및 Hr5_2 에러 제거를 위한 펙터를 이용하여, Hr5_3 신호를 생성할 수 있다.5.1.2 The channel audio rendering unit 1420 may generate the Rs5_3 signal by using the Rs5_2 signal and a factor for removing the Rs5_2 error included in the metadata. 5.1.2 The channel audio rendering unit 1420 may generate the H15_3 signal by using the H15_2 signal and a factor for removing the H15_2 error included in the metadata. 5.1.2 The channel audio rendering unit 1420 may generate the Hr5_3 signal by using the Hr5_2 signal and a factor for removing the Hr5_2 error.

5.1.2 채널 오디오 렌더링부(1420)는 Ls5_3, Rs5_3, Hl5_3, Hr5_3, L_1,R_1, C_1, LFE_2 신호를 포함하는 5.1.2 채널 오디오 신호를 복원할 수 있다.The 5.1.2-channel audio rendering unit 1420 may restore a 5.1.2-channel audio signal including the Ls5_3, Rs5_3, H15_3, Hr5_3, L_1, R_1, C_1, and LFE_2 signals.

한편, 7.1.4 채널 오디오 렌더링부(1430)는 Lb_2 신호 및 Lb_2 에러 제거를 위한 펙터를 이용하여, Lb_3 신호를 생성할 수 있다. Meanwhile, the 7.1.4 channel audio rendering unit 1430 may generate the Lb_3 signal by using the Lb_2 signal and the factor for removing the Lb_2 error.

7.1.4 채널 오디오 렌더링부(1430)는 Rb_2 신호 및 Rb_2 에러 제거를 위한 펙터를 이용하여, Rb_3 신호를 생성할 수 있다. 7.1.4 The channel audio rendering unit 1430 may generate the Rb_3 signal by using the Rb_2 signal and a factor for removing the Rb_2 error.

7.1.4 채널 오디오 렌더링부(1430)는 Hbl_2 신호 및 Hbl_2 에러 제거를 위한 펙터를 이용하여, Hbl_3 신호를 생성할 수 있다.7.1.4 The channel audio rendering unit 1430 may generate the Hbl_3 signal by using the Hbl_2 signal and a factor for removing the Hbl_2 error.

7.1.4 채널 오디오 렌더링부(1430)는 Hbr_2 신호 및 Hbr_2 에러 제거를 위한 펙터를 이용하여, Hbr_3 신호를 생성할 수 있다.7.1.4 The channel audio rendering unit 1430 may generate the Hbr_3 signal by using the Hbr_2 signal and a factor for removing the Hbr_2 error.

7.1.4 채널 오디오 렌더링부(1430)는 Lb_3, Rb_3, Hbl_3, Hbr_3, L_1, R_1, C_1, LFE_2, Ls_1, Rs_1, HFL_1 및 Hfr_1 신호를 포함하는 7.1.4 채널 오디오 신호를 복원할 수 있다.The 7.1.4-channel audio rendering unit 1430 may restore the 7.1.4-channel audio signal including the Lb_3, Rb_3, Hbl_3, Hbr_3, L_1, R_1, C_1, LFE_2, Ls_1, Rs_1, HFL_1, and Hfr_1 signals.

도 15a는 일 실시예에 따른 오디오 부호화 장치(400)가 에러 제거를 위한 펙터를 결정하는 과정을 설명하기 위한 흐름도이다.15A is a flowchart illustrating a process of determining a factor for error removal by the audio encoding apparatus 400 according to an exemplary embodiment.

S1502 단계에서, 오디오 부호화 장치(400)는 제 1 오디오 신호의 원 신호 세기가 소정의 제 1 값보다 작은지를 확인할 수 있다. 여기서 원 신호 세기란, 원본 오디오 신호의 신호 세기 또는 원본 오디오 신호로부터 다운믹싱된 오디오 신호의 신호 세기를 의미할 수 있다. 즉, 제 1 오디오 신호는 원본 오디오 신호 또는 원본 오디오 신호로부터 다운믹싱된 오디오 신호 중 적어도 일부 채널의 오디오 신호일 수 있다.In operation S1502, the audio encoding apparatus 400 may determine whether the original signal strength of the first audio signal is less than a predetermined first value. Here, the original signal strength may mean the signal strength of the original audio signal or the signal strength of the audio signal downmixed from the original audio signal. That is, the first audio signal may be an audio signal of at least some channels of the original audio signal or an audio signal downmixed from the original audio signal.

S1504 단계에서, 제 1 오디오 신호의 원 신호 세기가 소정의 제 1 값보다 작은 경우(Yes), 오디오 부호화 장치(400)는 제 1 오디오 신호에 대하여 에러 제거를 위한 펙터의 값을 0으로 결정할 수 있다.In step S1504, when the original signal strength of the first audio signal is less than a predetermined first value (Yes), the audio encoding apparatus 400 may determine a value of a factor for error removal with respect to the first audio signal as 0. have.

S1506 단계에서, 제 1 오디오 신호의 원 신호 세기가 소정의 제 1 값보다 크거나 같은 경우(No), 오디오 부호화 장치(400)는 제 1 오디오 신호와 제 2 오디오 신호의 원 신호 세기 비율이 소정의 제 2 값보다 작은지를 확인할 수 있다.In step S1506, when the original signal strength of the first audio signal is greater than or equal to the predetermined first value (No), the audio encoding apparatus 400 determines that the ratio of the original signal strength of the first audio signal and the second audio signal is predetermined. It can be checked whether it is smaller than the second value of .

S1508 단계에서, 제 1 오디오 신호와 제 2 오디오 신호의 신호 세기 비율이 소정의 제 2 값보다 작은 경우(Yes), 오디오 부호화 장치(400)는 제 1 오디오 신호의 원 신호 세기 및 제 1 오디오 신호의 복호화후 신호 세기를 기초로, 에러 제거를 위한 펙터를 결정할 수 있다.In step S1508, if the signal strength ratio of the first audio signal and the second audio signal is less than a predetermined second value (Yes), the audio encoding apparatus 400 determines the original signal strength of the first audio signal and the first audio signal Based on the signal strength after decoding of , a factor for error removal may be determined.

S1510 단계에서, 오디오 부호화 장치(400)는 에러 제거를 위한 펙터의 값이 1보다 큰지를 확인할 수 있다.In operation S1510, the audio encoding apparatus 400 may determine whether the value of the factor for error removal is greater than 1.

S1512 단계에서, 제 1 오디오 신호와 제 2 오디오 신호의 신호 세기 비율이 소정의 제 2 값보다 크거나 같은 경우(No), 오디오 부호화 장치(400)는 제 1 오디오 신호에 대하여 에러 제거를 위한 펙터의 값을 1로 결정할 수 있다.In step S1512, when the signal intensity ratio of the first audio signal and the second audio signal is greater than or equal to the second predetermined value (No), the audio encoding apparatus 400 determines the error removal factor for the first audio signal. can be set to a value of 1.

또는 S1510 단계에서, 에러 제거를 위한 펙터의 값이 1보다 큰 경우(Yes), 제 1 오디오 신호에 대하여 에러 제거를 위한 펙터의 값을 1로 결정할 수 있다.Alternatively, when the value of the factor for error cancellation is greater than 1 (Yes) in step S1510, the value of the factor for error cancellation with respect to the first audio signal may be determined to be 1.

도 15b는 일 실시예에 따른 오디오 부호화 장치(400)가 Ls5 신호의 스케일 펙터를 결정하는 과정을 설명하기 위한 흐름도이다.15B is a flowchart illustrating a process in which the audio encoding apparatus 400 determines a scale factor of an Ls5 signal according to an embodiment.

도 15b를 참조하면, S1514 단계에서, 오디오 부호화 장치(400)는 Ls5 신호의 전력(20log(RMS(Ls5))가 -80dB보다 작은지를 확인할 수 있다. 여기서 RMS 값은 프레임 단위로 산출될 수 있다. 예를 들어, 하나의 프레임은 960 개의 샘플들의 오디오 신호를 포함할 수 있으나, 이에 제한되지 않고, 하나의 프레임은 복수의 개수의 샘플들의 오디오 신호를 포함할 수 있다. X의 RMS 값 RMS(X)는 다음 수학식 1에 의하여 산출될 수 있다. 여기서 N은 샘플의 개수를 의미한다.15B , in step S1514, the audio encoding apparatus 400 may determine whether the power 20log(RMS(Ls5)) of the Ls5 signal is less than -80dB, where the RMS value may be calculated in units of frames. For example, one frame may include an audio signal of 960 samples, but is not limited thereto, and one frame may include an audio signal of a plurality of samples. RMS value of X RMS ( X) can be calculated by the following Equation 1. Here, N means the number of samples.

S1516 단계에서, 오디오 부호화 장치(400)는 Ls5 신호의 전력이 -80dB보다 작은 경우, Ls5_2 신호에 대한 에러 제거를 위한 펙터를 0으로 결정할 수 있다.In operation S1516 , when the power of the Ls5 signal is less than -80 dB, the audio encoding apparatus 400 may determine a factor for error removal of the Ls5_2 signal as 0.

S1518 단계에서, 오디오 부호화 장치(400)는 하나의 프레임에 대한 Ls5 신호의 전력과 L3 신호의 전력의 비율(20log(RMS(Ls5)/RMS(L3)))이 -6dB보다 작은지를 확인할 수 있다.In step S1518, the audio encoding apparatus 400 may check whether the ratio (20log(RMS(Ls5)/RMS(L3))) of the power of the Ls5 signal to the power of the L3 signal for one frame is less than -6dB. .

S1520 단계에서, 하나의 프레임에 대한 Ls5 신호의 전력과 L3 신호의 전력의 비율(20log(RMS(Ls5)/RMS(L3)))이 -6dB보다 작은 경우(Yes), 오디오 부호화 장치(400)는 L3_2 신호를 생성할 수 있다. 구체적으로, 오디오 부호화 장치(400)는 원본 오디오 신호를 다운믹싱하여 C 신호 및 L2 신호를 압축하여 C_1 신호 및 L2_1 신호를 획득하고, 압축 C 신호 및 L2 신호를 압축 해제하여 C_1 신호 및 L2_1 신호를 획득할 수 있다. 오디오 부호화 장치(400)는 C_1 신호 및 L2_1 신호를 디믹싱하여 L3_2 신호를 생성할 수 있다.In step S1520, when the ratio (20log(RMS(Ls5)/RMS(L3))) of the power of the Ls5 signal to the power of the L3 signal for one frame is less than -6dB (Yes), the audio encoding apparatus 400 may generate the L3_2 signal. Specifically, the audio encoding apparatus 400 downmixes the original audio signal to compress the C signal and the L2 signal to obtain the C_1 signal and the L2_1 signal, and decompresses the compressed C signal and the L2 signal to obtain the C_1 signal and the L2_1 signal. can be obtained The audio encoding apparatus 400 may generate the L3_2 signal by demixing the C_1 signal and the L2_1 signal.

S1522 단계에서, 오디오 부호화 장치(400)는 압축 L신호를 압축 해제하여 L_1 신호를 획득할 수 있다.In operation S1522, the audio encoding apparatus 400 may decompress the compressed L signal to obtain the L_1 signal.

S1524 단계에서, 오디오 부호화 장치(400)는 L3_2 신호 및 L_1 신호를 기초로, Ls5_2 신호를 생성할 수 있다.In operation S1524 , the audio encoding apparatus 400 may generate an Ls5_2 signal based on the L3_2 signal and the L_1 signal.

S1526 단계에서, 오디오 부호화 장치(400)는 Ls5의 전력 값(RMS(Ls5)) 및 Ls5_2의 전력 값(RMS(Ls5_2))을 기초로, 에러 제거를 위한 펙터(RMS(Ls5)/RMS(Ls5_2))를 결정할 수 있다.In step S1526, the audio encoding apparatus 400 determines a factor (RMS(Ls5)/RMS(Ls5_2) for error removal based on a power value of Ls5 (RMS(Ls5)) and a power value of Ls5_2 (RMS(Ls5_2)). )) can be determined.

S1528 단계에서, 오디오 부호화 장치(400)는 에러 제거를 위한 펙터의 값이 1보다 큰 지를 확인할 수 있다.In operation S1528, the audio encoding apparatus 400 may check whether the value of the factor for error removal is greater than 1.

S1530 단계에서, 에러 제거를 위한 펙터의 값이 1보다 크다면(Yes), 오디오 부호화 장치(400)는 에러 제거를 위한 펙터의 값을 1로 결정할 수 있다.In operation S1530, if the value of the factor for error removal is greater than 1 (Yes), the audio encoding apparatus 400 may determine the value of the factor for error removal as 1.

S1532 단계에서, 오디오 부호화 장치(400)는 Ls5_2 신호의 에러 제거를 위한 펙터를 저장하고, 출력할 수 있다. 오디오 부호화 장치(400)는 에러 제거를 위한 펙터에 관한 정보를 포함하는, 에러 제거와 관련된 정보를 생성하고, 에러 제거와 관련된 정보를 포함하는 부가 정보를 생성할 수 있다. 오디오 부호화 장치(400)는 부가 정보를 포함하는 비트스트림을 생성하고, 출력할 수 있다.In operation S1532 , the audio encoding apparatus 400 may store and output a factor for error removal of the Ls5_2 signal. The audio encoding apparatus 400 may generate information related to error removal including information on factors for error removal, and may generate additional information including information related to error removal. The audio encoding apparatus 400 may generate and output a bitstream including additional information.

도 15c는 일 실시예에 따른 오디오 복호화 장치(500)가 에러 제거를 위한 펙터를 기초로, Ls5_3 신호를 생성하는 과정을 설명하기 위한 흐름도이다.15C is a flowchart illustrating a process in which the audio decoding apparatus 500 generates an Ls5_3 signal based on a factor for error removal according to an exemplary embodiment.

S1535 단계에서, 오디오 복호화 장치(500)는 L3_2 신호를 생성할 수 있다. In operation S1535 , the audio decoding apparatus 500 may generate an L3_2 signal.

예를 들어, 오디오 복호화 장치(500)는 압축 C 신호 및 L2 신호를 압축 해제하여 C_1 신호 및 L2_1 신호를 획득할 수 있다. 오디오 부호화 장치(400)는 C_1 신호 및 L2_1 신호를 디믹싱하여 L3_2 신호를 생성할 수 있다.For example, the audio decoding apparatus 500 may decompress the compressed C signal and the L2 signal to obtain the C_1 signal and the L2_1 signal. The audio encoding apparatus 400 may generate the L3_2 signal by demixing the C_1 signal and the L2_1 signal.

S1540 단계에서, 오디오 복호화 장치(500)는 압축 L신호를 압축해제하여 L_1 신호를 획득할 수 있다.In operation S1540, the audio decoding apparatus 500 may decompress the compressed L signal to obtain the L_1 signal.

S1545 단계에서, 오디오 복호화 장치(500)는, L3_2 신호 및 L_1 신호를 기초로, Ls5_2 신호를 생성할 수 있다. 즉, 오디오 복호화 장치(500)는, L3_2 신호 및 L_1 신호를 디믹싱하여 Ls5_2 신호를 생성할 수 있다.In operation S1545 , the audio decoding apparatus 500 may generate an Ls5_2 signal based on the L3_2 signal and the L_1 signal. That is, the audio decoding apparatus 500 may generate the Ls5_2 signal by demixing the L3_2 signal and the L_1 signal.

S1550 단계에서, 오디오 복호화 장치(500)는 Ls_2 신호에 대하여 에러 제거를 위한 펙터를 획득할 수 있다.In operation S1550 , the audio decoding apparatus 500 may obtain a factor for error removal with respect to the Ls_2 signal.

S1555 단계에서, 오디오 복호화 장치(500)는 Ls5_2 신호에 대하여 에러 제거를 위한 펙터를 적용하여 Ls5_3 신호를 생성할 수 있다. 이때, Ls5_2의 RMS 값에 에러 제거를 위한 펙터를 곱한 RMS값(즉, Ls5의 RMS값과 거의 같은 RMS 값)을 갖는 Ls5_3 신호가 생성될 수 있다.In operation S1555, the audio decoding apparatus 500 may generate an Ls5_3 signal by applying a factor for error removal to the Ls5_2 signal. In this case, an Ls5_3 signal having an RMS value obtained by multiplying an RMS value of Ls5_2 by a factor for error removal (ie, an RMS value approximately equal to an RMS value of Ls5) may be generated.

복수의 오디오 채널의 오디오 신호가 믹싱되고, 믹스 채널의 오디오 신호를 손실 부호화(Lossy Coding)하는 과정에서, 오디오 신호에 에러가 발생할 수 있다. 구체적으로, 오디오 신호에 대한 양자화 프로세스에서 오디오 신호에 부호화 에러가 발생할 수 있다.In the process of mixing audio signals of a plurality of audio channels and lossy coding the audio signals of the mixed channels, an error may occur in the audio signals. Specifically, an encoding error may occur in an audio signal in a quantization process for the audio signal.

보다 구체적으로, 심리청각 특성에 기초한 모델을 이용하여 오디오 신호에 대한 부호화 프로세스(양자화)에서 부호화 에러가 발생할 수 있다. 예를 들어, 인접 주파수에서 강한 음과 약한 음이 동시에 발생되면 약한 음을 청자가 들을 수 없는 현상인 마스킹 특성이 발생한다. 즉, 인접 주파수의 강한 방해음 때문에, 약한 목적음의 최소 가청한계가 높아지게 된다.More specifically, an encoding error may occur in the encoding process (quantization) of an audio signal using a model based on psychoacoustic properties. For example, when a strong sound and a weak sound are generated at the same time at an adjacent frequency, a masking characteristic, which is a phenomenon in which a listener cannot hear a weak sound, occurs. That is, the minimum audible limit of the weak target sound is increased due to the strong disturbance sound of the adjacent frequency.

따라서, 오디오 부호화 장치(400)가 약한 음의 대역에 청각심리 모델을 이용하여 양자화하는 경우, 약한 음의 대역의 오디오 신호는 부호화되지 않을 수 있다.Accordingly, when the audio encoding apparatus 400 quantizes the weak negative band using the psychoacoustic model, the audio signal in the weak negative band may not be encoded.

예를 들어, Ls5 신호에 마스킹된 사운드(masked sound; 약한 음)가 존재하고, L 신호에 마스커 사운드(masker sound;강한 음)가 존재하는 경우, L3_2 신호는 마스킹된 사운드와 마스커 사운드가 믹싱된 신호(L3 신호)에서 마스킹 특성으로 인하여, 마스킹된 사운드가 실질적으로 제거된 신호일 수 있다.For example, if a masked sound (weak sound) is present in the Ls5 signal and a masker sound (strong sound) is present in the L signal, the L3_2 signal is a masked sound and a masker sound. Due to the masking characteristic in the mixed signal (the L3 signal), the masked sound may be substantially removed from the signal.

한편, L3_2 신호와 L_1 신호의 디믹싱으로 Ls5_2 신호를 생성하게 되면, Ls5_2 신호는 마스킹 특성에 따른 부호화의 에러로 인하여, 아주 작은 에너지의 마스커 사운드가 노이즈 형태로 포함될 수 있다.On the other hand, when the Ls5_2 signal is generated by demixing the L3_2 signal and the L_1 signal, the Ls5_2 signal may contain a masker sound of very small energy in the form of noise due to an encoding error according to the masking characteristic.

Ls5_2 신호에 포함된 마스커 사운드는 기존 마스커 사운드에 비하여 아주 작은 에너지를 가지나, 마스킹된 사운드보다는 큰 에너지를 가질 수 있다. 이 경우, 마스킹된 사운드가 출력되어야 하는 Ls5_2 채널에서, 이보다 큰 에너지를 갖는 마스커 사운드가 출력될 수 있다. 따라서, Ls5_2 채널에서의 노이즈를 줄이기 위하여, 마스킹된 사운드가 포함된 Ls5 신호의 세기와 동일한 신호 세기를 갖도록 Ls5_2 신호를 스케일함으로써 손실 부호화로 인한 에러를 제거할 수 있다. 이때, 스케일 동작을 위한 펙터(스케일 펙터)가 에러 제거를 위한 펙터가 될 수 있다. 에러 제거를 위한 펙터는 오디오 신호의 원 신호 세기와 오디오 신호의 복호화후 신호 세기의 비율로 표현되고, 오디오 복호화 장치(500)는 스케일 펙터를 기초로, 복호화된 신호에 대한 스케일 동작을 수행함으로써, 원 신호 세기와 동일한 신호 세기를 갖는 오디오 신호를 복원할 수 있다. The masker sound included in the Ls5_2 signal has very little energy compared to the existing masker sound, but may have a greater energy than the masked sound. In this case, the masker sound having energy greater than this may be output from the Ls5_2 channel to which the masked sound is to be output. Accordingly, in order to reduce noise in the Ls5_2 channel, an error due to lossy coding may be removed by scaling the Ls5_2 signal to have the same signal strength as the Ls5 signal including the masked sound. In this case, a factor (scale factor) for the scale operation may be a factor for error removal. The factor for error removal is expressed as a ratio of the original signal strength of the audio signal and the signal strength after decoding of the audio signal, and the audio decoding apparatus 500 performs a scale operation on the decoded signal based on the scale factor, An audio signal having the same signal strength as the original signal strength may be restored.

따라서, 청자는 특정 채널에서 노이즈 형태로 출력되는 마스커 사운드의 에너지가 작아짐으로써, 음질의 향상을 기대할 수 있다. Accordingly, the listener can expect improvement in sound quality by reducing the energy of the masker sound output in the form of noise from a specific channel.

한편, 마스킹된 사운드와 마스커 사운드의 원 신호 세기를 비교하여 마스킹된 사운드의 신호 세기가 마스커 사운드의 신호 세기보다 소정의 값보다 작은 경우에, 마스킹 현상에 따른 부호화 에러가 발생한다고 확인하고, 에러 제거를 위한 펙터의 값을 0과 1 사이의 값으로 결정할 수 있다. 구체적으로, 원 신호 세기와 복호화후 신호 세기의 비율로 에러 제거를 위한 펙터의 값이 결정될 수 있다. 다만, 경우에 따라, 그 비율이 1보다 큰 값인 경우라면, 에러 제거를 위한 펙터의 값이 1로 결정될 수 있다. 즉, 1보다 큰 에러 제거를 위한 펙터의 값을 갖는다면, 복호화된 신호의 에너지가 보다 커지게 되나, 마스커 사운드가 노이즈 형태로 삽입된 복호화된 신호의 에너지가 더 커지게 된다면 노이즈가 더 커지는 결과가 발생할 수 있다. On the other hand, by comparing the original signal strength of the masked sound and the masker sound, if the signal strength of the masked sound is less than a predetermined value than the signal strength of the masker sound, it is confirmed that an encoding error due to the masking phenomenon occurs, The value of the factor for error removal may be determined as a value between 0 and 1. Specifically, the value of the factor for error removal may be determined by the ratio of the original signal strength and the signal strength after decoding. However, in some cases, if the ratio is greater than 1, the value of the factor for error removal may be determined to be 1. That is, if it has a value of the factor for error removal greater than 1, the energy of the decoded signal becomes larger. consequences may occur.

따라서, 그러한 경우에는, 복호화된 신호의 현재 에너지를 유지하도록, 에러 제거를 위한 펙터의 값이 1로 결정될 수 있다.Accordingly, in such a case, the value of the factor for error removal may be determined to be 1 so as to maintain the current energy of the decoded signal.

만약, 마스킹된 사운드의 신호 세기와 마스커 사운드의 신호 세기의 비율이 소정의 값보다 크거나 같은 경우에는, 마스킹 현상에 따른 부호화 에러가 발생하지 않는다고 확인하고, 복호화된 신호의 현재 에너지를 유지하도록, 에러 제거를 위한 펙터의 값이 1로 결정될 수 있다.If the ratio of the signal strength of the masked sound to the signal strength of the masked sound is greater than or equal to a predetermined value, it is confirmed that an encoding error due to the masking phenomenon does not occur, and the current energy of the decoded signal is maintained. , the value of the factor for error removal may be determined to be 1.

따라서, 오디오 부호화 장치(200)는 오디오 신호의 신호 세기를 기초로, 에러 제거를 위한 펙터를 생성하고, 에러 제거를 위한 펙터에 관한 정보를 오디오 복호화 장치(300)로 전송할 수 있다. 오디오 복호화 장치(300)는 에러 제거를 위한 펙터에 관한 정보를 기초로, 업믹스 채널의 오디오 신호에 에러 제거를 위한 펙터를 적용함으로써, 노이즈 형태의 마스커 사운드의 에너지를 목적 사운드의 마스킹된 사운드의 에너지에 맞게 감소시킬 수 있다.Accordingly, the audio encoding apparatus 200 may generate a factor for error removal based on the signal strength of the audio signal, and transmit information about the factor for error removal to the audio decoding apparatus 300 . The audio decoding apparatus 300 applies the factor for error removal to the audio signal of the upmix channel based on information about the factor for error removal, thereby converting the energy of the masker sound in the form of noise to the masked sound of the target sound. can be reduced according to the energy of

도 16a는 일 실시예에 따른, 채널 레이아웃 확장을 위한 비트스트림의 구성을 설명하기 위한 도면이다.16A is a diagram for describing a configuration of a bitstream for channel layout extension, according to an embodiment.

도 16a를 참조하면, 비트스트림(1600)은 기본 채널 오디오 스트림(1605) 및 종속 채널 오디오 스트림 #1(1610) 및 종속 채널 오디오 스트림 #2(1615)를 포함할 수 있다. 기본 채널 오디오 스트림(1605)은 A 신호 및 B 신호를 포함할 수 있다. 오디오 복호화 장치(300,500)는 기본 채널 오디오 스트림(1605)에 포함된 A 신호 및 B 신호를 압축해제하고, 압축해제된 A 신호 및 B 신호를 기초로, 2채널 레이아웃의 오디오 신호(L2, R2 신호)를 복원할 수 있다.Referring to FIG. 16A , a bitstream 1600 may include a base channel audio stream 1605 and a dependent channel audio stream #1 ( 1610 ) and a dependent channel audio stream #2 ( 1615 ). The base channel audio stream 1605 may include an A signal and a B signal. The audio decoding apparatus 300 and 500 decompresses the A signal and the B signal included in the basic channel audio stream 1605, and based on the decompressed A and B signal, the audio signals L2 and R2 of the two-channel layout ) can be restored.

종속 채널 오디오 스트림 #1(1610)은 3.1.2 채널 중 복원된 2채널을 제외한 나머지 4개의 채널의 오디오 신호(T, P, Q1 및 Q2)를 포함할 수 있다. 오디오 복호화 장치(300,500)는 종속 채널 오디오 스트림 #1(1610)에 포함된 오디오 신호(T,P, Q1 및 Q2)를 압축 해제하고, 압축해제된 오디오 신호(T,P, Q1 및 Q2)와 기존의 압축해제된 A 신호 및 B 신호를 기초로, 3.1.2 채널 레이아웃의 오디오 신호(L3, R3, C, LFE, Hfl3, Hfr3 신호)를 복원할 수 있다.The dependent channel audio stream #1 1610 may include audio signals T, P, Q1, and Q2 of the remaining 4 channels except for the 2 reconstructed channels among 3.1.2 channels. The audio decoding apparatus 300 and 500 decompresses the audio signals (T, P, Q1 and Q2) included in the dependent channel audio stream #1 (1610), and decompresses the decompressed audio signals (T, P, Q1 and Q2) and Based on the existing decompressed A signal and B signal, it is possible to reconstruct audio signals (L3, R3, C, LFE, Hfl3, Hfr3 signals) of a 3.1.2 channel layout.

추가적으로, 종속 채널 오디오 스트림 #2(1615)는 7.1.4 채널 중 복원된 3.1.2 채널을 제외한 나머지 6개의 채널의 오디오 신호(S1,S2,U1,U2,V1,V2)를 포함할 수 있다. 오디오 복호화 장치(300,500)는 종속 채널 오디오 스트림 #2(1615)에 포함된 오디오 신호(S1,S2,U1,U2,V1,V2)와 이전에 복원된 3.1.2 채널 레이아웃의 오디오 신호를 기초로, 5.1.2 채널 레이아웃의 오디오 신호(L5, R5, Ls5, Rs5, C, LFE, Hl5, Hr5 신호)를 복원할 수 있다.Additionally, the dependent channel audio stream #2 1615 may include audio signals S1, S2, U1, U2, V1, and V2 of the remaining 6 channels except for the reconstructed 3.1.2 channel among the 7.1.4 channels. . The audio decoding apparatus 300 and 500 based on the audio signal S1, S2, U1, U2, V1, V2 included in the dependent channel audio stream #2 1615 and the previously restored audio signal of the 3.1.2 channel layout. , 5.1.2 can restore the audio signals (L5, R5, Ls5, Rs5, C, LFE, H15, Hr5 signals) of the channel layout.

전술한 바와 같이, 종속 채널 오디오 스트림 #2(1615)는 불연속적인 채널들의 오디오 신호를 포함할 수 있다. 채널의 개수를 확장하기 위해, 채널의 개수만큼의 오디오 신호가 압축되어 종속 채널 오디오 스트림 #2(1615)에 포함될 수 있다. 따라서, 채널의 개수가 많이 확장될수록, 종속 채널 오디오 스트림 #2(1615)에 포함되는 데이터의 양이 커질 수 있다.As described above, dependent channel audio stream #2 1615 may include an audio signal of discrete channels. In order to expand the number of channels, an audio signal corresponding to the number of channels may be compressed and included in the dependent channel audio stream #2 (1615). Accordingly, as the number of channels increases, the amount of data included in the dependent channel audio stream #2 1615 may increase.

도 16b는 다른 실시예에 따른, 채널 레이아웃 확장을 위한 비트스트림의 구성을 설명하기 위한 도면이다.16B is a diagram for explaining the configuration of a bitstream for channel layout extension, according to another embodiment.

도 16b를 참조하면, 비트스트림(1620)은 기본 채널 오디오 스트림(1625) 및 종속 채널 오디오 스트림 #1(1630) 및 종속 채널 오디오 스트림 #2(1635)를 포함할 수 있다.Referring to FIG. 16B , a bitstream 1620 may include a base channel audio stream 1625 , a dependent channel audio stream #1 1630 , and a dependent channel audio stream #2 1635 .

도 16a의 종속 채널 오디오 스트림 #2(1615)와 달리, 도 16b의 종속 채널 오디오 스트림 #2(1635)는 앰비소닉 오디오 신호인 WXYZ 채널의 오디오 신호를 포함할 수 있다. 앰비소닉 오디오 신호는 연속적인 채널의 오디오 스트림으로, 확장되는 채널의 개수가 많더라도, WXYZ 채널의 오디오 신호로 표현될 수 있다. 따라서, 채널의 개수의 확장이 많아지거나, 다양한 채널 레이아웃의 오디오 신호를 복원하는 경우, 종속 채널 오디오 스트림 #2(1630)는 앰비소닉 오디오 신호를 포함할 수 있다. 전술한 바와 같이, 오디오 부호화 장치(200,400)는 불연속적인 채널의 오디오 스트림(도 16a의 종속 채널 오디오 스트림 #2(1615))이 존재하는지를 나타내는 정보 및 연속적인 채널의 오디오 스트림(도 16b의 종속 채널 오디오 스트림 #2(1635))이 존재하는지를 나타내는 정보를 포함하는 부가 정보를 생성할 수 있다. 따라서, 오디오 부호화 장치(200,400)는 채널의 개수의 확장 정도를 고려하여, 선택적으로, 다양한 형태의 비트스트림을 생성할 수 있다.Unlike the dependent channel audio stream #2 1615 of FIG. 16A , the dependent channel audio stream #2 1635 of FIG. 16B may include an audio signal of a WXYZ channel that is an ambisonic audio signal. The ambisonic audio signal is an audio stream of a continuous channel, and may be expressed as an audio signal of a WXYZ channel even if the number of extended channels is large. Accordingly, when the number of channels is increased or audio signals of various channel layouts are restored, the dependent channel audio stream #2 1630 may include an ambisonic audio signal. As described above, the audio encoding apparatuses 200 and 400 provide information indicating whether an audio stream of a discontinuous channel (subordinate channel audio stream #2 ( 1615 ) in FIG. 16A ) exists and an audio stream of a continuous channel (dependent channel in FIG. 16B ). Additional information including information indicating whether audio stream #2 (1635)) exists may be generated. Accordingly, the audio encoding apparatuses 200 and 400 may selectively generate various types of bitstreams in consideration of the degree of expansion of the number of channels.

도 16c는 또 다른 실시예에 따른, 채널 레이아웃 확장을 위한 비트스트림의 구성을 설명하기 위한 도면이다.16C is a diagram for explaining a configuration of a bitstream for channel layout extension, according to another embodiment.

도 16c를 참조하면, 비트스트림(1640)은 기본 채널 오디오 스트림(1645) 및 종속 채널 오디오 스트림 #1(1650), 종속 채널 오디오 스트림 #2(1655) 및 종속 채널 오디오 스트림 #3(1660)를 포함할 수 있다. 도 16c의 기본 채널 오디오 스트림(1645), 종속 채널 오디오 스트림 #1(1650) 및 종속 채널 오디오 스트림 #2(1655)의 구성은 도 16a의 기본 채널 오디오 스트림(1605), 종속 채널 오디오 스트림 #1(1610) 및 종속 채널 오디오 스트림 #2(1615)의 구성과 동일하다. 따라서, 오디오 복호화 장치(300,500)는 기본 채널 오디오 스트림(1645), 종속 채널 오디오 스트림 #1(1650) 및 종속 채널 오디오 스트림 #2(1655)를 기초로, 7.1.4 채널 레이아웃의 오디오 신호를 복원할 수 있다.Referring to FIG. 16C , a bitstream 1640 includes a base channel audio stream 1645 and a dependent channel audio stream #1 (1650), a dependent channel audio stream #2 (1655), and a dependent channel audio stream #3 (1660). may include The configuration of the base channel audio stream 1645, the dependent channel audio stream #1 (1650), and the dependent channel audio stream #2 (1655) of FIG. 16C is the base channel audio stream 1605 of FIG. 16A, the dependent channel audio stream #1 (1610) and the configuration of the dependent channel audio stream #2 (1615) is the same. Accordingly, the audio decoding apparatus 300,500 restores the audio signal of the 7.1.4 channel layout based on the base channel audio stream 1645, the dependent channel audio stream #1 (1650), and the dependent channel audio stream #2 (1655). can do.

추가적으로, 오디오 부호화 장치(200,400)는 앰비소닉 오디오 신호를 포함하는 종속 채널 오디오 스트림 #3(1660)를 포함하는 비트스트림(1640)을 생성할 수 있다. 따라서, 오디오 부호화 장치(200,400)는 채널 레이아웃에 구애받지 않는 오디오 신호인 프리 채널 레이아웃의 오디오 신호를 복원할 수 있다. 오디오 부호화 장치(200,400)는 복원된 프리 채널 레이아웃의 오디오 신호를 다양한 불연속 채널 레이아웃의 오디오 신호로 변환할 수 있다. Additionally, the audio encoding apparatuses 200 and 400 may generate the bitstream 1640 including the dependent channel audio stream #3 1660 including the ambisonic audio signal. Accordingly, the audio encoding apparatuses 200 and 400 may restore an audio signal of a free channel layout, which is an audio signal regardless of a channel layout. The audio encoding apparatuses 200 and 400 may convert the restored audio signal of the free channel layout into an audio signal of various discontinuous channel layouts.

즉, 오디오 부호화 장치(200,400)는 추가적으로 앰비소닉 오디오 신호를 포함하는 종속 채널 오디오 스트림 #3(1660)를 포함하는 비트스트림을 생성함으로써, 자유롭게 다양한 채널 레이아웃의 오디오 신호를 복원할 수 있다.That is, the audio encoding apparatuses 200 and 400 may freely reconstruct audio signals of various channel layouts by additionally generating a bitstream including the dependent channel audio stream #3 1660 including the ambisonic audio signal.

도 17는 일 실시예에 따라, 채널 레이아웃의 확장을 위해, 3.1.2 채널 레이아웃의 오디오 신호에, 추가되는 앰비소닉 오디오 신호를 설명하기 위한 도면이다.FIG. 17 is a diagram for explaining an ambisonic audio signal added to an audio signal of a 3.1.2 channel layout in order to expand a channel layout, according to an embodiment.

오디오 부호화 장치(200,400)는 앰비소닉 오디오 신호를 압축하고, 압축된 앰비소닉 오디오 신호를 포함하는 비트스트림을 생성할 수 있다. 따라서, 앰비소닉 오디오 신호에 따라, 3.1.2 채널 레이아웃으로부터 채널 레이아웃을 확장하는 것이 가능하다.The audio encoding apparatuses 200 and 400 may compress the Ambisonics audio signal and generate a bitstream including the compressed Ambisonics audio signal. Therefore, according to the ambisonic audio signal, it is possible to extend the channel layout from the 3.1.2 channel layout.

구체적으로, 도 17을 참조하면, 3.1.2 채널 레이아웃의 오디오 신호는 청자(1700) 중심으로 전방에 위치하는 채널의 오디오 신호이다. 오디오 부호화 장치(200,400)는 앰비소닉 마이크와 같은 앰비소닉 오디오 신호 캡쳐 장치를 이용하여 앰비소닉 오디오 신호를 청자 후면의 오디오 신호로써 획득할 수 있다. 또는, 오디오 부호화 장치(200,400)는 청자 후면에 위치하는 채널들의 오디오 신호를 기초로 앰비소닉 오디오 신호를 청자 후면의 오디오 신호로써 획득할 수 있다.Specifically, referring to FIG. 17 , the audio signal of the 3.1.2 channel layout is the audio signal of the channel positioned in front of the listener 1700 . The audio encoding apparatuses 200 and 400 may acquire the ambisonic audio signal as an audio signal on the rear side of the listener by using an ambisonic audio signal capture device such as an ambisonic microphone. Alternatively, the audio encoding apparatus 200 or 400 may obtain the ambisonic audio signal as the audio signal of the rear side of the listener based on audio signals of channels located at the rear side of the listener.

예를 들어, Ls 신호, Rs 신호, Lb 신호, Rb 신호, Hbl 신호 및 Hbr 신호는 다음 수학식 2와 같이 theta, phi, 오디오 신호 S에 의해 정의될 수 있다. Theta, phi는 도 17에 도시된 바와 같다.For example, the Ls signal, the Rs signal, the Lb signal, the Rb signal, the Hbl signal, and the Hbr signal may be defined by theta, phi, and the audio signal S as shown in Equation 2 below. Theta and phi are as shown in FIG. 17 .

오디오 부호화 장치(200,400)는 다음 수학식 3에 기초하여 신호 W,X,Y,Z 값을 생성할 수 있다. 여기서, N1, N2, N3 및 N4은 정규화 계수일 수 있고, S_X=cos(theta) * cos(phi) * S, S_y=sin(theta) * cos(phi) * S, 및 S_z=sin(phi) * S일 수 있다.The audio encoding apparatuses 200 and 400 may generate signal W, X, Y, and Z values based on Equation 3 below. where N1, N2, N3 and N4 may be normalization coefficients, S _X =cos(theta) * cos(phi) * S, S _y =sin(theta) * cos(phi) * S, and S _z = It may be sin(phi) * S.

오디오 부호화 장치(200,400)는 앰비소닉 오디오 신호 W,X,Y,Z를 압축하고, 압축된 신호 W,X,Y,Z를 포함하는 비트스트림을 생성할 수 있다. The audio encoding apparatuses 200 and 400 may compress the ambisonic audio signals W, X, Y, and Z and generate a bitstream including the compressed signals W, X, Y, and Z.

오디오 복호화 장치(300,500)는 3.1.2 채널 레이아웃의 압축 오디오 신호 및 압축된 앰비소닉 오디오 신호를 포함하는 비트스트림을 획득할 수 있다. 오디오 복호화 장치(300,500)는 3.1.2 채널 레이아웃의 압축 오디오 신호 및 압축된 앰비소닉 오디오 신호를 기초로, 5.1.2 채널 레이아웃의 오디오 신호를 생성할 수 있다.The audio decoding apparatuses 300 and 500 may obtain a bitstream including a compressed audio signal of a 3.1.2 channel layout and a compressed ambisonics audio signal. The audio decoding apparatuses 300 and 500 may generate the audio signal of the 5.1.2 channel layout based on the compressed audio signal of the 3.1.2 channel layout and the compressed ambisonics audio signal.

오디오 복호화 장치(300,500)는 다음 수학식 4에 따라, 압축된 앰비소닉 오디오 신호를 기초로 청자 후방의 채널의 오디오 신호를 생성할 수 있다.The audio decoding apparatuses 300 and 500 may generate an audio signal of a channel behind the listener based on the compressed ambisonic audio signal according to Equation 4 below.

오디오 복호화 장치(300,500)는 3.1.2 채널 레이아웃의 C, LFE 신호를 그대로 이용하여 5.1.2 채널 레이아웃의 오디오 신호 중 C, LFE 신호를 생성할 수 있다. The audio decoding apparatuses 300 and 500 may use the C and LFE signals of the 3.1.2 channel layout as they are to generate C and LFE signals among the audio signals of the 5.1.2 channel layout.

오디오 복호화 장치(300,500)는 수학식 5에 따라, 5.1.2 채널 레이아웃의 채널 오디오 신호 중 Hl5, Hr5, L,R, Ls5, Rs5 신호를 생성할 수 있다. The audio decoding apparatuses 300 and 500 may generate signals H15, Hr5, L, R, Ls5, and Rs5 among the channel audio signals of the 5.1.2 channel layout according to Equation 5.

또한, 오디오 복호화 장치(300,500)는 3.1.2 채널 레이아웃의 C, LFE 신호를 그대로 이용하여 7.1.4 채널 레이아웃의 오디오 신호 중 C, LFE 신호를 생성할 수 있다.Also, the audio decoding apparatuses 300 and 500 may generate C and LFE signals among audio signals of 7.1.4 channel layout by using the C and LFE signals of the 3.1.2 channel layout as they are.

오디오 복호화 장치(300,500)는 3.1.2 채널 레이아웃의 압축 오디오 신호 외에 압축된 앰비소닉 오디오 신호로부터 획득된 Ls_1, Rs_1, Lb_1, Rb_1, Hbl_1, Hbr_1 를 이용하여 7.1.4 채널 레이아웃의 오디오 신호 중 Ls, Rs, Lb, Rb, Hbl, Hbr 신호를 생성할 수 있다. The audio decoding apparatus 300,500 uses Ls_1, Rs_1, Lb_1, Rb_1, Hbl_1, and Hbr_1 obtained from the compressed ambisonic audio signal in addition to the compressed audio signal of the 3.1.2 channel layout to Ls among the audio signals of the 7.1.4 channel layout , Rs, Lb, Rb, Hbl, Hbr signals.

오디오 복호화 장치(300,500)는 수학식 6에 따라, 7.1.4 채널 레이아웃의 채널 오디오 신호 중 Hfl, Hfr, L, R 신호를 생성할 수 있다.The audio decoding apparatuses 300 and 500 may generate Hfl, Hfr, L, and R signals among the channel audio signals of the 7.1.4 channel layout according to Equation (6).

오디오 복호화 장치(300,500)는 3.1.2 채널 레이아웃의 압축 오디오 신호 외에 압축된 앰비소닉의 오디오 신호를 이용하여, 3.1.2 채널 레이아웃으로부터 확장된 채널 레이아웃의 오디오 신호를 복원할 수 있다.The audio decoding apparatuses 300 and 500 may restore the audio signal of the extended channel layout from the 3.1.2 channel layout by using the compressed ambisonics audio signal in addition to the compressed audio signal of the 3.1.2 channel layout.

도 18은 오디오 복호화 장치(1800)가 3.1.2 채널 레이아웃의 오디오 신호 및 음원 객체 정보를 기초로, 화면상의 객체 오디오 신호를 생성하는 과정을 설명하기 위한 도면이다.18 is a diagram for explaining a process in which the audio decoding apparatus 1800 generates an object audio signal on a screen based on an audio signal of a 3.1.2 channel layout and sound source object information.

오디오 부호화 장치(200,400)는 음원 객체 정보를 기초로, 공간 상의 오디오 신호를 화면 상의 오디오 신호로 변환할 수 있다. 여기서, 음원 객체 정보는 화면상의 객체의 믹싱 레벨 신호(object_S), 객체의 크기/모양(object_G), 객체의 위치(object_L), 객체의 방향(object_V)을 나타내는 음원 객체 정보를 포함할 수 있다.The audio encoding apparatuses 200 and 400 may convert an audio signal in space into an audio signal on a screen based on the sound source object information. Here, the sound source object information may include sound source object information indicating the mixing level signal (object_S) of the object on the screen, the size/shape of the object (object_G), the position of the object (object_L), and the direction of the object (object_V).

음원 객체 신호 생성부(1810)는 오디오 신호(W,X,Y,Z,L3,R3,C,LFE,Hfl3,Hfr3)로부터, S,G,V,L 신호를 생성할 수 있다.The sound source object signal generator 1810 may generate S, G, V, and L signals from the audio signals W, X, Y, Z, L3, R3, C, LFE, Hfl3, and Hfr3.

음원 객체 신호 생성부(1810)는 음원 객체 3.1.2 채널 레이아웃의 오디오 신호 S,G,V,L 및 음원 객체 정보를 기초로, 재생성된 화면 상의 음원 객체에 대한 신호를 생성할 수 있다.The sound source object signal generator 1810 may generate a signal for the sound source object on the regenerated screen based on the audio signals S, G, V, L of the sound source object 3.1.2 channel layout and the sound source object information.

리믹싱부(1820)는 3.1.2 채널 레이아웃의 오디오 신호(L3,R3,C,LFE,Hfl3, Hfr3) 및 재생성된 화면 상의 음원 객체에 대한 신호를 기초로, 리믹싱된 객체 오디오 신호(화면 상의 오디오 신호) S11~Snm를 생성할 수 있다.The remixing unit 1820 is a remixed object audio signal (screen audio signal) S11 to Snm can be generated.

즉, 음원 객체 신호 생성부(1810) 및 리믹싱부(1820)는 다음과 같은 수학식 8에 따라, 음원 객체 정보를 기초로, 화면 상의 오디오 신호를 생성할 수 있다.That is, the sound source object signal generating unit 1810 and the remixing unit 1820 may generate an audio signal on the screen based on the sound source object information according to Equation 8 below.

오디오 복호화 장치(1800)는 음원 객체 정보와 S,G,V,L 신호를 기초로 재생성된 화면상의 음원 객체에 대한 신호를, 복원된 3.1.2 채널 레이아웃의 오디오 신호와 리믹싱함으로써 화면상의 음원 객체의 음상을 향상시킬 수 있다.The audio decoding apparatus 1800 remixes the signal for the sound source object on the screen regenerated based on the sound source object information and the S, G, V, and L signals with the audio signal of the restored 3.1.2 channel layout, thereby remixing the sound source on the screen. The sound image of an object can be improved.

도 19는, 일 실시예에 따른 오디오 부호화 장치(200,400)가 각 채널 그룹 내 오디오 스트림의 전송 순서 및 규칙을 설명하기 위한 도면이다.19 is a diagram for explaining the transmission order and rules of audio streams in each channel group by the audio encoding apparatuses 200 and 400 according to an embodiment.

스케일러블 포맷에서 각 채널 그룹 내 오디오 스트림 전송 순서 및 규칙은 다음과 같을 수 있다.In the scalable format, an audio stream transmission order and rules within each channel group may be as follows.

오디오 부호화 장치(200,400)는 커플링된 스트림을 먼저 전송하고, 커플링되지 않은 스트림을 전송할 수 있다.The audio encoding apparatuses 200 and 400 may transmit the coupled stream first, and then transmit the uncoupled stream.

오디오 부호화 장치(200,400)는 서라운드 채널에 대한 커플링된 스트림을 먼저 전송하고, 높이 채널에 대한 커플링된 스트림을 전송할 수 있다. The audio encoding apparatuses 200 and 400 may first transmit the coupled stream for the surround channel and then transmit the coupled stream for the height channel.

오디오 부호화 장치(200,400)는 전방 채널에 대한 커플링된 스트림을 먼저 전송하고, 측방이나 후방 채널에 대한 커플링된 스트림을 전송할 수 있다.The audio encoding apparatuses 200 and 400 may transmit the coupled stream for the front channel first, and then transmit the coupled stream for the side or rear channel.

오디오 부호화 장치(200,400)는 커플링되지 않은 스트림을 전송하는 경우, 센터 채널에 대한 스트림을 먼저 전송하고, LFE 채널 및 다른 채널들에 대한 스트림을 전송할 수 있다. 여기서, 다른 채널은 기본 채널 그룹이 모노 채널 신호만을 포함하는 경우에 존재할 수 있다. 이때, 다른 채널은 스테레오 채널의 오른쪽 채널 L2_ 또는 왼쪽 채널 R2 중 하나일 수 있다.When transmitting an uncoupled stream, the audio encoding apparatuses 200 and 400 may transmit a stream for a center channel first, and then transmit a stream for the LFE channel and other channels. Here, another channel may exist when the basic channel group includes only a mono channel signal. In this case, the other channel may be one of the right channel L2_ or the left channel R2 of the stereo channel.

그리고, 오디오 부호화 장치(200,400)는 커플링된 채널의 오디오 신호를 하나의 쌍으로 압축할 수 있다. 오디오 부호화 장치(200,400)는 하나의 쌍으로 압축된 오디오 신호를 포함하는, 커플링된 스트림을 전송할 수 있다. 예를 들어, 커플링된 채널은 L/R 채널, Ls/Rs, Lb/Rb, Hfl/Hfr, Hbl/Hbr 등과 같이, 좌우로 대칭적인 채널을 의미한다.In addition, the audio encoding apparatuses 200 and 400 may compress the audio signal of the coupled channel as a pair. The audio encoding apparatuses 200 and 400 may transmit a coupled stream including an audio signal compressed as a pair. For example, a coupled channel means a channel symmetrical left and right, such as an L/R channel, Ls/Rs, Lb/Rb, Hfl/Hfr, Hbl/Hbr, and the like.

이하, 전술된 각 채널 그룹 내 스트림의 전송 순서 및 규칙에 따라, Case 1의 비트스트림(1910) 내 각 채널 그룹의 스트림 구성에 대하여 설명하겠다. Hereinafter, a stream configuration of each channel group in the bitstream 1910 of Case 1 will be described according to the transmission order and rules of the streams in each channel group described above.

도 19를 참조하면, 예를 들어, 오디오 부호화 장치(200,400)는 2채널의 오디오 신호인 L1 신호 및 R1 신호를 압축하고, 압축된 L1 신호 및 R1 신호가 기본 채널 그룹(BCG)의 비트스트림 중 C1 비트스트림에 포함될 수 있다.Referring to FIG. 19 , for example, the audio encoding apparatuses 200 and 400 compress the L1 signal and the R1 signal, which are two-channel audio signals, and the compressed L1 signal and the R1 signal are included in the bitstream of the basic channel group (BCG). It may be included in the C1 bitstream.

기본 채널 그룹 다음으로, 오디오 부호화 장치(200,400)는 4채널의 오디오 신호를 종속 채널 그룹 #1의 오디오 신호로 압축할 수 있다. After the basic channel group, the audio encoding apparatuses 200 and 400 may compress the audio signal of 4 channels into the audio signal of the dependent channel group #1.

오디오 부호화 장치(200,400)는 Hfl3 신호 및 Hfr 신호를 압축하고, 압축된 Hfl3 신호 및 Hfr3 신호는 종속 채널 그룹 #1의 비트스트림 중 C2 비트스트림에 포함될 수 있다. The audio encoding apparatuses 200 and 400 may compress the Hfl3 signal and the Hfr signal, and the compressed Hfl3 signal and the Hfr3 signal may be included in the C2 bitstream among the bitstreams of the dependent channel group #1.

오디오 부호화 장치(200,400)는 C 신호를 압축하고, 압축된 C 신호는 종속 채널 그룹 #1의 비트스트림 중 M1 비트스트림에 포함될 수 있다. The audio encoding apparatuses 200 and 400 may compress the C signal, and the compressed C signal may be included in the M1 bitstream among the bitstreams of the dependent channel group #1.

오디오 부호화 장치(200,400)는 LFE 신호를 압축하고, 압축된 LFE 신호는 종속 채널 그룹 #1의 비트스트림 중 M2 비트스트림에 포함될 수 있다.The audio encoding apparatuses 200 and 400 may compress the LFE signal, and the compressed LFE signal may be included in the M2 bitstream among the bitstreams of the dependent channel group #1.

오디오 복호화 장치(300,500)는 기본 채널 그룹 및 종속 채널 그룹 #1의 압축 오디오 신호를 기초로, 3.1.2 채널 레이아웃의 오디오 신호를 복원할 수 있다.The audio decoding apparatuses 300 and 500 may reconstruct the audio signal of the 3.1.2 channel layout based on the compressed audio signal of the basic channel group and the dependent channel group #1.

종속 채널 그룹 #2 다음으로, 오디오 부호화 장치(200,400)는 6채널의 오디오 신호를 종속 채널 그룹 #2의 오디오 신호로 압축할 수 있다.Subordinate channel group #2 Next, the audio encoding apparatuses 200 and 400 may compress an audio signal of 6 channels into an audio signal of subordinate channel group #2.

먼저, 오디오 부호화 장치(200,400)는 L 신호 및 R 신호를 압축하고, 압축된 L 신호 및 R 신호는 종속 채널 그룹 #2의 비트스트림 중 C3 비트스트림에 포함될 수 있다.First, the audio encoding apparatuses 200 and 400 may compress the L signal and the R signal, and the compressed L signal and the R signal may be included in the C3 bitstream among the bitstreams of the dependent channel group #2.

C3 비트스트림 다음으로, 오디오 부호화 장치(200,400)는 Ls 신호 및 Rs 신호를 압축하고, 압축된 Ls 신호 및 Rs 신호는 종속 채널 그룹 #2의 비트스트림 중 C4 비트스트림에 포함될 수 있다.Next to the C3 bitstream, the audio encoding apparatuses 200 and 400 may compress the Ls signal and the Rs signal, and the compressed Ls signal and the Rs signal may be included in the C4 bitstream among the bitstreams of the dependent channel group #2.

C4 비트스트림 다음으로, 오디오 부호화 장치(200,400)는 Hfl 신호 및 Hfr 신호를 압축하고, 압축된 Hfl 신호 및 Hfr 신호는 종속 채널 그룹 #2의 비트스트림 중 C5 비트스트림에 포함될 수 있다.Next to the C4 bitstream, the audio encoding apparatuses 200 and 400 may compress the Hfl signal and the Hfr signal, and the compressed Hfl signal and the Hfr signal may be included in the C5 bitstream among the bitstreams of the dependent channel group #2.

오디오 복호화 장치(300,500)는 기본 채널 그룹, 종속 채널 그룹 #1 및 종속 채널 그룹 #2의 압축 오디오 신호를 기초로 7.1.4 채널 레이아웃의 오디오 신호를 복원할 수 있다.The audio decoding apparatuses 300 and 500 may reconstruct the audio signal of the 7.1.4 channel layout based on the compressed audio signals of the basic channel group, the dependent channel group #1, and the dependent channel group #2.

이하, 전술된 각 채널 그룹 내 스트림의 전송 순서 및 규칙에 따라, Case 2의 비트스트림(1920) 내 각 채널 그룹의 스트림 구성에 대하여 설명하겠다. Hereinafter, the stream configuration of each channel group in the bitstream 1920 of Case 2 will be described according to the transmission order and rules of the streams in each channel group described above.

먼저 오디오 부호화 장치(200,400)는 2채널의 오디오 신호인 L2 신호 및 R2 신호를 압축하고, 압축된 L2 신호 및 R2 신호가 기본 채널 그룹의 비트스트림 중 C1 비트스트림에 포함될 수 있다.First, the audio encoding apparatuses 200 and 400 may compress the L2 signal and the R2 signal, which are two-channel audio signals, and the compressed L2 signal and the R2 signal may be included in the C1 bitstream among the bitstreams of the basic channel group.

기본 채널 그룹 다음으로, 오디오 부호화 장치(200,400)는 6채널의 오디오 신호를 종속 채널 그룹 #1의 오디오 신호로 압축할 수 있다. After the basic channel group, the audio encoding apparatuses 200 and 400 may compress the 6-channel audio signal into the audio signal of the dependent channel group #1.

오디오 부호화 장치(200,400)는 L 신호 및 R 신호를 압축하고, 압축된 L 신호 및 R 신호는 종속 채널 그룹 #1의 비트스트림 중 C2 비트스트림에 포함될 수 있다. The audio encoding apparatuses 200 and 400 may compress the L signal and the R signal, and the compressed L signal and the R signal may be included in the C2 bitstream among the bitstreams of the dependent channel group #1.

오디오 부호화 장치(200,400)는 Ls 신호 및 Rs 신호를 압축하고, 압축된 Ls 신호 및 Rs 신호는 종속 채널 그룹 #1의 비트스트림 중 C3 비트스트림에 포함될 수 있다. The audio encoding apparatuses 200 and 400 may compress the Ls signal and the Rs signal, and the compressed Ls signal and the Rs signal may be included in the C3 bitstream among the bitstreams of the dependent channel group #1.

오디오 부호화 장치(200,400)는 LFE 신호를 압축하고, 압축된 LFE 신호는 종속 채널 그룹 #1의 비트스트림 중 M2 비트스트림에 포함될 수 있다. The audio encoding apparatuses 200 and 400 may compress the LFE signal, and the compressed LFE signal may be included in the M2 bitstream among the bitstreams of the dependent channel group #1.

오디오 복호화 장치(300,500)는 기본 채널 그룹 및 종속 채널 그룹 #1의 압축 오디오 신호를 기초로 7.1.0 채널 레이아웃의 오디오 신호를 복원할 수 있다.The audio decoding apparatuses 300 and 500 may reconstruct the audio signal of the 7.1.0 channel layout based on the compressed audio signal of the basic channel group and the dependent channel group #1.

종속 채널 그룹 #1 다음으로, 오디오 부호화 장치(200,400)는 4채널의 오디오 신호를 종속 채널 그룹 #2의 오디오 신호로 압축할 수 있다. After the dependent channel group #1, the audio encoding apparatuses 200 and 400 may compress the audio signal of 4 channels into the audio signal of the dependent channel group #2.

오디오 부호화 장치(200,400)는 Hfl 신호 및 Hfr 신호를 압축하고, 압축된 Hfl 신호 및 Hfr 신호는 종속 채널 그룹 #2의 비트스트림 중 C4 비트스트림에 포함될 수 있다.The audio encoding apparatuses 200 and 400 may compress the Hfl signal and the Hfr signal, and the compressed Hfl signal and the Hfr signal may be included in the C4 bitstream among the bitstreams of the dependent channel group #2.

오디오 부호화 장치(200,400)는 Hbl 신호 및 Hbr 신호를 압축하고, 압축된 Hbl 신호 및 Hbr 신호는 종속 채널 그룹 #2의 비트스트림 중 C5 비트스트림에 포함될 수 있다.The audio encoding apparatuses 200 and 400 may compress the Hbl signal and the Hbr signal, and the compressed Hbl signal and the Hbr signal may be included in the C5 bitstream among the bitstreams of the dependent channel group #2.

이하, 전술된 각 채널 그룹 내 스트림의 전송 순서 및 규칙에 따라, Case 3의 비트스트림(1930) 내 각 채널 그룹의 스트림 구성에 대하여 설명하겠다. Hereinafter, the stream configuration of each channel group in the bitstream 1930 of Case 3 will be described according to the transmission order and rules of the streams in each channel group described above.

기본 채널 그룹 다음으로, 오디오 부호화 장치(200,400)는 10채널의 오디오 신호를 종속 채널 그룹 #1의 오디오 신호로 압축할 수 있다.After the basic channel group, the audio encoding apparatuses 200 and 400 may compress the 10-channel audio signal into the audio signal of the dependent channel group #1.

오디오 부호화 장치(200,400)는 Hfl 신호 및 Hfr 신호를 압축하고, 압축된 Hfl 신호 및 Hfr 신호는 종속 채널 그룹 #1의 비트스트림 중 C4 비트스트림에 포함될 수 있다.The audio encoding apparatuses 200 and 400 may compress the Hfl signal and the Hfr signal, and the compressed Hfl signal and the Hfr signal may be included in the C4 bitstream among the bitstreams of the dependent channel group #1.

오디오 부호화 장치(200,400)는 Hbl 신호 및 Hbr 신호를 압축하고, 압축된 Hbl 신호 및 Hbr 신호는 종속 채널 그룹 #1의 비트스트림 중 C5 비트스트림에 포함될 수 있다.The audio encoding apparatuses 200 and 400 may compress the Hbl signal and the Hbr signal, and the compressed Hbl signal and the Hbr signal may be included in the C5 bitstream among the bitstreams of the dependent channel group #1.

오디오 복호화 장치(300,500)는 기본 채널 그룹 및 종속 채널 그룹 #1의 압축 오디오 신호를 기초로 7.1.4 채널 레이아웃의 오디오 신호를 복원할 수 있다.The audio decoding apparatuses 300 and 500 may reconstruct the audio signal of the 7.1.4 channel layout based on the compressed audio signal of the basic channel group and the dependent channel group #1.

한편, 오디오 복호화 장치(300,500)는 적어도 하나의 업믹싱부를 이용하여, 단계적으로 디믹싱을 수행할 수 있다. 디믹싱은 적어도 하나의 채널 그룹에 포함된 채널들의 오디오 신호에 기초하여 수행된다.Meanwhile, the audio decoding apparatuses 300 and 500 may perform demixing in stages by using at least one upmixing unit. Demixing is performed based on audio signals of channels included in at least one channel group.

예를 들어, 1.x to 2.x 업믹싱부(제 1 업믹싱부)는 믹싱된 오른쪽 채널인 모노 채널의 오디오 신호로부터 오른쪽 채널의 오디오 신호를 디믹싱할 수 있다.For example, the 1.x to 2.x upmixing unit (the first upmixing unit) may demix the right channel audio signal from the mixed right channel mono channel audio signal.

또는, 2.x to 3.x 업믹싱부(제 2 업믹싱부)는 믹싱된 센터 채널인 L2 채널의 오디오 신호 및 R2 채널의 오디오 신호로부터 센터 채널의 오디오 신호를 디믹싱할 수 있다. 또는, 2.x to 3.x 업믹싱부(제 2 업믹싱부)는 믹싱된 L3 채널 및 R3 채널의 L2 채널의 오디오 신호 및 R2 채널의 오디오 신호 및 C 채널의 오디오 신호로부터 L3 채널 및 R3 채널의 오디오 신호를 디믹싱할 수 있다.Alternatively, the 2.x to 3.x upmixer (the second upmixer) may demix the audio signal of the center channel from the mixed audio signal of the L2 channel, which is the center channel, and the audio signal of the R2 channel. Alternatively, the 2.x to 3.x upmixing unit (the second upmixing unit) is configured to mix the L3 channel and R3 channel L2 channel audio signal and the R2 channel audio signal and the C channel audio signal from the L3 channel and R3 You can demix the audio signal of a channel.

3.x to 5.x 업믹싱부(제 3 업믹싱부)는 믹싱된 Ls5/Rs5 채널인 L3 채널의 오디오 신호 및 R3 채널의 오디오 신호, L(5) 채널의 오디오 신호 및 R(5) 채널의 오디오 신호로부터 Ls5 채널 및 Rs5 채널의 오디오 신호를 디믹싱할 수 있다.3.x to 5.x upmixing unit (3rd upmixing unit) is a mixed Ls5/Rs5 channel, L3 channel audio signal and R3 channel audio signal, L(5) channel audio signal and R(5) It is possible to demix the audio signal of the Ls5 channel and the Rs5 channel from the audio signal of the channel.

5.x to 7.x 업믹싱부(제 4 업믹싱부)는 믹싱된 Lb/Rb 채널인 Ls5 채널의 오디오 신호, Ls7 채널의 오디오 신호 및 Rs7 채널의 오디오 신호로부터, Lb 채널 및 Rb 채널의 오디오 신호를 디믹싱할 수 있다.The 5.x to 7.x upmixing unit (fourth upmixing unit) receives the mixed Lb/Rb channel, Ls5 channel audio signal, Ls7 channel audio signal, and Rs7 channel audio signal, from the Lb channel and Rb channel audio signal. Audio signals can be demixed.

x.x.2(FH) to x.x.2(H) 업믹싱부(제 4 업믹싱부)는 믹싱된 Ls/Rs 채널인 Hfl3 채널의 오디오 신호 및 Hfr3 채널의 오디오 신호, L3 채널의 오디오 신호, L5 채널의 오디오 신호, R3 채널의 오디오 신호, R5 채널의 오디오 신호로부터 Hl 채널 및 Hr 채널의 오디오 신호를 디믹싱할 수 있다.The x.x.2(FH) to x.x.2(H) upmixing unit (fourth upmixing unit) includes the mixed Ls/Rs channels of the Hfl3 channel audio signal, the Hfr3 channel audio signal, the L3 channel audio signal, and the L5 channel. The audio signal of the Hl channel and the Hr channel may be demixed from the audio signal, the R3 channel audio signal, and the R5 channel audio signal.

x.x.2(H) to x.x.4 업믹싱부(제 5 업믹싱부)는 믹싱된 Hbl/Hbr 채널인 Hl 채널의 오디오 신호 및 Hr 채널의 오디오 신호 및 Hfl 및 Hfr의 채널의 오디오 신호로부터 Hbl 채널 및 Hbr 채널의 오디오 신호를 디믹싱할 수 있다.The x.x.2(H) to x.x.4 upmixing unit (fifth upmixing unit) is configured to extract the mixed Hbl/Hbr channel audio signal of the Hl channel, the Hr channel audio signal, and the Hfl and Hfr channel audio signals from the Hbl channel and You can demix the audio signal of the Hbr channel.

예를 들어, 오디오 복호화 장치(300,500)는 제 1 업믹싱부를 이용하여 3.1.2 채널의 레이아웃으로의 디믹싱을 수행할 수 있다.For example, the audio decoding apparatuses 300 and 500 may perform demixing into a 3.1.2 channel layout using the first upmixing unit.

또한, 오디오 복호화 장치(300,500)는 서라운드 채널에 대한 제 2 업믹싱부 및 제 3 믹싱부를 이용하고, 높이 채널에 대한 제 4 업믹싱부 및 제 5 업믹싱부를 이용하여 7.1.4 채널 레이아웃으로의 디믹싱을 수행할 수 있다.In addition, the audio decoding apparatuses 300 and 500 use the second upmixing unit and the third mixing unit for the surround channel, and use the fourth upmixing unit and the fifth upmixing unit for the height channel to form a 7.1.4 channel layout. Demixing can be performed.

또는, 오디오 복호화 장치(300,500)는 제 1 믹싱부, 제 2 믹싱부 및 제 3 믹싱부를 이용하여 7.1.0 채널 레이아웃으로의 디믹싱을 수행할 수 있다. 오디오 복호화 장치(300,500)는 7.1.0 채널 레이아웃으로부터 7.1.4 채널 레이아웃으로의 디믹싱을 수행하지 않을 수 있다.Alternatively, the audio decoding apparatus 300 or 500 may perform demixing to the 7.1.0 channel layout using the first mixing unit, the second mixing unit, and the third mixing unit. The audio decoding apparatuses 300 and 500 may not perform demixing from the 7.1.0 channel layout to the 7.1.4 channel layout.

또는, 오디오 복호화 장치(300,500)는 제 1 믹싱부, 제 2 믹싱부 및 제 3 믹싱부를 이용하여 7.1.4 채널 레이아웃으로의 디믹싱을 수행할 수 있다. 오디오 복호화 장치(300,500)는 높이 채널에 대한 디믹싱을 수행하지 않을 수 있다.Alternatively, the audio decoding apparatuses 300 and 500 may perform demixing to the 7.1.4 channel layout using the first mixing unit, the second mixing unit, and the third mixing unit. The audio decoding apparatuses 300 and 500 may not perform demixing on the height channel.

이하, 오디오 부호화 장치(200,400)가 채널 그룹을 생성하는 규칙을 설명하겠다. 스케일러블 포맷에 대한 채널 레이아웃 CLi(i=0부터 n사이의 정수, Cli은 Si.Wi.Hi임)에 대하여, Si+Wi+Hi는 채널 그룹 #i에 대한 채널들의 개수일 수 있다. 채널 그룹 #i에 대한 채널들의 개수는 채널 그룹 #i-1에 대한 채널들의 개수보다 많을 수 있다.Hereinafter, a rule for generating a channel group by the audio encoding apparatuses 200 and 400 will be described. For the channel layout CLi for the scalable format (i=an integer between 0 and n, Cli is Si.Wi.Hi), Si+Wi+Hi may be the number of channels for channel group #i. The number of channels for channel group #i may be greater than the number of channels for channel group #i-1.

채널 그룹 #i는 가능한 많은 Cli의 원본 채널들(표시 채널들)을 포함할 수 있다. 원본 채널들은 다음 우선순위를 따를 수 있다.Channel group #i may include as many original channels (display channels) of Cli as possible. Original channels may follow the following priority.

만약 H_i-1이 0이면, 다른 채널들보다 높이 채널의 우선순위가 앞설 수 있다. 다른 채널들보다 센터 채널 및 LFE 채널의 우선순위가 앞설 수 있다.If H _i-1 is 0, the priority of the height channel may precede other channels. The priority of the center channel and the LFE channel may precede other channels.

높이 전방 채널의 우선순위가 사이드 채널 및 높이 후방 채널의 우선순위보다 앞설 수 있다.The priority of the height front channel may precede the priority of the side channel and the height rear channel.

사이드 채널의 우선 순위가 후방 채널의 우선순위보다 앞설 수 있다. 또한, 좌측 채널의 우선순위가 우측 채널의 우선순위보다 앞설 수 있다.The priority of the side channel may precede the priority of the rear channel. Also, the priority of the left channel may precede the priority of the right channel.

예를 들어, n이 4이고, CL0는 스테레오 채널, CL1은 3.1.2 채널, CL2는 5.1.2 채널, CL3는 7.1.4 채널인 경우, 다음과 같이 채널 그룹이 생성될 수 있다.For example, when n is 4, CL0 is a stereo channel, CL1 is a 3.1.2 channel, CL2 is a 5.1.2 channel, and CL3 is a 7.1.4 channel, a channel group can be created as follows.

오디오 부호화 장치(200,400)는 A(L2),B(R2) 신호를 포함하는 기본 채널 그룹을 생성할 수 있다. 오디오 부호화 장치(200,400)는 Q1(Hfl3), Q2(Hfr3), T(=C), P(=LFE)신호를 포함하는 종속 채널 그룹 #1을 생성할 수 있다. 오디오 부호화 장치(200,400)는 S1(=L), S2(=R) 신호를 포함하는 종속 채널 그룹 #2를 생성할 수 있다.The audio encoding apparatuses 200 and 400 may generate a basic channel group including A(L2) and B(R2) signals. The audio encoding apparatuses 200 and 400 may generate the dependent channel group #1 including the Q1 (Hfl3), Q2 (Hfr3), T (=C), and P (= LFE) signals. The audio encoding apparatuses 200 and 400 may generate a dependent channel group #2 including signals S1 (=L) and S2 (=R).

오디오 부호화 장치(200,400)는 V1(Hfl) 및 V2(Hfr) 신호, U1(Ls) 및 U2(Rs)를 포함하는 종속 채널 그룹 #3을 생성할 수 있다.The audio encoding apparatus 200 or 400 may generate a dependent channel group #3 including signals V1 (Hfl) and V2 (Hfr), and U1 (Ls) and U2 (Rs).

한편, 오디오 복호화 장치(300,500)는 다운믹싱 행렬을 이용하여 압축 해제된 오디오 신호들로부터 7.1.4 채널의 오디오 신호를 복원할 수 있다. 이때, 다운믹싱 행렬은 예를 들어, 하기와 같은 표 2와 같은 다운믹싱 가중치 파라미터를 포함할 수 있다. Meanwhile, the audio decoding apparatus 300 or 500 may reconstruct the 7.1.4 channel audio signal from the decompressed audio signals using the downmixing matrix. In this case, the downmixing matrix may include, for example, downmixing weight parameters as shown in Table 2 below.

LL RR CC LFELFE LsLs RsRs LbLb RbRb HflHfl HfrHfr HblHbl HbrHbr A(L2/L3)A (L2/L3) 1One cwcw δ*αδ*α δ*βδ*β B(L2/L3)B (L2/L3) 1One cwcw δ*αδ*α δ*βδ*β T(C)T(C) 1One P(LFE)P(LFE) 1One Q1(Hfl3)Q1 (Hfl3) w*δ*αw*δ*α w*δ*βw*δ*β 1One γγ Q2(Hfr3)Q2 (Hfr3) w*δ*αw*δ*α w*δ*βw*δ*β 1One γγ S1(L)S1(L) 1One S2(R)S2(R) 1One U1(Ls7)U1 (Ls7) 1One U2(Rs7)U2 (Rs7) 1One V1(Hfl3)V1 (Hfl3) 1One V2(Hfr3)V2 (Hfr3) 1One

여기서 cw는 중심 가중치(center weight)로, 기본 채널 그룹의 채널 레이아웃이 3.1.2 채널 레이아웃인 경우, 0이고, 기본 채널의 그룹의 레이아웃이 2채널 레이아웃인 경우, 1일 수 있다. 또한, w는 서라운드-투-높이 믹싱 가중치(surround-to-height mixing weight)일 수 있다. 또한, α, β, γ, δ는 다운믹싱 가중치 파라메터로, 가변적일 수 있다. 오디오 부호화 장치(200,400)는 α, β, γ, δ, w와 같은 다운믹싱 가중치 파라메터 정보를 포함하는 비트스트림을 생성할 수 있고, 오디오 복호화 장치(300,500)는 비트스트림으로부터 다운믹싱 가중치 파라메터 정보를 획득할 수 있다.Here, cw is a center weight, and may be 0 when the channel layout of the basic channel group is a 3.1.2 channel layout, and may be 1 when the layout of the basic channel group is a two-channel layout. Also, w may be a surround-to-height mixing weight. In addition, α, β, γ, and δ are downmixing weight parameters, and may be variable. The audio encoding apparatuses 200 and 400 may generate a bitstream including downmixing weight parameter information such as α, β, γ, δ, and w, and the audio decoding apparatus 300 and 500 may generate the downmixing weight parameter information from the bitstream. can be obtained

한편, 다운믹싱 행렬(또는 디믹싱 행렬)의 가중치 파라미터 정보는 인덱스 형태일 수 있다. 예를 들어, 다운믹싱 행렬(또는 디믹싱 행렬)의 가중치 파라미터 정보는 복수의 다운믹싱(또는 디믹싱) 가중치 파라미터 셋 중 하나의 다운믹싱(또는 디믹싱) 가중치 파라미터 셋을 나타내는 인덱스 정보일 수 있고, 하나의 다운믹싱(또는 디믹싱) 가중치 파라미터 셋에 대응하는 적어도 하나의 다운믹싱(또는 디믹싱) 가중치 파라미터가 LUT 형태로 존재할 수 있다. 예를 들어, 다운믹싱(또는 디믹싱) 행렬의 가중치 파라미터 정보는 복수의 다운믹싱(또는 디믹싱) 가중치 파라미터 셋 중 하나의 다운믹싱(또는 디믹싱) 가중치 파라미터 셋을 나타내는 정보일 수 있고, 하나의 다운믹싱(또는 디믹싱) 가중치 파라미터 셋에 대응하는 LUT에는, α, β, γ, δ, w 중 적어도 하나가 미리 정의되어 있을 수 있다. 따라서, 오디오 복호화 장치(300,500)는 하나의 다운믹싱(디믹싱) 가중치 파라미터 셋에 대응하는 α, β, γ, δ, w를 획득할 수 있다.Meanwhile, weight parameter information of the downmixing matrix (or demixing matrix) may be in the form of an index. For example, the weight parameter information of the downmixing matrix (or demixing matrix) may be index information indicating one downmixing (or demixing) weighting parameter set among a plurality of downmixing (or demixing) weighting parameter sets, and , at least one downmixing (or demixing) weight parameter corresponding to one downmixing (or demixing) weight parameter set may exist in the form of a LUT. For example, the weight parameter information of the downmixing (or demixing) matrix may be information indicating one downmixing (or demixing) weighting parameter set among a plurality of downmixing (or demixing) weighting parameter sets, and one At least one of α, β, γ, δ, and w may be predefined in the LUT corresponding to the downmixing (or demixing) weight parameter set of . Accordingly, the audio decoding apparatuses 300 and 500 may obtain α, β, γ, δ, and w corresponding to one set of downmixing (demixing) weight parameters.

제 1 채널 레이아웃으로부터 제 2 채널 레이아웃의 오디오 신호로의 다운믹싱을 위한 행렬은 복수의 행렬을 포함할 수 있다. 예를 들어 ,제 1 채널 레이아웃으로부터 제 3 채널 레이아웃으로의 다운 믹싱을 위한 제 1 행렬 및 제 3 채널 레이아웃으로부터 제 2 채널 레이아웃으로의 다운 믹싱을 위한 제 2 행렬을 포함할 수 있다. The matrix for downmixing from the first channel layout to the audio signal of the second channel layout may include a plurality of matrices. For example, it may include a first matrix for down-mixing from the first channel layout to the third channel layout and a second matrix for down-mixing from the third channel layout to the second channel layout.

구체적으로, 예를 들어, 7.1.4 채널 레이아웃으로부터 3.1.2 채널 레이아웃의 오디오 신호로의 다운믹싱을 위한 행렬은 7.1.4 채널 레이아웃으로부터 5.1.4 채널의 레이아웃의 오디오 신호로의 다운믹싱을 위한 제 1 행렬 및 5.1.4 채널 레이아웃으로부터 3.1.2 채널 레이아웃의 오디오 신호로의 다운믹싱을 위한 제 2 행렬을 포함할 수 있다.Specifically, for example, a matrix for downmixing from a 7.1.4 channel layout to an audio signal of a 3.1.2 channel layout is for downmixing from a 7.1.4 channel layout to an audio signal of a layout of 5.1.4 channels. a first matrix and a second matrix for downmixing from the 5.1.4 channel layout to the audio signal in the 3.1.2 channel layout.

표 3 및 4는 컨텐츠 기반 다운믹스 파라미터 및 서라운드 투 높이 기반 가중치에 기초한 7.1.4 채널 레이아웃으로부터 3.1.2 채널 레이아웃의 오디오 신호로의 다운믹싱을 위한 제 1 행렬 및 제 2 행렬이다.Tables 3 and 4 are a first matrix and a second matrix for downmixing from a 7.1.4 channel layout to an audio signal in a 3.1.2 channel layout based on content-based downmix parameters and surround-to-height-based weights.

제1행렬first matrix LL RR CC LfeLfe LsLs RsRs LbLb RbRb Ls5Ls5 αα ββ Rs5Rs5 αα ββ

제2행렬second matrix LL RR CC LfeLfe Ls5Ls5 Rs5Rs5 HflHfl HfrHfr HblHbl HbrHbr L3L3 1One 00 00 00 γγ 00 00 00 00 00 R3R3 00 1One 00 00 00 γγ 00 00 00 00 CC 00 00 1One 00 00 00 00 00 00 00 LfeLfe 00 00 00 1One 00 00 00 00 00 00 Hfl3Hfl3 00 00 00 00 γ*wγ*w 00 00 00 δδ 00 Hfr3Hfr3 00 00 00 00 00 γ*wγ*w 00 00 00 δδ

여기서, α, β, γ, δ는 다운믹싱 파라메터 중 하나이고, w는 surround to height weight를 의미할 수 있다.Here, α, β, γ, and δ are one of the downmixing parameters, and w may mean surround to height weight.

여기서, A,B,C는 다운믹싱 파라메터 중 하나이고, w는 surround to height weight를 의미할 수 있다.Here, A, B, and C are one of the downmixing parameters, and w may mean surround to height weight.

5.x 채널에서 7.x 채널로의 업믹싱(또는 디믹싱)을 위해, 디믹싱 가중치 파라메터 α, β가 이용될 수 있다.For upmixing (or demixing) from the 5.x channel to the 7.x channel, the demixing weight parameters α, β may be used.

x.x.2(H) 채널에서 x.x.4 채널로의 업믹싱을 위해, 디믹싱 가중치 파라메터 γ 가 이용될 수 있다. For upmixing from the x.x.2(H) channel to the x.x.4 channel, a demixing weight parameter γ may be used.

3.x 채널에서 5.x 채널로의 업믹싱을 위해, 디믹싱 가중치 파라메터δ가 이용될 수 있다.For upmixing from the 3.x channel to the 5.x channel, a demixing weight parameter δ may be used.

x.x.2(FH) 채널에서 x.x.2(H) 채널로의 업믹싱을 위해, 디믹싱 가중치 파라메터 w 및δ가 이용될 수 있다.For upmixing from the x.x.2 (FH) channel to the x.x.2 (H) channel, the demixing weight parameters w and δ may be used.

2.x 채널에서 3.x 채널로의 업믹싱을 위해, 디믹싱 가중치 파라메터 -3dB가 이용될 수 있다. 즉, 디믹싱 가중치 파라메터는 고정수일 수 있고, 시그널링되지 않을 수 있다. For upmixing from the 2.x channel to the 3.x channel, a demixing weight parameter of -3dB may be used. That is, the demixing weight parameter may be a fixed number and may not be signaled.

또한, 1.x 채널 및 2.x 채널로의 업믹싱을 위해, 디믹싱 가중치 파라메터 -6dB가 이용될 수 있다. 즉, 디믹싱 가중치 파라메터는 고정수일 수 있고, 시그널링되지 않을 수 있다. 한편, 디믹싱에 이용되는 디믹싱 가중치 파라메터는 복수의 타입 중 하나의 타입에 포함된 파라메터일 수 있다. 예를 들어, Type 1의 디믹싱 가중치 파라메터 α, β, γ, δ는 0dB, 0dB, -3dB, -3dB일 수 있다. Type 2의 디믹싱 가중치 파라메터 α, β, γ, δ는 -3dB, -3dB, -3dB, -3dB일 수 있다. Type 3의 디믹싱 가중치 파라메터 α, β, γ, δ는 0dB, -1.25dB, -1.25dB, -1.25dB일 수 있다. Also, for upmixing to the 1.x channel and the 2.x channel, a demixing weight parameter of -6dB may be used. That is, the demixing weight parameter may be a fixed number and may not be signaled. Meanwhile, the demixing weight parameter used for demixing may be a parameter included in one of a plurality of types. For example, the demixing weight parameters α, β, γ, and δ of Type 1 may be 0dB, 0dB, -3dB, and -3dB. The demixing weight parameters α, β, γ, and δ of Type 2 may be -3dB, -3dB, -3dB, and -3dB. Type 3 demixing weight parameters α, β, γ, and δ may be 0dB, -1.25dB, -1.25dB, and -1.25dB.

Type 1은 오디오 신호가 일반적인 오디오 신호인 경우를 나타내는 타입, Type2는 오디오 신호에 대화가 포함된 경우를 나타내는 타입(대화 타입), Type3는 오디오 신호에 효과음이 존재하는 경우를 나타내는 타입(효과음 타입)일 수 있다.Type 1 indicates a case in which the audio signal is a general audio signal, Type 2 indicates a case in which dialogue is included in the audio signal (conversation type), and Type 3 indicates a case in which sound effects are present in the audio signal (sound effect type) can be

오디오 부호화 장치(200,400)는 오디오 신호를 분석하고, 분석된 오디오 신호에 따라, 복수의 타입 중 하나의 타입을 결정할 수 있다. 오디오 부호화 장치(200,400)는 결정된 하나의 타입의 디믹싱 가중치 파라메터를 이용하여, 원본 오디오에 대한 다운믹싱을 수행하여, 하위 채널 레이아웃의 오디오 신호를 생성할 수 있다. The audio encoding apparatuses 200 and 400 may analyze an audio signal and determine one type from among a plurality of types according to the analyzed audio signal. The audio encoding apparatuses 200 and 400 may perform downmixing on the original audio using the determined one type of demixing weight parameter to generate an audio signal of a lower channel layout.

오디오 부호화 장치(200,400)는 복수의 타입 중 하나의 타입을 나타내는 인덱스 정보를 포함하는 비트스트림을 생성할 수 있다. 오디오 복호화 장치(300,500)는 비트스트림으로부터 인덱스 정보를 획득하고, 획득된 인덱스 정보를 기초로 복수의 타입 중 하나의 타입을 식별할 수 있다. 오디오 복호화 장치(300,500)는 식별된 하나의 타입의 디믹싱 가중치 파라메터를 이용하여, 압축 해제된 채널 그룹의 오디오 신호를 업믹싱하여 특정 채널 레이아웃의 오디오 신호를 복원할 수 있다. The audio encoding apparatuses 200 and 400 may generate a bitstream including index information indicating one type among a plurality of types. The audio decoding apparatuses 300 and 500 may obtain index information from the bitstream and identify one type among a plurality of types based on the obtained index information. The audio decoding apparatuses 300 and 500 may reconstruct an audio signal of a specific channel layout by upmixing the decompressed audio signal of the channel group using the identified one type of demixing weight parameter.

또는, 다운믹싱에 따라 생성된 오디오 신호는 다음과 같은 수학식 9로 표현될 수 있다. 즉, 다운믹싱 행렬을 이용한 연산에 제한되지 않고, 1차 다항식 형태의 수학식을 이용한 연산을 기초로, 다운믹싱이 수행되고, 다운믹싱된 각각의 오디오 신호가 생성될 수 있다.Alternatively, the audio signal generated according to downmixing may be expressed by the following Equation (9). That is, the operation is not limited to the operation using the downmixing matrix, and downmixing may be performed based on the operation using an equation in the form of a first-order polynomial, and each downmixed audio signal may be generated.

여기서, p₁은 0.5(즉, -6dB),p₂는 0.707(즉, -3dB)일 수 있다. α 및 β는 서라운드 채널의 개수를 7채널에서 5채널로 다운믹싱할 때 이용되는 값일 수 있다. 예를 들어, α 또는 β는 1(즉, 0dB), 0.866(즉, -1.25dB), 0.707(즉, -3dB) 중 하나일 수 있다.

는 높이 채널의 개수를 4채널에서 5채널로 다운믹싱할 때 이용되는 값일 수 있다. 예를 들어,

는 0.866 또는 0.707 중 하나일 수 있다. δ는 서라운드 채널의 개수를 5채널에서 3채널로 다운믹싱할 때 이용되는 값일 수 있다. δ는 0.866 또는 0.707 중 하나일 수 있다. w'는 H2(예를 들어, 5.1.2 채널 레이아웃 또는 7.1.2 채널 레이아웃의 높이 채널)에서 Hf2(3.1.2 채널 레이아웃의 높이 채널)로 다운믹싱할 때, 이용되는 값일 수 있다.where p ₁ is 0.5 (ie, -6 dB),p ₂ may be 0.707 (ie, -3 dB). α and β may be values used when downmixing the number of surround channels from 7 channels to 5 channels. For example, α or β may be one of 1 (ie, 0 dB), 0.866 (ie, -1.25 dB), or 0.707 (ie, -3 dB).

may be a value used when downmixing the number of height channels from 4 channels to 5 channels. for example,

may be either 0.866 or 0.707. δ may be a value used when downmixing the number of surround channels from 5 channels to 3 channels. δ may be either 0.866 or 0.707. w' may be a value used when downmixing from H2 (eg, a height channel of a 5.1.2 channel layout or a 7.1.2 channel layout) to Hf2 (a height channel of a 3.1.2 channel layout).

이와 유사하게, 디믹싱에 따라 생성되는 오디오 신호는 다음과 같은 수학식 10으로 표현될 수 있다. 즉, 디믹싱 행렬을 이용한 연산에 제한되지 않고, 1차 다항식 형태의 수학식을 이용한 연산을 기초로, 단계적으로 디믹싱이 수행(각 수학식의 연산 프로세스가 하나의 디믹싱 프로세스에 대응됨)되고, 디믹싱된 각각의 오디오 신호가 생성될 수 있다.Similarly, an audio signal generated according to demixing may be expressed by the following Equation (10). That is, the demixing is performed step by step based on the operation using the equation in the form of a first-order polynomial, not limited to the operation using the demixing matrix (the operation process of each equation corresponds to one demixing process) and each demixed audio signal may be generated.

w'는 H2(예를 들어, 5.1.2 채널 레이아웃 또는 7.1.2 채널 레이아웃의 높이 채널)에서 Hf2(3.1.2 채널 레이아웃의 높이 채널)로 다운믹싱 또는 Hf2(3.1.2 채널 레이아웃의 높이 채널)로부터 H2(예를 들어, 5.1.2 채널 레이아웃 또는 7.1.2 채널 레이아웃의 높이 채널)로 디믹싱할 때, 이용되는 값일 수 있다.w' is downmixed from H2 (e.g. height channel in 5.1.2 channel layout or 7.1.2 channel layout) to Hf2 (height channel in 3.1.2 channel layout) or Hf2 (height channel in 3.1.2 channel layout) ) to H2 (eg, a height channel of a 5.1.2 channel layout or a 7.1.2 channel layout) may be a value used when demixing.

sum_w값 및 sum_w값 에 대응되는 w’는 w에 따라 업데이트될 수 있다. w는 -1 또는 1일 수 있고, 프레임마다 전송될 수 있다.The sum _w value and w' corresponding to the sum _w value may be updated according to w. w may be -1 or 1, and may be transmitted per frame.

예를 들어, 최초의 sum_w값은 0이고, 프레임마다 w가 1인 경우,sum_w값이 1만큼 증가하고, 프레임마다 w가 -1인 경우, 1만큼 감소할 수 있다. 만약sum_w값이 1만큼 증가 또는 감소할 때, 0~10의 범위를 벗어난다면,sum_w값은 0 또는 10으로 유지될 수 있다. w' 및 sum_w의 관계를 나타내는 표 5는 다음과 같다. 즉, 프레임마다 w' 값이 점진적으로 업데이트되어 Hf2로부터 H2로 디믹싱할 때 이용될 수 있다.For example, if the first sum _w value is 0, and w is 1 every frame,The sum _w value increases by 1, and when w is -1 for each frame, it may decrease by 1. what ifWhen the sum _w value increases or decreases by 1, if it is out of the range of 0 to 10,The sum _w value may be kept as 0 or 10. Table 5 showing the relationship between w' and sum _w is as follows. That is, the w' value is gradually updated for each frame and can be used when demixing from Hf2 to H2.

sum_w sum _w 00 1One 22 33 44 55 w'w' 00 0.01790.0179 0.03910.0391 0.06580.0658 0.10380.1038 0.250.25 sum_w sum _w 66 77 88 99 1010 w'w' 0.39620.3962 0.43420.4342 0.46090.4609 0.48210.4821 0.50.5

이에 제한되지 않고, 복수의 디믹싱 프로세스 단계를 통합하여 디믹싱이 수행될 수 있다. 예를 들어, L2, R2 의 서라운드 2채널로부터 디믹싱된 Ls5 채널 또는 Rs5 채널의 신호는 수학식 10의 두번째 수학식 내지 다섯번째 수학식을 정리한 수학식 11로 표현될 수 있다.Without being limited thereto, demixing may be performed by integrating a plurality of demixing process steps. For example, the signal of the Ls5 channel or the Rs5 channel demixed from the L2 and R2 surround 2 channels may be expressed by Equation 11 that summarizes the second to fifth Equations of Equation 10.

또한, L2, R2 의 서라운드 2채널로부터 디믹싱된 Hl 또는 Hr 채널의 신호는 수학식 10의 두번째,세번째 수학식 및 여덟번째 및 아홉번째 수학식을 정리한 수학식 12로 표현될 수 있다.In addition, the signal of the Hl or Hr channel demixed from the surround 2 channels of L2 and R2 may be expressed by Equation 12 that summarizes the second and third Equations of Equation 10 and the eighth and ninth Equations.

한편, 서라운드 채널 및 높이 채널에 대한 단계적인 다운믹싱은 다음 표 6과 같은 메커니즘을 가질 수 있다.On the other hand, the stepwise downmixing of the surround channel and the height channel may have a mechanism as shown in Table 6 below.

다운믹싱 관련 정보(또는 디믹싱 관련 정보)는 미리 결정된 5개의 다운믹싱 가중치 파라미터(또는 디믹싱 가중치 파라미터)의 조합들에 기초한 복수의 모드들 중 하나를 나타내는 인덱스 정보일 수 있다. 예를 들어, 표 7과 같이, 복수의 모드에 대응되는 다운믹싱 가중치 파라미터가 미리 결정되어 있을 수 있다.The downmixing related information (or demixing related information) may be index information indicating one of a plurality of modes based on combinations of five predetermined downmixing weight parameters (or demixing weight parameters). For example, as shown in Table 7, downmixing weight parameters corresponding to a plurality of modes may be predetermined.

ModeMode 다운믹싱 가중치 파라미터 (α,β,γ, δ, w) (또는 디믹싱 가중치 파라미터)Downmixing weight parameters (α,β,γ, δ, w) (or demixing weight parameters) 1One (1, 1, 0.707, 0.707, -1)(1, 1, 0.707, 0.707, -1) 22 (0.707, 0.707, 0.707, 0.707, -1)(0.707, 0.707, 0.707, 0.707, -1) 33 (1, 0.866, 0.866, 0.866, -1)(1, 0.866, 0.866, 0.866, -1) 44 (1, 1, 0.707, 0.707, 1)(1, 1, 0.707, 0.707, 1) 55 (0.707, 0.707, 0.707, 0.707, 1)(0.707, 0.707, 0.707, 0.707, 1) 66 (1, 0.866, 0.866, 0.866, 1)(1, 0.866, 0.866, 0.866, 1)

도 20a는 일 실시예에 따른, 오디오 처리 방법의 흐름도를 도시한다.20A illustrates a flowchart of an audio processing method, according to an embodiment.

S2002 단계에서, 오디오 복호화 장치(500)는 비트스트림으로부터 기본 채널 그룹의 적어도 하나의 압축 오디오 신호를 획득할 수 있다.In operation S2002, the audio decoding apparatus 500 may obtain at least one compressed audio signal of the basic channel group from the bitstream.

S2004 단계에서, 오디오 복호화 장치(500)는 비트스트림으로부터 적어도 하나의 종속 채널 그룹의 적어도 하나의 압축 오디오 신호를 획득할 수 있다.In operation S2004, the audio decoding apparatus 500 may obtain at least one compressed audio signal of at least one dependent channel group from the bitstream.

S2006 단계에서, 오디오 복호화 장치(500)는 비트스트림으로부터 업믹스 채널 그룹 중 하나의 업믹스 채널의 에러 제거를 위한 펙터에 관한 정보를 획득할 수 있다.In operation S2006, the audio decoding apparatus 500 may obtain information about a factor for error removal of one upmix channel among the upmix channel group from the bitstream.

S2008 단계에서, 오디오 복호화 장치(500)는 기본 채널 그룹의 적어도 하나의 압축 오디오 신호를 압축 해제하여 기본 채널 그룹의 오디오 신호를 복원할 수 있다. In operation S2008, the audio decoding apparatus 500 may decompress at least one compressed audio signal of the basic channel group to reconstruct the audio signal of the basic channel group.

S2010 단계에서, 오디오 복호화 장치(500)는 적어도 하나의 종속 채널 그룹의 적어도 하나의 압축 오디오 신호를 압축 해제하여 적어도 하나의 종속 채널 그룹의 적어도 하나의 오디오 신호를 복원할 수 있다. In operation S2010 , the audio decoding apparatus 500 may decompress at least one compressed audio signal of at least one subordinate channel group to reconstruct at least one audio signal of at least one subordinate channel group.

S2012 단계에서, 오디오 복호화 장치(500)는 기본 채널 그룹의 적어도 하나의 오디오 신호 및 적어도 하나의 종속 채널 그룹의 적어도 하나의 오디오 신호를 기초로, 업믹스 채널 그룹의 오디오 신호를 생성할 수 있다. In operation S2012 , the audio decoding apparatus 500 may generate an audio signal of an upmix channel group based on at least one audio signal of the basic channel group and at least one audio signal of at least one subordinate channel group.

S2014 단계에서, 오디오 복호화 장치(500)는 업믹스 채널 그룹 중 하나의 업믹스 채널의 오디오 신호 및 에러 제거를 위한 펙터를 기초로, 하나의 업믹스 채널의 오디오 신호를 복원할 수 있다. In operation S2014 , the audio decoding apparatus 500 may reconstruct an audio signal of one upmix channel based on an audio signal of one upmix channel among the upmix channel group and a factor for error removal.

오디오 복호화 장치(500)는 에러 제거를 위한 펙터가 적용되어 복원된 업믹스 채널 그룹 중 하나의 업믹스 채널의 적어도 하나의 오디오 신호와, 업믹스 채널 그룹 중 나머지 채널의 오디오 신호를 포함하는 다채널 오디오 신호를 복원할 수 있다. 즉, 나머지 채널의 오디오 신호의 일부는 에러 제거를 위한 펙터가 적용되지 않을 수 있다.The audio decoding apparatus 500 includes at least one audio signal of one upmix channel among upmix channel groups restored by applying a factor for error removal and a multi-channel including audio signals of the other channels of the upmix channel group. The audio signal can be restored. That is, a factor for error cancellation may not be applied to a portion of the audio signal of the remaining channels.

도 20b는 다른 일 실시예에 따른,오디오 처리 방법의 흐름도를 도시한다.20B is a flowchart of an audio processing method according to another embodiment.

S2022 단계에서, 오디오 복호화 장치(500)는 비트스트림으로부터, 적어도 하나의 제 1 오디오 신호로부터 다운믹싱된 제 2 오디오 신호를 획득할 수 있다.In operation S2022, the audio decoding apparatus 500 may obtain a second audio signal downmixed from at least one first audio signal from the bitstream.

S2024 단계에서, 오디오 복호화 장치(500)는 비트스트림으로부터 제 1 오디오 신호에 대한 에러 제거와 관련된 정보를 획득할 수 있다.In operation S2024, the audio decoding apparatus 500 may obtain information related to error removal of the first audio signal from the bitstream.

S2026 단계에서, 오디오 복호화 장치(500)는 에러 제거와 관련된 정보를 업믹싱된 제 1 오디오 신호에 적용하여 제 1 오디오 신호를 복원할 수 있다.In operation S2026, the audio decoding apparatus 500 may reconstruct the first audio signal by applying the error removal related information to the upmixed first audio signal.

도 20c는 다른 일 실시예에 따른, 오디오 처리 방법의 흐름도를 도시한다.20C is a flowchart of an audio processing method according to another embodiment.

S2052 단계에서, 오디오 부호화 장치(400)는 원본 오디오 신호를 소정의 채널 레이아웃에 기초하여 다운믹싱함으로써, 기본 채널 그룹의 적어도 하나의 오디오 신호 및 적어도 하나의 종속 채널 그룹의 오디오 신호를 획득할 수 있다.In step S2052 , the audio encoding apparatus 400 may obtain at least one audio signal of a basic channel group and an audio signal of at least one subordinate channel group by downmixing the original audio signal based on a predetermined channel layout. .

S2054 단계에서, 오디오 부호화 장치(400)는 기본 채널 그룹의 적어도 하나의 오디오 신호를 압축하여, 기본 채널 그룹의 적어도 하나의 압축 오디오 신호를 생성할 수 있다.In operation S2054, the audio encoding apparatus 400 may compress at least one audio signal of the basic channel group to generate at least one compressed audio signal of the basic channel group.

S2056 단계에서, 오디오 부호화 장치(400)는 적어도 하나의 종속 채널 그룹의 적어도 하나의 오디오 신호를 압축하여, 적어도 하나의 종속 채널 그룹의 적어도 하나의 압축 오디오 신호를 생성할 수 있다. In operation S2056 , the audio encoding apparatus 400 may compress at least one audio signal of at least one subordinate channel group to generate at least one compressed audio signal of at least one subordinate channel group.

S2058 단계에서, 오디오 부호화 장치(400)는 기본 채널 그룹의 적어도 하나의 압축 오디오 신호를 압축 해제하여 기본 채널 복원 신호를 생성할 수 있다.In operation S2058, the audio encoding apparatus 400 may decompress at least one compressed audio signal of the basic channel group to generate a basic channel restoration signal.

S2060 단계에서, 오디오 부호화 장치(400)는 적어도 하나의 종속 채널 그룹의 적어도 하나의 오디오 신호를 압축 해제하여 종속 채널 복원 신호를 생성할 수 있다.In operation S2060, the audio encoding apparatus 400 may decompress at least one audio signal of at least one dependent channel group to generate a dependent channel reconstruction signal.

S2062 단계에서, 오디오 부호화 장치(400)는 기본 채널 복원 신호 및 종속 채널 복원 신호를 업믹스함으로써 업믹스 채널 그룹 중 하나의 업믹스 채널의 제 1 오디오 신호를 획득할 수 있다.In operation S2062 , the audio encoding apparatus 400 may obtain the first audio signal of one upmix channel of the upmix channel group by upmixing the base channel reconstruction signal and the dependent channel reconstruction signal.

S2064 단계에서, 오디오 부호화 장치(400)는 원본 오디오 신호로부터 제 2 오디오 신호를 획득하거나, 원본 오디오 신호를 다운믹싱함으로써 하나의 채널의 제 2 오디오 신호를 획득할 수 있다.In operation S2064, the audio encoding apparatus 400 may obtain a second audio signal of one channel by obtaining a second audio signal from the original audio signal or by downmixing the original audio signal.

S2066 단계에서, 오디오 부호화 장치(400)는 제 1 오디오 신호의 전력 값 및 제 2 오디오 신호의 전력 값을 기초로, 하나의 업믹스 채널에 대한 스케일 펙터를 획득할 수 있다. 여기서, 제 1 오디오 신호의 업믹스 채널과, 제 2 오디오 신호의 채널은 소정의 채널 레이아웃 내 동일한 채널을 나타낼 수 있다.In operation S2066, the audio encoding apparatus 400 may obtain a scale factor for one upmix channel based on the power value of the first audio signal and the power value of the second audio signal. Here, the upmix channel of the first audio signal and the channel of the second audio signal may represent the same channel in a predetermined channel layout.

S2068 단계에서, 오디오 부호화 장치(400)는 기본 채널 그룹의 적어도 하나의 압축 오디오 신호, 적어도 하나의 종속 채널 그룹의 적어도 하나의 압축 오디오 신호 및 하나의 업믹스 채널에 대한 에러 제거에 관한 정보를 포함하는 비트스트림을 생성할 수 있다.In step S2068, the audio encoding apparatus 400 includes information on error removal for at least one compressed audio signal of a basic channel group, at least one compressed audio signal of at least one subordinate channel group, and one upmix channel bitstream can be created.

도 20d는 다른 일 실시예에 따른,오디오 처리 방법의 흐름도를 도시한다.20D is a flowchart of an audio processing method according to another embodiment.

S2072 단계에서, 오디오 부호화 장치(400)는 적어도 하나의 제 1 오디오 신호를 다운믹싱하여 제 2 오디오 신호를 생성할 수 있다.In operation S2072, the audio encoding apparatus 400 may generate a second audio signal by downmixing at least one first audio signal.

S2074 단계에서, 오디오 부호화 장치(400)는 제 2 오디오 신호의 원 신호 세기 및 제 1 오디오 신호의 복호화 후 신호 세기 중 적어도 하나를 이용하여, 제 1 오디오 신호에 대한 에러 제거와 관련된 정보를 생성할 수 있다.In operation S2074, the audio encoding apparatus 400 generates information related to error removal of the first audio signal by using at least one of the original signal strength of the second audio signal and the signal strength after decoding the first audio signal. can

S2076 단계에서, 오디오 부호화 장치(400)는 제 1 오디오 신호에 대한 에러 제거와 관련된 정보 및 다운믹싱된 제 2 오디오 신호를 송신할 수 있다.In operation S2076 , the audio encoding apparatus 400 may transmit information related to error removal of the first audio signal and the downmixed second audio signal.

도 21는 일 실시예에 따라, 오디오 부호화 장치가 제 1 신경망 이용하여 LFE 신호에 메타 데이터를 담아 전송하고, 오디오 복호화 장치가 제 2 신경망을 이용하여 LFE 신호로부터 메타 데이터를 획득하는 과정을 설명하기 위한 도면이다.21 illustrates a process in which an audio encoding apparatus transmits metadata in an LFE signal using a first neural network, and an audio decoding apparatus acquires metadata from an LFE signal using a second neural network, according to an embodiment. is a drawing for

도 21을 참조하면, 오디오 부호화 장치(2100)는 다운믹싱부(2105)를 이용하여, 믹싱 관련 정보(다운믹싱 관련 정보)를 기초로, 채널 신호들(L/R/C/Ls/Rs/Lb/Rb/Hfl/Hfr/Hbl/Hbr/W/X/Y/Z)을 다운믹싱하여 A/B/T/Q/S/U/V의 오디오 신호를 획득할 수 있다. Referring to FIG. 21 , the audio encoding apparatus 2100 uses the downmixing unit 2105 to generate channel signals (L/R/C/Ls/Rs/ Lb/Rb/Hfl/Hfr/Hbl/Hbr/W/X/Y/Z) can be downmixed to obtain audio signals of A/B/T/Q/S/U/V.

오디오 부호화 장치(2100)는 LFE 신호와 메타 데이터를 입력으로 하는 제 1 신경망(2110)을 이용하여, P 신호를 획득할 수 있다. 즉, 메타 데이터가 제 1 신경망을 이용하여 LFE 신호에 포함될 수 있다. 여기서 메타 데이터는 스피치 놈(Speech Norm) 정보, 에러 제거를 위한 펙터(예를 들어, CER;Cancelation Error Ratio)에 관한 정보, 온 스크린 객체(On screen object) 정보 및 믹싱 관련 정보를 포함할 수 있다.The audio encoding apparatus 2100 may obtain the P signal by using the first neural network 2110 to which the LFE signal and metadata are input. That is, the metadata may be included in the LFE signal using the first neural network. Here, the metadata may include speech norm information, information about a factor for error removal (eg, CER; Cancellation Error Ratio), on-screen object information, and mixing-related information. .

오디오 부호화 장치(2100)는 A/B/T/Q/S/U/V의 오디오 신호를 입력으로 하여, 제 1 압축부(2115)를 이용하여, 압축된 A/B/T/Q/S/U/V 신호를 생성할 수 있다.The audio encoding apparatus 2100 receives an A/B/T/Q/S/U/V audio signal as an input and uses the first compression unit 2115 to compress A/B/T/Q/S Can generate /U/V signals.

오디오 부호화 장치(2100)는 P 신호를 입력으로 하여, 제 2 압축부(2115)를 이용하여, 압축된 P 신호를 생성할 수 있다.The audio encoding apparatus 2100 may receive the P signal as an input and generate a compressed P signal using the second compression unit 2115 .

오디오 부호화 장치(2100)는 패킷화부(2120)를 이용하여, 압축된 A/B/T/Q/S/U/V 신호 및 압축된 P 신호를 포함하는 비트스트림을 생성할 수 있다. 이때, 비트스트림은 패킷화될 수 있다. 오디오 부호화 장치(2100)는 오디오 복호화 장치(2150)으로 패킷화된 비트스트림을 송신할 수 있다.The audio encoding apparatus 2100 may generate a bitstream including the compressed A/B/T/Q/S/U/V signal and the compressed P signal by using the packetizer 2120 . In this case, the bitstream may be packetized. The audio encoding apparatus 2100 may transmit a packetized bitstream to the audio decoding apparatus 2150 .

오디오 복호화 장치(2150)는 패킷화된 비트스트림을 오디오 부호화 장치(2100)로부터 수신할 수 있다.The audio decoding apparatus 2150 may receive the packetized bitstream from the audio encoding apparatus 2100 .

오디오 복호화 장치(2150)는 역패킷화부(2155)를 이용하여, 패킷화된 비트스트림으로부터 압축된 A/B/T/Q/S/U/V 신호 및 압축된 P 신호를 획득할 수 있다.The audio decoding apparatus 2150 may obtain a compressed A/B/T/Q/S/U/V signal and a compressed P signal from the packetized bitstream by using the depacketizer 2155 .

오디오 복호화 장치(2150)는 제 1 압축해제부(2160)를 이용하여, 압축된 A/B/T/Q/S/U/V 신호로부터 A/B/T/Q/S/U/V 신호를 획득할 수 있다.The audio decoding apparatus 2150 uses the first decompressor 2160 to obtain an A/B/T/Q/S/U/V signal from the compressed A/B/T/Q/S/U/V signal. can be obtained.

오디오 복호화 장치(2150)는 제 1 압축해제부(2165)를 이용하여, 압축된 P 신호로부터 P 신호를 획득할 수 있다.The audio decoding apparatus 2150 may obtain a P signal from the compressed P signal by using the first decompressor 2165 .

오디오 복호화 장치(2150)는 업믹싱부(2170)를 이용하여, (디)믹싱 관련 정보를 기초로, A/B/T/Q/S/U/V 신호로부터 채널 신호를 복원할 수 있다. 채널 신호는 L/R/C/Ls/Rs/Lb/Rb/Hfl/Hfr/Hbl/Hbr/W/X/Y/Z 신호 중 적어도 하나일 수 있다. (디)믹싱 관련 정보는 제 2 신경망(2180)를 이용하여 획득될 수 있다.The audio decoding apparatus 2150 may use the upmixer 2170 to reconstruct a channel signal from the A/B/T/Q/S/U/V signal based on (de)mixing related information. The channel signal may be at least one of L/R/C/Ls/Rs/Lb/Rb/Hfl/Hfr/Hbl/Hbr/W/X/Y/Z signals. (D)mixing-related information may be obtained using the second neural network 2180 .

오디오 복호화 장치(2150)는 로우패스 필터(2175)를 이용하여, P 신호로부터 LFE 신호를 획득할 수 있다.The audio decoding apparatus 2150 may obtain an LFE signal from the P signal by using the low-pass filter 2175 .

오디오 복호화 장치(2150)는 고주파수 감지부(2185)를 이용하여, P 신호로부터 활성화 신호를 획득할 수 있다.The audio decoding apparatus 2150 may obtain an activation signal from the P signal by using the high frequency detection unit 2185 .

오디오 복호화 장치(2150)는 활성화 신호를 기초로, 제 2 신경망(2180)을 이용하는지 여부를 확인할 수 있다.The audio decoding apparatus 2150 may determine whether the second neural network 2180 is used based on the activation signal.

오디오 복호화 장치(2150)는 제 2 신경망(2180)을 이용한다고 확인한 경우, 제 2 신경망(2180)을 이용하여, P 신호로부터 메타 데이터를 획득할 수 있다. 메타 데이터는 스피치 놈(Speech Norm) 정보, 에러 제거를 위한 펙터(예를 들어, CER;Cancelation Error Ratio)에 관한 정보, 온 스크린 객체(On screen object) 정보 및 (디)믹싱 관련 정보를 포함할 수 있다.When it is confirmed that the second neural network 2180 is used, the audio decoding apparatus 2150 may obtain metadata from the P signal using the second neural network 2180 . Meta data may include speech norm information, information on factors for error removal (eg, CER; Cancellation Error Ratio), on screen object information, and (de)mixing related information. can

제 1 신경망(2110) 및 제 2 신경망(2180)의 파라미터는 단독 훈련을 통해 획득될 수 있으나, 이에 제한되지 않고, 연계 훈련을 통해 획득될 수 있다. 별도의 훈련장치로부터 미리 훈련된 제 1 신경망(2110)과 제 2 신경망(2180)의 파라미터 정보를 수신하고, 파라미터 정보를 기초로, 제 1 신경망(2110) 및 제 2 신경망(2180)이 각각 설정될 수 있다. The parameters of the first neural network 2110 and the second neural network 2180 may be obtained through independent training, but are not limited thereto, and may be obtained through joint training. Receive parameter information of the first neural network 2110 and the second neural network 2180 trained in advance from a separate training device, and based on the parameter information, the first neural network 2110 and the second neural network 2180 are set respectively can be

제 1 신경망(2110)과 제 2 신경망(2180)는 각각 복수의 훈련된 파라미터 셋 중 하나의 파라미터 셋을 선택할 수 있다. 예를 들어, 제 1 신경망(2110)은 복수의 훈련된 파라미터 셋 중 선택된 하나의 파라미터 셋을 기초로 설정될 수 있다. 오디오 부호화 장치(2100)는 제 1 신경망(2110)에 대한 복수의 파라미터 셋 중 선택된 하나의 파라미터 셋을 나타내는 인덱스 정보를 오디오 복호화 장치(2150)로 전송할 수 있다. 오디오 복호화 장치(2150)는 인덱스 정보를 기초로, 제 2 신경망(2180)의 복수의 파라미터 셋 중 하나의 파라미터 셋을 선택할 수 있다. 오디오 복호화 장치(2150)가 선택한 제 2 신경망(2180)의 파라미터 셋은 오디오 부호화 장치(2100)가 선택한 제 1 신경망(2110)의 파라미터 셋에 대응될 수 있다. 제 1 신경망(2110)의 복수의 파라미터 셋과 제 2 신경망(2180)의 복수의 파라미터 셋은 1 대 1 대응관계를 가질 수 있으나, 이에 제한되지 않고, 1 대 다 또는 다 대 1의 대응관계를 가질 수 있다. 1 대 다 관계에 있는 경우, 추가적인 인덱스 정보가 오디오 부호화 장치(2100)로부터 전송될 수 있다. 또는, 오디오 부호화 장치(2100)는 제 1 신경망(2110)의 복수의 파라미터 셋 중 하나를 나타내는 인덱스 정보를 대신하여, 제 2 신경망(2180)의 복수의 파라미터 셋 중 하나의 파라미터 셋을 나타내는 인덱스 정보를 전송할 수 있다.The first neural network 2110 and the second neural network 2180 may select one parameter set from among a plurality of trained parameter sets, respectively. For example, the first neural network 2110 may be set based on one parameter set selected from among a plurality of trained parameter sets. The audio encoding apparatus 2100 may transmit index information indicating one parameter set selected among a plurality of parameter sets for the first neural network 2110 to the audio decoding apparatus 2150 . The audio decoding apparatus 2150 may select one parameter set from among a plurality of parameter sets of the second neural network 2180 based on the index information. The parameter set of the second neural network 2180 selected by the audio decoding apparatus 2150 may correspond to the parameter set of the first neural network 2110 selected by the audio encoding apparatus 2100 . The plurality of parameter sets of the first neural network 2110 and the plurality of parameter sets of the second neural network 2180 may have a one-to-one correspondence, but the present invention is not limited thereto. can have When there is a one-to-many relationship, additional index information may be transmitted from the audio encoding apparatus 2100 . Alternatively, the audio encoding apparatus 2100 replaces the index information indicating one of the plurality of parameter sets of the first neural network 2110 , and index information indicating one parameter set among the plurality of parameter sets of the second neural network 2180 . can send

도 22a는 일 실시예에 따른, 오디오 처리 방법의 흐름도를 도시한다.22A illustrates a flowchart of an audio processing method, according to an embodiment.

S2205 단계에서, 오디오 복호화 장치(2150)는 비트스트림으로부터, 적어도 하나의 제 1 오디오 신호로부터 다운믹싱된 제 2 오디오 신호를 획득할 수 있다.In operation S2205, the audio decoding apparatus 2150 may obtain a second audio signal downmixed from at least one first audio signal from the bitstream.

S2210 단계에서, 오디오 복호화 장치(2150)는 비트스트림으로부터 저주파 효과 채널의 오디오 신호를 획득할 수 있다.In operation S2210, the audio decoding apparatus 2150 may obtain an audio signal of a low-frequency effect channel from the bitstream.

S2215 단계에서, 오디오 복호화 장치(2150)는 획득된 저주파 효과 채널의 오디오 신호에 대하여, 부가정보 획득을 위한 신경망(예를 들어, 제 2 신경망(2180))을 이용하여 제 1 오디오 신호에 대한 에러 제거와 관련된 오디오 정보를 획득할 수 있다.In step S2215, the audio decoding apparatus 2150 uses a neural network (eg, the second neural network 2180) for acquiring additional information with respect to the acquired audio signal of the low-frequency effect channel, and an error with respect to the first audio signal Audio information related to removal may be acquired.

S2220 단계에서, 오디오 복호화 장치(2150)는 에러 제거와 관련된 정보를 제 2 오디오 신호로부터 업믹싱된 제 1 오디오 신호에 적용하여 제 1 오디오 신호를 복원할 수 있다.In operation S2220, the audio decoding apparatus 2150 may restore the first audio signal by applying the error removal related information to the first audio signal upmixed from the second audio signal.

도 22b는 일 실시예에 따른, 오디오 처리 방법의 흐름도를 도시한다.22B illustrates a flowchart of an audio processing method, according to an embodiment.

S2255 단계에서, 오디오 부호화 장치(2100)는 적어도 하나의 제 1 오디오 신호를 다운믹싱하여 제 2 오디오 신호를 생성할 수 있다.In operation S2255, the audio encoding apparatus 2100 may downmix at least one first audio signal to generate a second audio signal.

S2260 단계에서, 오디오 부호화 장치(2100)는 제 2 오디오 신호의 원 신호 세기 및 제 1 오디오 신호의 복호화후 신호 세기 중 적어도 하나를 이용하여, 제 1 오디오 신호에 대한 에러 제거와 관련된 정보를 생성할 수 있다.In operation S2260, the audio encoding apparatus 2100 generates information related to error removal for the first audio signal by using at least one of the original signal strength of the second audio signal and the signal strength after decoding the first audio signal. can

S2265 단계에서, 오디오 부호화 장치(2100)는 에러 제거와 관련된 정보에 대하여, 저주파 효과 채널의 오디오 신호의 생성을 위한 신경망(예를 들어, 제 1 신경망(2110))을 이용하여, 저주파 효과 채널의 오디오 신호를 생성할 수 있다.In step S2265, the audio encoding apparatus 2100 uses a neural network (eg, the first neural network 2110) for generating an audio signal of a low-frequency effect channel with respect to information related to error removal, An audio signal can be generated.

S2270 단계에서, 오디오 부호화 장치(2100)는 다운믹싱된 제 2 오디오 신호 및 저주파 효과 채널의 오디오 신호를 송신할 수 있다.In operation S2270, the audio encoding apparatus 2100 may transmit the downmixed second audio signal and the audio signal of the low frequency effect channel.

본 발명의 일 실시예에 의하면, 오디오 복호화 장치는 오디오 신호의 신호 세기를 기초로, 에러 제거를 위한 펙터를 생성하고, 에러 제거를 위한 펙터에 관한 정보를 오디오 복호화 장치로 전송할 수 있다. 오디오 복호화 장치는 에러 제거를 위한 펙터에 관한 정보를 기초로, 업믹스 채널의 오디오 신호에 에러 제거를 위한 펙터를 적용함으로써, 노이즈 형태의 마스커 사운드(masker sound)의 에너지를 목적 사운드의 마스킹된 사운드(masked sound)의 에너지에 맞게 감소시킬 수 있다.According to an embodiment of the present invention, the audio decoding apparatus may generate a factor for error cancellation based on the signal strength of the audio signal, and transmit information about the factor for error cancellation to the audio decoding apparatus. The audio decoding apparatus applies the factor for error removal to the audio signal of the upmix channel based on information about the factor for error removal, thereby converting the energy of the masker sound in the form of noise into the masked sound of the target sound. It can be reduced to match the energy of the masked sound.

한편, 상술한 본 개시의 실시예들은 컴퓨터에서 실행될 수 있는 프로그램 또는 인스트럭션으로 작성가능하고, 작성된 프로그램 또는 인스트럭션은 저장매체에 저장될 수 있다.Meanwhile, the above-described embodiments of the present disclosure can be written as a program or instruction that can be executed on a computer, and the written program or instruction can be stored in a storage medium.

기기로 읽을 수 있는 저장매체는, 비일시적(non-transitory) 저장매체의 형태로 제공될 수 있다. 여기서, ‘비일시적 저장매체'는 실재(tangible)하는 장치이고, 신호(signal)(예: 전자기파)를 포함하지 않는다는 것을 의미할 뿐이며, 이 용어는 데이터가 저장매체에 반영구적으로 저장되는 경우와 임시적으로 저장되는 경우를 구분하지 않는다. 예로, '비일시적 저장매체'는 데이터가 임시적으로 저장되는 버퍼를 포함할 수 있다.The device-readable storage medium may be provided in the form of a non-transitory storage medium. Here, 'non-transitory storage medium' is a tangible device and only means that it does not contain a signal (eg, electromagnetic wave). It does not distinguish the case where it is stored as For example, the 'non-transitory storage medium' may include a buffer in which data is temporarily stored.

일 실시예에 따르면, 본 문서에 개시된 다양한 실시예들에 따른 방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다. 컴퓨터 프로그램 제품은 기기로 읽을 수 있는 저장 매체(예: compact disc read only memory (CD-ROM))의 형태로 배포되거나, 또는 어플리케이션 스토어(예: 플레이 스토어^TM)를 통해 또는 두개의 사용자 장치들(예: 스마트폰들) 간에 직접, 온라인으로 배포(예: 다운로드 또는 업로드)될 수 있다. 온라인 배포의 경우에, 컴퓨터 프로그램 제품(예: 다운로더블 앱(downloadable app))의 적어도 일부는 제조사의 서버, 어플리케이션 스토어의 서버, 또는 중계 서버의 메모리와 같은 기기로 읽을 수 있는 저장 매체에 적어도 일시 저장되거나, 임시적으로 생성될 수 있다.According to one embodiment, the method according to various embodiments disclosed in this document may be provided in a computer program product (computer program product). Computer program products may be traded between sellers and buyers as commodities. The computer program product is distributed in the form of a machine-readable storage medium (eg compact disc read only memory (CD-ROM)), or via an application store (eg Play Store ^TM ) or on two user devices ( It can be distributed (eg downloaded or uploaded) directly, online between smartphones (eg: smartphones). In the case of online distribution, at least a portion of the computer program product (eg, a downloadable app) is stored at least in a machine-readable storage medium, such as a memory of a manufacturer's server, a server of an application store, or a relay server. It may be temporarily stored or temporarily created.

한편, 상술한 신경망과 관련된 모델은, 소프트웨어 모듈로 구현될 수 있다. 소프트웨어 모듈(예를 들어, 명령어(instruction)를 포함하는 프로그램 모듈)로 구현되는 경우, 신경망 모델은 컴퓨터로 읽을 수 있는 판독 가능한 기록매체에 저장될 수 있다.Meanwhile, the above-described neural network-related model may be implemented as a software module. When implemented as a software module (eg, a program module including instructions), the neural network model may be stored in a computer-readable recording medium.

또한, 신경망 모델은 하드웨어 칩 형태로 집적되어 전술한 장치 및 디스플레이 장치의 일부가 될 수도 있다. 예를 들어, 신경망 모델은 인공 지능을 위한 전용 하드웨어 칩 형태로 제작될 수도 있고, 또는 기존의 범용 프로세서(예를 들어, CPU 또는 애플리케이션 프로세서) 또는 그래픽 전용 프로세서(예를 들어, GPU)의 일부로 제작될 수도 있다.In addition, the neural network model may be integrated in the form of a hardware chip to be a part of the above-described device and display device. For example, a neural network model may be built in the form of a dedicated hardware chip for artificial intelligence, or as part of an existing general-purpose processor (e.g., CPU or application processor) or graphics-only processor (e.g., GPU). it might be

또한, 신경망 모델은 다운로드 가능한 소프트웨어 형태로 제공될 수도 있다. 컴퓨터 프로그램 제품은 제조사 또는 전자 마켓을 통해 전자적으로 배포되는 소프트웨어 프로그램 형태의 상품(예를 들어, 다운로드 가능한 애플리케이션)을 포함할 수 있다. 전자적 배포를 위하여, 소프트웨어 프로그램의 적어도 일부는 저장 매체에 저장되거나, 임시적으로 생성될 수 있다. 이 경우, 저장 매체는 제조사 또는 전자 마켓의 서버, 또는 중계 서버의 저장매체가 될 수 있다.In addition, the neural network model may be provided in the form of downloadable software. The computer program product may include a product (eg, a downloadable application) in the form of a software program distributed electronically through a manufacturer or an electronic market. For electronic distribution, at least a portion of the software program may be stored in a storage medium or may be temporarily generated. In this case, the storage medium may be a server of a manufacturer or an electronic market, or a storage medium of a relay server.

이상, 본 개시의 기술적 사상을 바람직한 실시예를 들어 상세하게 설명하였으나, 본 개시의 기술적 사상은 상기 실시예들에 한정되지 않고, 본 개시의 기술적 사상의 범위 내에서 당 분야에서 통상의 지식을 가진 자에 의하여 여러 가지 변형 및 변경이 가능하다.In the above, the technical idea of the present disclosure has been described in detail with reference to preferred embodiments, but the technical idea of the present disclosure is not limited to the above embodiments, and those of ordinary skill in the art within the scope of the technical spirit of the present disclosure Various modifications and changes are possible by the person.

Claims

적어도 하나의 제 1 오디오 신호를 다운믹싱하여 제 2 오디오 신호를 생성하는 단계;
상기 제 1 오디오 신호의 원 신호 세기(original signal power) 및 상기 제 1 오디오 신호의 복호화 후 신호 세기 중 적어도 하나를 이용하여, 상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보를 생성하는 단계; 및
상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보 및 상기 다운믹싱된 제 2 오디오 신호를 송신하는 단계를 포함하는 오디오 처리 방법.
downmixing the at least one first audio signal to generate a second audio signal;
generating information related to error cancellation for the first audio signal by using at least one of an original signal power of the first audio signal and a signal strength after decoding of the first audio signal; and
and transmitting information related to error cancellation for the first audio signal and the downmixed second audio signal.

제 1 항에 있어서,
상기 에러 제거와 관련된 정보는 상기 에러 제거를 위한 펙터에 관한 정보를 포함하고,
상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보를 생성하는 단계는,
상기 제 1 오디오 신호의 원 신호 세기가 소정의 제 1 값보다 작거나 같은 경우,
상기 에러 제거를 위한 펙터의 값이 0임을 나타내는, 상기 에러 제거를 위한 펙터에 관한 정보를 생성하는 단계를 포함하는 것을 특징으로 하는 오디오 처리 방법.
The method of claim 1,
The information related to the error removal includes information about a factor for the error removal,
The step of generating information related to error removal for the first audio signal comprises:
When the original signal strength of the first audio signal is less than or equal to a predetermined first value,
and generating information about the factor for error cancellation indicating that the value of the factor for error cancellation is 0.

제 1 항에 있어서,
상기 에러 제거와 관련된 정보는
상기 에러 제거를 위한 펙터에 관한 정보를 포함하고,
상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보를 생성하는 단계는,
상기 제 1 오디오 신호의 원 신호 세기와 상기 제 2 오디오 신호의 원 신호 세기의 비율이 소정의 제 2 값보다 작은 경우,
상기 제 1 오디오 신호의 원 신호 세기 및 상기 제 1 오디오 신호의 복호화후 신호 세기를 기초로, 상기 에러 제거를 위한 펙터에 관한 정보를 생성하는 단계를 포함하는 것을 특징으로 하는 오디오 처리 방법.
The method of claim 1,
Information related to the error removal is
Includes information about a factor for removing the error,
The step of generating information related to error removal for the first audio signal comprises:
When the ratio of the original signal strength of the first audio signal to the original signal strength of the second audio signal is less than a predetermined second value,
and generating information about a factor for removing the error based on the original signal strength of the first audio signal and the signal strength after decoding of the first audio signal.

제 3 항에 있어서,
상기 에러 제거를 위한 펙터에 관한 정보를 생성하는 단계는,
상기 에러 제거를 위한 펙터의 값이, 상기 제 1 오디오 신호의 원 신호 세기와 상기 제 1 오디오 신호의 복호화후 신호 세기의 비율(ratio)임을 나타내는 상기 에러 제거를 위한 펙터에 관한 정보를 생성하는 단계를 포함하는 것을 특징으로 하는 오디오 처리 방법.
4. The method of claim 3,
The step of generating information about the factor for removing the error comprises:
Generating information about the factor for error cancellation indicating that the value of the factor for error cancellation is a ratio of the original signal strength of the first audio signal and the signal strength after decoding of the first audio signal Audio processing method comprising a.

제 4 항에 있어서,
상기 에러 제거를 위한 펙터에 관한 정보를 생성하는 단계는,
상기 제 1 오디오 신호의 원 신호 세기와 상기 제 1 오디오 신호의 복호화후 신호 세기의 비율(ratio)이 1보다 큰 경우,
상기 에러 제거를 위한 펙터의 값이 1임을 나타내는 상기 에러 제거를 위한 펙터에 관한 정보를 생성하는 단계를 포함하는 것을 특징으로 하는 오디오 처리 방법.
5. The method of claim 4,
The step of generating information about the factor for removing the error comprises:
When the ratio of the original signal strength of the first audio signal to the signal strength after decoding of the first audio signal is greater than 1,
and generating information about the factor for error cancellation indicating that the value of the factor for error cancellation is 1.

제 1 항에 있어서,
상기 에러 제거와 관련된 정보는
상기 에러 제거를 위한 펙터에 관한 정보를 포함하고,
상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보를 생성하는 단계는,
상기 제 1 오디오 신호의 원 신호 세기와 상기 제 2 오디오 신호의 원 신호 세기의 비율이 소정의 제 2 값보다 크거나 같은 경우, 상기 에러 제거를 위한 펙터의 값이 1임을 나타내는 상기 에러 제거를 위한 펙터에 관한 정보를 생성하는 단계를 생성하는 단계를 포함하는 것을 특징으로 하는 오디오 처리 방법.
The method of claim 1,
Information related to the error removal is
Includes information about a factor for removing the error,
The step of generating information related to error removal for the first audio signal comprises:
When the ratio of the original signal strength of the first audio signal to the original signal strength of the second audio signal is greater than or equal to a predetermined second value, the error removal factor indicating that the value of the error removal factor is 1 An audio processing method comprising the step of generating the step of generating information about the factor.

제 1 항에 있어서,
상기 에러 제거와 관련된 정보는 상기 제 2 오디오 신호의 프레임마다 생성되는 것을 특징으로 하는, 오디오 처리 방법.
The method of claim 1,
The audio processing method, characterized in that the information related to the error cancellation is generated for each frame of the second audio signal.

제 1 항에 있어서,
상기 다운믹싱된 제 2 오디오 신호는,
기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호를 포함하고,
상기 종속 채널 그룹의 오디오 신호는 청자 전방의 3차원 오디오 채널에 포함된 독립 채널의 오디오 신호를 포함하는 제 1 종속 채널 오디오 신호를 포함하고,
청자 측방 및 후방의 3차원 오디오 채널의 오디오 신호는 상기 제 1 종속 채널의 오디오 신호를 믹싱하여 획득된 오디오 신호인 것을 특징으로 하는, 오디오 처리 방법.
The method of claim 1,
The downmixed second audio signal is
an audio signal of a primary channel group and an audio signal of a subordinate channel group;
The audio signal of the subordinate channel group includes a first subordinate channel audio signal including an audio signal of an independent channel included in a three-dimensional audio channel in front of a listener;
The audio signal of the three-dimensional audio channel on the side and behind the listener is an audio signal obtained by mixing the audio signal of the first sub-channel.

제 8 항에 있어서,
상기 기본 채널 그룹의 오디오 신호는 제 1 채널의 오디오 신호 및 제 2 채널의 오디오 신호를 포함하고,
상기 제 1 채널의 오디오 신호는 좌측 스테레오 채널의 오디오 신호 및 청자 전방의 중심 채널의 복호화된 오디오 신호를 믹싱하여 생성된 신호이고,
상기 제 2 채널의 오디오 신호는 우측 스테레오 채널의 오디오 신호 및 청자 전방의 중심 채널의 복호화된 오디오 신호를 믹싱하여 생성된 신호인 것을 특징으로 하는 오디오 처리 방법.
9. The method of claim 8,
The audio signal of the basic channel group includes an audio signal of a first channel and an audio signal of a second channel,
The audio signal of the first channel is a signal generated by mixing the audio signal of the left stereo channel and the decoded audio signal of the center channel in front of the listener,
The audio signal of the second channel is a signal generated by mixing an audio signal of a right stereo channel and a decoded audio signal of a center channel in front of the listener.

제 1 항에 있어서,
상기 다운믹싱된 제 2 오디오 신호는,
기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호를 포함하고,
상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보 및 상기 다운믹싱된 제 2 오디오 신호를 송신하는 단계는,
상기 에러 제거와 관련된 정보 및 상기 다운믹싱된 제 2 오디오 신호에 관한 정보를 포함하는 비트스트림을 생성하는 단계; 및
상기 비트스트림을 송신하는 단계를 포함하고,
상기 비트스트림은 복수의 오디오 트랙의 파일 스트림이고,
상기 비트스트림을 생성하는 단계는,
기본 채널 그룹의 압축 오디오 신호를 포함하는 제 1 오디오 트랙의 오디오 스트림을 생성하는 단계; 및
종속 채널 오디오 신호 식별 정보를 포함하는 제 2 오디오 트랙의 오디오 스트림을 생성하는 단계를 포함하고, 상기 제 2 오디오 트랙은 상기 제 1 오디오 트랙에 인접하고,
상기 기본 채널 그룹의 오디오 신호에 대응하는, 종속 채널의 오디오 신호가 존재하는 경우, 상기 종속 채널의 오디오 신호가 존재함을 나타내는 상기 종속 채널 오디오 신호 식별 정보가 생성되고,
상기 종속 채널 오디오 신호 식별 정보는 종속 채널 오디오 신호가 존재함을 나타내는 경우, 상기 제 2 오디오 트랙의 오디오 스트림은 종속 채널 그룹의 압축 오디오 신호를 포함하고,
상기 종속 채널 오디오 신호 식별 정보는 종속 채널 오디오 신호가 존재하지 않음을 나타내는 경우, 상기 제 2 오디오 트랙의 오디오 스트림은 기본 채널 그룹의 다음 트랙의 오디오 신호를 포함하는 것을 특징으로 하는 오디오 처리 방법.
The method of claim 1,
The downmixed second audio signal is
an audio signal of a primary channel group and an audio signal of a subordinate channel group;
Transmitting information related to error cancellation for the first audio signal and the downmixed second audio signal comprises:
generating a bitstream including information related to the error cancellation and information about the downmixed second audio signal; and
transmitting the bitstream;
The bitstream is a file stream of a plurality of audio tracks,
The step of generating the bitstream comprises:
generating an audio stream of a first audio track comprising a compressed audio signal of a base channel group; and
generating an audio stream of a second audio track comprising dependent channel audio signal identification information, wherein the second audio track is adjacent to the first audio track;
When there is an audio signal of the dependent channel corresponding to the audio signal of the base channel group, the dependent channel audio signal identification information indicating that the audio signal of the dependent channel exists is generated;
when the dependent channel audio signal identification information indicates that a dependent channel audio signal exists, the audio stream of the second audio track includes a compressed audio signal of a dependent channel group;
When the dependent channel audio signal identification information indicates that the dependent channel audio signal does not exist, the audio stream of the second audio track includes the audio signal of the next track of the base channel group.

제 1 항에 있어서,
상기 다운믹싱된 제 2 오디오 신호는,
기본 채널 그룹의 오디오 신호 및 종속 채널 그룹의 오디오 신호를 포함하고,
상기 기본 채널의 오디오 신호는 스테레오 채널의 오디오 신호를 포함하고,
상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보 및 상기 다운믹싱된 제 2 오디오 신호를 송신하는 단계는,
상기 에러 제거와 관련된 정보 및 상기 다운믹싱된 제 2 오디오 신호에 관한 정보를 포함하는 비트스트림을 생성하는 단계; 및
상기 비트스트림을 송신하는 단계를 포함하고,
상기 비트스트림을 생성하는 단계는,
스테레오 채널의 압축 오디오 신호를 포함하는 기본 채널 오디오 스트림을 생성하는 단계; 및
복수의 종속 채널 그룹의 복수의 오디오 신호를 포함하는 복수의 종속 채널 오디오 스트림을 생성하는 단계를 포함하고,
상기 복수의 종속 채널 오디오 스트림은 제 1 종속 채널 오디오 스트림 및 제 2 종속 채널 오디오 스트림을 포함하고,
기본 채널 오디오 스트림 및 제 1 종속 채널 오디오 스트림을 생성하기 위해 이용된 다채널의 오디오 신호의 서라운드 채널의 개수는 S_n-1, 서브 우퍼 채널의 개수는 W_n-1, 높이 채널의 개수는 H_n-1이라고 하고, 기본 채널 오디오 스트림, 상기 제 1 종속 채널 오디오 스트림 및 제 2 종속 채널 오디오 스트림을 생성하기 위해 이용된 다채널의 오디오 신호의 서라운드 채널의 개수는 S_n, 서브 우퍼 채널의 개수는 W_n, 높이 채널의 개수는 H_n이라고 할 때,
S_n-1은 S_n보다 작거나 같고, W_n-1은 W_n보다 작거나 같고, H_n-1은 H_n보다 작거나 같으나, S_n-1,W_n-1및 H_n-1이 모두 S_n, W_n,및H_n과 같지는 않은 것으로 제한되는 것을 특징으로 하는 오디오 처리 방법.
The method of claim 1,
The downmixed second audio signal is
an audio signal of a primary channel group and an audio signal of a subordinate channel group;
The audio signal of the basic channel includes an audio signal of a stereo channel,
Transmitting information related to error cancellation for the first audio signal and the downmixed second audio signal comprises:
generating a bitstream including information related to the error cancellation and information about the downmixed second audio signal; and
transmitting the bitstream;
The step of generating the bitstream comprises:
generating a base channel audio stream comprising a compressed audio signal of a stereo channel; and
generating a plurality of dependent channel audio streams comprising a plurality of audio signals of a plurality of dependent channel groups;
wherein the plurality of dependent channel audio streams include a first dependent channel audio stream and a second dependent channel audio stream,
The number of surround channels of the multi-channel audio signal used to generate the base channel audio stream and the first dependent channel audio stream is S _n-1 , the number of subwoofer channels is W _n-1 , and the number of height channels is H _{Let n-1} be, and the number of surround channels of the multi-channel audio signal used to generate the base channel audio stream, the first dependent channel audio stream, and the second dependent channel audio stream is S _n , the number of subwoofer channels Let W _n , the number of height channels is H _n ,
S _n-1 is less than or equal to S _n , W _n-1 is less than or equal to W _n , and H _n-1 is less than or equal to H _n , but S _n-1, W _n-1 and H _n-1 These are all S _n , W _{n ,} andAn audio processing method, characterized in that it is limited to not equal to H _n .

제 1 항에 있어서,
상기 오디오 처리 방법은,
오디오 객체(음원)의 오디오 신호, 위치, 방향 중 적어도 하나를 나타내는 청자 전방의 3차원 오디오 채널의 오디오 객체 신호를 생성하는 단계를 더 포함하고,
상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보 및 상기 다운믹싱된 제 2 오디오 신호를 송신하는 단계는,
상기 에러 제거와 관련된 정보, 청자 천방의 3차원 오디오 채널의 오디오 객체 신호 및 상기 다운믹싱된 제 2 오디오 신호에 관한 정보를 포함하는 비트스트림을 생성하는 단계; 및
상기 비트스트림을 송신하는 단계를 포함하는 것을 특징으로 하는 오디오 처리 방법.
The method of claim 1,
The audio processing method comprises:
The method further comprises generating an audio object signal of a three-dimensional audio channel in front of the listener indicating at least one of an audio signal, a position, and a direction of an audio object (sound source),
Transmitting information related to error cancellation for the first audio signal and the downmixed second audio signal comprises:
generating a bitstream including information related to the error cancellation, an audio object signal of a three-dimensional audio channel in the immediate vicinity of a listener, and information about the downmixed second audio signal; and
and transmitting the bitstream.

비트스트림으로부터, 적어도 하나의 제 1 오디오 신호로부터 다운믹싱된 제 2 오디오 신호를 획득하는 단계;
상기 비트스트림으로부터, 상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보를 획득하는 단계; 및
상기 다운믹싱된 제 2 오디오 신호로부터 상기 제1 오디오 신호를 디믹싱하는 단계; 및
상기 에러 제거와 관련된 정보를 상기 디믹싱된 제 1 오디오 신호에 적용하여 상기 제 1 오디오 신호를 복원하는 단계를 포함하고,
상기 에러 제거와 관련된 정보는,
상기 제 1 오디오 신호의 원 신호 세기 및 상기 제 1 오디오 신호의 복호화후 신호 세기 중 적어도 하나를 이용하여 생성된 정보인, 오디오 처리 방법.
obtaining, from the bitstream, a second audio signal downmixed from the at least one first audio signal;
obtaining, from the bitstream, information related to error cancellation for the first audio signal; and
demixing the first audio signal from the downmixed second audio signal; and
Reconstructing the first audio signal by applying the information related to the error cancellation to the demixed first audio signal,
Information related to the error removal is
The audio processing method is information generated using at least one of an original signal strength of the first audio signal and a signal strength after decoding of the first audio signal.

제 13 항에 있어서,
상기 에러 제거와 관련된 정보는, 상기 에러 제거를 위한 펙터에 관한 정보를 포함하고,
상기 에러 제거를 위한 펙터는 0보다 크거나 같고, 1보다 작거나 같은 값인 것을 특징으로 하는, 오디오 처리 방법.
14. The method of claim 13,
The information related to the error removal includes information about a factor for the error removal,
The factor for error removal is greater than or equal to 0 and less than or equal to 1, the audio processing method.

제 13 항에 있어서,
상기 제 1 오디오 신호를 복원하는 단계는,
상기 디믹싱된 제 1 오디오 신호의 신호 세기에 상기 에러 제거를 위한 펙터를 곱한 신호 세기를 갖도록, 상기 제 1 오디오 신호를 복원하는 단계를 포함하는 것을 특징으로 하는, 오디오 처리 방법.
14. The method of claim 13,
The step of restoring the first audio signal comprises:
and reconstructing the first audio signal to have a signal strength obtained by multiplying a signal strength of the demixed first audio signal by a factor for removing the error.

제 13 항에 있어서,
상기 비트스트림은 기본 채널 그룹의 오디오 신호에 관한 정보 및 종속 채널 그룹의 오디오 신호에 관한 정보를 포함하고,
상기 기본 채널 그룹의 오디오 신호는 다른 채널 그룹의 오디오 신호와의 디믹싱 없이, 상기 비트스트림에 포함된 기본 채널 그룹의 오디오 신호에 관한 정보를 복호화하여 획득된 오디오 신호이고,
상기 종속 채널 그룹의 오디오 신호는, 기본 채널 그룹의 오디오 신호와의 디믹싱을 통해 적어도 하나의 업믹스 채널을 포함하는 업믹스 채널 그룹의 오디오 신호를 복원하기 위한 오디오 신호인 것을 특징으로 하는, 오디오 처리 방법.
14. The method of claim 13,
The bitstream includes information about an audio signal of a base channel group and information about an audio signal of a subordinate channel group,
The audio signal of the basic channel group is an audio signal obtained by decoding information about the audio signal of the basic channel group included in the bitstream without demixing with the audio signal of another channel group,
The audio signal of the subordinate channel group is an audio signal for reconstructing an audio signal of an upmix channel group including at least one upmix channel through demixing with an audio signal of a basic channel group, processing method.

제 16 항에 있어서,
상기 종속 채널 그룹의 오디오 신호는 제 1 종속 채널 오디오 신호 및 제 2 종속 채널 오디오 신호를 포함하고,
상기 제 1 종속 채널 오디오 신호는 청자 전방의 독립 채널의 오디오 신호를 포함하고,
상기 제 2 종속 채널 오디오 신호는 청자 측방 및 후방의 채널의 오디오 신호가 믹싱된(mixed) 오디오 신호를 포함하는 것을 특징으로 하는 오디오 처리 방법.
17. The method of claim 16,
the audio signal of the dependent channel group includes a first dependent channel audio signal and a second dependent channel audio signal;
the first dependent channel audio signal includes an audio signal of an independent channel in front of a listener;
The second sub-channel audio signal comprises an audio signal in which audio signals of channels on the side and behind the listener are mixed.

제 16 항에 있어서,
상기 기본 채널 그룹의 오디오 신호는 제 1 채널의 오디오 신호 및 제 2 채널의 오디오 신호를 포함하고,
상기 제 1 채널의 오디오 신호는 좌측 스테레오 채널의 오디오 신호와 복호화된 청자 전방의 중심(center) 채널의 오디오 신호를 믹싱하여 생성된 신호이고, 우측 스테레오 채널의 오디오 신호와 압축후 압축해제된 청자 전방의 중심 채널의 오디오 신호를 믹싱하여 생성된 신호인 것을 특징으로 하는 오디오 처리 방법.
17. The method of claim 16,
The audio signal of the basic channel group includes an audio signal of a first channel and an audio signal of a second channel,
The audio signal of the first channel is a signal generated by mixing the audio signal of the left stereo channel and the audio signal of the center channel in front of the decoded listener, and the audio signal of the right stereo channel and the decompressed audio signal in front of the listener An audio processing method, characterized in that it is a signal generated by mixing an audio signal of a central channel of

제 17 항에 있어서,
상기 기본 채널 그룹은 모노 채널 또는 스테레오 채널을 포함하고,
상기 적어도 하나의 업믹스 채널은 청자 전방의 3차원 오디오 채널 또는 청자 전방향의 3차원 오디오 채널 중 상기 기본 채널 그룹의 채널을 제외한 적어도 하나의 채널로, 비연속적인(discrete) 오디오 채널인 것을 특징으로 하는 오디오 처리 방법.
18. The method of claim 17,
The basic channel group includes a mono channel or a stereo channel,
The at least one upmix channel is at least one channel excluding the channel of the basic channel group among the 3D audio channel in front of the listener or the 3D audio channel in the front direction of the listener, and is a discrete audio channel. audio processing method.

제 19 항에 있어서,
상기 청자 전방의 3차원 오디오 채널은 3.1.2 채널이고,
상기 3.1.2 채널은 상기 청자 전방의 3개의 서라운드 채널(surround channel), 상기 청자 전방의 1개의 서브우퍼 채널(subwoofer channel), 및 2개의 높이 채널(height channel)을 갖는 채널이고,
상기 청자 전방향의 3차원 오디오 채널은 5.1.2 채널 또는 7.1.4 채널 중 적어도 하나이고,
상기 5.1.2 채널은 상기 청자 전방의 3개의 서라운드 채널, 청자 측방 및 후방의 2개의 서라운드 채널, 상기 청자 전방의 1개의 서브우퍼 채널, 상기 청자 전방의 2개의 높이 채널을 갖는 채널이고,
상기 7.1.4 채널은 상기 청자 전방의 3개의 서라운드 채널, 상기 청자 측방 및 후방의 4개의 서라운드 채널, 상기 청자 전방의 1개의 서브우퍼 채널, 상기 청자 전방의 2개의 높이 채널 및 상기 청자 측방 및 후방의 2개의 높이 채널을 갖는 채널인 것을 특징으로 하는 오디오 처리 방법.
20. The method of claim 19,
The three-dimensional audio channel in front of the listener is 3.1.2 channel,
the 3.1.2 channel is a channel having three surround channels in front of the listener, one subwoofer channel in front of the listener, and two height channels,
The three-dimensional audio channel of the listener omnidirectional is at least one of a 5.1.2 channel or a 7.1.4 channel,
the 5.1.2 channel is a channel having three surround channels in front of the listener, two surround channels in the side and rear of the listener, one subwoofer channel in front of the listener, and two height channels in front of the listener;
The 7.1.4 channel includes three surround channels in front of the listener, four surround channels in the side and rear of the listener, one subwoofer channel in front of the listener, two height channels in front of the listener and two height channels in front of the listener and side and rear of the listener. An audio processing method, characterized in that it is a channel having two height channels of

제 16 항에 있어서,
상기 디믹싱된 제 1 오디오 신호는, 적어도 하나의 업믹스 채널의 오디오 신호 및 독립 채널의 오디오 신호를 포함하고,
상기 독립 채널의 오디오 신호는 기본 채널 그룹의 오디오 신호 및 상기 종속 채널 그룹의 오디오 신호 중 일부를 포함하는 것을 특징으로 하는 오디오 처리 방법.
17. The method of claim 16,
The demixed first audio signal includes an audio signal of at least one upmix channel and an audio signal of an independent channel,
The audio processing method according to claim 1, wherein the audio signal of the independent channel includes a part of an audio signal of a basic channel group and an audio signal of the subordinate channel group.

제 13 항에 있어서,
상기 비트스트림은 서로 인접하는 제 1 오디오 트랙 및 제 2 오디오 트랙을 포함하는 복수의 오디오 트랙의 파일 스트림이고,
상기 제 1 오디오 트랙으로부터, 기본 채널 그룹의 오디오 신호가 획득되고,
상기 제 2 오디오 트랙으로부터, 종속 채널 오디오 신호 식별 정보가 획득되고,
상기 획득된 종속 채널 오디오 신호 식별 정보는 상기 제 2 오디오 트랙에 종속 채널 오디오 신호가 존재함을 나타내는 경우, 상기 제 2 오디오 트랙으로부터 종속 채널 그룹의 오디오 신호가 획득되고,
상기 획득된 종속 채널 오디오 신호 식별 정보는 상기 제 2 오디오 트랙에 종속 채널 오디오 신호가 존재하지 않음을 나타내는 경우, 상기 제 2 오디오 트랙으로부터 상기 기본 채널 그룹의 다음 트랙의 오디오 신호가 획득되는 것을 특징으로 하는 오디오 처리 방법.
14. The method of claim 13,
The bitstream is a file stream of a plurality of audio tracks including a first audio track and a second audio track adjacent to each other;
from the first audio track, an audio signal of a basic channel group is obtained,
dependent channel audio signal identification information is obtained from the second audio track,
When the obtained dependent channel audio signal identification information indicates that a dependent channel audio signal exists in the second audio track, an audio signal of a dependent channel group is obtained from the second audio track;
When the obtained dependent channel audio signal identification information indicates that there is no dependent channel audio signal in the second audio track, the audio signal of the next track of the basic channel group is obtained from the second audio track. How to process audio.

제 13 항에 있어서,
상기 비트스트림은 기본 채널 오디오 스트림 및 복수의 종속 채널 오디오 스트림을 포함하고,
상기 복수의 종속 채널 오디오 스트림은 제 1 종속 채널 오디오 스트림 및 제 2 종속 채널 오디오 스트림을 포함하고,
상기 기본 채널 오디오 스트림은 스테레오 채널의 오디오 신호를 포함하고,
기본 채널 오디오 스트림 및 제 1 종속 채널 오디오 스트림을 통해 복원된 다채널의 제 1 오디오 신호의 서라운드 채널의 개수는 S_n-1, 서브 우퍼 채널의 개수는 W_n-1, 높이 채널의 개수는 H_n-1이라고 하고, 기본 채널 오디오 스트림, 제 1 종속 채널 오디오 스트림 및 제 2 종속 채널 오디오 스트림을 통해 복원된 다채널의 제 2 오디오 신호의 서라운드 채널은 S_n, 서브 우퍼 채널은 W_n, 높이 채널은 H_n이라고 할 때,
S_n-1은 S_n보다 작거나 같고, W_n-1은 W_n보다 작거나 같고, H_n-1은 H_n보다 작거나 같으나, S_n-1,W_n-1및 H_n-1이 모두 S_n, W_n,및 H_n과 같지는 않은 것으로 제한되는 것을 특징으로 하는 오디오 처리 방법.
14. The method of claim 13,
the bitstream includes a base channel audio stream and a plurality of dependent channel audio streams;
wherein the plurality of dependent channel audio streams include a first dependent channel audio stream and a second dependent channel audio stream;
The base channel audio stream includes an audio signal of a stereo channel,
The number of surround channels of the multi-channel first audio signal reconstructed through the base channel audio stream and the first dependent channel audio stream is S _n-1 , the number of subwoofer channels is W _n-1 , and the number of height channels is H _{Let n-1} be, the surround channel of the multi-channel second audio signal reconstructed through the base channel audio stream, the first dependent channel audio stream, and the second dependent channel audio stream is S _n , the subwoofer channel is W _n , the height When the channel is H _n ,
S _n-1 is less than or equal to S _n , W _n-1 is less than or equal to W _n , and H _n-1 is less than or equal to H _n , but S _n-1, W _n-1 and H _n-1 All of which is limited to not equal to S _n , W _{n ,} and H _n .

제 16 항에 있어서,
상기 오디오 처리 방법은,
상기 비트스트림으로부터 오디오 객체(음원)의 오디오 신호, 위치, 방향 중 적어도 하나를 나타내는 청자 전방의 3차원 오디오 채널의 오디오 객체 신호를 획득하는 단계를 더 포함하고,
상기 기본 채널 그룹의 오디오 신호 및 상기 종속 채널 그룹의 오디오 신호로부터 생성된 청자 전방의 3차원 오디오 채널의 오디오 신호와 상기 청자 전방의 3차원 오디오 채널의 오디오 객체 신호를 기초로, 청자 전방의 3차원 오디오 채널의 오디오 신호가 복원되는 것을 특징으로 하는 오디오 처리 방법.
17. The method of claim 16,
The audio processing method comprises:
Further comprising the step of obtaining an audio object signal of a three-dimensional audio channel in front of the listener indicating at least one of an audio signal, a position, and a direction of an audio object (sound source) from the bitstream,
Based on the audio signal of the 3D audio channel in front of the listener and the audio object signal of the 3D audio channel in front of the listener generated from the audio signal of the base channel group and the audio signal of the subordinate channel group, the 3D in front of the listener An audio processing method, characterized in that the audio signal of the audio channel is restored.

제 13 항에 있어서,
상기 오디오 처리 방법은,
상기 비트스트림으로부터 다채널 오디오 관련 부가 정보를 획득하는 단계를 더 포함하고,
상기 다채널 오디오 관련 부가 정보는 기본 채널 오디오 스트림 및 종속 채널 오디오 스트림을 포함하는 오디오 스트림의 총 개수에 관한 정보, 다운믹스 이득 정보, 채널 맵핑 테이블 정보, 음량 정보, 저주파 효과 이득 정보, 동적 범위 제어(DRC) 정보, 채널 레이아웃 렌더링 정보, 커플링된 오디오 스트림의 개수 정보, 상기 다채널의 레이아웃을 나타내는 정보, 오디오 신호 내 대화 존재 여부 및 대화 레벨에 관한 정보, 저주파 효과 출력 여부를 나타내는 정보, 화면 상 오디오 객체의 존재 여부에 관한 정보, 연속적인 채널 오디오 신호의 존재 여부 또는 비연속적인 채널 오디오 신호의 존재 여부에 관한 정보 및 상기 다채널의 오디오 신호를 생성하기 위한 디믹싱 행렬의 적어도 하나의 디믹싱 파라미터를 포함하는 디믹싱에 관한 정보 중 적어도 하나를 포함하는 것을 특징으로 하는 오디오 처리 방법.
14. The method of claim 13,
The audio processing method comprises:
Further comprising the step of obtaining multi-channel audio related additional information from the bitstream,
The multi-channel audio-related additional information includes information on the total number of audio streams including a base channel audio stream and a sub-channel audio stream, downmix gain information, channel mapping table information, volume information, low-frequency effect gain information, and dynamic range control. (DRC) information, channel layout rendering information, information on the number of coupled audio streams, information indicating the layout of the multi-channel, information on whether dialogue exists and dialogue level in an audio signal, information indicating whether low-frequency effect is output, a screen At least one demixing matrix of a demixing matrix for generating information on the presence or absence of a phase audio object, information on whether a continuous channel audio signal exists or a non-continuous channel audio signal exists, and the multi-channel audio signal An audio processing method comprising at least one of information on demixing including a mixing parameter.

하나 이상의 인스트럭션을 실행하는 적어도 하나의 프로세서를 포함하고,
상기 적어도 하나의 프로세서는, 비트스트림으로부터, 적어도 하나의 제 1 오디오 신호로부터 다운믹싱된 제 2 오디오 신호를 획득하고,
상기 비트스트림으로부터, 상기 제 1 오디오 신호에 대한 에러 제거와 관련된 정보를 획득하고,
상기 다운믹싱된 제 2 오디오 신호로부터 상기 제1 오디오 신호를 디믹싱하고,
상기 에러 제거와 관련된 정보를 상기 제 2 오디오 신호로부터 상기 디믹싱된 제 1 오디오 신호에 적용하여 상기 제 1 오디오 신호를 복원하고,
상기 에러 제거와 관련된 정보는,
상기 제 1 오디오 신호의 원 신호 세기 및 상기 제 1 오디오 신호의 복호화후 신호 세기 중 적어도 하나를 이용하여 생성된 정보인, 오디오 처리 장치.
at least one processor executing one or more instructions;
the at least one processor obtains, from the bitstream, a second audio signal downmixed from the at least one first audio signal;
obtaining, from the bitstream, information related to error cancellation for the first audio signal;
demixing the first audio signal from the downmixed second audio signal;
restoring the first audio signal by applying the information related to the error cancellation to the demixed first audio signal from the second audio signal;
Information related to the error removal is
The audio processing apparatus is information generated by using at least one of an original signal strength of the first audio signal and a signal strength after decoding of the first audio signal.

제 1 항의 오디오 처리 방법을 구현하기 위한 프로그램이 기록된 컴퓨터로 판독 가능한 기록매체.A computer-readable recording medium in which a program for implementing the audio processing method of claim 1 is recorded.