KR20100085861A

KR20100085861A - A method for processing an audio signal and an apparatus for processing an audio signal

Info

Publication number: KR20100085861A
Application number: KR1020100004817A
Authority: KR
Inventors: 오현오; 정양원
Original assignee: 엘지전자 주식회사
Priority date: 2009-01-20
Filing date: 2010-01-19
Publication date: 2010-07-29
Also published as: CN102292768B; CN102292768A; KR101187075B1

Abstract

PURPOSE: A method and an apparatus for processing an audio signal are provided to up-mix a multi-channel object by a multi-channel signal by obtaining spatial information corresponding to the multi-channel object if a multi-channel object down-mixed by mono or stereo is included in a down-mix signal. CONSTITUTION: A down-mix signal including at least one normal object signal is received(S110). A bit stream including object information determined when the down-mix signal is generated is received. An extended type identifier is extracted from an extended part of the bit stream. The down mix signal includes a multi-channel object signal. If the extended type identifier is instructed, the first spatial information is extracted from the bit stream(S130).

Description

오디오 신호 처리 방법 및 장치{A method for processing an audio signal and an apparatus for processing an audio signal}A method for processing an audio signal and an apparatus for processing an audio signal}

본 발명은 오디오 신호를 인코딩하거나 디코딩할 수 있는 오디오 신호 처리 방법 및 장치에 관한 것이다. The present invention relates to an audio signal processing method and apparatus capable of encoding or decoding an audio signal.

일반적으로, 다수 개의 오브젝트를 모노 또는 스테레오 신호로 다운믹스하는 과정에 있어서, 각각의 오브젝트 신호로부터 파라미터들이 추출된다. 이러한 파라미터들은 디코더에서 사용될 수 있는 데, 각각의 오브젝들의 패닝(panning)과 게인(gain)은 유저의 선택에 의해 컨트롤 될 수 있다.In general, in the process of downmixing a plurality of objects into a mono or stereo signal, parameters are extracted from each object signal. These parameters can be used in the decoder, where the panning and gain of each object can be controlled by the user's choice.

각각의 오브젝트 시그널을 제어하기 위해서는, 다운믹스에 포함되어 있는 각각의 소스들이 적절히 포지셔닝 또는 패닝되어야 한다.In order to control each object signal, each source included in the downmix must be properly positioned or panned.

또한, 채널 기반(channel-oriented) 디코딩 방식으로 하향 호환성을 갖기 위해서는, 오브젝트 파라미터는 업믹싱을 위한 멀티 채널 파라미터로 유연하게 변환되어야 한다.In addition, to be backward compatible with channel-oriented decoding, object parameters must be flexibly converted to multi-channel parameters for upmixing.

본 발명은 상기와 같은 문제점을 해결하기 위해 창안된 것으로서, 오브젝트의 게인과 패닝을 컨트롤하여, 모노 신호, 스테레오 신호, 및 멀티채널 신호로 출력할 수 있는 오디오 신호 처리 방법 및 장치를 제공하는 데 그 목적이 있다.The present invention was devised to solve the above problems, and provides an audio signal processing method and apparatus which can output mono signals, stereo signals, and multichannel signals by controlling gain and panning of an object. There is a purpose.

본 발명의 또 다른 목적은, 오브젝트 기반의 일반 오브젝트들과, 채널 기반의 오브젝트(멀티채널 오브젝트 또는 멀티채널 백그라운드 오브젝트)가 모두 다운믹스 신호에 포함되어 있을 때, 오브젝트를 컨트롤하기 위한 오브젝트 정보뿐만 아니라 채널 기반의 오브젝트를 업믹싱하기 위한 공간 정보까지 비트스트림으로부터 획득할 수 있는 오디오 신호 처리 방법 및 장치를 제공하는 데 있다.Another object of the present invention, when both object-based general objects and a channel-based object (multi-channel object or multi-channel background object) is included in the downmix signal, as well as object information for controlling the object An object of the present invention is to provide an audio signal processing method and apparatus that can obtain spatial information for upmixing a channel-based object from a bitstream.

본 발명의 또 다른 목적은, 다운믹스 신호에 포함되어 있는 다수 개의 오브젝트 중 어느 오브젝트가 멀티채널 오브젝트인지를 식별할 수 있는 오디오 신호 처리 방법 및 장치를 제공하는 데 있다.It is still another object of the present invention to provide an audio signal processing method and apparatus capable of identifying which of a plurality of objects included in a downmix signal is a multichannel object.

본 발명의 또 다른 목적은, 다운믹스 신호에 스테레오로 다운믹스된 멀티채널 오브젝트가 포함되어 있는 경우, 어떤 오브젝트가 멀티채널 오브젝트의 좌측 채널인지 여부를 식별할 수 있는 오디오 신호 처리 방법 및 장치를 제공하는 데 있다.Still another object of the present invention is to provide an audio signal processing method and apparatus for identifying whether an object is a left channel of a multichannel object when the downmix signal includes a multichannel object downmixed in stereo. There is.

본 발명의 또 다른 목적은, 보컬 신호와 같은 노멀 오브젝트와 배경음악과 같은 멀티채널 오브젝트의 게인을 큰 폭으로 조절하는 경우에도 음질의 왜곡을 발생시키지 않는 오디오 신호 처리 방법 및 장치를 제공하는 데 있다.It is still another object of the present invention to provide a method and apparatus for processing an audio signal that does not generate distortion of sound quality even when largely adjusting gains of a normal object such as a vocal signal and a multichannel object such as background music. .

본 발명은 상기와 같은 목적을 달성하기 위해, 하나 이상의 노멀 오브젝트 신호를 포함하는 다운믹스 신호를 수신하는 단계; 상기 다운믹스 신호가 생성될 때 결정된 오브젝트 정보를 포함하는 비트스트림을 수신하는 단계; 상기 다운믹스 신호가 멀티채널 오브젝트 신호를 더 포함하는지 여부를 나타내는 확장 타입 식별자를 상기 비트스트림의 확장 파트로부터 추출하는 단계; 상기 다운믹스 신호가 멀티채널 오브젝트 신호를 더 포함하는 것을 상기 확장 타입 식별자가 지시하는 경우, 상기 비트스트림으로부터 제1 공간 정보를 추출하는 단계; 및 상기 제1 공간 정보 및 제2 공간 정보 중 하나 이상을 전송하는 단계를 포함하고, 상기 제1 공간 정보는 멀티채널 소스 신호가 상기 멀티채널 오브젝트 신호로 다운믹스될 때 결정되는 것이고, 상기 제2 공간 정보는 상기 오브젝트 정보 및 믹스 정보를 이용하여 생성되는 오디오 신호 처리 방법을 제공한다.In order to achieve the above object, the present invention includes the steps of: receiving a downmix signal including one or more normal object signals; Receiving a bitstream including object information determined when the downmix signal is generated; Extracting, from the extended part of the bitstream, an extension type identifier indicating whether the downmix signal further includes a multichannel object signal; Extracting first spatial information from the bitstream when the extended type identifier indicates that the downmix signal further includes a multichannel object signal; And transmitting at least one of the first spatial information and the second spatial information, wherein the first spatial information is determined when a multichannel source signal is downmixed to the multichannel object signal, and wherein the second spatial information is determined. The spatial information provides an audio signal processing method generated using the object information and the mix information.

본 발명에 따르면, 상기 제1 공간 정보 및 상기 제2 공간 정보 중 하나 이상은 상기 멀티채널 오브젝트가 억압되는 지 여부를 지시하는 모드 정보에 따라 전송되는 것일 수 있다.According to the present invention, one or more of the first spatial information and the second spatial information may be transmitted according to mode information indicating whether the multichannel object is suppressed.

본 발명에 따르면, 상기 멀티채널 오브젝트 신호가 억압되지 않는 것을 상기 모드 정보가 지시하는 경우, 상기 제1 공간 정보가 전송되고, 상기 멀티채널 오브젝트 신호가 억압되는 것을 상기 모드 정보가 지시하는 경우, 상기 제2 공간 정보가 전송되는 것일 수 있다.According to the present invention, when the mode information indicates that the multichannel object signal is not suppressed, when the mode information indicates that the first spatial information is transmitted and the multichannel object signal is suppressed, the The second spatial information may be transmitted.

본 발명에 따르면, 상기 제1 공간정보가 전송되는 경우, 제1 공간 정보 및 상기 멀티채널 오브젝트 신호를 이용하여 멀티채널 신호를 생성하는 단계를 더 포함할 수 있다.According to the present invention, when the first spatial information is transmitted, the method may further include generating a multichannel signal using the first spatial information and the multichannel object signal.

본 발명에 따르면, 상기 제2 공간 정보가 생성되는 경우, 상기 제2 공간 정보 및 상기 노멀 오브젝트 신호를 이용하여 출력 신호를 생성하는 단계를 더 포함할 수 있다.According to the present invention, when the second spatial information is generated, the method may further include generating an output signal using the second spatial information and the normal object signal.

본 발명에 따르면, 상기 제2 공간 정보가 전송되는 경우, 상기 오브젝트 정보 및 상기 믹스 정보를 이용하여 다운믹스 프로세싱 정보를 생성하는 단계; 상기 다운믹스 프로세싱 정보를 이용하여 상기 노멀 오브젝트 신호를 프로세싱함으로써 프로세싱된 다운믹스 신호를 생성하는 단계를 더 포함할 수 있다.According to the present invention, when the second spatial information is transmitted, generating downmix processing information using the object information and the mix information; The method may further include generating a processed downmix signal by processing the normal object signal using the downmix processing information.

본 발명에 따르면, 상기 제1 공간 정보는 공간 컨피그레이션 정보 및 공간 프레임 데이터를 포함할 수 있다.According to the present invention, the first spatial information may include spatial configuration information and spatial frame data.

본 발명의 또 다른 측면에 따르면, 하나 이상의 노멀 오브젝트 신호를 포함하는 다운믹스 신호를 수신하고, 상기 다운믹스 신호가 생성될 때 결정된 오브젝트 정보를 포함하는 비트스트림을 수신하는 수신 유닛; 상기 다운믹스 신호가 멀티채널 오브젝트 신호를 더 포함하는지 여부를 나타내는 확장 타입 식별자를 상기 비트스트림의 확장 파트로부터 추출하는 확장 타입 식별자 추출 파트; 상기 다운믹스 신호가 멀티채널 오브젝트 신호를 더 포함하는 것을 상기 확장 타입 식별자가 지시하는 경우, 상기 비트스트림으로부터 제1 공간 정보를 추출하는 제1 공간 정보 추출 파트; 및, 상기 제1 공간 정보 및 제2 공간 정보 중 하나 이상을 전송하는 멀티채널 오브젝트 트랜스코더를 포함하고, 상기 제1 공간 정보는 멀티채널 소스 신호가 상기 멀티채널 오브젝트 신호로 다운믹스될 때 결정되는 것이고, 상기 제2 공간 정보는 상기 오브젝트 정보 및 믹스 정보를 이용하여 생성되는 오디오 신호 처리 장치가 제공된다.According to another aspect of the invention, a receiving unit for receiving a downmix signal comprising at least one normal object signal, and receiving a bitstream including the object information determined when the downmix signal is generated; An extension type identifier extraction part for extracting from the extension part of the bitstream an extension type identifier indicating whether the downmix signal further includes a multichannel object signal; A first spatial information extraction part that extracts first spatial information from the bitstream when the extended type identifier indicates that the downmix signal further includes a multichannel object signal; And a multichannel object transcoder for transmitting one or more of the first spatial information and the second spatial information, wherein the first spatial information is determined when the multichannel source signal is downmixed to the multichannel object signal. And the second spatial information is generated using the object information and the mix information.

본 발명에 따르면, 상기 제1 공간정보가 전송되는 경우, 제1 공간 정보 및 상기 멀티채널 오브젝트 신호를 이용하여 멀티채널 신호를 생성하는 멀티채널 디코더를 더 포함할 수 있다.According to the present invention, when the first spatial information is transmitted, the apparatus may further include a multichannel decoder configured to generate a multichannel signal using the first spatial information and the multichannel object signal.

본 발명에 따르면, 상기 제2 공간 정보가 생성되는 경우, 상기 제2 공간 정보 및 상기 노멀 오브젝트 신호를 이용하여 출력 신호를 생성하는 멀티채널 디코더를 더 포함할 수 있다.According to the present invention, when the second spatial information is generated, the apparatus may further include a multichannel decoder configured to generate an output signal using the second spatial information and the normal object signal.

본 발명에 따르면, 상기 멀티채널 트랜스코더는,상기 제2 공간 정보가 전송되는 경우, 상기 오브젝트 정보 및 상기 믹스 정보를 이용하여 다운믹스 프로세싱 정보를 생성하는 정보 생성 파트; 및 상기 다운믹스 프로세싱 정보를 이용하여 상기 노멀 오브젝트 신호를 프로세싱함으로써 프로세싱된 다운믹스 신호를 생성하는 다운믹스 프로세싱 파트를 포함할 수 있다.According to the present invention, the multichannel transcoder includes: an information generating part for generating downmix processing information using the object information and the mix information when the second spatial information is transmitted; And a downmix processing part configured to generate the processed downmix signal by processing the normal object signal using the downmix processing information.

본 발명의 또 다른 측면에 따르면, 하나 이상의 노멀 오브젝트 신호를 포함하는 다운믹스 신호를 수신하는 단계; 상기 다운믹스 신호가 생성될 때 결정된 오브젝트 정보를 포함하는 비트스트림을 수신하는 단계; 상기 다운믹스 신호가 멀티채널 오브젝트 신호를 더 포함하는지 여부를 나타내는 확장 타입 식별자를 상기 비트스트림의 확장 파트로부터 추출하는 단계; 상기 다운믹스 신호가 멀티채널 오브젝트 신호를 더 포함하는 것을 상기 확장 타입 식별자가 지시하는 경우, 상기 비트스트림으로부터 제1 공간 정보를 추출하는 단계; 및 상기 제1 공간 정보 및 제2 공간 정보 중 하나 이상을 전송하는 단계를 포함하고, 상기 제1 공간 정보는 멀티채널 소스 신호가 상기 멀티채널 오브젝트 신호로 다운믹스될 때 결정되는 것이고, 상기 제2 공간 정보는 상기 오브젝트 정보 및 믹스 정보를 이용하여 생성되는 것을 특징으로 하는, 동작들을, 프로세서에 의해 실행될 때, 상기 프로세서가 수행하도록 하는 명령들이 저장되어 있는 컴퓨터로 읽을 수 있는 매체가 제공된다.According to another aspect of the invention, the method comprising the steps of: receiving a downmix signal comprising one or more normal object signals; Receiving a bitstream including object information determined when the downmix signal is generated; Extracting, from the extended part of the bitstream, an extension type identifier indicating whether the downmix signal further includes a multichannel object signal; Extracting first spatial information from the bitstream when the extended type identifier indicates that the downmix signal further includes a multichannel object signal; And transmitting at least one of the first spatial information and the second spatial information, wherein the first spatial information is determined when a multichannel source signal is downmixed to the multichannel object signal, and wherein the second spatial information is determined. Spatial information is generated using the object information and the mix information, and when executed by the processor, a computer-readable medium is provided that stores instructions that the processor is to perform.

본 발명은 다음과 같은 효과와 이점을 제공한다.The present invention provides the following effects and advantages.

우선, 오브젝트의 게인과 패닝을 제한없이 컨트롤 할 수 있다.First, you can control the gain and panning of an object without restriction.

둘째, 유저의 선택을 기반으로 오브젝트의 게인과 패닝을 컨트롤할 수 있다.Second, you can control the gain and panning of the object based on the user's selection.

셋째, 모노 또는 스테레오로 다운믹스된 멀티채널 오브젝트가 다운믹스 신호에 포함되어 있을 경우, 멀티채널 오브젝트에 대응하는 공간 정보를 획득함으로써, 모노 또는 스테레오의 멀티채널 오브젝트를 멀티채널 신호로 업믹싱할 수 있다.Third, when a multichannel object downmixed in mono or stereo is included in the downmix signal, the multichannel object in mono or stereo can be upmixed into a multichannel signal by acquiring spatial information corresponding to the multichannel object. have.

넸째, 보컬이나 배경 음악 중 하나를 완전하게 억압하는 경우에도, 게인 조정에 따른 음질의 왜곡을 방지할 수 있다.Fourth, even when one of vocals and background music is completely suppressed, distortion of sound quality due to gain adjustment can be prevented.

도 1은 본 발명의 실시예에 따른 오디오 신호 처리 장치 중 인코더의 구성도.
도 2는 도 1의 멀티플렉서(130)의 세부 구성도의 일 예.
도 3은 확장 컨피그레이션에 대한 신택스의 일 예.
도 4는 확장 타입 식별자가 x인 경우, 공간 컨피그레이션 정보에 대한 신택스의 예들.
도 5는 확장 타입 식별자가 x인 경우, 공간 프레임 데이터에 대한 신택스의 일 예.
도 6은 확장 타입 식별자가 x인 경우, 공간 프레임 데이터에 대한 신택스의 다른 예.
도 7은 공간 컨피그레이션 정보에 대한 신택스의 일 예.
도 8은 공간 프레임 데이터에 대한 신택스의 일 예.
도 9는 도 1의 멀티플렉서(130)의 세부 구성도의 다른 예.
도 10은 확장 타입 식별자가 y인 경우, 커플링된 오브젝트 정보에 대한 신택스의 일 예.
도 11은 커플링된 오브젝트 정보에 대한 신택스의 일 예.
도 12는 커플링된 오브젝트 정보에 대한 신택스의 다른 예들.
도 13은 본 발명의 실시예에 따른 오디오 신호 처리 장치 중 디코더의 구성도.
도 14는 본 발명의 실시예에 따른 오디오 신호 처리 방법 중 디코딩 동작에 대한 순서도.
도 15은 도 13의 디멀티플렉서(210)의 세부 구성도의 일 예.
도 16은 도 13의 디멀티플렉서(210)의 세부 구성도의 다른 예.
도 17은 도 13의 MBO 트랜스코더(220)의 세부 구성도의 일 예.
도 18은 도 13의 MBO 트랜스코더(220)의 세부 구성도의 다른 예.
도 19는 도 17 및 18의 추출 유닛(222)의 세부 구성도의 예들.
도 20은 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 구현된 제품의 개략적인 구성도.
도 21은 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 구현된 제품들의 관계도. 1 is a block diagram of an encoder in an audio signal processing apparatus according to an embodiment of the present invention.
2 is an example of a detailed configuration diagram of the multiplexer 130 of FIG.
3 is an example of syntax for an extended configuration.
4 is an example of syntax for spatial configuration information when the extension type identifier is x.
5 is an example of syntax for spatial frame data when the extension type identifier is x.
6 is another example of syntax for spatial frame data when the extension type identifier is x.
7 is an example of syntax for spatial configuration information.
8 is an example of syntax for spatial frame data.
9 is another example of a detailed configuration diagram of the multiplexer 130 of FIG. 1.
10 is an example of syntax for coupled object information when the extension type identifier is y.
11 is an example of syntax for coupled object information.
12 is another example of syntax for coupled object information.
13 is a block diagram of a decoder in an audio signal processing apparatus according to an embodiment of the present invention.
14 is a flowchart illustrating a decoding operation of an audio signal processing method according to an embodiment of the present invention.
15 is an example of a detailed configuration diagram of the demultiplexer 210 of FIG.
16 is another example of a detailed configuration diagram of the demultiplexer 210 of FIG. 13.
17 is an example of a detailed configuration diagram of the MBO transcoder 220 of FIG.
18 is another example of a detailed configuration diagram of the MBO transcoder 220 of FIG. 13.
19 shows examples of detailed configuration diagrams of the extraction unit 222 of FIGS. 17 and 18.
20 is a schematic structural diagram of a product implemented with an audio signal processing device according to an embodiment of the present invention.
21 is a relationship diagram of products in which an audio signal processing device according to an embodiment of the present invention is implemented.

이하 첨부된 도면을 참조로 본 발명의 바람직한 실시예를 상세히 설명하기로 한다.　 이에 앞서, 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서, 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시예에 불과할 뿐이고 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형예들이 있을 수 있음을 이해하여야 한다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Prior to this, terms or words used in the specification and claims should not be construed as having a conventional or dictionary meaning, and the inventors should properly explain the concept of terms in order to best explain their own invention. Based on the principle that can be defined, it should be interpreted as meaning and concept corresponding to the technical idea of the present invention. Therefore, the embodiments described in the specification and the drawings shown in the drawings are only the most preferred embodiment of the present invention and do not represent all of the technical idea of the present invention, various modifications that can be replaced at the time of the present application It should be understood that there may be equivalents and variations.

본 발명에서 다음 용어는 다음과 같은 기준으로 해석될 수 있고, 기재되지 않은 용어라도 하기 취지에 따라 해석될 수 있다. 코딩은 경우에 따라 인코딩 또는 디코딩으로 해석될 수 있고, 정보(information)는 값(values), 파라미터(parameter), 계수(coefficients), 성분(elements) 등을 모두 아우르는 용어로서, 경우에 따라 의미는 달리 해석될 수 있는 바, 그러나 본 발명은 이에 한정되지 아니한다.In the present invention, the following terms may be interpreted based on the following criteria, and terms not described may be interpreted according to the following meanings. Coding can be interpreted as encoding or decoding in some cases, and information is a term that encompasses values, parameters, coefficients, elements, and so on. It may be interpreted otherwise, but the present invention is not limited thereto.

도 1은 본 발명의 실시예에 따른 오디오 신호 처리 장치 중 인코더의 구성을 보여주는 도면이다. 도 1을 참조하면, 인코더(100)는 공간 인코더(110), 오브젝트 인코더(120), 및 멀티플렉서(130)를 포함한다. 1 is a block diagram of an encoder in an audio signal processing apparatus according to an exemplary embodiment of the present invention. Referring to FIG. 1, the encoder 100 includes a spatial encoder 110, an object encoder 120, and a multiplexer 130.

공간 인코더(110)는 멀티채널 소스 신호(또는 멀티채널 사운드 소스)를 채널 기반의 방식으로 다운믹스함으로써, 모노 또는 스테레오로 다운믹스된 멀티채널 오브젝트(또는 멀티채널 백그라운드 오브젝트)(이하, 멀티채널 오브젝트)(MBO)를 생성한다. 여기서 멀티채널 소스 신호란, 3개 이상의 채널로 구성된 사운드로서, 이를 테면, 하나의 악기 사운드를 5.1 채널 마이크로폰으로 수집한 것일 수도 있고, 오케스트라와 같이 다수 개의 악기 사운드 및 보컬 사운드를 5.1 채널 마이크로폰으로 획득한 것일 수도 있다. 물론, 모노 또는 스테레오 마이크로폰을 통해 입력된 신호에 다양한 프로세싱을 하여 5.1 채널로 업믹싱한 채널에 해당할 수도 있다. The spatial encoder 110 downmixes a multichannel source signal (or multichannel sound source) in a channel-based manner, thereby multichannel object (or multichannel background object) downmixed to mono or stereo (hereinafter, referred to as a multichannel object). (MBO). Here, the multichannel source signal is a sound composed of three or more channels, for example, one instrument sound may be collected by a 5.1 channel microphone, and a plurality of instrument sounds and vocal sounds, such as an orchestra, are acquired by a 5.1 channel microphone. It may be one. Of course, this may correspond to a channel upmixed to 5.1 channels by performing various processing on a signal input through a mono or stereo microphone.

이러한 멀티채널 소스 신호를 멀티채널 오브젝트(MBO)라고 할 수도 있고, 이 멀티채널 소스 신호가 모노 또는 스테레오로 다운믹스된 오브젝트 신호를 멀티채널 오브젝트(MBO)라고 지칭할 수도 있는 바, 본원 명세서에서는 후자를 따르고자 한다. Such a multichannel source signal may be referred to as a multichannel object (MBO), and an object signal in which the multichannel source signal is downmixed to mono or stereo may be referred to as a multichannel object (MBO). I would like to follow.

여기서 생성된 멀티채널 오브젝트(MBO)는 오브젝트로서 오브젝트 인코더(120)에 입력되는 데, 멀티채널 오브젝트(MBO)가 모노 채널인 경우, 하나의 오브젝트로서 입력되고, 스테레오 채널인 경우 좌측 멀티채널 오브젝트, 우측 멀티채널 오브젝트 즉, 두 개의 오브젝트로서 입력된다.The multi-channel object MBO generated here is input to the object encoder 120 as an object. When the multi-channel object MBO is a mono channel, the multi-channel object MBO is input as one object. It is input as the right multichannel object, that is, two objects.

이 다운믹싱 과정에서 공간 정보(spatial information)를 추출한다. 공간 정보란, 다운믹스(DMX)를 멀티 채널로 업믹싱하기 위한 정보로서, 채널 레벨 정보(channel level information), 채널 상관 정보(channel correlation information) 등을 포함할 수 있다. 이 공간 정보는 후자 디코더에서 생성되는 제2 공간 정보와 구분하기 위해 제1 공간 정보로 지칭하고자 한다. 제1 공간 정보는 멀티플렉서(130)에 입력된다.In this downmixing process, spatial information is extracted. The spatial information is information for upmixing the downmix (DMX) into multiple channels and may include channel level information, channel correlation information, and the like. This spatial information is intended to be referred to as first spatial information to distinguish it from second spatial information generated in the latter decoder. The first spatial information is input to the multiplexer 130.

오브젝트 인코더(120)는 멀티채널 오브젝트(MBO)와 노멀 오브젝트를 오브젝트 기반 방식에 따라서 다운믹스함으로써 다운믹스 신호(DMX)를 생성한다. 오브젝트들을 다운믹스함으로써 다운믹스 신호(DMX)뿐만 아니라 레지듀얼을 더 생성할 수도 있으나, 본 발명은 이에 한정되지 아니한다.The object encoder 120 generates a downmix signal DMX by downmixing the multichannel object MBO and the normal object according to an object-based method. Downmixing the objects may further generate not only the downmix signal DMX but also a residual, but the present invention is not limited thereto.

그리고 이 다운믹스 과정에서 오브젝트 정보가 생성되는 데, 오브젝트 정보(OI)는 다운믹스 신호 내에 포함되어 있는 오브젝트들에 관한 정보이자, 다운믹스 신호(DMX)로부터 다수 개의 오브젝트 신호를 생성하기 위해 필요한 정보이다. 오브젝트 정보는 오브젝트 레벨 정보(object level information), 오브젝트 상관 정보(object correlation information) 등을 포함할 수 있지만, 본 발명은 이에 한정되지 아니한다. 나아가, 다운믹스 게인 정보(DMG: DownMix Gain), 다운믹스 채널 레벨 차이(DCLD: Downmix Channel Level Difference)이 오브젝트 정보에 더 포함될 수 있다. 다운믹스 게인 정보(DMG)란 다운믹싱되기 전에 각 오브젝트에 적용된 게인을 나타내고, 다운믹스 채널 레벨 차이(DCLD)란, 다운믹스 신호가 스테레오인 경우 각 오브젝트가 좌측 채널 및 우측 채널에 적용된 비율을 나타낸다. 여기서 생성된 오브젝트 정보는 멀티플렉서(130)로 입력된다.In this downmix process, object information is generated. The object information (OI) is information about objects included in the downmix signal, and information necessary for generating a plurality of object signals from the downmix signal DMX. to be. The object information may include object level information, object correlation information, and the like, but the present invention is not limited thereto. Furthermore, downmix gain information (DMG) and downmix channel level difference (DCLD) may be further included in the object information. Downmix gain information (DMG) indicates gain applied to each object before downmixing, and downmix channel level difference (DCLD) indicates the ratio of each object applied to the left channel and the right channel when the downmix signal is stereo. . The generated object information is input to the multiplexer 130.

한편, 오브젝트 인코더(120)는 스테레오 오브젝트 정보를 더 생성하여 멀티플렉서(130)에 전달할 수 있다. 여기서 스테레오 오브젝트란 하나 또는 둘 이상의 음원이 스테레오 마이크로 입력된 오브젝트 신호를 일컫는다. Meanwhile, the object encoder 120 may further generate stereo object information and transmit the stereo object information to the multiplexer 130. Herein, the stereo object refers to an object signal in which one or more sound sources are input into the stereo microphone.

도 1에는 공간 인코더(110) 및 오브젝트 인코더(120)가 서로 분리되어 도시되어 있지만, 오브젝트 인코더(120)가 공간 인코더(110)을 기능까지 포함함으로써, 멀티채널 사운드소스 및 노멀 오브젝트를 다운믹스함으로써 공간 정보 및 오브젝트 정보를 생성할 수도 있다.Although the spatial encoder 110 and the object encoder 120 are separated from each other in FIG. 1, the object encoder 120 includes the spatial encoder 110 as a function, thereby downmixing a multichannel sound source and a normal object. It is also possible to generate spatial information and object information.

멀티플렉서(130)는 오브젝트 인코더(120)에서 생성된 오브젝트 정보를 이용하여 비트스트림을 생성하는 데, 다운믹스 신호(DMX)에 멀티채널 오브젝트(MBO)가 존재하는 경우, 상기 오브젝트 정보뿐만 아니라, 공간 인코더(110)에서 생성된 제1 공간 정보까지 멀티플렉싱함으로써 비트스트림에 포함시킨다.The multiplexer 130 generates a bitstream using the object information generated by the object encoder 120. When the multichannel object MBO exists in the downmix signal DMX, not only the object information but also the space The first spatial information generated by the encoder 110 is multiplexed to be included in the bitstream.

여기서 멀티플렉싱하는 방식에는 두 가지 방식이 있는데, 첫 번째 방식은 오브젝트 정보 비트스트림에 대응하는 신택스가 제1 공간 정보를 포함하는 것으로 정의하는 것이고, 두 번째 방식은 오브젝트 정보 비트스트림 및 공간 정보 비트스트림의 전송 메커니즘을 새롭게 만들어내는 것이다. Here, there are two methods of multiplexing. The first method is to define that the syntax corresponding to the object information bitstream includes the first spatial information. The second method is to define the object information bitstream and the spatial information bitstream. To create a new transport mechanism.

상기 첫 번째 방식에 대해서 추후 도 3 내지 도8과 함께 보다 구체적으로 설명하고자 한다.The first method will be described in more detail later with reference to FIGS. 3 to 8.

한편 멀티플렉서(130)는 커플링된 오브젝트 정보를 생성하여 비트스트림에 포함시킬 수 있다. 여기서 커플링된 오브젝트 정보란, 오브젝트 인코더(120)가 다운믹스한 둘 이상의 오브젝트 신호들 중에서, 스테레오 오브젝트 또는 멀티채널 오브젝트가 존재하는지 아니면, 노멀 오브젝트만 존재하는지 등에 대한 정보이다. 만약 제1 공간 정보가 있는 경우 멀티채널 오브젝트가 존재하는 것이다. 앞서 언급한 바와 같이 오브젝트 인코더(120)로부터 스테레오 오브젝트 정보를 수신한 경우, 스테레오 오브젝트가 존재하는 것이다. 만약 멀티채널 오브젝트 또는 스테레오 오브젝트가 포함된 경우, 커플링된 오브젝트 정보는, 어느 오브젝트가 스테레오 오브젝트(또는 멀티채널 오브젝트)의 좌측 오브젝트 또는 우측 오브젝트인지를 나타내는 정보를 더 포함할 수 있는데, 이에 대해서는 추후 도 10 내지 도 12와 함께 보다 구체적으로 설명하고자 한다.Meanwhile, the multiplexer 130 may generate coupled object information and include it in the bitstream. Here, the coupled object information is information about whether a stereo object or a multichannel object exists or only a normal object among two or more object signals that the object encoder 120 downmixes. If there is the first spatial information, the multichannel object exists. As mentioned above, when the stereo object information is received from the object encoder 120, the stereo object exists. If the multichannel object or the stereo object is included, the coupled object information may further include information indicating which object is the left object or the right object of the stereo object (or multichannel object), which will be described later. 10 to 12 will be described in more detail.

도 2의 멀티플렉서(130)의 세부 구성도의 일 예를 보여주는 도면이다. 도 2를 참조하면, 멀티플렉서(130)는 오브젝트 정보 삽입 파트(132), 확장 타입 식별자 삽입 파트(134), 및 제1 공간 정보 삽입 파트(136)을 포함한다.2 is a diagram illustrating an example of a detailed configuration diagram of the multiplexer 130 of FIG. 2. Referring to FIG. 2, the multiplexer 130 includes an object information insertion part 132, an extension type identifier insertion part 134, and a first spatial information insertion part 136.

오브젝트 정보 삽입 파트(132)는 오브젝트 인코더(120)로부터 수신한 오브젝트 정보를 신택스에 따라 비트스트림에 삽입한다. 확장 타입 식별자 삽입 파트(134)는 공간 인코더(110)로부터 제1 공간 정보가 수신되는지 여부에 따라서 확장 타입 식별자를 결정하고 이 확장 타입 식별자를 비트스트림에 삽입한다. The object information insertion part 132 inserts object information received from the object encoder 120 into the bitstream according to the syntax. The extension type identifier insertion part 134 determines the extension type identifier according to whether first spatial information is received from the spatial encoder 110 and inserts the extension type identifier into the bitstream.

도 3은 확장 컨피그레이션에 대한 신택스(SAOCExtensionConfig())의 일 예이다. 도 3의 (A) 행을 참조하면, 확장 영역의 타입을 나타내는 확장 타입 식별자(bsSaocExtType)가 포함되어 있음을 알 수 있다. 여기서 확장 타입 식별자는 확장 영역이 어떤 타입의 정보를 포함하고 있는지에 대한 식별자로서, 구체적으로 비트스트림에 공간 정보가 존재하는지 여부를 나타내는 것이다. 한편, 공간 정보가 존재하는 것은 즉, 다운믹스 신호에 멀티채널 오브젝트(MBO)가 포함된다는 것을 의미하는 것일 수 있기 때문에, 확장 타입 식별자는 다운믹스 신호에 멀티채널 오브젝트(MBO)가 포함되는지 여부를 나타내는 것이기도 한다. 하기 표에 확장 타입 식별자(bsSaocExtType)와 그 의미의 일 예가 나타나 있다.3 is an example of syntax (SAOCExtensionConfig ()) for an extension configuration. Referring to line (A) of FIG. 3, it can be seen that an extension type identifier (bsSaocExtType) indicating a type of an extension area is included. Here, the extension type identifier is an identifier of what type of information the extension region includes, and specifically indicates whether spatial information exists in the bitstream. On the other hand, since the presence of the spatial information may mean that the multi-channel object (MBO) is included in the downmix signal, the extended type identifier indicates whether the multi-channel object (MBO) is included in the downmix signal. It is also an indication. An example of an extension type identifier (bsSaocExtType) and its meaning is shown in the following table.

확장 타입 식별자의 의미의 일 예An example of the meaning of extended type identifiers 확장 타입 식별자
(bsSaocExtType)Extended type identifier
(bsSaocExtType) 의미meaning 확장 프레임 데이터Extended frame data 00 레지듀얼 코딩 데이터Residual coding data 존재existence 1One 프리셋 정보Preset Information 존재existence xx MBO 공간 정보MBO Space Information 존재existence ii 메타 데이터Meta data 존재하지 않음it does not exist

여기서, x, i는 임의의 정수Where x and i are arbitrary integers

상기 표에 따르면, 확장 타입 식별자가 x(x는 임의의 정수, 바람직하게 15이하의 정수)인 경우 MBO 공간정보가 존재함으로 의미하고, MBO 공간정보가 존재할 경우, 확장 프레임 데이터가 더 포함되어 있음을 의미한다.According to the table, when the extended type identifier is x (x is any integer, preferably 15 or less), it means that MBO spatial information exists, and when MBO spatial information exists, extended frame data is further included. Means.

여기서 확장 타입 식별자(bsSaocExtType)가 x인 경우, 도 3의 (B)행을 살펴보면 그 x에 대응하는 확장 컨피그 데이터(SAOCExtensionConfigData(x))가 호출된다. 이는 도 4와 함께 설명하고자 한다.Here, when the extension type identifier (bsSaocExtType) is x, looking at line (B) of FIG. 3, extension configuration data (SAOCExtensionConfigData (x)) corresponding to x is called. This will be described with reference to FIG. 4.

도 4는 확장 타입 식별자가 x인 경우, 공간 컨피그레이션 정보에 대한 신택스의 일 예이고, 도 5 및 도 6은 확장 타입 식별자가 x인 경우, 공간 프레임 데이터에 대한 신택스의 예들이다. 도 4의 테이블 2A를 참조하면, 확장 컨피그 데이터(SAOCExtensionConfigData(x))는 MBO 식별정보(bsMBOIs) 및 공간 컨피그레이션 정보(SpatialSpecificConfig())를 포함한다. 4 is an example of syntax for spatial configuration information when the extension type identifier is x, and FIGS. 5 and 6 are examples of syntax for spatial frame data when the extension type identifier is x. Referring to Table 2A of FIG. 4, the extension configuration data SAOCExtensionConfigData (x) includes MBO identification information bsMBOIs and spatial configuration information SpatialSpecificConfig ().

MBO 식별정보는 어떤 오브젝트가 MBO인지를 나타내는 정보로서, 만약 0인 경우, 1번째 오브젝트가 MBO에 해당하고, MBO 식별정보가 4인 경우, 5번째 오브젝트가 MBO에 해당하는 것이다. 상기 MBO가 스테레오(즉 MBO가 2개)일 수도 있는데, 스테레오인지 여부는 공간 컨피그레이션 정보(SpatialSpecificConfig())를 근거로 알 수 있다. 따라서, MBO가 스테레오인 경우, MBO 식별정보에 의해 지정된 오브젝트뿐만 아니라 그 다음 오브젝트도 MBO인 것으로 약속할 수 있다. 예를 들어 MBO 식별정보가 0이고, 공간 컨피그레이션 정보에 따라 MBO가 2개인 경우, 1번째 및 2번째 오브젝트가 MBO에 해당하는 것일 수 있다.The MBO identification information is information indicating which object is the MBO. If 0, the first object corresponds to the MBO, and if the MBO identification information is 4, the fifth object corresponds to the MBO. The MBO may be stereo (that is, two MBOs), and whether the MBO is stereo may be determined based on spatial configuration information (SpatialSpecificConfig ()). Therefore, when the MBO is stereo, it can be promised that not only the object specified by the MBO identification information but also the next object is the MBO. For example, when MBO identification information is 0 and there are two MBOs according to spatial configuration information, the first and second objects may correspond to MBOs.

도 4의 테이블 2B를 참조하면, MBO 식별정보(bsMBOIs)가 고정비트가 아니라 가변비트(nBitsMBO)로 포함되어 있음을 알 수 있다. MBO 식별정보는 앞서 언급한 바와 같이 다운믹스 신호에 포함된 오브젝트들 중에서 어떤 오브젝트가 MBO인지를 나타내는 정보이기 때문에, 다운믹스 신호에 포함된 총 오브젝트의 개수를 초과하는 비트가 필요하지 않다. 즉, 총 오브젝트의 개수가 10개일 때, 0~9를 나타내기 위한 비트수(예:4비트)만이 필요하고, 총 오브젝트의 개수가 N개일 때, ceil(log₂N) 비트만이 필요하다. 따라서, 고정비트(5비트)로 전송하는 것보다는, 총 오브젝트 수에 따른 가변비트로 전송하면 비트수를 절감할 수 있다.Referring to Table 2B of FIG. 4, it can be seen that MBO identification information (bsMBOIs) is included as a variable bit (nBitsMBO) instead of a fixed bit. As mentioned above, since the MBO identification information is information indicating which object is an MBO among the objects included in the downmix signal, no bits exceeding the total number of objects included in the downmix signal are needed. That is, when the total number of objects is 10, only the number of bits (for example, 4 bits) to represent 0 to 9 is needed, and when the total number of objects is N, only ceil (log ₂ N) bits are needed. . Accordingly, the number of bits can be reduced by transmitting the variable bits according to the total number of objects, rather than transmitting the fixed bits (5 bits).

도 4의 테이블 2C를 참조하면, 앞선 예와 마찬가지로, MBO 식별정보 및 공간 컨피그레이션 정보(SpatialSpecificConfig())를 포함하는 데, 프레임이 헤더에 포함되어 있을 때, 공간 프레임 데이터(SpatialFrame())을 포함한다.Referring to Table 2C of FIG. 4, similarly to the above example, MBO identification information and spatial configuration information (SpatialSpecificConfig ()) are included, and when the frame is included in the header, the spatial frame data (SpatialFrame ()) is included. do.

도 5 및 도 6은 확장 타입 식별자가 x인 경우, 공간 프레임 데이터(SpatialFrame())에 대한 신택스의 예들이다. 도 5의 테이블 3A를 참조하면, 확장 타입 식별자가 x인 경우의 확장 프레임 데이터(SAOCExtensionFrame(x))는 공간 프레임 데이터(SpatialFrame())을 포함하는 것을 알 수 있다. 도 5에 나타난 신택스 대신에 도 6이 도시된 바와 같이 신택스가 정의될 수도 있다.5 and 6 are examples of syntax for spatial frame data SpatialFrame () when the extension type identifier is x. Referring to Table 3A of FIG. 5, it can be seen that the extension frame data SAOCExtensionFrame (x) when the extension type identifier is x includes spatial frame data SpatialFrame (). Instead of the syntax shown in FIG. 5, the syntax may be defined as shown in FIG. 6.

도 6의 테이블 3B.1을 참조하면, 확장 타입 식별자가 x인 경우의 확장 프레임 데이터(SAOCExtensionFrame(x))는 MBO 프레임(MBOFrame())을 포함한다. 테이블3B.2에 나타난 바와 같이 MBO 프레임(MBOFrame())은 공간 프레임 데이터(SpatialFrame())를 포함한다.Referring to Table 3B.1 of FIG. 6, the extension frame data SAOCExtensionFrame (x) when the extension type identifier is x includes an MBO frame (MBOFrame ()). As shown in Table 3B.2, the MBO frame (MBOFrame ()) contains spatial frame data (SpatialFrame ()).

도 7은 공간 컨피그레이션 정보에 대한 신택스의 일 예이고, 도 8은 공간 프레임 데이터에 대한 신택스의 일 예이다. 도 7를 참조하면, 앞서 도 4의 테이블 2A 내지 2C에 포함된 공간 컨피그레이션 정보(SpatialSpecConfig())의 구체적인 구성이 나타나 있다. 공간 컨피그레이션 정보는 모노 또는 스테레오 채널을 복수의 채널로 업믹싱하는데 있어서 필요한 컨피그레이션 정보를 포함하고 있다. 우선 샘플링 주파수를 나타내는 샘플링 주파수 인덱스(bsSamplingFrequencyIndex), 프레임의 길이(타임 슬롯의 개수)를 나타내는 프레임 길이 정보(bsFrameLength), 미리 정해진 트리 구조(5-1-5₁ 트리 컨피그, 5-2-5 컨피그, 7-2-7 트리 컨피그 등) 중 하나를 지정하는 정보인 트리 컨피그레이션 정보(bsTreeConfig) 등이 포함되어 있다. 여기서 트리 컨피그레이션 정보를 통해서 MBO가 모노인지 스테레오인지 알 수 있다.FIG. 7 is an example of syntax for spatial configuration information, and FIG. 8 is an example of syntax for spatial frame data. Referring to FIG. 7, a detailed configuration of the spatial configuration information SpatialSpecConfig () included in Tables 2A to 2C of FIG. 4 is shown. The spatial configuration information includes configuration information necessary for upmixing a mono or stereo channel into a plurality of channels. First, a sampling frequency index (bsSamplingFrequencyIndex) indicating a sampling frequency, frame length information (bsFrameLength) indicating a frame length (number of time slots), a predetermined tree structure (5-1-5 ₁ tree config, 5-2-5 config) , 7-2-7 tree configuration, etc.), tree configuration information (bsTreeConfig), which is information for specifying one of the items, and the like, is included. Here, the tree configuration information indicates whether the MBO is mono or stereo.

도 8을 참조하면, 앞서 도 4의 테이블 2C, 도 5, 및 도 5의 테이블 3B.2에 포함된 공간 프레임 데이터(SpatiaFrame())의 구체적인 구성이 나타나 있다. 공간 프레임 데이터는 모노 또는 스테레오 채널을 복수의 채널로 업믹싱하는데 필요한 채널 레벨 차이(CLD) 등과 같은 공간 파라미터를 포함하고 있다. 구체적으로, 프레임 정보(Frameinfo()), OTT 정보(OttData() 등이 포함되어 있다. 프레임 정보(Frameinfo())는 파라미터 셋의 개수와, 파라미터 셋이 어느 타임 슬롯에 적용되는지에 대한 정보를 포함하는 프레임 정보를 포함할 수 있다. OTT 정보는 OTT(One-To-Two) 박스에 필요한 채널 레벨 차이(CLD), 채널 상관 정보(channel correlation information)(ICC) 등의 파라미터를 포함할 수 있다.Referring to FIG. 8, a detailed configuration of the spatial frame data SpatiaFrame () included in Table 2C of FIG. 4, FIG. 5, and Table 3B.2 of FIG. 5 is shown. The spatial frame data includes spatial parameters such as channel level differences (CLDs), etc. required for upmixing a mono or stereo channel into a plurality of channels. Specifically, frame information (Frameinfo ()), OTT information (OttData (), etc. are included.) Frame information (Frameinfo ()) includes information about the number of parameter sets and to which time slots the parameter sets are applied. OTT information may include parameters such as channel level difference (CLD) and channel correlation information (ICC) required for one-to-two (OTT) box. .

요컨대, 도 2에 도시된 멀티플렉서(120)는 제1 공간 정보가 존재하는 지 여부에 따라서, MBO의 존재여부를 나타낼 수 있는 확장 프레임 타입을 결정한다. 그리고 확장 프레임 타입이 제1 공간 정보가 존재하는 것을 지시하는 경우, 제1 공간 정보를 비트스트림에 포함시킨다. 제1 공간 정보를 비트스트림에 포함시키기 위한 신택스는 도 3 내지 도 8에 도시된 바와 같이 정의될 수 있다.In other words, the multiplexer 120 illustrated in FIG. 2 determines an extended frame type that may indicate the presence or absence of an MBO according to whether first spatial information exists. When the extended frame type indicates that the first spatial information exists, the first spatial information is included in the bitstream. The syntax for including the first spatial information in the bitstream may be defined as shown in FIGS. 3 to 8.

도 9는 도 1의 멀티플렉서(130)의 세부 구성도의 다른 예인데, 도 2에 도시된 일 예(130A)에서는, 확장 타입 식별자가 x일 때(즉 MBO가 포함되는 경우) 제1 공간정보를 비트스트림에 포함시키는 데 비해, 도 9에 도시된 다른 예(130B)에서는, 확장 타입 식별자가 y일 때, 커플링된 오브젝트 정보(ObjectCoupledInformation())를 비트스트림에 포함시킨다. 여기서 커플링된 오브젝트 정보란, 오브젝트 인코더(120)가 다운믹스한 둘 이상의 오브젝트 신호들 중에서, 스테레오 오브젝트 또는 멀티채널 오브젝트가 존재하는지 아니면, 노멀 오브젝트만 존재하는지 등에 대한 정보이다.FIG. 9 is another example of a detailed configuration diagram of the multiplexer 130 of FIG. 1. In the example 130A illustrated in FIG. 2, when the extension type identifier is x (ie, MBO is included), first spatial information is illustrated. 9 is included in the bitstream, when the extended type identifier is y, in the other example 130B shown in FIG. 9, the included object information (ObjectCoupledInformation ()) is included in the bitstream. Here, the coupled object information is information about whether a stereo object or a multichannel object exists or only a normal object among two or more object signals that the object encoder 120 downmixes.

도 9를 참조하면, 멀티플렉서(130B)는 오브젝트 정보 삽입 파트(132B), 확장 타입 식별자 삽입 파트(134B), 및 커플링된 오브젝트 정보 삽입 파트(136B)를 포함한다. 여기서 오브젝트 정보 삽입 파트(132B)는 도 2에서의 동일 명칭의 구성요소(132A)와 동일한 기능을 수행하므로 구체적인 설명은 생략하고자 한다. 9, the multiplexer 130B includes an object information insertion part 132B, an extension type identifier insertion part 134B, and a coupled object information insertion part 136B. Since the object information insertion part 132B performs the same function as the component 132A of the same name in FIG. 2, a detailed description thereof will be omitted.

확장 타입 식별자 삽입 파트는 다운믹스(DMX)에 스테레오 오브젝트 또는 멀티채널 오브젝트(MBO)가 존재하는지에 따라서 확장 타입 식별자를 결정하여 비트스트림에 포함시킨다. 그런 다음, 확장 타입 식별자가 스테레오 오브젝트 또는 멀티채널 오브젝트가 존재함을 의미하는 경우(예: y인 경우), 커플링된 오브젝트 정보를 비트스트림에 포함되도록 한다. 여기서 확장 타입 식별자(bsSaocExtType)는 앞서 도 3에 도시된 확장 컨피그레이션에 포함될 수 있다. 하기 표에 확장 타입 식별자(bsSaocExtType)와 그 의미의 일 예가 나타나 있다.The extended type identifier insertion part determines the extended type identifier according to whether a stereo object or a multichannel object (MBO) exists in the downmix (DMX) and includes it in the bitstream. Then, if the extended type identifier means that a stereo object or a multichannel object exists (eg, y), the coupled object information is included in the bitstream. The extension type identifier bsSaocExtType may be included in the extension configuration shown in FIG. 3. An example of an extension type identifier (bsSaocExtType) and its meaning is shown in the following table.

확장 타입 식별자의 의미의 일 예An example of the meaning of extended type identifiers 확장 타입 식별자
(bsSaocExtType)Extended type identifier
(bsSaocExtType) 의미meaning 확장 프레임 데이터Extended frame data 00 레지듀얼 코딩 데이터Residual coding data 존재existence 1One 프리셋 정보Preset Information 존재existence xx MBO 공간 정보MBO Space Information 존재existence yy 커플링된 오브젝트 정보Coupled Object Information 존재하지 않음it does not exist

여기서 y는 임의의 정수Where y is any integer

표 2는 확장 타입 식별자가 y인 경우, 커플링된 오브젝트 정보가 비트스트림에 포함되는 것을 의미하는 것이다. 물론, 앞서 언급된 표 1과 상기 표 2가 병합된 형태도 가능하다.Table 2 means that when the extension type identifier is y, coupled object information is included in the bitstream. Of course, the form in which Table 1 and Table 2 mentioned above are merged is also possible.

도 10은 확장 타입 식별자가 y인 경우, 커플링된 오브젝트 정보에 대한 신택스의 일 예이고, 도 11 및 12는 커플링된 오브젝트 정보에 대한 신택스의 예들이다. 도 10을 참조하면, 확장 타입 식별자가 y인 경우(bsSaocExtType가 y), 확장 컨피그 데이터(SAOCExtensionConfigData(y))는 커플링된 오브젝트 정보(ObjectCoupledInformation())를 포함하는 것을 알 수 있다.10 is an example of syntax for coupled object information when the extension type identifier is y, and FIGS. 11 and 12 are examples of syntax for coupled object information. Referring to FIG. 10, when the extension type identifier is y (bsSaocExtType is y), it can be seen that the extension configuration data SAOCExtensionConfigData (y) includes coupled object information ObjectCoupledInformation ().

도 11를 참조하면 커플링된 오브젝트 정보(ObjectCoupledInformation())는 우선 커플링 오브젝트 식별정보(bsCoupledObject[i][j]), 좌측 채널 정보(bsObjectIsLeft), MBO 정보(bsObjectIsMBO) 등을 포함하고 있다.Referring to FIG. 11, the coupled object information ObjectCoupledInformation () first includes coupling object identification information (bsCoupledObject [i] [j]), left channel information (bsObjectIsLeft), MBO information (bsObjectIsMBO), and the like.

커플링 오브젝트 식별정보(bsCoupledObject[i][j])는 어떤 오브젝트가 스테레오 또는 멀티채널 오브젝트의 파트인지를 나타내는 정보이다. 즉, 커플링 오브젝트 식별정보(bsCoupledObject[i][j])가 1인 경우 i번째 오브젝트와 j번째 오브젝트가 서로 커플링이 되어 있다는 것이고, 0인 경우는 서로 관련이 없음을 의미한다. 오브젝트가 총 5개이고, 세 번째와 네 번째의 오브젝트가 커플링되어있을 때 커플링 오브젝트 식별정보(bsCoupledObject[i][j])의 일 예가 다음 표에 나타나 있다.Coupling object identification information (bsCoupledObject [i] [j]) is information indicating which object is part of a stereo or multichannel object. That is, when the coupling object identification information (bsCoupledObject [i] [j]) is 1, it means that the i-th object and the j-th object are coupled to each other, and 0 means that they are not related to each other. An example of the coupling object identification information (bsCoupledObject [i] [j]) when there are five objects in total and the third and fourth objects are coupled is shown in the following table.

커플링 오브젝트 식별정보(bsCoupledObject[i][j])의 일 예An example of coupling object identification information (bsCoupledObject [i] [j]) bsCoupledObject[i][j]bsCoupledObject [i] [j] i = 0i = 0 i = 1i = 1 i = 2i = 2 i = 3i = 3 i = 4i = 4 i = 0i = 0 1One 00 00 00 00 j = 1j = 1 00 1One 1One 00 00 j = 2j = 2 00 1One 1One 00 00 j = 3j = 3 00 00 00 1One 00 j = 4j = 4 00 00 00 00 1One

여기서, 오브젝트는 총 5개, 3번째 오브젝트와 4번째 오브젝트가 커플인 경우Here, 5 objects in total, when the third object and the fourth object is a couple

그리고 커플링된 오브젝트에 한해(if (bsCoupledObject[i][j])), 좌측 채널 정보(bsObjectIsLeft), MBO 정보(bsObjectIsMBO)이 포함된다. 좌측 채널 정보(bsObjectIsLeft)는 1인 경우 해당 오브젝트가 스테레오 오브젝트의 좌측 채널에 해당하는 것을 의미하고, 0인 경우 우측 채널에 해당하는 것을 의미한다. MBO 정보(bsObjectIsMBO)가 1인 경우, 해당 오브젝트가 멀티채널 오브젝트(MBO)로부터 생성된 것임을 의미하고, 0인 경우 멀티채널 오브젝트(MBO)가 아님을 의미한다. 도 2과 함께 설명된 예에서는 제1 공간 정보가 포함되는지 여부에 따라서 MBO의 존재를 알 수 있지만, 상기 예에서는 MBO 정보를 통해 오브젝트에 멀티채널 오브젝트가 포함되어 있는지를 알 수 있는 것이다.Only coupled objects (if (bsCoupledObject [i] [j])), left channel information (bsObjectIsLeft), and MBO information (bsObjectIsMBO) are included. If left channel information (bsObjectIsLeft) is 1, this means that the object corresponds to the left channel of the stereo object, and if 0, it means that it corresponds to the right channel. If the MBO information (bsObjectIsMBO) is 1, it means that the object is generated from the multichannel object (MBO), and if it is 0, it is not a multichannel object (MBO). In the example described with reference to FIG. 2, the existence of the MBO may be known depending on whether the first spatial information is included. In the example, the MBO information may indicate whether the multichannel object is included in the object.

한편 도 12를 참조하면, 커플링된 오브젝트 정보의 다른 예가 나타나있다. 커플링된 오브젝트 정보의 다른 예는 오브젝트 타입 정보(bsObjectType), 좌측 채널 정보(bsObjectIsLeft), MBO 정보(bsObjectIsMBO), 및 커플 상대 정보(bsObjectIsCoupled) 등을 포함한다.Meanwhile, referring to FIG. 12, another example of coupled object information is shown. Other examples of coupled object information include object type information (bsObjectType), left channel information (bsObjectIsLeft), MBO information (bsObjectIsMBO), couple relative information (bsObjectIsCoupled), and the like.

여기서 오브젝트 타입 정보(bsObjectType)는 각 오브젝트에 대해서 1인 경우 스테레오 오브젝트(또는 멀티채널 오브젝트)를 나타내고 0인 경우 노멀 오브젝트임을 나타낸다. In this case, the object type information (bsObjectType) indicates a stereo object (or a multichannel object) in the case of 1 for each object and a normal object in the case of 0.

오브젝트는 총 5개이고, 3번째 오브젝트와 4번째 오브젝트가 스테레오 오브젝트(또는 멀티채널 오브젝트)이고, 1번째, 2번째, 5번째 오브젝트가 노멀 오브젝트인 경우, 오브젝트 타입정보는 다음과 같다.When there are five objects in total, the third object and the fourth object are stereo objects (or multichannel objects), and the first, second and fifth objects are normal objects, the object type information is as follows.

오브젝트 타입 정보(bsObjectType)의 일 예Example of object type information (bsObjectType) i = 0i = 0 i = 1i = 1 i = 2i = 2 i = 3i = 3 i = 4i = 4 bsObjectTypebsObjectType 00 00 1One 1One 00

오브젝트는 총 5개이고, 1번째 오브젝트부터 4번째 오브젝트가 스테레오 오브젝트(또는 멀티채널 오브젝트)이고, 5번째 오브젝트만이 노멀 오브젝트인 경우, 오브젝트 타입정보는 다음과 같다.When there are five objects in total, the first to fourth objects are stereo objects (or multichannel objects), and only the fifth object is a normal object, the object type information is as follows.

오브젝트 타입 정보(bsObjectType)의 다른 예Another example of object type information (bsObjectType) i = 0i = 0 i = 1i = 1 i = 2i = 2 i = 3i = 3 i = 4i = 4 bsObjectTypebsObjectType 1One 1One 1One 1One 00

오브젝트 타입 정보가 1인 경우(if (bsObjectType ==1))에 한해서, 도 11에 나타난 바와 같은 좌측 채널 정보(bsObjectIsLeft) 및 MBO 정보(bsObjectIsMBO)가 포함된다. 한편 커플 대상 정보(bsObjectIsCoupled)는 해당 오브젝트가 스테레오일 때 페어 또는 커플을 이루는 대상이 어떤 오브젝트인지를 나타내는 정보이다. 도 12의 테이블 7B.1에 나타난 바와 같이 커플 대상 정보가 고정비트(5비트)로 표현될 때, 앞서 표 4와 같은 경우, 커플 대상 정보는 다음 표 6과 같고, 테이블 5의 경우에는 커플 대상 정보가 다음 표 7과 같이 표현된다. Only when the object type information is 1 (if (bsObjectType == 1)), left channel information (bsObjectIsLeft) and MBO information (bsObjectIsMBO) as shown in FIG. 11 are included. Meanwhile, the couple target information (bsObjectIsCoupled) is information indicating which object is a pair or a couple when the object is stereo. As shown in Table 7B.1 of FIG. 12, when the couple target information is represented by a fixed bit (5 bits), the couple target information is as shown in Table 6 below, and in the case of Table 5, the couple target information is shown in Table 4 above. The information is expressed as shown in Table 7 below.

커플 대상 정보(bsObjectIsCoupled) 의 일 예Example of couple target information (bsObjectIsCoupled) i = 0i = 0 i = 1i = 1 i = 2i = 2 i = 3i = 3 i = 4i = 4 bsObjectTypebsObjectType -- -- 0001100011 0001000010 --

커플 대상 정보(bsObjectIsCoupled)의 다른 예 Another example of couple target information (bsObjectIsCoupled) i = 0i = 0 i = 1i = 1 i = 2i = 2 i = 3i = 3 i = 4i = 4 bsObjectIsCoupledbsObjectIsCoupled 0000100001 0000000000 0001100011 0001000010 --

우선, 노멀 오브젝트에 대해서는 커플 대상 정보가 전송되지 않을 알 수 있다. First, it can be known that couple target information is not transmitted for the normal object.

표 6에 나타난 케이스는, 3번째 오브젝트(i=2)의 커플 대상 정보는 i=3(00011)이므로 4번째 오브젝트(i=3)가 상대로 지정하고 있고, 4번째 오브젝트는 i=2(00010)으로서 3번째 오브젝트(i=2)를 상대로 지정하고 있기 때문에, 서로 하나의 페어를 이루고 있다. 표 7에 나타난 경우는, 1번째 오브젝트와 2번째 오브젝트가 하나의 커플을 이루고 있고, 3번째 오브젝트와 4번째 오브젝트가 다른 커플을 이루고 있음을 알 수 있다.In the case shown in Table 6, since the couple target information of the third object (i = 2) is i = 3 (00011), the fourth object (i = 3) is specified as a partner, and the fourth object is i = 2 (00010). Since the 3rd object (i = 2) is designated as (), it is a pair. In the case shown in Table 7, it can be seen that the first object and the second object form a couple, and the third object and the fourth object form a different couple.

한편, 커플 대상 정보(bsObjectIsCoupled)는 도 12의 테이블 2B.1에 나타난 바와 같이 고정비트로 나타낼 수도 있지만, 보다 비트수를 절약하기 위해 테이블 7B.2에 나타난 바와 같이 가변 비트로 나타낼 수도 있다. 이는 앞서 도 4와 함께 설명된 MBO 식별정보(MBOIs)를 가변비트로 나타내는 이유 및 원리와 동일하다.On the other hand, the couple object information (bsObjectIsCoupled) may be represented by a fixed bit as shown in Table 2B.1 of FIG. 12, but may be represented by a variable bit as shown in Table 7B.2 in order to save more bits. This is the same as the reason and principle of representing the MBO identification information (MBOIs) described with reference to Figure 4 in a variable bit.

[수학식 1][Equation 1]

nBitsMBO = ceil(log₂(bsNumObjects))nBitsMBO = ceil (log ₂ (bsNumObjects))

bsNumObjects는 총 오브젝트 개수, ceil(x)는 x보다 크지 않은 정수bsNumObjects is the total number of objects, ceil (x) is an integer not greater than x

앞서 표 4과 표 5에 나타난 케이스는, 총 오브젝트 개수가 5개인 경우이므로, 고정 5비트가 아닌 가변 비트 3비트(=(ceil(log₂5))로서 다음 표 8 및 표 9와 같이 나타낼 수 있다.Since the cases shown in Tables 4 and 5 above are 5 cases, the total number of objects is 5, so that the variable bits are not fixed 5 bits (= (ceil (log ₂ 5))) and can be represented as shown in Table 8 and Table 9 below. have.

커플 대상 정보(bsObjectIsCoupled) 의 일 예Example of couple target information (bsObjectIsCoupled) i = 0i = 0 i = 1i = 1 i = 2i = 2 i = 3i = 3 i = 4i = 4 bsObjectTypebsObjectType -- -- 011011 010010 --

커플 대상 정보(bsObjectIsCoupled)의 일 예Example of couple target information (bsObjectIsCoupled) i = 0i = 0 i = 1i = 1 i = 2i = 2 i = 3i = 3 i = 4i = 4 bsObjectIsCoupledbsObjectIsCoupled 001001 000000 011011 010010 --

도 13은 본 발명의 실시예에 따른 오디오 신호 처리 장치 중 디코더의 구성을 보여주는 도면이다. 도 14는 본 발명의 실시예에 따른 오디오 신호 처리 방법 중 디코딩 동작에 대한 순서를 보여주는 도면이다.13 is a block diagram illustrating a decoder of an audio signal processing apparatus according to an embodiment of the present invention. 14 is a flowchart illustrating a decoding operation of an audio signal processing method according to an embodiment of the present invention.

우선, 도 13을 참조하면, 디코더(200)는 디멀티플렉서(210), MBO 트랜스코더(220)를 포함하고, 멀티채널 디코더(230)를 더 포함할 수 있다. 이하, 도 13 및 도 14를 함께 참조하면서 디코더(200)의 기능 및 동작을 설명하고자 한다.First, referring to FIG. 13, the decoder 200 may include a demultiplexer 210, an MBO transcoder 220, and may further include a multichannel decoder 230. Hereinafter, the function and operation of the decoder 200 will be described with reference to FIGS. 13 and 14.

디코더(210)의 수신 유닛(미도시)는 다운믹스 신호(DMX), 및 비트스트림을 수신하고, 레지듀얼 신호를 더 수신할 수 있다(S110 단계). 여기서 상기 비트스트림에는 상기 레지듀얼 신호가 포함되어 있을 수 있고 나아가 다운믹스 신호(DMX)까지 포함되어 있을 수 있으나, 본 발명은 이에 한정되지 아니한다.The receiving unit (not shown) of the decoder 210 may receive the downmix signal DMX and the bitstream, and further receive the residual signal (step S110). Here, the bitstream may include the residual signal and may further include a downmix signal DMX, but the present invention is not limited thereto.

디멀티플렉서(210)는 비트스트림(나아가 비트스트림의 확장 영역)으로부터 확장 타입 식별자를 추출하고 이를 근거로 다운믹스 신호(DMX)에 멀티채널 오브젝트(MBO)가 포함되어 있는지 여부를 판단한다. 다운믹스(DMX)에 MBO가 포함되어 있다고 판단되는 경우(S120단계의 'yes', 비트스트림으로부터 제1 공간 정보를 추출한다(S130 단계).The demultiplexer 210 extracts an extension type identifier from the bitstream (and later, an extended region of the bitstream) and determines whether the multi-channel object MBO is included in the downmix signal DMX based on this. If it is determined that MBO is included in the downmix DMX (YES in step S120), first spatial information is extracted from the bitstream (step S130).

MBO 트랜스코더(220)는 레지듀얼, 오브젝트 정보 등을 이용하여 다운믹스(DMX)를 MBO 및 노멀 오브젝트로 분리한다. MBO 트랜스코더(220)는 믹스 정보(MXI)를 근거로 모드를 결정하는 데 이때 모드는 MBO를 업믹싱하는 모드, 또는 노멀 오브젝트를 제어하는 모드로 나뉠 수 있다. MBO를 업믹싱하는 모드는 백그라운드만을 남겨두는 것이기 때문에 가라오케 모드(karaoke mode)에 해당할 수 있고, 노멀 오브젝트를 제어하는 모드는 백그라운드를 제거하고 보컬과 같은 오브젝트만을 남겨두는 것일 수 있기 때문에 솔로 모드(solo mode)에 해당할 수 있다. 한편, 믹스 정보(MXI)에 대한 더욱 구체적인 설명은 추후 도 17 및 도 18과 함께 후술하고자 한다.The MBO transcoder 220 separates the downmix (DMX) into an MBO and a normal object using residual and object information. The MBO transcoder 220 determines a mode based on the mix information (MXI). The mode may be divided into a mode for upmixing the MBO or a mode for controlling the normal object. The mode of upmixing the MBO can be karaoke mode because it only leaves the background, and the mode for controlling normal objects can be removed because it can only remove objects and leave objects such as vocals. solo mode). Meanwhile, a more detailed description of the mix information MXI will be described later with reference to FIGS. 17 and 18.

이와 같이 MBO가 억압되지 않는 모드(또는 MBO가 업믹싱되는 모드)인 경우(예를 들어 가라오케 모드인 경우)(S140 단계의 'yes', 수신된 제1 공간 정보를 멀티채널 디코더(230)에 전달한다(S150 단계), 그러면 멀티채널 디코더(230)는 채널 기반의 방식으로 제1 공간 정보를 이용하여 모노 또는 스테레오 채널의 멀티채널 오브젝트를 업믹싱하여 멀티채널 신호를 생성한다(S160 단계).In this case, when the MBO is not suppressed (or a mode in which the MBO is upmixed) (for example, in the karaoke mode) ('yes' in step S140), the received first spatial information is transmitted to the multichannel decoder 230. In operation S150, the multichannel decoder 230 generates a multichannel signal by upmixing a multichannel object of a mono or stereo channel using the first spatial information in a channel-based manner.

만약 MBO가 억압되는 모드인 경우(즉, 노멀 오브젝트를 렌더링하는 경우)(예를 들어 솔로 모드인 경우)(S140 단계의 'no'), 수신된 제1 공간 정보를 이용하지 않고, 오브젝트 정보 및 믹스 정보(MXI)를 이용하여 프로세싱 정보를 생성한다(S170 단계). 상기 오브젝트 정보는 다운믹스에 포함된 하나 이상의 오브젝트 신호가 다운믹스될 때 결정된 정보로서, 앞서 언급한 바와 같이 오브젝트 레벨 정보 등을 포함한다. 여기서 프로세싱 정보란, 다운믹스 프로세싱 정보 및 제2 공간 정보 중 하나 이상을 포함하는 데, 멀티채널 디코더(230) 없이 MBO 트랜스코더(220)에서 바로 출력 채널이 생성되는 모드인 경우(디코딩 모드), 프로세싱 정보는 다운믹스 프로세싱 정보만을 포함한다. 반대로, 멀티채널 디코더(230)로 노멀 오브젝트가 전달되는 경우(트랜스 코딩 모드), 프로세싱 정보는 제2 공간 정보를 더 포함할 수 있다. 디코딩 모드 및 트랜스 코딩 모드에 대한 구체적인 설명은 추후 도 17 및 도 18과 함께 후술하고자 한다.If the MBO is in a suppressed mode (i.e., rendering a normal object) (for example, in solo mode) ('no' in step S140), the object information and the first spatial information are not used. Processing information is generated using the mix information MXI (step S170). The object information is information determined when one or more object signals included in the downmix is downmixed, and includes object level information and the like as described above. Herein, the processing information includes at least one of downmix processing information and second spatial information, and when the output channel is directly generated in the MBO transcoder 220 without the multichannel decoder 230 (decoding mode), The processing information only includes downmix processing information. Conversely, when the normal object is delivered to the multichannel decoder 230 (trans coding mode), the processing information may further include second spatial information. A detailed description of the decoding mode and the trans coding mode will be described later with reference to FIGS. 17 and 18.

이와 같이 MBO 트랜스코더(220)가 제2 공간 정보를 생성한 경우(디코딩 모드), 멀티채널 디코더(230)는 제2 공간 정보를 이용하여 노멀 오브젝트를 업믹싱함으로써 멀티채널 신호를 생성한다(S180 단계).When the MBO transcoder 220 generates the second spatial information as described above (decoding mode), the multichannel decoder 230 generates a multichannel signal by upmixing the normal object using the second spatial information (S180). step).

이하, 도 15 및 도 16를 참조하면서 디멀티플렉서(210)의 세부 구성에 대해서 설명하도록 하고, 도 17 내지 도 18을 참조하면서 MBO 트랜스코더(220)의 세부 구성에 대해서 설명하고자 한다.Hereinafter, a detailed configuration of the demultiplexer 210 will be described with reference to FIGS. 15 and 16, and a detailed configuration of the MBO transcoder 220 will be described with reference to FIGS. 17 to 18.

도 15은 도 13의 디멀티플렉서(210)의 세부 구성도의 일 예이고, 도 16은 다른 예이다. 다시 말해서, 도 15에 도시된 디멀티플렉서(210A)는 앞서 도 2의 멀티플렉서(130A)에 대응한 예이고, 도 16의 디멀티플렉서(210B)는 앞서 도 9의 멀티플렉서(130B)에 대응한 예이다. 요컨대, 도 15에 도시된 디멀티플렉서(210A)는 확장 타입 식별자에 따라서 제1 공간 정보를 추출하는 예이고, 도 16에 도시된 디멀티플렉서(210B)는 커플링된 오브젝트 정보를 추출하는 예이다.FIG. 15 is an example of a detailed configuration diagram of the demultiplexer 210 of FIG. 13, and FIG. 16 is another example. In other words, the demultiplexer 210A illustrated in FIG. 15 is an example corresponding to the multiplexer 130A of FIG. 2, and the demultiplexer 210B of FIG. 16 is an example corresponding to the multiplexer 130B of FIG. 9. In other words, the demultiplexer 210A shown in FIG. 15 is an example of extracting first spatial information according to an extension type identifier, and the demultiplexer 210B shown in FIG. 16 is an example of extracting coupled object information.

도 15를 참조하면, 디멀티플렉서(210A)는 확장 타입 식별자 추출 파트(212A), 제1 공간정보 추출 파트(214A), 및 오브젝트 정보 추출 파트(216A)를 포함한다. 확장 타입 식별자 추출 파트(212A)는 우선 비트스트림으로부터 확장 타입 식별자를 추출한다. 여기서 확장 타입 식별자(bsSaocExtType)는 도 3에 도시된 신택스에 따라서 획득될 수 있고 앞서 설명된 표 1에 의해 해석될 수 있다. 그리고 확장 타입 식별자가 다운믹스 신호에 MBO가 포함되어 있음(즉, 비트스트림에 공간 정보가 포함되어 있음)을 지시하는 경우(예: bsSaocExtType가 x인 경우), 비트스트림은 제1 공간 정보 추출 파트(214A)로 유입되고, 제1 공간 정보 추출 파트(214A)는 비트스트림으로부터 제1 공간 정보를 획득할 수 있다. 반대로 확장 타입 식별자가 다운믹스에 MBO가 포함되어 있지 않은 것을 지시하는 경우, 비트스트림은 제1 공간정보 추출 파트(214A)로 유입되지 않고, 오브젝트 정보 추출 파트(216A)로 직접 전달된다.Referring to FIG. 15, the demultiplexer 210A includes an extension type identifier extraction part 212A, a first spatial information extraction part 214A, and an object information extraction part 216A. The extension type identifier extraction part 212A first extracts the extension type identifier from the bitstream. Here, the extended type identifier (bsSaocExtType) may be obtained according to the syntax shown in FIG. 3 and may be interpreted by Table 1 described above. And when the extended type identifier indicates that the downmix signal includes the MBO (ie, the bitstream includes spatial information) (eg, when bsSaocExtType is x), the bitstream may include the first spatial information extraction part. Flowing into 214A, the first spatial information extraction part 214A may obtain first spatial information from the bitstream. On the contrary, when the extended type identifier indicates that the MBO is not included in the downmix, the bitstream is not directly introduced to the first spatial information extraction part 214A, but is directly transmitted to the object information extraction part 216A.

상기 제1 공간 정보는, 앞서 설명한 바와 같이, 멀티채널 소스 신호를 모노 또는 스테레오의 MBO로 다운믹스할 때 결정된 정보이자, MBO를 멀티채널로 업믹싱하기 위해 필요한 공간 정보이다. 또한, 제1 공간 정보는 앞서 도 4, 및 도 7에 정의된 공간 컨피그레이션 정보, 및 도 5, 도 6 및 도 8에 도시된 공간 프레임 데이터를 포함할 수 있다. As described above, the first spatial information is information determined when downmixing a multichannel source signal to a mono or stereo MBO and spatial information necessary for upmixing the MBO to multichannel. In addition, the first spatial information may include spatial configuration information previously defined in FIGS. 4 and 7, and spatial frame data illustrated in FIGS. 5, 6, and 8.

그리고 오브젝트 정보 추출 파트(216A)는 확장 타입 식별자와 상관없이 비트스트림으로부터 오브젝트 정보를 추출한다.The object information extraction part 216A extracts object information from the bitstream regardless of the extension type identifier.

도 16을 참조하면, 디멀티플렉서(210B)는 확장 타입 식별자 추출 파트(212B), 커플링된 오브젝트 정보 추출 파트(214B) 및 오브젝트 정보 추출 파트(216B)를 포함한다. Referring to FIG. 16, the demultiplexer 210B includes an extension type identifier extraction part 212B, a coupled object information extraction part 214B, and an object information extraction part 216B.

확장 타입 식별자 추출 파트(212B)는 비트스트림으로부터 확장 타입 식별자를 추출한다. 확장 타입 식별자는 도 3에 도시된 신택스에 따라서 획득될 수 있고 앞서 설명된 표 2에 의해 해석될 수 있다. 확장 타입 식별자가 비트스트림에 커플링된 오브젝트 정보가 포함되어 있음을 의미하는 경우(예를 들어, bsSaocExtType=y인 경우), 비트스트림은 커플링된 오브젝트 정보 추출 파트(214B)로 유입되고 반대의 경우, 오브젝트 정보 추출 파트(216B)로 직접 전달된다. The extension type identifier extraction part 212B extracts the extension type identifier from the bitstream. The extension type identifier may be obtained according to the syntax shown in FIG. 3 and may be interpreted by Table 2 described above. If the extended type identifier means that the bitstream contains coupled object information (eg, when bsSaocExtType = y), the bitstream is introduced into the coupled object information extraction part 214B and vice versa. If so, it is passed directly to the object information extraction part 216B.

여기서 커플링된 오브젝트 정보란, 다운믹스한 둘 이상의 오브젝트 신호들 중에서, 스테레오 오브젝트 또는 멀티채널 오브젝트가 존재하는지 아니면, 노멀 오브젝트만 존재하는지 등에 대한 정보이다. 나아가, 앞서 도 10 및 도 11과 함께 설명된 바와 같이, 커플링된 오브젝트 정보는 커플링 오브젝트 식별정보(bsCoupledObject[i][j]), 좌측 채널 정보(bsObjectIsLeft), MBO 정보(bsObjectIsMBO) 등을 포함할 수 있다. 여기서 커플링된 오브젝트 정보란, 오브젝트 인코더(120)가 다운믹스한 둘 이상의 오브젝트 신호들 중에서, 스테레오 오브젝트 또는 멀티채널 오브젝트가 존재하는지 아니면, 노멀 오브젝트만 존재하는지 등에 대한 정보이다. 디코더는 커플링된 오브젝트 정보를 이용하여 어떤 오브젝트가 스테레오 오브젝트(또는 멀티채널 오브젝트)인지 알 수 있다. 이하에서는 커플링된 오브젝트 정보의 속성 및 용도에 대해서 설명하고자 한다.Here, the coupled object information is information about whether a stereo object or a multichannel object exists or only a normal object among two or more downmixed object signals. Furthermore, as described above with reference to FIGS. 10 and 11, the coupled object information includes coupling object identification information (bsCoupledObject [i] [j]), left channel information (bsObjectIsLeft), MBO information (bsObjectIsMBO), and the like. It may include. Here, the coupled object information is information about whether a stereo object or a multichannel object exists or only a normal object among two or more object signals that the object encoder 120 downmixes. The decoder may know which object is a stereo object (or a multichannel object) using the coupled object information. Hereinafter, attributes and uses of the coupled object information will be described.

스테레오 오브젝트(또는 스테레오로 다운믹스 된 멀티 채널 신호)는 모두 2개의 오브젝트 신호라 하더라도, 하나 또는 복수 개의 음원의 좌측 채널 및 우측 채널의 성질을 가지고 있기 때문에 서로 유사성이 높다. 즉 오브젝트의 좌측 채널 및 우측 채널은 마치 하나의 오브젝트와 같이 행동한다. 예를 들어 오브젝트 상관 정도(IOC: Inter-Object cross Correlation)가 매우 높을 수 있다. 그렇기 때문에, 디코더에서는 다운믹스 신호에 포함된 다수 개의 오브젝트들 중 어떤 오브젝트가 스테레오 오브젝트(또는 멀티채널 오브젝트)에 해당하는지 아는 경우, 스테레오 오브젝트의 상기와 같은 유사성을 이용함으로써 오브젝트를 렌더링하는 데 효율을 높일 수 있다. 예를 들어, 특정 오브젝트의 레벨 또는 패닝(위치)를 제어하는 경우에, 2개의 오브젝트로 취급되는 스테레오 오브젝트의 좌측 채널 및 우측 채널을 각각 별개로 제어할 수 있다. 구체적으로, 유저가 스테레오 오브젝트의 좌측 채널을 출력 채널의 좌측 및 우측 채널로 최대 레벨를 갖고 렌더링하고, 스테레오 오브젝트의 우측 채널을 출력 채널의 좌측 및 우측 채널로 최소 레벨을 갖고 렌더링 할 수 있는 것이다. 이와 같이 스테레오 오브젝트의 특성을 무시하고 오브젝트를 렌더링하는 경우에는 음질이 상당히 악화될 수 있다. 그러나, 디코더에서 스테레오 오브젝트의 존재를 알고 있는 경우, 그 스테레오 좌측 채널 및 우측 채널을 한꺼번에 일괄적으로 제어함으로써 음질이 나빠지는 것을 예방할 수 있다. 디코더는 IOC 값으로 어느 오브젝트가 스테레오 오브젝트의 일부 채널인지를 추정할 수도 있지만, 어느 오브젝트가 스테레오 오브젝트인지를 명시적으로 지시하는 커플링된 오브젝트 정보가 수신될 경우, 이를 오브젝트를 렌더링하는 데 활용할 수 있다.Stereo objects (or multi-channel signals downmixed in stereo), even though they are two object signals, have high similarity to each other because they have properties of left and right channels of one or more sound sources. That is, the left channel and the right channel of the object behave like one object. For example, Inter-Object Cross Correlation (IOC) can be very high. Therefore, when the decoder knows which of the plurality of objects included in the downmix signal corresponds to a stereo object (or a multichannel object), the decoder can use the above similarity of the stereo object to improve the efficiency of rendering the object. It can increase. For example, when controlling the level or panning (position) of a specific object, the left channel and the right channel of the stereo object treated as two objects can be controlled separately. Specifically, the user may render the left channel of the stereo object with the maximum level to the left and right channels of the output channel, and the right channel of the stereo object with the minimum levels to the left and right channels of the output channel. As such, when the object is rendered in a manner that ignores the characteristics of the stereo object, the sound quality may be significantly deteriorated. However, when the decoder knows the existence of the stereo object, the sound quality can be prevented from being worsened by collectively controlling the stereo left and right channels at once. The decoder may estimate which object is part of the stereo object's channel by its IOC value, but if coupled object information is received that explicitly indicates which object is the stereo object, the decoder may use it to render the object. have.

한편, 다운믹스 신호가 스테레오 채널의 오브젝트를 포함하는 경우, 이것이 일반적인 스테레오 오브젝트인지 아니면 멀티채널 오브젝트(MBO)가 스테레오 채널로 다운믹스된 것인지를 디코더가 상기 설명한 MBO 정보를 통해 알 수 있다. 디코더는 MBO 정보를 이용하여, 멀티채널 오브젝트(MBO)가 다운믹스될 때 결정된 공간정보(도 15와 함께 설명한 제1 공간 정보에 해당할 수 있음)가 비트스트림에 포함되어 있는지 여부를 알 수도 있다. 나아가, MBO가 디코더에서 이용될 때, 종종 변경되지 않기를 원하거나 또는 기껏해야 전체적인 게인으로서 변형되길 원한다.On the other hand, when the downmix signal includes an object of a stereo channel, the decoder may know whether the downlink signal is a general stereo object or a multichannel object (MBO) is downmixed to the stereo channel through the MBO information described above. The decoder may know whether the bitstream includes spatial information (which may correspond to the first spatial information described with reference to FIG. 15) determined when the multi-channel object MBO is downmixed using the MBO information. . Furthermore, when an MBO is used in a decoder, it often wants to be unchanged or at best deformed as an overall gain.

이와 같이 도 16에 도시된 디멀티플렉서(210B)는 커플링된 오브젝트 정보를 수신함으로써, 확장 타입 식별자가 커플링된 오브젝트 정보가 포함되는 것을 지시하는 경우, 비트스트림으로부터 커플링된 오브젝트 정보를 추출한다.As described above, when the demultiplexer 210B illustrated in FIG. 16 receives the coupled object information, when the extension type identifier indicates that the coupled object information is included, the demultiplexer 210B extracts the coupled object information from the bitstream.

그리고 오브젝트 정보 추출 파트(216)는 역시 확장 타입 식별자 또는 커플링된 오브젝트 정보의 존재여부와 상관없이 비트스트림으로부터 오브젝트 정보를 추출한다.The object information extraction part 216 also extracts the object information from the bitstream regardless of the presence of the extension type identifier or the coupled object information.

도 17 및 도 18은 도 13의 MBO 트랜스코더(220)의 세부 구성도의 예들이고, 도 19는 도 17 및 18의 추출 유닛(222)의 세부 구성도의 예들이다.17 and 18 are examples of detailed configuration diagrams of the MBO transcoder 220 of FIG. 13, and FIG. 19 is an example of detailed configuration diagrams of the extraction unit 222 of FIGS. 17 and 18.

도 17에 도시된 MBO 트랜스코더(및 멀티 채널 디코더)는 그 구성요소는 다르지 않지만, 도 17은 다운믹스 신호에 포함된 오브젝트들 중에 MBO 이외의 노멀 오브젝트는 억압되는 모드(예: 가라오케 모드)에 대한 것이고, 도 18은 다운믹스 신호 중 MBO는 억압되고 노멀 오브젝트만이 렌더링되는 경우인 모드(예: 솔로 모드)에 대한 것이다. 우선 도 17을 참조하면, MBO 트랜스코더(220)는 추출 유닛(222), 렌더링 유닛(224) 및 다운믹스 프로세싱 유닛(226), 정보 생성 유닛(228)을 포함하고, 도 13에 도시된 바와 같이 멀티채널 디코더(230)와 연결될 수 있다. The MBO transcoder (and multi-channel decoder) shown in FIG. 17 is not different in its components, but FIG. 17 illustrates a mode (eg, karaoke mode) in which normal objects other than MBO are suppressed among objects included in the downmix signal. 18 illustrates a mode (eg, a solo mode) in which the MBO of the downmix signal is suppressed and only a normal object is rendered. Referring first to FIG. 17, the MBO transcoder 220 includes an extraction unit 222, a rendering unit 224 and a downmix processing unit 226, an information generating unit 228, as shown in FIG. 13. It may be connected to the multi-channel decoder 230 as well.

추출 유닛(222)은 레지듀얼(및 오브젝트 정보)를 이용하여 다운믹스(DMB)로부터 MBO 또는 노멀 오브젝트를 추출한다. 추출 유닛(222)의 예들이 도 19에 도시되어 있다. 도 19의 (A)를 참조하면, OTN 모듈(222-1)(One-To-N)은 하나 채널의 입력 신호로부터 N채널의 출력신호를 생성하는 모듈로서, 예를 들어, 두 레지듀얼 신호들(residual₁, residual₂)를 이용하여 모노 다운믹스(DMX_m)로부터 모노 MBO(MBO_m) 및 두 개의 노멀 오브젝트(Normal obj₁, Normal obj₂)를 추출할 수 있다. 이때 레지듀얼 신호의 개수는 노멀 오브젝트 신호의 개수와 동일할 수 있다. 도 19의 (B)를 참조하면, TTN 모듈(222-2)(Two-To-N)은 두 채널의 입력 신호로부터 N채널의 출력신호를 생성하는 모듈로서, 예를 들어 스테레오 다운믹스(DMX_L, DMX_R)로부터 두 개의 MBO 채널(MBO_L, MBO_R) 및 세 개의 노멀 오브젝트(Normal obj₁, Normal obj₂, Normal obj₃)를 추출할 수 있다. The extraction unit 222 extracts the MBO or normal object from the downmix (DMB) using the residual (and object information). Examples of extraction unit 222 are shown in FIG. 19. Referring to FIG. 19A, an OTN module 222-1 (One-To-N) is a module for generating an output signal of N channels from an input signal of one channel, for example, two residual signals. The residuals ₁ and residual ₂ may be used to extract a mono MBO (MBO _m ) and two normal objects (Normal obj ₁ and Normal obj ₂ ) from the mono downmix (DMX _m ). In this case, the number of residual signals may be equal to the number of normal object signals. Referring to FIG. 19B, the TTN module 222-2 (Two-To-N) is a module for generating an output signal of N channels from input signals of two channels, for example, a stereo downmix (DMX). _L , DMX _R ) may extract two MBO channels (MBO _L , MBO _R ) and three normal objects (Normal obj ₁ , Normal obj ₂ , and Normal obj ₃ ).

그런데 만약, 인코더에서 레지듀얼 신호가 생성될 때, MBO만을 가라오케 모드의 백그라운드인 인핸스드 오디오 오브젝트(Enhanced Audio Object: EAO)로 설정하지 않고, MBO 및 노멀 오브젝트를 포함하여 EAO로 설정하고 레지듀얼을 생성할 수도 있다. 이렇게 생성된 레지듀얼을 이용하는 경우 도 19의 (C) 및 (D)에 나타난 바와 같이 모노 또는 스테레오 채널의 EAO(EAO_m, 및 EAO_L, EAO_R)를 추출하고, EAO에 포함된 이외의 오브젝트인 레귤러 오브젝트(Regular obj_N)이 추출될 수 있다. However, if the residual signal is generated in the encoder, instead of setting the MBO as an enhanced audio object (EAO), which is the background of the karaoke mode, the MBO and the normal object are set to EAO and the residual is set. You can also create In the case of using the generated residual, the EAO (EAO _m , and EAO _L , EAO _R ) of a mono or stereo channel is extracted as shown in FIGS. 19C and 19D, and other objects included in the EAO. An regular object (Regular obj _N ) may be extracted.

이하에서는, 도 19의 (A) 및 (B)에 나타난 바와 같이 MBO가 가라오케 모드 및 솔로 모드에서의 EAO를 구성하는 경우에 대해서 설명하고자 한다.Hereinafter, a case in which the MBO configures the EAO in the karaoke mode and the solo mode as shown in FIGS. 19A and 19B will be described.

다시 도 17을 참조하면, 추출 유닛(220)에 의해 추출된 MBO 및 노멀 오브젝트는 렌더링 유닛(224)에 유입된다. 렌더링 유닛(224)은 렌더링 정보(Rendering Information)(RI)를 근거로 MBO 및 노멀 오브젝트 중 하나 이상을 억압할 수 있다. 여기서 렌더링 정보(RI)는 모드 정보를 포함할 수 있는데, 모드 정보란, 일반 모드, 가라오케 모드, 및 솔로 모드 중 하나를 선택하는 정보이다. 일반 모드는 가라오케 모드 및 솔로 모드 모두 선택하지 않는 정보이고, 가라오케 모드는 MBO(또는 MBO를 포함한 EAO) 이외의 오브젝트를 억압하는 모드이고 솔로 모드는 MBO를 억압하는 모드에 해당한다. 한편 렌더링 정보(RI)는 믹스 정보(MXI) 그 자체일 수도 있고 정보 생성 유닛(228)이 믹스 정보(MXI)를 근거로 생성한 정보일 수 있으나, 본 발명은 이에 한정되지 아니한다. 믹스 정보는 도 18과 함께 구체적으로 설명하고자 한다.Referring back to FIG. 17, the MBO and normal objects extracted by the extraction unit 220 flow into the rendering unit 224. The rendering unit 224 may suppress one or more of the MBO and the normal object based on the rendering information (RI). The rendering information (RI) may include mode information. The mode information is information for selecting one of a normal mode, a karaoke mode, and a solo mode. Normal mode is information that neither karaoke mode nor solo mode is selected. Karaoke mode suppresses objects other than MBO (or EAO including MBO), and solo mode corresponds to suppressing MBO. The rendering information RI may be the mix information MXI itself or information generated by the information generating unit 228 based on the mix information MXI, but the present invention is not limited thereto. Mix information will be described in detail with reference to FIG. 18.

만약, 렌더링 유닛(224)이 만약 MBO이외의 노멀 오브젝트를 억압하는 경우가 가라오케 모드 MBO만이 멀티채널 디코더(230)로 출력되고, 정보 생성 유닛(228)은 다운믹스 프로세싱 정보(DPI) 및 제2 공간정보를 생성하지 않는다. 물론 다운믹스 프로세싱 유닛(226) 또한 활성화되지 않을 수 있다. 그리고 수신된 제1 공간정보가 멀티채널 디코더(230)로 전달된다. If the rendering unit 224 suppresses normal objects other than the MBO, only the karaoke mode MBO is output to the multichannel decoder 230, and the information generating unit 228 is configured to downmix processing information (DPI) and the second. Do not generate spatial information. Of course the downmix processing unit 226 may also not be activated. The received first spatial information is transmitted to the multichannel decoder 230.

멀티채널 디코더(230)은 수신된 제1 공간정보를 이용하여 MBO를 멀티채널 신호로 업믹스할 수 있다. 즉 가라오케 모드의 경우 MBO 트랜스코더(220)는 수신된 공간정보와 다운믹스 신호로부터 추출된 MBO를 멀티채널 디코더로 전달하는 하는 것이다.The multichannel decoder 230 may upmix the MBO into a multichannel signal using the received first spatial information. That is, in the karaoke mode, the MBO transcoder 220 delivers the MBO extracted from the received spatial information and the downmix signal to the multichannel decoder.

도 18은 솔로 모드인 경우 MBO 트랜스코더(220)의 동작을 나타내고 있다. 추출 유닛(222)은 마찬가지로 다운믹스(DMX)로부터 MBO 및 노멀 오브젝트를 추출한다. 렌더링 파트(224)는 렌더링 정보(RI)를 이용하여 솔로모드인 경우 MBO를 억압하고 노멀 오브젝트를 다운믹스 프로세싱 파트(226)에 전달한다. 18 illustrates an operation of the MBO transcoder 220 in the solo mode. The extraction unit 222 likewise extracts the MBO and normal objects from the downmix DMX. The rendering part 224 suppresses the MBO in the solo mode using the rendering information RI and delivers the normal object to the downmix processing part 226.

한편 정보 생성 유닛(228)은 오브젝트 정보 및 믹스 정보(MXI)를 이용하여 다운믹스 프로세싱 정보(DPI)를 생성한다. 여기서 믹스 정보(MXI)란 오브젝트 위치 정보(object position information), 오브젝트 게인 정보(object gain information), 및 재생 환경 정보(playback configuration information) 등을 근거로 생성된 정보이다. 여기서 오브젝트 위치 정보 및 오브젝트 게인 정보는 다운믹스에 포함된 오브젝트를 제어하기 위한 정보로서, 여기서 오브젝트는 앞서 설명한 노멀 오브젝트뿐만 아니라 EAO까지 포함되는 개념일 수 있다.Meanwhile, the information generating unit 228 generates the downmix processing information DPI using the object information and the mix information MXI. The mix information MXI is information generated based on object position information, object gain information, playback configuration information, and the like. The object position information and the object gain information are information for controlling an object included in the downmix. Here, the object may be a concept including not only the normal object but also the EAO.

구체적으로, 오브젝트 위치 정보란, 사용자가 각 오브젝트의 위치 또는 패닝(panning)를 제어하기 위해 입력한 정보이며, 오브젝트 게인 정보란, 사용자가 각 오브젝트의 게인(gain)을 제어하기 위해 입력한 정보이다. 따라서 여기 오브젝트 게인 정보는 노멀 오브젝트에 대한 게인 컨트롤 정보뿐만 아니라 EAO에 대한 게인 컨트롤 정보를 포함할 수 있다. Specifically, the object position information is information input by the user to control the position or panning of each object, and the object gain information is information input by the user to control the gain of each object. . Therefore, the object gain information here may include gain control information for the EAO as well as gain control information for the normal object.

한편, 오브젝트 위치 정보 및 오브젝트 게인 정보는 프리셋 모드들로부터 선택된 하나일 수 있는데, 프리셋 모드란, 시간에 따라 오브젝트 특정 게인 및 특정 위치가 미리 결정된 값으로서, 프리셋 모드 정보는, 다른 장치로부터 수신된 값일 수도 있고, 장치에 저장되어 있는 값일 수도 있다. 한편, 하나 이상의 프리셋 모드들(예: 프리셋 모드 사용안함, 프리셋 모드 1, 프리셋 모드 2 등) 중 하나를 선택하는 것은 사용자 입력에 의해 결정될 수 있다. 재생환경 정보는, 스피커의 개수, 스피커의 위치, 앰비언트 정보(speaker의 가상 위치) 등을 포함하는 정보로서, 사용자로부터 입력받을 수도 있고, 미리 저장되어 있을 수도 있으며, 다른 장치로부터 수신할 수도 있다.Meanwhile, the object position information and the object gain information may be one selected from preset modes. The preset mode is a value in which an object specific gain and a specific position are predetermined according to time, and the preset mode information is a value received from another device. It may be a value stored in the device. Meanwhile, selecting one of one or more preset modes (eg, not using preset mode, preset mode 1, preset mode 2, etc.) may be determined by user input. The reproduction environment information is information including the number of speakers, the location of the speakers, the ambient information (virtual location of the speaker), and the like. The reproduction environment information may be input from a user, may be stored in advance, or may be received from another device.

한편 앞서 설명한 바와 같이, 믹스 정보(MXI)는 일반 모드, 가라오케 모드, 및 솔로 모드 중 하나를 선택하기 위한 정보인 모드 정보를 더 포함할 수 있다.As described above, the mix information MXI may further include mode information which is information for selecting one of a normal mode, a karaoke mode, and a solo mode.

한편, 정보 생성 유닛(228)은 디코딩 모드인 경우에는 다운믹스 프로세싱 정보(DPI)만을 생성할 수 있지만, 트랜스코딩 모드인 경우(즉, 멀티채널 디코더를 이용하는 모드인 경우) 오브젝트 정보 및 믹스 정보(MXI)를 이용하여 제2 공간 정보를 생성한다. 제2 공간 정보는 제1 공간 정보와 마찬가지로 채널 레벨 차이, 채널 상관 정보 등을 포함한다. 다만, 제1 공간 정보는 오브젝트 위치 및 레벨을 제어하는 기능이 반영되어 있지 않은 반면에, 제2 공간 정보는 믹스 정보(MXI)를 기반으로 생성된 것이기 때문에 사용자가 오브젝트별로 위치 및 레벨을 제어하는 것이 반영되어 있는 것이다. On the other hand, the information generating unit 228 can generate only the downmix processing information (DPI) in the decoding mode, but in the transcoding mode (that is, the mode using the multichannel decoder), the object information and the mix information ( Second spatial information is generated using MXI). The second spatial information, like the first spatial information, includes channel level difference, channel correlation information, and the like. However, since the first spatial information does not reflect the function of controlling the object position and level, the second spatial information is generated based on the mix information (MXI). It is reflected.

한편 정보 생성 유닛(228)은 출력 채널이 멀티채널이고, 입력 채널이 모노 채널인 경우 다운믹스 프로세싱 정보(DPI)를 생성하지 않을 수 있고, 이 경우 다운믹스 프로세싱 유닛(226)은 입력 신호를 바이패스하여 멀티채널 디코더(230)로 전달한다.On the other hand, the information generating unit 228 may not generate the downmix processing information (DPI) when the output channel is multichannel and the input channel is a mono channel, in which case the downmix processing unit 226 receives the input signal. Pass to the multi-channel decoder 230.

한편, 다운믹스 프로세싱 유닛(226)은 다운믹스 프로세싱 정보(DPI)를 이용하여 노멀 오브젝트에 대해 프로세싱을 수행함으로써 프로세싱된 다운믹스를 생성한다. 여기서 프로세싱은 입력 채널수와 출력 채널수를 변화시키지 않고, 오브젝트의 게인 및 패닝을 조정하기 위한 것이다. 만약 디코딩 모드인 경우(출력 모드가 모노 채널, 스테레오 채널, 3D 스테레오 채널(바이노럴 모드))이 경우에는 다운믹스 프로세싱 유닛(226)은 시간 도메인의 프로세싱된 다운믹스를 최종적 출력 신호로서 출력한다(미도시). 즉 프로세싱된 다운믹스를 멀티채널 디코더(230)로 전달하지 않는 것이다. 반대로 트랜스코딩 모드인 경우(출력 모드가 멀티 채널인 경우), 다운믹스 프로세싱 유닛(226)은 프로세싱된 다운믹스를 멀티채널 디코더(230)로 전달한다. 한편, 수신된 제1 공간정보는 멀티채널 디코더(230)로 전달되지 않는다.Meanwhile, the downmix processing unit 226 generates a processed downmix by performing processing on the normal object using the downmix processing information (DPI). The processing here is to adjust the gain and panning of the object without changing the number of input channels and output channels. If the decoding mode (output mode is mono channel, stereo channel, 3D stereo channel (binar mode)), the downmix processing unit 226 outputs the processed downmix of the time domain as the final output signal. (Not shown). That is, the processed downmix is not transmitted to the multichannel decoder 230. In contrast, when in transcoding mode (output mode is multi-channel), the downmix processing unit 226 delivers the processed downmix to the multichannel decoder 230. Meanwhile, the received first spatial information is not transmitted to the multichannel decoder 230.

그러면 멀티채널 디코더(230)는 정보 생성 유닛(228)에 의해 생성된 제2 공간 정보를 이용하여 프로세싱된 다운믹스를 멀티채널 신호로 업믹싱한다.
The multichannel decoder 230 then upmixes the processed downmix into a multichannel signal using the second spatial information generated by the information generating unit 228.

<가라오케 모드에 대한 어플리케이션 시나리오>Application scenarios for karaoke mode

가라오케 및 솔로모드에 있어서, 오브젝트는 노멀 오브젝트 및 EAO로 분류된다. 리드 보컬 신호가 레귤러 오브젝트의 좋은 예이고, 노래방 트랙이 EAO이 될 수 있다. 그러나 EAO 및 레귤러 오브젝트에 대한 강한 제한(strict limitation)은 없다. TTN 모듈의 레지듀얼 개념의 이점에 의해, 6개의 오브젝트(2 개의 스테레오 EAO 및 4개의 레귤러 오브젝트)까지 TTN 모듈에 의해 높은 품질로 분리될 수 있다.In karaoke and solo mode, objects are classified as normal objects and EAO. The lead vocal signal is a good example of a regular object, and the karaoke track can be EAO. However, there are no strict limitations on EAO and regular objects. By virtue of the residual concept of the TTN module, up to six objects (two stereo EAOs and four regular objects) can be separated with high quality by the TTN module.

가라오케 및 솔로 모드에서, 각각 EAO 및 레귤러 오브젝트에 대한 레지듀얼 신호가 분리 품질을 위해 필요하다. 그렇기 때문에 오브젝트의 수에 비례하여 총 비트레이트 수가 증가하는데, 오브젝트의 수를 낮추기 위해서는 EAO 및 레귤러 오브젝트로 오브젝트를 그룹핑하는 것이 요구된다. EAO 및 노멀 오브젝트로 그룹핑된 오브젝트는 비트 절감의 대가로 각각 제어될 수 없다.In karaoke and solo modes, residual signals for EAO and regular objects, respectively, are needed for isolation quality. Therefore, the total number of bitrates increases in proportion to the number of objects. In order to lower the number of objects, it is required to group the objects into EAO and regular objects. Objects grouped into EAO and normal objects cannot each be controlled at the expense of bit savings.

그러나 어떤 응용 시나리오에서는, 높은 품질의 가라오케의 기능을 갖는 것과 동시에 각 동반하는 오브젝트를 중간 정도의 레벨로 제어하는 기능을 갖는 것이 요구될 수 있다. 5 스테레오 오브젝트가 존재할 때(즉, 리드 보컬, 리드 기타, 베이스 기타, 드럼 및 키보드) 상호적인 뮤직 리믹스의 전형적인 예를 생각해보자. 이 경우, 리드 보컬이 레귤러 오브젝트를 형성하고, 나머지 4개의 스테레오 오브젝트들의 혼합물이 EAO를 구성한다. 유저는 제작자 믹스 버전(전송된 다운믹스), 가라오케 버전, 솔로 버전(아 카펠라 버전)을 즐길 수 있다. 그러나, 이 경우, 유저는 선호하는 메가 베이스(MegaBass)모드를 위해 베이스 기타, 또는 드럼을 부스트(boost)시킬 수 없다.However, in some application scenarios, it may be required to have a function of high quality karaoke and at the same time control each accompanying object to a moderate level. Consider a typical example of an interactive music remix when five stereo objects exist (ie, lead vocals, lead guitars, bass guitars, drums, and keyboards). In this case, the lead vocals form a regular object, and a mixture of the remaining four stereo objects makes up the EAO. Users can enjoy the creator mix version (downmix sent), the karaoke version, and the solo version (a capella version). In this case, however, the user cannot boost the bass guitar or drum for the preferred MegaBass mode.

일반 모드에서는, 적은 정보량(예: 비트레이트가 3kpbs/object)에도 불구하고 다운믹스의 모든 오브젝트를 일반적인 정도의 렌더링 파라미터로 모든 그리고 각각의 오브젝트를 제어하는 것이 가능하지만, 분리의 높은 품질이 성취되지는 않는다. 한편, 가라오케 및 솔로 모드에서는 거의 완전히 노멀 오브젝트를 분리하는 것은 가능하지만, 제어가능한 오브젝트 수가 감소된다. 따라서, 어플리케이션은 일반 모드와 가라오케/솔로 모드 중 하나만이 배타적으로 선택되도록 강제할 수 있다. 이와 같이 어플리케이션의 시나리오의 요구를 이행하기 위해 일반 모드와 가라오케/솔로모드의 이점의 결합을 고려하는 것을 제안할 수 있다.
In normal mode, it is possible to control all and each of the objects in the downmix with general rendering parameters, despite the small amount of information (e.g. 3kpbs / object bitrate), but the high quality of separation is not achieved. Does not. On the other hand, in karaoke and solo modes it is possible to separate normal objects almost completely, but the number of controllable objects is reduced. Thus, the application can force that only one of the normal mode and the karaoke / solo mode is selected exclusively. As such, it may be suggested to consider the combination of the benefits of normal mode and karaoke / solo mode to fulfill the needs of application scenarios.

가라오케/솔로 모드에서 TTN 매트릭스는 프리딕션 모드 및 에너지 모드에 의해서 획득된다. 레지듀얼 신호는 프리딕션 모드에서 필요하거, 반면 에너미 모드는 레지듀얼 신호 없이 동작될 수 있다.In karaoke / solo mode, the TTN matrix is obtained by the prediction mode and the energy mode. The residual signal is needed in the prediction mode, while the enemy mode can be operated without the residual signal.

가라오케/솔로 모드의 개념을 떠나서, EAO와 레귤러 신호를 떠나서, 에너지 기반의 솔로/레지듀얼 모드와 일반 모드와의 큰 차이점이 없는 것이 고려될 수 있다. 두 프로세싱 모드에서 오브젝트 파라미터는 같은데, 다만, 프로세싱된 출력이 다르다. 일반 모드에서는 최종적으로 렌더링된 신호를 출력하는데 비해, 에너지 기반의 가라오케/솔로 모드에서는 분리된 오브젝트를 출력하고, 렌더링 포스트 프로세싱 유닛을 더 필요로 한다. 결과적으로, 이러한 두 접근이 출력의 품질을 차별화시키지 않는다고 가정할 때, 오브젝트 비트스트림를 디코딩하는데 두 개의 다른 디스크립션이 존재한다. 이것이 해석하고 구현하는데 있어서 혼란을 야기시킨다.Apart from the concept of karaoke / solo mode, apart from EAO and regular signals, it can be considered that there is no significant difference between energy-based solo / residual mode and normal mode. In both processing modes, the object parameters are the same, except that the processed output is different. In the normal mode, the final rendered signal is output, while in the energy-based karaoke / solo mode, the separate object is output and a rendering post processing unit is required. As a result, assuming that these two approaches do not differentiate the quality of the output, there are two different descriptions for decoding the object bitstream. This causes confusion in interpretation and implementation.

일반 모드와 에너지 기반의 가라오케/솔로 모드 간의 이러한 중복을 명확히 하고 가능한 통합할 것을 제안한다.
We propose to clarify and possibly integrate this redundancy between normal and energy-based karaoke / solo modes.

<레지듀얼 신호에 대한 정보><Information on the residual signal>

레지듀얼 신호의 컨피그레이션은 ResidualConfig()에서 정의되고, 레지듀얼 신호는 ResidualData()를 통해 전송된다. 그러나, 레지듀얼 신호가 적용된 오브젝트가 어떤 오브젝트인지에 대한 정보가 없다. 이러한 모호함을 제거하기 위해서 또한 레지듀얼과 오브젝트가 미스 매칭되는 위험을 제거하기 위해서 오브젝트 비트스트림에서 레지듀얼 신호에 대한 부가적인 정보가 전송되는 것이 요구된다. 이 정보는 ResidualConfig()에 삽입될 수 잇다. 이와 같이 레지듀얼 신호에 대한 부가 정보, 특히 레지듀얼 신호가 어느 오브젝트 신호에 적용될 것인지에 대한 정보를 갖는 것을 제안한다.The configuration of the residual signal is defined in ResidualConfig (), and the residual signal is transmitted through ResidualData (). However, there is no information on which object the object to which the residual signal is applied is. In order to eliminate this ambiguity and to eliminate the risk of mismatching the residual and the object, additional information about the residual signal in the object bitstream is required to be transmitted. This information can be inserted into ResidualConfig (). In this way, it is proposed to have additional information about the residual signal, in particular, to which object signal the residual signal is applied.

본 발명에 따른 오디오 신호 처리 장치는 다양한 제품에 포함되어 이용될 수 있다. 이러한 제품은 크게 스탠드 얼론(stand alone) 군과 포터블(portable) 군으로 나뉠 수 있는데, 스탠드 얼론군은 티비, 모니터, 셋탑 박스 등을 포함할 수 있고, 포터블군은 PMP, 휴대폰, 네비게이션 등을 포함할 수 있다.The audio signal processing apparatus according to the present invention can be included and used in various products. These products can be broadly divided into stand alone and portable groups, which can include TVs, monitors and set-top boxes, and portable groups include PMPs, mobile phones, and navigation. can do.

도 20는 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 구현된 제품들의 관계를 보여주는 도면이다. 우선 도 20을 참조하면, 유무선 통신부(310)는 유무선 통신 방식을 통해서 비트스트림을 수신한다. 구체적으로 유무선 통신부(310)는 유선통신부(310A), 적외선통신부(310B), 블루투스부(310C), 무선랜통신부(310D) 중 하나 이상을 포함할 수 있다.20 is a diagram illustrating a relationship between products in which an audio signal processing apparatus according to an embodiment of the present invention is implemented. First, referring to FIG. 20, the wired / wireless communication unit 310 receives a bitstream through a wired / wireless communication scheme. Specifically, the wired / wireless communication unit 310 may include at least one of a wired communication unit 310A, an infrared communication unit 310B, a Bluetooth unit 310C, and a wireless LAN communication unit 310D.

사용자 인증부는(320)는 사용자 정보를 입력 받아서 사용자 인증을 수행하는 것으로서 지문인식부(320A), 홍채인식부(320B), 얼굴인식부(320C), 및 음성인식부(320D) 중 하나 이상을 포함할 수 있는데, 각각 지문, 홍채정보, 얼굴 윤곽 정보, 음성 정보를 입력받아서, 사용자 정보로 변환하고, 사용자 정보 및 기존 등록되어 있는 사용자 데이터와의 일치여부를 판단하여 사용자 인증을 수행할 수 있다. The user authentication unit 320 receives user information and performs user authentication. The user authentication unit 320 includes one or more of the fingerprint recognition unit 320A, the iris recognition unit 320B, the face recognition unit 320C, and the voice recognition unit 320D. The fingerprint, iris information, facial contour information, and voice information may be input, converted into user information, and the user authentication may be performed by determining whether the user information matches the existing registered user data. .

입력부(330)는 사용자가 여러 종류의 명령을 입력하기 위한 입력장치로서, 키패드부(330A), 터치패드부(330B), 리모컨부(330C) 중 하나 이상을 포함할 수 있지만, 본 발명은 이에 한정되지 아니한다. The input unit 330 is an input device for a user to input various types of commands, and may include one or more of a keypad unit 330A, a touch pad unit 330B, and a remote controller unit 330C. It is not limited.

신호 코딩 유닛(340)는 유무선 통신부(310)를 통해 수신된 오디오 신호 및/또는 비디오 신호에 대해서 인코딩 또는 디코딩을 수행하고, 시간 도메인의 오디오 신호를 출력한다. 오디오 신호 처리 장치(345)를 포함하는데, 이는 앞서 설명한 본 발명의 실시예(즉, 인코더(100) 및/또는 디코더(200))에 해당하는 것으로서, 이와 같이 오디오 처리 장치(345) 및 이를 포함한 신호 코딩 유닛은 하나 이상의 프로세서에 의해 구현될 수 있다.The signal coding unit 340 encodes or decodes an audio signal and / or a video signal received through the wired / wireless communication unit 310, and outputs an audio signal of a time domain. Audio signal processing device 345, which corresponds to the embodiment of the present invention (ie, encoder 100 and / or decoder 200) described above, and thus includes audio processing device 345 and the same. The signal coding unit may be implemented by one or more processors.

제어부(350)는 입력장치들로부터 입력 신호를 수신하고, 신호 디코딩부(340)와 출력부(360)의 모든 프로세스를 제어한다. 출력부(360)는 신호 디코딩부(340)에 의해 생성된 출력 신호 등이 출력되는 구성요소로서, 스피커부(360A) 및 디스플레이부(360B)를 포함할 수 있다. 출력 신호가 오디오 신호일 때 출력 신호는 스피커로 출력되고, 비디오 신호일 때 출력 신호는 디스플레이를 통해 출력된다.The control unit 350 receives input signals from the input devices and controls all processes of the signal decoding unit 340 and the output unit 360. The output unit 360 is a component in which an output signal generated by the signal decoding unit 340 is output, and may include a speaker unit 360A and a display unit 360B. When the output signal is an audio signal, the output signal is output to the speaker, and when the output signal is a video signal, the output signal is output through the display.

도 21는 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 구현된 제품들의 관계도이다. 도 21는 도 20에서 도시된 제품에 해당하는 단말 및 서버와의 관계를 도시한 것으로서, 도 21의 (A)를 참조하면, 제1 단말(300.1) 및 제2 단말(300.2)이 각 단말들은 유무선 통신부를 통해서 데이터 내지 비트스트림을 양방향으로 통신할 수 있음을 알 수 있다. 도 21의 (B)를 참조하면, 서버(500) 및 제1 단말(300.1) 또한 서로 유무선 통신을 수행할 수 있음을 알 수 있다.21 is a relationship diagram of products in which an audio signal processing apparatus according to an embodiment of the present invention is implemented. FIG. 21 is a diagram illustrating a relationship between a terminal and a server corresponding to the product illustrated in FIG. 20. Referring to FIG. 21A, the first terminal 300.1 and the second terminal 300. It can be seen that the data to the bitstream can be bidirectionally communicated through the wired / wireless communication unit. Referring to FIG. 21B, it can be seen that the server 500 and the first terminal 300.1 may also perform wired / wireless communication with each other.

본 발명에 따른 오디오 신호 처리 방법은 컴퓨터에서 실행되기 위한 프로그램으로 제작되어 컴퓨터가 읽을 수 있는 기록 매체에 저장될 수 있으며, 본 발명에 따른 데이터 구조를 가지는 멀티미디어 데이터도 컴퓨터가 읽을 수 있는 기록 매체에 저장될 수 있다. 상기 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 저장 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한, 상기 인코딩 방법에 의해 생성된 비트스트림은 컴퓨터가 읽을 수 있는 기록 매체에 저장되거나, 유/무선 통신망을 이용해 전송될 수 있다.The audio signal processing method according to the present invention can be stored in a computer-readable recording medium which is produced as a program for execution in a computer, and multimedia data having a data structure according to the present invention can also be stored in a computer-readable recording medium. Can be stored. The computer readable recording medium includes all kinds of storage devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like, and may also be implemented in the form of a carrier wave (for example, transmission over the Internet). Include. In addition, the bitstream generated by the encoding method may be stored in a computer-readable recording medium or transmitted using a wired / wireless communication network.

이상과 같이, 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 이것에 의해 한정되지 않으며 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 본 발명의 기술사상과 아래에 기재될 특허청구범위의 균등범위 내에서 다양한 수정 및 변형이 가능함은 물론이다. As described above, although the present invention has been described by way of limited embodiments and drawings, the present invention is not limited thereto and is intended by those skilled in the art to which the present invention pertains. Of course, various modifications and variations are possible within the scope of equivalents of the claims to be described.

본 발명은 오디오 신호를 인코딩하고 디코딩하는데 데 적용될 수 있다.The present invention can be applied to encoding and decoding audio signals.

Claims

하나 이상의 노멀 오브젝트 신호를 포함하는 다운믹스 신호를 수신하는 단계;
상기 다운믹스 신호가 생성될 때 결정된 오브젝트 정보를 포함하는 비트스트림을 수신하는 단계;
상기 다운믹스 신호가 멀티채널 오브젝트 신호를 더 포함하는지 여부를 나타내는 확장 타입 식별자를 상기 비트스트림의 확장 파트로부터 추출하는 단계;
상기 다운믹스 신호가 멀티채널 오브젝트 신호를 더 포함하는 것을 상기 확장 타입 식별자가 지시하는 경우, 상기 비트스트림으로부터 제1 공간 정보를 추출하는 단계; 및
상기 제1 공간 정보 및 제2 공간 정보 중 하나 이상을 전송하는 단계를 포함하고,
상기 제1 공간 정보는 멀티채널 소스 신호가 상기 멀티채널 오브젝트 신호로 다운믹스될 때 결정되는 것이고,
상기 제2 공간 정보는 상기 오브젝트 정보 및 믹스 정보를 이용하여 생성되는 것을 특징으로 하는 오디오 신호 처리 방법.Receiving a downmix signal comprising one or more normal object signals;
Receiving a bitstream including object information determined when the downmix signal is generated;
Extracting, from the extended part of the bitstream, an extension type identifier indicating whether the downmix signal further includes a multichannel object signal;
Extracting first spatial information from the bitstream when the extended type identifier indicates that the downmix signal further includes a multichannel object signal; And
Transmitting at least one of the first spatial information and the second spatial information,
The first spatial information is determined when the multichannel source signal is downmixed to the multichannel object signal.
The second spatial information is generated using the object information and the mix information.

제 1 항에 있어서,
상기 제1 공간 정보 및 상기 제2 공간 정보 중 하나 이상은 상기 멀티채널 오브젝트가 억압되는 지 여부를 지시하는 모드 정보에 따라 전송되는 것을 특징으로 하는 오디오 신호 처리 방법.The method of claim 1,
At least one of the first spatial information and the second spatial information is transmitted according to mode information indicating whether the multichannel object is suppressed.

제 1 항에 있어서,
상기 멀티채널 오브젝트 신호가 억압되지 않는 것을 상기 모드 정보가 지시하는 경우, 상기 제1 공간 정보가 전송되고,
상기 멀티채널 오브젝트 신호가 억압되는 것을 상기 모드 정보가 지시하는 경우, 상기 제2 공간 정보가 전송되는 것을 특징으로 하는 오디오 신호 처리 방법.The method of claim 1,
When the mode information indicates that the multichannel object signal is not suppressed, the first spatial information is transmitted,
And when the mode information indicates that the multichannel object signal is suppressed, the second spatial information is transmitted.

제 1 항에 있어서,
상기 제1 공간정보가 전송되는 경우, 제1 공간 정보 및 상기 멀티채널 오브젝트 신호를 이용하여 멀티채널 신호를 생성하는 단계를 더 포함하는 것을 특징으로 하는 오디오 신호 처리 방법.The method of claim 1,
And generating the multichannel signal by using the first spatial information and the multichannel object signal when the first spatial information is transmitted.

제 1 항에 있어서,
상기 제2 공간 정보가 생성되는 경우, 상기 제2 공간 정보 및 상기 노멀 오브젝트 신호를 이용하여 출력 신호를 생성하는 단계를 더 포함하는 것을 특징으로 하는 오디오 신호 처리 방법.The method of claim 1,
And generating the output signal by using the second spatial information and the normal object signal when the second spatial information is generated.

제 1 항에 있어서,
상기 제2 공간 정보가 전송되는 경우, 상기 오브젝트 정보 및 상기 믹스 정보를 이용하여 다운믹스 프로세싱 정보를 생성하는 단계;
상기 다운믹스 프로세싱 정보를 이용하여 상기 노멀 오브젝트 신호를 프로세싱함으로써 프로세싱된 다운믹스 신호를 생성하는 단계를 더 포함하는 것을 특징으로 하는 오디오 신호 처리 방법.The method of claim 1,
Generating downmix processing information using the object information and the mix information when the second spatial information is transmitted;
Generating a processed downmix signal by processing the normal object signal using the downmix processing information.

제 1 항에 있어서,
상기 제1 공간 정보는 공간 컨피그레이션 정보 및 공간 프레임 데이터를 포함하는 것을 특징으로 하는 오디오 신호 처리 방법.The method of claim 1,
The first spatial information includes spatial configuration information and spatial frame data.

하나 이상의 노멀 오브젝트 신호를 포함하는 다운믹스 신호를 수신하고, 상기 다운믹스 신호가 생성될 때 결정된 오브젝트 정보를 포함하는 비트스트림을 수신하는 수신 유닛;
상기 다운믹스 신호가 멀티채널 오브젝트 신호를 더 포함하는지 여부를 나타내는 확장 타입 식별자를 상기 비트스트림의 확장 파트로부터 추출하는 확장 타입 식별자 추출 파트;
상기 다운믹스 신호가 멀티채널 오브젝트 신호를 더 포함하는 것을 상기 확장 타입 식별자가 지시하는 경우, 상기 비트스트림으로부터 제1 공간 정보를 추출하는 제1 공간 정보 추출 파트; 및,
상기 제1 공간 정보 및 제2 공간 정보 중 하나 이상을 전송하는 멀티채널 오브젝트 트랜스코더를 포함하고,
상기 제1 공간 정보는 멀티채널 소스 신호가 상기 멀티채널 오브젝트 신호로 다운믹스될 때 결정되는 것이고,
상기 제2 공간 정보는 상기 오브젝트 정보 및 믹스 정보를 이용하여 생성되는 것을 특징으로 하는 오디오 신호 처리 장치.A receiving unit for receiving a downmix signal including at least one normal object signal and receiving a bitstream including object information determined when the downmix signal is generated;
An extension type identifier extraction part for extracting from the extension part of the bitstream an extension type identifier indicating whether the downmix signal further includes a multichannel object signal;
A first spatial information extraction part that extracts first spatial information from the bitstream when the extended type identifier indicates that the downmix signal further includes a multichannel object signal; And,
A multichannel object transcoder for transmitting at least one of the first spatial information and the second spatial information;
The first spatial information is determined when the multichannel source signal is downmixed to the multichannel object signal.
The second spatial information is generated using the object information and the mix information.

제 8 항에 있어서,
상기 제1 공간 정보 및 상기 제2 공간 정보 중 하나 이상은 상기 멀티채널 오브젝트가 억압되는 지 여부를 지시하는 모드 정보에 따라 전송되는 것을 특징으로 하는 오디오 신호 처리 장치.The method of claim 8,
At least one of the first spatial information and the second spatial information is transmitted according to mode information indicating whether the multichannel object is suppressed.

제 9 항에 있어서,
상기 멀티채널 오브젝트 신호가 억압되지 않는 것을 상기 모드 정보가 지시하는 경우, 상기 제1 공간 정보가 전송되고,
상기 멀티채널 오브젝트 신호가 억압되는 것을 상기 모드 정보가 지시하는 경우, 상기 제2 공간 정보가 전송되는 것을 특징으로 하는 오디오 신호 처리 장치.The method of claim 9,
When the mode information indicates that the multichannel object signal is not suppressed, the first spatial information is transmitted,
And when the mode information indicates that the multi-channel object signal is suppressed, the second spatial information is transmitted.

제 8 항에 있어서,
상기 제1 공간정보가 전송되는 경우, 제1 공간 정보 및 상기 멀티채널 오브젝트 신호를 이용하여 멀티채널 신호를 생성하는 멀티채널 디코더를 더 포함하는 것을 특징으로 하는 오디오 신호 처리 장치.The method of claim 8,
And a multichannel decoder configured to generate a multichannel signal using the first spatial information and the multichannel object signal when the first spatial information is transmitted.

제 8 항에 있어서,
상기 제2 공간 정보가 생성되는 경우, 상기 제2 공간 정보 및 상기 노멀 오브젝트 신호를 이용하여 출력 신호를 생성하는 멀티채널 디코더를 더 포함하는 것을 특징으로 하는 오디오 신호 처리 장치.The method of claim 8,
And generating the output signal by using the second spatial information and the normal object signal when the second spatial information is generated.

제 8 항에 있어서,
상기 멀티채널 트랜스코더는,
상기 제2 공간 정보가 전송되는 경우, 상기 오브젝트 정보 및 상기 믹스 정보를 이용하여 다운믹스 프로세싱 정보를 생성하는 정보 생성 파트; 및
상기 다운믹스 프로세싱 정보를 이용하여 상기 노멀 오브젝트 신호를 프로세싱함으로써 프로세싱된 다운믹스 신호를 생성하는 다운믹스 프로세싱 파트를 포함하는 것을 특징으로 하는 오디오 신호 처리 장치.The method of claim 8,
The multichannel transcoder,
An information generation part for generating downmix processing information using the object information and the mix information when the second spatial information is transmitted; And
And a downmix processing part configured to generate a processed downmix signal by processing the normal object signal using the downmix processing information.

제 8 항에 있어서,
상기 제1 공간 정보는 공간 컨피그레이션 정보 및 공간 프레임 데이터를 포함하는 것을 특징으로 하는 오디오 신호 처리 장치.The method of claim 8,
The first spatial information includes spatial configuration information and spatial frame data.

하나 이상의 노멀 오브젝트 신호를 포함하는 다운믹스 신호를 수신하는 단계;
상기 다운믹스 신호가 생성될 때 결정된 오브젝트 정보를 포함하는 비트스트림을 수신하는 단계;
상기 다운믹스 신호가 멀티채널 오브젝트 신호를 더 포함하는지 여부를 나타내는 확장 타입 식별자를 상기 비트스트림의 확장 파트로부터 추출하는 단계;
상기 다운믹스 신호가 멀티채널 오브젝트 신호를 더 포함하는 것을 상기 확장 타입 식별자가 지시하는 경우, 상기 비트스트림으로부터 제1 공간 정보를 추출하는 단계; 및
상기 제1 공간 정보 및 제2 공간 정보 중 하나 이상을 전송하는 단계를 포함하고,
상기 제1 공간 정보는 멀티채널 소스 신호가 상기 멀티채널 오브젝트 신호로 다운믹스될 때 결정되는 것이고,
상기 제2 공간 정보는 상기 오브젝트 정보 및 믹스 정보를 이용하여 생성되는 것을 특징으로 하는,
동작들을, 프로세서에 의해 실행될 때, 상기 프로세서가 수행하도록 하는 명령들이 저장되어 있는 컴퓨터로 읽을 수 있는 매체.Receiving a downmix signal comprising one or more normal object signals;
Receiving a bitstream including object information determined when the downmix signal is generated;
Extracting, from the extended part of the bitstream, an extension type identifier indicating whether the downmix signal further includes a multichannel object signal;
Extracting first spatial information from the bitstream when the extended type identifier indicates that the downmix signal further includes a multichannel object signal; And
Transmitting at least one of the first spatial information and the second spatial information,
The first spatial information is determined when the multichannel source signal is downmixed to the multichannel object signal.
The second spatial information is generated using the object information and the mix information.
A computer readable medium having stored thereon instructions which, when executed by a processor, cause the processor to perform.