KR20140128181A

KR20140128181A - Rendering for exception channel signal

Info

Publication number: KR20140128181A
Application number: KR1020130047054A
Authority: KR
Inventors: 오현오; 이태규; 송명석; 송정욱
Original assignee: 인텔렉추얼디스커버리 주식회사
Priority date: 2013-04-27
Filing date: 2013-04-27
Publication date: 2014-11-05
Also published as: KR102058619B1

Abstract

The present invention relates to a method and an apparatus for processing an object audio signal and, more specifically, to an audio signal processing method comprising the steps of: receiving a bitstream including a normal channel signal and an exceptional channel signal; decoding the exceptional channel signal and the normal channel signal from the received bitstream; generating correlation information by using the decoded exceptional channel signal and the decoded normal channel signal; generating a gain value by using at least one of a first downmix scheme, which applies the same downmix gain value, and a second downmix scheme, which applies a time-varying gain value by using the correlation information; and outputting the exceptional channel signal as a plurality of channel signals by using the gain value.

Description

예외 채널 신호의 렌더링 방법 {Rendering for exception channel signal}Rendering for exception channel signal}

본 발명은 객체 오디오 신호 처리 방법 및 장치에 관한 것으로, 보다 상세하게는 객체 오디오 신호의 부호화 및 복호화하거나 3차원 공간에 렌더링하기 위한 방법 및 장치에 관한 것이다.
The present invention relates to a method and apparatus for processing an object audio signal, and more particularly, to a method and apparatus for encoding and decoding an object audio signal or rendering the object audio signal in a three-dimensional space.

3D 오디오란 기존의 서라운드 오디오에서 제공하는 수평면 상의 사운드 장면(2D)에 높이 방향으로 또 다른 축(dimension)을 제공함으로써, 말그대로 3차원 공간에서의 임장감있는 사운드를 제공하기 위한 일련의 신호처리, 전송, 부호화, 재생 기술 등을 통칭한다. 특히, 3D 오디오를 제공하기 위해서는 종래보다 많은 수의 스피커를 사용하거나 혹은 적은 수의 스피커를 사용하더라도 스피커가 존재하지 않는 가상의 위치에서 음상이 맺히도록 하는 렌더링 기술이 널리 요구된다. 3D audio is a series of signal processing to provide a lively sound in a three-dimensional space, by providing another dimension in the height direction on a horizontal sound scene (2D) provided by existing surround audio, Transmission, encoding, reproduction technology, and the like. Particularly, in order to provide 3D audio, a rendering technique is widely required in which an image is formed at a virtual position where a speaker is not used even if a larger number of speakers are used or a smaller number of speakers are used.

3D 오디오는 향후 출시될 초고해상도 TV (UHDTV)에 대응되는 오디오 솔루션이 될 것으로 예상되며, 고품질 인포테인먼트 공간으로 진화하고 있는 차량에서의 사운드를 비롯하여 그밖에 극장 사운드, 개인용 3DTV, 테블릿, 스마트폰, 클라우드 게임 등 다양하게 응용될 것으로 예상된다.
3D audio is expected to become an audio solution for future high-definition TVs (UHDTVs), and it is expected that 3D audio will be able to be used in a variety of applications such as sound in vehicles evolving into high-quality infotainment space as well as theater sound, personal 3DTV, tablet, Games and so on.

3D 오디오는 우선 최대 22.2채널까지 종래보다 많은 채널의 신호를 전송하는 것이 필요한데, 이를 위해서는 이에 적합한 압축 전송 기술이 요구된다. 종래의 MP3, AAC, DTS, AC3 등의 고음질 부호화의 경우, 주로 5.1채널 미만의 채널만을 전송하는데 최적화되어 있었다. For 3D audio, it is necessary to transmit signals of more than 22.2 channels, which is a conventional compression transmission technique. In the case of conventional high-quality encoding such as MP3, AAC, DTS, and AC3, it is optimized to transmit only channels less than 5.1 channels.

또한 22.2채널 신호를 재생하기 위해서는 24개의 스피커 시스템을 설치한 청취공간에 대한 인프라가 필요한데, 시장에 단기간 확산이 용이하지 않으므로, 22.2채널 신호를 그보다 작은 수의 스피커를 가진 공간에서 효과적으로 재생하기 위한 기술, 반대로 기존 스테레오 혹은 5.1채널 음원을 그보다 많은 수의 스피커인 10.1채널, 22.2채널 환경에서 재생할 수 있도록 하는 기술, 나아가서, 규정된 스피커 위치와 규정된 청취실 환경이 아닌 곳에서도 원래의 음원이 제공하는 사운드 장면을 제공할 수 있도록 하는 기술, 그리고 헤드폰 청취환경에서도 3D 사운드를 즐길 수 있도록 하는 기술 등이 요구된다. 이와 같은 기술들을 본원에서는 통칭 렌더링(rendering)이라고 하고, 세부적으로는 각각 다운믹스, 업믹스, 유연한 렌더링(flexible rendering), 바이노럴 렌더링 (binaural rendering) 등으로 부른다.In addition, in order to reproduce 22.2 channel signals, an infrastructure for a listening space in which 24 speaker systems are installed is required. In short, it is not easy to spread to the market, so a technology for effectively reproducing 22.2 channel signals in a space with a smaller number of speakers , A technique that allows the reproduction of a conventional stereo or 5.1 channel sound source in a larger number of speakers, 10.1 channel and 22.2 channel environment, and also a sound provided by the original sound source A technique for providing a scene, and a technique for enabling 3D sound to be enjoyed in a headphone listening environment. Such techniques are referred to herein as collective rendering and are referred to in detail as downmix, upmix, flexible rendering, binaural rendering, and the like.

한편, 이와 같은 사운드 장면을 효과적으로 전송하기 위한 대안으로 객체 기반의 신호 전송 방안이 필요하다. 음원에 따라서 채널 기반으로 전송하는 것보다 객체 기반으로 전송하는 것이 더 유리한 경우가 있을 뿐 아니라, 객체 기반으로 전송하는 경우, 사용자가 임의로 객체들의 재생 크기와 위치를 제어할 수 있는 등 인터렉티브한 음원 청취를 가능하게 한다. 이에 따라 객체 신호를 높은 전송률로 압축할 수 있는 효과적인 전송 방법이 필요하다. On the other hand, an object-based signal transmission scheme is needed as an alternative for efficiently transmitting such a sound scene. It is more advantageous to transmit on an object basis than on a channel-based transmission according to a sound source. In addition, when transmitting on an object basis, the user can arbitrarily control the reproduction size and position of the objects, . Accordingly, there is a need for an effective transmission method capable of compressing object signals at a high transmission rate.

또한, 상기 채널 기반의 신호와 객체 기반의 신호가 혼합된 형태의 음원도 존재할 수 있으며, 이를 통해 새로운 형태의 청취 경험을 제공할 수도 있다. 따라서, 채널 신호와 객체 신호를 함께 효과적으로 전송하고, 이를 효과적으로 렌더링하기 위한 기술도 필요하다. In addition, a sound source in which the channel-based signal and the object-based signal are mixed may exist, thereby providing a new type of listening experience. Accordingly, there is a need for a technique for effectively transmitting a channel signal and an object signal together and rendering the same effectively.

마지막으로 채널이 갖는 특수성과 재생단에서의 스피커 환경에 따라 기존의 방식으로는 재생하기 어려운 예외 채널들이 발생할 수 있다. 이 경우 재생단에서의 스피커 환경을 기반으로 효과적으로 예외 채널을 재현하는 기술이 필요하다.
Finally, depending on the specificity of the channel and the speaker environment at the playback stage, exception channels that are difficult to reproduce in the conventional manner may occur. In this case, there is a need for a technique for effectively reproducing the exception channel based on the speaker environment at the reproduction end.

본 발명의 일 양상에 따르면, 오디오 신호처리 방법으로써, 일반 채널 신호와 예외 채널 신호가 포함된 비트열을 수신하는 단계; 상기 수신된 비트열로부터 예외 채널 신호와 일반 채널 신호를 복호화하는 단계; 상기 복호화된 예외 채널 신호와 상기 복호화된 일반 채널 신호를 이용하여 상관 정보를 생성하는 단계; 상기 상관 정보를 이용하여 동일한 다운믹스 이득값을 적용하는 제 1 다운믹스 방법과 시간에 따른 가변적 이득값을 적용하는 제 2 다운믹스 방법 중 적어도 하나를 통해 이득값을 생성하는 단계; 상기 이득값을 이용하여 상기 예외 채널 신호를 복수개의 채널 신호를 출력하는 단계를 포함하는 오디오 신호처리 방법이 제공될 수 있다.
According to an aspect of the present invention, there is provided a method of processing an audio signal, the method comprising: receiving a bit stream including a general channel signal and an exception channel signal; Decoding an exception channel signal and a general channel signal from the received bit stream; Generating correlation information using the decoded exception channel signal and the decoded general channel signal; Generating a gain value through at least one of a first downmix method for applying the same downmix gain value using the correlation information and a second downmix method for applying a variable gain value over time; And outputting a plurality of channel signals of the exception channel signal using the gain value.

본 발명에 의하면, 예외 위치 또는 예외 기능을 하는 채널이 부재할 경우 이를 음원의 특성에 따라서 효과적으로 재생할 수 있다. 이러한 예외 채널의 대표적인 예가 머리 바로 위에 존재하는 TcP 채널로 이 채널은 신의 음성과 같이 하늘에서 머리 바로 위로 음성이 들리는 듯한 효과 등을 주는 독특한 기능을 하는 채널이다. 이 채널의 경우 다른 경우와는 달리 특별한 효과를 주기 때문에 이 채널이 부재할 경우 효과적으로 다른 채널들을 이용하여 재생 할 수 있어야 한다. 본 발명은 이러한 예외 채널이 부재한 경우에도 효과적으로 이를 보상할 수 있는 효과를 가진다. 본 발명의 효과가 상술한 효과들로 제한되는 것은 아니며, 언급되지 아니한 효과들은 본 명세서 및 첨부된 도면으로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확히 이해될 수 있을 것이다.
According to the present invention, when there is no exception or exceptional channel, it can be effectively reproduced according to the characteristics of the sound source. A typical example of such an exception channel is a TcP channel directly above the head. This channel is a unique channel that gives the effect of a sound like a voice in the sky just above the head. In the case of this channel, unlike other cases, it has a special effect. Therefore, in the absence of this channel, it should be possible to effectively reproduce using other channels. The present invention has the effect of effectively compensating for the absence of such an exception channel. The effects of the present invention are not limited to the above-mentioned effects, and the effects not mentioned can be clearly understood by those skilled in the art from the present specification and the accompanying drawings.

도 1은 동일한 시청 거리에서 영상 크기에 따른 시청 각도를 설명하기 위한 도면
도 2는 멀티 채널의 일 예로서 22.2ch의 스피커 배치 구성도
도 3은 예외 신호가 다운믹스되는 과정을 설명하기 위한 개념도
도 4은 다운 믹서 선택부의 순서도
도 5은 매트릭스 기반 다운믹서에서의 간략화된 방법을 설명하기 위한 개념도
도 6은 매트릭스 기반 다운믹서의 개념도
도 7는 경로 기반 다운믹서의 개념도
도 8는 가상 채널 생성기의 개념도1 is a view for explaining viewing angles according to image sizes at the same viewing distance
2 is a diagram showing a configuration of a speaker arrangement of 22.2 channels
3 is a conceptual diagram illustrating a process of downmixing an exception signal
4 is a flowchart of the down mixer selection unit
5 is a conceptual diagram for explaining a simplified method in a matrix-based downmixer
6 is a conceptual diagram of a matrix-based downmixer
7 is a conceptual diagram of a path-based downmixer
8 is a conceptual diagram of a virtual channel generator

본 명세서에 기재된 실시예는 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 본 발명의 사상을 명확히 설명하기 위한 것이므로, 본 발명이 본 명세서에 기재된 실시예에 의해 한정되는 것은 아니며, 본 발명의 범위는 본 발명의 사상을 벗어나지 아니하는 수정예 또는 변형예를 포함하는 것으로 해석되어야 한다. 본 명세서에서 사용되는 용어와 첨부된 도면은 본 발명을 용이하게 설명하기 위한 것이고, 도면에 도시된 형상은 필요에 따라 본 발명의 이해를 돕기 위하여 과장되어 표시된 것이므로, 본 발명이 본 명세서에서 사용되는 용어와 첨부된 도면에 의해 한정되는 것은 아니다. 본 명세서에서 본 발명에 관련된 공지의 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에 이에 관한 자세한 설명은 필요에 따라 생략한다. 본 발명에서 다음 용어는 다음과 같은 기준으로 해석될 수 있고, 기재되지 않은 용어라도 하기 취지에 따라 해석될 수 있다. 코딩은 경우에 따라 인코딩 또는 디코딩으로 해석될 수 있고, 정보(information)는 값(values), 파라미터(parameter), 계수(coefficients), 성분(elements) 등을 모두 아우르는 용어로서, 경우에 따라 의미는 달리 해석될 수 있는 바, 그러나 본 발명은 이에 한정되지 아니한다. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to be illustrative of the present invention and not to limit the scope of the invention. Should be interpreted to include modifications or variations that do not depart from the spirit of the invention. The terms and accompanying drawings used herein are for the purpose of facilitating the present invention and the shapes shown in the drawings are exaggerated for clarity of the present invention as necessary so that the present invention is not limited thereto And are not intended to be limited by the terms and drawings. In the following description, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. In the present invention, the following terms can be interpreted according to the following criteria, and terms not described may be construed in accordance with the following. Coding can be interpreted as encoding or decoding as occasion demands, and information is a term that includes all of values, parameters, coefficients, elements, and the like, But the present invention is not limited thereto.

본 발명의 일 양상에 따르면, 오디오 신호처리 방법으로써, 일반 채널 신호와 예외 채널 신호가 포함된 비트열을 수신하는 단계; 상기 수신된 비트열로부터 예외 채널 신호와 일반 채널 신호를 복호화하는 단계; 상기 복호화된 예외 채널 신호와 상기 복호화된 일반 채널 신호를 이용하여 상관 정보를 생성하는 단계; 상기 상관 정보를 이용하여 동일한 다운믹스 이득값을 적용하는 제 1 다운믹스 방법과 시간에 따른 가변적 이득값을 적용하는 제 2 다운믹스 방법 중 적어도 하나를 통해 이득값을 생성하는 단계; 상기 이득값을 이용하여 상기 예외 채널 신호를 복수개의 채널 신호를 출력하는 단계를 포함하는 오디오 신호처리 방법이 제공될 수 있다.According to an aspect of the present invention, there is provided a method of processing an audio signal, the method comprising: receiving a bit stream including a general channel signal and an exception channel signal; Decoding an exception channel signal and a general channel signal from the received bit stream; Generating correlation information using the decoded exception channel signal and the decoded general channel signal; Generating a gain value through at least one of a first downmix method for applying the same downmix gain value using the correlation information and a second downmix method for applying a variable gain value over time; And outputting a plurality of channel signals of the exception channel signal using the gain value.

또한, 상기 제 1 다운믹스 방법은 복수개의 채널에 동일한 다운믹스 이득값을 적용하는 것을 특징으로 하는 오디오 신호처리 방법이 포함될 수 있다.Also, the first downmix method may include an audio signal processing method in which the same downmix gain value is applied to a plurality of channels.

또한, 상기 제 1 다운믹스 방법은 스피커의 위치 정보를 이용하여 이득값과 딜레이 정보를 보상하는 것을 특징으로 하는 오디오 신호 처리 방법이 포함될 수 있다.Also, the first downmix method may include a method of processing an audio signal, wherein the gain value and the delay information are compensated using position information of a speaker.

또한, 상기 제 1 다운믹스 방법은 균등하게 나눠진 공간에 동일한 이득값이 분배 하는 것을 특징으로 하는 오디오 신호처리 방법이 포함될 수 있다.Also, the first downmix method may include an audio signal processing method in which the same gain value is distributed in an evenly divided space.

또한, 상기 제 2 다운믹스 방법은 상기 상관 정보를 기준으로 음상의 이동 경로를 추정하여 다운믹스 이득값을 시간에 따라 가변적으로 조절하는 것을 특징으로 하는 오디오 신호처리 방법이 포함될 수 있다.
Also, the second downmix method may include a method of processing an audio signal, wherein the downmix gain value is variably adjusted with time by estimating a moving path of the sound image based on the correlation information.

이하에서는 본 발명의 실시예에 따른 객체 오디오 신호의 처리 방법 및 장치에 관하여 설명한다.
Hereinafter, a method and apparatus for processing an object audio signal according to an embodiment of the present invention will be described.

도 1은 동일한 시청 거리상에서 영상 크기(예: UHDTV 및 HDTV)에 따른 시청 각도를 설명하기 위한 도면이다. 디스플레이의 제작 기술이 발전되고, 소비자의 요구에 따라서 영상크기가 대형화 되어가는 추세이다. 도 1에 나타난 바와 같이 HDTV(1920*1080픽셀 영상, 120)인 경우보다 UHDTV(7680*4320픽셀 영상,110)는 약 16배가 커진 영상이다. HDTV가 거실 벽면에 설치되고 시청자가 일정 시청거리를 두고 거실 쇼파에 앉은 경우 약 시청 각도가 30도일 수 있다. 그런데 동일 시청 거리에서 UHDTV가 설치된 경우 시청 각도는 약 100도에 이르게 된다. 이와 같이 고화질 고해상도의 대형 스크린이 설치된 경우, 이 대형 컨텐츠에 걸맞게 높은 현장감과 임장감을 갖는 사운드가 제공되는 것이 바람직할 수 있다. 시청자가 마치 현장에 있는 것과 거의 동일한 환경을 제공하기 위해서는, 1-2개의 서라운드 채널 스피커가 존재하는 것만으로는 부족할 수 있다. 따라서, 보다 많은 스피커 및 채널 수를 갖는 멀티채널 오디오 환경이 요구될 수 있다.1 is a view for explaining viewing angles according to image sizes (e.g., UHDTV and HDTV) on the same viewing distance. Display technology has been developed and the size of the image has been increasing in accordance with the demand of the consumer. As shown in FIG. 1, UHDTV (7680 * 4320 pixel image, 110) is about 16 times larger than that of HDTV (1920 * 1080 pixel image, 120). If the HDTV is installed on the living room wall and the viewer is sitting on the living room sofa at a certain viewing distance, the viewing angle may be about 30 degrees. However, when UHDTV is installed in the same viewing distance, the viewing angle reaches about 100 degrees. When a large screen of high resolution and high resolution is installed as described above, it may be desirable to provide a sound having high sense of presence and impact suitable for the large content. In order to provide a viewer with almost the same environment as in the scene, the presence of one or two surround channel speakers may not be sufficient. Thus, a multi-channel audio environment having a larger number of speakers and channels may be required.

위에 설명한 바와 같이 홈 시어터 환경 이외에도 개인 3D TV(personal 3D TV), 스마트폰 TV, 22.2채널 오디오 프로그램, 자동차, 3D video, 원격 현장감 룸(telepresence room), 클라우드 기반 게임(cloud-based gaming) 등이 있을 수 있다.
In addition to the home theater environment, there are also personal 3D TVs, smartphone TVs, 22.2-channel audio programs, cars, 3D videos, telepresence rooms, cloud-based gaming, Can be.

도 2는 멀티 채널의 일 예로서 22.2ch의 스피커 배치를 나타낸 도면이다. 22.2ch는 음장감을 높이기 위한 멀티 채널 환경의 일 예일 수 있으며, 본 발명은 특정 채널 수 또는 특정 스피커 배치에 한정되지 아니한다. 도 2를 참조하면, 가장 높은 레이어(top layer, 210)에 총 9개 채널이 제공될 수 있다. 전면에 3개, 중간 위치에 3개, 서라운드 위치에 3개 총 9개의 스피커가 배치되어 있음을 알 수 있다. 중간 레이어(middle layer, 220)에는 전면에 5개, 중간 위치에 2개, 서라운드 위치에 총 3개의 스피커가 배치될 수 있다. 전면의 5개 스피커 중에 중앙 위치의 3개는 TV 스크린의 내에 포함될 수 있다. 바닥(bottom layer, 230)에는 전면에 총 3개의 채널 및 2개의 LFE 채널(240)이 설치될 수 있다. 2 is a diagram showing a speaker arrangement of 22.2 channels as an example of a multi-channel. 22.2ch may be an example of a multi-channel environment for enhancing the sound field, and the present invention is not limited to a specific number of channels or a specific speaker arrangement. Referring to FIG. 2, a total of nine channels may be provided in the top layer 210. It can be seen that a total of nine speakers are arranged at the front, three at the middle position, and three at the surround position. In the middle layer 220, five speakers may be arranged on the front side, two speakers may be arranged on the middle position, and three speakers may be disposed on the surround position. Of the five speakers on the front, three of the center positions can be contained within the TV screen. A total of three channels and two LFE channels 240 may be installed on the bottom layer 230 at the front side.

이와 같이 최대 수십 개 채널에 이르는 멀티 채널 신호를 전송하고 재생하는 데 있어서, 높은 연산량이 필요할 수 있다. 또한 통신 환경 등을 고려할 때 높은 압축률이 요구될 수 있다. 뿐만 아니라, 일반 가정에서는 멀티채널(예: 22.2ch) 스피커 환경을 구비하는 경우는 많지 않고 2ch 또는 5.1ch 셋업을 갖는 청취자가 많기 때문에, 모든 유저에게 공통적으로 전송하는 신호가 멀티채널을 각각 인코딩해서 보내는 경우에는, 그 멀티채널을 2ch 및 5.1ch로 다시 변환하여 재생해야하는 경우 통신적인 비효율이 발생할 뿐만 아니라 22.2ch의 PCM 신호를 저장해야 하므로, 메모리 관리에 있어서의 비효율이 발생할 수 있다.
In this manner, a high computation amount may be required for transmitting and reproducing multi-channel signals up to several tens of channels. Also, a high compression ratio may be required in consideration of a communication environment and the like. In addition, many households do not have a multi-channel (eg, 22.2-ch) speaker environment and many listeners have a 2-channel or 5.1-channel setup. When the multi-channel is converted and re-converted into 2-channel and 5.1-channel, it is necessary to store 22.2-channel PCM signals as well as communication inefficiency, resulting in inefficiency in memory management.

(유연한 렌더링 필요) (Flexible rendering required)

3D 오디오를 위해 필요한 기술 가운데 유연한 렌더링은 3D 오디오의 품질을 최상으로 끌어올리기 위해 해결해야할 중요한 과제 가운데 하나이다. 거실의 구조, 가구 배치에 따라 5.1 채널 스피커의 위치가 매우 비정형적인 것은 주지의 사실이다. 이와 같은 비정형적 위치에 스피커가 존재하더라도, 컨텐츠 제작자가 의도한 사운드 장면을 제공할 수 있도록 해야하는데, 이를 위해서는 사용자마다 제각각인 재생 환경에서의 스피커 환경을 알아야 하는 것과 함께, 규격에 따른 위치 대비 차이를 보정하기 위한 렌더링 기술이 필요하다. 즉, 전송된 비트열을 디코딩 방법에 따라 디코딩하는 것으로 코덱의 역할이 끝나는 것이 아니라, 이를 사용자의 재생 환경에 맞게 최적화 변형하는 과정에 대한 일련의 기술이 요구된다.
Among the technologies required for 3D audio, flexible rendering is one of the key challenges to be solved to maximize the quality of 3D audio. It is well known that the position of the 5.1 channel speaker is very irregular depending on the structure of the living room and the arrangement of the furniture. Even if there is a speaker at such an irregular position, a content producer should be able to provide a sound scene intended by the user. In order to do this, it is necessary to know the speaker environment in the reproduction environment which is different for each user, A rendering technique is needed to compensate for this. That is, a series of techniques are required to decode the transmitted bit stream according to the decoding method, and not to end the codec role, but to optimize and transform it according to the user's reproduction environment.

(플렉서블 렌더링)(Flexible rendering)

신호의 크기를 기준으로 두 스피커 사이의 음원의 방향 정보를 결정하는 Amplitude Panning이나 3차원 공간상에서 3개의 스피커를 이용하여 음원의 방향을 결정하는데 널리 사용되는 VBAP (Vector-Based Amplitude Panning)을 이용하면 객체별로 전송된 객체 신호에 대해서는 상대적으로 편리하게 플렉서블 렌더링을 구현할 수 있는 것을 알 수 있다. 채널 대신 객체 신호를 전송하는 것의 장점 중 하나이다.
Amplitude Panning, which determines the direction information of a sound source between two speakers based on the signal size, or Vector-Based Amplitude Panning (VBAP), which is widely used to determine the direction of a sound source using three speakers in a three-dimensional space It can be seen that flexible rendering can be implemented relatively conveniently for object signals transmitted on an object-by-object basis. It is one of the advantages of transmitting object signals instead of channels.

(Voice of God)(Voice of God)

멀티채널 오디오 시스템에서 청취자 머리위의 스피커인 TpC(Top of center) 채널은 흔히 ‘신의 음성(Voice-of-God)’ 이라고 불린다. 이 채널이 신의 음성이라고 불리는 이유는 이 채널을 사용함으로써 얻을 수 있는 가장 극적인 상황이 신의 음성이 하늘에서 들리는 상황이기 때문이다. 이 밖에도 이 채널을 사용함으로써 얻을 수 있는 효과는 매우 다양하다. 머리 바로 위에서 물체가 떨어지는 상황이나, 머리 바로 위에서 폭죽놀이가 진행되는 상황, 매우 높은 빌딩의 옥상에서 한 사람이 소리치는 상황 등이 예가 된다. 혹은 비행기가 전방에서 시청자의 머리위를 지나 후면으로 사라지는 장면처럼 다양한 씬에서 매우 필수적인 채널이라고 할 수 있다. 즉, TpC채널을 사용함으로써 많은 극적인 상황들에서 기존의 오디오 시스템이 제공하지 못했던 현실감 있는 음장을 사용자에게 부여할 수 있다. TpC와 같은 예외 채널의 경우 해당 위치에 스피커가 존재하지 않은 경우 기존의 유연한 렌더링과 같은 방식으로 이를 보상하는 것은 효과적이지 못하며 큰 기능을 기대하기 어렵다. 따라서 이러한 예외 채널이 부재한 경우 적은 수의 출력 채널을 통하여 이를 효과적으로 재생하는 방법이 필요하다. In multichannel audio systems, the top of center (TpC) channel, which is the speaker above the listener's head, is often referred to as the "voice-of-God". This channel is called the voice of God because the most dramatic situation you can get by using this channel is because God's voice is heard in the sky. There are many other benefits to using this channel. An example is a situation where an object falls directly above your head, a situation where fireworks play is just above your head, or a person shouts on the roof of a very tall building. Or it can be said to be a very essential channel in a variety of scenes, such as a scene in which an airplane goes from the front to the back of the viewer's head. That is, by using the TpC channel, it is possible to give a user a realistic sound field that the conventional audio system can not provide in many dramatic situations. In the case of an exception channel such as TpC, it is not effective to compensate it in the same manner as existing flexible rendering when there is no speaker at the corresponding position, and it is difficult to expect a large function. Therefore, in the absence of such an exception channel, a method for effectively reproducing it through a small number of output channels is needed.

멀티채널 컨텐츠를 그보다 적은 수의 출력 채널을 통해 재생하는 경우 지금까지는 M-N 다운믹스 매트릭스 (M은 입력채널 수, N은 출력 채널 수)로 구현하는 것이 일반적이다. 즉, 5.1 채널 컨텐츠를 스테레오로 재생할 때, 주어진 수식에 의해 다운믹스를 수행하는 식으로 구현된다. 그런데, 이와 같은 다운믹스 구현 방법은 일반적으로 공간적으로 거리가 가까운 스피커들에 상대적인 다운믹스 게인을 적용하여 합성하는 방법을 취한다. 예를 들어 톱 레이어의 TpFc는 중간 레이어의 Fc(혹은 FRc, FLc)로 다운믹스되어 합성될 수 있다. 즉, 이 스피커들(Fc, FRc, FLc)을 이용하여 가상의 TpFc를 생성함으로써 부재 스피커(TpFc)의 위치에 해당하는 소리를 재생할 수 있다. 그러나, Tpc 스피커의 경우, 청취자를 기준으로 전후좌우의 방향성을 규정하기 모호하여 중간 레이어의 스피커들 중 이와 공간적으로 근접한 스피커 위치를 결정하기 어려운 문제점을 갖는다. 더불어 비정형적인 스피커 배열 환경에서 Tpc 스피커에 할당된 신호를 다운믹스 렌더링하는 경우, 유연한 렌더링 기술과 연관하여 다운믹스 매트릭스의 형태를 유연하게 변화시키는 것이 효과적인 경우도 있다. In a case where multi-channel content is reproduced through a smaller number of output channels, up to now, it is common to implement an M-N downmix matrix (M is the number of input channels and N is the number of output channels). That is, when 5.1 channel contents are reproduced in stereo, downmix is performed by a given expression. However, such a downmix implementation method generally adopts a method of applying a downmix gain relative to spatially close speakers, and synthesizing them. For example, TpFc of the top layer can be downmixed to Fc (or FRc, FLc) of the middle layer and synthesized. That is, sound corresponding to the position of the member speaker TpFc can be reproduced by generating the virtual TpFc using these speakers Fc, FRc, and FLc. However, in the case of the Tpc speaker, it is difficult to determine the directionality of the front, rear, left and right based on the listener, and it is difficult to determine the position of the speaker in the middle layer among the speakers in the middle layer. In addition, when downmixing a signal assigned to a Tpc speaker in an atypical speaker array environment, it may be effective to flexibly change the shape of a downmix matrix in connection with a flexible rendering technique.

이에 대한 해결 방안 중 하나로 TpC로 재생되는 음원이 정말로 “신의 목소리”에 해당하는 객체로써, TpC에서만 재생되는 객체이거나, TpC를 중심으로 재생되는 객체라면, 그에 맞게 다운믹스하는 것이 바람직하다. 그러나, 상위 레이어 전체에서 재생되는 객체의 일부이거나, TpFL의 위치에서 TpC 를 통과하여 TpBR를 지나는 것처럼 비행기가 하늘을 스쳐 지나가는 순간인 경우에는 또한 그에 특화된 다운믹스 방법을 적용하는 것이 바람직하다. 게다가 위 두 상황과는 다르게 스피커의 위치에 따라 소수의 제한된 숫자의 스피커들을 이용해야만 하는 경우, 다양한 각도에 음원을 위치시키는 렌더링 방법에 관한 고려가 필요하다. 사람이 음원의 높이를 인지하는 단서(elevation spectral cue)들이 존재하는데 한 예로써 음원의 높이에 따라 사람의 귓바퀴(pinna)의 외형적 특성에 영향으로 인해 이상의 고주파 대역에서와 나치와 피크의 형태가 될 수 있다. 따라서, 이러한 음원의 높이를 인지하는 단서를 인위적으로 삽입함으로써 TpC 채널의 사운드 장면을 효과적으로 재현할 수 있다.
One of the solutions to this problem is that the sound source reproduced by TpC is really an object corresponding to "God's voice". If it is an object reproduced only in TpC or an object reproduced around TpC, it is preferable to downmix it accordingly. However, it is preferable to apply a downmix method specialized for the part when the airplane passes through the sky as if it is a part of the object reproduced in the entire upper layer, or passes the TpBR through the TpC at the position of TpFL. Furthermore, unlike the above two situations, if a limited number of speakers should be used depending on the position of the speaker, consideration should be given to a rendering method of positioning the sound source at various angles. One example is the elevation spectral cues that the person perceives the height of a sound source. For example, depending on the height of the sound source, the shape of the pinna of the human being influences the shape of the nail and peak in the higher frequency band . Therefore, it is possible to effectively reproduce the sound scene of the TpC channel by artificially inserting a clue to recognize the height of the sound source.

(VOG 전체 블록도)(Whole block diagram of VOG)

도 3은 예외 채널 신호가 다운믹스 되는 과정을 나타낸 개념도이다. 예외 채널 신호 신호는 전송된 비트열의 특정 값 또는 신호의 특징을 분석하여 다운믹스 될 수 있다. 예외 채널 신호의 실시 예로써 청취자의 머리 위쪽에 존재하는 TpC신호를 들 수 있다. 첫째로 머리 위쪽에 정지되어 있거나 방향성이 모호한 앰비언트(ambient)한 신호의 경우 다수의 채널에 동일한 다운믹스 게인을 적용하는 것이 타당하다. 이는 기존의 일반적인 매트릭스 기반 다운믹서(310)를 사용하여 TpC 신호를 다운믹스 할 수 있다. 둘째는 이동성을 가지는 사운드 장면에서의 TpC 신호의 경우 앞에서 언급한 매트릭스 기반 다운믹서(310)을 사용할 경우 컨텐츠 제공자가 의도한 동적인 사운드 장면이 보다 정적해진다. 이를 방지하기 위하여 채널 신호들을 분석하여 가변적인 이득 값을 가지는 다운믹스를 수행 할 수 있다. 이를 경로 기반 다운믹서(320)이라고 부른다. 마지막으로 마지막으로 근방의 스피커만으로 원하는 효과를 충분히 얻을 수 없는 경우 특정 N개의 스피커의 출력 신호에 사람이 높이를 지각하는 스펙트럴 단서들을 사용할 수 있다. 이를 가상 채널 생성기(330)이라고 부른다. 다운믹서 선택부(340)에서는 입력 비트열 정보를 이용하거나 입력 채널 신호들을 분석하여 어떤 다운믹스 방법을 사용할 지 결정된다. 이렇게 선택된 다운믹스 방법에 따라 L, M 또는 N개의 채널 신호로 출력신호가 결정되게 된다.
3 is a conceptual diagram illustrating a process of downmixing an exception channel signal. The exception channel signal may be downmixed by analyzing the characteristic of the signal or a specific value of the transmitted bitstream. An example of the exception channel signal is a TpC signal existing above the listener's head. First, it is reasonable to apply the same downmix gain to multiple channels in case of an ambient signal that is stationary above the head or ambiguous. Which can down-mix the TpC signal using a conventional conventional matrix-based downmixer 310. Second, in the case of a TpC signal in a mobile sound scene, the dynamic sound scene intended by the content provider becomes more static when the matrix-based downmixer 310 described above is used. In order to prevent this, channel signals are analyzed and a downmix having a variable gain value can be performed. This is called a path-based down mixer 320. Finally, lastly, if you can not get enough of the desired effect with only the nearby speakers, you can use spectral clues that perceive a person's height in the output signal of a certain N speakers. This is called a virtual channel generator 330. The downmix selector 340 determines which downmix method is used by using input bit stream information or analyzing input channel signals. According to the selected downmix method, the output signal is determined by L, M, or N channel signals.

(다운믹스 결정부)(Downmix determination unit)

도 4는 다운믹서 선택부(340)의 순서도이다. 먼저 입력 비트열을 파싱하여 컨텐츠 제공자가 설정한 모드가 있는지를 체크한다. 설정된 모드가 있는 경우 해당 모드의 설정된 파라미터를 이용하여 다운믹스를 수행한다. 컨텐츠 제공자가 설정한 모드가 없는 경우 현재 사용자의 스피커 배치를 분석한다. 이는 스피커 배치가 매우 비정형인경우 앞에서 언급하였듯이 근방 채널의 이득값을 조절하는 것 만으로 다운믹스를 할 경우 컨텐츠 제공자가 의도한 사운드 장면을 충분히 재생할 수 없기 때문이다. 이를 극복하기 위해서는 사람이 높은 고도의 음상을 인지하는 여러가지 단서들을 이용하여만 한다. 4 is a flowchart of the down mixer selection unit 340. FIG. First, the input bit string is parsed to check whether the mode set by the content provider exists. If there is a set mode, downmix is performed using the set parameters of the corresponding mode. If there is no mode set by the content provider, the speaker arrangement of the current user is analyzed. This is because, as described above, when the speaker arrangement is very irregular, it is impossible to sufficiently reproduce the sound scenes intended by the content provider when the down mix is performed merely by adjusting the gain value of the nearby channel. In order to overcome this, one only uses various clues that a person perceives a high-level sound image.

스피커 배치를 분석하는 실시 예로써 도2의 상위 레이어의 스피커들의 위치 벡터들과 재생단에서의 상위 레이어 스피커 위치 벡터들의 거리합으로 분석할 수 있다. 도2의 상위 레이어의 i번째 스피커의 위치 벡터를 Vi, 재생단에서의 i번째 스피커의 위치 벡터를 Vi' 라고 하자. 또한 스피커의 위치적 중요도에 따라 가중치를 wi라고 하면 스피커 위치 에러 Espk 는 수학식 1으로 정의될 수 있다.
As an embodiment of analyzing the speaker arrangement, it can be analyzed as the sum of the distance vectors of the position vector of the upper layer speakers of FIG. 2 and the upper layer speaker position vectors of the reproduction end. Let Vi be the position vector of the i-th speaker in the upper layer of FIG. 2, and Vi 'be the position vector of the i-th speaker at the playback end. If the weight is weighted according to the positional importance of the speaker, the speaker position error Espk can be defined by Equation (1).

사용자의 스피커 배치가 매우 비 정형적인 경우 스피커 위치 에러 Espk는 큰 값을 갖게 된다. 따라서 스피커 위치 에러 Espk가 일정 임계값을 이상 또는 초과하는 경우 이는 가상 채널 생성기를 선택한다. 스피커 위치 에러가 일정 임계값보다 미만 또는 이하인 경우 매트릭스 기반 다운믹서 또는 경로 기반 다운믹서를 사용하게 된다. 다운믹스 하려는 음원이 채널 신호인 경우 채널신호의 추정된 음상 크기의 폭에 따라 다운믹스 방법이 선택 될 수 있다. 이는 뒤에서 언급할 사람의 정위 퍼짐(localization blur)이 정중면에 비하여 굉장히 크기 때문에, 음상의 폭(apparent source width)이 넓을 경우 정교한 음상 정위 방법이 불필요하기 때문이다. 여러 채널의 음상의 폭을 측정하는 실시 예로써 양 이 신호의 상호 상관도(interaural cross correlation)을 이용하여 측정방법이 한 예가 된다. 그러나 이는 매우 복잡한 연산을 필요로하므로 각 채널간의 상호상관도는 양 이 신호의 상호상관도와 비례하다고 가정하면 TpC 채널 신호와 각 채널간의 상호상관도의 총 합를 이용하여 상대적으로 적은 연산량으로 음상의 폭을 추정할 수 있다. TpC 채널 신호와 주변 채널 신호간의 상호 상관도의 총 합 C가 일정 임계값을 초과 또는 이상인 경우 음상의 폭이 기준보다 넓기 때문에 매트릭스 기반 다운믹서를 사용하고, 그렇지 않은 경우 음상의 폭이 기준보다 좁은 것이므로 보다 정교한 경로 기반 다운믹서를 이용한다.
If the speaker layout of the user is very irregular, the speaker position error Espk will have a large value. Therefore, if the speaker position error Espk exceeds or exceeds a certain threshold value, it selects a virtual channel generator. When the speaker position error is less than or equal to a certain threshold value, a matrix-based downmixer or a path-based downmixer is used. If the sound source to be downmixed is a channel signal, the downmix method may be selected according to the width of the estimated sound image size of the channel signal. This is because the localization blur of the person mentioned later is very large compared to that of the mid-plane, so that a sophisticated sound localization method is unnecessary when the apparent source width is wide. As an example of measuring the width of the sound image of several channels, an example of the measurement method using the interaural cross correlation of the signals is described. However, since it requires a very complicated operation, assuming that the amount of cross-correlation between the channels is proportional to the cross-correlation of the signals, the total sum of the cross correlation between the TpC channel signal and each channel is used, Can be estimated. If the total sum C of the cross correlation between the TpC channel signal and the surrounding channel signal exceeds or exceeds a certain threshold value, the matrix-based downmixer is used because the width of the sound image is wider than the reference value. Otherwise, , It uses a more sophisticated path-based downmixer.

(정적인 음원 다운믹서 / 매트릭스 기반 다운믹서)(Static source downmixer / matrix-based downmixer)

여러 심리 음향적인 실험에 따르면 정중면(median plane)에서의 음상 정위는 수평면(horizontal plane)에서의 음상 정위와는 굉장히 다른 양상을 가진다. 이러한 음상 정위의 부정확도를 측정하는 수치로는 정위 퍼짐(localization blur)으로서 이는 특정 위치에서 음상의 위치가 구분이 가지 않는 범위를 각도로 나타낸 것이다. 앞에서 언급한 실험들에 따르면 음성신호의 경우 9도에서 17도에 해당하는 부정확도를 가진다. 그러나 수평면에서 음성신호의 경우 0.9도에서 1.5도를 갖는 것을 고려하면 정중면에서의 음상 정위는 매우 낮은 정확도를 가진다는 것을 알 수 있다. 높은 고도를 가지는 음상의 경우 사람이 인지할 수 있는 정확도가 낮기 때문에 정교한 정위 방법 보다는 매트릭스를 이용한 다운믹스가 효과적이다. 따라서 위치가 크게 변하지 않는 음상의 경우 대칭적으로 스피커가 분포 되어있는 Top채널들에 동등한 이득값을 분배함으로써 효과적으로 부재중인 TpC채널을 복수 개의 채널로 업믹스 할 수 있다. According to various psychoacoustic experiments, the sound phase orientation in the median plane is very different from the sound phase orientation in the horizontal plane. A numerical value for measuring the inaccuracy of the sound localization is a localization blur, which is an angle in which the position of the sound image is not distinguished at a specific position. According to the above-mentioned experiments, the speech signal has an inaccuracy of 9 to 17 degrees. However, it can be seen that the sound localization on the mid-plane has a very low accuracy, considering that the audio signal on the horizontal plane has 0.9 to 1.5 degrees. In the case of high-altitude images, the downmix using the matrix is more effective than the sophisticated method of localization because the human perception is low. Therefore, in the case of a sound image in which the position is not largely changed, it is possible to effectively upmix the TpC channel of the absence to a plurality of channels by distributing the equivalent gain value to the top channels in which the speakers are symmetrically distributed.

재생단의 채널 환경이 도 2의 구성에서 TpC채널을 제외하고 Top 레이어는 동일하다고 가정하면 Top 레이어에 분배되는 채널 이득값은 동일한 값을 갖는다. 그러나 재생단에서 도2와 같이 정형적인 채널 환경을 갖는 것이 어려운 것은 주지의 사실이다. 비정형한 채널 환경에서 앞에서 언급한 모든 채널에 일정 이득값을 배분하는 것은 음상이 컨텐츠가 의도한 위치와 이루는 각도가 정위 퍼짐 수치보다 커질 수 있다. 이는 사용자로 하여금 잘못된 음상을 인지하게 한다. 이를 방지하기 위하여 비정형한 채널 환경의 경우 이를 보상해주는 과정이 필요하다.In the channel environment of the playback end, assuming that the top layer is the same except for the TpC channel in the configuration of FIG. 2, the channel gain values distributed to the top layer have the same value. However, it is well known that it is difficult to have a typical channel environment as shown in FIG. 2 at the playback end. In an unsteady channel environment, distributing a constant gain value to all of the channels mentioned above may result in an angle between the sound image and the intended position of the content to be greater than the sagittal spread value. This allows the user to perceive a false image. In order to prevent this, it is necessary to compensate the unregulated channel environment.

Top 레이어에 위치하는 채널의 경우 청자의 위치에서는 평면파로 도달한다고 가정할 수 있기 때문에 일정한 이득값을 설정하는 기존의 다운믹스 방법은 주변 채널을 이용하여 TpC 채널에서 발생하는 평면파를 재현한다고 설명할 수 있다. Top 레이어를 포함하는 평면상에서 스피커들의 위치를 꼭지점으로 하는 다각형의 무게중심이 TpC채널의 위치와 같은 것과 같다. 따라서 비정형적인 채널 환경의 경우 각 채널의 이득값은 이득값이 가중치로 부여된 각 채널의 Top 레이어를 포함하는 평면 상에서의 2차원 위치벡터들의 무게중심백터가 TpC채널 위치의 위치벡터와 같다는 수식으로 얻어질 수 있다.In the case of the channel located at the top layer, since it can be assumed that a plane wave arrives at the position of the celadon, a conventional downmix method of setting a constant gain value can be described as reproducing a plane wave generated in a TpC channel have. The center of gravity of the polygon whose vertices are the positions of the speakers on the plane containing the Top layer is the same as the position of the TpC channel. Therefore, in the case of an irregular channel environment, the gain value of each channel is expressed by the equation that the center-of-gravity vector of the two-dimensional position vectors on the plane including the top layer of each channel to which the gain value is given is equal to the position vector of the TpC channel position Can be obtained.

그러나 이러한 수식적인 접근은 많은 연산량을 필요로 하며, 이후에 설명될 간략화된 방법에 비하여 성능 차이가 크지 않다. 간략화된 방법은 다음과 같다. 먼저 TpC 채널을 중심으로 N개의 영역을 등각도로 나눈다. 등각도로 나눈 영역에는 동일한 이득값을 부여하고, 만일 영역 내에 2개 이상의 스피커가 위치할 경우 각 게인의 제곱의 합이 상기 언급된 이득값과 같아지도록 설정한다. 이것의 실시 예로써 도 5와 같이 탑 레이어를 포함하는 평면 상에 위치하는 스피커(510), TpC 채널 스피커(520), 탑 레이어를 포함하는 평면 밖에 위치하는 스피커(530)으로 구성되는 스피커 배치를 갖는 다고 가정하자. 4개의 영역을 TpC채널(520)을 중심으로 90도의 등각도로 나누었을 때 각 영역에는 크기가 같으면서 제곱의 합이 1이 되도록 하는 이득값을 부여한다. 이 경우 4개의 영역이므로 각 영역의 이득값은 0.5이다. 한 영역 상에 2개 이상의 스피커가 있는 경우 이 또한 제곱의 합이 영역의 이득값과 같아지도록 이득값을 설정한다. 따라서 오른쪽 하단 영역(540)에 존재하는 2개의 스피커 출력의 이득값은 0.3536이다. 마지막으로 탑 레이어를 포함하는 평면 밖에 위치하는 스피커(530)의 경우 먼저 탑 레이어를 포함하는 평면에 프로젝션 시켰을 때의 이득값을 먼저 구하고, 평면과 스피커의 거리 차이를 이득값과 딜레이를 이용하여 보상한다.However, such a mathematical approach requires a large amount of computation, and the performance difference is not large compared with the simplified method described later. The simplified method is as follows. First, we divide the N regions around the TpC channel by a constant angle. The same gain value is given to the area divided by the equal angle, and if two or more speakers are located in the area, the sum of the squares of the gains is set to be equal to the above-mentioned gain value. As an embodiment of this, as shown in FIG. 5, a speaker arrangement composed of a speaker 510 located on a plane including a top layer, a TpC channel speaker 520, and a speaker 530 located outside a plane including a top layer Suppose you have. When four regions are equally divided at 90 degrees around the TpC channel 520, a gain value is given to each region so that the sum of squares is equal to the size. In this case, the gain value of each region is 0.5 since it is four regions. If there are two or more speakers on one area, this also sets the gain value so that the sum of squares equals the gain value of the area. Therefore, the gain value of the two speaker outputs in the right lower region 540 is 0.3536. Finally, in case of the speaker 530 located outside the plane including the top layer, the gain value when the projection is performed on the plane including the top layer is obtained first, and the difference between the plane and the speaker distance is compensated do.

도 6는 매트릭스 기반 다운믹서(310)의 개념도이다. 먼저 파서(610)를 이용하여 입력 비트열을 컨텐츠 제공자가 제공한 모드 비트와 채널신호를 분리한다. 모드 비트가 설정 되어있는 경우 스피커 결정부(620)는 해당 스피커 그룹을 선택하고, 모드 비트가 설정되어 있지 않은 경우의 경우 현재 사용자가 사용하는 스피커 위치 정보를 이용하여 가장 최단 거리가 최소인 스피커 그룹을 선택한다. 게인 및 딜레이를 보상부(630)에서 설정된 스피커 그룹과 실제 사용자의 스피커 배치 사이의 거리가 다른 것을 보상하기 위하여 각 스피커의 게인과 딜레이를 보상한다. 마지막으로 다운믹스 매트릭스 생성부(640)에서 게인 및 딜레이 보상부(630)에서 출력되는 게인 및 딜레이를 적용하여 파서에서 출력되는 채널을 다른 채널들로 다운믹스한다.
6 is a conceptual diagram of a matrix-based downmixer 310. FIG. First, a parser 610 separates an input bit stream from a mode bit provided by a content provider and a channel signal. When the mode bit is set, the speaker determining unit 620 selects the corresponding speaker group. If the mode bit is not set, the speaker determining unit 620 determines the speaker group . The gain and delay of each speaker are compensated to compensate for the difference in distance between the speaker group set in the compensating unit 630 and the actual user's speaker arrangement. Finally, the downmix matrix generator 640 applies the gain and delay output from the gain and delay compensator 630 to downmix a channel output from the parser to other channels.

(동적인 음원 다운믹서 / 경로 기반 다운믹서)(Dynamic source downmixer / path-based downmixer)

도 7는 동적인 음원 다운믹서(320)의 개념도이다. 먼저 파서(710)은 입력 비트열을 파싱하여 예외 채널 신호와 근방의 복수의 채널 신호를 경로 추정부(720)에 전달한다. 경로 추정부(720)에서는 복수의 채널 신호의 경우 채널간의 상관도를 추정하여 상관도가 높은 채널들의 변화를 경로로 추정한다. 스피커 선택부(730)는 경로 추정부(720)에서 추정한 경로를 이용하여 추정한 경로에서 일정 거리 이하의 스피커들을 선택한다. 이렇게 선택된 스피커들의 위치정보는 다운믹서(740)에 전해져 해당 스피커에 맞게 다운믹스 된다. 상기 다운믹스 방법의 한 예로써 벡터 기반 진폭 패닝 기법(Vector base amplitude panning, VBAP)가 한 예가 된다.
FIG. 7 is a conceptual diagram of a dynamic sound source down mixer 320. FIG. First, the parser 710 parses the input bit stream and transmits a plurality of channel signals in the vicinity of the exception channel signal to the path estimator 720. In the case of a plurality of channel signals, the path estimator 720 estimates a degree of correlation between channels and estimates a change in channels having a high degree of correlation. The speaker selecting unit 730 selects speakers of a certain distance or less from the path estimated by the path estimating unit 720. The position information of the selected speakers is transmitted to the down mixer 740 and down-mixed according to the speaker. An example of the downmix method is Vector based amplitude panning (VBAP).

(가상 채널 생성기)(Virtual channel generator)

도 8은 가상 채널 생성기(330)의 개념도이다. 입력 비트열은 파서(810)을 통하여 예외 채널 신호로 파싱된다. 파라미터 추출부(820)에서는 전달된 예외 채널 신호 내장되어 있는 일반화 된 머리 전달 함수 혹은 제공된 개인화 된 머리 전달 함수를 이용하여 파라미터를 추출한다. 파라미터의 실시예로써 특정 스펙트럼의 나치나 피크의 주파수 및 크기 정보 또는 특정 주파수의 양이 레벨차, 양이 위상차가 될 수 있다. 가상 채널 기반 다운믹서(830)에서는 이렇게 전달 된 파라미터를 바탕으로 다운믹스를 수행한다. 이러한 다운믹스의 실시예로 머리 전달 함수를 필터링 하는 것 또는 전체 주파수에서 특정 대역으로 나누어 패닝을 수행하는 콤플렉스 패닝 등이 된다.
8 is a conceptual diagram of the virtual channel generator 330. FIG. The input bit stream is parsed into an exception channel signal through a parser 810. The parameter extracting unit 820 extracts the parameter using the generalized head transfer function or the provided personalized head transfer function having the embedded exception channel signal. As an example of the parameter, the frequency and magnitude information of the nazzie or peak of a specific spectrum or the amount of the specific frequency may be a level difference and a positive phase difference. The virtual channel-based downmixer 830 performs downmix based on the parameters thus transmitted. As an example of such a downmix, there is a complex panning in which a head transfer function is filtered, or a panning is performed by dividing a specific frequency band at a whole frequency.

본 발명에 따른 오디오 신호 처리 방법은 컴퓨터에서 실행되기 위한 프로그램으로 제작되어 컴퓨터가 읽을 수 있는 기록 매체에 저장될 수 있으며, 본 발명에 따른 데이터 구조를 가지는 멀티미디어 데이터도 컴퓨터가 읽을 수 있는 기록 매체에 저장될 수 있다. 상기 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 저장 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한, 상기 인코딩 방법에 의해 생성된 비트스트림은 컴퓨터가 읽을 수 있는 기록 매체에 저장되거나, 유/무선 통신망을 이용해 전송될 수 있다.The audio signal processing method according to the present invention may be implemented as a program to be executed by a computer and stored in a computer-readable recording medium. The multimedia data having the data structure according to the present invention may also be recorded on a computer- Lt; / RTI > The computer-readable recording medium includes all kinds of storage devices in which data that can be read by a computer system is stored. Examples of the computer-readable recording medium include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like, and may be implemented in the form of a carrier wave (for example, transmission via the Internet) . In addition, the bit stream generated by the encoding method may be stored in a computer-readable recording medium or transmitted using a wired / wireless communication network.

이상과 같이, 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 이것에 의해 한정되지 않으며 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 본 발명의 기술사상과 아래에 기재될 특허청구범위의 균등범위 내에서 다양한 수정 및 변형이 가능함은 물론이다.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. It will be understood that various modifications and changes may be made without departing from the scope of the appended claims.

210 : 상위 레이어
220 : 중간 레이어
230 : 바닥 레이어
240 : LFE 채널
310 : 매트릭스 기반 다운믹서
320 : 경로 기반 다운믹서
330 : 가상 채널 생성기
340 : 다운 믹서 선택부
510 : 탑 레이어를 포함하는 평면 상에 위치하는 스피커
520 : TpC 채널
530 : 탑 레이어를 포함하는 평면 밖에 위치하는 스피커
610 : 파서
620 : 스피커 결정부
630 : 게인 및 딜레이 보상부
640 : 다운믹스 매트릭스 생성부
710 : 파서
720 : 경로 추정부
730 : 스피커 선택부
740 : 다운믹서
810 : 파서
820 : 파라미터 추출부
830 : 가상 채널 기반 다운믹서
210: Upper layer
220: middle layer
230: bottom layer
240: LFE channel
310: Matrix-based downmixer
320: Path-based downmixer
330: Virtual channel generator
340: down mixer selection unit
510: Speaker located on a plane including the top layer
520: TpC channel
530: Speaker located outside the plane containing the top layer
610: Parser
620: Speaker determination section
630: gain and delay compensation unit
640: Downmix matrix generating unit
710: Parser
720: path estimating unit
730: Speaker selection unit
740: Downmixer
810: Parser
820:
830: Virtual channel-based downmixer

Claims

오디오 신호처리 방법으로써,
일반 채널 신호와 예외 채널 신호가 포함된 비트열을 수신하는 단계;
상기 수신된 비트열로부터 예외 채널 신호와 일반 채널 신호를 복호화하는 단계;
상기 복호화된 예외 채널 신호와 상기 복호화된 일반 채널 신호를 이용하여 상관 정보를 생성하는 단계;
상기 상관 정보를 이용하여 동일한 다운믹스 이득값을 적용하는 제 1 다운믹스 방법과 시간에 따른 가변적 이득값을 적용하는 제 2 다운믹스 방법 중 적어도 하나를 통해 이득값을 생성하는 단계;
상기 이득값을 이용하여 상기 예외 채널 신호를 복수개의 채널 신호를 출력하는 단계를 포함하는 오디오 신호처리 방법.
As an audio signal processing method,
Receiving a bit stream including a general channel signal and an exception channel signal;
Decoding an exception channel signal and a general channel signal from the received bit stream;
Generating correlation information using the decoded exception channel signal and the decoded general channel signal;
Generating a gain value through at least one of a first downmix method for applying the same downmix gain value using the correlation information and a second downmix method for applying a variable gain value over time;
And outputting a plurality of channel signals of the exception channel signal using the gain value.

제 1항에 있어서.
상기 제 1 다운믹스 방법은 복수개의 채널에 동일한 다운믹스 이득값을 적용하는 것을 특징으로 하는 오디오 신호처리 방법.
The method of claim 1,
Wherein the first downmix method applies the same downmix gain value to a plurality of channels.

제 2항에 있어서,
상기 제 1 다운믹스 방법은 스피커의 위치 정보를 이용하여 이득값과 딜레이 정보를 보상하는 것을 특징으로 하는 오디오 신호 처리 방법;
3. The method of claim 2,
Wherein the first downmix method compensates the gain value and the delay information using the position information of the speaker.

제 2항에 있어서,
상기 제 1 다운믹스 방법은 균등하게 나눠진 공간에 동일한 이득값이 분배 하는 것을 특징으로 하는 오디오 신호처리 방법.
3. The method of claim 2,
Wherein the first downmix method distributes the same gain value to an evenly divided space.

제 1항에 있어서.
상기 제 2 다운믹스 방법은 상기 상관 정보를 기준으로 음상의 이동 경로를 추정하여 다운믹스 이득값을 시간에 따라 가변적으로 조절하는 것을 특징으로 하는 오디오 신호처리 방법.

The method of claim 1,
Wherein the second downmix method variably adjusts the downmix gain value according to time by estimating a moving path of the sound image based on the correlation information.