KR20130109615A

KR20130109615A - Virtual sound producing method and apparatus for the same

Info

Publication number: KR20130109615A
Application number: KR1020120031504A
Authority: KR
Inventors: 최병권; 노경식
Original assignee: 삼성전자주식회사
Priority date: 2012-03-28
Filing date: 2012-03-28
Publication date: 2013-10-08
Also published as: KR101901593B1

Abstract

PURPOSE: A method and an apparatus for generating virtual stereoscopic sound are provided to reduce errors in the realty and the directionality recognized by each user. CONSTITUTION: A receiving part (210) receives multi-channel voice signals acquired from plural mikes including a reference mike. A sound source direction estimating part (240) estimates the direction of the sound source based on the received multi-channel voice signals. A virtual sound source generating part (250) generates plural virtual sound sources based on plural head related transfer function (HRTF) values determined according to sound signals and sound source directions selected among the multi-channel voice signals. A virtual stereoscopic sound generating part (260) generates the virtual stereoscopic sound by averaging the plural virtual sound sources. [Reference numerals] (200) Voice output device; (210) Receiving part; (220) Decoder; (230) Sync correction part; (240) Sound source direction estimating part; (250) Virtual sound source generating part; (260) Virtual stereoscopic sound generating part; (270) Output part; (280) Storage part

Description

가상 입체 음향 생성 방법 및 장치{Virtual sound producing method and apparatus for the same}Virtual sound producing method and apparatus for the same

본 발명은 가상 입체 음향 생성 방법 및 장치에 관한 것으로, 더욱 상세하게는 HRTF 기반의 입체 음향 생성 시 현실감과 방향성에 대한 오차를 줄임으로써, 사용자가 제어 장치를 사용하여 원격지에 있는 로봇을 조작할 때 로봇에 대한 조작 능력을 향상시킬 수 있도록 가상 입체 음향을 생성하는 가상 입체 음향 생성 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for generating virtual stereo sound, and more particularly, by reducing errors in reality and directionality when generating HRTF-based stereo sound, when a user manipulates a robot remotely using a control device. The present invention relates to a virtual stereo sound generating method and apparatus for generating a virtual stereo sound so as to improve a manipulation ability for a robot.

3차원 가상 입체 음향이란 헤드폰이나 특정 위치에 놓인 스피커를 통해 가상공간의 특정위치에 음원을 형성하여 사용자가 듣는 소리가 마치 실제로 그 가상음원이 위치하고 있는 곳에서 들려오는 것처럼 방향감, 거리감, 공간감 등을 주는 것을 말한다. 실제로 사람의 눈은 일정 영역만 바라볼 수 있기 때문에 들려오는 소리정보에 의해서 시선을 움직이게 된다. 소리가 없고 영상만 있는 가상현실 시스템은 눈으로 볼 수 있는 전방의 일정 영역에만 가상공간을 형성하지만 3차원 음향효과가 부여된 가상현실 시스템은 사용자의 모든 주위 환경을 가상공간으로 형성하는 큰 효과가 있다.Three-dimensional virtual stereo sounds form sound sources at specific locations in the virtual space through headphones or speakers placed at specific locations so that the user can hear the sense of direction, distance, and sense of space as if they were actually heard from the location where the virtual sound source is located. I say giving. In fact, since the human eye can only look at a certain area, the eyes are moved by the sound information. Virtual reality system with no sound and only image forms a virtual space in a certain area in front of the eye, but a virtual reality system with 3D sound effect has a great effect of forming all the surroundings of the user into a virtual space. have.

입체음향을 구현하기 위해서는 공간상의 특정위치에 음원을 위치시키는 입체음상정위 기술이 필수적이다. 3차원 공간상의 특정 위치에서 음원이 들려지는 것과 같은 효과를 내는 것을 위치음 효과라고 하는데 2채널 재생방식에 있어서 위치음 효과는 머리전달함수(Head Related Transfer Function; 이하 HRTF라 한다)와의 콘볼루션을 통해서 생성한다.In order to realize stereoscopic sound, stereoscopic stereotactic technology for positioning a sound source at a specific location in space is essential. The effect of sound being heard at a specific location in three-dimensional space is called the location sound effect. In the two-channel playback method, the location sound effect is a convolution with a head related transfer function (hereinafter referred to as HRTF). Generate through

청취자는 두 귀에 입사한 두 신호간의 차이로 음원의 위치를 지각하게 되는데 이러한 특성은 HRTF에 내포되어 있어 HRTF를 이용하면 단순음에 공간적 정보가 부가된 입체음을 생성할 수 있다. The listener perceives the position of the sound source by the difference between the two signals incident on the two ears. This characteristic is included in the HRTF, and the HRTF can be used to generate three-dimensional sound with spatial information added to the simple sound.

HRTF는 데이터베이스로 실험적으로 얻어질 수 있다. 구체적으로, 무향실 내에서 더미헤드를 중심으로 구의 형태로 여러 각도에 배치한 스피커로부터 백색 잡음과 같은 임펄스 신호를 방사시키고, 더미헤드의 양쪽 귀에 장착한 마이크로폰으로 측정한 임펄스 응답을 푸리에 변환함으로써 HRTF DB를 얻을 수 있다. HRTF can be obtained experimentally from a database. Specifically, HRTF DB by radiating an impulse signal such as white noise from a speaker arranged at various angles in the form of a sphere around a dummy head in anechoic chamber, and Fourier transforming the impulse response measured by microphones mounted on both ears of the dummy head. Can be obtained.

이렇듯 실험적으로 얻은 HRTF DB는 비개인화된 데이터베이스이기 때문에 HRTF에 기반하여 2채널의 가상입체 음향을 생성하는 경우, 사용자마다 느끼는 현실감과 방향성에 오차가 크다는 문제가 있다.Since the experimentally obtained HRTF DB is a non-personalized database, there is a problem in that the reality and direction felt by each user are large when generating two-channel virtual stereo sound based on the HRTF.

본 발명이 해결하고자 하는 과제는 HRTF 기반의 입체 음향 생성 시 현실감과 방향성에 대한 오차를 줄일 수 있는 가상 입체 음향 생성 방법 및 장치를 제공하는 것이다. The problem to be solved by the present invention is to provide a virtual stereo sound generating method and apparatus that can reduce the error of the reality and direction when generating the HRTF-based stereo sound.

그러나 본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제들로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.However, the problems to be solved by the present invention are not limited to the above-mentioned problems, and other problems which are not mentioned can be clearly understood by those skilled in the art from the following description.

상술한 과제를 해결하기 위하여, 본 발명의 일 실시예에 따른 가상 입체 음향 생성 방법은 기준 마이크를 포함하는 다수의 마이크로부터 획득된 다채널 음성 신호를 수신하는 단계; 상기 수신된 다채널 음성 신호에 근거하여 음원의 방향을 추정하는 단계; 상기 다채널 음성 신호 중에서 선택된 적어도 하나의 음성 신호와 상기 음원의 방향에 따라 결정된 다수의 HRTF 값에 근거하여 다수의 가상 음원을 생성하는 단계; 및 상기 다수의 가상 음원을 평균하여 가상 입체 음향을 생성하는 단계를 포함한다. In order to solve the above problems, a virtual stereo sound generating method according to an embodiment of the present invention comprises the steps of receiving a multi-channel voice signal obtained from a plurality of microphones including a reference microphone; Estimating a direction of a sound source based on the received multi-channel voice signal; Generating a plurality of virtual sound sources based on at least one voice signal selected from the multi-channel voice signals and a plurality of HRTF values determined according to a direction of the sound source; And generating a virtual stereo sound by averaging the plurality of virtual sound sources.

상기 다수의 가상 음원을 생성하는 단계는 상기 기준 마이크의 음성 신호와 상기 음원의 방향에 대응하는 HRTF 값인 기준 HRTF 값을 합성하여 제1 가상 음원을 생성하는 단계; 상기 기준 마이크의 좌측에 위치한 마이크의 음성 신호와 상기 기준 HRTF 값보다 한 단계 작은 HRTF 값을 합성하여 제2 가상 음원을 생성하는 단계; 및 상기 기준 마이크의 우측에 위치한 마이크의 음성 신호와 상기 기준 HRTF 값보다 한 단계 큰 HRTF 값을 합성하여 제3 가상 음원을 생성하는 단계를 포함한다. The generating of the plurality of virtual sound sources may include: generating a first virtual sound source by synthesizing a voice signal of the reference microphone and a reference HRTF value that is an HRTF value corresponding to a direction of the sound source; Generating a second virtual sound source by combining a voice signal of a microphone located to the left of the reference microphone and an HRTF value one step smaller than the reference HRTF value; And generating a third virtual sound source by synthesizing a voice signal of a microphone located to the right of the reference microphone and an HRTF value one step larger than the reference HRTF value.

상기 다수의 가상 음원을 생성하는 단계는 상기 다채널 음성 신호 중에서 적어도 두 개의 음성 신호를 선택하는 단계; 상기 선택된 적어도 두 개의 음성 신호 각각에 대하여 상기 음원의 방향에 대응하는 HRTF 값인 기준 HRTF 값, 상기 기준 HRTF 값 보다 한 단계 작은 HRTF 값, 및 상기 기준 HRTF 값 보다 한 단계 큰 HRTF 값을 합성하여 다수의 가상 음원을 생성하는 단계를 포함한다. The generating of the plurality of virtual sound sources may include selecting at least two voice signals from the multichannel voice signals; For each of the selected at least two voice signals, a plurality of reference HRTF values corresponding to the direction of the sound source, HRTF values one step smaller than the reference HRTF value, and HRTF values one step larger than the reference HRTF value are synthesized. Generating a virtual sound source.

상기 선택된 음성 신호는 상기 기준 마이크의 음성 신호를 포함한다. The selected voice signal includes a voice signal of the reference microphone.

상기 다수의 가상 음원을 생성하는 단계는 상기 다채널 음성 신호 중에서 상기 기준 마이크의 음성 신호를 선택하는 단계; 및 상기 기준 마이크의 음성 신호와 상기 음원의 방향에 대응하는 HRTF 값인 기준 HRTF 값, 상기 기준 HRTF 값 보다 한 단계 작은 HRTF 값, 및 상기 기준 HRTF 값 보다 한 단계 큰 HRTF 값을 각각 합성하여 제1 가상 음원, 제2 가상 음원 및 제3 가상 음원을 생성하는 단계를 포함한다. The generating of the plurality of virtual sound sources may include selecting a voice signal of the reference microphone from the multichannel voice signals; And combining a voice signal of the reference microphone with a reference HRTF value corresponding to the direction of the sound source, an HRTF value one step smaller than the reference HRTF value, and an HRTF value one step larger than the reference HRTF value, respectively. Generating a sound source, a second virtual sound source, and a third virtual sound source.

상기 가상 입체 음향을 생성하는 단계는 상기 다수의 가상 음원의 좌측 채널 신호에 대한 평균을 구하는 단계; 및 상기 다수의 가상 음원의 우측 채널 신호에 대한 평균을 구하는 단계를 포함한다. The generating of the virtual stereo sound may include obtaining an average of left channel signals of the plurality of virtual sound sources; And calculating an average of right channel signals of the plurality of virtual sound sources.

또한 상술한 과제를 해결하기 위하여, 본 발명의 일 실시예에 따른 가상 입체 음향 생성 장치는 기준 마이크를 포함하는 다수의 마이크로부터 획득된 다채널 음성 신호를 수신하는 수신부; 상기 수신된 다채널 음성 신호에 근거하여 음원의 방향을 추정하는 음원 방향 추정부; 상기 다채널 음성 신호 중에서 선택된 적어도 하나의 음성 신호와 상기 음원의 방향에 따라 결정된 다수의 HRTF 값에 근거하여 다수의 가상 음원을 생성하는 가상 음원 생성부; 및 상기 다수의 가상 음원을 평균하여 가상 입체 음향을 생성하는 가상 입체 음향 생성부를 포함한다. In addition, in order to solve the above problems, the virtual stereo sound generating apparatus according to an embodiment of the present invention includes a receiver for receiving a multi-channel voice signal obtained from a plurality of microphones including a reference microphone; A sound source direction estimator for estimating a direction of a sound source based on the received multi-channel voice signal; A virtual sound source generator for generating a plurality of virtual sound sources based on at least one voice signal selected from the multi-channel voice signals and a plurality of HRTF values determined according to the direction of the sound source; And a virtual stereo sound generator for generating a virtual stereo sound by averaging the plurality of virtual sound sources.

상기 가상 음원 생성부는 상기 기준 마이크의 음성 신호와 상기 음원의 방향에 대응하는 HRTF 값인 기준 HRTF 값을 합성하여 제1 가상 음원을 생성하고, 상기 기준 마이크의 좌측에 위치한 마이크의 음성 신호와 상기 기준 HRTF 값보다 한 단계 작은 HRTF 값을 합성하여 제2 가상 음원을 생성하며, 상기 기준 마이크의 우측에 위치한 마이크의 음성 신호와 상기 기준 HRTF 값보다 한 단계 큰 HRTF 값을 합성하여 제3 가상 음원을 생성할 수 있다. The virtual sound source generator generates a first virtual sound source by synthesizing a voice signal of the reference microphone and a reference HRTF value that is an HRTF value corresponding to the direction of the sound source, and generates a voice signal of the microphone located to the left of the reference microphone and the reference HRTF. The second virtual sound source is generated by synthesizing the HRTF value one step smaller than the value, and the third virtual sound source is synthesized by synthesizing the voice signal of the microphone located to the right of the reference microphone and the HRTF value one step larger than the reference HRTF value. Can be.

상기 가상 음원 생성부는 상기 다채널 음성 신호 중에서 적어도 두 개의 음성 신호를 선택하고, 상기 선택된 적어도 두 개의 음성 신호 각각에 대하여 상기 음원의 방향에 대응하는 HRTF 값인 기준 HRTF 값, 상기 기준 HRTF 값 보다 한 단계 작은 HRTF 값, 및 상기 기준 HRTF 값 보다 한 단계 큰 HRTF 값을 합성하여 다수의 가상 음원을 생성할 수 있다.The virtual sound source generation unit selects at least two voice signals from the multi-channel voice signals, and for each of the selected at least two voice signals, a reference HRTF value which is an HRTF value corresponding to the direction of the sound source, and the reference HRTF value. A plurality of virtual sound sources may be generated by combining the small HRTF value and the HRTF value one step larger than the reference HRTF value.

상기 선택된 음성 신호는 상기 기준 마이크의 음성 신호를 포함할 수 있다.The selected voice signal may include a voice signal of the reference microphone.

상기 가상 음원 생성부는 상기 다채널 음성 신호 중에서 상기 기준 마이크의 음성 신호를 선택하고, 상기 기준 마이크의 음성 신호와 상기 음원의 방향에 대응하는 HRTF 값인 기준 HRTF 값, 상기 기준 HRTF 값 보다 한 단계 작은 HRTF 값, 및 상기 기준 HRTF 값 보다 한 단계 큰 HRTF 값을 각각 합성하여 제1 가상 음원, 제2 가상 음원 및 제3 가상 음원을 생성할 수 있다.The virtual sound source generation unit selects a voice signal of the reference microphone from among the multi-channel voice signals, a HRTF value corresponding to a voice signal of the reference microphone and a direction of the sound source, and a HRTF smaller than the reference HRTF value by one step. The first virtual sound source, the second virtual sound source, and the third virtual sound source may be generated by synthesizing the value and the HRTF value one step larger than the reference HRTF value.

상기 가상 입체 음향 생성부는 상기 다수의 가상 음원의 좌측 채널 신호에 대한 평균을 구하고, 상기 다수의 가상 음원의 우측 채널 신호에 대한 평균을 구하여 상기 가상 입체 음향을 생성한다.The virtual stereo sound generator generates the average of the left channel signals of the plurality of virtual sound sources and obtains the average of the right channel signals of the plurality of virtual sound sources to generate the virtual stereo sound.

본 발명에 의한 가상 입체 음향 생성 방법 및 장치에 따르면 다음과 같은 효과가 있다. According to the virtual stereo sound generation method and apparatus according to the present invention has the following advantages.

HRTF 기반의 입체 음향 생성 시 사용자마다 느끼는 현실감과 방향성에 대한 오차를 줄일 수 있다. When creating 3D sound based on HRTF, it is possible to reduce the error about reality and direction felt by each user.

사용자마다 느끼는 현실감과 방향성에 대한 오차를 줄일 수 있으므로, 사용자가 원격지의 로봇을 조작할 때 로봇에 대한 조작 능력을 향상시킬 수 있다.Since the error on the reality and the direction felt by each user can be reduced, when the user operates the robot in a remote location, the operation ability of the robot can be improved.

도 1은 본 발명의 일 실시예에 따른 가상 입체 음향 생성 방법이 적용될 수 있는 원격 제어 시스템의 구성을 도시한 도면이다.
도 2는 도 1에 도시된 음성 입력 장치의 구성을 도시한 도면이다.
도 3은 도 1에 도시된 음성 출력 장치의 구성을 도시한 도면이다.
도 4는 HRTF DB를 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른 가상 입체 음향 생성 방법을 도시한 흐름도이다.1 is a diagram illustrating a configuration of a remote control system to which a virtual stereo sound generating method according to an embodiment of the present invention can be applied.
FIG. 2 is a diagram illustrating a configuration of a voice input device shown in FIG. 1.
FIG. 3 is a diagram illustrating a configuration of the audio output device shown in FIG. 1.
4 is a diagram for explaining an HRTF DB.
5 is a flowchart illustrating a virtual stereo sound generating method according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. Advantages and features of the present invention and methods for achieving them will be apparent with reference to the embodiments described below in detail with the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. To fully disclose the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims.

이하, 첨부된 도면들을 참조하여 본 발명의 실시예에 의한 가상 입체 음향 생성 방법 및 장치에 대해 설명한다. 도면에서 동일한 도면 부호는 동일한 구성 요소를 나타낸다. Hereinafter, a method and apparatus for generating virtual stereo sound according to an embodiment of the present invention will be described with reference to the accompanying drawings. In the drawings, like reference numerals designate like elements.

도 1은 본 발명의 일 실시예에 따른 가상 입체 음향 생성 방법이 적용될 수 있는 원격 제어 시스템의 구성을 도시한 도면이다. 1 is a diagram illustrating a configuration of a remote control system to which a virtual stereo sound generating method according to an embodiment of the present invention can be applied.

도 1에 도시된 바와 같이 원격 제어 시스템은 로봇(10) 및 제어 장치(20)를 포함할 수 있다. As shown in FIG. 1, the remote control system may include a robot 10 and a control device 20.

로봇(10)은 원격 제어 시스템에서 슬레이브 장치로 동작한다. 이러한 로봇(10)은 예를 들어 보행 로봇(10)으로 구현될 수 있다. 보행 로봇(10)은 몸통의 상부에는 머리가 구비되고, 몸통의 좌측 및 우측에는 각각 왼쪽 팔과 오른쪽 팔이 구비될 수 있다. 그리고 몸통의 하부에는 왼쪽 다리와 오른쪽 다리가 구비될 수 있다. 이러한 로봇(10)에는 음성 입력 장치(도 2의 도면부호 100 참조)가 탑재될 수 있다. 음성 입력 장치(100)에 대한 보다 상세한 설명은 도 2를 참조하여 후술하기로 한다. The robot 10 operates as a slave device in the remote control system. Such a robot 10 may be implemented with, for example, a walking robot 10. The walking robot 10 may be provided with a head at an upper portion of the trunk, and left and right arms may be provided at left and right sides of the trunk, respectively. And the lower part of the body may be provided with the left leg and the right leg. The robot 10 may be equipped with a voice input device (see reference numeral 100 of FIG. 2). The voice input device 100 will be described in detail later with reference to FIG. 2.

제어 장치(20)는 원격 제어 시스템에서 마스터 장치로 동작한다. 사용자는 제어 장치(20)를 이용하여 로봇(10)을 원격으로 제어할 수 있으며, 로봇(10)의 동작이나 상태를 모니터링할 수 있다. 이를 위해 제어 장치(20)는 로봇(10)을 제어하기 위한 제어 신호를 로봇(10)으로 송신할 수 있으며, 로봇(10)에서 수집된 데이터를 수신하여 처리할 수 있다. 제어 장치(20)는 제어 신호의 송신이나 데이터의 수신을 위해 네트워크를 통해 로봇(10)과 연결될 수 있다. 네트워크는 유선 네트워크, 무선 네트워크 또는 이들의 조합으로 이루어질 수 있다. 이러한 제어 장치(20)에는 음성 출력 장치(도 3의 도면부호 200 참조)가 탑재될 수 있다. 음성 출력 장치(200)에 대한 보다 상세한 설명은 도 3을 참조하여 후술하기로 한다. The control device 20 operates as a master device in the remote control system. The user may remotely control the robot 10 using the control device 20, and may monitor the operation or state of the robot 10. To this end, the control device 20 may transmit a control signal for controlling the robot 10 to the robot 10, and may receive and process data collected by the robot 10. The control device 20 may be connected to the robot 10 through a network for transmission of control signals or reception of data. The network may be a wired network, a wireless network, or a combination thereof. The control device 20 may be equipped with a voice output device (see 200 in FIG. 3). The voice output device 200 will be described in detail later with reference to FIG. 3.

도 2는 본 발명의 일 실시예에 따른 음성 입력 장치(100)의 구성을 도시한 도면이다. 2 is a diagram illustrating a configuration of a voice input device 100 according to an embodiment of the present invention.

음성 입력 장치(100)는 음원(音源)으로부터 소리를 입력받고, 입력받은 소리로부터 음성 신호를 추출할 수 있다. 그리고 추출된 음성 신호를 음성 출력 장치(200)로 전송할 수 있다. 이를 위하여 음성 입력 장치(100)는 수집부(110), 추출부(120), 인코더(130) 및 전송부(140)를 포함할 수 있다. The voice input device 100 may receive a sound from a sound source and extract a voice signal from the received sound. The extracted voice signal may be transmitted to the voice output device 200. To this end, the voice input device 100 may include a collector 110, an extractor 120, an encoder 130, and a transmitter 140.

수집부(110)는 다수의 마이크(111, 112, 113, 114)를 포함할 수 있다. 여기서 마이크는 음파 또는 초음파를 입력받아 그 진동에 따른 전기 신호를 발생하는 마이크로 폰(microphone)을 의미한다. 본 발명의 실시예에 따르면 수집부(110)는 2개 이상의 마이크를 포함할 수 있다. 다수의 마이크는 로봇(10)의 몸통의 전면에 배치되거나, 로봇(10)의 몸통의 둘레를 따라 배치될 수 있다. 일 예로, 다수의 마이크(111, 112, 113, 114)는 로봇(10)의 몸통의 전면에 일직선으로 배치될 수 있다. 이 때, 다수의 마이크(111, 112, 113, 114)는 가로 방향으로 일정 간격마다 배치되거나 세로 방향으로 일정 간격마다 배치될 수 있다. 다른 예로, 다수의 마이크(111, 112, 113, 114)는 로봇(10)의 몸통의 전면에 원형으로 배치될 수도 있다. 또 다른 예로, 다수의 마이(111, 112, 113, 114)크는 로봇(10)의 몸통의 둘레를 따라 일정 간격으로 배치될 수 있다. 이하의 설명에서는 4개의 마이크(111, 112, 113, 114)가 로봇(10)의 몸통 전면에 가로 방향으로 일정 간격마다 배치되는 경우를 예로 들어 설명하기로 한다. 또한 로봇(10)의 오른쪽에 위치한 마이크부터 차례로 0번 마이크(111), 1번 마이크(112), 2번 마이크(113), 및 3번 마이크(114)라 칭하기로 한다. The collection unit 110 may include a plurality of microphones 111, 112, 113, and 114. Here, the microphone refers to a microphone that receives sound waves or ultrasonic waves and generates an electric signal according to the vibration. According to an embodiment of the present invention, the collecting unit 110 may include two or more microphones. The plurality of microphones may be disposed in front of the body of the robot 10 or may be disposed along the circumference of the body of the robot 10. For example, the plurality of microphones 111, 112, 113, and 114 may be disposed in a straight line in front of the body of the robot 10. In this case, the plurality of microphones 111, 112, 113, and 114 may be disposed at regular intervals in the horizontal direction or at regular intervals in the vertical direction. As another example, the plurality of microphones 111, 112, 113, and 114 may be arranged in a circle on the front of the body of the robot 10. As another example, the plurality of mice 111, 112, 113, and 114 may be arranged at regular intervals along the circumference of the body of the robot 10. In the following description, a case in which four microphones 111, 112, 113, and 114 are disposed at regular intervals in the horizontal direction on the front surface of the robot 10 will be described as an example. In addition, the microphone located on the right side of the robot 10 will be referred to as a microphone 0, 111, microphone 112, microphone 113, and microphone 114 in turn.

추출부(120)는 다수의 마이크(111, 112, 113, 114)에서 출력된 다채널 전기 신호로부터 다채널 음성 신호를 추출할 수 있다. 구체적으로, 추출부(120)는 4개의 마이크(111, 112, 113, 114)에서 출력된 각각의 전기 신호로부터 음성 신호를 추출할 수 있다. 추출된 다채널 음성 신호 중에서 어느 하나의 음성 신호는 기준 음성 신호로 설정될 수 있다. 여기서, 기준 음성 신호란 기준 마이크의 전기 신호에서 추출된 음성 신호를 의미할 수 있다. 기준 마이크는 사전에 지정될 수 있다. 예를 들면, 가운데에 위치한 마이크들(1번 마이크와 2번 마이크) 중에서 1번 마이크(112)가 기준 마이크로 설정될 수 있다. The extractor 120 may extract a multichannel audio signal from the multichannel electrical signals output from the plurality of microphones 111, 112, 113, and 114. Specifically, the extractor 120 may extract a voice signal from each of the electrical signals output from the four microphones 111, 112, 113, and 114. Any one of the extracted multichannel voice signals may be set as a reference voice signal. Here, the reference voice signal may mean a voice signal extracted from the electrical signal of the reference microphone. The reference microphone can be predefined. For example, the first microphone 112 among the microphones (the first microphone and the second microphone) located in the center may be set as the reference microphone.

인코더(130)(encoder)는 추출부(120)로부터 제공받은 다채널 음성 신호를 각각 부호화할 수 있다. The encoder 130 may encode each of the multichannel speech signals provided from the extractor 120.

전송부(140)는 인코더(130)로부터 제공받은 부호화된 다채널 음성 신호를 음성 출력 장치(200)로 전송할 수 있다. The transmitter 140 may transmit the encoded multi-channel voice signal provided from the encoder 130 to the voice output device 200.

도 3은 본 발명의 일 실시예에 따른 음성 출력 장치(200)의 구성을 도시한 도면이다. 3 is a diagram illustrating a configuration of a voice output device 200 according to an embodiment of the present invention.

음성 출력 장치(200)는 음성 입력 장치(100)에서 전송된 다채널 음성 신호를 수신하여 음원의 방향을 추정할 수 있다. 또한 음성 출력 장치(200)는 추정된 음원의 방향에 따라 선택된 다수의 HRTF(Head-Relate Transfer Function) 값 및 수신된 다채널 음성 신호 중 적어도 하나의 음성 신호에 근거하여 다수의 가상 음원을 생성하고, 생성된 다수의 가상 음원을 평균하여 가상 입체 음향을 출력할 수 있다. The voice output device 200 may receive the multi-channel voice signal transmitted from the voice input device 100 to estimate the direction of the sound source. In addition, the voice output device 200 generates a plurality of virtual sound sources based on at least one voice signal among a plurality of HRTF (Head-Relate Transfer Function) values selected according to the estimated direction of the sound source and the received multichannel voice signal. The virtual stereo sound may be output by averaging a plurality of generated virtual sound sources.

이러한 음성 출력 장치(200)는 도 3에 도시된 바와 같이 수신부(210), 디코더(220), 싱크 보정부(230), 음원 방향 추정부(240), 가상 음원 생성부(250), 가상 입체 음향 생성부(260), 출력부(270) 및 저장부(280)를 포함할 수 있다. As illustrated in FIG. 3, the voice output device 200 includes a receiver 210, a decoder 220, a sync corrector 230, a sound source direction estimator 240, a virtual sound source generator 250, and a virtual stereoscopic apparatus. The sound generator 260 may include an output unit 270 and a storage unit 280.

수신부(210)는 음성 입력 장치(100)로부터 부호화된 다채널 음성 신호를 수신할 수 있다. The receiver 210 may receive an encoded multi-channel voice signal from the voice input device 100.

디코더(220)는 부호화된 다채널 음성 신호를 복호화할 수 있다. 복화화된 다채널 음성 신호는 동기화부 및 음원 방향 추정부(240)로 각각 제공될 수 있다. The decoder 220 may decode the encoded multichannel speech signal. The complexed multi-channel voice signal may be provided to the synchronizer and the sound source direction estimator 240, respectively.

싱크 보정부(230)는 복호화된 다채널 음성 신호들의 싱크를 보정할 수 있다. 이 때, 싱크 보정부(230)는 기준 음성 신호를 기준으로 나머지 음성 신호들의 싱크를 보정할 수 있다. 좀 더 구체적으로 살펴보면, 다수의 마이크(111, 112, 113, 114)는 로봇(10)의 몸체 전면에 일정 간격으로 이격되어 배치되기 때문에 음원과 각 마이크 간의 거리가 서로 다르다. 이처럼 음원과 각 마이크 간의 거리가 다른 경우, 음원과의 거리가 먼 마이크일수록 소리가 늦게 도달한다. 따라서 기준 마이크의 음성 신호를 기준으로 나머지 마이크들의 음성 신호들에 대해서는 음원과의 거리에 따라 발생한 지연(delay)을 보정해 줄 필요가 있는 것이다. 싱크 보정부(230)에 의해 싱크가 보정된 다채널 음성 신호는 후술될 가상 음원 생성부(250)로 제공될 수 있다.The sync corrector 230 may correct the sync of the decoded multichannel voice signals. In this case, the sync correction unit 230 may correct the sync of the remaining voice signals based on the reference voice signal. More specifically, since the plurality of microphones 111, 112, 113, and 114 are spaced apart at regular intervals in front of the body of the robot 10, the distance between the sound source and each microphone is different. If the distance between the sound source and each microphone is different, the farther the microphone is, the later the sound arrives. Therefore, it is necessary to correct the delay caused by the distance to the sound source for the voice signals of the remaining microphones based on the voice signal of the reference microphone. The multi-channel voice signal whose sync is corrected by the sync corrector 230 may be provided to the virtual sound source generator 250 to be described later.

음원 방향 추정부(240)는 복호화된 다채널 음성 신호에 근거하여 음원의 방향을 추정할 수 있다. 여기서 음원의 방향이란 기준 위치에 대한 음원의 위치를 의미하는 것으로, 음원의 위치는 방향각 및 고도각으로 표시될 수 있다. 이하의 설명에서는 설명의 편의를 위하여 방향각만을 고려하여 설명하기로 한다. 한편, 기준 위치는 로봇(10)의 정면을 의미하며, 음원의 방향은 기준 위치로부터의 각도로 표시될 수 있다. 일 예로, 음원이 기준 위치로부터 우측으로 45° 방향에 위치한 경우, 음원의 방향은 +45° 로 표시될 수 있다. 반대로 음원이 기준 위치로부터 좌측으로 45° 방향에 위치한 경우, 음원의 방향은 -45°로 표시될 수 있다. 음원의 방향을 추정하는 방법은 공지의 기술이므로, 이에 대한 상세한 설명은 생략하기로 한다. 음원 방향 추정부(240)에서 추정된 음원의 방향 정보는 가상 음원 생성부(250)로 제공될 수 있다. The sound source direction estimator 240 may estimate the direction of the sound source based on the decoded multichannel speech signal. Here, the direction of the sound source means the position of the sound source with respect to the reference position, and the position of the sound source may be indicated by a direction angle and an elevation angle. In the following description, only the direction angle will be described for convenience of description. On the other hand, the reference position means the front of the robot 10, the direction of the sound source may be displayed at an angle from the reference position. For example, when the sound source is located in the 45 ° direction to the right from the reference position, the direction of the sound source may be displayed as + 45 °. On the contrary, when the sound source is located 45 ° to the left from the reference position, the direction of the sound source may be displayed as -45 °. Since the method of estimating the direction of the sound source is a known technique, a detailed description thereof will be omitted. Direction information of the sound source estimated by the sound source direction estimator 240 may be provided to the virtual sound source generator 250.

가상 음원 생성부(250)는 싱크 보정부(230)로부터 싱크가 보정된 다채널 음성 신호를 제공받고, 음원 방향 추정부(240)로부터 음원 방향 정보를 제공 받는다. 이러한 가상 음원 생성부(250)는 제공받은 다채널 음성 신호 중에서 선택된 적어도 하나의 음성 신호와, 제공받은 음원의 방향에 따라 선택된 다수의 HRTF 값에 근거하여 다수의 가상 음원을 생성한다. The virtual sound source generator 250 receives the multi-channel voice signal with the sync corrected from the sync corrector 230, and receives the sound source direction information from the sound source direction estimator 240. The virtual sound source generator 250 generates a plurality of virtual sound sources based on at least one voice signal selected from the provided multi-channel voice signals and a plurality of HRTF values selected according to the direction of the provided sound source.

다수의 가상 음원을 생성하는 방법을 설명하기에 앞서, 도 4를 참조하여 HRTF DB에 대해서 간략히 살펴보기로 한다. Before describing a method of generating a plurality of virtual sound sources, the HRTF DB will be briefly described with reference to FIG. 4.

앞서 설명한 바와 같이 HRTF DB는 무향실 내에서 더미헤드를 중심으로 구의 형태로 여러 각도에 스피커를 배치한 다음, 이 스피커를 통해 백색 잡음과 같은 임펄스 신호를 방사하고, 더미헤드의 양쪽 귀에 장착한 마이크로폰(미도시)으로 측정한 임펄스 응답을 푸리에 변환함으로써 얻을 수 있다. 이렇게 얻어진 HRTF DB의 해상도는 더미헤드를 중심으로 배치된 스피커의 각도에 따라 결정된다. 구체적으로, 더미헤드를 중심으로 스피커들을 5도 각도로 배치하여 HRTF DB를 얻은 경우, HRTF DB의 해상도는 5도가 된다. 만약 더미헤드를 중심으로 스피커들을 2도 각도로 배치하여 HRTF DB를 얻은 경우, HRTF DB의 해상도는 2도가 된다. 이러한 HRTF DB에서 더미헤드의 정면에 위치한 스피커의 방향각은 0°로 표시될 수 있다. 그리고 더미헤드의 정면을 기준으로 오른쪽에 위치한 스피커의 방향각은 플러스 각으로 표시될 수 있으며, 더미헤드의 정면을 기준으로 왼쪽에 위치한 스피커의 방향각은 마이너스 각으로 표시될 수 있다. 예를 들어, 더미헤드의 정면을 기준으로 오른쪽 45° 방향에 위치한 스피커의 방향각은 +45°로 표시될 수 있다. 더미헤드의 정면을 기준으로 왼쪽 45° 방향에 위치한 스피커의 방향각은 -45°로 표시될 수 있다. As described above, the HRTF DB uses a speaker arranged at various angles in the form of a sphere around the dummy head in the anechoic chamber, and then emits an impulse signal such as white noise through the speaker, and a microphone mounted on both ears of the dummy head ( It can be obtained by Fourier transforming the impulse response measured in (not shown). The resolution of the HRTF DB thus obtained is determined by the angle of the speaker arranged around the dummy head. Specifically, when the HRTF DB is obtained by arranging the speakers at an angle of 5 degrees around the dummy head, the resolution of the HRTF DB is 5 degrees. If the HRTF DB is obtained by arranging the speakers at a 2-degree angle around the dummy head, the resolution of the HRTF DB is 2 degrees. In this HRTF DB, the direction angle of the speaker located in front of the dummy head may be displayed as 0 °. And the direction angle of the speaker located on the right side with respect to the front of the dummy head may be displayed as a plus angle, the direction angle of the speaker located on the left side relative to the front of the dummy head may be displayed as a negative angle. For example, the direction angle of the speaker located in the right 45 ° direction with respect to the front of the dummy head may be displayed as + 45 °. The direction angle of the speaker located in the left 45 ° direction with respect to the front of the dummy head may be displayed as -45 °.

다시 도 3을 참조하면 저장부(280)는 HRTF DB를 저장할 수 있다. 이러한 저장부(280)는 롬(Read Only Memory: ROM), 램(Random Access Memory: RAM), 피롬(Programmable Read Only Memory: PROM), 이피롬(Erasable Programmable Read Only Memory: EPROM), 플래시 메모리와 같은 비휘발성 메모리 소자, 또는 램(Random Access Memory: RAM)과 같은 휘발성 메모리 소자, 또는 하드 디스크, 광 디스크와 같은 저장 매체로 구현될 수 있다. 그러나 상술한 예로 한정되는 것은 아니며, 저장부(280)는 당업계에 알려져 있는 임의의 다른 형태로 구현될 수도 있음은 물론이다.Referring to FIG. 3 again, the storage unit 280 may store the HRTF DB. The storage unit 280 may include a read only memory (ROM), a random access memory (RAM), a programmable read only memory (PROM), an erasable programmable read only memory (EPROM), and a flash memory. A nonvolatile memory device, or a volatile memory device such as a random access memory (RAM), or a storage medium such as a hard disk or an optical disk. However, the present invention is not limited to the example described above, and the storage unit 280 may be implemented in any other form known in the art.

가상 음원 생성부(250)는 다채널 음성 신호 중에서 선택된 적어도 하나의 음성 신호와 음원의 방향에 따라 선택된 다수의 HRTF 값에 근거하여 다수의 가상 음원을 생성한다. The virtual sound source generator 250 generates a plurality of virtual sound sources based on at least one voice signal selected from the multi-channel voice signals and a plurality of HRTF values selected according to the direction of the sound source.

일 예로, 가상 음원 생성부(250)는 다채널 음성 신호 중에서 기준 마이크의 음성 신호와 음원의 방향에 대응하는 HRTF 값(이하 '기준 HRTF 값'이라 한다)을 합성하여 제1 가상 음원을 생성한다. 그리고 기준 마이크의 좌측에 위치한 마이크 즉, 2번 마이크(113)의 음성 신호와 기준 HRTF 값보다 한 단계 작은 HRTF 값을 합성하여 제2 가상 음원을 생성한다. 그리고 기준 마이크의 우측에 위치한 마이크 즉, 0번 마이크(111)의 음성 신호와 기준 HRTF 값보다 한 단계 큰 HRTF 값을 합성하여 제3 가상 음원을 생성한다. 예를 들어, 음원의 방향이 우측 45°라고 하자. 이 경우, 가상 음원 생성부(250)는 기준 마이크의 음성 신호와 음원 방향이 45°인 경우에 대응하는 HRTF 값을 합성하여 제1 가상 음원을 생성한다. 그리고 기준 마이크 보다 좌측에 위치한 마이크의 음성 신호와 음원 방향이 40°인 경우에 대응하는 HRTF 값을 합성하여 제2 가상 음원을 생성한다. 그리고 기준 마이크 보다 우측에 위치한 마이크의 음성 신호와 음원 방향이 50°인 경우에 대응하는 HRTF 값을 합성하여 제3 가상 음원을 생성한다. For example, the virtual sound source generator 250 generates a first virtual sound source by synthesizing a voice signal of a reference microphone and an HRTF value (hereinafter referred to as a 'reference HRTF value') among the multi-channel voice signals. . The second virtual sound source is generated by synthesizing the voice signal of the microphone located at the left side of the reference microphone, that is, the voice signal of the second microphone 113 and the HRTF value one step smaller than the reference HRTF value. In addition, a third virtual sound source is generated by synthesizing the voice signal of the microphone located at the right side of the reference microphone, that is, the voice signal of the microphone 111 and the HRTF value one step larger than the reference HRTF value. For example, assume that the direction of the sound source is 45 ° to the right. In this case, the virtual sound source generator 250 generates a first virtual sound source by synthesizing the voice signal of the reference microphone and the HRTF value corresponding to the case where the sound source direction is 45 °. The second virtual sound source is generated by synthesizing the voice signal of the microphone located to the left of the reference microphone and the HRTF value corresponding to the case where the sound source direction is 40 °. The third virtual sound source is generated by synthesizing the voice signal of the microphone located to the right of the reference microphone and the HRTF value corresponding to the case where the sound source direction is 50 °.

다른 예로, 가상 음원 생성부(250)는 다채널 음성 신호 중에서 두 개의 음성 신호를 선택할 수 있다. 예를 들면, 기준 마이크의 음성 신호와 기준 마이크 보다 우측에 위치한 마이크의 음성 신호를 선택할 수 있다. 그 다음 가상 음원 생성부(250)는 기준 HRTF 값, 기준 HRTF 값 보다 한 단계 작은 HRTF 값, 기준 HRTF 값 보다 한 단계 큰 HRTF 값을 각각 기준 마이크의 음성 신호와 합성하여 제1 가상 음원, 제2 가상 음원 및 제3 가상 음원을 생성할 수 있다. 또한 가상 음원 생성부(250)는 기준 기준 HRTF 값, 기준 HRTF 값 보다 한 단계 작은 HRTF 값, 기준 HRTF 값 보다 한 단계 큰 HRTF 값을 각각 기준 마이크 보다 우측에 위치한 마이크의 음성 신호와 합성하여 제4 가상 음원, 제5 가상 음원 및 제6 가상 음원을 생성할 수 있다.As another example, the virtual sound source generator 250 may select two voice signals from among the multi-channel voice signals. For example, the voice signal of the reference microphone and the voice signal of the microphone located to the right of the reference microphone may be selected. Then, the virtual sound source generator 250 synthesizes the reference HRTF value, the HRTF value one step smaller than the reference HRTF value, and the HRTF value one step larger than the reference HRTF value, with the voice signal of the reference microphone, respectively, to the first virtual sound source and the second. The virtual sound source and the third virtual sound source may be generated. In addition, the virtual sound source generator 250 synthesizes the reference reference HRTF value, the HRTF value one step smaller than the reference HRTF value, and the HRTF value one step larger than the reference HRTF value, respectively, with the voice signal of the microphone located to the right of the reference microphone. The virtual sound source, the fifth virtual sound source, and the sixth virtual sound source may be generated.

또 다른 예로, 가상 음원 생성부(250)는 다채널 음성 신호 중에서 기준 마이크의 음성 신호만을 선택할 수 있다. 그 다음 가상 음원 생성부(250)는 기준 HRTF 값, 기준 HRTF 값보다 한 단계 작은 HRTF 값, 및 기준 HRTF 값보다 한 단계 큰 HRTF 값을 각각 기준 마이크의 음성 신호와 합성하여 제1 가상 음원, 제2 가상 음원 및 제3 가상 음원을 생성할 수 있다. As another example, the virtual sound source generator 250 may select only the voice signal of the reference microphone from among the multi-channel voice signals. Next, the virtual sound source generator 250 synthesizes the reference HRTF value, the HRTF value one step smaller than the reference HRTF value, and the HRTF value one step larger than the reference HRTF value with the voice signal of the reference microphone, respectively, to generate the first virtual sound source, the first virtual sound source, and the first. A second virtual sound source and a third virtual sound source can be generated.

가상 음원 생성부(250)에서 생성된 각각의 가상 음원은 2채널의 가상 음원으로, 각각의 가상 음원은 좌측 이어폰이나 좌측 스피커로 출력될 좌측 채널 신호와, 우측 이어폰이나 우측 스피커로 출력될 우측 채널 신호를 포함한다. Each virtual sound source generated by the virtual sound source generator 250 is a two-channel virtual sound source, and each virtual sound source is a left channel signal to be output to the left earphone or the left speaker and a right channel to be output to the right earphone or the right speaker. Contains a signal.

가상 입체 음향 생성부(260)는 가상 음원 생성부(250)에 의해 생성된 다수의 가상 음원을 합성하여 2채널의 가상 입체 음향을 생성할 수 있다. 구체적으로, 가상 입체 음향 생성부(260)는 각 가상 음원의 좌측 채널 신호의 평균을 구하고, 각 가상 음원의 우측 채널 신호의 평균을 구하여, 2채널의 가상 입체 음향을 생성할 수 있다. The virtual stereo sound generator 260 may generate two channels of virtual stereo sound by synthesizing a plurality of virtual sound sources generated by the virtual sound source generator 250. In detail, the virtual stereo sound generator 260 may generate an average of the left channel signal of each virtual sound source, obtain an average of the right channel signal of each virtual sound source, and generate two channels of virtual stereo sound.

출력부(270)는 가상 입체 음향 생성부(260)에 의해 생성된 2채널의 가상 입체 음향을 출력할 수 있다. 이를 위해 출력부(270)는 좌측 스피커 및 우측 스피커를 포함할 수 있다. The output unit 270 may output two channels of virtual stereo sound generated by the virtual stereo sound generator 260. To this end, the output unit 270 may include a left speaker and a right speaker.

다음으로 도 5는 본 발명의 일 실시예에 따른 가상 입체 음향 생성 방법을 도시한 흐름도이다. 5 is a flowchart illustrating a virtual stereo sound generation method according to an embodiment of the present invention.

부호화된 다채널 음성 신호가 음성 입력 장치(100)로부터 수신되면(S510), 디코더(220)는 부호화된 다채널 음성 신호를 각각 복호화 한다(S520). When the encoded multichannel speech signal is received from the speech input apparatus 100 (S510), the decoder 220 decodes the encoded multichannel speech signal, respectively (S520).

다채널 음성 신호가 복호화되면, 복호화된 다채널 음성 신호의 싱크 보정 및 음원의 방향 추정이 수행된다(S530). 이 단계에서 음원의 방향 추정은 음원 방향 추정부(240)에 의해 실행되며, 다채널 음성 신호의 싱크 보정은 싱크 보정부(230)에 의해 실행된다. 이 때, 싱크 보정부(230)는 복호화된 다채널 음성 신호 중에서 기준 음성 신호에 맞추어 나머지 음성 신호들의 싱크를 보정한다. When the multi-channel voice signal is decoded, sync correction of the decoded multi-channel voice signal and direction estimation of the sound source are performed (S530). In this step, the direction estimation of the sound source is performed by the sound source direction estimator 240, and the sync correction of the multi-channel voice signal is performed by the sync corrector 230. At this time, the sync correction unit 230 corrects the sync of the remaining voice signals according to the reference voice signal among the decoded multi-channel voice signals.

가상 음원 생성부(250)는 싱크가 보정된 다채널 음성 신호 중에서 선택된 적어도 하나의 음성 신호와 음원의 방향에 따라 선택된 다수의 HRTF 값에 근거하여 다수의 가상 음원을 생성한다(S540). The virtual sound source generator 250 generates a plurality of virtual sound sources based on at least one voice signal selected from the multi-channel voice signals whose sync is corrected and a plurality of HRTF values selected according to the direction of the sound source (S540).

일 예로, 상기 S540 단계는 다채널 음성 신호 중에서 기준 마이크, 기준 마이크의 좌측에 위치한 마이크 및 기준 마이크의 우측에 위치한 마이크의 음성 신호를 선택하는 단계와, 기준 마이크의 음성 신호와 음원의 방향에 대응하는 HRTF 값인 기준 HRTF 값을 합성하여 제1 가상 음원을 생성하는 단계와, 기준 마이크의 좌측에 위치한 마이크의 음성 신호와 기준 HRTF 값보다 한 단계 작은 HRTF 값을 합성하여 제2 가상 음원을 생성하는 단계와, 기준 마이크의 우측에 위치한 마이크의 음성신호와 기준 HRTF 값보다 한 단계 큰 HRTF 값을 합성하여 제3 가상 음원을 생성하는 단계를 포함할 수 있다. For example, the step S540 may include selecting a voice signal of a reference microphone, a microphone located on the left of the reference microphone, and a microphone located on the right of the reference microphone, among the multi-channel voice signals, and corresponding to the voice signal of the reference microphone and the direction of the sound source. Generating a first virtual sound source by synthesizing a reference HRTF value, which is an HRTF value, and generating a second virtual sound source by synthesizing a voice signal of a microphone located to the left of the reference microphone and an HRTF value one step smaller than the reference HRTF value And generating a third virtual sound source by synthesizing the voice signal of the microphone located on the right side of the reference microphone and the HRTF value one step larger than the reference HRTF value.

다른 예로, 상기 S540는 기준 마이크의 음성 신호 및 기준 마이크의 우측(또는 좌측)에 위치한 마이크의 음성 신호를 선택하는 단계와, 기준 HRTF 값, 기준 HRTF 값보다 한 단계 작은 HRTF 값, 기준 HRTF 값보다 한 단계 큰 HRTF 값을 각각 기준 마이크의 음성 신호와 합성하여 제1 가상 음원, 제2 가상 음원 및 제3 가상 음원을 생성하는 단계와, 기준 HRTF 값, 기준 HRTF 값보다 한 단계 작은 HRTF 값, 기준 HRTF 값보다 한 단계 큰 HRTF 값을 각각 기준 마이크의 우측(또는 좌측)에 위치한 마이크의 음성 신호와 합성하여 제4 가상 음원, 제5 가상 음원 및 제6 가상 음원을 생성하는 단계를 포함할 수 있다. As another example, S540 may include selecting a voice signal of a reference microphone and a voice signal of a microphone located on the right (or left) side of the reference microphone, a reference HRTF value, an HRTF value smaller than the reference HRTF value, and a reference HRTF value. Generating a first virtual sound source, a second virtual sound source, and a third virtual sound source by synthesizing the HRTF value one step larger with the voice signal of the reference microphone, respectively, the HRTF value one step smaller than the reference HRTF value, and the reference And synthesizing the HRTF value one step larger than the HRTF value with the voice signal of the microphone located at the right side (or left side) of the reference microphone, respectively, to generate the fourth virtual sound source, the fifth virtual sound source, and the sixth virtual sound source. .

또 다른 예로, 상기 S540는 다채널 음성 신호 중에서 기준 마이크의 음성 신호를 선택하는 단계와, 기준 HRTF 값, 기준 HRTF 값보다 한 단계 작은 HRTF 값, 기준 HRTF 값보다 한 단계 큰 HRTF 값을 각각 기준 마이크의 음성 신호와 합성하여 제1 가상 음원, 제2 가상 음원 및 제3 가상 음원을 생성하는 단계를 포함할 수 있다. In another example, the S540 selects a voice signal of the reference microphone from among the multi-channel voice signals, a reference HRTF value, an HRTF value one step smaller than the reference HRTF value, and an HRTF value one step larger than the reference HRTF value, respectively. Synthesizing with a voice signal of the first virtual sound source, a second virtual sound source and a third virtual sound source may be included.

상술한 방법들 중 하나에 따라 다수의 가상 음원이 생성되면, 다수의 가상 음원을 합성하여 2채널의 가상 입체 음향을 생성한다(S550). 상기 S550 단계는 각 가상 음원의 좌측 채널 신호의 평균을 구하는 단계와, 각 가상 음원의 우측 채널 신호의 평균을 구하는 단계를 포함할 수 있다. When a plurality of virtual sound sources are generated according to one of the methods described above, a plurality of virtual sound sources are synthesized to generate two-dimensional virtual stereo sound (S550). The step S550 may include obtaining an average of the left channel signal of each virtual sound source, and calculating an average of the right channel signal of each virtual sound source.

생성된 2채널의 가상 입체 음향은 출력부(270)를 통해 출력된다(S560). The generated two-dimensional virtual stereo sound is output through the output unit 270 (S560).

이상으로 본 발명의 일 실시예에 따른 가상 입체 음향 생성 방법 및 장치를 설명하였다. 전술한 실시예에서 가상 입체 음향 생성 장치를 구성하는 구성요소는 일종의 '모듈(module)'로 구현될 수 있다. 여기서, '모듈'은 소프트웨어 또는 Field Programmable Gate Array(FPGA) 또는 주문형 반도체(Application Specific Integrated Circuit, ASIC)과 같은 하드웨어 구성요소를 의미하며, 모듈은 어떤 역할들을 수행한다. 그렇지만 모듈은 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. 모듈은 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 실행시키도록 구성될 수도 있다. The virtual stereo sound generation method and apparatus according to the embodiment of the present invention have been described above. In the above-described embodiment, the components constituting the virtual stereo sound generating apparatus may be implemented as a kind of 'module'. Here, 'module' means a hardware component such as software or a field programmable gate array (FPGA) or an application specific integrated circuit (ASIC), and the module performs certain roles. However, a module is not limited to software or hardware. A module may be configured to reside on an addressable storage medium and may be configured to execute one or more processors.

따라서, 일 예로서 모듈은 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들, 및 변수들을 포함한다. 구성요소들과 모듈들에서 제공되는 기능은 더 작은 수의 구성요소들 및 모듈들로 결합되거나 추가적인 구성요소들과 모듈들로 더 분리될 수 있다. 게다가, 상기 구성요소들 및 모듈들은 디바이스 내에서 하나 또는 그 이상의 CPU를 실행할 수 있다. Thus, by way of example, a module may include components such as software components, object-oriented software components, class components and task components, and processes, functions, attributes, procedures, Microcode, circuitry, data, databases, data structures, tables, arrays, and variables, as will be appreciated by those skilled in the art. The functionality provided by the components and modules may be combined into a smaller number of components and modules or further separated into additional components and modules. In addition, the components and modules may execute one or more CPUs within the device.

전술한 실시예들에 더하여, 본 발명의 실시예들은 전술한 실시예의 적어도 하나의 처리 요소를 제어하기 위한 컴퓨터 판독 가능한 코드/명령을 포함하는 매체 예를 들면, 컴퓨터 판독 가능한 매체를 통해 구현될 수도 있다. 상기 매체는 상기 컴퓨터 판독 가능한 코드의 저장 및/또는 전송을 가능하게 하는 매체/매체들에 대응할 수 있다. In addition to the embodiments described above, embodiments of the present invention may be embodied in a medium, such as a computer-readable medium, including computer readable code / instructions for controlling at least one processing element of the above described embodiments have. The medium may correspond to media / media enabling storage and / or transmission of the computer readable code.

상기 컴퓨터 판독 가능한 코드는, 매체에 기록될 수 있을 뿐만 아니라, 인터넷을 통해 전송될 수도 있는데, 상기 매체는 예를 들어, 마그네틱 저장 매체(예를 들면, ROM, 플로피 디스크, 하드 디스크 등) 및 광학 기록 매체(예를 들면, CD-ROM 또는 DVD)와 같은 기록 매체, 반송파(carrier wave)와 같은 전송매체를 포함할 수 있다. 또한, 본 발명의 실시예에 따라 상기 매체는 합성 신호 또는 비트스트림(bitstream)과 같은 신호일 수도 있다. 상기 매체들은 분산 네트워크일 수도 있으므로, 컴퓨터로 읽을 수 있는 코드는 분산 방식으로 저장/전송되고 실행될 수 있다. 또한 더 나아가, 단지 일 예로써, 처리 요소는 프로세서 또는 컴퓨터 프로세서를 포함할 수 있고, 상기 처리 요소는 하나의 디바이스 내에 분산 및/또는 포함될 수 있다. The computer readable code may be recorded on a medium as well as transmitted over the Internet, including, for example, a magnetic storage medium (e.g., ROM, floppy disk, hard disk, etc.) A recording medium such as a recording medium (e.g., CD-ROM or DVD), and a transmission medium such as a carrier wave. Also, according to an embodiment of the present invention, the medium may be a composite signal or a signal such as a bitstream. Since the media may be distributed networks, computer readable code may be stored / transmitted and executed in a distributed fashion. Still further, by way of example only, processing elements may include a processor or a computer processor, and the processing elements may be distributed and / or contained within a single device.

이상과 같이 예시된 도면을 참조로 하여, 본 발명의 실시예들을 설명하였지만, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 한정적이 아닌 것으로 이해해야만 한다.While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, It will be understood that the invention may be practiced. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive.

10: 로봇
20: 제어 장치
100: 음성 입력 장치
110: 수집부
120: 추출부
130: 인코더
140: 전송부
200: 음성 출력 장치
210: 수신부
220: 디코더
230: 싱크 보정부
240: 음원 방향 추정부
250: 가상 음원 생성부
260: 가상 입체 음향 생성부
270: 출력부
280: 저장부10: Robot
20: Control device
100: voice input device
110: collector
120:
130: encoder
140:
200: audio output device
210:
220: decoder
230: sync correction unit
240: sound source direction estimation unit
250: virtual sound source generation unit
260: virtual stereo sound generating unit
270: output unit
280:

Claims

기준 마이크를 포함하는 다수의 마이크로부터 획득된 다채널 음성 신호를 수신하는 단계;
상기 수신된 다채널 음성 신호에 근거하여 음원의 방향을 추정하는 단계;
상기 다채널 음성 신호 중에서 선택된 적어도 하나의 음성 신호와 상기 음원의 방향에 따라 결정된 다수의 HRTF(Head Related Transfer Function) 값에 근거하여 다수의 가상 음원을 생성하는 단계; 및
상기 다수의 가상 음원을 평균하여 가상 입체 음향을 생성하는 단계를 포함하는 가상 입체 음향 생성 방법. Receiving a multichannel speech signal obtained from a plurality of microphones including a reference microphone;
Estimating a direction of a sound source based on the received multi-channel voice signal;
Generating a plurality of virtual sound sources based on at least one voice signal selected from the multi-channel voice signals and a plurality of Head Related Transfer Function (HRTF) values determined according to a direction of the sound source; And
And generating a virtual stereo sound by averaging the plurality of virtual sound sources.

제 1 항에 있어서,
상기 다수의 가상 음원을 생성하는 단계는
상기 기준 마이크의 음성 신호와 상기 음원의 방향에 대응하는 HRTF 값인 기준 HRTF 값을 합성하여 제1 가상 음원을 생성하는 단계;
상기 기준 마이크의 좌측에 위치한 마이크의 음성 신호와 상기 기준 HRTF 값보다 한 단계 작은 HRTF 값을 합성하여 제2 가상 음원을 생성하는 단계; 및
상기 기준 마이크의 우측에 위치한 마이크의 음성 신호와 상기 기준 HRTF 값보다 한 단계 큰 HRTF 값을 합성하여 제3 가상 음원을 생성하는 단계를 포함하는 가상 입체 음향 생성 방법. The method of claim 1,
Generating the plurality of virtual sound sources
Generating a first virtual sound source by synthesizing a voice signal of the reference microphone and a reference HRTF value which is an HRTF value corresponding to a direction of the sound source;
Generating a second virtual sound source by combining a voice signal of a microphone located to the left of the reference microphone and an HRTF value one step smaller than the reference HRTF value; And
And generating a third virtual sound source by synthesizing a voice signal of a microphone located to the right of the reference microphone and an HRTF value one step larger than the reference HRTF value.

제 1 항에 있어서,
상기 다수의 가상 음원을 생성하는 단계는
상기 다채널 음성 신호 중에서 적어도 두 개의 음성 신호를 선택하는 단계;
상기 선택된 적어도 두 개의 음성 신호 각각에 대하여 상기 음원의 방향에 대응하는 HRTF 값인 기준 HRTF 값, 상기 기준 HRTF 값 보다 한 단계 작은 HRTF 값, 및 상기 기준 HRTF 값 보다 한 단계 큰 HRTF 값을 합성하여 다수의 가상 음원을 생성하는 단계를 포함하는 가상 입체 음향 생성 방법.The method of claim 1,
Generating the plurality of virtual sound sources
Selecting at least two voice signals from the multichannel voice signals;
For each of the selected at least two voice signals, a plurality of reference HRTF values corresponding to the direction of the sound source, HRTF values one step smaller than the reference HRTF value, and HRTF values one step larger than the reference HRTF value are synthesized. Virtual stereo sound generation method comprising the step of generating a virtual sound source.

제 3 항에 있어서,
상기 선택된 음성 신호는 상기 기준 마이크의 음성 신호를 포함하는 가상 입체 음향 생성 방법.The method of claim 3, wherein
And the selected voice signal comprises a voice signal of the reference microphone.

제 1 항에 있어서,
상기 다수의 가상 음원을 생성하는 단계는
상기 다채널 음성 신호 중에서 상기 기준 마이크의 음성 신호를 선택하는 단계; 및
상기 기준 마이크의 음성 신호와 상기 음원의 방향에 대응하는 HRTF 값인 기준 HRTF 값, 상기 기준 HRTF 값 보다 한 단계 작은 HRTF 값, 및 상기 기준 HRTF 값 보다 한 단계 큰 HRTF 값을 각각 합성하여 제1 가상 음원, 제2 가상 음원 및 제3 가상 음원을 생성하는 단계를 포함하는 가상 입체 음향 생성 방법. The method of claim 1,
Generating the plurality of virtual sound sources
Selecting a voice signal of the reference microphone from the multichannel voice signals; And
A first virtual sound source by synthesizing a voice signal of the reference microphone and a reference HRTF value corresponding to the direction of the sound source, an HRTF value one step smaller than the reference HRTF value, and an HRTF value one step higher than the reference HRTF value, respectively And generating a second virtual sound source and a third virtual sound source.

제 1 항에 있어서,
상기 가상 입체 음향을 생성하는 단계는
상기 다수의 가상 음원의 좌측 채널 신호에 대한 평균을 구하는 단계; 및
상기 다수의 가상 음원의 우측 채널 신호에 대한 평균을 구하는 단계를 포함하는 가상 입체 음향 생성 방법.The method of claim 1,
Generating the virtual stereo sound
Obtaining an average of left channel signals of the plurality of virtual sound sources; And
And calculating an average of the right channel signals of the plurality of virtual sound sources.

기준 마이크를 포함하는 다수의 마이크로부터 획득된 다채널 음성 신호를 수신하는 수신부;
상기 수신된 다채널 음성 신호에 근거하여 음원의 방향을 추정하는 음원 방향 추정부;
상기 다채널 음성 신호 중에서 선택된 적어도 하나의 음성 신호와 상기 음원의 방향에 따라 결정된 다수의 HRTF(Head Related Transfer Function) 값에 근거하여 다수의 가상 음원을 생성하는 가상 음원 생성부; 및
상기 다수의 가상 음원을 평균하여 가상 입체 음향을 생성하는 가상 입체 음향 생성부를 포함하는 가상 입체 음향 생성 장치. A receiver configured to receive a multi-channel voice signal obtained from a plurality of microphones including a reference microphone;
A sound source direction estimator for estimating a direction of a sound source based on the received multi-channel voice signal;
A virtual sound source generator configured to generate a plurality of virtual sound sources based on at least one voice signal selected from the multichannel voice signals and a plurality of Head Related Transfer Function (HRTF) values determined according to a direction of the sound source; And
And a virtual stereo sound generator for generating a virtual stereo sound by averaging the plurality of virtual sound sources.

제 7 항에 있어서,
상기 가상 음원 생성부는
상기 기준 마이크의 음성 신호와 상기 음원의 방향에 대응하는 HRTF 값인 기준 HRTF 값을 합성하여 제1 가상 음원을 생성하고,
상기 기준 마이크의 좌측에 위치한 마이크의 음성 신호와 상기 기준 HRTF 값보다 한 단계 작은 HRTF 값을 합성하여 제2 가상 음원을 생성하며,
상기 기준 마이크의 우측에 위치한 마이크의 음성 신호와 상기 기준 HRTF 값보다 한 단계 큰 HRTF 값을 합성하여 제3 가상 음원을 생성하는 가상 입체 음향 생성 장치. The method of claim 7, wherein
The virtual sound source generation unit
Generating a first virtual sound source by synthesizing a voice signal of the reference microphone and a reference HRTF value which is an HRTF value corresponding to a direction of the sound source,
Generating a second virtual sound source by combining a voice signal of a microphone located to the left of the reference microphone and an HRTF value one step smaller than the reference HRTF value,
And a third virtual sound source by synthesizing a voice signal of a microphone located to the right of the reference microphone and an HRTF value one step larger than the reference HRTF value.

제 7 항에 있어서,
상기 가상 음원 생성부는
상기 다채널 음성 신호 중에서 적어도 두 개의 음성 신호를 선택하고,
상기 선택된 적어도 두 개의 음성 신호 각각에 대하여 상기 음원의 방향에 대응하는 HRTF 값인 기준 HRTF 값, 상기 기준 HRTF 값 보다 한 단계 작은 HRTF 값, 및 상기 기준 HRTF 값 보다 한 단계 큰 HRTF 값을 합성하여 다수의 가상 음원을 생성하는 가상 입체 음향 생성 장치.The method of claim 7, wherein
The virtual sound source generation unit
Selecting at least two voice signals from the multichannel voice signals,
For each of the selected at least two voice signals, a plurality of reference HRTF values corresponding to the direction of the sound source, HRTF values one step smaller than the reference HRTF value, and HRTF values one step larger than the reference HRTF value are synthesized. Virtual stereo sound generating device for generating a virtual sound source.

제 9 항에 있어서,
상기 선택된 음성 신호는 상기 기준 마이크의 음성 신호를 포함하는 가상 입체 음향 생성 장치.The method of claim 9,
And the selected voice signal comprises a voice signal of the reference microphone.

제 7 항에 있어서,
상기 가상 음원 생성부는
상기 다채널 음성 신호 중에서 상기 기준 마이크의 음성 신호를 선택하고,
상기 기준 마이크의 음성 신호와 상기 음원의 방향에 대응하는 HRTF 값인 기준 HRTF 값, 상기 기준 HRTF 값 보다 한 단계 작은 HRTF 값, 및 상기 기준 HRTF 값 보다 한 단계 큰 HRTF 값을 각각 합성하여 제1 가상 음원, 제2 가상 음원 및 제3 가상 음원을 생성하는 가상 입체 음향 생성 장치. The method of claim 7, wherein
The virtual sound source generation unit
Selecting a voice signal of the reference microphone from the multi-channel voice signal,
A first virtual sound source by synthesizing a voice signal of the reference microphone and a reference HRTF value corresponding to the direction of the sound source, an HRTF value one step smaller than the reference HRTF value, and an HRTF value one step higher than the reference HRTF value, respectively And a virtual stereo sound generating device generating a second virtual sound source and a third virtual sound source.

제 7 항에 있어서,
상기 가상 입체 음향 생성부는
상기 다수의 가상 음원의 좌측 채널 신호에 대한 평균을 구하고, 상기 다수의 가상 음원의 우측 채널 신호에 대한 평균을 구하여 상기 가상 입체 음향을 생성하는 가상 입체 음향 생성 장치.The method of claim 7, wherein
The virtual stereo sound generating unit
And obtaining an average of left channel signals of the plurality of virtual sound sources and an average of right channel signals of the plurality of virtual sound sources to generate the virtual stereo sound.