JP2006066939A

JP2006066939A - Sound reproducing method and apparatus thereof

Info

Publication number: JP2006066939A
Application number: JP2004243477A
Authority: JP
Inventors: Atsushi Yoshimoto; 淳善本
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2004-08-24
Filing date: 2004-08-24
Publication date: 2006-03-09

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method and an apparatus for reproducing sounds with realistic sensation in which the influence of movements of heads of both communication parties when sound source units such as mouths or sound collecting units such as ears in such a case where the heads of both the communication parties move in conversation in communication between remote places. <P>SOLUTION: In communication between communicators each provided with a sound source unit for emitting sounds and a sound collecting unit for collecting sounds, this method reproduces sounds equivalent to an open type sound collecting mode in which the sound collecting unit is not relatively fixed on a sound source unit, even at least any one of the communicators has a headset type sound collecting mode in which the sound collecting unit is relatively fixed on the sound source unit. In this method, when a speaking party's communicator moves spatially, the voice and sound collected with the sound collecting unit of the speaking party's communicator are converted by an inverse head transmission function for obtaining changes in voice and sound equivalent to a case of collecting voice and sound with a sound collecting unit fixed on a virtual position from the change in the sound source unit by using information indicating the spatial movement of the sound source unit of the speaking party's communicator. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、遠隔地通信における対話において通信者双方の頭部が動作する場合など、口等の音源部や耳等の収音部が動く場合において、その動作の影響を反映した臨場感のある音を再現する方法及び装置に関する。 The present invention has a sense of presence that reflects the influence of the operation when the sound source unit such as the mouth and the sound collection unit such as the ear move, such as when both the heads of the communicator operate in a conversation in remote communication. The present invention relates to a method and an apparatus for reproducing sound.

臨場感のある音を再現する技術が進みつつある。
一般に、音源から発せられた音は、聞手の顔や耳介によって反射や回折された後に受聴される。この顔や耳介での反射や回折の影響は、音源と聞手の耳との位置関係によって特徴づけられる。音源と聞手の方位関係、音源から右耳への伝達特性、音源から左耳への伝達特性、これらを一組にしたもの、すなわち音源から人の外耳道入口までの音響特性を示すものとして、頭部伝達関数(HRTF:Head-Related Transfer Function)が知られている。
例えば、ヘッドホンでステレオ音楽を聴くと、聴手が頭部動作中であっても、それとは無関係に音像が頭の内部に定位してしまい不自然な感じを受ける。これは、頭部伝達関数の影響が考慮されていないためである。音を再生する際、この頭部伝達関数を正確に再現することができれば、任意の位置に音像を定位させることが可能となる。 Technology to reproduce realistic sounds is advancing.
Generally, sound emitted from a sound source is received after being reflected or diffracted by a listener's face or pinna. The influence of reflection and diffraction on the face and pinna is characterized by the positional relationship between the sound source and the ear of the listener. Directional relationship between the sound source and the listener, transmission characteristics from the sound source to the right ear, transmission characteristics from the sound source to the left ear, a set of these, that is, an acoustic characteristic from the sound source to the human ear canal entrance, The head-related transfer function (HRTF) is known.
For example, when listening to stereo music with headphones, the sound image is localized inside the head regardless of the movement of the listener, and the user feels unnatural. This is because the influence of the head-related transfer function is not considered. When reproducing the sound, if the head-related transfer function can be accurately reproduced, the sound image can be localized at an arbitrary position.

音源信号と頭部伝達関数を畳み込むことによって、頭外に音像を定位させることが可能となり、立体的な音場を再生することが可能となる。頭外音像定位の評価法には、MOS評価法、一対比較法などが用いられているが、これらの評価法は被験者に大きく依存し、あいまいである。そこで聴覚マスキングを利用し、数値的に頭外音像定位を評価することが検討されている。 By convolving the sound source signal and the head-related transfer function, the sound image can be localized outside the head, and a three-dimensional sound field can be reproduced. MOS evaluation methods, paired comparison methods, and the like are used as evaluation methods for out-of-head sound image localization, but these evaluation methods largely depend on the subject and are ambiguous. Thus, numerically evaluating out-of-head sound localization using auditory masking has been studied.

また、ステレオソースをヘッドホンで聞くときに、音像が頭の内部に位置してしまうことを抑える手段として、バイノーラル式の録音がある。
バイノーラル録音は、録音するときに人間の頭を模したダミーヘッドを用い、そのダミーヘッドの各耳にマイクを仕込んで録音する方法である。このようにして録音された音声をヘッドホンで再生すると、ダミーヘッドの耳が聴いた音が、左右の音が混ざり合うことなく、そっくりそのまま聞手の耳に届けられるため、臨場感を得ることができる。 In addition, binaural recording is available as a means to prevent the sound image from being located inside the head when listening to stereo sources with headphones.
Binaural recording is a method of recording by using a dummy head imitating a human head when recording and inserting a microphone into each ear of the dummy head. When the sound recorded in this way is played back with headphones, the sound heard by the ears of the dummy head is delivered to the listener's ears as they are without mixing the left and right sounds. it can.

しかし、録音に使用されるダミーヘッドと、再生に使用されるヘッドホンの組み合わせによっては、正確な周波数特性の再現ができなくなるという問題がある。また、ダミーヘッドと各聞手の頭や耳は、形状や材質が異なるため、各聞手に固有な聴取環境に適応させるのは困難である。
特に、頭部の回転効果が再現されない難点がある。ダミーヘッドは頭部運動などせず固定されたまま録音されるので、聞手が頭を回転させてもその動作は反映されず、音像も一緒に回転してしまい不自然な音になってしまう。 However, depending on the combination of the dummy head used for recording and the headphones used for reproduction, there is a problem that accurate frequency characteristics cannot be reproduced. Also, since the dummy head and the head and ear of each listener are different in shape and material, it is difficult to adapt to the listening environment specific to each listener.
In particular, there is a difficulty that the rotation effect of the head is not reproduced. Since the dummy head is recorded without moving the head, the movement is not reflected even if the listener rotates his head, and the sound image also rotates together, resulting in an unnatural sound. .

これを解消するために、聞手のリアルタイムの頭部動作を再現しながら録音する手段が検討されている。しかし、動作させるモータと方位を検知するセンサとを備えたロボットヘッドで録音する際に、ロボットの動作音を収録してしまうことが避けられない。また、この場合も、形状の異なる各聞手の頭に固有な聴取環境に適応させるのは難しい。
このように、非特許文献１をはじめとする従来技術においては、通信者双方の音源及び収音部の動作の影響を反映した臨場感のある音を再現することは困難であった。
戸嶋巌樹、植松尚、平原達也。「頭部運動に追従するダミーヘッド」日本音響学会秋季講演会講演論文集、p.439-440、2002年 In order to solve this problem, means for recording while reproducing the real-time head movement of the listener has been studied. However, when recording with a robot head having a motor to be operated and a sensor for detecting the direction, it is inevitable that the operation sound of the robot is recorded. Also in this case, it is difficult to adapt to the listening environment unique to each listener's head having a different shape.
As described above, in the prior art including Non-Patent Document 1, it is difficult to reproduce a realistic sound reflecting the influence of the sound source and the operation of the sound collection unit of both communication parties.
Yuki Tojima, Nao Uematsu, Tatsuya Hirahara. "Dummy head following head movement" Proceedings of the Autumn Meeting of the Acoustical Society of Japan, p.439-440, 2002

そこで、本発明は、遠隔地通信における対話において通信者双方の頭部が動作する場合など、口等の音源部や耳等の収音部が動く場合において、その動作の影響を反映した臨場感のある音を再現する方法及び装置、特に、ヘッドセット式での収音等により収音部を音源部に固定しても、オープン式で収音したのに相当する臨場感に富む音を再現する方法及び装置を提供することを課題とする。 In view of this, the present invention provides a realistic sensation that reflects the effects of movement when the sound source unit such as the mouth and the sound collection unit such as the ear move, such as when the heads of both communicators move during remote communication. Reproduces sound with a sense of presence equivalent to that collected by the open method even if the sound collection part is fixed to the sound source part by sound collection with a headset type, etc. It is an object of the present invention to provide a method and an apparatus.

上記課題を解決するために、本発明の音響再現方法は、次の構成を備える。
すなわち、音声を発する音源部と音声を収音する収音部とを備えた通信体同士の通信において、その通信体の少なくとも一方が、収音部を音源部に対して相対的に固定したヘッドセット式の収音形態であっても、収音部を音源部に対して相対的に固定していないオープン式の収音形態での収音に相当する音声を再現する方法であって、話手側の通信体が空間的に動作をしたとき、その話手側通信体の音源部の空間的動作を示す情報を用いて、音源部の位置の変化から、仮想的な位置に固定された収音部で収音したのに相当する音声の変化を求める逆頭部伝達関数によって、その話手側通信体の収音部で収音した音声を変換することを特徴とする。 In order to solve the above problems, the sound reproduction method of the present invention has the following configuration.
That is, in communication between communication bodies including a sound source section that emits sound and a sound collection section that collects sound, at least one of the communication bodies fixes the sound collection section relative to the sound source section. Even if it is a set-type sound collection form, it is a method for reproducing sound equivalent to sound collection in an open-type sound collection form in which the sound collection part is not fixed relative to the sound source part. When the communication body on the hand side moved spatially, it was fixed at a virtual position from the change in the position of the sound source section using information indicating the spatial operation of the sound source section of the speaker side communication body. The voice picked up by the sound collecting unit of the speaker-side communication body is converted by an inverse head-related transfer function for obtaining a change in sound corresponding to the sound picked up by the sound collecting unit.

ここで、聞手側の通信体が空間的に動作をしたとき、その聞手側通信体の収音部の空間的動作を示す情報を用いて、収音部の位置の変化から、仮想的な位置に固定された収音部で収音したのに相当する音声の変化を求める頭部伝達関数によって、話手側通信体から伝達された音声を変換してもよい。 Here, when the communication body on the listener side operates spatially, the information indicating the spatial operation of the sound collection unit of the listener communication body is used to determine the virtual position from the change in the position of the sound collection unit. The voice transmitted from the speaker-side communication body may be converted by a head-related transfer function that obtains a change in voice corresponding to the sound picked up by the sound pickup unit fixed at a certain position.

また、通信体を人物とし、空間的動作を、人物の頭部の略水平回転方位に回転する水平回転角（ヨー角）、前後回転方位に回転する前後回転角（ピッチ角）、左右回転方位に回転する左右回転角（ロール角）とで表現してもよい。 In addition, assuming that the communication body is a person, the spatial movement of the person's head is rotated in the horizontal rotation direction (yaw angle), the front-rear rotation angle (pitch angle) rotated in the front-rear rotation direction, and the left-right rotation direction. It may be expressed by a left-right rotation angle (roll angle) that rotates in the direction of the angle.

本発明の音響再現装置は、音声を発する音源部と音声を収音する収音部とを備えた通信体同士の通信において、その通信体の少なくとも一方が、収音部を音源部に対して相対的に固定したヘッドセット式の収音形態であっても、収音部を音源部に対して相対的に固定していないオープン式の収音形態での収音に相当する音声を再現する装置であって、話手側の通信体が空間的に動作をしたとき、その話手側通信体の音源部の空間的動作を示す情報を計測して出力する音源部動作伝達手段と、音源部の位置の変化から、仮想的な位置に固定された収音部で収音したのに相当する音声の変化を求める逆頭部伝達関数によって、その話手側通信体の収音部で収音した音声を変換する音声調整手段とを備えることを特徴とする。 In the acoustic reproduction apparatus of the present invention, in communication between communication bodies including a sound source section that emits sound and a sound collection section that collects sound, at least one of the communication bodies uses the sound collection section to the sound source section. Reproduces sound equivalent to sound collection in an open-type sound collection form where the sound collection part is not fixed relative to the sound source part even in a relatively fixed headset type sound collection form A sound source unit motion transmission means for measuring and outputting information indicating a spatial operation of the sound source unit of the speaker side communication body when the speaker side communication body spatially operates, and a sound source From the change in the position of the part, the sound collecting part of the speaker side communicator uses the inverse head related transfer function to obtain the sound change corresponding to the sound picked up by the sound collecting part fixed at the virtual position. And a sound adjusting means for converting the sound that has been sounded.

聞手側の通信体が空間的に動作をしたとき、その聞手側通信体の収音部の空間的動作を示す情報を計測して出力する収音部動作伝達手段と、収音部の位置の変化から、仮想的な位置に固定された収音部で収音したのに相当する音声の変化を求める頭部伝達関数によって、話手側通信体から伝達された音声を変換する音声調整手段とを設けてもよい。 When the listener's communication body operates spatially, the sound collection unit operation transmitting means for measuring and outputting information indicating the spatial operation of the sound collection unit of the listener's side communication body, Speech adjustment that converts speech transmitted from the speaker-side communicator using a head-related transfer function that calculates the change in speech equivalent to sound picked up by a sound pickup unit fixed at a virtual position from the change in position Means may be provided.

本発明によると、話手側の通信体が空間的に動作をしたとき、逆頭部伝達関数を用いて音声を変換することによって、収音部を音源部に対して相対的に固定していないオープン式の収音形態での収音に相当する音声を再現することができる。また、聞手側の通信体が空間的に動作をしたときには、頭部伝達関数を用いて音声を適切に変換して再現できる。遠隔地通信における対話等において通信者双方の頭部が動作した場合などに、臨場感に富む音を再現することが可能である。
また、使い方によっては現実世界では不可能な配座、例えば、同じ方向を向きつつ横並びになった状態で相手話者Aが自分Bの左側に居る（Aの左横にBが、同時にBの左横にはAが居る）、などの対話も可能となる。 According to the present invention, when the communication body on the speaker side operates spatially, the sound collection unit is relatively fixed with respect to the sound source unit by converting the speech using the inverse head-related transfer function. It is possible to reproduce sound equivalent to sound collection in an open type sound collection form. Further, when the communication body on the listener side operates spatially, it can be reproduced by appropriately converting the voice using the head-related transfer function. It is possible to reproduce sound with a rich sense of realism when the heads of both communicators move during a conversation or the like in remote communication.
Also, depending on how you use it, a configuration that is impossible in the real world, for example, the other speaker A is on the left side of your B while facing the same direction (B on the left side of A, A is on the left side).

以下に、図面を基に本発明の実施形態を説明する。
図１は、収音部が音源部に固定されたヘッドセット式（インカム式）での収音形態を示す説明図である。
マイクは、それに連なる略曲線状体を耳や頭部に掛けたり、頭部を左右から挟みこむことで、頭部に装着される。頭部に固定されるので、頭部を動かしても、マイクと音源の口との位置関係は不変であり、特に声量の変化は受けない。 Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 is an explanatory diagram showing a sound collection form of a headset type (income type) in which a sound collection unit is fixed to a sound source unit.
The microphone is attached to the head by hanging a substantially curvilinear body connected to the ear or the head or by sandwiching the head from the left and right. Since it is fixed to the head, even if the head is moved, the positional relationship between the microphone and the mouth of the sound source is not changed, and the volume of the voice is not particularly changed.

ここで得られた音V0Mを、収音部が音源部と空間を隔てて独立したオープン式収音形態で得られたような音声V'0Mにするには、主に声量を話手の頭部動作に従って修正する必要がある。
その修正は、一般的な頭部伝達関数HRTFとは逆の用い方で、頭部の位置の変化から、仮想的な位置に固定された収音装置で収音したかのような音の時間差、音圧、角度による口周辺の組織での遮蔽などによるスペクトル変化、発話中の頭部動作がもたらす周波数の変化等を加味した変化を演算するものである。ここでは、逆頭部伝達関数rHRTFと呼ぶ。
なお、略水平回転方位に回転する水平回転方位をヨー、それに垂直で前後回転方位に回転する前後回転方位をピッチ、更にそれらに垂直で左右回転方位に回転する左右回転方位をロールと呼び、それぞれ変数ω、χ、ψによって表現する。 In order to convert the sound V0M obtained here into a voice V'0M that is obtained in an open-type sound collection form in which the sound collection unit is separated from the sound source unit by a space, the volume of the voice is mainly set at the speaker's head. It is necessary to correct according to the part operation.
The correction is based on the reverse use of the general head related transfer function HRTF, and the time difference of the sound as if the sound was picked up by the sound pickup device fixed at the virtual position from the change in the position of the head. In addition, a change in consideration of a spectrum change due to sound pressure, angle shielding by a tissue around the mouth, a frequency change caused by head movement during speech, and the like is calculated. Here, it is called the inverse head-related transfer function rHRTF.
Note that the horizontal rotation direction that rotates in a substantially horizontal rotation direction is called yaw, the front and rear rotation direction that rotates perpendicularly to the front and rear rotation direction is the pitch, and the left and right rotation direction that rotates perpendicular to them and that rotates in the left and right rotation direction is called a roll. Expressed by variables ω, χ, and ψ.

逆頭部伝達関数rHRTFは、粗く作成するなら、頭部のヨー・ピッチ・ロールという頭部変数であるωs, χs, ψsを利用し、頭部を典型的なサイズの扁平球とし、口の位置を典型的な位置と大きさにすれば、実測しなくともシミュレーションとしてある程度は作成することができる。
正確に作成するには、話手となる被験者を用いて、ヨー・ピッチ・ロールを次々と変えながら、体から特定の３次元距離(x, y, z)だけ離れた位置で固定された収音装置に対して、同じ音量で発話してもらえばよい。実際に同じ音量かどうかの判断には、ヘッドセット式収音装置を併用して装着し、比較すればよい。
また、頭部動作による声帯への影響が無いと仮定するならば、基本的な周波数を同音量で発し続ける小型スピーカを、被験者の口腔奥声帯付近に設置して観測する方法も利用できる。
こうして、その被験者用の逆頭部伝達関数rHRTFが得られる。実際には使用者毎に専用の逆頭部伝達関数rHRTFを作成するのが望ましいが、ある人の逆頭部伝達関数rHRTFを典型的な例として万人に応用することも可能である。 If the inverse head-related transfer function rHRTF is created roughly, the head variables of head, yaw, pitch, and roll, ωs, χs, ψs, are used to make the head a flat sphere of typical size, If the position is set to a typical position and size, it can be created to some extent as a simulation without actual measurement.
To create accurately, the subject who is the speaker, while changing the yaw, pitch, and roll one after another, is fixed at a fixed distance from the body by a specific three-dimensional distance (x, y, z). The sound device may be uttered at the same volume. In order to determine whether or not the volume is actually the same, it is only necessary to use a headset type sound pickup device in combination and compare them.
If it is assumed that there is no influence on the vocal cords due to the head movement, a method of observing a small speaker that continuously emits a basic frequency at the same volume near the oral vocal cords of the subject can be used.
Thus, the inverse head-related transfer function rHRTF for the subject is obtained. In practice, it is desirable to create a dedicated inverse head-related transfer function rHRTF for each user, but a person's inverse head-related transfer function rHRTF can be applied to all people as a typical example.

また、話手の体から距離(x, y, z)だけ離れた位置で固定された収音装置の位置も、逆頭部伝達関数rHRTFの変数の１つとするならば、それだけ逆頭部伝達関数rHRTFの測定回数が増すが、利用者は任意の位置を収音地点として選択することも可能となる。
これは、利用者が自分の意志で、電話ボックスのような狭い空間内で、話者のネクタイにつけられたかのようなマイク、壁も天井もない開放された広い空間で、高さ2ｍの位置に設置されたかのようなマイク、1.8m四方の風呂場内にて、話者の口元近くにあるようなマイク、など事前に用意してあれば、臨機応変TPOにあわせて色々な収音位置を選ぶことも可能となる。 In addition, if the position of the sound pickup device fixed at a position (x, y, z) away from the speaker's body is also one of the variables of the inverse head transfer function rHRTF, the corresponding inverse head transfer Although the number of measurements of the function rHRTF increases, the user can also select an arbitrary position as a sound collection point.
This is the user's will, in a narrow space like a telephone box, a microphone as if attached to the speaker's tie, a wide open space with no walls or ceiling, and at a height of 2m. If you prepare in advance, such as a microphone as if it was installed, a microphone near the mouth of the speaker in a 1.8m square bathroom, select various sound collection positions according to the flexible TPO. Is also possible.

また、ヘッドセットの音に対する指向性が高く、反射音を拾わないと仮定すると、事前の実験でマイクを音の反射の無い無響室、反響のある狭い部屋、反響のある広い部屋などで取り分ければ、指向性がやや低いマイクで収音したかのように、話手周辺の音の反響環境もこの逆頭部伝達関数rHRTFに含まれ、高音質な通信装置であればあるほど話手周辺の環境も再現できることになる。
これによって、立ち位置での高身長（床面まで口が遠い）や、低身長（床面まで口が近い）の床面や天井や壁による音の反射も再現できる。この事前の収音実験はバイノーラル式で行うことが望ましい。 Also, assuming that the headset has high directivity and does not pick up the reflected sound, the microphone is divided in an anechoic room without reflection of sound, a narrow room with reverberation, and a large room with reverberation. Therefore, the reverberation environment of the sound around the speaker is also included in the inverse head-related transfer function rHRTF as if the sound was picked up by a microphone with slightly lower directivity. The surrounding environment can also be reproduced.
This makes it possible to reproduce the reflection of sound from a floor surface, ceiling, or wall with a tall height (the mouth is far from the floor) or a short height (the mouth is close to the floor) at the standing position. This preliminary sound collection experiment is preferably performed in a binaural manner.

なお、各逆頭部伝達関数rHRTFの測定は、公知技術の方法を適宜用いて行える。また、各話手の頭部動作をリアルタイムで測定し、話手や聞手の頭部のヨー・ピッチ・ロールを電気的磁気的に伝達する方法も、公知技術を適宜用いて行える。音声の通信等についても同様である。以下では、既に相手話手からの音声と頭部状態が、電気的磁気的手段によりリアルタイムに本装置に届いていて、また同時に利用者である聞手の頭部状態も同様にリアルタイムに本装置に届いているとする。 Note that each inverse head-related transfer function rHRTF can be measured by appropriately using a known technique. Further, a method of measuring the head movement of each speaker in real time and electrically and magnetically transmitting the yaw, pitch, and roll of the speaker and the listener's head can be performed using known techniques as appropriate. The same applies to voice communication and the like. In the following, the voice and head state from the other party's speaker has already reached the device in real time by means of electrical and magnetic means, and at the same time the head state of the listener who is the user is also in real time in the same way. Suppose that

図２は、音声を変換する流れを示す説明図であり、図３は、収音部に対して音源部が移動する場合を例示したオープン式での収音形態を示す説明図である。
上述のようにして作成される逆頭部伝達関数rHRTFの第一段階である音声調整装置S1はコントローラからの指示を受け、利用者は次の４つの動作状態を選択することができる。１つ目は、話手が入力をV0Mとするヘッドセット式モノラル収音、２つ目は、同様に受話器式モノラル収音、３つ目は、入力をV'0Mとするオープン式モノラル収音、４つ目は、入力をV'0R, V'0Lとするオープン式ステレオ収音である。この音声調整装置S1は、ヘッドセット式や受話器式のモノラル音V0Mを、話手のヨーωs、ピッチχs、ロールψsという頭部動作の影響を加味して、リアルタイムに音V1Mを出力するエフェクター機能を奏する。 FIG. 2 is an explanatory diagram illustrating a flow of converting sound, and FIG. 3 is an explanatory diagram illustrating an open-type sound collection form exemplifying a case where the sound source unit moves relative to the sound collection unit.
The voice adjustment device S1, which is the first stage of the inverse head-related transfer function rHRTF created as described above, receives an instruction from the controller, and the user can select the following four operation states. The first is a headset-type monaural sound pickup with V0M input by the speaker, the second is a receiver-type monaural sound pickup similarly, and the third is an open-type monaural sound pickup with V'0M input. The fourth is open stereo sound collection with inputs V'0R and V'0L. This sound adjuster S1 is an effector function that outputs the sound V1M in real time, taking into account the effects of the head movement of the speaker's yaw ωs, pitch χs, and roll ψs to the headset type or receiver type monaural sound V0M Play.

逆頭部伝達関数rHRTFの第一段階である頭部動作調整フィルターS1は、下記のように記述できる。
１つ目及び２つ目のヘッドセット式モノラル収音や受話器式モノラル収音では、話手の収音装置で収音されたモノラルの音V0Mは、逆頭部伝達関数rHRTFの第一段階S1の変換によって、収音V0Mを位置(x0l,y0l,z0l)のオープン式相当の音V1M（モノラル）[ポイント1M]として出力する。 The head movement adjustment filter S1, which is the first stage of the inverse head-related transfer function rHRTF, can be described as follows.
In the first and second headset-type monaural sound collection and handset-type monaural sound collection, the monaural sound V0M picked up by the speaker's sound pickup device is the first step S1 of the inverse head-related transfer function rHRTF. The converted sound V0M is output as an open-type sound V1M (monaural) [point 1M] at the position (x0l, y0l, z0l).

モノラル入力、モノラル出力:
V1M = S1(V0M, χs, ψs, ωs, t, x0l, y0l, z0l, e)
ここで、x0l, y0l, z0lは、基準点（例えば話手の頭部の中心座標）から仮想的な収音装置までの距離、χs, ψs, ωsは、話手の頭部変数（ピッチ角、ロール角、ヨー角）、tは、時間、eは、話手のいる部屋など環境の特徴を示す変数、V0Mは、話手に装着されたヘッドセットからの音である。 Mono input, mono output:
V1M = S1 (V0M, χs, ψs, ωs, t, x0l, y0l, z0l, e)
Here, x0l, y0l, z0l are distances from the reference point (for example, the center coordinates of the speaker's head) to the virtual sound pickup device, and χs, ψs, ωs are the speaker head variables (pitch angle). , Roll angle, yaw angle), t is time, e is a variable indicating environmental characteristics such as the room where the speaker is located, and V0M is a sound from the headset attached to the speaker.

x0l, y0l, z0lには無限の組み合わせがあるが、実際に想定できる何点かで行えば必要に事足りる。また、収録済みのA地点と収録済みのB地点との中間等にある未収録のM地点での変換を試みる場合、A地点とB地点での結果から、M地点の結果を合成することも可能である。
以上のように、逆頭部伝達関数rHRTFの第一段階S1とは、話手に装着されたヘッドセット類で収音された音V0Mを、話手や話手周囲の情報を駆使し、話手から離れたある位置(x0l, y0l, z0l)にあるマイクにてあたかも収音されたかのような音V1Mに変換する関数である。
なお、現在一般的に利用されている受話器(モノラル収音)とスピーカのセットは、このヘッドセット式に準拠するものに相当する。実際には、頷きの振動を受け難いなど異なる点があるが、受話器専用の逆頭部伝達関数rHRTFの測定方法などは同様の方法を利用することができる。 There are infinite combinations of x0l, y0l, and z0l, but it is sufficient if it is done at some point that can be assumed in practice. In addition, when trying to convert at an unrecorded M point in the middle of a recorded A point and a recorded B point, the result of the M point can be synthesized from the results at the A and B points. Is possible.
As described above, the first stage S1 of the inverse head-related transfer function rHRTF means that the sound V0M collected by the headsets attached to the speaker is used by making full use of information about the speaker and the speaker's surroundings. This function converts the sound to a sound V1M as if it was picked up by a microphone located at a certain position (x0l, y0l, z0l) away from the hand.
Note that a set of a handset (monaural sound pickup) and a speaker that are generally used at present correspond to those that conform to this headset type. Actually, there are differences such as difficulty in receiving whispering vibrations, but the same method can be used for measuring the inverse head-related transfer function rHRTF dedicated to the receiver.

逆頭部伝達関数rHRTFの第一段階S1の変換によって得られたオープン式相当の音V1M（モノラル）[ポイント1M]は、逆頭部伝達関数rHRTFの第二段階S2の変換を受けると、V2R / V2L（ステレオ）[ポイント2R/ポイント2L]として出力される。
異なる２地点で収音されたかのような出力、すなわちステレオ化出力の場合の逆頭部伝達関数rHRTFの第二段階である収音位置調整フィルターS2は、例えば次のように記述できる。
モノラル入力、ステレオ出力（右チャネル）:
V2R = S2(V1M, χs, ψs, ωs, t, x0l, y0l, z0l, xr, yr, zr, e)
モノラル入力、ステレオ出力（左チャネル）:
V2L = S2(V1M, χs, ψs, ωs, t, x0l, y0l, z0l, xl, yl, zl, e)
ここで、(xr, yr, zr)とは、右チャネル用の仮想収音装置の位置、(xl, yl, zl)とは、左チャネル用の仮想収音装置の位置である。
右と左の収音装置は聞手にあわせたり、または平均的な人間の左右の耳の位置が望ましい。そうすれば、聞手が一般的なヘッドホンなどで聞いているとしても、相応の臨場感が得られる。また、この逆頭部伝達関数rHRTFの第二段階S2の処理は時間的に平行して行われる。
なお、位置(x0l,y0l,z0l)と位置(xl,yl,zl)が等しい場合は、収音位置調整フィルターS2による処理は素通りしてV1M=V2Lとする。 The open-type equivalent sound V1M (monaural) [point 1M] obtained by the conversion of the inverse head-related transfer function rHRTF in the first stage S1 is converted into the V2R when the inverse head-related transfer function rHRTF is converted in the second stage S2 / Output as V2L (stereo) [Point 2R / Point 2L].
The sound collection position adjustment filter S2, which is the second stage of the inverse head-related transfer function rHRTF in the case of the output as if the sound was collected at two different points, that is, the stereo output, can be described as follows, for example.
Mono input, stereo output (right channel):
V2R = S2 (V1M, χs, ψs, ωs, t, x0l, y0l, z0l, xr, yr, zr, e)
Mono input, stereo output (left channel):
V2L = S2 (V1M, χs, ψs, ωs, t, x0l, y0l, z0l, xl, yl, zl, e)
Here, (xr, yr, zr) is the position of the virtual sound pickup device for the right channel, and (xl, yl, zl) is the position of the virtual sound pickup device for the left channel.
The right and left sound pickup devices are preferably matched to the listener or the average human left and right ear position. In this way, even if the listener is listening with general headphones, a suitable presence can be obtained. Further, the processing of the second stage S2 of the inverse head-related transfer function rHRTF is performed in parallel in time.
When the position (x0l, y0l, z0l) and the position (xl, yl, zl) are equal, the processing by the sound collection position adjustment filter S2 is passed and V1M = V2L.

図３に例示したオープン式収音形態によると、マイクは、話手の頭の動きに沿って運動することもなく、常にある一点で静止しているので、最も自然な収音が可能といえる。
３つ目のオープン式モノラル収音では、話手のオープン式モノラル収音装置で収音されたモノラルの音V'0Mは、逆頭部伝達関数rHRTFの第二段階S2の変換を用いて、音V'0Mを位置(xl,yl,zl)のオープン式相当の音V2L（モノラル）[ポイント2L]や、位置(xr,yr,zr)と(xl,yl,zl)のオープン式相当の音V2R / V2L（ステレオ）[ポイント2R/ポイント2L]として次のように出力される。 According to the open type sound pickup form illustrated in FIG. 3, the microphone does not move along the movement of the speaker's head, and is always stationary at a certain point, so that the most natural sound pickup is possible. .
In the third open-type mono sound pickup, the monaural sound V'0M picked up by the speaker's open-type mono sound pickup device is converted using the second stage S2 conversion of the inverse head-related transfer function rHRTF. Sound V'0M is equivalent to an open-type sound V2L (monaural) [point 2L] at position (xl, yl, zl) or an open-type equivalent at positions (xr, yr, zr) and (xl, yl, zl) Sound V2R / V2L (stereo) [Point 2R / Point 2L] is output as follows.

モノラル入力、モノラル出力:
V2M = S2(V'0M, χs, ψs, ωs, t, x0l, y0l, z0l, xl, yl, zl, e)
ここで、(x0l, y0l, z0l)は、実際に収音された位置、(xl, yl, zl)は、仮想の収音位置である。
このように、逆頭部伝達関数rHRTFの第二段階である収音位置調整フィルターS2とは、話手から離れた位置で固定された収音装置で収音された音V'0Mを、話手や話手周囲の情報を駆使し、実際の収音装置位置(x0l, y0l, z0l)とは異なる位置(xl, yl, zl)にあるマイクにてあたかも収音されたかのような音V2Mに変換する関数である。 Mono input, mono output:
V2M = S2 (V'0M, χs, ψs, ωs, t, x0l, y0l, z0l, xl, yl, zl, e)
Here, (x0l, y0l, z0l) is a position where the sound is actually collected, and (xl, yl, zl) is a virtual sound collection position.
In this way, the sound collection position adjustment filter S2, which is the second stage of the inverse head-related transfer function rHRTF, speaks the sound V'0M collected by the sound collection device fixed at a position away from the speaker. Using the information around the hand and speaker, the V2M sound is as if it was picked up by a microphone at a position (xl, yl, zl) different from the actual sound pickup device position (x0l, y0l, z0l). Function to convert.

ステレオ化出力の場合は、前式を前述と同様の方法で２つ並列させればよい。すなわち、次のように記述できる。
モノラル入力、ステレオ出力（右チャネル）:
V2R = S2(V'0M, χs, ψs, ωs, t, x0l, y0l, z0l, xr, yr, zr, e)
モノラル入力、ステレオ出力（左チャネル）:
V2L = S2(V'0M, χs, ψs, ωs, t, x0l, y0l, z0l, xl, yl, zl, e)
ここで、(x0l, y0l, z0l)は、実際に収音された位置、(xl, yl, zl)は、仮想の左側収音位置、(xr, yr, zr)は、仮想の右側収音位置である。 In the case of stereo output, two previous equations may be paralleled in the same manner as described above. That is, it can be described as follows.
Mono input, stereo output (right channel):
V2R = S2 (V'0M, χs, ψs, ωs, t, x0l, y0l, z0l, xr, yr, zr, e)
Mono input, stereo output (left channel):
V2L = S2 (V'0M, χs, ψs, ωs, t, x0l, y0l, z0l, xl, yl, zl, e)
Where (x0l, y0l, z0l) is the actual pickup location, (xl, yl, zl) is the virtual left pickup location, and (xr, yr, zr) is the virtual right pickup. Position.

４つ目のオープン式ステレオ収音では、話手のオープン式ステレオ収音装置で収音された左右の音V'0R, V'0Lは、逆頭部伝達関数rHRTFの第二段階S2を用いて、音V'0R / V'0Lを最適な音V2R / V2L（ステレオ）[ポイント2R/ポイント2L]として次のように出力される。
右チャネル:
V2R = S2(V'0R, χs, ψs, ωs, t, x0r, y0r, z0r, xr, yr, zr, e)
左チャネル:
V2L = S2(V'0L, χs, ψs, ωs, t, x0l, y0l, z0l, xl, yl, zl, e)
ここで、(x0r, y0r, z0r)は、右チャネルの実際の収音位置、(x0l, y0l, z0l)は、左チャネルの実際の収音位置、(xr, yr, zr)は、右チャネルの変換後の仮想的な収音位置、(xl, yl, zl)は、左チャネルの変換後の仮想的な収音位置である。 In the fourth open stereo sound pickup, the left and right sounds V'0R and V'0L picked up by the speaker's open stereo sound pickup device use the second stage S2 of the inverse head-related transfer function rHRTF. The sound V'0R / V'0L is output as the optimum sound V2R / V2L (stereo) [point 2R / point 2L] as follows.
Right channel:
V2R = S2 (V'0R, χs, ψs, ωs, t, x0r, y0r, z0r, xr, yr, zr, e)
Left channel:
V2L = S2 (V'0L, χs, ψs, ωs, t, x0l, y0l, z0l, xl, yl, zl, e)
Where (x0r, y0r, z0r) is the actual pickup position of the right channel, (x0l, y0l, z0l) is the actual pickup position of the left channel, and (xr, yr, zr) is the right channel (Xl, yl, zl) is the virtual sound pickup position after conversion of the left channel.

位置(x0l,y0l,z0l)と位置(xl,yl,zl)が等しくかつ位置(x0r,y0r,z0r)と位置(xr,yr,zr)が等しい場合、収音位置調整フィルターS2処理は素通りしてV'0R=V2R, V'0L=V2Lとする。
仮想的な左右の収音位置が実際の聞手のスピーカー位置であり、かつオープン式ステレオ再生式ならば、この段階で既に臨場感のある音の再生が可能となる。
なお、収音方法は、バイノーラル録音と同様の手法が望ましい。
また、再生する場合、聞手から離れた適切な位置にステレオスピーカを設置したときはオープン式ステレオ再生と記述する。 If the position (x0l, y0l, z0l) is equal to the position (xl, yl, zl) and the position (x0r, y0r, z0r) is equal to the position (xr, yr, zr), the sound collection position adjustment filter S2 processing is passed. V'0R = V2R and V'0L = V2L.
If the virtual left and right sound pickup positions are the speaker positions of the actual listener and the open stereo playback type, it is possible to reproduce the sound that is already realistic at this stage.
The sound collection method is preferably the same method as that for binaural recording.
In the case of reproduction, when stereo speakers are installed at an appropriate position away from the listener, it is described as open stereo reproduction.

以上をまとめると次のようになる。
ヘッドセット式（モノラル収音）：
モノラル収音 V0M、モノラル出力 V1M、逆頭部伝達関数rHRTFの第一段階S1
モノラル収音 V0M、ステレオ出力 V2R, V2L、逆頭部伝達関数rHRTFの第一段階S1と第二段階S2
V1M = S1(V0M, χs, ψs, ωs, t, x0l, y0l, z0l, e)
V2R = S2(V1M, χs, ψs, ωs, t, x0l, y0l, z0l, xr, yr, zr, e)
V2L = S2(V1M, χs, ψs, ωs, t, x0l, y0l, z0l, xl, yl, zl, e)
展開すると次のようになる。
V2R = S2(S1(V0M, χs, ψs, ωs, t, x0l, y0l, z0l, e), χs, ψs, ωs, t, x0l, y0l, z0l, xr, yr, zr, e)
V2L = S2(S1(V0M, χs, ψs, ωs, t, x0l, y0l, z0l, e), χs, ψs, ωs, t, x0l, y0l, z0l, xl, yl, zl, e)
このように、ヘッドセット音V0Mと話者の頭部動作χs, ψs, ωs、時間t、仮想的な周囲の環境e、仮想的な１次収音位置(x0l,y0l,z0l)、仮想的な左収音位置(xl,yl,zl)と右収音位置(xr,yr,zr)を入力すれば、仮想的な左収音位置(xl,yl,zl)と右収音位置(xr,yr,zr)で収音したかのようなV2LとV2Rが得られる。 The above is summarized as follows.
Headset type (monaural sound pickup):
Monaural sound pickup V0M, monaural output V1M, first stage S1 of inverse head-related transfer function rHRTF
Monaural sound pickup V0M, stereo output V2R, V2L, first stage S1 and second stage S2 of inverse head-related transfer function rHRTF
V1M = S1 (V0M, χs, ψs, ωs, t, x0l, y0l, z0l, e)
V2R = S2 (V1M, χs, ψs, ωs, t, x0l, y0l, z0l, xr, yr, zr, e)
V2L = S2 (V1M, χs, ψs, ωs, t, x0l, y0l, z0l, xl, yl, zl, e)
When expanded, it looks like this:
V2R = S2 (S1 (V0M, χs, ψs, ωs, t, x0l, y0l, z0l, e), χs, ψs, ωs, t, x0l, y0l, z0l, xr, yr, zr, e)
V2L = S2 (S1 (V0M, χs, ψs, ωs, t, x0l, y0l, z0l, e), χs, ψs, ωs, t, x0l, y0l, z0l, xl, yl, zl, e)
Thus, the headset sound V0M, the head movement of the speaker χs, ψs, ωs, time t, the virtual surrounding environment e, the virtual primary sound pickup position (x0l, y0l, z0l), the virtual If the left sound pickup position (xl, yl, zl) and right sound pickup position (xr, yr, zr) are input, the virtual left sound pickup position (xl, yl, zl) and the right sound pickup position (xr , yr, zr), V2L and V2R are obtained.

ここで、仮想的な１次収音位置(x0l,y0l,z0l)、と仮想的な左収音位置(xl,yl,zl)が等しいならば、右チャネルで頭部動作補正と位置補正を行ない、左チャネルでは頭部動作補正のみを行うことで、左チャネルは収音位置調整フィルターS2を使う必要がなくなり、下式で十分となる。
V2R = S2(S1(V0M, χs, ψs, ωs, t, x0l, y0l, z0l, e), χs, ψs, ωs, t, x0l, y0l, z0l, xr, yr, zr, e)
V2L = S1(V0M, χs, ψs, ωs, t, x0l, y0l, z0l, e)
収音が(x0l, y0l, z0l)の位置と(x0r, y0r, z0r)の位置とのステレオで行われたオープン式である場合は、頭部動作調整フィルターS1の工程を略し、収音位置調整フィルターS2の工程だけを用いて演算量を減らすことが可能である。 Here, if the virtual primary sound collection position (x0l, y0l, z0l) and the virtual left sound collection position (xl, yl, zl) are equal, head movement correction and position correction are performed in the right channel. By performing only the head movement correction in the left channel, it is not necessary to use the sound collection position adjustment filter S2 in the left channel, and the following equation is sufficient.
V2R = S2 (S1 (V0M, χs, ψs, ωs, t, x0l, y0l, z0l, e), χs, ψs, ωs, t, x0l, y0l, z0l, xr, yr, zr, e)
V2L = S1 (V0M, χs, ψs, ωs, t, x0l, y0l, z0l, e)
If the sound collection is an open type performed in stereo between the position (x0l, y0l, z0l) and the position (x0r, y0r, z0r), the head movement adjustment filter S1 process is omitted, and the sound collection position It is possible to reduce the amount of calculation using only the process of the adjustment filter S2.

オープン式（(x0l, y0l, z0l)におけるモノラル収音）：
モノラル収音 V'0M、モノラル出力 V2M(=V2L)、逆頭部伝達関数rHRTFの第二段階S2
V2M = S2(V'0M, χs, ψs, ωs, t, x0l, y0l, z0l, xl, yl, zl, e)
モノラル収音 V'0M、ステレオ出力 V2R, V2L、逆頭部伝達関数rHRTFの第二段階S2
V2R = S2(V'0M, χs, ψs, ωs, t, x0l, y0l, z0l, xr, yr, zr, e)
V2L = S2(V'0M, χs, ψs, ωs, t, x0l, y0l, z0l, xl, yl, zl, e)
オープン式（(x0l, y0l, z0l)、(x0r, y0r, z0r)におけるステレオ収音）：
ステレオ収音V'0R, V'0L、ステレオ出力 V2R, V2L、逆頭部伝達関数rHRTFの第二段階S2
V2R = S2(V'0R, χs, ψs, ωs, t, x0r, y0r, z0r, xr, yr, zr, e)
V2L = S2(V'0L, χs, ψs, ωs, t, x0l, y0l, z0l, xl, yl, zl, e)
このように、収音時のシステムにあわせ適宜、頭部動作調整フィルターS1や収音位置調整フィルターS2を組み合わせることにより、適切な臨場感溢れる音V2R, V2L、またはV2Mが得られる。 Open type (monaural sound collection in (x0l, y0l, z0l)):
Monaural pickup V'0M, monaural output V2M (= V2L), second stage S2 of inverse head-related transfer function rHRTF
V2M = S2 (V'0M, χs, ψs, ωs, t, x0l, y0l, z0l, xl, yl, zl, e)
Monaural sound collection V'0M, stereo output V2R, V2L, second stage S2 of inverse head-related transfer function rHRTF
V2R = S2 (V'0M, χs, ψs, ωs, t, x0l, y0l, z0l, xr, yr, zr, e)
V2L = S2 (V'0M, χs, ψs, ωs, t, x0l, y0l, z0l, xl, yl, zl, e)
Open type (stereo sound collection in (x0l, y0l, z0l), (x0r, y0r, z0r)):
Stereo sound collection V'0R, V'0L, stereo output V2R, V2L, second stage S2 of inverse head-related transfer function rHRTF
V2R = S2 (V'0R, χs, ψs, ωs, t, x0r, y0r, z0r, xr, yr, zr, e)
V2L = S2 (V'0L, χs, ψs, ωs, t, x0l, y0l, z0l, xl, yl, zl, e)
As described above, by appropriately combining the head movement adjustment filter S1 and the sound collection position adjustment filter S2 in accordance with the system at the time of sound collection, appropriate sound V2R, V2L, or V2M with a sense of realism can be obtained.

図４は、聞手の側の頭部動作を示す説明図である。
対面対話を想定し、話手が仮に頭部を静止させたまま話をしているとする。その時に聞手が顔を横に振った場合、聞手は頭を静止させているときとは異なる音の変化を受け取ることになる。あたりまえのことではあるが、聞手が右に顔を振れば、聞手の左耳は話手の口に近づくことになり、左耳はより大きな音を拾い、逆に聞手の右耳はより小さな音を拾うことになる。音量のみだけでなく、スペクトルの変化、時間差なども発声する。これについては、一般的に頭部伝達関数HRTFとして知られている。 FIG. 4 is an explanatory diagram showing the head movement on the listener's side.
Assume a face-to-face conversation and assume that the speaker is talking with his head still. If the listener shakes his / her face at that time, the listener will receive a different sound change than when the head is stationary. Obviously, if the listener shakes his face to the right, the listener's left ear approaches the speaker's mouth, the left ear picks up a louder sound, and conversely the listener's right ear You will pick up a smaller sound. He speaks not only the volume but also the change of spectrum and time difference. This is generally known as the head related transfer function HRTF.

この聞手側の頭部動作に関する頭部伝達関数HRTFの音声調整装置Hは、聞手のヨーωh、ピッチχh、ロールψhという頭部動作の影響を加味して、先の音声調整装置S1/S2で必要に応じて変換された話手の音V2R / V2Lの音量音質を変化させる装置であり、これもある種の音のエフェクターに相当する。なお、この頭部伝達関数HRTFの音声調整装置H側では、話手のヨーωs、ピッチχs、ロールψsは全く独立している。
頭部伝達関数HRTFとしては、前記の逆頭部伝達関数rHRTFの場合と同様、聞手固有の頭部伝達関数HRTFを用いることも、平均的な人の頭部部伝達関数HRTFを用いることもできる。また、この段階で利用者である聞手は、話手が話している位置を仮想的に任意にリアルタイムで変更することもできる。 The sound adjustment device H of the head related transfer function HRTF related to the head movement on the listener side takes into account the effects of the head movement of the listener's yaw ωh, pitch χh, roll ψh, and the previous sound adjustment device S1 / It is a device that changes the volume sound quality of the speaker's sound V2R / V2L converted as needed in S2, and this also corresponds to an effector of some kind of sound. Note that the yaw ωs, pitch χs, and roll ψs of the speaker are completely independent on the side of the head adjustment function HRTF on the sound adjustment device H side.
As for the head-related transfer function HRTF, the head-related transfer function HRTF specific to the listener can be used as in the case of the inverse head-related transfer function rHRTF described above, or the average human head-related transfer function HRTF can be used. it can. In addition, the listener who is the user at this stage can virtually arbitrarily change the position where the speaker is speaking in real time.

音声調整装置Hは、コントローラからの指示を受け、利用者は以下の２つの動作状態を選択することができる。１つ目は、聞手の頭部動作の影響を利用せず、聞手への出力を聞手から離れたスピーカなどで再生するオープン式ステレオ再生、２つ目は、聞手の頭部動作の影響を考慮し、頭部伝達関数HRTFを用いて最も臨場感を高めてヘッドホンなどで再生するヘッドセット式ステレオ再生である。
オープン式ステレオ再生の場合、両者の位置関係を近づけすぎると再生が不自然になり臨場感が乏しくなるので、一般的にはヘッドホン再生が望ましい。 The sound adjustment device H receives an instruction from the controller, and the user can select the following two operation states. The first is open stereo playback that does not use the effect of the listener's head movement, and the output to the listener is played back with a speaker, etc. away from the listener. The second is the listener's head movement. In consideration of the influence of the head, the head-type stereo playback that reproduces with headphones etc. with the highest realistic feeling using the head related transfer function HRTF.
In the case of open stereo reproduction, if the positional relationship between the two is too close, reproduction becomes unnatural and the sense of reality becomes poor.

以上の多段階の音声調整装置S1/S2，Hを経て、話者の発した音声は、話手及び聞手の諸条件を加味され、より臨場感を増して聞手に伝わることになる。
音の臨場感を保ったままの通信を行うには、話手も聞手も複数の収音装置を用いるオープン式（ステレオ収音）で収音された音を、複数の音再生装置（スピーカなど）によるオープン式（ステレオ再生）で再生して相互に話すべきといえる。しかし、この手法は、どうしてもオープン式であるために空間を占有すること、周りに居る他人に対する音漏れ等の理由から、安易に実装しやすいとは言いがたい。
そこで、話手と聞手のどちらか一方または双方が、オープン式（ステレオ収音）−オープン式（ステレオ再生）の組み合わせでない場合に、話手がヘッドセット式で話してもオープン式で収音したかのような状態に変換する逆頭部伝達関数rHRTFを用いて、安価かつ手軽な手段により、可能な限りの修正を施し、実際の対面対話、即ち話手聞手双方でオープン式（ステレオ収音）−オープン式（ステレオ再生）の組み合わせを用いた状態に近づけるようにしたわけである。 Through the above-described multi-stage voice adjustment devices S1 / S2, H, the voice produced by the speaker is transmitted to the listener with a greater sense of reality, taking into account the conditions of the speaker and the listener.
In order to perform communication while maintaining the presence of sound, the sound collected by the open method (stereo sound collection) using a plurality of sound collection devices for both the speaker and the listener is converted into a plurality of sound reproduction devices (speakers). Etc.) can be said to be played back in an open format (stereo playback). However, it is difficult to say that this method is easy to implement because it occupies space because it is an open type, and sound leaks to others around it.
So, if either or both of the speaker and the listener are not a combination of open type (stereo sound collection) -open type (stereo playback), even if the speaker speaks in the headset type, the sound is collected in the open type. Using the inverse head-related transfer function rHRTF that converts to the state as if it were done, it was modified as much as possible by cheap and easy means, and the actual face-to-face conversation, that is, both the speaker and the listener were open (stereo The sound pickup) -open type (stereo reproduction) combination is used.

図５は、逆頭部伝達関数rHRTFと頭部伝達関数HRTFを組み合わせた具体的な利用例を示す説明図である。
通常の音声通信では、逆頭部伝達関数rHRTFで変換されていない話手の音声を聞手がただ聞くだけであるが、本装置では、話手のヘッドセットでの音声、話手と聞手の頭部動作、仮想的な位置関係や周囲の環境を、音声調整装置S1/S2（rHRTFの実行手段）及び音声調整装置H（HRTFの実行手段）に与えることにより、臨場感があふれる音声対話を生み出すことができる。実際には図６のようになる。 FIG. 5 is an explanatory diagram showing a specific usage example in which the inverse head-related transfer function rHRTF and the head-related transfer function HRTF are combined.
In normal voice communication, the listener simply listens to the speaker's voice that has not been converted by the inverse head-related transfer function rHRTF, but this device uses the speaker's headset voice, speaker and listener. Spoken dialogue that gives a sense of realism by giving the voice adjustment device S1 / S2 (rHRTF execution means) and voice adjustment device H (HRTF execution means) to the head movement, virtual positional relationship and surrounding environment Can be produced. Actually, as shown in FIG.

図６は、コントローラを例示する説明図である。
コントローラには、各音声調整装置S1/S2, Hが一体的に含まれていて、話手の音（V1等）、話手の各頭部動作χs, ψs, ωs、聞手の各頭部動作χh, ψh, ωhを入力する端子があり、また音V3R / V3Lを出力するスピーカやヘッドホン端子などがある。
音V3R / V3Lを出力するスピーカを、この筐体に搭載することも可能である。その場合は音量を調節するスイッチも設ける。また、個人情報や各逆頭部伝達関数rHRTF、頭部伝達関数HRTFを他の装置やPCなどとやりとりする外部入出力端子を設けてもよい。また、非通信時に聞手の各動作の中心位置や左右の耳の位置などの個人情報を、多目的スイッチから、搭載された対話式インタフェースを用いるなどをして、修正をすることも可能である。 FIG. 6 is an explanatory diagram illustrating a controller.
The controller includes the sound adjustment devices S1 / S2 and H as one unit, the speaker's sound (V1 etc.), the speaker's head movements χs, ψs, ωs, and the listener's head There are terminals for inputting operations χh, ψh, ωh, and speakers and headphone terminals for outputting sound V3R / V3L.
A speaker that outputs sound V3R / V3L can also be installed in this housing. In that case, a switch for adjusting the volume is also provided. In addition, an external input / output terminal for exchanging personal information, each inverse head-related transfer function rHRTF, and head-related transfer function HRTF with another device or a PC may be provided. It is also possible to revise personal information such as the center position of each operation of the listener and the positions of the left and right ears during non-communication by using a built-in interactive interface from a multipurpose switch. .

通信時にコントロールできるものは、４択スイッチ（１２）による相手話手の収音装置のタイプ（４種）、２択スイッチ（１３）による利用者聞手の再生装置のタイプ（２種）、回転つまみ（１４）による音量、４択スイッチ（１５）による典型的な両者の位置関係などである。
両者の位置関係は、非通信時に、典型的な位置をセットしておくことも可能であり、容易に選択できるようにしてもよい。その位置の例としては、対面で50cmほど離れて同じ高さで話す位置関係、同じベンチなどで２人が並んで腰をかけて話す位置関係、互いに互いの左耳の真近で話す位置関係、互いに背中合わせで話す位置関係などが挙げられる。更に、トポロジカル的に現実にはありえない配座も可能である。 What can be controlled at the time of communication is the type (4 types) of the other party's sound collecting device by the 4-select switch (12), the type of playback device (2 types) of the user's listener by the 2-select switch (13), and rotation For example, the volume of the knob (14) and the typical positional relationship between the four selector switch (15).
As the positional relationship between the two, a typical position can be set at the time of non-communication, and may be easily selected. Examples of the position are: a positional relationship where the two people talk about the same height 50cm away from each other, a positional relationship where two people sit side by side on the same bench, etc., a positional relationship where each other speaks in close proximity to each other's left ear , The positional relationship of talking back to back with each other. Furthermore, a conformation that is topologically impossible in reality is possible.

変更された項目は、リアルタイムに各音声調整装置S1/S2, Hにその情報が伝達され反映される。例えば、テンキーを含む１６択スイッチ（１６）によって、通信中に各音声調整装置S1/S2, Hで行われている処理のタイプを適宜選択する。
なお、ディスプレイ（１１）には、現在の両者の位置関係を図示したり、相手や自分の頭部動作を簡単なエージェントで再現したりする。 The changed items are transmitted and reflected in the audio adjustment devices S1 / S2, H in real time. For example, the type of processing performed in each of the sound adjustment devices S1 / S2, H during communication is appropriately selected by a 16-select switch (16) including a numeric keypad.
The display (11) displays the current positional relationship between the two, and reproduces the opponent and his / her own head movement with a simple agent.

本発明の音響再現方法及び装置は、遠隔地で対面対話を行っている時など、ヘッドセット式モノラル収音装置とヘッドセット式ステレオ再生装置を用いても、頭部動作による臨場感を十分に再現できるので、用途が広く産業上非常に有用である。 The sound reproduction method and apparatus of the present invention can provide a realistic feeling due to head movement even when using a headset monaural sound pickup device and a headset stereo playback device, such as when conducting a face-to-face conversation at a remote place. Since it can be reproduced, it is widely used and very useful industrially.

収音部が音源部に固定されたヘッドセット式での収音形態を示す説明図Explanatory drawing which shows the sound collection form by the headset type in which the sound collection part was fixed to the sound source part 音声を変換する流れを示す説明図Explanatory diagram showing the flow of converting audio 収音部が音源部と空間を隔てて独立したオープン式での収音形態を示す説明図Explanatory drawing which shows the sound collection form in the open type in which the sound collection unit is independent from the sound source unit across the space 聞手側の頭部動作を示す説明図Explanatory diagram showing head movement on the listener side 音声を変換する装置の要部構成を示す説明図Explanatory drawing which shows the principal part structure of the apparatus which converts an audio | voice. コントローラを例示する説明図Explanatory diagram illustrating the controller

符号の説明Explanation of symbols

１１ディスプレイ
１２〜１６スイッチ
11 Display 12-16 Switch

Claims

音声を発する音源部と音声を収音する収音部とを備えた通信体同士の通信において、その通信体の少なくとも一方が、収音部を音源部に対して相対的に固定したヘッドセット式の収音形態であっても、収音部を音源部に対して相対的に固定していないオープン式の収音形態での収音に相当する音声を再現する方法であって、
話手側の通信体が空間的に動作をしたとき、
その話手側通信体の音源部の空間的動作を示す情報を用いて、
音源部の位置の変化から、仮想的な位置に固定された収音部で収音したのに相当する音声の変化を求める逆頭部伝達関数によって、
その話手側通信体の収音部で収音した音声を変換する
ことを特徴とする音響再現方法。 In communication between communication bodies including a sound source section that emits sound and a sound collection section that collects sound, a headset type in which at least one of the communication bodies fixes the sound collection section relative to the sound source section Is a method of reproducing sound corresponding to sound collection in an open-type sound collection form in which the sound collection part is not fixed relative to the sound source part,
When the speaker's communication body operates spatially,
Using information indicating the spatial operation of the sound source section of the speaker-side communication body,
From the change in the position of the sound source part, by the inverse head-related transfer function for obtaining the sound change corresponding to the sound picked up by the sound pickup part fixed at the virtual position,
A sound reproduction method characterized by converting the sound collected by the sound collection unit of the speaker-side communication body.

聞手側の通信体が空間的に動作をしたとき、
その聞手側通信体の収音部の空間的動作を示す情報を用いて、
収音部の位置の変化から、仮想的な位置に固定された収音部で収音したのに相当する音声の変化を求める頭部伝達関数によって、
話手側通信体から伝達された音声を変換する
請求項１に記載の音響再現方法。 When the listener's communication body operates spatially,
Using information indicating the spatial operation of the sound collection part of the listener's communication body,
From the change in the position of the sound collection unit, the head related transfer function that calculates the change in sound equivalent to the sound collected by the sound collection unit fixed at the virtual position,
The sound reproduction method according to claim 1, wherein the sound transmitted from the speaker side communication body is converted.

通信体が、人物であり、
空間的動作を、人物の頭部の略水平回転方位に回転する水平回転角（ヨー角）、前後回転方位に回転する前後回転角（ピッチ角）、左右回転方位に回転する左右回転角（ロール角）とで表現する
請求項１または２に記載の音響再現方法。 The communication body is a person,
Spatial movements can be achieved by rotating the human head to a horizontal rotation angle (yaw angle) that rotates in a substantially horizontal direction, a front-rear rotation angle (pitch angle) that rotates in a front-rear direction, and a left-right rotation angle (roll) The sound reproduction method according to claim 1 or 2.

音声を発する音源部と音声を収音する収音部とを備えた通信体同士の通信において、その通信体の少なくとも一方が、収音部を音源部に対して相対的に固定したヘッドセット式の収音形態であっても、収音部を音源部に対して相対的に固定していないオープン式の収音形態での収音に相当する音声を再現する装置であって、
話手側の通信体が空間的に動作をしたとき、
その話手側通信体の音源部の空間的動作を示す情報を計測して出力する音源部動作伝達手段と、
音源部の位置の変化から、仮想的な位置に固定された収音部で収音したのに相当する音声の変化を求める逆頭部伝達関数によって、
その話手側通信体の収音部で収音した音声を変換する音声調整手段とを備える
ことを特徴とする音響再現装置。 In communication between communication bodies including a sound source section that emits sound and a sound collection section that collects sound, a headset type in which at least one of the communication bodies fixes the sound collection section relative to the sound source section Is a device that reproduces sound corresponding to sound collection in an open-type sound collection form in which the sound collection part is not fixed relative to the sound source part,
When the speaker's communication body operates spatially,
Sound source unit motion transmission means for measuring and outputting information indicating the spatial operation of the sound source unit of the speaker side communication body,
From the change in the position of the sound source part, by the inverse head-related transfer function for obtaining the sound change corresponding to the sound picked up by the sound pickup part fixed at the virtual position,
A sound reproduction device comprising: sound adjusting means for converting the sound collected by the sound collecting unit of the speaker-side communication body.

聞手側の通信体が空間的に動作をしたとき、
その聞手側通信体の収音部の空間的動作を示す情報を計測して出力する収音部動作伝達手段と、
収音部の位置の変化から、仮想的な位置に固定された収音部で収音したのに相当する音声の変化を求める頭部伝達関数によって、
話手側通信体から伝達された音声を変換する音声調整手段とを備える
請求項４に記載の音響再現装置。
When the listener's communication body operates spatially,
Sound collection unit operation transmission means for measuring and outputting information indicating the spatial operation of the sound collection unit of the listener-side communication body,
From the change in the position of the sound collection unit, the head related transfer function that calculates the change in sound equivalent to the sound collected by the sound collection unit fixed at the virtual position,
The sound reproduction device according to claim 4, further comprising: a voice adjustment unit that converts voice transmitted from the speaker-side communication body.