JP6087856B2

JP6087856B2 - Sound field recording and reproducing apparatus, system, method and program

Info

Publication number: JP6087856B2
Application number: JP2014047024A
Authority: JP
Inventors: 翔一小山; 島内　末廣; 末廣島内; 仲大室
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2014-03-11
Filing date: 2014-03-11
Publication date: 2017-03-01
Anticipated expiration: 2034-03-11
Also published as: JP2015171111A

Description

この発明は、ある音場に設置されたマイクロホンアレイで音信号を収音し、その音信号を用いてスピーカアレイでその音場を再生する波面合成法（Wave Field Synthesis）の技術に関する。 The present invention relates to a wave field synthesis technique in which a sound array is collected by a microphone array installed in a certain sound field, and the sound field is reproduced by a speaker array using the sound signal.

波面合成法は、複数のマイクロホンとスピーカを用いて、遠隔地の音場を仮想的に再現する技術である。遠隔コミュニケーションシステムなどの応用では、リアルタイムの収音・再現が必要になるため、一般的なマイクロホンアレイで収音した音圧を、一般的なスピーカアレイで出力するための音場再現信号へと一意に変換可能であることが必要となる。 The wavefront synthesis method is a technique for virtually reproducing a sound field in a remote place using a plurality of microphones and speakers. In applications such as remote communication systems, real-time sound collection / reproduction is required, so the sound pressure collected by a general microphone array is unique to a sound field reproduction signal for output by a general speaker array. It must be convertible to

波面合成法の従来技術として非特許文献１に記載の方法が知られている。 As a conventional technique of the wavefront synthesis method, a method described in Non-Patent Document 1 is known.

小山、古家、日和崎、羽田、「音場収音・再現のための時空間周波数領域信号変換法」、音響学会秋季研究発表会講演論文集、pp. 635-636、2011年Koyama, Furuya, Hiwasaki, Haneda, "Spatial-Time Frequency Domain Signal Conversion Method for Sound Field Recording and Reproduction", Proc. Of the Acoustical Society of Japan Autumn Meeting, pp. 635-636, 2011

しかしながら、従来技術では、マイクロホンアレイで取得した音場をスピーカアレイを用いて再現する場合に、マイクロホンやスピーカの間隔に依存する空間エイリアシングによる誤差が生じる。これはマイクロホンの数がスピーカの数よりも少ない場合に特に顕著である。空間エイリアシングによる誤差が生じると、再現された音場において、音質や定位感の劣化が生じるという課題がある。 However, in the prior art, when a sound field acquired by a microphone array is reproduced using a speaker array, an error due to spatial aliasing depending on the distance between the microphone and the speaker occurs. This is particularly noticeable when the number of microphones is smaller than the number of speakers. When an error due to spatial aliasing occurs, there is a problem in that sound quality and localization are deteriorated in the reproduced sound field.

この発明の目的は、一般的なマイクロホン及びスピーカを用いて、高精度に音場を収音し再現することができる音場収音再生装置、システム、方法及びプログラムを提供することである。 An object of the present invention is to provide a sound field sound collecting / reproducing apparatus, system, method, and program capable of collecting and reproducing a sound field with high accuracy using a general microphone and speaker.

上記の課題を解決するために、この発明の第一の態様の音場収音再生装置は、複数のマイクロホンで収音された信号に基づいて生成された時間周波数領域信号を、潜在的に点音源が存在すると仮定する位置の集合である潜在点音源位置に基づいて、点音源に由来する成分である第一の信号と第一の信号以外の成分である第二の信号とに分離する信号分解処理部を含む。 In order to solve the above-described problem, the sound field sound collecting / reproducing apparatus according to the first aspect of the present invention is configured to potentially point a time frequency domain signal generated based on signals collected by a plurality of microphones. A signal that is separated into a first signal that is a component derived from a point sound source and a second signal that is a component other than the first signal, based on a latent point sound source position that is a set of positions where a sound source is assumed to exist Includes decomposition processing unit.

この発明の第二の態様の音場収音再生装置は、複数のマイクロホンで収音された信号から点音源に由来する成分を分離した第一の信号に基づいて生成された点音源駆動信号と、第一の信号と異なる第二の信号に基づいて生成された残差駆動信号とを合成して、マイクロホンで収音された信号の波面を再現するためのスピーカ駆動信号を生成する駆動信号合成部を含む。 The sound field sound collecting / reproducing apparatus according to the second aspect of the present invention includes a point sound source drive signal generated based on a first signal obtained by separating components derived from a point sound source from signals collected by a plurality of microphones. Drive signal synthesis for synthesizing a residual drive signal generated based on a second signal different from the first signal to generate a speaker drive signal for reproducing the wavefront of the signal picked up by the microphone Part.

この発明の第三の態様の音場収音再生システムは、音場収音装置と音場再生装置を含む。音場収音装置は、複数のマイクロホンで収音された信号に基づいて生成された時間周波数領域信号を、潜在的に点音源が存在すると仮定する位置の集合である潜在点音源位置に基づいて、点音源に由来する成分である第一の信号と第一の信号以外の成分である第二の信号とに分離する信号分解処理部と、第一の信号及び潜在点音源位置に基づいて音圧勾配に相当する点音源駆動信号を求める点音源駆動信号計算部と、第二の信号に波面再構成フィルタを適用して音圧勾配に相当する残差駆動信号を求める残差駆動信号計算部と、を含む。音場再生装置は、点音源駆動信号と残差駆動信号とを合成してマイクロホンで収音された信号の波面を再現するためのスピーカ駆動信号を生成する駆動信号合成部を含む。 The sound field sound collecting and reproducing system according to the third aspect of the present invention includes a sound field sound collecting device and a sound field reproducing device. The sound field sound collection device is configured to generate a time-frequency domain signal generated based on signals collected by a plurality of microphones based on a potential point sound source position, which is a set of positions where a point sound source exists. A signal decomposition processing unit that separates a first signal that is a component derived from a point sound source into a second signal that is a component other than the first signal, and a sound based on the first signal and the potential point sound source position A point sound source drive signal calculation unit for obtaining a point sound source drive signal corresponding to the pressure gradient, and a residual drive signal calculation unit for obtaining a residual drive signal corresponding to the sound pressure gradient by applying a wavefront reconstruction filter to the second signal And including. The sound field reproduction device includes a drive signal synthesizer that synthesizes the point sound source drive signal and the residual drive signal to generate a speaker drive signal for reproducing the wavefront of the signal collected by the microphone.

この発明の音場収音再生技術によれば、空間エイリアシングによる誤差を軽減することができ、再現された音場における音質や定位感の劣化の改善が可能となる。 According to the sound field collecting and reproducing technique of the present invention, errors due to spatial aliasing can be reduced, and deterioration of sound quality and localization in a reproduced sound field can be improved.

図１は、マイクロホン及びスピーカの配置の例を説明するための図である。FIG. 1 is a diagram for explaining an example of the arrangement of microphones and speakers. 図２は、第一実施形態の音場収音再生装置の機能構成を例示する図である。FIG. 2 is a diagram illustrating a functional configuration of the sound field sound collecting / reproducing apparatus according to the first embodiment. 図３は、潜在点音源位置の設定の例を説明するための図である。FIG. 3 is a diagram for explaining an example of setting the latent point sound source position. 図４は、第一実施形態の音場収音再生方法の処理フローを例示する図である。FIG. 4 is a diagram illustrating a processing flow of the sound field sound collecting and reproducing method according to the first embodiment. 図５は、第二実施形態の音場収音再生装置の機能構成を例示する図である。FIG. 5 is a diagram illustrating a functional configuration of the sound field sound collecting / reproducing apparatus according to the second embodiment. 図６は、第二実施形態の音場収音再生方法の処理フローを例示する図である。FIG. 6 is a diagram illustrating a processing flow of the sound field sound collecting / reproducing method of the second embodiment.

以下、この発明の実施形態について説明する。なお、以下の説明に用いる図面中において同じ機能を有する構成部や同じ処理を行うステップには同一の符号を記し、重複説明を省略する。 Embodiments of the present invention will be described below. In the drawings used for the following description, components having the same functions and steps for performing the same processing are denoted by the same reference numerals, and redundant description is omitted.

以下の説明において、文中で使用する記号「⁻」「~」「＾」等は、本来直前の文字の真上に記載されるべきものであるが、テキスト記法の制限により、当該文字の直後に記載する。式中においてはこれらの記号は本来の位置に記述している。また、ベクトルや行列の各要素単位で行われる処理は、特に断りが無い限り、そのベクトルやその行列の全ての要素に対して適用されるものとする。 In the following explanation, the symbols “ ⁻ ”, “~”, “^”, etc. used in the sentence should be described immediately above the character immediately before, but immediately after the character due to restrictions on text notation. Describe. In the formula, these symbols are written in their original positions. Further, the processing performed for each element of a vector or matrix is applied to all elements of the vector or matrix unless otherwise specified.

この発明の音場収音再生技術は、マイクロホンアレイで収音した音圧を、点音源に由来する成分と、その残差成分に分解し、それぞれに対して駆動信号への変換を行うことで、空間エイリアシングによる誤差を軽減する。第一実施形態は、点音源が実際に存在する位置を事前に与えない場合の音場収音再生装置及び方法である。第二実施形態は、点音源が実際に存在する位置を事前に与える場合の音場収音再生装置及び方法である。 The sound field collection and reproduction technology of the present invention breaks down the sound pressure collected by the microphone array into a component derived from a point sound source and its residual component, and converts each into a drive signal. Reduce errors due to spatial aliasing. The first embodiment is an apparatus and method for collecting and reproducing a sound field when a position where a point sound source actually exists is not given in advance. The second embodiment is an apparatus and method for collecting and reproducing a sound field when a position where a point sound source actually exists is given in advance.

［第一実施形態］
第一実施形態の音場収音再生装置及び方法は、図１に示すように、収音側となる第一の空間に配置されているＭ個のマイクロホン１₁,…,１_Mで構成されるマイクロホンアレイと、再生側となる第二の空間に配置されているＬ個のスピーカ２₁,…,２_Lで構成されるスピーカアレイとを用いて、第一の空間の音源Ｓで発生した音によって形成された第一の空間の音場を第二の空間で再現する。なお、第一の空間及び第二の空間は互いに異なる空間であってもよいし、同じ空間であってもよい。図１では、第二の空間においてＬ個のスピーカ２₁,…,２_Lで再現された音源を音源Ｓ'（以下、「仮想音源Ｓ’」ともいう）と表現している。図１では、第一の空間の音源Ｓは１個であるが、複数個であってもよい。その場合、第二の空間の仮想音源Ｓ’も音源Ｓと同数となる。 [First embodiment]
As shown in FIG. 1, the sound field sound collecting / reproducing apparatus and method according to the first embodiment includes M microphones 1 ₁ ,..., 1 _M arranged in a first space on the sound collecting side. Generated in the sound source S of the first space using the microphone array and the speaker array composed of _L speakers 2 ₁ ,..., 2 _L arranged in the second space on the reproduction side The sound field of the first space formed by the sound is reproduced in the second space. The first space and the second space may be different from each other or the same space. In FIG. 1, a sound source reproduced by _L speakers 2 ₁ ,..., 2 _L in the second space is expressed as a sound source S ′ (hereinafter also referred to as “virtual sound source S ′”). In FIG. 1, the number of sound sources S in the first space is one, but a plurality of sound sources S may be used. In that case, the number of virtual sound sources S ′ in the second space is the same as the number of sound sources S.

Ｍ個のマイクロホン１₁,…,１_Mは任意の配置とすることができ、例えば、直線や平面、円、球等に配置できる。同様に、Ｌ個のスピーカ２₁,…,２_Lは任意の配置とすることができ、例えば、直線や平面、円、球等に配置できる。マイクロホンの配置とスピーカの配置は同じでなくてもよい。例えば、図１に示すように、平面状アレイに配置されたマイクロホン１₁,…,１_Mで収音し、上下方向の次元数を削減した直線状アレイに配置されたスピーカ２₁,…,２_Lで再生するように構成することができる。 The M microphones 1 ₁ ,..., 1 _M can be arbitrarily arranged, for example, can be arranged in a straight line, a plane, a circle, a sphere, or the like. Similarly, the L speakers 2 ₁ ,..., 2 _L can be arbitrarily arranged, for example, can be arranged in a straight line, a plane, a circle, a sphere, or the like. The microphone arrangement and the speaker arrangement may not be the same. For example, as shown in FIG. 1, the microphones 1 ₁ ,..., 1 _M arranged in a planar array collect sound and the speakers 2 ₁ ,. It can be configured to play at 2 _L.

第一実施形態の音場収音再生装置１０は、図２に示すように、潜在点音源位置設定部１１、周波数変換部１２、信号分解処理部１３、点音源駆動信号計算部１４、残差駆動信号計算部１５、駆動信号合成部１６及び周波数逆変換部１７を例えば含み、図４に例示された各ステップの処理を行う。 As shown in FIG. 2, the sound field sound collection / reproduction device 10 of the first embodiment includes a latent point sound source position setting unit 11, a frequency conversion unit 12, a signal decomposition processing unit 13, a point sound source drive signal calculation unit 14, a residual. For example, the driving signal calculation unit 15, the driving signal synthesis unit 16, and the frequency inverse conversion unit 17 are included, and processing of each step illustrated in FIG. 4 is performed.

音場収音再生装置１０は、例えば、中央演算処理装置（Central Processing Unit、ＣＰＵ）、主記憶装置（Random Access Memory、ＲＡＭ）などを有する公知又は専用のコンピュータに特別なプログラムが読み込まれて構成された特別な装置である。音場収音再生装置１０は、例えば、中央演算処理装置の制御のもとで各処理を実行する。音場収音再生装置１０に入力されたデータや各処理で得られたデータは、例えば、主記憶装置に格納され、主記憶装置に格納されたデータは必要に応じて読み出されて他の処理に利用される。 The sound field sound collecting / reproducing apparatus 10 is configured, for example, by loading a special program into a known or dedicated computer having a central processing unit (CPU), a main storage device (Random Access Memory, RAM), and the like. Special equipment. The sound field sound collecting / reproducing apparatus 10 executes each process under the control of the central processing unit, for example. The data input to the sound field sound collecting / reproducing device 10 and the data obtained by each processing are stored in, for example, the main storage device, and the data stored in the main storage device is read out as necessary to obtain other data. Used for processing.

ステップＳ１１において、潜在点音源位置設定部１１は、潜在的に点音源が存在すると仮定する位置r_ps∈R^3×N（以下、潜在点音源位置という）を事前に与える。ここで、Rは実数全体であり、Nは潜在点音源位置の数である。潜在点音源位置r_psは、マイクロホンアレイからの相対的な三次元位置である。三次元位置はどのような座標系を用いてもよく、例えば、三軸で表される直交座標系や、動径と偏角で表される円筒座標系や球座標系のような極座標系などを利用することができる。潜在点音源位置r_psは、例えば、図３に示すように、収音側となる第一の空間を格子状にサンプリングした位置に設定する。 In step S ^< b> 11, the latent point sound source position setting unit 11 gives in advance a position r _ps ∈ R ^{3 × N} (hereinafter, referred to as a latent point sound source position) that assumes that a point sound source exists potentially. Here, R is the entire real number, and N is the number of latent point sound source positions. The latent point sound source position r _ps is a relative three-dimensional position from the microphone array. Any coordinate system may be used for the three-dimensional position, such as an orthogonal coordinate system represented by three axes, a polar coordinate system such as a cylindrical coordinate system or a spherical coordinate system represented by a radius vector and a declination angle, etc. Can be used. For example, as shown in FIG. 3, the latent sound source position r _ps is set to a position obtained by sampling the first space on the sound collection side in a grid pattern.

潜在点音源位置は任意の位置に設定することができる。このとき、潜在点音源位置を密に設定すると信号の分解が困難となり、疎に設定すると点音源に由来する信号以外の残差信号が大きくなるというトレードオフの関係にある。具体的には、潜在点音源位置を、例えば10cm程度の間隔のグリッドとして設定する。 The latent point sound source position can be set to an arbitrary position. At this time, if the position of the latent point sound source is set densely, it is difficult to decompose the signal, and if it is set sparsely, the residual signal other than the signal derived from the point sound source becomes large. Specifically, the latent point sound source position is set as a grid with an interval of about 10 cm, for example.

潜在点音源位置は収音空間外に設定してもよい。例えば、ある部屋で収音する場合、部屋の壁の中や部屋の外の廊下に潜在点音源位置を設定してもよい。設定された潜在点音源は、壁からの反射を考慮する際に反射音の仮想音源となり得る。 The latent point sound source position may be set outside the sound collection space. For example, when collecting sound in a certain room, the latent point sound source position may be set in a wall of the room or a corridor outside the room. The set latent point sound source can be a virtual sound source of reflected sound when reflection from the wall is considered.

第一の空間に配置されたマイクロホン１₁,…,１_Mは、第一の空間の音源Ｓで発せられた音を収音して時間領域の信号を生成する。生成された時間領域の信号は、周波数変換部１２に送られる。m（=1,…,M）番目のマイクロホン１_mで収音された時間領域の時刻tの信号をp_m(t)と表記する。 The microphones 1 ₁ ,..., 1 _M arranged in the first space collect the sound emitted from the sound source S in the first space and generate a time domain signal. The generated time domain signal is sent to the frequency converter 12. A signal at time t in the time domain picked up by the m (= 1,..., M) microphone 1 _m is represented as p _m (t).

ステップＳ１２において、周波数変換部１２は、マイクロホン１_mで収音された時間領域信号p_m(t)をフーリエ変換により時間周波数領域信号p_m ⁽ⁱ⁾(ω)に変換する。生成された時間周波数領域信号p_m ⁽ⁱ⁾(ω)は、信号分解処理部１３に送られる。ここで、iは時間フレームのインデックスを示し、ωは時間周波数である。例えば、短時間離散フーリエ変換により時間周波数領域信号p_m ⁽ⁱ⁾(ω)が生成される。もちろん、他の既存の方法により時間周波数領域信号p_m ⁽ⁱ⁾(ω)を生成してもよい。また、オーバーラップアド等の方法を用いて時間周波数領域信号p_m ⁽ⁱ⁾(ω)を生成してもよい。入力信号が長い場合や、リアルタイム処理のように連続して信号が入力される場合には、例えば10ミリ秒ごとといったフレームごとに処理を行う。時間周波数領域信号p_m ⁽ⁱ⁾(ω)は、例えば次式のように定義される。 In step S12, the frequency converter 12 converts the time domain signal p _m (t) collected by the microphone 1 _m into a time frequency domain signal p _m ⁽ⁱ⁾ (ω) by Fourier transform. The generated time frequency domain signal p _m ⁽ⁱ⁾ (ω) is sent to the signal decomposition processing unit 13. Here, i indicates a time frame index, and ω is a time frequency. For example, the time-frequency domain signal p _m ⁽ⁱ⁾ (ω) is generated by short-time discrete Fourier transform. Of course, the time frequency domain signal p _m ⁽ⁱ⁾ (ω) may be generated by other existing methods. Alternatively, the time frequency domain signal p _m ⁽ⁱ⁾ (ω) may be generated using a method such as overlap add. When the input signal is long or when the signal is continuously input as in real time processing, the processing is performed for each frame such as every 10 milliseconds. The time frequency domain signal p _m ⁽ⁱ⁾ (ω) is defined, for example, as follows.

ただし、exp関数の引数の中のjは虚数単位である。

However, j in the argument of the exp function is an imaginary unit.

ステップＳ１３において、信号分解処理部１３は、マイクロホンアレイで取得した信号の時間周波数領域信号p⁽ⁱ⁾(ω)∈C^Mを、潜在的に点音源が存在すると仮定する位置の集合である潜在点音源位置r_ps∈R^3×Nに基づいて、潜在点音源位置r_psに存在すると仮定する点音源に由来する成分である第一の信号q⁽ⁱ⁾(ω)∈C^N（以下、点音源信号ともいう）と仮定した点音源信号以外の成分である第二の信号h⁽ⁱ⁾(ω)∈C^M（以下、残差信号ともいう）とに分離する。生成された点音源信号q⁽ⁱ⁾(ω)は、点音源駆動信号計算部１４へ送られる。生成された残差信号h⁽ⁱ⁾(ω)は、残差駆動信号計算部１５へ送られる。ここで、Cは複素数全体であり、Rは実数全体であり、Mはマイクロホンの数であり、Nは潜在点音源位置の数である。 In step S13, the signal separation processing section 13, the time-frequency domain signal ^{p (i) (ω) ∈C} M of the acquired signal by the microphone array is a collection of potentially position assumed as a point sound source exists potential Based on the point source position r _ps ∈R ^{3 × N} , the first signal q ⁽ⁱ⁾ (ω) ∈C ^N (hereinafter, the component derived from the point source assumed to exist at the latent point source position r _ps And a second signal h ⁽ⁱ⁾ (ω) ∈ C ^M (hereinafter also referred to as a residual signal), which is a component other than the point sound source signal assumed to be a point sound source signal). The generated point sound source signal q ⁽ⁱ⁾ (ω) is sent to the point sound source drive signal calculation unit 14. The generated residual signal h ⁽ⁱ⁾ (ω) is sent to the residual drive signal calculation unit 15. Here, C is the whole complex number, R is the whole real number, M is the number of microphones, and N is the number of latent point sound source positions.

点音源信号q⁽ⁱ⁾(ω)と残差信号h⁽ⁱ⁾(ω)は、以下の式を満たすように、時間周波数領域信号p⁽ⁱ⁾(ω)を分解することで得る。 The point source signal q ⁽ⁱ⁾ (ω) and the residual signal h ⁽ⁱ⁾ (ω) are obtained by decomposing the time frequency domain signal p ⁽ⁱ⁾ (ω) so as to satisfy the following equation.

ここで、D∈C^M×Nは、各潜在点音源位置からマイクロホン位置までの点音源の伝達関数を要素に持つ行列である。例えば、行列Dの(m,n)番目の要素D_mnは、図３に示すように、n番目の潜在点音源位置r_ps[n]からm番目のマイクロホン位置r_m[m]までの正規化した伝達関数である。行列Dの要素D_mnは次式のように定義する。 Here, D∈C ^{M × N} is a matrix having the transfer function of the point sound source from each potential point sound source position to the microphone position as an element. For example, as shown in FIG. 3, the (m, n) -th element D _mn of the matrix D is a normal value from the n-th latent point sound source position r _ps [n] to the m-th microphone position r _m [m]. It is a generalized transfer function. The element D _{mn of the} matrix D is defined as follows:

ここで、kは時間周波数ωと音速cとを用いてk=ω/cとして定義される波数である。波数とは、いわゆる空間周波数又は角度スペクトルのことである。 Here, k is a wave number defined as k = ω / c using the time frequency ω and the sound velocity c. The wave number is a so-called spatial frequency or angular spectrum.

信号分解のアルゴリズムには様々な方法が考えられる。一般的に、事前に設定した潜在点音源位置のうち、実際に点音源の成分が存在する箇所はごく少数であるため、ベクトルq⁽ⁱ⁾(ω)の要素はほとんどがゼロとなり、少数の要素のみが値を持つことになる。これをベクトルq⁽ⁱ⁾(ω)がスパースであると呼び、スパース性を利用したアルゴリズムで分解を行うことが望ましい。また、各時間フレームのデータで分解を行っても構わないし、複数の時間フレームのデータを同時に分解しても構わない。ここでは、M-FOCUSS法と呼ばれる手法を適用した場合について示す。M-FOCUSS法は、ベクトルq⁽ⁱ⁾(ω)がスパースであること、及び、短い時間区間においては複数の時間フレームにおいて非ゼロ要素の位置がほとんど変化しないことを利用した信号分解アルゴリズムである。M-FOCUSS法についての詳細は、例えば、「S. F. Cotter, B. D. Rao, K. Engan, and K. Kreutz-Delgado, “Sparse Solutions to Linear Inverse Problems with Multiple Measurement Vectors”, IEEE Transactions on Signal Processing, vol. 53, no. 7, pp. 2477-2488, 2005.（参考文献１）」を参照されたい。 Various methods are conceivable for the signal decomposition algorithm. In general, since there are only a small number of potential point sound source positions that have point sound source components in advance, most of the elements of vector q ⁽ⁱ⁾ (ω) are zero, Only the element will have a value. This is called the vector q ⁽ⁱ⁾ (ω) is sparse, and it is desirable to perform decomposition using an algorithm using sparsity. Further, the data of each time frame may be decomposed, or the data of a plurality of time frames may be decomposed simultaneously. Here, a case where a method called an M-FOCUSS method is applied is shown. The M-FOCUSS method is a signal decomposition algorithm that uses the fact that the vector q ⁽ⁱ⁾ (ω) is sparse and that the positions of non-zero elements hardly change in multiple time frames in a short time interval. . For details on the M-FOCUSS method, see, for example, “SF Cotter, BD Rao, K. Engan, and K. Kreutz-Delgado,“ Sparse Solutions to Linear Inverse Problems with Multiple Measurement Vectors ”, IEEE Transactions on Signal Processing, vol. 53, no. 7, pp. 2477-2488, 2005. (Reference 1).

複数の時間フレームi∈{1,…,Γ}のデータをまとめて行列表記すると、P=[p⁽¹⁾(ω),…,p^(Γ)(ω)]、Q=[q⁽¹⁾(ω),…,q^(Γ)(ω)]、H=[h⁽¹⁾(ω),…,h^(Γ)(ω)]として、次式のように書ける。 When data of a plurality of time frames i∈ {1, ..., Γ} are collectively expressed as a matrix, P = [p ⁽¹⁾ (ω),…, p ^(Γ) (ω)], Q = [q ^{(1 )} (ω), ..., q ^(Γ) (ω)], H = [h ⁽¹⁾ (ω), ..., h ^(Γ) (ω)].

M-FOCUSS法のアルゴリズムでは、以下の規準に基づき、Qの推定値Q^を得る。

In the algorithm of the M-FOCUSS method, an estimated value Q ^ of Q is obtained based on the following criteria.

ここで、

であり、ρは0≦ρ≦1として設定するパラメータであり、Q[n]は行列Qの第n行目を表わす。λはQの推定値Q^のスパースさを調整するためのパラメータである。||・||_Fは行列のフロベニウスノルムを表わし、||・||²はベクトルのl₂ノルムを表わす。具体的なアルゴリズムの例は以下の通りである。ζ回目の反復において、以下の計算を行う。 here,

Ρ is a parameter set as 0 ≦ ρ ≦ 1, and Q [n] represents the nth row of the matrix Q. λ is a parameter for adjusting the sparsity of the estimated value Q ^ of Q. || · || _F represents the Frobenius norm of the matrix, and || · || ² represents the l ₂ norm of the vector. An example of a specific algorithm is as follows. In the ζ-th iteration, the following calculation is performed.

１．以下のW_ζ+1を計算する。

1. The following W _{ζ + 1} is calculated.

ここで、

である。 here,

It is.

２．以下のB_ζ+1を計算する。

2. The following B _{ζ + 1} is calculated.

ここで、

である。・^Hはエルミート転置を表す。 here,

It is.・^H represents Hermitian transpose.

３．求めるQ^を以下のように更新する。

3. Update the desired Q ^ as follows.

以上を反復することで、最適なQ^を得る。あらかじめ設定した回数で反復を終えるか、又はQの更新量||Q_ζ+1-Q_ζ||_Fが十分小さい値になったところで反復を終える。 By repeating the above, the optimum Q ^ is obtained. The iteration is finished for a preset number of times, or the iteration is finished when the Q update amount || Q _{ζ + 1} −Q _ζ || _F becomes a sufficiently small value.

その他の簡単な信号分解の方法としては、次式に示す、l₂ノルム規準である最小ノルム解に基づく分解などが考えられる。 As another simple signal decomposition method, decomposition based on the minimum norm solution, which is the l ₂ norm criterion, as shown in the following equation, can be considered.

ここでβは正則化パラメータであり、数値計算の安定性を調整するための任意の正の実数である。Iは単位行列であり、・^Hはエルミート転置を表す。 Here, β is a regularization parameter, and is an arbitrary positive real number for adjusting the stability of numerical calculation. I is an identity matrix, and ^H represents Hermitian transpose.

残差信号Hは、次式により得られる。

The residual signal H is obtained by the following equation.

または、次式により得られる。

Or it is obtained by the following equation.

M-FOCUSS法もしくは最小ノルム解による信号分解の方法はあくまで一例である。また、M-FOCUSS法による信号分解の方法も様々な方法が存在する。M-FOCUSS法以外のスパース性を利用したアルゴリズムとして適用可能なものとしては、例えば、OMP法、FOCUSS法、Basis Pursuit法等が挙げられる。 The signal decomposition method using the M-FOCUSS method or the minimum norm solution is merely an example. There are also various methods of signal decomposition by the M-FOCUSS method. Examples of algorithms applicable to sparsity other than the M-FOCUSS method include the OMP method, the FOCUSS method, and the Basis Pursuit method.

ステップＳ１４において、点音源駆動信号計算部１４は、点音源信号q⁽ⁱ⁾(ω)及び潜在点音源位置r_psに基づいて、音圧勾配に相当する点音源駆動信号q~⁽ⁱ⁾(ω)を計算する。生成された点音源駆動信号q~⁽ⁱ⁾(ω)は、駆動信号合成部１６に送られる。これらの間の信号変換は、Wave Field Synthesisを用いる方法（「S. Spors, R. Rabenstein, and J. Ahrens, “The Theory of Wave Field Synthesis Revisited”, Proceedings of AES 124th Convention, 2008.（参考文献２）」参照）や、Spectral Division Methodを用いる方法（「J. Ahrens, and S. Spors, “Sound Field Reproduction Using Planar and Linear Arrays of Loudspeakers”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 8, pp. 2038-2050, 2010.（参考文献３）」参照）などがある。ここでは、一例として、スピーカアレイが直線状の場合のWave Field Synthesisを用いる方法について示す。 In step S14, the point sound source drive signal calculator 14 generates point sound source drive signals q˜ ⁽ⁱ⁾ (corresponding to the sound pressure gradient based on the point sound source signal q ⁽ⁱ⁾ (ω) and the latent point sound source position r _ps. ω) is calculated. The generated point sound source drive signals q to ⁽ⁱ⁾ (ω) are sent to the drive signal synthesis unit 16. Signal conversion between them is performed using a method using Wave Field Synthesis (“S. Spors, R. Rabenstein, and J. Ahrens,“ The Theory of Wave Field Synthesis Revisited ”, Proceedings of AES 124th Convention, 2008. (references). 2) ”) and the method using the Spectral Division Method (“ J. Ahrens, and S. Spors, “Sound Field Reproduction Using Planar and Linear Arrays of Loudspeakers”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 8, pp. 2038-2050, 2010. (references 3) ”). Here, as an example, a method using Wave Field Synthesis in the case where the speaker array is linear will be described.

まず、行列G∈C^M×Nを定義する。行列Gの(m,n)番目の要素G_mnは、n番目の潜在点音源位置r_ps[n]及びm番目のマイクロホン位置r_m[m]を用いて、次式により定義する。 First, a matrix GεC ^{M × N} is defined. The (m, n) -th element G _mn of the matrix G is defined by the following equation using the n-th latent point sound source position r _ps [n] and the m-th microphone position r _m [m].

ここで、y_m及びy_psは、r_m及びr_psのy座標（マイクロホンアレイに直交する方向の座標）であり、G₀は音色を操作するための任意の複素数である。 Here, y _m and y _ps are y coordinates of r _m and r _ps (coordinates in a direction orthogonal to the microphone array), and G ₀ is an arbitrary complex number for manipulating the timbre.

このとき、次式により点音源駆動信号q~⁽ⁱ⁾(ω)が得られる。

At this time, the point sound source drive signals q˜ ⁽ⁱ⁾ (ω) are obtained by the following equation.

式（１６）におけるH₁ ⁽²⁾は１次の第二種ハンケル関数である。第二種ハンケル関数H_n ⁽²⁾は、第一種ベッセル関数J_n(x)及び第二種ベッセル関数Y_n(x)を用いて以下のように定義される。

H ₁ ⁽² ) in the equation (16) is a first-order second-class Hankel function. The second kind Hankel function H _n ⁽²⁾ is defined as follows using the first kind Bessel function J _n (x) and the second kind Bessel function Y _n (x).

ステップＳ１５において、残差駆動信号計算部１５は、残差信号h⁽ⁱ⁾(ω)に基づいて、音圧勾配に相当する残差駆動信号h~⁽ⁱ⁾(ω)を出力する。生成された残差駆動信号h~⁽ⁱ⁾(ω)は、駆動信号合成部１６に送られる。これらの間の信号変換は、波面再構成フィルタ（Wave Field Reconstructionフィルタ：WFRフィルタともいう）を用いる方法がある。波面再構成フィルタについての詳細は非特許文献１を参照されたい。 In step S15, the residual drive signal calculating section 15, based on the residual signal h ^{(i) (ω),} and outputs the residual driving signals h ~ corresponding to the sound pressure gradient and ^{(i) (ω).} The generated residual drive signals h˜ ⁽ⁱ⁾ (ω) are sent to the drive signal synthesis unit 16. For signal conversion between them, there is a method using a wavefront reconstruction filter (also referred to as a Wave Field Reconstruction filter: WFR filter). See Non-Patent Document 1 for details on the wavefront reconstruction filter.

波面再構成フィルタは、収音場の波面を平面波に展開して得られる時空間スペクトルと、マイクロホンアレイ及びスピーカアレイの配置及び相対位置関係と、物理的な音波の伝搬式とに基づいて設計される。 The wavefront reconstruction filter is designed based on the spatio-temporal spectrum obtained by expanding the wavefront of the sound collection field into a plane wave, the arrangement and relative positional relationship of the microphone array and speaker array, and the physical sound wave propagation formula. The

例えば、マイクロホンアレイ及びスピーカアレイが共に直線状の場合には、まず、残差信号h⁽ⁱ⁾(ω)を空間方向にフーリエ変換し、時空間周波数領域信号h⁻⁽ⁱ⁾(ω)を得る。そして、時空間周波数領域信号h⁻⁽ⁱ⁾(ω)のm番目の要素に対し、以下の係数を乗じる。 For example, when both the microphone array and the speaker array are linear, first, the residual signal h ⁽ⁱ⁾ (ω) is Fourier-transformed in the spatial direction, and the spatio-temporal frequency domain signal h− ⁽ⁱ⁾ (ω) is obtain. Then, the following coefficient is multiplied to the m-th element of the spatio-temporal frequency domain signal h− ⁽ⁱ⁾ (ω).

ここで、k_x,mはm番目の要素に対応するx方向の空間周波数であり、y_refは振幅を一致させる位置を設定するパラメータであり、f₀は音色を操作するための任意の複素数である。H₀ ⁽²⁾は０次の第二種ハンケル関数である。最後に、空間方向に逆フーリエ変換を行い、残差駆動信号h~⁽ⁱ⁾(ω)を得る。 Where k _{x, m} is the spatial frequency in the x direction corresponding to the mth element, y _ref is a parameter that sets the position where the amplitudes match, and f ₀ is an arbitrary complex number for manipulating the timbre It is. H ₀ ⁽²⁾ is a zeroth-order second-class Hankel function. Finally, inverse Fourier transform is performed in the spatial direction to obtain residual drive signals h˜ ⁽ⁱ⁾ (ω).

例えば、マイクロホンアレイ及びスピーカアレイが共に平面状配置の場合には、まず、残差信号h⁽ⁱ⁾(ω)を空間方向に２次元フーリエ変換し、時空間周波数領域信号h⁻⁽ⁱ⁾(ω)を得る。そして、時空間周波数領域信号h⁻⁽ⁱ⁾(ω)のm番目の要素に対し、以下の係数を乗じる。 For example, when both the microphone array and the speaker array are planarly arranged, first, the residual signal h ⁽ⁱ⁾ (ω) is two-dimensionally Fourier transformed in the spatial direction to obtain the spatio-temporal frequency domain signal h ^{− (i)} ( ω). Then, the following coefficient is multiplied to the m-th element of the spatio-temporal frequency domain signal h− ⁽ⁱ⁾ (ω).

ここで、k_x,m、k_y,mはm番目の要素に対応するx方向、y方向の空間周波数であり、f₀は音色を操作するための任意の複素数である。最後に、空間方向に２次元逆フーリエ変換を行い、残差駆動信号h~⁽ⁱ⁾(ω)を得る。 Here, k _{x, m} and k _{y, m} are spatial frequencies in the x and y directions corresponding to the m-th element, and f ₀ is an arbitrary complex number for manipulating the timbre. Finally, two-dimensional inverse Fourier transform is performed in the spatial direction to obtain residual drive signals h˜ ⁽ⁱ⁾ (ω).

上記の波面再構成フィルタはあくまで一例であり、収音場の波面を平面波に展開して得られる時空間スペクトルと、マイクロホンアレイ及びスピーカアレイの配置及び相対位置関係と、物理的な音波の伝搬式とに基づいて設計される波面再構成フィルタであれば、どのような波面再構成フィルタを用いてもよい。 The wavefront reconstruction filter described above is merely an example, a spatio-temporal spectrum obtained by expanding the wavefront of the sound collection field into a plane wave, the arrangement and relative positional relationship of the microphone array and the speaker array, and a physical sound wave propagation formula. As long as the wavefront reconstruction filter is designed based on the above, any wavefront reconstruction filter may be used.

ステップＳ１６において、駆動信号合成部１６は、点音源駆動信号q~⁽ⁱ⁾(ω)と残差駆動信号h~⁽ⁱ⁾(ω)とを合成して、マイクロホンアレイで収音された信号の波面を再現するためのスピーカ駆動信号d⁽ⁱ⁾(ω)を出力する。生成されたスピーカ駆動信号d⁽ⁱ⁾(ω)は、周波数逆変換部１７に送られる。スピーカ駆動信号d⁽ⁱ⁾(ω)は、以下のように、入力された点音源駆動信号q~⁽ⁱ⁾(ω)と残差駆動信号h~⁽ⁱ⁾(ω)とを加算することで得られる。 In step S16, the drive signal synthesizer 16 synthesizes the point sound source drive signals q to ⁽ⁱ⁾ (ω) and the residual drive signals h to ⁽ⁱ⁾ (ω), and the signals collected by the microphone array. The speaker drive signal d ⁽ⁱ⁾ (ω) for reproducing the wavefront of is output. The generated speaker drive signal d ⁽ⁱ⁾ (ω) is sent to the frequency inverse transform unit 17. The speaker drive signal d ⁽ⁱ⁾ (ω) is obtained by adding the input point sound source drive signals q ~ ⁽ⁱ⁾ (ω) and the residual drive signals h ~ ⁽ⁱ⁾ (ω) as follows: It is obtained by.

ステップＳ１７において、周波数逆変換部１７は、周波数領域信号d⁽ⁱ⁾(ω)を逆フーリエ変換により時間領域信号d_p(t)に変換する。pはその時間領域信号が再生されるスピーカのインデックスであり、p=1,…,Lである。逆フーリエ変換によりフレーム毎に得られた時間領域信号d_p(t)は適宜シフトされて線形和が取られて、連続した時間領域信号となる。逆フーリエ変換は短時間離散逆フーリエ変換等の既存の方法を用いればよい。時間領域信号d_p(t)はスピーカ２_pへ送られる。 In step S17, the frequency inverse transform unit 17 transforms the frequency domain signal d ⁽ⁱ⁾ (ω) into a time domain signal d _p (t) by inverse Fourier transform. p is an index of a speaker from which the time domain signal is reproduced, and p = 1,. The time domain signal d _p (t) obtained for each frame by the inverse Fourier transform is appropriately shifted to obtain a linear sum, and becomes a continuous time domain signal. For the inverse Fourier transform, an existing method such as a short-time discrete inverse Fourier transform may be used. The time domain signal d _p (t) is sent to the speaker 2 _p .

スピーカ２₁,…,２_Lは、時間領域信号d_p(t)に基づいて音を再生する。具体的には、p=1,…,Lとして、p番目のスピーカ２_pが時間領域信号d_p(t)に基づいて音を再生する。これにより、第一の空間の音場を第二の空間に再現することができる。 The speakers 2 ₁ ,..., 2 _L reproduce sound based on the time domain signal d _p (t). Specifically, with p = 1,..., L, the p-th speaker 2 _p reproduces sound based on the time domain signal d _p (t). Thereby, the sound field of the first space can be reproduced in the second space.

［第二実施形態］
第二実施形態の音場収音再生装置及び方法は、第一実施形態と同様に、収音側となる第一の空間に配置されているＭ個のマイクロホン１₁,…,１_Mで構成されるマイクロホンアレイと、再生側となる第二の空間に配置されているＬ個のスピーカ２₁,…,２_Lで構成されるスピーカアレイとを用いて、第一の空間の音源Ｓで発生した音によって形成された第一の空間の音場を第二の空間で再現する。 [Second Embodiment]
Similar to the first embodiment, the sound field sound collecting / reproducing apparatus and method according to the second embodiment are configured by M microphones 1 ₁ ,..., 1 _M arranged in the first space on the sound collecting side. Generated by the sound source S in the first space using the microphone array to be played and the speaker array composed of _L speakers 2 ₁ ,..., 2 _L arranged in the second space on the reproduction side The sound field of the first space formed by the sound is reproduced in the second space.

第一の空間に配置されたマイクロホン及び第二の空間に配置されたスピーカの配置は第一実施形態と同様である。 The arrangement of the microphones arranged in the first space and the speakers arranged in the second space is the same as in the first embodiment.

第二実施形態の音場収音再生装置２０は、図５に示すように、点音源位置設定部２１、周波数変換部１２、信号分解処理部２３、点音源駆動信号計算部１４、残差駆動信号計算部１５、駆動信号合成部１６及び周波数逆変換部１７を例えば含み、図６に例示された各ステップの処理を行う。第一実施形態の音場収音再生装置１０と第二実施形態の音場収音再生装置２０との相違点は、潜在点音源位置設定部１１の代わりに点音源位置設定部２１を備え、信号分解処理部の処理が異なることである。 As shown in FIG. 5, the sound field sound collection / reproduction device 20 of the second embodiment includes a point sound source position setting unit 21, a frequency conversion unit 12, a signal decomposition processing unit 23, a point sound source drive signal calculation unit 14, a residual drive. The signal calculation unit 15, the drive signal synthesis unit 16, and the frequency inverse conversion unit 17 are included, for example, and the processing of each step illustrated in FIG. 6 is performed. The difference between the sound field sound collection / reproduction device 10 of the first embodiment and the sound field sound collection / reproduction device 20 of the second embodiment includes a point sound source position setting unit 21 instead of the latent point sound source position setting unit 11. The processing of the signal decomposition processing unit is different.

以下、第二実施形態の音場収音再生装置２０の行う処理について、第一実施形態と異なる部分を中心に説明する。 Hereinafter, the processing performed by the sound field sound collecting / reproducing apparatus 20 according to the second embodiment will be described focusing on differences from the first embodiment.

ステップＳ２１において、点音源位置設定部２１は、実際に点音源が存在する位置r’_ps∈R^3×N’（以下、点音源位置と言う）を事前に与える。ここで、N’は実在する点音源の数である。点音源位置r’_psは、マイクロホンアレイからの相対的な三次元位置である。 In step S21, the point sound source position setting unit 21 gives in advance a position r ′ _ps ∈ R ^{3 × N ′} (hereinafter referred to as a point sound source position) where the point sound source actually exists. Here, N ′ is the number of actual point sound sources. The point sound source position r ′ _ps is a relative three-dimensional position from the microphone array.

ステップＳ２３において、信号分解処理部２３は、マイクロホンアレイで取得した信号の時間周波数領域信号p⁽ⁱ⁾(ω)∈C^Mを、実際に点音源が存在する位置の集合である点音源位置r’_ps∈R^3×N’に基づいて、実在する点音源に由来する成分である点音源信号q⁽ⁱ⁾(ω)∈C^N’と点音源信号以外の成分である残差信号h⁽ⁱ⁾(ω)∈C^Mとに分離する。生成された点音源信号q⁽ⁱ⁾(ω)は、点音源駆動信号計算部１４へ送られる。生成された残差信号h⁽ⁱ⁾(ω)は、残差駆動信号計算部１５へ送られる。 In step S23, the signal separation processing section 23, the time-frequency domain signal p ^{(i) (ω)} of the acquired signal by the microphone array ∈ C ^M, and actually set a is a point sound source position r of the position where the point source is present Based on ' _ps ∈ R ^{3 × N'} , point source signal q ⁽ⁱ⁾ (ω) ∈ C ^{N '} , which is a component derived from an existing point source, and residual signal h ^{( i) (ω)} is separated into a ∈C ^M. The generated point sound source signal q ⁽ⁱ⁾ (ω) is sent to the point sound source drive signal calculation unit 14. The generated residual signal h ⁽ⁱ⁾ (ω) is sent to the residual drive signal calculation unit 15.

点音源信号q⁽ⁱ⁾(ω)と残差信号h⁽ⁱ⁾(ω)は、以下の式を満たすように、時間周波数領域信号p⁽ⁱ⁾(ω)を分解することで得る。

The point source signal q ⁽ⁱ⁾ (ω) and the residual signal h ⁽ⁱ⁾ (ω) are obtained by decomposing the time frequency domain signal p ⁽ⁱ⁾ (ω) so as to satisfy the following equation.

ここで、D’∈C^M×N’は、各点音源位置からマイクロホン位置までの点音源の伝達関数を要素に持つ行列である。行列D’の要素D’_mnは次式のように定義する。

Here, D ^′ ∈C ^{M × N ′} is a matrix having the transfer function of the point sound source from each point sound source position to the microphone position as an element. The element D ′ _mn of the matrix D ′ is defined as follows:

信号分解では、N'≦Mの場合には最小二乗解に基づく方法を用いる。

In the signal decomposition, when N ′ ≦ M, a method based on a least square solution is used.

また、N'>Mの場合には最小ノルム解に基づく方法を用いる。

In the case of N ′> M, a method based on the minimum norm solution is used.

ここでβは正則化パラメータであり、数値計算の安定性を調整するための任意の正の実数である。 Here, β is a regularization parameter, and is an arbitrary positive real number for adjusting the stability of numerical calculation.

残差信号Hは、次式により得られる。

The residual signal H is obtained by the following equation.

［変形例］
音場収音再生装置は、収音側の空間に配置された音場収音装置と再生側の空間に配置された音場再生装置とを含む音場収音再生システムとして構成してもよい。音場収音再生装置を構成する各部は、音場収音装置と音場再生装置のいずれに備えられていてもよい。換言すれば、潜在点音源位置設定部１１（もしくは点音源位置設定部２１）、周波数変換部１２、信号分解処理部１３（もしくは信号分解処理部２３）、点音源駆動信号計算部１４、残差駆動信号計算部１５、駆動信号合成部１６及び周波数逆変換部１７のそれぞれの処理は、収音側の空間に配置された音場収音装置で実行されてもよいし、再生側の空間に配置された音場再生装置で実行されてもよい。音場収音装置で生成された信号は、音場再生装置に送信される。このとき送信信号は任意の方法で符号化されていてもよい。 [Modification]
The sound field sound collecting / reproducing device may be configured as a sound field sound collecting / reproducing system including a sound field sound collecting device arranged in a sound collecting side space and a sound field reproducing device arranged in a reproduction side space. . Each unit constituting the sound field sound collecting / reproducing device may be provided in either the sound field sound collecting device or the sound field reproducing device. In other words, the latent point sound source position setting unit 11 (or the point sound source position setting unit 21), the frequency conversion unit 12, the signal decomposition processing unit 13 (or the signal decomposition processing unit 23), the point sound source drive signal calculation unit 14, the residual Each process of the drive signal calculation unit 15, the drive signal synthesis unit 16, and the frequency inverse conversion unit 17 may be executed by a sound field sound collection device arranged in the sound collection side space, or in the reproduction side space. It may be executed by the arranged sound field reproduction device. The signal generated by the sound field sound collecting device is transmitted to the sound field reproducing device. At this time, the transmission signal may be encoded by an arbitrary method.

例えば、第一の空間に配置された音場収音装置は潜在点音源設定部１１、周波数変換部１２及び信号分解処理部１３を備え、第二の空間に配置された音場再生装置は点音源駆動信号計算部１４、残差駆動信号計算部１５、駆動信号合成部１６及び周波数逆変換部１７を備え、信号分解処理部１３の出力する点音源信号q⁽ⁱ⁾(ω)及び残差信号h⁽ⁱ⁾(ω)が音場収音装置から音場再生装置へ符号化して送信されるように構成することができる。また、例えば、第一の空間に配置された音場収音装置は潜在点音源設定部１１、周波数変換部１２、信号分解処理部１３、点音源駆動信号計算部１４及び残差駆動信号計算部１５を備え、第二の空間に配置された音場再生装置は駆動信号合成部１６及び周波数逆変換部１７を備え、点音源駆動信号計算部１４の出力する点音源駆動信号q~⁽ⁱ⁾(ω)及び残差駆動信号計算部１５の出力する残差駆動信号h~⁽ⁱ⁾(ω)が音場収音装置から音場再生装置へ符号化して送信されるように構成してもよい。 For example, the sound field sound collection device arranged in the first space includes the latent point sound source setting unit 11, the frequency conversion unit 12, and the signal decomposition processing unit 13, and the sound field reproduction device arranged in the second space is a point. The sound source drive signal calculation unit 14, the residual drive signal calculation unit 15, the drive signal synthesis unit 16, and the frequency inverse conversion unit 17 are provided, and the point source signal q ⁽ⁱ⁾ (ω) and the residual output from the signal decomposition processing unit 13 The signal h ⁽ⁱ⁾ (ω) can be configured to be encoded and transmitted from the sound field sound collecting device to the sound field reproducing device. Also, for example, the sound field sound collection device arranged in the first space includes a latent point sound source setting unit 11, a frequency conversion unit 12, a signal decomposition processing unit 13, a point sound source drive signal calculation unit 14, and a residual drive signal calculation unit. The sound field reproduction device disposed in the second space includes a drive signal synthesis unit 16 and a frequency inverse conversion unit 17, and the point sound source drive signals q˜ ⁽ⁱ⁾ output from the point sound source drive signal calculation unit 14. (ω) and the residual drive signal h˜ ⁽ⁱ⁾ (ω) output from the residual drive signal calculator 15 may be encoded and transmitted from the sound field pickup device to the sound field reproduction device. Good.

この発明は上述の実施形態に限定されるものではなく、この発明の趣旨を逸脱しない範囲で適宜変更が可能であることはいうまでもない。上記実施形態において説明した各種の処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。 The present invention is not limited to the above-described embodiment, and it goes without saying that modifications can be made as appropriate without departing from the spirit of the present invention. The various processes described in the above embodiment may be executed not only in time series according to the order of description, but also in parallel or individually as required by the processing capability of the apparatus that executes the processes or as necessary.

［プログラム、記録媒体］
上記実施形態で説明した各装置における各種の処理機能をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 [Program, recording medium]
When various processing functions in each device described in the above embodiment are realized by a computer, the processing contents of the functions that each device should have are described by a program. Then, by executing this program on a computer, various processing functions in each of the above devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、本装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, the present apparatus is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

１マイクロホン
２スピーカ
１０、２０音場収音再生装置
１１潜在点音源位置設定部
１２周波数変換部
１３、２３信号分解処理部
１４点音源駆動信号計算部
１５残差駆動信号計算部
１６駆動信号合成部
１７周波数逆変換部
２１点音源位置設定部 DESCRIPTION OF SYMBOLS 1 Microphone 2 Speaker 10, 20 Sound field sound collection reproducing apparatus 11 Potential point sound source position setting part 12 Frequency conversion part 13, 23 Signal decomposition process part 14 Point sound source drive signal calculation part 15 Residual drive signal calculation part 16 Drive signal synthetic | combination part 17 Frequency inverse transformation part 21 Point sound source position setting part

Claims

複数のマイクロホンで収音された信号に基づいて生成された時間周波数領域信号を、潜在的に点音源が存在すると仮定する位置の集合である潜在点音源位置に基づいて、上記点音源に由来する成分である第一の信号と上記第一の信号以外の成分である第二の信号とに分離する信号分解処理部を含む
音場収音再生装置。 A time-frequency domain signal generated based on signals collected by a plurality of microphones is derived from the point sound source based on a potential point sound source position, which is a set of positions where a point sound source exists potentially. A sound field sound collecting / reproducing apparatus including a signal decomposition processing unit that separates a first signal as a component into a second signal that is a component other than the first signal.

請求項１に記載の音場収音再生装置であって、
ωを時間周波数とし、p⁽ⁱ⁾(ω)を上記時間周波数領域信号のi番目の時間フレームとし、q⁽ⁱ⁾(ω)を上記第一の信号のi番目の時間フレームとし、h⁽ⁱ⁾(ω)を上記第二の信号のi番目の時間フレームとし、Dを上記潜在点音源位置から上記マイクロホンまでの伝達関数を要素にもつ行列とし、
上記信号分解処理部は、p⁽ⁱ⁾(ω)=Dq⁽ⁱ⁾(ω)+h⁽ⁱ⁾(ω)を満たすように、上記時間周波数領域信号を上記第一の信号と上記第二の信号とに分離するものである
音場収音再生装置。 The sound field recording and reproducing device according to claim 1,
ω is the time frequency, p ⁽ⁱ⁾ (ω) is the i-th time frame of the time-frequency domain signal, q ⁽ⁱ⁾ (ω) is the i-th time frame of the first signal, and h ^{( i)} (ω) is the i-th time frame of the second signal, D is a matrix having the transfer function from the latent sound source position to the microphone as an element,
The signal decomposition processing unit converts the time-frequency domain signal into the first signal and the second signal so as to satisfy p ⁽ⁱ⁾ (ω) = Dq ⁽ⁱ⁾ (ω) + h ⁽ⁱ⁾ (ω). A sound field recording and reproducing device that is separated into

請求項１又は２に記載の音場収音再生装置であって、
上記信号分解処理部は、信号のスパース性を利用した信号分解アルゴリズムにより、上記時間周波数領域信号を上記第一の信号と上記第二の信号とに分離するものである
音場収音再生装置。 The sound field sound collecting / reproducing apparatus according to claim 1 or 2,
The sound field collecting and reproducing apparatus, wherein the signal decomposition processing unit separates the time-frequency domain signal into the first signal and the second signal by a signal decomposition algorithm using signal sparsity.

請求項１から３のいずれかに記載の音場収音再生装置であって、
ωを時間周波数とし、p⁽ⁱ⁾(ω)を上記時間周波数領域信号のi番目の時間フレームとし、q⁽ⁱ⁾(ω)を上記第一の信号のi番目の時間フレームとし、h⁽ⁱ⁾(ω)を上記第二の信号のi番目の時間フレームとし、P=[p⁽¹⁾(ω),…,p^(Γ)(ω)]を上記時間周波数領域信号の1番目からΓ番目までの時間フレームからなる行列とし、Q=[q⁽¹⁾(ω),…,q^(Γ)(ω)]を上記第一の信号の1番目からΓ番目までの時間フレームからなる行列とし、H=[h⁽¹⁾(ω),…,h^(Γ)(ω)]を上記第二の信号の1番目からΓ番目までの時間フレームからなる行列とし、Dを上記潜在点音源位置から上記マイクロホンまでの伝達関数を要素にもつ行列とし、ρは0≦ρ≦1として設定するパラメータであり、λは第一の信号のスパース性の程度を調整するパラメータであり、||・||_Fは行列のフロベニウスノルムであり、||・||₂はベクトルのl₂ノルムであり、Q[n]は行列Qのn行目を表し、Nは上記潜在点音源位置の数であり、

であり、
上記信号分解処理部は、次式により上記第一の信号を求め、

次式により、上記第二の信号を求めるものである

音場収音再生装置。 A sound field sound collecting / reproducing apparatus according to any one of claims 1 to 3,
ω is the time frequency, p ⁽ⁱ⁾ (ω) is the i-th time frame of the time-frequency domain signal, q ⁽ⁱ⁾ (ω) is the i-th time frame of the first signal, and h ^{( i)} ^Let (ω) be the i-th time frame of the second signal and P = [p ⁽¹⁾ (ω), ..., p ^(Γ) (ω)] from the first of the time-frequency domain signal Suppose that the matrix is composed of the Γth time frames, and Q = [q ⁽¹⁾ (ω), ..., q ^(Γ) (ω)] is composed of the first to Γth time frames of the first signal. A matrix, H = [h ⁽¹⁾ (ω), ..., h ^(Γ) (ω)] is a matrix composed of the first to Γ th time frames of the second signal, and D is the latent point. A matrix having the transfer function from the sound source position to the microphone as an element, ρ is a parameter set as 0 ≦ ρ ≦ 1, λ is a parameter for adjusting the degree of sparsity of the first signal, and ||・ || _F is the Frobenius norm of the matrix, and || · || ₂ is the l ₂ norm of the vector, Q [n] represents the nth row of the matrix Q, N is the number of the latent sound source positions,

And
The signal decomposition processing unit obtains the first signal by the following equation:

The second signal is obtained by the following equation.

Sound field recording and playback device.

請求項１から３のいずれかに記載の音場収音再生装置であって、
ωを時間周波数とし、p⁽ⁱ⁾(ω)を上記時間周波数領域信号のi番目の時間フレームとし、q⁽ⁱ⁾(ω)を上記第一の信号のi番目の時間フレームとし、h⁽ⁱ⁾(ω)を上記第二の信号のi番目の時間フレームとし、P=[p⁽¹⁾(ω),…,p^(Γ)(ω)]を上記時間周波数領域信号の1番目からΓ番目までの時間フレームからなる行列とし、βは正則化パラメータであり、Iは単位行列であり、・^Hはエルミート転置であり、
上記信号分解処理部は、次式により上記第一の信号を求め、

次式により上記第二の信号を求めるものである

音場収音再生装置。 A sound field sound collecting / reproducing apparatus according to any one of claims 1 to 3,
ω is the time frequency, p ⁽ⁱ⁾ (ω) is the i-th time frame of the time-frequency domain signal, q ⁽ⁱ⁾ (ω) is the i-th time frame of the first signal, and h ^{( i)} ^Let (ω) be the i-th time frame of the second signal and P = [p ⁽¹⁾ (ω), ..., p ^(Γ) (ω)] from the first of the time-frequency domain signal Let Γ be a matrix of time frames, β is a regularization parameter, I is a unit matrix, ^H is a Hermitian transpose,
The signal decomposition processing unit obtains the first signal by the following equation:

The second signal is obtained by the following equation.

Sound field recording and playback device.

請求項１から５のいずれかに記載の音場収音再生装置であって、
上記第一の信号及び上記潜在点音源位置に基づいて音圧勾配に相当する点音源駆動信号を求める点音源駆動信号計算部と、
上記第二の信号に波面再構成フィルタを適用して音圧勾配に相当する残差駆動信号を求める残差駆動信号計算部と、
をさらに含む音場収音再生装置。 A sound field sound collecting and reproducing device according to any one of claims 1 to 5,
A point sound source drive signal calculation unit for obtaining a point sound source drive signal corresponding to a sound pressure gradient based on the first signal and the latent point sound source position;
Applying a wavefront reconstruction filter to the second signal to obtain a residual drive signal corresponding to a sound pressure gradient;
A sound field collecting and reproducing device further comprising:

複数のマイクロホンで収音された信号から点音源に由来する成分を分離した第一の信号に基づいて生成された点音源駆動信号と、上記第一の信号と異なる第二の信号に基づいて生成された残差駆動信号とを合成して、上記マイクロホンで収音された信号の波面を再現するためのスピーカ駆動信号を生成する駆動信号合成部を含む
音場収音再生装置。 Generated based on a point sound source drive signal generated based on a first signal obtained by separating components derived from a point sound source from signals collected by a plurality of microphones, and a second signal different from the first signal A sound field sound collecting / reproducing apparatus including a drive signal synthesizing unit that synthesizes the generated residual drive signal and generates a speaker drive signal for reproducing the wavefront of the signal picked up by the microphone.

請求項７に記載の音場収音再生装置であって、
上記第二の信号は、上記マイクロホンで収音された信号から上記第一の信号を分離した残りの成分である
音場収音再生装置。 The sound field sound collecting and reproducing device according to claim 7,
The second signal is a remaining component obtained by separating the first signal from a signal collected by the microphone.

音場収音装置と音場再生装置を含む音場収音再生システムであって、
上記音場収音装置は、
複数のマイクロホンで収音された信号に基づいて生成された時間周波数領域信号を、潜在的に点音源が存在すると仮定する位置の集合である潜在点音源位置に基づいて、上記点音源に由来する成分である第一の信号と上記第一の信号以外の成分である第二の信号とに分離する信号分解処理部と、
上記第一の信号及び上記潜在点音源位置に基づいて音圧勾配に相当する点音源駆動信号を求める点音源駆動信号計算部と、
上記第二の信号に波面再構成フィルタを適用して音圧勾配に相当する残差駆動信号を求める残差駆動信号計算部と、
を含み、
上記音場再生装置は、
上記点音源駆動信号と上記残差駆動信号とを合成して上記マイクロホンで収音された信号の波面を再現するためのスピーカ駆動信号を生成する駆動信号合成部を含む
音場収音再生システム。 A sound field sound collection and reproduction system including a sound field sound collection device and a sound field reproduction device,
The sound field pickup device is
A time-frequency domain signal generated based on signals collected by a plurality of microphones is derived from the point sound source based on a potential point sound source position, which is a set of positions where a point sound source exists potentially. A signal decomposition processing unit that separates a first signal that is a component and a second signal that is a component other than the first signal;
A point sound source drive signal calculation unit for obtaining a point sound source drive signal corresponding to a sound pressure gradient based on the first signal and the latent point sound source position;
Applying a wavefront reconstruction filter to the second signal to obtain a residual drive signal corresponding to a sound pressure gradient;
Including
The sound field reproduction device is
A sound field sound collection and reproduction system including a drive signal synthesis unit that synthesizes the point sound source drive signal and the residual drive signal to generate a speaker drive signal for reproducing the wavefront of the signal collected by the microphone.

信号分解処理部が、複数のマイクロホンで収音された信号に基づいて生成された時間周波数領域信号を、潜在的に点音源が存在すると仮定する位置の集合である潜在点音源位置に基づいて、上記点音源に由来する成分である第一の信号と上記第一の信号以外の成分である第二の信号とに分離する信号分解処理ステップを含む
音場収音再生方法。 The signal decomposition processing unit generates a time-frequency domain signal generated based on signals collected by a plurality of microphones, based on a potential point sound source position that is a set of positions where a point sound source exists potentially. A method for collecting and reproducing sound fields, comprising: a signal decomposition processing step for separating a first signal that is a component derived from the point sound source and a second signal that is a component other than the first signal.

駆動信号合成部が、複数のマイクロホンで収音された信号から点音源に由来する成分を分離した第一の信号に基づいて生成された点音源駆動信号と、上記第一の信号と異なる第二の信号に基づいて生成された残差駆動信号とを合成して、上記マイクロホンで収音された信号の波面を再現するためのスピーカ駆動信号を生成する駆動信号合成ステップを含む
音場収音再生方法。 A point signal source drive signal generated based on a first signal obtained by separating a component derived from a point source from signals collected by a plurality of microphones, and a second different from the first signal Including a drive signal synthesis step of generating a speaker drive signal for reproducing the wavefront of the signal picked up by the microphone by combining the residual drive signal generated based on the signal of Method.

信号分解処理部が、複数のマイクロホンで収音された信号に基づいて生成された時間周波数領域信号を、潜在的に点音源が存在すると仮定する位置の集合である潜在点音源位置に基づいて、上記点音源に由来する成分である第一の信号と上記第一の信号以外の成分である第二の信号とに分離する信号分解処理ステップと、
点音源駆動信号計算部が、上記第一の信号及び上記潜在点音源位置に基づいて音圧勾配に相当する点音源駆動信号を求める点音源駆動信号計算ステップと、
残差駆動信号計算部が、上記第二の信号に波面再構成フィルタを適用して音圧勾配に相当する残差駆動信号を求める残差駆動信号計算ステップと、
駆動信号合成部が、上記点音源駆動信号と上記残差駆動信号とを合成して上記マイクロホンで収音された信号の波面を再現するためのスピーカ駆動信号を生成する駆動信号合成ステップと、
を含む音場収音再生方法。 The signal decomposition processing unit generates a time-frequency domain signal generated based on signals collected by a plurality of microphones, based on a potential point sound source position that is a set of positions where a point sound source exists potentially. A signal decomposition processing step for separating a first signal that is a component derived from the point sound source and a second signal that is a component other than the first signal;
A point sound source drive signal calculation unit for obtaining a point sound source drive signal corresponding to a sound pressure gradient based on the first signal and the potential point sound source position; and
A residual drive signal calculating unit that applies a wavefront reconstruction filter to the second signal to obtain a residual drive signal corresponding to a sound pressure gradient; and
A driving signal combining step for generating a speaker driving signal for reproducing the wavefront of the signal picked up by the microphone by combining the point sound source driving signal and the residual driving signal;
Sound field collection and playback method including

請求項１から８のいずれかに記載の音場収音再生装置もしくは請求項９に記載の音場収音装置又は音場再生装置としてコンピュータを機能させるためのプログラム。 A program for causing a computer to function as the sound field sound collecting / reproducing device according to claim 1 or the sound field sound collecting device or the sound field reproducing device according to claim 9.