JP2019213109A

JP2019213109A - Sound field signal estimation device, sound field signal estimation method, program

Info

Publication number: JP2019213109A
Application number: JP2018109188A
Authority: JP
Inventors: 江村　暁; Akira Emura; 暁江村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2018-06-07
Filing date: 2018-06-07
Publication date: 2019-12-12
Also published as: WO2019235193A1

Abstract

To provide a sound field signal estimation device capable of generating an ambisonic signal focused in a specified direction.SOLUTION: A sound field signal estimation device comprises a sparse wave surface decomposition part for calculating an S-dimensional sparse complex vector representing the amplitude and phase of each wave surface of a plane wave, when assuming S as any natural number, and a pickup signal of a spherical surface microphone array is decomposed into S plane waves arriving from previously assumed S direction, a target wave surface extraction part for extracting, from the complex vector, a target vector representing the amplitude and phase of each wave surface of a plane wave becoming a target determined based on the predetermined focus direction, a virtual microphone signal generation part for generating the output signals of four virtual microphones, on the basis of the target vector and virtual three-dimensional position of the four virtual microphones, and an ambisonic signal generation part for generating four ambisonic signals on the basis of the output signal.SELECTED DRAWING: Figure 1

Description

本発明は、音場推定技術に関し、特に球面マイクロホンアレーを用いて収音した収音信号から再生装置向けの音場信号を推定する音場信号推定装置、音場信号推定方法、プログラムに関する。 The present invention relates to a sound field estimation technique, and more particularly to a sound field signal estimation device, a sound field signal estimation method, and a program for estimating a sound field signal for a playback device from a sound collection signal collected using a spherical microphone array.

近年、オーディオ再生に使われるチャネル数およびスピーカ数は、臨場感をより高めるために、2から、5.1へ、さらには22.1へと増加している。このような多チャネル再生システムに共通に使用する信号フォーマットとして、アンビソニックがよく使われる（非特許文献１）。 In recent years, the number of channels and the number of speakers used for audio playback has increased from 2 to 5.1 and further to 22.1 in order to enhance the sense of reality. Ambisonic is often used as a signal format commonly used in such a multi-channel reproduction system (Non-patent Document 1).

アンビソニック信号を実際の収音信号から求める方法として、球面マイクロホンアレーを用いる方法が示されている（非特許文献２）。この方法では、球面マイクロホンアレーを音場に配置し、アレー上の複数マイクロホンで収音する。そしてこのマルチチャネル収音信号をアンビソニック信号に変換する。アンビソニック信号はアンビソニックデコーダによりデコードされて、複数スピーカから再生される。 As a method for obtaining an ambisonic signal from an actual sound pickup signal, a method using a spherical microphone array is shown (Non-Patent Document 2). In this method, a spherical microphone array is placed in a sound field, and sound is picked up by a plurality of microphones on the array. The multi-channel sound pickup signal is converted into an ambisonic signal. The ambisonic signal is decoded by an ambisonic decoder and reproduced from a plurality of speakers.

西村竜一、「特集：立体音響技術５章アンビソニックス」、映像情報メディア学会誌、vol.68、No.8、pp.616-620、(2014)Ryuichi Nishimura, “Special Feature: Stereophonic Technology, Chapter 5 Ambisonics”, Journal of the Institute of Image Information and Television Engineers, vol.68, No.8, pp.616-620, (2014) S. Moreau, J. Daniel, and S. Bertet, “3D Sound Field Recording with Higher Order Ambisonics - Objective Measurements and Validation of a 4th Order Spherical Microphone Array,” 120th AES convention, May, 2006.S. Moreau, J. Daniel, and S. Bertet, “3D Sound Field Recording with Higher Order Ambisonics-Objective Measurements and Validation of a 4th Order Spherical Microphone Array,” 120th AES convention, May, 2006.

通常、音場は複数の音源から放射された音波で構成される。従来のアンビソニック信号生成方法では、どの音源からの音波も均等に扱ってアンビソニック信号を生成する。しかし実際には、受聴者にとっての各音源の重要性は均等ではない。たとえば、音声に映像が組み合わされる場合、映像でフォーカスしている対象物は音源として重要だが、それ以外の音源からの音は、重要性が低い場合がある。 Usually, the sound field is composed of sound waves radiated from a plurality of sound sources. In the conventional ambisonic signal generation method, sound waves from any sound source are treated equally to generate an ambisonic signal. However, in reality, the importance of each sound source to the listener is not equal. For example, when video is combined with audio, an object focused on the video is important as a sound source, but sounds from other sound sources may be less important.

そこで本発明は、指定された方向にフォーカスしたアンビソニック信号を生成することができる音場信号推定装置、音場信号推定方法、プログラムを提供することを目的とする。 Accordingly, an object of the present invention is to provide a sound field signal estimation device, a sound field signal estimation method, and a program that can generate an ambisonic signal focused in a designated direction.

本発明の音場信号推定装置は、スパース波面分解部と、ターゲット波面抽出部と、仮想マイク信号生成部と、アンビソニック信号生成部を含む。 The sound field signal estimation device of the present invention includes a sparse wavefront decomposition unit, a target wavefront extraction unit, a virtual microphone signal generation unit, and an ambisonic signal generation unit.

スパース波面分解部は、Sを任意の自然数とし、球面マイクロホンアレーの収音信号が予め想定したS方向から到来するS個の平面波に分解されると仮定した場合の平面波の各波面の振幅と位相を表すS次元のスパースな複素数ベクトルを計算する。ターゲット波面抽出部は、予め定めたフォーカス方向に基づいて定まるターゲットとなる平面波の各波面の振幅と位相を表すターゲットベクトルを複素数ベクトルから抽出する。仮想マイク信号生成部は、ターゲットベクトルと、４つの仮想マイクロホンの仮想の３次元位置に基づいて、４つの仮想マイクロホンの出力信号を生成する。アンビソニック信号生成部は、出力信号に基づいて、４つのアンビソニック信号を生成する。 The sparse wavefront decomposition unit assumes that S is an arbitrary natural number, and the amplitude and phase of each wavefront of the plane wave when it is assumed that the collected sound signal of the spherical microphone array is decomposed into S plane waves coming from the previously assumed S direction. Compute an S-dimensional sparse complex vector representing. The target wavefront extraction unit extracts a target vector representing the amplitude and phase of each wavefront of a plane wave as a target determined based on a predetermined focus direction from the complex vector. The virtual microphone signal generation unit generates output signals of the four virtual microphones based on the target vector and the virtual three-dimensional position of the four virtual microphones. The ambisonic signal generation unit generates four ambisonic signals based on the output signal.

本発明の音場信号推定装置によれば、指定された方向にフォーカスしたアンビソニック信号を生成することができる。 According to the sound field signal estimation apparatus of the present invention, an ambisonic signal focused in a designated direction can be generated.

実施例１およびその変形例の音場信号推定装置の構成を示すブロック図。The block diagram which shows the structure of Example 1 and the sound field signal estimation apparatus of the modification. 実施例１およびその変形例の音場信号推定装置の動作を示すフローチャート。The flowchart which shows operation | movement of the sound field signal estimation apparatus of Example 1 and its modification. ４つの仮想マイクロホンの仮想の３次元位置を示す図。The figure which shows the virtual three-dimensional position of four virtual microphones. 実施例２およびその変形例の音場信号推定装置の構成を示すブロック図。The block diagram which shows the structure of Example 2 and the sound field signal estimation apparatus of the modification. 実施例２およびその変形例の音場信号推定装置の動作を示すフローチャート。The flowchart which shows operation | movement of the sound field signal estimation apparatus of Example 2 and its modification.

以下、本発明の実施の形態について、詳細に説明する。なお、同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail. In addition, the same number is attached | subjected to the structure part which has the same function, and duplication description is abbreviate | omitted.

以下、図１を参照して本実施例の音場信号推定装置の構成を説明する。同図に示すように、本実施例の音場信号推定装置１００は、短時間フーリエ変換部１１０と、分解抽出変換部１２０と、短時間逆フーリエ変換部１９０を含み、分解抽出変換部１２０は、スパース波面分解部１２１と、ターゲット波面抽出部１２３と、仮想マイク信号生成部１２６と、アンビソニック信号生成部１２８を含む。本実施例の音場信号推定装置１００は、球面マイクロホンアレー９０１の収音信号から再生用のアンビソニック信号を推定する装置である。通常、球面マイクロホンアレー９０１には剛球型が用いられる。以下、図２を参照して各構成要件の動作を説明する。 Hereinafter, the configuration of the sound field signal estimation apparatus of the present embodiment will be described with reference to FIG. As shown in the figure, the sound field signal estimation apparatus 100 of the present embodiment includes a short-time Fourier transform unit 110, a decomposition / extraction conversion unit 120, and a short-time inverse Fourier transform unit 190. , A sparse wavefront decomposition unit 121, a target wavefront extraction unit 123, a virtual microphone signal generation unit 126, and an ambisonic signal generation unit 128. The sound field signal estimation device 100 according to the present embodiment is a device that estimates an ambisonic signal for reproduction from the collected sound signal of the spherical microphone array 901. Normally, a hard sphere type is used for the spherical microphone array 901. Hereinafter, the operation of each component will be described with reference to FIG.

≪短時間フーリエ変換部１１０≫
短時間フーリエ変換部１１０は、球面マイクロホンアレー９０１が収音した収音信号を周波数領域に変換する（Ｓ１１０）。 ≪Short-time Fourier transform unit 110≫
The short-time Fourier transform unit 110 transforms the collected sound signal collected by the spherical microphone array 901 into the frequency domain (S110).

≪分解抽出変換部１２０≫
分解抽出変換部１２０は、周波数領域に変換された収音信号の信号処理を実行する（Ｓ１２０）。以下、ステップＳ１２０の詳細について説明する。
＜スパース波面分解部１２１＞
スパース波面分解部１２１は、Sを任意の自然数とし、球面マイクロホンアレー９０１の収音信号が予め想定したS方向から到来するS個の平面波に分解されると仮定した場合の平面波の各波面の振幅と位相を表すS次元のスパースな複素数ベクトルを計算する（Ｓ１２１）。 ≪Decomposition extraction conversion unit 120≫
The decomposition extraction conversion unit 120 performs signal processing of the collected sound signal converted into the frequency domain (S120). Details of step S120 will be described below.
<Sparse wavefront decomposition unit 121>
The sparse wavefront decomposition unit 121 sets S as an arbitrary natural number, and the amplitude of each wavefront of the plane wave when it is assumed that the sound collection signal of the spherical microphone array 901 is decomposed into S plane waves coming from the S direction assumed in advance. And an S-dimensional sparse complex vector representing the phase is calculated (S121).

スパース波面分解部１２１が実行する具体的な処理を説明する。はじめに、剛球型球面マイクロホンアレー９０１の半径をrとして、この球面マイクロホンアレー９０１に、波数kの平面波が方向Ω_s=(θ_s φ_s)から入射する状況を考える。波数kは、周波数＝音速×kである。θ_sはelevation angleであり、φ_sはazimuth angleである。 A specific process executed by the sparse wavefront decomposition unit 121 will be described. First, suppose that the radius of the hard sphere type spherical microphone array 901 is r and a plane wave having a wave number k is incident on the spherical microphone array 901 from the direction Ω _s = (θ _s φ _s ). The wave number k is frequency = sound speed × k. θ _s is the elevation angle, and φ _s is the azimuth angle.

半径rの球上のΩ'の音圧は The sound pressure of Ω 'on a sphere of radius r is

で与えられる。ただし Given in. However,

である。ここでj_l( )はオーダーｌの球ベッセル関数であり、j'_l( )は関数j_l( )の微分を意味する。h_l ⁽¹⁾(kr)はオーダーｌの第１種球ハンケル関数である。またP_l(cosΘ_Ωs,Ω')は、ｌ次のルジャンドル多項式であり、Θ_Ωs,Ω'は方向Ω_sと方向Ω'のなす角度である。 It is. Here, j _l () is a spherical Bessel function of order l, and j ′ _l () means differentiation of the function j _l (). h _l ⁽¹⁾ (kr) is a first-class sphere Hankel function of order l. P _l (cos Θ Ω _s _{, Ω ′} ) is an l-order Legendre polynomial, and Θ Ω _s _{, Ω ′} is an angle formed by the direction Ω _s and the direction Ω ′.

球面上のM個のマイクロホンの各位置がΩ'_m(1≦m≦M)で与えられる場合に、方向Ω_sからの入射波に対するM個のマイクロホンの応答はベクトル形式で If each position of M microphones on the sphere is given by Ω ' _m (1 ≤ m ≤ M), the response of M microphones to the incident wave from direction Ω _s is in vector form

とかける。 Call it.

ここで、入射波は予め想定したS方向（Sは数百から数千）から到来するS個の平面波であると仮定した場合、波数kにおけるM個のマイクロホン信号p^₁(k)〜p^_M(k)と各平面波との関係は次の式で記述される。 Here, assuming that the incident wave is S plane waves coming from the S direction (S is several hundred to several thousand) assumed in advance, M microphone signals p ^ ₁ (k) to p at wave number k The relationship between ^ _M (k) and each plane wave is described by the following equation.

ここで右辺のa(k)はS次元の複素数ベクトルであり、S個の平面波の各波面の振幅と位相の情報からなる。この複素数ベクトルが疎（スパース）、すなわちごく一部の成分だけが０以外の値をとる、と仮定できるとき、数十個のマイクロホン信号からでも、各波面の振幅と位相からなるベクトルa(k)を求めることができる。 Here, a (k) on the right side is an S-dimensional complex vector, which is composed of information on the amplitude and phase of each wavefront of S plane waves. When it can be assumed that this complex vector is sparse, that is, only a small component takes a value other than 0, a vector a (k consisting of the amplitude and phase of each wavefront can be obtained from several tens of microphone signals. ).

スパース波面分解部１２１は、以下の最適化問題を解いて、スパースな複素数ベクトルa(k)を計算する（Ｓ１２１）。 The sparse wavefront decomposition unit 121 solves the following optimization problem and calculates a sparse complex vector a (k) (S121).

ただし||a||₁はベクトルaのＬ１ノルムを取ることを意味し、 Where || a || ₁ means taking the L1 norm of the vector a

である。D(k)を辞書行列と呼ぶ。この形式の問題はsquare-root LASSOと呼ばれる。式中のパラメータλは参考非特許文献１の方法によりD(k)から決定できる（参考非特許文献１：Florentina Bunea; Johannes Lederer; Yiyuan She, The Group Square-Root Lasso: Theoretical Properties and Fast Algorithms, IEEE Transactions on Information Theory
Year: 2014, Volume: 60, Issue: 2, Pages: 1313 - 1325.）。 It is. D (k) is called a dictionary matrix. This type of problem is called square-root LASSO. The parameter λ in the equation can be determined from D (k) by the method of Reference Non-Patent Document 1 (Reference Non-Patent Document 1: Florentina Bunea; Johannes Lederer; Yiyuan She, The Group Square-Root Lasso: Theoretical Properties and Fast Algorithms, IEEE Transactions on Information Theory
Year: 2014, Volume: 60, Issue: 2, Pages: 1313-1325.).

＜ターゲット波面抽出部１２３＞
ターゲット波面抽出部１２３は、予め定めたフォーカス方向に基づいて定まるターゲットとなる平面波の各波面の振幅と位相を表すターゲットベクトルを複素数ベクトルから抽出する（Ｓ１２３）。 <Target wavefront extraction unit 123>
The target wavefront extraction unit 123 extracts a target vector representing the amplitude and phase of each wavefront of a plane wave that is a target determined based on a predetermined focus direction from the complex vector (S123).

ターゲット波面抽出部１２３が実行する具体的な処理を説明する。ターゲット波面抽出部１２３は、予め定めたフォーカスする方向Ω''（フォーカス方向）に基づき、ターゲットとなる平面波を抽出する。具体的には、ターゲットとなる平面波の各波面の振幅と位相を表すターゲットベクトルを前述した複素数ベクトルa(k)から抽出する。例えばターゲット波面抽出部１２３は、予め想定したS個の方向から、フォーカス方向Ω''との差がδ以下になる方向を全て抽出する。δとしては例えば1〜30°等の値を設定することが考えられる。抽出されたインデックスの個数をS'、各インデックスをb(1)〜b(S')とする。 Specific processing executed by the target wavefront extraction unit 123 will be described. The target wavefront extraction unit 123 extracts a plane wave as a target based on a predetermined focusing direction Ω ″ (focus direction). Specifically, a target vector representing the amplitude and phase of each wavefront of the plane wave that is the target is extracted from the complex vector a (k) described above. For example, the target wavefront extraction unit 123 extracts all directions in which the difference from the focus direction Ω ″ is δ or less from S directions assumed in advance. For example, a value such as 1 to 30 ° may be set as δ. The number of extracted indexes is S ′, and each index is b (1) to b (S ′).

このとき、抽出された方向の平面波からなるマイクロホン信号は、 At this time, the microphone signal consisting of the plane wave in the extracted direction is

と推定できる。なお、p(Ω_b(s'))はターゲットとなる平面波に対するM個のマイクロホンの応答、a_b(s')(k)はターゲットベクトルである。
＜仮想マイク信号生成部１２６＞
仮想マイク信号生成部１２６は、ターゲットベクトルと、４つの仮想マイクロホンの仮想の３次元位置に基づいて、４つの仮想マイクロホンの出力信号を生成する（Ｓ１２６）。 Can be estimated. Note that p (Ω _{b (s ′)} ) is a response of M microphones to a target plane wave, and a _{b (s ′)} (k) is a target vector.
<Virtual microphone signal generation unit 126>
The virtual microphone signal generation unit 126 generates output signals of the four virtual microphones based on the target vector and the virtual three-dimensional positions of the four virtual microphones (S126).

仮想マイク信号生成部１２６が実行する具体的な処理を説明する。仮想マイク信号生成部１２６は、平面波分解の結果a(k)と上記のインデックスb(s')、すなわちターゲットベクトルa_b(s')(k)から、仮想マイクロホンの収音信号（出力信号）を求める。図３に示すように、アンビソニック収音用に４つの仮想マイクロホン（o,x,y,z）を使用し、各仮想マイクロホンの仮想の３次元位置がベクトルr_o、r_x、r_y、r_zで与えられるとする。このとき、仮想マイク信号生成部１２６は、各仮想マイクロホンの出力信号p_o(k)、p_x(k)、p_y(k)、p_z(k)を、ターゲットベクトルa_b(s')(k)と、４つの仮想マイクロホンの仮想の３次元位置r_o、r_x、r_y、r_zに基づいて、下式のように生成する。 Specific processing executed by the virtual microphone signal generation unit 126 will be described. The virtual microphone signal generation unit 126 uses the result of plane wave decomposition a (k) and the above-described index b (s ′), that is, the target vector a _{b (s ′)} (k) to collect the sound signal (output signal) of the virtual microphone. Ask for. As shown in FIG. 3, Ambisonic four virtual microphone for sound pickup (o, x, y, z ) using a virtual three-dimensional position of each virtual microphone vector _{_{_{r o, r x, r y}}} , Let r be given by _z . At this time, the virtual microphone signal generation unit 126 uses the output signals p _o (k), p _x (k), p _y (k), and p _z (k) of each virtual microphone as the target vector a _{b (s ′).} Based on (k) and the virtual three-dimensional positions r _o , r _x , r _y , r _z of the four virtual microphones, the following equation is generated.

ただし、 However,

である。

It is.

＜アンビソニック信号生成部１２８＞
アンビソニック信号生成部１２８は、出力信号に基づいて、４つのアンビソニック信号を生成する（Ｓ１２８）。 <Ambisonic signal generator 128>
The ambisonic signal generation unit 128 generates four ambisonic signals based on the output signal (S128).

アンビソニック信号生成部１２８が実行する具体的な処理を説明する。アンビソニック信号生成部１２８は、各仮想マイクロホンの出力信号p_o(k)、p_x(k)、p_y(k)、p_z(k)から０次および１次のアンビソニック信号を求める。具体的には4つのアンビソニック信号q_o(k)、q_x(k)、q_y(k)、q_z(k)を
q_o(k)=p_o(k)
q_x(k)=p_x(k)-p_o(k)
q_y(k)=p_y(k)-p_o(k)
q_z(k)=p_z(k)-p_o(k)
で求める。 Specific processing executed by the ambisonic signal generator 128 will be described. The ambisonic signal generator 128 obtains 0th-order and 1st-order ambisonic signals from the output signals p _o (k), p _x (k), p _y (k), and p _z (k) of each virtual microphone. Specifically, four ambisonic signals q _o (k), q _x (k), q _y (k), q _z (k)
q _o (k) = p _o (k)
q _x (k) = p _x (k) -p _o (k)
q _y (k) = p _y (k) -p _o (k)
q _z (k) = p _z (k) -p _o (k)
Ask for.

≪短時間逆フーリエ変換部１９０≫
短時間逆フーリエ変換部１９０は、処理済みの周波数領域の信号を時間領域に変換する（Ｓ１９０）。 ≪Short-time inverse Fourier transform unit 190≫
The short-time inverse Fourier transform unit 190 transforms the processed frequency domain signal into the time domain (S190).

［変形例１］
以下、実施例１を変形した変形例１の音場信号推定装置１００Ａについて説明する。実施例１の音場信号推定装置１００は、フォーカス方向の平面波を取り出し、その他の方向の平面波を除去する。一方、本変形例の音場信号推定装置１００Ａは、その他の方向の平面波を除去する代わりに残存させ、ターゲット方向成分を強調する。本変形例の音場信号推定装置１００Ａは、実施例１における仮想マイク信号生成部１２６を仮想マイク信号生成部１２６Ａに代替した構成となっている。 [Modification 1]
Hereinafter, a sound field signal estimation apparatus 100A of Modification 1 obtained by modifying Embodiment 1 will be described. The sound field signal estimation apparatus 100 according to the first embodiment extracts a plane wave in the focus direction and removes plane waves in other directions. On the other hand, the sound field signal estimation device 100A of the present modification causes the target direction component to be emphasized by remaining in place of removing plane waves in other directions. The sound field signal estimation device 100A of the present modification has a configuration in which the virtual microphone signal generation unit 126 in the first embodiment is replaced with a virtual microphone signal generation unit 126A.

仮想マイク信号生成部１２６Ａは、ターゲットベクトルと４つの仮想マイクロホンの仮想の３次元位置に基づいて生成する第１の項と、複素数ベクトルのうちターゲットベクトルとして抽出されなかったベクトルであるターゲット外ベクトルと４つの仮想マイクロホンの仮想の３次元位置と１よりも小さい重み係数に基づいて生成する第２の項により、４つの仮想マイクロホンの出力信号を生成する（Ｓ１２６Ａ）。 The virtual microphone signal generation unit 126A includes a first term that is generated based on the target vector and the virtual three-dimensional position of the four virtual microphones, and a non-target vector that is a vector that has not been extracted as a target vector among complex vectors. The output signals of the four virtual microphones are generated by the second term generated based on the virtual three-dimensional positions of the four virtual microphones and the weighting coefficient smaller than 1 (S126A).

具体的には、ステップＳ１２３で抽出されなかったインデックスの個数をS''、各インデックスをb_n(1)〜b_n(S'')とし、ターゲット外ベクトルをa_bn(s'')(k)とする。仮想マイク信号生成部１２６Ａは、重み係数α＜１をもちいて、各仮想マイクロホンの出力信号p_o(k)、p_x(k)、p_y(k)、p_z(k)を下式のように生成する。 Specifically, the number of indexes not extracted in step S123 is S ″, each index is b _n (1) to b _n (S ″), and the non-target vector is a _{bn (s ″)} ( k). The virtual microphone signal generator 126A uses the weighting coefficient α <1 and outputs the output signals p _o (k), p _x (k), p _y (k), and p _z (k) of each virtual microphone as Generate as follows.

上式の第１項は、ターゲットベクトルa_b(s')(k)と、４つの仮想マイクロホンの仮想の３次元位置r_o、r_x、r_y、r_zに基づく項であり、上式の第２項は、ターゲット外ベクトルa_bn(s'')(k)と４つの仮想マイクロホンの仮想の３次元位置r_o、r_x、r_y、r_zと１よりも小さい重み係数αに基づく項である。この４チャネル信号にステップＳ１２８を実行することにより、その他方向の成分がα（＜１）倍されたアンビソニック信号が得られる。 The first term of the above equation is a term based on the target vector a _{b (s ′)} (k) and the virtual three-dimensional positions r _o , r _x , r _y , r _z of the four virtual microphones. The second term of is a non-target vector a _{bn (s ″)} (k) and virtual three-dimensional positions r _o , r _x , r _y , r _{z of} four virtual microphones and a weighting factor α smaller than 1. It is a term based on. By executing step S128 on the four-channel signal, an ambisonic signal in which the component in the other direction is multiplied by α (<1) is obtained.

＜変形例２＞
以下、実施例１を変形した変形例２の音場信号推定装置１００Ｂについて説明する。実施例１では、球面マイクロホンアレー９０１の各マイクロホンの収音信号をある時刻に注目した単一のベクトルとした。一方、本変形例では、各収音信号を複数の時刻についての複数本のベクトル、すなわち複数時刻の各収音信号を使用する場合をあつかう。複数時刻の各収音信号を入力とすることで、音場の平面波分解をより精度よく求めることが可能となる。本変形例の音場信号推定装置１００Ｂは、実施例１におけるスパース波面分解部１２１をスパース波面分解部１２１Ｂに代替した構成となっている。 <Modification 2>
Hereinafter, a sound field signal estimation apparatus 100B according to Modification 2 of Modification of Embodiment 1 will be described. In the first embodiment, the collected sound signal of each microphone of the spherical microphone array 901 is a single vector focused on a certain time. On the other hand, in this modification, a case where a plurality of vectors for a plurality of times, that is, a plurality of sound collection signals at a plurality of times, is used for each collected signal. By using each collected sound signal at a plurality of times as input, plane wave decomposition of the sound field can be obtained more accurately. The sound field signal estimation device 100B of this modification has a configuration in which the sparse wavefront decomposition unit 121 in the first embodiment is replaced with a sparse wavefront decomposition unit 121B.

スパース波面分解部１２１Ｂは、時間のパラメータを含む最適化問題に基づき、複数時刻の球面マイクロホンアレーの収音信号の全てが再現されるように、複数時刻のS次元のスパースな複素数ベクトルを計算する（Ｓ１２１Ｂ）。 The sparse wavefront decomposition unit 121B calculates an S-dimensional sparse complex vector of a plurality of times so that all the collected signals of the spherical microphone array of a plurality of times are reproduced based on an optimization problem including a time parameter. (S121B).

具体的には、時刻tにおける球面マイクロホンアレー９０１の収音信号ベクトルを Specifically, the collected sound signal vector of the spherical microphone array 901 at time t is

として、T本の収音信号ベクトルp^(k,1)…p^(k,T)が与えられたとする。このとき時刻tにおける複素数ベクトルを Suppose that T collected sound signal vectors p ^ (k, 1)... P ^ (k, T) are given. At this time, the complex vector at time t is

として、
A(k)=[a(k,1)…a(k,T)]
を、スパース波面分解部１２１Ｂにおいて求めることを考える。この問題は複数本のベクトルをまとめて行列化すると、次の時間のパラメータを含む最適化問題の形で表現できる。
A(k)=argmin||[p^(k,1)…p^(k,T)]-D(k)A(k)||_F+λ||A(k)||_1,2
ただし||A||_Fは行列Aのフロベニウスノルムをとることを意味する。また||A(k)||_1,2は行列Aの混合ノルム、
||A(k)||_1,2=Σ^S _s=1||[a_s(k,1)…a_s(k,T)]||₁
を意味する。このノルムは、行列A(k)の各横ベクトルのL1ノルムの総和になる。 As
A (k) = [a (k, 1)… a (k, T)]
Is obtained in the sparse wavefront decomposition unit 121B. This problem can be expressed in the form of an optimization problem including the parameters of the next time when a plurality of vectors are grouped together.
A (k) = argmin || [p ^ (k, 1)… p ^ (k, T)]-D (k) A (k) || _F + λ || A (k) || _1,2
However, || A || _F means taking the Frobenius norm of matrix A. || A (k) || _1,2 is the mixed norm of matrix A,
|| A (k) || _1,2 = Σ ^S _{s = 1} || [a _s (k, 1)… a _s (k, T)] || ₁
Means. This norm is the sum of the L1 norms of each horizontal vector of the matrix A (k).

音場信号推定装置１００Ｂは、時間のパラメータを含み、行列で表現された上述の最適化問題を解いて得られたA(k)に基づき、各時刻においてステップＳ１２３、Ｓ１２６、Ｓ１２８を実行することにより、各時刻におけるアンビソニック信号を求める。 The sound field signal estimation device 100B executes steps S123, S126, and S128 at each time based on A (k) obtained by solving the above optimization problem expressed in a matrix including a time parameter. Thus, an ambisonic signal at each time is obtained.

実施例１では、球面マイクロホンアレーの出力信号から、０次と１次のアンビソニック信号を求める方法を示した。実施例２では、球面マイクロホンアレーの出力信号から、２次以上の高次アンビソニック信号を求める。以下、図４を参照して本実施例の音場信号推定装置の構成を説明する。同図に示すように、本実施例の音場信号推定装置２００は、短時間フーリエ変換部１１０と、分解抽出変換部２２０と、短時間逆フーリエ変換部１９０を含み、分解抽出変換部２２０は、スパース波面分解部１２１と、ターゲット波面抽出部１２３と、仮想球面マイク信号生成部１２７と、高次アンビソニック信号生成部１２９を含む。実施例１における仮想マイク信号生成部１２６が、本実施例において仮想球面マイク信号生成部１２７に、実施例１におけるアンビソニック信号生成部１２８が、本実施例において高次アンビソニック信号生成部１２９に、それぞれ置き換わっている。 In the first embodiment, the method of obtaining the 0th-order and 1st-order ambisonic signals from the output signal of the spherical microphone array is shown. In the second embodiment, a second-order or higher order ambisonic signal is obtained from the output signal of the spherical microphone array. Hereinafter, the configuration of the sound field signal estimation apparatus of the present embodiment will be described with reference to FIG. As shown in the figure, the sound field signal estimation apparatus 200 of the present embodiment includes a short-time Fourier transform unit 110, a decomposition / extraction conversion unit 220, and a short-time inverse Fourier transform unit 190. , A sparse wavefront decomposition unit 121, a target wavefront extraction unit 123, a virtual spherical microphone signal generation unit 127, and a higher-order ambisonic signal generation unit 129. The virtual microphone signal generator 126 in the first embodiment is used as the virtual spherical microphone signal generator 127 in this embodiment, and the ambisonic signal generator 128 in the first embodiment is used as the higher-order ambisonic signal generator 129 in this embodiment. , Each has been replaced.

以下、図５を参照して、実施例１と異なる構成要件である仮想球面マイク信号生成部１２７、高次アンビソニック信号生成部１２９の動作を説明する。 Hereinafter, the operations of the virtual spherical microphone signal generation unit 127 and the higher-order ambisonic signal generation unit 129, which are different configuration requirements from the first embodiment, will be described with reference to FIG.

＜仮想球面マイク信号生成部１２７＞
仮想球面マイク信号生成部１２７は、ターゲットベクトルに基づいて仮想の球面マイクロホンアレーの出力信号を生成する（Ｓ１２７）。あるいは、仮想球面マイク信号生成部１２７は、ターゲットベクトルに基づいて生成する第１の項と、複素数ベクトルのうち、ターゲットベクトルとして抽出されなかったベクトルであるターゲット外ベクトルと１よりも小さい重み係数に基づいて生成する第２の項により、仮想の球面マイクロホンアレーの出力信号を生成する（Ｓ１２７）。 <Virtual Spherical Microphone Signal Generation Unit 127>
The virtual spherical microphone signal generation unit 127 generates an output signal of the virtual spherical microphone array based on the target vector (S127). Alternatively, the virtual spherical microphone signal generation unit 127 sets the first term to be generated based on the target vector, the non-target vector that is not extracted as the target vector among the complex vector, and the weighting coefficient smaller than 1. The output signal of the virtual spherical microphone array is generated by the second term generated based on the second term (S127).

前述したように、ステップＳ１２３では、平面波分解の結果a(k)のうち、フォーカス方向に含まれるインデックスb(1)〜b(S')（すなわちターゲットベクトルa_b(s')(k)）と、フォーカス方向に含まれないインデックスb_n(1)〜b_n(S'')（すなわちターゲット外ベクトルa_bn(s'')(k)）が抽出される。仮想球面マイク信号生成部１２７は、フォーカス方向の平面波成分からなる球面マイクロホン信号を、 As described above, in step S123, of the plane wave decomposition result a (k), the indices b (1) to b (S ′) (that is, the target vector a _{b (s ′)} (k)) included in the focus direction. Then, indexes b _n (1) to b _n (S ″) (that is, out-target vectors a _{bn (s ″)} (k)) not included in the focus direction are extracted. The virtual spherical microphone signal generation unit 127 converts a spherical microphone signal composed of a plane wave component in the focus direction,

と推定する。仮想球面マイク信号生成部１２７は、フォーカス外の平面波成分からなる球面マイクロホン信号を Estimated. The virtual spherical microphone signal generator 127 converts a spherical microphone signal composed of a plane wave component out of focus.

と推定する。 Estimated.

仮想球面マイク信号生成部１２７は、仮想の球面マイクロホンアレーの出力信号を The virtual spherical microphone signal generation unit 127 outputs the output signal of the virtual spherical microphone array.

として出力する（ただしα＜１）。出力された仮想の球面マイクロホンアレーの出力信号に後述するステップＳ１２９を実行することで、非フォーカス成分が一部残留する高次アンビソニック信号を得ることができる。上の式でα＝０にしてステップＳ１２９を実行すれば、特定方向にフォーカスされた高次アンビソニック信号が得られる。
＜高次アンビソニック信号生成部１２９＞
高次アンビソニック信号生成部１２９は、仮想の球面マイクロホンアレーの出力信号を球調和関数領域へ変換し、高次アンビソニック信号を生成する（Ｓ１２９）。 (Where α <1). By executing step S129, which will be described later, on the output signal of the virtual spherical microphone array that has been output, it is possible to obtain a higher-order ambisonic signal in which a part of the non-focus component remains. If α = 0 in the above equation and step S129 is executed, a high-order ambisonic signal focused in a specific direction can be obtained.
<Higher order ambisonic signal generator 129>
The high-order ambisonic signal generator 129 converts the output signal of the virtual spherical microphone array into the spherical harmonic function region, and generates a high-order ambisonic signal (S129).

高次アンビソニック信号生成部１２９は、非特許文献２の方法をそのまま用いる。非特許文献２では、周波数領域の球面マイクロホンアレーの収音信号を球調和関数領域へ変換し、処理することで、高次アンビソニック信号を生成する手法が提案されている。 The higher-order ambisonic signal generator 129 uses the method of Non-Patent Document 2 as it is. Non-Patent Document 2 proposes a method of generating a higher-order ambisonic signal by converting a collected signal of a spherical microphone array in a frequency domain into a spherical harmonic function domain and processing it.

なお実施例２についても、実施例１の変形例２と同様に、複数時刻の各収音信号を使用することが可能である。 Note that in the second embodiment as well, as in the second modification of the first embodiment, it is possible to use each collected sound signal at a plurality of times.

＜補記＞
本発明の装置は、例えば単一のハードウェアエンティティとして、キーボードなどが接続可能な入力部、液晶ディスプレイなどが接続可能な出力部、ハードウェアエンティティの外部に通信可能な通信装置（例えば通信ケーブル）が接続可能な通信部、ＣＰＵ（Central Processing Unit、キャッシュメモリやレジスタなどを備えていてもよい）、メモリであるＲＡＭやＲＯＭ、ハードディスクである外部記憶装置並びにこれらの入力部、出力部、通信部、ＣＰＵ、ＲＡＭ、ＲＯＭ、外部記憶装置の間のデータのやり取りが可能なように接続するバスを有している。また必要に応じて、ハードウェアエンティティに、ＣＤ−ＲＯＭなどの記録媒体を読み書きできる装置（ドライブ）などを設けることとしてもよい。このようなハードウェア資源を備えた物理的実体としては、汎用コンピュータなどがある。 <Supplementary note>
The apparatus of the present invention includes, for example, a single hardware entity as an input unit to which a keyboard or the like can be connected, an output unit to which a liquid crystal display or the like can be connected, and a communication device (for example, a communication cable) capable of communicating outside the hardware entity. Can be connected to a communication unit, a CPU (Central Processing Unit, may include a cache memory or a register), a RAM or ROM that is a memory, an external storage device that is a hard disk, and an input unit, an output unit, or a communication unit thereof , A CPU, a RAM, a ROM, and a bus connected so that data can be exchanged between the external storage devices. If necessary, the hardware entity may be provided with a device (drive) that can read and write a recording medium such as a CD-ROM. A physical entity having such hardware resources includes a general-purpose computer.

ハードウェアエンティティの外部記憶装置には、上述の機能を実現するために必要となるプログラムおよびこのプログラムの処理において必要となるデータなどが記憶されている（外部記憶装置に限らず、例えばプログラムを読み出し専用記憶装置であるＲＯＭに記憶させておくこととしてもよい）。また、これらのプログラムの処理によって得られるデータなどは、ＲＡＭや外部記憶装置などに適宜に記憶される。 The external storage device of the hardware entity stores a program necessary for realizing the above functions and data necessary for processing the program (not limited to the external storage device, for example, reading a program) It may be stored in a ROM that is a dedicated storage device). Data obtained by the processing of these programs is appropriately stored in a RAM or an external storage device.

ハードウェアエンティティでは、外部記憶装置（あるいはＲＯＭなど）に記憶された各プログラムとこの各プログラムの処理に必要なデータが必要に応じてメモリに読み込まれて、適宜にＣＰＵで解釈実行・処理される。その結果、ＣＰＵが所定の機能（上記、…部、…手段などと表した各構成要件）を実現する。 In the hardware entity, each program stored in an external storage device (or ROM or the like) and data necessary for processing each program are read into a memory as necessary, and are interpreted and executed by a CPU as appropriate. . As a result, the CPU realizes a predetermined function (respective component requirements expressed as the above-described unit, unit, etc.).

本発明は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記実施形態において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。 The present invention is not limited to the above-described embodiment, and can be modified as appropriate without departing from the spirit of the present invention. In addition, the processing described in the above embodiment may be executed not only in time series according to the order of description but also in parallel or individually as required by the processing capability of the apparatus that executes the processing. .

既述のように、上記実施形態において説明したハードウェアエンティティ（本発明の装置）における処理機能をコンピュータによって実現する場合、ハードウェアエンティティが有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記ハードウェアエンティティにおける処理機能がコンピュータ上で実現される。 As described above, when the processing functions in the hardware entity (the apparatus of the present invention) described in the above embodiment are realized by a computer, the processing contents of the functions that the hardware entity should have are described by a program. Then, by executing this program on a computer, the processing functions in the hardware entity are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ−ＲＡＭ（Random Access Memory）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ−Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ−ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only). Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording medium, MO (Magneto-Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記録媒体に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its storage device. When executing the process, the computer reads a program stored in its own recording medium and executes a process according to the read program. As another execution form of the program, the computer may directly read the program from the portable recording medium and execute processing according to the program, and the program is transferred from the server computer to the computer. Each time, the processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program in this embodiment includes information that is used for processing by an electronic computer and that conforms to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、ハードウェアエンティティを構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In this embodiment, a hardware entity is configured by executing a predetermined program on a computer. However, at least a part of these processing contents may be realized by hardware.

Claims

Sを任意の自然数とし、球面マイクロホンアレーの収音信号が予め想定したS方向から到来するS個の平面波に分解されると仮定した場合の前記平面波の各波面の振幅と位相を表すS次元のスパースな複素数ベクトルを計算するスパース波面分解部と、
予め定めたフォーカス方向に基づいて定まるターゲットとなる平面波の各波面の振幅と位相を表すターゲットベクトルを前記複素数ベクトルから抽出するターゲット波面抽出部と、
前記ターゲットベクトルと、４つの仮想マイクロホンの仮想の３次元位置に基づいて、４つの前記仮想マイクロホンの出力信号を生成する仮想マイク信号生成部と、
前記出力信号に基づいて、４つのアンビソニック信号を生成するアンビソニック信号生成部を含む
音場信号推定装置。 S is an arbitrary natural number, and the S-dimensional signal representing the amplitude and phase of each wavefront of the plane wave when the collected sound signal of the spherical microphone array is assumed to be decomposed into S plane waves coming from the previously assumed S direction. A sparse wavefront decomposition unit for calculating a sparse complex vector,
A target wavefront extraction unit that extracts a target vector representing the amplitude and phase of each wavefront of a plane wave as a target determined based on a predetermined focus direction from the complex vector;
A virtual microphone signal generation unit that generates output signals of the four virtual microphones based on the target vector and a virtual three-dimensional position of the four virtual microphones;
A sound field signal estimation apparatus including an ambisonic signal generator that generates four ambisonic signals based on the output signal.

請求項１に記載の音場信号推定装置であって、
前記仮想マイク信号生成部は、
前記ターゲットベクトルと４つの仮想マイクロホンの仮想の３次元位置に基づいて生成する第１の項と、前記複素数ベクトルのうち前記ターゲットベクトルとして抽出されなかったベクトルであるターゲット外ベクトルと４つの仮想マイクロホンの仮想の３次元位置と１よりも小さい重み係数に基づいて生成する第２の項により、４つの前記仮想マイクロホンの出力信号を生成する
音場信号推定装置。 The sound field signal estimation device according to claim 1,
The virtual microphone signal generator is
A first term that is generated based on the target vector and a virtual three-dimensional position of four virtual microphones; an out-of-target vector that is a vector that has not been extracted as the target vector of the complex vector; and four virtual microphones A sound field signal estimation device that generates output signals of four virtual microphones according to a second term that is generated based on a virtual three-dimensional position and a weighting factor smaller than one.

Sを任意の自然数とし、球面マイクロホンアレーの収音信号が予め想定したS方向から到来するS個の平面波に分解されると仮定した場合の前記平面波の各波面の振幅と位相を表すS次元のスパースな複素数ベクトルを計算するスパース波面分解部と、
予め定めたフォーカス方向に基づいて定まるターゲットとなる平面波の各波面の振幅と位相を表すターゲットベクトルを前記複素数ベクトルから抽出するターゲット波面抽出部と、
前記ターゲットベクトルに基づいて仮想の球面マイクロホンアレーの出力信号を生成する仮想球面マイク信号生成部と、
前記仮想の球面マイクロホンアレーの出力信号を球調和関数領域へ変換し、高次アンビソニック信号を生成する高次アンビソニック信号生成部を含む
音場信号推定装置。 S is an arbitrary natural number, and the S-dimensional signal representing the amplitude and phase of each wavefront of the plane wave when the collected sound signal of the spherical microphone array is assumed to be decomposed into S plane waves coming from the previously assumed S direction. A sparse wavefront decomposition unit for calculating a sparse complex vector,
A target wavefront extraction unit that extracts a target vector representing the amplitude and phase of each wavefront of a plane wave as a target determined based on a predetermined focus direction from the complex vector;
A virtual spherical microphone signal generator that generates an output signal of a virtual spherical microphone array based on the target vector;
A sound field signal estimation device including a high-order ambisonic signal generator that converts an output signal of the virtual spherical microphone array into a spherical harmonic function region and generates a high-order ambisonic signal.

請求項３に記載の音場信号推定装置であって、
前記仮想球面マイク信号生成部は、
前記ターゲットベクトルに基づいて生成する第１の項と、前記複素数ベクトルのうち、前記ターゲットベクトルとして抽出されなかったベクトルであるターゲット外ベクトルと１よりも小さい重み係数に基づいて生成する第２の項により、仮想の球面マイクロホンアレーの出力信号を生成する
音場信号推定装置。 The sound field signal estimation device according to claim 3,
The virtual spherical microphone signal generator is
A first term that is generated based on the target vector, and a second term that is generated based on a non-target vector that is a vector that has not been extracted as the target vector, and a weighting factor smaller than 1, among the complex vectors. A sound field signal estimation device that generates an output signal of a virtual spherical microphone array.

請求項１から４の何れかに記載の音場信号推定装置であって、
前記スパース波面分解部は、
時間のパラメータを含む最適化問題に基づき、複数時刻の球面マイクロホンアレーの収音信号の全てが再現されるように、複数時刻のS次元のスパースな複素数ベクトルを計算する
音場信号推定装置。 The sound field signal estimation device according to any one of claims 1 to 4,
The sparse wavefront decomposition unit is
A sound field signal estimation device that calculates an S-dimensional sparse complex vector of multiple times so that all of the collected signals of a spherical microphone array of multiple times are reproduced based on an optimization problem including time parameters.

音場信号推定装置が実行する音場信号推定方法であって、
Sを任意の自然数とし、球面マイクロホンアレーの収音信号が予め想定したS方向から到来するS個の平面波に分解されると仮定した場合の前記平面波の各波面の振幅と位相を表すS次元のスパースな複素数ベクトルを計算するステップと、
予め定めたフォーカス方向に基づいて定まるターゲットとなる平面波の各波面の振幅と位相を表すターゲットベクトルを前記複素数ベクトルから抽出するステップと、
前記ターゲットベクトルと、４つの仮想マイクロホンの仮想の３次元位置に基づいて、４つの前記仮想マイクロホンの出力信号を生成するステップと、
前記出力信号に基づいて、４つのアンビソニック信号を生成するステップを含む
音場信号推定方法。 A sound field signal estimation method executed by the sound field signal estimation device,
S is an arbitrary natural number, and the S-dimensional signal representing the amplitude and phase of each wavefront of the plane wave when the collected sound signal of the spherical microphone array is assumed to be decomposed into S plane waves coming from the previously assumed S direction. Calculating a sparse complex vector;
Extracting a target vector representing the amplitude and phase of each wavefront of a plane wave as a target determined based on a predetermined focus direction from the complex vector;
Generating output signals of the four virtual microphones based on the target vector and a virtual three-dimensional position of the four virtual microphones;
A method for estimating a sound field signal, comprising: generating four ambisonic signals based on the output signal.

音場信号推定装置が実行する音場信号推定方法であって、
Sを任意の自然数とし、球面マイクロホンアレーの収音信号が予め想定したS方向から到来するS個の平面波に分解されると仮定した場合の前記平面波の各波面の振幅と位相を表すS次元のスパースな複素数ベクトルを計算するステップと、
予め定めたフォーカス方向に基づいて定まるターゲットとなる平面波の各波面の振幅と位相を表すターゲットベクトルを前記複素数ベクトルから抽出するステップと、
前記ターゲットベクトルに基づいて仮想の球面マイクロホンアレーの出力信号を生成するステップと、
前記仮想の球面マイクロホンアレーの出力信号を球調和関数領域へ変換し、高次アンビソニック信号を生成するステップを含む
音場信号推定方法。 A sound field signal estimation method executed by the sound field signal estimation device,
S is an arbitrary natural number, and the S-dimensional signal representing the amplitude and phase of each wavefront of the plane wave when the collected sound signal of the spherical microphone array is assumed to be decomposed into S plane waves coming from the previously assumed S direction. Calculating a sparse complex vector;
Extracting a target vector representing the amplitude and phase of each wavefront of a plane wave as a target determined based on a predetermined focus direction from the complex vector;
Generating an output signal of a virtual spherical microphone array based on the target vector;
A method for estimating a sound field signal, comprising: converting an output signal of the virtual spherical microphone array into a spherical harmonic function region to generate a high-order ambisonic signal.

コンピュータを請求項１から５の何れかに記載の音場信号推定装置として機能させるプログラム。 A program for causing a computer to function as the sound field signal estimation device according to any one of claims 1 to 5.