JP6815956B2

JP6815956B2 - Filter coefficient calculator, its method, and program

Info

Publication number: JP6815956B2
Application number: JP2017175898A
Authority: JP
Inventors: 江村　暁; 暁江村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2017-09-13
Filing date: 2017-09-13
Publication date: 2021-01-20
Anticipated expiration: 2037-09-13
Also published as: JP2019054344A

Description

本発明は、複数のマイクロホンを用いてビームを形成するビームフォーミング技術を用いた収音装置、収音装置において用いるフィルタ係数を算出するフィルタ係数算出装置、その方法、及びプログラムに関する。 The present invention relates to a sound collecting device using a beamforming technique for forming a beam using a plurality of microphones, a filter coefficient calculating device for calculating a filter coefficient used in the sound collecting device, a method thereof, and a program.

複数のマイクロホンを音場に設置してマルチチャネルのマイクロホン信号を取得し、そこからターゲットとする音声や音(以下、ターゲット音ともいう)をクリアに、ノイズやその他音声(以下、非ターゲット音ともいう)をできる限り取り除いて、取り出す技術のニーズが近年高まっている。そのために、複数のマイクロホンをもちいてビームを形成するビームフォーミング技術が近年さかんに研究開発されている。 Multiple microphones are installed in the sound field to acquire multi-channel microphone signals, from which the target voice and sound (hereinafter, also referred to as target sound) are cleared, and noise and other sounds (hereinafter, also referred to as non-target sound) are cleared. In recent years, there has been an increasing need for technology to remove as much as possible. For this reason, beamforming technology for forming a beam using a plurality of microphones has been actively researched and developed in recent years.

ビームフォーミング技術では、図１のようにN個のマイクロホン９１−ｎ(ただし、n=1,2,…,N)で収音した各マイクロホン信号y_n(t)にフィルタリング部９２−ｎにおいてフィルタを適用する。なお、tは時刻を示すインデックスである。次に、加算部９３においてフィルタリング部９２−ｎの出力値の総和をとる。求めた総和を収音装置の出力信号z(t)として出力する。このような構成により雑音を大幅に減らし、ターゲット音をより明瞭に取り出すことができる。このようなビームフォーミングのフィルタを求める方法として、minimum variance distortionless response法（MVDR法）がよく使われる（非特許文献１参照）。 In the beamforming technology, as shown in FIG. 1, each microphone signal y _n (t) picked up by N microphones 91-n (however, n = 1,2, ..., N) is filtered by the filtering unit 92-n. To apply. Note that t is an index indicating the time. Next, the addition unit 93 sums the output values of the filtering unit 92-n. The obtained sum is output as the output signal z (t) of the sound collecting device. With such a configuration, noise can be significantly reduced and the target sound can be extracted more clearly. The minimum variance distortionless response method (MVDR method) is often used as a method for obtaining such a beamforming filter (see Non-Patent Document 1).

図２を用いて、MVDR法を説明する。なお、以下では、周波数fにおいて目的音源に対する各マイクロホンの応答a(f)(ステアリングベクトル)が既知であると想定している。非特許文献２では、マイクロホン信号からa(f)を相対的に推定する方法が示されており、この想定は妥当である。 The MVDR method will be described with reference to FIG. In the following, it is assumed that the response a (f) (steering vector) of each microphone to the target sound source is known at the frequency f. Non-Patent Document 2 shows a method of relatively estimating a (f) from a microphone signal, and this assumption is valid.

マイクロホンアレイからのNチャネルマイクロホン信号y_n(t)(1≦n≦N)はフレームごとに短時間フーリエ変換部１０７において短時間フーリエ変換される。その周波数f、フレームlでの変換結果を The N-channel microphone signal y _n (t) (1 ≦ n ≦ N) from the microphone array is subjected to short-time Fourier transform in the short-time Fourier transform unit 107 for each frame. The conversion result at the frequency f and frame l

のようにベクトル化して扱う。このNチャネルマイクロホン信号y(f,l)は、
y(f,l)=x(f,l)+v(f,l) (2)
のようにターゲット音の直接波のマルチチャネル信号x(f,l）と、反射残響成分および雑音のマルチチャネル信号v(f,l）からなる。 It is treated as a vector like. This N-channel microphone signal y (f, l) is
y (f, l) = x (f, l) + v (f, l) (2)
It consists of a direct wave multi-channel signal x (f, l) of the target sound and a multi-channel signal v (f, l) of the reflected reverberation component and noise.

相関行列算出部１０１では、Nチャネルマイクロホン信号y(f,l)の周波数fでの空間相関行列を
R(f,l)=E[y(f,l)y^H(f,l)] (3)
で算出する。ただしE[ ]は期待値をとることを意味する。またy^H(f,l)は、y(f,l)を転置し複素共役をとったベクトルである。実際の処理では通常E[ ]の代わりに短時間平均を用いる。 In the correlation matrix calculation unit 101, the spatial correlation matrix at the frequency f of the N-channel microphone signal y (f, l) is calculated.
R (f, l) = E [y (f, l) y ^H (f, l)] (3)
Calculate with. However, E [] means to take the expected value. Y ^H (f, l) is a vector obtained by transposing y (f, l) and taking the complex conjugate. In actual processing, short-term averaging is usually used instead of E [].

ステアリングベクトル決定部１０２では、非特許文献２等の方法により、ステアリングベクトルa(f)を決める。非特許文献２で推定したa(f)を用いる場合、参照マイクロホンが収音する目的音源の音がターゲット音となる。 The steering vector determination unit 102 determines the steering vector a (f) by a method such as Non-Patent Document 2. When a (f) estimated in Non-Patent Document 2 is used, the sound of the target sound source picked up by the reference microphone becomes the target sound.

アレーフィルタ推定部１０４では、次の拘束条件つき最適化問題を解いて、フィルタ係数ベクトル(すなわちN次元複素数ベクトル)h(f,l)を求める。
h(f,l)=arg min h^H(f,l)R(f,l)h(f,l) (4)
拘束条件
h^H(f,l)a(f)=1
上記の最適化問題は、周波数fにおいてターゲット音を無歪みで出力するという拘束条件のもとで、フィルタ係数ベクトルh(f,l)をNチャネルマイクロホン信号y(f,l)に適用した際に得られる値のパワーが最小になるようにフィルタ係数ベクトルh(f,l)を求めている。 The array filter estimation unit 104 solves the following constrained optimization problem to obtain the filter coefficient vector (that is, the N-dimensional complex number vector) h (f, l).
h (f, l) = arg min h ^H (f, l) R (f, l) h (f, l) (4)
Restraint condition
h ^H (f, l) a (f) = 1
The above optimization problem is when the filter coefficient vector h (f, l) is applied to the N-channel microphone signal y (f, l) under the constraint that the target sound is output without distortion at the frequency f. The filter coefficient vector h (f, l) is calculated so that the power of the value obtained in is minimized.

アレーフィルタリング部１０５では、次式により、推定されたフィルタ係数ベクトルh(f,l)をマイクロホン信号y(f,l)に適用する。
z(f,l)=h^H(f,l)y(f,l) (5)
これにより、ターゲット音以外の成分を極力抑えてターゲット音を取り出すことができる。全周波数での処理結果を短時間逆フーリエ変換部１０８において短時間逆フーリエ変換することで、ターゲット音を取り出すことができる。 In the array filtering unit 105, the filter coefficient vector h (f, l) estimated by the following equation is applied to the microphone signal y (f, l).
z (f, l) = h ^H (f, l) y (f, l) (5)
As a result, the target sound can be extracted by suppressing components other than the target sound as much as possible. The target sound can be extracted by performing the short-time inverse Fourier transform on the processing results at all frequencies in the short-time inverse Fourier transform unit 108.

D. H. Johnson, D. E. Dudgeon, "Array Signal Processing", Prentice Hall, 1993.D. H. Johnson, D. E. Dudgeon, "Array Signal Processing", Prentice Hall, 1993. S. Araki, H. Sawada, and S. Makino," Blind speech separation in a meeting situation with maximum SNR beamformer", in proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP2007), 2007, pp. 41-44.S. Araki, H. Sawada, and S. Makino, "Blind speech separation in a meeting situation with maximum SNR beamformer", in proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP2007), 2007, pp. 41 -44.

上記のMVDR法によりターゲット音がクリアに抽出されるためには、相関行列が高精度に推定されている必要がある。しかしながら、実際に処理に使われる相関行列は、有限のサンプルから短時間平均をもちいて算出されている。そのために相関行列の各要素の値は真値と完全に一致するわけではなく、一定のばらつき（分散）をもってしまう。統計学上、サンプル数を増加させることで、この分散を小さくすることは可能だが0にはできない。 In order for the target sound to be clearly extracted by the above MVDR method, the correlation matrix needs to be estimated with high accuracy. However, the correlation matrix actually used for processing is calculated from a finite number of samples using a short-time average. Therefore, the value of each element of the correlation matrix does not completely match the true value, and has a certain variation (variance). Statistically, it is possible to reduce this variance by increasing the number of samples, but not zero.

真の相関行列からフィルタ係数ベクトルを求めることができれば、理想的な収音ビームを形成できる。しかしながら、実際には推定誤差のある相関行列からマイクロホンアレイのフィルタ係数ベクトルが求められる。このフィルタ係数が形成する収音ビームは理想の収音ビームからズレが生じ、雑音や残響の抑え込みが本来より弱くなる。そのため、理想的な収音ビームのようにはターゲット音をクリアに抽出できない。この場合、スペクトルグラムでフィルタの出力信号をみると、ターゲット音成分のないところでも、残響音の成分や雑音の成分が重畳してしまう。 If the filter coefficient vector can be obtained from the true correlation matrix, an ideal sound collecting beam can be formed. However, in reality, the filter coefficient vector of the microphone array can be obtained from the correlation matrix with estimation error. The sound collecting beam formed by this filter coefficient deviates from the ideal sound collecting beam, and the suppression of noise and reverberation becomes weaker than it should be. Therefore, the target sound cannot be extracted clearly like the ideal sound collecting beam. In this case, when looking at the output signal of the filter in the spectrum gram, the reverberation sound component and the noise component are superimposed even in the place where there is no target sound component.

そこで本発明では、マイクロホン信号からターゲット音を従来よりクリアに抽出するためのフィルタ係数ベクトルを求めるフィルタ係数算出装置、求めたフィルタ係数ベクトルを用いる収音装置、その方法、及びプログラムを提供することを目的とする。 Therefore, the present invention provides a filter coefficient calculation device for obtaining a filter coefficient vector for clearly extracting a target sound from a microphone signal, a sound collecting device using the obtained filter coefficient vector, a method thereof, and a program. The purpose.

上記の課題を解決するために、本発明の一態様によれば、フィルタ係数算出装置は、Nを2以上の整数の何れかとし、N個のマイクロホンからなるマイクロホンアレーにおける音源方向に対するステアリングベクトルをもとに、ターゲット音を無歪みで出力するという拘束条件のもとで、マイクロホンアレーの周波数領域のNチャネルマイクロホン信号にフィルタ係数ベクトルを適用して得られる値が疎になるようにフィルタ係数ベクトルを求める。 In order to solve the above problems, according to one aspect of the present invention, the filter coefficient calculation device sets N to one of two or more integers, and sets a steering vector for the sound source direction in a microphone array consisting of N microphones. Based on this, under the constraint that the target sound is output without distortion, the filter coefficient vector is sparse so that the value obtained by applying the filter coefficient vector to the N-channel microphone signal in the frequency region of the microphone array becomes sparse. Ask for.

上記の課題を解決するために、本発明の他の態様によれば、フィルタ係数算出方法は、Nを2以上の整数の何れかとし、N個のマイクロホンからなるマイクロホンアレーにおける音源方向に対するステアリングベクトルをもとに、ターゲット音を無歪みで出力するという拘束条件のもとで、マイクロホンアレーの周波数領域のNチャネルマイクロホン信号にフィルタ係数ベクトルを適用して得られる値が疎になるようにフィルタ係数ベクトルを求める。 In order to solve the above problems, according to another aspect of the present invention, the filter coefficient calculation method uses N as one of two or more integers, and a steering vector with respect to the sound source direction in a microphone array consisting of N microphones. The filter coefficient is such that the value obtained by applying the filter coefficient vector to the N-channel microphone signal in the frequency domain of the microphone array becomes sparse under the constraint condition that the target sound is output without distortion. Find the vector.

本発明によれば、マイクロホン信号からターゲット音を従来よりクリアに抽出することができるという効果を奏する。 According to the present invention, there is an effect that the target sound can be extracted more clearly than before from the microphone signal.

従来技術に係る収音装置の機能ブロック図。The functional block diagram of the sound collecting device which concerns on the prior art. 従来技術に係るMVDR法を説明するための図。The figure for demonstrating the MVDR method which concerns on the prior art. 第一、第二及び第三実施形態に係る収音装置の機能ブロック図。The functional block diagram of the sound collecting apparatus which concerns on 1st, 2nd and 3rd Embodiment. 第一、第二及び第三実施形態に係る収音装置の処理フローの例を示す図。The figure which shows the example of the processing flow of the sound collecting apparatus which concerns on 1st, 2nd and 3rd Embodiment. 第二実施形態に係るステアリングベクトル決定部の機能ブロック図。The functional block diagram of the steering vector determination part which concerns on 2nd Embodiment. 第二実施形態に係るステアリングベクトル決定部の処理フローの例を示す図。The figure which shows the example of the processing flow of the steering vector determination part which concerns on 2nd Embodiment.

以下、本発明の実施形態について、説明する。なお、以下の説明に用いる図面では、同じ機能を持つ構成部や同じ処理を行うステップには同一の符号を記し、重複説明を省略する。以下の説明において、ベクトルや行列の各要素単位で行われる処理は、特に断りが無い限り、そのベクトルやその行列の全ての要素に対して適用されるものとする。 Hereinafter, embodiments of the present invention will be described. In the drawings used in the following description, the same reference numerals are given to the components having the same function and the steps for performing the same processing, and duplicate description is omitted. In the following description, the processing performed for each element of a vector or matrix shall be applied to all the elements of the vector or matrix unless otherwise specified.

＜第一実施形態＞
図３は第一実施形態に係る収音装置の機能ブロック図を、図４はその処理フローを示す。 <First Embodiment>
FIG. 3 shows a functional block diagram of the sound collecting device according to the first embodiment, and FIG. 4 shows a processing flow thereof.

以下、N個のマイクロホン９１−ｎ(n=1,2,…,N)からなるマイクロホンアレイの出力信号(以下、Nチャネルマイクロホン信号ともいう)y_n(t)を対象とし、そのNチャネルマイクロホン信号y_n(t)から、ターゲット音をとりだす収音装置について説明する。この例では、マイクロホン９１−ｎは無指向性のマイクロホン素子からなる。 Hereinafter, the output signal (hereinafter, also referred to as N-channel microphone signal) y _n (t) of a microphone array consisting of N microphones 91-n (n = 1,2, ..., N) is targeted, and the N-channel microphone is targeted. A sound collecting device that extracts a target sound from a signal y _n (t) will be described. In this example, the microphone 91-n comprises an omnidirectional microphone element.

収音装置２００は、Nチャネルマイクロホン信号y_n(t)を入力とし、ターゲット音を取り出し、出力信号z(t)とし、出力する。 The sound collecting device 200 takes the N-channel microphone signal y _n (t) as an input, extracts the target sound, sets it as an output signal z (t), and outputs the target sound.

収音装置２００は、短時間フーリエ変換部１０７と、ステアリングベクトル決定部１０２と、アレーフィルタ推定部２０４と、アレーフィルタリング部１０５と、短時間逆フーリエ変換部１０８とを含む。 The sound collecting device 200 includes a short-time Fourier transform unit 107, a steering vector determination unit 102, an array filter estimation unit 204, an array filtering unit 105, and a short-time inverse Fourier transform unit 108.

収音装置２００は、例えば、中央演算処理装置（CPU: Central Processing Unit）、主記憶装置（RAM: Random Access Memory）などを有する公知又は専用のコンピュータに特別なプログラムが読み込まれて構成された特別な装置である。収音装置２００は、例えば、中央演算処理装置の制御のもとで各処理を実行する。収音装置２００に入力されたデータや各処理で得られたデータは、例えば、主記憶装置に格納され、主記憶装置に格納されたデータは必要に応じて中央演算処理装置へ読み出されて他の処理に利用される。収音装置２００の各処理部は、少なくとも一部が集積回路等のハードウェアによって構成されていてもよい。収音装置２００が備える各記憶部は、例えば、RAM（Random Access Memory）などの主記憶装置、ハードディスクや光ディスクもしくはフラッシュメモリ（Flash Memory）のような半導体メモリ素子により構成される補助記憶装置、またはリレーショナルデータベースやキーバリューストアなどのミドルウェアにより構成することができる。 The sound collecting device 200 is configured by loading a special program into a known or dedicated computer having, for example, a central processing unit (CPU), a main storage device (RAM: Random Access Memory), or the like. Device. The sound collecting device 200 executes each process under the control of the central processing unit, for example. The data input to the sound collecting device 200 and the data obtained by each process are stored in the main storage device, for example, and the data stored in the main storage device is read out to the central processing unit as needed. Used for other processing. At least a part of each processing unit of the sound collecting device 200 may be configured by hardware such as an integrated circuit. Each storage unit included in the sound collecting device 200 is, for example, a main storage device such as RAM (Random Access Memory), an auxiliary storage device composed of a hard disk, an optical disk, or a semiconductor memory element such as a flash memory, or an auxiliary storage device. It can be configured with middleware such as relational databases and key-value stores.

＜短時間フーリエ変換部１０７＞
短時間フーリエ変換部１０７は、Nチャネルの時間領域のマイクロホン信号y_n(t)を入力とし、フレームl(エル)毎に周波数領域のマイクロホン信号Y_n(f,l)に短時間フーリエ変換し(Ｓ１０７)、出力する。その周波数f、フレームlでの変換結果を <Short-time Fourier transform unit 107>
The short-time Fourier transform unit 107 takes the microphone signal y _n (t) in the time domain of the N channel as an input, and performs short-time Fourier transform to the microphone signal Y _n (f, l) in the frequency domain for each frame l (el). (S107), output. The conversion result at the frequency f and frame l

のようにベクトル化して扱う。 It is treated as a vector like.

＜ステアリングベクトル決定部１０２＞
ステアリングベクトル決定部１０２は、ステアリングベクトルa(f)を求め（Ｓ１０２）、出力する。ステアリングベクトルの決定方法には様々な公知の技術を利用できる。例えば、ステアリングベクトル決定部１０２は、Nチャネルの周波数領域のマイクロホン信号y(f,l)を入力とし、非特許文献２の方法により、ステアリングベクトルa(f)を求める。 <Steering vector determination unit 102>
The steering vector determination unit 102 obtains the steering vector a (f) (S102) and outputs the steering vector a (f). Various known techniques can be used for determining the steering vector. For example, the steering vector determination unit 102 receives the microphone signal y (f, l) in the frequency domain of the N channel as an input, and obtains the steering vector a (f) by the method of Non-Patent Document 2.

＜アレーフィルタ推定部２０４＞
アレーフィルタ推定部２０４は、Nチャネルの周波数領域のマイクロホン信号y(f,l)とステアリングベクトルa(f)とを入力とし、ステアリングベクトルa(f)をもとに、以下の拘束条件付き最適化問題を解いて、フィルタ係数ベクトルh(f,l)を求め（Ｓ２０４）、出力する。 <Array filter estimation unit 204>
The array filter estimation unit 204 takes the microphone signal y (f, l) and the steering vector a (f) in the frequency domain of the N channel as inputs, and based on the steering vector a (f), the following constrained optimum The optimization problem is solved, the filter coefficient vector h (f, l) is obtained (S204), and the result is output.

拘束条件
h^H(f,l)a(f)=1
ただし、|・|₁は、ベクトル各成分の絶対値の総和すなわちＬ１ノルムを取ることを意味する。従来の最適化では、2乗ノルムすなわちL2ノルムを用いるコスト関数が良く使われてきた。本実施形態では、L2ノルムの代わりにL1ノルムを用いてコスト関数を最適化することで、スパースなベクトル、すなわち、0を多く含むベクトルが得られる。L1ノルムを用いてコスト関数を最適化する方法は、近年、圧縮センシングの分野で知られている（参考文献１参照）。
（参考文献１）田中利幸、「圧縮センシングの数理」、IEICE Fundamental Review, vol. 4, no. 1, pp. 39-47, 2010 Restraint condition
h ^H (f, l) a (f) = 1
However, | · | ₁ means that the sum of the absolute values of each component of the vector, that is, the L1 norm is taken. In traditional optimization, cost functions that use the square norm, or L2 norm, have often been used. In this embodiment, by optimizing the cost function using the L1 norm instead of the L2 norm, a sparse vector, that is, a vector containing many 0s can be obtained. A method of optimizing a cost function using the L1 norm has been known in recent years in the field of compressed sensing (see Reference 1).
(Reference 1) Yuki Tanaka, "Mathematical Science of Compressed Sensing", IEICE Fundamental Review, vol. 4, no. 1, pp. 39-47, 2010

上記の最適化問題では、周波数fごとに、ターゲット音（参照マイクロホンが収音する目的音源の音）を無歪みで出力するという拘束条件のもとで、Lフレーム分のNチャネルの周波数領域のマイクロホン信号y(f,l-L+1),y(f,l-L+2),…,y(f,l)にフィルタ係数ベクトルh(f,l)を適用して得られる値(例えば、h^H(f,l)y(f,l-L+1),h^H(f,l)y(f,l-L+2),…,h^H(f,l)y(f,l))が疎になるようにフィルタ係数ベクトルh(f,l)を求めている。スペクトルグラム上で信号成分のスパース化をかけることで、雑音成分や残響成分を抑えることが可能となる。なお、フレーム数Lは、1以上の整数の何れかとし、例えば、100前後の値に設定することがのぞましい。また、フィルタ係数ベクトルh(f,l)は、適切なタイミングで推定し、出力すればよく、例えば、Lフレームの分のNチャネルの周波数領域のマイクロホン信号y(f,l)を蓄積しておき0.5〜1秒ごとにフィルタ係数ベクトルh(f,l)を求める構成としてもよい。 In the above optimization problem, under the constraint condition that the target sound (the sound of the target sound source picked up by the reference microphone) is output without distortion for each frequency f, the frequency region of the N channel for L frames is used. A value obtained by applying the filter coefficient vector h (f, l) to the microphone signal y (f, l-L + 1), y (f, l-L + 2),…, y (f, l) ( For example, h ^H (f, l) y (f, l-L + 1), h ^H (f, l) y (f, l-L + 2),…, h ^H (f, l) y (f) The filter coefficient vector h (f, l) is calculated so that, l)) becomes sparse. By applying sparseness of signal components on the spectrum gram, it is possible to suppress noise components and reverberation components. The number of frames L should be any of integers of 1 or more, and should be set to a value of around 100, for example. The filter coefficient vector h (f, l) may be estimated and output at an appropriate timing. For example, the microphone signal y (f, l) in the frequency domain of the N channel for the L frame is accumulated. The filter coefficient vector h (f, l) may be obtained every 0.5 to 1 second.

＜アレーフィルタリング部１０５＞
アレーフィルタリング部１０５は、Nチャネルの周波数領域のマイクロホン信号y(f,l)とフィルタ係数ベクトルh(f,l)とを入力とし、次式のようにNチャネルマイクロホン信号y(f,l)にフィルタ係数ベクトルh(f,l)を適用し、周波数領域の出力信号z(f,l)を求め（Ｓ１０５）、出力する。
z(f,l)=h^H(f,l)y(f,l) (5) <Array filtering unit 105>
The array filtering unit 105 takes the microphone signal y (f, l) in the frequency domain of the N channel and the filter coefficient vector h (f, l) as inputs, and the N channel microphone signal y (f, l) is as shown in the following equation. The filter coefficient vector h (f, l) is applied to, and the output signal z (f, l) in the frequency domain is obtained (S105) and output.
z (f, l) = h ^H (f, l) y (f, l) (5)

＜短時間逆フーリエ変換部１０８＞
短時間逆フーリエ変換部１０８は、周波数領域の出力信号z(f,l)を入力とし、全周波数での処理結果を短時間逆フーリエ変換し（Ｓ１０８）、時間領域の出力信号z(t)を得、出力する。 <Short-time inverse Fourier transform unit 108>
The short-time inverse Fourier transform unit 108 takes the output signal z (f, l) in the frequency domain as an input, short-time inverse Fourier transforms the processing results at all frequencies (S108), and outputs the output signal z (t) in the time domain. And output.

＜効果＞
以上の構成により、Nチャネルマイクロホン信号からターゲット音を従来よりクリアに抽出することができる。 <Effect>
With the above configuration, the target sound can be extracted more clearly than before from the N-channel microphone signal.

＜変形例＞
本実施形態では、収音装置として説明したが、アレーフィルタ推定部のみからなるフィルタ係数算出装置であっても、マイクロホン信号からターゲット音を従来よりクリアに抽出するためのフィルタ係数を求めることができる。また、収音装置は、アレーフィルタ推定部とアレーフィルタリング部のみを含む構成としてもよい。別装置からNチャネルの周波数領域のマイクロホン信号y(f,l)とステアリングベクトルa(f)とを受け取り、周波数領域の出力信号z(f,l)を求め、別装置に出力する構成としてもよい。 <Modification example>
In the present embodiment, the sound pick-up device has been described, but even with a filter coefficient calculation device consisting of only an array filter estimation unit, it is possible to obtain a filter coefficient for extracting a target sound clearly from a microphone signal. .. Further, the sound collecting device may be configured to include only an array filter estimation unit and an array filtering unit. It is also possible to receive the microphone signal y (f, l) in the frequency domain of the N channel and the steering vector a (f) from another device, obtain the output signal z (f, l) in the frequency domain, and output it to another device. Good.

なお、本実施形態では、N個の無指向性のマイクロホンからなるマイクロホンアレイのマイクロホン信号y_n(t)を入力としているが、必ずしも無指向性である必要はない。また、あるマイクロホン素子の指向性と他のマイクロホン素子の指向性とは、同一であってもよいし、異なってもよい。 In the present embodiment, the microphone signal y _n (t) of the microphone array consisting of N omnidirectional microphones is input, but it does not necessarily have to be omnidirectional. Further, the directivity of a certain microphone element and the directivity of another microphone element may be the same or different.

＜第二実施形態＞
第一実施形態と異なる部分を中心に説明する。 <Second embodiment>
The part different from the first embodiment will be mainly described.

図３は第二実施形態に係る収音装置の機能ブロック図を、図４はその処理フローを示す。 FIG. 3 shows a functional block diagram of the sound collecting device according to the second embodiment, and FIG. 4 shows a processing flow thereof.

収音装置３００は、ステアリングベクトル決定部１０２に代えてステアリングベクトル決定部３０２を含む。 The sound collecting device 300 includes a steering vector determining unit 302 instead of the steering vector determining unit 102.

＜ステアリングベクトル決定部３０２＞
図５はステアリングベクトル決定部３０２の機能ブロック図を、図６はその処理フローの例を示す。 <Steering vector determination unit 302>
FIG. 5 shows a functional block diagram of the steering vector determination unit 302, and FIG. 6 shows an example of the processing flow.

ステアリングベクトル決定部３０２は、Nチャネルの周波数領域のマイクロホン信号y(f,l)を入力とし、ステアリングベクトルa(f)を求め（Ｓ３０２）、出力する。本実施形態では、ステアリングベクトル決定部３０２は、ノイズ・到来波分解部３０２１及びターゲット方向判定部３０２２を含む。 The steering vector determination unit 302 takes the microphone signal y (f, l) in the frequency domain of the N channel as an input, obtains the steering vector a (f) (S302), and outputs the steering vector a (f). In the present embodiment, the steering vector determination unit 302 includes a noise / arrival wave decomposition unit 3021 and a target direction determination unit 3022.

＜ノイズ・到来波分解部３０２１＞
ノイズ・到来波分解部３０２１は、Nチャネルマイクロホン信号y(f,l)の空間相関行列R(f,l)から、周波数fにおける複数到来波の強度および各マイクロホンのノイズパワーを推定し（Ｓ３０２１）、出力する。 <Noise / arrival wave decomposition unit 3021>
The noise / incoming wave decomposition unit 3021 estimates the intensity of multiple incoming waves at frequency f and the noise power of each microphone from the spatial correlation matrix R (f, l) of the N-channel microphone signal y (f, l) (S3021). ),Output.

例えば、ノイズ・到来波分解部３０２１は、周波数領域のマイクロホン信号y(f,l)を入力とし、周波数f、フレームlにおけるマイクロホン信号y(f,l)を用いて、その空間相関行列R(f,l)を算出する。例えば、次式により算出する。
R(f,l)=E[y(f,l)y(f,l)^H] (22) For example, the noise / arrival wave decomposition unit 3021 takes the microphone signal y (f, l) in the frequency domain as an input, and uses the microphone signal y (f, l) in the frequency f and the frame l to use the spatial correlation matrix R ( Calculate f, l). For example, it is calculated by the following formula.
R (f, l) = E [y (f, l) y (f, l) ^H ] (22)

ただしE[ ]は期待値をとることを意味する。また、y(f,l)^Hは、y(f,l)を転置し複素共役をとったベクトルである。実際の処理では通常E[ ]の代わりに短時間平均を用いる。 However, E [] means to take the expected value. Y (f, l) ^H is a vector obtained by transposing y (f, l) and taking a complex conjugate. In actual processing, short-term averaging is usually used instead of E [].

そして、ノイズ・到来波分解部３０２１は、空間相関行列R(f,l)からK個の方向からの到来波の強度の推定値p_k(f,l)及び各マイクロホン信号Y_n(f,l)に含まれるノイズパワーの推定値q_n(f,l)を求め（Ｓ３０２１）、p_k(f,l)及びq_n(f,l)を対角成分とする対角行列V(f,l)を出力する。ただし、kを到来方向のインデックスとし、平面波の到来可能方向としてK方向を想定し、k=1,2,…,Kとする。よって、対角行列V(f,l)は、以下のように表される。 Then, the noise / arrival wave decomposition unit 3021 is used to estimate the intensity of the arrival wave from K directions from the spatial correlation matrix R (f, l), p _k (f, l), and each microphone signal Y _n (f, f,). Find the estimated value q _n (f, l) of the noise power contained in l) (S3021), and the diagonal matrix V (f) with p _k (f, l) and q _n (f, l) as diagonal components. , l) is output. However, k is the index of the arrival direction, and the K direction is assumed as the arrival direction of the plane wave, and k = 1,2, ..., K. Therefore, the diagonal matrix V (f, l) is expressed as follows.

なおK>Nである。強度の推定値p_k(f,l)及びノイズパワーの推定値q_n(f,l)の推定方法として、例えば、参考文献２の方法をもちいることができる。
（参考文献２）P. Stoica, P. Babu, and J. Li, "SPICE A sparse covariance-based estimation method for array processing", IEEE Transactions on signal processing, vol. 59, no. 2, 2011, 629-638. K> N. As a method for estimating the intensity estimation value p _k (f, l) and the noise power estimation value q _n (f, l), for example, the method of Reference 2 can be used.
(Reference 2) P. Stoica, P. Babu, and J. Li, "SPICE A sparse covariance-based estimation method for array processing", IEEE Transactions on signal processing, vol. 59, no. 2, 2011, 629- 638.

この方法では、あらかじめ平面波の到来可能方向としてK方向(>N)を想定する。周波数ｆにおいて、k番目の方向から振幅１の平面波がマイクロホンアレイに到達したとき、その各マイクロホンのレスポンス(出力信号)をa_k(f)=[a_k,1(f) a_k,2(f) … a_k,N(f)]^Tとする。a_k(f)をk番目の方向に対するステアリングベクトルともいう。a_k,n(f)は、周波数ｆにおいて、k番目の方向からの到来する振幅１の平面波に対するn番目のマイクロホンのレスポンス(出力信号)を表す。なお、a_k(f)は、収音に先立ち予め求めておく。だだし、a_k(f)は、実験(実測)やシミュレーションにより予め求めてもよいし、計算による理論値を用いてもよい。K個のレスポンスベクトルa_k(f)とN×N単位行列I_Nからなる行列
A(f)^H=[a₁(f) a₂(f) … a_K(f) I_N] (23)
を用いて、参考文献２では
R(f,l)=A(f)^HV(f,l)A(f) (24)
の形に行列R(f,l)を行列A(f)^H、対角行列V(f,l)と行列A(f)の積に分解する。この分解により、対角行列V(f,l)に含まれるk番目の方向からの平面波の強度の推定値p_k(f,l)と、n番目のマイクロホン９１−ｎのノイズパワーの推定値q_n(f,l)とが得られる。なお実際には、上記の分解は、
||(A(f)^HV(f,l)A(f))^-1/2(R(f,l)-A(f)^HV(f,l)A(f))R(f,l)^-1/2||² (25)
を最小にする対角行列V(f,l)を求めることに対応する。なお、この式(25)で||x||は行列xのフロベニウスノルムをとることを意味する。 In this method, the K direction (> N) is assumed in advance as the direction in which a plane wave can arrive. At frequency f, when a plane wave with amplitude 1 reaches the microphone array from the kth direction, the response (output signal) of each microphone is a _k (f) = [a _{k, 1} (f) a _{k, 2} ( f)… a _{k, N} (f)] Let ^T be. a _k (f) is also called the steering vector for the kth direction. a _{k and n} (f) represent the response (output signal) of the nth microphone to the plane wave of amplitude 1 coming from the kth direction at the frequency f. Note that a _k (f) is obtained in advance prior to sound collection. However, a _k (f) may be obtained in advance by experiment (actual measurement) or simulation, or a theoretical value calculated may be used. A matrix consisting of K response vectors a _k (f) and an N × N identity matrix I _N
A (f) ^H = [a ₁ (f) a ₂ (f)… a _K (f) _IN ] (23)
In reference 2,
R (f, l) = A (f) ^H V (f, l) A (f) (24)
The matrix R (f, l) is decomposed into the product of the matrix A (f) ^H , the diagonal matrix V (f, l) and the matrix A (f). This decomposition diagonal matrix V (f, l) estimate p _k (f, l) of the intensity of a plane wave from the k-th direction included in the estimate of the n th microphone 91-n of the noise power q _n (f, l) and are obtained. Actually, the above decomposition is
|| (A (f) ^H V (f, l) A (f)) ^{-1 / 2} (R (f, l) -A (f) ^H V (f, l) A (f)) R (f , l) ^-1/2 || ² (25)
Corresponds to finding the diagonal matrix V (f, l) that minimizes. In this equation (25), || x || means to take the Frobenius norm of the matrix x.

＜ターゲット方向判定部３０２２＞
ターゲット方向判定部３０２２は、ターゲット音の到来方向の推定値k_tを求め（Ｓ３０２２）、推定値k_tに対応するステアリングベクトルa(f)=a_kt(f)を出力する。例えば、ターゲット方向判定部３０２２は、対角行列V(f,l)を入力とし、対角行列V(f,l)に含まれる各到来方向kの強度の推定値p_k(f,l)を用いて、強度が所定の値より大きい方向をターゲット音の到来方向と判定し、判定結果(到来方向の推定値)k_tを求める。この例では、ターゲット方向判定部３０２２は、音声パワーが集中している帯域100〜500Hzの強度の推定値p_k(f,l)を用いてターゲット音の到来方向の推定値k_tを求める。この帯域で各到来方向kの強度は <Target direction determination unit 3022>
The target direction determination unit 3022 obtains an estimated value k _t in the arrival direction of the target sound (S3022), and outputs a steering vector a (f) = a _kt (f) corresponding to the estimated value k _t . For example, the target direction determination unit 3022 takes the diagonal matrix V (f, l) as an input, and estimates the intensity of each arrival direction k included in the diagonal matrix V (f, l) p _k (f, l). Is used to determine the direction in which the intensity is greater than a predetermined value as the arrival direction of the target sound, and the determination result (estimated value in the arrival direction) k _t is obtained. In this example, the target direction determination unit 3022 obtains an estimated value k _t in the arrival direction of the target sound using an estimated value p _k (f, l) of the intensity in the band 100 to 500 Hz in which the voice power is concentrated. In this band, the intensity of each arrival direction k is

になる。この例では、f₀は100Hz、f₁は500Hzに相当する。b(k,l)がピークを取る方向k_peakを、フレームlでのターゲット音の到来方向(以下、ターゲット音方向ともいう)の候補とする。そして、b(k,l)の最大値をb_maxとし、ピーク値がb_max×α以上の値を持つピークの位置をターゲット音方向k_tとして抽出する。ターゲット音方向k_tは複数の場合もあるし1つの場合もある。αとしては、例えば-12dB〜-6dBの範囲の値をもちいればよい。ターゲット音の到来方向の推定値k_tに対応するステアリングベクトルa(f)=a_kt(f)、つまり、行列A(f)^H=[a₁(f) a₂(f) … a_K(f) I_N]の第k_t列のベクトルを出力する。 become. In this example, f ₀ corresponds to 100 Hz and f ₁ corresponds to 500 Hz. The direction in which b (k, l) takes a _peak k _peak is a candidate for the arrival direction of the target sound in the frame l (hereinafter, also referred to as the target sound direction). Then, the maximum value of b (k, l) is set to b _max, and the position of the peak having a peak value of b _max × α or more is extracted as the target sound direction k _t . The target sound direction k _t may be multiple or one. As α, for example, a value in the range of -12 dB to -6 dB may be used. The steering vector a (f) = a _kt (f) corresponding to the estimated value k _t in the direction of arrival of the target sound, that is, the matrix A (f) ^H = [a ₁ (f) a ₂ (f)… a _K ( f) outputting a vector of the k _t columns of I _N].

アレーフィルタ推定部２０４は、第一実施形態と同様の処理を行う。ただし、ターゲット音方向k_tが複数の場合には、各ターゲット音方向k_tに対してフィルタ係数ベクトルh_kt(f,l)を求めればよい。 The array filter estimation unit 204 performs the same processing as in the first embodiment. However, if the target sound direction k _t is plural, may be obtained a filter coefficient vector h _kt (f, l) for each target sound direction k _t.

＜効果＞
このような構成とすることで、第一実施形態と同様の効果を得ることができる。なお、本実施形態では、空間相関行列を算出し、利用しているが、空間相関行列はターゲット音の到来方向を推定するために利用しているのであって、フィルタ係数ベクトルを求める最適化問題に利用しているのではない。そのため、推定誤差のある相関行列からマイクロホンアレイのフィルタ係数ベクトルが求め、理想の収音ビームからズレが生じ、雑音や残響の抑え込みが本来より弱くなるという問題は生じない。 <Effect>
With such a configuration, the same effect as that of the first embodiment can be obtained. In this embodiment, the spatial correlation matrix is calculated and used, but the spatial correlation matrix is used to estimate the arrival direction of the target sound, and is an optimization problem for obtaining the filter coefficient vector. I'm not using it for. Therefore, the filter coefficient vector of the microphone array is obtained from the correlation matrix with the estimation error, and the problem that the suppression of noise and reverberation becomes weaker than the original problem does not occur due to the deviation from the ideal sound collecting beam.

＜第三実施形態＞
第一実施形態と異なる部分を中心に説明する。 <Third Embodiment>
The part different from the first embodiment will be mainly described.

第一実施形態のアレーフィルタ推定部２０４は、ノルムの大きいフィルタ係数ベクトルh(f,l)を推定することがある。このようなフィルタ係数ベクトルh(f,l)は、ターゲット音やノイズの特性変動への感度が高くなり、現実の信号に対して雑音抑圧性能や非ターゲット音抑圧性能が劣化してしまう場合がある。 The array filter estimation unit 204 of the first embodiment may estimate a filter coefficient vector h (f, l) having a large norm. Such a filter coefficient vector h (f, l) becomes highly sensitive to characteristic fluctuations of target sound and noise, and noise suppression performance and non-target sound suppression performance may deteriorate with respect to an actual signal. is there.

この劣化を防止するために、フィルタ係数ベクトルh(f,l)のノルムを一定量以下に設定する仕組みを導入する。 In order to prevent this deterioration, we will introduce a mechanism to set the norm of the filter coefficient vector h (f, l) to a certain amount or less.

図３は第三実施形態に係る収音装置の機能ブロック図を、図４はその処理フローを示す。 FIG. 3 shows a functional block diagram of the sound collecting device according to the third embodiment, and FIG. 4 shows a processing flow thereof.

収音装置４００は、アレーフィルタ推定部２０４に代えてアレーフィルタ推定部４０４を含む。 The sound collecting device 400 includes an array filter estimation unit 404 instead of the array filter estimation unit 204.

＜アレーフィルタ推定部４０４＞
アレーフィルタ推定部４０４は、Nチャネルの周波数領域のマイクロホン信号y(f,l)とステアリングベクトルa(f)とを入力とし、ステアリングベクトルa(f)をもとに、以下の拘束条件付き最適化問題を解いて、フィルタ係数ベクトルh(f,l)を求め（Ｓ４０４）、出力する。 <Array filter estimation unit 404>
The array filter estimation unit 404 takes the microphone signal y (f, l) and the steering vector a (f) in the frequency domain of the N channel as inputs, and based on the steering vector a (f), the following constrained optimum The conversion problem is solved, the filter coefficient vector h (f, l) is obtained (S404), and the result is output.

拘束条件
h^H(f,l)a(f)=1 Restraint condition
h ^H (f, l) a (f) = 1

ただし||h(f,l)||はベクトルh(f,l)の２乗ノルムをとることを意味する。またλは事前に指定するパラメータであり、10前後の値が設定される。つまり、上記の最適化問題では、周波数fごとに、ターゲット音を無歪みで出力し、かつ、フィルタ係数ベクトルh(f,l)のノルムが所定の値以下であるというという拘束条件のもとで、Lフレーム分のNチャネルの周波数領域のマイクロホン信号y(f,l-L+1),y(f,l-L+2),…,y(f,l)にフィルタ係数ベクトルh(f,l)を適用して得られる値が疎になるようにフィルタ係数ベクトルh(f,l)を求めている。 However, || h (f, l) || means to take the square norm of the vector h (f, l). In addition, λ is a parameter specified in advance, and a value of around 10 is set. That is, in the above optimization problem, the target sound is output without distortion for each frequency f, and the norm of the filter coefficient vector h (f, l) is equal to or less than a predetermined value. Then, the filter coefficient vector h (f, l) is added to the microphone signals y (f, l-L + 1), y (f, l-L + 2),…, y (f, l) in the frequency region of the N channel for the L frame. The filter coefficient vector h (f, l) is calculated so that the values obtained by applying f, l) are sparse.

＜効果＞
このような構成とすることで、第一実施形態と同様の効果を得ることができる。さらに、ノルムの大きいフィルタ係数ベクトルh(f,l)を推定することを防ぎ、雑音抑圧性能や非ターゲット音抑圧性能の劣化を抑えることができる。なお、本実施形態と第二実施形態を組合せてもよい。 <Effect>
With such a configuration, the same effect as that of the first embodiment can be obtained. Furthermore, it is possible to prevent the estimation of the filter coefficient vector h (f, l) having a large norm, and to suppress the deterioration of the noise suppression performance and the non-target sound suppression performance. The present embodiment and the second embodiment may be combined.

＜その他の変形例＞
本発明は上記の実施形態及び変形例に限定されるものではない。例えば、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 <Other variants>
The present invention is not limited to the above embodiments and modifications. For example, the various processes described above may not only be executed in chronological order according to the description, but may also be executed in parallel or individually as required by the processing capacity of the device that executes the processes. In addition, changes can be made as appropriate without departing from the spirit of the present invention.

＜プログラム及び記録媒体＞
また、上記の実施形態及び変形例で説明した各装置における各種の処理機能をコンピュータによって実現してもよい。その場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 <Programs and recording media>
In addition, various processing functions in each device described in the above-described embodiment and modification may be realized by a computer. In that case, the processing content of the function that each device should have is described by the program. Then, by executing this program on the computer, various processing functions in each of the above devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing content can be recorded on a computer-readable recording medium. The computer-readable recording medium may be, for example, a magnetic recording device, an optical disk, a photomagnetic recording medium, a semiconductor memory, or the like.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させてもよい。 Further, the distribution of this program is performed, for example, by selling, transferring, renting, or the like a portable recording medium such as a DVD or a CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in the storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶部に格納する。そして、処理の実行時、このコンピュータは、自己の記憶部に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実施形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよい。さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、プログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its own storage unit. Then, when the process is executed, the computer reads the program stored in its own storage unit and executes the process according to the read program. Further, as another embodiment of this program, a computer may read the program directly from a portable recording medium and execute processing according to the program. Further, every time the program is transferred from the server computer to this computer, the processing according to the received program may be executed sequentially. Further, the above-mentioned processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and the result acquisition without transferring the program from the server computer to this computer. May be. In addition, the program shall include information used for processing by a computer and equivalent to the program (data that is not a direct command to the computer but has a property of defining the processing of the computer, etc.).

また、コンピュータ上で所定のプログラムを実行させることにより、各装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Further, although each device is configured by executing a predetermined program on a computer, at least a part of these processing contents may be realized by hardware.

Claims

Nを2以上の整数の何れかとし、N個のマイクロホンからなるマイクロホンアレーにおける音源方向に対するステアリングベクトルをもとに、ターゲット音を無歪みで出力するという拘束条件のもとで、マイクロホンアレーの周波数領域のNチャネルマイクロホン信号にフィルタ係数ベクトルを適用して得られるマイクロホンアレー出力信号のベクトルが疎になるように、周波数領域のマイクロホンアレー出力信号のベクトルのL1ノルムをコスト関数として用いて前記フィルタ係数ベクトルを求めるアレーフィルタ推定部を含み、
前記アレーフィルタ推定部は、fを周波数番号とし、lをフレーム番号とし、Lを1以上の整数の何れかとし、周波数領域のNチャネルマイクロホン信号をy(f,l)とし、フィルタ係数ベクトルをh(f,l)とし、h ^H (f,l)をh(f,l)を転置し複素共役をとったベクトルとし、ステアリングベクトルをa(f)とし、ベクトルAのL1ノルムを|A| ₁ とし、
拘束条件
h ^H (f,l)a(f)=1
のもとで

を求めて、フィルタ係数ベクトルh(f,l)を得る、
フィルタ係数算出装置。 The frequency of the microphone array under the constraint that N is one of the integers of 2 or more and the target sound is output without distortion based on the steering vector for the sound source direction in the microphone array consisting of N microphones. The filter coefficient is used as a cost function using the L1 norm of the vector of the microphone array output signal in the frequency domain so that the vector of the microphone array output signal obtained by applying the filter coefficient vector to the N-channel microphone signal in the region becomes sparse. Includes array filter estimator for vector
In the array filter estimation unit, f is a frequency number, l is a frame number, L is one of integers of 1 or more, an N-channel microphone signal in the frequency domain is y (f, l), and a filter coefficient vector is set. Let h (f, l), let h ^H (f, l) be the vector obtained by transposing h (f, l) and take the complex conjugate, let the steering vector be a (f), and let the L1 norm of the vector A be | A. | ₁ and then,
Restraint condition
h ^H (f, l) a (f) = 1
Under

To obtain the filter coefficient vector h (f, l),
Filter coefficient calculation device.

請求項１のフィルタ係数算出装置であって、
前記拘束条件は、
ターゲット音を無歪みで出力し、かつ、フィルタ係数ベクトルのノルムが所定の値以下であるという条件を含む、
フィルタ係数算出装置。 The filter coefficient calculation device according to claim 1 .
The constraint condition is
Includes the condition that the target sound is output without distortion and the norm of the filter coefficient vector is less than or equal to a predetermined value.
Filter coefficient calculation device.

請求項２のフィルタ係数算出装置であって、
前記アレーフィルタ推定部は、fを周波数番号とし、lをフレーム番号とし、Lを1以上の整数の何れかとし、周波数領域のNチャネルマイクロホン信号をy(f,l)とし、フィルタ係数ベクトルをh(f,l)とし、h^H(f,l)をh(f,l)を転置し複素共役をとったベクトルとし、ステアリングベクトルをa(f)とし、ベクトルAのL1ノルムを|A|₁とし、ベクトルAのL2ノルムを||A||とし、λを所定のパラメータとし、
拘束条件
h ^H (f,l)a(f)=1
および

のもとで

を求めて、フィルタ係数ベクトルh(f,l)を得る、
フィルタ係数算出装置。 The filter coefficient calculation device according to claim 2 .
In the array filter estimation unit, f is a frequency number, l is a frame number, L is one of integers of 1 or more, the N-channel microphone signal in the frequency region is y (f, l), and the filter coefficient vector is set. Let h (f, l), let h ^H (f, l) be the vector obtained by transposing h (f, l) and take the complex conjugate, let the steering vector be a (f), and let the L1 norm of the vector A be | A. | ₁ and the L2 norm of the vector A is || A || and λ is the predetermined parameter.
Restraint condition
h ^H (f, l) a (f) = 1
and

Under

The seeking to obtain the filter coefficient vector h (f, l),
Filter coefficient calculation device.

Nを2以上の整数の何れかとし、N個のマイクロホンからなるマイクロホンアレーにおける音源方向に対するステアリングベクトルをもとに、ターゲット音を無歪みで出力するという拘束条件のもとで、マイクロホンアレーの周波数領域のNチャネルマイクロホン信号にフィルタ係数ベクトルを適用して得られるマイクロホンアレー出力信号のベクトルが疎になるように、周波数領域のマイクロホンアレー出力信号のベクトルのL1ノルムをコスト関数として用いて前記フィルタ係数ベクトルを求めるアレーフィルタ推定ステップを含み、
前記アレーフィルタ推定ステップにおいて、fを周波数番号とし、lをフレーム番号とし、Lを1以上の整数の何れかとし、周波数領域のNチャネルマイクロホン信号をy(f,l)とし、フィルタ係数ベクトルをh(f,l)とし、h ^H (f,l)をh(f,l)を転置し複素共役をとったベクトルとし、ステアリングベクトルをa(f)とし、ベクトルAのL1ノルムを|A| ₁ とし、
拘束条件
h ^H (f,l)a(f)=1
のもとで

を求めて、フィルタ係数ベクトルh(f,l)を得る、
フィルタ係数算出方法。 The frequency of the microphone array under the constraint that N is one of the integers of 2 or more and the target sound is output without distortion based on the steering vector for the sound source direction in the microphone array consisting of N microphones. The filter coefficient is used as a cost function using the L1 norm of the vector of the microphone array output signal in the frequency domain so that the vector of the microphone array output signal obtained by applying the filter coefficient vector to the N-channel microphone signal in the region becomes sparse. Includes array filter estimation step to find vector
In the array filter estimation step, f is the frequency number, l is the frame number, L is one of integers of 1 or more, the N-channel microphone signal in the frequency region is y (f, l), and the filter coefficient vector is set. Let h (f, l), let h ^H (f, l) be the vector obtained by transposing h (f, l) and take the complex conjugate, let the steering vector be a (f), and let the L1 norm of the vector A be | A. | ₁ and then,
Restraint condition
h ^H (f, l) a (f) = 1
Under

To obtain the filter coefficient vector h (f, l),
Filter coefficient calculation method.

請求項１から請求項３の何れかのフィルタ係数算出装置としてコンピュータを機能させるためのプログラム。 Either the filter coefficient calculation equipment and programs for causing a computer of claims 1 to 3.