JP4444345B2

JP4444345B2 - Sound source separation system

Info

Publication number: JP4444345B2
Application number: JP2008133175A
Authority: JP
Inventors: 弘史中島; 一博中臺; 雄二長谷川; 広司辻野
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2007-06-08
Filing date: 2008-05-21
Publication date: 2010-03-31
Anticipated expiration: 2028-05-21
Also published as: DE602008000475D1; JP2008306712A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a system capable of separating sound source signals with high precision while improving a convergence rate and convergence precision. <P>SOLUTION: Processing of updating a current separation matrix W<SB>k</SB>to a next separation matrix W<SB>k+1</SB>such that a next value J(W<SB>k+1</SB>) of a cost function is closer to a minimum value J(W<SB>0</SB>) than a current value J(W<SB>k</SB>) is repeated. An update amount ΔW<SB>k</SB>of the separation matrix is increased as the current value J(W<SB>k</SB>) of the cost function is increased and is decreased as a current gradient δJ(W<SB>k</SB>)/δW of the cost function is rapid (wherein "rounded d" (partial derivative symbol) is transposed with δ). On the basis of input signals (x) from a plurality of microphones Mi and an optimal separation matrix W<SB>0</SB>, it is possible to separate sound source signals y(=W<SB>0</SB>×(x)) with high precision while improving a convergence rate and convergence precision. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、音源分離システムに関する。 The present invention relates to a sound source separation system.

逆フィルタに基づく分離法等にしたがって伝達系の情報がなくても音源を分離する手法としてブラインド音源分離（ＢＳＳ）が提案されている（非特許文献１〜４参照）。ＢＳＳとしては無相関化（ＤＳＳ（ＤｅｃｏｒａｔｉｏｎｂａｓｅｄＳｏｕｒｃｅＳｅｐａｒａｔｉｏｎ））、独立成分分析（ＩＣＡ（ＩｎｄｅｐｅｎｄｅｎｔＣｏｍｐｏｎｅｎｔＡｎａｌｙｓｉｓ））および高次の無相関化（ＨＤＳＳ（Ｈｉｇｅｒ−ｏｒｄｅｒＤＳＳ））に基づく音源分離手法と、これらの手法のそれぞれに幾何的情報を加えた分離法（ＧＳＳ（ＧｅｏｍｅｔｒｉｃｃｏｎｓｔｒａｉｎｅｄＳｏｕｒｃｅＳｅｐａｒａｔｉｏｎ），ＧＩＣＡ（ＧｅｏｍｅｔｒｉｃｃｏｎｓｔｒａｉｎｅｄＩＣＡ），ＧＨＤＳＳ（ＧｅｏｍｅｔｒｉｃｃｏｎｓｔｒａｉｎｅｄＨＤＳＳ））が知られている。以下、ＢＳＳの概要について説明する。 Blind sound source separation (BSS) has been proposed as a method of separating sound sources even if there is no transmission system information according to a separation method based on an inverse filter (see Non-Patent Documents 1 to 4). As a BSS, a sound source separation method based on decorrelation (DSS (Decoration Based Source Separation)), independent component analysis (ICA (Independent Component Analysis)) and higher-order decorrelation (HDSS (Higer-order DSS)), There are known separation methods (GSS (Geometrically Constrained Source Separation), GICA (Geometrically Constrained ICA), GHDSS (Geometrically Constrained HDSS)) in which geometric information is added to each of these methods. Hereinafter, an outline of the BSS will be described.

Ｍ個の音源信号の周波数特性をｓ（ω）＝［ｓ₁（ω），ｓ₂（ω），‥，ｓ_N（ω）］^T（「Ｔ」は転置を表わす。）とすれば、Ｎ（≦Ｍ）個のマイクロホンでの入力信号の特性ｘ（ω）＝［ｘ₁（ω），ｘ₂（ω），‥，ｘ_N（ω）］^Tは伝達関数行列Ｈ（ω）を用いて式（１）で表わされる。伝達関数Ｈ（ω）の要素Ｈ_ijは音源ｉからマイクロホンｊまでの伝達関数を表わしている。 If the frequency characteristics of the M sound source signals are s (ω) = [s ₁ (ω), s ₂ (ω),..., S _N (ω)] ^T (“T” represents transposition). Input signal characteristics x (ω) = [x ₁ (ω), x ₂ (ω),..., X _N (ω)] ^T is the transfer function matrix H (ω). And is represented by Formula (1). The element H _ij of the transfer function H (ω) represents the transfer function from the sound source i to the microphone j.

ｘ（ω）＝Ｈ（ω）ｓ（ω） ‥（１） x (ω) = H (ω) s (ω) (1)

音源分離問題は分離行列Ｗ（ω）を用いて式（２）で表わされる。 The sound source separation problem is expressed by Equation (2) using a separation matrix W (ω).

ｙ（ω）＝Ｗ（ω）ｘ（ω） ‥（２） y (ω) = W (ω) x (ω) (2)

音源分離処理はｙ（ω）＝ｓ（ω）となる分離行列Ｗ（ω）を求めることとして定式化される。伝達関数行列Ｈ（ω）が既知である場合には分離行列Ｗ（ω）は類似逆行列Ｈ⁺（ω）を用いて計算される。しかるに、実際には伝達関数行列Ｈ（ω）が既知であることはまれである。ＢＳＳは、Ｈ（ω）が未知の状態でＷ（ω）を求める手法である。
１．ＢＳＳ（オフライン処理）
ＢＳＳの一般的手法は分離度を評価するコスト関数Ｊ（ｙ）を最小化するｙを求める処理として式（３）により記述される。 The sound source separation process is formulated as obtaining a separation matrix W (ω) such that y (ω) = s (ω). If the transfer function matrix H (ω) is known, the separation matrix W (ω) is calculated using the similar inverse matrix H ⁺ (ω). However, in practice, the transfer function matrix H (ω) is rarely known. BSS is a method for obtaining W (ω) in a state where H (ω) is unknown.
1. BSS (offline processing)
A general method of BSS is described by Expression (3) as a process for obtaining y that minimizes the cost function J (y) for evaluating the degree of separation.

Ｗ_BSS＝ａｒｇｍｉｎ_W［Ｊ（ｙ）］＝ａｒｇｍｉｎ_W［Ｊ（Ｗｘ）］ ‥（３） W _BSS = argmin _W [J (y)] = argmin _W [J (Wx)] (3)

コスト関数Ｊ（ｙ）は手法によって異なり、ＤＳＳによればｙの相関行列Ｒ_yy＝Ｅ［ｙｙ^H］に基づき、フロベニウスノルム（行列のすべての要素の絶対値の二乗和を表わしている。）を用いて式（４）にしたがって算出される。 The cost function J (y) varies depending on the method. According to DSS, the Frobenius norm (representing the sum of squares of the absolute values of all elements of the matrix) is based on the correlation matrix R _yy = E [yy ^H ] of y. Is calculated according to the equation (4).

Ｊ_DSS（Ｗ）＝‖Ｒ_yy−Ｄｉａｇ［Ｒ_yy］‖² ‥（４） J _DSS (W) = ‖R _yy -Diag [R _yy ] ‖ ² (4)

また、Ｋ−Ｌ情報量を利用したＩＣＡによればコスト関数Ｊは、ｙの同時ＰＤＦ（確率密度関数）ｐ（ｙ）と、ｙの周辺ＰＤＦｑ（ｙ）＝Π_kｐ（ｙ_k）とに基づき、式（５）にしたがって算出される（非特許文献５参照）。 Further, according to ICA using the KL information amount, the cost function J is expressed as y simultaneous PDF (probability density function) p (y), y neighboring PDFq (y) = Π _k p (y _k ) Is calculated according to the equation (5) (see Non-Patent Document 5).

Ｊ_ICA（Ｗ）＝∫ｄｙ・ｐ（ｙ）Ｌｏｇ｛ｐ（ｙ）／ｑ（ｙ）｝ ‥（５） J _ICA (W) = ∫dy · p (y) Log {p (y) / q (y)} (5)

式（３）を満たすＷは、Ｊ（Ｗ）がＪ（Ｗ_k）（ｋは反復回数）の周辺で最も勾配が急なＷの方向を表わす行列Ｊ’（Ｗ_k）と、ステップサイズパラメータμとに基づき、式（６）で表わされる勾配法にしたがって繰り返し計算により決定される。 W satisfying Equation (3) is a matrix J ′ (W _k ) representing the direction of W with the steepest gradient around J (W) where J (W _k ) (k is the number of iterations), and a step size parameter. Based on μ, it is determined by repetitive calculation according to the gradient method represented by the equation (6).

Ｗ_k+1＝Ｗ_k−μＪ’（Ｗ_t） ‥（６） W _{k + 1} = W _k −μJ ′ (W _t ) (6)

行列Ｊ’（Ｗ_k）は複素勾配演算法等にしたがって算出される（非特許文献６参照）。ＤＳＳによれば行列Ｊ’（Ｗ）は式（７）により表わされる。 The matrix J ′ (W _k ) is calculated according to a complex gradient calculation method or the like (see Non-Patent Document 6). According to DSS, the matrix J ′ (W) is expressed by equation (7).

Ｊ’_DSSoff（Ｗ）＝２［Ｒ_yy−Ｄｉａｇ［Ｒ_yy］］ＷＲ_xx ‥（７） J ′ _DSSoff (W) = 2 [R _yy −Diag [R _yy ]] WR _xx (7)

ＩＣＡによれば行列Ｊ’（Ｗ）は行列Ｒ_φ(y)y＝Ｅ［φ（ｙ）ｙ^T］と、式（９）および（１０）で定義される関数φ（ｙ）にしたがって式（８）により表わされる。 According to ICA, the matrix J ′ (W) is expressed by the matrix R _{φ (y) y} = E [φ (y) y ^T ] and the function φ (y) defined by the equations (9) and (10). It is represented by (8).

Ｊ’_ICAoff（Ｗ）＝［Ｒ_φ(y)y−Ｉ］［Ｗ^-1］^T ‥（８）
φ（ｙ）＝［φ（ｙ₁），φ（ｙ₂），‥，φ（ｙ_N）］^T‥（９）
φ（ｙ_i）＝−（∂／∂ｙ_i）Ｌｏｇｐ（ｙ_i） ‥（１０） J ′ _ICAoff (W) = [R _{φ (y) y} −I] [W ⁻¹ ] ^T (8)
φ (y) = [φ (y ₁ ), φ (y ₂ ),..., φ (y _N )] ^T (9)
φ (y _i ) = − (∂ / ∂y _i ) Logp (y _i ) (10)

２．適応ＢＳＳ
適応ＢＳＳによれば、一般的に再起処理での期待値演算を省略して即時データが利用される。具体的にはＥ［ｙｙ^H］をｙｙ^Hに変換する。更新式は式（６）と同様であるが、反復回数「ｋ」が時間の意味をも含む。オフライン処理では精度を高めるために小さいステップサイズで反復回数を多くすることができるが、適応処理でこの方法が採用されると適応時間が長くなってパフォーマンスの質が低下する。したがって、適応ＢＳＳのステップサイズパラメータμの調節はオフラインＢＳＳよりも重要である。適応ＢＳＳにおけるＤＳＳおよびＩＣＡのそれぞれによる行列Ｊ’は式（１１）および（１２）のそれぞれにより表わされる。ただし、ＩＣＡについては相関行列の非対角要素のみに着目した手法にしたがって自然勾配に基づく更新方法を利用する方法によって記述されている（非特許文献７参照）。 2. Adaptive BSS
According to the adaptive BSS, in general, immediate data is used by omitting the expected value calculation in the restart process. Specifically, E [yy ^H ] is converted to yy ^H. The update formula is similar to formula (6), but the number of iterations “k” also includes the meaning of time. In offline processing, it is possible to increase the number of iterations with a small step size in order to increase accuracy. However, when this method is adopted in adaptive processing, the adaptation time becomes long and the quality of performance decreases. Therefore, the adjustment of the step size parameter μ of the adaptive BSS is more important than the offline BSS. The matrix J ′ by DSS and ICA in the adaptive BSS is expressed by equations (11) and (12), respectively. However, ICA is described by a method that uses an update method based on a natural gradient according to a method that focuses only on non-diagonal elements of a correlation matrix (see Non-Patent Document 7).

Ｊ’_DSS（Ｗ）＝２［ｙｙ^H−Ｄｉａｇ［ｙｙ^H］］Ｗ_xx ^H ‥（１１）
Ｊ’_ICA（Ｗ）＝［φ（ｙ）ｙ^H−Ｄｉａｇ［φ（ｙ）ｙ^H］］Ｗ ‥（１２） J ′ _DSS (W) = 2 [yy ^H −Diag [yy ^H ]] W _xx ^H (11)
J ′ _ICA (W) = [φ (y) y ^H −Diag [φ (y) y ^H ]] W (12)

３．幾何情報による拘束条件付きＢＳＳ（ＧＢＳＳ）
幾何情報（マイクロホンおよび音源のそれぞれの位置）を利用してＩＣＡで起こるパーミュテーション問題およびスケーリング問題を解決する手法が提案されている（非特許文献８〜１１参照）。ＧＳＳによれば、幾何制約の誤差と分離の誤差とを合成した値がコスト関数として用いられる。たとえば、コスト関数Ｊ（Ｗ）は幾何情報に基づく線形拘束の誤差Ｊ_LC（Ｗ）と、分離系の誤差Ｊ_SS（Ｗ）と、正規化係数λとに基づき、式（１３）にしたがって定められる。 3. BSS with constraints based on geometric information (GBSS)
A method for solving the permutation problem and the scaling problem that occur in ICA using geometric information (the respective positions of the microphone and the sound source) has been proposed (see Non-Patent Documents 8 to 11). According to GSS, a value obtained by synthesizing a geometric constraint error and a separation error is used as a cost function. For example, the cost function J (W) is determined according to the equation (13) based on the linear constraint error J _LC (W) based on the geometric information, the separation system error J _SS (W), and the normalization coefficient λ. It is done.

Ｊ（Ｗ）＝Ｊ_LC（Ｗ）＋λＪ_ss（Ｗ） ‥（１３） J (W) = J _LC (W) + λJ _ss (W) (13)

線形拘束の誤差Ｊ_LC（Ｗ）としては、式（１４）で表わされる遅延和のビームフォーミング法における係数からの差Ｊ_LCDS（Ｗ）または式（１５）で表わされる死角型のビームフォーミング法における係数からの差Ｊ_LCNULL（Ｗ）が採用される。 As the error J _LC (W) of the linear constraint, the difference from the coefficient in the beam forming method of the delay sum represented by the equation (14) J _LCDS (W) or the blind angle type beam forming method represented by the equation (15) is used. The difference J _LCNULL (W) from the coefficient is adopted.

Ｊ_LCDS（Ｗ）＝‖Ｄｉａｇ［ＷＤ−Ｉ］‖² ‥（１４）
Ｊ_LCNULL（Ｗ）＝‖ＷＤ−Ｉ‖² ‥（１５） J _LCDS (W) = ‖Diag [WD-I] ‖ ² (14)
J _LCNULL (W) = ‖WD-I‖ ² (15)

ＧＳＳでは分離系の誤差Ｊ_ss（Ｗ）として式（４）におけるＪ_DSS（Ｗ）が採用される（非特許文献１２参照）。そのほか、分離系の誤差Ｊ_SS（Ｗ）として式（５）におけるＪ_ICA（Ｗ）が採用されうるが、この場合、幾何情報による線形拘束付きの適応ＩＣＡ（ＧＩＣＡ）となる。この適応ＧＩＣＡは線形拘束の誤差を許す弱い制約を設けた手法であり、非特許文献１１に記載されているような線形拘束を絶対条件として用いる強い制約の手法とは異なる。
L.Parra and C.Spence, Conductive blind source separation of non-stationary source, IEEE Trans. on Speech and Audio Proceeding, vol.8, no.3, 2000, pp.320-327 F.Asano, S.Ikeda, M.Ogawa, H.Asoh and N.Kitawaki, Combined Approach of Array Proceeding and Independent Component Analysis for Blind Separation of Acoustic Signals, IEEE Trans. on Speech and Audio Proceeding, vol.11, no.3, 2003, pp.204-215 M.Miyoshi and Y.Kaneda, Inverse Filtering of Room Acoustics, IEEE Trans. on Acoustic Speech and Signal Proceeding, vol.ASSP-36, no.2, 1988, pp.145-152 H.Nakajima, M.Miyoshi and M.Tohyama, Sound field control by Indefinite MINT Filters, IEICE Trans., Fundamentals, vol.E-80A, no.5, 1997, pp.821-824 S.Ikeda and M.Murata, A method of ICA in time-frequencydomain, Proc.Workshop Indep. Compom. Anal. Signal. 1999, pp.365-370 D.H.Brandwood, B.A, A complex gradient operator and itsapplication in adaptive array theory, Proc. IEE Proc., vol.130, Pts. Fand H, No.1, 1983, pp.11-16 S.Amari, Natural gradient works efficiently in learning, newral Compt., vol.10, 1988, pp.251-276 L.Parra and C.Alvino, Gepmetric Source Separation: Merging Convultive Source Separation with Geometric Beamforming, IEEE Trans. on Speech and Audio Proceeding, vol.10, no.6, 2002, pp.352-362 R.Mukai, H.Sawada, S.Araki and S.Makino, Blind Source Separation of many signals in the frequency domain, in Proc. of ICASSP2006, vol.V, 2006, pp.969-972 H. Saruwatari, T.Kawamura, T.Nishikawa, A.Lee and K.Shikano, Blind Source Separation Based on a Fast Convergence Algorithm Combining ICA and Beamforming, IEEE Trans. on Speech and Audio Proceeding, vol.14, no.2, 2006, pp.666-678 M.Knaak, S.Araki snd S.Makino, Geometrically ConstrainedIndependent Component Analysis, IEEE Trans. on Speech and Audio Proceeding, vol.15, no.2, 2007, pp.715-726 J.Valin, J.Rouat and F.Michaud, Enhanced Robot Audition Based on Microphone Array Source Separation with Post-Filter, Proc. of 2004 IEE/RSJ IROS, 2004, pp.2123-2128 In GSS, J _DSS (W) in Equation (4) is adopted as the error J _ss (W) of the separation system (see Non-Patent Document 12). In addition, J _ICA (W) in Expression (5) can be adopted as the error J _SS (W) of the separation system, but in this case, the adaptive ICA (GICA) with linear constraint by geometric information is obtained. This adaptive GICA is a technique in which a weak constraint allowing a linear constraint error is provided, and is different from a strong constraint method using a linear constraint as described in Non-Patent Document 11 as an absolute condition.
L. Parra and C. Spence, Conductive blind source separation of non-stationary source, IEEE Trans. On Speech and Audio Proceeding, vol.8, no.3, 2000, pp.320-327 F. Asano, S. Ikeda, M. Ogawa, H. Asoh and N. Kitawaki, Combined Approach of Array Proceeding and Independent Component Analysis for Blind Separation of Acoustic Signals, IEEE Trans. On Speech and Audio Proceeding, vol.11, no .3, 2003, pp.204-215 M. Miyoshi and Y. Kaneda, Inverse Filtering of Room Acoustics, IEEE Trans. On Acoustic Speech and Signal Proceeding, vol.ASSP-36, no.2, 1988, pp.145-152 H. Nakajima, M. Miyoshi and M. Tohyama, Sound field control by Indefinite MINT Filters, IEICE Trans., Fundamentals, vol.E-80A, no.5, 1997, pp.821-824 S. Ikeda and M. Murata, A method of ICA in time-frequency domain, Proc. Workshop Indep. Compom. Anal. Signal. 1999, pp. 365-370 DHBrandwood, BA, A complex gradient operator and its application in adaptive array theory, Proc.IEE Proc., Vol.130, Pts.Fand H, No.1, 1983, pp.11-16 S. Amari, Natural gradient works efficiently in learning, newral Compt., Vol.10, 1988, pp.251-276 L.Parra and C. Alvino, Gepmetric Source Separation: Merging Convultive Source Separation with Geometric Beamforming, IEEE Trans. On Speech and Audio Proceeding, vol.10, no.6, 2002, pp.352-362 R.Mukai, H.Sawada, S.Araki and S.Makino, Blind Source Separation of many signals in the frequency domain, in Proc. Of ICASSP2006, vol.V, 2006, pp.969-972 H. Saruwatari, T. Kawamura, T. Nishikawa, A. Lee and K. Shikano, Blind Source Separation Based on a Fast Convergence Algorithm Combining ICA and Beamforming, IEEE Trans. On Speech and Audio Proceeding, vol.14, no.2 , 2006, pp.666-678 M. Knaak, S. Araki snd S. Makino, Geometrically Constrained Independent Component Analysis, IEEE Trans. On Speech and Audio Proceeding, vol.15, no.2, 2007, pp.715-726 J. Valin, J. Rouat and F. Michaud, Enhanced Robot Audition Based on Microphone Array Source Separation with Post-Filter, Proc. Of 2004 IEE / RSJ IROS, 2004, pp.2123-2128

しかし、従来手法によればステップサイズパラメータμ（式（６）参照）が固定されていたため、コスト関数Ｊ（Ｗ）の最小値Ｊ（Ｗ₀）（Ｗ₀：最適分離行列）への収束性の観点から２つの問題がある。 However, since the step size parameter μ (see Equation (6)) is fixed according to the conventional method, the convergence of the cost function J (W) to the minimum value J (W ₀ ) (W ₀ : optimal separation matrix) is achieved. There are two problems from this point of view.

第１の問題は分離行列Ｗの更新量ΔＷ（＝μＪ’（Ｗ_k））がコスト関数の現在値Ｊ（Ｗ_k）によらずに決まるので、更新量ΔＷが収束速度および収束精度の向上の観点から不適当になるということである。第１の問題の理解のため分離行列Ｗに応じたコスト関数Ｊ（Ｗ）の変化態様を概念的に示す図１０（ａ）を参照する。収束不十分な場合（＝コスト関数Ｊ（Ｗ）が最小値Ｊ（Ｗ₀）から離れている場合）、収束速度の向上の観点から更新量ΔＷが過少となる可能性がある。その一方、収束十分な場合（＝コスト関数Ｊ（Ｗ）が最小値Ｊ（Ｗ₀）に近い場合）、収束精度の向上の観点から更新量ΔＷが過多となる可能性がある。 The first problem is that the update amount ΔW (= μJ ′ (W _k )) of the separation matrix W is determined regardless of the current value J (W _k ) of the cost function, so the update amount ΔW is improved in convergence speed and accuracy. It is inappropriate from the viewpoint of. In order to understand the first problem, reference is made to FIG. 10A conceptually showing how the cost function J (W) changes according to the separation matrix W. If the convergence is insufficient (= the cost function J (W) is far from the minimum value J (W ₀ )), the update amount ΔW may be too small from the viewpoint of improving the convergence speed. On the other hand, when the convergence is sufficient (= when the cost function J (W) is close to the minimum value J (W ₀ )), the update amount ΔW may be excessive from the viewpoint of improving the convergence accuracy.

第２の問題は更新量ΔＷがコスト関数Ｊ（Ｗ）の微分値Ｊ’（Ｗ）に比例するので、更新量ΔＷが収束速度および収束精度の向上の観点から不適当になるということである。第２の問題の理解のため分離行列Ｗに応じた異なるコスト関数Ｊ₁（Ｗ）およびＪ₂（Ｗ）のそれぞれの変化態様を概念的に示す図１０（ｂ）を参照する。コスト関数Ｊ₁（Ｗ）は、その微分値Ｊ’（Ｗ）が分離行列Ｗに対して敏感に変化するので収束精度の向上の観点から更新量が過多となる可能性がある。その一方、コスト関数Ｊ₂（Ｗ）は、その微分値Ｊ’（Ｗ）が分離行列Ｗに対して敏感に変化するとはいえないので収束速度の向上の観点から更新量が過少となる可能性がある。 The second problem is that since the update amount ΔW is proportional to the differential value J ′ (W) of the cost function J (W), the update amount ΔW becomes inappropriate from the viewpoint of improving the convergence speed and convergence accuracy. . In order to understand the second problem, reference is made to FIG. 10B, which conceptually shows each change mode of different cost functions J ₁ (W) and J ₂ (W) according to the separation matrix W. Since the differential value J ′ (W) of the cost function J ₁ (W) changes sensitively with respect to the separation matrix W, the update amount may be excessive from the viewpoint of improving the convergence accuracy. On the other hand, the cost function J ₂ (W) cannot be said to have a small update amount from the viewpoint of improving the convergence speed because the differential value J ′ (W) does not change sensitively with respect to the separation matrix W. There is.

そこで、本発明は、収束速度および収束精度の向上を図りながら音源信号を高精度で分離することができるシステムを提供することを解決課題とする。 Accordingly, an object of the present invention is to provide a system capable of separating sound source signals with high accuracy while improving convergence speed and convergence accuracy.

第１発明の音源分離システムは、複数のマイクロホンを備え、前記複数のマイクロホンのそれぞれからの入力信号に基づいて複数の音源信号を分離する音源分離システムであって、前記入力信号および前記音源信号の相関関係を表わす分離行列により定義され、かつ、前記音源信号の分離度を評価するためのコスト関数を認識する第１処理要素と、前記第１処理要素により認識された前記コスト関数の次回値が今回値よりも最小値に近づくように今回の前記分離行列を更新することによって次回の前記分離行列を決定する処理を繰り返すことにより、前記コスト関数が最小値となるときの前記分離行列を最適分離行列として認識するとともに、前記分離行列の今回値から次回値への更新量を、前記コスト関数の今回値が大きいほど多くなる一方、前記コスト関数の今回勾配が急であるほど少なくなるように調節する第２処理要素とを備えていることを特徴とする。 A sound source separation system according to a first aspect of the present invention is a sound source separation system that includes a plurality of microphones and separates a plurality of sound source signals based on input signals from each of the plurality of microphones. A first processing element defined by a separation matrix representing a correlation and recognizing a cost function for evaluating the separation degree of the sound source signal, and a next value of the cost function recognized by the first processing element are By repeating the process of determining the next separation matrix by updating the current separation matrix so that it approaches the minimum value rather than the current value, the separation matrix when the cost function becomes the minimum value is optimally separated. The amount of update from the current value of the separation matrix to the next value increases as the current value of the cost function increases. , Characterized in that a second processing element to adjust to the current gradient of the cost function becomes smaller as a steep.

第１発明の音源分離システムによれば、コスト関数の次回値（今回分離行列における値）が今回値（前回分離行列における値）よりも最小値に近づくように今回の分離行列が次回の分離行列に更新される処理が繰り返される。分離行列の更新量はコスト関数の今回値が大きいほど多くなり、かつ、コスト関数の今回勾配が急なほど少なくなるように調節される。このため、コスト関数の今回値が収束不十分であり、かつ、コスト関数の今回勾配が緩やかな「第１の状態」では収束速度の向上の観点から分離行列の更新量が適当に多くなるように調節される。また、コスト関数の今回値が収束不十分であり、かつ、コスト関数の今回勾配が急な「第２の状態」では、第１の状態と同様に収束速度の向上の観点から分離行列の更新量が適当に多くなるように調節される一方、収束精度の向上の観点から第１の状態よりも分離行列の更新量が少なくなるように調節される。さらに、コスト関数の今回値が収束十分であり、かつ、コスト関数の今回勾配が急な「第３の状態」では収束精度の向上の観点から分離行列の更新量が適当に少なくなるように調節される。また、コスト関数の今回値が収束十分であり、かつ、コスト関数の今回勾配が緩やかな「第４の状態」では、第３の状態と同様に収束精度の向上の観点から分離行列の更新量が適当に少なくなるように調節される一方、収束速度の向上の観点から第３の状態よりも分離行列の更新量が多くなるように調節される。そして、前記処理の繰り返しによって最適分離行列（コスト関数が最小値となるような分離行列）が認識される。したがって、複数のマイクロホンのそれぞれからの入力信号と最適分離行列とに基づき、収束速度および収束精度の向上を図りながら音源信号が高精度で分離されうる。 According to the sound source separation system of the first invention, the next separation matrix is the next separation matrix so that the next value of the cost function (value in the current separation matrix) is closer to the minimum value than the current value (value in the previous separation matrix). The process updated to is repeated. The update amount of the separation matrix is adjusted so as to increase as the current value of the cost function increases and to decrease as the current gradient of the cost function increases. For this reason, when the current value of the cost function is insufficiently converged and the current function slope of the cost function is gentle, the update amount of the separation matrix is appropriately increased from the viewpoint of improving the convergence speed. Adjusted to. Also, in the “second state” where the current value of the cost function is insufficiently converged and the current function gradient is steep, the separation matrix is updated from the viewpoint of improving the convergence speed as in the first state. While the amount is adjusted to be appropriately increased, the amount of update of the separation matrix is adjusted to be smaller than that in the first state from the viewpoint of improving the convergence accuracy. Furthermore, in the “third state” where the current value of the cost function is sufficient to converge and the current function function has a steep slope, the adjustment amount of the separation matrix is adjusted to be appropriately reduced from the viewpoint of improving convergence accuracy. Is done. In addition, in the “fourth state” where the current value of the cost function is sufficiently converged and the current function slope of the cost function is gentle, the update amount of the separation matrix is improved from the viewpoint of improving the convergence accuracy as in the third state. Is adjusted to be appropriately reduced, while from the viewpoint of improving the convergence speed, the update amount of the separation matrix is adjusted to be larger than that in the third state. Then, an optimum separation matrix (a separation matrix having a minimum cost function) is recognized by repeating the process. Therefore, the sound source signal can be separated with high accuracy while improving the convergence speed and convergence accuracy based on the input signal from each of the plurality of microphones and the optimum separation matrix.

なお、音源分離システムの構成要素が情報を「認識する」とは、記憶装置から情報を読み出すこと、データベースから情報を検索すること、情報を受信すること、基礎情報に基づいて情報を算定、推定、設定または決定すること、算定した情報を記憶装置に保存すること等、この情報を必要とする演算処理のために情報を準備するためのあらゆる情報処理を実行することを意味する。 Note that the components of the sound source separation system “recognize” information means reading information from a storage device, retrieving information from a database, receiving information, and calculating and estimating information based on basic information. , Setting or determining, saving the calculated information in a storage device, and the like, performing any information processing for preparing information for arithmetic processing that requires this information.

第２発明の音源分離システムは、第１発明の音源分離システムにおいて、前記第２処理要素が多次元のニュートン法にしたがって前記分離行列の更新量を調節することを特徴とする。 A sound source separation system according to a second aspect is characterized in that, in the sound source separation system according to the first aspect, the second processing element adjusts an update amount of the separation matrix according to a multidimensional Newton method.

第２発明の音源分離システムによれば、ニュートン法にしたがって分離行列の更新量はコスト関数の今回値が大きいほど多くなり、かつ、コスト関数の今回勾配が急なほど少なくなるように調節される。したがって、複数のマイクロホンのそれぞれからの入力信号と最適分離行列とに基づき、収束速度および収束精度の向上を図りながら音源信号が高精度で分離されうる。 According to the sound source separation system of the second invention, according to the Newton method, the update amount of the separation matrix is adjusted to increase as the current value of the cost function increases and to decrease as the current gradient of the cost function increases. . Therefore, the sound source signal can be separated with high accuracy while improving the convergence speed and convergence accuracy based on the input signal from each of the plurality of microphones and the optimum separation matrix.

本発明の音源分離システムの実施形態について図面を用いて説明する。 An embodiment of a sound source separation system of the present invention will be described with reference to the drawings.

図１に示されている音源分離システムは複数のマイクロホンＭ_i（ｉ＝１，２，‥，ｎ）と、電子制御ユニット（ＣＰＵ，ＲＯＭ，ＲＡＭ，Ｉ／Ｏ回路、Ａ／Ｄ変換回路等の電子回路などにより構成されている。）１０とにより構成されている。 The sound source separation system shown in FIG. 1 includes a plurality of microphones M _i (i = 1, 2,..., N), an electronic control unit (CPU, ROM, RAM, I / O circuit, A / D conversion circuit, etc.) 10) and 10).

電子制御ユニット１０は複数のマイクロホンＭ_iのそれぞれからの入力信号に基づいて複数の音源信号を分離する。電子制御ユニット１０は第１処理要素１１および第２処理要素１２を備えている。第１処理要素１１および第２処理要素１２は同一のＣＰＵにより構成されていてもよく、異なるＣＰＵのそれぞれにより構成されていてもよい。第１処理要素１１は各マイクロホンＭ_iからの入力信号および音源信号の相関関係を表わす分離行列Ｗにより定義され、かつ、音源信号の分離度を評価するためのコスト関数Ｊ（Ｗ）を認識する。第２処理要素１２は第１処理要素１１により認識されたコスト関数の次回値Ｊ（Ｗ_k+1）が今回値Ｊ（Ｗ_k）よりも最小値Ｊ（Ｗ₀）に近づくように今回の分離行列Ｗ_kを更新することにより次回の分離行列Ｗ_k+1を決定する処理を実行する。第２処理要素１２はこの処理を繰り返して実行することにより、コスト関数が最小値となるときの分離行列を最適分離行列Ｗ₀として認識する。第２処理要素１２は今回分離行列Ｗ_kから次回分離行列Ｗ_k+1への更新量ΔＷ_kを、コスト関数の今回値Ｊ（Ｗ_k）の大小および今回勾配∂Ｊ（Ｗ_k）／∂Ｗの緩急に応じて調節する。 The electronic control unit 10 separates the plurality of sound source signals based on input signals from the plurality of microphones M _i. The electronic control unit 10 includes a first processing element 11 and a second processing element 12. The first processing element 11 and the second processing element 12 may be constituted by the same CPU, or may be constituted by different CPUs. The first processing element 11 is defined by a separation matrix W representing the correlation between the input signal from each microphone M _i and the sound source signal, and recognizes a cost function J (W) for evaluating the degree of separation of the sound source signal. . The second processing element 12 causes the current value J (W _{k + 1} ) of the cost function recognized by the first processing element 11 to be closer to the minimum value J (W ₀ ) than the current value J (W _k ). executing a process of determining the next separation matrix W _{k + 1} by updating the separation matrix W _k. The second processing element 12 recognizes the separation matrix when the cost function becomes the minimum value as the optimum separation matrix W ₀ by repeatedly executing this process. The second processing element 12 update amount [Delta] W _k from this separation matrix W _k to the next separation matrix W _{k + 1,} the magnitude of the current value J (W _k) of the cost function and the current gradient ∂J (W _k) / ∂ Adjust according to the speed of W.

マイクロホンＭ_iはたとえば図２に示されているように電子制御ユニット１０が搭載されているロボットＲの頭部Ｐ１の左右両側に４つずつ配置されている。マイクロホンＭ₁〜Ｍ₄のそれぞれは頭部Ｐ１の右側の前上部、後上部、前下部および後下部のそれぞれに配置されている。マイクロホンＭ₅〜Ｍ₈のそれぞれは頭部Ｐ１の左側の前上部、後上部、前下部および後下部のそれぞれに配置されている。なお、音源分離システムはロボットＲのほか、車両（四輪自動車）、複数の音源が存在する環境に接する任意の機械や装置に搭載されうる。また、マイクロホンＭ_iの数および配置は任意に変更されうる。ロボットＲは脚式移動ロボットであり、人間と同様に基体Ｐ０と、基体Ｐ０の上方に配置された頭部Ｐ１と、基体Ｐ０の上部に上部両側から延設された左右の腕体Ｐ２と、左右の腕体Ｐ２のそれぞれの先端に連結されている手部Ｐ３と、基体Ｐ０の下部から下方に延設された左右の脚体Ｐ４と、左右の脚体Ｐ４のそれぞれに連結されている足部Ｐ５とを備えている。基体Ｐ０はヨー軸回りに相対的に回動しうるように上下に連結された上部および下部により構成されている。頭部Ｐ１は基体Ｐ０に対してヨー軸回りに回動する等、動くことができる。腕体Ｐ２は肩関節機構、肘関節機構および手根関節機構のそれぞれにおいて１〜３軸回りの回動自由度を有している、手部Ｐ３は、手掌部から延設され、人間の手の親指、人差指、中指、薬指および小指のそれぞれに相当する５つの指機構を備え、物体の把持動作等が可能に構成されている。脚体Ｐ４は股関節機構、膝関節機構および足関節機構のそれぞれにおいて１〜３軸回りの回動自由度を有している。ロボットＲは音源分離結果に基づき、左右の脚体Ｐ４を動かして移動する等、適当な動作をすることができる。 Microphones M _i are arranged one by 4 on the left and right sides of the head P1 of the robot R, for example the electronic control unit 10 as shown in Figure 2 are mounted. Each of the microphones M _{1 to} M ₄ is disposed at the front upper part, the rear upper part, the front lower part, and the rear lower part on the right side of the head P1. Each of the microphones M ₅ ~M ₈ are provided at a front upper portion, a rear upper portion, a front lower part, and a rear lower portion of the left side of the head P1. In addition to the robot R, the sound source separation system can be mounted on a vehicle (four-wheeled vehicle) or any machine or device that contacts an environment where a plurality of sound sources exist. The number and arrangement of microphones M _i may be arbitrarily changed. The robot R is a legged mobile robot, like a human being, a base P0, a head P1 disposed above the base P0, and left and right arm bodies P2 extending from both upper sides of the base P0, Hands P3 connected to the respective distal ends of the left and right arm bodies P2, left and right legs P4 extending downward from the lower portion of the base P0, and legs connected to the left and right legs P4 Part P5. The base P0 is composed of an upper part and a lower part that are connected vertically so as to be relatively rotatable about the yaw axis. The head P1 can move, such as rotating around the yaw axis with respect to the base P0. The arm body P2 has a degree of freedom of rotation about 1 to 3 axes in each of the shoulder joint mechanism, the elbow joint mechanism, and the carpal joint mechanism. The hand part P3 extends from the palm part and is a human hand. 5 finger mechanisms corresponding to the thumb, forefinger, middle finger, ring finger, and little finger, respectively, are configured to be capable of gripping an object. The leg P4 has a degree of freedom of rotation about 1 to 3 axes in each of the hip joint mechanism, the knee joint mechanism, and the ankle joint mechanism. Based on the sound source separation result, the robot R can perform an appropriate operation such as moving the left and right legs P4.

前記構成の音源分離システムの機能について説明する。電子制御ユニット１０により分離行列Ｗの更新回数を表わす指数ｋが「１」に設定され（図３／Ｓ００１）、各マイクロホンＭ_iからの入力信号が取得される（図３／Ｓ００２）。第１処理要素１１により各音源信号の分離度を評価するためのコスト関数Ｊ（Ｗ）が定義または認識される（図３／Ｓ００４（式（４）（５）参照））。第２処理要素１２により分離行列Ｗの更新回数を表わす指数ｋが「１」に設定され（図３／Ｓ００６）、適応調整法（ＡＳ（ＡｄａｐｔｉｖｅＳｔｅｐ−ｓｉｚｅ））によって分離行列の今回更新量Ｗ_kが調節される（図３／Ｓ００８）。具体的には、コスト関数の今回値Ｊ（Ｗ_k）の近くのコスト関数Ｊ（Ｗ）が複素勾配演算法にしたがって式（１６）で表わされるように線形近似される。 The function of the sound source separation system having the above configuration will be described. Electrons by the control unit 10 index k representing the number of updates of the separating matrix W is set to "1" (FIG. 3 / S001), the input signals from the microphones M _i is obtained (FIG. 3 / S002). The first processing element 11 defines or recognizes a cost function J (W) for evaluating the degree of separation of each sound source signal (see FIG. 3 / S004 (see equations (4) and (5)). The index k representing the number of updates of the separation matrix W is set to “1” by the second processing element 12 (FIG. 3 / S006), and the current update amount W of the separation matrix by the adaptive adjustment method (AS (Adaptive Step-size)). _k is adjusted (FIG. 3 / S008). Specifically, the cost function J (W) near the current value J (W _k ) of the cost function is linearly approximated as expressed by the equation (16) according to the complex gradient calculation method.

Ｊ（Ｗ）≒Ｊ（Ｗ_k）＋２ＭＡ［∂Ｊ（Ｗ_k）／∂Ｗ，Ｗ−Ｗ_k］，
ＭＡ［Ａ，Ｂ］≡Ｒｅ［Σ_ijａ_ijｂ_ij］ ‥（１６） J (W) ≒ J (W k) + 2MA [∂J (W k) / ∂W, W-W k],
MA [A, B] ≡Re [Σ _ij a _ij b _ij ] (16)

図４に概念的に示されているように分離行列Ｗに応じてコスト関数Ｊ（Ｗ）が変化する場合、コスト関数の今回値Ｊ（Ｗ_k）を通り、コスト関数Ｊ（Ｗ）の今回勾配∂Ｊ（Ｗ_k）／∂Ｗだけ傾いた線形関数（一点鎖線、二点鎖線、三点鎖線参照）としてコスト関数Ｊ（Ｗ）が近似される。 As conceptually shown in FIG. 4, when the cost function J (W) changes according to the separation matrix W, the current value J (W _k ) of the cost function passes through the current value of the cost function J (W). The cost function J (W) is approximated as a linear function tilted by the gradient ∂J (W _k ) / ∂W (see the one-dot chain line, two-dot chain line, and three-dot chain line).

また、多次元のニュートン法にしたがって最適な今回ステップサイズパラメータμ_Kが、近似コスト関数Ｊ（Ｗ）が０（＝コスト関数の最小値Ｊ（Ｗ₀））となるように、関係式Ｗ＝Ｗ_k−μＪ’（Ｗ_k）に基づいて算出される。最適な今回ステップサイズパラメータμ_kは式（１７）で表わされる。そして、分離行列Ｗの今回更新量ΔＷ_kがμ_kＪ’（Ｗ_k）に決定される。 Further, according to the multi-dimensional Newton method, the current step size parameter μ _K has a relational expression W = so that the approximate cost function J (W) becomes 0 (= minimum value J (W ₀ ) of the cost function). It is calculated based on W _k −μJ ′ (W _k ). The optimum current step size parameter μ _k is expressed by equation (17). Then, the current update amount ΔW _k of the separation matrix W is determined to be μ _k J ′ (W _k ).

μ_k＝Ｊ（Ｗ_k）／２ＭＡ［∂Ｊ（Ｗ_k）／∂Ｗ，Ｊ’（Ｗ_k）］ ‥（１７） μ _k = J (W _k ) / 2 MA [∂J (W _k ) / ∂W, J ′ (W _k )] (17)

さらに、第２処理要素１２により今回の分離行列Ｗ_kが今回更新量ΔＷ_kだけ調節されることにより次回の分離行列Ｗ_k+1（＝Ｗ_k−ΔＷ_k）が決定される（図３／Ｓ０１０）。これにより、図４に矢印で示されているようにコスト関数の値Ｊ（Ｗ_k）が徐々に最小値Ｊ（Ｗ₀）（＝０）に近づくように分離行列Ｗ_kが逐次更新される。 Further, the second separation element W _{k + 1} (= W _k −ΔW _k ) is determined by adjusting the current separation matrix W _k by the current update amount ΔW _k by the second processing element 12 (FIG. 3 / S010). As a result, the separation matrix W _k is sequentially updated so that the value J (W _k ) of the cost function gradually approaches the minimum value J (W ₀ ) (= 0) as indicated by arrows in FIG. .

なお、次に説明するように適応調整法はさまざまなＢＳＳに適用されうる。
１．適応ステップサイズＤＳＳ（ＤＳＳ−ＡＳ）
本手法をＤＳＳに適用したアルゴリズムは式（１０１）〜（１０５）により定義される。 As will be described below, the adaptive adjustment method can be applied to various BSSs.
1. Adaptive step size DSS (DSS-AS)
An algorithm obtained by applying this method to DSS is defined by equations (101) to (105).

ｙ＝Ｗ_kｘ ‥（１０１），
Ｅ＝ｙｙ^H−Ｄｉａｇ［ｙｙ^H］ ‥（１０２），
Ｊ’＝２ＥＷ_kｘｘ^H‥（１０３），
μ＝‖Ｅ‖²／２‖Ｊ’‖² ‥（１０４），
Ｗ_k+1＝Ｗ_k−μＪ’‥（１０５） y = W _k x (101),
E = yy ^H −Diag [yy ^H ] (102),
J ′ = 2EW _k xx ^H (103),
μ = ‖E‖ ² / 2‖J'‖ ² (104),
W _{k + 1} = W _k −μJ ′ (105)

２．適応ステップサイズＩＣＡ（ＩＣＡ−ＡＳ）
本手法をＩＣＡに適用したアルゴリズムは式（２０１）〜（２０８）により定義される。 2. Adaptive step size ICA (ICA-AS)
An algorithm in which this method is applied to ICA is defined by equations (201) to (208).

ｙ＝Ｗ_kｘ ‥（２０１），
Ｅ＝φ（ｙ）ｙ^H−Ｄｉａｇ［φ（ｙ）ｙ^H］ ‥（２０２），
Ｊ_ICA’＝ＥＷ_k‥（２０３），
Ｊ’＝［Ｅφ~（ｙ）ｘ^H］^* ‥（２０４），
φ~（ｙ）＝［φ~（ｙ₁），φ~（ｙ₂），‥，φ~（ｙ_N）］^T‥（２０５）
φ~（ｙ_i）＝φ（ｙ_i）＋ｙ_i（∂φ（ｙ_i）／∂ｙ_i） ‥（２０６）
μ＝‖Ｅ‖²／２ＭＡ［Ｊ’，Ｊ_ICA’］ ‥（２０７），
Ｗ_k+1＝Ｗ_k−μＪ’‥（２０８） y = W _k x (201),
E = φ (y) y ^H −Diag [φ (y) y ^H ] (202),
J _ICA '= EW _k (203),
J ′ = [Eφ˜ (y) x ^H ] ^* (204),
φ ~ (y) = [φ ~ (y ₁ ), φ ~ (y ₂ ),..., φ ~ (y _N )] ^T (205)
φ ~ (y _i ) = φ (y _i ) + y _i (∂φ (y _i ) / ∂y _i ) (206)
μ = ‖E‖ ² / 2MA [J ', J _ICA '] (207),
W _{k + 1} = W _k −μJ ′ (208)

３．適応ステップサイズ高次ＤＳＳ（ＨＤＳＳ−ＡＳ）
本手法を高次ＤＳＳに適用したアルゴリズムは式（３０１）〜（３０５）により定義される。 3. Adaptive step size higher order DSS (HDSS-AS)
An algorithm in which this method is applied to higher-order DSS is defined by equations (301) to (305).

ｙ＝Ｗ_kｘ ‥（３０１），
Ｅ＝φ（ｙ）ｙ^H−Ｄｉａｇ［φ（ｙ）ｙ^H］ ‥（３０２），
Ｊ’＝［Ｅφ~（ｙ）ｘ^H］^*‥（３０３），
μ＝‖Ｅ‖²／２‖Ｊ’‖² ‥（３０４），
Ｗ_k+1＝Ｗ_k−μＪ’‥（３０６） y = W _k x (301),
E = φ (y) y ^H −Diag [φ (y) y ^H ] (302),
J ′ = [Eφ˜ (y) x ^H ] ^* (303),
μ = ‖E‖ ² / 2‖J'‖ ² (304),
W _{k + 1} = W _k −μJ ′ (306)

４．適応ステップサイズＧＳＳ（ＧＳＳ−ＡＳ）
本手法をＧＳＳに適用したアルゴリズムは式（４０１）〜（４０８）により定義される。 4). Adaptive step size GSS (GSS-AS)
An algorithm in which this method is applied to GSS is defined by equations (401) to (408).

ｙ＝Ｗ_kｘ ‥（４０１），
Ｅ_ss＝ｙｙ^H−Ｄｉａｇ［ｙｙ^H］ ‥（４０２），
Ｊ_ss’＝２Ｅ_ssＷ_tｘｘ^H‥（４０３），
μ_ss＝‖Ｅ_ss‖²／２‖Ｊ_ss’‖² ‥（４０４），
Ｅ_LC＝ＷＤ−Ｉ ‥（４０５），
Ｊ_LC’＝Ｅ_LCＤ^H ‥（４０６），
μ_LC＝‖Ｅ_LC‖²／２‖Ｊ_LC’‖² ‥（４０７），
Ｗ_k+1＝Ｗ_k−μ_LCＪ_LC’−μ_ssＪ_ss’ ‥（４０８） y = W _k x (401),
E _ss = yy ^H −Diag [yy ^H ] (402),
J _ss ' = 2E _ss W _t xx ^H (403),
μ _{_ss} = ‖E _ss ‖ ² / 2‖J _ss' ‖ ² ‥ (404),
E _LC = WD-I (405),
J _LC '= E _LC ^DH (406),
μ _{_LC} = ‖E _LC ‖ ² / 2‖J _LC '‖ ² ‥ (407),
W _{k + 1} = W _k −μ _LC J _LC '−μ _ss J _ss ' (408)

５．適応ステップサイズＧＩＣＡ（ＧＩＣＡ−ＡＳ）
本手法をＧＩＣＡに適用したアルゴリズムは式（５０１）〜（５０９）により定義される。 5). Adaptive step size GICA (GICA-AS)
An algorithm in which this method is applied to GICA is defined by equations (501) to (509).

ｙ＝Ｗ_kｘ ‥（５０１），
Ｅ_ICA＝φ（ｙ）ｙ^H−Ｄｉａｇ［φ（ｙ）ｙ^H］ ‥（５０２），
Ｊ_ICA’＝Ｅ_ICAＷ_t ‥（５０３），
Ｊ’＝［Ｅ_ICAφ~（ｙ）ｘ^H］^*‥（５０４），
μ_ICA＝‖Ｅ_ICA‖²／２ＭＡ‖Ｊ’，Ｊ_ICA’‖² ‥（５０５），
Ｅ_LC＝ＷＤ−Ｉ ‥（５０６），
Ｊ_LC’＝Ｅ_LCＤ^H ‥（５０７），
μ_LC＝‖Ｅ_LC‖²／２‖Ｊ_LC’‖² ‥（５０８），
Ｗ_k+1＝Ｗ_k−μ_LCＪ_LC’−μ_ICAＪ_ICA’ ‥（５０９） y = W _k x (501),
E _ICA = φ (y) y ^H −Diag [φ (y) y ^H ] (502),
J _ICA '＝ E _ICA W _t (503),
J ′ = [E _ICA φ ~ (y) x ^H ] ^* (504),
μ _{_ICA} = ‖E _ICA ‖ ² / 2MA‖J ', J _ICA' ‖ ² ‥ (505),
E _LC = WD-I (506),
J _LC '= E _LC ^DH (507),
μ _{_LC} = ‖E _LC || ² / 2‖J _LC '‖ ² ‥ (508),
W _{k + 1} = W _k −μ _LC J _LC '−μ _ICA J _ICA ' (509)

６．適応ステップサイズＧＨＤＳＳ（ＧＨＤＳＳ−ＡＳ）
本手法をＧＨＤＳＳに適用したアルゴリズムは、ＧＳＳ−ＡＳを定義する式（４０１）〜（４０８）のうち式（４０２）で表わされるコスト関数Ｅ_ssを、ＧＩＣＡ−ＡＳを定義する式（５０２）で表わされるコスト関数Ｅ_ICAに置き換えることにより定義される。 6). Adaptive step size GHDSS (GHDSS-AS)
Algorithm according to the present technique GHDSS is a cost function E _ss represented by among expression of Expression (401) - (408) defining a GSS-AS (402), by the formula (502) which defines the GICA-AS It is defined by substituting the expressed cost function E _ICA .

次回の分離行列Ｗ_k+1が最適分離行列Ｗ₀に一致しているか否か次回の分離行列Ｗ_k+1と最適分離行列Ｗ₀との偏差のノルム（フロベニウスノルム）が許容値ｅｐｓ未満であるか否かが判定され（図３／Ｓ０１２）、当該判定結果が否定的である場合（図３／Ｓ０１２‥ＮＯ）、第２処理要素１２により指数ｋが「１」だけ増やされ（図３／Ｓ０１４）、前記のように各マイクロホンからの入力信号取得、コスト関数Ｊ（Ｗ）評価、更新量ΔＷ_kの調節および次回の分離行列Ｗ_k+1等の処理が再び実行される（図３／Ｓ００２，Ｓ００４，Ｓ００８，Ｓ０１０，Ｓ０１２参照）。一方、当該判定結果が肯定的な場合（図３／Ｓ０１２‥ＹＥＳ）、当該次回の分離行列Ｗが最適分離行列Ｗ₀として決定される（図３／Ｓ０１６）。そして、最適適応行列Ｗ₀および入力信号ｘに基づき、音源信号ｙ（＝Ｗ₀・ｘ）が分離される。 In the next separation matrix W _{k + 1} is optimized whether the separation matrix W ₀ to match or next separation matrix W _{k + 1} and the deviation of the norm of the optimal separation matrix W ₀ (Frobenius norm) is less than the allowable value eps If the determination result is negative (FIG. 3 / S012... NO), the second processing element 12 increases the index k by “1” (FIG. 3 / S012). / S014), the input signal acquisition from each microphone, the cost function J (W) evaluation, the adjustment of the update amount ΔW _k , the next separation matrix W _{k + 1 and the} like are executed again as described above (FIG. 3). / S002, S004, S008, S010, S012). On the other hand, if the determination result is affirmative (Fig. 3 / S012 ‥ YES), the next separation matrix W is determined as an optimal separation matrix W ₀ (Fig. 3 / S016). Then, the sound source signal y (= W ₀ · x) is separated based on the optimum adaptive matrix W ₀ and the input signal x.

前記機能を発揮する音源分離システムによれば、コスト関数の次回値Ｊ（Ｗ_k+1）が今回値Ｊ（Ｗ_k）よりも最小値に近づくように今回の分離行列Ｗ_kが次回の分離行列Ｗ_k+1に更新される処理が繰り返される（図３／Ｓ００８，Ｓ０１０，Ｓ０１２，Ｓ０１４，図４矢印参照）。分離行列Ｗの更新量ΔＷ_kはコスト関数の今回値Ｊ（Ｗ_k）が大きいほど多くなり、かつ、コスト関数の今回勾配∂Ｊ（Ｗ_k）／∂Ｗが急なほど少なくなるように調節される（図４参照）。このため、コスト関数の今回値Ｊ（Ｗ_k）が収束不十分であり、かつ、コスト関数の今回勾配∂Ｊ（Ｗ_k）／∂Ｗが緩やかな「第１の状態」では収束速度の向上の観点から分離行列の更新量ΔＷ_kが適当に多くなるように調節される。また、コスト関数の今回値Ｊ（Ｗ_k）が収束不十分であり、かつ、コスト関数の今回勾配∂Ｊ（Ｗ_k）／∂Ｗが急な「第２の状態」では、第１の状態と同様に収束速度の向上の観点から分離行列の更新量ΔＷ_kが適当に多くなるように調節される一方、収束精度の向上の観点から第１の状態よりも分離行列の更新量ΔＷ_kが少なくなるように調節される。さらに、コスト関数の今回値Ｗ_kが収束十分であり、かつ、コスト関数の今回勾配∂Ｊ（Ｗ_k）／∂Ｗが急な「第３の状態」では収束精度の向上の観点から分離行列の更新量ΔＷ_kが適当に少なくなるように調節される。また、コスト関数の今回値Ｊ（Ｗ_k）が収束十分であり、かつ、コスト関数の今回勾配∂Ｊ（Ｗ_k）／∂Ｗが緩やかな「第４の状態」では、第３の状態と同様に収束精度の向上の観点から分離行列の更新量ΔＷ_kが適当に少なくなるように調節される一方、収束速度の向上の観点から第３の状態よりも分離行列の更新量ΔＷ_kが多くなるように調節される。そして、前記処理の繰り返しによって最適分離行列（コスト関数が最小値となるような分離行列）Ｗ₀が認識される。したがって、複数のマイクロホンＭ_i（図１、図２参照）のそれぞれからの入力信号ｘと最適分離行列Ｗ₀とに基づき、収束速度および収束精度の向上を図りながら音源信号ｙ（＝Ｗ₀・ｘ）が高精度で分離されうる。 According to the sound source separation system that performs the above function, the current separation matrix W _k is separated next time so that the next value J (W _{k + 1} ) of the cost function is closer to the minimum value than the current value J (W _k ). The process of updating to the matrix W _{k + 1} is repeated (see arrows in FIG. 3 / S008, S010, S012, S014, and FIG. 4). The update amount ΔW _k of the separation matrix W is adjusted so as to increase as the current value J (W _k ) of the cost function increases, and to decrease as the current slope ∂J (W _k ) / ∂W of the cost function decreases. (See FIG. 4). Therefore, in the “first state” where the current value J (W _k ) of the cost function is insufficiently converged and the current gradient ∂J (W _k ) / ∂W of the cost function is moderate, the convergence speed is improved. From this point of view, the update amount ΔW _{k of the} separation matrix is adjusted to be appropriately increased. Further, in the “second state” where the current value J (W _k ) of the cost function is insufficiently converged and the current gradient ∂J (W _k ) / ∂W of the cost function is steep, the first state In the same manner as described above, the update amount ΔW _{k of the} separation matrix is adjusted to be appropriately increased from the viewpoint of improving the convergence speed, while the update amount ΔW _{k of the} separation matrix is smaller than that of the first state from the viewpoint of improving the convergence accuracy. Adjusted to be less. Further, in the “third state” in which the current value W _k of the cost function is sufficiently converged and the current gradient ∂J (W _k ) / ∂W of the cost function is steep, a separation matrix from the viewpoint of improving the convergence accuracy. The update amount ΔW _k is adjusted to be appropriately reduced. Further, in the “fourth state” where the current value J (W _k ) of the cost function is sufficiently converged and the current gradient ∂J (W _k ) / の W of the cost function is moderate, Similarly, the update amount ΔW _{k of the} separation matrix is adjusted to be appropriately reduced from the viewpoint of improving the convergence accuracy, while the update amount ΔW _{k of the} separation matrix is larger than that of the third state from the viewpoint of improving the convergence speed. Adjusted to be. Then, the optimum separation matrix (separation matrix having a minimum cost function) W ₀ is recognized by repeating the above process. Therefore, based on the input signal x from each of the plurality of microphones M _i (see FIGS. 1 and 2) and the optimum separation matrix W ₀ , the sound source signal y (= W _0. x) can be separated with high precision.

音源分離システムの性能実験結果について説明する。マイクロホンＭ_iへの入力信号ｘ_i（ｔ）は、第ｊ音源からマイクロホンＭ_iまでのインパルス応答ｈ_ji（ｔ）と、第ｊ音源の音源信号ｓ_j（ｔ）と、マイクロホンＭ_iの背景雑音ｎ_i（ｔ）とに基づいて式（１８）で表わされるように合成された。 The performance experiment result of the sound source separation system will be described. Input signal x _i to the microphone M _i (t) is the impulse response h _ji from the j sound source to the microphone M _i (t), a j-th sound source of the sound source signal s _j (t), background of the microphone M _i Based on the noise n _i (t), it was synthesized as represented by equation (18).

ｘ_i（ｔ）＝Σ_jｈ_ji（ｔ）ｓ_j（ｔ）＋ｎ_i（ｔ） ‥（１８） x _i (t) = Σ _j h _ji (t) s _j (t) + n _i (t) (18)

実験に際して音源信号ｓ_j（ｔ）としてクリーンな２つの音声が用いられた。具体的には図５（ａ）に示されている第１音源信号としての男声と、図５（ｂ）に示されている第２音源信号としての女声とが用いられた。インパルス応答ｈ_ji（ｔ）としては実験室における実測値が採用された。実験室の大きさは縦４．０［ｍ］、横７．０［ｍ］、高さ３．０［ｍ］であり、残響時間は約０．２［ｓ］である。実験室の壁の一面はガラスであり、強い反射が発生する。背景雑音ｎ_i（ｔ）としては同じく実験室における図５（ｃ）に示されている実測値が採用された。図５（ｄ）には合成入力信号ｘ_i（ｔ）が示されている。図６に各信号の周波数特性が示されている。背景雑音は音源より−１０〜−２０ｄＢ程度低いレベルとなっている。分離結果は分離結果の信号ｙと、信号ｙに含まれるノイズ信号ｎ^#と、対象音源のみが存在する場合の入力信号に対する分離結果の信号ｓ^#とに基づき、式（１９）にしたがって算出されるＳＮＲに基づいて評価された。ＳＮＲが高いほど音源が高精度で分離されていることを意味している。 In the experiment, two clean sounds were used as the sound source signal s _j (t). Specifically, the male voice as the first sound source signal shown in FIG. 5A and the female voice as the second sound source signal shown in FIG. 5B were used. As the impulse response h _ji (t), a measured value in a laboratory was adopted. The size of the laboratory is 4.0 [m] in length, 7.0 [m] in width, 3.0 [m] in height, and the reverberation time is about 0.2 [s]. One side of the laboratory wall is glass, and strong reflection occurs. As the background noise n _i (t), the measured value shown in FIG. FIG. 5D shows the combined input signal x _i (t). FIG. 6 shows the frequency characteristics of each signal. The background noise is at a level lower by about −10 to −20 dB than the sound source. The separation result is calculated according to the equation (19) based on the separation result signal y, the noise signal n ^# included in the signal y, and the separation result signal s ^# with respect to the input signal when only the target sound source exists. Was evaluated based on the SNR. A higher SNR means that the sound source is separated with higher accuracy.

ＳＮＲ［ｄＢ］
＝１０Ｌｏｇ₁₀［（１／Ｔ）Σ_t=1-T｜ｙ（ｔ）｜²／｜ｎ^#（ｔ）｜²］，
ｎ^#≡ｙ−ｓ^# ‥（１９） SNR [dB]
= ₁₀ Log ₁₀ [(1 / T) Σt _{= 1−T} | y (t) | ² / | n ^# (t) | ² ],
n ^# ≡y−s ^# (19)

分離結果はさらに時間周波数領域で式（２０）にしたがって算出される平均相関係数ＣＣに基づいて評価された。平均相関係数ＣＣが低いほど音源が高精度で分離されていることを意味している。 The separation results were further evaluated based on the average correlation coefficient CC calculated according to equation (20) in the time frequency domain. A lower average correlation coefficient CC means that the sound source is separated with higher accuracy.

ＣＣ［ｄＢ］
＝１０Ｌｏｇ₁₀［（１／Ｆ）Σ_f=1-FＣＣω（２πｆ）］，
ＣＣω（ω）≡｜Σ_t=1-Tｙ₁*（ｔ）・ｙ₂（ｔ）｜／（Ｙ₁（ω）Ｙ₂（ω）），
Ｙ₁（ω）≡（Σ_t=1-T｜ｙ₁（ω，ｔ）｜²）^1/2，
Ｙ₂（ω）≡（Σ_t=1-T｜ｙ₂（ω，ｔ）｜²）^1/2 ‥（２０） CC [dB]
= ₁₀ Log ₁₀ [(1 / F) Σf _{= 1−F} CCω (2πf)],
CCω (ω) ≡ | Σt _{= 1−T} y ₁ * (t) · y ₂ (t) | / (Y ₁ (ω) Y ₂ (ω)),
Y ₁ (ω) ≡ (Σt _{= 1−T} | y ₁ (ω, t) | ² ) ^1/2 ,
Y ₂ (ω) ≡ (Σt _{= 1−T} | y ₂ (ω, t) | ² ) ^1/2 (20)

分離行列Ｗは直接音成分の伝達関数を要素とする伝達関数行列Ｄを用いて式（２１）にしたがって初期化された。 The separation matrix W was initialized according to the equation (21) using the transfer function matrix D having the direct sound component transfer function as an element.

Ｗ_DS＝Ｄｉａｇ［Ｄ^HＤ］^-1Ｄ^H ‥（２１） W _DS = Diag [D ^H D] ⁻¹ D ^H (21)

なお、分離行列Ｗは式（２１）に代えて式（２２）または式（２３）にしたがって初期化されてもよい。 The separation matrix W may be initialized according to the equation (22) or the equation (23) instead of the equation (21).

Ｗ_I＝Ｉ ‥（２２）
Ｗ_NULL＝Ｄ⁺（＝［Ｄ^HＤ］^-1Ｄ^H） ‥（２３） W _I = I (22)
W _NULL = D ⁺ (= [D ^H D] ⁻¹ D ^H ) (23)

Ｗ_DSは最小ノルム重み付き遅延和ＢＦの係数を初期値として使うことを意味しており、Ｗ_NULLは死角型ＢＦの係数を初期値として使うことを意味している。Ｗ_NULLはＷ_DS初期の分離度が高いが変動に対するロバスト性が低いため、残響が強い場合や幾何情報の誤差が大きい場合、Ｗ_DSのほうが高性能な初期値が与えられる。 W _DS means that the coefficient of the minimum norm weighted delay sum BF is used as the initial value, and W _NULL means that the coefficient of the blind spot type BF is used as the initial value. Since W _NULL has a high degree of separation at the initial _stage of W _DS but is less robust against fluctuations, W _DS provides a higher-performance initial value when reverberation is strong or when there is a large error in geometric information.

幾何制約のない手法に関しては分離行列の各行ベクトルの大きさを正規化することでスケーリング問題が解決されている。また、パーミュテーション問題は初期値により解決されるものとみなして付加的処理は省略された。従来の幾何制約付きのＢＳＳで必要な正規化係数λはＧＳＳおよびＧＨＤＳＳにおいては前記文献１２にしたがって「‖ｘ^Hｘ‖^-2」とし、ＧＩＣＡにおいては自然勾配により正規化されているため「１」とした。また、ＤＳＳ以外の手法で利用される非線形関数φ（ｙ_i）はスケーリングパラメータη（本実験では「１」とした。）に基づき、式（２４）により定義された。 For the method without geometric constraints, the scaling problem is solved by normalizing the size of each row vector of the separation matrix. Further, the permutation problem was regarded as being solved by the initial value, and additional processing was omitted. The normalization coefficient λ required for the conventional BSS with geometric constraints is “‖x ^H x‖- ² ” according to the above-mentioned reference 12 in GSS and GHDSS, and is normalized by a natural gradient in GICA. " In addition, the nonlinear function φ (y _i ) used in a method other than DSS is defined by the equation (24) based on the scaling parameter η (“1” in this experiment).

φ（ｙ_i）≡ｔａｎｈ（η｜ｙ_i｜）ｅｘｐ（ｊθ（ｙ_i）） ‥（２４） φ (y _i ) ≡tanh (η | y _i |) exp (jθ (y _i )) (24)

ＤＳＳ，ＩＣＡ，ＨＤＳＳ，ＧＳＳ，ＧＩＣＡおよびＧＨＤＳＳのそれぞれのＢＳＳにおいて、ステップサイズパラメータμが「０．００１」「０．０１」および「０．１」に固定されている場合と、本発明の適応調整法（ＡＳ）が適用された場合とのそれぞれにおいて分離された音源信号のＳＮＲを図７に示し、当該音源信号のＣＣを図８に示す。また、ＧＳＳ−ＡＳにしたがって分離された波形を図９に示す。図７から明らかなようにＤＳＳにおいてＡＳによりＳＮＲが著しく改善された。ＩＣＡおよびＨＤＳＳについても平均的ＳＮＲが改善された。従来法では相関係数ＣＣが−３ｄＢ程度であるのに対して、ＡＳによればすべてのＢＳＳにおいて−７ｄＢ以上と顕著に低い。これからＡＳは無相関化において有効な手法であることがわかる。なお、ＧＳＳおよびＨＤＳＳのＳＮＲがＡＳにより顕著に改善されない原因は幾何制約の誤差であると推察される。 In the DSS, ICA, HDSS, GSS, GICA, and GHDSS BSS, the step size parameter μ is fixed to “0.001”, “0.01”, and “0.1”, and the application of the present invention FIG. 7 shows the SNR of the sound source signal separated when the adjustment method (AS) is applied, and FIG. 8 shows the CC of the sound source signal. Moreover, the waveform separated according to GSS-AS is shown in FIG. As is clear from FIG. 7, the SNR was significantly improved by the AS in the DSS. The average SNR was also improved for ICA and HDSS. In the conventional method, the correlation coefficient CC is about −3 dB, but according to AS, it is remarkably low at −7 dB or more in all BSSs. From this, it can be seen that AS is an effective method for decorrelation. Note that the reason why the SNR of GSS and HDSS is not significantly improved by AS is presumed to be an error of geometric constraints.

なお、多次元のニュートン法に代えて、ステップサイズパラメータμ、さらには、分離行列の今回値Ｗ_kから次回値Ｗ_k+1への更新量ΔＷ_kが、コスト関数の今回値Ｊ（Ｗ_k）が大きいほど多くなる一方、コスト関数の今回勾配∂Ｊ（Ｗ_k）／∂Ｗが急であるほど少なくなるように流動的に調節されるあらゆる手法が採用されてもよい。 Instead of the multi-dimensional Newton method, the step size parameter μ and the update amount ΔW _k from the current value W _k of the separation matrix to the next value W _{k + 1} are the current value J (W _{k of the} cost function). ) Increases as the value increases, while any method may be employed in which the current gradient コスト J (W _k ) / ∂W of the cost function is fluidly adjusted to decrease as the value increases.

本発明の音源分離システムの構成説明図Configuration explanatory diagram of the sound source separation system of the present invention 本発明の音源分離システムのロボットへの搭載例示図Example of mounting the sound source separation system of the present invention on a robot 本発明の音源分離システムの機能を示すフローチャートThe flowchart which shows the function of the sound source separation system of this invention コスト関数に応じた収束速度および収束精度の向上に関する説明図Explanatory drawing on improvement of convergence speed and convergence accuracy according to cost function （ａ）第１音源信号（男声）の波形説明図（ｂ）第２音源信号（女声）の波形説明図（ｃ）背景雑音の波形説明図（ｄ）合成入力信号の波形説明図(A) Waveform explanatory diagram of first sound source signal (male voice) (b) Waveform explanatory diagram of second sound source signal (female voice) (c) Waveform explanatory diagram of background noise (d) Waveform explanatory diagram of synthesized input signal 各信号の周波数特性図Frequency characteristics of each signal 音源分離実験結果としての各手法によるＳＮＲの比較説明図Comparison explanatory diagram of SNR by each method as a sound source separation experiment result 音源分離実験結果としての各手法によるＣＣの比較説明図Comparison explanation diagram of CC by each method as a sound source separation experiment result ＧＳＳ−ＡＳにしたがって分離された音源信号の波形説明図Waveform explanatory diagram of a sound source signal separated according to GSS-AS コスト関数に応じた収束速度および収束精度の問題に関する説明図Explanatory diagram on the problem of convergence speed and convergence accuracy according to the cost function

符号の説明Explanation of symbols

１０‥電子制御ユニット、１１‥第１処理要素、１２‥第２処理要素、Ｍ_i‥マイクロホン 10 ‥ electronic control unit, 11 ‥ first processing element, 12 ‥ second processing element, M _i ‥ microphone

Claims

複数のマイクロホンを備え、前記複数のマイクロホンのそれぞれからの入力信号に基づいて複数の音源信号を分離する音源分離システムであって、
前記入力信号および前記音源信号の相関関係を表わす分離行列により定義され、かつ、前記音源信号の分離度を評価するためのコスト関数を認識する第１処理要素と、
前記第１処理要素により認識された前記コスト関数の次回値が今回値よりも最小値に近づくように今回の前記分離行列を更新することによって次回の前記分離行列を決定する処理を繰り返すことにより、前記コスト関数が最小値となるときの前記分離行列を最適分離行列として認識するとともに、前記分離行列の今回値から次回値への更新量を、前記コスト関数の今回値が大きいほど多くなる一方、前記コスト関数の今回勾配が急であるほど少なくなるように調節する第２処理要素とを備えていることを特徴とする音源分離システム。 A sound source separation system that includes a plurality of microphones and separates a plurality of sound source signals based on input signals from each of the plurality of microphones,
A first processing element defined by a separation matrix representing a correlation between the input signal and the sound source signal and recognizing a cost function for evaluating the separation degree of the sound source signal;
By repeating the process of determining the next separation matrix by updating the current separation matrix so that the next value of the cost function recognized by the first processing element is closer to the minimum value than the current value, While recognizing the separation matrix when the cost function becomes the minimum value as an optimal separation matrix, the amount of update from the current value of the separation matrix to the next value increases as the current value of the cost function increases, A sound source separation system, comprising: a second processing element that adjusts the cost function so that the current gradient of the cost function is steeper.