JP3927701B2

JP3927701B2 - Sound source signal estimation device

Info

Publication number: JP3927701B2
Application number: JP26787798A
Authority: JP
Inventors: 靖茂中山; 哲夫梅田; 隆司西; 悟小泉
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 1998-09-22
Filing date: 1998-09-22
Publication date: 2007-06-13
Anticipated expiration: 2018-09-22
Also published as: JP2000097758A

Description

【０００１】
【発明の属する技術分野】
本発明は、複数の音源信号が相互に混在して複数のチャンネルを介して入力されたときに、その複数の音源信号を音源毎に推定する技術に関する。
【０００２】
【従来の技術】
複数の音源信号が相互に混在して複数のチャンネルを介して入力されたときに、その音源信号を音源毎に推定することは一般にはできない。その理由は、音源信号が未知である場合、いったん混合されて入力した信号であるチャンネル信号のみから混合過程を一意に決定することが不可能であるからである。そこで、音源信号が互いに統計的に独立であると仮定したうえで音源信号の混合過程をモデル化し、音源信号の推定、分離を行うことが試みられている。従来のこの種推定、分離手法として、文献 C.Jutten et al.“Blind separation of sources, Part １：An adaptive algorithm based on neuromimetic architecture, ”Signal Process. 24, 1-10 (1991) に記載されたＩＣＡ（Independent Component Analysis) の手法がある。
【０００３】
ＩＣＡの手法は、複数の音源信号の混合過程をモデル化し、かつ原音源信号が統計的に独立であることを利用する音源信号の推定、分離手法である。その複数の音源の混合、分離過程の原理を音源信号、チャンネル信号および推定信号がそれぞれ２つある場合を例に図４に示す。入力信号としての２つのチャンネル信号（第１チャンネル信号、第２チャンネル信号）はそれぞれ、ある時刻における複数の連続するサンプル値の集合としてのベクトルで与えられる。図４においては、連続するサンプル値の数はｍ（ｍ＞１）である。そのｍ個のサンプル値からなる２つのチャンネル信号ベクトル
【外１】

が音源分離過程の入力端に入力されたとき、それを音源分離処理して同じくｍ個のサンプル値からなる混合過程モデルにおける２つの音源信号ベクトル
【外２】

を２つの推定信号ベクトル
【外３】

として推定している。図４の場合、いずれの信号（チャンネル信号、推定信号）もその数は２であるが、チャンネル信号、推定信号の数をそれぞれＮ，Ｍ（Ｎ＞２，Ｎ≧Ｍ＞２）に拡張しても一般性は失われない。ただし、音源数がチャンネル数より多い場合であってもチャンネル数を超える数の音源推定をすることはできない。
【０００４】
図４に示す混合過程モデルは、それぞれｍ個のサンプル値からなる次の２つの音源信号ベクトル
【数１】

がそれぞれ個別の混合係数ベクトル
【外４】

と内積演算されて、他方の音源信号に加算される混合過程モデルを示している。その混合された２つの信号がチャンネル信号ベクトル
【数２】

として図４に示す音源分離過程に入力される。ここで、同図に示す混合係数ベクトル
【外５】

はｋ番目の音源信号がｎ番目の音源信号に混入される際の混合係数ベクトルを示している。
【０００５】
すなわち、このＩＣＡの手法は、図４に示す混合過程モデルに基づいてｎ番目の入力チャンネル信号が次式で表現できることを前提に、音源信号ベクトルの推定、分離を行う手法である。
【数３】

ここで、
【外６】

の内積、すなわち
【数４】

を表わしている。（１）式に基づき他の音源信号が混在したチャンネル信号から所定の音源信号を推定信号ベクトル
【外７】

として分離するには、以下に説明する分離係数ベクトル
【外８】

を定義し、他の音源信号に相当するこの推定信号ベクトルをそれぞれの分離係数ベクトルで内積演算した結果を、音源信号を分離しようとしているチャンネル信号から減算するようにすればよい。
【０００６】
いま、ｎ番目の入力チャンネル信号からｋ番目の入力チャンネルに対応した音源信号を除去してｎ番目の入力チャンネルに対応した音源信号を推定する場合の分離係数ベクトルを〔外８〕とすれば、ｎ番目の推定信号
【外９】

は
【数５】

で表されるから、この（２）式に（１）式を代入し、推定信号が音源信号にかなり近いとの仮定のもとに
【外１０】

として、さらに式を整理すると
【数６】

となる。推定信号と音源信号とが完全に一致する場合には、混合係数ベクトル〔外５〕と分離係数ベクトル〔外８〕は理論上一致する筈である。このため、
【外１１】

であれば完全に音源信号が推定できたことになるが、実際には、混合係数ベクトル〔外５〕は混合過程モデルで定義した混合係数ベクトルであり、その値は未知なので次のような期待値
【外１２】

を分離の指標として考える。
【０００７】
つまり、各音源信号間が無相関と仮定すればいわゆるクロス項が０となり
【数７】

で表わされる。ここで、
【外１３】

はベクトルのノルムであり
【数８】

である。この（４）式より
【外１４】

がゼロベクトルのとき、すなわち、
【外１５】

のとき、期待値〔外１２〕が最小になる。そこで、〔外１２〕を分離の指標と見なし、これを最小化する分離係数ベクトル〔外８〕を１つ前の時刻における〔外８〕を用いて逐次修正しながら推定していく。
【０００８】
ここで、ｋ番目の入力チャンネルに対応した音源信号はｋ番目以外の入力チャンネルに対応した音源信号にとっては雑音である。従って、この雑音としてのｋ番目の入力チャンネルに対応した音源信号にのみ注目し、ｋ番目以外には音源信号がないと仮定する。例えば、図４に示した例において、ｋ番目の入力チャンネルに対応した音源信号以外の音源信号はないと仮定すれば、図４は、図５のように書き替えることができる。
図５において、ある時刻ｊにおける分離係数ベクトル〔外８〕を
【外１６】

と表記した。混合係数ベクトル〔外５〕も同様に
【外１７】

と表記した。同様に、ｋ番目の音源信号ベクトル
【外１８】

、ｎ番目のチャンネル信号ベクトル
【外１９】

、ｋ番目の推定信号ベクトル
【外２０】

についても時刻ｊにおけるものであることを明確にするために、それぞれ
【外２１】

と表記した。
【０００９】
いま、図５において分離係数ベクトル〔外１６〕が混合係数ベクトル〔外１７〕に等しくなるように〔外１６〕を修正するには、同図に示す残差
【外２２】

がなるべく小さくなるようにすればよい。ｋ番目の音源信号ベクトル
【外２３】

を雑音として除去するために分離係数ベクトル〔外１６〕を逐次修正して残差〔外２２〕を最小化する過程は次のように記述することができる。
１．初期設定として分離係数ベクトル
【外２４】

は任意の初期値とする。ただし、一般的にはゼロベクトルとすることが多い。
２．図５では、ｎ番目の音源信号ベクトルはゼロベクトルの場合を仮定しているので
【外２５】

となり、
【数９】

であるから残差ｅ_jは
【数１０】

として求まる。次に修正ベクトル
【外２６】

を
【数１１】

とおいて、これ（〔外２６〕）を
【数１２】

のようにμ倍したうえで、その時刻の分離係数ベクトル〔外１６〕に加算することで逐次修正を行って、次の時刻における分離係数ベクトル
【外２７】

を生成する。
【００１０】
図５に示す
【外２８】

は時刻ｊと時刻（ｊ＋１）との差の時間に相当する遅延回路を意味し、図５全体としてはこの逐次修正過程を実現する回路構成を示している。図中、μは収束係数であらかじめ定められた値（０＜μ≦１）である。そして逐次修正された分離係数ベクトル〔外２７〕を用いて、上述の（２）式により次の時刻（ｊ＋１）における音源信号の推定を行う。現実には上記仮定と異なり、ｋ番目以外の音源信号が存在し、しかもそれはｋ番目のチャンネル信号にも混在しているので、上記のような逐次処理によっては完全な音源信号の推定はできないが、図５中の残差〔外２２〕に相当する第ｎ推定信号を最も小さくする場合の修正ベクトル〔外２６〕が〔外１６〕を〔外１７〕に最も近づける〔外２６〕になることから、従来よりこのような逐次修正による考え方を用いて音源信号の推定を行っている。
【００１１】
【発明が解決しようとする課題】
上述したように、従来のＩＣＡの手法においては、ある時刻ｊにおける分離係数ベクトルの修正ベクトル〔外２６〕の修正方向は（７）式によって時刻ｊにおける推定信号ベクトル
【外２９】

に基づいて決められる。しかし、修正ベクトル〔外２６〕の大きさは推定信号ベクトル〔外２９〕の大きさにも依存するため、分離係数ベクトル〔外１６〕を最適な値に向けて逐次修正していく際に、推定信号ベクトル〔外２９〕のパワー変動によって修正ベクトル〔外２６〕の大きさが大きく変動し、結果的に逐次修正が不安定になってしまう。従来、それを避けるために収束係数μを必要以上に小さな値としていたことから逐次修正の収束が遅くなるという問題があった。さらに、収束速度が入力信号の大きさに依存するため、収束係数μを必要以上に小さくしても動作が不安定となる現象が残るという問題もあった。
【００１２】
本発明の目的は、複数（例えば、Ｍ個）の音源信号が相互に混在してディジタル信号の形式で複数（例えば、Ｎ個）のチャンネル信号として入力されたとの仮定のもとで、前記チャンネル信号に基づき前記音源信号として推定されたＭ個の推定信号を２信号間で演算してチャンネル信号から混在信号を分離するための分離係数ベクトルを逐次修正しながら求め、その分離係数ベクトルと前記混在信号としての前記推定信号ベクトルとを内積演算して得られた結果を前記チャンネル信号から減じて、前記音源信号を推定する音源信号推定装置において、従来のように、逐次修正が不安定になったり、逐次修正の収束が遅くなることのない音源信号推定装置を提供することにある。
【００１３】
【課題を解決するための手段】
上記目的を達成するために、本発明においては、時刻ｊにおける推定信号ベクトルを、時刻ｊにおける分離係数ベクトルの修正ベクトルの修正方向の決定にのみ用いるように、分離係数ベクトルの修正ベクトル〔外２６〕を推定信号ベクトル〔外２９〕のノルムの自乗で正規化する正規化処理手段を設けたことを特徴としている。
【００１４】
すなわち、本発明による音源信号推定装置は、Ｍ（Ｍ≧２）個の音源信号が相互に混在してディジタル信号の形式でＮ（Ｎ≧Ｍ≧２）個のチャンネル信号として入力されたとの仮定のもとで、前記Ｎ個のチャンネル信号に基づき前記音源信号として推定されたＭ個の推定信号を２信号間で演算してチャンネル信号から混在信号を分離するための分離係数ベクトルを逐次修正しながら求め、その分離係数ベクトルと前記混在信号としての前記推定信号ベクトルとを内積演算して得られた結果を前記チャンネル信号から減じて、前記音源信号を推定する音源信号推定装置において、該装置は、前記混在信号としての推定信号と前記音源信号として推定された推定信号とが入力され、前記分離係数ベクトルを逐次修正するためのベクトルを線形演算により生成して出力する修正ベクトル生成手段と、該修正ベクトル生成手段の出力ベクトルを前記分離係数ベクトルに加算して次の時点における分離係数ベクトルとして出力することにより前記分離係数ベクトルの逐次修正を行う逐次修正手段と、該逐次修正手段によって逐次修正された分離係数ベクトルと前記混在信号としての推定信号ベクトルとを内積演算して得られた結果を前記チャンネル信号から減じて、前記音源信号として推定された推定信号を出力する減算手段とを含む音源信号推定手段を少なくとも１組具えるとともに、前記修正ベクトル生成手段は、前記混在信号としての推定信号ベクトルのノルムの自乗で正規化演算して、前記音源信号として推定された推定信号の大きさを有するとともに前記混在信号としての推定信号ベクトルの方向を有するベクトルを修正ベクトルとして生成し、該修正ベクトルを所定倍して出力する修正ベクトル生成手段であることを特徴とするものである。
【００１５】
【発明の実施の形態】
以下に添付図面を参照し、発明の実施の形態に基づいて本発明を詳細に説明する。
上述したように、本発明は、上記ＩＣＡの手法において、分離係数ベクトル〔外１６〕の修正ベクトル〔外２６〕を推定信号ベクトル〔外２９〕で正規化するようにしたものであり、そのための正規化回路を含んでいる本発明装置の一実施形態の回路構成を図１に示す。
ここに、図示の正規化回路における正規化の処理は次式
【数１３】

で表わされる。
【００１６】
また、図１においては、２つの入力チャンネルおよび２つの音源信号が存在する場合を想定していて、２つの音源信号に対する各残差として他の推定信号そのものが入力されている。また、図１では、雑音としての推定信号ベクトル〔外２９〕をそのノルムの自乗で正規化して残差ｅ_jとの乗算を行い分離係数ベクトル〔外１６〕の修正ベクトル〔外２６〕を算出する回路構成例を示しているが、図１の正規化回路に代わり、各推定信号ベクトルと残差とを乗算して求めた、従来技術における（７）式の修正ベクトルに
【外３０】

を乗算する乗算回路を収束係数乗算器の直前に配置することにより修正ベクトルの正規化の処理を実現してもよく、また収束係数乗算後に正規化の処理を施してもよい。さらに、正規化の処理は、上記正規化の処理を行うハードウエアと同様の動作をソフトウェアで実現することも可能である。
【００１７】
図１に示す実施形態の動作を説明する。
まず、２つの入力チャンネル信号（第１チャンネル信号ベクトル
【外３１】

、第２チャンネル信号ベクトル
【外３２】

）は、ある時刻ｊにおけるｍ個の連続するサンプル値の集合として装置に並列に入力される。いま、第１チャンネル信号に注目すれば、第１チャンネル信号から、混在信号としての推定信号と位置付けられる第２推定信号ベクトルと以下に説明するようにして求めた分離係数ベクトル
【外３３】

との間で内積演算した結果が減算器（図１中、記号
【外３４】

で示す）で減じられて音源信号としての第１推定信号ベクトル
【外３５】

として出力される。図１中、記号
【外３６】

は内積演算回路を示している。
【００１８】
上記において、内積演算すべき一方のベクトルとしての分離係数ベクトル〔外３３〕は、次のように逐次修正して求める。
すなわち、雑音としての第２推定信号をベクトル表現したときの方向成分のみを抽出するために正規化回路で第２推定信号ベクトル
【外３７】

をそのノルムの自乗で正規化し、その結果と第１推定信号とを乗算器（図１中、記号
【外３８】

で示す）で乗算して修正ベクトル
【外３９】

を求め、さらに収束係数μ（０＜μ≦１）を乗算したうえで時刻ｊにおける分離係数ベクトル〔外３３〕（図１中、逐次修正手段を構成し、記号〔外２８〕で示される時刻ｊと時刻（ｊ＋１）との差の時間に相当する遅延回路の出力）と加算して、図１中、記号
【外４０】

で示される加算器の出力を時刻（ｊ＋１）における分離係数ベクトルとしている。
【００１９】
本発明は、上述したC. Jutten 他の文献に見られる手法において、最適な逐次修正を行うにはどうしたらよいかを理論的に検討した結果生まれたものであるため、次に、その理論的根拠を簡単に説明する。
図２は、分離係数ベクトル〔外１６〕の修正方向と修正量をμ＝１（μは収束係数）として幾何学的に示し、とくに、図５の場合におけるある時刻ｊの音源ベクトル
【外４１】

と、それを除外するために用いる分離係数ベクトル〔外１６〕の集合と、〔外１６〕を修正ベクトル〔外２６〕により修正して得られる次の時刻ｊ＋１における分離係数ベクトル
【外４２】

の集合を示している。
【００２０】
図２において、ある時刻ｊにおける音源ベクトル〔外４１〕を除去するために用いた分離係数ベクトル〔外１６〕がベクトル空間上で図示の位置に存在したとすると、次の時刻（ｊ＋１）において用いる分離係数ベクトル〔外４２〕は分離係数ベクトル〔外１６〕を修正ベクトル〔外２６〕により修正したものとなる。ここで、修正ベクトル〔外２６〕は時刻ｊにおける推定信号ベクトル〔外２９〕に基づきその方向が決まるが、
【外４３】

を仮定しているので、修正ベクトル〔外２６〕の方向は〔外４１〕の方向と一致するものと考えてよい。
【００２１】
従って、いま、混合過程モデルにおける混合係数ベクトル〔外１７〕がベクトル空間上で図２中の図示の位置にあるとすれば、同図において、
【外４４】

ベクトル（混合係数ベクトル−分離係数ベクトル）と修正ベクトル〔外２６〕ベクトルとを含む面は図３に示され、理論的に最適な修正ベクトル〔外２６〕は、〔外４４〕ベクトルを〔外４１〕ベクトル上に射影したものとなる。ここで面π_jを、図５の第ｋ音源信号ベクトル
【外４５】

に対し（５）式で求めたｚ_jが第ｎチャンネル信号ベクトルｙ_n,jに等しくなるような分離係数ベクトル〔外１６〕の集合とすると、面π_jは、次式のようにｍ次元ユークリッド空間内の平面をなす。
【数１４】

【００２２】
また、面π_j+1は面π_j上の分離係数ベクトル〔外１６〕を修正ベクトル〔外２６〕により修正して得られた時刻（ｊ＋１）における分離係数ベクトル〔外４２〕の集合であるから、理論的には、逐次修正された分離係数ベクトル〔外４２〕のベクトル空間上の位置は、ベクトル空間上に占める分離係数ベクトル〔外１６〕の位置より面π_j+1に下ろした垂線の足となる。すなわち、図３より、最適な修正ベクトル〔外２６〕は
【数１５】

と求まる。
ここで、残差ｅ_jは（６）式を変形すると
【数１６】

となるから、これを用いて（１０）式を変形すると次式を得る。
【数１７】

（１１）式は、まさしく従来手法における修正ベクトル〔外２６〕を表す（７）式を推定信号ベクトル〔外２９〕のノルムの自乗で正規化したものに他ならない。
【００２３】
すなわち、図３から明らかなように、（１１）式の修正ベクトル〔外２６〕は、分離係数ベクトル〔外１６〕を推定信号ベクトル〔外２９〕のベクトル方向に修正する修正量として最適な修正量である。従って、分離係数ベクトル〔外１６〕を推定信号ベクトル〔外２９〕のベクトル方向に修正する場合、（１１）式の修正ベクトル〔外２６〕は、修正後の分離係数ベクトル〔外４２〕と目標とする混合係数ベクトル〔外１７〕との差が最小となる修正量であるため、逐次修正したときに推定信号ベクトル〔外２９〕の大きさによらずに単調に収束することが保証される。
【００２４】
なお、本実施形態では、チャンネル信号、推定信号の数がそれぞれ２の場合を例に説明したが、分離係数ベクトルは２信号間の演算により求められるため、チャンネル信号、推定信号の数が２を超える場合であっても本発明を同様に適用できることは言うまでもない。
【００２５】
【発明の効果】
本発明によれば、ある音声信号にそれ以外の複数の楽音や音声信号または雑音が混入し、それらが相互に混在している信号からそれぞれの信号を推定、分離するに際し、それぞれの信号パワー変動による推定、分離への影響を軽減することができ、さらに、収束係数を大きくすることができることから安定かつ高速の信号分離が可能となる。
【図面の簡単な説明】
【図１】本発明による音源信号推定装置の一実施形態を示している。
【図２】分離係数ベクトル〔外１６〕を逐次修正して求めるにあたり、分離係数ベクトルの修正方向と修正量を幾何学的に示している。
【図３】図２における〔外４４〕ベクトル（混合係数ベクトル−分離係数ベクトル）と修正ベクトル〔外２６〕ベクトルとを含む面を示している。
【図４】ＩＣＡの手法によって混合している音源信号を推定、分離するにあたって、その複数の音源信号の混合、分離過程の原理を示している。
【図５】ｋ番目の入力チャンネルに対応する音源信号以外の音源信号はないと仮定して、図１を書き替えたものである。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a technique for estimating a plurality of sound source signals for each sound source when a plurality of sound source signals are mixed and input via a plurality of channels.
[0002]
[Prior art]
When a plurality of sound source signals are mixed and inputted via a plurality of channels, it is generally not possible to estimate the sound source signal for each sound source. The reason is that when the sound source signal is unknown, it is impossible to uniquely determine the mixing process only from the channel signal which is a signal once mixed and input. Therefore, it is attempted to estimate and separate sound source signals by modeling the sound source signal mixing process on the assumption that the sound source signals are statistically independent from each other. This kind of conventional estimation and separation method was described in the document C. Jutten et al. “Blind separation of sources, Part 1: An adaptive algorithm based on neuromimetic architecture,” Signal Process. 24, 1-10 (1991). There is an ICA (Independent Component Analysis) technique.
[0003]
The ICA method is a method for estimating and separating a sound source signal that models the mixing process of a plurality of sound source signals and uses the fact that the original sound source signal is statistically independent. FIG. 4 shows the principle of the mixing and separation processes of a plurality of sound sources, taking as an example the case where there are two sound source signals, channel signals, and estimated signals. Two channel signals (first channel signal and second channel signal) as input signals are each given as a vector as a set of a plurality of consecutive sample values at a certain time. In FIG. 4, the number of consecutive sample values is m (m> 1). Two channel signal vectors consisting of m sample values [Outside 1]

Is input to the input end of the sound source separation process, and the sound source separation process is performed on the two sound source signal vectors in the mixed process model consisting of m sample values.

2 estimated signal vectors [Outside 3]

As estimated. In the case of FIG. 4, the number of all signals (channel signals and estimated signals) is 2, but the number of channel signals and estimated signals is expanded to N and M (N> 2, N ≧ M> 2), respectively. But generality is not lost. However, even when the number of sound sources is larger than the number of channels, it is not possible to estimate the number of sound sources exceeding the number of channels.
[0004]
The mixed process model shown in FIG. 4 has the following two sound source signal vectors each of m sample values:

Are individual mixing coefficient vectors [Outside 4]

The mixture process model is calculated by calculating the inner product and added to the other sound source signal. The two mixed signals are channel signal vectors

Is input to the sound source separation process shown in FIG. Here, the mixing coefficient vector shown in the figure [5]

Indicates a mixing coefficient vector when the kth sound source signal is mixed into the nth sound source signal.
[0005]
That is, this ICA method is a method for estimating and separating sound source signal vectors on the assumption that the nth input channel signal can be expressed by the following equation based on the mixing process model shown in FIG.
[Equation 3]

here,
[Outside 6]

Inner product, ie,

Represents. Based on the equation (1), a predetermined sound source signal is estimated from a channel signal mixed with other sound source signals.

To separate as a separation coefficient vector described below.

And the result obtained by calculating the inner product of the estimated signal vectors corresponding to the other sound source signals by the respective separation coefficient vectors may be subtracted from the channel signal from which the sound source signal is to be separated.
[0006]
Now, if the sound source signal corresponding to the kth input channel is removed from the nth input channel signal and the sound source signal corresponding to the nth input channel is estimated, the separation coefficient vector is [Outside 8]. nth estimated signal [outside 9]

Is [Equation 5]

Therefore, substituting Eq. (1) into Eq. (2) and assuming that the estimated signal is very close to the sound source signal.

Then, further organizing the formula:

It becomes. When the estimated signal and the sound source signal completely match, the mixing coefficient vector [Outside 5] and the separation coefficient vector [Outside 8] should theoretically match. For this reason,
[Outside 11]

In this case, the sound source signal can be completely estimated. However, in reality, the mixing coefficient vector [5] is a mixing coefficient vector defined by the mixing process model, and its value is unknown. Value [Outside 12]

As an indicator of separation.
[0007]
In other words, assuming that the sound source signals are uncorrelated, the so-called cross term becomes 0.

It is represented by here,
[Outside 13]

Is the norm of the vector

It is. From this equation (4) [Outside 14]

Is a zero vector, that is,
[Outside 15]

In this case, the expected value [outside 12] is minimized. Therefore, [Outside 12] is regarded as a separation index, and the separation coefficient vector [Outside 8] that minimizes this is estimated while being sequentially corrected using [Outside 8] at the previous time.
[0008]
Here, the sound source signal corresponding to the kth input channel is noise for the sound source signal corresponding to the input channel other than the kth. Accordingly, attention is paid only to the sound source signal corresponding to the kth input channel as the noise, and it is assumed that there is no sound source signal other than the kth. For example, in the example shown in FIG. 4, assuming that there is no sound source signal other than the sound source signal corresponding to the kth input channel, FIG. 4 can be rewritten as shown in FIG.
In FIG. 5, the separation coefficient vector [Outside 8] at a certain time j is expressed as [Outside 16].

It was written. Similarly for the mixing coefficient vector [Outside 5] [Outside 17]

It was written. Similarly, the kth sound source signal vector [Outside 18]

, Nth channel signal vector [Outside 19]

, K-th estimated signal vector

To make it clear that it is at time j,

It was written.
[0009]
To correct [outside 16] in FIG. 5 so that the separation coefficient vector [outside 16] becomes equal to the mixed coefficient vector [outside 17], the residual shown in FIG.

Should be as small as possible. kth sound source signal vector [outside 23]

The process of sequentially correcting the separation coefficient vector [outside 16] and minimizing the residual [outside 22] to remove the noise as the noise can be described as follows.
1. Separation coefficient vector as an initial setting [Outside 24]

Is an arbitrary initial value. However, in general, a zero vector is often used.
2. In FIG. 5, it is assumed that the n-th sound source signal vector is a zero vector.

And
[Equation 9]

Therefore, the residual e _j is

It is obtained as Next, the correction vector [Outside 26]

[Equation 11]

And this ([Outside 26])

As shown in FIG. 5, the correction coefficient is sequentially corrected by adding to the separation coefficient vector [outside 16] at that time, and the separation coefficient vector at the next time [outside 27].

Is generated.
[0010]
[Outside 28] shown in Fig. 5

Means a delay circuit corresponding to the time difference between time j and time (j + 1), and FIG. 5 as a whole shows a circuit configuration for realizing this sequential correction process. In the figure, μ is a predetermined value (0 <μ ≦ 1) as a convergence coefficient. Then, the sound source signal at the next time (j + 1) is estimated by the above equation (2) using the sequentially corrected separation coefficient vector [outside 27]. In reality, unlike the above assumption, there is a sound source signal other than the k-th, and it is also mixed in the k-th channel signal. Therefore, complete sound source signal estimation cannot be performed by the above-described sequential processing. 5, the correction vector [outside 26] when the n-th estimated signal corresponding to the residual [outside 22] in FIG. 5 is minimized becomes [outside 26] that makes [outside 16] closest to [outside 17]. Therefore, the sound source signal is estimated by using the concept based on such sequential correction.
[0011]
[Problems to be solved by the invention]
As described above, in the conventional ICA method, the correction direction of the correction vector [outside 26] of the separation coefficient vector at a certain time j is the estimated signal vector at time j according to equation (7).

It is decided based on. However, since the magnitude of the correction vector [outside 26] also depends on the magnitude of the estimated signal vector [outside 29], when the separation coefficient vector [outside 16] is sequentially corrected toward the optimum value, Due to the power fluctuation of the estimated signal vector [outside 29], the magnitude of the correction vector [outside 26] largely fluctuates, and as a result, the successive correction becomes unstable. Conventionally, in order to avoid this, the convergence coefficient μ is set to a value that is smaller than necessary, so that there is a problem that the convergence of the successive correction becomes slow. Furthermore, since the convergence speed depends on the magnitude of the input signal, there has been a problem that the operation becomes unstable even when the convergence coefficient μ is made smaller than necessary.
[0012]
An object of the present invention is to assume that a plurality of (for example, M) sound source signals are mixed and input as a plurality of (for example, N) channel signals in the form of digital signals. The M estimated signals estimated as the sound source signals based on the signal are calculated between two signals to obtain a separation coefficient vector for separating the mixed signal from the channel signal while sequentially correcting it, and the separation coefficient vector and the mixed signal are obtained. In a sound source signal estimating apparatus for estimating the sound source signal by subtracting the result obtained by calculating the inner product of the estimated signal vector as a signal from the channel signal, the successive correction becomes unstable as in the prior art. Another object of the present invention is to provide a sound source signal estimating apparatus that does not slow down the convergence of successive corrections.
[0013]
[Means for Solving the Problems]
In order to achieve the above object, in the present invention, the estimated signal vector at time j is used only for determining the correction direction of the correction vector of the separation coefficient vector at time j. ] Is normalized by means of the square of the norm of the estimated signal vector [outside 29].
[0014]
That is, the sound source signal estimation apparatus according to the present invention assumes that M (M ≧ 2) sound source signals are mixed and input as N (N ≧ M ≧ 2) channel signals in the form of digital signals. Based on the N channel signals, M estimated signals estimated as the sound source signals are calculated between the two signals, and the separation coefficient vector for separating the mixed signal from the channel signals is sequentially corrected. In the sound source signal estimation apparatus for estimating the sound source signal by subtracting the result obtained by calculating the inner product of the separation coefficient vector and the estimated signal vector as the mixed signal from the channel signal, The estimation signal as the mixed signal and the estimation signal estimated as the sound source signal are input, and a vector for linearly correcting the separation coefficient vector is linearly calculated. Correction vector generation means for generating and outputting the same, and the output vector of the correction vector generation means is added to the separation coefficient vector and output as a separation coefficient vector at the next time point to sequentially correct the separation coefficient vector. The channel signal is subtracted from the result obtained by calculating the inner product of the successive correction means, the separation coefficient vector sequentially corrected by the successive correction means, and the estimated signal vector as the mixed signal, and is estimated as the sound source signal. And at least one set of sound source signal estimating means including a subtracting means for outputting the estimated signal, and the correction vector generating means performs a normalization operation by the square of the norm of the estimated signal vector as the mixed signal, and An estimated signal having a size of an estimated signal estimated as a sound source signal and the mixed signal Generates a vector having the direction vector as a correction vector, it is characterized in that a correction vector generation means for outputting the correction vector predetermined multiple to.
[0015]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, the present invention will be described in detail based on an embodiment of the invention with reference to the accompanying drawings.
As described above, according to the present invention, in the ICA method, the correction vector [outside 26] of the separation coefficient vector [outside 16] is normalized by the estimated signal vector [outside 29]. A circuit configuration of an embodiment of the device of the present invention including a normalization circuit is shown in FIG.
Here, the normalization processing in the normalization circuit shown in the figure is as follows:

It is represented by
[0016]
In FIG. 1, it is assumed that there are two input channels and two sound source signals, and other estimated signals themselves are input as residuals for the two sound source signals. Also, calculated in FIG. 1, the correction vector of the separation coefficient vector [External 16] to normalize the estimated signal vector as noise [External 29] in the square of the norm performs multiplication of the residual e _j [out 26] An example of a circuit configuration is shown. However, instead of the normalization circuit of FIG. 1, the correction vector of the expression (7) in the prior art obtained by multiplying each estimated signal vector and the residual is added to the correction vector in the prior art.

The correction circuit normalization process may be realized by arranging a multiplication circuit that multiplies in front of the convergence coefficient multiplier, or the normalization process may be performed after the convergence coefficient multiplication. Further, in the normalization process, the same operation as that of the hardware that performs the normalization process can be realized by software.
[0017]
The operation of the embodiment shown in FIG. 1 will be described.
First, two input channel signals (first channel signal vector [outside 31]

, Second channel signal vector [Outside 32]

) Is input to the apparatus in parallel as a set of m consecutive sample values at a certain time j. Now, paying attention to the first channel signal, the second estimated signal vector positioned as the estimated signal as a mixed signal from the first channel signal and the separation coefficient vector obtained as described below.

The result of the inner product calculation with the subtracter is the subtractor (in Fig. 1, the symbol [Outside 34]

The first estimated signal vector as the sound source signal

Is output as Symbol in Fig. 1 [Outside 36]

Indicates an inner product arithmetic circuit.
[0018]
In the above, the separation coefficient vector [outside 33] as one vector to be subjected to the inner product operation is obtained by sequentially correcting as follows.
That is, in order to extract only the directional component when the second estimated signal as noise is expressed as a vector, the normalizing circuit extracts the second estimated signal vector.

Is normalized by the square of its norm, and the result and the first estimated signal are multiplied by a multiplier (in FIG.

The correction vector by multiplying by

Is further multiplied by the convergence coefficient μ (0 <μ ≦ 1), and then the separation coefficient vector [outside 33] at time j (in FIG. 1, the time is indicated by the symbol [outside 28] constituting the successive correction means. 1 and the output of the delay circuit corresponding to the time of the difference between j and time (j + 1) in FIG.

Is the separation coefficient vector at time (j + 1).
[0019]
The present invention was born as a result of theoretical investigation on how to perform the optimum sequential correction in the method described in the above-mentioned C. Jutten et al. Briefly explain the rationale.
FIG. 2 geometrically shows the correction direction and correction amount of the separation coefficient vector [outside 16] as μ = 1 (μ is a convergence coefficient), and in particular, the sound source vector at a certain time j in the case of FIG. ]

And a set of separation coefficient vectors [outside 16] used to exclude them, and a separation coefficient vector at the next time j + 1 obtained by correcting [outside 16] with the correction vector [outside 26].

A set of
[0020]
In FIG. 2, if the separation coefficient vector [outside 16] used for removing the sound source vector [outside 41] at a certain time j exists at the position shown in the vector space, it is used at the next time (j + 1). The separation coefficient vector [outside 42] is obtained by correcting the separation coefficient vector [outside 16] with a correction vector [outside 26]. Here, the direction of the correction vector [outside 26] is determined based on the estimated signal vector [outside 29] at time j.
[Outside 43]

Therefore, the direction of the correction vector [outside 26] may be considered to coincide with the direction of [outside 41].
[0021]
Therefore, if the mixing coefficient vector [outside 17] in the mixing process model is at the position shown in FIG. 2 in the vector space,
[Outside 44]

The plane containing the vector (mixing coefficient vector-separation coefficient vector) and the correction vector [outside 26] vector is shown in FIG. 3, and the theoretically optimal correction vector [outside 26] 41] Projected onto a vector. Here, the plane π _j is represented by the k-th sound source signal vector in FIG.

Is a set of separation coefficient vectors [outside 16] such that z _j obtained by the equation (5) is equal to the n-th channel signal vector y _{n, j} , the surface π _j has an m dimension as shown in the following equation: Forms a plane in Euclidean space.
[Expression 14]

[0022]
The plane π _{j + 1} is a set of separation coefficient vectors [outside 42] at time (j + 1) obtained by correcting the separation coefficient vector [outside 16] on the plane π _j with the correction vector [outside 26]. Theoretically, the position of the sequentially modified separation coefficient vector [outside 42] on the vector space is perpendicular to the plane π _{j + 1} from the position of the separation coefficient vector [outside 16] in the vector space. Become a foot. That is, from FIG. 3, the optimal correction vector [outside 26] is

It is obtained.
Here, the residual e _j is obtained by transforming equation (6):

Therefore, when the equation (10) is transformed using this, the following equation is obtained.
[Expression 17]

The expression (11) is nothing but the one obtained by normalizing the expression (7) representing the correction vector [Outside 26] in the conventional method by the square of the norm of the estimated signal vector [Outside 29].
[0023]
That is, as is apparent from FIG. 3, the correction vector [outside 26] of the equation (11) is the optimal correction amount for correcting the separation coefficient vector [outside 16] in the vector direction of the estimated signal vector [outside 29]. Amount. Therefore, when the separation coefficient vector [outside 16] is corrected in the vector direction of the estimated signal vector [outside 29], the correction vector [outside 26] of the equation (11) is the corrected separation coefficient vector [outside 42] and the target Since the correction amount minimizes the difference from the mixing coefficient vector [Outside 17], it is guaranteed to converge monotonously regardless of the size of the estimated signal vector [Outside 29] when it is sequentially corrected. .
[0024]
In this embodiment, the case where the number of channel signals and estimated signals is 2 has been described as an example. However, since the separation coefficient vector is obtained by calculation between two signals, the number of channel signals and estimated signals is 2. It goes without saying that the present invention can be applied in the same manner even when the number of cases is exceeded.
[0025]
【The invention's effect】
According to the present invention, when a plurality of other musical sounds, voice signals, or noise are mixed in a certain voice signal, and each signal is estimated and separated from a signal in which they are mixed with each other, each signal power fluctuation Can reduce the influence on estimation and separation, and can increase the convergence coefficient, thereby enabling stable and high-speed signal separation.
[Brief description of the drawings]
FIG. 1 shows an embodiment of a sound source signal estimation apparatus according to the present invention.
FIG. 2 shows geometrically the correction direction and the correction amount of a separation coefficient vector when the separation coefficient vector [outside 16] is sequentially corrected.
FIG. 3 shows a plane including the [external 44] vector (mixing coefficient vector−separation coefficient vector) and the modified vector [external 26] vector in FIG. 2;
FIG. 4 shows the principle of mixing and separating a plurality of sound source signals when estimating and separating sound source signals mixed by the ICA method.
5 is a rewrite of FIG. 1 on the assumption that there is no sound source signal other than the sound source signal corresponding to the kth input channel.

Claims

Ｍ（Ｍ≧２）個の音源信号が相互に混在してディジタル信号の形式でＮ（Ｎ≧Ｍ≧２）個のチャンネル信号として入力されたとの仮定のもとで、前記Ｎ個のチャンネル信号に基づき前記音源信号として推定されたＭ個の推定信号を２信号間で演算してチャンネル信号から混在信号を分離するための分離係数ベクトルを逐次修正しながら求め、その分離係数ベクトルと前記混在信号としての前記推定信号ベクトルとを内積演算して得られた結果を前記チャンネル信号から減じて、前記音源信号を推定する音源信号推定装置において、該装置は、前記混在信号としての推定信号と前記音源信号として推定された推定信号とが入力され、前記分離係数ベクトルを逐次修正するためのベクトルを線形演算により生成して出力する修正ベクトル生成手段と、
該修正ベクトル生成手段の出力ベクトルを前記分離係数ベクトルに加算して次の時点における分離係数ベクトルとして出力することにより前記分離係数ベクトルの逐次修正を行う逐次修正手段と、
該逐次修正手段によって逐次修正された分離係数ベクトルと前記混在信号としての推定信号ベクトルとを内積演算して得られた結果を前記チャンネル信号から減じて、前記音源信号として推定された推定信号を出力する減算手段と
を含む音源信号推定手段を少なくとも１組具えるとともに、
前記修正ベクトル生成手段は、前記混在信号としての推定信号ベクトルのノルムの自乗で正規化演算して、前記音源信号として推定された推定信号の大きさを有するとともに前記混在信号としての推定信号ベクトルの方向を有するベクトルを修正ベクトルとして生成し、該修正ベクトルを所定倍して出力する修正ベクトル生成手段である
ことを特徴とする音源信号推定装置。On the assumption that M (M ≧ 2) sound source signals are mixed and input as N (N ≧ M ≧ 2) channel signals in the form of digital signals, the N channel signals The M estimated signals estimated as the sound source signals are calculated between the two signals to obtain a separation coefficient vector for separating the mixed signal from the channel signal while sequentially correcting, and the separation coefficient vector and the mixed signal are obtained. In the sound source signal estimating apparatus for estimating the sound source signal by subtracting the result obtained by calculating the inner product of the estimated signal vector and the channel signal, the apparatus includes the estimated signal as the mixed signal and the sound source. A correction vector generation means for receiving an estimation signal estimated as a signal and generating and outputting a vector for sequentially correcting the separation coefficient vector by linear operation ,
Sequential correction means for sequentially correcting the separation coefficient vector by adding the output vector of the correction vector generation means to the separation coefficient vector and outputting it as a separation coefficient vector at the next time point;
A result obtained by calculating an inner product of the separation coefficient vector sequentially corrected by the successive correction means and the estimated signal vector as the mixed signal is subtracted from the channel signal, and an estimated signal estimated as the sound source signal is output. And at least one set of sound source signal estimating means including subtracting means for performing
The correction vector generation means performs normalization operation by the square of the norm of the estimated signal vector as the mixed signal, has a size of the estimated signal estimated as the sound source signal, and also calculates the estimated signal vector as the mixed signal. A sound source signal estimation device, characterized by being a correction vector generation means for generating a vector having a direction as a correction vector and multiplying the correction vector by a predetermined number.