JP2003271168A

JP2003271168A - Method, device and program for extracting signal, and recording medium recorded with the program

Info

Publication number: JP2003271168A
Application number: JP2002072111A
Authority: JP
Inventors: Akiko Araki; 章子荒木; Shoji Makino; 昭二牧野; Makoto Mukai; 良向井; Hiroshi Sawada; 宏澤田; Hiroshi Saruwatari; 洋猿渡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2002-03-15
Filing date: 2002-03-15
Publication date: 2003-09-25

Abstract

<P>PROBLEM TO BE SOLVED: To provide a signal extracting method, a signal extracting device and a signal extraction program for estimating a separation filter with a length sufficient for separation while holding the assumption of the independence a signal in each band by adopting a configuration for applying subband analysis combination to a plurality of mixed signals to calculate an output signal corresponding to the original signal, and to provide a recording medium recorded with the program. <P>SOLUTION: This signal extracting method is for separatingly extracting the original signal from the plurality of mixed signals observed through a path having a long impulse response on the basis of its independence. By using the signal extracting method, the signal extracting device, the signal extraction program and the recording medium recorded with the program, the plurality of mixed signals are inputted to a subband analyzing part 51, the mixed signals are respectively subjected to a subband analysis into N (N is positive integer) bands, the subband-analyzed signal of each band is inputted to a time domain BSS part 53 of a corresponding band to be subjected to sound source separation in each band, and the sound source-separated signal is inputted to a subband combining part 55 to calculate an output signal corresponding to the original signal. <P>COPYRIGHT: (C)2003,JPO

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、信号抽出方法お
よび信号抽出装置、信号抽出プログラムとそのプログラ
ムを記録した記録媒体に関し、特に、観測したい原信号
は直接観測することはできずにノイズその他の信号が重
畳した状態で観測されるという状況下において、観測し
たい原信号を推定する技術であり、例えば音声認識装置
の入力マイクロホンと話者とが隔離してマイクロホンに
目的話者音声以外の音まで収音される様な状況下におい
ても目的話者音声を抽出して認識率の高い音声認識を実
施することができる信号抽出方法および信号抽出装置、
信号抽出プログラムとそのプログラムを記録した記録媒
体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a signal extracting method, a signal extracting device, a signal extracting program and a recording medium having the program recorded therein. In particular, an original signal to be observed cannot be directly observed and noise and other Under the condition that signals are observed in a superposed state, it is a technology that estimates the original signal to be observed. A signal extracting method and a signal extracting apparatus capable of extracting a target speaker's voice and performing voice recognition with a high recognition rate even in a situation where sound is picked up,
The present invention relates to a signal extraction program and a recording medium recording the program.

【０００２】[0002]

【従来の技術】原信号および混合過程の知識を全く使用
することなしに複数の線形混合された信号を推定する問
題はブラインド音源分離（Blind Source Separation：
ＢＳＳ）と称されるが、この出願の発明もこのブライン
ド音源分離技術に属する。信号相互間の統計的独立性に
基づいて線形混合された信号を分離する技術は、独立成
分分析（Independent Component Analysis：ICA）と称
される。実音場における収音の如く、信号に録音装置の
インパルス応答が畳み込まれた状態で線形混合された信
号は、The problem of estimating multiple linearly mixed signals without using any knowledge of the original signal and the mixing process is the Blind Source Separation:
BSS), but the invention of this application also belongs to this blind sound source separation technology. The technique of separating linearly mixed signals based on the statistical independence between the signals is called Independent Component Analysis (ICA). A signal that is linearly mixed with the impulse response of the recording device convoluted into the signal, like a sound pickup in a real sound field,

【０００３】[0003]

【数１】 [Equation 1]

【０００４】の如くに表現される。ここで、ｘ_j：センサｊで観測される信号ｓ_i：信号源ｉの信号ｈ_ji：信号源ｉからマイクロホンｊヘのＰタップのイン
パルス応答（線形システム）である。ブラインド音源分離を実用とする場合、インパ
ルス応答長Ｐは大きいことが多い。例えば、実環境にお
いて観測した混合音声の場合、残響が音声に畳み込まれ
ることになる。１５０〜３００ｍｓ程度の一般の会議室
程度の残響時間でも、インパルス応答長Ｐが数千という
長さになる。It is expressed as follows. Here, x _j : signal s _i observed by sensor _j : signal h _{ji of} signal source i: impulse response (linear system) of P tap from signal source i to microphone j. When the blind sound source separation is practically used, the impulse response length P is often large. For example, in the case of mixed speech observed in a real environment, reverberation will be convoluted with the speech. Even in the reverberation time of a general conference room of about 150 to 300 ms, the impulse response length P is as long as several thousand.

【０００５】独立成分分析においては、Ｎ個の信号源か
ら発せられる信号は統計的に互いに独立であると仮定
し、式（１）の形で得られる観測信号と長さがＱタップ
の分離フィルタ群ｗ_ijより成る分離系を使用して分離抽
出する。この分離フィルタ群ｗ _ijを使用して、分離抽出
して得られる信号ｙ_i（ｎ）は、In independent component analysis, there are N signal sources.
The signals emitted from them are assumed to be statistically independent of each other
However, the observation signal obtained in the form of equation (1) and the length are Q taps.
Separation filter group w_ijUsing a separation system consisting of
Put out. This separation filter group w _ijSeparation extraction using
Signal y obtained by_i(N) is

【０００６】[0006]

【数２】 [Equation 2]

【０００７】と表現される。図１は以上の混合分離過程
をＮ＝Ｍ＝２の場合について説明する図である。分離系
は、或る学習則ｗ_ij ^k+1＝ｗ_ij ^k＋△ｗ_ij ^k （３）を使用して出力ｙ_iを互いに独立にするように推定す
る。ｋは学習更新回数を示す。この問題は畳み込み混合
という複雑な問題であるので、分離フィルタ群を直接求
めることは難しい。そこで、離散フーリエ変換（ＤＦ
Ｔ）により周波数領域へ変換する方法がよく採用され
る。これを、周波数領域ブラインド音源分離（周波数領
域ＢＢＳ）と称す。先ず、式(１)をＤＦＴにより周波数
領域へ変換する。It is expressed as FIG. 1 is a diagram for explaining the above mixing and separating process when N = M = 2. The separating system estimates the outputs y _i to be independent of each other using some learning rule w _ij ^{k + 1} = w _ij ^k + Δw _ij ^k (3). k indicates the number of learning updates. Since this problem is a complicated problem of convolutional mixing, it is difficult to directly obtain the separation filter group. Therefore, the discrete Fourier transform (DF
The method of converting to the frequency domain by T) is often adopted. This is called frequency domain blind source separation (frequency domain BBS). First, the equation (1) is transformed into the frequency domain by DFT.

【０００８】Ｘ（ω、ｍ）＝Ｈ（ω）Ｓ（ω、ｍ）（４）これにより、畳み込み混合問題を各周波数における瞬時
混合問題として表現し、問題を簡単化することができ
る。以上の様にして、分離過程の推定は各周波数におい
て出力信号Ｙ₁（ω、ｍ)、Ｙ₂（ω、ｍ)が互いに独立と
なる様に、Ｎ＝Ｍ＝２の場合は（２×２）の分離行列
Ｗ（ω）を推定すればよくなる。Ｙ（ω、ｍ）＝Ｗ（ω）Ｘ（ω、ｍ）（５）実環境の如くインパルス応答長Ｐが大きい残響時間が１
５０〜３００ｍｓ以上程度の場合、これと同程度の長さ
を有する分離フィルタを求める必要がある。従って、周
波数領域ＢＳＳにおいて分離行列Ｗ（ω）を求める場
合、部屋のインパルス応答長Ｐより長いフレームＴを使
用してＤＦＴ分析を行い、周波数ビンの数を増やす必要
がある。しかし、決められた長さの学習データを、長い
フレームを使用して分析すると、各周波数におけるデー
タの数が少なくなり、各周波数においてデータの統計的
性質が悪化する。X (ω, m) = H (ω) S (ω, m) (4) Thus, the convolutional mixing problem can be expressed as an instantaneous mixing problem at each frequency, and the problem can be simplified. As described above, the estimation of the separation process is performed so that the output signals Y ₁ (ω, m) and Y ₂ (ω, m) are independent of each other at each frequency. It suffices to estimate the separation matrix W (ω) of 2). Y (ω, m) = W (ω) X (ω, m) (5) The reverberation time is 1 with a large impulse response length P as in a real environment.
In the case of about 50 to 300 ms or more, it is necessary to find a separation filter having a length comparable to this. Therefore, when obtaining the separation matrix W (ω) in the frequency domain BSS, it is necessary to perform the DFT analysis using the frame T longer than the impulse response length P of the room to increase the number of frequency bins. However, when the learning data of a fixed length is analyzed using a long frame, the number of data at each frequency is reduced, and the statistical property of the data is deteriorated at each frequency.

【０００９】図２を参照してこれについて説明する。フ
レーム長が短い図２（ａ）の場合、各周波数におけるデ
ータ数が充分であるので、各周波数において統計的性質
は充分に保証される。しかし、推定することができる分
離フィルタの長さは短く、不充分なものとなる。一方、
フレーム長が長い図２（ｂ）の場合、長い分離フィルタ
を準備することができるが、各周波数におけるデータ数
が少ないところから、データの統計的性質は悪化する。
統計的性質が悪化するという問題を図３を参照して説明
する。This will be described with reference to FIG. In the case of FIG. 2A where the frame length is short, the number of data at each frequency is sufficient, so that the statistical properties are sufficiently guaranteed at each frequency. However, the length of the separation filter that can be estimated is short, which is insufficient. on the other hand,
In the case of FIG. 2B where the frame length is long, a long separation filter can be prepared, but the statistical property of the data deteriorates because the number of data at each frequency is small.
The problem that the statistical properties deteriorate will be described with reference to FIG.

【００１０】図３は相関係数γω（ωは添え字）を全て
の周波数で求め平均したIn FIG. 3, the correlation coefficient γω (ω is a subscript) is calculated at all frequencies and averaged.

【００１１】[0011]

【数３】 [Equation 3]

【００１２】を示している。フレーム長Ｔが大きい時
に、信号間の相関が高くなり、独立成分分析に必要な独
立性の仮定が崩れていることが分かる。この様に、周波
数領域ＢＳＳは、部屋のインパルス応答長Ｐより長いフ
レームＴを使用してＤＦＴ分析し、周波数ビンの数を増
加する必要があるが、そうすると、各周波数においてデ
ータの統計的性質が悪化するので分離が困難になる。[0012] is shown. It can be seen that when the frame length T is large, the correlation between signals becomes high and the assumption of independence necessary for independent component analysis is broken. Thus, in the frequency domain BSS, it is necessary to perform the DFT analysis using the frame T longer than the impulse response length P of the room and increase the number of frequency bins. Separation becomes difficult because it deteriorates.

【００１３】[0013]

【発明が解決しようとする課題】従来の周波数領域ＢＳ
Ｓにおいては、長い残響に対応する大きなフレーム長Ｔ
を使用した時に信号間の相関が高くなり、独立成分分析
に必要な独立性の仮定が崩れて高い性能が得られなかっ
た。そこで、この発明は、特にインパルス応答長Ｐが大
きい数千の場合のＢＳＳにおいて、複数の混合信号をサ
ブバンド分析合成して原信号に対応する出力信号を求め
る構成を採用して各帯域で信号の独立性の仮定を保持し
たまま、分離に充分な長さの分離フィルタを推定する信
号抽出方法および信号抽出装置、信号抽出プログラムと
そのプログラムを記録した記録媒体を提供するものであ
る。Conventional frequency domain BS
At S, a large frame length T corresponding to long reverberation
When using, the correlation between signals became high, and the assumption of independence necessary for independent component analysis collapsed, and high performance could not be obtained. Therefore, the present invention adopts a configuration in which a mixed signal of a plurality of mixed signals is subjected to subband analysis and synthesis to obtain an output signal corresponding to an original signal in a BSS having a large impulse response length P of several thousand, and a signal in each band is adopted. The present invention provides a signal extraction method and a signal extraction device for estimating a separation filter having a sufficient length for separation while maintaining the independence assumption of (1), a signal extraction program, and a recording medium recording the program.

【００１４】[0014]

【課題を解決するための手段】長いインパルス応答を有
する経路を介して観測される複数の混合信号から原信号
をその独立性に基づいて分離抽出する信号抽出方法にお
いて、複数の混合信号をサブバンド分析部に入力してそ
れぞれＮ個（Ｎ：整数）の帯域にサブバンド分析し、サ
ブバンド分析された各帯域の信号を対応する帯域の時間
領域ＢＳＳ部に入力して各帯域毎に音源分離し、音源分
離した信号をサブバンド合成部に入力して原信号に対応
する出力信号を求める信号抽出方法を構成した。In a signal extraction method for separating and extracting an original signal based on its independence from a plurality of mixed signals observed through a path having a long impulse response, the plurality of mixed signals are subbanded. The signals are input to the analysis unit and subband-analyzed into N (N: integer) bands, and the signals of each band subjected to the subband analysis are input to the time-domain BSS unit of the corresponding band to separate the sound sources for each band. Then, a signal extraction method for obtaining the output signal corresponding to the original signal by inputting the source-separated signal to the subband synthesis unit was constructed.

【００１５】そして、長いインパルス応答を有する経路
を介して観測される複数の混合信号から原信号をその独
立性に基づいて分離抽出する信号抽出装置において、複
数の混合信号を入力してそれぞれＮ個（Ｎ：整数）の帯
域に分析するサブバンド分析部を具備し、サブバンド分
析された各帯域の信号毎に音源分離する時間領域ＢＳＳ
部を具備し、音原分離した信号を入力して原信号に対応
する出力信号を求めるサブバンド合成部を具備する信号
抽出装置を構成した。ここで、複数の混合信号をそれぞ
れＮ個（Ｎ：整数）の帯域にサブバンド分析し、サブバ
ンド分析された各帯域の信号を帯域毎に音源分離し、各
帯域に音源分離した信号から原信号に対応する出力信号
を求める、ことを実行させる信号抽出プログラムを構成
した。Then, in a signal extracting device for separating and extracting an original signal from a plurality of mixed signals observed through a path having a long impulse response on the basis of their independence, a plurality of mixed signals are input to each of N signals. A time domain BSS that includes a subband analysis unit that analyzes (N: integer) bands and separates sound sources for each signal in each band subjected to subband analysis.
A signal extraction device having a subband synthesizing unit for inputting a signal separated from a sound source and obtaining an output signal corresponding to the original signal is configured. Here, each of the plurality of mixed signals is subband-analyzed into N (N: integer) bands, the signals of each band subjected to the subband analysis are separated into sound sources for each band, and the original signals are extracted from the signals separated in each band. A signal extraction program was constructed to execute the calculation of the output signal corresponding to the signal.

【００１６】そして、複数の混合信号をそれぞれＮ個
（Ｎ：整数）の帯域にサブバンド分析し、サブバンド分
析された各帯域の信号を帯域毎に音源分離し、各帯域に
音源分離した信号から原信号に対応する出力信号を求め
る、ことを実行させる信号抽出プログラムを記録した記
録媒体を構成した。Then, the plurality of mixed signals are subband-analyzed into N (N: integer) bands, the subband-analyzed signals of each band are separated into sound sources for each band, and the sound source is separated into each band. A recording medium having a signal extraction program for executing the operation of obtaining an output signal corresponding to the original signal is constructed.

【００１７】[0017]

【発明の実施の形態】この発明は、サブバンド分析を利
用し、各帯域で信号分離を行う。これをサブバンドＢＳ
Ｓと称す。サブバンドＢＳＳは分割するサブバンドの個
数を自由に選ぶことができるので、図２（ｃ）に示され
る如く各サブバンドで統計的性質を充分満足する帯域分
割数を選定することができる。そして、周波数領域ＢＳ
Ｓは各周波数で１タップのフィルタしか推定することが
できなかったが、サブバンドＢＳＳは図２（ｃ）に示さ
れる如く、各帯域毎に長いフィルタを持たせることがで
きるところから、分割数が少なくてもフルバンドで見た
ときに充分に長いフィルタを推定することができる。BEST MODE FOR CARRYING OUT THE INVENTION The present invention utilizes subband analysis to perform signal separation in each band. This is a sub band BS
It is called S. Since the subband BSS can freely select the number of subbands to be divided, as shown in FIG. 2C, it is possible to select the number of band divisions that sufficiently satisfies the statistical properties of each subband. And frequency domain BS
S was able to estimate only a 1-tap filter at each frequency, but subband BSS can have a long filter for each band as shown in FIG. It is possible to estimate a sufficiently long filter when viewed in the full band even if there are few.

【００１８】以上の２点により、サブバンドＢＳＳを使
用して、インパルス応答長Ｐが長い場合でも各帯域での
データの統計的性質を保持しながら残響に対応すること
ができる長い分離フィルタを推定することができる。From the above two points, the subband BSS is used to estimate a long separation filter capable of coping with reverberation while maintaining the statistical property of data in each band even when the impulse response length P is long. can do.

【００１９】[0019]

【実施例】この発明の実施の形態を図４の実施例を参照
して説明する。図４はサブバンドＢＳＳの全体を示す図
である。（１）サブバンド分析過程先ず、入力された観測信号ｘ₁(ｎ）、ｘ₂(ｎ）は、サブ
バンド分析部５１₁、５１₂にそれぞれ入力してサブバン
ド分析される。（２）音源分離過程次いで、観測信号ｘ₁(ｎ）、ｘ₂(ｎ）の各帯域にサブバ
ンド分析された信号の成分は、帯域分割数をＮとして、
対応するそれぞれの帯域の時間領域ＢＳＳ部５３₁、・・・
・、５３_Nに入力して音源分離される。Embodiments of the present invention will be described with reference to the embodiment of FIG. FIG. 4 is a diagram showing the entire subband BSS. (1) Subband analysis process First, the input observed signals x ₁ (n) and x ₂ (n) are input to the subband analysis units 51 ₁ and 51 ₂ , respectively, and subband analyzed. (2) Sound source separation process Next, the component of the signal subjected to the subband analysis in each band of the observed signals x ₁ (n) and x ₂ (n) has the number of band divisions as N,
Time domain BSS units 53 ₁ , ...
Input to 53 _N to separate sound sources.

【００２０】（３）サブバンド合成過程最後に、各帯域に分離した信号の成分を、各時間領域Ｂ
ＳＳ部５３₁、・・・・・、５３_N からサブバンド合成部５５
₁、５５₂に入力して信号s₁(ｎ）、s₂(ｎ）に対応する信
号ｙ₁、ｙ₂が合成、出力される。上述した通り、長いイ
ンパルス応答を有する経路を介して観測される複数の混
合信号から原信号をその独立性に基づいて分離抽出する
この発明による信号抽出装置は、複数の混合信号を入力
してそれぞれＮ個（Ｎ：整数）の帯域に分析するサブバ
ンド分析部５１と、サブバンド分析された各帯域の信号
毎に音源分離する時間領域ＢＳＳ部５３と、音原分離し
た信号を入力して原信号に対応する出力信号を求めるサ
ブバンド合成部５５とにより構成される。(3) Sub-band synthesis process Finally, the signal components separated into each band are divided into each time domain B
From SS section 53 ₁ , ..., 53 _N to subband synthesis section 55
_The signals y ₁ and y ₂ corresponding to the signals s ₁ (n) and s ₂ (n) are synthesized and output by inputting to ₁ and 55 ₂ . As described above, the signal extraction device according to the present invention, which separates and extracts the original signal based on its independence from a plurality of mixed signals observed via a path having a long impulse response, inputs a plurality of mixed signals and A sub-band analysis unit 51 that analyzes N (N: integer) bands, a time-domain BSS unit 53 that separates sound sources for each signal of each band subjected to sub-band analysis, and a signal that has been subjected to sound source separation as input And a subband synthesis unit 55 for obtaining an output signal corresponding to the signal.

【００２１】次に、図５を参照して詳細に説明するに、
最初に、サブバンド分析過程について説明する。（１）
サブバンド分析過程は、サブバンド分析部５１₁および５
１₂とＳＳＢ変調部５２₁および５２₂とにより構成され
ている。帯域分割数をＮ、間引率をＭとすると、ｘ
_j(ｎ）のｋ番目の帯域における間引後の信号はNext, referring to FIG. 5, in detail,
First, the subband analysis process will be described. (1)
The subband analysis process is performed by the subband analysis units 51 ₁ and 5
1 ₂ and SSB modulators 52 ₁ and 52 ₂ . When the number of band divisions is N and the thinning rate is M, x
The signal after thinning in the k-th band of _j (n) is

【００２２】[0022]

【数４】 [Equation 4]

【００２３】と計算される。ここで、Ｗ_N＝ｅｘｐ（ｊ
２π／Ｎ）である。また、ｈ（ｎ）は分析に使用する帯
域［−π／Ｎ、π／Ｎ］のローパスフィルタであり、ｈ（ｎ）＝｛ｓｉｎ（ｎ／Ｎ）｝／（ｎ／Ｎ）（８）が使用される。この時、Ｘ（ｋ、ｍ）は複素数として得
られるが、後段の音源分離過程の時間領域ＢＳＳ部５３
₁、・・・・、５３_Nが実数のアルゴリズムの場合は、各帯域
で信号を実数で扱うために、例えば、ＳＳＢ（単側波
帯）変調部５２を使用することができる。ＳＳＢ変調部
５２を使用するサブバンドは周波数領域のエイリアジン
グを回避するために帯域分割数Ｎのとき、間引率ＭをＭ
＝Ｎ／４とする。Is calculated as Where W _N = exp (j
2π / N). Further, h (n) is a low-pass filter in the band [−π / N, π / N] used for analysis, and h (n) = {sin (n / N)} / (n / N) (8) Is used. At this time, X (k, m) is obtained as a complex number, but the time domain BSS unit 53 in the sound source separation process in the latter stage is
_1, ..., if 53 _N is a real algorithm, in order to handle the signal in real each band, for example, it can be used SSB (single side band) modulation unit 52. The subband using the SSB modulator 52 has a thinning rate M of M when the number of band divisions is N in order to avoid aliasing in the frequency domain.
= N / 4.

【００２４】ＳＳＢ変調による実数の信号はＸ
_j ^ssb（ｋ、ｍ）と表現すると、Ｘ_j ^ssb（ｋ、ｍ）＝Ｒｅ［Ｘ_j（ｋ、ｍ）］ｃｏｓ(ｍπ／２) ＋Ｉｍ［Ｘ_j（ｋ、ｍ）］ｓｉｎ（ｍπ／２）（９）により得られる。次に、音源分離過程について説明す
る。（２）音源分離過程はＮ個の時間領域ＢＳＳ部５３
₁、・・・・、５３_Nにより構成される。ここで、各帯域で使
用する時間領域ＢＳＳアルゴリズムの一例として、信号
の非定常性に基づく評価関数から導出されたものを示
す。簡単のために、出力信号Ｙ_i ^ssb（ｋ、ｎ）をｙ
_i（ｎ）と略記する。The real number signal by SSB modulation is X
_When expressed as _j ^ssb (k, m), X _j ^ssb (k, m) = Re [X _j (k, m)] cos (mπ / 2) + Im [X _j (k, m)] sin (mπ / 2) Obtained by (9). Next, the sound source separation process will be described. (2) The sound source separation process includes N time-domain BSS units 53.
_1, ..., constituted by 53 _N. Here, as an example of the time domain BSS algorithm used in each band, the one derived from the evaluation function based on the non-stationarity of the signal is shown. For simplicity, the output signal Y _i ^ssb (k, n) is y
_It is abbreviated as _i (n).

【００２５】出力信号の相互相関が全ての時間ブロック
において０になった時に最小値０をとる非負の評価関数Non-negative evaluation function having a minimum value of 0 when the cross-correlation of output signals becomes 0 in all time blocks

【００２６】[0026]

【数５】 [Equation 5]

【００２７】を考える。ここで、ｙ（ｎ）＝［ｙ
₁(ｎ）、ｙ₂(ｎ）］^T は出力信号であり、Ｒ_y ^b（τ）
は出力信号の共分散行列（＝＜ｙ（ｎ）ｙ^T(ｎ−
τ）＞_b ）であり、＜ｘ＞_bはブロックｂ（ｂ＝１、・・・
・・・、Ｂ）についての時間平均である。分離フィルタｗ
_ijの更新式は、この評価関数Ｑをｗ（ｋ）で微分して
算出したnatural gradientにおいて、更に、性能を高め
るためにＲ_y ^b（０）のみではなく、時間ずれの相関
Ｒ_y ^b（τ）も考慮することで次の様に得られる。Consider Where y (n) = [y
₁ (n), y ₂ (n)] ^T is an output signal, and R _y ^b (τ)
Covariance matrix of the output signal (= <y (n) y T (n-
τ)> _b ) and <x> _b is a block b (b = 1, ...
..., B) is a time average. Separation filter w
_In the natural gradient calculated by differentiating this evaluation function Q with w (k), the update formula of _ij is not only R _y ^b (0) but also the time-shift correlation R _y ^b (for increasing the performance. By taking τ) into consideration, we can obtain

【００２８】[0028]

【数６】 [Equation 6]

【００２９】式（１１）の導出はT.Nishikawa, H.Saruw
atari, K.Shikano,“Comparison ofblind source separ
ation methods based on time-domain ICA using nonst
ationarity and multistage ICA,”IEICE Tech.Rep.,Ja
n.2002.に詳しく記載されている。この更新式により求
まったｗ_ij（ｗの要素）を使用し、式（２）を使用し
て分離信号を得る。この分離信号が図５中のＹ
_i ^ssb（ｋ、ｎ）となる。以上の方法は二次の統計量を使
用する方法であるが、高次の統計量を使用する方法を採
用することも可能である。Equation (11) is derived by T. Nishikawa, H. Saruw
atari, K. Shikano, “Comparison ofblind source separ
ation methods based on time-domain ICA using nonst
ationarity and multistage ICA, ”IEICE Tech.Rep., Ja
n.2002. Using w _ij (element of w) obtained by this updating formula, the separated signal is obtained by using formula (2). This separated signal is Y in FIG.
_i ^ssb (k, n). Although the above method uses the second-order statistics, it is also possible to adopt the method using the higher-order statistics.

【００３０】最後に、サブバンド合成過程について説明
する。（３）サブバンド合成過程はＳＳＢ復調部５
４₁、５４₂と、サブバンド合成部５５₁、５５₂とにより
構成される。先ず、各帯域で分離した信号Ｙ_i ^ssb（ｋ、
ｍ）をＳＳＢ復調部５４₁、５４₂によりＳＳＢ復調す
る。Ｒｅ［Ｙ_i（ｋ、ｍ）］＝Ｙ_i ^ssb（ｋ、ｍ）ｃｏｓ（ｍ
π／２）Ｉｍ［Ｙ_i（ｋ、ｍ）］＝Ｙ_i ^ssb（ｋ、ｍ）ｓｉｎ（ｍ
π／２）この復調した信号Ｙ_i（ｋ、ｍ）はサブバンド合成部５
５に入力され、Finally, the subband synthesis process will be described. (3) The SSB demodulation unit 5 performs the subband synthesis process.
4 ₁ and 54 ₂ and sub-band synthesis units 55 ₁ and 55 ₂ . First, the signals Y _i ^ssb (k,
m) is SSB demodulated by the SSB demodulators 54 ₁ and 54 ₂ . Re [Y _i (k, m)] = Y _i ^ssb (k, m) cos (m
π / 2) Im [Y _i (k, m)] = Y _i ^ssb (k, m) sin (m
π / 2) This demodulated signal Y _i (k, m) is output to the subband synthesis unit 5
Entered in 5,

【００３１】[0031]

【数７】 [Equation 7]

【００３２】により合成された信号ｙ_i(ｎ）が得られ
る。ここで、ｆ（ｎ）は、合成に使用する帯域［−π／
Ｍ、π／Ｍ］のローパスフィルタであり、ｆ（ｎ）＝｛ｓｉｎ（ｎ／Ｍ）｝／（ｎ／Ｍ）（１３）が使用される。ここで、実施例の動作フローチャートを
示すと図７の如くになる。そして、以上の信号抽出装置
を電子計算機を主要な構成部材として構成してもよい。
また、この発明を、ＣＤその他の記憶媒体からダウンロ
ードし或いは通信回線を介してダウンロードしたプログ
ラムをこの電子計算機にインストールして実施すること
ができる。A signal y _i (n) synthesized by is obtained. Here, f (n) is a band [−π /
M, π / M] low-pass filter, and f (n) = {sin (n / M)} / (n / M) (13) is used. Here, the operation flow chart of the embodiment is shown in FIG. Then, the above signal extraction device may be configured by using an electronic computer as a main constituent member.
Further, the present invention can be implemented by installing a program downloaded from a storage medium such as a CD or downloaded via a communication line into this electronic computer.

【００３３】[0033]

【発明の効果】以上の通りであって、この発明によれ
ば、各帯域の統計的性質を保持したままで各帯域で長い
フィルタを推定することができるので、原信号の高い分
離性能が期待される。図６にはこのサブバンドＢＳＳの
効果が示されている。ここにおいては、帯域分割数Ｎ＝
６４、間引率Ｍ＝１６、各帯域の分離フィルタの長さＱ
_su _b ＝６４とした。これは間引率Ｍとしては周波数領域
ＢＳＳの「３２」に相当し、フィルタ長としてはフルバ
ンドで１０２４タップの分離フィルタに相当する。実験
は部屋の残響が１５０ｍｓ、３００ｍｓの２つの場合に
ついて行った。As described above, according to the present invention, it is possible to estimate a long filter in each band while maintaining the statistical properties of each band. Therefore, high separation performance of the original signal is expected. To be done. The effect of this subband BSS is shown in FIG. Here, the number of band divisions N =
64, thinning rate M = 16, separation filter length Q in each band
_su _b = 64. This corresponds to the decimation rate M of "32" in the frequency domain BSS, and the filter length of the full band to 1024 taps. The experiment was conducted for two cases where the room reverberation was 150 ms and 300 ms.

【００３４】図６において、横軸の数字は周波数領域Ｂ
ＳＳにおけるフレーム長（＝フィルタ長）であり、その
フレーム長を使用した周波数領域ＢＳＳによる結果であ
る。また、「ＳＵＢ」はサブバンドＢＳＳによる結果で
ある。周波数領域ＢＳＳでは、長さ１０２４の長い分離
フィルタを求める時に性能が劣化していたが、サブバン
ドＢＳＳでは長さ１０２４の分離フィルタを求めること
ができ、高い分離性能が得られている。なお、原信号を
サブバンド分割したものの式（６）の相関係数の値は、
男声と男声で０.０２８、男声と女声で０.０１８、女声
と女声で０.０２０であった。よって、独立性の仮定は
充分に保たれていると考えてよい。In FIG. 6, the numbers on the horizontal axis indicate the frequency domain B.
It is the frame length (= filter length) in SS, and is the result of the frequency domain BSS using the frame length. Further, “SUB” is the result of the subband BSS. In the frequency domain BSS, the performance deteriorates when a long separation filter with a length of 1024 is obtained, but with the subband BSS, a separation filter with a length of 1024 can be obtained, and high separation performance is obtained. The value of the correlation coefficient in equation (6) obtained by dividing the original signal into subbands is
The male and female voices were 0.028, the male and female voices were 0.018, and the female and female voices were 0.020. Therefore, it can be considered that the assumption of independence is sufficiently maintained.

【図面の簡単な説明】[Brief description of drawings]

【図１】信号分離モデルを示す図。FIG. 1 is a diagram showing a signal separation model.

【図２】各周波数各帯域の統計的性質およびフィルタ長
を説明する図。FIG. 2 is a diagram for explaining the statistical properties and filter length of each frequency band.

【図３】周波数領域ＢＳＳにおいて、大きなフレームサ
イズで独立性の仮定が崩れることを示す図。FIG. 3 is a diagram showing that the assumption of independence is broken at a large frame size in the frequency domain BSS.

【図４】サブバンドＢＳＳの実施例を説明する図。FIG. 4 is a diagram illustrating an example of a subband BSS.

【図５】サブバンドＢＳＳの実施例の詳細を説明する
図。FIG. 5 is a diagram illustrating details of an embodiment of a subband BSS.

【図６】実施例の効果を説明する図。FIG. 6 is a diagram for explaining the effect of the embodiment.

【図７】実施例のフローチャートFIG. 7 is a flowchart of an embodiment.

【符号の説明】[Explanation of symbols]

５１サブバンド分析部５３時間領域ＢＳＳ部５５サブバンド合成部 51 Sub-band analysis section 53 hours BSS department 55 Subband synthesizer

───────────────────────────────────────────────────── フロントページの続き (72)発明者向井良東京都千代田区大手町二丁目３番１号日本電信電話株式会社内 (72)発明者澤田宏東京都千代田区大手町二丁目３番１号日本電信電話株式会社内 (72)発明者猿渡洋奈良県生駒市高山8916−５Ｄ−307 Ｆターム(参考） 5D015 EE05 ─────────────────────────────────────────────────── ─── Continued front page (72) Inventor Ryo Mukai 2-3-1, Otemachi, Chiyoda-ku, Tokyo Inside Telegraph and Telephone Corporation (72) Inventor Hiroshi Sawada 2-3-1, Otemachi, Chiyoda-ku, Tokyo Inside Telegraph and Telephone Corporation (72) Inventor Hiroshi Saruwatari 8916-5 Takayama, Ikoma City, Nara D-307 F-term (reference) 5D015 EE05

Claims

【特許請求の範囲】[Claims]

【請求項１】長いインパルス応答を有する経路を介し
て観測される複数の混合信号から原信号をその独立性に
基づいて分離抽出する信号抽出方法において、複数の混合信号をサブバンド分析部に入力してそれぞれ
Ｎ個（Ｎ：整数）の帯域にサブバンド分析し、サブバン
ド分析された各帯域の信号を対応する帯域の時間領域Ｂ
ＳＳ部に入力して各帯域毎に音源分離し、音源分離した
信号をサブバンド合成部に入力して原信号に対応する出
力信号を求めることを特徴とする信号抽出方法。1. A signal extraction method for separating and extracting an original signal from a plurality of mixed signals observed through a path having a long impulse response based on their independence, and inputting the plurality of mixed signals to a subband analysis unit. Then, subband analysis is performed on each of N (N: integer) bands, and the signals of each band subjected to the subband analysis are time-domain B of the corresponding band.
A signal extraction method characterized by inputting to a SS section, separating sound sources for each band, and inputting the separated sound source signals to a subband combining section to obtain an output signal corresponding to an original signal.

【請求項２】長いインパルス応答を有する経路を介し
て観測される複数の混合信号から原信号をその独立性に
基づいて分離抽出する信号抽出装置において、複数の混合信号を入力してそれぞれＮ個（Ｎ：整数）の
帯域に分析するサブバンド分析部を具備し、サブバンド分析された各帯域の信号毎に音源分離する時
間領域ＢＳＳ部を具備し、音原分離した信号を入力して原信号に対応する出力信号
を求めるサブバンド合成部を具備することを特徴とする
信号抽出装置。2. A signal extraction device for separating and extracting an original signal from a plurality of mixed signals observed via a path having a long impulse response based on their independence, and inputting a plurality of mixed signals, each of which is N in number. A sub-band analysis unit for analyzing the (N: integer) band is provided, and a time-domain BSS unit for separating the sound source for each signal of each band subjected to the sub-band analysis is provided. A signal extraction device comprising a subband synthesis unit for obtaining an output signal corresponding to a signal.

【請求項３】複数の混合信号をそれぞれＮ個（Ｎ：整
数）の帯域にサブバンド分析し、サブバンド分析された各帯域の信号を帯域毎に音源分離
し、各帯域に音源分離した信号から原信号に対応する出力信
号を求める、ことを実行させる信号抽出プログラム。3. A signal obtained by subband-analyzing a plurality of mixed signals into N (N: integer) bands, separating the subband-analyzed signals in each band into sound sources, and separating the sound sources into each band. A signal extraction program that executes the process of obtaining the output signal corresponding to the original signal from.

【請求項４】複数の混合信号をそれぞれＮ個（Ｎ：整
数）の帯域にサブバンド分析し、サブバンド分析された各帯域の信号を帯域毎に音源分離
し、各帯域に音源分離した信号から原信号に対応する出力信
号を求める、ことを実行させる信号抽出プログラムを記録した記録媒
体。4. A signal obtained by subband-analyzing a plurality of mixed signals into N (N: integer) bands, separating the subband-analyzed signals of each band into sound sources, and separating the sound sources into respective bands. A recording medium that records a signal extraction program that causes the output signal corresponding to the original signal to be obtained.