JP3355598B2

JP3355598B2 - Sound source separation method, apparatus and recording medium

Info

Publication number: JP3355598B2
Application number: JP25231297A
Authority: JP
Inventors: 真理子青木; 茂明青木; 弘行松井; 豊西野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1996-09-18
Filing date: 1997-09-17
Publication date: 2002-12-09
Anticipated expiration: 2017-09-17
Also published as: JPH10313497A

Abstract

PROBLEM TO BE SOLVED: To separate sound accurately into respective components and to attain real time processing. SOLUTION: A difference Δτ in time required for a sound signal to travel from a sound source to microphones 1, 2 is detected from output channel signals L, R of the microphones 1, 2, where the signals L, R are divided into frequency bands L(f1 )-L(fn ), R(f1 )-R(fn ) by Fourier transform, and an arrival time difference Δτi (i=1, 2,...n) to the microphones 1, 2 of a corresponding frequency band of the L(f1 )-L(fn ), R(f1 )-R(fn ) and a signal level difference ΔLi are detected. Then L(f1 )-L(fn ), R(f1 )-R(fn ) are divided into low frequencies f1 <1/(2Δτ), medium frequencies 1/(2Δτ)<f1 <1/Δτ, and high frequencies fi >1/Δτ, and it is determined from which sound source the L(fi ), R(fi ) arrive, based on Δτi , in the case of low frequencies, on ΔLi , Δτi in the case of medium frequencies, and on ΔLi in the case of high frequencies, and the result is outputted for each sound source and the output is subject to inverse Fourier transform by each sound source and the results are synthesized.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は音声信号源や各種
環境音源などの複数の音源から発せられた複数の音響信
号が混ざった信号から少なくとも１つの音源の信号を分
離抽出する方法、その方法に用いた音源分離装置、およ
びその方法をコンピュータにより実行するためのプログ
ラを記録した記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for separating and extracting at least one sound source signal from a mixed signal of a plurality of sound signals emitted from a plurality of sound sources such as an audio signal source and various environmental sound sources. The present invention relates to a sound source separation device used and a recording medium storing a program for executing the method by a computer.

【０００２】この種の音源分離装置は、例えばテレビ会
議における収音装置、騒音環境下で発声した音声信号の
伝送のための収音装置、音源の種類を識別する装置の収
音装置など各種のものに適用される。従来の音源分離技
術は、周波数領域において各信号の基本周波数を推定
し、調波構造を抜き出すことにより、同一音源からの成
分を集めて合成する方法が用いられてきた。[0002] This type of sound source separation device includes various types of sound collection devices such as a sound collection device for a video conference, a sound collection device for transmitting a voice signal uttered in a noisy environment, and a sound collection device for identifying the type of a sound source. Applied to things. In the conventional sound source separation technology, a method of estimating a fundamental frequency of each signal in a frequency domain, extracting a harmonic structure, and collecting and synthesizing components from the same sound source has been used.

【０００３】しかしこの方法では、（１）分離可能な信
号が、音声の母音や楽音のような調波構造を持つものに
限定されるという問題があった、（２）基本周波数の推
定は一般に長い処理時間を必要とするため、実時間で音
源を分離することは困難であった、（３）調波構造の推
定誤りなどにより、抽出された信号に他の音源の周波数
成分が混じり、それが雑音として知覚されるため分離精
度が不十分であった。However, this method has a problem that (1) a separable signal is limited to a signal having a harmonic structure such as a vowel or a musical tone of a voice, and (2) estimation of a fundamental frequency is generally performed. Since a long processing time was required, it was difficult to separate the sound source in real time. (3) Due to an error in the estimation of the harmonic structure, the frequency components of other sound sources were mixed in the extracted signal. Was perceived as noise, and the separation accuracy was insufficient.

【０００４】[0004]

【発明の解決しようとする課題】この発明の目的は調波
構造を持たない音源の音響信号でも分離抽出することと
し、つまり音源の種類に依存することなく音源分離を可
能とし、かつ実時間での音源分離を可能とする方法、装
置、及びプログラム記録媒体を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to separate and extract even the sound signal of a sound source having no harmonic structure, that is, to enable sound source separation without depending on the type of the sound source, and in real time. It is an object of the present invention to provide a method, an apparatus, and a program recording medium which enable sound source separation.

【０００５】この発明の他の目的は分離精度が高く、雑
音の混入が少ない音源分離方法、装置及びプログラム記
録媒体を提供することにある。Another object of the present invention is to provide a sound source separation method, apparatus, and program recording medium having high separation accuracy and low noise contamination.

【０００６】[0006]

【課題を解決するための手段】この発明の音源分離方法
は互いに離して設けられた複数のマイクロホンを用い、
上記各マイクロホンの各出力チャネル信号を、帯域分割
過程で複数の周波数帯域に分割し、その各帯域には主と
して１つの音源信号成分のみ存在するようにし、これら
分割された各出力チャネル信号の各同一帯域ごとに、上
記複数のマイクロホンの位置に起因して変化する、マイ
クロホンに到達する音響信号のパラメータ、つまりレベ
ル（パワー）、到達時間の値の差を、帯域別チャネル間
パラメータ値差として検出し、上記各帯域の帯域別チャ
ネル間パラメータ値差にもとづき、その帯域の上記帯域
分割された各出力チャネル信号の何れがいずれの音源か
ら入力された信号であるかを音源信号判定過程で判定
し、この音源信号判定過程の判定にもとづき、上記帯域
分割された各出力チャネル信号から、同一音源から入力
された信号を少なくとも１つ、音源信号選択過程で選択
し、その音源信号選択過程で同一音源からの信号として
選択された、複数の帯域信号を音源信号として音源合成
過程で合成する。A sound source separation method according to the present invention uses a plurality of microphones provided at a distance from each other.
Each output channel signal of each of the microphones is divided into a plurality of frequency bands in a band division process, and each band mainly includes only one sound source signal component. For each band, a parameter of an acoustic signal reaching the microphone, that is, a difference between values of a level (power) and an arrival time, which changes due to the positions of the plurality of microphones, is detected as a band-based inter-channel parameter value difference. Based on the parameter value difference between channels for each band of each band, to determine which of the sound source signal is input from which sound source of each of the band-divided output channel signals of the band, Based on the determination in the sound source signal determination process, the signals input from the same sound source are reduced from the band-divided output channel signals. Also one selected by the sound source signal selection process, selected by the sound source signal selection process as the signal from the same sound source, synthesized by the sound source synthesis process a plurality of band signals as a sound source signal.

【０００７】この発明の音源分離方法の実施例によれ
ば、上記帯域分割過程で分割された各出力チャネル信号
の帯域別レベルをそれぞれ検出し、これらが検出された
各帯域別レベルを同一帯域についてチャネル間で比較し
た結果にもとづき発音をしていない音源を検出し、その
発音をしていない音源の検出信号により、上記音源合成
過程で合成された音源信号のうち、上記発音していない
音源と対応する合成信号を抑圧する。According to the embodiment of the sound source separation method of the present invention, the band level of each output channel signal divided in the band dividing process is detected, and the detected band level is determined for the same band. A sound source that is not sounding is detected based on the result of the comparison between the channels, and a sound signal that is not sounding is detected by the detection signal of the sound source that is not sounding. Suppress the corresponding composite signal.

【０００８】この発明の音源分離方法の他の実施例によ
ると、上記帯域分離過程で分割された各出力チャネル信
号のそのマイクロホンへの到達時間差を同一帯域ごとに
検出し、これら検出された各帯域別到達時間差を、同一
帯域についてチャネル間で比較した結果にもとづき発音
をしていない音源を検出し、その発音をしていない音源
の検出信号により、上記音源合成過程で合成された音源
信号のうち、上記発音していない音源と対応する合成信
号を抑圧する。According to another embodiment of the sound source separation method of the present invention, the arrival time difference of each output channel signal divided in the band separation process to the microphone is detected for each same band, and each detected band is detected. A different arrival time difference is detected based on the result of comparison between channels for the same band, and a sound source that is not sounding is detected. , And suppresses a synthesized signal corresponding to the sound source that is not sounding.

【０００９】[0009]

【発明の実施の形態】図１にこの発明の実施例を示す。
マイクロホン１，２が間隔、例えば２０ｃｍ程度をあけ
て配され、これらマイクロホン１，２はそれぞれ音源
Ａ，Ｂからの音響信号を収集して電気信号に変換する。
マイクロホン１の出力をＬチャネル信号と、マイクロホ
ン２の出力をＲチャネル信号と称する。Ｌチャネル信号
とＲチャネル信号はチャネル間時間差／レベル差検出部
３と、帯域分割部４へ供給され、帯域分割部４ではそれ
ぞれ複数の周波数帯域信号に分割されて帯域別チャネル
間時間差／レベル差検出部５と音源判定信号選別部６へ
供給される。検出部３，５の各検出出力に応じて選別部
６において各帯域ごとに何れかのチャネル信号がＡ成分
又はＢ成分として選別され、これら選択された帯域ごと
のＡ成分信号、Ｂ成分信号はそれぞれ音源信号合成部７
Ａ，７Ｂでそれぞれ合成されて、音源Ａ信号と音源Ｂ信
号とに分離出力される。音源Ａがマイクロホン２よりマ
イクロホン１に近いと、音源Ａよりマイクロホン１に到
達する信号ＳＡ１は音源Ａよりマイクロホン２に到達す
る信号ＳＡ２より早く到達し、かつレベルが大きい、ま
た音源Ｂがマイクロホン１よりマイクロホン２に近い
と、音源Ｂからマイクロホン１，２にそれぞれ到達する
信号ＳＢ１，ＳＢ２は後者が早くマイクロホン２に到達
し、レベルも大きい。このようにこの発明では、音源の
マイクロホン１，２に対する位置に起因する両マイクロ
ホン１，２に到達する音響信号の変化量、この例では両
信号の到達時間差とレベル差を利用する。FIG. 1 shows an embodiment of the present invention.
Microphones 1 and 2 are arranged at intervals, for example, about 20 cm, and these microphones 1 and 2 collect acoustic signals from sound sources A and B, respectively, and convert them into electric signals.
The output of the microphone 1 is called an L channel signal, and the output of the microphone 2 is called an R channel signal. The L-channel signal and the R-channel signal are supplied to an inter-channel time difference / level difference detection unit 3 and a band division unit 4, where they are divided into a plurality of frequency band signals, and the band-by-band time difference / level difference between bands. The signal is supplied to the detection unit 5 and the sound source determination signal selection unit 6. According to each detection output of the detection units 3 and 5, any one of the channel signals is selected as an A component or a B component for each band in the selection unit 6, and the A component signal and the B component signal for each of the selected bands are Sound source signal synthesizer 7
A and 7B respectively synthesize and separate and output a sound source A signal and a sound source B signal. When the sound source A is closer to the microphone 1 than the microphone 2, the signal SA1 that reaches the microphone 1 from the sound source A arrives earlier than the signal SA2 that reaches the microphone 2 from the sound source A, and has a higher level. When the signal SB1 is closer to the microphone 2, the signals SB1 and SB2 reaching the microphones 1 and 2 from the sound source B, respectively, reach the microphone 2 earlier and have a higher level. As described above, in the present invention, the amount of change in the sound signal reaching the microphones 1 and 2 due to the position of the sound source with respect to the microphones 1 and 2 is used.

【００１０】図１に示した装置は以下に示すように動作
する。図２に示すように、マイクロホン１，２に２つの
音源Ａ，Ｂからの信号が取り込まれる（Ｓ０１）。チャ
ネル間時間差／レベル差検出部３は、Ｌチャネル信号と
Ｒチャネル信号からチャネル間時間差またはレベル差を
検出する。時間差の検出に用いるパラメータとしては、
Ｌチャネル信号とＲチャネル信号との相互相関関数を用
いた場合で説明する。図３に示すようにまず、Ｌチャネ
ル信号とＲチャネル信号との各サンプルＬ（ｔ），Ｒ
（ｔ）を読み込み（Ｓ０２）、これらサンプル間の相互
相関関数を算出する（Ｓ０３）。この算出は両チャネル
信号が同一サンプル時点についての相互相関を求め、ま
た一方のチャネル信号に対し他方のチャネル信号をサン
プル時点を１つだけずらした場合、２つだけずらした場
合・・・の各場合の相互相関をそれぞれ求めて相互相関
関数を求める。これら相互相関を多数求め、これらをパ
ワーで正規化したヒストグラムを作成する（Ｓ０４）。
次に、ヒストグラムの累積度数順位第一位、第二位をそ
れぞれとる時点差Δα ₁，Δα₂を求める（Ｓ０５）。
これらの時点差Δα₁，Δα₂を、次式によりそれぞれ
チャネル間時間差Δτ₁，Δτ₂に変換して出力する
（Ｓ０６）。The device shown in FIG. 1 operates as follows.
I do. As shown in FIG. 2, two microphones 1 and 2
Signals from sound sources A and B are captured (S01). Cha
The inter-nel time difference / level difference detection unit 3 detects the L channel signal and
Time difference or level difference between channels from R channel signal
To detect. The parameters used to detect the time difference are
Using the cross-correlation function between the L channel signal and the R channel signal
Will be described. First, as shown in FIG.
Samples L (t) and R of the R signal and the R channel signal.
(T) is read (S02), and the mutual
A correlation function is calculated (S03). This calculation is for both channels
If the signals are cross-correlated for the same sample time,
One channel signal to the other channel signal
If the pull time is shifted by one,
The cross-correlation in each case of
Find a function. A large number of these cross-correlations are determined and these are
In step S04, a histogram normalized by a word is created.
Next, the first and second places of the histogram are listed.
Each time difference Δα ₁, Δα_TwoIs obtained (S05).
These time differences Δα₁, Δα_TwoAre given by
Time difference between channels Δτ₁, Δτ_TwoConvert to and output
(S06).

【００１１】 Δτ₁＝１０００×Δα₁／Ｆ（１） Δτ₂＝１０００×Δα₂／Ｆ（２）ただしＦはサンプリング周波数であり、１０００倍にす
るのは演算の便宜上値をある程度大きくするためであ
る。時間差Δτ₁，Δτ₂は、音源Ａ，Ｂそれぞれの信
号のＬチャネル信号とＲチャネル信号のチャネル間時間
差である。Δτ ₁ = 1000 × Δα ₁ / F (1) Δτ ₂ = 1000 × Δα ₂ / F (2) Here, F is a sampling frequency, and the factor of 1000 is used to increase the value to some extent for the convenience of calculation. It is. The time differences Δτ ₁ and Δτ ₂ are the time differences between the channels of the L channel signal and the R channel signal of the signals of the sound sources A and B, respectively.

【００１２】図１、２の説明に戻って帯域分割部４はＬ
チャネル信号とＲチャネル信号をそれぞれ各周波数帯域
の信号Ｌ（ｆ１），Ｌ（ｆ２），…，（ｆｎ）と、信号
Ｒ（ｆ１），Ｒ（ｆ２），…，（ｆｎ）に分割する（Ｓ
０４）。この分割は例えば各チャネル信号をそれぞれ離
散的フーリエ変換して周波数領域信号に変換した後、各
周波数帯域に分割することにより行う。この帯域分割
は、音源Ａ，Ｂの各信号の周波数特性の差から各帯域に
おいて、一方の音源の信号成分のみが主として存在する
程度、音声信号の場合は、例えば２０Ｈｚ帯域幅で分割
する。音源Ａのパワースペクトルが例えば図４Ａに示す
ように得られ、音源Ｂのパワースペクトルが図４Ｂに示
すように得られ、この各スペクトルが分離できる程度の
帯域幅Δｆで分割する。この時、例えば破線で対応する
スペクトルを示すように、一方の音源のスペクトルに対
し他方の音源のスペクトルは無視できる。またこの図４
Ａ、４Ｂから理解されるように帯域幅２Δｆで分離して
もよい。つまり、各帯域に１本のスペクトルのみが含ま
れるようにしなくてもよい。なお、離散的フーリエ変換
は例えば２０〜４０ｍｓごとに行う。Returning to the description of FIGS.
The channel signal and the R channel signal are divided into signals L (f1), L (f2),..., (Fn) and signals R (f1), R (f2),. S
04). This division is performed, for example, by discretely Fourier-transforming each channel signal into a frequency-domain signal, and then dividing it into frequency bands. In this band division, in the case of an audio signal, for example, a 20 Hz bandwidth is used so that only the signal component of one of the sound sources mainly exists in each band from the difference in the frequency characteristics of the signals of the sound sources A and B. The power spectrum of the sound source A is obtained, for example, as shown in FIG. 4A, and the power spectrum of the sound source B is obtained as shown in FIG. 4B, and each spectrum is divided by a bandwidth Δf that can be separated. At this time, the spectrum of one sound source can be neglected with respect to the spectrum of the other sound source, for example, as shown by the corresponding spectrum with a broken line. FIG. 4
A and 4B, it may be separated by a bandwidth 2Δf. That is, it is not necessary to include only one spectrum in each band. The discrete Fourier transform is performed, for example, every 20 to 40 ms.

【００１３】次に、帯域別チャネル間時間差／レベル差
検出部５は、例えばＬ（ｆ１）とＲ（ｆ１），…Ｌ（ｆ
ｎ）とＲ（ｆｎ）といった各対応する帯域信号のチャネ
ル間について、帯域別チャネル間時間差またはレベル差
を検出する（Ｓ０５）。ここで、帯域別チャネル間時間
差は、チャネル間時間差検出部３で検出したチャネル間
時間差Δτ₁，Δτ₂を利用することにより一意的に検
出される。この検出に用いる式は以下のとおりである。Next, for example, L (f1), R (f1),... L (f)
A time difference or a level difference between channels for each band is detected between the channels of the corresponding band signals such as n) and R (fn) (S05). Here, the inter-channel time difference for each band is uniquely detected by using the inter-channel time differences Δτ ₁ and Δτ ₂ detected by the inter-channel time difference detection unit 3. The equation used for this detection is as follows.

【００１４】 Δτ₁−｛（Δφｉ／（２πｆｉ）＋（ｋｉ１／ｆｉ）｝＝ε_i１（３） Δτ₂−｛（Δφｉ／（２πｆｉ）＋（ｋｉ２／ｆｉ）｝＝ε_i２（４）ｉ＝１，２，…，ｎ、Δφｉは信号Ｌ（ｆｉ）と信号Ｒ
（ｆｉ）の位相差である。これら式でε_i１，ε_i２が
最小になるように整数ｋｉ１，ｋｉ２を決める。次に、
その最小値のε_i１とε_i２とを比べて小さい方のチャ
ネル時間差Δτ _j（ｊ＝１，２）を、その帯域ｉのチャ
ネル間時間差Δτ_ijとする。つまり一方の音源信号のそ
の帯域でのチャネル間時間差とする。Δτ₁− {(Δφi / (2πfi) + (ki1 / fi)} = ε_i1 (3) Δτ_Two− {(Δφi / (2πfi) + (ki2 / fi)} = ε_i2 (4) i = 1, 2,..., N, Δφi are the signal L (fi) and the signal R
(Fi) is the phase difference. In these equations, ε_i1, ε_i2
The integers ki1 and ki2 are determined so as to be minimum. next,
Its minimum ε_i1 and ε_iCha that is smaller than 2
Flannel time difference Δτ _j(J = 1, 2) is replaced by the channel i
Time difference between tunnels Δτ_ijAnd That is, one of the sound source signals
Is the time difference between channels in the band.

【００１５】音源判定信号選別部６は、帯域別チャネル
間時間差／レベル差検出部５で検出された帯域別チャネ
ル間時間差Δτ_1j〜τ_njを用いて各帯域信号Ｌ（ｆ１）
〜Ｌ（ｆｎ）とＲ（ｆ１）〜Ｒ（ｆｎ）との各対応する
ものについて何れを選択するか判定を音源信号判定部６
０１で行う（Ｓ０６）。例えば、チャネル間時間差／レ
ベル差検出部３で算出された時間差Δτ₁，Δτ₂のう
ち、Δτ₁が、Ｌ側のマイクロホンに近い、音源Ａから
の信号のチャネル間時間差であり、Δτ₂が、Ｒ側のマ
イクロホンに近い、音源Ｂからの信号のチャネル間時間
差である場合で説明する。The sound source determination signal selecting section 6 uses the band-to-channel time differences Δτ _{1j to} τ _nj detected by the band-to-channel time difference / level difference detecting section 5 to generate each band signal L (f1).
To L (fn) and R (f1) to R (fn) to determine which one to select, the sound source signal determination unit 6
01 (S06). For example, of the time differences Δτ ₁ and Δτ ₂ calculated by the inter-channel time difference / level difference detection unit 3, Δτ ₁ is the inter-channel time difference of the signal from the sound source A close to the L-side microphone, and Δτ ₂ is , And the time difference between channels of the signal from the sound source B, which is close to the R-side microphone.

【００１６】この場合、帯域別チャネル間時間差／レベ
ル差検出部５で算出された時間差Δτ_ijがΔτ₁である
帯域ｉは、音源信号判定部６０１によりゲート６０２Ｌ
ｉが開とされてＬ側の入力信号Ｌ（ｆｉ）がそのままＳ
Ａ（ｆｉ）として出力され、Ｒ側の帯域ｉの入力信号Ｒ
（ｆｉ）は音源信号判定部６０１によりゲート６０２Ｒ
が閉とされてＳＢ（ｆｉ）は０として出力される。時間
差Δτ_ijがΔτ₂となる帯域ｉは、逆に、Ｌ側は信号Ｌ
（ｆｉ）はＳＡ（ｆｉ）＝０として出力され、Ｒ側は入
力信号Ｒ（ｆｉ）がそのままＳＢ（ｆｉ）として出力さ
れる。つまり図１に示すように帯域信号Ｌ（ｆ１）〜Ｌ
（ｆｎ）はそれぞれゲート６０２Ｌ１〜６０２Ｌｎを通
じて音源信号合成部７Ａへ供給され、帯域信号Ｒ（ｆ
１）〜Ｒ（ｆｎ）はそれぞれゲート６０２Ｒ１〜６０２
Ｒｎを通じて音源信号合成部７へ供給される。音源判定
信号選別部６内の音源信号判定部６０１ではΔτ_1j〜Δ
τ_njが入力され、Δτ_ijがΔτ₁と判定された帯域ｉに
ついてはゲート制御信号ＣＬｉ＝１とＣＲｉ＝０が生成
され、対応するゲート６０２Ｌｉが開、６０２Ｒｉが閉
にそれぞれ制御され、Δτ_ijがΔτ₂と判定された帯域
ｉについてはゲート制御信号ＣＬｉ＝０と、ＣＲｉ＝１
が生成され、対応するゲート６０２Ｌｉが閉、６０２Ｒ
ｉが開にそれぞれ制御される。以上の説明は機能構成で
あって、実際には例えばデジタルシグナルプロセッサに
より処理される。In this case, the band i in which the time difference Δτ _ij calculated by the band-by-band channel time difference / level difference detection unit 5 is Δτ ₁ is determined by the sound source signal judgment unit 601 by the gate 602L.
i is opened and the input signal L (fi) on the L side remains at S
A (fi), and the input signal R of the band i on the R side
(Fi) indicates that the sound source signal determination unit 601 uses the gate 602R.
Is closed, and SB (fi) is output as 0. The band i in which the time difference Δτ _ij becomes Δτ ₂ is, on the contrary, the signal L
(Fi) is output as SA (fi) = 0, and on the R side, the input signal R (fi) is output as it is as SB (fi). That is, as shown in FIG. 1, the band signals L (f1) to L (f1) to L
(Fn) are supplied to the sound source signal synthesis unit 7A through the gates 602L1 to 602Ln, respectively, and the band signal R (f
1) to R (fn) are gates 602R1 to 602, respectively.
The signal is supplied to the sound source signal synthesis unit 7 through Rn. In the sound source signal determination unit 601 in the sound source determination signal selection unit 6, Δτ _{1j to} Δ
For the band i for which τ _nj is input and Δτ _ij is determined to be Δτ ₁ , gate control signals CLi = 1 and CRi = 0 are generated, and the corresponding gate 602Li is controlled to be open and 602Ri is controlled to be closed, and Δτ _{ij is} controlled. Is determined to be Δτ ₂ , the gate control signal CLi = 0 and CRi = 1
Is generated and the corresponding gate 602Li is closed, 602R
i is respectively controlled to be open. The above description is a functional configuration, and is actually processed by, for example, a digital signal processor.

【００１７】音源信号合成部７Ａで信号ＳＡ（ｆｉ）〜
ＳＡ（ｆｎ）が合成され、前記帯域分割の例ではそれぞ
れ逆フーリエ変換され、信号ＳＡとして出力端子ｔ_Aに
出力され、また音源信号合成部７Ｂで信号ＳＢ（ｆｉ）
〜ＳＢ（ｆｎ）が同様に合成されて信号ＳＢとして出力
端子ｔ_Bに出力される。以上の説明で明らかなように、
この発明装置においては、各チャネル信号の細かく帯域
分割した、各帯域成分がそれぞれどの音源からのもので
あるかを判定し、判定された成分は全て出力する、すな
わち、音源Ａ，Ｂの信号の周波数成分が互いに重なって
いなければ、特定の周波数帯域を欠落させることなく処
理を行うため、調波構造のみ抜き出す従来の方法に比べ
て音質を高く保ったまま音源Ａ，Ｂの各信号を分離する
ことが可能である。The signals SA (fi) to
SA (fn) are combined, the band in the example of the division is the inverse Fourier transform respectively, is output to the output terminal t _A as a signal SA, and the signal SB by the sound source signal synthesizer 7B (fi)
To SB (fn) is output to the output terminal t _B as similarly synthesized by the signal SB. As is clear from the above explanation,
In the device of the present invention, it is determined from which sound source each band component is obtained by finely dividing the band of each channel signal, and all the determined components are output, that is, the signals of the sound sources A and B are output. If the frequency components do not overlap each other, processing is performed without dropping a specific frequency band, so that the signals of the sound sources A and B are separated while maintaining high sound quality as compared with the conventional method of extracting only the harmonic structure. It is possible.

【００１８】以上の説明は、チャネル間時間差／レベル
差検出部３及び帯域別チャネル間時間差／レベル差検出
部５で検出した、チャネル間時間差と、帯域別チャネル
間時間差のみを利用して、音源判定信号部６０１で判定
条件を決定した。次にこの判定条件の決定をチャネル間
のレベル差を用いて処理する実施例を説明する。この実
施例は図５に示すようにマイクロホン１，２からＬチャ
ネル信号とＲチャネル信号を取込み（Ｓ０２）、これら
Ｌチャネル信号とＲチャネル信号のチャネル間レベル差
ΔＬをチャネル間時間差／レベル差検出部３（図１）で
検出する（Ｓ０３）。図２中のステップＳ０４と同様
に、Ｌチャネル信号、Ｒチャネル信号をそれぞれｎ個の
帯域別チャネル信号Ｌ（ｆ１）〜Ｌ（ｆｎ），Ｒ（ｆ
１）〜Ｒ（ｆｎ）に分割し（Ｓ０４）、帯域別チャネル
信号Ｌ（ｆ１）〜Ｌ（ｆｎ）とＲ（ｆ１）〜Ｒ（ｆｎ）
との対応帯域、つまりＬ（ｆ１）とＲ（ｆ１），Ｌ（ｆ
２）とＲ（ｆ２），…，Ｌ（ｆｎ）とＲ（ｆｎ）につい
て帯域別チャネル間レベル差ΔＬ１，ΔＬ２，…，ΔＬ
ｎを検出する（Ｓ０５）。In the above description, the sound source is generated by using only the inter-channel time difference and the inter-channel time difference detected by the inter-channel time difference / level difference detecting unit 3 and the inter-channel time difference / level difference detecting unit 5. The determination condition is determined by the determination signal unit 601. Next, an embodiment in which the determination of the determination condition is processed using the level difference between the channels will be described. In this embodiment, as shown in FIG. 5, an L channel signal and an R channel signal are fetched from the microphones 1 and 2 (S02), and a level difference ΔL between the L channel signal and the R channel signal is detected as a time difference / level difference between channels. The detection is performed by the unit 3 (FIG. 1) (S03). As in step S04 in FIG. 2, the L channel signal and the R channel signal are respectively converted into n band-specific channel signals L (f1) to L (fn), R (f).
1) to R (fn) (S04), and band-specific channel signals L (f1) to L (fn) and R (f1) to R (fn).
, That is, L (f1) and R (f1), L (f
2) and R (f2),..., L (fn) and R (fn), the level difference between channels ΔL1, ΔL2,.
n is detected (S05).

【００１９】人間の音声は、２０ｍｓ〜４０ｍｓ程度の
間は定常状態とみなすことが出来る。そのため、音源信
号判定部６０１（図１）においては、２０ｍｓ〜４０ｍ
ｓ毎に、チャネル間レベル差ΔＬの対数を取った値の符
号と、帯域別チャネル間レベル差ΔＬｉの対数を取った
値の符号とが、全帯域のうち何割以上の帯域で、同じ符
号（＋又は−）になるのかを算出し、所定値、例えば８
割以上の帯域で両者が同じ符号を持てば（Ｓ０６，Ｓ０
７）、そこから２０ｍｓ〜４０ｍｓの間はチャネル間レ
ベル差ΔＬのみで判定し（Ｓ０８）、同じ符号を持つの
が８割以下の帯域であれば、そこから２０ｍｓ〜４０ｍ
ｓの間は帯域毎に、帯域別チャネル間レベル差ΔＬｉを
用いて判定する（Ｓ０９）。判定の仕方は、全帯域をチ
ャネル間レベル差ΔＬで判定する場合は、ΔＬが正であ
れば、Ｌチャネル信号Ｌ（ｔ）がそのまま信号ＳＡとし
て出力され、Ｒチャネル信号Ｒ（ｔ）は信号ＳＢ＝０と
して出力される。ΔＬが０以下であれば逆に、Ｌチャネ
ル信号Ｌ（ｔ）は信号ＳＡ＝０として出力され、Ｒチャ
ネル信号Ｒ（ｔ）がそのまま信号ＳＢとして出力され
る。ただし、これは、チャネル間レベル差としてＬ側か
らＲ側を引いた値を用いた場合の説明である。また、帯
域別チャネル間レベル差ΔＬｉを用いて帯域毎に判定す
る場合は、各帯域ｆｉごとに帯域別チャネル間レベル差
ΔＬｉが正であれば、Ｌ側分割信号Ｌ（ｆｉ）がそのま
ま信号ＳＡ（ｆｉ）として出力され、Ｒ側分割信号Ｒ
（ｆｉ）は信号ＳＢ（ｆｉ）＝０として出力される。レ
ベル差ΔＬｉが０以下であれば逆に、Ｌ側は分割信号Ｌ
（ｆｉ）は信号ＳＡ（ｆｉ）＝０として出力され、Ｒ側
は分割信号Ｒ（ｆｉ）が信号ＳＢ（ｆｉ）として出力さ
れる。以上のようにして音源信号判定部６０１からゲー
ト制御信号ＣＬ１〜ＣＬｎ，ＣＲ１〜ＣＲｎが出力さ
れ、ゲート６０２Ｌ１〜６０２Ｌｎ，６０２Ｒ１〜６０
２Ｒｎがそれぞれ制御される。これも、前者と同様、帯
域別チャネル間レベル差として、Ｌ側からＲ側を引いた
値を用いた場合の説明である。信号ＳＡ（ｆ１）〜ＳＡ
（ｆｎ）、信号ＳＢ（ｆ１）〜ＳＢ（ｆｎ）は先の実施
例と同様にそれぞれ合成された信号ＳＡ，ＳＢとして出
力端子ｔ_A，ｔ_Bにそれぞれ出力される（Ｓ１０）。A human voice can be regarded as a steady state for about 20 ms to 40 ms. Therefore, in the sound source signal determination unit 601 (FIG. 1), 20 ms to 40 m
For each s, the sign of the value obtained by taking the logarithm of the inter-channel level difference ΔL and the sign of the value obtained by taking the logarithm of the inter-channel level difference ΔLi are the same sign in more than a few percent of the entire band. (+ Or-) is calculated, and a predetermined value, for example, 8
If both have the same code in a band equal to or more than a certain percentage (S06, S0
7) From 20 ms to 40 ms therefrom, judgment is made only by the inter-channel level difference ΔL (S08), and if the band having the same code is 80% or less, 20 ms to 40 m from there.
During s, determination is made for each band using the band-by-band level difference ΔLi (S09). When the entire band is determined by the level difference ΔL between channels, if ΔL is positive, the L channel signal L (t) is output as it is as the signal SA, and the R channel signal R (t) is It is output as SB = 0. On the contrary, if ΔL is 0 or less, the L channel signal L (t) is output as the signal SA = 0, and the R channel signal R (t) is output as the signal SB as it is. However, this is an explanation in the case where a value obtained by subtracting the R side from the L side is used as the level difference between channels. In addition, in the case where the determination is made for each band using the band-based inter-channel level difference ΔLi, if the band-based inter-channel level difference ΔLi is positive for each band fi, the L-side divided signal L (fi) is directly used as the signal SA. (Fi) and the R-side divided signal R
(Fi) is output as the signal SB (fi) = 0. Conversely, if the level difference ΔLi is 0 or less, the L side
(Fi) is output as a signal SA (fi) = 0, and on the R side, a divided signal R (fi) is output as a signal SB (fi). The gate control signals CL1 to CLn and CR1 to CRn are output from the sound source signal determination unit 601 as described above, and the gates 602L1 to 602Ln and 602R1 to 602Rn are output.
2Rn are respectively controlled. This is also a case where a value obtained by subtracting the R side from the L side is used as the level difference between channels for each band, as in the former case. Signals SA (f1) to SA
(Fn), the signal SB (f1) ~SB (fn) signals are respectively similar to the previous embodiments the synthetic SA, an output terminal t _A as SB, are output to t _B (S10).

【００２０】前記実施例では、音源信号判定部６０１で
用いる判定条件として、到達時間差とレベル差のうちど
ちらかの片方のみを利用する。しかし、レベル差のみを
利用した場合、低域の周波数帯域ではＬ（ｆｉ）とＲ
（ｆｉ）とのレベルが拮抗する場合があり、その場合は
レベル差を正確に求めることが困難になる。また、時間
差のみを利用した場合は、高い周波数帯域においては、
位相の回転が起こるため時間差を正しく算出することが
困難な場合がある。これらの点から、低域の周波数帯域
では時間差を、高域ではレベル差を判定に用いた方が、
全帯域に渡り単一のパラメータを用いるよりも有利であ
る場合がある。In the above embodiment, only one of the arrival time difference and the level difference is used as a judgment condition used in the sound source signal judgment unit 601. However, if only the level difference is used, L (fi) and R
In some cases, the level with (fi) may antagonize, in which case it is difficult to accurately determine the level difference. Also, when only the time difference is used, in a high frequency band,
In some cases, it is difficult to calculate the time difference correctly due to the rotation of the phase. From these points, it is better to use the time difference in the low frequency band and the level difference in the high frequency band,
It may be advantageous to use a single parameter over the entire band.

【００２１】そこで、音源信号判定部６０１で帯域別チ
ャネル間時間差と帯域別チャネル間レベル差を共に用い
る実施例を図６以下の図面を参照して説明する。この実
施例の機能構成のブロックとしては図１と同一である
が、チャネル間時間差／レベル差検出部分３、帯域別チ
ャネル間時間差／レベル差検出部５と音源信号判定部６
０１での処理が以下のように異なる。チャネル間時間差
／レベル差検出部３は、検出された時間差Δτ₁，Δτ
₂の各絶対値の平均、又はΔτ₁，Δτ₂が比較的近い
値であれば、その一方のみなど、一つの時間差Δτを出
力する。なおチャネル間時間差Δτ₁，Δτ₂，Δτを
チャネル信号Ｌ（ｔ），Ｒ（ｔ）を周波数軸上で帯域分
割する前に算出したが、帯域分割した後に算出すること
も可能である。An embodiment in which the sound source signal determination unit 601 uses both the time difference between channels for each band and the level difference between channels for each band will be described with reference to FIGS. The blocks of the functional configuration of this embodiment are the same as those shown in FIG.
01 differs as follows. The inter-channel time difference / level difference detection unit 3 detects the detected time differences Δτ ₁ , Δτ
If the average of the absolute values of ₂ or Δτ ₁ and Δτ ₂ are relatively close values, one time difference Δτ is output, such as only one of them. Although the channel time differences Δτ ₁ , Δτ ₂ , and Δτ were calculated before the band division of the channel signals L (t) and R (t) on the frequency axis, they may be calculated after the band division.

【００２２】図５に示すように、Ｌチャネル信号Ｌ
（ｔ）、Ｒチャネル信号Ｒ（ｔ）をフレーム（例えば２
０〜４０ｍｓ）毎に読み込み（Ｓ０２）、帯域分割部４
でＬチャネル信号、Ｒチャネル信号をそれぞれ複数の周
波数帯域に分割する。この例ではＬチャネル信号Ｌ
（ｔ）、Ｒチャネル信号Ｒ（ｔ）にそれぞれハニング窓
をかけ（Ｓ０３）、それぞれフーリエ変換を施して分割
された信号Ｌ（ｆ１）〜Ｌ（ｆｎ）、Ｒ（ｆ１）〜Ｒ
（ｆｎ）を得る（Ｓ０４）。As shown in FIG. 5, the L channel signal L
(T), the R channel signal R (t) is converted to a frame (for example, 2
0 to 40 ms) (S02), and the band dividing unit 4
Divides the L channel signal and the R channel signal into a plurality of frequency bands, respectively. In this example, the L channel signal L
(T), a Hanning window is applied to each of the R channel signals R (t) (S03), and the signals L (f1) to L (fn) and R (f1) to R obtained by performing a Fourier transform on each are divided.
(Fn) is obtained (S04).

【００２３】次に、帯域別チャネル間時間差／レベル差
検出部５では分割された信号の周波数ｆｉが１／（２Δ
τ）（Δτはチャネル時間差）以下の帯域（以下、低域
と呼ぶ）であるかを調べ（Ｓ０５）、以下であれば帯域
別チャネル間位相差Δφｉを出力し（Ｓ０８）、分割さ
れた信号の周波数ｆが１／（２Δτ）より大きく１／Δ
τ未満の帯域（以下、中域と呼ぶ）であるかがチェック
され（Ｓ０６）、この中域であれば帯域別チャネル間位
相差Δφｉ及びレベル差ΔＬｉを出力し（Ｓ０９）、分
割された信号の周波数ｆが１／Δτ以上の帯域（以下、
高域と呼ぶ）かがチェックされ（Ｓ０７）、高域であれ
ば帯域別チャネル間レベル差ΔＬｉを出力する（Ｓ１
０）。Next, in the inter-channel time difference / level difference detector 5 for each band, the frequency fi of the divided signal is 1 / (2Δ
τ) (Δτ is a channel time difference) or less (hereinafter referred to as a low frequency band) (S05), and if so, a phase difference Δφi between bands is output (S08), and the divided signal is divided. Is larger than 1 / (2Δτ) and 1 / Δ
It is checked whether the band is a band smaller than τ (hereinafter referred to as a middle band) (S06). If the band is the middle band, a phase difference Δφi and a level difference ΔLi between channels are output (S09), and the divided signal is outputted. Frequency f is 1 / Δτ or more (hereinafter, referred to as
Is checked (S07), and if it is a high band, the band-to-channel level difference ΔLi is output (S1).
0).

【００２４】音源信号判定部６０１は、帯域別チャネル
間時間差／レベル差検出部５で検出された帯域別チャネ
ル間位相差、レベル差を用いてＬ（ｆ１）〜Ｌ（ｆ
ｎ）、Ｒ（ｆ１）〜Ｒ（ｆｎ）それぞれについて何れを
出力するかの判定を行う。なお、位相差Δφｉ、レベル
差ΔＬについては、この例では共にＬ側からＲ側の値を
引いて算出した値を用いる。The sound source signal judging section 601 uses L (f1) to L (f) using the band-to-channel phase difference and level difference detected by the band-to-channel time difference / level difference detecting section 5.
n) and which of R (f1) to R (fn) is to be output is determined. In this example, values calculated by subtracting the value on the R side from the L side are used for the phase difference Δφi and the level difference ΔL.

【００２５】低域と判定された信号Ｌ（ｆｉ），Ｒ（ｆ
ｉ）については図７に示すようにまず位相差Δφｉがπ
以上かを調べ（Ｓ１５）、π以上であればΔφｉから２
πを減算した値をΔφｉとし（Ｓ１７）、ステップＳ１
５でΔφｉがπ以上でなければ、−π以下かを調べ（Ｓ
１６）、以下であればΔφｉに２πを加算した値をΔφ
ｉとし（Ｓ１８）、ステップＳ１６で−π以下でなけれ
ばΔφｉをそのまま用いる（Ｓ１９）。ステップＳ１
７，Ｓ１８，Ｓ１９で求めた帯域別チャネル間位相差Δ
φｉを時間差Δσｉに次式で変換する（Ｓ２０）。The signals L (fi) and R (f
Regarding i), first, as shown in FIG.
It is checked whether it is the above (S15).
The value obtained by subtracting π is set to Δφi (S17), and step S1
If Δφi is not equal to or more than π in 5, it is checked whether it is equal to or less than −π (S
16), the value obtained by adding 2π to Δφi is Δφ
i (S18), and if it is not less than -π in step S16, Δφi is used as it is (S19). Step S1
7, phase difference Δ between channels obtained in band obtained in S18, S19
φi is converted into a time difference Δσi by the following equation (S20).

【００２６】 Δσｉ＝１０００・Δφｉ／２πｆｉ（５）分割された信号Ｌ（ｆｉ），Ｒ（ｆｉ）が中域と判定さ
れた場合は図８に示すように帯域別チャネル間レベル差
ΔＬ（ｆｉ）を利用して、位相差Δφｉを一意に決定す
る。即ちΔＬ（ｆｉ）が正かを調べ（Ｓ２３）、正であ
れば、その帯域別チャネル間位相差Δφｉが正であるか
を調べ（Ｓ２４）、正であればそのΔφｉをそのまま出
力し（Ｓ２６）、ステップＳ２４で正でなければΔφｉ
に２πを加算した値をΔφｉとして出力する（Ｓ２
７）。ステップＳ２３でΔＬ（ｆｉ）が正でなければ、
その帯域別チャネル間位相差Δφｉが負であるかを調べ
（Ｓ２５）、負であれば、そのΔφｉをそのままΔφｉ
として出力し（Ｓ２８）、ステップＳ２５で負でなけれ
ばΔφｉから２πを減算した値をΔφｉとして出力する
（Ｓ２９）。これらステップＳ２６〜Ｓ２９の何れかの
Δφｉが次式によりその帯域別チャネル間時間差Δσｉ
として演算される（Ｓ３０）。Δσi = 1000 · Δφi / 2πfi (5) When the divided signals L (fi) and R (fi) are determined to be in the middle band, as shown in FIG. 8, the channel-to-channel level difference ΔL (fi) ) Is used to uniquely determine the phase difference Δφi. That is, it is checked whether ΔL (fi) is positive (S23). If it is positive, it is checked whether the band-by-band phase difference Δφi is positive (S24). If it is positive, the Δφi is output as it is (S26). ), Δφi if not positive in step S24
Is output as Δφi (S2
7). If ΔL (fi) is not positive in step S23,
It is checked whether the band-by-band phase difference Δφi is negative (S25), and if it is negative, the Δφi is directly used as Δφi.
(S28), and a value obtained by subtracting 2π from Δφi is output as Δφi if it is not negative in step S25 (S29). Any of these Δφi in steps S26 to S29 is represented by
Is calculated (S30).

【００２７】 Δσｉ＝１０００・Δφｉ／２πｆｉ（６）以上のようにして低域、中域における帯域別チャネル間
時間差Δσｉと、高域における帯域別チャネル間レベル
差ΔＬ（ｆｉ）が得られ、これらに応じて音源信号の判
別が次のようになされる。図９に示すように低域と中域
においては位相差Δφｉを、高域においてはレベル差Δ
Ｌｉを利用して両チャネルの各周波数成分を該当するど
ちらかの音源の信号として判別する。具体的には、低域
と中域においては図７、８でそれぞれ求められた帯域別
チャネル間時間差Δσｉが正であるかを調べ（Ｓ３
４）、正であれば、その帯域ｉのＬ側チャネル信号Ｌ
（ｆｉ）を信号ＳＡ（ｆｉ）として出力し、Ｒ側帯域チ
ャネル信号Ｒ（ｆｉ）を０の信号ＳＢ（ｆｉ）として出
力する（Ｓ３６）。ステップＳ３４で帯域別チャネル時
間差Δσｉが正でない場合は逆にＳＡ（ｆｉ）として０
を出力し、ＳＢ（ｆｉ）としてＲ側チャネル信号Ｒ（ｆ
ｉ）を出力する（Ｓ３７）。Δσi = 1000 · Δφi / 2πfi (6) As described above, the inter-channel time difference Δσi in the low band and the middle band and the inter-channel level difference ΔL (fi) in the high band are obtained. Is determined in the following manner. As shown in FIG. 9, the phase difference Δφi between the low band and the middle band, and the level difference Δφ between the high band and the high band.
Using Li, each frequency component of both channels is determined as a signal of one of the corresponding sound sources. Specifically, it is checked whether the band-by-band time difference Δσi obtained in FIGS. 7 and 8 is positive in the low band and the middle band (S3).
4) If positive, the L-side channel signal L of the band i
(Fi) is output as a signal SA (fi), and the R-side band channel signal R (fi) is output as a signal SB (fi) of 0 (S36). If the band-based channel time difference Δσi is not positive in step S34, on the contrary, SA (fi) is set to 0.
And outputs the R-side channel signal R (f) as SB (fi).
i) is output (S37).

【００２８】また、高域においては、図６中のステップ
Ｓ１０で検出した帯域別チャネル間レベル差ΔＬ（ｆ
ｉ）が正であるかを調べ（Ｓ３５）、正であれば信号Ｓ
Ａ（ｆｉ）としてＬ側チャネル信号Ｌ（ｆｉ）を出力
し、ＳＢ（ｆｉ）として０を出力する（Ｓ３８）。ステ
ップＳ３５でレベル差ΔＬｉが正でなければＳＡ（ｆ
ｉ）として０を出力し、ＳＢ（ｆｉ）としてＲ側帯域チ
ャネル信号Ｒ（ｆｉ）を出力する（Ｓ３９）。In the high frequency range, the band-to-channel level difference ΔL (f) detected in step S10 in FIG.
It is checked whether i) is positive (S35).
The L-side channel signal L (fi) is output as A (fi), and 0 is output as SB (fi) (S38). If the level difference ΔLi is not positive in step S35, SA (f
It outputs 0 as i) and outputs the R-side band channel signal R (fi) as SB (fi) (S39).

【００２９】以上のようにして各帯域についてＬ側又は
Ｒ側が出力され、音源信号合成部７Ａ，７Ｂでそれぞれ
判別した各周波数成分を全帯域に渡り加算し（Ｓ４
０）、かつ、加算した各信号を逆フーリエ変換し（Ｓ４
１）、その変換した信号ＳＡ，ＳＢを出力する（Ｓ４
２）。以上説明したように、この実施例においては、周
波数帯域毎に音源分離に有利なパラメータを用いること
により、全帯域に渡り単一のパラメータを用いる場合に
比べてより分離性能の高い音源分離を実現することが可
能である。As described above, the L side or the R side is output for each band, and the frequency components determined by the sound source signal combining units 7A and 7B are added over the entire band (S4).
0), and inversely Fourier-transform the added signals (S4
1) Output the converted signals SA and SB (S4)
2). As described above, in this embodiment, by using parameters that are advantageous for sound source separation for each frequency band, sound source separation with higher separation performance is realized compared to the case where a single parameter is used over the entire band. It is possible to

【００３０】この発明は音源の数が３個以上でも適用で
きる。例として、音源数が３、マイクロホン数が２であ
る場合でマイクロホンへの到達時間差を利用して音源分
離する場合を説明する。この場合、チャネル間時間差／
レベル差検出部３で各音源についてＬチャネル信号、Ｒ
チャネル信号のチャネル間時間差を算出する際に、図３
に示したように相互相関のパワーで正規化したヒストグ
ラムの、累積度数（ピーク値）第一位から第三位までを
とる各時点を求めることによって各音源信号についての
チャネル間時間差Δτ₁，Δτ₂，Δτ ₃を算出する。
そして、帯域別チャネル間時間差／レベル差検出部５に
おいても、各帯域の帯域別チャネル間時間差をΔτ₁か
らΔτ₃のどれかに決定する。この決定の仕方は、前記
実施例で述べた計算式（３），（４）と同様である。音
源信号判定部６０１では、例として、Δτ₁＞０、Δτ
₂＞０、Δτ₃＜０である場合で説明する。ここで、Δ
τ₁，Δτ₂，Δτ₃はそれぞれ、音源Ａ，Ｂ，Ｃ各信
号のチャネル間時間差と仮定し、さらに、これらの値は
Ｌ側からＲ側の値を引いて算出した値と仮定する。この
場合、音源ＡはＬ側のマイクロホン１に近く、音源Ｂは
Ｒ側のマイクロホン２の近くにある。よって、Ｌチャネ
ルの信号から、帯域別チャネル間時間差がΔτ₁となる
帯域の信号を加算して音源Ａの信号を、またΔτ₂とな
る帯域を加算して、音源Ｂの信号をそれぞれ分離するこ
とが可能である。また、Ｒチャネル信号から、帯域別チ
ャネル間時間差がΔτ₃となる帯域の信号を加算して出
力することにより、音源Ｃの信号を分離する。The present invention is applicable even when the number of sound sources is three or more.
Wear. For example, if the number of sound sources is 3 and the number of microphones is 2,
Source time difference using the arrival time difference to the microphone
The case of separation will be described. In this case, the time difference between channels /
In the level difference detection unit 3, L channel signal, R
When calculating the time difference between channels of the channel signal, FIG.
Histogram normalized by cross-correlation power as shown in
The ram's cumulative frequency (peak value)
By determining each time point taken,
Time difference between channels Δτ₁, Δτ_Two, Δτ _ThreeIs calculated.
Then, the band-by-band time difference between channels / level difference detection unit 5
In addition, the time difference between channels for each band is Δτ₁Or
Δτ_ThreeDecide on one of How to determine this
This is the same as the calculation formulas (3) and (4) described in the embodiment. sound
In the source signal determination unit 601, for example, Δτ₁> 0, Δτ
_Two> 0, Δτ_ThreeThe case where <0 is described. Where Δ
τ₁, Δτ_Two, Δτ_ThreeAre the signals of sound sources A, B, and C, respectively.
Signal time difference between channels, and furthermore, these values are
It is assumed that the value is calculated by subtracting the value on the R side from the L side. this
In this case, the sound source A is close to the microphone 1 on the L side, and the sound source B is
It is near the microphone 2 on the R side. Therefore, L channel
Time difference between channels for each band is Δτ₁Becomes
The signals of the sound source A are added by adding the signals of the bands, and Δτ_TwoTona
To separate the signals of sound source B.
And it is possible. Also, from the R channel signal,
The time difference between channels is Δτ_ThreeSignal of the band
By applying the force, the signal of the sound source C is separated.

【００３１】上述では音源信号を分離し、分離された各
音源信号ＳＡ，ＳＢを各別に出力した。しかし、例えば
一方の音源Ａは発話者による音声であり、他方の音源Ｂ
は騒音のような場合、騒音と混合された音源Ａの信号音
を分離抽出し、騒音を抑圧するためにもこの発明を適用
することができる。その場合は図１において音源信号合
成部７Ａを残し、１点鎖線で示す枠９中の音源信号合成
部７Ｂ、ゲート６０２Ｒ１〜６０２Ｒｎを省略すればよ
い。In the above description, the sound source signals are separated, and the separated sound source signals SA and SB are separately output. However, for example, one sound source A is a voice by a speaker and the other sound source B
In the case of a noise, the present invention can also be applied to separate and extract the signal sound of the sound source A mixed with the noise and suppress the noise. In that case, the sound source signal combining unit 7A and the gates 602R1 to 602Rn in the frame 9 indicated by a dashed line may be omitted, while the sound source signal combining unit 7A is left in FIG.

【００３２】一方の音源Ａが他方の音源Ｂより周波数帯
域が広い場合でその各周波数帯域が予め知られている場
合は、図１０に示すように図１において帯域分離部１０
において、両音源信号の重なっていない周波数帯域を分
離する。例えば音源Ａの信号Ａ（ｔ）の周波数帯域はｆ
１〜ｆｎであるが音源Ｂの信号Ｂ（ｔ）の周波数帯域は
ｆ１〜ｆｎ（ｆｎ＞ｆｍ）の場合、重なっていない帯域
ｆｍ＋１〜ｆｎの信号をマイクロホン１，２の出力から
分離し、この帯域ｆｍ＋１〜ｆｎの信号については、音
源信号判定部６０１の判定処理、場合によっては帯域別
チャネル間時間差／レベル差検出部５の処理を行わず、
音源信号判定部６０１は、音源Ｂの信号として選出する
チャネル信号ＳＢ（ｔ）として選出するＲの分割された
帯域チャネル信号Ｒ（ｆｍ＋１）〜Ｒ（ｆｎ）をそれぞ
れＳＢ（ｆｍ＋１）〜ＳＢ（ｆｎ）として出力し、ＳＡ
（ｆｍ＋１）〜ＳＡ（ｆｎ）は０を出力させるように音
源信号選択部６０２を制御する。即ちゲート６０２Ｌｍ
＋１〜６０２Ｌｎは常閉とし、ゲート６０２Ｒｍ＋１〜
６０２Ｒｎは常開とする。When one sound source A has a wider frequency band than the other sound source B and each frequency band is known in advance, as shown in FIG.
In, a frequency band where both sound source signals do not overlap is separated. For example, the frequency band of the signal A (t) of the sound source A is f
If the frequency band of the signal B (t) of the sound source B is f1 to fn (fn> fm), the signals of the non-overlapping bands fm + 1 to fn are separated from the outputs of the microphones 1 and 2, For signals in the bands fm + 1 to fn, the determination processing of the sound source signal determination unit 601 and, in some cases, the processing of the band-by-band time difference between channels / level difference detection unit 5 are not performed.
The sound source signal determination unit 601 converts the divided band channel signals R (fm + 1) to R (fn) selected as the channel signal SB (t) to be selected as the signal of the sound source B from SB (fm + 1) to SB (fn), respectively. ) And output as SA
(Fm + 1) to SA (fn) control the sound source signal selection unit 602 to output 0. That is, the gate 602Lm
+1 to 602Ln are normally closed, and gate 602Rm + 1 to
602Rn is normally open.

【００３３】上述では各帯域別チャネル間時間差Δσ
ｉ、正か負かにより、また各帯域別チャネル間レベル差
ΔＬｉが正か負かにより、つまり、いずれも０をしきい
値として、その帯域信号が何れのマイクロホンに近いか
を判別した。これはマイクロホン１として結ぶ線の２等
分線に対して音源Ａと音源Ｂと左右対称に位置している
場合である。この関係にない場合は判別しきい値を以下
のように決めればよい。In the above description, the time difference between channels Δσ for each band
i, positive or negative, and whether the level difference ΔLi between the respective channels is positive or negative, that is, 0 is set as the threshold value, and it is determined which microphone is close to the band signal. This is a case where the sound source A and the sound source B are located symmetrically with respect to the bisector of the line connected as the microphone 1. If not, the determination threshold may be determined as follows.

【００３４】音源Ａの信号がマイクロホン１、マイクロ
ホン２に到達する帯域別チャネル間レベル差をΔＬ_A、
到達する帯域別チャネル間時間差をΔτ_A、音源Ｂの信
号がマイクロホン１、マイクロホン２に到達する帯域別
チャネル間レベル差をΔＬ_B、到達する帯域別チャネル
間時間差をΔτ_Bとそれぞれする。このとき、帯域別チ
ャネル間レベル差のしきい値ΔＬthは ΔＬth＝（ΔＬ_A＋ΔＬｉ）／２とし、帯域別チャネル間時間差のしきい値Δτthは Δτth＝（Δτ_A＋Δτ_B）／２とすればよい。先に述べた実施例ではΔＬ_B＝−Δ
Ｌ_A、Δτ_B＝−Δτ_Aの場合でΔＬth＝０、Δτth＝
０となる。音源Ａ，Ｂを分離できるように、二つの音源
をマイクロホン１，２に対し、互いに異なる側となるよ
うに、マイクロホン１，２を位置させ、マイクロホン
１，２に対する距離、方向は必ずしも正しくはわかって
いない場合があり、しきい値ΔＬth，Δτthを可変とし
て、分離がよく行われるようにΔＬth，Δτthを調整可
能としてもよい。The level difference between the channels at which the signal of the sound source A reaches the microphones 1 and 2 is represented by ΔL _A ,
Arriving per-band channel between the time difference .DELTA..tau _A, signal microphone 1 of the sound source B, [Delta] L _B the level difference between the band-by-band channel that reaches the microphone 2, respectively and .DELTA..tau _B the time difference between the arriving band-by-band channel. In this case, the threshold DerutaLth the per-band channel level difference is set to _{ΔLth = (ΔL A + ΔLi)} / 2, the threshold Derutatauth the per-band channel between the time difference if _{_{Δτth = (Δτ A + Δτ B}} ) / 2 Good. In the embodiment described above, ΔL _B = −Δ
In the case of L _A , Δτ _B = −Δτ _A , ΔLth = 0, Δτth =
It becomes 0. The microphones 1 and 2 are positioned so that the two sound sources are on different sides of the microphones 1 and 2 so that the sound sources A and B can be separated, and the distance and direction to the microphones 1 and 2 are not always known correctly. In some cases, the threshold values ΔLth and Δτth may be variable so that ΔLth and Δτth can be adjusted so that separation is performed well.

【００３５】前記実施例では部屋の残響や回折の影響に
より、帯域別チャネル間時間差や帯域別チャネル間レベ
ル差に誤りが生じ、各音源信号を精度よく分離すること
ができない場合がある。このような問題を改善した実施
例を次に述べる。図１１に示すように、マイクロホンＭ
１，Ｍ２，Ｍ３は、例えば１辺が２０ｃｍの正三角形の
頂点の位置に配置されている。マイクロホンＭ１〜Ｍ３
の指向特性に基づいて空間が分割して設定され、その各
分割された空間を音源ゾーンと呼ぶ。全てのマイクロホ
ンＭ１〜Ｍ３が無指向で同じ特性を有する場合には、例
えば図１２に示すように、ゾーンＺ１〜Ｚ６のように６
個に分割される。つまり、各マイクロホンＭ１，Ｍ２，
Ｍ３と、その中心点Ｃp をそれぞれ通る直線により、中
心点Ｃpを中心に等角間隔で６分割された６つのゾーン
Ｚ１〜Ｚ６が形成される。音源ＡはゾーンＺ３に、音源
ＢはゾーンＺ４に位置している。つまり、１個の音源ゾ
ーンには１個の音源が属するよう、マイクロホンＭ１〜
Ｍ３の配置や特性に基づいて各音源ゾーンを決定する。In the above embodiment, due to the effects of room reverberation and diffraction, an error occurs in the time difference between the channels for each band and the level difference between the channels for each band, and it may not be possible to accurately separate each sound source signal. An embodiment in which such a problem is improved will be described below. As shown in FIG.
1, M2 and M3 are arranged, for example, at the vertices of an equilateral triangle having one side of 20 cm. Microphones M1 to M3
The space is divided and set based on the directional characteristics of the sound source, and each divided space is called a sound source zone. When all microphones M1 to M3 are omnidirectional and have the same characteristics, for example, as shown in FIG.
Divided into pieces. That is, each microphone M1, M2,
Six zones Z1 to Z6 divided into six at equal angular intervals around the center point Cp are formed by M3 and a straight line passing through the center point Cp. Sound source A is located in zone Z3, and sound source B is located in zone Z4. That is, the microphones M1 to M1 are so arranged that one sound source zone belongs to one sound source zone.
Each sound source zone is determined based on the arrangement and characteristics of M3.

【００３６】図１１において、帯域分割部４１は、マイ
クロホンＭ１で収音した第１チャネルの音響信号Ｓ１を
ｎ個の周波数帯域信号Ｓ１（ｆ１）〜Ｓ１（ｆｎ）に分
割し、分割部４２でマイクロホンＭ２で収音した第２チ
ャネルの音響信号Ｓ２をｎ個の周波数帯域信号Ｓ２（ｆ
１）〜Ｓ２（ｆｎ）に分割し、帯域分割部４３は、マイ
クロホンＭ３で収音した第３チャネルの音響信号Ｓ３を
ｎ個の周波数帯域信号Ｓ３（ｆ１）〜Ｓ３（ｆｎ）に分
割する。これら各帯域ｆ１〜ｆｎは帯域分割部４１〜４
３で共通であり、このような帯域分割は離散的フーリエ
変換器を利用することができる。In FIG. 11, a band dividing section 41 divides a sound signal S1 of the first channel collected by the microphone M1 into n frequency band signals S1 (f1) to S1 (fn). The acoustic signal S2 of the second channel collected by the microphone M2 is converted into n frequency band signals S2 (f
1) to S2 (fn), and the band dividing unit 43 divides the sound signal S3 of the third channel collected by the microphone M3 into n frequency band signals S3 (f1) to S3 (fn). These bands f1 to fn are divided into band division units 41 to 4
3, and such a band division can utilize a discrete Fourier transformer.

【００３７】音源分離部８０は図１乃至図１０を参照し
て説明した手法を用いて音源信号を分離するものであ
る。ただし図１１ではマイクロホンが３つであるから、
この３つのチャネルの信号の各２つの組合せについて同
様な処理を行う。従って音源分離部８０内の帯域分割部
と帯域分割部４１〜４３を兼用することもできる。帯域
別レベル（パワー）検出部Ｓ１で帯域分割部４１で得ら
れた各帯域の信号Ｓ１（ｆ１）〜Ｓ１（ｆｎ）のレベル
（パワー）信号Ｐ（Ｓ１ｆ１）〜Ｐ（Ｓ１ｆｎ）が検出
され、同様に帯域別レベル検出部５２，５３でそれぞれ
帯域分割部４２，４３で得られた各帯域信号Ｓ２（ｆ
１）〜Ｓ２（ｆｎ），Ｓ３（ｆ１）〜Ｓ３（ｆｎ）の各
Ｐ（Ｓ２ｆ１）〜Ｐ（Ｓ２ｆｎ），Ｐ（Ｓ３ｆ１）〜Ｐ
（Ｓ３ｆｎ）がそれぞれ検出される。これら帯域別レベ
ル検出もフーリエ変換器で実現できる。つまり各チャネ
ル信号を離散的フーリエ変換によりスペクトルに分解
し、その各スペクトルの電力を求めればよい。従って、
各チャネル信号について、パワースペクトルを求め、そ
のパワースペクトルを帯域分割してもよい。各マイクロ
ホンＭ１〜Ｍ３の各チャネル信号を、帯域別レベル検出
部４００で各帯域に分割すると共にそのレベル（パワ
ー）を出力することになる。The sound source separation section 80 separates a sound source signal by using the method described with reference to FIGS. However, in FIG. 11, since there are three microphones,
Similar processing is performed for each two combinations of these three channel signals. Therefore, the band division unit and the band division units 41 to 43 in the sound source separation unit 80 can also be used. The level (power) detection section S1 for each band detects the level (power) signals P (S1f1) to P (S1fn) of the signals S1 (f1) to S1 (fn) of each band obtained by the band division section 41, Similarly, each band signal S2 (f) obtained by the band division units 42 and 43 by the band-specific level detection units 52 and 53,
1) to S2 (fn), P (S2f1) to P (S2fn), P (S3f1) to P of S3 (f1) to S3 (fn)
(S3fn) are respectively detected. The level detection for each band can also be realized by a Fourier transformer. That is, each channel signal may be decomposed into a spectrum by discrete Fourier transform, and the power of each spectrum may be obtained. Therefore,
A power spectrum may be obtained for each channel signal, and the power spectrum may be divided into bands. Each channel signal of each of the microphones M1 to M3 is divided into each band by the band-specific level detection unit 400, and the level (power) is output.

【００３８】一方全帯域レベル検出部６１でマイクロホ
ンＭ１で収音された第１チャネルの音響信号Ｓ１の全周
波数成分のレベル（パワー）Ｐ（Ｓ１）が検出され、全
帯域レベル検出部６２，６３でそれぞれマイクロホンＭ
２，Ｍ３でそれぞれ収音された第２、第３チャネル２，
３の各音響信号Ｓ２，Ｓ３の全周波数成分のレベルＰ
（Ｓ２），Ｐ（Ｓ３）が検出される。On the other hand, the level (power) P (S1) of all the frequency components of the sound signal S1 of the first channel collected by the microphone M1 is detected by the all-band level detector 61, and the whole-band level detectors 62 and 63 are detected. With each microphone M
2nd and 3rd channels collected by M2 and M3, respectively
Level P of all frequency components of each of the acoustic signals S2 and S3 of FIG.
(S2) and P (S3) are detected.

【００３９】音源状態判定部７０では、コンピュータ処
理により、音響を発していない音源ゾーンを判定する。
まず、帯域別レベル検出部５０により得られる帯域別レ
ベルＰ（Ｓ１ｆ１）〜Ｐ（Ｓ１ｆｎ）、Ｐ（Ｓ２ｆ１）
〜Ｐ（Ｓ２ｆｎ）、Ｐ（Ｓ３ｆ１）〜Ｐ（Ｓ３ｆｎ）
を、同一の帯域の信号について相互に比較する。そして
各帯域ｆ１〜ｆｎ毎に、最も大きなレベルのチャネルを
特定する。The sound source state determination unit 70 determines, by computer processing, a sound source zone that does not emit sound.
First, the band-specific levels P (S1f1) to P (S1fn) and P (S2f1) obtained by the band-specific level detection unit 50.
~ P (S2fn), P (S3f1) ~ P (S3fn)
Are compared with each other for signals in the same band. Then, the highest level channel is specified for each of the bands f1 to fn.

【００４０】帯域分割の数ｎを所定数以上にすることに
より、前述したように、１つの帯域には１個の音源の音
響信号しか含まれないと見なせるようにすることができ
るので、同一帯域ｆｉのレベルＰ（Ｓ１ｆｉ），Ｐ
（Ｓ２ｆｉ），Ｐ（Ｓ３ｆｉ）は、同一音源からの音響
のレベルと見なすことができる。よって、第１〜第３チ
ャネルについて同一の帯域のレベルＰ（Ｓ１ｆｉ），Ｐ
（Ｓ２ｆｉ），Ｐ（Ｓ３ｆｉ）に差があるときは、音源
に最も近いマイクロホンのチャネルの帯域のレベルが最
も大きくなる。By setting the number n of band divisions to a predetermined number or more, as described above, one band can be regarded as containing only the sound signal of one sound source. fi Level P (S1fi), P
(S2fi) and P (S3fi) can be regarded as the level of sound from the same sound source. Therefore, the level P (S1fi), P of the same band for the first to third channels
When there is a difference between (S2fi) and P (S3fi), the level of the band of the microphone channel closest to the sound source becomes the largest.

【００４１】前記処理の結果、各帯域ｆ１〜ｆｎについ
て、最もレベルの大きなチャネルがそれぞれ割り当てら
れる。ｎ個の帯域中で第１〜第３各チャネルについて、
最もレベルが大きな帯域の合計数χ１，χ２，χ３を算
出する。この合計数の値が大きいチャネルのマイクロホ
ンほど、音源に近いとみなすことができる。合計数値が
例えば９０ｎ／１００以上程度であればそのチャネルの
マイクロホンに音源が近いと判定することができる。し
かし、最もレベルが大きい帯域の合計数が５３ｎ／１０
０、次に合計値が大きい値が４９ｎ／１００の場合はそ
のそれぞれの対応マイクロホンに音源が近いか明確では
ない。従って当該合計数が予め設定した基準値ＴｈＰ、
例えばｎ／３程度を越えたとき、当該合計数と対応する
チャネルのマイクロホンにその音源が最も近いと判定す
る。As a result of the above processing, the highest level channel is assigned to each of the bands f1 to fn. For each of the first to third channels in the n bands,
The total number # 1, # 2, and # 3 of the band having the highest level is calculated. The microphone of the channel having the larger total value can be regarded as closer to the sound source. If the total numerical value is, for example, about 90 n / 100 or more, it can be determined that the sound source is close to the microphone of that channel. However, the total number of bands having the highest level is 53n / 10
If the next largest sum is 0n / 100, it is not clear whether the sound source is close to the corresponding microphone. Therefore, the total number is equal to the preset reference value ThP,
For example, when it exceeds about n / 3, it is determined that the sound source is closest to the microphone of the channel corresponding to the total number.

【００４２】また、この音源状態判定部７０には、全帯
域レベル検出部６０で検出された各チャネルのレベルＰ
（Ｓ１）〜Ｐ（Ｓ３）も入力されていて、そのレベルの
全てが予め設定した基準値ＴｈＲ以下の場合には、何れ
のゾーンにも、音源がないと判定する。この音源状態判
定部７０による判定結果に基づき、制御信号を発生し
て、音源分離部８０で分割された音響信号Ａ，Ｂに対す
る抑圧を信号抑圧部９０で行う。つまり制御信号ＳＡｉ
により音響信号ＳＡを抑圧（減衰ないし削除）し、制御
信号ＳＢｉにより音響信号ＳＢを抑圧し、制御信号ＳＡ
Ｂｉにより両音響信号ＳＡ，ＳＢを抑圧する。例えば信
号抑圧部９０内に常閉スイッチ９Ａ，９Ｂが設けられ、
音源分離部８０の出力端子ｔ_A，ｔ_Bが常閉スイッチ９
Ａ，９Ｂを通じて、出力端子ｔ_A′，ｔ_B′に接続さ
れ、制御信号ＳＡｉによりスイッチ９Ａが開とされ、制
御信号ＳＢｉによりスイッチ９Ｂが開とされ、制御信号
ＳＡＢｉによりスイッチ９Ａ，９Ｂが共に開にされる。
当然のことであるが、音源分離部８０で行う分離処理す
るフレームの信号と、信号抑圧部９０での抑圧に用いる
制御信号を得るフレームの信号とは同一のものを用い
る。抑圧（制御）信号ＳＡｉ，ＳＢｉ，ＳＡＢｉの発生
についてわかり易く説明する。The sound source state determination unit 70 includes the level P of each channel detected by the all-band level detection unit 60.
If (S1) to P (S3) are also input and all of the levels are equal to or less than the preset reference value ThR, it is determined that there is no sound source in any zone. Based on the determination result by the sound source state determination unit 70, a control signal is generated, and the signal suppression unit 90 suppresses the sound signals A and B divided by the sound source separation unit 80. That is, the control signal SAi
Suppresses (attenuates or deletes) the sound signal SA, suppresses the sound signal SB with the control signal SBi, and outputs the control signal SA.
Bi suppresses both acoustic signals SA and SB. For example, normally closed switches 9A and 9B are provided in the signal suppression unit 90,
The output terminals t _A and t _{B of the} sound source separation unit 80 are normally closed switches 9
A and 9B are connected to the output terminals t _A ′ and t _B ′, the switch 9A is opened by the control signal SAi, the switch 9B is opened by the control signal SBi, and both the switches 9A and 9B are controlled by the control signal SABi. It is opened.
As a matter of course, the same signal is used for the frame signal to be separated by the sound source separation unit 80 and the frame signal for obtaining the control signal used for suppression by the signal suppression unit 90. The generation of the suppression (control) signals SAi, SBi, and SABi will be described in an easily understandable manner.

【００４３】いま、図１２に示すように音源Ａ，Ｂが位
置している時マイクロホンＭ１〜Ｍ３を図に示したよう
に配置し、ゾーンＺ１〜Ｚ６を決定し、音源ＡとＢが別
個のゾーンＺ３，Ｚ４にそれぞれ位置するようにする。
この時、音源ＡのマイクロホンＭ１〜Ｍ３に対する距離
ＳＡ１，ＳＡ２，ＳＡ３は、ＳＡ２＜ＳＡ３＜ＳＡ１と
なる。また、音源Ｂの各マイクロホンＭ１〜Ｍ３に対す
る距離ＳＢ１，ＳＢ２，ＳＢ３は、ＳＢ３＜ＳＢ２＜Ｓ
Ｂ１となる。Now, when the sound sources A and B are located as shown in FIG. 12, the microphones M1 to M3 are arranged as shown in the figure, zones Z1 to Z6 are determined, and the sound sources A and B are separated. It should be located in each of zones Z3 and Z4.
At this time, the distances SA1, SA2, and SA3 of the sound source A to the microphones M1 to M3 satisfy SA2 <SA3 <SA1. The distances SB1, SB2, and SB3 of the sound source B with respect to the microphones M1 to M3 are represented by SB3 <SB2 <S.
B1.

【００４４】全帯域レベル検出部６０の検出信号Ｐ（Ｓ
１）〜Ｐ（Ｓ３）のすべてが基準値ＴｈＲよりも小さい
とき、音源Ａ，Ｂは発音、例えば発話していないと見な
し、制御信号ＳＡＢｉにより、両音響信号ＳＡ，ＳＢを
抑圧する。このとき、出力音響信号ＳＡ，ＳＢは無音信
号となる（図１３の１０１，１０２）。音源Ａのみが発
音しているときは、その音響信号のすべての帯域の周波
数成分がマイクロホンＭ２へ一番大きな音圧レベル（パ
ワー）で到達するので、このマイクロホンＭ２のチャネ
ルの合計帯域数χ２が最も多くなる。The detection signal P (S
When all of 1) to P (S3) are smaller than the reference value ThR, it is considered that the sound sources A and B are not generating sound, for example, speaking, and the control signal SABi suppresses both sound signals SA and SB. At this time, the output audio signals SA and SB are silent signals (101 and 102 in FIG. 13). When only the sound source A is sounding, the frequency components of all the bands of the sound signal reach the microphone M2 at the highest sound pressure level (power), so that the total number of channels χ2 of the channels of the microphone M2 is The most.

【００４５】また、音源Ｂのみが発音しているときは、
その音響信号のすべての帯域の周波数成分がマイクロホ
ンＭ３へ一番大きな音圧レベルで到達するので、このマ
イクロホンＭ３のチャネルの合計帯域数χ３が最も多く
なる。さらに、音源Ａ，Ｂが共に発音している場合に
は、音響信号が最も大きな音圧レベルで到達する帯域数
がマイクロホンＭ２とＭ３で拮抗する。When only the sound source B is sounding,
Since the frequency components of all the bands of the sound signal reach the microphone M3 at the highest sound pressure level, the total number of bands χ3 of the channels of the microphone M3 becomes the largest. Further, when both of the sound sources A and B are sounding, the microphones M2 and M3 compete with each other for the number of bands in which the sound signal reaches at the highest sound pressure level.

【００４６】したがって、前記した基準値ＴｈＰによ
り、音響信号があるマイクロホンへ最も大きな音圧レベ
ルで到達する合計帯域数が、当該基準値ＴｈＰを越えた
場合、当該マイクロホンが司るゾーンに音源が存在する
と判定することにより、発音している音源ゾーンを検出
することができる。上記の例では、音源Ａのみが発音し
ているときは、χ２のみが基準値ＴｈＰを越えて、発音
している音源が存在するのはマイクロホンＭ２が司るゾ
ーンＺ３であると検出されるので、制御信号ＳＢｉによ
り音声信号ＳＢを抑制して、音響信号ＳＡのみを出力さ
せる（図１３の１０３，１０４）。Therefore, according to the reference value ThP, if the total number of bands in which the sound signal reaches the microphone with the largest sound pressure level exceeds the reference value ThP, it is determined that the sound source exists in the zone controlled by the microphone. By making the determination, the sound source zone that is sounding can be detected. In the above example, when only the sound source A is sounding, only # 2 exceeds the reference value ThP, and it is detected that the sounding sound source exists in the zone Z3 controlled by the microphone M2. The audio signal SB is suppressed by the control signal SBi, and only the audio signal SA is output (103 and 104 in FIG. 13).

【００４７】また、音源Ｂのみが発音しているときは、
χ３のみが基準値ＴｈＰを越えて、発音している音源が
存在するのは、マイクロホンＭ３が司るゾーンＺ４であ
ると検出されるので、制御信号ＳＡｉにより音響信号Ｓ
Ａを抑制して、音響信号ＳＢのみを出力させる（図１３
の１０５，１０６）。さらに、音源Ａ，Ｂが共に発音し
ていて、χ２，χ３ともに基準値ＴｈＰを越えるとき
は、例えば音源Ａに優先度を与えて、音源Ａのみが発音
していると処理することができる。図１３の処理手順は
そのようにしてある。また、χ２，χ３が共に基準値Ｔ
ｈＰに達していない場合は、レベルＰ（Ｓ１）〜Ｐ（Ｓ
３）が基準値ＴｈＲを越えている限り、両音源Ａ，Ｂと
もに発音していると判断し、制御信号ＳＡｉ，ＳＢｉ，
ＳＡＢｉの何れも出力せず、音声抑圧部９０では合成信
号ＳＡ，ＳＢに対する抑圧は行われない（図１３の１０
７）。When only the sound source B is sounding,
Since it is detected that only the # 3 exceeds the reference value ThP and the sound source that is sounding exists in the zone Z4 controlled by the microphone M3, the sound signal Si is detected by the control signal SAi.
A is suppressed, and only the acoustic signal SB is output (FIG. 13
105, 106). Further, when both the sound sources A and B are sounding and both # 2 and # 3 exceed the reference value ThP, for example, priority can be given to the sound source A, and it can be processed that only the sound source A is sounding. The processing procedure of FIG. Also, both # 2 and # 3 are the reference values T
hP, the levels P (S1) to P (S
As long as 3) exceeds the reference value ThR, it is determined that both sound sources A and B are sounding, and the control signals SAi, SBi,
SABi is not output, and the speech suppressor 90 does not suppress the combined signals SA and SB (see 10 in FIG. 13).
7).

【００４８】以上のようにして、音源分離部８０で分離
された音源信号ＳＡ，ＳＢは、音源状態判定部７０によ
って発音していないと判定された音源に対応するもの
が、信号抑圧部９０で抑圧され、不要音が抑圧されるよ
うになる。図１２に示した状態に対して、図１４に示す
ように音源ＣをゾーンＺ６に加えた場合は、図示しない
が音源分離部８０からは、音源Ａに対応する信号ＳＡ、
音源Ｂに対応する信号ＳＢの他に、音源Ｃに対応する信
号ＳＣを出力する。As described above, the sound source signals SA and SB separated by the sound source separation unit 80 correspond to the sound source determined not to be sounded by the sound source state determination unit 70, and are output by the signal suppression unit 90. It is suppressed, and unnecessary sound is suppressed. When the sound source C is added to the zone Z6 as shown in FIG. 14 with respect to the state shown in FIG. 12, the signal SA corresponding to the sound source A
The signal SC corresponding to the sound source C is output in addition to the signal SB corresponding to the sound source B.

【００４９】また、信号抑圧部９０に対して、音源状態
判定部７０から、信号ＳＡを抑圧する制御信号ＳＡｉ、
信号ＳＢを抑圧する制御信号ＳＢｉの他に、信号ＳＣを
抑圧する制御信号ＳＣｉが出力する。また、信号ＳＡと
ＳＢを抑圧する制御信号ＳＡＢｉの他に、信号ＳＢとＳ
Ｃを抑圧する制御信号ＳＢＣｉ、信号ＳＣとＳＡを抑圧
する制御信号ＳＣＡｉ、信号ＳＡとＳＢとＳＣの全部を
抑圧する制御信号ＳＡＢＣｉが出力する。この音源状態
判定部７０は、図１５に示すような処理を行う。For the signal suppressing section 90, the control signal SAi for suppressing the signal SA from the sound source state determining section 70,
A control signal SCi for suppressing the signal SC is output in addition to the control signal SBi for suppressing the signal SB. In addition to the control signal SABi for suppressing the signals SA and SB, the signals SB and S
A control signal SBCi for suppressing C, a control signal SCAi for suppressing the signals SC and SA, and a control signal SABCi for suppressing all of the signals SA, SB and SC are output. The sound source state determination unit 70 performs a process as shown in FIG.

【００５０】まず、レベルＰ（Ｓ１）〜Ｐ（Ｓ３）の全
部が基準値ＴｈＲを越えていない場合は、いずれの音源
Ａ〜Ｃも発音していないものと判断して、音源状態判定
部７０からＳＡＢＣｉを出力して、信号ＳＡ，ＳＢ，Ｓ
Ｃのいずれもが抑圧される（図１５の２０１〜２０
２）。次に、音源Ａ，Ｂ，Ｃがそれぞれ単独で発音して
いる場合は、Ｐ（Ｓ１）〜Ｐ（Ｓ３）の何れかはＴｈＲ
より大となり、前記した音源が２個の場合と同様に、そ
の音源に最も近いマイクロホンのチャネルのレベルが最
も大きくなるので、そのチャネルの帯域数χ１，χ２，
χ３のいずれかが基準値ＴｈＰを越える。そして、音源
Ｃのみが発音している場合は、χ１がＴｈＰを越え、制
御信号ＳＡＢｉを出力して信号ＳＡ，ＳＢが抑圧される
（図１５の２０３，２０４）。また、音源Ａのみが発音
している場合は、制御信号ＳＢＣｉが出力して信号Ｓ
Ｂ，ＳＣが抑圧される。さらに、音源Ａのみが発音して
いる場合は、制御信号ＳＢＣｉが出力して信号ＳＢ，Ｓ
Ｃが抑圧される（図１５の２０５〜２０８）。First, when all of the levels P (S1) to P (S3) do not exceed the reference value ThR, it is determined that none of the sound sources A to C is sounding, and the sound source state determination unit 70 Outputs SABCi, and outputs signals SA, SB, S
C are all suppressed (201 to 20 in FIG. 15).
2). Next, when each of the sound sources A, B, and C is sounding independently, any one of P (S1) to P (S3) is ThR.
Since the level of the channel of the microphone closest to the sound source becomes the highest as in the case of two sound sources, the number of bands of the channel χ1, χ2,
Any of # 3 exceeds the reference value ThP. When only the sound source C emits sound, # 1 exceeds ThP, and the control signal SABi is output to suppress the signals SA and SB (203 and 204 in FIG. 15). When only the sound source A is sounding, the control signal SBCi is output and the signal SBCi is output.
B and SC are suppressed. Further, when only the sound source A is sounding, the control signal SBCi is output and the signals SB, S
C is suppressed (205 to 208 in FIG. 15).

【００５１】次に、３つの音源Ａ〜Ｃのうちのいずれか
２つが発音する場合は、発音していない音源に対応する
ゾーンにあるマイクロホンのレベルが最も大きくなる帯
域数が、他のマイクロホンのものに比べて小さくなる。
例えば、音源Ｃのみが発音していない場合には、マイク
ロホンＭ１のレベルが最も大きくなる帯域数χ１が、他
の２個のマイクロホンＭ２，Ｍ３の帯域数χ２，χ３に
比べて小さくなる。Next, when any two of the three sound sources A to C sound, the number of bands in which the level of the microphone in the zone corresponding to the sound source that is not sounding becomes the highest is the number of bands of the other microphones. It is smaller than the one.
For example, when only the sound source C does not emit sound, the number of bands # 1 at which the level of the microphone M1 is the largest is smaller than the number of bands # 2 and # 3 of the other two microphones M2 and M3.

【００５２】よって、予めある基準値ＴｈＱ（＜Ｔｈ
Ｐ）を設定し、χ１がその基準値ＴｈＱ以下になる場合
は、マイクロホンＭ１とマイクロホンＭ３で空間を２分
割したゾーンＺ５，Ｚ６の内、マイクロホンＭ１に近い
ゾーンＺ６では、音源は信号を発していないと判定す
る。さらに、マイクロホンＭ１とＭ２で空間を２分割し
たゾーンＺ１，Ｚ２のうちマイクロホンＭ１に近いゾー
ンＺ１では音源は信号を発していないと判定する。Therefore, a predetermined reference value ThQ (<Th
P) is set, and when χ1 is equal to or less than the reference value ThQ, the sound source emits a signal in the zone Z6 close to the microphone M1 among the zones Z5 and Z6 obtained by dividing the space into two by the microphone M1 and the microphone M3. It is determined that there is not. Further, it is determined that the sound source does not emit a signal in the zone Z1 near the microphone M1 among the zones Z1 and Z2 obtained by dividing the space into two by the microphones M1 and M2.

【００５３】すなわち、ゾーンＺ１，Ｚ６にある音源は
信号を発していないと判定するのである。これらのゾー
ンにある音源は音源Ｃであることから、音源Ｃが信号を
発していないと判定される。つまり、音源Ａ，Ｂのみが
信号を発していると判定され、制御信号ＳＣｉを生成
し、信号ＳＣが抑圧される。図１４に示した状態で３つ
の音源Ａ〜Ｃのうち１つのみが発音していない場合は通
常は何れのマイクロホンについても最大となる帯域数χ
１，χ２，χ３は基準値ＴｈＰ以下となるため、図１５
においてステップ２０３，２０５，２０７を通過し、ス
テップ２０９で、χ１が基準値ＴｈＱ以下かを調べ、音
源Ｃのみが発音していなければ、χ１＜ＴｈＱとなり、
制御信号ＳＣｉが生成される（図１５の２１０）。ステ
ップ２０９でχ１がＴｈＱ以下でなければχ２，χ３に
ついても同様にＴｈＱ以下であるかが順次調べられ、Ｔ
ｈＱ以下であれば音源Ａのみ、又は音源Ｂのみが発音し
ていないと推定され、それぞれ制御信号ＳＡｉ又はＳＢ
ｉが抑圧される（図１５の２１１〜２１４）。That is, it is determined that the sound sources in the zones Z1 and Z6 do not emit a signal. Since the sound source in these zones is the sound source C, it is determined that the sound source C does not emit a signal. That is, it is determined that only the sound sources A and B are emitting signals, the control signal SCi is generated, and the signal SC is suppressed. In the state shown in FIG. 14, when only one of the three sound sources A to C does not emit sound, the number of bands that usually becomes the maximum for all microphonesχ
15 are smaller than the reference value ThP.
In Steps 203, 205, and 207, it is checked in Step 209 whether # 1 is equal to or smaller than the reference value ThQ.
A control signal SCi is generated (210 in FIG. 15). In step 209, if # 1 is not equal to or less than ThQ, it is sequentially checked whether # 2 and # 3 are equal to or less than ThQ.
If hQ or less, it is estimated that only the sound source A or only the sound source B is not sounding, and the control signal SAi or SB
i is suppressed (211 to 214 in FIG. 15).

【００５４】ステップ２１３でχ３がＴｈＱ以下でない
と判定されると、音源Ａ，Ｂ，Ｃは全て発音していると
判定され、何れの制御信号も生成されない（図１５の２
１５）。この場合基準値ＴｈＰは２ｎ／３〜３ｎ／４程
度基準値ＴｈＱはｎ／２〜２ｎ／３程度、つまり例えば
ＴｈＰを２ｎ／３程度にすると、ＴｈＱはｎ／２程度に
する。If it is determined in step 213 that # 3 is not equal to or less than ThQ, it is determined that all of the sound sources A, B, and C are sounding, and no control signal is generated (2 in FIG. 15).
15). In this case, the reference value ThP is about 2n / 3 to 3n / 4. The reference value ThQ is about n / 2 to 2n / 3. That is, for example, when ThP is about 2n / 3, ThQ is about n / 2.

【００５５】なお、以上の例では、ゾーンをＺ１〜Ｚ６
の６つに分けたが、図１６に示すように、中心点Ｃp か
ら各マイクロホン間の中点を通る点線により３つのゾー
ンＺ１〜Ｚ３に分けても同様に音源状態を判定できる。
この場合は、例えば、音源Ａのみが発音している場合
は、マイクロホンＭ２のチャネルの帯域数χ２が最も大
きくなるので、そのマイクロホンＭ２の司るゾーンＺ２
に音源があると判定される。また、音源Ｂのみが発音し
ている場合はχ３が最も大きくなり、ゾーンＺ３に音源
があると判定される。また、χ１が予め設定した値Ｔｈ
Ｑ以下である場合には、マイクロホンＭ１とＭ２および
Ｍ３とそれぞれ２分したうちのゾーンＺ１にある音源は
発音していないと判定する。以上の処理により、ゾーン
を３分割しても、６分割のときと同様に音源の状態を判
定できる。In the above example, the zones are defined as Z1 to Z6.
However, as shown in FIG. 16, the sound source state can be determined in the same manner by dividing into three zones Z1 to Z3 by a dotted line passing from the center point Cp to the middle point between the microphones.
In this case, for example, when only the sound source A is sounding, the number of channels χ2 of the channel of the microphone M2 is the largest, and therefore the zone Z2 controlled by the microphone M2 is used.
Is determined to have a sound source. When only the sound source B is sounding, # 3 is the largest, and it is determined that there is a sound source in the zone Z3. Also, χ1 is a preset value Th.
If it is equal to or less than Q, it is determined that the sound source in the zone Z1 of the microphones M1, M2, and M3, which is divided into two, is not sounding. With the above processing, even if the zone is divided into three, the state of the sound source can be determined in the same manner as in the case of dividing into six.

【００５６】また、基準値ＴｈＲ，ＴｈＰ，ＴｈＱは、
全てのマイクロホンＭ１〜Ｍ３で同一値を用いた場合で
説明したが、マイクロホン毎に適宜変更してもよい。ま
た、以上の説明では、音源が３個でマイクロホンが３個
の場合についてであったが、マイクロホンの個数は音源
の個数と同数以上であれば、同様に音源ゾーンを検出す
ることができる。The reference values ThR, ThP, ThQ are:
Although the case where the same value is used for all microphones M1 to M3 has been described, it may be changed as appropriate for each microphone. In the above description, the case where the number of microphones is three and the number of microphones is three is described. However, if the number of microphones is equal to or more than the number of sound sources, the sound source zone can be similarly detected.

【００５７】例えば、音源が４個の場合には、４個のマ
イクロホンにより、個々のチャネルのマイクロホンが１
個の音源を司るように、図１６の分割方法と同様に４個
のゾーンに空間を分割する。このときの音源状態判定
は、図１５のステップ２０１〜２０８と同様な処理によ
り、４個全部の音源が無音か、いずれか１個が発音して
いるかを判定する。それらいずれでもないとき、図１５
のステップ２０９〜２１４と同様な処理により、４個の
内の１個が無音かを判定し、１個の無音もないとき図１
５のステップ２１５と同じ処理により全部の音源が発音
していると判定する。また、４個の内の３個の音源が発
音しているとき（１個が無音のとき）は、そのままとし
ても良いが、その３個の内のより無音に近い１個を選別
するには、次のようにより細かく制御する。すなわち、
基準値をＴｈＱからＴｈＳ（ＴｈＰ＞ＴｈＳ＞ＴｈＱ）
に換え、図１５の各ステップ２１０，２１２，２１４の
各々の次段に図１５のステップ２０９〜２１４と同様な
処理部分を設けて、３個の内から１個の無音に近い音源
を判定する。For example, when the number of sound sources is four, the microphone of each channel becomes one by four microphones.
The space is divided into four zones in the same manner as in the division method of FIG. In the sound source state determination at this time, it is determined whether all four sound sources are silent or one of them is sounding by processing similar to steps 201 to 208 in FIG. If none of them, FIG.
By the same processing as in steps 209 to 214, it is determined whether one of the four is silent.
It is determined that all the sound sources are sounding by the same processing as in step 215 of No. 5. If three of the four sound sources are sounding (one is silent), it may be left as it is. To select one of the three sound sources that is closer to silence, , Is controlled more finely as follows. That is,
Change the reference value from ThQ to ThS (ThP>ThS> ThQ)
Instead, a processing part similar to that of steps 209 to 214 in FIG. 15 is provided at the next stage of each of steps 210, 212, and 214 in FIG. 15, and one sound source close to silence is determined from three. .

【００５８】このように、音源の数が多くなるほど、図
１５のステップ２０９〜２１４の処理内容を繰り返すこ
とにより、無音又は無音に近い音源を２以上判定するこ
とができる。ただし、判定基準値ＴｈＳは処理の繰り返
しが増えるほど、ＴｈＰに近付ける。以上の処理動作手
順マイクロホンが４個、音源が４個の場合について図１
７に示すようになる。まずマイクロホンＭ１〜Ｍ４より
第１〜第４チャネル信号Ｓ１〜Ｓ４を取込み（Ｓ０
１）、これらチャネル信号Ｓ１〜Ｓ４のレベルＰ（Ｓ
１）〜Ｐ（Ｓ４）をそれぞれ検出し（Ｓ０２）、これら
レベルＰ（Ｓ１）〜Ｐ（Ｓ４）の何れもが基準値ＴｈＲ
以下であるかを調べ（Ｓ０３）、基準値以下であれば制
御信号ＳＡＢＣＤｉを生成して合成信号ＳＡ，ＳＢ，Ｓ
Ｃ（Ｓ１）の出力を抑圧する（Ｓ０４）。ステップＳ０
３で何れかが基準値ＴｈＲ以下でなければ、各チャネル
信号Ｓ１〜Ｓ４をｎ帯域に分割すると共にその各帯域の
レベルＰ（Ｓ１ｆｉ），Ｐ（Ｓ２ｆｉ），Ｐ（Ｓ３ｆ
ｉ），Ｐ（Ｓ４ｆｉ）（ｉ＝１，…，ｎ）を求める（Ｓ
０５）。各チャネル間で同一帯域ｆｉのレベル中の最
大のチャネルｆｉＭ（Ｍは１，２，３，４の何れか）を
各帯域について決定し（Ｓ０６）、全帯域（ｎ個）中で
ｆｉ１，ｆｉ２，ｆｉ３，ｆｉ４の各合計値χ１，χ
２，χ３，χ４を求める（Ｓ０７）。χ１，χ２，χ
３，χ４中の最大のものχ_Mを求め（Ｓ０８）、χ_Mが
基準値ＴｈＰ１（例えばｎ／３）以上であるかを調べ
（Ｓ０９）、ＴｈＰ１以上であればチャネルＭと対応し
て選出した音源信号、音源Ａの信号であれば分離された
チャネルＭ以外の分離されたチャネルの分離音響信号を
抑圧する制御信号ＳＢＣＤｉを生成する（Ｓ０１０）。
ステップＳ０８から直ちにステップＳ０１０へ移っても
よい。As described above, as the number of sound sources increases, by repeating the processing contents of steps 209 to 214 in FIG. 15, it is possible to determine two or more silent or near-silent sound sources. However, the criterion value ThS approaches ThP as the repetition of the process increases. FIG. 1 shows a case where four microphones and four sound sources are used.
As shown in FIG. First, the first to fourth channel signals S1 to S4 are acquired from the microphones M1 to M4 (S0
1), the level P of these channel signals S1 to S4 (S
1) to P (S4) are respectively detected (S02), and any of these levels P (S1) to P (S4) becomes the reference value ThR.
(S03), and if it is less than the reference value, the control signal SABCDi is generated and the combined signals SA, SB, S
The output of C (S1) is suppressed (S04). Step S0
3, if any one is not less than the reference value ThR, each of the channel signals S1 to S4 is divided into n bands, and the levels P (S1fi), P (S2fi), and P (S3f) of each band are divided.
i), P (S4fi) (i = 1,..., n) are obtained (S
05). Same band fi between channels Is determined for each band (S06), and the total of fi1, fi2, fi3, and fi4 in all bands (n) is determined. Value χ1, χ
2, χ3, χ4 are obtained (S07). χ1, χ2, χ
3, obtains the largest of chi _M in χ4 (S08), χ _M is checked whether the reference value THP1 (e.g. n / 3) or more (S09), in correspondence with channel M if THP1 or more elected If the signal is a sound source signal or a signal of the sound source A, a control signal SBCDi for suppressing a separated sound signal of a separated channel other than the separated channel M is generated (S010).
The process may proceed from step S08 to step S010 immediately.

【００５９】ステップＳ０９で基準値以上でなければχ
_Mが基準値ＴｈＱ以下のチャネルＭがあるかを調べる
（Ｓ０１１）。ＴｈＱ以下のものがなければ、全ての音
源が発音しているとみなして、何れの制御信号も発生し
ない（Ｓ０１２）。ステップＳ０１１でχ_MがＴｈＱ以
下のチャネルＭがあれば、これと対応するチャネルＭと
して分離された音源信号を抑圧する制御信号ＳＭｉを
生成する（Ｓ０１３）。If it is not equal to or larger than the reference value in step S09, χ
_It is checked whether there is a channel M whose M is equal to or less than the reference value ThQ (S011). If there is no signal below ThQ, it is assumed that all sound sources are sounding, and no control signal is generated (S012). If there is a channel _{M in} which χM is equal to or less than ThQ in step S011, the control signal SMi for suppressing the sound source signal separated as the corresponding channel M Is generated (S013).

【００６０】制御信号ＳＭｉで抑圧された以外の分離さ
れた音源信号中の無音又は無音に近いものを抑圧するに
は、Ｓを＋１し（Ｓ０１４）（Ｓは予め０に初期化して
おく）、ＳがＭ−１（Ｍは音源の数）と一致したかを調
べ（Ｓ０１５）、一致していなければ、ＴｈＱを＋ΔＱ
だけ大としてステップＳ０１１に戻る（Ｓ０１６）。Ｓ
がＭ−１になるまでＴｈＱをＴｈＰを越えない範囲でΔ
Ｑづつ増加させステップＳ０１１を実行する。ステップ
Ｓ０１５でＭ−１＝Ｓであれば、その時のＴｈＱ以下の
各χ_Mの各チャネルＭと対応する分離された音源信号を
抑圧する各制御信号ＳＭｉを生成する（Ｓ０１３）。必
要に応じてステップＳ０１５でＭ−１＝Ｓになる前にス
テップＳ０１３に移ってもよい。In order to suppress silence or near silence in the separated sound source signals other than those suppressed by the control signal SMi, S is incremented by 1 (S014) (S is initialized to 0 in advance). It is checked whether S matches M-1 (M is the number of sound sources) (S015). If not, ThQ is set to + ΔQ
And the process returns to step S011 (S016). S
Until Th becomes M−1, in a range not exceeding ThP Δ
Step S011 is performed by incrementing Q. If M-1 = S, in step S015, generates each control signal SMi for suppressing the separated sound source signal corresponding to each channel M of ThQ following each chi _M at that time (S013). If necessary, the process may proceed to step S013 before M-1 = S in step S015.

【００６１】ステップＳ０７でχ１〜χ４を計算した
後、これらでＴｈＰ２（例えば２ｎ／３）以上のものが
あるかを調べ、あればステップＳ０１０に移り、なけれ
ばステップＳ０１１に移るようにしてもよい（Ｓ０１
７）。上述では音源分離の精度を上げるため、マイクロ
ホンＭ１〜Ｍ３のチャネル信号Ｓ１〜Ｓ３の帯域間レベ
ル差を利用して信号抑圧部９０に対する制御信号を生成
したが、帯域間時間差を利用して制御信号を生成するこ
ともできる。After calculating χ1 to χ4 in step S07, it is checked whether or not there is a ThP2 (for example, 2n / 3) or more, and if there is, the process proceeds to step S010. If not, the process proceeds to step S011. (S01
7). In the above description, in order to improve the accuracy of sound source separation, the control signal for the signal suppression unit 90 is generated using the level difference between the bands of the channel signals S1 to S3 of the microphones M1 to M3, but the control signal is generated using the time difference between the bands. Can also be generated.

【００６２】この例を図１８に、図１１と対応する部分
に同一符号を付けて示す。この実施例では帯域分割部４
１で得られた各帯域ｆ１〜ｆｎの信号Ｓ１（ｆ１）〜Ｓ
１（ｆｎ）から到達時間差信号Ａｎ（Ｓ１ｆ１）〜Ａｎ
（Ｓ１ｆｎ）が帯域別時間差検出部１０１で検出され、
同様に帯域分割部４２，４３でそれぞれ得られた各帯域
の信号Ｓ２（ｆ１）〜Ｓ２（ｆｎ）、Ｓ３（ｆ１）〜Ｓ
３（ｆｎ）からそれぞれ到達時間差信号Ａｎ（Ｓ２ｆ
１）〜Ａｎ（Ｓ２ｆｎ），Ａｎ（Ｓ３ｆ１）〜Ａｎ（Ｓ
３ｆｎ）が帯域別時間差検出部１０２，１０３で検出さ
れる。FIG. 18 shows this example, in which parts corresponding to those in FIG. In this embodiment, the band dividing unit 4
1, signals S1 (f1) to S1 of the respective bands f1 to fn
1 (fn) to arrival time difference signals An (S1f1) to An
(S1fn) is detected by the band-based time difference detection unit 101,
Similarly, the signals S2 (f1) to S2 (fn) and S3 (f1) to S3 of each band obtained by the band division units 42 and 43, respectively.
3 (fn) to the arrival time difference signal An (S2f
1) to An (S2fn), An (S3f1) to An (S
3fn) is detected by the time difference detection units 102 and 103 for each band.

【００６３】これらの到達時間差信号を得る処理は、例
えば、フーリエ変換により各帯域の信号の位相（あるい
は群遅延）を算出し、同一の帯域ｆｉの信号Ｓ１（ｆ
ｉ），Ｓ２（ｆｉ），Ｓ３（ｆｉ）（ｉ＝１，２，…，
ｎ）の位相を相互に比較することで、同一音源信号の到
達時間差と対応した信号を得ることができる。この場合
も帯域分割部４０での分割は、１つの帯域には１つの音
源信号成分しか存在しないとみなせる程度に小さく行
う。In the processing for obtaining these arrival time difference signals, for example, the phase (or group delay) of the signal of each band is calculated by Fourier transform, and the signal S1 (f) of the same band fi is calculated.
i), S2 (fi), S3 (fi) (i = 1, 2,...,
By comparing the phases n) with each other, a signal corresponding to the arrival time difference of the same sound source signal can be obtained. Also in this case, the division by the band division unit 40 is performed so small that only one sound source signal component can be considered to exist in one band.

【００６４】この到達時間差の表現方法は、例えば、マ
イクロホンＭ１〜Ｍ３のいずれかを基準にしてその基準
マイクロホンに対する到達時間差を０に設定しておけ
ば、他のマイクロホンに対する到達時間差はその基準マ
イクロホンに対して速く到達したか遅く到達したかで判
定できるので、正又は負の極性を付した数値で表すこと
ができる。この場合、基準マイクロホンを例えばＭ１と
すると、到達時間差信号Ａｎ（Ｓ１ｆ１）〜Ａｎ（Ｓ１
ｆｎ）は全て０となる。This arrival time difference can be expressed by, for example, setting the arrival time difference with respect to one of the microphones M1 to M3 to the reference microphone to 0, and setting the arrival time difference with respect to other microphones to the reference microphone. On the other hand, since it can be determined whether the vehicle has arrived faster or later, it can be represented by a numerical value with a positive or negative polarity. In this case, assuming that the reference microphone is, for example, M1, the arrival time difference signals An (S1f1) to An (S1f1).
fn) are all 0.

【００６５】音源状態判定部１１０では、コンピュータ
処理により音声を発していない音源を判定する。まず、
帯域別時間差検出部１００により得られる到達時間差信
号Ａｎ（Ｓ１ｆ１）〜Ａｎ（Ｓ１ｆｎ），Ａｎ（Ｓ２ｆ
１）〜Ａｎ（Ｓ２ｆｎ），Ａｎ（Ｓ３ｆ１）〜Ａｎ（Ｓ
３ｆｎ）を、同一の帯域の信号について相互に比較す
る。これにより各帯域ｆ１〜ｆｎ毎に、最も信号が速く
到達するチャネルが決定できる。The sound source state determination unit 110 determines a sound source that does not emit sound by computer processing. First,
Arrival time difference signals An (S1f1) to An (S1fn), An (S2f) obtained by the band-specific time difference detection unit 100.
1) to An (S2fn), An (S3f1) to An (S
3fn) are compared with each other for signals in the same band. As a result, for each of the bands f1 to fn, the channel in which the signal reaches the fastest can be determined.

【００６６】そこで、各チャネルについて信号が最も速
く到達すると判定された帯域の合計数を算出して、それ
をチャネル間で比較する。この結果、この合計帯域数の
値が大きいチャネルのマイクロホンほど、音源に近いと
みなすことができる。そして、あるチャネルについて、
当該合計帯域数が予め設定した基準値ＴｈＰを越えたと
き、当該のチャネルのマイクロホンが司るゾーンに音源
があると判定する。Therefore, the total number of bands determined to reach the fastest signal for each channel is calculated and compared between channels. As a result, the microphone of a channel having a larger value of the total number of bands can be regarded as being closer to the sound source. And for a certain channel,
When the total number of bands exceeds a preset reference value ThP, it is determined that a sound source exists in a zone controlled by the microphone of the channel.

【００６７】また、この音源状態判定部１１０には、全
帯域レベル検出部６０で検出された各チャネルのレベル
Ｐ（Ｓ１）〜Ｐ（Ｓ３）も入力され、あるチャネルのレ
ベルが予め設定した基準値ＴｈＲ以下の場合には、その
チャネルのマイクロホンが司るゾーンには、音源がない
と判定する。いま図１２に示したように音源Ａ，Ｂに対
し、マイクロホンＭ１〜Ｍ３を配置したとする。またマ
イクロホンＭ１のチャネルに対する前記した合計帯域数
をχ１、マイクロホンＭ２，Ｍ３の各チャネルに対する
合計帯域数をそれぞれχ２，χ３とする。The sound source state determination unit 110 also receives the levels P (S1) to P (S3) of each channel detected by the all-band level detection unit 60, and sets the level of a certain channel to a predetermined reference level. If the value is equal to or smaller than the value ThR, it is determined that there is no sound source in the zone controlled by the microphone of the channel. Now, it is assumed that microphones M1 to M3 are arranged for sound sources A and B as shown in FIG. The total number of bands for the channel of the microphone M1 is $ 1, and the total number of bands for each channel of the microphones M2 and M3 is $ 2 and $ 3, respectively.

【００６８】この場合も図１３に示した処理手順と同様
にすればよい。即ち、まず、全帯域レベル検出部６０の
検出信号Ｐ（Ｓ１）〜Ｐ（Ｓ３）のすべてが基準値Ｔｈ
Ｒよりも小さいとき（１０１）、音源Ａ，Ｂは発音して
いないと見なし、制御信号ＳＡＢｉを生成して（１０
２）、両音源信号ＳＡ，ＳＢを抑圧する。このとき、出
力信号ＳＡ′，ＳＢ′は無音信号となる。In this case, the procedure may be the same as that shown in FIG. That is, first, all of the detection signals P (S1) to P (S3) of the all band level detection unit 60 are equal to the reference value Th.
When it is smaller than R (101), it is considered that the sound sources A and B are not sounding, and the control signal SABi is generated (10).
2) Suppress both sound source signals SA and SB. At this time, the output signals SA 'and SB' are silent signals.

【００６９】音源Ａのみが発音しているときは、その音
源信号のすべての帯域の周波数成分がマイクロホンＭ２
へ一番速く到達するので、このマイクロホンＭ２のチャ
ネルの合計帯域数χ２が最も多くなる。また、音源Ｂの
みが発音しているときは、その音源信号のすべての帯域
の周波数成分がマイクロホンＭ３へ一番速く到達するの
で、このマイクロホンＭ３のチャネルの合計帯域数χ３
が最も多くなる。When only the sound source A is sounding, the frequency components of all the bands of the sound source signal are output from the microphone M2.
, The total number of bands χ2 of the channel of the microphone M2 becomes the largest. Further, when only the sound source B is sounding, the frequency components of all the bands of the sound source signal reach the microphone M3 fastest, so that the total number of channels of the microphone M3 マイクロ 3
Is the most.

【００７０】さらに、音源Ａ，Ｂが共に発音している場
合には、音源信号が最も速く到達する帯域数がマイクロ
ホンＭ２とＭ３で拮抗する。したがって、前記した基準
値ＴｈＰにより、音源信号があるマイクロホンへ最も速
く到達する合計帯域数が、当該設定値ＴｈＰを越えた場
合、当該マイクロホンが司るゾーンに音源が存在し、そ
の音源が発音していると判定する。Further, when both the sound sources A and B are sounding, the number of bands in which the sound source signal reaches the fastest is opposed by the microphones M2 and M3. Therefore, when the total number of bands in which the sound source signal reaches the microphone at the earliest exceeds the set value ThP, the sound source exists in the zone controlled by the microphone, and the sound source emits a sound. It is determined that there is.

【００７１】上記の例では、音源Ａのみが発音している
ときは、χ２のみが基準値ＴｈＰを越えて（図３の１０
３）、音響を発生している音源が存在するのはマイクロ
ホンＭ２が司るゾーンＺ３であると検出されるので、制
御信号ＳＢｉが生成され（１０４）、音響信号ＳＢが抑
制され、信号ＳＡのみが出力される。また、音源Ｂのみ
が発音しているときは、χ３のみが基準値ＴｈＰを越え
（１０５）、音を発している音源が存在するのは、マイ
クロホンＭ３が司るゾーンＺ４であると検出されるの
で、制御信号ＳＡｉが生成され（１０６）信号ＳＡが抑
制されて、信号ＳＢのみが出力される。In the above example, when only the sound source A is sounding, only # 2 exceeds the reference value ThP (10 in FIG. 3).
3) Since it is detected that the sound source generating the sound exists in the zone Z3 controlled by the microphone M2, the control signal SBi is generated (104), the sound signal SB is suppressed, and only the signal SA is output. Is output. When only the sound source B is sounding, only # 3 exceeds the reference value ThP (105), and it is detected that the sound source emitting the sound is in the zone Z4 controlled by the microphone M3. , A control signal SAi is generated (106), the signal SA is suppressed, and only the signal SB is output.

【００７２】この例ではＴｈＰは例えばｎ／３程度に設
定され、音源Ａ，Ｂが共に発音していて、χ２，χ３と
もに基準値ＴｈＰを越えることがある。この場合は図１
３の処理手順に示すように一方の音源、この例ではＡを
優先させ、音源Ａへ分離信号のみを出力させることもで
きる。また、χ２，χ３が共に基準値ＴｈＰに達してい
ない場合は、レベルＰ（Ｓ１）〜Ｐ（Ｓ３）が基準値Ｔ
ｈＲを越えている限り、両音源Ａ，Ｂともに発音してい
ると判断し、制御信号ＳＡｉ，ＳＢｉ，ＳＡＢｉは出力
せず（図３の１０７）音声抑圧部９０では音声信号Ｓ
Ａ，ＳＢに対する抑圧は行われない。In this example, ThP is set to, for example, about n / 3, and both the sound sources A and B are sounding, and both # 2 and # 3 may exceed the reference value ThP. In this case, FIG.
As shown in the processing procedure 3, one of the sound sources, A in this example, may be prioritized, and only the separated signal may be output to the sound source A. When both # 2 and # 3 have not reached the reference value ThP, the levels P (S1) to P (S3) are not equal to the reference value ThP.
As long as it exceeds hR, it is determined that both sound sources A and B are sounding, and control signals SAi, SBi and SABi are not output (107 in FIG. 3).
No suppression is performed on A and SB.

【００７３】図１２に示した状態に対して図１４に示す
ように音源ＣをゾーンＺ６に加えた場合、図示しないが
音源分離部８０からは、音源Ａに対応する信号ＳＡ、音
源Ｂに対応する信号ＳＢの他に、音源Ｃに対応する信号
ＳＣが出力する。これと対応して音源状態判定部１１０
から、信号ＳＡを抑圧する制御信号ＳＡｉ、信号ＳＢを
抑圧する制御信号ＳＢｉの他に、信号ＳＣを抑圧する制
御信号ＳＣｉが出力し、また、信号ＳＡとＳＢを抑圧す
る制御信号ＳＡＢｉの他に、信号ＳＢとＳＣを抑圧する
制御信号ＳＢＣｉ、信号ＳＣとＳＡを抑圧する制御信号
ＳＣＡｉ、信号ＳＡ，ＳＢ，ＳＣの全部を抑圧する制御
信号ＳＡＢＣｉが出力する。そして、この音源状態判定
部１１０は先に述べた図１５に示したと同様の処理を行
う。When the sound source C is added to the zone Z6 as shown in FIG. 14 with respect to the state shown in FIG. 12, the sound source separation unit 80 outputs signals SA and B corresponding to the sound source A (not shown). In addition to the signal SB, a signal SC corresponding to the sound source C is output. Correspondingly, the sound source state determination unit 110
Outputs a control signal SCi for suppressing the signal SC in addition to a control signal SAi for suppressing the signal SA and a control signal SBi for suppressing the signal SB. , A control signal SBCi for suppressing the signals SB and SC, a control signal SCAi for suppressing the signals SC and SA, and a control signal SABCi for suppressing all of the signals SA, SB and SC. Then, the sound source state determination unit 110 performs the same processing as that shown in FIG. 15 described above.

【００７４】まず、レベルＰ（Ｓ１）〜Ｐ（Ｓ３）の全
部が基準値ＴｈＲを越えていない場合は、いずれの音源
Ａ〜Ｃも発音していないものと判断して、音源状態判定
部１１０からはＳＡＢＣｉが出力して、信号ＳＡ，Ｓ
Ｂ，ＳＣのいずれもが抑圧される。次に、音源Ａ，Ｂ，
Ｃがそれぞれ単独で発音している場合には、前記した音
源が２個の場合と同様に、その音源に最も近いマイクロ
ホンのチャネルの到達時間が最も速くなるので、そのチ
ャネルの帯域数χ１，χ２，χ３のいずれかが基準値Ｔ
ｈＰを越える。そして、音源Ｃのみが発音している場合
は、制御信号ＳＡＢｉが出力して信号ＳＡ，ＳＢが抑圧
される。また、音源Ａのみが発音している場合は、制御
信号ＳＢＣｉが出力して信号ＳＢ，ＳＣが抑圧される。
さらに、音源Ｂのみが鳴っている場合は、制御信号ＳＡ
Ｃｉが出力して信号ＳＡ，ＳＣが抑圧される（図１５の
２０３〜２０８）。First, when all of the levels P (S1) to P (S3) do not exceed the reference value ThR, it is determined that none of the sound sources A to C is sounding, and the sound source state determination unit 110 Outputs SABCi to output signals SA and S
Both B and SC are suppressed. Next, sound sources A, B,
If each of the C's is sounding independently, the arrival time of the channel of the microphone closest to the sound source is the fastest, as in the case of the two sound sources described above. , Χ3 is the reference value T
exceeds hP. When only the sound source C is sounding, the control signal SABi is output and the signals SA and SB are suppressed. When only the sound source A is sounding, the control signal SBCi is output and the signals SB and SC are suppressed.
Further, when only the sound source B is sounding, the control signal SA
Ci outputs and the signals SA and SC are suppressed (203 to 208 in FIG. 15).

【００７５】次に、３つの音源Ａ〜Ｃのうちのいずれか
２つが発音している場合は、発音していない音源に対応
するゾーンにあるマイクロホンの到達時間の最も速い帯
域数が、他のマイクロホンのものに比べて小さくなる。
例えば、音源Ｃのみが鳴っていない場合には、マイクロ
ホンＭ１への到達時間が最も速い帯域数χ１が、他の２
個のマイクロホンＭ２，Ｍ３の帯域数χ２，χ３に比べ
て小さくなる。Next, when any two of the three sound sources A to C are sounding, the number of bands having the fastest arrival time of the microphone in the zone corresponding to the sound source that is not sounding is determined by the other. It is smaller than that of a microphone.
For example, when only the sound source C is not sounding, the number of bands # 1 having the fastest arrival time at the microphone M1 is equal to the other two bands.
This is smaller than the number of bands # 2 and # 3 of the microphones M2 and M3.

【００７６】よって、予めある基準値ＴｈＱ（＜Ｔｈ
Ｐ）を設定し、χ１がその基準値ＴｈＱ以下になる場合
は、マイクロホンＭ１とマイクロホンＭ３で空間を２分
割したゾーンＺ５，Ｚ６の内、マイクロホンＭ１に近い
ゾーンＺ６では、音源は信号を発していないと判定し、
さらに、マイクロホンＭ１とＭ２で空間を２分割したゾ
ーンＺ１，Ｚ２のうちマイクロホンＭ１に近いゾーンＺ
１では音源は信号を発していないと判定する。Therefore, a predetermined reference value ThQ (<Th
P) is set, and if χ1 is equal to or less than the reference value ThQ, the sound source emits a signal in the zone Z6 close to the microphone M1 among the zones Z5 and Z6 obtained by dividing the space into two by the microphone M1 and the microphone M3. Judge that there is no
Further, of the zones Z1 and Z2 obtained by dividing the space into two by the microphones M1 and M2, the zone Z near the microphone M1
At 1, it is determined that the sound source does not emit a signal.

【００７７】すなわち、ゾーンＺ１，Ｚ６にある音源は
信号を発していないと判定するのである。これらのゾー
ンにある音源は音源Ｃであることから、音源Ｃが信号を
発していないと判定される。つまり、音源Ａ，Ｂのみが
信号を発していると判定され、制御信号ＳＣｉが生成さ
れて信号ＳＣが抑圧される（図１５の２０９〜２１
０）。音源Ａのみ、音源Ｂのみがそれぞれ信号を発して
いないゾーンも、同様に判定される（図１５の２１１〜
２１４）。That is, it is determined that the sound sources in the zones Z1 and Z6 do not emit a signal. Since the sound source in these zones is the sound source C, it is determined that the sound source C does not emit a signal. That is, it is determined that only the sound sources A and B are emitting signals, the control signal SCi is generated, and the signal SC is suppressed (209 to 21 in FIG. 15).
0). Zones in which only the sound source A and only the sound source B do not emit a signal are similarly determined (211 to 211 in FIG. 15).
214).

【００７８】また、χ１，χ２，χ３がともに基準値Ｔ
ｈＱ以下でないと判定されると、音源Ａ，Ｂ，Ｃはその
全てが信号を発していると判定される（図１５の２１
５）。なお、以上の例では、ゾーンをＺ１〜Ｚ６の６つ
に分けたが、図１６に示したように、３つに分けても同
様に音源状態を判定できる。この場合は、例えば、音源
Ａのみが発音している場合は、マイクロホンＭ２のチャ
ネルの帯域数χ２が最も大きくなるので、そのマイクロ
ホンＭ２の司るゾーンＺ２に音源があると判定される。
また、音源Ｂのみが発音している場合はχ３が最も大き
くなり同様にゾーンＺ３に音源があると判定される。ま
た、χ１が予め設定した値ＴｈＱ以下である場合には、
マイクロホンＭ１とＭ３で空間を２分したうちのゾーン
Ｚ１にある音源は発音していないと判定し、同じくマイ
クロホンＭ１とＭ２で空間を分割したうちのゾーンＺ１
にある音源は信号を発していないと判定する。以上の処
理により、ゾーンを３分割しても、６分割したときと同
様に音源の状態を判定できる。Further, both # 1, # 2, and # 3 are equal to the reference value T.
If it is determined that they are not less than or equal to hQ, it is determined that all of the sound sources A, B, and C are emitting signals (see 21 in FIG. 15).
5). In the above example, the zones are divided into six zones Z1 to Z6. However, as shown in FIG. 16, even if the zones are divided into three zones, the sound source state can be similarly determined. In this case, for example, when only the sound source A is sounding, the number of bands χ2 of the channel of the microphone M2 is the largest, so it is determined that the sound source exists in the zone Z2 controlled by the microphone M2.
When only the sound source B is sounding, # 3 is the largest, and similarly, it is determined that the sound source exists in the zone Z3. When には 1 is equal to or less than a preset value ThQ,
It is determined that the sound source in the zone Z1 of the space divided by the microphones M1 and M3 is not sounding, and the zone Z1 of the space divided by the microphones M1 and M2.
Is determined not to emit a signal. With the above processing, even if the zone is divided into three, the state of the sound source can be determined in the same manner as when the zone is divided into six.

【００７９】以上の場合の基準値ＴｈＰ，ＴｈＱの設定
は、先の帯域レベルを利用する場合と同様に行えばよ
い。また、基準値ＴｈＲ，ＴｈＰ，ＴｈＱは、全てのマ
イクロホンＭ１〜Ｍ３で同一値を用いた場合で説明した
が、マイクロホン毎に適宜変更してもよい。また、以上
の説明では、音源が３個でマイクロホンが３個の場合に
ついてであったが、マイクロホンの個数は音源の個数と
同数以上であれば、同様に音源ゾーンを検出することが
できる。その処理手順は先に述べた帯域レベルを利用す
る場合と同様である。従って、例えば音源が４個の場合
に４個の内の３個の音源が発音しているとき（１個が無
音のとき）は、そのままとしても良いが、その３個の内
のより無音に近い１個も選別するには、基準値をＴｈＱ
からＴｈＳ（ＴｈＰ＞ＴｈＳ＞ＴｈＱ）に換え、図１５
の２１０，２１２，２１４の各々の次段に図１５の２０
９〜２１４と同様な処理部分を設けて、３個の内から１
個の無音の音源を判定することも同様である。The setting of the reference values ThP and ThQ in the above case may be performed in the same manner as in the case where the band level is used. The reference values ThR, ThP, and ThQ have been described in the case where the same value is used for all the microphones M1 to M3, but may be changed as appropriate for each microphone. In the above description, the case where the number of microphones is three and the number of microphones is three is described. However, if the number of microphones is equal to or more than the number of sound sources, the sound source zone can be similarly detected. The processing procedure is the same as the case where the band level described above is used. Therefore, for example, when there are four sound sources and three of the four sound sources are sounding (when one sound is silent), the sound may be left as it is, but the sound becomes more silent among the three sound sources. To select the nearest one, set the reference value to ThQ
To ThS (ThP>ThS> ThQ), and FIG.
In the next stage of each of 210, 212 and 214 of FIG.
Processing parts similar to 9 to 214 are provided, and one out of three
The same applies to the determination of silence sound sources.

【００８０】図１７に示した処理において、そのレベル
の代りに時間差を用いれば、図１８に示した到達時間差
を利用した不要信号の抑圧に、図１７に示した処理手順
も適用できる。上述においては各マイクロホンの出力チ
ャネル信号をまず帯域分割したが、帯域別レベルを利用
する場合はまず各チャネルのパワースペクトルを求めた
後、帯域分割してもよい。その例を図１９に図１、図１
１と対応する部分と同一符号を付けて示し、これらと異
なる部分のみを説明する。この例ではマイクロホン１，
２よりの各チャネル信号は、パワースペクトル分解部３
００により、例えば高速フーリエ変換によりパワースペ
クトルに変換され、その後、各チャネルごとに帯分割部
４で各帯域に分割され、各帯域ではほぼ１つの音源信号
のみが主として含まれるようにして帯域別レベルを得
る。この場合、音源信号選択部６０２へ供給する各帯域
別レベルは、その原スペクトルの位相成分も供給し、音
源信号合成部７で音源信号が再生できるようにする。In the processing shown in FIG. 17, if a time difference is used instead of the level, the processing procedure shown in FIG. 17 can also be applied to suppression of unnecessary signals using the arrival time difference shown in FIG. In the above description, the output channel signal of each microphone is divided into bands first. However, when using the level for each band, the power spectrum of each channel may be obtained first and then divided into bands. FIG. 19 and FIG.
The same reference numerals are given to the parts corresponding to 1, and only the parts different from these will be described. In this example, microphone 1,
2 from the power spectrum decomposing unit 3
00, for example, is converted into a power spectrum by a fast Fourier transform, and is then divided into bands by a band division unit 4 for each channel, and each band contains only approximately one sound source signal so as to mainly include a band-specific level. Get. In this case, each band level supplied to the sound source signal selection unit 602 also supplies the phase component of the original spectrum, so that the sound source signal synthesis unit 7 can reproduce the sound source signal.

【００８１】また各帯域別レベルは帯域別チャネル間レ
ベル差検出部５と音源状態判定部７０とへ供給され、こ
れらの部分で図１、図１１で説明したように処理される
その他の動作は図１又は図１１の場合と同一である。図
２を参照して説明した実施例において、チャネル間時間
差を用いずに、各帯域分割信号ごとに、対応帯域別チャ
ネル間時間差のみを用いて、何れの音源から到来したか
を判定してもよい。また図５を参照して説明した実施例
において、チャネル間レベル差を用いずに、各帯域分割
信号ごとに、対応帯域別チャネル間レベル差のみを用い
て、何れの音源から到来したかを判定してもよい。図５
を参照した実施例におけるチャネル間レベル差の検出
は、対数レベルに変換する前のレベルを用いてもよい。
図１中の帯域分割部４、図１１、図１８中の各帯域分割
部４０、図２０中の帯域分割部２３３、図２１中の帯域
分割部２４１における各周波数帯域の分割は必ずしも同
一とする必要はない。要求される精度に応じて、これら
の分割数を互いに異ならせてもよい。図２０中の帯域分
割部２３３はその後の処理のために、その入力信号のパ
ワースペクトルを先ず求め、その後、複数の周波数帯域
に分割してもよい。The band-specific levels are supplied to the band-specific channel-to-channel level difference detecting section 5 and the sound source state judging section 70, and other operations which are processed in these portions as described with reference to FIGS. This is the same as in FIG. 1 or FIG. In the embodiment described with reference to FIG. 2, without using the time difference between channels, it is possible to determine which sound source has come from each band division signal using only the time difference between channels for each corresponding band. Good. Further, in the embodiment described with reference to FIG. 5, it is determined for each band-divided signal, without using the inter-channel level difference, only the corresponding channel-specific inter-channel level difference to determine which sound source has come from. May be. FIG.
May be used for the detection of the level difference between channels in the embodiment with reference to FIG.
The division of each frequency band by the band division unit 4 in FIG. 1, the band division units 40 in FIG. 11, FIG. 18, the band division unit 233 in FIG. 20, and the band division unit 241 in FIG. No need. These division numbers may be different from each other depending on the required accuracy. The band division unit 233 in FIG. 20 may first obtain the power spectrum of the input signal for subsequent processing, and then divide the input signal into a plurality of frequency bands.

【００８２】以下に図６〜９に示したこの発明を適用し
た実験例を示す。図２０に示す３種類の２音源信号の組
み合わせにこの発明を適用し、その際に帯域分割部４で
与える周波数分解能を変化させ、分離信号を物理的、及
び主観的に評価した。分離処理前の混合信号は、チャネ
ル間時間差及びレベル差のみを計算機上で与えて加算す
ることにより作成した。与えたチャネル間時間差、レベ
ル差はそれぞれ、０．４７ｍｓ、２ｄＢである。An experimental example to which the present invention shown in FIGS. 6 to 9 is applied will be described below. The present invention was applied to a combination of three types of two sound source signals shown in FIG. 20, and at that time, the frequency resolution given by the band division unit 4 was changed, and the separated signal was physically and subjectively evaluated. The mixed signal before the separation processing was created by giving only a time difference between channels and a level difference on a computer and adding them. The given time difference between channels and level difference are 0.47 ms and 2 dB, respectively.

【００８３】帯域分割部４の周波数分解能は、約５Ｈ
ｚ，１０Ｈｚ，２０Ｈｚ，４０Ｈｚ，８０Ｈｚの５種類
とした。これらの分解能で分離した信号と、原信号（Ｏ
Ｓ）の計６種類の信号について評価した。なお、信号帯
域は約５ｋＨｚである。定量的評価を次のように行っ
た。混合された信号の分離が完全に行われた場合、原信
号と分離信号が等しくなる。すなわち、相関係数が１と
なる。そこで、分離度を計る物理量として、各音につい
て原信号と処理後の信号との相関係数を算出した。The frequency resolution of the band dividing section 4 is about 5H
z, 10 Hz, 20 Hz, 40 Hz, and 80 Hz. The signal separated at these resolutions and the original signal (O
Evaluation was made for a total of six types of signals in S). The signal band is about 5 kHz. Quantitative evaluation was performed as follows. When the separation of the mixed signal is completely performed, the original signal and the separated signal become equal. That is, the correlation coefficient is 1. Therefore, a correlation coefficient between the original signal and the processed signal was calculated for each sound as a physical quantity for measuring the degree of separation.

【００８４】結果を、図２２に破線で示す。音声は、い
ずれの組み合わせについても、周波数分解能が８０Ｈｚ
になると相関値がかなり低くなるが、それ以外の分解能
の場合は顕著な差が見られなかった。鳥の鳴き声につい
ては今回用いた周波数分解能の間に顕著な差は見られな
かった。主観評価を次のように行った。The result is shown by a broken line in FIG. Audio has a frequency resolution of 80 Hz for all combinations
, The correlation value was considerably reduced, but no significant difference was observed at other resolutions. There was no significant difference in the frequency resolution used for bird calls. Subjective evaluation was performed as follows.

【００８５】被験者は、正常な聴力を持つ２０代から３
０代の日本人５人とした。各音源について、５種類の周
波数分解能の分離音と原音をランダムにヘッドホンでダ
イオティックに提示し、音質について５段階で評価させ
た。一つの音の提示時間は約４秒間であった。結果を、
図２２に実線で示す。分離音Ｓ１については周波数分解
能１０Ｈｚの場合が一番評価が高い。また、全ての条件
に対する評価の間に有意差（α＜０．０５）が存在し
た。分離音Ｓ２〜４、６については周波数分解能２０Ｈ
ｚの評価が最も高いが、２０Ｈｚと１０Ｈｚとの間には
有意差はなかった。また、２０Ｈｚの音と５Ｈｚ，４０
Ｈｚ，８０Ｈｚの間にはそれぞれ有意差が存在した。こ
れらの結果から、音声については分離する組み合わせの
種類によらず、最適な周波数分解能が存在することが分
かった。この実験の場合は２０Ｈｚもしくは１０Ｈｚ程
度が最適な値である。分離音Ｓ５（鳥の鳴き声）につい
ては４０Ｈｚの場合が最も評価が高いが有意差は４０Ｈ
ｚと５Ｈｚ，２０Ｈｚと５Ｈｚの間にしか存在しなかっ
た。なお、いずれの場合についても、分離処理後の音と
原音の間には有意差が存在した。Subjects were in their twenties with normal hearing
Five Japanese in their teens. For each sound source, separated sounds and original sounds of five types of frequency resolution were randomly and diotically presented with headphones, and the sound quality was evaluated on a five-point scale. The presentation time of one sound was about 4 seconds. The result
FIG. 22 shows a solid line. The evaluation of the separated sound S1 is highest when the frequency resolution is 10 Hz. Also, there was a significant difference (α <0.05) between evaluations for all conditions. For the separated sounds S2 to S4 and S6, the frequency resolution is 20H
Although the evaluation of z was the highest, there was no significant difference between 20 Hz and 10 Hz. In addition, 20Hz sound and 5Hz, 40Hz
There was a significant difference between Hz and 80 Hz. From these results, it has been found that there is an optimum frequency resolution regardless of the type of combination to be separated for voice. In this experiment, the optimum value is about 20 Hz or 10 Hz. Regarding the separated sound S5 (bird's cry), the case of 40 Hz is the highest evaluation, but the significant difference is 40H.
It was only between z and 5 Hz, 20 Hz and 5 Hz. In each case, there was a significant difference between the sound after the separation processing and the original sound.

【００８６】図２１、図２３にこの発明の効果を示す。
図２１は、分離処理前の男声と女声の混合音声のスペク
トル２０１とこの発明による分離処理後の男声Ｓ１、女
声Ｓ２の各スペクトル２０２，２０３を表す。図２３
は、分離処理前の男声Ｓ１、女声Ｓ２の各原音声の各波
形をＡ，Ｂに、混合音声波形をＣに、分離処理後の男声
Ｓ１、女声Ｓ２の各波形をＤ，Ｅにそれぞれ示す。図２
１からは、不要な成分が抑圧されていることが分かる。
さらに、図２３からは、分離処理後の音声が原音声と同
程度の品質で復元されていることが分かる。FIGS. 21 and 23 show the effect of the present invention.
FIG. 21 shows a spectrum 201 of a mixed voice of a male voice and a female voice before the separation processing, and respective spectra 202 and 203 of the male voice S1 and the female voice S2 after the separation processing according to the present invention. FIG.
Indicates the waveforms of the original voices of the male voice S1 and the female voice S2 before separation processing in A and B, the mixed voice waveform in C, and the waveforms of the male voice S1 and female voice S2 after the separation processing in D and E, respectively. . FIG.
From FIG. 1, it can be seen that unnecessary components are suppressed.
Further, it can be seen from FIG. 23 that the sound after the separation processing is restored with the same quality as the original sound.

【００８７】帯域分割の分解能は音声の場合、１０〜２
０Ｈｚ程度が好ましく、５Ｈｚ以下、５０Ｈｚ以上は好
ましくない。帯域分割の手法はフーリエ変換に限らず、
帯域フィルタにより分割してもよい。次に図１１に示し
たレベル差を利用して音源状態を判定して信号抑圧部９
０で信号抑圧を行う場合の実験例を示す。２個のマイク
ロホンを用い、２つの音源Ａ，Ｂをダミーヘッドから距
離１．５ｍ、角度差９０度（２つのマイクロホンの中点
に対し右４５度、左４５度）の位置に置き、同一の音圧
レベルで、残響時間０．２ｓ（５００Ｈｚ）の可変残響
室内で収音した。用いた混合音と分離音の組み合せは図
２２中のＳ１〜Ｓ４である。The resolution of the band division is 10 to 2 for voice.
About 0 Hz is preferable, and 5 Hz or less and 50 Hz or more are not preferable. The band division method is not limited to Fourier transform,
It may be divided by a band filter. Next, the sound source state is determined using the level difference shown in FIG.
An experimental example when signal suppression is performed at 0 is shown. Using two microphones, the two sound sources A and B were placed at a distance of 1.5 m from the dummy head and at an angle difference of 90 degrees (45 degrees to the right and 45 degrees to the midpoint of the two microphones) and were identical. Sound was collected in a variable reverberation room having a reverberation time of 0.2 s (500 Hz) at a sound pressure level. The combinations of the mixed sound and the separated sound used are S1 to S4 in FIG.

【００８８】分離音声Ｓ１〜Ｓ４について、無音と判定
されたフレームの個数と、原音の無音フレームの個数の
比率を算した。その結果は次の通り９０％以上正しく検
出された。男（S1）女(S2) 女声１(S3) 女声２(S4) 検出率９９％９３％９２％９５％図６〜９に示した基本方法と図１１に示した改良方法と
のそれぞれで分離した音をランダムにヘッドホンでダイ
オティックに提示し、雑音の交じり具合の少なさと不連
続感の少なさについて評価させた。用いた分離音は前記
Ｓ１〜Ｓ４であり、被験者は正常な聴力を持つ２０代か
ら３０代の日本人５名である。一つの音の提示時間は約
４秒間、各音の試行回数は３回である。その結果、雑音
の交じり具合が少ないと評価した率は改良方法が９１．
７％、基本方法は８．３％で、改良方法が少ないと判断
した回答が格段と多かった。一方不連続感が少ないにつ
いては改良方法は２０．０％、基本方法が８０．０％で
基本方法の方が少ないと判断する回答が多かったが、改
良方法との間に有意な差は見られなかった。For the separated voices S1 to S4, the ratio of the number of frames determined to be silent to the number of silence frames of the original sound was calculated. As a result, 90% or more were correctly detected as follows. Male (S1) Female (S2) Female voice 1 (S3) Female voice 2 (S4) Detection rate 99% 93% 92% 95% Separated by the basic method shown in FIGS. 6 to 9 and the improved method shown in FIG. The sound was randomly presented diotically with headphones, and the participants were evaluated for the degree of noise mixing and the degree of discontinuity. The used separated sounds are S1 to S4, and the subjects are five Japanese people in their twenties to thirties with normal hearing. The presentation time of one sound is about 4 seconds, and the number of trials of each sound is three. As a result, the rate of improvement in the degree of noise mixing was small, and the rate of improvement was 91.
7% and the basic method were 8.3%. On the other hand, when the sense of discontinuity was small, the improvement method was 20.0%, the basic method was 80.0%, and there were many answers that judged that the basic method was less. I couldn't.

【００８９】次に分離性能を相対評価を行うため、以下
の５種類の音の分離度の比較を主観評価により行った。（１）原音（２）基本法（計算機）：チャネル間時間差（０．４７
ｍｓ）、レベル差（２ｄＢ）を与えて計算機上で加算し
た混合信号を、基本方法で分離した音。（３）改良法（実環境）：先の無音区間検出率の実験に
用いた条件で収音した混合音を改良方法で分離した音。（４）基本法（実環境）：先の無音区間検出率の実験に
用いた条件で収音した混合音を基本方法で分離した音。（５）混合音：先の無音区間検出率の実験に用いた条件
で収音した混合音。Next, in order to make a relative evaluation of the separation performance, a comparison of the following five types of sound separation was made by subjective evaluation. (1) Original sound (2) Basic method (computer): time difference between channels (0.47
ms), and a mixed signal obtained by adding a level difference (2 dB) on a computer and separating it by a basic method. (3) Improved method (real environment): A sound obtained by separating a mixed sound collected under the conditions used in the above-described experiment on the silent section detection rate by the improved method. (4) Basic method (real environment): A sound obtained by separating a mixed sound collected under the conditions used in the above-described experiment on the silent section detection rate by the basic method. (5) Mixed sound: A mixed sound collected under the conditions used in the above-described experiment on the silence section detection rate.

【００９０】図２０中の最初の２つの混合音に対し、
“原音”上記（１）〜（４）の方法で処理した音、“混
合音”の計２０種類をランダムにヘッドホンでダイオテ
ィックに提示し、分離度について７段階で評価させた。
つまり「最も分離されている」を７点、「最も分離され
ていない」を１点とした。被験者、音の提示時間及び試
行回数は、前記雑音の交じり具合の少なさの評価の場合
と同一である。For the first two mixed sounds in FIG.
“Original sound” A total of 20 kinds of sounds processed by the above-mentioned methods (1) to (4) and “mixed sounds” were randomly presented diagonally with headphones, and the degree of separation was evaluated in seven levels.
In other words, "most separated" was given 7 points, and "least separated" was given 1 point. The subject, the presentation time of the sound, and the number of trials are the same as in the case of the evaluation of the degree of noise mixing.

【００９１】この結果を図２４中で、全音源（Ｓ０）を
Ａに、男声（Ｓ１）をＢに、女声（Ｓ２）をＣに、女声
１（Ｓ３）をＤに、女声２（Ｓ４）をＥにそれぞれ示
す。全音源について分析した結果（Ｓ０）と、音源の種
類毎に分析した結果（Ｓ１）〜（Ｓ４）とは、ほぼ同じ
傾向を示した。Ｓ０〜Ｓ４全ての場合について、
“（１）原音”、“（２）基本法（計算機）”、
“（３）改良法（実環境）”、“（４）基本法（実環
境）”、“（５）混合音”の順に分離精度が高い。つま
り実環境では改良方法の方が基本方法より優れている。The results are shown in FIG. 24. All sound sources (S0) are A, male voices (S1) are B, female voices (S2) are C, female voices 1 (S3) are D, and female voices 2 (S4). Is shown in E respectively. The results (S0) analyzed for all sound sources and the results (S1) to (S4) analyzed for each type of sound source showed almost the same tendency. For all S0 to S4 cases,
“(1) Original sound”, “(2) Basic method (computer)”,
The separation accuracy is higher in the order of “(3) Improved method (real environment)”, “(4) Basic method (real environment)”, and “(5) Mixed sound”. That is, in a real environment, the improved method is superior to the basic method.

【００９２】[0092]

【発明の効果】以上述べたようにこの発明によれば複数
のマイクロホンからの各チャネル信号を、主な成分が１
つの音源信号の成分のみからなる程度に複数の帯域に分
割し、これら各同一帯域について、レベル、到達時間を
検出し、これらから、各帯域ごとに何れの音源信号かを
判定分離することにより、各音源信号を正しく分離する
ことができ、しかも実時間での処理が可能である。As described above, according to the present invention, each channel signal from a plurality of microphones has one main component.
By dividing into a plurality of bands to the extent that it consists of only one sound source signal, detecting the level and arrival time for each of these same bands, and determining and separating which sound source signal for each band from these, Each sound source signal can be correctly separated, and can be processed in real time.

【００９３】特に発音していない音源を検出し、その成
分を抑圧することにより、部屋内のような回り込みや、
残響がある場所でも、正確に分離することができる。By detecting a sound source that is not particularly sounding and suppressing its components, it is possible to wrap around a room,
Even where there is reverberation, it can be accurately separated.

【図面の簡単な説明】[Brief description of the drawings]

【図１】この発明の音源分離装置の実施例の機能構成を
示すブロック図。FIG. 1 is a block diagram showing a functional configuration of an embodiment of a sound source separation device of the present invention.

【図２】この発明の音源分離方法の実施例の処理手順を
示す流れ図。FIG. 2 is a flowchart showing a processing procedure of an embodiment of a sound source separation method according to the present invention.

【図３】図２中のチャネル間時間差Δτ₁，Δτ₂を求
める処理手順の例を示す流れ図。FIG. 3 is a flowchart showing an example of a processing procedure for obtaining time differences Δτ ₁ and Δτ ₂ between channels in FIG. 2;

【図４】Ａ，Ｂはそれぞれ二つの音源信号のスペクトル
の例を示す図である。FIGS. 4A and 4B are diagrams illustrating examples of spectra of two sound source signals, respectively.

【図５】この発明の音源分離方法で、チャネル間レベル
差を利用して音源分離を行う実施例の処理手順を示す流
れ図。FIG. 5 is a flowchart showing a processing procedure of an embodiment in which sound source separation is performed using a level difference between channels in the sound source separation method of the present invention.

【図６】この発明音源分離方法で、チャネル間レベル差
と、チャネル間到達時間差を利用する実施例の処理手順
の一部を示す流れ図。FIG. 6 is a flowchart showing a part of a processing procedure of an embodiment using an inter-channel level difference and an inter-channel arrival time difference in the sound source separation method of the present invention.

【図７】図６中のステップＳ０８の続きを示す流れ図。FIG. 7 is a flowchart showing a continuation of step S08 in FIG. 6;

【図８】図６中のステップＳ０９の続きを示す流れ図。FIG. 8 is a flowchart showing a continuation of step S09 in FIG. 6;

【図９】図６中のステップＳ１０、図７、図８中のステ
ップＳ２０，Ｓ３０の続きを示す流れ図。FIG. 9 is a flowchart showing a continuation of step S10 in FIG. 6 and steps S20 and S30 in FIGS. 7 and 8;

【図１０】周波数帯域が異なる音源信号を分離する実施
例の機能構成を示すブロック図。FIG. 10 is a block diagram showing a functional configuration of an embodiment for separating sound source signals having different frequency bands.

【図１１】レベル差を利用して不要音源信号を抑圧する
構成を付加したこの発明の音源分離装置の実施例の機能
構成を示すブロック図。FIG. 11 is a block diagram showing a functional configuration of an embodiment of the sound source separation apparatus according to the present invention to which a structure for suppressing an unnecessary sound source signal by using a level difference is added.

【図１２】３つのマイクロホンとその受けもつゾーン
と、２つの音源の配置例を示す図。FIG. 12 is a diagram showing an example of the arrangement of three microphones, their zones, and two sound sources.

【図１３】発音している音源が１つの場合の音源ゾーン
の検出と、抑圧制御信号の生成処理手順の例を示す流れ
図。FIG. 13 is a flowchart showing an example of a procedure for detecting a sound source zone and generating a suppression control signal when only one sound source is sounding;

【図１４】３つのマイクロホンと、その受けもつゾーン
と、３つの音源の配置例を示す図。FIG. 14 is a diagram showing an example of the arrangement of three microphones, their zones, and three sound sources.

【図１５】音源が３つの場合の発音音源のゾーン検出
と、抑圧制御信号の生成処理手順の例を示す流れ図。FIG. 15 is a flowchart illustrating an example of a procedure for detecting a zone of a sound source and generating a suppression control signal when there are three sound sources;

【図１６】３つのマイクロホンによりゾーンを３つに分
割した例と、音源の配置例を示す図。FIG. 16 is a diagram showing an example in which a zone is divided into three by three microphones, and an example of arrangement of sound sources.

【図１７】この発明の音源分離装置において、発音して
いない合成音源信号を抑圧する制御信号を生成するため
の処理手順の例を示す流れ図。FIG. 17 is a flowchart showing an example of a processing procedure for generating a control signal for suppressing a synthesized sound source signal that is not sounding in the sound source separation device of the present invention.

【図１８】到達時間差を利用して不要音源信号を抑圧す
る構成を付加したこの発明の音源分離装置の実施例の機
能構成を示すブロック図。FIG. 18 is a block diagram showing a functional configuration of an embodiment of a sound source separation apparatus according to the present invention to which a structure for suppressing an unnecessary sound source signal by using a difference in arrival time is added.

【図１９】この発明音源分離装置で、パワースペクトル
を求めた後、帯域分割を行う場合の実施例の機能構成を
示すブロック図。FIG. 19 is a block diagram showing a functional configuration of an embodiment in which band division is performed after obtaining a power spectrum in the sound source separation device of the present invention.

【図２０】この発明の実験に用いた音源の種類を示す
図。FIG. 20 is a diagram showing types of sound sources used in an experiment of the present invention.

【図２１】図６〜図９に示した実施例の方法による処理
前と、処理後の音声スペクトルを示す図。FIG. 21 is a diagram showing a speech spectrum before and after processing according to the method of the embodiment shown in FIGS. 6 to 9;

【図２２】図６〜図９に示した実施例の方法を用いた主
観評価実験の結果を示す図。FIG. 22 is a diagram showing the results of a subjective evaluation experiment using the method of the embodiment shown in FIGS. 6 to 9;

【図２３】図６〜図９に示した実施例の方法により処理
した処理後の音声波形と、その原音声波形を示す図。FIG. 23 is a diagram showing a processed audio waveform processed by the method of the embodiment shown in FIGS. 6 to 9 and its original audio waveform.

【図２４】図６〜図９に示した音源分離方法と図１１に
示した音源分離装置とについての実験結果を示す図。FIG. 24 is a view showing experimental results of the sound source separation method shown in FIGS. 6 to 9 and the sound source separation device shown in FIG. 11;

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩＧ１０Ｌ 21/02 Ｇ１０Ｌ 9/00 ＨＨ０４Ｓ 7/00 (72)発明者西野豊東京都新宿区西新宿三丁目19番２号日本電信電話株式会社内 (56)参考文献特開平７−168586（ＪＰ，Ａ) 特開平５−344011（ＪＰ，Ａ) 米国特許5610991（ＵＳ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) H04R 3/00 320 G01S 5/18 G10L 11/00 G10L 15/20 G10L 19/00 G10L 21/02 H04S 7/00 ──────────────────────────────────────────────────の Continued on the front page (51) Int.Cl. ⁷ Identification code FI G10L 21/02 G10L 9/00 H H04S 7/00 (72) Inventor Yutaka Nishino 3-19-2 Nishishinjuku, Shinjuku-ku, Tokyo (56) References JP-A-7-168586 (JP, A) JP-A-5-344011 (JP, A) U.S. Pat. No. 5,610,991 (US, A) .Cl. ⁷ , DB name) H04R 3/00 320 G01S 5/18 G10L 11/00 G10L 15/20 G10L 19/00 G10L 21/02 H04S 7/00

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】互いに離して設けられた複数のマイクロ
ホンを用いて、複数の音源から少なくとも１つの音源を
分離する音源分離方法であって、上記各マイクロホンの各出力チャネル信号を、複数の周
波数帯域に分割する帯域分割過程と、上記帯域分割過程で分割された各出力チャネル信号の各
同一帯域ごとに、上記複数のマイクロホンの位置に起因
して変化する、マイクロホンに到達する音響信号のパラ
メータの値の差を、帯域別チャネル間パラメータ値差と
して検出する帯域別チャネル間パラメータ値差検出過程
と、上記各帯域の帯域別チャネル間パラメータ値差にもとづ
き、その帯域の上記帯域分割された各出力チャネル信号
の何れがいずれの音源から入力された信号であるかを判
定する音源信号判定過程と、上記音源信号判定過程の判定にもとづき、上記帯域分割
された各出力チャネル信号から、同一音源から入力され
た信号を少なくとも１つ選択する音源信号選択過程と、上記音源信号選択過程で同一音源からの信号として選択
された複数の帯域信号を音源信号として合成する音源合
成過程とを有することを特徴とする音源分離方法。1. A sound source separation method for separating at least one sound source from a plurality of sound sources by using a plurality of microphones provided apart from each other, wherein each output channel signal of the microphones is divided into a plurality of frequency bands. And a value of a parameter of an acoustic signal reaching the microphone, which varies depending on the positions of the plurality of microphones, for each of the same bands of each output channel signal divided in the above-described band division process. And a band-based inter-channel parameter value difference detecting step of detecting the difference between the band-based channel-based parameter value differences, and based on the band-based inter-channel parameter value difference of each band, the band-divided output channels of the band. A sound source signal determining step of determining which of the signals is a signal input from which sound source; A sound source signal selecting step of selecting at least one signal input from the same sound source from the band-divided output channel signals based on the determination; and a plurality of signals selected as signals from the same sound source in the sound source signal selecting step. And a sound source synthesizing step of synthesizing the band signal as a sound source signal.

【請求項２】請求項１記載の方法において、上記帯域分割過程は各出力チャネル信号の各分割された
帯域信号は、主として１つの音源の音響信号の成分より
なる程度に、小さく分割することを特徴とする音源分離
方法。2. The method according to claim 1, wherein the band division step divides each divided band signal of each output channel signal into small parts so that the divided band signals mainly include components of an acoustic signal of one sound source. Characteristic sound source separation method.

【請求項３】請求項１又は２記載の方法において、上記帯域別チャネル間パラメータ値差検出過程における
上記パラメータ値は音源からの音響信号が上記各マイク
ロホンに到達するまでの時間であり、上記帯域別チャネ
ル間パラメータ値差は各マイクロホンに到達するまでの
時間のマイクロホン間の差である帯域別チャネル間時間
差であることを特徴とする音源分離方法。3. The method according to claim 1, wherein the parameter value in the step of detecting a parameter value difference between channels for each band is a time until an acoustic signal from a sound source reaches each microphone. A sound source separation method, wherein the parameter value difference between different channels is a time difference between channels, which is a difference between microphones in a time required to reach each microphone.

【請求項４】請求項３記載の方法において、上記音響信号が各マイクロホンに到達するまでの時間の
マイクロホン間の差をチャネル間時間差として各マイク
ロホンの出力チャネル信号から検出するチャネル時間差
検出過程を有し、上記音源信号判定過程は、上記各帯域別チャネル間時間
差について、上記各チャネル間時間差を照合して、その
帯域の上記分割された各出力チャネル信号がいずれの音
源から入力された信号であるかを判定することを特徴と
する音源分離方法。4. The method according to claim 3, further comprising a channel time difference detecting step of detecting a difference between microphones until the acoustic signal reaches each microphone as an inter-channel time difference from an output channel signal of each microphone. Then, the sound source signal determination step compares the inter-channel time difference for each band, and compares the inter-channel time difference, and the divided output channel signals of the band are signals input from any sound source. A sound source separation method, characterized in that it is determined whether

【請求項５】請求項４記載の方法において、上記チャネル時間差検出過程は各出力チャネル信号間の
相互相関を求め、相互相関の各ピークとなるその出力チ
ャネル信号間の各時間差として上記各チャネル間時間差
を求めることを特徴とする音源分離方法。5. The method according to claim 4, wherein the step of detecting a channel time difference determines a cross-correlation between the output channel signals, and calculates a time difference between the output channel signals at each peak of the cross-correlation. A sound source separation method characterized by calculating a time difference.

【請求項６】請求項５記載の方法において、上記帯域別チャネル間時間差は、上記各チャネル間時間
差中の、上記分割された各出力チャネルの同一帯域の成
分の位相差と対応する時間と最も近いものを求めて、そ
の帯域別チャネル間時間差とすることを特徴とする音源
分離方法。6. The method according to claim 5, wherein the inter-channel time difference for each band is the time corresponding to the phase difference of the component of the same band of each of the divided output channels in the inter-channel time difference. A sound source separation method characterized in that a close one is obtained and the time difference between channels for each band is obtained.

【請求項７】請求項１又は２記載の方法において、上記帯域別チャネル間パラメータ値差検出過程における
上記パラメータ値は音源からの音響信号が上記各マイク
ロホンに到達した時の信号レベルであり、上記帯域別チ
ャネル間パラメータ値差は各分割された出力チャネル信
号の対応帯域間のレベル差である帯域別チャネル間レベ
ル差であることを特徴とする音源分離方法。7. The method according to claim 1, wherein the parameter value in the step of detecting a parameter value difference between channels for each band is a signal level when an acoustic signal from a sound source reaches each of the microphones. A sound source separation method characterized in that the inter-band parameter value difference between bands is a level difference between channels, which is a level difference between corresponding bands of the divided output channel signals.

【請求項８】請求項７記載の方法において、上記各マイクロホンの出力チャネル信号間のレベル差
を、チャネル間レベル差として検出するチャネル間レベ
ル差検出過程と、上記チャネル間レベル差と、対応する帯域別チャネル間
レベル差の全てと比較する比較過程と、その比較過程で分割帯域の所定数以上が同様の関係にあ
れば、上記チャネル間レベル差にもとづき、対応する出
力チャネル信号の全帯域について同一の音源から入力さ
れた信号であると判定し、上記比較過程で所定値以上が
同様の関係になければ、上記帯域別にいずれの音源から
入力された信号であるかを判定する上記音源信号判定過
程を実行することを特徴とする音源分離方法。8. The method according to claim 7, wherein a level difference between output channel signals of the microphones is detected as an inter-channel level difference, and the inter-channel level difference is detected. A comparison step of comparing with all of the band-to-channel level differences, and if a predetermined number or more of the divided bands have the same relationship in the comparison step, the entire band of the corresponding output channel signal is calculated based on the inter-channel level difference. The sound source signal is determined to be a signal input from the same sound source, and if the predetermined value or more does not have the same relationship in the comparison process, the sound source signal determination is performed to determine which signal is input from which sound source for each band. A sound source separation method comprising performing a process.

【請求項９】請求項１又は２記載の方法において、上記パラメータ値は音源からの音響信号が上記マイクロ
ホンに到達するまでの時間と、その音響信号が到達した
時の信号レベルであり、上記帯域別チャネル間パラメー
タ値差として帯域別チャネル間時間差と、帯域別チャネ
ル間レベル差が求められ、各音源からの音響信号が上記各マイクロホンに到達する
までの時間のマイクロホン間の差を、各マイクロホンの
出力チャネル信号から、チャネル時間差として検出する
チャネル間時間差検出過程と、上記チャネル間時間差を基準にして上記分割された各出
力チャネル信号を、低域、中域、高域の３つの周波数領
域に分け領域分割過程とを有し、上記音源信号判定過程は、上記分割された低域の周波数帯域については、上記帯域
別チャネル間時間差を利用して対応する帯域の分割され
た各出力チャネル信号の何れがいずれの音源からの入力
信号であるか判定する過程と、上記分割された中域の周波数帯域については、上記帯域
別チャネル間レベル差と、上記帯域別チャネル間時間差
を利用して、対応する帯域の分割された各出力チャネル
信号の何れがいずれの音源からの入力信号であるか判定
する過程と、上記分割された高域の周波数帯域については、上記帯域
別チャネル間レベル差を利用して、対応する帯域の分割
された各出力チャネル信号の何れかがいずれの音源から
の入力信号であるか判定する過程とからなることを特徴
とする音源分離方法。9. The method according to claim 1, wherein the parameter values are a time until an acoustic signal from a sound source reaches the microphone and a signal level when the acoustic signal arrives, and The time difference between channels for each band and the level difference between channels for each band are obtained as the parameter value difference for each channel, and the difference between the microphones in the time required for the acoustic signal from each sound source to reach each microphone is determined by An inter-channel time difference detection process for detecting a channel time difference from an output channel signal; and dividing the output channel signals divided into three frequency regions of a low band, a middle band, and a high band based on the inter-channel time difference. And a sound source signal determining step, wherein, for the divided low frequency band, the band-based channel A step of determining which of the divided output channel signals of the corresponding band is the input signal from which sound source using the time difference; and for the divided middle frequency band, the band-based channel Determining which of the divided output channel signals of the corresponding band is an input signal from which sound source using the inter-level difference and the inter-channel time difference for each band; and A frequency band of the band, using the above-mentioned level difference between channels for each band, determining which of the sound sources is an input signal from which of the divided output channel signals of the corresponding band. A sound source separation method characterized in that:

【請求項１０】請求項１〜９の何れかに記載の方法に
おいて、上記帯域別チャネル間パラメータ値差検出過程におい
て、その互いに差をとるべき、もとのチャネル信号の周
波数帯域が異なる場合は、その周波数帯域が互いに重な
らない周波数帯域は、上記帯域別チャネル間パラメータ
値差検出過程を実行せず、上記音源信号判定過程ではそ
の信号がある帯域を予め知られている広い帯域の音源か
らの入力信号と判定することを特徴とする音源分離方
法。10. The method according to claim 1, wherein, in the step of detecting the parameter value difference between channels, if the frequency bands of the original channel signals to be different from each other are different. The frequency bands whose frequency bands do not overlap each other do not execute the above-described band-specific inter-channel parameter value difference detection process, and in the above-mentioned sound source signal determination process, the signal is deemed to have a certain band from a sound source of a wide band that is known in advance. A sound source separation method characterized by determining an input signal.

【請求項１１】互いに離して設けられた複数のマイク
ロホンを用いて、複数の音源から少なくとも１つの音源
を分離する音源分離方法であって、上記各マイクロホンの各出力チャネル信号のパワースペ
クトルを求めるスペクトル分解過程と、上記各チャネルごとのパワースペクトルを、主としてほ
ぼ１つの音源の成分が含まれるように複数の周波数帯域
に分割する帯域分割過程と、上記各同一帯域ごとに、各チャネル間で分割されたパワ
ースペクトル差を、帯域別チャネル間レベル差として検
出する帯域別チャネル間レベル差検出過程と、上記各帯域の帯域別チャネル間レベル差にもとづき、そ
の帯域の信号が上記出力チャネル信号の何れであるかを
判定する音源信号判定過程と、上記音源信号判定過程の判定にもとづき、上記分割され
たパワースペクトルから、同一音源からの信号を少なく
とも１つ選択する音源信号選択過程と、上記音源信号選択過程で同一音源からのものとして選択
されたスペクトルを音源信号として合成する音源合成過
程とを有することを特徴とする音源分離方法。11. A sound source separation method for separating at least one sound source from a plurality of sound sources by using a plurality of microphones provided apart from each other, wherein a spectrum for obtaining a power spectrum of each output channel signal of each microphone is provided. A decomposition process; a band dividing process of dividing the power spectrum of each channel into a plurality of frequency bands so as to include mainly one sound source component; and a dividing process between the channels for each of the same bands. Detecting a power spectrum difference as a band-to-channel level difference, and a band-to-channel level difference detection step of detecting a band-based channel-to-channel level difference based on the band-to-channel level difference of each band. A sound source signal determining step of determining whether there is a sound source signal; and A sound source signal selecting step of selecting at least one signal from the same sound source from the power spectrum; and a sound source synthesizing step of synthesizing, as a sound source signal, a spectrum selected as being from the same sound source in the sound source signal selecting step. A sound source separation method characterized by the following.

【請求項１２】請求項１１記載の方法において、上記各マイクロホンの出力チャネル信号間のレベル差を
チャネル間レベル差として検出するチャネル間レベル差
検出過程と、上記チャネル間レベル差と、対応する上記帯域別チャネ
ル間レベル差の全てとを比較する比較過程と、その比較過程で分割帯域の所定数以上が同様の関係であ
れば、上記チャネル間レベル差にもとづき、対応する出
力チャネル信号の全帯域について同一音源から入力され
た信号であると判定し、上記比較過程で所定値以上が同
様の関係になければ、上記音源信号判定過程を実行する
ことを特徴とする音源分離方法。12. The method according to claim 11, wherein an inter-channel level difference detecting step of detecting a level difference between output channel signals of the microphones as an inter-channel level difference; A comparison process of comparing all of the band-to-channel level differences, and if a predetermined number or more of the divided bands are similar in the comparison process, the entire band of the corresponding output channel signal is determined based on the inter-channel level difference. Sound source separation method, wherein the sound source separation method determines that the signals are input from the same sound source, and if the predetermined value or more does not have the same relationship in the comparing process, the sound source signal determining process is performed.

【請求項１３】請求項１乃至１２の何れかの方法にお
いて、上記各マイクロホンの出力チャネル信号を、各帯域が主
として１つの音源信号成分になる程度に、複数の周波数
帯域に分割する第２帯域分割過程と、上記第２帯域分割過程で分割された各出力チャネル信号
の帯域別レベルをそれぞれ検出する帯域別レベル検出過
程と、その帯域別レベル検出過程で検出された各帯域別レベル
を同一帯域についてチャネル間で比較した結果にもとづ
き発音をしていない音源を検出する音源状態判定過程
と、その音源状態判定過程で得た発音をしていない音源の検
出信号により、上記音源合成過程で合成された音源信号
のうち、上記発音していない音源と対応する合成信号を
抑圧する信号抑圧過程とを有することを特徴とする音源
分離方法。13. The method according to claim 1, wherein the output channel signal of each of the microphones is divided into a plurality of frequency bands so that each band is mainly one sound source signal component. A dividing step; a band-specific level detecting step of detecting each band-specific level of each output channel signal divided in the second band dividing step; and each band-level detected in the band-specific level detecting step is set to the same band. The sound source state determination step of detecting a sound source that is not sounding based on the result of comparison between the channels, and the detection signal of the sound source that is not sounding obtained in the sound source state determination step are combined in the sound source synthesis step. A sound source separation method for suppressing a synthesized signal corresponding to the sound source that is not sounding out of the generated sound source signals.

【請求項１４】請求項１３の方法において、上記音源状態判定過程は、上記各帯域別レベルのチャネ
ル間での比較で、最も大きいチャネルを帯域ごとに決定
する過程と、各チャネルごとに最もレベルが大きい帯域の数を求める
過程と、上記最もレベルが大きい帯域の数が第１基準値を越える
か否か判定する第１判定過程と、その第１判定過程で第１基準値を越えると判定すると、
その越えた最もレベルが大きい帯域の数と対応するチャ
ネルのマイクロホン位置から、発音している１個の音源
を推定する過程と、その推定された音源以外の音源を発音していないものと
して検出する過程とを有することを特徴とする音源分離
方法。14. The method according to claim 13, wherein the sound source state determination step is a step of determining the largest channel for each band by comparing the channels of each band level, and the step of determining the highest level for each channel. Determining the number of bands having the highest level, determining whether the number of bands having the highest level exceeds the first reference value, and determining that the number exceeds the first reference value in the first determination step. Then
A process of estimating one sounding sound source from the microphone position of the channel corresponding to the number of bands having the highest level and exceeding the number of bands, and detecting sound sources other than the estimated sound source as not sounding. And a sound source separation method.

【請求項１５】請求項１〜１２の何れかに記載の方法
において、上記各マイクロホンの各出力チャネル信号のパワースペ
クトルを求めるスペクトル分解過程と、上記各チャネルごとのパワースペクトルを、主としてほ
ぼ１つの音源の成分が含まれるように周波数帯域を分割
して帯域別レベルをそれぞれ検出する帯域別レベル検出
過程と、これら各帯域別レベルを同一帯域について比較し、最大
レベルのチャネルを各帯域ごとに決定する過程と、各チャネルごとの最大レベルの帯域の数を求める過程
と、その帯域の数が第１基準値を越えたか否かを判定する第
１判定過程と、その第１判定過程で第１基準値を越える数と判定する
と、その越えたチャネルのマイクロホンが受けもつ、ゾ
ーンから発音している１個の音源を推定する過程と、その推定された音源以外の音源は発音していないと判定
する過程と、上記音源合成過程で合成された音源信号のうち、上記発
音していないと判定された音源と対応する信号を、抑圧
する信号抑圧過程とを有することを特徴とする音源分離
方法。15. The method according to claim 1, wherein a spectrum decomposing step of obtaining a power spectrum of each output channel signal of each microphone, and a power spectrum of each channel are performed by substantially one A band level detection process of dividing a frequency band so as to include a sound source component and detecting each band level, and comparing each band level for the same band, and determining a maximum level channel for each band. A step of determining the number of maximum-level bands for each channel; a first determining step of determining whether the number of bands exceeds a first reference value; When it is determined that the number exceeds the reference value, a process of estimating one sound source that is sounding from the zone, which is covered by the microphone of the channel that exceeds the reference value; Determining that no sound source other than the estimated sound source is sounding; and suppressing a signal corresponding to the sound source determined to be not sounding among the sound source signals synthesized in the sound source synthesis process. A sound source separation method, comprising: a signal suppression process.

【請求項１６】請求項１４又は１５の方法において、上記第１判定過程で、第１基準値を越えるものがないと
判定されると、上記最もレベルが大きい帯域の数が、上
記第１基準値よりも小さい第２基準値以下か否かを判定
する第２判定過程と、その第２判定過程で、第２基準値より小さいと判定され
ると、その小さいと判定された最もレベルが大きい帯域
の数と対応するチャネルのマイクロホン位置から、発音
していない１個の音源として検出する過程とを有するこ
とを特徴とする音源分離方法。16. The method according to claim 14, wherein, if it is determined in the first determination step that none of the bands exceed a first reference value, the number of bands having the highest level is determined by the first reference value. A second determination step of determining whether or not the value is equal to or less than a second reference value smaller than the value. If the second determination step determines that the value is smaller than the second reference value, the highest level determined to be smaller is the largest. Detecting a single sound source that is not sounding from the number of bands and the microphone position of the corresponding channel.

【請求項１７】請求項１乃至１２の何れかの方法にお
いて、上記各マイクロホンの出力チャネル信号を、各帯域が主
として１つの音源信号成分になる程度に、複数の周波数
帯域に分割する第２帯域分割過程と、上記第２帯域分離過程で分割された各出力チャネル信号
のそのマイクロホンへの到達時間差を同一帯域ごとに検
出する帯域別時間差検出過程と、この帯域別時間差検出過程で検出された各帯域別到達時
間差を、同一帯域についてチャネル間で比較した結果に
もとづき発音をしていない音源を検出する音源状態判定
過程と、その音源状態判定過程で得た発音をしていない音源の検
出信号により、上記音源合成過程で合成された音源信号
のうち、上記発音していない音源と対応する合成信号を
抑圧する信号抑圧過程とを有することを特徴とする音源
分離方法。17. The method according to claim 1, wherein the output channel signal of each of the microphones is divided into a plurality of frequency bands such that each band is mainly one sound source signal component. A dividing step; a band-based time difference detecting step of detecting the arrival time difference of each output channel signal divided by the second band separating step to the microphone for each same band; A sound source state determination process of detecting a sound source that is not sounding based on a result of comparing the arrival time difference for each band between channels in the same band, and a detection signal of a sound source that is not sounding obtained in the sound source state determination process. Out of the sound source signals synthesized in the sound source synthesizing step, a signal suppressing step of suppressing a synthesized signal corresponding to the sound source not producing sound. Sound source separation method and butterflies.

【請求項１８】請求項３の方法において、上記帯域別チャネル間時間差を、同一帯域についてチャ
ネル間で比較した結果にもとづき発音をしていない音源
を検出する音源状態判定過程と、その音源状態判定過程で得た発音をしていない音源を検
出信号により、上記音源合成過程で合成された音源信号
のうち、上記発音していない音源と対応する信号を抑圧
する信号抑圧過程とを有することを特徴とする音源分離
方法。18. The method according to claim 3, wherein a sound source state determining step of detecting a sound source that is not generating sound based on a result of comparing the time difference between channels for each band between channels in the same band, and determining the sound source state. And a signal suppressing step of suppressing a signal corresponding to the non-sounding sound source among the sound source signals synthesized in the sound source synthesizing step according to a detection signal of a soundless sound source obtained in the process. Sound source separation method.

【請求項１９】請求項１７又は１８の方法において、上記音源状態判定過程は、上記各帯域別到達時間差比較
で最も速く音源信号が到達したチャネルを帯域ごとに決
定する過程と、各チャネルごとに最も速く到達した帯域の数が第１基準
値を越えるか否かを判定する第１判定過程と、その第１判定過程が第１基準値を越えると判定すると、
その越えた最も速く到達した帯域数と対応するチャネル
のマイクロホン位置から発音している１個の音源を推定
する過程と、その推定された音源以外の音源を発音していないものと
して検出する過程とを有することを特徴とする音源分離
方法。19. The method according to claim 17, wherein the sound source state determination step includes: determining, for each band, a channel in which the sound source signal has reached the earliest in the arrival time difference comparison for each band; A first determining step of determining whether or not the number of bands that have reached the fastest exceeds a first reference value; and determining that the first determining step exceeds the first reference value.
Estimating one sound source that is sounding from the microphone position of the channel that corresponds to the number of bands that have reached the fastest frequency, and detecting other sound sources other than the estimated sound source as not sounding. A sound source separation method comprising:

【請求項２０】請求項１９の方法において、上記第１判定過程で、第１基準値を越えるものがないと
判定されると、上記最も速く到達する帯域の数が、上記
第１基準値よりも小さい第２基準値より小さいか否かを
判定する第２判定過程と、その第２判定過程で、第２基準値より小さいと判定され
ると、その小さいと判定された最も速い到達時間の帯域
数と対応するチャネルのマイクロホン位置から、発音し
ていない１個の音源として検出する過程とを有すること
を特徴とする音源分離方法。20. The method according to claim 19 , wherein, in the first determining step, when it is determined that there is no one exceeding the first reference value, the number of the bands that reach the earliest is set to be smaller than the first reference value. A second determination step of determining whether or not the second reference value is smaller than the second reference value. If it is determined in the second determination step that the second reference value is smaller than the second reference value, Detecting as a single sound source that is not sounding from the microphone position of the channel corresponding to the number of bands.

【請求項２１】請求項１６又は２０の方法において、音源が４個以上の場合で、上記第２判定過程で、第２基
準値より小さいと判定されると、上記第２基準値を上記
第１基準値を越えない範囲内で、順次大きくして、上記
第２判定過程と同じ判定を、（Ｍ−２）回以内、Ｍは音
源の数、繰返す過程を有することを特徴とする音源分離
方法。21. The method according to claim 16, wherein, when the number of sound sources is four or more, if it is determined in the second determination step that the number is smaller than the second reference value, the second reference value is changed to the second reference value. Sound source separation characterized by successively increasing the size within a range not exceeding one reference value and repeating the same judgment as the second judgment step within (M-2) times, where M is the number of sound sources and Method.

【請求項２２】請求項１３〜２１の何れかに記載の方
法において、各出力チャネル信号の全周波数成分のレベルをそれぞれ
検出する全帯域レベル検出過程と、その全帯域レベル検出過程で検出した各チャネルの全周
波数成分レベルの何れもが第３基準値以下であるかを判
定し、何れかが第３基準値以下でないと判定すると上記
音源状態判定過程に移る第３判定過程とを有することを
特徴とする音源分離方法。22. The method according to claim 13, wherein an entire band level detecting step for detecting the level of all frequency components of each output channel signal, and Determining whether all of the frequency component levels of the channels are equal to or less than a third reference value, and, if determining that any of the levels is not equal to or less than the third reference value, shifting to the sound source state determination step. Characteristic sound source separation method.

【請求項２３】請求項２２の方法において、上記第３判定過程が第３基準値以下であると判定される
と、上記音源合成過程で合成された各音源信号のすべて
を抑圧する過程を有することを特徴とする音源分離方
法。23. The method according to claim 22, further comprising the step of suppressing all of the sound source signals synthesized in the sound source synthesizing step when the third judging step is judged to be less than or equal to a third reference value. A sound source separation method characterized in that:

【請求項２４】請求項１３〜２３の何れかの方法にお
いて、上記帯域分割過程と上記第２帯域分割過程は同一過程と
して行われることを特徴とする音源分離方法。24. A sound source separation method according to claim 13, wherein said band division step and said second band division step are performed as the same step.

【請求項２５】互いに離して設けられた複数のマイク
ロホンを用いて、複数の音源から少なくとも１つの音源
を分離する音源分離装置であって、上記各マイクロホンの各出力チャネル信号を、主として
１つの音源の音響信号の成分のみが含まれる程度に複数
の周波数帯域に分割すると共に、これら分割された各出
力チャネル信号の各同一帯域ごとに、上記複数のマイク
ロホンの位置に起因して変化する、マイクロホンに到達
する音響信号のパラメータの値の差を、帯域別チャネル
間パラメータ値差として検出する帯域別チャネル間パラ
メータ値差検出手段と、上記各帯域の帯域別チャネル間パラメータ値差にもとづ
き、その帯域の上記帯域分割された各出力チャネル信号
の何れがいずれの音源から入力された信号であるかを判
定する音源信号判定手段と、上記音源信号判定過程の判定にもとづき、上記帯域分割
された各出力チャネル信号から、同一音源から入力され
た信号を少なくとも１つ選択する音源信号選択手段と、上記音源信号選択過程で同一音源からの信号として選択
された、複数の帯域信号を音源信号として合成する音源
合成手段とを具備することを特徴とする音源分離装置。25. A sound source separation device for separating at least one sound source from a plurality of sound sources by using a plurality of microphones provided apart from each other, wherein each output channel signal of each of the microphones is mainly converted into one sound source. A plurality of frequency bands to the extent that only the components of the acoustic signal are included, and for each of the same bands of each of the divided output channel signals, which changes due to the positions of the plurality of microphones. A band-to-band parameter value difference detecting means for detecting a difference between parameter values of an arriving acoustic signal as a band-to-channel parameter value difference, and based on the band-to-channel parameter value difference of each band, A sound source signal for determining which of the band-divided output channel signals is a signal input from which sound source Judging means; sound source signal selecting means for selecting at least one signal input from the same sound source from each of the band-divided output channel signals based on the judgment in the sound source signal judging step; A sound source separation device comprising: sound source synthesizing means for synthesizing a plurality of band signals selected as signals from the same sound source as a sound source signal.

【請求項２６】請求項２５の装置において、上記帯域別チャネル間パラメータ値差検出手段における
上記パラメータ値は音源からの音響信号が上記各マイク
ロホンに到達するまでの時間であり、上記帯域別チャネ
ル間パラメータ値差は各マイクロホンに到達するまでの
時間のマイクロホン間の差である帯域別チャネル間時間
差であることを特徴とする音源分離装置。26. The apparatus according to claim 25, wherein the parameter value in the band-by-band parameter value difference detecting means is a time until an acoustic signal from a sound source reaches each of the microphones. A sound source separation apparatus, wherein the parameter value difference is a time difference between channels for each band, which is a difference between microphones in a time required to reach each microphone.

【請求項２７】請求項２５の装置において、上記音響信号が各マイクロホンに到達するまでの時間の
マイクロホン間の差をチャネル間時間差として各マイク
ロホンの出力チャネル信号から検出するチャネル時間差
検出手段を有し、上記音源信号判定手段は、上記各帯域別チャネル間時間
差について、上記各チャネル間時間差を照合して、その
帯域の上記分割された各出力チャネル信号がいずれの音
源から入力された信号であるかを判定する手段であるこ
とを特徴とする音源分離装置。27. The apparatus according to claim 25, further comprising a channel time difference detecting means for detecting a difference between microphones until the acoustic signal reaches each microphone as a channel time difference from an output channel signal of each microphone. The sound source signal determination unit checks the time difference between the channels for each band, and compares the time difference between the channels to determine which of the sound sources the divided output channel signals of the band are input from. A sound source separation device, which is means for determining

【請求項２８】請求項２５の装置において、上記帯域別チャネル間パラメータ値差検出手段における
上記パラメータ値は音源からの音響信号が上記各マイク
ロホンに到達した時の信号レベルであり、上記帯域別チ
ャネル間パラメータ値差は各分割された出力チャネル信
号の対応帯域間のレベル差である帯域別チャネル間レベ
ル差であることを特徴とする音源分離装置。28. The apparatus according to claim 25, wherein said parameter value in said band-by-band parameter value difference detecting means is a signal level when an acoustic signal from a sound source reaches said microphone. A sound source separation apparatus, wherein the inter-parameter value difference is a band-to-channel inter-channel level difference that is a level difference between corresponding bands of the divided output channel signals.

【請求項２９】請求項２８の装置において、上記各マイクロホンの出力チャネル信号間のレベル差
を、チャネル間レベル差として検出するチャネル間レベ
ル差検出手段と、上記チャネル間レベル差と、対応する帯域別チャネル間
レベル差の全てと比較する比較手段と、その比較手段で
分割帯域の所定数以上が同様の関係にあれば、上記チャ
ネル間レベル差にもとづき、対応する出力チャネル信号
の全帯域について同一の音源から入力された信号である
と判定し、上記比較手段で所定値以上が同様の関係にな
ければ、上記帯域別にいずれの音源から入力された信号
であるかを判定する上記音源信号判定手段を実行する手
段を含むことを特徴とする音源分離装置。29. The apparatus according to claim 28, wherein: an inter-channel level difference detecting means for detecting a level difference between output channel signals of the respective microphones as an inter-channel level difference; Comparing means for comparing with all of the level differences between different channels, and if the predetermined number or more of the divided bands have the same relationship, the same means for the entire band of the corresponding output channel signal based on the level difference between the channels. The sound source signal determination means determines that the signal is input from a sound source of the sound source, and if the predetermined value or more does not have the same relation in the comparison means, the sound source signal determination means determines which of the sound sources is input from each of the bands. A sound source separation device comprising:

【請求項３０】請求項２５の装置において、上記パラメータ値は音源からの音響信号が上記マイクロ
ホンに到達するまでの時間と、その音響信号が到達した
時の信号レベルであり、上記帯域別チャネル間パラメー
タ値差として帯域別チャネル間時間差と、帯域別チャネ
ル間レベル差が求められ、各音源からの音響信号が上記各マイクロホンに到達する
までの時間のマイクロホン間の差と、各マイクロホンの
出力チャネル信号から、チャネル時間差として検出する
チャネル間時間差検出手段と、上記チャネル間時間差を基準にして、上記分割された各
出力チャネル信号を、低域、中域、高域の３つの周波数
領域に分ける領域分割手段とを有し、上記音源信号判定手段は、上記分割された低域の周波数帯域については、上記帯域
別チャネル間時間差を利用して対応する帯域の分割され
た各出力チャネル信号の何れがいずれの音源からの入力
信号であるか判定する手段と、上記分割された中域の周波数帯域については、上記帯域
別チャネル間レベル差と、上記帯域別チャネル間時間差
を利用して、対応する帯域の分割された各出力チャネル
信号の何れがいずれの音源からの入力信号であるか判定
する手段と、上記分割された高域の周波数帯域については、上記帯域
別チャネル間レベル差を利用して、対応する帯域の分割
された各出力チャネル信号の何れかがいずれの音源から
の入力信号であるか判定する手段とからなることを特徴
とする音源分離装置。30. The apparatus according to claim 25, wherein the parameter values are a time until an acoustic signal from a sound source reaches the microphone and a signal level when the acoustic signal reaches the microphone. The time difference between channels for each band and the level difference between channels for each band are obtained as the parameter value difference. The difference between the microphones in the time until the acoustic signal from each sound source reaches each of the microphones, and the output channel signal of each microphone And an inter-channel time difference detecting means for detecting the time difference as a channel time difference, and dividing the output channel signals into three frequency regions of a low band, a middle band, and a high band based on the inter-channel time difference. Means, and the sound source signal determination means, for the divided low frequency band, Means for determining which of the divided output channel signals of the corresponding band is the input signal from which sound source by using the difference, and for the divided middle frequency band, Means for determining which of the divided output channel signals of the corresponding band is an input signal from which sound source using the inter-channel level difference and the inter-channel time difference for each band, For the high frequency band, using the above-mentioned band-to-channel level difference, means for determining which of the sound source is any of the output channel signals obtained by dividing the corresponding band. A sound source separation device characterized in that:

【請求項３１】請求項２５乃至３０の何れかの装置に
おいて、上記帯域分割された各出力チャネル信号の帯域別レベル
をそれぞれ検出する帯域別レベル検出手段と、その帯域別レベル検出手段が検出された各帯域別レベル
を同一帯域についてチャネル間で比較した結果にもとづ
き発音をしていない音源を検出する音源状態判定手段
と、その音源状態判定手段で得た発音をしていない音源の検
出信号により、上記音源合成手段で合成された音源信号
のうち、上記発音していない音源と対応する信号を抑圧
する信号抑圧手段とを有することを特徴とする音源分離
装置。31. The apparatus according to claim 25, wherein the band-specific level detecting means for detecting a band-specific level of each of the band-divided output channel signals, and the band-specific level detecting means are detected. Sound source state determination means for detecting a sound source that is not sounding based on the result of comparing the levels for each band in the same band between channels, and a detection signal of a sound source that is not sounding obtained by the sound source state determination means. And a signal suppressing unit for suppressing a signal corresponding to the sound source not sounding among the sound source signals synthesized by the sound source synthesizing unit.

【請求項３２】請求項３１の装置において、上記音源状態判定手段は、上記各帯域別レベルのチャネ
ル間での比較で、最も大きいチャネルを帯域ごとに決定
する手段と、各チャネルごとに最もレベルが大きい帯域の数を求める
手段と、上記最もレベルが大きい帯域の数が第１基準値を越える
か否か判定する第１判定手段と、その第１判定手段で第１基準値を越えると判定すると、
その越えた最もレベルが大きい帯域の数と対応するチャ
ネルのマイクロホン位置から、発音している１個の音源
を推定する手段と、その推定された音源以外の音源を発音していないものと
して検出する手段とを有することを特徴とする音源分離
装置。32. The apparatus according to claim 31, wherein the sound source state determining means determines the largest channel for each band by comparing the channels of each band level, and the highest level for each channel. Means for determining the number of bands having a large level, first determining means for determining whether or not the number of bands having the highest level exceeds a first reference value, and determining that the number exceeds the first reference value by the first determining means Then
Means for estimating one sound source that is sounding from the microphone position of the channel corresponding to the number of bands having the highest level and the sound source other than the estimated sound source being detected as not sounding Means for separating sound sources.

【請求項３３】請求項３２の装置において、上記第１判定手段で、第１基準値を越えるものがないと
判定されると、上記最もレベルが大きい帯域の数が、上
記第１基準値よりも小さい第２基準値以下か否かを判定
する第２判定手段と、その第２判定手段で、第２基準値より小さいと判定され
ると、その小さいと判定された最もレベルが大きい帯域
の数と対応するチャネルのマイクロホン位置から、発音
していない１個の音源として検出する手段とを有するこ
とを特徴とする音源分離装置。33. The apparatus according to claim 32 , wherein if the first determination means determines that none of the bands exceed the first reference value, the number of bands having the highest level is higher than the first reference value. Determining means for determining whether or not the second reference value is smaller than the second reference value. If the second determining means determines that the value is smaller than the second reference value, Means for detecting a single sound source that is not producing sound from the microphone position of the channel corresponding to the number.

【請求項３４】請求項２５乃至３０の何れかの装置に
おいて、上記帯域分割された各出力チャネル信号のそのマイクロ
ホンへの到達時間差を同一帯域ごと検出する帯域別時間
差検出手段と、この帯域別時間差検出手段で検出された各帯域別到達時
間差を、同一帯域についてチャネル間で比較した結果に
もとづき発音をしていない音源を検出する音源状態判定
手段と、その音源状態判定手段で得た発音をしていない音源を検
出信号により、上記音源合成手段で合成された音源信号
のうち、上記発音していない音源と対応する信号を抑圧
する信号抑圧手段とを有することを特徴とする音源分離
装置。34. The apparatus according to claim 25, wherein a time difference detecting means for each band detects a difference in arrival time of each of the band-divided output channel signals to the microphone for the same band, and a time difference for each band. A sound source state determination unit that detects a sound source that is not sounding based on a result of comparing the arrival time differences for each band detected by the detection unit between channels in the same band, and a sound source obtained by the sound source state determination unit. A sound source separation device, comprising: a signal suppressing unit that suppresses a signal corresponding to the sound source that is not sounding among the sound source signals synthesized by the sound source synthesizing unit based on a detection signal of a sound source that is not sounded.

【請求項３５】請求項３４の装置において、上記音源状態判定手段は、上記各帯域別到達時間差比較
で最も速く音源信号が到達したチャネルを帯域ごとに決
定する手段と、各チャネルごとに最も速く到達した帯域の数が第１基準
値を越えるか否かを判定する第１判定手段と、その第１判定手段が第１基準値を越えると判定すると、
その越えた最も速く到達した帯域数と対応するチャネル
のマイクロホン位置から発音している１個の音源を推定
する手段と、その推定された音源以外の音源を発音していないものと
して検出する手段とを有することを特徴とする音源分離
装置。35. The apparatus according to claim 34, wherein said sound source state determination means determines, for each band, a channel in which the sound source signal has reached the earliest in said arrival time difference comparison for each band, First determining means for determining whether or not the number of arrived bands exceeds a first reference value; and when the first determining means determines that the number exceeds the first reference value,
Means for estimating one sound source that is sounding from the microphone position of the channel corresponding to the number of bands that have reached the fastest, and means for detecting sound sources other than the estimated sound source as not sounding. A sound source separation device comprising:

【請求項３６】請求項３５の装置において、上記第１判定手段で、第１基準値を越えるものがないと
判定されると、上記最も速く到達する帯域の数が、上記
第１基準値よりも小さい第２基準値以下か否かを判定す
る第２判定手段と、その第２判定手段で、第２基準値より小さいと判定され
ると、その小さいと判定された最も速い到達時間の帯域
数と対応するチャネルのマイクロホン位置から、発音し
ていない１個の音源として検出する手段とを有すること
を特徴とする音源分離装置。36. The apparatus according to claim 35, wherein when the first determination means determines that there is no one exceeding the first reference value, the number of the bands that reach the fastest is set to be smaller than the first reference value. Determining means for determining whether or not the second reference value is smaller than the second reference value. If the second determining means determines that the value is smaller than the second reference value, the band of the fastest arrival time determined to be smaller is smaller than the second reference value. Means for detecting a single sound source that is not producing sound from the microphone position of the channel corresponding to the number.

【請求項３７】請求項３１〜３６の何れかに記載の装
置において、各出力チャネル信号の全周波数成分のレベルをそれぞれ
検出する全帯域レベル検出手段と、その全帯域レベル検出手段で検出した各チャネルの全周
波数成分レベルの何れもが第３基準値以下であるかを判
定し、何れかが第１基準値以下でないと判定すると、上
記音源状態判定手段に移る第３判定手段とを有すること
を特徴とする音源分離装置。37. The apparatus according to claim 31, wherein the whole band level detecting means for detecting the level of all frequency components of each output channel signal, and each of the signals detected by the whole band level detecting means. Determining whether all of the frequency component levels of the channels are equal to or less than a third reference value and, if determining that any of the frequency component levels is not equal to or less than the first reference value, shifting to the sound source state determination means; A sound source separation device characterized by the above-mentioned.

【請求項３８】互いに離して設けられた複数のマイク
ロホンを用いて、複数の音源から少なくとも１つの音源
を分離する下記過程を有する音源分離方法のプログラム
を記録した記録媒体であって、上記各マイクロホンの各出力チャネル信号を、主に１つ
の音源の音響信号の成分のみを含む程度に複数の周波数
帯域に分割すると共にこれら分割された各出力チャネル
信号の各同一帯域ごとに、上記複数のマイクロホンの位
置に起因して変化する、マイクロホンに到達する音響信
号のパラメータの値の差を、帯域別チャネル間パラメー
タ値差として検出する帯域別チャネル間パラメータ値差
検出過程と、上記各帯域の帯域別チャネル間パラメータ値差にもとづ
き、その帯域の上記帯域分割された各出力チャネル信号
の何れがいずれの音源から入力された信号であるかを判
定する音源信号判定過程と、上記音源信号判定過程の判定にもとづき、上記帯域分割
された各出力チャネル信号から、同一音源から入力され
た信号を少なくとも１つ選択する音源信号選択過程と、上記音源信号選択過程で同一音源からの信号として選択
された複数の帯域信号を音源信号として合成する音源合
成過程とを有するコンピュータにより読出し可能な記録
媒体。38. A recording medium recording a program of a sound source separation method having the following process of separating at least one sound source from a plurality of sound sources using a plurality of microphones provided apart from each other, wherein each of the microphones Is divided into a plurality of frequency bands so as to mainly include only the sound signal component of one sound source, and each of the divided output channel signals is divided into the same band by the plurality of microphones. A band-based inter-channel parameter value difference detecting step of detecting, as a band-based inter-channel parameter value difference, a parameter value difference of an acoustic signal arriving at the microphone that changes due to the position; Of the output channel signals divided into the above-mentioned band in the band based on the parameter value difference between A sound source signal determining step of determining whether the signal is a divided signal, and a sound source for selecting at least one signal input from the same sound source from each of the band-divided output channel signals based on the determination of the sound source signal determining step. A computer-readable recording medium comprising: a signal selecting step; and a sound source synthesizing step of synthesizing, as a sound source signal, a plurality of band signals selected as signals from the same sound source in the sound source signal selecting step.

【請求項３９】請求項３８の記録媒体において、上記帯域別チャネル間パラメータ値差検出過程における
上記パラメータ値は音源からの音響信号が上記各マイク
ロホンに到達するまでの時間であり、上記帯域別チャネ
ル間パラメータ値差は各マイクロホンに到達するまでの
時間のマイクロホン間の差である帯域別チャネル間時間
差であって、上記プログラムは上記音響信号が各マイクロホンに到達
するまでの時間のマイクロホン間の差をチャネル間時間
差として各マイクロホンの出力チャネル信号から検出す
るチャネル時間差検出過程を有し、上記音源信号判定過程は、上記各帯域別チャネル間時間
差について、上記各チャネル間時間差を照合して、その
帯域の上記分割された各出力チャネル信号がいずれの音
源から入力された信号であるかを判定することを特徴と
する記録媒体。39. The recording medium according to claim 38, wherein the parameter value in the band-by-band parameter value difference detection process is a time required for an acoustic signal from a sound source to reach each of the microphones. The inter-parameter value difference is a time difference between channels for each band, which is a difference between microphones in a time required to reach each microphone. The program calculates a difference between microphones in a time required for the acoustic signal to reach each microphone. A channel time difference detecting step of detecting from the output channel signal of each microphone as the inter-channel time difference, wherein the sound source signal determining step compares the inter-channel time difference with respect to each band-by-band inter-channel time difference, and Each of the divided output channel signals is a signal input from any sound source. A recording medium characterized by determining whether a

【請求項４０】請求項３９の記録媒体において、上記チャネル時間差検出過程は各出力チャネル信号間の
相互相関を求め、相互相関の各ピークとなる、その出力
チャネル信号間の各時間差を上記各チャネル間時間差と
して求めることを特徴とする記録媒体。A recording medium 40. 39. The channel time difference detection process obtains a correlation between the output channel signals, and each peak of the cross-correlation, each channel each time difference between the output channel signals Time difference between
A recording medium characterized by being determined by:

【請求項４１】請求項４０の記録媒体において、上記帯域別チャネル間時間差は、上記各チャネル間時間
差中の、上記分割された各出力チャネルの同一帯域の成
分の位相差と対応する時間と最も近いものを求めて、そ
の帯域別チャネル間時間差とすることを特徴とする記録
媒体。41. The recording medium according to claim 40, wherein the time difference between channels for each band is the time corresponding to the phase difference between components of the same band of each of the divided output channels in the time difference between channels. A recording medium characterized by finding a close one and taking the time difference between channels for each band.

【請求項４２】請求項３８の記録媒体において、上記帯域別チャネル間パラメータ値差検出過程における
上記パラメータ値は音源からの音響信号が上記各マイク
ロホンに到達した時の信号レベルであり、上記帯域別チ
ャネル間パラメータ値差は各分割された出力チャネル信
号の対応帯域間のレベル差である帯域別チャネル間レベ
ル差であって、上記プログラムは上記各マイクロホンの出力チャネル信
号間のレベル差を、チャネル間レベル差として検出する
チャネル間レベル差検出過程と、上記チャネル間レベル差と、対応する帯域別チャネル間
レベル差の全てと比較する比較過程と、その比較過程で
分割帯域の所定数以上が同様の関係にあれば、上記チャ
ネル間レベル差にもとづき、対応する出力チャネル信号
の全帯域について同一の音源から入力された信号である
と判定し、上記比較過程で所定値以上が同様の関係にな
ければ、上記帯域別にいずれの音源から入力された信号
であるかを判定する、上記音源信号判定過程と実行する
過程とを有することを特徴とする記録媒体。42. The recording medium according to claim 38, wherein the parameter value in the band-by-band parameter value difference detection step is a signal level when an acoustic signal from a sound source reaches each of the microphones. The inter-channel parameter value difference is a band-to-channel level difference which is a level difference between corresponding bands of the divided output channel signals, and the program calculates a level difference between output channel signals of the microphones between channels. A step of detecting an inter-channel level difference detected as a level difference, a comparing step of comparing the inter-channel level difference with all of the corresponding inter-channel level differences, and a predetermined number of divided bands equal to or more than the same in the comparing step. If there is a relationship, the same sound source is used for all bands of the corresponding output channel signal based on the level difference between the channels. It is determined that the signal is input from, and if the predetermined value or more is not in the same relationship in the comparison process, it is determined whether the signal is input from which sound source for each band, the sound source signal determination process and Executing the recording medium.

【請求項４３】請求項３８の記録媒体において、上記プログラムは上記パラメータ値は音源からの音響信
号が上記マイクロホンに到達するまでの時間と、その音
響信号が到達した時の信号レベルであり、上記帯域別チ
ャネル間パラメータ値差として帯域別チャネル間時間差
と、帯域別チャネル間レベル差が求められ、各音源からの音響信号が上記各マイクロホンに到達する
までの時間のマイクロホン間の差を、各マイクロホンの
出力チャネル信号から、チャネル時間差として検出する
チャネル間時間差検出過程と、上記チャネル間時間差を基準にして、上記分割された各
出力チャネル信号を、低域、中域、高域の３つの周波数
領域に分ける領域分割過程とを有し、上記音源信号判定過程は、上記分割された低域の周波数帯域については、上記帯域
別チャネル間時間差を利用して対応する帯域の分割され
た各出力チャネル信号の何れがいずれの音源からの入力
信号であるか判定する過程と、上記分割された中域の周波数帯域については、上記帯域
別チャネル間レベル差と、上記帯域別チャネル間時間差
を利用して、対応する帯域の分割された各出力チャネル
信号の何れがいずれの音源からの入力信号であるか判定
する過程と、上記分割された高域の周波数帯域については、上記帯域
別チャネル間レベル差を利用して、対応する帯域の分割
された各出力チャネル信号の何れがいずれの音源からの
入力信号であるか判定する過程とからなることを特徴と
する記録媒体。43. The recording medium according to claim 38, wherein in the program, the parameter values are a time until an acoustic signal from a sound source reaches the microphone and a signal level when the acoustic signal reaches the microphone. The time difference between channels for each band and the level difference between channels for each band are obtained as the parameter value difference between channels for each band, and the difference between the microphones in the time required for the acoustic signal from each sound source to reach each of the microphones is determined by each microphone. And an inter-channel time difference detecting process for detecting the time difference between channels as a channel time difference from the output channel signal. The sound source signal determination step includes: for the divided low frequency band, A step of determining which of the output channel signals obtained by dividing the corresponding band by using the time difference between the channels for each band is an input signal from which sound source, and regarding the divided middle frequency band, A step of determining which of the sound source is an input signal from which sound source, using the band-to-channel level difference and the band-to-channel time difference, With respect to the divided high frequency band, it is determined which of the divided output channel signals of the corresponding band is the input signal from which sound source by using the above-mentioned inter-channel level difference for each band. And a recording medium.

【請求項４４】請求項３８乃至４３の何れかの記録媒
体において、上記プログラムは上記帯域分割された各出力チャネル信
号の帯域別レベルをそれぞれ検出する帯域別レベル検出
過程と、その帯域別レベル検出過程が検出された各帯域別レベル
を同一帯域についてチャネル間で比較した結果にもとづ
き発音をしていない音源を検出する音源状態判定過程
と、その音源状態判定過程で得た発音をしていない音源の検
出信号により、上記音源合成過程で合成された音源信号
のうち、上記発音していない音源と対応する信号を抑圧
する信号抑圧過程とを有することを特徴とする記録媒
体。44. The recording medium according to claim 38, wherein the program detects a band-specific level of each band-divided output channel signal, and the band-specific level detection. A sound source state determination process of detecting a sound source that is not sounding based on a result of comparing the level of each band in which the process is detected for the same band between channels, and a sound source that is not sounding obtained in the sound source state determination process A signal suppressing step of suppressing a signal corresponding to the sound source that is not sounding out of the sound source signals synthesized in the sound source synthesizing step in response to the detection signal.

【請求項４５】請求項４４の記録媒体において、上記音源状態判定過程は、上記各帯域別レベルのチャネ
ル間での比較で、最も大きいチャネルを帯域ごとに決定
する過程と、各チャネルごとに最もレベルが大きい帯域の数を求める
過程と、上記最もレベルが大きい帯域の数が第１基準値を越える
か否か判定する第１判定過程と、その第１判定過程で第１基準値を越えると判定すると、
その越えた最もレベルが大きい帯域の数と対応するチャ
ネルのマイクロホン位置から、発音している１個の音源
を推定する過程と、その推定された音源以外の音源を発音していないものと
して検出する過程とを有することを特徴とする記録媒
体。45. The recording medium according to claim 44, wherein the sound source state determination step includes a step of determining the largest channel for each band by comparing the channels at each band level, and a step of determining the largest channel for each channel. A step of obtaining the number of bands having the highest level; a first determining step of determining whether the number of bands having the highest level exceeds the first reference value; and a step of determining whether the number of bands having the highest level exceeds the first reference value. When judged,
A process of estimating one sounding sound source from the microphone position of the channel corresponding to the number of bands having the highest level and exceeding the number of bands, and detecting sound sources other than the estimated sound source as not sounding. And a recording medium.

【請求項４６】請求項４５の記録媒体において、上記プログラムは上記第１判定過程で、第１基準値を越
えるものがないと判定されると、上記最もレベルが大き
い帯域の数が、上記第１基準値よりも小さい第２基準値
以下か否かを判定する第３判定過程と、その第２判定過程で、第２基準値より小さいと判定され
ると、その小さいと判定された最もレベルが大きい帯域
の数と対応するチャネルのマイクロホン位置から、発音
していない１個の音源として検出する過程とを有するこ
とを特徴とする記録媒体。46. The recording medium according to claim 45, wherein in the first determination step, when it is determined that none of the programs exceeds a first reference value, the number of the bands having the highest level is determined by the number of the bands. A third determination step of determining whether the value is equal to or less than a second reference value smaller than one reference value, and in the second determination step, when it is determined that the value is smaller than the second reference value, the highest level determined to be smaller than the second reference value Detecting from the microphone position of the channel corresponding to the number of bands having a large number as a single sound source that is not sounding.

【請求項４７】請求項３８乃至４３の何れかの記録媒
体において、上記プログラムは上記帯域分割された各出力チャネル信
号のそのマイクロホンへの到達時間差を同一帯域ごと検
出する帯域別時間差検出過程と、この帯域別時間差検出過程で検出された各帯域別到達時
間差を、同一帯域についてチャネル間で比較した結果に
もとづき、発音をしていない音源を検出する音源状態判
定過程と、その音源状態判定過程で得た発音をしていない音源を検
出した検出信号により、上記音源合成過程で合成された
音源信号のうち、上記発音していない音源と対応する信
号を抑圧する信号抑圧過程とを有することを特徴とする
記録媒体。47. The recording medium according to claim 38, wherein the program detects a time difference for each band for detecting a difference in arrival time of each of the band-divided output channel signals to the microphone for the same band. Based on the result of comparing the arrival time differences for each band detected in the time difference detection process for each band between channels in the same band, a sound source state determination process for detecting a sound source that is not sounding, and a sound source state determination process for the sound source state determination process. A signal suppressing step of suppressing a signal corresponding to the sound source that is not sounding among the sound source signals synthesized in the sound source synthesizing step by a detection signal that detects the obtained sound source that is not sounding. Recording medium.

【請求項４８】請求項４７の記録媒体において、上記音源状態判定過程は、上記各帯域別到達時間差比較
で、最も速く音源信号が到達したチャネルを帯域ごとに
決定する過程と、各チャネルごとに最も速く到達した帯域の数が第１基準
値を越えるか否かを判定する第１判定過程と、その第１判定過程が第２基準値を越えると判定すると、
その越えた最も速く到達した帯域数と対応するチャネル
のマイクロホン位置から発音している１個の音源を推定
する過程と、その推定された音源以外の音源を発音していないものと
して検出する過程とを有することを特徴とする記録媒
体。48. The recording medium according to claim 47, wherein said sound source state determination step includes a step of determining, for each band, a channel at which the sound source signal has reached the earliest in each of said band arrival time difference comparisons. A first determining step of determining whether or not the number of bands that arrived fastest exceeds a first reference value; and determining that the first determining step exceeds a second reference value.
Estimating one sound source that is sounding from the microphone position of the channel that corresponds to the number of bands that have reached the fastest frequency, and detecting other sound sources other than the estimated sound source as not sounding. A recording medium comprising:

【請求項４９】請求項４８の記録媒体において、上記プログラムは上記第１判定過程で、第１基準値を越
えるものがないと判定されると、上記最も速く到達する
帯域の数が、上記第１基準値よりも小さい第２基準値よ
り小さいか否かを判定する第２判定過程と、その第３判定過程で、第２基準値より小さいと判定され
ると、その小さいと判定された最も速い到達時間の帯域
数と対応するチャネルのマイクロホン位置から、発音し
ていない１個の音源として検出する過程とを有すること
を特徴とする記録媒体。49. The recording medium according to claim 48, wherein in the first determining step, when it is determined that there is no one exceeding a first reference value, the number of the bands that reach the fastest is determined by the number of the bands. A second determination step of determining whether the second reference value is smaller than a second reference value smaller than the first reference value; and a third determination step, when it is determined that the second reference value is smaller than the second reference value, Detecting from the number of bands of the fast arrival time and the microphone position of the corresponding channel as one sound source that is not sounding.

【請求項５０】請求項４６又は４９の記録媒体におい
て、音源が４個以上の場合で、上記プログラムは上記第２判
定過程で、第２基準値より小さいと判定されると、上記
第２基準値を上記第１基準値を越えない範囲内で、順次
大きくして、上記第２判定過程と同じ判定を（Ｍ−２）
以内、Ｍは音源の数、繰返す過程を有することを特徴と
する記録媒体。50. The recording medium according to claim 46, wherein when the number of sound sources is four or more, and the program is determined in the second determination step to be smaller than a second reference value, the second reference value The value is sequentially increased within a range not exceeding the first reference value, and the same determination as in the second determination step is performed (M-2).
Wherein M is the number of sound sources and has a repeating process.

【請求項５１】請求項４４〜５０の何れかに記載の記
録媒体において、上記プログラムは各出力チャネル信号の全周波数成分の
レベルをそれぞれ検出する全帯域レベル検出過程と、その全帯域レベル検出過程で検出した各チャネルの全周
波数成分レベルの何れもが第３基準値以下であるかを判
定し、何れかが第３基準値以下でないと判定すると、上
記音源状態判定過程に移る第３判定過程とを有すること
を特徴とする記録媒体。51. The recording medium according to claim 44, wherein the program detects an entire band level detecting step of detecting the level of all frequency components of each output channel signal, and the entire band level detecting step. It is determined whether all of the frequency component levels of the respective channels detected in the above are below the third reference value, and if it is determined that none of them are below the third reference value, the third determination step proceeds to the sound source state determination step. And a recording medium comprising:

【請求項５２】請求項４７〜５０の何れかに記載の記
録媒体において、上記プログラムは上記チャネル時間差
検出過程は上記帯域時間差検出過程を兼ねていることを
特徴とする記録媒体。52. The recording medium according to claim 47, wherein said program has said channel time difference detection step also serving as said band time difference detection step.