1262433 源訊號的能量,最後比較經過該波束形成器後的能量來 决疋聲源的角度。雖上述之方法可以在噪音環境下使 用,但仍無法解決有物體擋住和同方向但不同距離的問 題’且必須在麥克風間互相匹配下才可使用。 4·美國專利公告弟6,243,471號由Brandstein等人 提出利用二個以上之麥克風作為一個群組,再利用簡單 的幾何關係找出三維空間的資訊,因此,只要有複數個 群組就可產生複數個三維空間資訊,即便可估測出聲源 方位,亦不會有物體播住和同方向但不同距離的問題, 但上述之方法若使用在複雜的環境下,其所需的麥克風 陣列數目則相當可觀,且由於利用時間差聲源角度定位 法(TDOA),在高反射和暫態的時候就會產生角度上的 誤差,而利用簡單的幾何關係求解會因不準而有更大的 為差,雖可利用了求取變異數(var|_ance)配合高斯分布 的假設來製造出不同的權重以減少誤差,但若有雜訊存 在時,該高斯分布的假設就不能同時適用於雜訊和聲源 同時存在的狀況,其估測出來的角度會產生誤差而不 準’且必須在麥克風間互相匹配下才可使用。 美國專利公告第5,778,〇82號由Chu等人提出利 用簡略的聲源制线對是㈣生f要估測之聲音加以 辨以求找出雜訊區段,並預先估測雜訊之共相關矩 1262433 陣(Cr〇sscorrelation matrix ),而再利用該共相關矩陣與 欲估測之聲源的共相關矩陣作相減動作,以達到消去雜 訊所造成之影響的效果。但由於上述之方法並沒有針對 語音偵測設計,因此無法穩定的❹以語音發生,此外, 若雜訊之共相_陣估測不準確,騎造成估測結果之 誤差,亦無法對被擋住的物體作分辨,且必須使用匹配 的麥克風。 6·美國專利公告第5,465 3〇2號由⑷加等人提出 利用複數個麥克風並以兩兩計算出時間延遲(加㊀ delay)後,制用非平面波假設原理計算出聲源與麥克 風陣列之相對位置。上狀方法雖可估測出聲源之位 置’但需要麥克風間互相B,且對於環境蜂音、反射 和聲源與麥克風陣列間有障礙物等問題則無法解決。 7·美國專利公告第4,333,170號由Mathews等人提 出利用相位差異斜率(phase difference s|〇pe),並加以 料以計算出聲源對麥克風陣列之角度關係,並利用訊 號頻》曰之敗(1強度找出適合之頻率來加以分群,由於上 述之方法在聲音與雜訊包含同樣之頻帶時會發生誤差, 因而導致估測結果不準確,同時,該方法亦需麥歧間 互相匹配’ 不能分辨被擋住之聲源或同樣角度,距離 不同之聲源。 12624331262433 The energy of the source signal, and finally the energy after passing through the beamformer to determine the angle of the sound source. Although the above method can be used in a noisy environment, it is still impossible to solve the problem that objects are blocked and in the same direction but at different distances, and must be used until the microphones are matched with each other. 4. U.S. Patent Bulletin 6,243,471, by Brandstein et al., uses two or more microphones as a group, and then uses simple geometric relations to find information in three-dimensional space. Therefore, as long as there are multiple groups, multiple numbers can be generated. In the three-dimensional space information, even if the sound source orientation can be estimated, there will be no problem that objects are in the same direction but at different distances. However, if the above method is used in a complicated environment, the number of microphone arrays required is equivalent. Obvious, and due to the use of time difference source angle positioning method (TDOA), angle errors will occur in high reflection and transient, and the use of simple geometric relations will be more inaccurate due to inaccuracy. Although the assumption that the variance (var|_ance) is combined with the Gaussian distribution can be used to create different weights to reduce the error, if there is noise, the Gaussian distribution hypothesis cannot be applied to the noise and sound simultaneously. When the source is present at the same time, the estimated angle will produce an error and it will not be used and must be matched before the microphones match each other. U.S. Patent Bulletin No. 5,778, No. 82 was proposed by Chu et al. using a simple sound source line to identify the sounds to be estimated in order to find the noise section and to estimate the total amount of noise. Correlation moment 1262433 matrix (Cr〇sscorrelation matrix), and then use the co-correlation matrix and the co-correlation matrix of the sound source to be estimated as subtraction action to achieve the effect of eliminating the influence of noise. However, since the above method is not designed for speech detection, it is impossible to stably generate speech. In addition, if the common phase estimation of the noise is inaccurate, the error caused by the estimation of the riding cannot be blocked. The objects are resolved and a matching microphone must be used. 6. U.S. Patent Publication No. 5,465 3〇2, (4) Adding et al., using a plurality of microphones to calculate the time delay (plus one delay), the non-planar wave hypothesis is used to calculate the sound source and the microphone array. relative position. Although the upper method can estimate the position of the sound source, 'there is a need for B between the microphones, and there is no problem with environmental buzz, reflection, and obstacles between the sound source and the microphone array. 7. U.S. Patent No. 4,333,170, by Mathews et al., uses a phase difference s|〇pe and calculates the angle relationship between the sound source and the microphone array, and uses the signal frequency to defeat ( 1 Intensity Find the appropriate frequency to be grouped. Because the above method will cause errors when the sound and noise contain the same frequency band, the estimation result is inaccurate. At the same time, the method also needs to match each other. Distinguish the sound source that is blocked or the same angle, the source of the sound is different. 1262433
8. 由 Lo 等人於 IEEE TRANSACTIONS 〇N INSTRUMENTION AND MEASUREMENT, VOL.53,NO.4,AUGUST 2004 所發表的 Robust Joint Audio-Vvideo Localization in Video Conferencing Using Reliability Information中提出一種同時利用影像 與聲音訊號的方法來達到語者定位的效果,該方法的聲 音訊號定位部分只利用了簡單的延遲加成波束型成器 (delay and sum beamformer )來計算不同區域 (section)的聲音能量,並尋找最大能量的區域來當作 可能的語者位置,再與該影像之部分判斷融合,以達到 語者定位的目標,但由於該方法需要一組圓形麥克風陣 列放置在語者的中間,才能夠分出不同區域的聲音能 量,而該區域由麥克風陣列的圓心放射狀區分,並不方 便使用,若要達到較佳之穩定結果時,需要影像的整合, 該方法之系統架構複雜,且價格昂貴,不利使用。 9. IEEE TRANSACTIONS ON NEUWORAL NETWORKS, V〇L.11,N〇.4,JULY 2000由 Guner Arslan 等人所發表的A Unified NeurahNetwork-Based Speaker Localization Technique中使用 了以類神經網路 (neura卜network)為基礎的技術來作聲源定位,當訊號 雜訊比(SNR)高過20dB的時候,即使是大角度的定位 1262433 依然有很好的效果,且可用於近場(near-field)和遠場 (far-field)的應用中,但無法使用在周遭環境之噪音大 的時候,且也無法解決物體擔住和同方向但不同距離的 問題,並必須在麥克風間互相匹配下才可使用。 10. IEEE TRANSACTIONS ON SPEECH AND AUDIO PRESSING,VOL.8,N〇.2,MARCH 2000 由 James G_ Ryan 等人所發表的 Aarray Optimization Applied in the Near Field of a Microphone Array 中提 出當麥克風間距離等於一半波長(d = ^ )的時候會有最好 的效果,因此在作定位的時候特別將和“I這兩個情 況分開討論,且上述之方法最好的運用情況是聲源在近 場(near-field),而雜訊是在遠場(far-field)的時候, 該方法也無法解決物體擋住和同方向但不同距離的問 題,且必須在麥克風間互相匹配下才可使用。8. A Robust Joint Audio-Vvideo Localization in Video Conferencing Using Reliability Information, published by Lo et al. in IEEE TRANSACTIONS 〇N INSTRUMENTION AND MEASUREMENT, VOL.53, NO.4, AUGUST 2004, proposes a simultaneous use of video and audio signals. The method achieves the effect of speaker positioning. The sound signal localization part of the method only uses a simple delay and sum beamformer to calculate the sound energy of different sections and find the maximum energy. The region is regarded as a possible speaker position, and then merged with the part of the image to achieve the target of the speaker, but since the method requires a set of circular microphone arrays to be placed in the middle of the speaker, the difference can be distinguished. The sound energy of the area, which is radially separated by the center of the microphone array, is not convenient to use. If better stable results are needed, image integration is required. The system architecture of the method is complicated, expensive, and unfavorable. 9. IEEE TRANSACTIONS ON NEUWORAL NETWORKS, V〇L.11, N〇.4, JULY 2000 is used by Guner Arslan et al. A Unified Neurah Network-Based Speaker Localization Technique uses a neural network (neura) Based on the basic technology for sound source localization, even when the signal-to-noise ratio (SNR) is higher than 20dB, even the large-angle positioning 1262433 still has a good effect, and can be used for near-field and far-field. In the field (far-field) application, but can not be used when the surrounding environment is noisy, and can not solve the problem of the object bearing and the same direction but different distances, and must be matched before the microphones can be used. 10. IEEE TRANSACTIONS ON SPEECH AND AUDIO PRESSING, VOL.8, N〇.2, MARCH 2000, proposed by James G_Ryan et al., Aarray Optimization Applied in the Near Field of a Microphone Array, when the distance between the microphones is equal to half the wavelength (d = ^) will have the best effect, so in the positioning, it will be discussed separately from the "I" situation, and the best use of the above method is the sound source in the near field (near- Field), and the noise is in the far-field, this method can not solve the problem that the object blocks and the same direction but different distances, and must be matched before the microphones match each other.
11. Huang 等人在 IEEE TRANSACTIONS ON INSTRUMENTION AND MEASUREMENT, VOL.44,NO.3,JUNE 1995 所發表的 A Biomimetic System for Localization and Separation of Multiple Sound Sources中提出一種以「抵達時間差」(Arrival Temporal Disparities,ATD)的方式來計算聲源角度,該 方法必須偵測聲源的起始點(onset),並利用該起始點11. Huang et al. proposed an "Arrival Temporal Disparities" in the A Biomimetic System for Localization and Separation of Multiple Sound Sources published by IEEE TRANSACTIONS ON INSTRUMENTION AND MEASUREMENT, VOL.44, NO.3, JUNE 1995. ATD) way to calculate the sound source angle, the method must detect the origin of the sound source (onset) and use the starting point