TWI262433B - Voice locating system - Google Patents

Voice locating system Download PDF

Info

Publication number
TWI262433B
TWI262433B TW94110648A TW94110648A TWI262433B TW I262433 B TWI262433 B TW I262433B TW 94110648 A TW94110648 A TW 94110648A TW 94110648 A TW94110648 A TW 94110648A TW I262433 B TWI262433 B TW I262433B
Authority
TW
Taiwan
Prior art keywords
voice
module
statistical
signal
update
Prior art date
Application number
TW94110648A
Other languages
Chinese (zh)
Other versions
TW200636561A (en
Inventor
Jwu-Sheng Hu
Wei-Han Liou
Jie-Cheng Jeng
Original Assignee
Univ Nat Chiao Tung
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Nat Chiao Tung filed Critical Univ Nat Chiao Tung
Priority to TW94110648A priority Critical patent/TWI262433B/en
Application granted granted Critical
Publication of TWI262433B publication Critical patent/TWI262433B/en
Publication of TW200636561A publication Critical patent/TW200636561A/en

Links

Landscapes

  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The present invention relates to a voice locating system, wherein a sound signal is received by a microphone array and transmitted to a voice detecting system for determining the operational status procedure of the system; the acoustic signal is then transmitted to an environmental parameter training subsystem to produce a training signal simulated according to an additive principle; producing a statistical parameter from the training signal through the extraction of characteristics and the distribution describing the characteristics; and actuating a voice location detecting subsystem by the sound signal extracted by the microphone array and the statistical parameter to detect the location of a sounder.

Description

1262433 源訊號的能量,最後比較經過該波束形成器後的能量來 决疋聲源的角度。雖上述之方法可以在噪音環境下使 用,但仍無法解決有物體擋住和同方向但不同距離的問 題’且必須在麥克風間互相匹配下才可使用。 4·美國專利公告弟6,243,471號由Brandstein等人 提出利用二個以上之麥克風作為一個群組,再利用簡單 的幾何關係找出三維空間的資訊,因此,只要有複數個 群組就可產生複數個三維空間資訊,即便可估測出聲源 方位,亦不會有物體播住和同方向但不同距離的問題, 但上述之方法若使用在複雜的環境下,其所需的麥克風 陣列數目則相當可觀,且由於利用時間差聲源角度定位 法(TDOA),在高反射和暫態的時候就會產生角度上的 誤差,而利用簡單的幾何關係求解會因不準而有更大的 為差,雖可利用了求取變異數(var|_ance)配合高斯分布 的假設來製造出不同的權重以減少誤差,但若有雜訊存 在時,該高斯分布的假設就不能同時適用於雜訊和聲源 同時存在的狀況,其估測出來的角度會產生誤差而不 準’且必須在麥克風間互相匹配下才可使用。 美國專利公告第5,778,〇82號由Chu等人提出利 用簡略的聲源制线對是㈣生f要估測之聲音加以 辨以求找出雜訊區段,並預先估測雜訊之共相關矩 1262433 陣(Cr〇sscorrelation matrix ),而再利用該共相關矩陣與 欲估測之聲源的共相關矩陣作相減動作,以達到消去雜 訊所造成之影響的效果。但由於上述之方法並沒有針對 語音偵測設計,因此無法穩定的❹以語音發生,此外, 若雜訊之共相_陣估測不準確,騎造成估測結果之 誤差,亦無法對被擋住的物體作分辨,且必須使用匹配 的麥克風。 6·美國專利公告第5,465 3〇2號由⑷加等人提出 利用複數個麥克風並以兩兩計算出時間延遲(加㊀ delay)後,制用非平面波假設原理計算出聲源與麥克 風陣列之相對位置。上狀方法雖可估測出聲源之位 置’但需要麥克風間互相B,且對於環境蜂音、反射 和聲源與麥克風陣列間有障礙物等問題則無法解決。 7·美國專利公告第4,333,170號由Mathews等人提 出利用相位差異斜率(phase difference s|〇pe),並加以 料以計算出聲源對麥克風陣列之角度關係,並利用訊 號頻》曰之敗(1強度找出適合之頻率來加以分群,由於上 述之方法在聲音與雜訊包含同樣之頻帶時會發生誤差, 因而導致估測結果不準確,同時,該方法亦需麥歧間 互相匹配’ 不能分辨被擋住之聲源或同樣角度,距離 不同之聲源。 12624331262433 The energy of the source signal, and finally the energy after passing through the beamformer to determine the angle of the sound source. Although the above method can be used in a noisy environment, it is still impossible to solve the problem that objects are blocked and in the same direction but at different distances, and must be used until the microphones are matched with each other. 4. U.S. Patent Bulletin 6,243,471, by Brandstein et al., uses two or more microphones as a group, and then uses simple geometric relations to find information in three-dimensional space. Therefore, as long as there are multiple groups, multiple numbers can be generated. In the three-dimensional space information, even if the sound source orientation can be estimated, there will be no problem that objects are in the same direction but at different distances. However, if the above method is used in a complicated environment, the number of microphone arrays required is equivalent. Obvious, and due to the use of time difference source angle positioning method (TDOA), angle errors will occur in high reflection and transient, and the use of simple geometric relations will be more inaccurate due to inaccuracy. Although the assumption that the variance (var|_ance) is combined with the Gaussian distribution can be used to create different weights to reduce the error, if there is noise, the Gaussian distribution hypothesis cannot be applied to the noise and sound simultaneously. When the source is present at the same time, the estimated angle will produce an error and it will not be used and must be matched before the microphones match each other. U.S. Patent Bulletin No. 5,778, No. 82 was proposed by Chu et al. using a simple sound source line to identify the sounds to be estimated in order to find the noise section and to estimate the total amount of noise. Correlation moment 1262433 matrix (Cr〇sscorrelation matrix), and then use the co-correlation matrix and the co-correlation matrix of the sound source to be estimated as subtraction action to achieve the effect of eliminating the influence of noise. However, since the above method is not designed for speech detection, it is impossible to stably generate speech. In addition, if the common phase estimation of the noise is inaccurate, the error caused by the estimation of the riding cannot be blocked. The objects are resolved and a matching microphone must be used. 6. U.S. Patent Publication No. 5,465 3〇2, (4) Adding et al., using a plurality of microphones to calculate the time delay (plus one delay), the non-planar wave hypothesis is used to calculate the sound source and the microphone array. relative position. Although the upper method can estimate the position of the sound source, 'there is a need for B between the microphones, and there is no problem with environmental buzz, reflection, and obstacles between the sound source and the microphone array. 7. U.S. Patent No. 4,333,170, by Mathews et al., uses a phase difference s|〇pe and calculates the angle relationship between the sound source and the microphone array, and uses the signal frequency to defeat ( 1 Intensity Find the appropriate frequency to be grouped. Because the above method will cause errors when the sound and noise contain the same frequency band, the estimation result is inaccurate. At the same time, the method also needs to match each other. Distinguish the sound source that is blocked or the same angle, the source of the sound is different. 1262433

8. 由 Lo 等人於 IEEE TRANSACTIONS 〇N INSTRUMENTION AND MEASUREMENT, VOL.53,NO.4,AUGUST 2004 所發表的 Robust Joint Audio-Vvideo Localization in Video Conferencing Using Reliability Information中提出一種同時利用影像 與聲音訊號的方法來達到語者定位的效果,該方法的聲 音訊號定位部分只利用了簡單的延遲加成波束型成器 (delay and sum beamformer )來計算不同區域 (section)的聲音能量,並尋找最大能量的區域來當作 可能的語者位置,再與該影像之部分判斷融合,以達到 語者定位的目標,但由於該方法需要一組圓形麥克風陣 列放置在語者的中間,才能夠分出不同區域的聲音能 量,而該區域由麥克風陣列的圓心放射狀區分,並不方 便使用,若要達到較佳之穩定結果時,需要影像的整合, 該方法之系統架構複雜,且價格昂貴,不利使用。 9. IEEE TRANSACTIONS ON NEUWORAL NETWORKS, V〇L.11,N〇.4,JULY 2000由 Guner Arslan 等人所發表的A Unified NeurahNetwork-Based Speaker Localization Technique中使用 了以類神經網路 (neura卜network)為基礎的技術來作聲源定位,當訊號 雜訊比(SNR)高過20dB的時候,即使是大角度的定位 1262433 依然有很好的效果,且可用於近場(near-field)和遠場 (far-field)的應用中,但無法使用在周遭環境之噪音大 的時候,且也無法解決物體擔住和同方向但不同距離的 問題,並必須在麥克風間互相匹配下才可使用。 10. IEEE TRANSACTIONS ON SPEECH AND AUDIO PRESSING,VOL.8,N〇.2,MARCH 2000 由 James G_ Ryan 等人所發表的 Aarray Optimization Applied in the Near Field of a Microphone Array 中提 出當麥克風間距離等於一半波長(d = ^ )的時候會有最好 的效果,因此在作定位的時候特別將和“I這兩個情 況分開討論,且上述之方法最好的運用情況是聲源在近 場(near-field),而雜訊是在遠場(far-field)的時候, 該方法也無法解決物體擋住和同方向但不同距離的問 題,且必須在麥克風間互相匹配下才可使用。8. A Robust Joint Audio-Vvideo Localization in Video Conferencing Using Reliability Information, published by Lo et al. in IEEE TRANSACTIONS 〇N INSTRUMENTION AND MEASUREMENT, VOL.53, NO.4, AUGUST 2004, proposes a simultaneous use of video and audio signals. The method achieves the effect of speaker positioning. The sound signal localization part of the method only uses a simple delay and sum beamformer to calculate the sound energy of different sections and find the maximum energy. The region is regarded as a possible speaker position, and then merged with the part of the image to achieve the target of the speaker, but since the method requires a set of circular microphone arrays to be placed in the middle of the speaker, the difference can be distinguished. The sound energy of the area, which is radially separated by the center of the microphone array, is not convenient to use. If better stable results are needed, image integration is required. The system architecture of the method is complicated, expensive, and unfavorable. 9. IEEE TRANSACTIONS ON NEUWORAL NETWORKS, V〇L.11, N〇.4, JULY 2000 is used by Guner Arslan et al. A Unified Neurah Network-Based Speaker Localization Technique uses a neural network (neura) Based on the basic technology for sound source localization, even when the signal-to-noise ratio (SNR) is higher than 20dB, even the large-angle positioning 1262433 still has a good effect, and can be used for near-field and far-field. In the field (far-field) application, but can not be used when the surrounding environment is noisy, and can not solve the problem of the object bearing and the same direction but different distances, and must be matched before the microphones can be used. 10. IEEE TRANSACTIONS ON SPEECH AND AUDIO PRESSING, VOL.8, N〇.2, MARCH 2000, proposed by James G_Ryan et al., Aarray Optimization Applied in the Near Field of a Microphone Array, when the distance between the microphones is equal to half the wavelength (d = ^) will have the best effect, so in the positioning, it will be discussed separately from the "I" situation, and the best use of the above method is the sound source in the near field (near- Field), and the noise is in the far-field, this method can not solve the problem that the object blocks and the same direction but different distances, and must be matched before the microphones match each other.

11. Huang 等人在 IEEE TRANSACTIONS ON INSTRUMENTION AND MEASUREMENT, VOL.44,NO.3,JUNE 1995 所發表的 A Biomimetic System for Localization and Separation of Multiple Sound Sources中提出一種以「抵達時間差」(Arrival Temporal Disparities,ATD)的方式來計算聲源角度,該 方法必須偵測聲源的起始點(onset),並利用該起始點11. Huang et al. proposed an "Arrival Temporal Disparities" in the A Biomimetic System for Localization and Separation of Multiple Sound Sources published by IEEE TRANSACTIONS ON INSTRUMENTION AND MEASUREMENT, VOL.44, NO.3, JUNE 1995. ATD) way to calculate the sound source angle, the method must detect the origin of the sound source (onset) and use the starting point

Claims (1)

1262433十、申請專利範圍··1262433 X. Patent application scope·· •-種浯音定位系統,其係包含: 条μ 一麥克風陣列,係用以接收聲音訊號;一語音偵測 練子=用以決定系統之運作狀11流程;—環境參數訓 一 ·Μ二」係用以將聲波訊號藉由加成性原理模擬產生 Ζ練訊號’並將該訓練訊號經由特徵之抽取及描述此• A kind of voice positioning system, which includes: a μ μ microphone array for receiving audio signals; a voice detection training = used to determine the operation of the system 11 process; - environmental parameters training one · two Used to simulate the sound signal by the additive principle of the sound wave signal' and extract and describe the training signal through the feature. 早1 ^分佈^產生—統計參數;及—語音位置偵測 用以藉由該麥克風陣列所擷取之聲音訊號及 參數偵測語者位置,並利用估測結果,經由統計 ^更新模組來更新統計參數化模組。 2 · f據中請專利範圍第1項所述之語音定位系統,其中, 5亥麥克風陣列係包含至少2顆以上之麥克風。 3·依^中請專利範圍第i項所述之語音以緣統,其中, 參數訓練子彡統係包含—語者參考訊號之記憶 體、了環境噪音訊號之記憶體、—合併器、—相位差特 徵抽取模組及一特徵統計參數化模組。 4·依射請專利範圍第3項所述之語音定位系統,其中, 該特徵統計參數化模組係為混合高斯模型(gmm)'及核 心基礎模型(Kernel based model)中擇其一。 5.依據中請專利範圍第1項所述之語音定位系統,其中, 該語音位置偵測子系統係包含一位置偵測模組及、一統 计參數更新模組。 1262433Early 1 ^ distribution ^ generation - statistical parameters; and - voice position detection is used to detect the position of the speaker by the sound signal and parameters captured by the microphone array, and use the estimation result to update the module via the statistics Update the statistical parameterization module. The voice positioning system according to the first aspect of the invention, wherein the 5th microphone array comprises at least two microphones. 3. According to the voice mentioned in item i of the patent scope, the parameter training subsystem includes the memory of the speaker reference signal, the memory of the environmental noise signal, the merger, The phase difference feature extraction module and a feature statistical parameterization module. 4. The speech positioning system according to item 3 of the patent scope, wherein the characteristic statistical parameterization module is one of a mixed Gaussian model (gmm) and a kernel based model. 5. The voice positioning system of claim 1, wherein the voice position detection subsystem comprises a position detection module and a statistical parameter update module. 1262433 統計參數 更新模組 X X 執行 位置偵 測模組 X X 執行 特徵統計參 數化模組 X 執行 X 相位差特徵 抽取模組 X 執行 X X 更新 X 諺黎 ♦ ^ 更新 I_ X X ,CNJ TC CO todj CiaL) CiaiJ w π|ΜStatistical parameter update module XX Perform position detection module XX Perform feature statistical parameterization module X Execute X phase difference feature extraction module X Execute XX Update X 谚 ♦ ^ Update I_ XX , CNJ TC CO todj CiaL) CiaiJ w π|Μ 12624331262433 各頻帶所屬之頻率區間 2(A/-1W u I VI V I C^T VI V 麥克風對組合數 II 、一 (N II I II T s5 麥克風對組合 (w, m + Μ - \) 其中/w = l (m, m + M - 2) 其中 (m,m + 1) 其中 1 S w < Λ/ -1 頻帶編號 頻帶1 (t = i) 頻帶2 (b = 2) 頻帶Λ/ -1 (b = i\f-\) 画ε濉 1262433The frequency interval 2 to which each frequency band belongs (A/-1W u I VI VIC^T VI V microphone pair combination number II, one (N II I II T s5 microphone pair combination (w, m + Μ - \) where /w = l (m, m + M - 2) where (m, m + 1) where 1 S w < Λ / -1 band number band 1 (t = i) band 2 (b = 2) band Λ / -1 ( b = i\f-\) draw ε濉1262433 % W. 凝4Τ> 费 JJJD ΓΖ-» κ- ^ ¢1 ^ 5 S >θ 妹 W: W: W: 雜訊環境之 穩健性 •5- 夺 -fr- ^ Μ W: 夺 夺 •6- 近場與遠場 之應用 US 6,826,284 2004 US 0,013,275 US 6,449,593 US 6,243,471 US 5,778,082 US 5,465,302 US 4,333,170 Lo等人提出 Guner Arslan等人提出 James G· Ryan等人提出 Huang等人提出 I 本發明 (mM)_寸濉% W. 凝四Τ> Fee JJJD ΓΖ-» κ- ^ ¢1 ^ 5 S >θ 妹 W: W: W: Robustness of the noise environment •5- win-fr- ^ Μ W: Capture •6 - Application of near field and far field US 6,826,284 2004 US 0,013,275 US 6,449,593 US 6,243,471 US 5,778,082 US 5,465,302 US 4,333,170 Lo et al. proposed by Guner Arslan et al., James G. Ryan et al., proposed by Huang et al. (present) Inch
TW94110648A 2005-04-01 2005-04-01 Voice locating system TWI262433B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW94110648A TWI262433B (en) 2005-04-01 2005-04-01 Voice locating system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW94110648A TWI262433B (en) 2005-04-01 2005-04-01 Voice locating system

Publications (2)

Publication Number Publication Date
TWI262433B true TWI262433B (en) 2006-09-21
TW200636561A TW200636561A (en) 2006-10-16

Family

ID=37987726

Family Applications (1)

Application Number Title Priority Date Filing Date
TW94110648A TWI262433B (en) 2005-04-01 2005-04-01 Voice locating system

Country Status (1)

Country Link
TW (1) TWI262433B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI406266B (en) * 2011-06-03 2013-08-21 Univ Nat Chiao Tung Speech recognition device and a speech recognition method thereof

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI736117B (en) * 2020-01-22 2021-08-11 瑞昱半導體股份有限公司 Device and method for sound localization

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI406266B (en) * 2011-06-03 2013-08-21 Univ Nat Chiao Tung Speech recognition device and a speech recognition method thereof

Also Published As

Publication number Publication date
TW200636561A (en) 2006-10-16

Similar Documents

Publication Publication Date Title
CN107102296B (en) Sound source positioning system based on distributed microphone array
KR101659712B1 (en) Estimating a sound source location using particle filtering
JP5728094B2 (en) Sound acquisition by extracting geometric information from direction of arrival estimation
CN104041075B (en) Audio source location is estimated
CN111474521B (en) Sound source positioning method based on microphone array in multipath environment
KR100877914B1 (en) sound source direction detecting system by sound source position-time difference of arrival interrelation reverse estimation
CN108352818A (en) Audio-signal processing apparatus for enhancing voice signal and method
JP6467736B2 (en) Sound source position estimating apparatus, sound source position estimating method, and sound source position estimating program
RU2012155185A (en) ESTIMATION OF THE DISTANCE USING AUDIO SIGNALS
CN108828501B (en) Method for real-time tracking and positioning of mobile sound source in indoor sound field environment
Griffin et al. Localizing multiple audio sources from DOA estimates in a wireless acoustic sensor network
CN105607042A (en) Method for locating sound source through microphone array time delay estimation
Zhao et al. Real-time sound source localization using hybrid framework
Paulose et al. Acoustic source localization
TWI262433B (en) Voice locating system
Parisi et al. Source localization in reverberant environments by consistent peak selection
KR20090128221A (en) Method for sound source localization and system thereof
Brutti et al. Classification of acoustic maps to determine speaker position and orientation from a distributed microphone network
Svaizer et al. Environment aware estimation of the orientation of acoustic sources using a line array
TW201323838A (en) Method for visualizing sound source energy distribution in reverberant environment
Heckmann et al. Auditory inspired binaural robust sound source localization in echoic and noisy environments
Pasha et al. Informed source location and DOA estimation using acoustic room impulse response parameters
Salvati et al. A real-time system for multiple acoustic sources localization based on ISP comparison
Himawan et al. Clustering of ad-hoc microphone arrays for robust blind beamforming
KR20060124443A (en) Sound source localization method using head related transfer function database

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees