JP2007085734A

JP2007085734A - Sound source direction detection device and method

Info

Publication number: JP2007085734A
Application number: JP2005271227A
Authority: JP
Inventors: Tomoko Matsui; 知子松井; Kunihito Tanabe; 國士田邉; Toshio Irino; 俊夫入野
Original assignee: Research Organization of Information and Systems
Current assignee: Research Organization of Information and Systems
Priority date: 2005-09-16
Filing date: 2005-09-16
Publication date: 2007-04-05

Abstract

<P>PROBLEM TO BE SOLVED: To provide a sound source direction detection method and a sound source direction detection method based on a principle quite different from a conventional method using a mutual correlation coefficient of sample data. <P>SOLUTION: A signal from a sound source 1 is received by both pseudo ears. A received sound signal is converted nonlinearly, for example, an inner ear model is applied to the signal received by both pseudo ears to thereby convert it into an inner ear signal. The sound source direction is detected by a dual Penalized Logistic Regression Machine (dPLRM) by using a linear combined information between a weight matrix W and the inner ear signal X. The weight matrix W is learned by using the dPLRM from a plurality of training data expressed by a pair of the inner ear signal and its right answer direction. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、音源方向検出装置及び音源方向検出方法に関する。すなわち、方向が既知の音源データ（訓練用データ）を用いて学習機械の一種である罰金付きロジスティック回帰マシンの双対機械（ｄｕａｌＰｅｎａｌｉｚｅｄＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎＭａｃｈｉｎｅ；ｄＰＬＲＭ）に学習させ、学習済みの学習機械によって未知の音源データに対して、音源方向の確率を計算し、音源方向を推定する音源方向検出装置及び音源方向検出方法に関する。 The present invention relates to a sound source direction detecting device and a sound source direction detecting method. That is, using a sound source data (training data) whose direction is known, a dual machine (dually registered logistic regression machine; dPLRM) of a fine logistic regression machine, which is a kind of learning machine, learns and is unknown by a learned learning machine. The present invention relates to a sound source direction detection apparatus and a sound source direction detection method for calculating the probability of a sound source direction and estimating the sound source direction.

音源方向は、複数のマイクロホン（マイクロホンアレー）で受信した信号の時間差によって物理的に特定出来る。従来の工学的な音源方向の検出においては、この原理に基づいて、マイクロホンアレーを配置し、それらの受音信号の相互相関係数を計算し、時間差を推定することにより音源方向を推定している。実際、受音信号の相関行列を用いた線形予測モデルなどのパラメトリックモデルによる音源方向の検出法がいくつか提案されている。それらの方法ではいずれも、受音信号の標本相互相関係数、または相関行列を求める必要がある。（非特許文献１参照） The sound source direction can be physically specified by the time difference between signals received by a plurality of microphones (microphone array). In conventional engineering sound source direction detection, based on this principle, a microphone array is arranged, the cross-correlation coefficient of those received signals is calculated, and the sound source direction is estimated by estimating the time difference. Yes. In fact, several sound source direction detection methods using a parametric model such as a linear prediction model using a correlation matrix of received sound signals have been proposed. In any of these methods, it is necessary to obtain the sample cross-correlation coefficient or correlation matrix of the received signal. (See Non-Patent Document 1)

人間は両耳により、音源方向とともに、音の広がり感も知覚することができる。前記の工学的な音源方向の検出法では音の広がり感は扱うことができない。人間の優れた方向知覚の解明が、機械装置による音源方向検出に有用である。 With both ears, humans can perceive the sound spread as well as the direction of the sound source. The engineered sound source direction detection method cannot handle the sense of sound spread. Elucidation of human direction perception is useful for sound source direction detection by mechanical devices.

近年、人間が行っている学習を真似て、ニューラル・ネットワーク（ＮＮ）やサポートベクターマシン（ＳＶＭ）等の学習機械と呼ばれる数理モデルを用いて訓練用データによってパラメ−タを学習させ、予測を行う方法が開発され、音声に限らずあらゆる信号処理やデータ解析に有効であることが広く知られている。（非特許文献２，３参照） In recent years, human learning is imitated, parameters are learned by training data using mathematical models called learning machines such as neural networks (NN) and support vector machines (SVM), and prediction is performed. A method has been developed, and it is widely known that it is effective not only for speech but also for all signal processing and data analysis. (See Non-Patent Documents 2 and 3)

学習機械の一種である罰金付きロジスティック回帰機械（ＰｅｎａｌｉｚｅｄＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎＭａｃｈｉｎｅｓ；ＰＬＲＭ）及びその双対機械（ｄｕａｌＰｅｎａｌｉｚｅｄＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎＭａｃｈｉｎｅｓ；ｄＰＬＲＭ）は、ＮＮやＳＶＭと異なり、確率的予測（判定）ができることに特色があり、話者認識などに有効であることが示されている。（非特許文献４〜９参照） A kind of learning machine, the Penalized Logistic Regression Machine (PLRM) and its dual machine (Dual Penalized Logistics Machines; dPLRM), which are different from NN and SVM, can be determined with a special probability. It is shown that it is effective for speaker recognition. (See Non-Patent Documents 4-9)

大賀寿郎、山崎芳雄、金田豊、“音響システムとディジタル信号処理”、電子情報通信学会、１９９５．Toshiro Oga, Yoshio Yamazaki, Yutaka Kaneda, "Acoustic system and digital signal processing", IEICE, 1995. 麻生英樹、“ニューラルネットワーク情報処理”、産業図書、１９８８．Hideki Aso, “Neural Network Information Processing”, Sangyo Tosho, 1988. 麻生英樹、津田宏治、村田昇、“パタン認識と学習の統計学、新しい概念と手法”、岩波書店、２００３．Hideki Aso, Koji Tsuda, Noboru Murata, “Statistics of Pattern Recognition and Learning, New Concepts and Methods”, Iwanami Shoten, 2003. Ｋ．Ｔａｎａｂｅ，“Ｐｅｎａｌｉｚｅｄｌｏｇｉｓｔｉｃｒｅｇｒｅｓｓｉｏｎｍａｃｈｉｎｅｓ：Ｎｅｗｍｅｔｈｏｄｓｆｏｒｓｔａｔｉｓｔｉｃａｌｐｒｅｄｉｃｔｉｏｎ１” ，ＩＳＭＣｏｏｐｅｒａｔｉｖｅＲｅｓｅａｒｃｈＲｅｐｏｒｔ１４３，ｐｐ．１６３−１９４，Ｍａｒｃｈ２００１．K. Tanabe, “Penalized logistic regression machines: New methods for statistical prediction 1”, ISM Cooperative Research Report 143, pp. 197 163-194, March 2001. Ｋ．Ｔａｎａｂｅ，“Ｐｅｎａｌｉｚｅｄｌｏｇｉｓｔｉｃｒｅｇｒｅｓｓｉｏｎｍａｃｈｉｎｅｓ：Ｎｅｗｍｅｔｈｏｄｓｆｏｒｓｔａｔｉｓｔｉｃａｌｐｒｅｄｉｃｔｉｏｎ２” ，第４回情報論的学習理論ワークショップ（ＩＢＩＳ２００１），ｐｐ．７１〜７６，Ｊｕｌｙ．２００１．K. Tanabé, “Penalized logistic regression machines: New methods for statistical prediction 2”, 4th Informational Learning Theory Workshop (IBIS2001), pp. 11-28. 71-76, July. 2001. Ｋ．Ｔａｎａｂｅ，“ＰｅｎａｌｉｚｅｄｌｏｇｉｓｔｉｃｒｅｇｒｅｓｓｉｏｎｍａｃｈｉｎｅｓａｎｄＲｅｌａｔｅｄＬｉｎｅａｒＮｕｍｅｒｉｃａｌＡｌｇｅｂｒａ” ，京都大学数理解析研究所、講究録１３２０，ｐｐ．２３９〜２４９，２００３．K. Tanab, “Penalized logistic regression machines and Related Linear Numerical Algebra”, Institute of Mathematical Analysis, Kyoto University, Proc. 239-249, 2003. Ｔ．ＭａｔｓｕｉａｎｄＫ．Ｔａｎａｂｅ，“ＳｐｅａｋｅｒＩｄｅｎｔｉｆｉｃａｔｉｏｎｗｉｔｈＤｕａｌＰｅｎａｌｉｚｅｄＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎＭａｃｈｉｎｅ” ，ＰｒｏｃｅｅｄｉｎｇｓｏｆＯｄｙｓｓｅｙ，ｐｐ．３６３〜３６６，２００４．T. T. et al. Matsui and K.M. Tanab, “Speaker Identification with Dual Penalized Logic Regression Machine”, Processeds of Odyssey, pp. 363-366, 2004. Ｔ．ＭａｔｓｕｉａｎｄＫ．Ｔａｎａｂｅ，“ＰｒｏｂａｂｉｌｉｓｔｉｃＳｐｅａｋｅｒＩｄｅｎｔｉｆｉｃａｔｉｏｎｗｉｔｈｄｕａｌＰｅｎａｌｉｚｅｄＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎＭａｃｈｉｎｅ” ，ＰｒｏｃｅｅｄｉｎｇｓｏｆＩＣＳＬＰ，ｐｐ．ＩＩＩ−１７９７〜１８００，２００４．T. T. et al. Matsui and K.M. Tanab, “Probabilistic Spike Identification with Dual Penalized Logistic Regression Machine”, Proceedings of ICSLP, pp. III-1797-1800, 2004. Ｔ．ＭａｔｓｕｉａｎｄＫ．Ｔａｎａｂｅ，“ＳｐｅａｋｅｒＲｅｃｏｇｎｉｔｉｏｎｗｉｔｈｏｕｔＦｅａｔｕｒｅＥｘｔｒａｃｔｉｏｎＰｒｏｃｅｓｓ” ，ＰｒｏｃｅｅｄｉｎｇｓｏｆＷｏｒｋｓｈｏｐｏｎＳｔａｔｉｓｔｉｃａｌＭｏｄｅｌｉｎｇＡｐｐｒｏａｃｈｆｏｒＳｐｅｅｃｈＲｅｃｏｇｎｉｔｉｏｎ：ＢｅｙｏｎｄＨＭＭ，ｐｐ．７９〜８４，２００４．T. T. et al. Matsui and K.M. Tanab, “Speaker Recognition without Feature Extraction Process”, Processeds of Workshops on Strategic Modeling for Prop. 79-84, 2004.

しかしながら、受音信号の標本相互相関係数、または相関行列を求めるには、コンピュータ演算に比較的多くの資源と時間を必要とする。また、人間の優れた方向知覚に近い学習機械による音源方向検出が求められるようになった。 However, in order to obtain a sample cross-correlation coefficient or correlation matrix of a received sound signal, a relatively large amount of resources and time are required for computer computation. In addition, sound source direction detection by a learning machine that is close to human's superior direction perception has been required.

本発明は、音源方向検出問題に、ｄＰＬＲＭを適用し、しかも最も単純な一次多項式の核関数を使って音源方向の検出ができることを見出したもので、先に述べた従来の方法が依拠する原理からみると意外な結果である。本発明はこの発見を背景として提供するものである。 The present invention has found that sound source direction can be detected by applying dPLRM to the sound source direction detection problem and using the simplest kernel function of a first-order polynomial, and the principle on which the above-described conventional method relies. This is an unexpected result. The present invention provides this discovery as a background.

本発明は、複数の受信装置から得られる信号を変換した信号の線形結合情報のみを用いて、ｄＰＬＲＭによって音源方向を検出できるという実験的発見に基づいて、標本データの相互相関係数を用いる従来の方法とは全く異なる原理に基づく音源方向検出法装置及び音源方向検出方法を提供することを目的とする。 The present invention uses a cross-correlation coefficient of sample data based on an experimental discovery that a sound source direction can be detected by dPLRM using only linear combination information of signals obtained by converting signals obtained from a plurality of receiving apparatuses. It is an object of the present invention to provide a sound source direction detection method device and a sound source direction detection method based on a completely different principle from the above method.

上記課題を解決するために、請求項１に記載の音源方向検出装置２は、例えば図１に示すように、音源１からの信号を複数の受信器で受信する受信部３と；受信部３で受信した信号に内耳モデル等の時間周波数分析を適用して、周波数帯域毎の振幅時系列信号Ｘ_Ｌ，Ｘ_Ｒに変換する信号変換部４と；学習用データを用いて重み行列Ｗ_Ｌ ^Ｋ，Ｗ_Ｒ ^Ｋを学習しながら更新する学習手段６と、重み行列Ｗ_Ｌ ^Ｋ，Ｗ_Ｒ ^Ｋと振幅時系列信号Ｘ_Ｌ，Ｘ_Ｒの線形結合情報を用いて音源方向の確率を計算する確率演算手段７と、確率演算手段７で計算された確率に基づいて音源方向を判定する音源方向判定手段８とを有する学習・検出部５とを備える。 In order to solve the above-mentioned problem, a sound source direction detecting device 2 according to claim 1 includes, for example, a receiving unit 3 that receives signals from the sound source 1 by a plurality of receivers as shown in FIG. A signal conversion unit 4 that applies time-frequency analysis such as an inner ear model to the signal received in step S1 to convert it into amplitude time-series signals X _L and X _R for each frequency band; and a weighting matrix W _L ^K using learning data , W _R ^K learning means 6 for updating while learning, and probability calculation for calculating the probability of the sound source direction using linear combination information of the weight matrices W _L ^K , W _R ^K and the amplitude time series signals X _L , X _R The learning / detecting unit 5 includes means 7 and sound source direction determining means 8 for determining the sound source direction based on the probability calculated by the probability calculating means 7.

ここにおいて、周波数帯域毎の振幅時系列信号として、典型的には神経発火パターンが用いられるが、これに限られず、フィルタバンク分析による帯域フィルタバンクの出力等を用いても良い。また、典型的には受信部の受信器は２個であるが、３以上でも良い。これに対応して、線形結合は、２つの振幅時系列信号と２つの重み行列を結合するものに限られず、３以上の振幅時系列信号、重み行列を結合しても良い。このように構成すると、標本データの相互相関をとることなく、複数の受信装置から得られる信号の線形結合情報のみから音源方向を検出できる音源方向検出装置を提供できる。 Here, a neural firing pattern is typically used as the amplitude time-series signal for each frequency band. However, the present invention is not limited to this, and the output of a band filter bank by filter bank analysis may be used. Typically, there are two receivers in the receiving unit, but three or more may be used. Correspondingly, the linear combination is not limited to combining two amplitude time series signals and two weight matrices, and three or more amplitude time series signals and weight matrices may be combined. If comprised in this way, the sound source direction detection apparatus which can detect a sound source direction only from the linear combination information of the signal obtained from a some receiver, without taking the cross correlation of sample data can be provided.

請求項２に記載の発明は、請求項１に記載の音源方向検出装置において、例えば図１に示すように、学習・検出部５は罰金付きロジスティック回帰マシン（ｄｕａｌＰｅｎａｌｉｚｅｄＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎＭａｃｈｉｎｅ；ｄＰＬＲＭ）を用いる。このように構成すると、ｄＰＬＲＭを用いることにより、計算量が大幅に改善され、音源方向の検出を特に効率的に行うことができる。 The invention according to claim 2 is the sound source direction detecting device according to claim 1, in which, for example, as shown in FIG. 1, the learning / detecting unit 5 uses a fine penalized logistic regression machine (dPLRM). Use. If comprised in this way, calculation amount will be improved significantly by using dPLRM, and the detection of a sound source direction can be performed especially efficiently.

請求項３に記載の音源方向検出方法は、例えば図２に示すように、音源からの信号を複数の受信器で受信する受信工程（Ｓ００１）と；受信工程で受信した信号に内耳モデル等の時間周波数分析を適用して、周波数帯域毎の振幅時系列信号Ｘ_Ｌ，Ｘ_Ｒに変換する信号変換工程（Ｓ００２）と；訓練用データを用いて重み行列Ｗ_Ｌ ^Ｋ，Ｗ_Ｒ ^Ｋを学習しながら更新する学習工程（Ｓ００３）と；重み行列Ｗ_Ｌ ^Ｋ，Ｗ_Ｒ ^Ｋと振幅時系列信号Ｘ_Ｌ，Ｘ_Ｒの線形結合情報を用いて音源方向の確率を計算する確率演算工程（Ｓ００４）と；確率演算工程で計算された確率に基づいて音源方向を判定する音源方向判定工程（Ｓ００５）とを備える。 The sound source direction detection method according to claim 3 includes a receiving step (S001) of receiving signals from a sound source by a plurality of receivers as shown in FIG. 2, for example, an inner ear model or the like in the signal received in the receiving step Applying time-frequency analysis, a signal conversion step (S002) for converting the amplitude time-series signals X _L and X _R for each frequency band; and learning the weight matrices W _L ^K and W _R ^K using the training data A learning step (S003) that is updated while updating; a probability calculation step (S004) that calculates the probability of the sound source direction using the linear combination information of the weight matrices W _L ^K and W _R ^K and the amplitude time series signals X _L and X _R A sound source direction determination step (S005) for determining the sound source direction based on the probability calculated in the probability calculation step.

このように構成すると、標本データの相互相関をとることなく、複数の受信装置から得られる信号の線形結合情報のみから音源方向を検出できる音源方向検出方法を提供できる。３つ以上の受音信号が利用できる場合にも、線形結合を用いて同様に音源方向判定できる。 If comprised in this way, the sound source direction detection method which can detect a sound source direction only from the linear combination information of the signal obtained from a some receiver, without taking the cross correlation of sample data can be provided. Even when three or more received sound signals can be used, the sound source direction can be similarly determined using linear combination.

請求項４に記載の発明は、請求項３に記載の音源方向検出方法において、例えば図２に示すように、学習工程（Ｓ００３）、確率演算工程（Ｓ００４）及び音源方向判定工程（Ｓ００５）に、罰金付きロジスティック回帰マシン（ｄｕａｌＰｅｎａｌｉｚｅｄＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎＭａｃｈｉｎｅ；ｄＰＬＲＭ）を用いる。このように構成すると、ｄＰＬＲＭの学習機能を用いることにより、音源方向の検出を特に効率的に行うことができる。 According to a fourth aspect of the present invention, in the sound source direction detecting method according to the third aspect, for example, as shown in FIG. 2, the learning step (S003), the probability calculating step (S004), and the sound source direction determining step (S005) , Using a Fine Penalized Logistic Regression Machine (dPLRM). If comprised in this way, the detection of a sound source direction can be performed especially efficiently by using the learning function of dPLRM.

請求項５に記載のプログラムは、請求項３又は請求項４に記載の音源方向検出方法をコンピュータに実行させるためのプログラムである。 A program according to claim 5 is a program for causing a computer to execute the sound source direction detection method according to claim 3 or claim 4.

本発明によれば、標本データの相互相関をとることなく、複数の受信装置から得られる信号の線形結合情報のみから音源方向を検出できる音源方向検出装置及び音源方向検出方法を提供できる。 According to the present invention, it is possible to provide a sound source direction detecting device and a sound source direction detecting method capable of detecting a sound source direction only from linear combination information of signals obtained from a plurality of receiving devices without taking a cross-correlation of sample data.

以下に本発明による実施の形態を図面に基づいて説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１に、本発明の実施の形態による音源方向検出装置の構成例のブロック図を示す。図１において、１は音源、２は音源方向検出装置である。音源方向検出装置２は、音源１からの信号を受信器としての両擬似耳で受信する受信部３、擬似耳で受信した信号に時間周波数分析を適用して周波数帯域毎の振幅時系列信号としての内耳信号Ｘに変換する信号変換部４、重み行列Ｗの学習と音源方向の検出を行う学習・検出部５、学習・検出部５の検出結果等を記憶する記憶部９、学習・検出部５の検出結果等を表示する表示部１０、音源方向検出装置２を構成する各部を制御して、音源方向検出を実行させる制御部１１等から構成される。学習・検出部５は、罰金付きロジスティック回帰機械の双対機械（ｄｕａｌＰｅｎａｌｉｚｅｄＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎＭａｃｈｉｎｅ；ｄＰＬＲＭ）により構成される。訓練用データを入力してパラメータである重み行列Ｗ_Ｌ ^Ｋ，Ｗ_Ｒ ^Ｋを学習しながら更新する学習手段６、重み行列Ｗ_Ｌ ^Ｋ，Ｗ_Ｒ ^Ｋと周波数帯域毎の振幅時系列信号としての内耳信号Ｘ_Ｌ，Ｘ_Ｒの線形結合情報を用いて音源方向の確率を計算する確率演算手段７、確率演算手段７で計算された確率に基づいて音源方向を判定する音源方向判定手段８等から構成される。 FIG. 1 shows a block diagram of a configuration example of a sound source direction detection apparatus according to an embodiment of the present invention. In FIG. 1, 1 is a sound source and 2 is a sound source direction detecting device. The sound source direction detection device 2 receives a signal from the sound source 1 with both pseudo-ears as a receiver, and applies time-frequency analysis to the signals received with the pseudo-ears as amplitude time-series signals for each frequency band. A signal converting unit 4 for converting the inner ear signal X into a signal, a learning / detecting unit 5 for learning the weight matrix W and detecting a sound source direction, a storage unit 9 for storing the detection results of the learning / detecting unit 5, and the learning / detecting unit 5 includes a display unit 10 that displays detection results and the like, and a control unit 11 that controls each unit of the sound source direction detection device 2 to execute sound source direction detection. The learning / detecting unit 5 is constituted by a dual machine (dually registered logistic regression machine; dPLRM). Weighting matrix is a parameter to input training data W _{L ^K,} W learning means 6 for updating while learning _R ^K, weight matrix W _{L ^{^K,}} W _R _K and the inner ear as amplitude time series signal for each frequency band Probability calculation means 7 for calculating the probability of the sound source direction using linear combination information of signals X _L and X _R , sound source direction determination means 8 for determining the sound source direction based on the probability calculated by the probability calculation means 7 and the like Is done.

図２に本実施の形態における音源方向検出方法の処理フローを示す。また、図３に信号の流れを模式的に示す。音源１は受信部の周り−９０度〜＋９０度のいずれかの方向に在り、音声、音響などの音を発信する。受信部３では、音源からの信号を両耳を擬似したマイクロホンで受信する（Ｓ００１）。さらに、内耳モデル（ｉｎｎｅｒｅａｒｍｏｄｅｌ）を適用し、両耳の内耳出力（神経発火パタ−ン：ｎｅｕｒａｌａｃｔｉｖｅｐａｔｔｅｒｎ；ＮＡＰ）を模擬し、内耳信号Ｘ（左耳についてＸ_Ｌ、右耳についてＸ_Ｒ）を生成する（Ｓ００２）。 FIG. 2 shows a processing flow of the sound source direction detection method in the present embodiment. FIG. 3 schematically shows the signal flow. The sound source 1 is located in any direction from −90 degrees to +90 degrees around the receiving unit, and transmits sound such as voice and sound. The receiving unit 3 receives a signal from a sound source with a microphone simulating both ears (S001). Further, by applying the inner ear model (inner, ear model), the binaural inner ear output (neuronal firing patterns - down: neural active pattern; NAP) simulating the inner ear signal X (for the left ear _{X L,} the right ear _{X R} ) Is generated (S002).

図４に内耳信号Ｘの例を示す。具体的には、Ｘ_ＬおよびＸ_Ｒをそれぞれ左右の内耳の神経発火パターンとする。それらはチャネル数Ｃと時間サンプル数Ｔの時間周波数表現（Ｔ×Ｃ行列）となっている。縦軸は音パワーを表し、他の２軸はチャネルＣ（周波数）及び時間Ｔ（目盛９６が２ｍｓに相当する）を表す。 FIG. 4 shows an example of the inner ear signal X. Specifically, the inner ear of the left and right of the X _L and X _R each neuronal firing patterns. They have a time frequency representation (T × C matrix) of channel number C and time sample number T. The vertical axis represents sound power, and the other two axes represent channel C (frequency) and time T (scale 96 corresponds to 2 ms).

人の聴覚抹消系においては、内耳（蝸牛）で周波数分析が行なわれ、数１０Ｈｚ〜２０ｋＨｚの周波数に対応した場所の聴神経の発火に変換される。この周波数分析機能は基底膜の機械的な振動によって実現されていて、比較的狭帯域のフィルタが多数並んでいる系（フィルタバンク）で近似できる。これを表現する内耳モデルは、たとえばガンマチャープ聴覚フィルタ、およびメディスの内有毛細胞モデルがよく用いられる。（Ｔ．ＩｒｉｎｏａｎｄＲ．Ｄ．ＰａｔｔｅｒｓｏｎＪ．Ａｃｏｕｓｔ．Ｓｏｃ．Ａｍ．１０９（５），ｐｐ．２００８−２０２２，２００１．及びＲ．Ｍｅｄｄｉｓ，Ｊ．Ａｃｏｕｓｔ．Ｓｏｃ．Ａｍ．８３（３），ｐｐ．１０５６−１０６３ｍ１９８８．参照） In the human auditory extinction system, frequency analysis is performed on the inner ear (cochlea), which is converted into firing of the auditory nerve at a location corresponding to a frequency of several tens of Hz to 20 kHz. This frequency analysis function is realized by mechanical vibration of the basement membrane, and can be approximated by a system (filter bank) in which a number of relatively narrow-band filters are arranged. For example, a gamma chirp auditory filter and Medis inner hair cell model are often used as the inner ear model for expressing this. (T.Irino and RD Patterson J. Acoust. Soc. Am. 109 (5), pp. 2008-2022, 2001. and R. Meddis, J. Acoust. Soc. Am. 83 (3), pp. .1056-1063m 1988.)

学習・検出部５では、まず、学習手段６において多数の訓練用データ（正解方向が既知である）を用いて、１次の多項式カーネル関数を持つｄＰＬＲＭの罰金付尤度が大きくなるように、ｄＰＬＲＭに基づいてパラメータである重み行列Ｗ_Ｌ ^Ｋ，Ｗ_Ｒ ^Ｋを学習しながら更新する（Ｓ００３）。重み行列Ｗは後述する（式１）、（式２）で表されるように、内耳信号Ｘ_Ｌ，Ｘ_Ｒの予測確率への寄与の重みを表すものである。ここにｋは方向を表すパラメータであり、例えば−９０度〜９０度を１０度毎に分割すると１９方向が存在し、ｋは１〜１９となる。Ｌ，Ｒは左右を表す。したがって、重み行列は１９個のＴ×Ｃの行列で表される。なお、内耳信号（Ｘ_Ｌ，Ｘ_Ｒ）は神経発火パターンを表すＴ×Ｃの行列である。 In the learning / detecting unit 5, first, the learning means 6 uses a large number of training data (the correct answer direction is known) so that the fine likelihood of dPLRM having a first-order polynomial kernel function is increased. The weighting matrices W _L ^K and W _R ^K as parameters are updated while learning based on dPLRM (S003). The weight matrix W represents the weight of the contribution to the prediction probability of the inner ear signals X _L and X _R as expressed by (Expression 1) and (Expression 2) described later. Here, k is a parameter indicating a direction. For example, when −90 degrees to 90 degrees are divided every 10 degrees, there are 19 directions, and k is 1 to 19. L and R represent left and right. Therefore, the weight matrix is represented by 19 T × C matrices. The inner ear signals (X _L , X _R ) are T × C matrices representing nerve firing patterns.

次に確率演算手段７において、重み行列Ｗ_Ｌ ^Ｋ，Ｗ_Ｒ ^Ｋと内耳信号Ｘ_Ｌ，Ｘ_Ｒの線形結合情報を用いて音源方向の確率を計算する（Ｓ００４）。内耳信号（Ｘ_Ｌ，Ｘ_Ｒ）に対して、その音源方向がｋである確率は次式のロジスティック変換（ｌｏｇｉｓｔｉｃｔｒａｎｓｆｏｒｍ）で与えられる。

と定義される。ここに、Ｗ_Ｌ ^Ｋ、Ｗ_Ｒ ^Ｋは学習されたＴ×Ｃの重み行列、ｔは転置行列を表す。 Next, the probability calculation means 7 calculates the probability of the sound source direction using the linear combination information of the weight matrices W _L ^K and W _R ^K and the inner ear signals X _L and X _R (S004). For the inner ear signals (X _L , X _R ), the probability that the sound source direction is k is given by the following logistic transformation.

Is defined. Here, W _L ^K and W _R ^K represent a learned T × C weight matrix, and t represents a transposed matrix.

ロジスティック変換により計算された確率の例は図３の最下欄に示されるように、各方向（−９０°〜９０°）に対する確率として得られる。次に音源方向判定手段８は、確率演算工程で計算された確率に基づいて音源方向を判定する（Ｓ００５）。確率演算工程での計算結果及び音源方向判定工程での判定結果は記憶部９に記憶され、表示部１０に表示される。 Examples of probabilities calculated by logistic transformation are obtained as probabilities for each direction (−90 ° to 90 °) as shown in the bottom column of FIG. Next, the sound source direction determination means 8 determines the sound source direction based on the probability calculated in the probability calculation step (S005). The calculation result in the probability calculation step and the determination result in the sound source direction determination step are stored in the storage unit 9 and displayed on the display unit 10.

図５にパルス音で学習した場合の重み行列を縮約したものの例を示す。Ｗ_Ｌ（上側：左耳）、Ｗ_Ｒ（下側：右耳）を、時間平均した結果を示す。横軸は角度、縦軸は周波数を表し、明暗が値に対応する（白色：０〜黒色：１）。パルス音による学習では、ランダムに開始時刻を選んでデータを抽出したので、時間情報を利用できず、周波数情報を用いて方向を判別するしかないが、実際、重み行列に時間的な変化は見られなかった。角度により連続的に変化する重み行列に表された振幅周波数特性の積分微分情報に基づいて方向を推定していると考えられる。 FIG. 5 shows an example of a reduced weight matrix when learning with pulse sounds. The time-averaged results of W _L (upper side: left ear) and W _R (lower side: right ear) are shown. The horizontal axis represents angle, the vertical axis represents frequency, and light and dark correspond to values (white: 0 to black: 1). In learning with a pulse sound, data is extracted by selecting a start time at random, so time information cannot be used and the direction can only be determined using frequency information. I couldn't. It is considered that the direction is estimated based on the integral / derivative information of the amplitude frequency characteristic represented in the weight matrix continuously changing according to the angle.

次に、一般的な判別・予測を行うように設計されている学習機械ｄＰＬＲＭについて説明する。 Next, a learning machine dPLRM designed to perform general discrimination / prediction will be described.

健康診断において血液検査データから疾患の有無を判定・弁別する問題を例にとって述べる。被験者ｊのｎ項目にわたる検査のデータを並べた検査データの（ｎ次元）ベクトルをｘ_ｊとし、疾患の有無を、健常ならばｃ_ｊ＝１、判定不能ならばｃ_ｊ＝２、疾患ありならばｃ_ｊ＝３と表すとしよう。 An example of the problem of judging / discriminating the presence or absence of a disease from blood test data in a health examination will be described as an example. X _j is an (n-dimensional) vector of examination data in which examination data over n items of subject j is arranged, and the presence or absence of disease is c _j = 1 if healthy, c _j = 2 if judgment is impossible, and if there is disease For example, let c _j = 3.

疾患の有無が判定出来ている多数の人（Ｎ人としよう）のデータ（訓練用データセットと呼ばれる）を学習機械ｄＰＬＲＭに入力することにより、ｄＰＬＲＭは学習を内部的に行い判別方式を生成する。すなわち、学習済みのｄＰＬＲＭによって、確率ベクトル（関数）ｐ^＊（ｘ）≡（ｐ_１（ｘ），ｐ_２（ｘ），ｐ_３（ｘ））^ｔが与えられ、未判定の人の検査データベクトルｘ（ｎ次元）に対して、健常である確率ｐ_１（ｘ）、判別不能である確率ｐ_２（ｘ）、疾患がある確率ｐ_３（ｘ）の組が計算予測できる。この例ではクラスの数が３であるが、一般には任意の大きさの自然数Ｋでよい。 By inputting data (called a training data set) of a large number of people who can determine the presence or absence of disease (referred to as training data set) to the learning machine dPLRM, dPLRM performs learning internally and generates a discrimination method. . That is, the learned dPLRM gives the probability vector (function) p ^* (x) ≡ (p ₁ (x), p ₂ (x), p ₃ (x)) ^t , and the test data of the undetermined person for a vector x (n-dimensional), the probability _p 1 is a healthy (x), the probability _p 2 is indistinguishable (x), the set of probability _p 3 there is disease (x) can be calculated prediction. In this example, the number of classes is 3, but generally a natural number K having an arbitrary size may be used.

確率ベクトル（関数）ｐ^＊（ｘ）≡（ｐ_１（ｘ），ｐ_２（ｘ），ｐ_３（ｘ））^ｔが生成される仕組みを説明する前に、数理的操作の便宜のため記号を導入する。たとえば上記の例において、健常（ｃ＝１）、判定不能（ｃ＝２）、疾患あり（ｃ＝３）をそれぞれ単位ベクトル

などとコード化する。一般に、訓練用データの判定に対応するＮ個のベクトルを並べて、Ｋ×Ｎ行列

を、その第ｊ列ベクトルが被験者ｊの判定コードとなるように定義し、訓練用データセットの判定結果をこの行列に蓄えておく。 Probability vector (function) p ^* (x) ≡ (p ₁ (x), p ₂ (x), p ₃ (x)) Before describing the mechanism by which ^t is generated, symbols for the convenience of mathematical operations Is introduced. For example, in the above example, healthy (c = 1), indeterminate (c = 2), and diseased (c = 3) unit vectors

And so on. In general, N vectors corresponding to the determination of training data are arranged in a K × N matrix.

Is defined such that the j-th column vector is the determination code of the subject j, and the determination result of the training data set is stored in this matrix.

ｄＰＬＲＭにおいては、確率ベクトル（関数）ｐ^＊（ｘ）≡（ｐ_１（ｘ），ｐ_２（ｘ），ｐ_３（ｘ））^ｔは、一般的には次のように定義されている。すなわち、Ｒ^ＮからＲ^Ｋへの線形写像

およびそのロジスティック変換

を用いて

と定義（モデル化）される。ただし、Ｋ×Ｎ行列Ｖは，訓練用データから学習によって推定されるべきのパラメータ行列、ｋ（ｘ）は次のＲ^ｎからＲ^Ｎへの線形・非線型写像で

と定義されるもので、Ｋ（ｘ，ｙ）としては、任意の正定値核関数を用いることが出来る仕組みになっている。 In dPLRM, a probability vector (function) p ^* (x) ≡ (p ₁ (x), p ₂ (x), p ₃ (x)) ^t is generally defined as follows. That is, a linear mapping from ^{R N} to ^{R K}

And its logistic conversion

Using

Is defined (modeled). However, K × N matrix V, the parameter matrix to be estimated by the learning from the training data, k (x) is a linear and non-linear mapping from the next R ⁿ to R ^N

As K (x, y), any positive definite kernel function can be used.

確率ベクトルｐ^＊（ｘ）に対する上記のモデル定義の下で、与えられた訓練用データセット

で与えられる、Ｖに関するこの凸関数を最小化すること（すなわち最尤推定法）により、Ｖの最尤推定値が得られるが、一般に訓練用データに過剰に適合してしまい（過学習）、未判別データに対する予測判別能力が極端に劣化する。ｄＰＬＲＭにおいては、この現象を避けるため罰金関数を導入し、罰金付きの負の対数尤度関数

を最小化することにより、Ｖの値が推定され、予測性能に優れた学習が実現されている。

で与えられている。また，

は訓練用データデータの各クラスの大きさによる偏りを補正するように，たとえば．

と設定される。 Given the above model definition for the probability vector p ^* (x), a given training data set

By minimizing this convex function for V given by (ie, the maximum likelihood estimation method), a maximum likelihood estimate of V is obtained, but generally fits excessively with training data (overlearning), Predictive discrimination ability for undiscriminated data is extremely deteriorated. In dPLRM, a fine function is introduced to avoid this phenomenon, and a negative negative log likelihood function with a fine

By minimizing the value of V, the value of V is estimated, and learning with excellent prediction performance is realized.

Is given in. Also,

To correct the bias due to the size of each class of training data, for example.

Is set.

学習機械ｄＰＬＲＭにおける主要な計算過程は、罰金付きの負の対数尤度関数ＰＬ（Ｖ）の最小値を与えるＶを数値的に求めることにある。それは非線形行列方程式

を数値的に解くことに帰着する。これは次の反復法アルゴリズムによって実現されている。アルゴリズムｄＰＬＲＭ−２：初期値をＶ^０（Ｋ×Ｎ行列）とする．系列｛Ｖ^ｉ｝を次式に従って反復計算する（非特許文献４参照）。

ただし、 ΔＶ^ｉは次の行列の線形等式の解である。

上記アルゴリズムの詳細については非特許文献４，５，６を参照のこと。 The main calculation process in the learning machine dPLRM is to numerically obtain V giving the minimum value of the negative log likelihood function PL (V) with a fine. It is a nonlinear matrix equation

It comes down to solving numerically. This is realized by the following iterative algorithm. Algorithm dPLRM-2: The initial value is V ⁰ (K × N matrix). The sequence {V ⁱ } is iteratively calculated according to the following equation (see Non-Patent Document 4).

Where ΔV ⁱ is the solution of the linear equation of the following matrix.

See Non-Patent Documents 4, 5, and 6 for details of the above algorithm.

学習機械ｄＰＬＲＭは学習データ中の（隠れた）構造を幅広く表現することができ、非常に高い帰納力を有している。ｄＰＬＲＭは元々罰金付きロジスティック回帰機械ＰＬＲＭ（ＰｅｎａｌｉｚｅｄＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎＭａｃｈｉｎｅ）の双対マシンとして導入されたもので，ＰＬＲＭにおいては、写像Ｆ（ｘ）は、式４の代わりに

と定義されている。ただしφ（ｘ）＝（φ_１（ｘ），Ｋ，φ_ｍ（ｘ））^ｔの各要素は、任意のｘの非線形関数、ＷはＫ×ｍのパラメータ行列である。ＰＬＲＭにおいては次の罰金付き負の対数尤度関数

を最小化することにより、パラメターＷを推定する。ここでΣは正定値行列である。φ（ｘ），Σ，Ｋ（ｘ、ｙ）が適切に選ばれた場合、ｄＰＬＲＭとＰＬＲＭは等価なモデルとなり、一つの訓練用データセットに対して、いずれも同一の確率予測ベクトルｐ^＊（ｘ）を与える。また、学習機械ＰＬＲＭ、ｄＰＬＲＭは、確率予測ｐ^＊（ｘ）を与えるものであると同時に、方式

によって、決定論的予測も行う（非特許文献４，５，６を参照）。 The learning machine dPLRM can express a wide range of (hidden) structures in the learning data and has a very high inductive power. dPLRM was originally introduced as a dual machine of the finer logistic regression machine PLRM (Penalized Logistic Regression Machine). In PLRM, the map F (x) is replaced by

It is defined as However, each element of φ (x) = (φ ₁ (x), K, φ _m (x)) ^t is an arbitrary non-linear function of x, and W is a parameter matrix of K × m. In PLRM, the following negative log-likelihood function with a fine

The parameter W is estimated by minimizing. Here, Σ is a positive definite matrix. When φ (x), Σ, K (x, y) are appropriately selected, dPLRM and PLRM are equivalent models, and for one training data set, the same probability prediction vector p ^* ( x). In addition, the learning machines PLRM and dPLRM give a probability prediction p ^* (x), and at the same time,

To perform deterministic prediction (see Non-Patent Documents 4, 5, and 6).

本発明における音源方向の検出法においてｄＰＬＲＭを採用した理由は、ＰＬＲＭに対して計算量が大幅に改善される点に着目したからである。特に、本発明提案の最大の特色は、核関数として最も単純でかつ実装化が容易である一次多項式核関数

を用いるだけで、音源方向の検出が可能であることを実験的に発見、これを用いて音源方向の検出装置を発明した点にある。 The reason for adopting dPLRM in the sound source direction detection method in the present invention is that attention is paid to the fact that the calculation amount is greatly improved with respect to PLRM. In particular, the greatest feature of the present invention is the first-order polynomial kernel function that is the simplest kernel function and easy to implement.

It was found that it was possible to detect the direction of the sound source only by using this, and the sound source direction detecting device was invented using this.

最後に、上記音源方向検出装置による検出実験について説明する。 Finally, a detection experiment by the sound source direction detection device will be described.

実験では、シミュレーションデータを用いて、前方水平面内［−９０°〜９０°］から到来するパルス音もしくは純音を模擬した。サンプリング周波数は４８ｋＨｚとした。両耳間時間差（ＩＴＤ）だけに対象を限定するため、周波数帯域１００Ｈｚ〜１ｋＨｚ、チャネル数５０のガンマチャープ聴覚フィルタ、およびメディスの内有毛細胞モデルを用い、神経発火パターンを内耳モデルの出力信号とした。 In the experiment, a pulse sound or a pure sound arriving from [-90 ° to 90 °] in the front horizontal plane was simulated using simulation data. The sampling frequency was 48 kHz. In order to limit the target to the interaural time difference (ITD) only, a gamma-chirp auditory filter with a frequency band of 100 Hz to 1 kHz and a channel number of 50 and a median inner hair cell model are used, and a nerve firing pattern is used as an output signal of the inner ear model. It was.

正解方向は、前方水平面内の１０°ごとに取って計１９方向（カテゴリ）とし、この方向について学習およびテストを行った。学習においては、正解方向の信号に対してその方向カテゴリだけに１の確率を与えた。また、重み行列Ｗに角度による連続性を持たせるために、正解方向から±９°の範囲にあるデータも与えた。パルス音の場合には、正解方向を中心として、その範囲にあるデータの個数に三角窓をかけて、その方向カテゴリのデータとして用いた。純音の場合には、角度のずれに比例して与える確率値を隣のカテゴリと案分した。 The correct answer direction was taken every 10 ° in the front horizontal plane to be a total of 19 directions (category), and learning and testing were performed in this direction. In learning, the probability of 1 was given only to the direction category with respect to the signal in the correct direction. Further, in order to give the weight matrix W continuity by angle, data in a range of ± 9 ° from the correct direction was also given. In the case of a pulse sound, the correct answer direction is the center, and the number of data in the range is multiplied by a triangular window and used as data in that direction category. In the case of a pure tone, the probability value given in proportion to the angle shift was divided into the adjacent categories.

また、パルス音は１００Ｈｚのものを用いた。学習では、開始時刻をランダムに抽出した１００個の２ｍｓの神経発火パターン（ＮＡＰ）（左右各９６×５０＝４８００次元）を用いた。テストでは、同様にランダム抽出した７個の２ｍｓのＮＡＰを用いた。正解は１９方向あるので、テスト総数は１３３（＝７×１９）である。 The pulse sound was 100 Hz. In the learning, 100 2 ms neural firing patterns (NAP) (96 × 50 = 4800 dimensions on each of the left and right sides) obtained by randomly extracting start times were used. In the test, seven 2 ms NAPs randomly extracted in the same manner were used. Since there are 19 correct answers, the total number of tests is 133 (= 7 × 19).

純音に関しては、１００〜１０００Ｈｚに対応するＥＲＢ軸（等価矩形周波数幅軸）を等分割する２５個の周波数を用いて作成した。各純音ごとに位相の開始時点を利用して、同期の取れたＮＡＰを作成した。時間長は２０ｍｓあるいは２ｍｓとした。テスト総数は４７５（＝１９×２５）である。 The pure tone was created using 25 frequencies that equally divide the ERB axis (equivalent rectangular frequency width axis) corresponding to 100 to 1000 Hz. A synchronized NAP was created using the phase start time for each pure tone. The time length was 20 ms or 2 ms. The total number of tests is 475 (= 19 × 25).

表１にパルス音について、線形結合だけを用いた本方法と、両耳のＮＡＰの相互相関係数（２ｍｓ）をｄＰＬＲＭの訓練・テスト用のデータとして用いた場合の音源解答正解率を示す。この表から、相互相関情報を用いなくても、同等の性能が得られることがわかる。

Table 1 shows the sound source answer correct rate when using the present method using only linear combination and the NAP cross-correlation coefficient (2 ms) as data for training and testing of dPLRM for pulse sounds. From this table, it can be seen that equivalent performance can be obtained without using cross-correlation information.

表２に長さの異なる純音について、データ長が２０ｍｓと２ｍｓの場合の方向正解率を示す。周期が２ｍｓより長い５００Ｈｚ以下の純音についても２ｍｓである程度方向推定できることがわかる。

Table 2 shows the directional accuracy rates for pure tones with different lengths when the data length is 20 ms and 2 ms. It can be seen that the direction can be estimated to some extent in 2 ms even for a pure tone with a period longer than 2 ms and 500 Hz or less.

以上のように、ｄＰＬＲＭにより、パルス音もしくは純音について、聴覚フィルタの出力の一次結合情報に基づいて音源方向定位ができることを確認できた。以上により、一次多項式の核関数を用いるｄＰＬＲＭに基づく確率計算が音源方向の検出に有効であることが確認できた。 As described above, it was confirmed by dPLRM that sound source direction localization can be performed for pulsed sound or pure sound based on the primary combination information output from the auditory filter. From the above, it was confirmed that the probability calculation based on dPLRM using the kernel function of the first-order polynomial is effective for detecting the sound source direction.

また、本発明は実施の形態における音源方向検出方法を、コンピュータに実行させるためのプログラムとして、また当該プログラムを記録したコンピュータ読み取り可能な記録媒体としても実現可能である。プログラムは、コンピュータに内蔵のＲＯＭに記録して用いても良く、ＦＤ、ＣＤ−ＲＯＭ、内蔵又は外付けの磁気ディスク等の記録媒体に記録し、コンピュータに読み取って用いても良く、インターネットを介してコンピュータにダウンロードして用いても良い。 The present invention can also be realized as a program for causing a computer to execute the sound source direction detection method according to the embodiment, and also as a computer-readable recording medium on which the program is recorded. The program may be used by being recorded in a ROM built in the computer, or may be recorded on a recording medium such as an FD, CD-ROM, built-in or external magnetic disk, read by the computer, and used via the Internet. May be downloaded to a computer and used.

以上、本発明の実施の形態について説明したが、本発明は上記の実施の形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で実施の形態に種々変更を加えられることは明白である。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, and various modifications can be made to the embodiments without departing from the spirit of the present invention. It is.

例えば、上記実施の形態では、周波数帯域毎の振幅時系列信号として神経発火パターンの場合を説明したが、これに限られず、フィルタバンク分析による帯域フィルタバンクの出力等を用いても良い。また、受信部の受信器が２個で、２つの振幅時系列信号と２つの重み行列を結合する例を説明したが、受信器は３個以上でも良く、これに対応して、３以上の振幅時系列信号、重み行列を結合しても良い。また、学習機械としてｄＰＬＲＭ以外の数理モデル、例えばＳＶＭ等を使用しても良く、ｄＰＬＲＭの場合には音源方向検出法以外にも、音声認識や話者認識を併せて行うことも可能である。また、例えば、上記実施の形態では、重み行列と振幅時系列信号の線形結合情報を用いる例を説明したが、非線形結合情報を用いてもそれなりの効果を得ることができる。また、振幅時系列信号、重み行列の大きさは適宜変更可能である。 For example, in the above embodiment, the case of a neural firing pattern as an amplitude time-series signal for each frequency band has been described. However, the present invention is not limited to this, and the output of a band filter bank by filter bank analysis may be used. Moreover, although the example of combining two amplitude time-series signals and two weight matrices has been described with two receivers of the receiving unit, the number of receivers may be three or more, and correspondingly three or more An amplitude time series signal and a weight matrix may be combined. In addition, a mathematical model other than dPLRM, such as SVM, may be used as the learning machine. In the case of dPLRM, speech recognition and speaker recognition can be performed together with the sound source direction detection method. For example, in the above-described embodiment, an example in which linear combination information of a weight matrix and an amplitude time series signal is used has been described. However, even if nonlinear combination information is used, a certain effect can be obtained. Moreover, the magnitude of the amplitude time series signal and the weight matrix can be changed as appropriate.

本発明は音声センサーを備えた情報通信機器、電気機器、ロボット等に利用できる。 The present invention can be used for information communication equipment, electrical equipment, robots and the like equipped with a voice sensor.

本発明の実施の形態による音源方向検出装置の構成例のブロック図である。It is a block diagram of the structural example of the sound source direction detection apparatus by embodiment of this invention. 本発明の実施の形態における音源方向検出方法の処理フローを示す図である。It is a figure which shows the processing flow of the sound source direction detection method in embodiment of this invention. 本発明の実施の形態における信号の流れを模式的に示す図である。It is a figure which shows typically the flow of the signal in embodiment of this invention. 内耳信号の例を示す図である。It is a figure which shows the example of an inner ear signal. パルス音で学習した場合の重み行列の例を示す図である。It is a figure which shows the example of the weighting matrix at the time of learning with a pulse sound.

符号の説明Explanation of symbols

１音源
２音源方向検出装置
３受信部
４信号変換部
５学習・検出部
６学習手段
７確率演算手段
８音源方向判定手段
９記憶部
１０表示部
１１制御部
Ｗ重み行列
Ｘ内耳信号 DESCRIPTION OF SYMBOLS 1 Sound source 2 Sound source direction detection apparatus 3 Reception part 4 Signal conversion part 5 Learning and detection part 6 Learning means 7 Probability calculation means 8 Sound source direction determination means 9 Storage part 10 Display part 11 Control part W Weight matrix X Inner ear signal

Claims

音源からの信号を複数の受信器で受信する受信部と；
前記受信部で受信した信号に内耳モデル等の時間周波数分析を適用して、周波数帯域毎の振幅時系列信号に変換する信号変換部と；
訓練用データを用いて重み行列を学習しながら更新する学習手段と、前記重み行列と前記振幅時系列信号の線形結合情報を用いて音源方向の確率を計算する確率演算手段と、前記確率演算部で計算された確率に基づいて音源方向を判定する音源方向判定手段とを有する学習・検出部とを備える；
音源方向検出装置。 A receiver for receiving signals from a sound source by a plurality of receivers;
Applying a time frequency analysis such as an inner ear model to the signal received by the receiving unit to convert it into an amplitude time-series signal for each frequency band;
Learning means for updating while learning a weight matrix using training data, probability calculating means for calculating a probability of a sound source direction using linear combination information of the weight matrix and the amplitude time series signal, and the probability calculating unit A learning / detecting unit having sound source direction determining means for determining a sound source direction based on the probability calculated in
Sound source direction detection device.

前記学習・検出部は罰金付きロジスティック回帰マシンの双対機械（ｄｕａｌＰｅｎａｌｉｚｅｄＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎＭａｃｈｉｎｅ；ｄＰＬＲＭ）を用いる；
請求項１に記載の音源方向検出装置。 The learning / detecting unit uses a dual machine (dually registered logistic regression machine; dPLRM);
The sound source direction detecting device according to claim 1.

音源からの信号を複数の受信器で受信する受信工程と；
前記受信工程で受信した信号に内耳モデル等の時間周波数分析を適用して、周波数帯域毎の振幅時系列信号に変換する信号変換工程と；
訓練用データを用いて重み行列を学習しながら更新する学習工程と；
前記重み行列と前記振幅時系列信号の線形結合情報を用いて音源方向の確率を計算する確率演算工程と；
前記確率演算工程で計算された確率に基づいて音源方向を判定する音源方向判定工程とを備える；
音源方向検出方法。 Receiving a signal from a sound source with a plurality of receivers;
Applying a time frequency analysis such as an inner ear model to the signal received in the receiving step to convert it into an amplitude time-series signal for each frequency band;
A learning step for updating the weighting matrix while learning the training data;
A probability calculation step of calculating a probability of a sound source direction using linear combination information of the weight matrix and the amplitude time series signal;
A sound source direction determination step of determining a sound source direction based on the probability calculated in the probability calculation step;
Sound source direction detection method.

前記学習工程、前記確率演算工程及び前記音源方向判定工程に、罰金付きロジスティック回帰マシンの双対機械（ｄｕａｌＰｅｎａｌｉｚｅｄＬｏｇｉｓｔｉｃＲｅｇｒｅｓｓｉｏｎＭａｃｈｉｎｅ；ｄＰＬＲＭ）を用いる；
請求項３に記載の音源方向検出方法。 For the learning step, the probability calculation step, and the sound source direction determination step, a dual logistic regression machine dual machine (dPLRM) with a fineness is used;
The sound source direction detection method according to claim 3.

請求項３又は請求項４に記載の音源方向検出方法をコンピュータに実行させるためのプログラム

A program for causing a computer to execute the sound source direction detection method according to claim 3 or 4.