JP2007085734A - Sound source direction detection device and method - Google Patents

Sound source direction detection device and method Download PDF

Info

Publication number
JP2007085734A
JP2007085734A JP2005271227A JP2005271227A JP2007085734A JP 2007085734 A JP2007085734 A JP 2007085734A JP 2005271227 A JP2005271227 A JP 2005271227A JP 2005271227 A JP2005271227 A JP 2005271227A JP 2007085734 A JP2007085734 A JP 2007085734A
Authority
JP
Japan
Prior art keywords
sound source
source direction
learning
probability
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
JP2005271227A
Other languages
Japanese (ja)
Inventor
Tomoko Matsui
知子 松井
Kunihito Tanabe
國士 田邉
Toshio Irino
俊夫 入野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Research Organization of Information and Systems
Original Assignee
Research Organization of Information and Systems
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Research Organization of Information and Systems filed Critical Research Organization of Information and Systems
Priority to JP2005271227A priority Critical patent/JP2007085734A/en
Publication of JP2007085734A publication Critical patent/JP2007085734A/en
Withdrawn legal-status Critical Current

Links

Images

Landscapes

  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

<P>PROBLEM TO BE SOLVED: To provide a sound source direction detection method and a sound source direction detection method based on a principle quite different from a conventional method using a mutual correlation coefficient of sample data. <P>SOLUTION: A signal from a sound source 1 is received by both pseudo ears. A received sound signal is converted nonlinearly, for example, an inner ear model is applied to the signal received by both pseudo ears to thereby convert it into an inner ear signal. The sound source direction is detected by a dual Penalized Logistic Regression Machine (dPLRM) by using a linear combined information between a weight matrix W and the inner ear signal X. The weight matrix W is learned by using the dPLRM from a plurality of training data expressed by a pair of the inner ear signal and its right answer direction. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、音源方向検出装置及び音源方向検出方法に関する。すなわち、方向が既知の音源データ(訓練用データ)を用いて学習機械の一種である罰金付きロジスティック回帰マシンの双対機械(dual Penalized Logistic Regression Machine;dPLRM)に学習させ、学習済みの学習機械によって未知の音源データに対して、音源方向の確率を計算し、音源方向を推定する音源方向検出装置及び音源方向検出方法に関する。   The present invention relates to a sound source direction detecting device and a sound source direction detecting method. That is, using a sound source data (training data) whose direction is known, a dual machine (dually registered logistic regression machine; dPLRM) of a fine logistic regression machine, which is a kind of learning machine, learns and is unknown by a learned learning machine. The present invention relates to a sound source direction detection apparatus and a sound source direction detection method for calculating the probability of a sound source direction and estimating the sound source direction.

音源方向は、複数のマイクロホン(マイクロホンアレー)で受信した信号の時間差によって物理的に特定出来る。従来の工学的な音源方向の検出においては、この原理に基づいて、マイクロホンアレーを配置し、それらの受音信号の相互相関係数を計算し、時間差を推定することにより音源方向を推定している。実際、受音信号の相関行列を用いた線形予測モデルなどのパラメトリックモデルによる音源方向の検出法がいくつか提案されている。それらの方法ではいずれも、受音信号の標本相互相関係数、または相関行列を求める必要がある。(非特許文献1参照)   The sound source direction can be physically specified by the time difference between signals received by a plurality of microphones (microphone array). In conventional engineering sound source direction detection, based on this principle, a microphone array is arranged, the cross-correlation coefficient of those received signals is calculated, and the sound source direction is estimated by estimating the time difference. Yes. In fact, several sound source direction detection methods using a parametric model such as a linear prediction model using a correlation matrix of received sound signals have been proposed. In any of these methods, it is necessary to obtain the sample cross-correlation coefficient or correlation matrix of the received signal. (See Non-Patent Document 1)

人間は両耳により、音源方向とともに、音の広がり感も知覚することができる。前記の工学的な音源方向の検出法では音の広がり感は扱うことができない。人間の優れた方向知覚の解明が、機械装置による音源方向検出に有用である。   With both ears, humans can perceive the sound spread as well as the direction of the sound source. The engineered sound source direction detection method cannot handle the sense of sound spread. Elucidation of human direction perception is useful for sound source direction detection by mechanical devices.

近年、人間が行っている学習を真似て、ニューラル・ネットワーク(NN)やサポートベクターマシン(SVM)等の学習機械と呼ばれる数理モデルを用いて訓練用データによってパラメ−タを学習させ、予測を行う方法が開発され、音声に限らずあらゆる信号処理やデータ解析に有効であることが広く知られている。(非特許文献2,3参照)   In recent years, human learning is imitated, parameters are learned by training data using mathematical models called learning machines such as neural networks (NN) and support vector machines (SVM), and prediction is performed. A method has been developed, and it is widely known that it is effective not only for speech but also for all signal processing and data analysis. (See Non-Patent Documents 2 and 3)

学習機械の一種である罰金付きロジスティック回帰機械(Penalized Logistic Regression Machines;PLRM)及びその双対機械(dual Penalized Logistic Regression Machines;dPLRM)は、NNやSVMと異なり、確率的予測(判定)ができることに特色があり、話者認識などに有効であることが示されている。(非特許文献4〜9参照)   A kind of learning machine, the Penalized Logistic Regression Machine (PLRM) and its dual machine (Dual Penalized Logistics Machines; dPLRM), which are different from NN and SVM, can be determined with a special probability. It is shown that it is effective for speaker recognition. (See Non-Patent Documents 4-9)

大賀寿郎、山崎芳雄、金田豊、“音響システムとディジタル信号処理”、電子情報通信学会、1995.Toshiro Oga, Yoshio Yamazaki, Yutaka Kaneda, "Acoustic system and digital signal processing", IEICE, 1995. 麻生英樹、“ニューラルネットワーク情報処理”、産業図書、1988.Hideki Aso, “Neural Network Information Processing”, Sangyo Tosho, 1988. 麻生英樹、津田宏治、村田昇、“パタン認識と学習の統計学、新しい概念と手法”、岩波書店、2003.Hideki Aso, Koji Tsuda, Noboru Murata, “Statistics of Pattern Recognition and Learning, New Concepts and Methods”, Iwanami Shoten, 2003. K.Tanabe,“Penalized logistic regression machines:New methods for statistical prediction 1” ,ISM Cooperative Research Report 143,pp.163−194,March 2001.K. Tanabe, “Penalized logistic regression machines: New methods for statistical prediction 1”, ISM Cooperative Research Report 143, pp. 197 163-194, March 2001. K.Tanabe,“Penalized logistic regression machines:New methods for statistical prediction 2” ,第4回情報論的学習理論ワークショップ(IBIS2001),pp.71〜76,July.2001.K. Tanabé, “Penalized logistic regression machines: New methods for statistical prediction 2”, 4th Informational Learning Theory Workshop (IBIS2001), pp. 11-28. 71-76, July. 2001. K.Tanabe,“Penalized logistic regression machines and Related Linear Numerical Algebra” ,京都大学数理解析研究所、講究録1320,pp.239〜249,2003.K. Tanab, “Penalized logistic regression machines and Related Linear Numerical Algebra”, Institute of Mathematical Analysis, Kyoto University, Proc. 239-249, 2003. T.Matsui and K.Tanabe,“Speaker Identification with Dual Penalized Logistic Regression Machine” ,Proceedings of Odyssey,pp.363〜366,2004.T. T. et al. Matsui and K.M. Tanab, “Speaker Identification with Dual Penalized Logic Regression Machine”, Processeds of Odyssey, pp. 363-366, 2004. T.Matsui and K.Tanabe,“Probabilistic Speaker Identification with dual Penalized Logistic Regression Machine” ,Proceedings of ICSLP,pp.III−1797〜1800,2004.T. T. et al. Matsui and K.M. Tanab, “Probabilistic Spike Identification with Dual Penalized Logistic Regression Machine”, Proceedings of ICSLP, pp. III-1797-1800, 2004. T.Matsui and K.Tanabe,“Speaker Recognition without Feature Extraction Process” ,Proceedings of Workshop on Statistical Modeling Approach for Speech Recognition:Beyond HMM,pp.79〜84,2004.T. T. et al. Matsui and K.M. Tanab, “Speaker Recognition without Feature Extraction Process”, Processeds of Workshops on Strategic Modeling for Prop. 79-84, 2004.

しかしながら、受音信号の標本相互相関係数、または相関行列を求めるには、コンピュータ演算に比較的多くの資源と時間を必要とする。また、人間の優れた方向知覚に近い学習機械による音源方向検出が求められるようになった。   However, in order to obtain a sample cross-correlation coefficient or correlation matrix of a received sound signal, a relatively large amount of resources and time are required for computer computation. In addition, sound source direction detection by a learning machine that is close to human's superior direction perception has been required.

本発明は、音源方向検出問題に、dPLRMを適用し、しかも最も単純な一次多項式の核関数を使って音源方向の検出ができることを見出したもので、先に述べた従来の方法が依拠する原理からみると意外な結果である。本発明はこの発見を背景として提供するものである。   The present invention has found that sound source direction can be detected by applying dPLRM to the sound source direction detection problem and using the simplest kernel function of a first-order polynomial, and the principle on which the above-described conventional method relies. This is an unexpected result. The present invention provides this discovery as a background.

本発明は、複数の受信装置から得られる信号を変換した信号の線形結合情報のみを用いて、dPLRMによって音源方向を検出できるという実験的発見に基づいて、標本データの相互相関係数を用いる従来の方法とは全く異なる原理に基づく音源方向検出法装置及び音源方向検出方法を提供することを目的とする。   The present invention uses a cross-correlation coefficient of sample data based on an experimental discovery that a sound source direction can be detected by dPLRM using only linear combination information of signals obtained by converting signals obtained from a plurality of receiving apparatuses. It is an object of the present invention to provide a sound source direction detection method device and a sound source direction detection method based on a completely different principle from the above method.

上記課題を解決するために、請求項1に記載の音源方向検出装置2は、例えば図1に示すように、音源1からの信号を複数の受信器で受信する受信部3と;受信部3で受信した信号に内耳モデル等の時間周波数分析を適用して、周波数帯域毎の振幅時系列信号X,Xに変換する信号変換部4と;学習用データを用いて重み行列W ,W を学習しながら更新する学習手段6と、重み行列W ,W と振幅時系列信号X,Xの線形結合情報を用いて音源方向の確率を計算する確率演算手段7と、確率演算手段7で計算された確率に基づいて音源方向を判定する音源方向判定手段8とを有する学習・検出部5とを備える。 In order to solve the above-mentioned problem, a sound source direction detecting device 2 according to claim 1 includes, for example, a receiving unit 3 that receives signals from the sound source 1 by a plurality of receivers as shown in FIG. A signal conversion unit 4 that applies time-frequency analysis such as an inner ear model to the signal received in step S1 to convert it into amplitude time-series signals X L and X R for each frequency band; and a weighting matrix W L K using learning data , W R K learning means 6 for updating while learning, and probability calculation for calculating the probability of the sound source direction using linear combination information of the weight matrices W L K , W R K and the amplitude time series signals X L , X R The learning / detecting unit 5 includes means 7 and sound source direction determining means 8 for determining the sound source direction based on the probability calculated by the probability calculating means 7.

ここにおいて、周波数帯域毎の振幅時系列信号として、典型的には神経発火パターンが用いられるが、これに限られず、フィルタバンク分析による帯域フィルタバンクの出力等を用いても良い。また、典型的には受信部の受信器は2個であるが、3以上でも良い。これに対応して、線形結合は、2つの振幅時系列信号と2つの重み行列を結合するものに限られず、3以上の振幅時系列信号、重み行列を結合しても良い。このように構成すると、標本データの相互相関をとることなく、複数の受信装置から得られる信号の線形結合情報のみから音源方向を検出できる音源方向検出装置を提供できる。   Here, a neural firing pattern is typically used as the amplitude time-series signal for each frequency band. However, the present invention is not limited to this, and the output of a band filter bank by filter bank analysis may be used. Typically, there are two receivers in the receiving unit, but three or more may be used. Correspondingly, the linear combination is not limited to combining two amplitude time series signals and two weight matrices, and three or more amplitude time series signals and weight matrices may be combined. If comprised in this way, the sound source direction detection apparatus which can detect a sound source direction only from the linear combination information of the signal obtained from a some receiver, without taking the cross correlation of sample data can be provided.

請求項2に記載の発明は、請求項1に記載の音源方向検出装置において、例えば図1に示すように、学習・検出部5は罰金付きロジスティック回帰マシン(dual Penalized Logistic Regression Machine;dPLRM)を用いる。このように構成すると、dPLRMを用いることにより、計算量が大幅に改善され、音源方向の検出を特に効率的に行うことができる。   The invention according to claim 2 is the sound source direction detecting device according to claim 1, in which, for example, as shown in FIG. 1, the learning / detecting unit 5 uses a fine penalized logistic regression machine (dPLRM). Use. If comprised in this way, calculation amount will be improved significantly by using dPLRM, and the detection of a sound source direction can be performed especially efficiently.

請求項3に記載の音源方向検出方法は、例えば図2に示すように、音源からの信号を複数の受信器で受信する受信工程(S001)と;受信工程で受信した信号に内耳モデル等の時間周波数分析を適用して、周波数帯域毎の振幅時系列信号X,Xに変換する信号変換工程(S002)と;訓練用データを用いて重み行列W ,W を学習しながら更新する学習工程(S003)と;重み行列W ,W と振幅時系列信号X,Xの線形結合情報を用いて音源方向の確率を計算する確率演算工程(S004)と;確率演算工程で計算された確率に基づいて音源方向を判定する音源方向判定工程(S005)とを備える。 The sound source direction detection method according to claim 3 includes a receiving step (S001) of receiving signals from a sound source by a plurality of receivers as shown in FIG. 2, for example, an inner ear model or the like in the signal received in the receiving step Applying time-frequency analysis, a signal conversion step (S002) for converting the amplitude time-series signals X L and X R for each frequency band; and learning the weight matrices W L K and W R K using the training data A learning step (S003) that is updated while updating; a probability calculation step (S004) that calculates the probability of the sound source direction using the linear combination information of the weight matrices W L K and W R K and the amplitude time series signals X L and X R A sound source direction determination step (S005) for determining the sound source direction based on the probability calculated in the probability calculation step.

このように構成すると、標本データの相互相関をとることなく、複数の受信装置から得られる信号の線形結合情報のみから音源方向を検出できる音源方向検出方法を提供できる。3つ以上の受音信号が利用できる場合にも、線形結合を用いて同様に音源方向判定できる。   If comprised in this way, the sound source direction detection method which can detect a sound source direction only from the linear combination information of the signal obtained from a some receiver, without taking the cross correlation of sample data can be provided. Even when three or more received sound signals can be used, the sound source direction can be similarly determined using linear combination.

請求項4に記載の発明は、請求項3に記載の音源方向検出方法において、例えば図2に示すように、学習工程(S003)、確率演算工程(S004)及び音源方向判定工程(S005)に、罰金付きロジスティック回帰マシン(dual Penalized Logistic Regression Machine;dPLRM)を用いる。このように構成すると、dPLRMの学習機能を用いることにより、音源方向の検出を特に効率的に行うことができる。   According to a fourth aspect of the present invention, in the sound source direction detecting method according to the third aspect, for example, as shown in FIG. 2, the learning step (S003), the probability calculating step (S004), and the sound source direction determining step (S005) , Using a Fine Penalized Logistic Regression Machine (dPLRM). If comprised in this way, the detection of a sound source direction can be performed especially efficiently by using the learning function of dPLRM.

請求項5に記載のプログラムは、請求項3又は請求項4に記載の音源方向検出方法をコンピュータに実行させるためのプログラムである。   A program according to claim 5 is a program for causing a computer to execute the sound source direction detection method according to claim 3 or claim 4.

本発明によれば、標本データの相互相関をとることなく、複数の受信装置から得られる信号の線形結合情報のみから音源方向を検出できる音源方向検出装置及び音源方向検出方法を提供できる。   According to the present invention, it is possible to provide a sound source direction detecting device and a sound source direction detecting method capable of detecting a sound source direction only from linear combination information of signals obtained from a plurality of receiving devices without taking a cross-correlation of sample data.

以下に本発明による実施の形態を図面に基づいて説明する。   Embodiments of the present invention will be described below with reference to the drawings.

図1に、本発明の実施の形態による音源方向検出装置の構成例のブロック図を示す。図1において、1は音源、2は音源方向検出装置である。音源方向検出装置2は、音源1からの信号を受信器としての両擬似耳で受信する受信部3、擬似耳で受信した信号に時間周波数分析を適用して周波数帯域毎の振幅時系列信号としての内耳信号Xに変換する信号変換部4、重み行列Wの学習と音源方向の検出を行う学習・検出部5、学習・検出部5の検出結果等を記憶する記憶部9、学習・検出部5の検出結果等を表示する表示部10、音源方向検出装置2を構成する各部を制御して、音源方向検出を実行させる制御部11等から構成される。学習・検出部5は、罰金付きロジスティック回帰機械の双対機械(dual Penalized Logistic Regression Machine;dPLRM)により構成される。訓練用データを入力してパラメータである重み行列W ,W を学習しながら更新する学習手段6、重み行列W ,W と周波数帯域毎の振幅時系列信号としての内耳信号X,Xの線形結合情報を用いて音源方向の確率を計算する確率演算手段7、確率演算手段7で計算された確率に基づいて音源方向を判定する音源方向判定手段8等から構成される。 FIG. 1 shows a block diagram of a configuration example of a sound source direction detection apparatus according to an embodiment of the present invention. In FIG. 1, 1 is a sound source and 2 is a sound source direction detecting device. The sound source direction detection device 2 receives a signal from the sound source 1 with both pseudo-ears as a receiver, and applies time-frequency analysis to the signals received with the pseudo-ears as amplitude time-series signals for each frequency band. A signal converting unit 4 for converting the inner ear signal X into a signal, a learning / detecting unit 5 for learning the weight matrix W and detecting a sound source direction, a storage unit 9 for storing the detection results of the learning / detecting unit 5, and the learning / detecting unit 5 includes a display unit 10 that displays detection results and the like, and a control unit 11 that controls each unit of the sound source direction detection device 2 to execute sound source direction detection. The learning / detecting unit 5 is constituted by a dual machine (dually registered logistic regression machine; dPLRM). Weighting matrix is a parameter to input training data W L K, W learning means 6 for updating while learning R K, weight matrix W L K, W R K and the inner ear as amplitude time series signal for each frequency band Probability calculation means 7 for calculating the probability of the sound source direction using linear combination information of signals X L and X R , sound source direction determination means 8 for determining the sound source direction based on the probability calculated by the probability calculation means 7 and the like Is done.

図2に本実施の形態における音源方向検出方法の処理フローを示す。また、図3に信号の流れを模式的に示す。音源1は受信部の周り−90度〜+90度のいずれかの方向に在り、音声、音響などの音を発信する。受信部3では、音源からの信号を両耳を擬似したマイクロホンで受信する(S001)。さらに、内耳モデル(inner ear model)を適用し、両耳の内耳出力(神経発火パタ−ン:neural active pattern;NAP)を模擬し、内耳信号X(左耳についてX、右耳についてX)を生成する(S002)。 FIG. 2 shows a processing flow of the sound source direction detection method in the present embodiment. FIG. 3 schematically shows the signal flow. The sound source 1 is located in any direction from −90 degrees to +90 degrees around the receiving unit, and transmits sound such as voice and sound. The receiving unit 3 receives a signal from a sound source with a microphone simulating both ears (S001). Further, by applying the inner ear model (inner, ear model), the binaural inner ear output (neuronal firing patterns - down: neural active pattern; NAP) simulating the inner ear signal X (for the left ear X L, the right ear X R ) Is generated (S002).

図4に内耳信号Xの例を示す。具体的には、XおよびXをそれぞれ左右の内耳の神経発火パターンとする。それらはチャネル数Cと時間サンプル数Tの時間周波数表現(T×C行列)となっている。縦軸は音パワーを表し、他の2軸はチャネルC(周波数)及び時間T(目盛96が2msに相当する)を表す。 FIG. 4 shows an example of the inner ear signal X. Specifically, the inner ear of the left and right of the X L and X R each neuronal firing patterns. They have a time frequency representation (T × C matrix) of channel number C and time sample number T. The vertical axis represents sound power, and the other two axes represent channel C (frequency) and time T (scale 96 corresponds to 2 ms).

人の聴覚抹消系においては、内耳(蝸牛)で周波数分析が行なわれ、数10Hz〜20kHzの周波数に対応した場所の聴神経の発火に変換される。この周波数分析機能は基底膜の機械的な振動によって実現されていて、比較的狭帯域のフィルタが多数並んでいる系(フィルタバンク)で近似できる。これを表現する内耳モデルは、たとえばガンマチャープ聴覚フィルタ、およびメディスの内有毛細胞モデルがよく用いられる。(T.Irino and R.D.Patterson J.Acoust.Soc.Am.109(5),pp.2008−2022,2001.及び R.Meddis,J.Acoust.Soc.Am.83(3),pp.1056−1063m 1988.参照)   In the human auditory extinction system, frequency analysis is performed on the inner ear (cochlea), which is converted into firing of the auditory nerve at a location corresponding to a frequency of several tens of Hz to 20 kHz. This frequency analysis function is realized by mechanical vibration of the basement membrane, and can be approximated by a system (filter bank) in which a number of relatively narrow-band filters are arranged. For example, a gamma chirp auditory filter and Medis inner hair cell model are often used as the inner ear model for expressing this. (T.Irino and RD Patterson J. Acoust. Soc. Am. 109 (5), pp. 2008-2022, 2001. and R. Meddis, J. Acoust. Soc. Am. 83 (3), pp. .1056-1063m 1988.)

学習・検出部5では、まず、学習手段6において多数の訓練用データ(正解方向が既知である)を用いて、1次の多項式カーネル関数を持つdPLRMの罰金付尤度が大きくなるように、dPLRMに基づいてパラメータである重み行列W ,W を学習しながら更新する(S003)。重み行列Wは後述する(式1)、(式2)で表されるように、内耳信号X,Xの予測確率への寄与の重みを表すものである。ここにkは方向を表すパラメータであり、例えば−90度〜90度を10度毎に分割すると19方向が存在し、kは1〜19となる。L,Rは左右を表す。したがって、重み行列は19個のT×Cの行列で表される。なお、内耳信号(X,X)は神経発火パターンを表すT×Cの行列である。 In the learning / detecting unit 5, first, the learning means 6 uses a large number of training data (the correct answer direction is known) so that the fine likelihood of dPLRM having a first-order polynomial kernel function is increased. The weighting matrices W L K and W R K as parameters are updated while learning based on dPLRM (S003). The weight matrix W represents the weight of the contribution to the prediction probability of the inner ear signals X L and X R as expressed by (Expression 1) and (Expression 2) described later. Here, k is a parameter indicating a direction. For example, when −90 degrees to 90 degrees are divided every 10 degrees, there are 19 directions, and k is 1 to 19. L and R represent left and right. Therefore, the weight matrix is represented by 19 T × C matrices. The inner ear signals (X L , X R ) are T × C matrices representing nerve firing patterns.

次に確率演算手段7において、重み行列W ,W と内耳信号X,Xの線形結合情報を用いて音源方向の確率を計算する(S004)。内耳信号(X,X)に対して、その音源方向がkである確率は次式のロジスティック変換(logistic transform)で与えられる。

Figure 2007085734
と定義される。ここに、W 、W は学習されたT×Cの重み行列、tは転置行列を表す。 Next, the probability calculation means 7 calculates the probability of the sound source direction using the linear combination information of the weight matrices W L K and W R K and the inner ear signals X L and X R (S004). For the inner ear signals (X L , X R ), the probability that the sound source direction is k is given by the following logistic transformation.
Figure 2007085734
Is defined. Here, W L K and W R K represent a learned T × C weight matrix, and t represents a transposed matrix.

ロジスティック変換により計算された確率の例は図3の最下欄に示されるように、各方向(−90°〜90°)に対する確率として得られる。次に音源方向判定手段8は、確率演算工程で計算された確率に基づいて音源方向を判定する(S005)。確率演算工程での計算結果及び音源方向判定工程での判定結果は記憶部9に記憶され、表示部10に表示される。   Examples of probabilities calculated by logistic transformation are obtained as probabilities for each direction (−90 ° to 90 °) as shown in the bottom column of FIG. Next, the sound source direction determination means 8 determines the sound source direction based on the probability calculated in the probability calculation step (S005). The calculation result in the probability calculation step and the determination result in the sound source direction determination step are stored in the storage unit 9 and displayed on the display unit 10.

図5にパルス音で学習した場合の重み行列を縮約したものの例を示す。W(上側:左耳)、W(下側:右耳)を、時間平均した結果を示す。横軸は角度、縦軸は周波数を表し、明暗が値に対応する(白色:0〜黒色:1)。パルス音による学習では、ランダムに開始時刻を選んでデータを抽出したので、時間情報を利用できず、周波数情報を用いて方向を判別するしかないが、実際、重み行列に時間的な変化は見られなかった。角度により連続的に変化する重み行列に表された振幅周波数特性の積分微分情報に基づいて方向を推定していると考えられる。 FIG. 5 shows an example of a reduced weight matrix when learning with pulse sounds. The time-averaged results of W L (upper side: left ear) and W R (lower side: right ear) are shown. The horizontal axis represents angle, the vertical axis represents frequency, and light and dark correspond to values (white: 0 to black: 1). In learning with a pulse sound, data is extracted by selecting a start time at random, so time information cannot be used and the direction can only be determined using frequency information. I couldn't. It is considered that the direction is estimated based on the integral / derivative information of the amplitude frequency characteristic represented in the weight matrix continuously changing according to the angle.

次に、一般的な判別・予測を行うように設計されている学習機械dPLRMについて説明する。   Next, a learning machine dPLRM designed to perform general discrimination / prediction will be described.

健康診断において血液検査データから疾患の有無を判定・弁別する問題を例にとって述べる。被験者jのn項目にわたる検査のデータを並べた検査データの(n次元)ベクトルをxとし、疾患の有無を、健常ならばc=1、判定不能ならばc=2、疾患ありならばc=3と表すとしよう。 An example of the problem of judging / discriminating the presence or absence of a disease from blood test data in a health examination will be described as an example. X j is an (n-dimensional) vector of examination data in which examination data over n items of subject j is arranged, and the presence or absence of disease is c j = 1 if healthy, c j = 2 if judgment is impossible, and if there is disease For example, let c j = 3.

疾患の有無が判定出来ている多数の人(N人としよう)のデータ(訓練用データセットと呼ばれる)を学習機械dPLRMに入力することにより、dPLRMは学習を内部的に行い判別方式を生成する。すなわち、学習済みのdPLRMによって、確率ベクトル(関数)p(x)≡(p(x),p(x),p(x)) が与えられ、未判定の人の検査データベクトルx(n次元)に対して、健常である確率p(x)、判別不能である確率p(x)、疾患がある確率p(x)の組が計算予測できる。この例ではクラスの数が3であるが、一般には任意の大きさの自然数Kでよい。 By inputting data (called a training data set) of a large number of people who can determine the presence or absence of disease (referred to as training data set) to the learning machine dPLRM, dPLRM performs learning internally and generates a discrimination method. . That is, the learned dPLRM gives the probability vector (function) p * (x) ≡ (p 1 (x), p 2 (x), p 3 (x)) t , and the test data of the undetermined person for a vector x (n-dimensional), the probability p 1 is a healthy (x), the probability p 2 is indistinguishable (x), the set of probability p 3 there is disease (x) can be calculated prediction. In this example, the number of classes is 3, but generally a natural number K having an arbitrary size may be used.

確率ベクトル(関数)p(x)≡(p(x),p(x),p(x)) が生成される仕組みを説明する前に、数理的操作の便宜のため記号を導入する。たとえば上記の例において、健常(c=1)、判定不能(c=2)、疾患あり(c=3)をそれぞれ単位ベクトル

Figure 2007085734
などとコード化する。一般に、訓練用データの判定に対応するN個のベクトルを並べて、K×N行列
Figure 2007085734
を、その第j列ベクトルが被験者jの判定コードとなるように定義し、訓練用データセットの判定結果をこの行列に蓄えておく。 Probability vector (function) p * (x) ≡ (p 1 (x), p 2 (x), p 3 (x)) Before describing the mechanism by which t is generated, symbols for the convenience of mathematical operations Is introduced. For example, in the above example, healthy (c = 1), indeterminate (c = 2), and diseased (c = 3) unit vectors
Figure 2007085734
And so on. In general, N vectors corresponding to the determination of training data are arranged in a K × N matrix.
Figure 2007085734
Is defined such that the j-th column vector is the determination code of the subject j, and the determination result of the training data set is stored in this matrix.

dPLRMにおいては、確率ベクトル(関数)p(x)≡(p(x),p(x),p(x)) は、一般的には次のように定義されている。すなわち、RからRへの線形写像

Figure 2007085734
およびそのロジスティック変換
Figure 2007085734
を用いて
Figure 2007085734
と定義(モデル化)される。ただし、K×N行列Vは,訓練用データから学習によって推定されるべきのパラメータ行列、k(x) は次のRからRへの線形・非線型写像で
Figure 2007085734
と定義されるもので、K(x,y)としては、任意の正定値核関数を用いることが出来る仕組みになっている。 In dPLRM, a probability vector (function) p * (x) ≡ (p 1 (x), p 2 (x), p 3 (x)) t is generally defined as follows. That is, a linear mapping from R N to R K
Figure 2007085734
And its logistic conversion
Figure 2007085734
Using
Figure 2007085734
Is defined (modeled). However, K × N matrix V, the parameter matrix to be estimated by the learning from the training data, k (x) is a linear and non-linear mapping from the next R n to R N
Figure 2007085734
As K (x, y), any positive definite kernel function can be used.

確率ベクトルp(x)に対する上記のモデル定義の下で、与えられた訓練用データセット

Figure 2007085734
で与えられる、Vに関するこの凸関数を最小化すること(すなわち最尤推定法)により、Vの最尤推定値が得られるが、一般に訓練用データに過剰に適合してしまい(過学習)、未判別データに対する予測判別能力が極端に劣化する。dPLRMにおいては、この現象を避けるため罰金関数を導入し、罰金付きの負の対数尤度関数
Figure 2007085734
を最小化することにより、Vの値が推定され、予測性能に優れた学習が実現されている。
Figure 2007085734
で与えられている。 また,
Figure 2007085734
は訓練用データデータの各クラスの大きさによる偏りを補正するように,たとえば.
Figure 2007085734
と設定される。 Given the above model definition for the probability vector p * (x), a given training data set
Figure 2007085734
By minimizing this convex function for V given by (ie, the maximum likelihood estimation method), a maximum likelihood estimate of V is obtained, but generally fits excessively with training data (overlearning), Predictive discrimination ability for undiscriminated data is extremely deteriorated. In dPLRM, a fine function is introduced to avoid this phenomenon, and a negative negative log likelihood function with a fine
Figure 2007085734
By minimizing the value of V, the value of V is estimated, and learning with excellent prediction performance is realized.
Figure 2007085734
Is given in. Also,
Figure 2007085734
To correct the bias due to the size of each class of training data, for example.
Figure 2007085734
Is set.

学習機械dPLRMにおける主要な計算過程は、罰金付きの負の対数尤度関数PL(V)の最小値を与えるVを数値的に求めることにある。それは非線形行列方程式

Figure 2007085734
を数値的に解くことに帰着する。これは次の反復法アルゴリズムによって実現されている。アルゴリズムdPLRM−2:初期値をV(K×N行列)とする.系列{V}を次式に従って反復計算する(非特許文献4参照)。
Figure 2007085734
ただし、 ΔVは次の行列の線形等式の解である。
Figure 2007085734
上記アルゴリズムの詳細については非特許文献4,5,6を参照のこと。 The main calculation process in the learning machine dPLRM is to numerically obtain V giving the minimum value of the negative log likelihood function PL (V) with a fine. It is a nonlinear matrix equation
Figure 2007085734
It comes down to solving numerically. This is realized by the following iterative algorithm. Algorithm dPLRM-2: The initial value is V 0 (K × N matrix). The sequence {V i } is iteratively calculated according to the following equation (see Non-Patent Document 4).
Figure 2007085734
Where ΔV i is the solution of the linear equation of the following matrix.
Figure 2007085734
See Non-Patent Documents 4, 5, and 6 for details of the above algorithm.

学習機械dPLRMは学習データ中の(隠れた)構造を幅広く表現することができ、非常に高い帰納力を有している。dPLRMは元々罰金付きロジスティック回帰機械PLRM(Penalized Logistic Regression Machine)の双対マシンとして導入されたもので,PLRMにおいては、写像F(x)は、式4の代わりに

Figure 2007085734
と定義されている。ただしφ(x)=(φ(x),K,φ(x)) の各要素は、任意のxの 非線形関数、WはK×mのパラメータ行列である。PLRMにおいては次の罰金付き負の対数尤度関数
Figure 2007085734
を最小化することにより、パラメターWを推定する。ここでΣは正定値行列である。φ(x),Σ,K(x、y)が適切に選ばれた場合、dPLRMとPLRMは等価なモデルとなり、一つの訓練用データセットに対して、いずれも同一の確率予測ベクトルp(x)を与える。また、学習機械PLRM、dPLRMは、確率予測p(x)を与えるものであると同時に、方式
Figure 2007085734
によって、決定論的予測も行う(非特許文献4,5,6を参照)。 The learning machine dPLRM can express a wide range of (hidden) structures in the learning data and has a very high inductive power. dPLRM was originally introduced as a dual machine of the finer logistic regression machine PLRM (Penalized Logistic Regression Machine). In PLRM, the map F (x) is replaced by
Figure 2007085734
It is defined as However, each element of φ (x) = (φ 1 (x), K, φ m (x)) t is an arbitrary non-linear function of x, and W is a parameter matrix of K × m. In PLRM, the following negative log-likelihood function with a fine
Figure 2007085734
The parameter W is estimated by minimizing. Here, Σ is a positive definite matrix. When φ (x), Σ, K (x, y) are appropriately selected, dPLRM and PLRM are equivalent models, and for one training data set, the same probability prediction vector p * ( x). In addition, the learning machines PLRM and dPLRM give a probability prediction p * (x), and at the same time,
Figure 2007085734
To perform deterministic prediction (see Non-Patent Documents 4, 5, and 6).

本発明における音源方向の検出法においてdPLRMを採用した理由は、PLRMに対して計算量が大幅に改善される点に着目したからである。特に、本発明提案の最大の特色は、核関数として最も単純でかつ実装化が容易である一次多項式核関数

Figure 2007085734
を用いるだけで、音源方向の検出が可能であることを実験的に発見、これを用いて音源方向の検出装置を発明した点にある。 The reason for adopting dPLRM in the sound source direction detection method in the present invention is that attention is paid to the fact that the calculation amount is greatly improved with respect to PLRM. In particular, the greatest feature of the present invention is the first-order polynomial kernel function that is the simplest kernel function and easy to implement.
Figure 2007085734
It was found that it was possible to detect the direction of the sound source only by using this, and the sound source direction detecting device was invented using this.

最後に、上記音源方向検出装置による検出実験について説明する。   Finally, a detection experiment by the sound source direction detection device will be described.

実験では、シミュレーションデータを用いて、前方水平面内[−90°〜90°]から到来するパルス音もしくは純音を模擬した。サンプリング周波数は48kHzとした。両耳間時間差(ITD)だけに対象を限定するため、周波数帯域100Hz〜1kHz、チャネル数50 のガンマチャープ聴覚フィルタ、およびメディスの内有毛細胞モデルを用い、神経発火パターンを内耳モデルの出力信号とした。   In the experiment, a pulse sound or a pure sound arriving from [-90 ° to 90 °] in the front horizontal plane was simulated using simulation data. The sampling frequency was 48 kHz. In order to limit the target to the interaural time difference (ITD) only, a gamma-chirp auditory filter with a frequency band of 100 Hz to 1 kHz and a channel number of 50 and a median inner hair cell model are used, and a nerve firing pattern is used as an output signal of the inner ear model. It was.

正解方向は、前方水平面内の10°ごとに取って計19方向(カテゴリ)とし、この方向について学習およびテストを行った。学習においては、正解方向の信号に対してその方向カテゴリだけに1の確率を与えた。また、重み行列Wに角度による連続性を持たせるために、正解方向から±9°の範囲にあるデータも与えた。パルス音の場合には、正解方向を中心として、その範囲にあるデータの個数に三角窓をかけて、その方向カテゴリのデータとして用いた。純音の場合には、角度のずれに比例して与える確率値を隣のカテゴリと案分した。   The correct answer direction was taken every 10 ° in the front horizontal plane to be a total of 19 directions (category), and learning and testing were performed in this direction. In learning, the probability of 1 was given only to the direction category with respect to the signal in the correct direction. Further, in order to give the weight matrix W continuity by angle, data in a range of ± 9 ° from the correct direction was also given. In the case of a pulse sound, the correct answer direction is the center, and the number of data in the range is multiplied by a triangular window and used as data in that direction category. In the case of a pure tone, the probability value given in proportion to the angle shift was divided into the adjacent categories.

また、パルス音は100Hzのものを用いた。学習では、開始時刻をランダムに抽出した100個の2msの神経発火パターン(NAP)(左右各96×50=4800次元)を用いた。テストでは、同様にランダム抽出した7個の2msのNAPを用いた。正解は19方向あるので、テスト総数は133(=7×19)である。   The pulse sound was 100 Hz. In the learning, 100 2 ms neural firing patterns (NAP) (96 × 50 = 4800 dimensions on each of the left and right sides) obtained by randomly extracting start times were used. In the test, seven 2 ms NAPs randomly extracted in the same manner were used. Since there are 19 correct answers, the total number of tests is 133 (= 7 × 19).

純音に関しては、100〜1000Hzに対応するERB軸(等価矩形周波数幅軸)を等分割する25個の周波数を用いて作成した。各純音ごとに位相の開始時点を利用して、同期の取れたNAPを作成した。時間長は20msあるいは2msとした。テスト総数は475(=19×25)である。   The pure tone was created using 25 frequencies that equally divide the ERB axis (equivalent rectangular frequency width axis) corresponding to 100 to 1000 Hz. A synchronized NAP was created using the phase start time for each pure tone. The time length was 20 ms or 2 ms. The total number of tests is 475 (= 19 × 25).

表1にパルス音について、線形結合だけを用いた本方法と、両耳のNAPの相互相関係数(2ms)をdPLRMの訓練・テスト用のデータとして用いた場合の音源解答正解率を示す。この表から、相互相関情報を用いなくても、同等の性能が得られることがわかる。

Figure 2007085734
Table 1 shows the sound source answer correct rate when using the present method using only linear combination and the NAP cross-correlation coefficient (2 ms) as data for training and testing of dPLRM for pulse sounds. From this table, it can be seen that equivalent performance can be obtained without using cross-correlation information.
Figure 2007085734

表2に長さの異なる純音について、データ長が20msと2msの場合の方向正解率を示す。周期が2msより長い500Hz以下の純音についても2msである程度方向推定できることがわかる。

Figure 2007085734
Table 2 shows the directional accuracy rates for pure tones with different lengths when the data length is 20 ms and 2 ms. It can be seen that the direction can be estimated to some extent in 2 ms even for a pure tone with a period longer than 2 ms and 500 Hz or less.
Figure 2007085734

以上のように、dPLRMにより、パルス音もしくは純音について、聴覚フィルタの出力の一次結合情報に基づいて音源方向定位ができることを確認できた。以上により、一次多項式の核関数を用いるdPLRMに基づく確率計算が音源方向の検出に有効であることが確認できた。   As described above, it was confirmed by dPLRM that sound source direction localization can be performed for pulsed sound or pure sound based on the primary combination information output from the auditory filter. From the above, it was confirmed that the probability calculation based on dPLRM using the kernel function of the first-order polynomial is effective for detecting the sound source direction.

また、本発明は実施の形態における音源方向検出方法を、コンピュータに実行させるためのプログラムとして、また当該プログラムを記録したコンピュータ読み取り可能な記録媒体としても実現可能である。プログラムは、コンピュータに内蔵のROMに記録して用いても良く、FD、CD−ROM、内蔵又は外付けの磁気ディスク等の記録媒体に記録し、コンピュータに読み取って用いても良く、インターネットを介してコンピュータにダウンロードして用いても良い。   The present invention can also be realized as a program for causing a computer to execute the sound source direction detection method according to the embodiment, and also as a computer-readable recording medium on which the program is recorded. The program may be used by being recorded in a ROM built in the computer, or may be recorded on a recording medium such as an FD, CD-ROM, built-in or external magnetic disk, read by the computer, and used via the Internet. May be downloaded to a computer and used.

以上、本発明の実施の形態について説明したが、本発明は上記の実施の形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で実施の形態に種々変更を加えられることは明白である。   Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, and various modifications can be made to the embodiments without departing from the spirit of the present invention. It is.

例えば、上記実施の形態では、周波数帯域毎の振幅時系列信号として神経発火パターンの場合を説明したが、これに限られず、フィルタバンク分析による帯域フィルタバンクの出力等を用いても良い。また、受信部の受信器が2個で、2つの振幅時系列信号と2つの重み行列を結合する例を説明したが、受信器は3個以上でも良く、これに対応して、3以上の振幅時系列信号、重み行列を結合しても良い。また、学習機械としてdPLRM以外の数理モデル、例えばSVM等を使用しても良く、dPLRMの場合には音源方向検出法以外にも、音声認識や話者認識を併せて行うことも可能である。また、例えば、上記実施の形態では、重み行列と振幅時系列信号の線形結合情報を用いる例を説明したが、非線形結合情報を用いてもそれなりの効果を得ることができる。また、振幅時系列信号、重み行列の大きさは適宜変更可能である。   For example, in the above embodiment, the case of a neural firing pattern as an amplitude time-series signal for each frequency band has been described. However, the present invention is not limited to this, and the output of a band filter bank by filter bank analysis may be used. Moreover, although the example of combining two amplitude time-series signals and two weight matrices has been described with two receivers of the receiving unit, the number of receivers may be three or more, and correspondingly three or more An amplitude time series signal and a weight matrix may be combined. In addition, a mathematical model other than dPLRM, such as SVM, may be used as the learning machine. In the case of dPLRM, speech recognition and speaker recognition can be performed together with the sound source direction detection method. For example, in the above-described embodiment, an example in which linear combination information of a weight matrix and an amplitude time series signal is used has been described. However, even if nonlinear combination information is used, a certain effect can be obtained. Moreover, the magnitude of the amplitude time series signal and the weight matrix can be changed as appropriate.

本発明は音声センサーを備えた情報通信機器、電気機器、ロボット等に利用できる。   The present invention can be used for information communication equipment, electrical equipment, robots and the like equipped with a voice sensor.

本発明の実施の形態による音源方向検出装置の構成例のブロック図である。It is a block diagram of the structural example of the sound source direction detection apparatus by embodiment of this invention. 本発明の実施の形態における音源方向検出方法の処理フローを示す図である。It is a figure which shows the processing flow of the sound source direction detection method in embodiment of this invention. 本発明の実施の形態における信号の流れを模式的に示す図である。It is a figure which shows typically the flow of the signal in embodiment of this invention. 内耳信号の例を示す図である。It is a figure which shows the example of an inner ear signal. パルス音で学習した場合の重み行列の例を示す図である。It is a figure which shows the example of the weighting matrix at the time of learning with a pulse sound.

符号の説明Explanation of symbols

1 音源
2 音源方向検出装置
3 受信部
4 信号変換部
5 学習・検出部
6 学習手段
7 確率演算手段
8 音源方向判定手段
9 記憶部
10 表示部
11 制御部
W 重み行列
X 内耳信号
DESCRIPTION OF SYMBOLS 1 Sound source 2 Sound source direction detection apparatus 3 Reception part 4 Signal conversion part 5 Learning and detection part 6 Learning means 7 Probability calculation means 8 Sound source direction determination means 9 Storage part 10 Display part 11 Control part W Weight matrix X Inner ear signal

Claims (5)

音源からの信号を複数の受信器で受信する受信部と;
前記受信部で受信した信号に内耳モデル等の時間周波数分析を適用して、周波数帯域毎の振幅時系列信号に変換する信号変換部と;
訓練用データを用いて重み行列を学習しながら更新する学習手段と、前記重み行列と前記振幅時系列信号の線形結合情報を用いて音源方向の確率を計算する確率演算手段と、前記確率演算部で計算された確率に基づいて音源方向を判定する音源方向判定手段とを有する学習・検出部とを備える;
音源方向検出装置。
A receiver for receiving signals from a sound source by a plurality of receivers;
Applying a time frequency analysis such as an inner ear model to the signal received by the receiving unit to convert it into an amplitude time-series signal for each frequency band;
Learning means for updating while learning a weight matrix using training data, probability calculating means for calculating a probability of a sound source direction using linear combination information of the weight matrix and the amplitude time series signal, and the probability calculating unit A learning / detecting unit having sound source direction determining means for determining a sound source direction based on the probability calculated in
Sound source direction detection device.
前記学習・検出部は罰金付きロジスティック回帰マシンの双対機械(dual Penalized Logistic Regression Machine;dPLRM)を用いる;
請求項1に記載の音源方向検出装置。
The learning / detecting unit uses a dual machine (dually registered logistic regression machine; dPLRM);
The sound source direction detecting device according to claim 1.
音源からの信号を複数の受信器で受信する受信工程と;
前記受信工程で受信した信号に内耳モデル等の時間周波数分析を適用して、周波数帯域毎の振幅時系列信号に変換する信号変換工程と;
訓練用データを用いて重み行列を学習しながら更新する学習工程と;
前記重み行列と前記振幅時系列信号の線形結合情報を用いて音源方向の確率を計算する確率演算工程と;
前記確率演算工程で計算された確率に基づいて音源方向を判定する音源方向判定工程とを備える;
音源方向検出方法。
Receiving a signal from a sound source with a plurality of receivers;
Applying a time frequency analysis such as an inner ear model to the signal received in the receiving step to convert it into an amplitude time-series signal for each frequency band;
A learning step for updating the weighting matrix while learning the training data;
A probability calculation step of calculating a probability of a sound source direction using linear combination information of the weight matrix and the amplitude time series signal;
A sound source direction determination step of determining a sound source direction based on the probability calculated in the probability calculation step;
Sound source direction detection method.
前記学習工程、前記確率演算工程及び前記音源方向判定工程に、罰金付きロジスティック回帰マシンの双対機械(dual Penalized Logistic Regression Machine;dPLRM)を用いる;
請求項3に記載の音源方向検出方法。
For the learning step, the probability calculation step, and the sound source direction determination step, a dual logistic regression machine dual machine (dPLRM) with a fineness is used;
The sound source direction detection method according to claim 3.
請求項3又は請求項4に記載の音源方向検出方法をコンピュータに実行させるためのプログラム

A program for causing a computer to execute the sound source direction detection method according to claim 3 or 4.

JP2005271227A 2005-09-16 2005-09-16 Sound source direction detection device and method Withdrawn JP2007085734A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2005271227A JP2007085734A (en) 2005-09-16 2005-09-16 Sound source direction detection device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2005271227A JP2007085734A (en) 2005-09-16 2005-09-16 Sound source direction detection device and method

Publications (1)

Publication Number Publication Date
JP2007085734A true JP2007085734A (en) 2007-04-05

Family

ID=37972869

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2005271227A Withdrawn JP2007085734A (en) 2005-09-16 2005-09-16 Sound source direction detection device and method

Country Status (1)

Country Link
JP (1) JP2007085734A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009271183A (en) * 2008-05-01 2009-11-19 Nippon Telegr & Teleph Corp <Ntt> Multiple signal sections estimation device and its method, and program and its recording medium
CN101419801B (en) * 2008-12-03 2011-08-17 武汉大学 Method for subband measuring correlation sensing characteristic between ears and device thereof
KR20210046416A (en) * 2019-10-18 2021-04-28 한국과학기술원 Audio classification method based on neural network for waveform input and analyzing apparatus
KR20210116066A (en) * 2020-03-17 2021-09-27 성균관대학교산학협력단 A method for inferring of generating direction of sound using deep network and an apparatus for the same

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009271183A (en) * 2008-05-01 2009-11-19 Nippon Telegr & Teleph Corp <Ntt> Multiple signal sections estimation device and its method, and program and its recording medium
CN101419801B (en) * 2008-12-03 2011-08-17 武汉大学 Method for subband measuring correlation sensing characteristic between ears and device thereof
KR20210046416A (en) * 2019-10-18 2021-04-28 한국과학기술원 Audio classification method based on neural network for waveform input and analyzing apparatus
KR102281676B1 (en) 2019-10-18 2021-07-26 한국과학기술원 Audio classification method based on neural network for waveform input and analyzing apparatus
KR20210116066A (en) * 2020-03-17 2021-09-27 성균관대학교산학협력단 A method for inferring of generating direction of sound using deep network and an apparatus for the same
KR102329353B1 (en) 2020-03-17 2021-11-22 성균관대학교산학협력단 A method for inferring of generating direction of sound using deep network and an apparatus for the same

Similar Documents

Publication Publication Date Title
US10313818B2 (en) HRTF personalization based on anthropometric features
CN108269569B (en) Speech recognition method and device
Vecchiotti et al. End-to-end binaural sound localisation from the raw waveform
US11205443B2 (en) Systems, methods, and computer-readable media for improved audio feature discovery using a neural network
Talmon et al. Diffusion maps for signal processing: A deeper look at manifold-learning techniques based on kernels and graphs
CN108986835B (en) Based on speech de-noising method, apparatus, equipment and the medium for improving GAN network
CN107302737A (en) The modeling of the loudspeaker based on neutral net carried out using deconvolution filter
KR20160032536A (en) Signal process algorithm integrated deep neural network based speech recognition apparatus and optimization learning method thereof
JP6976804B2 (en) Sound source separation method and sound source separation device
Salvati et al. A weighted MVDR beamformer based on SVM learning for sound source localization
Mitrovic et al. DR-ABC: Approximate Bayesian computation with kernel-based distribution regression
Xiong et al. Joint estimation of reverberation time and early-to-late reverberation ratio from single-channel speech signals
Shekofteh et al. A gaussian mixture model based cost function for parameter estimation of chaotic biological systems
CN108476072A (en) Crowdsourcing database for voice recognition
Walter et al. Source counting in speech mixtures by nonparametric Bayesian estimation of an infinite Gaussian mixture model
JP2007085734A (en) Sound source direction detection device and method
JP2006154314A (en) Device, program, and method for sound source separation
Salvati et al. A late fusion deep neural network for robust speaker identification using raw waveforms and gammatone cepstral coefficients
Gul et al. Integration of deep learning with expectation maximization for spatial cue-based speech separation in reverberant conditions
KR20210060146A (en) Method and apparatus for processing data using deep neural network model, method and apparatus for trining deep neural network model
Mogridge et al. Non-intrusive speech intelligibility prediction for hearing-impaired users using intermediate ASR features and human memory models
Falcon Perez Machine-learning-based estimation of room acoustic parameters
Schymura et al. Binaural sound source localisation using a Bayesian-network-based blackboard system and hypothesis-driven feedback
US11337021B2 (en) Head-related transfer function generator, head-related transfer function generation program, and head-related transfer function generation method
Batista et al. Automatic speech recognition using support vector machine and particle swarm optimization

Legal Events

Date Code Title Description
A300 Application deemed to be withdrawn because no request for examination was validly filed

Free format text: JAPANESE INTERMEDIATE CODE: A300

Effective date: 20081202