JP5689844B2

JP5689844B2 - SPECTRUM ESTIMATION DEVICE, METHOD THEREOF, AND PROGRAM

Info

Publication number: JP5689844B2
Application number: JP2012060159A
Authority: JP
Inventors: 中谷　智広; 智広中谷
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-03-16
Filing date: 2012-03-16
Publication date: 2015-03-25
Anticipated expiration: 2032-03-16
Also published as: JP2013195511A

Description

本発明は一次元時系列信号を周波数分割した信号から、信号のスペクトルを推定するスペクトル推定技術に関する。 The present invention relates to a spectrum estimation technique for estimating a spectrum of a signal from a signal obtained by frequency-dividing a one-dimensional time series signal.

以下の説明に用いる図面では、同じ機能を持つ構成部や同じ処理を行うステップには同一の符号を記し、重複説明を省略する。以下の説明において、テキスト中で使用する記号、「＾」等は、本来直前の文字の真上に記載されるべきものであるが、テキスト記法の制限により、当該文字の直後に記載する。式中においてはこれらの記号は本来の位置に記述している。また、ベクトルや行列の各要素単位で行われる処理は、特に断りが無い限り、そのベクトルやその行列の全ての要素に対して適用されるものとする。 In the drawings used for the following description, components having the same function and steps for performing the same process are denoted by the same reference numerals, and redundant description is omitted. In the following description, a symbol used in the text, such as “^”, should be described immediately above the immediately preceding character, but is described immediately after the character due to restrictions on text notation. In the formula, these symbols are written in their original positions. Further, the processing performed for each element of a vector or matrix is applied to all elements of the vector or matrix unless otherwise specified.

ｎを短時間フレームの番号、ｋ（＝１〜Ｎ_ｋ）を、観測信号を周波数分割する際の周波数の番号とし、短時間フレームにおける各周波数分割した信号をｘ_ｎ，ｋと表す。さらに、ｘ_ｎ，ｋを全ての周波数についてひとまとめにしてできるベクトルをｘ_ｎ＝［ｘ_ｎ，１，ｘ_ｎ，２，…，ｘ_ｎ，Ｎｋ］^Ｔと表記し（ただし、下付添え字ＮｋはＮ_ｋを表す）、以下では、短時間フレームｎの周波数信号と呼ぶ。Ｎ_ｋは周波数分割数を表す。^Ｔは、ベクトルや行列の非共役転置を表す。 Let n be the number of the short-time frame, k (= 1 to N _k ) be the frequency number when the observed signal is frequency-divided, and each frequency-divided signal in the short-time frame is represented as x _{n, k} . Further, a vector that can be obtained by combining x _{n, k} for all frequencies is expressed as x _n = [x _{n, 1} , x _{n, 2} ,..., X _{n, N k} ] ^T (provided that the subscript Nk Represents _Nk ), hereinafter referred to as the frequency signal of the short-time frame n. N _k represents the frequency division number. ^T represents a non-conjugate transpose of a vector or matrix.

図１は、非特許文献１などに開示されている従来のスペクトル推定装置９の機能ブロック図を示す。スペクトル推定装置９は、各短時間フレームｎにおいて、周波数信号ｘ_ｎを受け取り、最尤法に基づき、周波数信号ｘ_ｎのスペクトルσ_ｎ＝［σ_ｎ，１，σ_ｎ，２，…，σ_ｎ，Ｎｋ］^Ｔを推定する。より具体的には、非特許文献１では、信号の周波数分割に短時間フーリエ変換を用いており、残響除去された信号の短時間フーリエ変換の推定値がｘ_ｎと与えられているときに、ｘ_ｎ，ｋが平均０、分散σ_ｎ，ｋの複素正規分布に従うとの仮定の下で、最尤スペクトル推定部９１において、分散σ_ｎ，ｋ（＝スペクトルの値）を最尤法により求める。つまり、ｘ_ｎ，ｋの条件付き確率密度関数ｐ（ｘ_ｎ，ｋ｜σ_ｎ，ｋ）は、以下の式でモデル化される。 FIG. 1 shows a functional block diagram of a conventional spectrum estimation apparatus 9 disclosed in Non-Patent Document 1 and the like. Spectrum estimation unit 9, in each short time frame n, receives the frequency signal _{x n,} based on the maximum likelihood method, the spectrum σ _{_n} = [σ _{_n, 1} of the frequency signal _{_{x n, σ n, 2,}} ..., σ n _{, Nk} ] ^T is estimated. More specifically, in Non-Patent Document 1, short-time Fourier transform is used for frequency division of a signal, and when an estimated value of short-time Fourier transform of a dereverberation signal is given as x _n , Under the assumption that x _{n, k} follows a complex normal distribution with mean 0 and variance σ _{n, k} , maximum likelihood spectrum estimation section 91 obtains variance σ _{n, k} (= spectrum value) by the maximum likelihood method. . _{That, x n,} the conditional probability density function _{p (x n, k | σ} n, k) of the _k is modeled by the following equation.

そして、対数尤度関数Ｌ（σ_ｎ）＝Σ_ｋｌｏｇｐ（ｘ_ｎ，ｋ｜σ_ｎ，ｋ）を最大にする値として、以下のようにスペクトルσ_ｎ＝［σ_ｎ，１，σ_ｎ，２，…，σ_ｎ，Ｎｋ］^Ｔを推定する。 Then, as a value that maximizes the log likelihood function L (σ _n ) = Σ _k logp (x _{n, k} | σ _{n, k} ), the spectrum σ _n = [σ _{n, 1} , σ _{n, 2} ,..., Σ _{n, Nk} ] ^T is estimated.

なお、推定値と推定すべき変数を区別するために、推定値には＾をつけて、σ＾_ｎ等と表記することにする。 In order to distinguish the estimated value from the variable to be estimated, the estimated value is appended with ^ and expressed as σ ^ _n or the like.

一方、非特許文献２等に詳述されているように、（１）式に加えて、分散σ_ｎ，ｋのとりうる値を規定する事前確率密度関数ｐ（σ_ｎ；Θ）を導入し、σ_ｎの値を、周波数信号ｘ_ｎが与えられた下での事後確率最大化（Maximum a posteriori、以下「ＭＡＰ」ともいう）推定により求める方法が説明されている。ここで、Θは、事前確率密度関数のモデルパラメータである。この場合のスペクトル推定装置８の機能ブロック図を図２に示す。ＭＡＰ推定は、以下のように定義される。 On the other hand, as detailed in Non-Patent Document 2 and the like, in addition to the equation (1), a prior probability density function p (σ _n ; Θ) that defines the possible values of the variance σ _{n, k} is introduced. , Σ _n is described by obtaining a posteriori probability maximization (Maximum a posteriori, hereinafter also referred to as “MAP”) under a frequency signal x _n . Here, Θ is a model parameter of the prior probability density function. A functional block diagram of the spectrum estimation apparatus 8 in this case is shown in FIG. The MAP estimation is defined as follows:

事後確率最大化スペクトル推定部８１は、スペクトル事前分布記憶部８２から事前確率密度関数のモデルパラメータΘを取り出し、（４）式により、σ_ｎを求める。このように、σ_ｎの事前確率密度関数ｐ（σ_ｎ；Θ）を考慮することで、σ_ｎがとりうる値の傾向をある程度制限できることになる。事前確率密度関数ｐ（σ_ｎ；Θ）として、ガウス分布の分散に関する自然共役分布である逆ガンマ分布等を用いると、効率的な計算が可能なことが知られている。 The posterior probability maximizing spectrum estimation unit 81 extracts the model parameter Θ of the prior probability density function from the spectrum prior distribution storage unit 82, and obtains σ _n by the equation (4). Thus, sigma _n prior probability density function _p; to consider the (σ _n Θ), so that to some extent limit the tendency of sigma _n can take a value. As the prior probability density function p (σ _n ; Θ), it is known that efficient calculation is possible by using an inverse gamma distribution or the like that is a natural conjugate distribution related to the dispersion of a Gaussian distribution.

中谷智広、吉岡拓也、木下慶介、三好正人、Biing-Hwang Juang、“短時間フーリエ変換表現を用いた最尤推定に基づく音声信号の残響除去”、日本音響学会春季研究発表会、２００８年３月、pp.733-736Tomohiro Nakatani, Takuya Yoshioka, Keisuke Kinoshita, Masato Miyoshi, Biing-Hwang Juang, “Reverberation removal of speech signal based on maximum likelihood estimation using short-time Fourier transform expression”, Acoustical Society of Japan Spring Meeting, March 2008 , Pp.733-736 C. M. Bishop著、元田浩、栗田多喜夫、樋口知之、松本裕治訳、「パターン認識と機械学習上- ベイズ理論による統計的予測」、シュプリンガー・ジャパン、2007年、pp.95-100C. M. Bishop, Hiroshi Motoda, Takio Kurita, Tomoyuki Higuchi, Yuji Matsumoto, “On Pattern Recognition and Machine Learning-Statistical Prediction by Bayesian Theory”, Springer Japan, 2007, pp.95-100

非特許文献１では、周波数信号として短時間フーリエ変換の推定値を用いているが、一般に、推定値には必ず推定誤差が含まれる。また、周波数信号としてマイクロホンで収音した観測信号を用いる場合なども、一般に、観測信号には必ず何らかの雑音が含まれる。その結果、これらの周波数信号に基づき、従来の方法でスペクトル推定を行うと、誤差や雑音の影響で必ずしも精度よく推定が行えないという問題がある。特に、非特許文献１のように、短時間フーリエ変換の推定とスペクトルの推定を相互に依存させながら交互に繰り返すような場合、繰り返しにより誤差の影響が拡大して推定値が劣化する場合もある。 In Non-Patent Document 1, an estimated value of a short-time Fourier transform is used as a frequency signal, but generally an estimated error is always included in the estimated value. Also, when using an observation signal collected by a microphone as a frequency signal, in general, the observation signal always includes some noise. As a result, when spectrum estimation is performed by a conventional method based on these frequency signals, there is a problem in that estimation cannot always be performed accurately due to the influence of errors and noise. In particular, as in Non-Patent Document 1, when the short-time Fourier transform estimation and the spectrum estimation are alternately repeated while depending on each other, the influence of the error may be enlarged by the repetition and the estimated value may deteriorate. .

一方、非特許文献２にあるように、分散の事前確率密度関数ｐ（σ_ｎ；Θ）を導入し、分散の値をＭＡＰ推定によって求めるようにすることで、分散がとりうる値を制限し、ある程度、誤差の影響を弱めることができる。しかし、ＭＡＰ推定において効率的に最適化が行えるのは、自然共役分布のようなごく一部の事前確率密度関数ｐ（σ_ｎ；Θ）を用いる場合に限られ、σ_ｎの分布を精度よくあらわすものを必ずしも利用できないという問題がある。特に、自動音声認識システムの音響モデルとして利用される確率分布と類似性の高い対数スペクトルに関する混合ガウス分布等（ガウス分布、混合ガウス分布、ガウス分布を出力確率分布に持つ隠れマルコフモデル等を含む）は、精度よく音声信号のスペクトルの分布を表すと考えられているが、（４）式の事前確率密度関数ｐ（σ_ｎ；Θ）として用いた場合に、効率的に最適化を行う方法は知られていない。 On the other hand, as disclosed in Non-Patent Document 2, by introducing a prior probability density function p (σ _n ; Θ) of variance and obtaining the value of variance by MAP estimation, the value that variance can take is limited. To some extent, the effect of errors can be weakened. However, efficient optimization in MAP estimation is limited to the case where only a small part of the prior probability density function p (σ _n ; Θ) such as a natural conjugate distribution is used, and the distribution of σ _n is accurately determined. There is a problem that what you represent is not always available. In particular, mixed Gaussian distributions for logarithmic spectra that are highly similar to probability distributions used as acoustic models for automatic speech recognition systems (including Gaussian distributions, mixed Gaussian distributions, hidden Markov models with Gaussian distributions as output probability distributions, etc.) Is considered to accurately represent the spectrum distribution of the speech signal, but when used as the prior probability density function p (σ _n ; Θ) in equation (4), an efficient optimization method is unknown.

この発明はこの課題に鑑みてなされたものであり、周波数信号が誤差を含む場合でも、対数スペクトルに関する混合ガウス分布等をスペクトルの事前確率密度関数として用いて、高精度かつ効率的にスペクトル推定が行える技術を提供することを目的とする。 The present invention has been made in view of this problem, and even when a frequency signal includes an error, spectrum estimation is performed with high accuracy and efficiency by using a mixed Gaussian distribution or the like relating to a logarithmic spectrum as a prior probability density function of the spectrum. The purpose is to provide technology that can be used.

上記の課題を解決するために、本発明の第一の態様によれば、スペクトル推定装置は、各短時間フレームｎにおける周波数信号ｘ_ｎのスペクトル値σ_ｎを推定する。スペクトル推定装置は、記憶部、スペクトル状態推定部及び事後確率最大化スペクトル推定部を含む。記憶部は、周波数信号ｘ_ｎの対数スペクトルρ_ｎの状態を表す状態パラメータθ_ｎの事前確率密度関数ｐ（θ_ｎ；Θ_θ）に関するモデルパラメータであるスペクトル状態モデルΘ_θと、状態パラメータθ_ｎが既知の条件下での対数スペクトルρ_ｎの条件付き確率密度関数ｐ（ρ_ｎ｜θ_ｎ；Θ_ρ）に関するモデルパラメータである状態依存スペクトルモデルΘ_ρとを記憶する。スペクトル状態推定部は、対数スペクトルの推定値ρ＾_ｎ、スペクトル状態モデルΘ_θ及び状態依存スペクトルモデルΘ_ρを用いて、対数尤度重みｗ_θｎを推定する。事後確率最大化スペクトル推定部は、周波数信号ｘ_ｎ、対数尤度重みｗ_θｎ及び状態依存スペクトルモデルΘ_ρを用いて、目的関数を最大化する対数スペクトルρ_ｎを推定する。収束条件を満たすまで、スペクトル状態推定部及び事後確率最大化スペクトル推定部における処理を繰り返す。 In order to solve the above problem, according to the first aspect of the present invention, the spectrum estimation device estimates the spectrum value σ _n of the frequency signal x _n in each short-time frame n. The spectrum estimation apparatus includes a storage unit, a spectrum state estimation unit, and a posterior probability maximization spectrum estimation unit. The storage unit includes a spectrum state model Θ _θ that is a model parameter related to the prior probability density function p (θ _n ; Θ _θ ) of the state parameter θ _n representing the state of the logarithmic spectrum ρ _n of the frequency signal x _n , and the state parameter θ _n. Is stored as a state dependent spectral model Θ _ρ , which is a model parameter for the conditional probability density function p (ρ _n | θ _n ; Θ _ρ ) of the logarithmic spectrum ρ _n under known conditions. The spectrum state estimation unit estimates the log likelihood weight w _θn using the logarithmic spectrum estimation value ρ ^ _n , the spectrum state model Θ _θ and the state dependent spectrum model Θ _ρ . The posterior probability maximizing spectrum estimation unit estimates the logarithmic spectrum ρ _n that maximizes the objective function, using the frequency signal x _n , the log likelihood weight w _θn, and the state-dependent spectrum model Θ _ρ . The processes in the spectrum state estimation unit and the posterior probability maximization spectrum estimation unit are repeated until the convergence condition is satisfied.

上記の課題を解決するために、本発明の第二の態様によれば、スペクトル推定方法は、各短時間フレームｎにおける周波数信号ｘ_ｎのスペクトル値σ_ｎを推定する。スペクトル推定方法は、スペクトル状態推定ステップ及び事後確率最大化スペクトル推定ステップを含む。周波数信号ｘ_ｎの対数スペクトルρ_ｎの状態を表す状態パラメータθ_ｎの事前確率密度関数ｐ（θ_ｎ；Θ_θ）に関するモデルパラメータであるスペクトル状態モデルΘ_θと、状態パラメータθ_ｎが既知の条件下での対数スペクトルρ_ｎの条件付き確率密度関数ｐ（ρ_ｎ｜θ_ｎ；Θ_ρ）に関するモデルパラメータである状態依存スペクトルモデルΘ_ρとを記憶しておく。スペクトル状態推定ステップは、対数スペクトルの推定値ρ＾_ｎ、スペクトル状態モデルΘ_θ及び状態依存スペクトルモデルΘ_ρを用いて、対数尤度重みｗ_θｎを推定する。事後確率最大化スペクトル推定ステップは、周波数信号ｘ_ｎ、対数尤度重みｗ_θｎ及び状態依存スペクトルモデルΘ_ρを用いて、目的関数を最大化する対数スペクトルρ_ｎを推定する。収束条件を満たすまで、スペクトル状態推定ステップ及び事後確率最大化スペクトル推定ステップにおける処理を繰り返す。 In order to solve the above problem, according to the second aspect of the present invention, the spectrum estimation method estimates the spectrum value σ _n of the frequency signal x _n in each short-time frame n. The spectrum estimation method includes a spectrum state estimation step and a posterior probability maximization spectrum estimation step. A spectrum state model Θ _θ that is a model parameter related to the prior probability density function p (θ _n ; Θ _θ ) of the state parameter θ _n representing the state of the logarithmic spectrum ρ _n of the frequency signal x _n , and a condition under which the state parameter θ _n is known The state-dependent spectrum model Θ _ρ which is a model parameter regarding the conditional probability density function p (ρ _n | θ _n ; Θ _ρ ) of the logarithmic spectrum ρ _n below is stored. In the spectral state estimation step, the log likelihood weight w _θn is estimated using the logarithmic spectrum estimation value ρ ^ _n , the spectral state model Θ _θ and the state dependent spectral model Θ _ρ . The posterior probability maximization spectrum estimation step estimates the logarithmic spectrum ρ _n that maximizes the objective function using the frequency signal x _n , the log likelihood weight w _θn, and the state-dependent spectrum model Θ _ρ . Until the convergence condition is satisfied, the processing in the spectrum state estimation step and the posterior probability maximization spectrum estimation step is repeated.

本発明によれば、スペクトルの分布を高精度に表現可能な対数スペクトルに関する潜在変数依存型ガウス分布をスペクトルの事前確率密度関数として用いた場合でも、効率的にスペクトルの値を推定できる。その結果、周波数信号が誤差を含むような場合でも、効率的かつ高精度に、そのスペクトルの推定が可能になるという効果を奏する。 According to the present invention, even when a latent variable-dependent Gaussian distribution relating to a logarithmic spectrum capable of expressing the spectrum distribution with high accuracy is used as the prior probability density function of the spectrum, the spectrum value can be estimated efficiently. As a result, even when the frequency signal includes an error, the spectrum can be estimated efficiently and with high accuracy.

従来のスペクトル推定装置の機能ブロック図。The functional block diagram of the conventional spectrum estimation apparatus. 従来のスペクトル推定装置の機能ブロック図。The functional block diagram of the conventional spectrum estimation apparatus. 第一実施形態のスペクトル推定装置の機能ブロック図。The functional block diagram of the spectrum estimation apparatus of 1st embodiment. 第一実施形態のスペクトル推定装置の処理フローを示す図。The figure which shows the processing flow of the spectrum estimation apparatus of 1st embodiment. 第一実施形態の事後確率最大化スペクトル推定部の機能ブロック図。The functional block diagram of the posterior probability maximization spectrum estimation part of 1st embodiment. 第一実施形態のスペクトル状態推定部と事後確率最大化スペクトル推定部の処理フローを示す図。The figure which shows the processing flow of the spectrum state estimation part and posterior probability maximization spectrum estimation part of 1st embodiment. 第一実施形態の変形例のスペクトル状態推定部の機能ブロック図。The functional block diagram of the spectrum state estimation part of the modification of 1st embodiment. 第一実施形態の変形例のスペクトル状態推定部の処理フローを示す図。The figure which shows the processing flow of the spectrum state estimation part of the modification of 1st embodiment. 第二実施形態のスペクトル状態推定部の機能ブロック図。The functional block diagram of the spectrum state estimation part of 2nd embodiment. 第二実施形態のスペクトル状態推定部の処理フローを示す図。The figure which shows the processing flow of the spectrum state estimation part of 2nd embodiment. 従来技術（最尤法）、第一実施形態、第二実施形態を用いた場合の比較結果を示す図。The figure which shows the comparison result at the time of using a prior art (maximum likelihood method), 1st embodiment, and 2nd embodiment.

以下、本発明の実施形態について説明する。 Hereinafter, embodiments of the present invention will be described.

＜第一実施形態＞
図３はスペクトル推定装置１０の機能ブロック図を、図４はその処理フローを示す。スペクトル推定装置１０は、スペクトル状態モデル記憶部１０１、状態依存スペクトルモデル記憶部１０２、スペクトル状態推定部１０４及び事後確率最大化スペクトル推定部１０６を含む。 <First embodiment>
FIG. 3 is a functional block diagram of the spectrum estimation apparatus 10, and FIG. 4 shows its processing flow. The spectrum estimation apparatus 10 includes a spectrum state model storage unit 101, a state dependent spectrum model storage unit 102, a spectrum state estimation unit 104, and a posterior probability maximization spectrum estimation unit 106.

スペクトル推定装置１０は、各短時間フレームｎにおいて、周波数信号ｘ_ｎを受け取り、そのスペクトルの推定値σ＾_ｎを出力する。 The spectrum estimation apparatus 10 receives the frequency signal x _n in each short-time frame _n and outputs an estimated value σ ^ _n of the spectrum.

まず、周波数信号ｘ_ｎの対数スペクトルをρ_ｎ＝［ρ_ｎ，１，ρ_ｎ，２，…，ρ_ｎ，Ｎｋ］^Ｔと表すことにする。ただし、ρ_ｎ，ｋ＝ｌｏｇσ_ｎ，ｋである。 First, the logarithmic spectrum of the frequency signal x _n is _expressed as ρ _n = [ρ _{n, 1} , ρ _{n, 2} ,..., Ρ _{n, Nk} ] ^T. However, ρ _{n, k} = logσ _{n, k} .

スペクトル状態モデル記憶部１０１は、周波数信号ｘ_ｎの対数スペクトルρ_ｎの状態を表す状態パラメータθ_ｎの事前確率密度関数ｐ（θ_ｎ；Θ_θ）に関するモデルパラメータを記憶している。以下、このモデルパラメータをスペクトル状態モデルΘ_θと呼ぶ。 The spectrum state model storage unit 101 stores model parameters related to the prior probability density function p (θ _n ; Θ _θ ) of the state parameter θ _n representing the state of the logarithmic spectrum ρ _n of the frequency signal x _n . Hereinafter referred to as the model parameters and the spectral state model theta _theta.

状態依存スペクトルモデル記憶部１０２は、状態パラメータθ_ｎが既知の条件下での対数スペクトルρ_ｎの条件付き確率密度関数ｐ（ρ_ｎ｜θ_ｎ；Θ_ρ）に関するモデルパラメータを記憶している。以下、このモデルパラメータを状態依存スペクトルモデルΘ_ρと呼ぶ。 The state-dependent spectrum model storage unit 102 stores model parameters related to the conditional probability density function p (ρ _n | θ _n ; Θ _ρ ) of the logarithmic spectrum ρ _n under the condition where the state parameter θ _n is known. Hereinafter referred to as the model parameters and state-dependent spectral model theta _[rho.

スペクトル状態推定部１０４は、後述する事後確率最大化スペクトル推定部１０６が推定した対数スペクトルの推定値ρ＾_ｎを受け取るとともに、スペクトル状態モデル記憶部１０１と状態依存スペクトルモデル記憶部１０２のそれぞれからスペクトル状態モデルΘ_θと状態依存スペクトルモデルΘ_ρを受け取り、対数尤度重みｗ_θｎ（ただし、下付添え字θｎはθ_ｎを表す）を推定し（ｓ１）、出力する。 The spectrum state estimation unit 104 receives logarithmic spectrum estimation values ρ ^ _n estimated by a posterior probability maximization spectrum estimation unit 106, which will be described later, and receives spectra from the spectrum state model storage unit 101 and the state-dependent spectrum model storage unit 102, respectively. receive state model theta _theta and state-dependent spectral model theta _[rho, log likelihood weights w _{.theta.n (where} subscript .theta.n represents theta _n) to estimate the (s1), and outputs.

事後確率最大化スペクトル推定部１０６は、周波数信号ｘ_ｎと、対数尤度重みｗ_θｎと、状態依存スペクトルモデルΘ_ρを受け取り、後述する目的関数を最大化する対数スペクトルの推定値ρ＾_ｎ＝［ρ＾_ｎ，１，ρ＾_ｎ，２，…，ρ＾_ｎ，Ｎｋ］^Ｔを推定し（ｓ２）、出力する。また収束条件を満たすまで（ｓ３）、スペクトル状態推定部１０４及び事後確率最大化スペクトル推定部１０６における処理（ｓ１及びｓ２）を繰り返す。収束条件としては、例えば、（１）繰り返し回数が所定の回数を超えることや、（２）一つ前の繰り返し時に得られた対数スペクトルの推定値と現在の繰り返し時に得られた対数スペクトルの推定値との差分が閾値以下であること等が挙げられる。収束条件を満たした場合は、満たした時点の対数スペクトルの推定値ρ＾_ｎからスペクトルの推定値σ＾_ｎ＝［σ＾_ｎ，１，σ＾_ｎ，２，…，σ＾_ｎ，Ｎｋ］^Ｔを求め、出力する。ただし、σ＾_ｎ，ｋ＝ｅｘｐ（ρ＾_ｎ，ｋ）である。推定値ρ＾_ｎ，ｋが得られれば推定値σ＾_ｎ，ｋも与えられるので、以下では推定値ρ＾_ｎ，ｋについての推定方法のみについて記述する。 Posteriori probability maximization spectrum estimating unit 106, a frequency signal x _n, and the logarithmic likelihood weights w _.theta.n, receives the state-dependent spectral model theta _[rho, estimates of the log spectrum to maximize the objective function to be described later [rho ^ _n = [Ρ ^ _{n, 1} , ρ ^ _{n, 2} ,..., Ρ ^ _{n, Nk} ] ^T is estimated (s2) and output. Further, until the convergence condition is satisfied (s3), the processing (s1 and s2) in the spectrum state estimation unit 104 and the posterior probability maximization spectrum estimation unit 106 is repeated. As the convergence condition, for example, (1) the number of iterations exceeds a predetermined number, or (2) the estimated value of the logarithmic spectrum obtained at the previous iteration and the estimation of the logarithmic spectrum obtained at the current iteration. For example, the difference from the value is equal to or less than a threshold value. When the convergence condition is satisfied, the estimated value σ ^ _n = [σ ^ _{n, 1} , σ ^ _{n, 2} ,..., Σ ^ _{n, Nk} ] from the logarithmic spectrum estimated value ρ ^ _n at the time when the convergence condition is satisfied. ^T is obtained and output. However, σ ^ _{n, k} = exp (ρ ^ _{n, k} ). Since the estimated value σ ^ _{n, k} is also given if the estimated value ρ ^ _{n, k} is obtained _, only the estimation method for the estimated value ρ ^ _{n, k} will be described below.

＜第一実施形態のポイント＞
スペクトル推定装置１０では、対数スペクトルρ_ｎがとりうる値を規定する事前確率密度関数ｐ（ρ_ｎ；Θ_θ，Θ_ρ）を導入し、対数スペクトルρ_ｎの値を、周波数信号ｘ_ｎが与えられた下での事後確率最大化（ＭＡＰ）推定により求める。すなわち、以下のように求める。 <Points of first embodiment>
The spectrum estimation apparatus 10 introduces a prior probability density function p (ρ _n ; Θ _θ , Θ _ρ ) that defines the values that the logarithmic spectrum ρ _n can take, and the frequency signal x _n gives the value of the log spectrum ρ _n. Calculated by posterior probability maximization (MAP) estimation. That is, it is obtained as follows.

これにより、（１）式で定義される周波数信号ｘ_ｎの条件付き確率密度関数ｐ（ｘ_ｎ，ｋ｜σ_ｎ，ｋ）に加えて、対数スペクトルρ_ｎの事前確率密度関数ｐ（ρ_ｎ；Θ_θ，Θ_ρ）をも考慮しながら対数スペクトルρ_ｎが推定されることになる。そのため、周波数信号ｘ_ｎに含まれる誤差の影響を比較的受けにくいスペクトル推定が可能になる。なお、（５）式のｐ（ｘ_ｎ｜ρ_ｎ）は、従来の最尤法と同様に、ｐ（ｘ_ｎ｜ρ_ｎ）＝Π_ｋｐ（ｘ_ｎ，ｋ｜ρ_ｎ，ｋ）のように分解でき、（１）式とσ_ｎ，ｋ＝ｅｘｐ（ρ_ｎ，ｋ）の関係式に基づき、以下のように定義されているものとする。 Thus, (1) the conditional probability density function p of the frequency signal _{x n} to be defined _{(x n, k} | sigma _{n, k)} in equation in addition to, the logarithmic spectrum [rho _n pre probability density function p ([rho _n The logarithmic spectrum ρ _n is estimated in consideration of Θ _θ and Θ _ρ ). Therefore, it is possible to perform spectrum estimation that is relatively difficult to be affected by the error included in the frequency signal _xn . Note that p (x _n | ρ _n ) in the equation (5) is p (x _n | ρ _n ) = Π _k p (x _{n, k} | ρ _{n, k} ) as in the conventional maximum likelihood method. It is assumed that the following definition is made based on the relational expression (1) and σ _{n, k} = exp (ρ _{n, k} ).

さらに、第一実施形態のスペクトル推定装置１０では、高精度で効率的な推定を実現するために、以下の３つの仮定を導入する。 Furthermore, in the spectrum estimation apparatus 10 of the first embodiment, the following three assumptions are introduced in order to realize highly accurate and efficient estimation.

仮定（１）：周波数信号ｘ_ｎの対数スペクトルρ_ｎの事前確率密度関数ｐ（ρ_ｎ；Θ_θ，Θ_ρ）は、状態パラメータθ_ｎを潜在変数として持つ以下の式でモデル化されている。 Assumption (1): Prior probability density function p (ρ _n ; Θ _θ , Θ _ρ ) of logarithmic spectrum ρ _n of frequency signal x _n is modeled by the following equation having state parameter θ _n as a latent variable. .

なお、上式では、状態パラメータθ_ｎは離散値を取るものと仮定し、その周辺化のために全状態の総和をとっている。一方、本発明は、状態パラメータθ_ｎが連続値を取る場合も含む。その場合、状態パラメータθ_ｎの周辺化は、以下のように、状態パラメータθ_ｎがとりうる値の全範囲にわたる積分として定義される。 In the above equation, it is assumed that the state parameter θ _n takes a discrete value, and the sum of all the states is taken for peripheralization. On the other hand, the present invention includes a case where the state parameter θ _n takes a continuous value. In that case, the peripheral of the state parameters theta _n, as follows is defined as the integral over the entire range of state values parameter theta _n can take.

本実施形態では、状態パラメータθ_ｎは離散値を取るものとして説明する。なお、連続値を取る場合については、状態パラメータθ_ｎに関する総和の部分を、適宜、状態パラメータθ_ｎがとりうる値の全範囲にわたる積分として読み替えるだけでよいので、個別の説明は省略する。 In the present embodiment, the state parameter θ _n will be described as a discrete value. Note that when taking a continuous value, a portion of the sum on the status parameter theta _n, as appropriate, the full range of state values parameter theta _n may take only needs replaced as an integral, individual description thereof is omitted.

仮定（２）：状態パラメータθ_ｎが与えられた下での対数スペクトルρ_ｎの条件付き確率密度関数ｐ（ρ_ｎ｜θ_ｎ；Θ_ρ）は、多変量ガウス分布に従う。以下、仮定（１）及び仮定（２）に従う分布を潜在変数依存型ガウス分布と呼ぶ。 Assumption (2): The conditional probability density function p (ρ _n | θ _n ; Θ _ρ ) of the logarithmic spectrum ρ _n given the state parameter θ _n follows a multivariate Gaussian distribution. Hereinafter, the distribution according to the assumption (1) and the assumption (2) is referred to as a latent variable dependent Gaussian distribution.

仮定（３）：さらに、条件付き確率密度関数ｐ（ρ_ｎ｜θ_ｎ；Θ_ρ）は、各周波数ｋの対数スペクトルρ_ｎ，ｋに関する条件付き確率密度関数ｐ（ρ_ｎ，ｋ｜θ_ｎ；Θ_ρ）の積に分解できる。 Assumption (3): Furthermore, the conditional probability density function p (ρ _n | θ _n ; Θ _ρ ) is the conditional probability density function p (ρ _{n, k} | θ _n ) for the logarithmic spectrum ρ _{n, k} of each frequency k. ; can be decomposed into a product of theta _[rho).

なお、仮定（３）を満たすとき、条件付き確率密度関数ｐ（ρ_ｎ｜θ_ｎ；Θ_ρ）は周波数分解可能であるという。なお、仮定（２）により、上式の右辺はさらに以下のように書き換えられる。 When the assumption (3) is satisfied, the conditional probability density function p (ρ _n | θ _n ; Θ _ρ ) is said to be frequency resolvable. Note that the right side of the above equation is further rewritten as follows by assumption (2).

ここで、Ｎ（ｘ；μ，ξ）は、平均μ、分散ξの一次元ガウス分布の確率密度関数を表す。例えば、（８）式において、状態パラメータθ_ｎが単一の状態しかとらないとすると事前確率密度関数ｐ（ρ_ｎ；Θ_θ，Θ_ρ）はガウス分布に一致する。状態パラメータθ_ｎが有限個の状態のどれか一つを取ると仮定すると事前確率密度関数ｐ（ρ_ｎ；Θ_θ，Θ_ρ）は混合ガウス分布に一致する。さらに、隣り合う短時間フレーム間での状態パラメータθ_ｎの遷移が、ある状態遷移確率に従うと仮定すると、対数スペクトルρ_ｎに関する隠れマルコフモデルになる。上記の仮定および以下では、簡単のため、対数スペクトルρ_ｎの事前確率密度関数ｐ（ρ_ｎ；Θ_θ，Θ_ρ）は、短時間フレームｎ毎に独立な分布として説明する。なお、本実施形態において、短時間フレーム間の状態遷移過程を導入する方法は、隠れマルコフモデルに関する既知の技術に基づき自明であるので、その説明を省略する。 Here, N (x; μ, ξ) represents a probability density function of a one-dimensional Gaussian distribution with mean μ and variance ξ. For example, in the equation (8), if the state parameter θ _n takes only a single state, the prior probability density function p (ρ _n ; Θ _θ , Θ _ρ ) matches a Gaussian distribution. Assuming that the state parameter θ _n takes one of a finite number of states, the prior probability density function p (ρ _n ; Θ _θ , Θ _ρ ) matches the mixed Gaussian distribution. Furthermore, _{assuming that} the transition of the state parameter θ _n between adjacent short-time frames follows a certain state transition probability, a hidden Markov model for the logarithmic spectrum ρ _n is obtained. In the above assumption and the following, for the sake of simplicity, the prior probability density function p (ρ _n ; Θ _θ , Θ _ρ ) of the logarithmic spectrum ρ _n is described as an independent distribution for each short-time frame n. In the present embodiment, the method for introducing the state transition process between the short-time frames is self-evident based on a known technique related to the hidden Markov model, and thus description thereof is omitted.

（５）式の解は、状態パラメータθ_ｎを隠れ変数とした期待値最大化（Expextation Maximization：以下「ＥＭ」とする）アルゴリズム（及び、その関連最適化手法）で求めることができる。このとき、補助関数Ｑ（ρ_ｎ｜ρ＾_ｎ）は、以下のように定義される。 The solution of the equation (5) can be obtained by an expected value maximization (hereinafter referred to as “EM”) algorithm (and its related optimization method) using the state parameter θ _n as a hidden variable. At this time, the auxiliary function Q (ρ _n | ρ ^ _n ) is defined as follows.

ここで、対数スペクトルρ_ｎが既知の下で、周波数信号ｘ_ｎは、状態パラメータθ_ｎと独立であると仮定すると、上記右辺に含まれる完全データの確率密度関数ｐ（ｘ_ｎ，ρ_ｎ，θ_ｎ；Θ_θ，Θ_ρ）は、以下のように展開できる。 Here, _assuming that the logarithmic spectrum ρ _n is known and the frequency signal x _n is independent of the state parameter θ _n , the probability density function p (x _n , ρ _n , θ _n ; Θ _θ , Θ _ρ ) can be expanded as follows.

したがって、ρ_ｎと無関係の項を省略して（１１）式をさらに展開し、以下を得る。 Therefore, a term unrelated to ρ _n is omitted and the expression (11) is further developed to obtain the following.

ただし、 However,

したがって、ＥＭアルゴリズムでは、収束するまで、以下の二つの処理を交互に繰り返すことで、ＭＡＰ推定は実現される。
１．Ｅ−ｓｔｅｐ：スペクトル状態推定部１０４が、対数尤度重みｗ_θｎを（１８）式に従い更新する（ｓ１）。
２．Ｍ−ｓｔｅｐ：事後確率最大化スペクトル推定部１０６が、（１５）式を最大化するρ_ｎ，ｋを対数スペクトルの推定値ρ＾_ｎ，ｋとして更新する（ｓ２）。 Therefore, in the EM algorithm, MAP estimation is realized by alternately repeating the following two processes until convergence.
1. E-step: The spectrum state estimation unit 104 updates the log likelihood weight _wθn according to the equation (18) (s1).
2. M-step: The posterior probability maximizing spectrum estimation unit 106 updates ρ _{n, k} that maximizes the equation (15) as an estimated value ρ ^ _{n, k} of the logarithmic spectrum (s2).

なお、（１８）式は、状態パラメータθ_ｎが連続値を取る場合は、状態パラメータθ_ｎに関する連続関数になる。上記の繰り返しのうち、最も計算コストを増大させる可能性があるのは、補助関数Ｑ（ρ_ｎ｜ρ＾_ｎ）の値を最大化する対数スペクトルρ_ｎを求めるＭ−ｓｔｅｐである。これに対し、本実施形態では、上記の仮定（１）〜（３）により、すなわち、対数スペクトルρ_ｎの事前確率密度関数ｐ（ρ_ｎ；Θ_θ，Θ_ρ）が潜在変数依存型ガウス分布に従い、その条件付き確率密度関数ｐ（ρ_ｎ｜θ_ｎ；Θ_ρ）が周波数分解可能であるとき、計算コストを抑えた処理が可能になる。より具体的には、以下の二つのポイントにより、計算コストを抑えることができる。 Incidentally, (18), if the state parameter theta _n takes continuous values, the continuous function relating to the state parameter theta _n. Of the above iterations, the most likely to increase the calculation cost is M-step for obtaining a logarithmic spectrum ρ _n that maximizes the value of the auxiliary function Q (ρ _n | ρ ^ _n ). On the other hand, in this embodiment, the prior probability density function p (ρ _n ; Θ _θ , Θ _ρ ) of the logarithmic spectrum ρ _n is expressed by the latent variable dependent Gaussian distribution according to the above assumptions (1) to (3). Accordingly, when the conditional probability density function p (ρ _n | θ _n ; Θ _ρ ) can be frequency-resolved, processing with reduced calculation cost is possible. More specifically, the calculation cost can be suppressed by the following two points.

ポイント（１）：（１５）式は、各時間周波数に閉じて、対数スペクトルρ_ｎ，ｋに関するスカラー１変数関数になっている。すなわち、ρ_ｎの更新は、各時間周波数ｎ，ｋにおける対数スペクトルρ_ｎ，ｋの更新に分解できる。 Point (1): Equation (15) is a scalar univariate function with respect to the logarithmic spectrum ρ _{n, k} , closed to each time frequency. That is, the update of the [rho _n can be decomposed to update the log spectrum [rho _{n, k} at each time-frequency n, k.

ポイント（２）：さらに、各時間周波数ｎ，ｋにおいて最大化をするべき関数であるＱ_ｋ（ρ_ｎ，ｋ｜ρ＾_ｎ，ｋ）をρ_ｎ，ｋで微分して得られる関数は、以下のような単純な形式をしている。
f(z)=exp(z)+z+a (19) Point (2): Furthermore, a function obtained by differentiating Q _k (ρ _{n, k} | ρ ^ _{n, k} ), which is a function to be maximized at each time frequency n, k, with ρ _{n, k} is It has the following simple format.
f (z) = exp (z) + z + a (19)

したがって、（１５）式を最大にするρ_ｎ，ｋは、（１９）式においてｆ（ｚ）＝０となるｚを求めた後に、（２０）式から求めることができる。一方、（１９）式は、スカラー定数ａのみで形状が定まる１変数凸関数であり、効率的にｆ（ｚ）＝０の解を求める方法が存在する。例えば、ａの値毎にｆ（ｚ）＝０を与える解をあらかじめ求めておき、解の参照表を用意しておけば、参照表を見るだけで近似解を得ることができる。また、（１９）式を詳しく調べると、ａ＞−１／２でｆ（ｚ）≒ｅｘｐ（ｚ）＋ａ，ａ≦−１／２でｆ（ｚ）≒ｚ＋ａと荒く近似できることがわかる。これより、以下の近似解を得ることもできる。 Therefore, ρ _{n, k} that maximizes the equation (15) can be obtained from the equation (20) after obtaining z where f (z) = 0 in the equation (19). On the other hand, equation (19) is a one-variable convex function whose shape is determined only by the scalar constant a, and there is a method for efficiently obtaining a solution of f (z) = 0. For example, if a solution that gives f (z) = 0 for each value of a is obtained in advance and a solution reference table is prepared, an approximate solution can be obtained simply by looking at the reference table. Further, when the equation (19) is examined in detail, it can be understood that f (z) ≈exp (z) + a when a> −1/2 and f (z) ≈z + a can be roughly approximated when a ≦ −1 / 2. From this, the following approximate solution can also be obtained.

さらに、ｆ（ｚ）＝０を与える解の初期推定値としてこれらの近似解を用い、ニュートン法などの勾配法を用いて数値的な探索を行うことで、解の精度を上げることができる。しかも、このとき、ｆ（ｚ）は、１変数凸関数であるため、非常に効率的かつ効果的に勾配法による探索を実現できる。 Furthermore, the accuracy of the solution can be improved by using these approximate solutions as the initial estimated value of the solution that gives f (z) = 0 and performing a numerical search using a gradient method such as Newton's method. Moreover, at this time, since f (z) is a one-variable convex function, the search by the gradient method can be realized very efficiently and effectively.

（対数スペクトルの事前確率密度関数ｐ（ρ_ｎ；Θ_θ，Θ_ρ））
本実施形態では、対数スペクトルの事前確率密度関数ｐ（ρ_ｎ；Θ_θ，Θ_ρ）を、混合ガウス分布でモデル化する。状態パラメータθ_ｎは、各短時間フレームｎにおいて、１からＮ_θで番号付されたＮ_θ個の有限状態の何れかの状態ｉをとるとする。事前確率密度関数ｐ（ρ_ｎ；Θ_θ，Θ_ρ）は、以下で定義される。 (A priori probability density function p (ρ _n ; Θ _θ , Θ _ρ ) of logarithmic spectrum)
In this embodiment, the prior probability density function p (ρ _n ; Θ _θ , Θ _ρ ) of the logarithmic spectrum is modeled by a mixed Gaussian distribution. The state parameter θ _n is assumed to be any state i of N _θ finite states numbered from 1 to N _θ in each short-time frame n. The prior probability density function p (ρ _n ; Θ _θ , Θ _ρ ) is defined as follows.

ただし、 However,

p(θ_n=i;Θ_θ)=βⁱ (25)
スペクトル状態モデルΘ_θは、全ての状態ｉに関する混合比β^ｉからなり、状態依存スペクトルモデルΘ_ρは、全ての状態ｉ、全ての周波数ｋに関する平均μ^ｉ _ｋと共分散行列ξ^ｉ _ｋとからなる。これらのモデルは、スペクトル推定の対象となる信号に関する学習データを用いて、事前に学習されているとする。混合ガウス分布のモデルパラメータの学習には、ＥＭアルゴリズムを用いる方法などが知られている。 p (θ _n = i; Θ _θ ) = β ⁱ (25)
Spectral state model theta _theta, consist mixing ratio beta ⁱ for all states i, the state-dependent spectral model theta _[rho, all states i, from the mean mu ⁱ _k and covariance matrix xi] ⁱ _k for all frequencies k Become. These models are assumed to have been learned in advance using learning data related to signals to be subjected to spectrum estimation. A method using an EM algorithm is known for learning model parameters of a mixed Gaussian distribution.

以下、各部の詳細を説明する。 Details of each part will be described below.

＜事後確率最大化スペクトル推定部１０６及びスペクトル状態推定部１０４の詳細＞
事後確率最大化スペクトル推定部１０６が、一つのスカラー変数ｚとそのスカラー変数に関する指数関数ｅｘｐ（ｚ）と一つのスカラー定数ａとの和によって規定される非線形方程式（例えば（１９）式）に関して、各短時間フレームｎにおける周波数ｋ毎の周波数信号ｘ_ｎ，ｋと対数尤度重みｗ_θｎと状態依存スペクトルモデルΘ_ρに依存してスカラー定数ａを定めるとともに（例えば（２１）式）、非線形方程式が０に一致するスカラー変数ｚの値を求め（例えば（１９）式、（２２）式）、その求めたスカラー変数ｚと周波数信号ｘ_ｎ，ｋと対数尤度重みｗ_θｎと状態依存スペクトルモデルΘ_ρとに基づき、対数スペクトルの推定値ρ＾_ｎを更新する（例えば（２０）式）。 <Details of A posteriori probability maximization spectrum estimation unit 106 and spectrum state estimation unit 104>
The posterior probability maximizing spectrum estimation unit 106 relates to a nonlinear equation (for example, Equation (19)) defined by the sum of one scalar variable z, an exponential function exp (z) related to the scalar variable, and one scalar constant a. The scalar constant a is determined depending on the frequency signal x _n, k for each frequency k in each short-time frame n, the log likelihood weight w _θn, and the state-dependent spectrum model Θ _ρ (for example, Equation (21)), and a nonlinear equation The value of the scalar variable z whose value matches 0 is obtained (for example, equations (19) and (22)), the obtained scalar variable z, frequency signal x _{n, k} , log likelihood weight w _θn, and state-dependent spectrum model. based on the theta _[rho, updating estimates [rho _{^ n} logarithm spectrum (e.g. (20)).

図５は事後確率最大化スペクトル推定部１０６の機能ブロック図を、図６はスペクトル状態推定部１０４及び事後確率最大化スペクトル推定部１０６の処理フローを表す。 5 shows a functional block diagram of the posterior probability maximizing spectrum estimation unit 106, and FIG. 6 shows a processing flow of the spectrum state estimation unit 104 and the posterior probability maximizing spectrum estimation unit 106.

事後確率最大化スペクトル推定部１０６は、初期値設定部１０６ａ、スカラー定数算出部１０６ｂ、スカラー変数算出部１０６ｃ、対数スペクトル算出部１０６ｄ、収束判定部１０６ｅ及びスペクトル算出部１０６ｆを備える。 The posterior probability maximizing spectrum estimation unit 106 includes an initial value setting unit 106a, a scalar constant calculation unit 106b, a scalar variable calculation unit 106c, a logarithmic spectrum calculation unit 106d, a convergence determination unit 106e, and a spectrum calculation unit 106f.

初期値設定部１０６ａは、周波数信号ｘ_ｎを受け取り、式（３’）のように対数スペクトルの推定値ρ＾_ｎの初期値を、従来の最尤法により求める（ｓ２１）。 The initial value setting unit 106a receives the frequency signal _xn, and obtains the initial value of the logarithmic spectrum estimation value ρ ^ _n by the conventional maximum likelihood method (s21) as shown in Equation (3 ′).

スペクトル状態推定部１０４が、対数スペクトルの推定値ρ＾_ｎに加えて、（２４）式と（２５）式のそれぞれで定義されるスペクトル状態モデルΘ_θである混合比β^ｉと状態依存スペクトルモデルΘ_ρである平均μ^ｉ _ｋ及び共分散行列ξ^ｉ _ｋを受け取り、対数尤度重みｗ_ｉを（１８）式に基づき以下のように求める（ｓ１）。 In addition to the estimated value ρ ^ _n of the logarithmic spectrum, the spectrum state estimation unit 104 adds the mixture ratio β ⁱ that is the spectrum state model Θ _θ defined by each of the equations (24) and (25) and the state-dependent spectrum model. receiving an average mu ⁱ _k and covariance matrix xi] ⁱ _k is a theta _[rho, the log-likelihood weights _{w i} (18) obtained as follows based on the formula (s1).

さらに、周波数ｋ毎に、以下の手順により、対数スペクトルの推定値ρ＾_ｎ，ｋを更新する。 Further, the estimated value ρ ^ _{n, k} of the logarithmic spectrum is updated for each frequency k by the following procedure.

スカラー定数算出部１０６ｂは、周波数信号ｘ_ｎと、対数尤度重みｗ_ｉと、状態依存スペクトルモデルΘ_ρである全ての状態ｉ、全ての周波数ｋに関する平均μ^ｉ _ｋと共分散行列ξ^ｉ _ｋとを受け取り、（２１）式によりスカラー定数ａを求める（ｓ２２）。 Scalar constant calculating unit 106b, a frequency signal x _n and the log likelihoods weight w _i and the state all states i depend the spectral model theta _[rho, average mu ⁱ _k and covariance matrix xi] ⁱ _k for all frequencies k And a scalar constant a is obtained from the equation (21) (s22).

スカラー変数算出部１０６ｃは、スカラー定数ａを受け取り、（１９）式に関して、ｆ（ｚ）＝０となるスカラー変数ｚを（近似的に）求める（ｓ２３）。 The scalar variable calculation unit 106c receives the scalar constant a and obtains (approximately) a scalar variable z that satisfies f (z) = 0 with respect to the equation (19) (s23).

対数スペクトル算出部１０６ｄは、周波数信号ｘ_ｎと、対数尤度重みｗ_ｉと、状態依存スペクトルモデルΘ_ρである全ての状態ｉ、全ての周波数ｋに関する共分散行列ξ^ｉ _ｋと、スカラー変数ｚとを受け取り、（２０）式を満たす対数スペクトルρ_ｎ，ｋを求め、その推定値ρ＾_ｎ，ｋとする（ｓ２４）。 The logarithmic spectrum calculation unit 106d includes a frequency signal _xn , a log likelihood weight w _i , a covariance matrix ξ ⁱ _k for all states i and all frequencies k that are state-dependent spectrum models Θ _ρ , and a scalar variable z. The logarithmic spectrum ρ _{n, k} satisfying the equation (20) is obtained, and the estimated value ρ ^ _{n, k} is set (s24).

スペクトル状態推定部１０４における処理をＥ−ｓｔｅｐとし、事後確率最大化スペクトル推定部１０６における処理をＭ−ｓｔｅｐとし、ＥＭアルゴリズムに基づき、ｓ１〜ｓ２４を収束条件を満たすまで繰り返す。そのため、収束判定部１０６ｅは、対数スペクトルの推定値ρ＾_ｎ，ｋを受け取り、収束条件を満たすか否かを判定する（ｓ３）。収束条件を満たさない場合には、対数スペクトルの推定値ρ＾_ｎをスペクトル状態推定部１０４に出力し、各部に対し、処理を繰り返すように制御信号を出力する。収束条件を満たす場合には、対数スペクトルの推定値ρ＾_ｎをスペクトル算出部１０６ｆに出力する。 The process in the spectrum state estimation unit 104 is set to E-step, the process in the posterior probability maximization spectrum estimation unit 106 is set to M-step, and s1 to s24 are repeated until the convergence condition is satisfied based on the EM algorithm. Therefore, the convergence determination unit 106e receives the logarithmic spectrum estimation value ρ ^ _{n, k} and determines whether or not the convergence condition is satisfied (s3). When the convergence condition is not satisfied, the logarithmic spectrum estimation value ρ ^ _n is output to the spectrum state estimation unit 104, and a control signal is output to each unit so as to repeat the processing. When the convergence condition is satisfied, the logarithmic spectrum estimation value ρ ^ _n is output to the spectrum calculation unit 106f.

スペクトル算出部１０６ｆは、対数スペクトルの推定値ρ＾_ｎを受け取り、各周波数ｋにおけるスペクトルの推定値σ＾_ｎ，ｋを、σ＾_ｎ，ｋ＝ｅｘｐ（ρ＾_ｎ，ｋ）として求め（ｓ２６）、スペクトルの推定値σ＾_ｎをスペクトル推定装置１０の出力値として出力する。 The spectrum calculation unit 106f receives the logarithmic spectrum estimation value ρ ^ _n, and obtains the spectrum estimation value σ ^ _{n, k} at each frequency k as σ ^ _{n, k} = exp (ρ ^ _{n, k} ) (s26). ), And the spectrum estimation value σ ^ _n is output as the output value of the spectrum estimation device 10.

＜効果＞
このような構成により、スペクトルの分布を高精度に表現可能な対数スペクトルに関する潜在変数依存型ガウス分布をスペクトルの事前確率密度関数として用いて、効率的にスペクトルの値を推定できる。その結果、周波数信号が誤差を含むような場合でも、効率的かつ高精度に、そのスペクトルの推定が可能になる。 <Effect>
With such a configuration, it is possible to efficiently estimate a spectrum value using a latent variable-dependent Gaussian distribution relating to a logarithmic spectrum capable of expressing the spectrum distribution with high accuracy as a prior probability density function of the spectrum. As a result, even when the frequency signal includes an error, the spectrum can be estimated efficiently and with high accuracy.

＜変形例＞
第一実施形態の変形例として、ＥＭアルゴリズムのＥ−ｓｔｅｐにおいて、各状態の事後確率ｐ（θ_ｎ｜ρ＾_ｎ；Θ_ρ）を求める代わりに、最大の事後確率を与える状態を選択する場合の例を説明する。これは、混合ガウス分布や隠れマルコフモデルを用いた推定において、計算量削減のためにしばしば導入される近似計算である。この変形は、より厳密には、第一実施形態ではθ_ｎを隠れ変数として扱っていたのに対し、ρ_ｎと一緒にθ_ｎも事後確率最大化推定で求めることに相当する。すなわち、以下の問題を解くことに相当する。 <Modification>
As a modification of the first embodiment, in the E-step of the EM algorithm, instead of obtaining the posterior probability p (θ _n | ρ ^ _n ; Θ _ρ ) of each state, a state that gives the maximum posterior probability is selected. An example will be described. This is an approximate calculation that is often introduced to reduce the amount of calculation in estimation using a mixed Gaussian distribution or a hidden Markov model. More precisely, in the first embodiment, θ _n is treated as a hidden variable in the first embodiment, whereas θ _n is obtained by posterior probability maximization estimation together with ρ _n . That is, it corresponds to solving the following problem.

具体的な処理手順としては、第一実施形態の処理手順の中のｓ１の処理が以下のように修正されるのみで、それ以外は、第一実施形態と同じである。 As a specific processing procedure, only the processing of s1 in the processing procedure of the first embodiment is modified as follows, and the other processing steps are the same as those of the first embodiment.

以下、変形例のスペクトル状態推定部１０４における処理（Ｅ−ｓｔｅｐ、ｓ１）を説明する。図７は変形例のスペクトル状態推定部１０４の機能ブロック図を、図８はその処理フローを示す。スペクトル状態推定部１０４は状態番号推定部１０４ａと対数尤度重み設定部１０４ｂとを含む。 Hereinafter, the process (E-step, s1) in the spectrum state estimation unit 104 of the modification will be described. FIG. 7 is a functional block diagram of the spectrum state estimation unit 104 of the modified example, and FIG. 8 shows its processing flow. The spectrum state estimation unit 104 includes a state number estimation unit 104a and a log likelihood weight setting unit 104b.

状態番号推定部１０４ａは、対数スペクトルの推定値ρ＾_ｎ、と、スペクトル状態モデルΘ_θである混合比β^ｉと、状態依存スペクトルモデルΘ_ρである全ての状態ｉ、全ての周波数ｋに関する平均μ^ｉ _ｋ及び共分散行列ξ^ｉ _ｋとを受け取り、事後確率最大となる状態番号の推定値ｉ＾を The state number estimator 104a calculates the logarithm spectrum estimate ρ ^ _n , the mixture ratio β ⁱ that is the spectrum state model Θ _θ , the average for all states i that are the state-dependent spectrum model Θ _ρ , and all the frequencies k. μ ⁱ _k and covariance matrix ξ ⁱ _k are received, and the estimated value i ^ of the state number that maximizes the posterior probability is obtained.

として求める（ｓ１０４ａ）。
対数尤度重み設定部１０４ｂは推定値ｉ＾を受け取り、対数尤度重みｗ_ｉを (S104a).
The log likelihood weight setting unit 104b receives the estimated value i ^ and sets the log likelihood weight w _i .

として定める（ｓ１０４ｂ）。 (S104b).

＜第二実施形態＞
第一実施形態と異なる部分についてのみ説明する。第二実施形態として、状態パラメータθ_ｎが連続値をとる場合の実施形態について説明する。 <Second embodiment>
Only parts different from the first embodiment will be described. As a second embodiment, an embodiment in which the state parameter θ _n takes a continuous value will be described.

スペクトル状態モデル記憶部１０１に記憶されているスペクトル状態モデル、状態依存スペクトルモデル記憶部１０２に記憶されている状態依存スペクトルモデル、及び各部の処理等が、第一実施形態とは異なる。 The spectrum state model stored in the spectrum state model storage unit 101, the state dependent spectrum model stored in the state dependent spectrum model storage unit 102, the processing of each unit, and the like are different from those of the first embodiment.

（状態パラメータの定義）
本実施形態では、状態パラメータθ_ｎとして、周波数信号に対応するメル周波数ケプストラム係数（Mel-frequency cepstral coefficient、以下「ＭＦＣＣ」という）ｃ_ｎを用いる。ＭＦＣＣｃ_ｎは、各次数に対応するＮ_ｃ個の要素ｃ_ｎ，ｍを持つベクトルとして表現されているとする。よって、ｃ_ｎ＝［ｃ_ｎ，１，ｃ_ｎ，２，…，ｃ_ｎ，Ｎｃ］^Ｔ、ただし下付添え字ＮｃはＮ_ｃを表す。いま、ｃ_ｎ＝Ｈ（ρ_ｎ）を信号の対数スペクトルρ_ｎをＭＦＣＣに変換する関数とする。すると、Ｈ（ρ_ｎ）は、まず、対数スペクトルρ_ｎの各要素に対数変換の逆変換（ｅｘｐ（・））を適用し、メルフィルタバンク処理（ｍｆｂ（・）と表記）を施し、個々のベクトル要素に対数変換（ｌｏｇ（・））を適用したのち、離散コサイン変換（Ｄ（・））を適応することに対応する。すなわち、Ｈ（ρ_ｎ）は、以下の変換過程で表現される。 (Definition of state parameters)
In the present embodiment, as the state parameter theta _n, Mel frequency cepstral coefficients corresponding to the frequency signal (Mel-frequency cepstral coefficient, hereinafter "MFCC" hereinafter) using c _n. MFCCc _n is assumed to be represented as a vector with _{N c} number of elements _{c n, m} corresponding to each order. Thus, c _n = [c _{n, 1} , c _{n, 2} ,..., C _{n, Nc} ] ^T , where the subscript Nc represents N _c . Now, _let c _n = H (ρ _n ) be a function that converts the logarithmic spectrum ρ _{n of the} signal into MFCC. Then, H (ρ _n ) first applies inverse transformation of logarithmic transformation (exp (•)) to each element of the logarithmic spectrum ρ _n , performs mel filter bank processing (denoted as mfb (•)), and individually It corresponds to applying a discrete cosine transform (D (•)) after applying a logarithmic transform (log (•)) to the vector elements of. That is, H (ρ _n ) is expressed by the following conversion process.

（スペクトル状態モデルの定義）
本実施形態では、状態パラメータθ_ｎの事前確率密度関数ｐ（θ_ｎ；Θ_θ）としてＭＦＣＣの混合ガウス分布を用いるとする。これは、ｊをガウス分布の番号とすると、以下でモデル化される。 (Definition of spectral state model)
In this embodiment, it is assumed that a mixed Gaussian distribution of MFCC is used as the prior probability density function p (θ _n ; Θ _θ ) of the state parameter θ _n . This is modeled below, where j is a Gaussian number.

ここで、γ^ｊは分布番号ｊに対応する混合比、μ^ｊとΣ^ｊは、分布番号ｊに対応するガウス分布の平均と共分散行列である。したがって、スペクトル状態モデルΘ_θは、全てのｊに関するγ^ｊとμ^ｊとΣ^ｊの集合とする。 Here, γ ^j is the mixing ratio corresponding to the distribution number j, and μ ^j and Σ ^j are the mean and covariance matrix of the Gaussian distribution corresponding to the distribution number j. Therefore, the spectral state model Θ _θ is a set of γ ^j , μ ^j, and Σ ^j for all j.

（状態依存スペクトルモデルの定義）
本実施形態では、状態パラメータであるＭＦＣＣｃ_ｎが既知の場合の対数スペクトルρ_ｎの条件付き確率密度関数ｐ（ρ_ｎ｜ｃ_ｎ；Θ_ρ）は、上記のｃ_ｎ＝Ｈ（ρ_ｎ）の逆変換過程としてモデル化する。一般に、ｃ_ｎ＝Ｈ（ｓ_ｎ）は多対一の変換となるため、その逆変換はユニークには定められない。したがって、その定め方には任意性がある。ここでは、一例を挙げる。まず、以下のように、線形回帰を用いて、ｃ_ｎ＝Ｈ（ρ_ｎ）の疑似逆変換であるρ＾_ｎ＝Ｇ（ｃ_ｎ）を定義する。
G(c)=Ac+b (35)
ただし、Ａは行列（Ｎ_ｋ×Ｎ_ｃ）、ｂはベクトル（Ｎ_ｋ×１）を表す。行列Ａとベクトルｂの値は、事前に音響信号のデータベースにより学習されるか、観測信号を用いて学習されるものとする。すなわち、いま学習用のデータベース（もしくは、観測信号）から、複数の周波数信号ｘ_ｎにそれぞれ対応する複数の対数スペクトルρ_ｎと、それに対応するＭＦＣＣｃ_ｎ＝Ｈ（ρ_ｎ）の組合せが与えられているときに、行列Ａとベクトルｂは、以下のように定められるものとする。 (Definition of state-dependent spectrum model)
In the present embodiment, MFCCc _n is a state parameter conditional probability density function p of the logarithmic spectrum [rho _n the case of the known _{_{(ρ n | c n; Θ}} ρ) , the above _c n = H in ([rho _n) Model as an inverse transformation process. In general, since c _n = H (s _n ) is a many-to-one transformation, the inverse transformation is not uniquely determined. Therefore, the method of determination is arbitrary. Here, an example is given. First, ρ ^ _n = G (c _n ), which is a pseudo inverse transformation of c _n = H (ρ _n ), is defined using linear regression as follows.
G (c) = Ac + b (35)
However, A represents a matrix (N _k × N _c ), and b represents a vector (N _k × 1). It is assumed that the values of the matrix A and the vector b are learned in advance from a database of acoustic signals or are learned using observation signals. That is, now database for learning (or observation signals) from a plurality of logarithmic spectrum [rho _n respectively corresponding to the plurality of frequency signals x _n, given the combination of the corresponding MFCCc _n = H _(ρ _n) thereto The matrix A and the vector b are defined as follows.

また、逆変換誤差ｅ＝ρ_ｎ−ρ＾_ｎ＝ρ_ｎ−Ｇ（Ｈ（ρ_ｎ））は、平均０と共分散行列Ξのガウス分布に従うと仮定する。すなわち、
p(e)=N(e;0,Ξ) (37)
これにより、条件付き確率密度関数ｐ（ρ_ｎ｜ｃ_ｎ；Θ_ρ）は、以下のように定義される。
p(ρ_n|c_n;Θ_ρ)=N(ρ_n;G(c_n),Ξ) (38) Also, it is assumed that the inverse transformation error e = ρ _n −ρ ^ _n = ρ _n −G (H (ρ _n )) follows a Gaussian distribution with mean 0 and covariance matrix Ξ. That is,
p (e) = N (e; 0, Ξ) (37)
Accordingly, the conditional probability density function p (ρ _n | c _n ; Θ _ρ ) is defined as follows.
p (ρ _n | c _n ; Θ _ρ ) = N (ρ _n ; G (c _n ), Ξ) (38)

本実施形態では、上記条件付き確率密度関数ｐ（ρ_ｎ｜ｃ_ｎ；Θ_ρ）は、周波数毎の要素の積に分解可能と仮定されているので、共分散行列Ξは、対角要素にξ_ｋをもつ対角行列になる。よって、Ξ＝ｄｉａｇ（ξ_ｋ）と表すことができる。Ｇ（ｃ）のｋ番目の要素をＧ_ｋ（ｃ）と書くとすると、ξ_ｋは平均自乗回帰誤差Ｅ｛｜ρ_ｎ，ｋ−Ｇ_ｋ（ｃ_ｎ）｜^２｝として、事前に学習されるとする。すると、上記条件付き確率密度関数は、以下のように書くことができる。 In the present embodiment, the conditional probability density function p (ρ _n | c _n ; Θ _ρ ) is assumed to be decomposable into a product of elements for each frequency, so that the covariance matrix に is a diagonal element. It becomes a diagonal matrix with ξ _k . Therefore, it can be expressed as Ξ = diag (ξ _k ). If the k-th element of G (c) is written as G _k (c), ξ _k is learned in advance as a mean square regression error E {| ρ _{n, k} −G _k (c _n ) | ² }. Let's say. Then, the conditional probability density function can be written as follows.

したがって、状態依存スペクトルモデルΘ_ρとして、（３６）式の係数である行列Ａとベクトルｂ、及び全ての周波数ｋにおける逆変換誤差の分散ξ_ｋを含んでいれば、上記条件付き確率密度関数ｐ（ρ_ｎ｜ｃ_ｎ；Θ_ρ）は規定されることになる。 Accordingly, if the state-dependent spectrum model Θ _{ρ includes} the matrix A and the vector b, which are the coefficients of the equation (36), and the variance ξ _k of the inverse transformation error at all frequencies k, the conditional probability density function p (Ρ _n | c _n ; Θ _ρ ) will be defined.

（最適化関数）
本実施形態では、第一実施形態の変形例と同様に、対数スペクトルρ_ｎと状態パラメータであるＭＦＣＣｃ_ｎの両方をＭＡＰ推定により推定する場合を考える。 (Optimization function)
In the present embodiment, similarly to the modification of the first embodiment, both MFCCc _n is logarithmic spectrum [rho _n and state parameter assumed that estimated by MAP estimation.

したがって、第一実施形態の変形例と同様に、ρ＾_ｎとｃ＾_ｎを交互に更新することで、上式を最大化するρ＾_ｎとｃ＾_ｎを求める。 Therefore, similarly to the modification of the first embodiment, [rho ^ _n and c ^ _n by alternately updated, determine the [rho ^ _n and c ^ _n that maximizes the above expression.

スペクトル状態推定部１０４は、対数スペクトルの推定値ρ＾_ｎが固定された下で、上式を最大化するＭＦＣＣｃ_ｎの推定値ｃ＾_ｎを求める。これは、例えば、ｐ（ｃ_ｎ；Θ_θ）の混合ガウス分布の分布番号ｊを隠れ変数としたＥＭアルゴリズムで求めることができる（つまり、スペクトル状態推定部１０４と事後確率最大化スペクトル推定部１０６において行われるＥＭアルゴリズムのＥ−ｓｔｅｐ内で、後述する期待値算出部２０４ｂと状態パラメータ算出部２０４ｃとにおいてＥＭアルゴリズムを行う）。このための補助関数は以下のように定めることができる。 Spectrum state estimating section 104, under the estimated value [rho _{^ n} logarithmic spectrum is fixed, obtaining an estimate c _{^ n} of MFCCc _n that maximizes the above expression. This, for _{_{example, p (c n; Θ θ}} ) can be calculated by the EM algorithm and distribution number j hidden variables of Gaussian mixture (i.e., spectral state estimating unit 104 and the posterior probability maximization spectrum estimating unit 106 In the E-step of the EM algorithm performed in step EM, the expected value calculation unit 204b and the state parameter calculation unit 204c described later perform the EM algorithm. The auxiliary function for this can be defined as follows.

ただし、 However,

したがって、ＥＭアルゴリズムでは、以下の処理を収束するまで繰り返すことで、（４３）式を最大化するｃ_ｎを求める。これをＭＦＣＣである状態パラメータの推定値ｃ＾_ｎとする。
１．Ｅ−ｓｔｅｐ：（４４）式により、Ｅ｛ｊ｜ｃ＾_ｎ｝の値を更新する。
２．Ｍ−ｓｔｅｐ：（４３）式を最大化するｃ_ｎの値として、ｃ＾_ｎを更新する。具体的には、以下の式を計算する。 Thus, the EM algorithm is repeated until convergence to the following process to determine the c _n maximizing the expression (43). This is set as an estimated value c ^ _n of the state parameter which is MFCC.
1. E-step: The value of E {j | c ^ _n } is updated by the equation (44).
2. M-step: (43) as the value of _{c n} to maximize expression, and updates the c _{^ n.} Specifically, the following formula is calculated.

そして、上記のように、本実施形態では、状態パラメータｃ_ｎを潜在変数ではなく、ＭＡＰ推定により求めるべきパラメータとして扱う。このため、確定値として求めた上記の状態パラメータの推定値ｃ＾_ｎに関する対数尤度重みは、以下のようにディラックデルタ関数δ（・）を用いて表現される。 As described above, in the present embodiment, instead of the latent variable state parameter c _n, treated as a parameter to be determined by the MAP estimation. For this reason, the log likelihood weight related to the estimated value c ^ _n of the state parameter obtained as a definite value is expressed using the Dirac delta function δ (·) as follows.

一方、事後確率最大化スペクトル推定部１０６は、ｗ_ｃｎ（ただし、下付添え字ｃｎはｃ_ｎを表す）を受け取り、（１５）式を最大化するρ_ｎを求め、ρ＾_ｎとする。（１５）式は、以下のように書き換えられる。 On the other hand, the posterior probability maximization spectrum estimating unit _{106, w cn} (where subscript cn represents _{c n)} receive, seek [rho _n that maximizes equation (15), and [rho _{^ n.} Equation (15) can be rewritten as follows.

上式は、（１６）式と同じ形をしているので、本実施形態により効率的に最大化することができる。例えば、本実施形態に基づくスペクトル状態推定部１０４の手順は以下のようになる。 Since the above formula has the same shape as the formula (16), it can be efficiently maximized by this embodiment. For example, the procedure of the spectrum state estimation unit 104 based on this embodiment is as follows.

＜スペクトル状態推定部１０４の詳細＞
図９は第二実施形態のスペクトル状態推定部１０４の機能ブロック図を、図１０はその処理フローを示す。 <Details of Spectrum State Estimation Unit 104>
FIG. 9 is a functional block diagram of the spectrum state estimation unit 104 of the second embodiment, and FIG. 10 shows its processing flow.

スペクトル状態推定部１０４は、初期値算出部２０４ａ、期待値算出部２０４ｂ、状態パラメータ算出部２０４ｃ、収束判定部２０４ｅ及び対数尤度重み算出部２０４ｆを含む。 The spectrum state estimation unit 104 includes an initial value calculation unit 204a, an expected value calculation unit 204b, a state parameter calculation unit 204c, a convergence determination unit 204e, and a log likelihood weight calculation unit 204f.

初期値算出部２０４ａは、対数スペクトルの推定値ρ＾_ｎを受け取り、状態パラメータの初期値をｃ＾_ｎ＝Ｈ（ρ＾_ｎ）として定める（ｓ２０４ａ）（（３１）式参照）。 The initial value calculation unit 204a receives the logarithmic spectrum estimation value ρ ^ _n, and defines the initial value of the state parameter as c ^ _n = H (ρ ^ _n ) (s204a) (see formula (31)).

期待値算出部２０４ｂは、状態パラメータの推定値ｃ＾_ｎとスペクトル状態モデルΘ_θである混合比γ^ｊ、平均μ^ｊ及び共分散行列Σ^ｊを受け取り、（４４）式により、期待値Ｅ｛ｊ｜ｃ＾_ｎ｝を求める（ｓ２０４ｂ、Ｅ−ｓｔｅｐ）。 The expected value calculation unit 204b receives the state parameter estimate c ^ _n and the spectral ratio model Θ _θ , the mixture ratio γ ^j , the average μ ^j, and the covariance matrix Σ ^j , and the expected value E { j | c ^ _n } is obtained (s204b, E-step).

状態パラメータ算出部２０４ｃは、対数スペクトルの推定値ρ＾_ｎと、期待値Ｅ｛ｊ｜ｃ＾_ｎ｝と、スペクトル状態モデルΘ_θである平均μ^ｊ及び共分散行列Σ^ｊと、状態依存スペクトルモデルΘ_ρである行列Ａ、ベクトルｂ及び共分散行列Ξとを受け取り、（４５）式により、ＭＦＣＣである状態パラメータの推定値ｃ＾_ｎを求める（ｓ２０４ｃ、Ｍ−ｓｔｅｐ）。 State parameter calculating unit 204c includes the estimated value [rho _{^ n} of log spectrum, the expected value E | and {j c _{^ n},} the average mu ^j and covariance matrix sigma ^j is the spectrum state model theta _theta, state-dependent spectrum model theta _[rho a is matrix a, receives a Ξ vector b and covariance matrix, equation (45) by, obtaining an estimate c _{^ n} state parameter is MFCC (s204c, M-step) .

期待値算出部２０４ｂにおける処理をＥ−ｓｔｅｐとし、状態パラメータ算出部２０４ｃにおける処理をＭ−ｓｔｅｐとし、ＥＭアルゴリズムに基づき、収束条件を満たすまでｓ２０４ｂ及びｓ２０４ｃを繰り返す。そのため、収束判定部２０４ｅは、状態パラメータの推定値ｃ＾_ｎを受け取り、収束条件を満たすか否かを判定する（ｓ２０４ｅ）。収束条件を満たさない場合には、状態パラメータの推定値ｃ＾_ｎを期待値算出部２０４ｂに出力し、各部に対し、処理を繰り返すように制御信号を出力する。収束条件を満たす場合には、状態パラメータの推定値ｃ＾_ｎを対数尤度重み算出部２０４ｆに出力する。収束条件としては、例えば、（１）繰り返し回数が所定の回数を超えることや、（２）一つ前の繰り返し時に得られた状態パラメータの推定値と現在の繰り返し時に得られた状態パラメータの推定値との差分が閾値以下であること等が挙げられる。 The process in the expected value calculation unit 204b is E-step, the process in the state parameter calculation unit 204c is M-step, and s204b and s204c are repeated until the convergence condition is satisfied based on the EM algorithm. Therefore, the convergence determination unit 204e receives the state parameter estimation value c ^ _n and determines whether or not the convergence condition is satisfied (s204e). When the convergence condition is not satisfied, the state parameter estimation value c ^ _n is output to the expected value calculation unit 204b, and a control signal is output to each unit so as to repeat the process. When the convergence condition is satisfied, the state parameter estimation value c ^ _n is output to the log likelihood weight calculation unit 204f. As the convergence condition, for example, (1) the number of repetitions exceeds a predetermined number, or (2) the state parameter estimate obtained at the previous iteration and the state parameter estimate obtained at the current iteration. For example, the difference from the value is equal to or less than a threshold value.

対数尤度重み算出部２０４ｆは、状態パラメータの推定値ｃ＾_ｎを受け取り、式（４６）により、対数尤度重みｗ_ｃｎを求め（ｓ２０４ｆ）、事後確率最大化スペクトル推定部１０６に出力する。 The log-likelihood weight calculation unit 204f receives the estimated value c ^ _n of the state parameter, obtains the log-likelihood weight w _{cn according} to the equation (46) (s204f), and outputs it to the posterior probability maximized spectrum estimation unit 106.

なお、事後確率最大化スペクトル推定部１０６は、対数尤度重みｗ_ｃｎと周波数信号ｘ_ｎと状態依存スペクトルモデルΘ_ρとを受け取り、（４７）式を最大化する各周波数ｋにおける対数スペクトルρ_ｎ，ｋを求め、対数スペクトルの推定値ρ＾_ｎ，ｋを更新する。 The posterior probability maximizing spectrum estimation unit 106 receives the log likelihood weight w _cn , the frequency signal x _n, and the state-dependent spectrum model Θ _ρ, and the log spectrum ρ _{n at} each frequency k that maximizes the equation (47). _{, K} , and the logarithmic spectrum estimate ρ ^ _{n, k} is updated.

なお、（４７）式の最大化は、前述までの例と同様、（４７）式を（１９）式の形に書き換えてｆ（ｚ）＝０となるスカラー変数ｚを求めたのち、求めたスカラー変数ｚに対応する対数スペクトルρ_ｎを求めることで実現できる。 Note that the maximization of the equation (47) was obtained after rewriting the equation (47) into the form of the equation (19) to obtain the scalar variable z where f (z) = 0, as in the previous examples. This can be realized by obtaining a logarithmic spectrum ρ _n corresponding to the scalar variable z.

最後に、事後確率最大化スペクトル推定部１０６のスペクトル算出部１０６ｆが、各周波数ｋにおけるスペクトルの推定値σ＾_ｎ，ｋを、σ＾_ｎ，ｋ＝ｅｘｐ（ρ＾_ｎ，ｋ）として求め、スペクトルの推定値σ＾_ｎをスペクトル推定装置１０の出力値として出力する。 Finally, the spectrum calculation unit 106f of the posterior probability maximization spectrum estimation unit 106 obtains an estimated value σ ^ _{n, k} of the spectrum at each frequency k as σ ^ _{n, k} = exp (ρ ^ _{n, k} ), The spectrum estimation value σ ^ _n is output as the output value of the spectrum estimation device 10.

＜効果＞
このような構成により、第一実施形態と同様の効果を奏する。 <Effect>
With such a configuration, the same effects as in the first embodiment can be obtained.

（シミュレーション結果）
この発明のスペクトル推定装置１０を評価する目的で確認実験を行った。このため、非特許文献１に記載されている残響除去法のなかで、残響除去された周波数信号の推定値からスペクトルを推定する処理の部分で第一実施形態及び第二実施形態を用いた実験を行った。 (simulation result)
A confirmation experiment was conducted for the purpose of evaluating the spectrum estimation apparatus 10 of the present invention. For this reason, in the dereverberation method described in Non-Patent Document 1, an experiment using the first embodiment and the second embodiment in the process of estimating the spectrum from the estimated value of the dereverberated frequency signal. Went.

非特許文献１による残響除去アルゴリズムは、以下になる。
１．残響除去された周波数信号の推定値ｘ＾_ｎを観測信号とする。
２．周波数信号の推定値ｘ＾_ｎからそのスペクトルの推定値σ＾_ｎを最尤法により求める。
３．以下を収束するまで繰り返す。
（ａ）観測信号とスペクトルの推定値σ＾_ｎから残響の予測係数を更新する。
（ｂ）観測信号と残響の予測係数から残響除去した信号の周波数信号の推定値ｘ＾_ｎを求める。
（ｃ）周波数信号の推定値ｘ＾_ｎからそのスペクトルの推定値σ＾_ｎを最尤法により求める。
４．求められた周波数信号の推定値ｘ＾_ｎを時間領域信号に変換し、残響除去された信号として出力する。 The dereverberation algorithm according to Non-Patent Document 1 is as follows.
1. The estimated value x ^ _n of the frequency signal from which dereverberation has been removed is taken as an observation signal.
2. An estimated value σ ^ _n of the spectrum is obtained from the estimated value x ^ _n of the frequency signal by the maximum likelihood method.
3. Repeat until convergence.
(A) The prediction coefficient of reverberation is updated from the observed signal and the estimated value σ ^ _{n of the} spectrum.
(B) Obtain an estimated value x ^ _n of the frequency signal of the dereverberation signal from the observed signal and the reverberation prediction coefficient.
(C) The estimated value σ ^ _n of the spectrum is obtained from the estimated value x ^ _n of the frequency signal by the maximum likelihood method.
4). The obtained estimated value x ^ _n of the frequency signal is converted into a time domain signal and output as a signal from which dereverberation is removed.

本実験では、上記の３（ｃ）の処理において、最尤法の代わりに、第一実施形態及び第二実施形態を用いる場合と用いない場合の比較を行った。図１１は、その結果を示す。３つのグラフのそれぞれは、左から順に、長さの異なる３種類の観測信号（平均長さは、それぞれ１．１５秒、２．３秒、４．６秒）を用いた場合の結果を示している。各グラフの横軸は、上記の残響除去アルゴリズムの繰り返し回数を表している。繰り返し回数０は、観測信号を表す。縦軸は、残響除去された信号のケプストラム歪（ＣＤ）を示す。二点鎖線が非特許文献１の残響除去法で、一点鎖線が第一実施形態の方法でスペクトル推定を行った場合、実線が第二実施形態の方法でスペクトル推定を行った場合を示す。全ての場合において、２回目以降の繰り返しにおいて、非特許文献１の残響除去法よりも第一実施形態及び第二実施形態によるスペクトル推定を用いた場合の方が、ケプストラム歪を小さくできている。なお、上記の残響除去アルゴリズムにおいて、第一実施形態及び第二実施形態により推定されたスペクトルに基づき残響除去が行われるのは、２回目以降の繰り返しにおいてである。このため、一回目の繰り返しでは、第一実施形態及び第二実施形態を用いる場合と用いない場合で、ケプストラム歪の値に差は生じない。 In this experiment, in the processing of 3 (c), a comparison was made between the case where the first embodiment and the second embodiment were used and the case where the first embodiment and the second embodiment were not used instead of the maximum likelihood method. FIG. 11 shows the result. Each of the three graphs shows the results when using three types of observation signals with different lengths (average lengths are 1.15 seconds, 2.3 seconds, and 4.6 seconds, respectively) in order from the left. ing. The horizontal axis of each graph represents the number of repetitions of the above dereverberation algorithm. The number of repetitions 0 represents an observation signal. The vertical axis represents the cepstrum distortion (CD) of the dereverberation signal. A two-dot chain line is the dereverberation method of Non-Patent Document 1, and a one-dot chain line indicates the case where spectrum estimation is performed by the method of the first embodiment, and the solid line indicates a case where spectrum estimation is performed by the method of the second embodiment. In all cases, in the second and subsequent iterations, the cepstrum distortion can be made smaller when the spectrum estimation according to the first embodiment and the second embodiment is used than the dereverberation method of Non-Patent Document 1. In the above dereverberation algorithm, dereverberation is performed based on the spectrum estimated by the first embodiment and the second embodiment in the second and subsequent iterations. For this reason, in the first iteration, there is no difference in the value of the cepstrum distortion between the case where the first embodiment and the second embodiment are used and the case where the first embodiment is not used.

以上の結果より、第一実施形態及び第二実施形態により、潜在変数依存型ガウス分布を対数スペクトルの事前分布として導入し、対数スペクトルを事後確率最大化推定により求めることで、スペクトル推定精度を改善できることが確認された。 Based on the above results, the first and second embodiments introduce a latent variable-dependent Gaussian distribution as a prior distribution of the logarithmic spectrum, and obtain the logarithmic spectrum by posterior probability maximization estimation to improve the spectrum estimation accuracy. It was confirmed that it was possible.

＜その他の変形例＞
本発明は上記の実施形態及び変形例に限定されるものではない。例えば、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 <Other variations>
The present invention is not limited to the above-described embodiments and modifications. For example, the various processes described above are not only executed in time series according to the description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. In addition, it can change suitably in the range which does not deviate from the meaning of this invention.

＜プログラム及び記録媒体＞
上述したスペクトル推定装置は、コンピュータにより機能させることもできる。この場合はコンピュータに、目的とする装置（各種実施形態で図に示した機能ブロック図をもつ装置）として機能させるためのプログラム、またはその処理手順（各実施形態で示したもの）の各過程をコンピュータに実行させるためのプログラムを、ＣＤ−ＲＯＭ、磁気ディスク、半導体記憶装置などの記録媒体から、あるいは通信回線を介してそのコンピュータ内にダウンロードし、そのプログラムを実行させればよい。 <Program and recording medium>
The spectrum estimation apparatus described above can also be functioned by a computer. In this case, each process of a program for causing a computer to function as a target device (device having the functional block diagram shown in the drawings in various embodiments) or a processing procedure thereof (shown in each embodiment) is performed. A program to be executed by a computer may be downloaded from a recording medium such as a CD-ROM, a magnetic disk, a semiconductor storage device, or the like into the computer and executed.

本発明は、各短時間フレームにおける周波数信号のスペクトル値を用いて行う様々な処理に利用することができる。 The present invention can be used for various processes performed using the spectrum value of the frequency signal in each short time frame.

１０スペクトル推定装置
１０１スペクトル状態モデル記憶部
１０２状態依存スペクトルモデル記憶部
１０４スペクトル状態推定部
１０４ａ状態番号推定部
１０４ｂ設定部
１０６事後確率最大化スペクトル推定部
１０６ａ初期値設定部
１０６ｂスカラー定数算出部
１０６ｃスカラー変数算出部
１０６ｄ対数スペクトル算出部
１０６ｅ収束判定部
１０６ｆスペクトル算出部
２０４ａ初期値算出部
２０４ｂ期待値算出部
２０４ｃ状態パラメータ算出部
２０４ｅ収束判定部
２０４ｆ算出部 DESCRIPTION OF SYMBOLS 10 Spectrum estimation apparatus 101 Spectrum state model memory | storage part 102 State dependence spectrum model memory | storage part 104 Spectrum state estimation part 104a State number estimation part 104b Setting part 106 A posteriori probability maximization spectrum estimation part 106a Initial value setting part 106b Scalar constant calculation part 106c Scalar Variable calculation unit 106d Logarithmic spectrum calculation unit 106e Convergence determination unit 106f Spectrum calculation unit 204a Initial value calculation unit 204b Expected value calculation unit 204c State parameter calculation unit 204e Convergence determination unit 204f Calculation unit

Claims

各短時間フレームｎにおける周波数信号ｘ_ｎのスペクトル値σ_ｎを推定するスペクトル推定装置であって、
前記周波数信号ｘ_ｎの対数スペクトルρ_ｎの状態を表す状態パラメータθ_ｎの事前確率密度関数ｐ（θ_ｎ；Θ_θ）に関するモデルパラメータであるスペクトル状態モデルΘ_θと、前記状態パラメータθ_ｎが既知の条件下での前記対数スペクトルρ_ｎの条件付き確率密度関数ｐ（ρ_ｎ｜θ_ｎ；Θ_ρ）に関するモデルパラメータである状態依存スペクトルモデルΘ_ρとを記憶する記憶部と、
前記対数スペクトルρ _ｎの推定値ρ＾_ｎ、前記スペクトル状態モデルΘ_θ及び前記状態依存スペクトルモデルΘ_ρを用いて、対数尤度重みｗ_θｎを推定するスペクトル状態推定部と、
前記周波数信号ｘ_ｎ、前記対数尤度重みｗ_θｎ及び前記状態依存スペクトルモデルΘ_ρを用いて、目的関数を最大化する対数スペクトルを前記推定値ρ＾ _ｎとして求める事後確率最大化スペクトル推定部とを含み、
収束条件を満たすまで、前記スペクトル状態推定部及び事後確率最大化スペクトル推定部における処理を繰り返す、
スペクトル推定装置。 A spectral estimation device for estimating a spectral value sigma _n of the frequency signal x _n in each short time frame n,
A spectral state model Θ _θ that is a model parameter related to the prior probability density function p (θ _n ; Θ _θ ) of the state parameter θ _n representing the state of the logarithmic spectrum ρ _n of the frequency signal x _n and the state parameter θ _n are known. A storage unit that stores a state-dependent spectrum model Θ _ρ that is a model parameter related to a conditional probability density function p (ρ _n | θ _n ; Θ _ρ ) of the logarithmic spectrum ρ _n under the conditions of
The log spectrum [rho estimate of _{_n} ρ ^ _{_n,} using said spectral state model theta _theta and the state-dependent spectral model theta _[rho, and spectral state estimation unit for estimating log-likelihood weights w _.theta.n,
A posteriori probability maximizing spectrum estimation unit that obtains a logarithmic spectrum that maximizes an objective function as the estimated value ρ ^ _n using the frequency signal x _n , the log likelihood weight w _θn, and the state-dependent spectrum model Θ _ρ ; Including
Until the convergence condition is satisfied, the processing in the spectrum state estimation unit and the posterior probability maximization spectrum estimation unit is repeated.
Spectrum estimation device.

請求項１記載のスペクトル推定装置であって、
前記事後確率最大化スペクトル推定部が、一つのスカラー変数ｚとそのスカラー変数に関する指数関数ｅｘｐ（ｚ）と一つのスカラー定数ａとの和によって規定される非線形方程式に関して、各短時間フレームｎにおける周波数ｋ毎の周波数信号ｘ_ｎ，ｋと前記対数尤度重みｗ_θｎと前記状態依存スペクトルモデルΘ_ρに依存して前記スカラー定数ａを定めるとともに、前記非線形方程式が０に一致する前記スカラー変数ｚの値を求め、その求めた前記スカラー変数ｚと前記周波数信号ｘ_ｎ，ｋと前記対数尤度重みｗ_θｎと前記状態依存スペクトルモデルΘ_ρとに基づき、前記推定値ρ＾_ｎを更新する、
スペクトル推定装置。 The spectrum estimation apparatus according to claim 1, wherein
The posterior probability maximizing spectrum estimator relates to a nonlinear equation defined by the sum of one scalar variable z, an exponential function exp (z) related to the scalar variable, and one scalar constant a. The scalar variable a is determined depending on the frequency signal x _n, k for each frequency k, the log-likelihood weight w _θn and the state-dependent spectral model Θ _ρ , and the nonlinear equation is equal to zero. And updating the estimated value ρ ^ _n based on the determined scalar variable z, the frequency signal x _{n, k} , the log likelihood weight w _θn, and the state-dependent spectrum model Θ _ρ .
Spectrum estimation device.

請求項１または請求項２記載のスペクトル推定装置であって、
前記対数スペクトルρ_ｎが取りうる値を規定する事前確率密度関数ｐ（ρ_ｎ；Θ_θ，Θ_ρ）が混合ガウス分布に従い、前記状態パラメータθ_ｎは短時間フレームｎにおいてＮ_θ個の有限状態の何れかの状態をとり、前記スペクトル状態モデルΘ_θは全ての状態ｉに関する混合比β^ｉからなり、前記状態依存スペクトルモデルΘ_ρは全ての状態ｉに関する全ての周波数ｋに関する平均μ^ｉ _ｋと分散ξ^ｉ _ｋとからなるものとし、
前記スペクトル状態推定部は、前記対数尤度重みｗ_ｉを

または、

として推定する、
スペクトル推定装置。 The spectrum estimation apparatus according to claim 1 or 2, wherein
The prior probability density function p (ρ _n ; Θ _θ , Θ _ρ ) that defines the possible values of the logarithmic spectrum ρ _n follows a mixed Gaussian distribution, and the state parameter θ _n is N _θ finite states in a short time frame n. Where the spectral state model Θ _θ is composed of the mixing ratio β ⁱ for all states i, and the state dependent spectral model Θ _ρ is the average μ ⁱ _k for all frequencies k for all states i The variance ξ ⁱ _k
The spectrum state estimation unit calculates the log likelihood weight w _i .

Or

Estimate as
Spectrum estimation device.

請求項１または請求項２記載のスペクトル推定装置であって、
前記状態パラメータθ_ｎを前記周波数信号に対応するメル周波数ケプストラム係数ｃ_ｎとし、前記状態パラメータの事前確率密度関数ｐ（ｃ_ｎ；Θ_θ）として前記メル周波数ケプストラム係数ｃ_ｎの混合ガウス分布を用い、ｊをガウス分布の番号とし、前記スペクトル状態モデルΘ_θを全てのｊに関する混合比γ^ｊと平均μ^ｊと共分散行列Σ^ｊとの集合とし、前記メル周波数ケプストラム係数ｃ_ｎから前記推定値ρ＾_ｎへの擬似逆変換を規定する行列Ａ及びベクトルｂと、逆変換誤差ｅがガウス分布に従うと仮定したときの共分散行列Ξとを前記状態依存スペクトルモデルΘ_ρとし、
前記スペクトル状態推定部は、
前記メル周波数ケプストラム係数である状態パラメータの推定値ｃ＾_ｎと前記混合比γ^ｊと前記平均μ^ｊと前記共分散行列Σ^ｊとを用いて、期待値Ｅ｛ｊ｜ｃ＾_ｎ｝を

として算出する期待値算出部と、
前記推定値ρ＾_ｎと前記期待値Ｅ｛ｊ｜ｃ＾_ｎ｝と前記行列Ａと前記ベクトルｂと前記共分散行列Ξと前記平均μ^ｊと前記共分散行列Σ^ｊとを用いて、前記状態パラメータである前記メル周波数ケプストラム係数の推定値ｃ＾_ｎを

として更新する状態パラメータ更新部と、
δをディラックデルタ関数とし、前記メル周波数ケプストラム係数である状態パラメータの推定値ｃ＾_ｎを用いて、前記メル周波数ケプストラム係数である前記状態パラメータの推定値ｃ＾_ｎに対する対数尤度重みｗ_ｃｎを

として算出する対数尤度重み算出部とを含み、
収束条件を満たすまで、前記期待値算出部及び前記状態パラメータ更新部における処理を繰り返す、
スペクトル推定装置。 The spectrum estimation apparatus according to claim 1 or 2, wherein
And Mel-frequency cepstral coefficients c _n corresponding to the state parameter theta _n to the frequency signal, the prior probability density function p of said state _parameters; Gaussian mixture of the mel-frequency cepstral coefficients c _n is used as (c n theta _theta) and j is the number in the Gaussian distribution, the spectral state model theta _theta and set the mixing ratio gamma ^j relating to all j the mean mu ^j and covariance matrix sigma ^j, the estimate from the mel-frequency cepstral coefficients c _n The state-dependent spectrum model Θ _ρ is defined as a matrix A and a vector b defining pseudo inverse transformation to ρ ^ _n and a covariance matrix Ξ when the inverse transformation error e is assumed to follow a Gaussian distribution.
The spectrum state estimation unit
The expected value E {j | c ^ _n } is obtained by using the estimated value c ^ _{n of the} mel frequency cepstrum coefficient, the mixing ratio γ ^j , the average μ ^j, and the covariance matrix Σ ^j.

An expected value calculation unit for calculating as
Using the estimated value ρ ^ _n , the expected value E {j | c ^ _n }, the matrix A, the vector b, the covariance matrix Ξ, the average μ ^j, and the covariance matrix Σ ^j , The estimated value c ^ _n of the mel frequency cepstrum coefficient which is a state parameter is

A state parameter updater that updates as
Using δ as a Dirac delta function and using the state parameter estimate c ^ _n that is the mel frequency cepstrum coefficient, a log likelihood weight w _cn for the state parameter estimate c ^ _n that is the mel frequency cepstrum coefficient is

And a log likelihood weight calculation unit that calculates as
Until the convergence condition is satisfied, repeat the process in the expected value calculation unit and the state parameter update unit,
Spectrum estimation device.

各短時間フレームｎにおける周波数信号ｘ_ｎのスペクトル値σ_ｎを推定するスペクトル推定方法であって、
前記周波数信号ｘ_ｎの対数スペクトルρ_ｎの状態を表す状態パラメータθ_ｎの事前確率密度関数ｐ（θ_ｎ；Θ_θ）に関するモデルパラメータであるスペクトル状態モデルΘ_θと、前記状態パラメータθ_ｎが既知の条件下での前記対数スペクトルρ_ｎの条件付き確率密度関数ｐ（ρ_ｎ｜θ_ｎ；Θ_ρ）に関するモデルパラメータである状態依存スペクトルモデルΘ_ρとを記憶しておき、
前記対数スペクトルρ _ｎの推定値ρ＾_ｎ、前記スペクトル状態モデルΘ_θ及び前記状態依存スペクトルモデルΘ_ρを用いて、対数尤度重みｗ_θｎを推定するスペクトル状態推定ステップと、
前記周波数信号ｘ_ｎ、前記対数尤度重みｗ_θｎ及び前記状態依存スペクトルモデルΘ_ρを用いて、目的関数を最大化する対数スペクトルを前記推定値ρ^ _ｎとして求める事後確率最大化スペクトル推定ステップとを含み、
収束条件を満たすまで、前記スペクトル状態推定ステップ及び事後確率最大化スペクトル推定ステップにおける処理を繰り返す、
スペクトル推定方法。 A spectral estimation method for estimating the spectral value sigma _n of the frequency signal x _n in each short time frame n,
A spectral state model Θ _θ that is a model parameter related to the prior probability density function p (θ _n ; Θ _θ ) of the state parameter θ _n representing the state of the logarithmic spectrum ρ _n of the frequency signal x _n and the state parameter θ _n are known. A state-dependent spectral model Θ _ρ that is a model parameter for the conditional probability density function p (ρ _n | θ _n ; Θ _ρ ) of the logarithmic spectrum ρ _n under the conditions of
Estimate [rho ^ _n of the log spectrum [rho _{_n,} using said spectral state model theta _theta and the state-dependent spectral model theta _[rho, and spectral state estimating step of estimating log-likelihood weights w _.theta.n,
Using the frequency signal x _n , the log likelihood weight w _θn and the state-dependent spectrum model Θ _ρ , a posterior probability maximizing spectrum estimation step for obtaining a logarithm spectrum maximizing an objective function as the estimated value ρ ^ _n ; Including
Until the convergence condition is satisfied, the processing in the spectral state estimation step and the posterior probability maximization spectral estimation step is repeated.
Spectral estimation method.

請求項５記載のスペクトル推定方法であって、
前記事後確率最大化スペクトル推定ステップが、一つのスカラー変数ｚとそのスカラー変数に関する指数関数ｅｘｐ（ｚ）と一つのスカラー定数ａとの和によって規定される非線形方程式に関して、各短時間フレームｎにおける周波数ｋ毎の周波数信号ｘ_ｎ，ｋと前記対数尤度重みｗ_θｎと前記状態依存スペクトルモデルΘ_ρに依存して前記スカラー定数ａを定めるとともに、前記非線形方程式が０に一致する前記スカラー変数ｚの値を求め、その求めた前記スカラー変数ｚと前記周波数信号ｘ_ｎ，ｋと前記対数尤度重みｗ_θｎと前記状態依存スペクトルモデルΘ_ρとに基づき、前記推定値ρ＾_ｎを更新する、
スペクトル推定方法。 The spectrum estimation method according to claim 5, comprising:
The posterior probability maximizing spectrum estimation step is performed in each short time frame n with respect to a nonlinear equation defined by a sum of one scalar variable z, an exponential function exp (z) related to the scalar variable, and a scalar constant a. The scalar variable a is determined depending on the frequency signal x _n, k for each frequency k, the log-likelihood weight w _θn and the state-dependent spectral model Θ _ρ , and the nonlinear equation is equal to zero. And updating the estimated value ρ ^ _n based on the determined scalar variable z, the frequency signal x _{n, k} , the log likelihood weight w _θn, and the state-dependent spectrum model Θ _ρ .
Spectral estimation method.

請求項５または請求項６記載のスペクトル推定方法であって、
前記対数スペクトルρ_ｎが取りうる値を規定する事前確率密度関数ｐ（ρ_ｎ；Θ_θ，Θ_ρ）が混合ガウス分布に従い、前記状態パラメータθ_ｎは短時間フレームｎにおいてＮ_θ個の有限状態の何れかの状態をとり、前記スペクトル状態モデルΘ_θは全ての状態ｉに関する混合比β^ｉからなり、前記状態依存スペクトルモデルΘ_ρは全ての状態ｉに関する全ての周波数ｋに関する平均μ^ｉ _ｋと分散ξ^ｉ _ｋとからなるものとし、
前記スペクトル状態推定ステップにおいて、前記対数尤度重みｗ_ｉを

または、

として推定する、
スペクトル推定方法。 The spectrum estimation method according to claim 5 or 6, comprising:
The prior probability density function p (ρ _n ; Θ _θ , Θ _ρ ) that defines the possible values of the logarithmic spectrum ρ _n follows a mixed Gaussian distribution, and the state parameter θ _n is N _θ finite states in a short time frame n. Where the spectral state model Θ _θ is composed of the mixing ratio β ⁱ for all states i, and the state dependent spectral model Θ _ρ is the average μ ⁱ _k for all frequencies k for all states i The variance ξ ⁱ _k
In the spectral state estimation step, the log likelihood weight w _i is

Or

Estimate as
Spectral estimation method.

請求項５または請求項６記載のスペクトル推定方法であって、
前記状態パラメータθ_ｎを前記周波数信号に対応するメル周波数ケプストラム係数ｃ_ｎとし、前記状態パラメータの事前確率密度関数ｐ（ｃ_ｎ；Θ_θ）として前記メル周波数ケプストラム係数ｃ_ｎの混合ガウス分布を用い、ｊをガウス分布の番号とし、前記スペクトル状態モデルΘ_θを全てのｊに関する混合比γ^ｊと平均μ^ｊと共分散行列Σ^ｊとの集合とし、前記メル周波数ケプストラム係数ｃ_ｎから前記推定値ρ＾_ｎへの擬似逆変換を規定する行列Ａ及びベクトルｂと、逆変換誤差ｅがガウス分布に従うと仮定したときの共分散行列Ξとを前記状態依存スペクトルモデルΘ_ρとし、
前記スペクトル状態推定ステップは、
前記メル周波数ケプストラム係数である状態パラメータの推定値ｃ＾_ｎと前記混合比γ^ｊと前記平均μ^ｊと前記共分散行列Σ^ｊとを用いて、期待値Ｅ｛ｊ｜ｃ＾_ｎ｝を

として算出する期待値算出ステップと、
前記推定値ρ＾_ｎと前記期待値Ｅ｛ｊ｜ｃ＾_ｎ｝と前記行列Ａと前記ベクトルｂと前記共分散行列Ξと前記平均μ^ｊと前記共分散行列Σ^ｊとを用いて、前記状態パラメータである前記メル周波数ケプストラム係数の推定値ｃ＾_ｎを

として更新する状態パラメータ更新ステップと、
δをディラックデルタ関数とし、前記メル周波数ケプストラム係数である状態パラメータの推定値ｃ＾_ｎを用いて、前記メル周波数ケプストラム係数である前記状態パラメータの推定値ｃ＾_ｎに対する対数尤度重みｗ_ｃｎを

として算出する対数尤度重み算出ステップとを含み、
収束条件を満たすまで、前記期待値算出ステップ及び前記状態パラメータ更新ステップにおける処理を繰り返す、
スペクトル推定方法。 The spectrum estimation method according to claim 5 or 6, comprising:
And Mel-frequency cepstral coefficients c _n corresponding to the state parameter theta _n to the frequency signal, the prior probability density function p of said state _parameters; Gaussian mixture of the mel-frequency cepstral coefficients c _n is used as (c n theta _theta) and j is the number in the Gaussian distribution, the spectral state model theta _theta and set the mixing ratio gamma ^j relating to all j the mean mu ^j and covariance matrix sigma ^j, the estimate from the mel-frequency cepstral coefficients c _n The state-dependent spectrum model Θ _ρ is defined as a matrix A and a vector b defining pseudo inverse transformation to ρ ^ _n and a covariance matrix Ξ when the inverse transformation error e is assumed to follow a Gaussian distribution.
The spectral state estimation step includes:
The expected value E {j | c ^ _n } is obtained by using the estimated value c ^ _{n of the} mel frequency cepstrum coefficient, the mixing ratio γ ^j , the average μ ^j, and the covariance matrix Σ ^j.

An expected value calculation step to calculate as
Using the estimated value ρ ^ _n , the expected value E {j | c ^ _n }, the matrix A, the vector b, the covariance matrix Ξ, the average μ ^j, and the covariance matrix Σ ^j , The estimated value c ^ _n of the mel frequency cepstrum coefficient which is a state parameter is

A state parameter update step to update as
Using δ as a Dirac delta function and using the state parameter estimate c ^ _n that is the mel frequency cepstrum coefficient, a log likelihood weight w _cn for the state parameter estimate c ^ _n that is the mel frequency cepstrum coefficient is

Log likelihood weight calculation step to calculate as
Until the convergence condition is satisfied, the processes in the expected value calculation step and the state parameter update step are repeated.
Spectral estimation method.

請求項１から請求項４の何れかに記載のスペクトル推定装置としてコンピュータを機能させるためのプログラム。 The program for functioning a computer as a spectrum estimation apparatus in any one of Claims 1-4.