JP6536322B2

JP6536322B2 - Noise estimation device, program and method, and voice processing device

Info

Publication number: JP6536322B2
Application number: JP2015191944A
Authority: JP
Inventors: 大藤枝
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2015-09-29
Filing date: 2015-09-29
Publication date: 2019-07-03
Anticipated expiration: 2035-09-29
Also published as: JP2017067951A

Description

本発明は、雑音推定装置、プログラム及び方法、並びに、音声処理装置に関し、例えば、入力信号に含まれる雑音成分の推定結果を用いて、入力信号に重畳された雑音成分を抑圧する装置に適用し得る。 The present invention relates to a noise estimation device, a program and method, and a voice processing device, and is applied to, for example, a device that suppresses noise components superimposed on an input signal using estimation results of noise components included in the input signal. obtain.

自然環境において雑音はいたる所に存在するため、一般に実世界で観測される音声は種々の発信元からの雑音を含む。雑音を含んで観測された入力信号から音声だけを強調させるために、様々な雑音抑圧方法が開発されている。これらのうちのほとんどは、抑圧すべき雑音を推定する方法と、雑音を抑圧するフィルタを計算する方法とを有する。従来の入力信号から雑音を抑圧する音声処理装置では、周波数領域で雑音のパワーを推定するものがある。 Since noise is present everywhere in the natural environment, speech observed in the real world generally includes noise from various sources. Various noise suppression methods have been developed to emphasize only speech from the observed input signal with noise. Most of these have a method of estimating the noise to be suppressed and a method of calculating a filter that suppresses the noise. Some conventional speech processing apparatuses that suppress noise from an input signal estimate the power of the noise in the frequency domain.

従来、最も単純な雑音推定方法の例として、入力スペクトルを音声が存在しない区間で平均する方法がある。しかし、このような従来の雑音推定方法は、事前に音声が存在しない区間を推定しなければならない。そのため、音声が存在する区間を推定する音声区間検出（ＶｏｉｃｅＡｃｔｉｖｉｔｙＤｅｔｅｃｔｉｏｎ：ＶＡＤ）という技術も盛んに開発されているが、完全なＶＡＤは未だ達成されていない。雑音推定処理において、音声区間の推定を誤ると、推定雑音が目的音声を含んでしまうため、強調音声や残留雑音を歪ませるという問題が生じる。また、上述のような雑音推定方法では、雑音区間でしか雑音を推定しないため、長い音声区間があると雑音の変化に追従できないという欠点もある。 Conventionally, as an example of the simplest noise estimation method, there is a method of averaging an input spectrum in a section where there is no speech. However, such a conventional noise estimation method must estimate in advance a section in which no speech exists. Therefore, a technology called voice activity detection (VAD) for estimating a section in which voice is present has been actively developed, but a complete VAD has not been achieved yet. In the noise estimation process, if the estimation of the voice section is wrong, the estimation noise includes the target voice, which causes a problem of distorting the emphasized voice and the residual noise. In addition, in the noise estimation method as described above, since noise is estimated only in the noise section, there is a drawback that it is not possible to follow the change in noise if there is a long voice section.

このような背景から、音声区間でも雑音の推定を継続する雑音推定方法として、従来非特許文献１、非特許文献２、及び特許文献１の記載技術がある。いずれの文献も雑音抑圧方法（音声強調方法とも言う）に関する。 From such background, there are conventional techniques described in Non-Patent Document 1, Non-Patent Document 2, and Patent Document 1 as noise estimation methods for continuing estimation of noise even in voice sections. Both documents relate to a noise suppression method (also referred to as a speech enhancement method).

非特許文献１に記載の従来の雑音推定方法は、入力パワーの時間方向のピークが目的音声の存在を表す一方で、谷が平滑化した雑音パワーの推定に使えるという発見に基づいている。具体的には、現在から所定時間（Ｔ秒）過去までの入力パワーの最小値を、第１の雑音パワー推定値とする。しかし、第１の雑音パワー推定値はバイアスを有しており、真の雑音パワーよりも小さくなる性質を持つ。このバイアスは、第１の雑音パワー推定値の期待値から推定され、得られたバイアス推定値を用いて第１の雑音パワー推定値を補正して、第２の雑音パワー推定値（最終的な推定値）を得る。 The conventional noise estimation method described in Non-Patent Document 1 is based on the discovery that the peak in the time direction of the input power represents the presence of the target voice, while the valley can be used to estimate the noise power smoothed. Specifically, the minimum value of input power from the present to a predetermined time (T seconds) past is taken as a first noise power estimated value. However, the first noise power estimate has a bias and has the property of being smaller than the true noise power. This bias is estimated from the expected value of the first noise power estimate, and the obtained noise estimate is used to correct the first noise power estimate to obtain a second noise power estimate (finally Get an estimate).

非特許文献２に記載の従来の雑音推定方法は、目的音声と雑音の複素スペクトルの分布がいずれも平均ゼロの複素正規分布に従うという仮説に基づいて、雑音の複素スペクトルの分散の最尤推定値を雑音パワー推定値とする。この仮説に基づくと、入力信号の複素スペクトルの分布は音声の複素スペクトルの分散と雑音の複素スペクトルの分散の和を分散とする平均ゼロの複素正規分布となる。ここに現在の入力が劣化音声と雑音のどちらであるかに関する隠れ変数を導入して、忘却係数を伴ったオンラインＥＭアルゴリズムを適用することで、雑音の複素スペクトルの最尤推定値を算出することができる。 The conventional noise estimation method described in Non-Patent Document 2 is a maximum likelihood estimation value of the variance of the complex spectrum of noise based on the hypothesis that the distributions of the complex spectrum of the target speech and noise both follow an average of zero complex normal distribution. Let be the noise power estimate. Based on this hypothesis, the distribution of the complex spectrum of the input signal is a mean-zero complex normal distribution with the variance being the sum of the variance of the complex spectrum of speech and the variance of the complex spectrum of noise. Calculate the maximum likelihood estimate value of the complex spectrum of noise by applying the on-line EM algorithm with forgetting factor, by introducing a hidden variable as to whether the current input is degraded speech or noise here. Can.

特許文献１に記載の従来の雑音推定方法は、入力パワーに適切な重み係数を乗じて、得られた加重入力パワーを所定時間（Ｔ秒）分記憶しておき、記憶された加重入力パワーの平均値を雑音パワー推定値とする。適切な重み係数は、現在の入力パワーを直前の雑音パワー推定値で除した事後ＳＮＲ（Ｓｉｇｎａｌ−ｔｏ−ＮｏｉｓｅＲａｔｉｏ：信号対雑音比）によって算出される。具体的には、事後ＳＮＲが所定の値Ｇ１以下では重み係数を１とし、事後ＳＮＲがＧ１以上では事後ＳＮＲに反比例するように重み係数を設定し、事後ＳＮＲが所定の値Ｇ２以上では重み係数を０とする。また、重み係数が０の場合には、加重入力パワーは記憶しない。 The conventional noise estimation method described in Patent Document 1 multiplies input power by an appropriate weighting factor, stores the obtained weighted input power for a predetermined time (T seconds), and stores the stored weighted input power. Let the average value be the noise power estimate. The appropriate weighting factor is calculated by the Signal-to-Noise Ratio (SNR) obtained by dividing the current input power by the noise power estimate immediately before. Specifically, the weighting factor is set to 1 when the posterior SNR is a predetermined value G1 or less, and the weighting factor is set to be inversely proportional to the posterior SNR when the posterior SNR is G1 or more, and the weighting factor when the posterior SNR is a predetermined value G2 or more Let 0 be. Also, when the weighting factor is 0, the weighted input power is not stored.

特開２００２−２０４１７５号公報Japanese Patent Laid-Open No. 2002-204175

Ｒ．Ｍａｒｔｉｎ，”ＳｐｅｃｔｒａｌＳｕｂｔｒａｃｔｉｏｎＢａｓｅｄｏｎＭｉｎｉｍｕｍＳｔａｔｉｓｔｉｃｓ，”ｉｎＰｒｏｃｅｅｄｉｎｇｓｏｆ７ｔｈＥｕｒｏｐｅａｎＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇＣｏｎｆｅｒｅｎｃｅ，１９９４，ｐｐ．１１８２−１１８５．R. Martin, “Spectral Subtraction Based on Minimum Statistics,” in Proceedings of 7th European Signal Processing Conference, 1994, pp. 1182-1185. Ｍ．Ｓｏｕｄｅｎ，Ｍ．Ｄｅｌｃｒｏｉｘ，Ｋ．Ｋｉｎｓｏｓｈｉｔａ，Ｔ．Ｙｏｓｈｉｏｋａ，ａｎｄＴ．Ｎａｋａｔａｎｉ，”ＮｏｉｓｅＰｏｗｅｒＳｐｅｃｔｒａｌＤｅｎｓｉｔｙＴｒａｃｋｉｎｇ：ＡＭａｘｉｍｕｍＬｉｋｅｌｉｈｏｏｄＰｅｒｓｐｅｃｔｉｖｅ，”ＩＥＥＥＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇＬｅｔｔｅｒｓ，Ｖｏｌ．１９，Ｎｏ．８，２０１２，ｐｐ．４９５−４９８．M. Souden, M. Delcroix, K .; Kinsoshita, T .; Yoshioka, and T. Nakatani, "Noise Power Spectral Density Tracking: A Maximum Likelihood Perspective," IEEE Signal Processing Letters, Vol. 19, no. 8, 2012, pp. 495-498.

しかしながら、従来の雑音推定方法には以下に述べるような問題点が存在する。 However, the conventional noise estimation method has the following problems.

非特許文献１の方法は、雑音が急に大きくなった場合に、後段の雑音抑圧方法によって不快に感じる雑音が残留するという問題を有している。具体的には、雑音が大きくなってから所定時間の間は、雑音パワー推定値は小さいままである。そして、雑音が大きくなってから所定時間後に、雑音パワー推定値は急激に増大する。そのような雑音パワー推定値を用いて雑音抑圧方法を動作させると、雑音が大きくなった瞬間に残留雑音も急に大きくなり、その所定時間後に残留雑音が急に小さくなる。残留雑音の急激な音量の変化は、聴取者に聴感上の不快感を与える。 The method of Non-Patent Document 1 has a problem that when the noise suddenly increases, noise that is perceived as unpleasant remains due to the noise suppression method of the latter stage. Specifically, the noise power estimate remains small for a predetermined time after the noise increases. Then, after a predetermined time after the noise increases, the noise power estimated value rapidly increases. When operating the noise suppression method using such a noise power estimation value, at the moment when the noise increases, the residual noise also suddenly increases, and the residual noise decreases suddenly after a predetermined time. The sudden change in volume of the residual noise causes the listener to feel uncomfortable.

非特許文献２の方法は、雑音レベルが変化すると雑音パワー推定値が過大になったり過小になったりするという問題を有している。この雑音推定方法で用いられているオンラインＥＭアルゴリズムは、次のような追従の速さと最尤推定の安定性とのトレードオフを有する：忘却係数を大きくすると安定性が増して追従が遅くなり、忘却係数を小さくすると追従が速くなって安定性が下がる。その結果、忘却係数を大きくしても小さくしても雑音パワー推定値は不正確となり、後段の雑音抑圧方法によって得られる強調音声の歪みを増大させたり残留雑音が大きくなったりする。 The method of Non-Patent Document 2 has a problem that when the noise level changes, the noise power estimation value becomes excessive or too small. The on-line EM algorithm used in this noise estimation method has the following trade-off between tracking speed and stability of maximum likelihood estimation: increasing the forgetting factor increases stability and makes tracking slower If the forgetting factor is reduced, the follow-up becomes faster and the stability decreases. As a result, even if the forgetting factor is increased or decreased, the noise power estimation value becomes inaccurate, and the distortion of the enhanced speech obtained by the noise suppression method in the subsequent stage may be increased or the residual noise may be increased.

特許文献１の方法は、雑音パワー推定値が、誤って音声に追従してしまうことや、非定常雑音に追従して不安定になることが比較的少なく、それでいて雑音が変化した場合にも比較的速やかに追従することができる。しかし、重み係数が０とならないような音声区間が続いた後の雑音区間では、雑音区間に切り替わった約Ｔ秒後に雑音パワー推定値が急激に小さくなる現象が生じる。そのような雑音パワー推定値を用いて後段の雑音抑圧方法を動作させると、当該雑音区間で残留雑音が急激に大きくなるため、聴感上不自然な強調音声になってしまう。 According to the method of Patent Document 1, the noise power estimation value is relatively less likely to erroneously follow the voice and to be unstable following the non-stationary noise, and even when the noise is changed Can follow quickly. However, in the noise section after the voice section where the weighting factor does not become 0 continues, a phenomenon occurs in which the noise power estimated value becomes sharply small about T seconds after switching to the noise section. When the noise suppression method of the latter stage is operated using such a noise power estimation value, the residual noise becomes large rapidly in the noise section, resulting in an aurally unnaturally emphasized voice.

さらに、上述した従来の雑音推定方法は、雑音推定に係るパラメータを入力信号に適応させる機能を有していないため、雑音の特性（雑音レベルや雑音の種類）が変化した場合に雑音推定の精度が劣化してしまう。 Furthermore, since the conventional noise estimation method described above does not have a function of adapting the parameters related to noise estimation to the input signal, the accuracy of the noise estimation when noise characteristics (noise level and noise type) change Will deteriorate.

以上のように、従来の雑音推定方法には、雑音パワー推定値が不安定になる問題や、雑音パワー推定値が急激に変化する問題や、雑音の特性に適応できないといった問題が存在する。 As described above, the conventional noise estimation method has a problem that the noise power estimated value becomes unstable, a problem that the noise power estimated value changes rapidly, and a problem that it can not adapt to the characteristics of noise.

以上のような問題点に鑑みて、入力音声について、雑音パワーを安定的かつ適応的に推定できる雑音推定装置、プログラム及び方法、並びに、音声処理装置が望まれている。 In view of the above problems, there is a demand for a noise estimation device, a program and method, and a voice processing device capable of stably and adaptively estimating the noise power of input speech.

第１の本発明は、入力音声に含まれる所定の周波数帯域の雑音を推定する雑音推定装置において、（１）前記入力音声を構成する前記所定の周波数帯域の帯域入力パワーを所定の値で正規化して正規化入力パワーを取得する入力パワー正規化手段と、（２）前記正規化入力パワーに基づいて事後確率が最大となる現在の正規化雑音パワーを推定する事後確率最大化手段と、（３）前記正規化雑音パワーを非正規化して非正規化雑音パワーを取得する雑音パワー非正規化手段と、（４）前記非正規化雑音パワーに基づく値を、前記入力音声に含まれる前記所定の周波数帯域の雑音パワーを推定した推定結果として出力する推定結果出力手段とを有することを特徴とする。 According to a first aspect of the present invention, there is provided a noise estimating apparatus for estimating noise in a predetermined frequency band included in input speech, (1) normalizing the band input power of the predetermined frequency band constituting the input speech with a predetermined value (2) a posteriori probability maximizing means for estimating the present normalized noise power that maximizes the posterior probability based on the normalized input power; 3) noise power denormalization means for acquiring the denormalized noise power by denormalizing the normalized noise power, and (4) the predetermined value included in the input speech based on the denormalized noise power. And an estimation result output means for outputting as an estimation result obtained by estimating the noise power of the frequency band of

第２の本発明の雑音抑制プログラムは、入力音声に含まれる所定の周波数帯域の雑音を推定する雑音推定装置に搭載されたコンピュータを、（１）前記入力音声を構成する前記所定の周波数帯域の帯域入力パワーを所定の値で正規化して正規化入力パワーを取得する入力パワー正規化手段と、（２）前記正規化入力パワーに基づいて事後確率が最大となる現在の正規化雑音パワーを推定する事後確率最大化手段と、（３）前記正規化雑音パワーを非正規化して非正規化雑音パワーを取得する雑音パワー非正規化手段と、（４）前記非正規化雑音パワーに基づく値を、前記入力音声に含まれる前記所定の周波数帯域の雑音パワーを推定した推定結果として出力する推定結果出力手段として機能させることを特徴とする。 According to a second aspect of the present invention, there is provided a noise suppression program comprising: a computer mounted on a noise estimating device for estimating noise in a predetermined frequency band included in input speech; (1) in the predetermined frequency band constituting the input speech Input power normalization means for obtaining normalized input power by normalizing band input power with a predetermined value, and (2) estimating the current normalized noise power that maximizes the posterior probability based on the normalized input power A posteriori probability maximizing means, (3) a noise power denormalization means for acquiring the denormalized noise power by denormalizing the normalized noise power, and (4) a value based on the denormalized noise power The present invention is characterized in that it functions as an estimation result output unit that outputs noise power of the predetermined frequency band included in the input speech as an estimation result of estimation.

第３の本発明は、入力音声に含まれる所定の周波数帯域の雑音を推定する雑音推定方法において、（１）入力パワー正規化手段、事後確率最大化手段、雑音パワー非正規化手段、及び推定結果出力手段を有し、（２）前記入力パワー正規化手段は、前記入力音声を構成する前記所定の周波数帯域の帯域入力パワーを所定の値で正規化して正規化入力パワーを取得し、（３）前記事後確率最大化手段は、前記正規化入力パワーに基づいて事後確率が最大となる現在の正規化雑音パワーを推定し、（４）前記雑音パワー非正規化手段は、前記正規化雑音パワーを非正規化して非正規化雑音パワーを取得し、（５）前記推定結果出力手段は、前記非正規化雑音パワーに基づく値を、前記入力音声に含まれる前記所定の周波数帯域の雑音パワーを推定した推定結果として出力することを特徴とする。 According to a third aspect of the present invention, there is provided a noise estimation method for estimating noise in a predetermined frequency band included in input speech, the method comprising: (1) input power normalization means, posterior probability maximization means, noise power denormalization means, and estimation (2) the input power normalization means normalizes the band input power of the predetermined frequency band constituting the input voice with a predetermined value to obtain normalized input power ((2) 3) The posterior probability maximizing means estimates the current normalized noise power that maximizes the posterior probability based on the normalized input power, and (4) the noise power denormalization means determines the normalized noise power. Noise power is denormalized to obtain denormalized noise power, and (5) the estimation result output means is a noise of the predetermined frequency band included in the input speech, a value based on the denormalized noise power. Estimate power And outputs as the estimated result.

第４の本発明は、入力音声に含まれる雑音を抑圧する音声処理装置において、（１）入力音声に含まれる雑音を抑圧する音声処理装置において、（２）入力音声が帯域分割されたそれぞれの帯域入力音声に対して、雑音パワーを推定する雑音推定手段と、（３）それぞれの前記帯域入力音声に対して、前記雑音推定手段が推定した前記雑音パワーを用いて、雑音を抑制する雑音抑制手段とを有し、（３）それぞれの前記雑音推定手段として、第１の本発明の雑音推定装置を適用したことを特徴とする。 According to a fourth aspect of the present invention, there is provided a speech processing apparatus for suppressing noise included in input speech, (1) a speech processing apparatus for suppressing noise included in input speech, and (2) each of the input speech being band-divided. Noise estimation means for estimating noise power with respect to band input speech, and (3) noise suppression for suppressing noise by using the noise power estimated by the noise estimation means for each of the band input speech And the noise estimation device according to the first aspect of the present invention is applied as each of the noise estimation means (3).

本発明によれば、入力音声について、雑音パワーを安定的かつ適応的に推定できる。 According to the present invention, noise power can be stably and adaptively estimated for input speech.

第１の実施形態に係る雑音推定手段（雑音推定装置）の機能的構成について示したブロック図である。It is the block diagram shown about the functional composition of the noise estimating means (noise estimating device) concerning a 1st embodiment. 第１の実施形態に係る音声処理装置の機能的構成について示したブロック図である。It is the block diagram shown about the functional composition of the speech processing unit concerning a 1st embodiment. 第２の実施形態に係る雑音推定手段（雑音推定装置）の機能的構成について示したブロック図である。It is the block diagram shown about the functional composition of the noise estimating means (noise estimating device) concerning a 2nd embodiment. 第３の実施形態に係る雑音推定手段（雑音推定装置）の機能的構成について示したブロック図である。It is the block diagram shown about the functional composition of the noise estimating means (noise estimating device) concerning a 3rd embodiment. 第４の実施形態に係る雑音推定手段（雑音推定装置）の機能的構成について示したブロック図である。It is the block diagram shown about the functional composition of the noise estimating means (noise estimating device) concerning a 4th embodiment.

（Ａ）第１の実施形態
以下、本発明による雑音推定装置、プログラム及び方法、並びに、音声処理装置の第１の実施形態を、図面を参照しながら詳述する。第１の実施形態では、本発明の雑音推定装置、プログラム及び方法を雑音推定手段に適用した例について説明する。 (A) First Embodiment Hereinafter, a first embodiment of a noise estimation device, program and method, and speech processing device according to the present invention will be described in detail with reference to the drawings. In the first embodiment, an example in which the noise estimation apparatus, program and method of the present invention are applied to noise estimation means will be described.

（Ａ−１）第１の実施形態の構成
図２は、第１の実施形態の音声処理装置１００の全体構成について示したブロック図である。図２における括弧内の符号は、後述する第２〜第４の実施形態で用いられる符号である。 (A-1) Configuration of First Embodiment FIG. 2 is a block diagram showing the overall configuration of the speech processing apparatus 100 according to the first embodiment. Reference numerals in parentheses in FIG. 2 are reference numerals used in second to fourth embodiments described later.

音声処理装置１００は、音声を含む音声信号（時間領域の音声信号）である入力信号ｘ（時間領域の入力信号）について雑音抑圧を行って、抑圧後信号ｙ（時間領域の出力信号）を生成するものである。 The voice processing apparatus 100 performs noise suppression on an input signal x (input signal in time domain) that is a voice signal (voice signal in time domain) including voice to generate a signal y (output signal in time domain) after suppression It is

音声処理装置１００は、周波数解析手段１０１、Ｋ個の帯域処理手段１０２−１〜１０２−Ｋ、及び波形復元手段１０３を有している。 The speech processing apparatus 100 includes frequency analysis means 101, K band processing means 102-1 to 102-K, and waveform restoration means 103.

帯域処理手段１０２−１〜１０２−Ｋは、それぞれ異なる周波数帯域の帯域処理を行うものである。帯域処理手段１０２−１〜１０２−Ｋに添えられた１〜Ｋの符号（添え字；番号）は、それぞれ周波数帯域に係る識別子（番号）を示している。 The band processing means 102-1 to 102-K perform band processing of different frequency bands. The symbols (indexes; numbers) of 1 to K attached to the band processing means 102-1 to 102-K respectively indicate identifiers (numbers) related to the frequency band.

周波数解析手段１０１は、フーリエ変換に代表される任意の周波数解析手法、またはフィルタバンクに代表される任意の帯域分割手法によって、入力信号ｘ（入力音声）をＫ個の帯域に分割する。そして、周波数解析手段１０１は、得られたＫ個の帯域入力信号Ｘ_１〜Ｘ_Ｋは帯域処理手段１０２−１〜１０２−Ｋにそれぞれ与えられる（以下、必要に応じて周波数帯域の番号を表す添え字を省略する）。 The frequency analysis means 101 divides the input signal x (input speech) into K bands by an arbitrary frequency analysis technique represented by Fourier transform or an arbitrary band division technique represented by a filter bank. Then, the frequency analysis means 101 applies the obtained K band input signals X _{1 to} X _K to the band processing means 102-1 to 102 -K (hereinafter, the numbers of frequency bands are indicated as necessary. Omit subscripts).

帯域処理手段１０２は、Ｋ個の帯域ごとに、共通の処理を行う。帯域処理手段１０２は、パワー算出手段１０４、雑音推定手段１０５および雑音抑圧手段１０６を有している。 The band processing means 102 performs common processing for each of the K bands. The band processing means 102 has a power calculation means 104, a noise estimation means 105 and a noise suppression means 106.

パワー算出手段１０４は、帯域入力信号Ｘのパワー（帯域入力パワー）を算出し、得られた入力パワーＰｘを雑音推定手段１０５に与える。 The power calculation means 104 calculates the power (band input power) of the band input signal X, and supplies the obtained input power Px to the noise estimation means 105.

雑音推定手段１０５は、帯域ごとに雑音のパワーを推定し、得られた雑音パワーＰｎを雑音抑圧手段１０６に与える。 The noise estimation unit 105 estimates the power of noise for each band, and supplies the obtained noise power Pn to the noise suppression unit 106.

雑音抑圧手段１０６は、帯域入力信号Ｘと雑音パワーＰｎを用いて、帯域入力信号Ｘ中の音声成分を強調し、得られた帯域抑圧後信号Ｙを波形復元手段１０３に与える。 The noise suppression means 106 emphasizes the voice component in the band input signal X using the band input signal X and the noise power Pn, and supplies the obtained band-suppressed signal Y to the waveform restoration means 103.

以下では、帯域処理手段１０２−１〜１０２−Ｋが生成する帯域抑圧後信号をＹ_１〜Ｙ_Ｋと表すものとする。 Hereinafter, it is assumed to represent the band suppression after signal band processing unit 102-1 to 102-K to generate the _Y 1 to Y _K.

波形復元手段１０３は、周波数解析手段１０１で用いた周波数解析手法または帯域分割手法に対応する波形復元手法を用いて、帯域抑圧後信号Ｙ_１〜Ｙ_Ｋから時間波形を再構成し、得られた抑圧後信号ｙを出力する。 The waveform restoration means 103 is obtained by reconstructing the time waveform from the band-suppressed signals Y _{1 to} Y _K using the frequency analysis method used in the frequency analysis means 101 or the waveform restoration method corresponding to the band division method. Output the signal y after suppression.

次に、雑音推定手段１０５が入力信号ｘの雑音パワーを安定的かつ適応的に推定する処理の概要について説明する。 Next, an outline of a process in which the noise estimation unit 105 stably and adaptively estimates the noise power of the input signal x will be described.

雑音推定方法において最も注意しなければならない点は、雑音推定値が目的音声を含まないようにすることである。もし雑音推定値が目的音声を含むと、後段の雑音抑圧方法によって得られる強調音声が歪んだり小さくなったりしてしまい、強調音声の明瞭度や単語了解度を向上させるという雑音抑圧方法の目的を果たせなくなる。 The most important point in the noise estimation method is that the noise estimate does not include the target speech. If the noise estimation value includes the target speech, the emphasis speech obtained by the noise suppression method in the latter stage is distorted or reduced, and the purpose of the noise suppression method is to improve the intelligibility and word intelligibility of the emphasis speech. I can not finish it.

一方、雑音推定方法には非定常な雑音も推定できる性能を求められることもある。しかし、非定常雑音と音声とを区別するのは難しいため、非定常雑音を推定する性能と雑音推定値が音声を含まない性能との間にはトレードオフが生じる。それゆえ、従来のような定常雑音と非定常雑音を同時に推定する方法では、雑音推定値が音声を含んでしまい、安定性が低下する課題があった。 On the other hand, the noise estimation method may be required to have the ability to estimate nonstationary noise as well. However, because it is difficult to distinguish between non-stationary noise and speech, there is a trade-off between the ability to estimate non-stationary noise and the performance where the noise estimate does not include speech. Therefore, in the conventional method of simultaneously estimating stationary noise and non-stationary noise, there is a problem that the noise estimation value includes speech and the stability is degraded.

そこで、第１の実施形態の雑音推定手段１０５では、推定対象を定常雑音に限定することによって、より高い安定性を有する雑音推定方法を実現している。このために、第１の実施形態の雑音推定手段１０５では、最大事後確率推定の枠組みを用いている。 Therefore, the noise estimation unit 105 of the first embodiment implements a noise estimation method with higher stability by limiting the estimation target to stationary noise. For this purpose, the noise estimation means 105 of the first embodiment uses a framework of maximum a posteriori probability estimation.

定常性を利用して現在の雑音パワーＰｎを推定するために、直前の雑音パワーＰｎ’からＰｎを推定する問題を考える。ただし、集音環境やマイク感度による自由度をキャンセルするために、雑音パワーの平均値￣Ｐｎを導入し、正規化雑音パワーν（ニュー）＝Ｐｎ／￣Ｐｎを導入する。同様に、直前の正規化雑音パワーはν’＝Ｐｎ’／￣Ｐｎとする。そして、直前の正規化雑音パワーν’が観測された下での正規化雑音パワーνの事後確率ｐ（ν｜ν’）を最大化する問題を考える。当該事後確率を最大化することで、正規化雑音パワーνが得られる。 To estimate the current noise power Pn using stationarity, consider the problem of estimating Pn from the previous noise power Pn '. However, in order to cancel the freedom due to the sound collection environment and the microphone sensitivity, an average value 値 Pn of noise power is introduced, and normalized noise power ((new) = Pn / PPn is introduced. Similarly, it is assumed that the preceding normalized noise power is n '= Pn' / Pn. Then, consider the problem of maximizing the posterior probability p (ν | ν ′) of the normalized noise power 下 under the observation of the previous normalized noise power ’′. The normalized noise power ν can be obtained by maximizing the a posteriori probability.

まず、事後確率ｐ（ν｜ν’）をベイズの定理に基づいて展開すると以下の（１）式が得られる。以下の（１）式において、ν’は観測済みで確定しているので、分母は省略できる。また、事後確率よりも対数事後確率を最大化する方が簡単な場合が多いので、最大化する評価関数Ｊ（ν）を以下の（２）式とする。

First, if the posterior probability p (|| ν ′) is expanded based on Bayes theorem, the following equation (1) is obtained. In the following equation (1), ν 'is observed and determined, so the denominator can be omitted. Further, since it is easier in most cases to maximize the log posterior probability than the posterior probability, the evaluation function J (v) to be maximized is expressed by the following equation (2).

次に、評価関数の尤度関数ｐ（ν’｜ν）と事前確率ｐ（ν）を設計する。定常雑音が平均０の正規分布に従う場合、雑音を周波数解析して得られる雑音スペクトルの各要素の雑音振幅はレイリー分布に従うことが知られている。また、レイリー分布に従う確率変数の２乗は指数分布に従うから、当該雑音振幅を２乗して得られる雑音パワーも以下の（３）式に示す指数分布に従う。以下の（３）式において、μは確率変数（正規化雑音パワー）νの平均値である。事前確率ｐ（ν）は以下の（３）式により求めることができる。

Next, the likelihood function p (v '| v) and the prior probability p (v) of the evaluation function are designed. It is known that the noise amplitude of each element of the noise spectrum obtained by frequency analysis of noise follows a Rayleigh distribution when stationary noise follows a normal distribution with zero mean. Further, since the square of the random variable according to the Rayleigh distribution follows the exponential distribution, the noise power obtained by squaring the noise amplitude also follows the exponential distribution shown in the following equation (3). In the following equation (3), μ is an average value of random variables (normalized noise power) ν. The prior probability p (v) can be obtained by the following equation (3).

尤度関数ｐ（ν’｜ν）は、ν’の立場で考えると、νが観測された下でν’が観測される確率である。そこで、ν’／νの確率密度関数をこの尤度関数とする。ν’もνも同じμとなる上記の（３）式に従うとすると、尤度関数ｐ（ν’｜ν）は以下の（４）式となる。 The likelihood function p (ν '| ν) is the probability that ’' is observed when ν is observed, from the standpoint of’ '. Therefore, let the probability density function of ’'/ ν be this likelihood function. If we follow the above equation (3) in which ν ′ and 同じ are also the same μ, the likelihood function p (| ′ | ν) becomes the following equation (4).

上記の（３）式と以下の（４）式を上記の（２）式に代入すると、評価関数Ｊ（ν）は以下の（５）式となる。そして、Ｊ（ν）を最小化するνを得るために、Ｊ（ν）のνに関する導関数がゼロとなるような方程式を解くと、以下の（６）式となる。

If the above equation (3) and the following equation (4) are substituted into the above equation (2), the evaluation function J (v) becomes the following equation (5). Then, in order to obtain ν that minimizes J (ν), solving the equation such that the derivative of J (ν) with respect to ゼロ becomes zero, the following equation (6) is obtained.

上記の（６）式によると、０≦ν≦２μとなるから、雑音パワーの推定値は雑音パワーの平均値Ｐｎの２μ倍以下となることが保証されている。したがって、上記の（６）式を用いれば安定的に雑音パワーを推定することができる。例えば、入力パワーが目的音声や非定常雑音の成分を含んでいる場合、当該入力パワーは真の雑音パワーより大きくなるが、雑音パワーの推定値は雑音パワーの平均値￣Ｐｎの２μ倍以下となるので、目的音声や非定常雑音の成分を誤って雑音パワーとして推定してしまうことはない。 According to the above equation (6), 0 ≦ ν ≦ 2μ. Therefore, it is guaranteed that the estimated value of the noise power is not more than 2μ times the average value Pn of the noise power. Therefore, noise power can be stably estimated by using the above equation (6). For example, when the input power includes the target voice and non-stationary noise components, the input power is larger than the true noise power, but the estimated value of the noise power is not more than 2μ times the average value of the noise power P Pn. Therefore, the components of the target voice and non-stationary noise are not erroneously estimated as noise power.

次に、雑音推定手段１０５の内部構成について図１を用いて説明する。 Next, the internal configuration of the noise estimation unit 105 will be described with reference to FIG.

図１は、上述のような雑音推定方法を実現する機能的構成の一例について示したブロック図である。 FIG. 1 is a block diagram showing an example of a functional configuration for realizing the noise estimation method as described above.

図１に示すように、第１の実施形態の雑音推定手段１０５は、入力パワー記憶手段２０１と、入力パワー正規化手段２０２と、事後確率最大化手段２０３と、雑音パワー非正規化手段２０４と、雑音パワー平均手段２０５と、平均雑音パワー記憶手段２０６とを有する。 As shown in FIG. 1, the noise estimation unit 105 of the first embodiment includes an input power storage unit 201, an input power normalization unit 202, a posterior probability maximization unit 203, and a noise power denormalization unit 204. , Noise power averaging means 205, and average noise power storage means 206.

入力パワー記憶手段２０１は、入力パワーＰｘを記憶し、一単位時間後に入力パワー正規化手段２０２に与える。すなわち、入力パワー記憶手段２０１は遅延素子のような役割を果たす。 The input power storage means 201 stores the input power Px, and gives it to the input power normalization means 202 after one unit time. That is, the input power storage unit 201 plays a role like a delay element.

入力パワー正規化手段２０２は、一単位時間前の入力パワーＰｘ’を、後述する平均雑音パワー記憶手段２０６より与えられる一単位時間前の平均雑音パワー￣Ｐｎ’で除し、得られた正規化入力パワーξ’を事後確率最大化手段２０３に与える。 The input power normalization means 202 divides the input power Px 'one unit time ago by the average noise power Pn' one unit time ago given from the average noise power storage means 206 described later, and obtains the normalized The input power ξ ′ is given to the posterior probability maximizing means 203.

事後確率最大化手段２０３は、一単位時間前の正規化入力パワーξ’に基づいて事後確率が最大となる現在の正規化雑音パワーνを推定し、得られたνを雑音パワー非正規化手段２０４に与える。事後確率最大化手段２０３は、ξ’を一単位時間前の正規化雑音パワーν’と読み替えて、当該ν’を上記の（６）式に代入してνを推定する。前述したとおり、ξ’をν’と読み替えても、νは２μ以下の値となるので、推定対象の定常雑音以外（目的音声や非定常雑音）の成分を誤って雑音パワーとして推定してしまうことはない。なお、第１の実施形態において、パラメータμはμ＝１とするのが好適である。 The posterior probability maximizing means 203 estimates the current normalized noise power となる that maximizes the posterior probability based on the normalized input power ξ ′ one unit time ago, and obtains ν as a noise power denormalization means Give to 204. The a posteriori probability maximizing unit 203 substitutes '' with the normalized noise power ’'one unit time ago, and substitutes the ν' into the above equation (6) to estimate ν. As described above, even if ξ 'is replaced with', ν has a value of 2 μ or less, so that components other than the stationary noise to be estimated (target speech and nonstationary noise) are erroneously estimated as noise power. There is nothing to do. In the first embodiment, the parameter μ is preferably μ = 1.

雑音パワー非正規化手段２０４は、正規化雑音パワーνに一単位時間前の平均雑音パワー￣Ｐｎ’を乗じ、得られた雑音パワーＰｎ（非正規化後雑音パワー）を雑音パワー平均手段２０５に与えるとともに、雑音推定手段１０５の出力とする。第１の実施形態の周波数解析手段１０１は、雑音パワー非正規化手段２０４が取得した雑音パワーＰｎを推定結果として出力する。したがって、第１の実施形態の雑音推定手段１０５では、雑音パワー非正規化手段２０４が雑音推定の推定結果を出力する推定結果出力手段として機能する。 The noise power denormalization means 204 multiplies the normalized noise power ν by the average noise power PPn ′ one unit time ago, and obtains the obtained noise power Pn (noise power after denormalization) as the noise power averaging means 205. And the output of the noise estimation means 105. The frequency analysis unit 101 of the first embodiment outputs the noise power Pn acquired by the noise power denormalization unit 204 as an estimation result. Therefore, in the noise estimation unit 105 of the first embodiment, the noise power denormalization unit 204 functions as an estimation result output unit that outputs the estimation result of the noise estimation.

雑音パワー平均手段２０５は、雑音パワーＰｎの平均値（過去の所定数の雑音パワーＰｎに基づく値）を算出し、得られた平均雑音パワー￣Ｐｎを平均雑音パワー記憶手段２０６に与える。なお、雑音パワー平均手段２０５は、単純に過去の所定数の雑音パワーＰｎを平均化して平均値を取得するようにしてもよいし、過去の所定数の雑音パワーＰｎの重みづけ平均を取得するようにしてもよい。具体的には、雑音パワー平均手段２０５は、例えば、時定数フィルタ（いわゆる「リーク積分」とも呼ばれる方式）や、移動平均法等を用いて、過去の所定数の雑音パワーＰｎに基づく平均値の算出を行うようにしてもよい。なお、雑音パワー平均手段２０５では、時定数フィルタを用いた平均値の算出処理が好適に用いられる。 The noise power averaging means 205 calculates an average value of the noise power Pn (a value based on a predetermined number of noise powers Pn in the past), and gives the obtained average noise power Pn to the average noise power storage means 206. The noise power averaging means 205 may simply average a predetermined number of noise powers P n in the past to acquire an average value, or acquire a weighted average of the noise powers P n of the predetermined number in the past. You may do so. Specifically, the noise power averaging means 205 uses, for example, a time constant filter (a method also called “leak integration”), a moving average method, etc., to calculate an average value based on a predetermined number of noise powers Pn in the past. The calculation may be performed. In the noise power averaging means 205, calculation processing of an average value using a time constant filter is suitably used.

平均雑音パワー記憶手段２０６は、平均雑音パワー￣Ｐｎを記憶し、一単位時間後に入力パワー正規化手段２０２および雑音パワー非正規化手段２０４に与える。すなわち、平均雑音パワー記憶手段２０６は遅延素子のような機能を果たす。 The average noise power storage means 206 stores the average noise power P n and supplies it to the input power normalization means 202 and the noise power denormalization means 204 after one unit time. That is, the average noise power storage means 206 performs a function like a delay element.

（Ａ−２）第１の実施形態の動作
次に、以上のような構成を有する第１の実施形態の音声処理装置１００の動作（実施形態に係る音声処理方法）を説明する。 (A-2) Operation of the First Embodiment Next, an operation (a sound processing method according to the embodiment) of the speech processing apparatus 100 of the first embodiment having the configuration as described above will be described.

まず、図１を用いて、音声処理装置１００の全体動作について説明する。 First, the overall operation of the speech processing apparatus 100 will be described using FIG.

周波数解析手段１０１は、入力信号ｘから、例えばフーリエ変換等によりＫ個の帯域入力信号Ｘ_１〜Ｘ_Ｋを取得し、帯域入力信号Ｘ_１〜Ｘ_Ｋをそれぞれ帯域処理手段１０２−１〜１０２−Ｋに供給する。 The frequency analysis unit 101 acquires _K band input signals X _{1 to} X _K from the input signal x by, for example, Fourier transform, and the band input signals X _{1 to} X _K are processed by the band processing units 102-1 to 102-, respectively. Supply to K.

そして、帯域処理手段１０２−１〜１０２−Ｋは、それぞれ帯域入力信号Ｘ_１〜Ｘ_Ｋについて雑音パワーを推定する。そして、帯域処理手段１０２−１〜１０２−Ｋは、それぞれ雑音パワーの推定結果に基づいて帯域入力信号Ｘ_１〜Ｘ_Ｋに対して雑音抑制処理を行い、帯域抑圧後信号Ｙ_１〜Ｙ_Ｋを生成し、波形復元手段１０３に供給する。 Then, the band processing unit 102-1 to 102-K estimates the noise power for each band input signal _X 1 to X _K. Then, the band processing unit 102-1 to 102-K performs noise suppression processing on the band input signals _X 1 to X _K based on the estimation result of the noise power, respectively, the band suppression after signal _Y 1 to Y _K It generates and supplies it to the waveform restoration means 103.

そして、波形復元手段１０３は、帯域抑圧後信号Ｙ_１〜Ｙ_Ｋから時間波形を再構成し、得られた抑圧後信号ｙを出力する。 Then, the waveform restoration means 103 reconstructs the time waveform from the band-suppressed signals Y _{1 to} Y _K, and outputs the obtained post-suppression signal y.

次に、各帯域処理手段１０２内部の動作について説明する。帯域処理手段１０２−１〜１０２−Ｋは、それぞれ処理する周波数成分が異なるだけで、共通の動作を行う。 Next, the operation inside each band processing means 102 will be described. The band processing means 102-1 to 102-K perform common operations only with different frequency components to be processed.

パワー算出手段１０４は、帯域入力信号Ｘのパワーを算出し、得られた入力パワーＰｘを雑音推定手段１０５に与える。 The power calculating means 104 calculates the power of the band input signal X, and supplies the obtained input power Px to the noise estimating means 105.

次に、図１を用いて、第１の実施形態の雑音推定手段１０５内部の動作について説明する。 Next, the internal operation of the noise estimation means 105 of the first embodiment will be described using FIG.

入力パワー記憶手段２０１は、入力パワーＰｘを記憶し、一単位時間後に入力パワー正規化手段２０２に与える。 The input power storage means 201 stores the input power Px, and gives it to the input power normalization means 202 after one unit time.

入力パワー正規化手段２０２は、一単位時間前の入力パワーＰｘ’を、一単位時間前の平均雑音パワー￣Ｐｎ’で除し、得られた正規化入力パワーξ’を事後確率最大化手段２０３に与える。 The input power normalization means 202 divides the input power Px 'one unit time ago by the average noise power Pn' one unit time ago, and the obtained normalized input power ξ 'is a posterior probability maximizing means 203 Give to.

事後確率最大化手段２０３は、一単位時間前の正規化入力パワーξ’に基づいて事後確率が最大となる現在の正規化雑音パワーνを推定し、得られたνを雑音パワー非正規化手段２０４に与える。 The posterior probability maximizing means 203 estimates the current normalized noise power となる that maximizes the posterior probability based on the normalized input power ξ ′ one unit time ago, and obtains ν as a noise power denormalization means Give to 204.

雑音パワー非正規化手段２０４は、正規化雑音パワーνに一単位時間前の平均雑音パワー￣Ｐｎ’を乗じ、得られた雑音パワーＰｎを雑音パワー平均手段２０５に与えるとともに、雑音推定手段１０５の出力とする。 The noise power denormalization means 204 multiplies the normalized noise power に by the average noise power P n ′ one unit time ago, and gives the obtained noise power P n to the noise power averaging means 205. It will be output.

雑音パワー平均手段２０５は、雑音パワーＰｎの平均値を算出し、得られた平均雑音パワー￣Ｐｎを平均雑音パワー記憶手段２０６に与える。 The noise power averaging means 205 calculates an average value of the noise power Pn, and provides the obtained average noise power P n to the average noise power storage means 206.

平均雑音パワー記憶手段２０６は、平均雑音パワー￣Ｐｎを記憶し、一単位時間後に入力パワー正規化手段２０２および雑音パワー非正規化手段２０４に与える。 The average noise power storage means 206 stores the average noise power P n and supplies it to the input power normalization means 202 and the noise power denormalization means 204 after one unit time.

（Ａ−３）第１の実施形態の効果
第１の実施形態によれば、以下のような効果を奏することができる。 (A-3) Effects of the First Embodiment According to the first embodiment, the following effects can be achieved.

第１の実施形態の雑音推定手段１０５では、事後確率最大化手段２０３が、入力パワーＰｘを平均雑音パワー￣Ｐｎ’で正規化した正規化入力パワーξ’に基づいて、事後確率が最大となる現在の正規化雑音パワーνを推定している。そして、第１の実施形態の雑音推定手段１０５では、雑音パワー非正規化手段２０４が、正規化雑音パワーνを非正規化して推定結果としての雑音パワーＰｎを取得している。そして、第１の実施形態では、事後確率最大化手段２０３は、上記の（６）式を用いて、正規化雑音パワーνを推定している。そして、上述の通り、上記の（６）式によれば、０≦ν≦２μとなるから、雑音パワーの推定値は雑音パワーの平均値Ｐｎの２μ倍以下となることが保証されているので、雑音推定手段１０５（事後確率最大化手段２０３）は、安定的に雑音パワーを推定することができる。 In the noise estimation unit 105 according to the first embodiment, the posterior probability maximizing unit 203 maximizes the posterior probability based on the normalized input power ξ ′ obtained by normalizing the input power Px with the average noise power PPn ′. The current normalized noise power ν is estimated. Then, in the noise estimation unit 105 of the first embodiment, the noise power denormalization unit 204 denormalizes the normalized noise power ν to obtain the noise power Pn as the estimation result. Then, in the first embodiment, the posterior probability maximizing unit 203 estimates the normalized noise power ν using the above equation (6). And, as described above, according to the above equation (6), 0 ≦ ν ≦ 2μ, and therefore it is guaranteed that the estimated value of the noise power will be 2μ or less of the average value Pn of the noise power. The noise estimating means 105 (the posterior probability maximizing means 203) can stably estimate the noise power.

また、事後確率最大化手段２０３では、正規化入力パワーξ’に基づいて、事後確率が最大となる現在の正規化雑音パワーνを推定するため、近似する処理を行わずに、正規化雑音パワーνを取得することが可能なる。これは、上記の（６）式を求める過程（上記の（１）式〜（６）式の過程）で、近似式を用いた計算を行っていないことからも明らかである。これにより、雑音推定手段１０５（事後確率最大化手段２０３）では、精度よく（推定誤差の少ない）雑音パワーを推定することができる。 Further, since the posterior probability maximizing means 203 estimates the current normalized noise power となる at which the posterior probability is maximum based on the normalized input power ξ ′, the normalized noise power is not performed without performing an approximation process. It is possible to obtain ν. This is also apparent from the fact that calculation using an approximate expression is not performed in the process of obtaining the above-mentioned expression (6) (process of the above-mentioned expressions (1) to (6)). Thus, the noise estimation unit 105 (the posterior probability maximizing unit 203) can estimate the noise power (with a small estimation error) with high accuracy.

（Ｂ）第２の実施形態
以下、本発明による雑音推定装置、プログラム及び方法、並びに、音声処理装置の第２の実施形態を、図面を参照しながら詳述する。第２の実施形態では、本発明の雑音推定装置、プログラム及び方法を雑音推定手段に適用した例について説明する。 (B) Second Embodiment A second embodiment of the noise estimation device, program and method, and speech processing device according to the present invention will be described in detail with reference to the drawings. In the second embodiment, an example in which the noise estimation apparatus, program and method of the present invention are applied to noise estimation means will be described.

（Ｂ−１）第２の実施形態の構成及び動作
第２の実施形態の音声処理装置１００Ａの全体構成についても図２を用いて示すことができる。 (B-1) Configuration and Operation of Second Embodiment The overall configuration of the speech processing apparatus 100A of the second embodiment can also be shown using FIG.

以下では、第２の実施形態の音声処理装置１００Ａについて第１の実施形態との差異を説明する。 Below, the difference with 1st Embodiment is demonstrated about 100A of audio processing apparatuses of 2nd Embodiment.

第２の実施形態の音声処理装置１００Ａでは、雑音推定手段１０５が雑音推定手段１０５Ａに置き換わっている点で第１の実施形態と異なっている。 The speech processing apparatus 100A of the second embodiment differs from that of the first embodiment in that the noise estimation unit 105 is replaced with a noise estimation unit 105A.

第１の実施形態の雑音推定手段１０５では、瞬時的な雑音パワーの推定値を、雑音パワーとして出力していたが、推定された雑音パワーは少なからず推定誤差を含んでいる。そこで、第２の実施形態の雑音推定手段１０５Ａでは、当該推定誤差の影響を軽減するために、平滑化された雑音パワー、すなわち雑音パワーの平均値を、雑音パワーとして出力する。 Although the noise estimation means 105 of the first embodiment outputs the estimated value of the instantaneous noise power as the noise power, the estimated noise power contains not a little estimation error. Therefore, in order to reduce the influence of the estimation error, the noise estimation means 105A of the second embodiment outputs the smoothed noise power, that is, the average value of the noise power as the noise power.

図３は、第２の実施形態の雑音推定手段１０５Ａの内部構成例について示したブロック図であり、同一部分及び対応部分には同一符号又は対応符号を付している。 FIG. 3 is a block diagram showing an example of the internal configuration of the noise estimation means 105A of the second embodiment, and the same or corresponding parts are given the same or corresponding reference numerals.

第２の実施形態の雑音推定手段１０５Ａでは、雑音パワー非正規化手段２０４、雑音パワー平均手段２０５が、雑音パワー非正規化手段２０４Ａ、雑音パワー平均手段２０５Ａに置き換わっている点で第１の実施形態と異なっている。 In the noise estimation means 105A of the second embodiment, the noise power denormalization means 204 and the noise power averaging means 205 replace the noise power denormalization means 204A and the noise power averaging means 205A in the first embodiment. It is different from the form.

雑音パワー非正規化手段２０４Ａは、正規化雑音パワーνに一単位時間前の平均雑音パワー￣Ｐｎ’を乗じ、得られた雑音パワーＰｎを雑音パワー平均手段２０５Ａに供給する。 The noise power denormalization means 204A multiplies the normalized noise power に by the average noise power ’Pn 'one unit time ago, and supplies the obtained noise power Pn to the noise power averaging means 205A.

雑音パワー平均手段２０５Ａは、雑音パワーＰｎの平均値を算出し、得られた平均雑音パワー￣Ｐｎを平均雑音パワー記憶手段２０６に与えるとともに、雑音推定手段１０５Ａの出力とする。すなわち、第１の実施形態の雑音推定手段１０５Ａでは、雑音パワー平均手段２０５Ａの出力（平均雑音パワー￣Ｐｎ）を、推定結果として出力する構成となっている。したがって、第２の実施形態の雑音推定手段１０５Ａでは、雑音パワー平均手段２０５Ａが雑音推定の推定結果を出力する推定結果出力手段として機能する。 The noise power averaging means 205A calculates the average value of the noise power Pn, gives the obtained average noise power Pn to the average noise power storage means 206, and uses it as the output of the noise estimation means 105A. That is, the noise estimation means 105A of the first embodiment is configured to output the output (average noise power Pn) of the noise power averaging means 205A as an estimation result. Therefore, in the noise estimation means 105A of the second embodiment, the noise power averaging means 205A functions as an estimation result output means for outputting the estimation result of the noise estimation.

（Ｂ−２）第２の実施形態の効果
第２の実施形態によれば、第１の実施形態と比較して以下のような効果を奏することができる。 (B-2) Effects of Second Embodiment According to the second embodiment, the following effects can be achieved as compared to the first embodiment.

第２の実施形態の雑音推定手段１０５Ａでは、雑音パワー平均手段２０５Ａの出力（平均雑音パワー￣Ｐｎ）を、推定結果として出力している。これにより、第２の実施形態の雑音推定手段１０５Ａでは、推定誤差の影響を軽減しながら、より安定的に雑音パワーを推定することができる。 The noise estimation means 105A of the second embodiment outputs the output (average noise power Pn) of the noise power averaging means 205A as an estimation result. Thereby, the noise estimation means 105A of the second embodiment can estimate the noise power more stably while reducing the influence of the estimation error.

（Ｃ）第３の実施形態
以下、本発明による雑音推定装置、プログラム及び方法、並びに、音声処理装置の第３の実施形態を、図面を参照しながら詳述する。第３の実施形態では、本発明の雑音推定装置、プログラム及び方法を雑音推定手段に適用した例について説明する。 (C) Third Embodiment Hereinafter, a third embodiment of the noise estimation device, program and method, and speech processing device according to the present invention will be described in detail with reference to the drawings. In the third embodiment, an example in which the noise estimation apparatus, program and method of the present invention are applied to noise estimation means will be described.

（Ｃ−１）第３の実施形態の構成及び動作
第３の実施形態の音声処理装置１００Ｂの全体構成についても図２を用いて示すことができる。 (C-1) Configuration and Operation of Third Embodiment The overall configuration of the speech processing device 100B of the third embodiment can also be shown using FIG.

以下では、第３の実施形態の音声処理装置１００Ｂについて第１の実施形態との差異を説明する。 Below, the difference with 1st Embodiment is demonstrated about the speech processing unit 100B of 3rd Embodiment.

第３の実施形態の音声処理装置１００Ｂでは、雑音推定手段１０５が雑音推定手段１０５Ｂに置き換わっている点で第１の実施形態と異なっている。 The speech processing apparatus 100B according to the third embodiment is different from the first embodiment in that the noise estimation unit 105 is replaced with a noise estimation unit 105B.

第１の実施形態の音声処理装置１００では、μの値は一定値であったため、途中で雑音の特性が変化した場合には対応できない。そこで、第３の実施形態の音声処理装置１００Ｂでは、μの値を更新するものとする。従来の雑音推定手段では、多くの場合、分布パラメータの推定には莫大なサンプル数か複雑な推定処理を必要とするが、本発明の雑音推定手段では、μは平均値（一次の統計量）であるため、少ないサンプル数かつ簡単な平均処理で精度良く推定できる。 In the speech processing apparatus 100 according to the first embodiment, the value of μ is a constant value, so that it can not cope with the case where the noise characteristic changes in the middle. Therefore, in the speech processing apparatus 100B of the third embodiment, the value of μ is updated. In the conventional noise estimation means, estimation of distribution parameters often requires a large number of samples or complicated estimation processing, but in the noise estimation means of the present invention, μ is an average value (first order statistic). Therefore, estimation can be accurately performed with a small number of samples and simple averaging.

図４は、第３の実施形態の雑音推定手段１０５Ｂの内部構成例について示したブロック図であり、同一部分及び対応部分には同一符号又は対応符号を付している。 FIG. 4 is a block diagram showing an example of the internal configuration of the noise estimation means 105B of the third embodiment, and the same or corresponding parts are given the same or corresponding reference numerals.

第３の実施形態の雑音推定手段１０５Ｂでは、第１の実施形態の雑音推定手段１０５と比較して、正規化雑音パワー平均手段２０８と平均正規化雑音パワー記憶手段２０９が追加されている。また、第３の実施形態の雑音推定手段１０５Ｂでは、事後確率最大化手段２０３が事後確率最大化手段２０３Ｂに置き換わっている点で第１の実施形態と異なっている。 In the noise estimating means 105 B of the third embodiment, a normalized noise power averaging means 208 and an average normalized noise power storing means 209 are added as compared with the noise estimating means 105 of the first embodiment. The noise estimating means 105B of the third embodiment is different from the first embodiment in that the posterior probability maximizing means 203 is replaced with the posterior probability maximizing means 203B.

事後確率最大化手段２０３Ｂは、一単位時間前の正規化入力パワーξ’と一単位時間前の平均正規化雑音パワー￣ν’とに基づいて事後確率が最大となる現在の正規化雑音パワーνを推定し、得られたνを雑音パワー非正規化手段２０４および正規化雑音パワー平均手段２０８に与える。事後確率最大化手段２０３Ｂは、ξ’を一単位時間前の正規化雑音パワーν’と読み替えて、さらに￣ν’をパラメータμと読み替えて、当該ν’およびμを（６）式に代入してνを推定する。 The posterior probability maximizing means 203B is configured to maximize the posterior probability based on the normalized input power ξ ′ one unit time ago and the average normalized noise power ̄ ′ one unit time ago. And gives the obtained ν to the noise power denormalization means 204 and the normalized noise power averaging means 208. The posterior probability maximizing unit 203B replaces 読み ′ with the normalized noise power '′ one unit time ago, further replaces ̄ ′ with the parameter μ, and substitutes the ν ′ and μ into the equation (6). Estimate ν.

正規化雑音パワー平均手段２０８は、正規化雑音パワーνの平均値（過去の所定数の正規化雑音パワーνに基づく値）を算出し、得られた平均正規化雑音パワー￣νを平均正規化雑音パワー記憶手段２０９に与える。なお、正規化雑音パワー平均手段２０８は、単純に過去の所定数の正規化雑音パワーνを平均化して平均値を取得するようにしてもよいし、過去の所定数の正規化雑音パワーνの重みづけ平均を取得するようにしてもよい。具体的には、正規化雑音パワー平均手段２０８は、例えば、時定数フィルタ（いわゆる「リーク積分」とも呼ばれる方式）や、移動平均法等を用いて、過去の所定数の正規化雑音パワーνに基づく平均値の算出を行うようにしてもよい。正規化雑音パワー平均手段２０８では、時定数フィルタを用いた平均値の算出処理が好適に用いられる。 The normalized noise power averaging means 208 calculates an average value of normalized noise power ν (value based on a predetermined number of normalized noise powers 過去 in the past), and averages the obtained average normalized noise power ν. The noise power storage means 209 is provided. Note that the normalized noise power averaging means 208 may simply average the predetermined number of normalized noise powers 過去 in the past to obtain an average value, or the normalized noise power averager 208 may obtain the average value of the predetermined number of normalized noise powers 過去 in the past. A weighted average may be obtained. Specifically, the normalized noise power averaging means 208 uses, for example, a time constant filter (a method also called a so-called “leak integration”), a moving average method, or the like to set a predetermined number of normalized noise powers 過去 in the past. The average value may be calculated based on the above. In the normalized noise power averaging means 208, calculation processing of an average value using a time constant filter is preferably used.

平均正規化雑音パワー記憶手段２０９は、平均正規化雑音パワー￣νを記憶し、一単位時間後に事後確率最大化手段２０３Ｂに与える。すなわち、平均正規化雑音パワー記憶手段２０９は遅延素子のような機能を果たす。 The average normalized noise power storage means 209 stores the average normalized noise power ν and supplies it to the posterior probability maximizing means 203B after one unit time. That is, the average normalized noise power storage means 209 functions like a delay element.

（Ｃ−２）第３の実施形態の効果
第３の実施形態によれば、第１の実施形態の効果に加えて以下のような効果を奏することができる。 (C-2) Effects of Third Embodiment According to the third embodiment, the following effects can be achieved in addition to the effects of the first embodiment.

第３の実施形態の雑音推定手段１０５Ｂでは、μの値を更新する。具体的には、第３の実施形態の雑音推定手段１０５Ｂでは、上記の（６）式を用いてνを推定する際に、ξ’を一単位時間前の正規化雑音パワーν’と読み替え、さらに￣ν’をパラメータμと読み替えて（６）式に代入するものとする。パラメータμは正規化雑音パワーνの平均値であるから、理論上はμ＝１である。しかし、定常雑音が正規分布以外の確率分布に従って生じる場合、雑音パワーの事前確率が指数分布とは若干異なる分布となる可能性があるが、第３の実施形態の雑音推定手段１０５Ｂのように正規化雑音パワーνを平均してパラメータμを更新することで、当該事前確率を真の分布に近付けることができ、適切に雑音パワーを推定することができる。したがって、第３の実施形態の雑音推定手段１０５Ｂでは、入力信号の雑音特性に適応しながら、より安定的に雑音パワーを推定することができる。 The noise estimating means 105B of the third embodiment updates the value of μ. Specifically, in the noise estimation means 105B of the third embodiment, when estimating ν using the above equation (6), ξ ′ is replaced with the normalized noise power '′ one unit time ago, Further, ̄ ′ is replaced with the parameter μ and substituted into the equation (6). Since the parameter μ is an average value of the normalized noise power ν, theoretically, μ = 1. However, when stationary noise occurs according to a probability distribution other than the normal distribution, the a priori probability of the noise power may be a distribution slightly different from the exponential distribution, but as in the noise estimation means 105B of the third embodiment By averaging the quantization noise power 更新 and updating the parameter μ, the prior probability can be made closer to a true distribution, and the noise power can be appropriately estimated. Therefore, the noise estimation means 105B of the third embodiment can estimate the noise power more stably while adapting to the noise characteristic of the input signal.

（Ｄ）第４の実施形態
以下、本発明による雑音推定装置、プログラム及び方法、並びに、音声処理装置の第４の実施形態を、図面を参照しながら詳述する。第４の実施形態では、本発明の雑音推定装置、プログラム及び方法を雑音推定手段に適用した例について説明する。 (D) Fourth Embodiment Hereinafter, a fourth embodiment of the noise estimation apparatus, program and method, and speech processing apparatus according to the present invention will be described in detail with reference to the drawings. In the fourth embodiment, an example in which the noise estimation apparatus, program and method of the present invention are applied to noise estimation means will be described.

（Ｄ−１）第４の実施形態の構成
第４の実施形態の音声処理装置１００Ｃの全体構成についても図２を用いて示すことができる。 (D-1) Configuration of Fourth Embodiment The overall configuration of the speech processing device 100C of the fourth embodiment can also be shown using FIG.

以下では、第４の実施形態の音声処理装置１００Ｃについて第１の実施形態との差異を説明する。 In the following, the difference from the first embodiment will be described for the speech processing apparatus 100C of the fourth embodiment.

第４の実施形態の音声処理装置１００Ｃでは、雑音推定手段１０５が雑音推定手段１０５Ｃに置き換わっている点で第１の実施形態と異なっている。 The speech processing apparatus 100C of the fourth embodiment is different from the first embodiment in that the noise estimation unit 105 is replaced with a noise estimation unit 105C.

図５は、第４の実施形態の雑音推定手段１０５Ｃの内部構成例について示したブロック図であり、同一部分及び対応部分には同一符号又は対応符号を付している。 FIG. 5 is a block diagram showing an example of the internal configuration of the noise estimation means 105C of the fourth embodiment, and the same or corresponding parts are assigned the same or corresponding parts.

第４の実施形態の雑音推定手段１０５Ｃでは、第１の実施形態の雑音推定手段１０５と比較して、入力パワー記憶手段２０１が除外されている点で異なっている。また、第４の実施形態の雑音推定手段１０５Ｃでは、第１の実施形態の雑音推定手段１０５と比較して、入力パワー正規化手段２０２が入力パワー正規化手段２０２Ｃに置き換わっている。 The noise estimation unit 105C of the fourth embodiment is different from the noise estimation unit 105 of the first embodiment in that the input power storage unit 201 is excluded. Further, in the noise estimation means 105C of the fourth embodiment, the input power normalization means 202 is replaced with the input power normalization means 202C as compared to the noise estimation means 105 of the first embodiment.

第１の実施形態において入力パワー正規化手段２０２が一単位時間前の入力パワーＰｘ’を与えられていた。これに対して、第４の実施形態では、入力パワー正規化手段２０２Ｃに、直接入力パワーＰｘが供給される。 In the first embodiment, the input power normalization means 202 is given the input power Px 'one unit time ago. On the other hand, in the fourth embodiment, the input power Px is directly supplied to the input power normalization means 202C.

（Ｄ−２）第４の実施形態の効果
第４の実施形態によれば、第１の実施形態の効果に加えて以下のような効果を奏することができる。 (D-2) Effects of Fourth Embodiment According to the fourth embodiment, the following effects can be achieved in addition to the effects of the first embodiment.

第４の実施形態の雑音推定手段１０５Ｃでは、入力パワー正規化手段２０２Ｃに、直接入力パワーＰｘが供給される。これにより、第４の実施形態の雑音推定手段１０５Ｃでは、雑音パワーを安定的に、かつより高い即応性を持って推定することができる。 In the noise estimation means 105C of the fourth embodiment, the input power Px is directly supplied to the input power normalization means 202C. As a result, the noise estimation means 105C of the fourth embodiment can estimate the noise power stably and with higher responsiveness.

（Ｅ）他の実施形態
本発明は、上記の各実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。 (E) Other Embodiments The present invention is not limited to the above-described embodiments, and may include modified embodiments as exemplified below.

（Ｅ−１）上記の各実施形態では、雑音推定装置としての雑音推定手段を、音声処理装置の一部として構築する例について示したが、雑音推定装置を単体の装置として構築するようにしてもよい。また、上記の各実施形態において、１つの雑音推定装置（雑音推定手段）で１つの周波数帯の雑音パワーを推定するものとして説明したが、本発明の雑音推定装置は、複数の周波数帯の雑音パワーを推定する装置として構築するようにしてもよい。すなわち、上記の各実施形態に示す雑音推定手段を複数備える装置を、本発明の雑音推定装置として構築するようにしてもよい。 (E-1) In each of the above embodiments, the noise estimation unit as the noise estimation unit is constructed as a part of the speech processing unit. However, the noise estimation unit is constructed as a single unit. It is also good. In each of the above embodiments, one noise estimation device (noise estimation means) has been described as estimating noise power of one frequency band, but the noise estimation device of the present invention is not limited to noise of a plurality of frequency bands. You may make it build as an apparatus which estimates power. That is, an apparatus provided with a plurality of noise estimation means shown in each of the above embodiments may be constructed as the noise estimation apparatus of the present invention.

（Ｅ−２）第２の実施形態の雑音推定手段１０５Ａにおいて、第３の実施形態と同様に、正規化雑音パワー平均手段２０８及び平均正規化雑音パワー記憶手段２０９を追加してμを動的に更新する構成としてもよい。また、第２の実施形態の雑音推定手段１０５Ａにおいて、第４の実施形態と同様に、入力パワー記憶手段２０１を省略する構成としてもよい。 (E-2) In the noise estimating means 105A of the second embodiment, as in the third embodiment, the normalized noise power averaging means 208 and the average normalized noise power storing means 209 are added to dynamically generate μ. May be updated. Further, in the noise estimation unit 105A of the second embodiment, the input power storage unit 201 may be omitted as in the fourth embodiment.

（Ｅ−３）第３の実施形態の雑音推定手段１０５Ｂにおいて、第２の実施形態と同様に、正規化雑音パワー平均手段２０８で取得された平均雑音パワー￣Ｐｎを出力するようにしてもよい。また、第３の実施形態の雑音推定手段１０５Ｂにおいて、第４の実施形態と同様に、入力パワー記憶手段２０１を省略する構成としてもよい。 (E-3) In the noise estimating means 105 B of the third embodiment, the average noise power P n acquired by the normalized noise power averaging means 208 may be output as in the second embodiment. . Further, in the noise estimation unit 105B of the third embodiment, the input power storage unit 201 may be omitted as in the fourth embodiment.

１００…音声処理装置、１０１…周波数解析手段、１０２−１〜１０２−Ｋ、１０２…帯域処理手段、１０３…波形復元手段、１０４…パワー算出手段、１０５…雑音推定手段、１０６…雑音抑圧手段、２０１…入力パワー記憶手段、２０２…入力パワー正規化手段、２０３…事後確率最大化手段、２０４…雑音パワー非正規化手段、２０５…雑音パワー平均手段、２０６…平均雑音パワー記憶手段。 DESCRIPTION OF SYMBOLS 100 ... Speech processing apparatus, 101 ... Frequency analysis means, 102-1-102-K, 102 ... Band processing means, 103 ... Waveform restoration means, 104 ... Power calculation means, 105 ... Noise estimation means, 106 ... Noise suppression means, 201 Input power storage means 202 Input power normalization means 203 Posterior probability maximization means 204 Noise power denormalization means 205 Noise power averaging means 206 Average noise power storage means

Claims

入力音声に含まれる所定の周波数帯域の雑音を推定する雑音推定装置において、
前記入力音声を構成する所定の周波数帯域の帯域入力パワーを所定の値で正規化して正規化入力パワーを取得する入力パワー正規化手段と、
前記正規化入力パワーに基づいて事後確率が最大となる現在の正規化雑音パワーを推定する事後確率最大化手段と、
前記正規化雑音パワーを非正規化して非正規化雑音パワーを取得する雑音パワー非正規化手段と、
前記非正規化雑音パワーに基づく値を、前記入力音声に含まれる前記所定の周波数帯域の雑音パワーを推定した推定結果として出力する推定結果出力手段と
を有することを特徴とする雑音推定装置。 In a noise estimation apparatus for estimating noise in a predetermined frequency band included in input speech,
Input power normalization means for obtaining a normalized input power by normalizing the band input power of a predetermined frequency band constituting the input voice with a predetermined value;
A posteriori probability maximizing means for estimating a current normalized noise power that maximizes a posteriori probability based on the normalized input power;
Noise power denormalization means for denormalizing the normalized noise power to obtain denormalized noise power;
An estimation result output unit that outputs a value based on the denormalized noise power as an estimation result of estimation of the noise power of the predetermined frequency band included in the input speech.

過去に前記非正規化雑音パワーが取得した複数の非正規化雑音パワーを平均化した平均雑音パワーを取得する雑音パワー平均手段をさらに有し、
前記入力パワー正規化手段は、前記平均雑音パワーを、前記所定の値として適用する
ことを特徴とする請求項１に記載の雑音推定装置。 And noise power averaging means for obtaining an average noise power obtained by averaging a plurality of denormalized noise powers acquired in the past by the denormalized noise power,
The noise estimation device according to claim 1, wherein the input power normalization means applies the average noise power as the predetermined value.

前記推定結果出力手段は、前記非正規化雑音パワーを推定結果として出力することを特徴とする請求項１又は２に記載の雑音推定装置。 The noise estimation apparatus according to claim 1, wherein the estimation result output unit outputs the denormalized noise power as an estimation result.

前記推定結果出力手段は、前記平均雑音パワーを推定結果として出力することを特徴とする請求項２に記載の雑音推定装置。 The noise estimation apparatus according to claim 2, wherein the estimation result output means outputs the average noise power as an estimation result.

前記事後確率最大化手段は、事後確率が最大となる現在の正規化雑音パワーを推定する際に、確率変数の平均値に対応するパラメータを所定の定数に置き換えて計算することを特徴とする請求項１〜４のいずれかに記載の雑音推定装置。 The posterior probability maximizing means is characterized in that, when estimating the current normalized noise power that maximizes the posterior probability, the parameter corresponding to the mean value of the random variable is calculated by replacing it with a predetermined constant. The noise estimation device according to any one of claims 1 to 4.

前記入力パワー正規化手段が過去に取得した複数の前記正規化雑音パワーを平均化した平均正規化雑音パワーを取得する平均正規化雑音パワー平均手段をさらに有し、
前記事後確率最大化手段は、事後確率が最大となる現在の前記正規化雑音パワーを推定する際に、確率変数の平均値に対応するパラメータを、前記平均正規化雑音パワーに置き換えて計算すること特徴とする請求項１〜４のいずれかに記載の雑音推定装置。 The input power normalization means further includes an average normalization noise power averaging means for obtaining an average normalization noise power obtained by averaging a plurality of the normalization noise powers acquired in the past,
The posterior probability maximizing means substitutes the parameter corresponding to the mean value of the random variable to the mean normalized noise power when estimating the current normalized noise power which maximizes the posterior probability. The noise estimation device according to any one of claims 1 to 4, characterized in that.

入力音声に含まれる所定の周波数帯域の雑音を推定する雑音推定装置に搭載されたコンピュータを、
前記入力音声を構成する所定の周波数帯域の帯域入力パワーを所定の値で正規化して正規化入力パワーを取得する入力パワー正規化手段と、
前記正規化入力パワーに基づいて事後確率が最大となる現在の正規化雑音パワーを推定する事後確率最大化手段と、
前記正規化雑音パワーを非正規化して非正規化雑音パワーを取得する雑音パワー非正規化手段と、
前記非正規化雑音パワーに基づく値を、前記入力音声に含まれる前記所定の周波数帯域の雑音パワーを推定した推定結果として出力する推定結果出力手段と
して機能させることを特徴とする雑音推定プログラム。 A computer mounted on a noise estimating device for estimating noise in a predetermined frequency band included in input speech;
Input power normalization means for obtaining a normalized input power by normalizing the band input power of a predetermined frequency band constituting the input voice with a predetermined value;
A posteriori probability maximizing means for estimating a current normalized noise power that maximizes a posteriori probability based on the normalized input power;
Noise power denormalization means for denormalizing the normalized noise power to obtain denormalized noise power;
A noise estimation program characterized by causing a value based on the denormalized noise power to be output as estimation result output means for outputting as an estimation result of noise power of the predetermined frequency band included in the input speech .

入力音声に含まれる所定の周波数帯域の雑音を推定する雑音推定方法において、
入力パワー正規化手段、事後確率最大化手段、雑音パワー非正規化手段、及び推定結果出力手段を有し、
前記入力パワー正規化手段は、前記入力音声を構成する所定の周波数帯域の帯域入力パワーを所定の値で正規化して正規化入力パワーを取得し、
前記事後確率最大化手段は、前記正規化入力パワーに基づいて事後確率が最大となる現在の正規化雑音パワーを推定し、
前記雑音パワー非正規化手段は、前記正規化雑音パワーを非正規化して非正規化雑音パワーを取得し、
前記推定結果出力手段は、前記非正規化雑音パワーに基づく値を、前記入力音声に含まれる前記所定の周波数帯域の雑音パワーを推定した推定結果として出力する
ことを特徴とする雑音推定方法。 In a noise estimation method for estimating noise in a predetermined frequency band included in input speech,
The input power normalization means, the posterior probability maximization means, the noise power denormalization means, and the estimation result output means;
The input power normalization means normalizes the band input power of a predetermined frequency band constituting the input speech by a predetermined value to obtain normalized input power.
The posterior probability maximizing means estimates the current normalized noise power that maximizes the posterior probability based on the normalized input power;
The noise power denormalization means denormalizes the normalized noise power to obtain denormalized noise power.
The noise estimation method, wherein the estimation result output means outputs a value based on the denormalized noise power as an estimation result obtained by estimating noise power of the predetermined frequency band included in the input speech.

入力音声に含まれる雑音を抑圧する音声処理装置において、
入力音声が帯域分割されたそれぞれの帯域入力音声に対して、雑音パワーを推定する雑音推定手段と、
それぞれの前記帯域入力音声に対して、前記雑音推定手段が推定した前記雑音パワーを用いて、雑音を抑制する雑音抑制手段とを有し、
それぞれの前記雑音推定手段として、請求項１〜６のいずれかに記載の雑音推定装置を適用したこと
を特徴とする音声処理装置。 In a speech processing apparatus for suppressing noise contained in input speech,
Noise estimation means for estimating noise power for each band input speech in which the input speech is band-divided;
And noise suppression means for suppressing noise by using the noise power estimated by the noise estimation means for each of the band input speeches,
A voice processing apparatus, characterized in that the noise estimating device according to any one of claims 1 to 6 is applied as each of the noise estimating means.