JP5107956B2

JP5107956B2 - Noise suppression method, apparatus, and program

Info

Publication number: JP5107956B2
Application number: JP2009085662A
Authority: JP
Inventors: 俊治堀内
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2009-03-31
Filing date: 2009-03-31
Publication date: 2012-12-26
Anticipated expiration: 2029-03-31
Also published as: JP2010239424A

Description

本発明は、雑音抑圧方法、装置およびプログラムに関し、特に、２個のマイクロホンが出力する受音信号に基づいて、目的音と妨害音を分離して目的音を抽出する雑音抑圧方法、装置およびプログラムに関する。 The present invention relates to a noise suppression method, apparatus, and program, and in particular, a noise suppression method, apparatus, and program for extracting a target sound by separating a target sound and an interference sound based on a received sound signal output from two microphones. About.

街頭、車内あるいは駅のプラットホームなどに代表される雑音環境下では、ハンドセットやヘッドセットなどの口元に近接配置されたマイクロホンを用いても、目的音である所望の音声に妨害音である他の音声や周囲雑音が混入してしまうことがある。この問題を解決するため、これまでに様々な音源分離手法や雑音抑圧手法が提案されている。これらの手法は、単一のマイクロホンを使用するものと複数のマイクロホンを使用するものとに大別できる。複数のマイクロホンを使用するものでは、単一のマイクロホンを使用するものと比較して、より高い音源分離性能あるいは雑音抑圧性能を得ることができる。 In noisy environments such as streets, cars, and station platforms, even if you use a microphone placed close to your mouth, such as a handset or headset, you can use the target sound as the target sound and other sounds that are interference sounds. And ambient noise may be mixed. In order to solve this problem, various sound source separation methods and noise suppression methods have been proposed so far. These methods can be broadly classified into those using a single microphone and those using a plurality of microphones. In the case of using a plurality of microphones, higher sound source separation performance or noise suppression performance can be obtained as compared with the case of using a single microphone.

複数のマイクロホンを使用する手法のうち、人の聴覚モデルに基づくバイナリマスキングあるいは時間周波数マスキングと呼ばれる手法がある。この手法は、人の聴覚マスキング現象を模擬したものであり、より強い信号はそれより弱い信号をマスクするという仮定に基づいている。この処理は、時間周波数領域で行われ、マイクロホンＸ１が出力する受音信号X1(f,t)のうち、目的音である所望の音声を含む周波数成分はそのまま出力し、妨害音である他の音声や雑音を含む周波数成分はマスキングすることで、目的音Y1(f,t)を得る。このマスキング処理は、具体的には、下記の数１の式で定義される。ここで、m1(f、t)はマスクパターンと呼ばれる。また、C1は目的音である所望の音声を含む周波数成分のクラスタである。 Among the methods using a plurality of microphones, there is a method called binary masking or time frequency masking based on a human auditory model. This technique simulates human auditory masking and is based on the assumption that stronger signals mask weaker signals. This processing is performed in the time-frequency domain, and the frequency component including the desired sound as the target sound is output as it is from the sound reception signal X1 (f, t) output from the microphone X1, and other signals that are interference sounds are output. The target sound Y1 (f, t) is obtained by masking the frequency components including voice and noise. Specifically, this masking process is defined by the following equation (1). Here, m1 (f, t) is called a mask pattern. C1 is a cluster of frequency components including the desired sound as the target sound.

下記の非特許文献１〜３には、前記マスクパターンm1(f、t)とクラスタC1の決定において、ある時間ｔにおける各マイクロホンが出力する受音信号の時間周波数成分のパラメータとして、パワースペクトル、それらの差を利用したバイナリマスキング、又は時間周波数マスキングと呼ばれる手法を用いることが開示されている。これらの手法は、音声と音声の分離、あるいは音声の抽出に大きな効果を奏する。これは、音声に時間周波数領域で局在する性質、すなわちスパース性を有しているためである。なお、このスパース性とは、音声信号のエネルギが一部の時間周波数領域に集中し、その他の時間周波数領域ではほぼ０であるような性質をいう。 Non-Patent Documents 1 to 3 listed below include a power spectrum as a parameter of a time frequency component of a sound reception signal output by each microphone at a certain time t in the determination of the mask pattern m1 (f, t) and the cluster C1. It is disclosed to use a technique called binary masking utilizing these differences or time-frequency masking. These methods have a great effect on the separation of speech and speech or the extraction of speech. This is because the voice has a local property in the time-frequency domain, that is, sparseness. The sparsity is a property in which the energy of the audio signal is concentrated in a part of the time frequency region and is almost zero in the other time frequency region.

P.J.Bloom: "Evaluation of two-input speech dereverberation techniques, "In Proc. ICASSP. 1982.P.J.Bloom: "Evaluation of two-input speech dereverberation techniques," In Proc. ICASSP. 1982. R.F.Lyon: "A computational model of binaural localization and separation, " In Proc. ICASSP, 1983.R.F.Lyon: "A computational model of binaural localization and separation," In Proc. ICASSP, 1983. M. Bodden: "Modeling human sound-source localization and the cocktail-party-effect," Acta Acoustica, vol.1, pp.43--55, 1993.M. Bodden: "Modeling human sound-source localization and the cocktail-party-effect," Acta Acoustica, vol.1, pp.43--55, 1993.

しかしながら、例えば、音声と雑音の分離あるいは抽出を目的とする場合、雑音は時間周波数領域全体を占めていることが多く、雑音にスパース性が成立することは希である。この結果、バイナリマスキングあるいは時間周波数マスキングと呼ばれる手法に基づき音声と雑音を分離あるいは音声を抽出する場合には、十分な音源分離性能あるいは雑音抑圧性能を得ることができないという課題がある。 However, for example, when the purpose is to separate or extract speech and noise, the noise often occupies the entire time frequency domain, and it is rare that the noise is sparse. As a result, there is a problem that sufficient sound source separation performance or noise suppression performance cannot be obtained when speech and noise are separated or speech is extracted based on a technique called binary masking or temporal frequency masking.

本発明は前記した従来技術の課題に鑑みてなされたものであり、その目的は、複数のマイクロホンで受音された受音信号間にスパース性が成立しない場合でも、音源分離性能あるいは雑音抑圧性能が劣化することなく、かつ簡易な構成を有する雑音抑圧方法、装置およびプログラムを提供することにある。 The present invention has been made in view of the above-described problems of the prior art, and its purpose is to achieve sound source separation performance or noise suppression performance even when sparseness is not established between received signals received by a plurality of microphones. Is to provide a noise suppression method, apparatus, and program having a simple configuration without deterioration.

上記目的を達成するために、本発明の雑音抑圧装置は、主・副マイクロホンの信号経路に設けられ、主・副マイクロホンが出力する受音信号をそれぞれ時間周波数成分に変換する変換手段と、副マイクロホンの信号経路に設けられ、時間周波数成分へ変換される前の受音信号、あるいは時間周波数成分へ変換された後の時間周波数成分に乗算値を乗算する乗算手段と、前記主マイクロホンの時間周波数成分のパワースペクトルと前記乗算手段により乗算された後の副マイクロホンの時間周波数成分のパワースペクトルとを生成するパワースペクトル算出手段と、前記パワースペクトル算出手段で生成された主マイクロホンの時間周波数成分のパワースペクトルと前記乗算値が乗算された副マイクロホンの時間周波数成分のパワースペクトルからマスクパターンを生成するマスクパターン生成手段と、前記主マイクロホンの時間周波数成分を、前記マスクパターン生成手段により生成されたマスクパターンを用いてマスキングするマスキング処理手段と、前記マスキング処理手段より出力される主マイクロホンの時間周波数成分を合成する合成手段とを有する点に特徴がある。 In order to achieve the above object, a noise suppression device of the present invention is provided in a signal path of a main / sub microphone and converts a received sound signal output from the main / sub microphone into a time frequency component, Multiplier means for multiplying a sound reception signal before being converted to a time frequency component or a time frequency component after being converted to a time frequency component by a multiplication value provided in the signal path of the microphone, and the time frequency of the main microphone A power spectrum calculating means for generating a power spectrum of the component and a power spectrum of the time frequency component of the sub microphone after being multiplied by the multiplying means; and a power of the time frequency component of the main microphone generated by the power spectrum calculating means Is the power spectrum of the time-frequency component of the sub microphone multiplied by the spectrum and the multiplication value? Mask pattern generation means for generating a mask pattern, masking processing means for masking the time frequency component of the main microphone using the mask pattern generated by the mask pattern generation means, and main output from the masking processing means It is characterized by having a synthesizing means for synthesizing the time frequency component of the microphone.

また、本発明は、前記乗算手段で乗算される乗算値は、妨害音に対して主・副マイクロホンがそれぞれ出力する受音信号の時間周波数成分間にパワースペクトル差を生じさせ、かつ目的音に対して主・副マイクロホンがそれぞれ出力する受音信号の時間周波数成分のパワースペクトルの大小関係が逆転しないように、一定値あるいは周波数依存値に設定されるようにする点に特徴がある。 Further, according to the present invention, the multiplication value multiplied by the multiplication means causes a power spectrum difference between the time frequency components of the received signal output from the main and sub microphones for the disturbing sound, and the target sound. On the other hand, it is characterized in that it is set to a constant value or a frequency-dependent value so that the magnitude relation of the power spectrum of the time frequency component of the received sound signal output from the main and sub microphones is not reversed.

また、本発明は、主マイクロホンの信号経路に設けられ、時間周波数成分へ変換された後の時間周波数成分に乗算値を乗算する乗算手段と、前記乗算手段により乗算値を乗算された後の時間周波数成分のパワースペクトルと前記変換された副マイクロホンの時間周波数成分のパワースペクトルとを生成するパワースペクトル算出手段と、前記パワースペクトル算出手段で生成された前記乗算値が乗算された主マイクロホンの時間周波数成分のパワースペクトルと前記副マイクロホンの時間周波数成分のパワースペクトルからマスクパターンを生成するマスクパターン生成手段と、前記乗算手段により乗算値を乗算された主マイクロホンの時間周波数成分を該乗算値で除算する除算手段と、前記除算手段により除算された後の時間周波数成分を、前記マスクパターン生成手段により生成されたマスクパターンを用いてマスキングするマスキング処理手段とを有する点に特徴がある。 Further, the present invention provides a multiplication means for multiplying a time frequency component after being converted to a time frequency component by a multiplication value provided in the signal path of the main microphone, and a time after the multiplication value is multiplied by the multiplication means. A power spectrum calculating means for generating a power spectrum of the frequency component and a power spectrum of the time frequency component of the converted sub microphone; and a time frequency of the main microphone multiplied by the multiplication value generated by the power spectrum calculating means. A mask pattern generation unit that generates a mask pattern from the power spectrum of the component and the power spectrum of the time frequency component of the sub microphone, and the time frequency component of the main microphone multiplied by the multiplication value by the multiplication unit is divided by the multiplication value. Dividing means and the time frequency component after being divided by the dividing means It is characterized in that it has a masking processing means for masking using a mask pattern generated by the mask pattern generation means.

なお、本発明は、雑音抑圧装置としてだけでなく、受音信号の処理手順で特定される雑音抑圧方法としても特徴があり、さらにコンピュータに音源分離や妨害音抑圧の機能を実現させるためのプログラムとしても特徴がある。 The present invention is characterized not only as a noise suppression device but also as a noise suppression method specified by the processing procedure of the received sound signal, and further a program for causing a computer to realize sound source separation and interference sound suppression functions There are also features.

本発明では、主・副マイクロホンの信号経路の少なくとも一方に時間周波数成分乗算部を設け、該信号経路を通る受音信号あるいは時間周波数成分のパワースペクトルに乗算値を乗算することにより、主・副マイクロホンが出力する受音信号の時間周波数成分間にパワースペクトル差を生じさせ、その後にマスクパターンを生成するので、主・副マイクロホンが出力する受音信号にスパース性が成立せず、それらの時間周波数成分間にエネルギ差がない場合でも、音源分離や妨害音抑圧の性能を劣化することがなく、雑音を良好に抑圧できる。 In the present invention, a time frequency component multiplier is provided in at least one of the signal paths of the main and sub microphones, and the received signal passing through the signal path or the power spectrum of the time frequency component is multiplied by the multiplication value, thereby obtaining the main and sub microphones. A power spectrum difference is generated between the time-frequency components of the sound reception signal output from the microphone, and then a mask pattern is generated. Therefore, the sound reception signals output from the main and sub microphones are not sparse, and their time Even when there is no energy difference between the frequency components, the noise can be satisfactorily suppressed without deteriorating the performance of sound source separation and interference noise suppression.

また、前記マスクパターンを過去のマスクパターンを考慮して生成することにより、より良好に雑音を抑圧できる。 Further, noise can be suppressed more satisfactorily by generating the mask pattern in consideration of the past mask pattern.

本発明に係る雑音抑圧装置の実施形態を示すブロック図である。It is a block diagram which shows embodiment of the noise suppression apparatus which concerns on this invention. マスクパターン生成の一方法の説明図である。It is explanatory drawing of one method of mask pattern generation. 時間周波数成分乗算部での乗算値付与動作の説明図である。It is explanatory drawing of the multiplication value provision operation | movement in a time frequency component multiplication part. 本発明に係る雑音抑圧装置の他の実施形態を示すブロック図である。It is a block diagram which shows other embodiment of the noise suppression apparatus which concerns on this invention. 本発明に係る雑音抑圧装置のさらに他の実施形態を示すブロック図である。It is a block diagram which shows other embodiment of the noise suppression apparatus which concerns on this invention. 本発明に係る雑音抑圧装置の変形例を示すブロック図である。It is a block diagram which shows the modification of the noise suppression apparatus which concerns on this invention.

以下、図面を参照して本発明を詳細に説明する。図１は、本発明に係る雑音抑圧装置の一実施形態の構成を示すブロック図である。なお、本発明は、雑音抑圧装置としてだけでなく、受音信号の処理手順で特定される雑音抑圧方法としても実現でき、さらにコンピュータに雑音抑圧の機能を実現させるためのプログラムとしても実現できる。また、雑音抑圧装置における各部は、ハードウエアでもソフトウエアでも実現できる。 Hereinafter, the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a configuration of an embodiment of a noise suppression device according to the present invention. The present invention can be realized not only as a noise suppression device but also as a noise suppression method specified by a received signal processing procedure, and also as a program for causing a computer to realize a noise suppression function. Each unit in the noise suppression device can be realized by hardware or software.

図１の雑音抑圧装置は、時間周波数変換部１１、１２、時間周波数成分乗算部１３、パワースペクトル算出部１４、マスクパターン生成部１５、マスキング処理部１６および時間周波数合成部１７を備える。本実施形態では、時間周波数変換部１１、マスキング処理部１６および時間周波数合成部１７により主マイクロホンの信号経路が構成され、時間周波数変換部１２および時間周波数成分乗算部１３により副マイクロホンの信号経路が構成されている。 1 includes time-frequency conversion units 11 and 12, a time-frequency component multiplication unit 13, a power spectrum calculation unit 14, a mask pattern generation unit 15, a masking processing unit 16, and a time-frequency synthesis unit 17. In this embodiment, the signal path of the main microphone is configured by the time frequency conversion unit 11, the masking processing unit 16 and the time frequency synthesis unit 17, and the signal path of the sub microphone is formed by the time frequency conversion unit 12 and the time frequency component multiplication unit 13. It is configured.

時間周波数変換部１１，１２はそれぞれ、主・副マイクロホンが出力する受音信号を時間周波数領域で分析し、各時間周波数成分を出力する。時間周波数成分乗算部１３は、入力される受音信号の各時間周波数成分に所定の値を乗算する。 Each of the time frequency conversion units 11 and 12 analyzes the received sound signal output from the main / sub microphone in the time frequency domain, and outputs each time frequency component. The time frequency component multiplication unit 13 multiplies each time frequency component of the input sound reception signal by a predetermined value.

パワースペクトル算出部１４は、時間周波数変換部１１と時間周波数成分乗算部１３からそれぞれ出力される各時間周波数成分のパワースペクトルを算出する。 The power spectrum calculation unit 14 calculates the power spectrum of each time frequency component output from the time frequency conversion unit 11 and the time frequency component multiplication unit 13, respectively.

マスクパターン生成部１５は、パワースペクトル算出部１４により生成された各時間周波数成分のパワースペクトルに従って、時間周波数変換部１１から出力される時間周波数成分をマスキングするマスクパターンを生成する。マスキング処理部１６は、マスクパターン生成部１５によって生成されたマスクパターンに従って、時間周波数変換部１１から出力される時間周波数成分をマスキングする。時間周波数合成部１７は、マスキング処理部１６から出力される時間周波数成分を合成する。 The mask pattern generation unit 15 generates a mask pattern for masking the time frequency component output from the time frequency conversion unit 11 according to the power spectrum of each time frequency component generated by the power spectrum calculation unit 14. The masking processing unit 16 masks the time frequency component output from the time frequency conversion unit 11 according to the mask pattern generated by the mask pattern generation unit 15. The time frequency synthesis unit 17 synthesizes the time frequency component output from the masking processing unit 16.

次に、図１の雑音抑圧装置の動作を説明する。 Next, the operation of the noise suppression device in FIG. 1 will be described.

時間周波数変換部１１には、主マイクロホンが出力する受音信号x1(t)が入力される。携帯端末(例えば携帯電話)の場合、目的音は通話での音声である。主マイクロホンは、高レベルの目的音を受音するために、例えば、携帯端末の前面に配置される。主マイクロホンは、目的音に比べて低レベルではあるが、該目的音を妨害する周囲雑音などの妨害音も受音する。したがって、受音信号x1(t)は、高レベルの目的音と低レベルの妨害音が混在されたものとなる。時間周波数変換部１１は、受音信号x1(t)を時間周波数成分X1(f,t)に変換する。 The time frequency converter 11 receives the sound reception signal x1 (t) output from the main microphone. In the case of a mobile terminal (for example, a mobile phone), the target sound is a voice in a call. The main microphone is arranged, for example, on the front surface of the mobile terminal in order to receive a high-level target sound. Although the main microphone is at a lower level than the target sound, the main microphone also receives interference sounds such as ambient noise that interferes with the target sound. Therefore, the received sound signal x1 (t) is a mixture of a high level target sound and a low level interference sound. The time frequency converter 11 converts the sound reception signal x1 (t) into a time frequency component X1 (f, t).

一方、時間周波数変換部１２には、副マイクロホンが出力する受音信号x2(t)が入力される。副マイクロホンは、妨害音を受音するために、例えば携帯端末の背面に配置される。副マイクロホンは、主マイクロホンが受音する目的音より低レベルである目的音と、妨害音を受音する。副マイクロホンが受音する目的音は、主マイクロホンが受音する目的音よりかなり低レベルであり、副マイクロホンが受音する妨害音は、主マイクロホンが受音する妨害音と同レベルである。受音信号x2(t)は、妨害音と低レベルの目的音が混在されたものとなる。時間周波数変換部１２は、受音信号x2(t)を時間周波数成分X2(f,t)に変換する。 On the other hand, the sound reception signal x2 (t) output from the sub microphone is input to the time frequency conversion unit 12. The sub microphone is disposed on the back surface of the mobile terminal, for example, in order to receive the interference sound. The sub microphone receives the target sound that is lower in level than the target sound received by the main microphone and the interference sound. The target sound received by the sub microphone is at a considerably lower level than the target sound received by the main microphone, and the disturbing sound received by the sub microphone is at the same level as the disturbing sound received by the main microphone. The received sound signal x2 (t) is a mixture of disturbing sound and low-level target sound. The time frequency conversion unit 12 converts the received sound signal x2 (t) into a time frequency component X2 (f, t).

時間周波数成分乗算部１３は、主・副マイクロホンの空間的な位置関係、妨害音の性質などから事前に算出された乗算値Gfを時間周波数成分X2(f,t)に付与し、乗算値Gfが付与された時間周波数成分Gf・X2(f,t)を出力する。乗算値Gfは、目的音に対して主・副マイクロホンがそれぞれ出力する受音信号間のエネルギ差を考慮し、さらに、妨害音の受音信号は低周波数領域では高レベルであり、高周波数領域では低レベルであるという一般的性質を考慮して、例えば周波数成分ごとに設定する。乗算値Gfは、1より大きい周波数依存値とするのが好適である。 The time frequency component multiplication unit 13 assigns the multiplication value Gf calculated in advance from the spatial positional relationship between the main and sub microphones, the nature of the interference sound, and the like to the time frequency component X2 (f, t), and the multiplication value Gf. The time frequency component Gf · X2 (f, t) to which is given is output. The multiplication value Gf takes into account the energy difference between the received signals output by the main and auxiliary microphones for the target sound, and the received signal of the disturbing sound is at a high level in the low frequency range, and the high frequency range In consideration of the general property of low level, for example, it is set for each frequency component. The multiplication value Gf is preferably a frequency-dependent value greater than 1.

目的音に対して主・副マイクロホンがそれぞれ出力する受音信号間のエネルギ差は、主・副マイクロホンのインパルス応答を予め測定することにより得ることができる。このエネルギ差は、目的音の音源と主・副マイクロホンとの間の距離、携帯端末の筐体における主・副マイクロホンの設置位置などに依存する。また、妨害音の受音信号の性質は、様々な周囲音源の受音信号を測定し、それらの周波数特性から平均的な周囲音源の受音信号の周波数ごとのエネルギを算出することにより得ることができる。 The energy difference between the received sound signals output from the main and sub microphones with respect to the target sound can be obtained by measuring the impulse responses of the main and sub microphones in advance. This energy difference depends on the distance between the sound source of the target sound and the main / sub microphone, the installation position of the main / sub microphone in the casing of the portable terminal, and the like. In addition, the nature of the received signal of the interfering sound is obtained by measuring the received signal of various ambient sound sources and calculating the energy for each frequency of the received signal of the average ambient sound source from their frequency characteristics. Can do.

時間周波数成分乗算部１３が付与する乗算値Gfは、妨害音に対して主・副マイクロホンがそれぞれ出力する受音信号の時間周波数成分間にエネルギ差を生じさせ、かつ目的音に対して主・副マイクロホンがそれぞれ出力する受音信号の時間周波数成分のエネルギの大小関係が逆転しないようなものとすればよい。 The multiplication value Gf provided by the time-frequency component multiplier 13 causes an energy difference between the time-frequency components of the received signal output from the main and sub-microphones with respect to the disturbing sound, and the main / sub-range with respect to the target sound. What is necessary is just to make it the magnitude relationship of the energy of the time frequency component of the received sound signal which each sub-microphone outputs does not reverse.

パワースペクトル算出部１４は、時間周波数成分X1(f,t)とGf・X2(f,t)のパワースペクトルを算出する。時間周波数成分X1(f,t)とX2(f,t)のパワースペクトルは、それぞれ、X1(f,t)^２、X2(f,t)^２となる。
マスクパターン生成部１５は、下記の数２の式によりクラスタC1(f,t)とマスクパターンm1(f,t)を生成する。マスクパターン生成部１５により生成されたマスクパターンm1(f,t)は、マスキング処理部１６に出力される。数２の式から明らかなように、マスクパターン生成部１５では、X1(f,t)^２＞Gf・X2(f,t)^２の場合１、他の場合０のクラスタC1(f,t)が生成され、主マイクロホンの受音信号x1(t）の時間周波数成分X1(f,t)が該クラスタC1(f,t)に含まれておれば１、他の場合には０のマスクパターンm1(f,t)が生成される。 The power spectrum calculation unit 14 calculates the power spectrum of the time frequency components X1 (f, t) and Gf · X2 (f, t). The power spectra of the time frequency components X1 (f, t) and X2 (f, t) are X1 (f, t) ² and X2 (f, t) ² , respectively.
The mask pattern generation unit 15 generates a cluster C1 (f, t) and a mask pattern m1 (f, t) according to the following equation (2). The mask pattern m1 (f, t) generated by the mask pattern generation unit 15 is output to the masking processing unit 16. As apparent from the equation ( ² ), the mask pattern generation unit 15 uses a cluster C1 (f, t) of 1 if X1 (f, t) ² > Gf · X2 (f, t) ² and 0 in other cases. Is generated, and if the time frequency component X1 (f, t) of the received signal x1 (t) of the main microphone is included in the cluster C1 (f, t), the mask pattern is 1 in other cases. m1 (f, t) is generated.

また、マスクパターン生成部１５は、時間周波数成分の時間ｔにおけるパワースペクトルと、さらに過去の時間t-1，t-2，・・・，t-Nにおける音声フレームのパワースペクトルから、目的音である所望の音声を含む時間ｔにおける時間周波数成分あるいは妨害音である他の音声や雑音の時間ｔにおける時間周波数成分のマスクパターンを生成することができる。 Further, the mask pattern generation unit 15 obtains the desired sound as the target sound from the power spectrum at the time t of the time frequency component and the power spectrum of the voice frame at the past times t−1, t−2,. It is possible to generate a mask pattern of the time frequency component at the time t including the other sound or the time frequency component at the time t of other sounds or noises that are interference sounds.

具体的には、図２に示されているように、時間t-1におけるマスクパターンm_1，t-1(f, t)、時間t-2におけるマスクパターンm_1，t-2(f, t)、・・・、時間t-Nにおけるマスクパターンm_{1, t-N}(f, t)を平均して、下記の数３に示されているようにして、マスクパターンm_1,t(f, t)を生成することができる。 Specifically, as shown in FIG. 2, the mask patterns m _{1 and t-1} (f, t) at time t− _{1 and the} mask patterns m _{1 and t−2} (f, _{t at} time t-2). t),..., the mask patterns m _{1, tN} (f, t) at the time tN are averaged, and the mask pattern m _{1, t} (f, t) is expressed as shown in the following Equation 3. Can be generated.

さらに、数３の変形として、過去のマスクパターンほど寄与度が低くなるように忘却係数を乗じた上で平均してマスクパターンm_1,t(f, t)を生成するようにしてもよい。さらに、論理和や論理積などの論理演算を施すことで、時間ｔにおける最適なマスクパターンm1(f,t)を生成するようにしてもよい。 Further, as a modification of Equation 3, the mask pattern m _{1, t} (f, t) may be generated by averaging after multiplying by a forgetting factor so that the contribution degree becomes lower as the past mask pattern. Furthermore, an optimal mask pattern m1 (f, t) at time t may be generated by performing a logical operation such as logical sum or logical product.

マスキング処理部１６は、時間周波数変換部１１から入力される時間周波数成分X1(f,t)を、マスクパターンm1(f,t)が１の場合通過、０の場合不通過のマスキングを実行する。したがって、マスキング処理部１６からは、主マイクロホンが出力する受音信号x1(t)の時間周波数成分X1(f,t)のうち、主マイクロホンが出力する受音信号x1(t)が支配的な成分Y1(f,t)のみが出力される。 The masking processing unit 16 masks the time frequency component X1 (f, t) input from the time frequency conversion unit 11 when the mask pattern m1 (f, t) is 1 and when it is 0. . Therefore, the received sound signal x1 (t) output from the main microphone is dominant among the time frequency components X1 (f, t) of the received sound signal x1 (t) output from the main microphone. Only the component Y1 (f, t) is output.

時間周波数合成部１７は、主マイクロホンが出力する受音信号x1(t)の時間周波数成分X1(f,t)のうち、主マイクロホンが出力する受音信号x1(t)が支配的な成分Y1(f,t)のみを合成し、出力信号y1(t)を送出する。 The time frequency synthesizer 17 is a component Y1 in which the sound reception signal x1 (t) output from the main microphone is dominant among the time frequency components X1 (f, t) of the sound reception signal x1 (t) output from the main microphone. Only (f, t) is synthesized and the output signal y1 (t) is sent out.

図３は、時間周波数成分乗算部１３での乗算値付与動作の説明図である。同図(a),(b)はそれぞれ、ある時間において、主・副マイクロホンが出力する受音信号の各周波数成分ごとのエネルギを示す。ここで、白部分は、目的音の受音信号の周波数成分であり、黒部分は、妨害音の受音信号の周波数成分である。 FIG. 3 is an explanatory diagram of the multiplication value giving operation in the time-frequency component multiplication unit 13. FIGS. 4A and 4B show the energy for each frequency component of the received sound signal output from the main and sub microphones at a certain time. Here, the white portion is the frequency component of the target sound reception signal, and the black portion is the frequency component of the interference sound reception signal.

例えば、f_１付近の周波数成分は、目的音の受音信号のみであり、主・副マイクロホンが出力する受音信号間にかなり大きなエネルギ差があり、このエネルギ差を利用して目的音を分離することができる。しかし、f_２付近の周波数成分は、妨害音の受音信号のみであり、主・副マイクロホンが出力する受音信号のエネルギはほぼ同じである。このエネルギの大小関係は状況によって変わるので、f_２付近の周波数成分は、目的音として分離されたり、妨害音として分離されたりする。 For example, the frequency component near f ₁ is only the received signal of the target sound, and there is a considerable energy difference between the received signals output by the main and sub microphones, and the target sound is separated using this energy difference. can do. However, the frequency components near f ₂ is only received sound signals of interference sound, the energy of the received sound signal mainly Vice microphone outputs are substantially the same. Since the magnitude relation of the energy will vary depending on the situation, the frequency components near f ₂ is or separated as target sound, or isolated as a disturbing sound.

そこで、図３(c)に示すように、副マイクロホンが出力する受音信号の各周波数成分X2(f,t)に乗算値Gfを付与し、主・副マイクロホンが妨害音の受音信号のみを出力する場合でも、主・副マイクロホンが出力する受音信号の周波数成分間にエネルギ差が生じるようにして、それが目的音として分離されないようにする。ここでは、乗算値Gfを高周波数領域で低下させることによって該領域の目的音を分離されやすくしている。 Therefore, as shown in FIG. 3 (c), a multiplication value Gf is assigned to each frequency component X2 (f, t) of the sound reception signal output from the sub microphone, and the main and sub microphones receive only the interference sound reception signal. Even when the signal is output, an energy difference is generated between the frequency components of the received sound signal output from the main and sub microphones so that the target sound is not separated. Here, by reducing the multiplication value Gf in the high frequency region, the target sound in the region is easily separated.

乗算値Gfは、妨害音に対して主・副マイクロホンが出力する受音信号の周波数成分間にエネルギ差を生じさせ、かつ目的音に対して主・副マイクロホンがそれぞれ出力する受音信号の周波数成分間のエネルギ差を打ち消さない、つまり、目的音に関しては両者の大小関係を逆転させないようなものとすればよい。 The multiplication value Gf causes an energy difference between the frequency components of the received sound signal output by the main and auxiliary microphones for the disturbing sound, and the frequency of the received sound signal output by the main and auxiliary microphones for the target sound. The energy difference between the components may not be canceled, that is, the magnitude relationship between the two may be reversed with respect to the target sound.

しかし、特定周波数成分の目的音あるいは妨害音が重畳された目的音が分離されないように、乗算値Gfを付与することもできる。例えば、図３(b)において、f_３付近の周波数成分に対する乗算値Gfを極めて大きくすれば、該周波数成分では目的音を含めて分離されなくなる。乗算値Gfを調整あるいは選択できるようにしてもよい。 However, the multiplication value Gf can be given so that the target sound with the specific frequency component or the target sound on which the interference sound is superimposed is not separated. For example, in FIG. 3 (b), the if very large multiplication value Gf for the frequency components in the vicinity of f _3, will not be separated, including the target sound in the frequency component. The multiplication value Gf may be adjusted or selected.

図４は、本発明に係る雑音抑圧装置の他の実施形態を示すブロック図であり、図１と同一または同等部分には同じ符号を付してある。本実施形態は、時間周波数変換部１２と時間周波数成分乗算部１３を図１と逆に配置したものであり、本実施形態でも、時間周波数変換部１１、マスキング処理部１６および時間周波数合成部１７により主マイクロホンの信号経路が構成され、時間周波数成分乗算部１３および時間周波数変換部１２により副マイクロホンの信号経路が構成されている。 FIG. 4 is a block diagram showing another embodiment of the noise suppression device according to the present invention, and the same or equivalent parts as those in FIG. In the present embodiment, the time-frequency conversion unit 12 and the time-frequency component multiplication unit 13 are arranged opposite to those in FIG. 1. In this embodiment, the time-frequency conversion unit 11, the masking processing unit 16, and the time-frequency synthesis unit 17 are also provided. Thus, the signal path of the main microphone is constituted, and the signal path of the sub microphone is constituted by the time-frequency component multiplier 13 and the time-frequency converter 12.

時間周波数変換部１１は、主マイクロホンが出力する受音信号x1(t)を入力とし、受音信号x1(t)を時間周波数成分X1(f,t)に変換する。 The time frequency converter 11 receives the sound reception signal x1 (t) output from the main microphone and converts the sound reception signal x1 (t) into a time frequency component X1 (f, t).

時間周波数成分乗算部１３は、主・副マイクロホンの空間的な位置関係、妨害音の性質などから事前に算出された乗算値Gを、副マイクロホンが出力する受音信号x2(t)に付与し、乗算値Gが付与された受音信号G・x2(t)を出力する。乗算値Gは、１より大きい一定値である。 The time frequency component multiplier 13 gives a multiplication value G calculated in advance based on the spatial positional relationship between the main and sub microphones and the nature of the interference sound to the received sound signal x2 (t) output from the sub microphone. The received sound signal G · x2 (t) to which the multiplication value G is assigned is output. The multiplication value G is a constant value greater than 1.

時間周波数変換部１２には、副マイクロホンが出力する受音信号x2(t)が時間周波数成分乗算部１３を介して入力される。したがって、時間周波数変換部１２は、ゲインが付与された受音信号G・x2(t)を時間周波数成分G・X2(f,t)に変換する。 The time frequency conversion unit 12 receives the sound reception signal x2 (t) output from the sub microphone via the time frequency component multiplication unit 13. Therefore, the time-frequency conversion unit 12 converts the received sound signal G · x2 (t) to which the gain is given into a time-frequency component G · X2 (f, t).

パワースペクトル算出部１４は、時間周波数成分X1(f,t)とG・X2(f,t)のパワースペクトルを算出し、マスクパターン生成部１５は、該算出されたパワースペクトルを基に前記の数２の式によりクラスタC1(f,t)とマスクパターンm1(f,t)を生成する。つまり、主マイクロホンが出力する受音信号x1(t)の時間周波数成分X1(f,t)のうち、主マイクロホンが出力する受音信号x1(t)が支配的な成分以外をマスクするマスクパターンm1(f,t)を生成する。マスクパターン生成部１５により生成されたマスクパターンm1(f,t)は、マスキング処理部１６に出力される。 The power spectrum calculation unit 14 calculates the power spectrum of the time frequency components X1 (f, t) and G · X2 (f, t), and the mask pattern generation unit 15 calculates the above-described power spectrum based on the calculated power spectrum. A cluster C1 (f, t) and a mask pattern m1 (f, t) are generated according to the equation (2). That is, the mask pattern that masks components other than the dominant component of the received sound signal x1 (t) output from the main microphone among the time frequency components X1 (f, t) of the received sound signal x1 (t) output from the main microphone Generate m1 (f, t). The mask pattern m1 (f, t) generated by the mask pattern generation unit 15 is output to the masking processing unit 16.

マスキング処理部１６は、時間周波数変換部１１から入力される時間周波数成分X1(f,t)をマスクパターンm1(f,t)によりマスキングする。したがって、マスキング処理部１６からは、主マイクロホンが出力する受音信号x1(t)の時間周波数成分X1(f,t)のうち、主マイクロホンが出力する受音信号x1(t)が支配的な成分Y1(f,t)のみが出力される。 The masking processing unit 16 masks the time frequency component X1 (f, t) input from the time frequency conversion unit 11 with the mask pattern m1 (f, t). Therefore, the received sound signal x1 (t) output from the main microphone is dominant among the time frequency components X1 (f, t) of the received sound signal x1 (t) output from the main microphone. Only the component Y1 (f, t) is output.

図５は、本発明に係る雑音抑圧装置のさらに他の実施形態を示すブロック図であり、図１と同一または同等部分には同じ符号を付してある。 FIG. 5 is a block diagram showing still another embodiment of the noise suppression apparatus according to the present invention, and the same or equivalent parts as those in FIG.

本実施形態では、時間周波数変換部１１、時間周波数成分乗算部１３、時間周波数成分除算部１８、マスキング処理部１６および時間周波数合成部１７により主マイクロホンの信号経路が構成され、時間周波数変換部１２により副マイクロホンの信号経路が構成されている。 In this embodiment, the time-frequency conversion unit 11, the time-frequency component multiplication unit 13, the time-frequency component division unit 18, the masking processing unit 16 and the time-frequency synthesis unit 17 constitute a signal path of the main microphone, and the time-frequency conversion unit 12. Thus, the signal path of the sub microphone is configured.

時間周波数変換部１１には、主マイクロホンが出力する受音信号x1(t)が入力される。時間周波数変換部１１は、受音信号x1(t)を時間周波数成分X1(f,t)に変換する。 The time frequency converter 11 receives the sound reception signal x1 (t) output from the main microphone. The time frequency converter 11 converts the sound reception signal x1 (t) into a time frequency component X1 (f, t).

時間周波数成分乗算部１３は、主・副マイクロホンの空間的な位置関係、妨害音の性質などから事前に算出された乗算値Gfを時間周波数成分X1(f,t)に付与し、乗算値Gfが付与された時間周波数成分Gf・X1(f,t)を送出する。乗算値Gfは、１より小さい周波数依存値である。 The time frequency component multiplication unit 13 assigns a multiplication value Gf calculated in advance from the spatial positional relationship between the main and sub microphones, the nature of the interference sound, and the like to the time frequency component X1 (f, t), and the multiplication value Gf. The time frequency component Gf · X1 (f, t) to which is added is transmitted. The multiplication value Gf is a frequency dependent value smaller than 1.

一方、時間周波数変換部１２には、副マイクロホンが出力する受音信号x2(t)が入力される。時間周波数変換部１２は、受音信号x2(t)を時間周波数成分X2(f,t)に変換する。 On the other hand, the sound reception signal x2 (t) output from the sub microphone is input to the time frequency conversion unit 12. The time frequency conversion unit 12 converts the received sound signal x2 (t) into a time frequency component X2 (f, t).

パワースペクトル算出部１４は、時間周波数成分Gf・X1(f,t)とX2(f,t)のパワースペクトルを算出し、マスクパターン生成部１５は、該算出されたパワースペクトルを基に前記の数２の式によりクラスタC1(f,t)とマスクパターンm1(f,t)を生成する。つまり、主マイクロホンが出力する受音信号x1(t)の時間周波数成分X1(f,t)のうち、主マイクロホンが出力する受音信号x1(t)が支配的な成分以外をマスクするマスクパターンm1(f,t)を生成する。マスクパターン生成部１５により生成されたマスクパターンm1(f,t)は、マスキング処理部１６に出力される。 The power spectrum calculation unit 14 calculates the power spectrum of the time frequency components Gf · X1 (f, t) and X2 (f, t), and the mask pattern generation unit 15 calculates the power spectrum based on the calculated power spectrum. A cluster C1 (f, t) and a mask pattern m1 (f, t) are generated according to the equation (2). That is, the mask pattern that masks components other than the dominant component of the received sound signal x1 (t) output from the main microphone among the time frequency components X1 (f, t) of the received sound signal x1 (t) output from the main microphone Generate m1 (f, t). The mask pattern m1 (f, t) generated by the mask pattern generation unit 15 is output to the masking processing unit 16.

時間周波数成分除算部１８は、時間周波数成分Gf・X1(f,t)に対し、時間周波数成分乗算部１３と逆の処理を施し、時間周波数成分X1(f,t)をマスキング処理部１６に出力する。時間周波数成分除算部１８は、時間周波数成分乗算部１３での乗算値付与に起因する出力信号y1(t)の歪みをなくすために設けているが、歪みが許容できる場合には省略することができる。また、時間周波数成分除算部１８は、マスキング処理部１６の出力側に設けてもよい。 The time frequency component division unit 18 performs a process opposite to that of the time frequency component multiplication unit 13 on the time frequency component Gf · X1 (f, t), and sends the time frequency component X1 (f, t) to the masking processing unit 16. Output. The time frequency component division unit 18 is provided to eliminate distortion of the output signal y1 (t) due to the multiplication value given by the time frequency component multiplication unit 13, but may be omitted if the distortion is acceptable. it can. Further, the time frequency component division unit 18 may be provided on the output side of the masking processing unit 16.

マスキング処理部１６は、時間周波数成分除算部１８から入力される時間周波数成分X1(f,t)をマスクパターンm1(f,t)によりマスキングする。したがって、マスキング処理部１６からは、主マイクロホンが出力する受音信号x1(t)の時間周波数成分X1(f,t)のうち、主マイクロホンが出力する受音信号x1(t)が支配的な成分Y1(f,t)のみが出力される。 The masking processor 16 masks the time frequency component X1 (f, t) input from the time frequency component divider 18 with the mask pattern m1 (f, t). Therefore, the received sound signal x1 (t) output from the main microphone is dominant among the time frequency components X1 (f, t) of the received sound signal x1 (t) output from the main microphone. Only the component Y1 (f, t) is output.

以上、実施形態について説明したが、本発明は、上記実施形態に限定されず、種々に変形することができる。例えば、時間周波数成分乗算部は、図１および図４の実施形態では副マイクロホンの信号経路に設けられ、図５の実施形態では主マイクロホンの信号経路に設けられたが、主・副マイクロホンの信号経路の両方に時間周波数成分乗算部を設け、それらの乗算値を調整するようにすることもできる。ただし、主マイクロホンの信号経路において乗算値Gf(周波数依存値)を付与する場合には、主マイクロホンが出力する受音信号が乗算値Gfによって変形されるので、図５の実施形態と同様に、時間周波数成分除算部を設けることが好ましい。 Although the embodiment has been described above, the present invention is not limited to the above embodiment and can be variously modified. For example, the time frequency component multiplying unit is provided in the signal path of the sub microphone in the embodiment of FIGS. 1 and 4, and is provided in the signal path of the main microphone in the embodiment of FIG. It is also possible to provide time frequency component multipliers on both paths and adjust their multiplication values. However, when a multiplication value Gf (frequency dependent value) is given in the signal path of the main microphone, the sound reception signal output from the main microphone is transformed by the multiplication value Gf, and thus, as in the embodiment of FIG. It is preferable to provide a time frequency component division unit.

また、上記実施形態は、目的音の受音信号を分離して出力するものであるが、これに加えて妨害音の受音信号を分離して出力したり、妨害音の受音信号のみを分離して出力するようにもできる。妨害音の受音信号は、例えば、周囲雑音の測定、携帯端末の背面方向から到来する音声の抽出などに用いることができる。 Moreover, although the said embodiment isolate | separates and outputs the received signal of a target sound, in addition to this, the received signal of a disturbing sound is isolate | separated and output, or only the received signal of a disturbing sound is output. It can also be output separately. The received signal of the disturbing sound can be used, for example, for measurement of ambient noise, extraction of sound coming from the back side of the mobile terminal, and the like.

図６は、目的音および妨害音の受音信号をそれぞれ分離して出力する場合の変形例を示すブロック図である。同図において、時間周波数変換部１１、１２、時間周波数成分乗算部１３、パワースペクトル算出部１４、マスキング処理部１６および時間周波数合成部１７は、図１と同じものであるが、マスクパターン生成部１５は、マスクパターンm1(f,t)の他に、これが反転されたマスクパターンm2(f,t)を、下記の数４の式により生成する。 FIG. 6 is a block diagram illustrating a modified example in which the received sound signals of the target sound and the interference sound are separated and output. In the figure, the time frequency conversion units 11 and 12, the time frequency component multiplication unit 13, the power spectrum calculation unit 14, the masking processing unit 16 and the time frequency synthesis unit 17 are the same as those in FIG. 15 generates a mask pattern m2 (f, t) obtained by inverting this in addition to the mask pattern m1 (f, t) by the following equation (4).

マスキング処理部１９は、時間周波数成分乗算部１３から入力される時間周波数成分Gf・X2(f,t)をマスクパターンm2(f,t)によりマスキングする。したがって、マスキング処理部１９からは、時間周波数成分Gf・X2(f,t)のうち、主マイクロホンが出力する受音信号x1(t)が支配的な成分を除いた成分、すなわち雑音の時間周波数成分のみが出力される。 The masking processor 19 masks the time frequency component Gf · X2 (f, t) input from the time frequency component multiplier 13 with the mask pattern m2 (f, t). Accordingly, the masking processing unit 19 removes the time frequency component Gf · X2 (f, t) from which the received signal x1 (t) output by the main microphone is dominant, that is, the time frequency of the noise. Only the components are output.

時間周波数合成部２０は、時間周波数成分Gf・X2(f,t)のうち、主マイクロホンが出力する受音信号x1(t)が支配的な成分を除いた成分Gf・Y2(f,t)、すなわち雑音の時間周波数成分Gf・Y2(f,t)のみを合成し、出力信号Gf・y2(t)を出力する。 The time frequency synthesizer 20 removes the component Gf · Y2 (f, t) from the time frequency component Gf · X2 (f, t) excluding the component in which the received sound signal x1 (t) output from the main microphone is dominant. That is, only the time frequency component Gf · Y2 (f, t) of the noise is synthesized and the output signal Gf · y2 (t) is output.

１１，１２・・・時間周波数変換部、１３・・・時間周波数成分乗算部、１４・・・パワースペクトル算出部、１５・・・マスクパターン生成部、１６，１９・・・マスキング処理部、１７，２０・・・時間周波数合成部、１８・・・時間周波数成分徐算部 DESCRIPTION OF SYMBOLS 11, 12 ... Time frequency conversion part, 13 ... Time frequency component multiplication part, 14 ... Power spectrum calculation part, 15 ... Mask pattern generation part, 16, 19 ... Masking process part, 17 , 20 ... time frequency synthesis unit, 18 ... time frequency component gradual calculation unit

Claims

主・副マイクロホンが出力する受音信号から少なくとも目的音成分を分離して出力する雑音抑圧方法において、
主・副マイクロホンの信号経路において、主・副マイクロホンが出力する受音信号をそれぞれ時間周波数成分に変換する第１のステップと、
副マイクロホンの信号経路において、時間周波数成分へ変換される前の受音信号、あるいは時間周波数成分へ変換された後の時間周波数成分に乗算値を乗算する第２のステップと、
前記第１のステップにより変換された後の主マイクロホンの時間周波数成分のパワースペクトルと前記第２のステップにより乗算値が乗算された副マイクロホンの時間周波数成分のパワースペクトルとを算出する第３のステップと、
前記第３のステップで算出された主・副マイクロホンのパワースペクトルからマスクパターンを生成する第４のステップと、
前記第１のステップにより変換された後の主マイクロホンの時間周波数成分を、前記第４のステップにより生成されるマスクパターンを用いてマスキングする第５のステップと、
前記第５のステップにより出力される主マイクロホンの時間周波数成分を合成する第６のステップを有することを特徴とする雑音抑圧方法。 In a noise suppression method that separates and outputs at least the target sound component from the received sound signal output by the main and sub microphones,
A first step of converting sound reception signals output from the main and sub microphones into time frequency components in the signal path of the main and sub microphones,
A second step of multiplying a received signal before being converted into a time frequency component or a time frequency component after being converted into a time frequency component by a multiplication value in the signal path of the sub microphone;
The third step of calculating the power spectrum of the time frequency component of the main microphone after the conversion in the first step and the power spectrum of the time frequency component of the sub microphone multiplied by the multiplication value in the second step. When,
A fourth step of generating a mask pattern from the power spectrum of the main / sub microphone calculated in the third step;
A fifth step of masking the time-frequency component of the main microphone after being converted by the first step using the mask pattern generated by the fourth step;
A noise suppression method comprising a sixth step of synthesizing the time-frequency components of the main microphone output in the fifth step.

主・副マイクロホンが出力する受音信号から少なくとも目的音成分を分離して出力する雑音抑圧方法において、
主・副マイクロホンの信号経路において、主・副マイクロホンが出力する受音信号をそれぞれ時間周波数成分に変換する第１のステップと、
主マイクロホンの信号経路において、時間周波数成分へ変換された後の時間周波数成分に乗算値を乗算する第２のステップと、
前記第２のステップにより乗算値を乗算された後の主マイクロホンの時間周波数成分のパワースペクトルと前記第１のステップにより変換された副マイクロホンの時間周波数成分のパワースペクトルとを算出する第３のステップと、
前記第３のステップで算出された主・副マイクロホンのパワースペクトルからマスクパターンを生成する第４のステップと、
前記第２のステップにより乗算値を乗算された時間周波数成分を該乗算値で除算する第５のステップと、
前記第５のステップにより除算された後の時間周波数成分を、前記第４のステップにより生成されるマスクパターンを用いてマスキングする第６のステップと、
前記第６のステップにより出力される時間周波数成分を合成する第７のステップを有することを特徴とする雑音抑圧方法。 In a noise suppression method that separates and outputs at least the target sound component from the received sound signal output by the main and sub microphones,
A first step of converting sound reception signals output from the main and sub microphones into time frequency components in the signal path of the main and sub microphones,
A second step of multiplying the time frequency component after being converted into the time frequency component by a multiplication value in the signal path of the main microphone;
Third step of calculating the power spectrum of the time frequency component of the main microphone after being multiplied by the multiplication value in the second step and the power spectrum of the time frequency component of the sub microphone converted by the first step. When,
A fourth step of generating a mask pattern from the power spectrum of the main / sub microphone calculated in the third step;
A fifth step of dividing the time frequency component multiplied by the multiplication value by the second step by the multiplication value;
A sixth step of masking the time frequency component after being divided by the fifth step using the mask pattern generated by the fourth step;
A noise suppression method comprising a seventh step of synthesizing the time-frequency components output in the sixth step.

主・副マイクロホンが出力する受音信号から少なくとも目的音成分を分離して出力する雑音抑圧装置において、
主・副マイクロホンの信号経路に設けられ、主・副マイクロホンが出力する受音信号をそれぞれ時間周波数成分に変換する変換手段と、
副マイクロホンの信号経路に設けられ、時間周波数成分へ変換される前の受音信号、あるいは時間周波数成分へ変換された後の時間周波数成分に乗算値を乗算する乗算手段と、前記主マイクロホンの時間周波数成分のパワースペクトルと前記乗算手段により乗算された後の副マイクロホンの時間周波数成分のパワースペクトルとを生成するパワースペクトル算出手段と、
前記パワースペクトル算出手段で生成された主マイクロホンの時間周波数成分のパワースペクトルと前記乗算値が乗算された副マイクロホンの時間周波数成分のパワースペクトルからマスクパターンを生成するマスクパターン生成手段と、
前記主マイクロホンの時間周波数成分を、前記マスクパターン生成手段により生成されたマスクパターンを用いてマスキングするマスキング処理手段と、
前記マスキング処理手段より出力される主マイクロホンの時間周波数成分を合成する合成手段とを有することを特徴とする雑音抑圧装置。 In a noise suppression device that separates and outputs at least the target sound component from the received sound signal output by the main and sub microphones,
Conversion means provided in the signal path of the main and sub microphones, and converting the received sound signals output from the main and sub microphones into time frequency components, respectively;
Multiplier provided in the signal path of the sub microphone and multiplying the received sound signal before being converted into the time frequency component or the time frequency component after being converted into the time frequency component by a multiplication value; and the time of the main microphone A power spectrum calculating means for generating a power spectrum of a frequency component and a power spectrum of a time frequency component of the sub microphone after being multiplied by the multiplication means;
Mask pattern generation means for generating a mask pattern from the power spectrum of the time frequency component of the sub microphone multiplied by the power spectrum of the time frequency component of the main microphone generated by the power spectrum calculation means;
Masking processing means for masking the time frequency component of the main microphone using the mask pattern generated by the mask pattern generating means;
A noise suppression apparatus comprising: a synthesizing unit that synthesizes a time frequency component of the main microphone output from the masking processing unit.

前記副マイクロホンの時間周波数成分を、前記マスクパターン生成手段により生成されたマスクパターンを用いてマスキングするマスキング処理手段と、
前記マスキング処理手段より出力される副マイクロホンの時間周波数成分を合成する合成手段とをさらに有することを特徴とする請求項３に記載の雑音抑圧装置。 Masking processing means for masking the time frequency component of the sub microphone using the mask pattern generated by the mask pattern generating means;
4. The noise suppression apparatus according to claim 3, further comprising a synthesis unit that synthesizes a time-frequency component of the sub microphone output from the masking processing unit.

主・副マイクロホンが出力する受音信号から少なくとも目的音成分を分離して出力する雑音抑圧装置において、
主・副マイクロホンの信号経路に設けられ、主・副マイクロホンが出力する受音信号をそれぞれ時間周波数成分に変換する変換手段と、
主マイクロホンの信号経路に設けられ、時間周波数成分へ変換された後の時間周波数成分に乗算値を乗算する乗算手段と、
前記乗算手段により乗算値を乗算された後の時間周波数成分のパワースペクトルと前記変換された副マイクロホンの時間周波数成分のパワースペクトルとを生成するパワースペクトル算出手段と、
前記パワースペクトル算出手段で生成された前記乗算値が乗算された主マイクロホンの時間周波数成分のパワースペクトルと前記副マイクロホンの時間周波数成分のパワースペクトルからマスクパターンを生成するマスクパターン生成手段と、
前記乗算手段により乗算値を乗算された主マイクロホンの時間周波数成分を該乗算値で除算する除算手段と、
前記除算手段により除算された後の時間周波数成分を、前記マスクパターン生成手段により生成されたマスクパターンを用いてマスキングするマスキング処理手段と、
前記マスキング処理手段より出力される主マイクロホンの時間周波数成分を合成する合成手段とを有することを特徴とする雑音抑圧装置。 In a noise suppression device that separates and outputs at least the target sound component from the received sound signal output by the main and sub microphones,
Conversion means provided in the signal path of the main and sub microphones, and converting the received sound signals output from the main and sub microphones into time frequency components, respectively;
Multiplication means provided in the signal path of the main microphone and multiplying the time frequency component after being converted into the time frequency component by a multiplication value;
Power spectrum calculation means for generating a power spectrum of the time frequency component after being multiplied by the multiplication value by the multiplication means and a power spectrum of the time frequency component of the converted sub microphone;
Mask pattern generating means for generating a mask pattern from the power spectrum of the time frequency component of the main microphone multiplied by the multiplication value generated by the power spectrum calculating means and the power spectrum of the time frequency component of the sub microphone;
Division means for dividing the time frequency component of the main microphone multiplied by the multiplication value by the multiplication means by the multiplication value;
Masking processing means for masking the time frequency component after being divided by the dividing means using the mask pattern generated by the mask pattern generating means;
A noise suppression apparatus comprising: a synthesizing unit that synthesizes a time frequency component of the main microphone output from the masking processing unit.

前記乗算手段で乗算される乗算値は、妨害音に対して主・副マイクロホンがそれぞれ出力する受音信号の時間周波数成分間にパワースペクトル差を生じさせ、かつ目的音に対して主・副マイクロホンがそれぞれ出力する受音信号の時間周波数成分のパワースペクトルの大小関係が逆転しないように、一定値あるいは周波数依存値に設定されることを特徴とする請求項３ないし５のいずれかに記載の雑音抑圧装置。 The multiplication value multiplied by the multiplication means causes a power spectrum difference between the time frequency components of the received signal output from the main and sub microphones for the disturbing sound, and the main and sub microphones for the target sound. The noise according to any one of claims 3 to 5, wherein the noise spectrum is set to a constant value or a frequency-dependent value so that the magnitude relation of the power spectrum of the time frequency component of each received sound signal is not reversed. Suppressor.

前記マスクパターン生成手段は、過去のマスクパターンを平均して、または過去のマスクパターンほど寄与度が低くなるように忘却係数を乗じた上で平均して、マスクパターンを生成することを特徴とする請求項３ないし６のいずれかに記載の雑音抑圧装置。 The mask pattern generation means generates a mask pattern by averaging past mask patterns or averaging after multiplying by a forgetting factor so that the contribution degree is lower as the past mask pattern is reduced. The noise suppression device according to claim 3.

主・副マイクロホンが出力する受音信号から少なくとも目的音成分を分離して出力する機能を実現するプログラムであって、コンピュータに、
主・副マイクロホンの信号経路において、主・副マイクロホンが出力する受音信号をそれぞれ時間周波数成分に変換する第１の機能と、
副マイクロホンの信号経路において、時間周波数成分へ変換される前の受音信号、あるいは時間周波数成分へ変換された後の時間周波数成分に乗算値を乗算する第２の機能と、
前記第１の機能により変換された後の主マイクロホンの時間周波数成分のパワースペクトルと前記第２の機能により乗算値が乗算された副マイクロホンの時間周波数成分のパワースペクトルとを算出する第３の機能と、
前記第３の機能で算出された主・副マイクロホンのパワースペクトルからマスクパターンを生成する第４の機能と、
前記第１の機能により変換された後の主マイクロホンの時間周波数成分を、前記第４の機能により生成されるマスクパターンを用いてマスキングする第５の機能と、
前記第５の機能により出力される主マイクロホンの時間周波数成分を合成する第６の機能とを実行させるプログラム。 A program that realizes the function of separating and outputting at least the target sound component from the received sound signal output by the main and sub microphones,
A first function for converting received sound signals output from the main and sub microphones into time frequency components in the signal path of the main and sub microphones;
A second function of multiplying a received signal before being converted into a time frequency component or a time frequency component after being converted into a time frequency component by a multiplication value in the signal path of the sub microphone;
A third function for calculating the power spectrum of the time-frequency component of the main microphone after being converted by the first function and the power spectrum of the time-frequency component of the sub-microphone multiplied by the multiplication value by the second function When,
A fourth function for generating a mask pattern from the power spectrum of the main / sub microphone calculated by the third function;
A fifth function of masking the time frequency component of the main microphone after being converted by the first function using a mask pattern generated by the fourth function;
A program for executing a sixth function for synthesizing a time-frequency component of the main microphone output by the fifth function.

主・副マイクロホンが出力する受音信号から少なくとも目的音成分を分離して出力する機能を実現するプログラムであって、コンピュータに、
主・副マイクロホンの信号経路において、主・副マイクロホンが出力する受音信号をそれぞれ時間周波数成分に変換する第１の機能と、
主マイクロホンの信号経路において、時間周波数成分へ変換された後の時間周波数成分に乗算値を乗算する第２の機能と、
前記第２の機能により乗算値を乗算された後の主マイクロホンの時間周波数成分のパワースペクトルと前記第１の機能により変換された副マイクロホンの時間周波数成分のパワースペクトルとを算出する第３の機能と、
前記第３の機能で算出された主・副マイクロホンのパワースペクトルを各時間周波数成分からマスクパターンを生成する第４の機能と、
前記第２の機能により乗算値を乗算された時間周波数成分を該乗算値で除算する第５の機能と、
前記第５の機能により除算された後の時間周波数成分を、前記第４の機能により生成されるマスクパターンを用いてマスキングする第６の機能と、
前記第６の機能により出力される時間周波数成分を合成する第７の機能とを実行させるプログラム。 A program that realizes the function of separating and outputting at least the target sound component from the received sound signal output by the main and sub microphones,
A first function for converting received sound signals output from the main and sub microphones into time frequency components in the signal path of the main and sub microphones;
A second function of multiplying the time frequency component after being converted into the time frequency component by a multiplication value in the signal path of the main microphone;
A third function for calculating the power spectrum of the time frequency component of the main microphone after being multiplied by the multiplication value by the second function and the power spectrum of the time frequency component of the sub microphone converted by the first function When,
A fourth function for generating a mask pattern from each time frequency component of the power spectrum of the main / sub microphone calculated by the third function;
A fifth function for dividing the time frequency component multiplied by the multiplication value by the second function by the multiplication value;
A sixth function for masking a time frequency component after being divided by the fifth function using a mask pattern generated by the fourth function;
A program for executing a seventh function for synthesizing a time-frequency component output by the sixth function.