JP6204312B2

JP6204312B2 - Sound collector

Info

Publication number: JP6204312B2
Application number: JP2014173523A
Authority: JP
Inventors: 健太丹羽; 小林　和則; 和則小林
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2014-08-28
Filing date: 2014-08-28
Publication date: 2017-09-27
Anticipated expiration: 2034-08-28
Also published as: JP2016048872A

Description

本発明は、ヘッドセット型の収音装置に関する。 The present invention relates to a headset type sound collecting device.

非特許文献１が、ノイズキャンセリングヘッドホンとして知られている。非特許文献１では、ヘッドホンに内蔵されたマイクで騒音を収音し、その騒音を打ち消す効果のある逆位相の音を発生させ、音声信号とともに再生する。これにより、利用者は、音声信号をより明瞭に聞き取ることができる。 Non-Patent Document 1 is known as noise canceling headphones. In Non-Patent Document 1, noise is collected by a microphone built in the headphone, an opposite phase sound having an effect of canceling the noise is generated, and reproduced together with an audio signal. Thereby, the user can hear a voice signal more clearly.

「製品情報 > ヘッドホン > ラインアップ > MDR-1RNC > 商品の特長」、[online]、Sony Corporation, Sony Marketing (Japan) Inc.,[平成26年7月24日検索]、インターネット<URL:http://www.sony.jp/headphone/products/MDR-1RNC/feature_1.html>“Product Information> Headphones> Lineup> MDR-1RNC> Product Features” [online], Sony Corporation, Sony Marketing (Japan) Inc., [searched July 24, 2014], Internet <URL: http: //www.sony.jp/headphone/products/MDR-1RNC/feature_1.html>

しかしながら、従来技術は、別環境で収音、または、録音された音声・音響信号をヘッドホンで再生する場合に、再生する音声・音響信号に対して、ヘッドホン周辺のノイズをキャンセルすることを想定しており、ヘッドセットのマイクロホンで収音された収音信号に対してヘッドセット周辺のノイズをキャンセルすることを想定していない。なお、ヘッドホンとは再生装置や受信機から出力された電気信号を、耳（鼓膜）に近接した発音体（スピーカーなど）を用いて音波（可聴音）に変換する装置であり、ヘッドセットとは頭部に装着するマイクロホン（収音装置）の総称である。 However, the conventional technology assumes that noise around a headphone is canceled with respect to the reproduced sound / sound signal when the sound / sound signal recorded or recorded sound is reproduced with headphones in another environment. Therefore, it is not assumed that the noise around the headset is canceled with respect to the collected sound signal collected by the microphone of the headset. A headphone is a device that converts electrical signals output from a playback device or receiver into sound waves (audible sound) using a sounding body (such as a speaker) close to the ear (the eardrum). A generic term for microphones (sound pickup devices) worn on the head.

本発明は、所定の位置関係にある複数のマイクロホンを利用して、ターゲット音及び外部ノイズ音を収音し、ターゲット音が従来よりも明瞭になるように信号処理を行うヘッドセット型の収音装置を提供することを目的とする。 The present invention collects a target sound and an external noise sound using a plurality of microphones in a predetermined positional relationship, and performs a signal processing so that the target sound is clearer than before. An object is to provide an apparatus.

上記の課題を解決するために、本発明の一態様によれば、収音装置は、ヘッドセット型である。収音装置は、収音装置の装着者の口元近傍に配され、装着者が発する音声であるターゲット音を収音するための１個の第一マイクロホンと、第一マイクロホンとは、ヘッドセットの形態において離れた位置に配置され、外部ノイズ音を収音するための２個の第二マイクロホンと、第一マイクロホンの収音信号及び第二マイクロホンの収音信号を用いて、(i)ターゲット音を強調した、または／および、(ii)外部ノイズ音を抑圧した出力信号を生成する信号処理部とを含む。第一マイクロホンは、装着者の口元の方向に対して単一指向性を有し、２個の第二マイクロホンは、それぞれ、装着者の両耳近傍に配置され、外向きに対して指向性を有する。 In order to solve the above problems, according to one aspect of the present invention, the sound collection device is a headset type. The sound collection device is arranged near the mouth of the wearer of the sound collection device, and a first microphone for collecting a target sound, which is a sound emitted by the wearer, (I) Target sound by using two second microphones for collecting external noise sound, a collected sound signal of the first microphone, and a collected sound signal of the second microphone. And / or (ii) a signal processing unit that generates an output signal in which external noise sound is suppressed. The first microphone has a single directivity with respect to the direction of the wearer's mouth, and the two second microphones are respectively disposed in the vicinity of both ears of the wearer and have directivity with respect to the outward direction. Have.

本発明によれば、収音信号に対してターゲット音が従来のヘッドセット型の収音装置よりも明瞭になるように信号処理を行うことができるという効果を奏する。 According to the present invention, there is an effect that signal processing can be performed on a sound collection signal so that the target sound is clearer than a conventional headset type sound collection device.

第一実施形態に係る収音装置の構成を示す図。The figure which shows the structure of the sound collection device which concerns on 1st embodiment. 第一実施形態に係る収音装置の機能ブロック図。The functional block diagram of the sound collection device which concerns on 1st embodiment. 第一実施形態に係る収音装置の処理フローの例を示す図。The figure which shows the example of the processing flow of the sound collection device which concerns on 1st embodiment. 第二実施形態に係るフィルタ推定部のブロック図。The block diagram of the filter estimation part which concerns on 2nd embodiment. 第二実施形態に係るフィルタ推定部の処理フローの例を示す図。The figure which shows the example of the processing flow of the filter estimation part which concerns on 2nd embodiment. ゲインシェーピングの例を説明するための図。The figure for demonstrating the example of gain shaping. 第三実施形態に係る収音装置のブロック図。The block diagram of the sound collection device which concerns on 3rd embodiment. 第三実施形態に係る収音装置の処理フローの例を示す図。The figure which shows the example of the processing flow of the sound collection device which concerns on 3rd embodiment. 第三実施形態の変形例に係るフィルタ推定部のブロック図。The block diagram of the filter estimation part which concerns on the modification of 3rd embodiment. 第三実施形態の変形例に係るフィルタ推定部の処理フローの例を示す図。The figure which shows the example of the processing flow of the filter estimation part which concerns on the modification of 3rd embodiment.

以下、本発明の実施形態について説明する。なお、以下の説明に用いる図面では、同じ機能を持つ構成部や同じ処理を行うステップには同一の符号を記し、重複説明を省略する。以下の説明において、テキスト中で使用する記号「^」等は、本来直後の文字の真上に記載されるべきものであるが、テキスト記法の制限により、当該文字の直前に記載する。式中においてはこれらの記号は本来の位置に記述している。また、ベクトルや行列の各要素単位で行われる処理は、特に断りが無い限り、そのベクトルやその行列の全ての要素に対して適用されるものとする。 Hereinafter, embodiments of the present invention will be described. In the drawings used for the following description, constituent parts having the same function and steps for performing the same process are denoted by the same reference numerals, and redundant description is omitted. In the following explanation, the symbol “^” etc. used in the text should be described immediately above the character immediately after it, but it is described immediately before the character due to restrictions on the text notation. In the formula, these symbols are written in their original positions. Further, the processing performed for each element of a vector or matrix is applied to all elements of the vector or matrix unless otherwise specified.

＜第一実施形態＞
図１は第一実施形態に係る収音装置１００の構成を示す図、図２はその機能ブロック図、図３はその処理フローを示す図である。 <First embodiment>
FIG. 1 is a diagram illustrating a configuration of a sound collection device 100 according to the first embodiment, FIG. 2 is a functional block diagram thereof, and FIG. 3 is a diagram illustrating a processing flow thereof.

収音装置１００は、第一マイクロホン１１０と、第二マイクロホン１２０−１及び１２０−２と、信号処理部１４０を含む。 The sound collection device 100 includes a first microphone 110, second microphones 120-1 and 120-2, and a signal processing unit 140.

収音装置１００は、第一マイクロホン１１０及び第二マイクロホン１２０−１及び１２０−２で収音された収音信号から、(i)ターゲット音を強調した、または／および、(ii)外部ノイズ音を抑圧した出力信号zを生成し出力する。 The sound collection device 100 (i) emphasizes the target sound from the sound collection signals collected by the first microphone 110 and the second microphones 120-1 and 120-2 and / or (ii) external noise sound. An output signal z in which is suppressed is generated and output.

＜ヘッドセットの形状と、第一マイクロホン及び第二マイクロホン＞
収音装置１００は、ヘッドセット型である。前述の通り、ヘッドセットとは頭部に装着するマイクロホン（収音装置）の総称であり、その構造は固定バンド１０１が装着者９の頭頂部を通って耳当て部１０２等を介して装着者９の両耳で支持するものや、耳当て部１０２及び固定バンド１０１を介して装着者９の両耳と後頭部で支持するもの、固定バンド１０１及び図示しないこめかみ支持部を介して装着者９の後頭部とこめかみで支持するもの等がある。その他、頭部に装着可能な形態であって、以下の第一マイクロホン及び第二マイクロホンを搭載することができる形態であればよい。 <Headset shape, first microphone and second microphone>
The sound collection device 100 is a headset type. As described above, a headset is a general term for a microphone (sound collecting device) worn on the head, and the structure of the headset is that the fixed band 101 passes through the top of the wearer 9 via the ear pad 102 and the like. 9 supported by both ears of the wearer 9, supported by the ears and the back of the wearer 9 via the ear pad 102 and the fixed band 101, and supported by the wearer 9 via the fixed band 101 and the temple support portion not shown. Some are supported by the back of the head and temples. In addition, any form that can be mounted on the head and that can mount the following first microphone and second microphone may be used.

第一マイクロホンは、装着者９が発する音声であるターゲット音を収音するためのマイクロホンである。本実施形態では、第一マイクロホンは、１つであり（第一マイクロホン１１０）、装着者９の口元近傍に配置され、装着者９の口元の方向に対して単一指向性を有する（図１参照）。このような構成により、装着者９が発する音声であるターゲット音を収音する。 The first microphone is a microphone for picking up a target sound that is a sound emitted by the wearer 9. In the present embodiment, there is one first microphone (first microphone 110), which is disposed in the vicinity of the mouth of the wearer 9, and has a single directivity with respect to the direction of the mouth of the wearer 9 (FIG. 1). reference). With such a configuration, a target sound that is a sound emitted by the wearer 9 is collected.

第二マイクロホンは、外部ノイズ音を収音するためのマイクロホンである。本実施形態では、第二マイクロホンは、２つであり（第二マイクロホン１２０−１及び１２０−２）、装着者９の両耳近傍に配置され、外向きに対して指向性を有する（図１参照）。このような構成により、外部ノイズ音を収音する。 The second microphone is a microphone for collecting external noise sound. In the present embodiment, there are two second microphones (second microphones 120-1 and 120-2), which are arranged in the vicinity of both ears of the wearer 9 and have directivity toward the outside (FIG. 1). reference). With such a configuration, an external noise sound is collected.

ヘッドセットの場合、収音したい音は装着者９の口から発せられる音声に限られると考えられる。そこで、本実施形態では、第一マイクロホンを装着者９の口元近傍に配置し、第一マイクロホンはできるだけ口元の音を強調して収音するために、指向性のあるものを選んだ。また、外部ノイズ音を強調して収音するための第二マイクロホンを第一マイクロホンから離れた位置に配置し、外向きに指向性を有するものとした。 In the case of the headset, it is considered that the sound to be collected is limited to the sound emitted from the mouth of the wearer 9. Therefore, in the present embodiment, the first microphone is arranged near the mouth of the wearer 9, and the first microphone is selected to have directivity in order to emphasize the sound of the mouth as much as possible. In addition, the second microphone for emphasizing the external noise sound and picking up the sound is arranged at a position away from the first microphone, and has directivity outward.

＜信号処理部１４０の処理内容＞
信号処理部１４０は、第一マイクロホンの収音信号及び第二マイクロホンの収音信号を第一マイクロホン及び第二マイクロホンから受け取り、これらの値を用いて、(i)前記ターゲット音を強調した、または／および、(ii)前記外部ノイズ音を抑圧した出力信号を生成し、収音装置１００の出力値として出力する。 <Processing content of signal processing unit 140>
The signal processing unit 140 receives the collected sound signal of the first microphone and the collected sound signal of the second microphone from the first microphone and the second microphone, and using these values, (i) emphasizes the target sound, or / And (ii) An output signal in which the external noise sound is suppressed is generated and output as an output value of the sound collection device 100.

例えば、信号処理部１４０は、自乗部１４１及び１４２、フィルタ推定部１４３及び第二フィルタリング部１４４を含む。 For example, the signal processing unit 140 includes square units 141 and 142, a filter estimation unit 143, and a second filtering unit 144.

信号処理部１４０は、第一マイクロホン１１０で収音された時間領域の信号を、周波数領域の信号に変換した収音信号X₀、並びに、第二マイクロホン１２０−１及び１２０−２でそれぞれ収音された時間領域の信号を、周波数領域の信号に変換した周波数領域の収音信号X₁及びX₂を受け取り、出力信号zを出力する。 The signal processing unit 140 collects the sound collected signal X _{0 obtained} by converting the time domain signal collected by the first microphone 110 into a frequency domain signal, and the second microphones 120-1 and 120-2, respectively. The frequency domain sound pickup signals X ₁ and X ₂ obtained by converting the time domain signal thus converted into a frequency domain signal are received, and an output signal z is output.

＜自乗部１４１＞
自乗部１４１は、第二マイクロホン１２０−１及び１２０−２の収音信号X₁及びX₂を受け取り、これらの値を加算して自乗した値、または、これらの値の自乗和、または、これらの値を自乗した値を重み付加算した値を計算し（Ｓ１４１）出力する。なお、^φ_Nは以下の式で定義される雑音エリアのパワースペクトル密度の推定値である。例えば、 <Square part 141>
The square unit 141 receives the collected sound signals X ₁ and X ₂ of the second microphones 120-1 and 120-2, adds these values and squares them, or the square sum of these values, or these A value obtained by weighting and adding a value obtained by squaring the value of is calculated (S141) and output. Note that ^ φ _N is an estimate of the power spectral density of the noise area defined by the following equation. For example,

または

Or

または、

Or

とする。ただし、Kは第二マイクロホンの個数（よって本実施形態ではK=2）、ωは周波数、τはフレームのインデックス、g_k(ω)は予め設定した定数（重み）を表す。雑音エリアの詳細については後述する。 And However, K is the number of second microphones (therefore, K = 2 in this embodiment), ω is a frequency, τ is a frame index, and g _k (ω) is a preset constant (weight). Details of the noise area will be described later.

（１）とする場合、（２）を使用するより正確に雑音エリアのパワースペクトル密度を推定することができるという利点がある。 In the case of (1), there is an advantage that the power spectrum density of the noise area can be estimated more accurately than using (2).

（２）とする場合、（１）よりも雑音エリアのパワースペクトル密度の推定の誤差が増えるが、加算処理をアナログ回路で行うことが可能であり、第二マイクロホンの個数が２以上であってもＡＤ変換機１つでハードウェア構成することができ、安価なハード構成とできるという利点がある。 In the case of (2), the error in estimating the power spectral density of the noise area is larger than in (1), but the addition process can be performed by an analog circuit, and the number of second microphones is 2 or more. Also, there is an advantage that a hardware configuration can be achieved with one AD converter, and an inexpensive hardware configuration can be achieved.

（３）とする場合、ターゲット音よりも外部ノイズ音が多く含まれている第二マイクロホンに対する重みを大きく設定することができ、より精度よく雑音エリアのパワースペクトル密度を推定できるという利点がある。たとえば、口元から最も離れている第二マイクロホンの重みを最も大きくすることで、ターゲット音の混入が少ない第二マイクロホンの重みを大きくし、雑音エリアのパワースペクトル密度の推定精度を高めることができる。 In the case of (3), there is an advantage that the weight for the second microphone containing more external noise sounds than the target sound can be set larger, and the power spectral density of the noise area can be estimated more accurately. For example, by increasing the weight of the second microphone farthest from the mouth, it is possible to increase the weight of the second microphone with a small amount of target sound and increase the estimation accuracy of the power spectrum density in the noise area.

＜自乗部１４２＞
自乗部１４２は、第一マイクロホン１１０の収音信号X₀を受け取り、この値を自乗した値^φ_S(ω,τ)=|X₀(ω,τ)|²を計算し（Ｓ１４２）、出力する。なお、^φ_Sは、ターゲットエリアのパワースペクトル密度の推定値である。ターゲットエリアの詳細については後述する。 <Square part 142>
The square unit 142 receives the sound pickup signal X ₀ of the first microphone 110, calculates a value ^ φ _S (ω, τ) = | X ₀ (ω, τ) | ² obtained by squaring this value (S142), Output. Note that ^ φ _S is an estimated value of the power spectral density of the target area. Details of the target area will be described later.

＜フィルタ推定部１４３＞
フィルタ推定部１４３は、^φ_N(ω,τ)及び^φ_S(ω,τ)を受け取り、外部ノイズ音を抑圧するフィルタGを推定し（Ｓ１４３）、出力する。 <Filter estimation unit 143>
The filter estimation unit 143 receives ^ φ _N (ω, τ) and ^ φ _S (ω, τ), estimates the filter G that suppresses the external noise sound (S143), and outputs it.

例えば、参考文献１に基づくポストフィルタ設計法について説明する。 For example, a post filter design method based on Reference 1 will be described.

（参考文献１）Y. Hioka et al., “Underdetermined sound source separation using power spectrum density estimated by combination of directivity gain,” IEEE Trans. Audio, Speech, Language Proc., 21, 1240-1250, 2013.
参考文献１では、複数のビームフォーミングを用いて推定した各エリアのパワースペクトル密度(PSD)に基づいてポストフィルタを設計する方式が提案されている。以下、この方式をLPSD法(Local PSD-based post-filter design)と呼ぶ。図２を用いて、LPSD法の処理フローを説明する。 (Reference 1) Y. Hioka et al., “Underdetermined sound source separation using power spectrum density estimated by combination of directivity gain,” IEEE Trans. Audio, Speech, Language Proc., 21, 1240-1250, 2013.
Reference 1 proposes a method of designing a post filter based on the power spectral density (PSD) of each area estimated using a plurality of beamforming. Hereinafter, this method is referred to as an LPSD method (Local PSD-based post-filter design). The processing flow of the LPSD method will be described with reference to FIG.

例えば、Wiener法に基づいてポストフィルタを設計する場合、フィルタG(ω,τ)は以下のように計算される。 For example, when designing a post filter based on the Wiener method, the filter G (ω, τ) is calculated as follows.

ここで、φ_S(ω,τ)はターゲットエリア（マイクロホンの周囲を予め複数のエリアに分けた場合、複数のエリアのうちの収音したい音源が含まれるエリアであり、ターゲット音を発する音源が含まれるエリア。ヘッドセットの場合、装着者の口が含まれるエリア）のパワースペクトル密度を表し、φ_N(ω,τ)は雑音エリア（外部ノイズ音を発する音源が含まれるエリアであり、ターゲットエリアとは異なるように設定される。ヘッドセットの場合、上述の複数のエリアのうちターゲットエリアを除く装着者の口が含まれない全てのエリアである）のパワースペクトル密度を表す。ここで、あるエリアのパワースペクトル密度と言った場合には、そのエリアから到来する音のパワースペクトル密度のことを意味する。すなわち、例えば、ターゲットエリアのパワースペクトル密度とはターゲットエリアから到来する音のパワースペクトル密度のことであり、雑音エリアのパワースペクトル密度とは雑音エリアから到来する音のパワースペクトル密度のことである。 Here, φ _S (ω, τ) is a target area (in the case where the periphery of the microphone is divided into a plurality of areas in advance, it is an area including a sound source to be picked up from the plurality of areas, and the sound source that emits the target sound is Included area: In the case of a headset, it represents the power spectral density of the wearer's mouth, and φ _N (ω, τ) is a noise area (an area containing a sound source that emits external noise sound) In the case of the headset, the power spectral density of the above-mentioned plurality of areas that do not include the wearer's mouth except the target area is represented. Here, the power spectrum density of a certain area means the power spectrum density of sound coming from that area. That is, for example, the power spectral density of the target area is the power spectral density of sound coming from the target area, and the power spectral density of the noise area is the power spectral density of sound coming from the noise area.

後述するように、ヘッドセットの場合、第一マイクロホンと第二マイクロホンとは、所定の位置関係（第二マイクロホンは、第一マイクロホンとは、ヘッドセットの形態において離れた位置に配置される）にあるが、その位置関係は同一ではない。言い換えると、位置関係は、装着者毎（頭の大きさや口元の位置に応じて）、装着する度、及び、時刻毎に、変化する。そのため、マイクロホン間に生じる位相や振幅の差が変化する。よって、マイクロホン間に生じる位相や振幅の差を利用するビームフォーミングでは、マイクロホン間の位置関係に対して性能が敏感に変化するため、第一マイクロホンと第二マイクロホンによるビームフォーミングは、ターゲット音や外部ノイズ音のレベルを推定するための手段としてはふさわしくない。そこで、第一マイクロホンを装着者の口元方向に指向性を有するマイクロホンとする。さらに第一マイクロホンは口元に近接させることで、第一マイクロホンによる収音信号にはターゲット音が主に含まれる。一方、第二マイクロホンを外向きに対して指向性を有するマイクロホンとする。さらに、マイクロホンの位置に対して頑健に外部ノイズ音のレベルを推定するために、口元近傍に配置された第一マイクロホンから離れた位置に第二マイクロホンを配置することで、第二マイクロホンによる収音信号には外部ノイズ音が主に含まれる。これにより、フィルタ推定部１４３は、ターゲットエリアのパワースペクトル密度の推定値^φ_Sが第一マイクロホンの収音信号X₀(ω,τ)を用いて^φ_S=|X₀(ω,τ)|²として求められ、雑音エリアのパワースペクトル密度の推定値^φ_Nが第二マイクロホンの収音信号X_k(ω,τ)を用いて例えば式(1)〜(3)により求められる。 As will be described later, in the case of a headset, the first microphone and the second microphone are in a predetermined positional relationship (the second microphone is arranged at a position separated from the first microphone in the form of the headset). Although there is a positional relationship is not the same. In other words, the positional relationship changes for each wearer (according to the size of the head and the position of the mouth), every time it is worn, and every time. Therefore, the difference in phase and amplitude generated between the microphones changes. Therefore, in beam forming that uses the difference in phase and amplitude generated between microphones, the performance changes sensitively with respect to the positional relationship between the microphones. It is not suitable as a means for estimating the noise level. Therefore, the first microphone is a microphone having directivity in the direction of the wearer's mouth. Furthermore, the target sound is mainly included in the collected sound signal by the first microphone by bringing the first microphone close to the mouth. On the other hand, the second microphone is a microphone having directivity with respect to the outward direction. Furthermore, in order to robustly estimate the level of the external noise sound relative to the position of the microphone, the second microphone is arranged at a position away from the first microphone arranged in the vicinity of the mouth, so that sound collection by the second microphone is performed. The signal mainly includes external noise. Thus, the filter estimating unit 143 estimates the power spectral density of the target area ^ phi _S is collected sound signal X ₀ of the first microphone (omega, tau) with _{_{^ φ S = | X 0 (}} ω, τ ) | ² , and the estimated value ^ φ _N of the power spectral density in the noise area is obtained by, for example, equations (1) to (3) using the sound pickup signal X _k (ω, τ) of the second microphone.

フィルタ推定部１４３は、例えば、次式により、フィルタG(ω,τ)を推定する。 For example, the filter estimation unit 143 estimates the filter G (ω, τ) by the following equation.

＜第二フィルタリング部１４４＞
第二フィルタリング部１４４は、フィルタGを受け取り、フィルタGを用いて、収音信号X₀に対してフィルタリングを行う（Ｓ１４４）。X₀(ω,τ)に含まれる外部ノイズ音を抑圧するために、ポストフィルタG(ω,τ)を掛け合わせる。 <Second filtering unit 144>
The second filtering unit 144 receives the filter G, using the filter G, performs filtering on the sound pickup signal X ₀ (S144). In order to suppress the external noise sound included in X ₀ (ω, τ), the post filter G (ω, τ) is multiplied.

最後に、Z(ω,τ)を逆高速フーリエ変換（IFFT）することで、出力信号zを得る。 Finally, the output signal z is obtained by performing inverse fast Fourier transform (IFFT) on Z (ω, τ).

＜効果＞
このような構成により、ターゲット音が従来のヘッドセット型の収音装置よりも明瞭になるように信号処理を行うことができる。特に、高騒音下において収音されたターゲット音の聞き取りやすさを向上させることができ、高騒音下での通話や音声認識を可能とする。 <Effect>
With such a configuration, signal processing can be performed so that the target sound is clearer than the conventional headset type sound pickup device. In particular, it is possible to improve the easiness of hearing the target sound collected under high noise, and it is possible to make a call and recognize voice under high noise.

なお、非特許文献１では、ヘッドホンの音声信号に対してノイズキャンセルすることを想定しているため、装着者が発する音声を収音する必要がない。そのため、装着者が発する音声を収音するためのマイクロホンが存在しない。仮に、非特許文献１のヘッドホンと従来のヘッドセット（装着者が発する音声を収音するためのマイクロホン）を組合せたとしても、ノイズキャンセルの対象となる音声は、ヘッドホンの音声信号であって、装着者が発する音声を収音するためのマイクロホンで収音した収音信号ではない。そのため、従来技術では問題が生じておらず、装着者が発する音声を収音するためのマイクロホンと、外部ノイズ音を収音するためのマイクロホンとの位置関係について、検討すらされていない。 In Non-Patent Document 1, since it is assumed that noise cancellation is performed on the sound signal of the headphones, it is not necessary to collect the sound emitted by the wearer. For this reason, there is no microphone for picking up sound emitted by the wearer. Even if the headphones of Non-Patent Document 1 and a conventional headset (microphone for collecting sound emitted by the wearer) are combined, the sound that is subject to noise cancellation is an audio signal of the headphones, It is not a sound pickup signal picked up by a microphone for picking up sound emitted by the wearer. Therefore, there is no problem in the conventional technology, and the positional relationship between the microphone for collecting the sound emitted by the wearer and the microphone for collecting the external noise sound has not been studied.

また、第一マイクロホンの位置（口元近傍）に複数のマイクロホンを位置関係が変わらないように配置し、ビームフォーミングを行う方法も考えられるが、フィルタリング処理の計算量が大きくなるという問題がある。さらに、ヘッドセットのフレキシブルパイプ等の先端部分はスペースが限られており、複数のマイクロホンを配置するのに適さない。 A method of arranging a plurality of microphones at the position of the first microphone (near the mouth) so that the positional relationship does not change and performing beam forming is also conceivable, but there is a problem that the amount of calculation of the filtering process increases. Furthermore, a space is limited at the tip of the headset such as a flexible pipe, which is not suitable for arranging a plurality of microphones.

本実施形態では、指向性のマイクロホンを使うなどして、ターゲット音とその他の外部ノイズの音をできるだけ分けて収音できるような装置を設計することで、ビームフォーミングをせずとも外部ノイズ音のレベル推定を行うことができる構成とした。 In this embodiment, by designing a device that can collect the target sound and other external noise sounds as much as possible by using a directional microphone, etc., the external noise sound can be obtained without beam forming. It was set as the structure which can perform level estimation.

＜変形例＞
従来のノイズキャンセリングヘッドホンと本実施形態のヘッドセットとを組合せてもよい。その場合、第二マイクロホンの収音信号をヘッドホンの音声信号に対してノイズキャンセルを施すためにも利用すればよい。 <Modification>
You may combine the conventional noise canceling headphones and the headset of this embodiment. In that case, the collected sound signal of the second microphone may be used to perform noise cancellation on the sound signal of the headphones.

第一マイクロホンは、できる限り、ターゲット音を収音し、外部ノイズ音を収音しない構成であればよく、第一実施形態の構成に限定されない。例えば、収音装置１００は、複数個の第一マイクロホンを含んでもよい。また、第一マイクロホンを、装着者９の口元に近接させることで、第一マイクロホンによる収音信号に主にターゲット音が含まれるようにした場合には、第一マイクロホンは、必ずしも単一指向性でなくともよく、どのような指向性でもよい。また、必ずしも装着者９の口元近傍に配置される必要はなく、例えば、耳元近傍に配置され、超指向性を有し、ターゲット音を収音してもよい。このような構成であっても、第一マイクロホンによる収音信号X₀(ω,τ)に主に含まれるのはターゲット音となるため、ターゲットエリアのパワースペクトル密度の推定値^φ_S(ω,τ)は、^φ_S(ω,τ)=|X₀(ω,τ)|²で推定することができる。 The first microphone is not limited to the configuration of the first embodiment as long as it can collect the target sound and not the external noise sound as much as possible. For example, the sound collection device 100 may include a plurality of first microphones. Further, when the first microphone is brought close to the mouth of the wearer 9 so that the target sound is mainly included in the collected sound signal by the first microphone, the first microphone is not necessarily unidirectional. It does not have to be, and any directivity is acceptable. Further, it is not necessarily arranged near the mouth of the wearer 9. For example, it may be arranged near the ear, have super directivity, and collect the target sound. Even in such a configuration, since the target sound is mainly included in the sound pickup signal X ₀ (ω, τ) by the first microphone, the estimated value of the power spectral density of the target area ^ φ _S (ω , τ) _{is, ^ φ S (ω, τ} ) = | X 0 (ω, τ) | can be estimated at ^2.

第二マイクロホンは、できる限り、ターゲット音を収音せず、外部ノイズ音を収音する構成であればよく、第一実施形態の構成に限定されない。例えば、収音装置１００は、１個の第二マイクロホンのみを含んでもよいし、３個以上の第二マイクロホンを含んでもよい。また、第二マイクロホンは必ずしも装着者９の両耳近傍に配置される必要はなく、できる限り、ターゲット音を収音せず、外部ノイズ音を収音することができる位置に配置されればよい。例えば、図１のＡ〜Ｆ（Ａは左頬近傍、Ｂは左後頭部(1)または左側頭部、Ｃは左後頭部(2)または左頭頂部、Ｄは後頭部または頭頂部、Ｅは右後頭部(2)または右頭頂部、Ｆは右後頭部(1)または右側頭部）の位置に配置され、できる限り、ターゲット音を収音しないように指向性を有し、外部ノイズ音を収音してもよい。なお、第二マイクロホンを、第一マイクロホン１１０とは、ヘッドセットの形態において、物理的に離れた位置に配置することで、できる限り、ターゲット音を収音せず、外部ノイズ音を収音しやすくすることができる。特に、第一実施形態のように装着者９の両耳近傍に第二マイクロホンを配置すると、ターゲット音を収音せず、外部ノイズ音を収音しやすい。このような位置関係により、ターゲット音を収音せず、外部ノイズ音を収音しやすくすることができる場合には、第二マイクロホンは、外向きに対して指向性を有さなくともよく、どのような指向性でもよい。このような構成であっても、第二マイクロホンによる収音信号X_k(ω,τ)に主に含まれるのは外部ノイズ音になるため、雑音エリアのパワースペクトル密度の推定値^φ_N(ω,τ)は、式(1)〜(3)で推定することができる。 The second microphone is not limited to the configuration of the first embodiment as long as it can collect the target sound and collect the external noise as much as possible. For example, the sound collection device 100 may include only one second microphone or may include three or more second microphones. In addition, the second microphone does not necessarily have to be disposed near both ears of the wearer 9 and may be disposed at a position where the target sound is not collected and the external noise sound can be collected as much as possible. . For example, A to F in FIG. 1 (A is the vicinity of the left cheek, B is the left occipital region (1) or the left occipital region, C is the left occipital region (2) or the occipital region, D is the occipital region or the occipital region, E is the right occipital region) (2) or right top of the head, F is located at the position of the right back of the head (1) or right side of the head) and has directivity so as not to pick up the target sound as much as possible, and picks up external noise sound. May be. Note that the second microphone and the first microphone 110 are arranged in a physically separated position in the form of a headset, so that the target sound is not collected as much as possible and the external noise sound is collected. It can be made easier. In particular, when the second microphones are arranged in the vicinity of both ears of the wearer 9 as in the first embodiment, the target sound is not collected and the external noise sound is easily collected. In such a positional relationship, when the target sound is not picked up and the external noise sound can be picked up easily, the second microphone may not have directivity with respect to the outward direction, Any directivity is acceptable. Even in such a configuration, an external noise sound is mainly included in the sound pickup signal X _k (ω, τ) from the second microphone, so an estimated value ^ φ _N ( (ω, τ) can be estimated by equations (1) to (3).

フィルタ推定部１４３において推定されるフィルタは、(i)前記ターゲット音を強調する、または／および、(ii)前記外部ノイズ音を抑圧するものであればどのようなものでもよく、第一実施形態のフィルタに限定されない。 The filter estimated by the filter estimation unit 143 may be any filter as long as it (i) emphasizes the target sound and / or (ii) suppresses the external noise sound. It is not limited to the filter.

＜第二実施形態＞
第一実施形態と異なる部分を中心に説明する。第一実施形態とは、フィルタ推定部１４３における処理が異なる。なお、本実施形態において、「外部ノイズ音」のことを「雑音」ともいう。 <Second embodiment>
A description will be given centering on differences from the first embodiment. The process in the filter estimation part 143 differs from 1st embodiment. In the present embodiment, “external noise sound” is also referred to as “noise”.

LPSD法では、ターゲット音と干渉雑音とが混在することを仮定して問題を定式化してきた。しかし、実用上の問題では、コヒーレント性のある干渉雑音だけでなく、インコヒーレント性の強い定常性雑音(空調の雑音、マイクの内部雑音等)が混在することが多い。この場合、φ_S(ω,τ)及びφ_N(ω,τ)の推定誤差が大きくなり、雑音抑圧性能が低下してしまうことがあった。 The LPSD method has formulated the problem on the assumption that the target sound and interference noise are mixed. However, practical problems often include not only coherent interference noise but also stationary noise with high incoherence (air conditioning noise, microphone internal noise, etc.). In this case, estimation errors of φ _S (ω, τ) and φ _N (ω, τ) become large, and noise suppression performance may be deteriorated.

以下に説明するフィルタ推定部１４３では、LPSD法を拡張することで、様々な雑音環境に対して頑健にポストフィルタを推定する。具体的には、雑音の種類毎に分割してパワースペクトル密度を推定することで、ターゲット音のパワーとその他雑音のパワーとの比の推定誤差を小さくする。 In the filter estimation unit 143 described below, the post-filter is robustly estimated against various noise environments by extending the LPSD method. Specifically, the estimation error of the ratio between the power of the target sound and the power of other noise is reduced by estimating the power spectral density by dividing each noise type.

図４に、フィルタ推定部１４３の例のブロック図を示す。 FIG. 4 shows a block diagram of an example of the filter estimation unit 143.

フィルタ推定部１４３は、図４に示すように、第一定常／非定常成分抽出部１４３Ａ、第二定常／非定常成分抽出部１４３Ｂと、多様雑音対応型ゲイン計算部１４３Ｃと、時間周波数平均化部１４３Ｄと、ゲインシェーピング部１４３Ｅとを例えば備えている。 As shown in FIG. 4, the filter estimation unit 143 includes a first steady / unsteady component extraction unit 143A, a second steady / unsteady component extraction unit 143B, a multi-noise corresponding gain calculation unit 143C, and a time-frequency average. For example, a conversion unit 143D and a gain shaping unit 143E are provided.

この収音装置のフィルタ推定部１４３により例えば実現される信号処理の各ステップを、図５に示す。 FIG. 5 shows each step of signal processing realized by the filter estimation unit 143 of the sound collection device, for example.

以下、収音装置のフィルタ推定部１４３及び方法の実施形態の詳細について説明する。なお、基本的な信号処理のフレームワーク、言葉の定義等については、背景技術及び第一実施形態の欄に記載したものと同様である。よって、これらの重複説明を省略する。 Details of the filter estimation unit 143 and the method embodiment of the sound collection device will be described below. The basic signal processing framework, definition of words, and the like are the same as those described in the background art and the first embodiment. Therefore, these overlapping explanations are omitted.

本実施形態では、第一マイクロホン１１０でターゲット音が収音され、第二マイクロホン１２０−１及び１２０−２で外部ノイズ音が収音されると想定する。 In the present embodiment, it is assumed that the target sound is collected by the first microphone 110 and the external noise sound is collected by the second microphones 120-1 and 120-2.

＜第一定常／非定常成分抽出部１４３Ａ＞
例えば次式により定義される^φ_S(ω,τ)には、ターゲットエリアから到来する音に由来する非定常成分^φ_S ^(A)(ω,τ)及びインコヒーレントな雑音に由来する定常成分^φ_S ^(B)(ω,τ)が含まれる。 <First Steady / Unsteady Component Extraction Unit 143A>
For example, ^ φ _S (ω, τ) defined by the following equation has a nonstationary component ^ φ _S ^(A) (ω, τ) derived from the sound arriving from the target area and a steady state derived from incoherent noise. The component ^ φ _S ^(B) (ω, τ) is included.

なお、Kは第二マイクロホンの個数を表す。ここで、雑音には、干渉雑音とインコヒーレントな雑音との２種類の雑音がある。干渉雑音とは、雑音エリアに配置された雑音音源から発せられた雑音のことである。インコヒーレントな雑音とは、ターゲットエリア及び雑音エリアから発せられたものに限らず、雑音エリア、及び、これらのエリア以外の場所から発せられ、定常的に存在している雑音のことである。 K represents the number of second microphones. Here, there are two types of noise, interference noise and incoherent noise. Interference noise is noise generated from a noise source arranged in a noise area. Incoherent noise is not limited to noise emitted from the target area and the noise area, but is noise that is emitted from the noise area and places other than these areas and exists constantly.

そこで、第一定常／非定常成分抽出部１４３Ａは、ターゲットエリアのパワースペクトル密度^φ_S(ω,τ)から、ターゲットエリアから到来する音に由来する非定常成分^φ_S ^(A)(ω,τ)及びインコヒーレントな雑音に由来する定常成分^φ_S ^(B)(ω,τ)を時間平均処理により抽出する（Ｓ１４３Ａ）。 Therefore, the first stationary / unsteady component extraction unit 143A determines the unsteady component ^ φ _S ^(A) (derived from the sound arriving from the target area from the power spectral density ^ φ _S (ω, τ) of the target area. The stationary component ^ φ _S ^(B) (ω, τ) derived from ω, τ) and incoherent noise is extracted by time averaging (S143A).

抽出されたターゲットエリアから到来する音に由来する非定常成分^φ_S ^(A)(ω,τ)及びインコヒーレントな雑音に由来する定常成分^φ_S ^(B)(ω,τ)は、多様雑音対応型ゲイン計算部１４３Ｃに出力される。 There are various non-stationary components ^ φ _S ^(A) (ω, τ) derived from the sound coming from the extracted target area and stationary components ^ φ _S ^(B) (ω, τ) derived from incoherent noise. It is output to the noise corresponding gain calculation unit 143C.

例えば、第一定常／非定常成分抽出部１４３Ａは、式（１１）及び式（１２）のように指数移動平均処理をすることで、^φ_S(ω,τ)から^φ_S ^(B)(ω,τ)を計算する。 For example, the first stationary / unsteady component extraction unit 143A performs an exponential moving average process as shown in Equation (11) and Equation (12), so that ^ φ _S (ω, τ) to ^ φ _S ^{(B )} Calculate (ω, τ).

ここで、α_Sは平滑化係数であり、所定の正の実数である。例えば、０＜α_S＜１とする。また、時定数が150ms程度となるように設定してもよい。Υ_Sは、特定区間のフレームのインデックスの集合である。例えば、特定区間が３から４秒程度となるように設定される。 Here, α _S is a smoothing coefficient, which is a predetermined positive real number. For example, 0 <α _S <1. Further, the time constant may be set to about 150 ms. Υ _S is a set of frames index for a specific section. For example, the specific section is set to be about 3 to 4 seconds.

そして、第一定常／非定常成分抽出部１４３Ａは、式（１３）のように、^φ_S(ω,τ)から^φ_S ^(B)(ω,τ)を減算することで^φ_S ^(A)(ω,τ)を計算する。 Then, the first constant / non-stationary component extracting section 143A, as in the equation _{(13), ^ φ S (} ω, τ) from _{^{^ φ S (B) (ω}} , τ) by subtracting a ^ phi _S ^(A) (ω, τ) is calculated.

ここで、β_S（ω）は重み係数であり、所定の正の実数である。β_S（ω）は、例えば１から３程度の実数に設定される。 Here, β _S (ω) is a weighting coefficient, which is a predetermined positive real number. β _S (ω) is set to a real number of about 1 to 3, for example.

なお、^φ_S ^(A)(ω,τ)は、^φ_S ^(A)(ω,τ)≧０という条件を満たすようにフロアリング処理されてもよい。このフロアリング処理は、例えば第一定常／非定常成分抽出部１４３Ａにより行われる。 Note that ^ φ _S ^(A) (ω, τ) may be floored so as to satisfy the condition of ^ φ _S ^(A) (ω, τ) ≧ 0. This flooring process is performed by the first stationary / unsteady component extraction unit 143A, for example.

＜第二定常／非定常成分抽出部１４３Ｂ＞
例えば式（１０）により定義される^φ_N(ω,τ)には、干渉雑音に由来する非定常成分^φ_N ^(A)(ω,τ)及びインコヒーレントな雑音に由来する定常成分^φ_N ^(B)(ω,τ)が含まれる。 <Second Steady / Unsteady Component Extraction Unit 143B>
For example, ^ φ _N (ω, τ) defined by equation (10) includes non-stationary components derived from interference noise ^ φ _N ^(A) (ω, τ) and stationary components derived from incoherent noise ^ φ _N ^(B) (ω, τ) is included.

そこで、第二定常／非定常成分抽出部１４３Ｂは、雑音エリアのパワースペクトル密度^φ_N(ω,τ)を入力とし、雑音エリアのパワースペクトル密度^φ_N(ω,τ)から、干渉雑音に由来する非定常成分^φ_N ^(A)(ω,τ)及びインコヒーレントな雑音に由来する定常成分^φ_N ^(B)(ω,τ)を時間平均処理により抽出する（Ｓ１４３Ｂ）。 Therefore, the second stationary / unsteady component extraction unit 143B receives the power spectral density ^ φ _N (ω, τ) of the noise area as an input, and generates interference noise from the power spectral density ^ φ _N (ω, τ) of the noise area. The non-stationary component ^ φ _N ^(A) (ω, τ) derived from 及び and the stationary component ^ φ _N ^(B) (ω, τ) derived from incoherent noise are extracted by time averaging (S143B).

抽出された干渉雑音に由来する非定常成分^φ_N ^(A)(ω,τ)及びインコヒーレントな雑音に由来する定常成分^φ_N ^(B)(ω,τ)は、多様雑音対応型ゲイン計算部１４３Ｃに出力される。 The non-stationary component ^ φ _N ^(A) (ω, τ) derived from the extracted interference noise and the stationary component ^ φ _N ^(B) (ω, τ) derived from incoherent noise are It is output to the calculation unit 143C.

例えば、第二定常／非定常成分抽出部１４３Ｂは、式（１４）及び式（１５）のように指数移動平均処理をすることで、^φ_N(ω,τ)から^φ_N ^(B)(ω,τ)を計算する。 For example, the second stationary / unsteady component extraction unit 143B performs an exponential moving average process as shown in Equation (14) and Equation (15), so that ^ φ _N (ω, τ) to ^ φ _N ^(B) Calculate (ω, τ).

ここで、α_Nは平滑化係数であり、所定の正の実数である。例えば、０＜α_N＜１とする。また、時定数が150ms程度となるように設定してもよい。Υ_Nは、特定区間のフレームのインデックスの集合である。例えば、特定区間が３から４秒程度となるように設定される。 Here, α _N is a smoothing coefficient, which is a predetermined positive real number. For example, 0 <α _N <1. Further, the time constant may be set to about 150 ms. Υ _N is a set of frames index for a specific section. For example, the specific section is set to be about 3 to 4 seconds.

そして、第二定常／非定常成分抽出部１４３Ｂは、式（１６）のように、^φ_N(ω,τ)から^φ_N ^(B)(ω,τ)を減算することで^φ_N ^(A)(ω,τ)を計算する。 Then, the second constant / non-stationary component extracting section 143B, as in the equation _{(16), ^ φ N (} ω, τ) from _{^{^ φ N (B) (ω}} , τ) by subtracting a ^ phi _N ^(A) Calculate (ω, τ).

ここで、β_N（ω）は重み係数であり、所定の正の実数である。β_N（ω）は、例えば１から３程度の実数に設定される。 Here, β _N (ω) is a weighting coefficient, which is a predetermined positive real number. β _N (ω) is set to a real number of about 1 to 3, for example.

なお、^φ_N ^(A)(ω,τ)は、^φ_N ^(A)(ω,τ)≧０という条件を満たすようにフロアリング処理されてもよい。このフロアリング処理は、例えば第二定常／非定常成分抽出部１４３Ｂにより行われる。 Note that ^ φ _N ^(A) (ω, τ) may be floored so as to satisfy the condition of ^ φ _N ^(A) (ω, τ) ≧ 0. This flooring process is performed by, for example, the second steady / unsteady component extraction unit 143B.

α_Nは、α_Sと同じであっても異なっていてもよい。Υ_Nは、Υ_Sと同じであっても異なっていてもよい。β_N（ω）は、β_S（ω）と同じであっても異なっていてもよい。 α _N may be the same as or different from α _S. Υ _N may be the same as or different from Υ _S. β _N (ω) may be the same as or different from β _S (ω).

＜多様雑音対応型ゲイン計算部１４３Ｃ＞
多様雑音対応型ゲイン計算部１４３Ｃは、ターゲットエリアから到来する音に由来する非定常成分^φ_S ^(A)(ω,τ)と、インコヒーレントな雑音に由来する定常成分^φ_S ^(B)(ω,τ)と、干渉雑音に由来する非定常成分^φ_N ^(A)(ω,τ)と、インコヒーレントな雑音に由来する定常成分^φ_N ^(B)(ω,τ)を入力とし、これらを用いて、ターゲットエリアから到来する音の定常成分を強調するフィルタ~G(ω,τ)を計算する（Ｓ１４３Ｃ）。 <Gain calculation unit 143C for various noises>
The various noise corresponding gain calculation unit 143C is configured such that the non-stationary component ^ φ _S ^(A) (ω, τ) derived from the sound arriving from the target area and the stationary component ^ φ _S ^(B) derived from incoherent noise. (ω, τ), non-stationary component derived from interference noise ^ φ _N ^(A) (ω, τ), and stationary component derived from incoherent noise ^ φ _N ^(B) (ω, τ) These are used to calculate the filter ~ G (ω, τ) that emphasizes the steady component of the sound coming from the target area (S143C).

計算されたフィルタ~G(ω,τ)は、時間周波数平均化部１４３Ｄに出力される。 The calculated filter ~ G (ω, τ) is output to the time frequency averaging unit 143D.

雑音の種類ごとに（言い換えれば、インコヒーレントな雑音、コヒーレントな雑音という雑音の種類ごと）パワースペクトル密度を推定したので、多様雑音対応型ゲイン計算部１４３Ｃは、例えば、以下の式（１７）により定義されるポストフィルタ~G(ω,τ)を計算する。 Since the power spectral density is estimated for each type of noise (in other words, for each type of noise such as incoherent noise and coherent noise), the various noise corresponding gain calculation unit 143C uses, for example, the following equation (17). Calculate the defined post filter ~ G (ω, τ).

^φ_S ^(B)(ω,τ)の値の振る舞いと^φ_N ^(B)(ω,τ)の値の振る舞いとに違いがあり、インコヒーレント性の仮定が崩れている場合には、多様雑音対応型ゲイン計算部１４３Ｃは以下の式（１８）により定義されるポストフィルタ~G(ω,τ)を計算してもよい。 If there is a difference between the behavior of the value of ^ φ _S ^(B) (ω, τ) and the behavior of ^ φ _N ^(B) (ω, τ), and the assumption of incoherence is broken, The various noise corresponding gain calculation unit 143C may calculate a post filter ~ G (ω, τ) defined by the following equation (18).

＜時間周波数平均化部１４３Ｄ＞
時間周波数平均化部１４３Ｄは、フィルタ~G(ω,τ)を受け取り、フィルタ~G(ω,τ)について時間方向と周波数方向との少なくとも一方の方向への平滑化処理を行う（Ｓ１４３Ｄ）。 <Time frequency averaging unit 143D>
The time frequency averaging unit 143D receives the filter ~ G (ω, τ), and performs a smoothing process on the filter ~ G (ω, τ) in at least one of the time direction and the frequency direction (S143D).

平滑化処理されたフィルタ~G(ω,τ)は、ゲインシェーピング部１４３Ｅに出力される。 The smoothed filter˜G (ω, τ) is output to the gain shaping unit 143E.

時間方向に平滑化を行う場合には、τ₀及びτ₁を０以上の整数として、時間周波数平均化部１４３Ｄは、例えば、フィルタ~G(ω,τ)の時間方向に近傍のフィルタである~G(ω,τ-τ₀),…~G(ω,τ+τ₁)について加算平均をすればよい。時間周波数平均化部１４３Ｄは、~G(ω,τ-τ₀),…~G(ω,τ+τ₁)について重み付き加算をしてもよい。 In the case of performing smoothing in the time direction, τ ₀ and τ ₁ are integers greater than or equal to 0, and the time frequency averaging unit 143D is, for example, a filter in the time direction of filters to G (ω, τ). What is necessary is just to perform an addition average about ~ G (ω, τ-τ ₀ ), ... ~ G (ω, τ + τ ₁ ). The time-frequency averaging unit 143D may perform weighted addition for ~ G (ω, τ-τ ₀ ), ... ~ G (ω, τ + τ ₁ ).

また、周波数方向に平滑化を行う場合には、ω₀及びω₁を０以上の実数として、時間周波数平均化部１４３Ｄは、例えば、フィルタ~G(ω,τ)の周波数方向に近傍のフィルタである~G(ω-ω₀,τ),…~G(ω+ω₁,τ)について加算平均をすればよい。時間周波数平均化部１４３Ｄは、~G(ω-ω₀,τ),…~G(ω+ω₁,τ)について重み付き加算をしてもよい。 Further, when performing smoothing in the frequency direction, ω ₀ and ω ₁ are set to real numbers of 0 or more, and the time frequency averaging unit 143D performs, for example, a filter in the vicinity of the frequency direction of filters to G (ω, τ). What is necessary is just to perform an addition average for ~ G (ω−ω ₀ , τ),... ~ G (ω + ω ₁ , τ). The time frequency averaging unit 143D may perform weighted addition for ~ G (ω-ω ₀ , τ), ... ~ G (ω + ω ₁ , τ).

＜ゲインシェーピング部１４３Ｅ＞
ゲインシェーピング部１４３Ｅは、平滑化処理が行われたフィルタ~G(ω,τ)を受け取り、平滑化処理が行われたフィルタ~G(ω,τ)についてゲインシェーピングを行うことにより、フィルタG(ω,τ)を生成し、（Ｓ１４３Ｅ）出力する。ゲインシェーピング部１４３Ｅは、例えば、以下の式（１９）により定義されるフィルタG(ω,τ)を生成する。 <Gain shaping unit 143E>
The gain shaping unit 143E receives the smoothed filter ~ G (ω, τ) and performs the gain shaping on the smoothed filter ~ G (ω, τ), thereby obtaining the filter G ( (ω, τ) is generated and output (S143E). The gain shaping unit 143E generates, for example, a filter G (ω, τ) defined by the following equation (19).

ここで、γは重み係数であり、正の実数である。例えば、γを1から1.3程度に設定すればよい。 Here, γ is a weighting factor, which is a positive real number. For example, γ may be set to about 1 to 1.3.

ゲインシェーピング部１４３Ｅは、A≦G(ω,τ)≦1を満たすように、フィルタG(ω,τ)についてフロアリング処理をしてもよい。Aは0から0.3の実数であり、通常0.1程度とする。G(ω,τ)が１より大きいと強調し過ぎになる可能性があり、また、G(ω,τ)が小さ過ぎるとミュージカルノイズの発生する可能性がある。適切なフロアリング処理を行うことにより、この強調及びミュージカルノイズの発生を防止することができる。 The gain shaping unit 143E may perform a flooring process on the filter G (ω, τ) so as to satisfy A ≦ G (ω, τ) ≦ 1. A is a real number from 0 to 0.3, usually about 0.1. If G (ω, τ) is larger than 1, there is a possibility of overemphasis, and if G (ω, τ) is too small, musical noise may be generated. By performing an appropriate flooring process, it is possible to prevent this enhancement and the generation of musical noise.

定義域及び値域が実数である関数fを考える。関数fは例えば非減少関数とする。ゲインシェーピングは、ゲインシェーピング前の~G(ω,τ)を関数fに入力したときの出力値を求める操作を意味する。言い換えれば、関数fに~G(ω,τ)を入力したときの出力値がG(ω,τ)である。関数fの例が、式（１９）である。式（１９）による関数fは、f(x)=γ(x-0.5)+0.5である。 Consider a function f whose domain and range are real numbers. For example, the function f is a non-decreasing function. Gain shaping means an operation for obtaining an output value when ~ G (ω, τ) before gain shaping is input to the function f. In other words, the output value when ~ G (ω, τ) is input to the function f is G (ω, τ). An example of the function f is Expression (19). The function f according to the equation (19) is f (x) = γ (x−0.5) +0.5.

他の関数fの他の例を図６を用いて説明する。図６では、インデックスを省略している。すなわち、図６のGはG(ω,τ)を意味し、~Gは~G(ω,τ)を意味する。まず、この例では、図６（Ａ）から図６（Ｂ）に示すように、関数fのグラフの傾きを変えている。そして、図６（Ｂ）から図６（Ｃ）に示すように、0≦G(ω,τ)≦1を満たすように、フロアリング処理をしている。この図６（Ｃ）の太線により示されるグラフで特定される関数が関数fの他の例である。 Another example of another function f will be described with reference to FIG. In FIG. 6, the index is omitted. That is, G in FIG. 6 means G (ω, τ), and ~ G means ~ G (ω, τ). First, in this example, as shown in FIGS. 6A to 6B, the slope of the graph of the function f is changed. Then, as shown in FIGS. 6B to 6C, flooring processing is performed so as to satisfy 0 ≦ G (ω, τ) ≦ 1. The function specified by the graph indicated by the bold line in FIG. 6C is another example of the function f.

関数fのグラフは、図６（Ｃ）に示すものに限られない。例えば、図６（Ｃ）では、関数fのグラフは直線で構成されているが、関数fのグラフは曲線で構成されていてもよい。例えば、関数fは、ハイパボリックタンジェント関数に対してフロアリング処理を施したものであってもよい。 The graph of the function f is not limited to that shown in FIG. For example, in FIG. 6C, the graph of the function f is composed of a straight line, but the graph of the function f may be composed of a curve. For example, the function f may be a function obtained by performing a flooring process on a hyperbolic tangent function.

＜効果＞
このような構成により、第一実施形態と同様の効果を得ることができる。さらに、このフィルタ推定部１４３によれば、多様な性質を持つ雑音が存在する環境に対して頑健に、雑音抑圧するためのポストフィルタを設計することができる。また、リアルタイム性のある処理で、このようなポストフィルタを設計することができる。 <Effect>
With such a configuration, the same effect as that of the first embodiment can be obtained. Furthermore, according to the filter estimation unit 143, it is possible to design a post filter for suppressing noise robustly in an environment where noise having various properties exists. In addition, such a post filter can be designed by processing with real-time characteristics.

＜変形例＞
時間周波数平均化部１４３Ｄ及びゲインシェーピング部１４３Ｅの処理は、いわゆるミュージカルノイズを抑えるために行われる。時間周波数平均化部１４３Ｄ及びゲインシェーピング部１４３Ｅの処理は、行われなくてもよい。 <Modification>
The processing of the time frequency averaging unit 143D and the gain shaping unit 143E is performed to suppress so-called musical noise. The processing of the time frequency averaging unit 143D and the gain shaping unit 143E may not be performed.

指数移動平均処理による^φ_S ^(B)(ω,τ)及び^φ_S ^(A)(ω,τ)の計算は、第一定常／非定常成分抽出部１４３Ａの処理の一例である。第一定常／非定常成分抽出部１４３Ａは、他の処理により、^φ_S ^(B)(ω,τ)及び^φ_S ^(A)(ω,τ)を抽出してもよい。 The calculation of ^ φ _S ^(B) (ω, τ) and ^ φ _S ^(A) (ω, τ) by the exponential moving average process is an example of the process of the first steady / unsteady component extraction unit 143A. The first steady / unsteady component extraction unit 143A may extract ^ φ _S ^(B) (ω, τ) and ^ φ _S ^(A) (ω, τ) by other processing.

同様に、指数移動平均処理による^φ_N ^(B)(ω,τ)及び^φ_N ^(A)(ω,τ)の計算は、第二定常／非定常成分抽出部１４３Ｂの処理の一例である。第二定常／非定常成分抽出部１４３Ｂは、他の処理により、^φ_N ^(B)(ω,τ)及び^φ_N ^(A)(ω,τ)を抽出してもよい。 Similarly, calculation of ^ φ _N ^(B) (ω, τ) and ^ φ _N ^(A) (ω, τ) by exponential moving average processing is an example of processing of the second stationary / unsteady component extraction unit 143B. is there. The second steady / unsteady component extraction unit 143B may extract ^ φ _N ^(B) (ω, τ) and ^ φ _N ^(A) (ω, τ) by other processing.

＜第三実施形態＞
第一実施形態と異なる部分を中心に説明する。 <Third embodiment>
A description will be given centering on differences from the first embodiment.

第一マイクロホンが装着者９の近接しているとはいえ、第一マイクロホンの収音信号にも外部ノイズ音は混入している。また、第二マイクロホンが、第一マイクロホンから離れた位置に配置されているとはいえ、第二マイクロホンの収音信号にはターゲット音が混入している。本実施形態では、この混入分を補正することで、フィルタGの精度を高める。 Although the first microphone is close to the wearer 9, the external noise sound is also mixed in the sound pickup signal of the first microphone. In addition, although the second microphone is disposed at a position away from the first microphone, the target sound is mixed in the collected sound signal of the second microphone. In the present embodiment, the accuracy of the filter G is improved by correcting the mixed amount.

そこで、本実施形態では、ターゲットエリアのパワースペクトル密度の推定値^φ_S(ω,τ)および雑音エリアのパワースペクトル密度の推定値^φ_N(ω,τ)を補正する。補正式は以下の通りである。
^φ’_S(ω,τ)=^φ_S(ω,τ)-α^φ_N(ω,τ) (21)
^φ’_N(ω,τ)=^φ_N(ω,τ)-γ^φ_S(ω,τ) (22)
フィルタ推定部１４３は、式（５）において、^φ_S(ω,τ)及び^φ_N(ω,τ)に代えて、^φ’_S(ω,τ)及び^φ’_N(ω,τ)を用いて、ポストフィルタG(ω,τ)を計算すればよい。 Therefore, in this embodiment, the estimated value ^ φ _S (ω, τ) of the power spectral density in the target area and the estimated value ^ φ _N (ω, τ) of the power spectral density in the noise area are corrected. The correction formula is as follows.
^ φ ' _S (ω, τ) = ^ φ _S (ω, τ) -α ^ φ _N (ω, τ) (21)
^ φ ' _N (ω, τ) = ^ φ _N (ω, τ) -γ ^ φ _S (ω, τ) (22)
The filter estimation unit 143 replaces ^ φ _S (ω, τ) and ^ φ _N (ω, τ) in Equation (5) with ^ φ ′ _S (ω, τ) and ^ φ ′ _N (ω, The post filter G (ω, τ) may be calculated using τ).

なお、第一マイクロホン及び第二マイクロホンの配置や指向特性を考慮して、α及びγを予め設定してもよいし、適応的に設定してもよい。 Note that α and γ may be set in advance or may be set adaptively in consideration of the arrangement and directivity characteristics of the first microphone and the second microphone.

予め設定する場合には、信号処理部１４０の機能ブロック図は図２で表され、フィルタ推定部１４３は、α及びγを予め記憶しておけばよい。 In the case of setting in advance, the functional block diagram of the signal processing unit 140 is represented in FIG. 2, and the filter estimation unit 143 may store α and γ in advance.

図７はα及びγを適応的に設定する場合の信号処理部１４０の機能ブロック図を、図８はその処理フローの例を示す。信号処理部１４０は、レベル比推定部１４５を含む。 FIG. 7 is a functional block diagram of the signal processing unit 140 when α and γ are adaptively set, and FIG. 8 shows an example of the processing flow. The signal processing unit 140 includes a level ratio estimation unit 145.

＜レベル比推定部１４５＞
レベル比推定部１４５は、ターゲットエリアのパワースペクトル密度の推定値^φ_S(ω,τ)および雑音エリアのパワースペクトル密度の推定値^φ_N(ω,τ)を受け取り、これらの値を用いて、α及びγを求め（Ｓ１４５）、フィルタ推定部１４３に出力する。 <Level Ratio Estimator 145>
The level ratio estimation unit 145 receives the estimated value ^ φ _S (ω, τ) of the power spectral density of the target area and the estimated value ^ φ _N (ω, τ) of the power spectral density of the noise area, and uses these values. Then, α and γ are obtained (S145) and output to the filter estimation unit 143.

αは、雑音成分の第一マイクロホンと第二マイクロホン間のレベル差であるので、ターゲット音がなく雑音のみが存在する区間（雑音区間）に（第一マイクロホンのレベル）／（第ニマイクロホンのレベル）を計算することで求められる。まず、
α=^φ_S(ω,τ)/^φ_N(ω,τ) (23)
を計算する。ターゲット音は第一マイクロホンに大きく入り、第二マイクロホンに小さく入るので、マイクロホン間のレベル差（第一マイクロホンのレベル）／（第二マイクロホンのレベル）を観測することで、ターゲット音が存在する区間と、ターゲット音が存在しない雑音のみの区間（雑音区間）を識別することができる。レベル差（第一マイクロホンのレベル）／（第ニマイクロホンのレベル）があらかじめ設定した閾値以下である場合に雑音区間であると判定し、そのときのαを出力する。例えば、第一マイクロホンに収音されるターゲット音は、第二マイクロホンに比べ10〜20dB程度大きいレベルとなっていると見込まれる。そのため、あらかじめ設定する閾値は、1〜10の間に設定するとよい。 α is a level difference between the first microphone and the second microphone of the noise component. Therefore, in a section (noise section) where there is no target sound and only noise exists (level of the first microphone) / (level of the second microphone) ) Is calculated. First,
α = ^ φ _S (ω, τ) / ^ φ _N (ω, τ) (23)
Calculate Since the target sound enters the first microphone largely and enters the second microphone smallly, by observing the level difference between the microphones (level of the first microphone) / (level of the second microphone), the section where the target sound exists Then, it is possible to identify a noise-only section (noise section) in which the target sound does not exist. When the level difference (the level of the first microphone) / (the level of the second microphone) is equal to or less than a preset threshold value, it is determined that it is a noise interval, and α at that time is output. For example, the target sound collected by the first microphone is expected to be about 10 to 20 dB higher than the second microphone. Therefore, the threshold value set in advance is preferably set between 1 and 10.

γは、ターゲット音の第二マイクロホンと第一マイクロホン間のレベル差であるので、発話区間に（第二マイクロホンのレベル）／（第一マイクロホンのレベル）を計算することで求められる。まず、
γ=^φ_N(ω,τ)/^φ_S(ω,τ) (24)
を計算する。ターゲット音は第一マイクロホンに大きく入り、第ニマイクロホンに小さく入るので、マイクロホン間のレベル差（第二マイクロホンのレベル）／（第一マイクロホンのレベル）を観測することで、発話区間を検出することができる。レベル差（第二マイクロホンのレベル）／（第一マイクロホンのレベル）があらかじめ設定した閾値以下である場合に発話区間であると判定し、そのときのγを出力する。例えば、第一マイクロホンと第二マイクロホン間のレベル差は、発話区間であれば10〜20dBくらいが見込まれる。そのため、あらかじめ設定する閾値は、1〜10の間に設定するとよい。 Since γ is a level difference between the second microphone and the first microphone of the target sound, it can be obtained by calculating (level of the second microphone) / (level of the first microphone) in the speech section. First,
γ = ^ φ _N (ω, τ) / ^ φ _S (ω, τ) (24)
Calculate Since the target sound enters the first microphone largely and enters the second microphone smallly, the speech interval can be detected by observing the level difference between the microphones (second microphone level) / (first microphone level). Can do. When the level difference (the level of the second microphone) / (the level of the first microphone) is equal to or less than a preset threshold value, the speech section is determined to be output, and γ at that time is output. For example, the level difference between the first microphone and the second microphone is expected to be about 10 to 20 dB in the utterance period. Therefore, the threshold value set in advance is preferably set between 1 and 10.

レベル比推定部１４５は一定時間ごとにα及びγを求めることで、フィルタ推定部１４３はα及びγを適応的に設定することができる。 The level ratio estimation unit 145 obtains α and γ at regular intervals, so that the filter estimation unit 143 can adaptively set α and γ.

＜効果＞
このような構成により第一実施形態と同様の効果を得ることができる。さらに、第一マイクロホンにおける外部ノイズ音の混入、および、第二マイクロホンにおけるターゲット音の混入を考慮することで、フィルタGの精度を高めることができる。 <Effect>
With this configuration, the same effect as that of the first embodiment can be obtained. Furthermore, the accuracy of the filter G can be improved by taking into account the mixing of external noise sound in the first microphone and the mixing of target sound in the second microphone.

＜変形例＞
本実施形態と第二実施形態を組合せてもよい。 <Modification>
You may combine this embodiment and 2nd embodiment.

第二実施形態のフィルタ推定部１４３の処理の前段で、第三実施形態の方法で^φ’_S(ω,τ)及び^φ’_N(ω,τ)を計算して、^φ’_S(ω,τ)及び^φ’_N(ω,τ)を用いて、第二実施形態のフィルタ推定部１４３の処理を行ってもよいが、本変形例では、ターゲットエリアから到来する音に由来する非定常成分^φ_S ^(A)(ω,τ)および環境雑音に由来する非定常成分^φ_N ^(A)(ω,τ)を補正する。補正式は以下の通りである。
^φ’_S ^(A)(ω,τ)=^φ_S ^(A)(ω,τ)-β^φ_N ^(A)(ω,τ) (25)
^φ’_N ^(A)(ω,τ)=^φ_N ^(A)(ω,τ)-κ^φ_S ^(A)(ω,τ) (26)
多様雑音対応型ゲイン計算部１４３Ｃは、式（１７）または式（１８）において、^φ_S ^(A)(ω,τ)及び^φ_N ^(A)(ω,τ)に代えて、^φ’_S ^(A)(ω,τ)及び^φ’_N ^(A)(ω,τ)を用いて、ポストフィルタ~G(ω,τ)を計算すればよい。 Before the processing of the filter estimation unit 143 of the second embodiment, ^ φ ' _S (ω, τ) and ^ φ' _N (ω, τ) are calculated by the method of the third embodiment, and ^ φ ' _S The processing of the filter estimation unit 143 of the second embodiment may be performed using (ω, τ) and ^ φ ′ _N (ω, τ), but in this modification, it is derived from the sound coming from the target area The unsteady component ^ φ _S ^(A) (ω, τ) and the unsteady component ^ φ _N ^(A) (ω, τ) derived from environmental noise are corrected. The correction formula is as follows.
^ φ ' _S ^(A) (ω, τ) = ^ φ _S ^(A) (ω, τ) -β ^ φ _N ^(A) (ω, τ) (25)
^ φ ' _N ^(A) (ω, τ) = ^ φ _N ^(A) (ω, τ) -κ ^ φ _S ^(A) (ω, τ) (26)
The variable noise corresponding gain calculation unit 143C replaces ^ φ _S ^(A) (ω, τ) and ^ φ _N ^(A) (ω, τ) in Equation (17) or Equation (18) with ^ φ The post filter ~ G (ω, τ) may be calculated using ' _S ^(A) (ω, τ) and ^ φ' _N ^(A) (ω, τ).

なお、第一マイクロホン及び第二マイクロホンの配置や指向特性を考慮して、β及びκを予め設定してもよいし、適応的に設定してもよい。 Note that β and κ may be set in advance or adaptively in consideration of the arrangement and directivity characteristics of the first microphone and the second microphone.

予め設定する場合には、フィルタ推定部１４３の機能ブロック図は図４で表され、多様雑音対応型ゲイン計算部１４３Ｃは、β及びκを予め記憶しておけばよい。 In the case of setting in advance, the functional block diagram of the filter estimation unit 143 is shown in FIG. 4, and the various noise corresponding gain calculation unit 143C may store β and κ in advance.

図９はβ及びκを適応的に設定する場合のフィルタ推定部１４３の機能ブロック図を、図１０はその処理フローの例を表す。フィルタ推定部１４３は、レベル比推定部１４３Ｆを含む。 FIG. 9 is a functional block diagram of the filter estimation unit 143 when β and κ are adaptively set, and FIG. 10 shows an example of the processing flow. The filter estimation unit 143 includes a level ratio estimation unit 143F.

レベル比推定部１４３Ｆは、ターゲットエリアのパワースペクトル密度の推定値^φ_S(ω,τ)および雑音エリアのパワースペクトル密度の推定値^φ_N(ω,τ)に代えて、ターゲットエリアから到来する音に由来する非定常成分^φ_S ^(A)(ω,τ)および環境雑音に由来する非定常成分^φ_N ^(A)(ω,τ)を用いて、同様の処理を行い、αおよびγに代えて、βおよびκを求め、出力する（Ｓ１４３Ｆ）。よって、
β=^φ_S ^(A)(ω,τ)/^φ_N ^(A)(ω,τ)
κ=^φ_N ^(A)(ω,τ)/^φ_S ^(A)(ω,τ)
であり、β及びκが、それぞれあらかじめ設定した閾値以下のときに出力される。 The level ratio estimation unit 143F arrives from the target area instead of the estimated value ^ φ _S (ω, τ) of the power spectral density of the target area and the estimated value ^ φ _N (ω, τ) of the power spectral density of the noise area. Using the non-stationary component ^ φ _S ^(A) (ω, τ) derived from the sound and the non-stationary component ^ φ _N ^(A) (ω, τ) derived from the environmental noise, Instead of γ and γ, β and κ are obtained and output (S143F). Therefore,
β = ^ φ _S ^(A) (ω, τ) / ^ φ _N ^(A) (ω, τ)
κ = ^ φ _N ^(A) (ω, τ) / ^ φ _S ^(A) (ω, τ)
And are output when β and κ are each equal to or less than a preset threshold value.

本発明は上記の実施形態及び変形例に限定されるものではない。例えば、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 The present invention is not limited to the above-described embodiments and modifications. For example, the various processes described above are not only executed in time series according to the description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. In addition, it can change suitably in the range which does not deviate from the meaning of this invention.

＜プログラム及び記録媒体＞
また、上記の実施形態及び変形例で説明した各装置における各種の処理機能をコンピュータによって実現してもよい。その場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 <Program and recording medium>
In addition, various processing functions in each device described in the above embodiments and modifications may be realized by a computer. In that case, the processing contents of the functions that each device should have are described by a program. Then, by executing this program on a computer, various processing functions in each of the above devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶部に格納する。そして、処理の実行時、このコンピュータは、自己の記憶部に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実施形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよい。さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、プログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its storage unit. When executing the process, this computer reads the program stored in its own storage unit and executes the process according to the read program. As another embodiment of this program, a computer may read a program directly from a portable recording medium and execute processing according to the program. Further, each time a program is transferred from the server computer to the computer, processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program includes information provided for processing by the electronic computer and equivalent to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、コンピュータ上で所定のプログラムを実行させることにより、各装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In addition, although each device is configured by executing a predetermined program on a computer, at least a part of these processing contents may be realized by hardware.

スマートフォンのコマンド入力として、音声認識が一般的に利用されるようになってきた。車内や工場内といった雑音下では、ハンズフリーで機器を操作したり、遠隔地と通話するといった需要が高いと考えられる。 Speech recognition has been commonly used as a command input for smartphones. Under noisy conditions such as in cars and factories, there is a high demand for hands-free operation of devices and calls with remote locations.

この発明は、例えばこのような場合に利用することができる。 The present invention can be used, for example, in such a case.

Claims

ヘッドセット型の収音装置であって、
当該収音装置の装着者の口元近傍に配され、前記装着者が発する音声であるターゲット音を収音するための１個の第一マイクロホンと、
前記第一マイクロホンとは、ヘッドセットの形態において離れた位置に配置され、外部ノイズ音を収音するための２個の第二マイクロホンと、
第一マイクロホンの収音信号及び第二マイクロホンの収音信号を用いて、(i)前記ター
ゲット音を強調した、または／および、(ii)前記外部ノイズ音を抑圧した出力信号を生成する信号処理部とを含み、
前記第一マイクロホンは、前記装着者の口元の方向に対して単一指向性を有し、
２個の前記第二マイクロホンは、それぞれ、前記装着者の両耳近傍に配置され、外向きに対して指向性を有し、
前記信号処理部は、
第一マイクロホンの収音信号及び第二マイクロホンの収音信号を用いて、前記第一マイクロホンの収音信号から、前記外部ノイズ音を抑圧するフィルタを推定するフィルタ推定部と、
前記フィルタを用いて、前記第一マイクロホンの収音信号に対してフィルタリングを行う第二フィルタリング部とを含む、
収音装置。 A headset type sound collecting device,
A first microphone arranged in the vicinity of the mouth of the wearer of the sound collection device for collecting a target sound, which is a sound emitted by the wearer;
The first microphone is arranged at a position separated in the form of a headset, and two second microphones for collecting external noise sound,
Signal processing for generating an output signal in which (i) the target sound is emphasized and / or (ii) the external noise sound is suppressed by using the collected sound signal of the first microphone and the collected sound signal of the second microphone Including
The first microphone has unidirectionality with respect to the direction of the wearer's mouth,
Two of said second microphone, respectively, disposed on both ears vicinity of the wearer, have a tropism for outward
The signal processing unit
A filter estimation unit that estimates a filter that suppresses the external noise sound from the collected sound signal of the first microphone, using the collected sound signal of the first microphone and the collected sound signal of the second microphone;
A second filtering unit that performs filtering on the collected sound signal of the first microphone using the filter;
Sound collection device.