JP2018170718A

JP2018170718A - Sound collecting device, program, and method

Info

Publication number: JP2018170718A
Application number: JP2017068515A
Authority: JP
Inventors: 大藤枝; Masaru Fujieda; 一浩片桐; Kazuhiro Katagiri
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2017-03-30
Filing date: 2017-03-30
Publication date: 2018-11-01
Anticipated expiration: 2037-03-30
Also published as: JP6863004B2

Abstract

PROBLEM TO BE SOLVED: To provide a sound collecting device, a program, and a method for emphasizing only a target area sound with less distortion.SOLUTION: The present invention relates to a sound collecting device. The sound collecting device according to the present invention calculates, for each of a plurality of microphone arrays each of which is composed of two microphones, an arrival direction feature amount that changes according to the arrival direction of sound, and becomes a large value for a sound coming from a direction of a target area and becomes a small value for a sound coming from a direction other than the direction of the target area, and obtains, for each frequency component, an area feature amount obtained by integrating the arrival direction feature amounts of the respective microphone arrays, and extracts a target area sound from a signal based on the captured signal output by the microphone array by using the area feature amount.SELECTED DRAWING: Figure 1

Description

本発明は、収音装置、プログラム及び方法に関し、例えば、特定のエリアの音のみを強調し、それ以外のエリアの音を抑圧する場合に適用し得る。 The present invention relates to a sound collection device, a program, and a method, and can be applied to, for example, emphasizing only sounds in a specific area and suppressing sounds in other areas.

特定の方向に存在する音（音声や音響；以下、音声及び音響をまとめて「音響」と呼ぶこともある）を強調し、それ以外の音を抑圧する技術として、マイクアレイを用いたビームフォーマがある。ビームフォーマとは、各マイクに到達する信号の時間差を利用して指向性や死角を形成する技術である（非特許文献１、非特許文献２参照）。 A beamformer using a microphone array is a technology that emphasizes sound (sound and sound; hereinafter referred to as “sound” collectively), and suppresses other sounds. There is. The beamformer is a technique for forming directivity and blind spot using a time difference between signals reaching each microphone (see Non-Patent Document 1 and Non-Patent Document 2).

しかし、単純にビームフォーマの指向性を収音目的とするエリア（以下、「目的エリア」と呼ぶ）に向けただけでは、目的エリアの周囲に雑音源が存在する場合、目的エリア内に存在する音源（以下、「目的エリア音」と呼ぶ）だけでなく、目的エリア外に存在する雑音源（以下、「非目的エリア音」と呼ぶ）も同時に収音してしまうという問題が存在する。 However, simply pointing the beamformer's directivity toward an area for sound collection (hereinafter referred to as “target area”), if there is a noise source around the target area, it exists in the target area. There is a problem that not only a sound source (hereinafter referred to as “target area sound”), but also a noise source (hereinafter referred to as “non-target area sound”) existing outside the target area is collected simultaneously.

この問題に対して、従来、複数のマイクアレイを用いて、別々の方向から指向性を目的エリアに向けて交差させ、目的エリア音を収音する方式が提案されている（特許文献１）。特許文献１に記載された方式では、各マイクアレイのビームフォーマ出力を同時に処理することで、目的エリアを抽出する。 In order to solve this problem, a method has been proposed in which a plurality of microphone arrays are used to cross the directivities from different directions toward the target area to collect the target area sound (Patent Document 1). In the method described in Patent Document 1, the target area is extracted by simultaneously processing the beamformer output of each microphone array.

図６は、従来の複数のマイクアレイを用いた収音処理の例について示した説明図である。 FIG. 6 is an explanatory diagram showing an example of sound collection processing using a plurality of conventional microphone arrays.

図６では、２つのマイクアレイＭＡ（ＭＡ_１、ＭＡ_２）の指向性を目的エリアに向けた場合の例について示している。 FIG. 6 shows an example in which the directivity of _two microphone arrays MA (MA ₁ , MA ₂ ) is directed to the target area.

図６（ａ）は、２つのマイクアレイＭＡ_１、ＭＡ_２の指向性を目的エリアに向けた場合の各マイクアレイＭＡや、目的エリア音の音源との位置関係について示している。また、図６（ａ）では、マイクアレイＭＡ_１、ＭＡ_２に対応する指向性（ビームフォーマの指向性）Ｚ１、Ｚ２についても図示している。さらに、図６（ａ）の例では、目的エリアの音源の周囲に非目的エリア音の音源が存在している。従って、図６（ａ）の状態では、マイクアレイＭＡ_１、ＭＡ_２のビームフォーマ出力には、共に、目的エリアにある音源による目的エリア音だけでなく、同じ指向性方向の非目的エリアにある音源による非目的エリア音が含まれてしまうことになる。 FIG. 6A shows the positional relationship between each microphone array MA and the sound source of the target area sound when the directivities of the _two microphone arrays MA ₁ and MA ₂ are directed to the target area. FIG. 6A also shows the directivity (beamformer directivity) Z1 and Z2 corresponding to the microphone arrays MA ₁ and MA ₂ . Further, in the example of FIG. 6A, a sound source of non-target area sound exists around the sound source of the target area. Therefore, in the state of FIG. 6A, the beamformer outputs of the microphone arrays MA ₁ and MA ₂ are both in the non-target area of the same directivity direction as well as the target area sound by the sound source in the target area. The non-target area sound by the sound source will be included.

図６（ｂ）、図６（ｃ）は、それぞれ、２つのマイクアレイＭＡ_１、ＭＡ_２のビームフォーマ出力の周波数成分を示している。音声のスパース性を仮定すると、図６（ｂ）、図６（ｃ）に示すように、一つの周波数成分には一つの音源（目的エリア音又は非目的エリア音）しか含まれない。そして、目的エリアは全てのマイクアレイの指向性に含まれているため、目的エリア音の周波数成分は、全てのビームフォーマ出力に、同じ割合、同じ分布で含まれる。これと比較して、非目的エリア音の周波数成分は、ビームフォーマ出力ごとに異なっている。このような特徴から、各ビームフォーマ出力に共通に含まれる周波数成分は、目的エリア音が有する成分と推定することができ、これに基づいて、特許文献１等に記載された従来の目的エリア音の収音方法が実現されている。 FIGS. 6B and 6C show the frequency components of the beamformer outputs of the _two microphone arrays MA ₁ and MA ₂ , respectively. Assuming the sparseness of speech, as shown in FIGS. 6B and 6C, only one sound source (target area sound or non-target area sound) is included in one frequency component. Since the target area is included in the directivity of all microphone arrays, the frequency components of the target area sound are included in all beamformer outputs in the same ratio and with the same distribution. Compared with this, the frequency component of the non-target area sound is different for each beamformer output. From these characteristics, the frequency component that is commonly included in each beamformer output can be estimated as a component of the target area sound. Based on this, the conventional target area sound described in Patent Document 1 and the like can be estimated. The sound collection method is realized.

図７は、従来の収音方法を適用した収音装置１０の機能的構成について示したブロック図である。 FIG. 7 is a block diagram showing a functional configuration of the sound collection device 10 to which a conventional sound collection method is applied.

図７に示す従来の収音装置１０は、データ入力部２、周波数領域変換部３、指向性形成部４、伝搬遅延差補正部５、パワー補正部６、第１の減算部７、及び第２の減算部８を有している。 7 includes a data input unit 2, a frequency domain conversion unit 3, a directivity forming unit 4, a propagation delay difference correction unit 5, a power correction unit 6, a first subtraction unit 7, and a first subtraction unit. 2 subtracting sections 8 are provided.

マイクアレイＭＡ_１、ＭＡ_２からの捕捉信号は、それぞれ、データ入力部２においてアナログ信号からデジタル信号（データ）に変換され、周波数領域変換部３において時間領域から周波数領域へと変換されて捕捉信号群Ｘ_１及びＸ_２が得られる。そして、指向性形成部４において図６（ａ）の指向性Ｚ１、指向性Ｚ２のような指向性を有するビームフォーマが適用されてビームフォーマ出力信号Ｘ_ｍａ１（ｆ）及びＸ_ｍａ２（ｆ）が得られる。そして、伝搬遅延差補正部５において各マイクアレイと目的エリアとの距離（既知の情報）に基づいていずれかのビームフォーマ出力信号Ｘ_ｍａ１（ｆ）及びＸ_ｍａ２（ｆ）を遅延させてタイミングを合わせて、遅延補正信号Ｘ’_ｍａ１（ｆ）及びＸ’_ｍａ２（ｆ）が得られる。 Capture signals from the microphone arrays MA ₁ and MA ₂ are converted from analog signals to digital signals (data) in the data input unit 2, and converted from the time domain to the frequency domain in the frequency domain conversion unit 3. group _{X 1} and _{X 2} are obtained. In the directivity forming unit 4, beamformers having directivity such as directivity Z1 and directivity Z2 in FIG. _6A are applied, and beamformer output signals X _ma1 (f) and X _ma2 (f) are obtained. can get. Then, the propagation delay difference correction unit 5 delays one of the beamformer output signals X _ma1 (f) and X _ma2 (f) based on the distance (known information) between each microphone array and the target area, and _sets the timing. In addition, delay correction signals X ′ _ma1 (f) and X ′ _ma2 (f) are obtained.

パワー補正部６では、各マイクアレイと目的エリアとの距離による振幅差に加えて、目的エリア内の話者の向きに適応するため、（１）式によって振幅補正係数α_ｍａ１（アルファ）を算出する。なお、（１）式中の演算子ｍｏｄｅ_ｆ（Ａ（ｆ））は、変数ｆにより値が変わる関数値Ａ（ｆ）のうち最も多く出現した値（最頻値）を得る演算子である。また、最頻値に代えて、（２）式のように中央値を用いても良い。なお、（２）式中の演算子ｍｅｄｉａｎ_ｆ（Ａ（ｆ））は、変数ｆにより値が変わる関数値Ａ（ｆ）の中央値を得る演算子である。

The power correction unit 6 calculates the amplitude correction coefficient α _ma1 (alpha) by the equation (1) in order to adapt to the direction of the speaker in the target area in addition to the amplitude difference depending on the distance between each microphone array and the target area. To do. Note that the operator mode _f (A (f)) in the expression (1) is an operator that obtains the most frequently occurring value (mode value) among the function values A (f) whose values change depending on the variable f. . Further, instead of the mode value, a median value may be used as in equation (2). The operator median _f (A (f)) in the expression (2) is an operator that obtains the median value of the function value A (f) whose value varies depending on the variable f.

そして、第１の減算部７において、マイクアレイＭＡ_１に係る遅延補正信号Ｘ’_ｍａ１（ｆ）から振幅補正係数α_ｍａ１によって振幅を補正したマイクアレイＭＡ_２に係る遅延補正信号Ｘ’_ｍａ２（ｆ）をスペクトル減算することにより、両ビームフォーマ出力で重なっている目的エリア音成分が消去され、マイクアレイＭＡ_１に係る遅延補正信号Ｘ’_ｍａ１（ｆ）に含まれている非目的エリア音成分Ｎ_ｍａ１（ｆ）が抽出される。（３）式は、概ねこのような考え方に従っている算出式である。
Ｎ_ｍａ１＝Ｘ’_ｍａ１−α_ｍａ１・Ｘ’_ｍａ２ …（３） Then, in the first subtracting unit 7, 'delay correction signal X according _ma1 from (f) to the microphone array MA ₂ obtained by correcting the amplitude by the amplitude correction coefficient alpha _ma1' _ma2 delay correction signal X according to the microphone array MA ₁ (f ) by spectral subtraction to both beamformer erased object area sound components overlapping in the output, the delay correction signal X _{'ma1 (f)} non-target area sound components contained in the N of the microphone array MA ₁ _ma1 (f) is extracted. Formula (3) is a calculation formula that generally follows such a concept.
N _ma1 = X ′ _ma1 −α _ma1 · X ′ _ma2 (3)

そして、第２の減算部８において、マイクアレイＭＡ_１に係る遅延補正信号Ｘ’_ｍａ１（ｆ）から非目的エリア音成分Ｎ_ｍａ１（ｆ）をスペクトル減算することにより、目的エリア音Ｙ_ｍａ１（ｆ）が抽出される。（４）式は、概ねこのような考え方に従っている算出式である。なお、（４）式中のβ_ｍａ１（ベータ）は、非目的エリア音の除去強度を定めている一定値を取る係数である。
Ｙ_ｍａ１＝Ｘ’_ｍａ１−β_ｍａ１・Ｎ_ｍａ１…（４） Then, the second subtraction unit 8 subtracts the spectrum of the non-target area sound component N _ma1 (f) from the delay correction signal X ′ _ma1 (f) related to the microphone array MA ₁ , so that the target area sound Y _ma1 (f ) Is extracted. Formula (4) is a calculation formula that generally follows this concept. In addition, (beta) _ma1 (beta) in (4) Formula is a coefficient which takes the fixed value which has defined the removal intensity _| strength of the non-target area sound.
Y _ma1 = X ′ _ma1 −β _ma1 · N _ma1 (4)

以上のように、従来の収音方法を用いれば、目的エリアの周囲に非目的エリア音源が存在していても、目的エリア音のみを収音することができる。 As described above, if the conventional sound collection method is used, only the target area sound can be collected even if there is a non-target area sound source around the target area.

特開２０１４−７２７０８号公報JP 2014-72708 A

浅野太著、“音のアレイ信号処理−音源の定位・追跡と分離”、社団法人日本音響学会、コロナ社、２０１１年２月２５日発行Asano Tadashi, "Sound Array Signal Processing-Sound Source Localization / Tracking and Separation", The Acoustical Society of Japan, Corona, February 25, 2011 矢頭隆、森戸誠、山田圭、小川哲司共著、“正方形マイクロホンアレイによる音源分離技術（＜特集＞音声認識技術の実用化への取り組み）”、一般社団法人情報処理学会、情報処理５１（１１）、ｐｐ．１４１０−１４１６．２０１０年Jointly written by Takashi Yagami, Makoto Morito, Satoshi Yamada, and Tetsuji Ogawa, “Sound source separation technology using a square microphone array (<Special feature> Efforts for practical application of speech recognition technology)”, Information Processing Society of Japan, Information Processing 51 Pp. 1410-1416.2010

しかし、従来の収音方法では、目的エリア音のみを収音するために、２回のスペクトル減算を行っているため、抽出された目的エリア音に音質の問題が生じる可能性がある。 However, in the conventional sound collecting method, since the spectrum subtraction is performed twice in order to collect only the target area sound, there may be a problem in sound quality in the extracted target area sound.

スペクトル減算は、目的音成分と雑音成分が混在する観測信号と、適当な方法で推定した雑音成分とがあるときに、周波数成分ごとに、観測信号の振幅又はパワーから推定雑音成分の振幅又はパワーを減じることで、目的音の振幅又はパワーを推定する方法である。推定雑音成分は、実環境では推定誤差を必ず含んでしまう。そのため、スペクトル減算は、雑音成分が過大推定された周波数成分では目的音の成分までも減衰させてしまうため、目的音がひずむ課題と、雑音成分が過小推定された周波数成分では雑音成分を減衰させきれないため、雑音成分が残留する課題を有する。またさらに、周波数成分ごとにおいて、真の目的音の振幅又はパワーと真の雑音の振幅又はパワーとの和は、観測信号の振幅又はパワーと一致するとは限らないため、仮に推定雑音成分が推定誤差を含まなかったとしても、スペクトル減算は目的音がひずむ課題と雑音成分が残留する課題を有する。 Spectral subtraction is the measurement of the amplitude or power of the estimated noise component from the amplitude or power of the observed signal for each frequency component when there is an observed signal in which the target sound component and noise component are mixed and the noise component estimated by an appropriate method. This is a method for estimating the amplitude or power of the target sound. The estimated noise component necessarily includes an estimation error in the actual environment. For this reason, spectral subtraction attenuates the target sound component for the frequency component with an overestimated noise component, so the problem that the target sound is distorted and the noise component for the frequency component with an underestimated noise component are attenuated. Since it cannot be solved, there is a problem that a noise component remains. Furthermore, for each frequency component, the sum of the amplitude or power of the true target sound and the amplitude or power of the true noise does not always match the amplitude or power of the observation signal. Even if not included, spectral subtraction has a problem that the target sound is distorted and a problem that a noise component remains.

なお、残留した雑音成分は、ミュージカルノイズと呼ばれる極めて不快な雑音として知覚されるため、スペクトル減算の最大の課題として一般的に周知されている。ミュージカルノイズは雑音成分が強くひずんだ雑音である。 Since the remaining noise component is perceived as extremely unpleasant noise called musical noise, it is generally known as the biggest problem of spectrum subtraction. Musical noise is noise in which the noise component is strongly distorted.

従来の収音方法では、以上のような課題を有するスペクトル減算を２回適用するため、強調された目的エリア音がひずむ場合があるという課題があった。 In the conventional sound collection method, since the spectral subtraction having the above-described problems is applied twice, there is a problem that the emphasized target area sound may be distorted.

そのため、より少ないひずみで目的エリア音のみを強調する収音装置、プログラム及び方法が望まれている。 Therefore, a sound collection device, program, and method that emphasize only the target area sound with less distortion are desired.

第１の本発明の収音装置は、（１）２つのマイクから成る複数のマイクアレイごとに、音の到来方向に応じて変化するものであって、目的エリア方向から到来する音響に対して大きな値を取り、目的エリア方向以外の方向から到来する音響に対して小さな値を取る特徴を備える到来方向特徴量を算出する特徴量算出手段と、（２）周波数成分ごとに、それぞれの前記マイクアレイの前記到来方向特徴量を統合したエリア特徴量を取得する特徴量統合手段と、（３）前記エリア特徴量を用いて、前記マイクアレイが出力する捕捉信号に基づく信号から目的エリア音を抽出する目的エリア音抽出手段とを有することを特徴とする。 The sound collection device of the first aspect of the present invention is (1) for each of a plurality of microphone arrays composed of two microphones, which changes in accordance with the direction of arrival of sound, and for sound coming from the target area direction. Feature amount calculating means for calculating an arrival direction feature amount having a large value and a small value for sound arriving from directions other than the direction of the target area, and (2) each of the microphones for each frequency component Feature amount integration means for acquiring an area feature amount obtained by integrating the arrival direction feature amounts of the array; and (3) extracting a target area sound from a signal based on the captured signal output from the microphone array using the area feature amount. And a target area sound extraction means.

第２の本発明の収音プログラムは、コンピュータを、（１）２つのマイクから成る複数のマイクアレイごとに、音の到来方向に応じて変化するものであって、目的エリア方向から到来する音響に対して大きな値を取り、目的エリア方向以外の方向から到来する音響に対して小さな値を取る特徴を備える到来方向特徴量を算出する特徴量算出手段と、（２）周波数成分ごとに、それぞれの前記マイクアレイの前記到来方向特徴量を統合したエリア特徴量を取得する特徴量統合手段と、（３）前記エリア特徴量を用いて、前記マイクアレイが出力する捕捉信号に基づく信号から目的エリア音を抽出する目的エリア音抽出手段として機能させることを特徴とする。 The sound collection program according to the second aspect of the present invention allows a computer to (1) change sound for each of a plurality of microphone arrays composed of two microphones according to the direction of arrival of sound, and to receive sound coming from a target area direction. A feature amount calculation means for calculating an arrival direction feature amount having a feature that takes a large value for sound and a small value for sound coming from directions other than the target area direction, and (2) for each frequency component, Feature amount integration means for acquiring an area feature amount obtained by integrating the direction-of-arrival feature amounts of the microphone array; and (3) a target area from a signal based on a captured signal output by the microphone array using the area feature amount. It is made to function as a target area sound extraction means which extracts a sound.

第３の本発明の収音方法は、（１）特徴量算出手段、特徴量統合手段、及び目的エリア音抽出手段を備え、（２）前記特徴量算出手段は、２つのマイクから成る複数のマイクアレイごとに、音の到来方向に応じて変化するものであって、目的エリア方向から到来する音響に対して大きな値を取り、目的エリア方向以外の方向から到来する音響に対して小さな値を取る特徴を備える到来方向特徴量を算出し、（３）前記特徴量統合手段は、周波数成分ごとに、それぞれの前記マイクアレイの前記到来方向特徴量を統合したエリア特徴量を取得し、（５）前記目的エリア音抽出手段は、前記エリア特徴量を用いて、前記マイクアレイが出力する捕捉信号に基づく信号から目的エリア音を抽出することを特徴とする。 A sound collection method according to a third aspect of the present invention includes (1) a feature amount calculation unit, a feature amount integration unit, and a target area sound extraction unit. (2) The feature amount calculation unit includes a plurality of microphones including two microphones. Each microphone array changes according to the direction of arrival of sound, and takes a large value for sound coming from the target area direction and a small value for sound coming from directions other than the target area direction. (3) The feature amount integration unit acquires, for each frequency component, an area feature amount obtained by integrating the arrival direction feature amounts of the respective microphone arrays, and (5) The target area sound extraction means extracts the target area sound from the signal based on the captured signal output from the microphone array, using the area feature value.

本発明によれば、より少ないひずみで目的エリア音のみを強調する収音装置、プログラム及び方法を提供することができる。 According to the present invention, it is possible to provide a sound collection device, a program, and a method that emphasize only a target area sound with less distortion.

実施形態に係る収音装置の機能的構成について示したブロック図である。It is the block diagram shown about the functional structure of the sound collection device which concerns on embodiment. 実施形態に係る第１の到来方向特徴量の例について示した説明図である。It is explanatory drawing shown about the example of the 1st arrival direction feature-value which concerns on embodiment. 実施形態に係る第２の到来方向特徴量の例について示した説明図である。It is explanatory drawing shown about the example of the 2nd arrival direction feature-value which concerns on embodiment. 実施形態に係るエリア特徴量の例について示した説明図である。It is explanatory drawing shown about the example of the area feature-value which concerns on embodiment. 実施形態に収音装置で求められる目的エリアの判定結果の例について示した説明図である。It is explanatory drawing shown about the example of the determination result of the target area calculated | required with the sound collection device in embodiment. 従来の収音方法の例について示した説明図である。It is explanatory drawing shown about the example of the conventional sound collection method. 従来の収音装置の機能的構成について示したブロック図である。It is the block diagram shown about the functional structure of the conventional sound collection apparatus.

（Ａ）主たる実施形態
以下、本発明による収音装置、プログラム及び方法の一実施形態を、図面を参照しながら詳述する。 (A) Main Embodiment Hereinafter, an embodiment of a sound collection device, a program, and a method according to the present invention will be described in detail with reference to the drawings.

（Ａ−１）実施形態の構成
図１は、この実施形態の収音装置１００の機能的構成について示したブロック図である。 (A-1) Configuration of Embodiment FIG. 1 is a block diagram showing a functional configuration of the sound collection device 100 of this embodiment.

収音装置１００は、Ｍ個のマイクアレイＭＡ（ＭＡ_１〜ＭＡ_Ｍ）から供給される音響信号を用いて、目的エリアの音源からの目的エリア音を収音する目的エリア音収音処理を行う。 The sound collection device 100 performs a target area sound collection process for collecting a target area sound from a sound source in the target area using an acoustic signal supplied from the _M microphone arrays MA (MA _{1 to} MA _M ). .

各マイクアレイＭＡは、目的エリアが存在する空間の、目的エリアを指向できる場所に配置される。各マイクアレイＭＡは、２つのマイク１（１_１、１_２）により構成されている。各マイクアレイＭＡでは、２つのマイク１_１、１_２によって捕捉された音響に基づく音響信号がデータ入力部１０２に供給される。 Each microphone array MA is arranged at a location where the target area can be directed in the space where the target area exists. Each microphone array MA is composed of _two microphones 1 (1 ₁ , 1 ₂ ). In each microphone array MA, an acoustic signal based on the sound captured by the _two microphones 1 ₁ and 1 ₂ is supplied to the data input unit 102.

次に、収音装置１００の内部構成について図１を用いて説明する。 Next, the internal configuration of the sound collection device 100 will be described with reference to FIG.

図１に示すように、この実施形態に係る収音装置１００は、データ入力部１０２、周波数領域変換部１０３、特徴量算出部１０４、特徴量統合部１０５、及び目的エリア音抽出部１０６を有している。収音装置１００内部の各構成要素の詳細については後述する。 As shown in FIG. 1, the sound collection device 100 according to this embodiment includes a data input unit 102, a frequency domain conversion unit 103, a feature amount calculation unit 104, a feature amount integration unit 105, and a target area sound extraction unit 106. doing. Details of each component in the sound collection device 100 will be described later.

収音装置１００において、デジタル信号に変換された後の処理構成を、プロセッサやメモリ等を備えるコンピュータにプログラム（実施形態に係る収音プログラムを含む）を実行させるようにしてもよいが、その場合であっても、機能的には、図１で表すことができる。 In the sound collection device 100, the processing configuration after being converted into a digital signal may be caused to cause a computer including a processor, a memory, and the like to execute a program (including the sound collection program according to the embodiment). However, it can be functionally represented in FIG.

（Ａ−２）実施形態の動作
次に、以上のような構成を有するこの実施形態の収音装置１００の動作（この実施形態の収音方法）を説明する。 (A-2) Operation of Embodiment Next, the operation (sound collection method of this embodiment) of the sound collection device 100 of this embodiment having the above-described configuration will be described.

データ入力部１０２は、マイクアレイＭＡ_１〜ＭＡ_Ｍで捕捉した音響信号を、マイク１ごとにアナログ信号からデジタル信号（データ）に変換する。データ入力部１０２は、得られた捕捉信号を、周波数領域変換部１０３に与える。 Data input unit 102, an acoustic signal captured by the microphone array _MA 1 to MA _M, is converted from an analog signal for each microphone 1 into a digital signal (data). The data input unit 102 gives the acquired captured signal to the frequency domain transform unit 103.

以下では、マイクアレイＭＡ_１〜ＭＡ_Ｍのマイク１_１で捕捉された捕捉信号をそれぞれｘ_１，１（ｔ）〜ｘ_Ｍ，１（ｔ）と表し、マイクアレイＭＡ_１〜ＭＡ_Ｍのマイク１_２で捕捉された捕捉信号をそれぞれ、ｘ_１，２（ｔ）〜ｘ_Ｍ，２（ｔ）と表す。 In the following, the microphone array _MA 1 to MA _M of the captured captured signal by the microphone _{1 1} each represent _{_{x 1,1 (t) ~x M,}} 1 (t), the microphone 1 of the microphone array _MA 1 to MA _M _The captured signals captured at ₂ are denoted as x _1,2 (t) to x _{M, 2} (t), respectively.

周波数領域変換部１０３は、捕捉信号ｘ_１，１（ｔ）〜ｘ_Ｍ，１（ｔ）、ｘ_１，２（ｔ）〜ｘ_Ｍ，２（ｔ）をそれぞれ時間領域から周波数領域へと変換する。 The frequency domain transform unit 103 transforms the captured signals x _1,1 (t) to x _{M, 1} (t) and x _1,2 (t) to x _{M, 2} (t) from the time domain to the frequency domain, respectively. To do.

以下では、捕捉信号ｘ_１，１（ｔ）〜ｘ_Ｍ，１（ｔ）、ｘ_１，２（ｔ）〜ｘ_Ｍ，２（ｔ）を周波数領域に変換した信号を、Ｘ_１，１（ｔ）〜Ｘ_Ｍ，１（ｔ）、Ｘ_１，２（ｔ）〜Ｘ_Ｍ，２（ｔ）と表す。 In the following, the signals obtained by converting the acquired signals x _1,1 (t) to x _{M, 1} (t) and x _1,2 (t) to x _{M, 2} (t) into the frequency domain are represented by X _1,1 ( t) to X _{M, 1} (t), X _1,2 (t) to X _{M, 2} (t).

周波数領域変換部１０３は、得られた周波数領域の捕捉信号Ｘ_１，１（ｔ）〜Ｘ_Ｍ，１（ｔ）、Ｘ_１，２（ｔ）〜Ｘ_Ｍ，２（ｔ）を、特徴量算出部１０４及び目的エリア音抽出部１０６に供給する。 The frequency domain transforming unit 103 uses the obtained frequency domain captured signals X _1,1 (t) to X _{M, 1} (t), X _1,2 (t) to X _{M, 2} (t) as feature quantities. This is supplied to the calculation unit 104 and the target area sound extraction unit 106.

周波数領域変換部１０３が行う変換には、高速フーリエ変換（ＦＦＴ：ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍａｔｉｏｎ）やウェーブレット変換、フィルタバンクなどを利用することができるが、ＦＦＴが最も好適である。ここで、ＦＦＴを行う際、ハミング窓などの各種窓関数を用いるようにしても良い。 For the transform performed by the frequency domain transform unit 103, Fast Fourier Transform (FFT), wavelet transform, filter bank, or the like can be used, but FFT is most suitable. Here, when performing the FFT, various window functions such as a Hamming window may be used.

特徴量算出部１０４は、捕捉信号Ｘ_１，１（ｔ）〜Ｘ_Ｍ，１（ｔ）、Ｘ_１，２（ｔ）〜Ｘ_Ｍ，２（ｔ）から、マイクアレイＭＡごとに、到来方向特徴量Ｄ_１（ｆ）〜Ｄ_Ｍ（ｆ）を算出する。特徴量算出部１０４は、得られた到来方向特徴量Ｄ_１（ｆ）〜Ｄ_Ｍ（ｆ）を、特徴量統合部１０５に供給する。 The feature amount calculation unit 104 determines the arrival direction for each microphone array MA from the captured signals X _1,1 (t) to X _{M, 1} (t) and X _1,2 (t) to X _{M, 2} (t). The feature amounts D ₁ (f) to D _M (f) are calculated. The feature amount calculation unit 104 supplies the obtained arrival direction feature amounts D ₁ (f) to D _M (f) to the feature amount integration unit 105.

特徴量算出部１０４において、到来方向特徴量Ｄ_１（ｆ）〜Ｄ_Ｍ（ｆ）は捕捉信号Ｘ_１，１（ｔ）〜Ｘ_Ｍ，１（ｔ）、Ｘ_１，２（ｔ）〜Ｘ_Ｍ，２（ｔ）からマイクアレイＭＡごとに同様の算出方法によって算出される。以下ではｉ番目（ｉは１〜Ｍのいずれか）のマイクアレイＭＡ_ｉにおける捕捉信号Ｘ_ｉ，１（ｆ）、Ｘ_ｉ，２（ｆ）と到来方向特徴量Ｄ_ｉ（ｆ）について説明する。 In the feature amount calculation unit 104, the arrival direction feature amounts D ₁ (f) to D _M (f) are acquired from the captured signals X _1,1 (t) to X _{M, 1} (t), X _1,2 (t) to X The same calculation method is used for each microphone array MA from _{M, 2} (t). Hereinafter, the captured signals X _{i, 1} (f), X _{i, 2} (f) and the arrival direction feature amount D _i (f) in the i-th (i is any one of 1 to M) microphone array MA _i will be described. .

到来方向特徴量Ｄ_ｉ（ｆ）は、目的エリア方向に対して大きな値を取り、目的エリア方向以外の方向に対して小さな値を取るような特徴を持つことが好ましい。到来方向特徴量Ｄ_ｉ（ｆ）にそのような特徴を持たせられる算出方法であれば、どのような算出方法を用いても良い。目的エリアが、全てのマイクアレイＭＡの正面方向に位置している場合は、例えば（５）式を用いるのが好適である。

The arrival direction feature quantity D _i (f) preferably has a characteristic that takes a large value with respect to the destination area direction and takes a small value with respect to directions other than the destination area direction. Any calculation method may be used as long as it is a calculation method that allows the arrival direction feature value D _i (f) to have such a feature. When the target area is located in the front direction of all the microphone arrays MA, it is preferable to use, for example, Equation (5).

捕捉信号Ｘ_ｉ，１（ｆ）、Ｘ_ｉ，２（ｆ）は、目的エリア音と非目的エリア音とが混在した信号であるが、音声のスパース性を仮定すると、各周波数成分には目的エリア音と非目的エリア音のいずれかしか含まれないことになる。したがって、ある音源があるマイクアレイＭＡに到来する角度をθ（シータ）と定義すると、（５）式は（６）式のように展開できる。（６）式において、ｃは音速、ｄはマイクアレイを構成する２つのマイク１_１、１_２の間の距離である。なお、同様に音声のスパース性を仮定すると、到来方向特徴量Ｄ_ｉ（ｆ）の算出方法として（７）式のような到来方向を明示的に求める算出方法を用いることもできる。（７）式の絶対値の内側が到来方向θの正弦関数の値（ｓｉｎθ）となっている。

The captured signals X _{i, 1} (f) and X _{i, 2} (f) are signals in which the target area sound and the non-target area sound are mixed. Only either the area sound or the non-target area sound is included. Therefore, if the angle at which a certain sound source arrives at a microphone array MA is defined as θ (theta), equation (5) can be developed as equation (6). In Equation (6), c is the speed of sound, and d is the distance between the _two

microphones

1 ₁ and 1 ₂ constituting the microphone array. Similarly, assuming the sparseness of speech, a calculation method for explicitly determining the arrival direction as shown in equation (7) can be used as the calculation method of the arrival direction feature quantity D _i (f). The inside of the absolute value of equation (7) is the value of the sine function (sin θ) in the direction of arrival θ.

次に、到来方向特徴量Ｄ_ｉ（ｆ）の具体例について図２、図３を用いて説明する。 Next, a specific example of the arrival direction feature amount D _i (f) will be described with reference to FIGS.

図２（ａ）、図３（ａ）は、それぞれマイクアレイＭＡ_１、ＭＡ_２に対応する到来方向特徴量Ｄ_１（ｆ）、Ｄ_２（ｆ）を（５）式を用いて求めた場合の例を３次元（縦、横、高さ）のグラフで示している。 2A and 3A show the cases where the arrival direction feature amounts D ₁ (f) and D ₂ (f) corresponding to the microphone arrays MA ₁ and MA ₂ are obtained using the equation (5), respectively. Is shown in a three-dimensional (vertical, horizontal, height) graph.

図２（ａ）、図３（ａ）のグラフは、マイクアレイＭＡ_１からの距離を縦位置（グラフの縦方向の軸）とし、マイクアレイＭＡ_２からの距離を横位置（グラフの横方向の軸）とし、到来方向特徴量Ｄ_１（ｆ）、Ｄ_２（ｆ）の値を高さ（グラフの高さ方向（上下方向）の軸）としている。そして、図２（ａ）、図３（ａ）のグラフは、ｆ＝３ｋＨｚとしたときの様々な縦位置と横位置から目的エリア音や非目的エリア音が到来した場合における到来方向特徴量Ｄ_１（ｆ）、Ｄ_２（ｆ）の値を示している。 Graph of FIG. 2 (a), FIG. 3 (a), the vertical position the distance from the microphone array MA ₁ (vertical axis of the graph), the transverse direction of the transverse position (graph the distance from the microphone array MA ₂ ) And the values of the arrival direction feature values D ₁ (f) and D ₂ (f) are heights (axis in the height direction (vertical direction) of the graph). The graphs of FIGS. 2 (a) and 3 (a) show arrival direction feature amounts D when target area sounds and non-target area sounds arrive from various vertical and horizontal positions when f = 3 kHz. The values of ₁ (f) and D ₂ (f) are shown.

図２（ｂ）は、図２（ａ）に図示したＰ４１１〜Ｐ４１６の各位置における到来方向特徴量Ｄ_１（ｆ）の値を示している。図２（ｂ）に示す通り、Ｐ４１１〜Ｐ４１６のそれぞれの位置の到来方向特徴量Ｄ_１（ｆ）の値は、−０．１３、１、−０．１３、０．７２、１、０．７２となる。 FIG. 2B shows the value of the arrival direction feature quantity D ₁ (f) at each position of P411 to P416 shown in FIG. As shown in FIG. 2B, the values of the arrival direction feature amounts D ₁ (f) at the respective positions of P411 to P416 are −0.13, ₁ , −0.13, 0.72, 1, 0,. 72.

なお、図２（ａ）は、マイクアレイＭＡ_１を横位置１．５ｍ、縦位置０ｍに設置したときの、マイクアレイＭＡ_１に関するｆ＝３ｋＨｚの音の到来方向特徴量Ｄ_１（ｆ）のグラフとなっている。図２（ａ）、図２（ｂ）に示すように、マイクアレイＭＡ_１の正面方向（横位置が１．５ｍの場合）において、到来方向特徴量Ｄ_１（ｆ）がピーク値となっていることがわかる。 Incidentally, FIG. 2 (a), the lateral position 1.5m microphone array MA _1, when installed in a vertical position 0 m, the arrival of sound f = 3 kHz about the microphone array MA ₁ direction feature amount _D 1 of the (f) It is a graph. As shown in FIGS. 2 (a) and 2 (b), in the front direction of microphone array MA ₁ (when the lateral position is 1.5 m), arrival direction feature quantity D ₁ (f) has a peak value. I understand that.

図３（ｂ）は、図３（ａ）に図示したＰ４２１〜Ｐ４２６の各位置における到来方向特徴量Ｄ_２（ｆ）の値を示している。図３（ｂ）に示す通り、Ｐ４２１〜Ｐ４２６のそれぞれの位置の到来方向特徴量Ｄ_２（ｆ）の値は、０．７２、−０．１３、１、−０．１３、０．７２、１となる。 FIG. 3B shows the value of the arrival direction feature value D ₂ (f) at each position of P421 to P426 shown in FIG. As shown in FIG. 3B, the values of the arrival direction feature amounts D ₂ (f) at the respective positions of P421 to P426 are 0.72, −0.13, 1, −0.13, 0.72, 1

なお、図３（ａ）は、マイクアレイＭＡ_２を横位置０ｍ、縦位置１．５ｍに設置したときの、マイクアレイＭＡ_２に関するｆ＝３ｋＨｚにおける到来方向特徴量Ｄ_２（ｆ）のグラフである。図３（ａ）、図３（ｂ）に示すように、マイクアレイＭＡ_２の正面方向（縦位置が１．５ｍの場合）において、到来方向特徴量Ｄ_２（ｆ）がピーク値となっていることがわかる。 3 (a) is, the microphone array MA ₂ lateral position 0 m, when placed in a vertical position 1.5 m, a graph of the arrival direction feature amount _D 2 (f) in f = 3 kHz about the microphone array MA ₂ is there. As shown in FIGS. 3A and 3B, the arrival direction feature amount D ₂ (f) is a peak value in the front direction of the microphone array MA ₂ (when the vertical position is 1.5 m). I understand that.

特徴量統合部１０５は、周波数成分ごとに、到来方向特徴量Ｄ_１（ｆ）〜Ｄ_Ｍ（ｆ）を統合してエリア特徴量Ｅ（ｆ）を算出する。得られたエリア特徴量Ｅ（ｆ）は、目的エリア音抽出部１０６に与えられる。 The feature amount integration unit 105 calculates the area feature amount E (f) by integrating the arrival direction feature amounts D ₁ (f) to D _M (f) for each frequency component. The obtained area feature amount E (f) is given to the target area sound extraction unit 106.

エリア特徴量Ｅ（ｆ）の算出方法（統合方法）は、全ての到来方向特徴量Ｄ_１（ｆ）〜Ｄ_Ｍ（ｆ）が大きいときにエリア特徴量Ｅ（ｆ）も大きくなるような算出方法（統合方法）であれば、どのような算出方法を用いても良いが、例えば、（８）式のように、周波数成分ごとに、全てのマイクアレイに関して最小となる到来方向特徴量Ｄ_１（ｆ）〜Ｄ_Ｍ（ｆ）を選択してエリア特徴量Ｅ（ｆ）とするようにしてもよい。
Ｅ（ｆ）＝ｍｉｎ［Ｄ_１（ｆ），…，Ｄ_Ｍ（ｆ）］ …（８） The calculation method (integration method) of the area feature quantity E (f) is a calculation in which the area feature quantity E (f) increases when all the arrival direction feature quantities D ₁ (f) to D _M (f) are large. Any calculation method may be used as long as it is a method (integration method). For example, as shown in equation (8), for each frequency component, the arrival direction feature amount D ₁ that is the minimum for all microphone arrays is used. (F) to D _M (f) may be selected as the area feature amount E (f).
E (f) = min [D ₁ (f),..., D _M (f)] (8)

次に、エリア特徴量Ｅ（ｆ）の具体例について図４を用いて説明する。 Next, a specific example of the area feature amount E (f) will be described with reference to FIG.

図４（ａ）は、（８）式を用いて、エリア特徴量Ｅ（ｆ）を求めた場合の例を３次元（縦、横、高さ）のグラフで示している。 FIG. 4A shows a three-dimensional (vertical, horizontal, height) example when the area feature amount E (f) is obtained using the equation (8).

図４（ａ）は、図６のようにマイクアレイＭＡ_１、ＭＡ_２（マイク１が２個の）を配置した場合において、到来方向特徴量Ｄ_１（ｆ）、Ｄ_２（ｆ）をそれぞれ（５）式で算出し、算出した到来方向特徴量Ｄ_１（ｆ）、Ｄ_２（ｆ）を（８）式に適用してエリア特徴量Ｅ（ｆ）を算出した場合の例を示している。すなわち、図４（ａ）は、図２（ａ）、図３（ａ）に示す到来方向特徴量Ｄ_１（ｆ）、Ｄ_２（ｆ）を（８）式により統合したエリア特徴量Ｅ（ｆ）を示している。 FIG. 4A shows arrival direction feature amounts D ₁ (f) and D ₂ (f) when microphone arrays MA ₁ and MA ₂ ( _two microphones 1) are arranged as shown in FIG. An example in which the area feature quantity E (f) is calculated by calculating the formula (5) and applying the calculated arrival direction feature quantities D ₁ (f) and D ₂ (f) to the formula (8) is shown. Yes. That is, FIG. 4A shows an area feature E () obtained by integrating the arrival direction features D ₁ (f) and D ₂ (f) shown in FIGS. 2A and 3A according to the equation (8). f).

図４（ａ）のグラフは、マイクアレイＭＡ_１からの距離を縦位置（グラフの縦方向の軸）とし、マイクアレイＭＡ_２からの距離を横位置（グラフの横方向の軸）とし、エリア特徴量Ｅ（ｆ）の値を高さ（グラフの高さ方向（上下方向）の軸）としている。なお、図４（ａ）は、ｆ＝３ｋＨｚのときの様々な縦位置と横位置におけるエリア特徴量Ｅ（ｆ）の値を示している。 Graph of FIG. 4 (a), the distance from the microphone array MA ₁ and vertical position (vertical axis of the graph), the distance from the microphone array MA ₂ to the lateral position (horizontal axis of the graph), the area The value of the feature amount E (f) is the height (axis in the height direction (vertical direction) of the graph). FIG. 4A shows values of area feature values E (f) at various vertical positions and horizontal positions when f = 3 kHz.

図４（ｂ）は、図４（ａ）に図示したＰ５１〜５９の各位置におけるエリア特徴量Ｅ（ｆ）の値を示している。図４（ｂ）に示す通り、Ｐ５１〜５９のそれぞれの位置のエリア特徴量Ｅ（ｆ）の値は、−０．１３、０．３６、−０．１３、０．３６、−０．１３、０．３６、０．７２、０．３６、１となっている。 FIG. 4B shows the value of the area feature amount E (f) at each position of P51 to P59 shown in FIG. As shown in FIG. 4B, the area feature value E (f) at each of the positions P51 to P59 is -0.13, 0.36, -0.13, 0.36, -0.13. , 0.36, 0.72, 0.36, and 1.

図４（ａ）、図４（ｂ）に示すように、マイクアレイＭＡ_１とマイクアレイＭＡ_２の正面方向（横位置と縦位置が共に１．５ｍとなる点の周辺）において、エリア特徴量Ｅ（ｆ）が大きな値となっていることがわかる。 As shown in FIG. 4A and FIG. 4B, in the front direction of the microphone array MA ₁ and the microphone array MA ₂ (around the point where both the horizontal position and the vertical position are 1.5 m), the area feature amount It can be seen that E (f) has a large value.

目的エリア音抽出部１０６は、捕捉信号Ｘ_１，１（ｔ）〜Ｘ_Ｍ，１（ｔ）、Ｘ_１，２（ｔ）〜Ｘ_Ｍ，２（ｔ）とエリア特徴量Ｅ（ｆ）とに基づいて目的エリア強調音Ｙ（ｆ）を算出する。そして、目的エリア音抽出部１０６は、得られた目的エリア強調音Ｙ（ｆ）を次段に供給（出力）する。 The target area sound extraction unit 106 includes the captured signals X _1,1 (t) to X _{M, 1} (t), X _1,2 (t) to X _{M, 2} (t), and the area feature E (f). Based on the above, the target area emphasis sound Y (f) is calculated. Then, the target area sound extraction unit 106 supplies (outputs) the obtained target area emphasized sound Y (f) to the next stage.

目的エリア音抽出部１０６において、目的エリア音の抽出（強調）対象となる捕捉信号の選択（Ｘ_１，１（ｔ）〜Ｘ_Ｍ，１（ｔ）、Ｘ_１，２（ｔ）〜Ｘ_Ｍ，２（ｔ）のいずれかの選択）は任意であり、例えば先頭のＸ_１，１（ｆ）としても良いし、最も目的エリアに近いマイクに係る捕捉信号としても良いし、最も目的エリアに近いマイクアレイＭＡの捕捉信号群に遅延和ビームフォーマを適用して目的エリア音を少しだけ強調した信号（統合捕捉信号と呼ぶ）としても良い。以下、選択された捕捉信号又は統合捕捉信号を抽出対象信号Ｘ’（ｆ）と呼ぶ。 In the target area sound extraction unit 106, selection of capture signals (X _1,1 (t) to X _{M, 1} (t), X _1,2 (t) to X _M ) to be extracted (emphasized) for the target area sound. _{, 2} (t) is arbitrary, and may be, for example, the leading X _1,1 (f), or may be a captured signal related to the microphone closest to the target area, or may be the most target area. A delayed sum beamformer may be applied to the captured signal group of the nearby microphone array MA to generate a signal in which the target area sound is slightly emphasized (referred to as an integrated captured signal). Hereinafter, the selected capture signal or integrated capture signal is referred to as an extraction target signal X ′ (f).

目的エリア音抽出部１０６において、目的エリア音の抽出（強調）は、抽出対象信号Ｘ’（ｆ）の周波数成分の内、目的エリア音以外の周波数成分を減衰させることで達成される。そして、エリア特徴量Ｅ（ｆ）は目的エリアに近いほど大きな値となっていることから、目的エリア音抽出部１０６では、エリア特徴量Ｅ（ｆ）の大小に応じて抽出対象信号Ｘ’（ｆ）を減衰させることで、目的エリア音を抽出（強調）することができる。目的エリア音抽出部１０６では、例えば、（９）式のように、周波数成分ごとに所定の閾値Ｆ（ｆ）を事前に定めておいて、エリア特徴量Ｅ（ｆ）が閾値Ｆ（ｆ）より小さければ抽出対象信号Ｘ’（ｆ）の当該周波数成分を減衰させる（例えば、ゼロとする）ことで、目的エリア音の周波数成分だけが残った目的エリア強調音Ｙ（ｆ）を得ることができる。

In the target area sound extraction unit 106, extraction (emphasis) of the target area sound is achieved by attenuating frequency components other than the target area sound among the frequency components of the extraction target signal X ′ (f). Since the area feature amount E (f) becomes larger as it is closer to the target area, the target area sound extraction unit 106 determines the extraction target signal X ′ ( The target area sound can be extracted (emphasized) by attenuating f). In the target area sound extraction unit 106, for example, a predetermined threshold value F (f) is determined in advance for each frequency component as shown in Equation (9), and the area feature amount E (f) is set to the threshold value F (f). If it is smaller, the frequency component of the extraction target signal X ′ (f) is attenuated (for example, zero), thereby obtaining the target area emphasized sound Y (f) in which only the frequency component of the target area sound remains. it can.

目的エリア音抽出部１０６において、閾値Ｆ（ｆ）は、周波数成分によらず一定値としても良いが、その場合、周波数成分によって抽出（強調）されるエリアの範囲が変化してしまう。これは、目的エリアは、周波数が低いほど広く、周波数が高いほど狭くなるためである。そこで、目的エリア音抽出部１０６では、例えば（１０）式のように、周波数成分ごとに閾値Ｆ（ｆ）を定めることで、周波数の高低によらず目的エリア（周波数成分が減衰されない範囲）を一定の範囲に定めることができる。（１０）式において、φ（ファイ）は、各マイクアレイから見た目的エリアの広さ（角度）である。 In the target area sound extraction unit 106, the threshold value F (f) may be a constant value regardless of the frequency component, but in this case, the range of the area to be extracted (emphasized) changes depending on the frequency component. This is because the target area is wider as the frequency is lower and narrower as the frequency is higher. Therefore, the target area sound extraction unit 106 determines a target area (a range in which the frequency component is not attenuated) regardless of the level of the frequency by setting a threshold value F (f) for each frequency component, for example, as in Expression (10). It can be set within a certain range. In the equation (10), φ (phi) is the width (angle) of the target area viewed from each microphone array.

図５は、図２〜図４と同様に、図６のようにマイクアレイＭＡ_１、ＭＡ_２を配置した場合において、φ＝π／１０としたときに、目的エリアであると判定される範囲を示している。 FIG. 5 shows the range determined as the target area when φ = π / 10 when the microphone arrays MA ₁ and MA ₂ are arranged as shown in FIG. Is shown.

図５において、黒色に塗りつぶされた領域が閾値に基づき目的エリアでないと判定された範囲を示し、それ以外の領域（黒く塗りつぶされていない領域）が閾値に基づき目的エリアであると判定された範囲を示している。 FIG. 5 shows a range in which a black area is determined not to be the target area based on the threshold value, and other areas (areas that are not black) are determined to be the target area based on the threshold value. Is shown.

図５に示すように、縦横共に、およそ１〜２ｍの範囲が閾値に基づき目的エリアであると判定されていることがわかる。

As shown in FIG. 5, it can be seen that a range of approximately 1 to 2 m in both vertical and horizontal directions is determined to be the target area based on the threshold value.

（Ａ−３）実施形態の効果
この実施形態によれば、以下のような効果を奏することができる。 (A-3) Effects of Embodiment According to this embodiment, the following effects can be achieved.

この実施形態の収音装置１００では、スペクトル減算を行わないので、目的エリアが非目的エリア音源に囲まれている状況でも、少ないひずみで目的エリア音のみを強調することができる。 Since the sound collection device 100 of this embodiment does not perform spectral subtraction, even in a situation where the target area is surrounded by the non-target area sound source, it is possible to emphasize only the target area sound with little distortion.

（Ｂ）他の実施形態
本発明は、上記の実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。 (B) Other Embodiments The present invention is not limited to the above-described embodiments, and may include modified embodiments as exemplified below.

（Ｂ−１）特徴量算出部１０４において、到来方向特徴量Ｄ_ｉ（ｆ）の算出方法には、（１１）式や（１２）式も適用し得る。 (B-1) In the feature quantity calculation unit 104, the formula (11) and the formula (12) can also be applied to the method of calculating the arrival direction feature quantity D _i (f).

また、特徴量統合部１０５において、エリア特徴量Ｅ（ｆ）の算出方法（到来方向特徴量Ｄ_１（ｆ）〜Ｄ_Ｍ（ｆ）の統合方法）には、（１３）式や（１４）式も適用し得る。

Further, in the feature quantity integration unit 105, the area feature quantity E (f) calculation method (the integration method of the arrival direction feature quantities D ₁ (f) to D _M (f)) includes Expression (13) and (14) Equations can also be applied.

（Ｂ−２）目的エリア音抽出部１０６において、閾値Ｆ（ｆ）のある周波数（例えば２５０Ｈｚ）より小さい周波数成分では一定値としても良い。例えば２５０Ｈｚ未満におけるＦ（ｆ）の値として２５０ＨｚにおけるＦ（ｆ）を用いると、２５０Ｈｚ未満の周波数成分では目的エリアであると判定される範囲が広くなり、周波数の低い周波数成分がひずみにくくなり、よりひずみの少ない目的エリア強調音Ｙ（ｆ）を得られる。 (B-2) In the target area sound extraction unit 106, a frequency component smaller than a certain frequency (for example, 250 Hz) having the threshold value F (f) may be set to a constant value. For example, when F (f) at 250 Hz is used as the value of F (f) at less than 250 Hz, the frequency component less than 250 Hz has a wider range that is determined to be the target area, and the low-frequency component is less likely to be distorted. A target area emphasis sound Y (f) with less distortion can be obtained.

また、目的エリア音抽出部１０６において、２つの閾値Ｆ_１（ｆ）、Ｆ_２（ｆ）を用意して、エリア強調ゲインＧ（ｆ）を算出して、得られたエリア強調ゲインＧ（ｆ）を抽出対象信号Ｘ’（ｆ）に乗じることで、目的エリア強調音Ｙ（ｆ）を算出しても良い。例えば、２つの閾値Ｆ_１（ｆ）、Ｆ_２（ｆ）を（１５）式にしたがって算出するものとして、φ_１＝π／９、φ_２−π／１１として、（１６）式によってエリア強調ゲインを算出しても良い。これにより、抽出対象信号Ｘ’（ｆ）の周波数成分の中で、目的エリアと非目的エリアの境界付近に存在する音源に由来する成分の減衰度合が緩やかになるため、よりひずみの少ない目的エリア強調音Ｙ（ｆ）が得られる。

The target area sound extraction unit 106 prepares _two threshold values F ₁ (f) and F ₂ (f), calculates the area enhancement gain G (f), and obtains the area enhancement gain G (f ) May be multiplied by the extraction target signal X ′ (f) to calculate the target area emphasized sound Y (f). For example, assuming that two threshold values F ₁ (f) and F ₂ (f) are calculated according to the equation (15), φ ₁ = π / 9, φ ₂ −π / 11, and area emphasis by the equation (16) The gain may be calculated. As a result, among the frequency components of the extraction target signal X ′ (f), the attenuation degree of the component derived from the sound source existing in the vicinity of the boundary between the target area and the non-target area becomes moderate. An emphasis sound Y (f) is obtained.

１００…収音装置、１０２…データ入力部、１０３…周波数領域変換部、１０４…特徴量算出部、１０５…特徴量統合部、１０６…目的エリア音抽出部。 DESCRIPTION OF SYMBOLS 100 ... Sound collecting apparatus, 102 ... Data input part, 103 ... Frequency domain conversion part, 104 ... Feature-value calculation part, 105 ... Feature-value integration part, 106 ... Target area sound extraction part.

Claims

２つのマイクから成る複数のマイクアレイごとに、音の到来方向に応じて変化するものであって、目的エリア方向から到来する音響に対して大きな値を取り、目的エリア方向以外の方向から到来する音響に対して小さな値を取る特徴を備える到来方向特徴量を算出する特徴量算出手段と、
周波数成分ごとに、それぞれの前記マイクアレイの前記到来方向特徴量を統合したエリア特徴量を取得する特徴量統合手段と、
前記エリア特徴量を用いて、前記マイクアレイが出力する捕捉信号に基づく信号から目的エリア音を抽出する目的エリア音抽出手段と
を有することを特徴とする収音装置。 Each of a plurality of microphone arrays composed of two microphones changes according to the arrival direction of sound, takes a large value with respect to the sound coming from the target area direction, and comes from a direction other than the target area direction. A feature amount calculating means for calculating an arrival direction feature amount including a feature having a small value with respect to sound;
Feature quantity integration means for acquiring area feature quantities obtained by integrating the arrival direction feature quantities of the respective microphone arrays for each frequency component;
A sound collection apparatus comprising: a target area sound extraction unit that extracts a target area sound from a signal based on a captured signal output from the microphone array using the area feature amount.

前記特徴量統合手段は、周波数成分ごとに、すべての前記到来方向特徴量が大きいときに大きな値を取り、いずれかの前記到来方向特徴量が小さいときに小さな値を取ることで前記エリア特徴量を取得することを特徴とする請求項１に記載の収音装置。 The feature amount integration unit takes a large value when all the arrival direction feature amounts are large for each frequency component, and takes a small value when any of the arrival direction feature amounts is small. The sound collecting device according to claim 1, wherein:

前記特徴量統合手段は、周波数成分ごとに、全ての前記マイクアレイに関する前記到来方向特徴量の最小値を前記エリア特徴量として取得することを特徴とする請求項２に記載の収音装置。 The sound collection device according to claim 2, wherein the feature amount integration unit acquires, as the area feature amount, a minimum value of the arrival direction feature amounts related to all the microphone arrays for each frequency component.

前記目的エリア音抽出手段は、前記エリア特徴量の大小に応じて、前記捕捉信号に基づく信号から目的エリア音を抽出することを特徴とする請求項１〜３のいずれかに記載の収音装置。 The sound collection device according to claim 1, wherein the target area sound extraction unit extracts a target area sound from a signal based on the captured signal in accordance with the size of the area feature value. .

前記目的エリア音抽出手段は、周波数成分ごとの閾値を予め保持しておき、前記捕捉信号に基づく信号から、前記エリア特徴量が前記閾値より小さい周波数成分を減衰させることで目的エリア音を抽出することを特徴とする請求項４に記載の収音装置。 The target area sound extraction unit holds a threshold value for each frequency component in advance, and extracts a target area sound from the signal based on the captured signal by attenuating a frequency component whose area feature amount is smaller than the threshold value. The sound collection device according to claim 4.

周波数成分ごとの前記閾値は、周波数の高低によらず、周波数成分が減衰されない範囲を一定範囲とするように定められていることを特徴とする請求項５に記載の収音装置。 6. The sound collecting apparatus according to claim 5, wherein the threshold value for each frequency component is determined so that a range in which the frequency component is not attenuated is a constant range regardless of a frequency level.

コンピュータを、
２つのマイクから成る複数のマイクアレイごとに、音の到来方向に応じて変化するものであって、目的エリア方向から到来する音響に対して大きな値を取り、目的エリア方向以外の方向から到来する音響に対して小さな値を取る特徴を備える到来方向特徴量を算出する特徴量算出手段と、
周波数成分ごとに、それぞれの前記マイクアレイの前記到来方向特徴量を統合したエリア特徴量を取得する特徴量統合手段と、
前記エリア特徴量を用いて、前記マイクアレイが出力する捕捉信号に基づく信号から目的エリア音を抽出する目的エリア音抽出手段と
して機能させることを特徴とする収音プログラム。 Computer
Each of a plurality of microphone arrays composed of two microphones changes according to the arrival direction of sound, takes a large value with respect to the sound coming from the target area direction, and comes from a direction other than the target area direction. A feature amount calculating means for calculating an arrival direction feature amount including a feature having a small value with respect to sound;
Feature quantity integration means for acquiring area feature quantities obtained by integrating the arrival direction feature quantities of the respective microphone arrays for each frequency component;
A sound collection program that functions as a target area sound extraction unit that extracts a target area sound from a signal based on a captured signal output from the microphone array using the area feature amount.

収音方法において、
特徴量算出手段、特徴量統合手段、及び目的エリア音抽出手段を備え、
前記特徴量算出手段は、２つのマイクから成る複数のマイクアレイごとに、音の到来方向に応じて変化するものであって、目的エリア方向から到来する音響に対して大きな値を取り、目的エリア方向以外の方向から到来する音響に対して小さな値を取る特徴を備える到来方向特徴量を算出し、
前記特徴量統合手段は、周波数成分ごとに、それぞれの前記マイクアレイの前記到来方向特徴量を統合したエリア特徴量を取得し、
前記目的エリア音抽出手段は、前記エリア特徴量を用いて、前記マイクアレイが出力する捕捉信号に基づく信号から目的エリア音を抽出する
ことを特徴とする収音方法。 In the sound collection method,
A feature amount calculating means, a feature amount integrating means, and a target area sound extracting means;
The feature amount calculation means changes according to the direction of arrival of sound for each of a plurality of microphone arrays composed of two microphones, and takes a large value with respect to the sound arriving from the direction of the target area. Calculate the direction-of-arrival feature with a feature that takes a small value for sound coming from directions other than the direction,
The feature amount integration unit acquires, for each frequency component, an area feature amount obtained by integrating the arrival direction feature amounts of the microphone arrays.
The sound collection method according to claim 1, wherein the target area sound extraction unit extracts a target area sound from a signal based on a captured signal output from the microphone array, using the area feature amount.