JP2011072043A

JP2011072043A - Sound collecting device

Info

Publication number: JP2011072043A
Application number: JP2011003075A
Authority: JP
Inventors: Makoto Tanaka; 田中　　良
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2011-01-11
Filing date: 2011-01-11
Publication date: 2011-04-07
Anticipated expiration: 2025-06-29
Also published as: JP5273162B2

Abstract

PROBLEM TO BE SOLVED: To provide a sound collecting device capable of optionally realizing sound collection directivity for a sound source direction by detecting the sound source direction and performing sound collection, while tracking a sound source, when the sound source is moved. SOLUTION: A voice signal processing section 2 forms sound-collecting beams Xc1 to Xc4 corresponding to each partial region, by using time delay processing signals to be output from digital filters 21A to 21M. The voice signal processing section 2 selects a sound-collecting beam whose signal strength is the greatest from those sound collection beams Xc1 to Xc4, and detects the partial region where the sound source is present. When the selected sound collecting beam is switched accompanying the movement of the sound source, the voice signal processing section 2 performs fade-out processing for a synthesized voice signal that forms the pre-movement sound collection beam, while fade-in processing is performed for a synthesized voice signal that forms the post-movement sound collection beam. COPYRIGHT: (C)2011,JPO&INPIT

Description

この発明は、マイクロホンアレイを備えた集音装置、特に、受信した音声信号から音源方向を検出し、音源方向の受信指向性を高める集音装置に関するものである。 The present invention relates to a sound collection device including a microphone array, and more particularly to a sound collection device that detects a sound source direction from a received audio signal and increases reception directivity in the sound source direction.

従来、複数のマイクを用いて、話者等の音源からの音声を、他のノイズ等と区別して集音する装置やシステムが各種開示されている。 Conventionally, various devices and systems that use a plurality of microphones to collect sound from a sound source such as a speaker while distinguishing it from other noises are disclosed.

特許文献１には、音声作動スイッチング装置として、複数の指向性マイクロホンから最も強い音声信号を受信しているマイクロホンを判定して、このマイクロホンで受信した音声信号に対して、他のマイクロホンで受信した音声信号よりも強い重み付け処理をして出力する装置が開示されている。 In Patent Literature 1, a microphone that receives the strongest audio signal from a plurality of directional microphones is determined as an audio operation switching device, and the audio signal received by this microphone is received by another microphone. An apparatus for performing weighting processing stronger than that of an audio signal and outputting it is disclosed.

特許文献２には、直交円形マイクアレイシステムおよびこれを用いた音源の３次元方向検出方法として、経度方向に配列されたアレイマイクと、緯度方向に配列されたアレイマイクとで受信した音声信号から音源方向を検出し、これらとは別の超指向性マイクを検出した音源方向に機械的に向けるシステムおよび方法が開示されている。 In Patent Document 2, as an orthogonal circular microphone array system and a three-dimensional direction detection method of a sound source using the same, an audio signal received by an array microphone arranged in the longitude direction and an array microphone arranged in the latitude direction is used. A system and method for detecting a sound source direction and mechanically directing a superdirective microphone different from these to the detected sound source direction is disclosed.

特開平８−７０４９４号公報JP-A-8-70494 特開２００３−３０４５８９公報JP 2003-304589 A

しかしながら、特許文献１に記載の装置では、指向性マイクロホンを用いることで特定方向からの音声しか集音できず、任意方向への集音指向性を実現することができなかった。このため、音源が移動した場合の音源方向追尾を行うことができなかった。また、特許文献２に記載の装置では、任意方向への集音指向性を実現するために、超指向性のマイクロホンを目的とする方向へ向けるように機械的に回転させなければならなかった。さらに、特許文献２に記載の装置は、目的とする方向を検出するために、３次元配置されたマイクロホンからなるアレイマイクを使用し、複雑な音声信号処理を行わなければならなかった。 However, the apparatus described in Patent Document 1 can collect sound only from a specific direction by using a directional microphone, and cannot achieve sound collection directivity in an arbitrary direction. For this reason, sound source direction tracking when the sound source has moved cannot be performed. Further, in the apparatus described in Patent Document 2, in order to realize sound collection directivity in an arbitrary direction, the super-directional microphone has to be mechanically rotated so as to be directed in a target direction. Furthermore, the apparatus described in Patent Document 2 has to perform complicated audio signal processing using an array microphone including microphones arranged three-dimensionally in order to detect a target direction.

したがって、本発明の目的は、比較的簡素な構造および音声信号処理を用いて、音源方向を検出してこの音源方向に対する集音指向性を任意に実現し、音源が移動した場合に追尾しながら集音することができる集音装置を提供することにある。 Therefore, an object of the present invention is to detect a sound source direction using a relatively simple structure and audio signal processing, and arbitrarily realize sound collection directivity with respect to the sound source direction, while tracking when the sound source moves. An object of the present invention is to provide a sound collecting device that can collect sound.

この発明の集音装置は、複数のマイクロホンを所定パターンに配列して構成されるマイクロホンアレイと、前記複数のマイクロホンが集音した音声信号をそれぞれ所定の遅延時間で遅延して合成することにより、特定方向に強い指向性の集音ビームを形成するビーム形成部であって、並列にそれぞれ独立した遅延時間を設定することで同時に複数方向に集音ビームを形成するビーム形成部と、該ビーム形成部が形成した複数の集音ビームのうち、信号レベルが最も高い集音ビームが集音する音声信号を選択して後段に出力する信号選択部と、を備え、前記ビーム形成部は、所定タイミング毎に集音ビームを形成し、前記信号選択部は、前後するタイミングで異なる集音ビームを用いて、集音する音声信号を選択する場合に、前のタイミングで集音された音声信号による合成音声信号と、後のタイミングで集音された音声信号による合成音声信号とをクロスフェード処理して出力する、ことを特徴とする。 The sound collecting device of the present invention is configured by combining a microphone array configured by arranging a plurality of microphones in a predetermined pattern and a sound signal collected by the plurality of microphones with a predetermined delay time, respectively. A beam forming unit that forms a sound collecting beam having a strong directivity in a specific direction, and forms a sound collecting beam in a plurality of directions simultaneously by setting independent delay times in parallel. A signal selection unit that selects an audio signal collected by the sound collection beam having the highest signal level from among the plurality of sound collection beams formed by the unit, and outputs the selected signal to a subsequent stage, wherein the beam forming unit has a predetermined timing A sound collecting beam is formed every time, and the signal selection unit uses a different sound collecting beam at different timings to select a sound signal to be collected at the previous timing. A synthesized speech signal audible speech signals, after the synthesized speech signal by The speech signals collected by a timing by cross-fading outputs, characterized in that.

この構成では、マイクロホンアレイの正面側の領域における或る位置に話者等の音源が存在して、この音源から音声が発せられると、マイクロホンアレイを構成する各マイクが集音する。この際、音声は各マイクと音源との距離に応じた伝搬時間で伝搬されて、各マイクでは同じ音声の集音タイミングに差が生じる。 In this configuration, when a sound source such as a speaker is present at a certain position in a region on the front side of the microphone array and sound is emitted from this sound source, each microphone constituting the microphone array collects sound. At this time, the sound is propagated with a propagation time corresponding to the distance between each microphone and the sound source, and a difference occurs in the sound collection timing of the same sound in each microphone.

図９（Ａ）は音源とマイクロホンとの位置関係と、音源から発生した音が各マイクロホンで集音される際のディレイとの関係を示した図であり、図９（Ｂ），（Ｃ）は集音された音声信号のディレイに基づくディレイ補正量の形成概念を示す図である。なお、この説明では、マイクロホンの所定配列パターンとして、マイクロホンが直線状に配列されたマイクロホンアレイについて説明する。 FIG. 9A is a diagram showing the relationship between the positional relationship between the sound source and the microphone and the delay when the sound generated from the sound source is collected by each microphone, and FIGS. 9B and 9C. FIG. 4 is a diagram showing a concept of forming a delay correction amount based on a delay of a collected audio signal. In this description, a microphone array in which microphones are linearly arranged will be described as a predetermined arrangement pattern of microphones.

具体的には、音源１００Ａが、図９（Ａ）に示す位置に存在する場合、音源１００Ａで発生した音は、最も近いマイクロホン１０Ａで最初に集音される。そして、音源１００Ａと各マイクロホン１０Ｂ〜１０Ｉとの距離に応じて順に、各マイクロホン１０Ｂ〜１０Ｉで集音され、最も遠いマイクロホン１０Ｊで最後に集音される。一方、音源１００Ｂが、図９（Ａ）に示す位置に存在する場合、音源１００Ｂで発生した音は、最も近いマイクロホン１０Ｊで最初に集音される。そして、音源１００Ｂと各マイクロホン１０Ｉ〜１０Ｂとの距離に応じて順に、各マイクロホン１０Ｉ〜１０Ｂで集音され、最も遠いマイクロホン１０Ａで最後に集音される。このように、音源で発生した音は音源とマイクロホンとの距離に応じた遅延時間（ディレイ）で、マイクロホンに集音される。 Specifically, when the sound source 100A is present at the position shown in FIG. 9A, the sound generated by the sound source 100A is first collected by the nearest microphone 10A. Then, in order according to the distance between the sound source 100A and each of the microphones 10B to 10I, sound is collected by each of the microphones 10B to 10I, and finally collected by the farthest microphone 10J. On the other hand, when the sound source 100B is present at the position shown in FIG. 9A, the sound generated by the sound source 100B is first collected by the nearest microphone 10J. Then, in order according to the distance between the sound source 100B and each of the microphones 10I to 10B, sound is collected by each of the microphones 10I to 10B, and finally collected by the farthest microphone 10A. Thus, the sound generated by the sound source is collected by the microphone with a delay time (delay) corresponding to the distance between the sound source and the microphone.

この関係を用いることで、ビーム形成部は集音ビームを形成する。例えば、音源１００Ａに対しては、図９（Ｂ）に示すように、各マイクロホン１０Ａ〜１０Ｊで集音される音声信号を遅延処理する。すなわち、図９（Ａ）に示すディレイを補正するように、前記遅延時間に対応するディレイ補正量を設定する。また、音源１００Ｂに対しては、図９（Ｃ）に示すように、各マイクロホン１０Ａ〜１０Ｊで集音される音声信号を遅延処理する。すなわち、図９（Ａ）に示すディレイを補正するように、ディレイ補正量を設定する。 By using this relationship, the beam forming unit forms a sound collecting beam. For example, for the sound source 100A, as shown in FIG. 9B, the audio signals collected by the microphones 10A to 10J are subjected to delay processing. That is, the delay correction amount corresponding to the delay time is set so as to correct the delay shown in FIG. For the sound source 100B, as shown in FIG. 9C, the audio signals collected by the microphones 10A to 10J are subjected to delay processing. That is, the delay correction amount is set so as to correct the delay shown in FIG.

このように遅延処理された音声信号を合成することで、図９（Ｂ）に示すような補正を行えば、音源１００Ａ方向に強い指向性を有する集音ビームが形成され、図９（Ｃ）に示すような補正を行えば、音源１００Ｂ方向に強い指向性を有する集音ビームが形成される。 By synthesizing the audio signals subjected to delay processing in this way, if correction as shown in FIG. 9B is performed, a sound collecting beam having a strong directivity in the direction of the sound source 100A is formed, and FIG. If the correction shown in FIG. 4 is performed, a sound collecting beam having strong directivity in the direction of the sound source 100B is formed.

ビーム形成部は、このように各マイクロホンで集音される音声信号の遅延処理した信号をそれぞれに異なる組み合わせで合成することにより、複数方向の集音ビームを形成する。 The beam forming unit forms a sound collecting beam in a plurality of directions by synthesizing the signals obtained by delay processing of the sound signals collected by the respective microphones in different combinations.

信号選択部は、ビーム形成部が形成した複数の集音ビームの信号強度を比較して、信号レベルが最も高い集音ビームが集音する音声信号を選択する。または、信号選択部は、ビーム形成部が形成した複数の集音ビームの信号強度を比較して、信号レベルが最も高い集音ビームと該レベルが最も高い集音ビームに隣接する集音ビームとが集音する音声信号を選択する。ここで、信号レベルが最も高い集音ビームの指向性が向く方向は音源方向に相当する。これにより、音源方向が検出されて、該音源方向への強い集音指向性が実現される。 The signal selection unit compares the signal intensities of the plurality of sound collecting beams formed by the beam forming unit, and selects the sound signal collected by the sound collecting beam having the highest signal level. Alternatively, the signal selection unit compares the signal intensities of the plurality of sound collecting beams formed by the beam forming unit, and the sound collecting beam having the highest signal level and the sound collecting beam adjacent to the sound collecting beam having the highest level are Select the audio signal to be collected. Here, the direction in which the directivity of the sound collecting beam having the highest signal level is equivalent to the sound source direction. Thereby, the sound source direction is detected, and strong sound collection directivity in the sound source direction is realized.

また、この構成では、ビーム形成部は予め設定されたタイミング毎に前述の各集音ビームの形成を行い、信号選択部は、このタイミング毎に信号強度の比較を行い、集音ビームの選択すなわち音声信号の選択を切り替える。これにより、タイミング毎に最適な集音指向性が実現される。 Further, in this configuration, the beam forming unit performs the above-described sound collection beam formation at each preset timing, and the signal selection unit performs signal intensity comparison at each timing to select the sound collection beam, that is, Switches the audio signal selection. Thereby, the optimal sound collection directivity for each timing is realized.

また、この構成では、音源の移動に伴い選択される集音ビームが切り替えられるが、この際、移動前の集音ビームを形成する合成音声信号に対してフェードアウト処理を行うとともに、移動後の集音ビームを形成する合成音声信号に対してフェードイン処理を行う。これにより、音源の移動に伴って集音指向性の切り替わり、出力される合成音声信号の切り替えが平滑化される。すなわち、出力される音声を聞くユーザにとって、出力音声の切り替わりが滑らかになる。 In this configuration, the selected sound collection beam is switched in accordance with the movement of the sound source. At this time, fade-out processing is performed on the synthesized sound signal forming the sound collection beam before the movement, and the collected sound beam after the movement is also performed. Fade-in processing is performed on the synthesized speech signal that forms the sound beam. As a result, the sound collection directivity is switched in accordance with the movement of the sound source, and the switching of the output synthesized speech signal is smoothed. That is, for the user who listens to the output sound, the output sound is smoothly switched.

この発明によれば、マイクロホンアレイを用いて集音される音声信号に対して時間遅延処理および組み合わせ処理をするという、簡単な構造および簡単な信号処理だけで、音源方向を検出でき、さらに、この音源方向に対して強い集音指向性を与えることができる。これにより、音源からの音声を除くノイズを抑圧し、音源からの音声を鮮明に集音し出力することができる。 According to the present invention, it is possible to detect the direction of a sound source with only a simple structure and simple signal processing in which time delay processing and combination processing are performed on an audio signal collected using a microphone array. Strong sound directivity can be given to the direction of the sound source. As a result, noise excluding sound from the sound source can be suppressed, and sound from the sound source can be clearly collected and output.

また、この発明によれば、音源が移動しても、簡素な構造および簡素な処理で音源方向を追尾することができる。これにより、移動する音源に対しても、音声を鮮明に集音し続けて出力することができる。 Moreover, according to this invention, even if a sound source moves, a sound source direction can be tracked with a simple structure and a simple process. Thereby, it is possible to continuously collect and output the sound even to the moving sound source.

本発明の実施形態に係る集音装置の主要部の構成を示すブロック図The block diagram which shows the structure of the principal part of the sound collector which concerns on embodiment of this invention. 音声信号処理部２の概略構成を示すブロック図（前半部）Block diagram showing the schematic configuration of the audio signal processing unit 2 (first half) 音声信号処理部２の概略構成を示すブロック図（後半部）Block diagram showing the schematic configuration of the audio signal processing unit 2 (second half) 複数の集音ビームを選択する場合の音声信号処理部３の部分構成を示すブロック図The block diagram which shows the partial structure of the audio | voice signal processing part 3 in the case of selecting a several sound collection beam 音源方向検知方法（部分領域の検出）の概念図Conceptual diagram of sound source direction detection method (partial area detection) 音源方向検知方法（局所領域の検出）の概念図Conceptual diagram of sound source direction detection method (local area detection) 音声信号処理部２の処理フローを示すフロー図Flow chart showing the processing flow of the audio signal processing unit 2 円状にマイクロホンを配置したマイクホロンアレイの構成図Configuration diagram of microphone holon array with microphones arranged in a circle 音源とマイクロホンとの位置関係と、音源から発生した音が各マイクロホンで集音される際のディレイとの関係を示した図、および、集音された音声信号のディレイに基づくディレイ補正量の形成概念を示す図Diagram showing the relationship between the positional relationship between the sound source and the microphone and the delay when the sound generated from the sound source is collected by each microphone, and the formation of the delay correction amount based on the delay of the collected sound signal Illustration showing the concept

本発明の実施形態に係る集音装置について図１〜図８を参照して説明する。
以下の説明では、音源検知領域全体を４つの部分領域、および各部分領域をさらに分割する４つの局所領域で音源１００の検知を行うための構成について説明するが、領域の分割数は装置仕様や所望とする音源検知方向の分解能に基づいて、適宜設定すればよい。また、マイクロホンアレイ１は、マイクロホン１０Ａ〜１０Ｍを直線状に配列したものを用いて説明する。 A sound collector according to an embodiment of the present invention will be described with reference to FIGS.
In the following description, a configuration for detecting the sound source 100 in the four sound source detection regions in the four partial regions and in the four local regions that further divide each of the partial regions will be described. What is necessary is just to set suitably based on the resolution of the desired sound source detection direction. The microphone array 1 will be described using a microphone 10A to 10M arranged linearly.

図１は本実施形態の集音装置の主要部の構成を示すブロック図である。
また、図２、図３は本実施形態の音声信号処理部２の構成を示すブロック図であり、図２が集音ビームの形成、選択を行う部分を示し、図３は目的とする局所領域に対する集音ビームの選択、出力を行う部分を示す。なお、図２では、ディジタルフィルタ２１Ａ〜２１Ｍの内のディジタルフィルタ２１Ａについてのみ詳細なブロック図を図示し、詳細な説明を行うが、他のディジタルフィルタ２１Ｂ〜２１Ｍについても同様の構造である。 FIG. 1 is a block diagram showing the configuration of the main part of the sound collecting device of this embodiment.
2 and 3 are block diagrams showing the configuration of the audio signal processing unit 2 of this embodiment. FIG. 2 shows a portion for forming and selecting a sound collecting beam. FIG. 3 shows a target local region. The part which performs selection and output of the sound collecting beam is shown. FIG. 2 shows a detailed block diagram only for the digital filter 21A among the digital filters 21A to 21M, and will be described in detail, but the other digital filters 21B to 21M have the same structure.

図４は音声信号処理部２の他の構成（図３に対応する部分）を示す部分ブロック図である。 FIG. 4 is a partial block diagram showing another configuration (portion corresponding to FIG. 3) of the audio signal processing unit 2.

図５、図６は音源方向検知方法の概念図である。図５は、図２に示すディジタルフィルタ２１Ａ〜２１Ｍからレベル判定部２７までの部分による検出概念を示す。図６は図２に示すディジタルフィルタ２１Ａ〜２１Ｍから、図３に示すＢＰＦ２８、レベル判定部２９までの部分による検出概念を示す。 5 and 6 are conceptual diagrams of the sound source direction detection method. FIG. 5 shows a detection concept by the parts from the digital filters 21A to 21M to the level determination unit 27 shown in FIG. FIG. 6 shows a detection concept by the parts from the digital filters 21A to 21M shown in FIG. 2 to the BPF 28 and the level determination unit 29 shown in FIG.

図７は音声信号処理部２の処理フローを示すフロー図である。 FIG. 7 is a flowchart showing the processing flow of the audio signal processing unit 2.

本実施形態の集音装置は、所定方向に直線状に配列された複数のマイクロホン１０Ａ〜１０Ｍを有するマイクロホンアレイ１を備える。各マイクロホン１０Ａ〜１０Ｍは、特に特定方向に指向性を有する指向性マイクではなく、全てが同じ無指向性マイクにより構成されている。そして、各マイクロホン１０Ａ〜１０Ｍは、集音領域方向を正面として配置されている。また、マイクロホンアレイ１のマイクロホン配列個数および配置間隔は、集音領域の広さや、後述する音源方向の検出精度（方位分解能）、集音する音声の周波数帯域等により、適宜設定されている。 The sound collection device of the present embodiment includes a microphone array 1 having a plurality of microphones 10A to 10M arranged linearly in a predetermined direction. Each of the microphones 10 A to 10 M is not a directional microphone having directivity particularly in a specific direction, but all are configured by the same omnidirectional microphone. And each microphone 10A-10M is arrange | positioned by making the sound collection area | region direction into the front. The number and arrangement interval of the microphones in the microphone array 1 are appropriately set according to the width of the sound collection area, the detection accuracy (azimuth resolution) of the sound source direction described later, the frequency band of the sound to be collected, and the like.

マイクロホンアレイ１の各マイクロホン１０Ａ〜１０Ｍは、それぞれ、図示しないＡ／Ｄ変換器を介して、音声信号処理部２のディジタルフィルタ２１Ａ〜２１Ｍに接続されている。 The microphones 10A to 10M of the microphone array 1 are connected to digital filters 21A to 21M of the audio signal processing unit 2 via A / D converters (not shown).

音源から音声が発生すると、マイクロホン１０Ａ〜１０Ｍはこれを集音して電気的な音声信号に変換して出力する。マイクロホン１０Ａ〜１０Ｍから出力された音声信号は、Ａ／Ｄ変換器にて所定のサンプリング周期でサンプリングされる。このサンプリングにより得られるディジタル音声信号ｘ１（ｎ），ｘ２（ｎ）〜ｘｍ（ｎ）が、ディジタルフィルタ２１Ａ，２１Ｂ〜２１Ｍにそれぞれ入力される（Ｓ１）。 When sound is generated from the sound source, the microphones 10A to 10M collect the sound, convert it into an electrical sound signal, and output it. The audio signals output from the microphones 10A to 10M are sampled at a predetermined sampling period by the A / D converter. Digital audio signals x1 (n) and x2 (n) to xm (n) obtained by this sampling are input to the digital filters 21A and 21B to 21M, respectively (S1).

ディジタルフィルタ２１Ａは、予め設定されたサンプリング周期に応じた所定段のディレイバッファメモリ２２Ａを備える。ディレイバッファメモリ２２Ａの各段の遅延量に相当するサンプリング周期は、マイクロホンアレイ１におけるマイクロホン１０Ａ〜１０Ｍの配置、および、音源検知を行う領域からマイクロホンアレイ１までの距離から設定される。ディレイバッファメモリ２２Ａには８段の出力が備えられており、これらの出力がＦＩＲフィルタ２３１Ａ〜２３８Ａにそれぞれ入力される。 The digital filter 21A includes a delay buffer memory 22A of a predetermined stage corresponding to a preset sampling cycle. The sampling period corresponding to the delay amount of each stage of the delay buffer memory 22A is set from the arrangement of the microphones 10A to 10M in the microphone array 1 and the distance from the sound source detection region to the microphone array 1. The delay buffer memory 22A is provided with eight stages of outputs, and these outputs are input to the FIR filters 231A to 238A, respectively.

ディレイバッファメモリ２２Ａは、サンプリング周期に準じて、入力音声信号ｘ１（ｎ）に対してそれぞれに異なる遅延処理を施した音声信号を各段に記憶し、まずＦＩＲフィルタ２３１Ａ〜２３４Ａに対して記憶された各遅延処理信号を出力する（Ｓ２）。さらに、ディレイバッファメモリ２２Ａは、レベル判定部２７の検知結果に基づいてＦＩＲフィルタ２３５Ａ〜２３８Ａに対して記憶された各遅延処理信号を出力する。このようなディレイバッファメモリ２２Ａは、例えばシフトレジスタ等で構成することができる。 The delay buffer memory 22A stores, in each stage, audio signals obtained by performing different delay processing on the input audio signal x1 (n) in accordance with the sampling period, and is first stored in the FIR filters 231A to 234A. Each delayed processing signal is output (S2). Furthermore, the delay buffer memory 22A outputs each delay processing signal stored to the FIR filters 235A to 238A based on the detection result of the level determination unit 27. Such a delay buffer memory 22A can be constituted by a shift register, for example.

ここで、ディレイバッファメモリ２２Ａにおける、ＦＩＲフィルタ２３１Ａ〜２３４Ａに入力される出力は、予め設定されている。具体的には、ＦＩＲフィルタ２３１Ａに出力される信号は部分領域１０１に対応する集音ビームＸｃ１を構成するための遅延処理信号であり、ＦＩＲフィルタ２３２Ａに出力される信号は部分領域１０２に対応する集音ビームＸｃ２を構成するための遅延処理信号である。また、ＦＩＲフィルタ２３３Ａに出力される信号は部分領域１０３に対応する集音ビームＸｃ３を構成するための遅延処理信号であり、ＦＩＲフィルタ２３４Ａに出力される信号は部分領域１０４に対応する集音ビームＸｃ４を構成するための遅延処理信号である。 Here, the outputs input to the FIR filters 231A to 234A in the delay buffer memory 22A are set in advance. Specifically, the signal output to the FIR filter 231A is a delay processing signal for constituting the sound collection beam Xc1 corresponding to the partial region 101, and the signal output to the FIR filter 232A corresponds to the partial region 102. It is a delay processing signal for constituting the sound collection beam Xc2. The signal output to the FIR filter 233A is a delay processing signal for constituting the sound collection beam Xc3 corresponding to the partial region 103, and the signal output to the FIR filter 234A is the sound collection beam corresponding to the partial region 104. This is a delay processing signal for configuring Xc4.

また、ＦＩＲフィルタ２３５Ａに出力される信号は、後述するレベル判定部２７で検出された部分領域１０４（図５，図６の場合）を再分割する局所領域１４１に対応する集音ビームＸｃ５を構成するための遅延処理信号であり、ＦＩＲフィルタ２３６Ａに出力される信号は局所領域１４２に対応する集音ビームＸｃ６を構成するための遅延処理信号である。また、ＦＩＲフィルタ２３７Ａに出力される信号は局所領域１４３に対応する集音ビームＸｃ７を構成するための遅延処理信号であり、ＦＩＲフィルタ２３８Ａに出力される信号は局所領域１４４に対応する集音ビームＸｃ８を構成するための遅延処理信号である。ここで、局所領域１４１〜１４４は部分領域１０１〜１０４のうちで音源１００方向が含まれる部分領域を再分割した局所領域であり、レベル判定部２７の検知結果に応じて変わる。そして、これらの局所領域に応じて各集音ビームＸｃ５〜Ｘｃ８を構成するための遅延処理信号が、検知結果に基づいて更新されながら設定される。 Further, the signal output to the FIR filter 235A constitutes a sound collection beam Xc5 corresponding to the local region 141 that subdivides the partial region 104 (in the case of FIGS. 5 and 6) detected by the level determination unit 27 described later. The signal output to the FIR filter 236A is a delay processing signal for constituting the sound collection beam Xc6 corresponding to the local region 142. The signal output to the FIR filter 237A is a delay processing signal for forming the sound collection beam Xc7 corresponding to the local region 143, and the signal output to the FIR filter 238A is the sound collection beam corresponding to the local region 144. This is a delay processing signal for configuring Xc8. Here, the local areas 141 to 144 are local areas obtained by re-dividing the partial area including the direction of the sound source 100 among the partial areas 101 to 104, and change according to the detection result of the level determination unit 27. And the delay processing signal for comprising each sound collection beam Xc5-Xc8 according to these local regions is set, updating based on a detection result.

そして、これらの遅延処理信号は、図９に示すような集音ビームと遅延処理との関係により設定される。 These delay processing signals are set according to the relationship between the sound collection beam and the delay processing as shown in FIG.

ＦＩＲフィルタ２３１Ａ〜２３８Ａは全て同じ構成からなり、それぞれに入力された遅延処理信号をＦＩＲ処理して、遅延処理信号Ｘ１Ｓ１〜Ｘ１Ｓ８を出力する。このように、ＦＩＲフィルタ２３１Ａ〜２３８ＡでＦＩＲ処理を行うことにより、ディレイバッファメモリ２２Ａでは実現できない、サンプリング周期間の詳細な遅延を実現することができる。すなわち、ＦＩＲフィルタのフィルタ内サンプリング周期とタップ数とを所望値に設定することにより、ディレイバッファメモリ２２Ａでのサンプリング周期を遅延時間の整数部分とする場合に、この遅延時間の小数点部分を実現することができる。これにより、ディジタルフィルタ２１Ａは、入力音声信号ｘ１（ｎ）に対して、サンプリング周期に単に準じた遅延処理よりも、さらに詳細な遅延処理を行い、遅延処理信号Ｘ１Ｓ１〜Ｘ１Ｓ８を生成することができる。 The FIR filters 231A to 238A all have the same configuration, and FIR processing is performed on the delay processing signals input thereto, and delay processing signals X1S1 to X1S8 are output. Thus, by performing FIR processing with the FIR filters 231A to 238A, it is possible to realize a detailed delay between sampling periods that cannot be realized with the delay buffer memory 22A. That is, by setting the sampling period and the number of taps in the FIR filter to desired values, the decimal point part of the delay time is realized when the sampling period in the delay buffer memory 22A is an integer part of the delay time. be able to. As a result, the digital filter 21A can perform further detailed delay processing on the input audio signal x1 (n) than the delay processing simply according to the sampling period, and generate the delay processing signals X1S1 to X1S8. .

各ＦＩＲフィルタ２３１Ａ〜２３８Ａから出力された遅延処理信号Ｘ１Ｓ１〜Ｘ１Ｓ８は、それぞれアンプ２４１Ａ〜２４８Ａで増幅されて、それぞれ加算器２５Ａ〜２５Ｈに入力される。 Delay processing signals X1S1 to X1S8 output from the FIR filters 231A to 238A are amplified by amplifiers 241A to 248A, respectively, and input to adders 25A to 25H, respectively.

他のディジタルフィルタ２１Ｂ〜２１Ｍも、ディジタルフィルタ２１Ａと同じ構造であり、それぞれに予め設定された遅延処理信号の選択条件にしたがって、入力された音声信号Ｘ２（ｎ）〜Ｘｍ（ｎ）から集音ビームＸｃ１〜Ｘｃ８を形成するための遅延処理信号Ｘ２Ｓ１〜Ｘ２Ｓ８，・・・，ＸＭＳ１〜ＸＭＳ８を各加算器２５Ａ〜２５Ｈに出力する。 The other digital filters 21B to 21M have the same structure as the digital filter 21A, and collect sound from the input audio signals X2 (n) to Xm (n) in accordance with the preset delay processing signal selection conditions. Delay processing signals X2S1 to X2S8,..., XMS1 to XMS8 for forming the beams Xc1 to Xc8 are output to the adders 25A to 25H.

加算器２５Ａは、各ディジタルフィルタ２１Ａ〜２１Ｍから入力される遅延処理信号Ｘ１Ｓ１，Ｘ２Ｓ１，・・・，ＸＭＳ１を合成して集音ビームＸｃ１を生成し、加算器２５ＢはＸ１Ｓ２，Ｘ２Ｓ２，・・・，ＸＭＳ２を合成して集音ビームＸｃ２を生成する。加算器２５Ｃは、各ディジタルフィルタ２１Ａ〜２１Ｍから入力される遅延処理信号Ｘ１Ｓ３，Ｘ２Ｓ３，・・・，ＸＭＳ３を合成して集音ビームＸｃ３を生成し、加算器２５ＤはＸ１Ｓ４，Ｘ２Ｓ４，・・・，ＸＭＳ４を合成して集音ビームＸｃ４を生成する（Ｓ３）。 The adder 25A synthesizes the delay processing signals X1S1, X2S1,..., XMS1 input from the digital filters 21A to 21M to generate a sound collecting beam Xc1, and the adder 25B has X1S2, X2S2,. , XMS2 are synthesized to generate a sound collection beam Xc2. The adder 25C synthesizes the delay processing signals X1S3, X2S3,..., XMS3 input from the digital filters 21A to 21M to generate a sound collection beam Xc3, and the adder 25D has X1S4, X2S4,. , XMS4 are synthesized to generate a sound collection beam Xc4 (S3).

バンドパスフィルタＢＰＦ２６は、入力された集音ビームＸｃ１〜Ｘｃ４のフィルタリングを行って、レベル判定部２７に出力する。ここで、ＢＰＦ２６は、マイクロホンアレイ１の幅（配列方向の長さ）、および、マイクロホン１０Ａ〜１０Ｍの設置間隔に応じてビーム化される周波数帯域が異なることを利用し、各集音ビームＸｃ１〜Ｘｃ４で集音したい音声に対応する周波数帯域を通過帯域に設定している。 The band pass filter BPF 26 performs filtering of the input sound collection beams Xc 1 to Xc 4 and outputs the filtered sound beams to the level determination unit 27. Here, the BPF 26 utilizes the fact that the frequency band to be beamed differs depending on the width of the microphone array 1 (length in the arrangement direction) and the installation interval of the microphones 10A to 10M, and the sound collecting beams Xc1 to Xc1. The frequency band corresponding to the sound to be collected by Xc4 is set as the pass band.

レベル判定部２７は、集音ビームＸｃ１〜Ｘｃ４の信号強度を比較し、最も信号強度が強い集音ビームを選択する（Ｓ４）。ところで、集音ビームの信号強度が最も強いということは、この集音ビームにより集音される領域に音源が存在することを意味する。これにより、音源探知領域全体を４つの部分領域に区分した場合における、音源が存在する部分領域を検出することができる。例えば、図５の例では、部分領域１０４に対応する集音ビームＸｃ４の信号強度が他の集音ビームＸｃ１〜Ｘｃ３の信号強度よりも強いので、レベル検出部２７は集音ビームＸｃ４を選択し、部分領域１０４に音源１００が存在することを検知する。 The level determination unit 27 compares the signal intensities of the sound collecting beams Xc1 to Xc4, and selects the sound collecting beam having the strongest signal intensity (S4). By the way, that the signal intensity of the sound collecting beam is the strongest means that a sound source exists in an area collected by the sound collecting beam. Thereby, it is possible to detect a partial area where a sound source exists when the entire sound source detection area is divided into four partial areas. For example, in the example of FIG. 5, since the signal intensity of the sound collection beam Xc4 corresponding to the partial region 104 is stronger than the signal intensity of the other sound collection beams Xc1 to Xc3, the level detection unit 27 selects the sound collection beam Xc4. The sound source 100 is detected in the partial area 104.

レベル判定部２７は、検出した部分領域をさらに再分割した局所領域に対応する選択信号を生成し、ディジタルフィルタ２１Ａ〜２１Ｍに出力する。より具体的に、例えば、図５，図６の例であれば、レベル判定部２７は、部分領域１０４をさらに方位方向に４分割する局所領域１４１〜１４４に対する集音ビームＸｃ５〜Ｘｃ８を形成するための各ディジタルフィルタ２１Ａ〜２１Ｍから出力される遅延処理信号を選択する。そして、この選択内容が含まれる選択信号を各ディジタルフィルタ２１Ａ〜２１Ｍに出力する。 The level determination unit 27 generates a selection signal corresponding to the local region obtained by further subdividing the detected partial region, and outputs the selection signal to the digital filters 21A to 21M. More specifically, for example, in the example of FIGS. 5 and 6, the level determination unit 27 forms sound collection beams Xc 5 to Xc 8 for the local regions 141 to 144 that further divide the partial region 104 into four in the azimuth direction. The delay processing signals output from the respective digital filters 21A to 21M are selected. And the selection signal containing this selection content is output to each digital filter 21A-21M.

各ディジタルフィルタ２１Ａ〜２１Ｍはこの選択信号にしたがい、それぞれのディレイバッファメモリから加算器２５Ｅ〜２５Ｈに出力する遅延処理信号を選択設定する。例えば、ディジタルフィルタ２１Ａは、レベル判定部２７から入力された選択信号にしたがい、ディレイバッファメモリ２２ＡからＦＩＲフィルタ２３５Ａ〜２３８Ａにそれぞれ選択された遅延処理信号を出力する。ＦＩＲフィルタ２３５Ａ〜２３８Ａは、それぞれ入力された遅延処理信号をＦＩＲ処理してアンプ２４５Ａ〜２４８Ａを介して、加算器２５Ｅ〜２５Ｈに出力する（Ｓ５）。 Each of the digital filters 21A to 21M selects and sets a delay processing signal to be output from the respective delay buffer memory to the adders 25E to 25H according to the selection signal. For example, the digital filter 21 A outputs the delay processing signals selected from the delay buffer memory 22 A to the FIR filters 235 A to 238 A according to the selection signal input from the level determination unit 27. The FIR filters 235A to 238A perform FIR processing on the input delay processing signals, and output to the adders 25E to 25H via the amplifiers 245A to 248A (S5).

加算器２５Ｅは、各ディジタルフィルタ２１Ａ〜２１Ｍから入力される遅延処理信号Ｘ１Ｓ５，Ｘ２Ｓ５，・・・，ＸＭＳ５を合成して集音ビームＸｃ５を生成し、加算器２５ＦはＸ１Ｓ６，Ｘ２Ｓ６，・・・，ＸＭＳ６を合成して集音ビームＸｃ６を生成する。加算器２５Ｇは、各ディジタルフィルタ２１Ａ〜２１Ｍから入力される遅延処理信号Ｘ１Ｓ７，Ｘ２Ｓ７，・・・，ＸＭＳ７を合成して集音ビームＸｃ７を生成し、加算器２５ＨはＸ１Ｓ８，Ｘ２Ｓ８，・・・，ＸＭＳ８を合成して集音ビームＸｃ８を生成する（Ｓ６）。 The adder 25E synthesizes the delay processing signals X1S5, X2S5,..., XMS5 input from the digital filters 21A to 21M to generate a sound collection beam Xc5, and the adder 25F has X1S6, X2S6,. , XMS6 are synthesized to generate a sound collection beam Xc6. The adder 25G synthesizes the delay processing signals X1S7, X2S7,..., XMS7 inputted from the digital filters 21A to 21M to generate a sound collection beam Xc7, and the adder 25H has X1S8, X2S8,. , XMS8 are synthesized to generate a sound collection beam Xc8 (S6).

加算器２５Ｅ〜２５Ｈのそれぞれから出力された集音ビームＸｃ５〜Ｘｃ８は、２系統に分配され、一方はバンドパスフィルタＢＰＦ２８に入力され、他方はセレクタ３０に入力される。 The collected sound beams Xc5 to Xc8 output from each of the adders 25E to 25H are distributed to two systems, one is input to the band pass filter BPF28, and the other is input to the selector 30.

バンドパスフィルタＢＰＦ２８は、入力された集音ビームＸｃ５〜Ｘｃ８のフィルタリングを行って、レベル判定部２９に出力する。ここで、ＢＰＦ２８は、マイクロホンアレイ１の幅（配列方向の長さ）、および、マイクロホン１０Ａ〜１０Ｍの設置間隔に応じてビーム化される周波数帯域が異なることを利用し、各集音ビームＸｃ５〜Ｘｃ８で集音したい音声に対応する周波数帯域を通過帯域に設定している。 The band pass filter BPF 28 performs filtering of the input sound collection beams Xc 5 to Xc 8 and outputs the filtered sound beams to the level determination unit 29. Here, the BPF 28 uses the fact that the frequency band to be beamed differs according to the width of the microphone array 1 (length in the arrangement direction) and the installation interval of the microphones 10A to 10M, and the sound collecting beams Xc5 to Xc5. The frequency band corresponding to the sound to be collected by Xc8 is set as the pass band.

レベル判定部２９は、集音ビームＸｃ５〜Ｘｃ８の信号強度を比較し、最も信号強度が強い集音ビームを選択する（Ｓ７）。ところで、集音ビームの信号強度が最も強いということは、この集音ビームにより集音される領域に音源が存在することを意味する。これにより、部分領域全体を４つの局所領域に区分した場合における、音源が存在する局所領域を検出することができる。例えば、図６の例では、部分領域１０４における局所領域１４１に対応する集音ビームＸｃ５の信号強度が他の集音ビームＸｃ６〜Ｘｃ８の信号強度よりも強いので、レベル検出部２９は集音ビームＸｃ５を選択し、局所領域１４１に音源１００が存在することを検知する。 The level determination unit 29 compares the signal intensities of the sound collecting beams Xc5 to Xc8, and selects the sound collecting beam having the strongest signal intensity (S7). By the way, that the signal intensity of the sound collecting beam is the strongest means that a sound source exists in an area collected by the sound collecting beam. Thereby, the local region where the sound source exists can be detected when the entire partial region is divided into four local regions. For example, in the example of FIG. 6, the signal intensity of the sound collection beam Xc5 corresponding to the local area 141 in the partial area 104 is stronger than the signal intensity of the other sound collection beams Xc6 to Xc8. Xc5 is selected and the sound source 100 is detected in the local region 141.

レベル判定部２９は、検出した局所領域に対応する選択信号を生成し、セレクタ３０に出力する。セレクタ３０は、入力された選択信号にしたがい、検出された局所領域に対応する集音ビームを選択して出力する。例えば、図６の例であれば、局所領域１４１に対応する集音ビームＸｃ５を選択して、この集音ビームＸｃ５により形成される音声信号を出力する（Ｓ８）。 The level determination unit 29 generates a selection signal corresponding to the detected local region and outputs the selection signal to the selector 30. The selector 30 selects and outputs a sound collection beam corresponding to the detected local region in accordance with the input selection signal. For example, in the example of FIG. 6, the sound collection beam Xc5 corresponding to the local region 141 is selected, and the sound signal formed by this sound collection beam Xc5 is output (S8).

このような構成および処理を用いることにより、音源方向からの音声のみを集音して出力することができる。これにより、話者等の音源からの音声とは異なるノイズを抑圧して、音源からの音声のみを集音することができる。このように出力された合成信号はスピーカ５により音声として外部に放音される。 By using such a configuration and processing, only the sound from the sound source direction can be collected and output. As a result, noise different from the sound from the sound source of the speaker or the like can be suppressed, and only the sound from the sound source can be collected. The synthesized signal output in this way is emitted as sound by the speaker 5 to the outside.

以上のように、本実施形態の構成および処理を用いることにより、音源方向を検出して、検出した音源方向に対する集音指向性を高めることができ、ノイズを抑圧して音源方向からの音声のみを集音することができる。この際、マイクロホンアレイを用いて、機械的にマイク方向を移動させることがないので、従来よりも簡単な構造で且つ簡素な処理で音源方向の検出と、音源方向に強い指向性を有する集音を行うことができる。 As described above, by using the configuration and processing of the present embodiment, the sound source direction can be detected, and the sound collection directivity with respect to the detected sound source direction can be improved. Can be collected. At this time, since the microphone direction is not mechanically moved using the microphone array, the sound source direction is detected with a simpler structure and simpler processing than before, and the sound collection having strong directivity in the sound source direction. It can be performed.

また、音源検知を２段階に分けて徐々に詳細に行うことで、一度に詳細な音源領域検知を行うよりも処理量を低減することができる。例えば、図５、図６に示したような局所領域分割を一度に処理する場合、１６分割の局所領域となり、１６方向の集音ビームを形成しなければならない。しかしながら、本実施形態の構成および処理方法を用いることにより、８方向の集音ビーム形成で、音源領域の検知を行うことができる。 Further, by performing the sound source detection in two stages and gradually performing the details, the processing amount can be reduced as compared to performing the detailed sound source region detection at a time. For example, when the local area division as shown in FIGS. 5 and 6 is processed at a time, it becomes a 16-division local area and a sound collecting beam in 16 directions must be formed. However, by using the configuration and processing method of the present embodiment, it is possible to detect a sound source region by forming sound collecting beams in eight directions.

なお、前述の説明では、局所領域に関するレベル判定部２９で選択する集音ビームが１つの場合について説明した。しかしながら、複数の集音ビームを選択して処理してもよい。 In the above description, the case where there is one sound collection beam selected by the level determination unit 29 related to the local region has been described. However, a plurality of sound collecting beams may be selected and processed.

図４は、複数の集音ビームを選択する場合の音声信号処理部２の部分構成を示すブロック図である。なお、この音声信号処理部２は、ディジタルフィルタ２１Ａ〜２１Ｍから加算器２５Ｅ〜２５Ｈまでの図２に示した構成は同じであるので図示説明は省略し、異なる部分（図３に対応）のみを図示したものである。 FIG. 4 is a block diagram showing a partial configuration of the audio signal processing unit 2 when a plurality of sound collecting beams are selected. The audio signal processing unit 2 has the same configuration as that shown in FIG. 2 from the digital filters 21A to 21M to the adders 25E to 25H, and therefore will not be described. Only the different parts (corresponding to FIG. 3) will be omitted. It is illustrated.

レベル判定部２９は、集音ビームＸｃ５〜Ｘｃ８の信号強度を比較し、最も信号強度が強い集音ビームと、次に信号強度が強い集音ビームとの２つの集音ビームを選択する選択信号Ｓをセレクタ３０に出力する。これと同時に、レベル判定部２９は、これら２つの集音ビームの信号強度を重み付け設定部３１に出力する。 The level determination unit 29 compares the signal intensities of the sound collecting beams Xc5 to Xc8, and selects a sound collecting beam having the strongest signal intensity and the second sound collecting beam having the next strongest signal intensity. S is output to the selector 30. At the same time, the level determination unit 29 outputs the signal strengths of these two sound collection beams to the weight setting unit 31.

セレクタ３０は、入力された選択信号にしたがい、検出された２つの局所領域にそれぞれ対応する２つの集音ビームＸｃａ，Ｘｃｂを選択する。そして、セレクタ３０は、集音ビームＸｃａを信号合成部３２の可変アンプ３２１に出力し、集音ビームＸｃｂを信号合成部３２の可変アンプ３２２に出力する。 The selector 30 selects the two sound collecting beams Xca and Xcb corresponding to the two detected local regions, respectively, according to the input selection signal. Then, the selector 30 outputs the sound collection beam Xca to the variable amplifier 321 of the signal synthesis unit 32 and outputs the sound collection beam Xcb to the variable amplifier 322 of the signal synthesis unit 32.

重み付け設定部３１、２つの集音ビームの信号強度から重み付け係数をｗ１，ｗ２を設定する。ここで、重み付け設定部３１は、より信号強度が強い集音ビームが強調されるような重み付けを行う。例えば、２つの集音ビームの信号強度の比が７：３であれば、合成信号の振幅が正規化されるように、各集音ビームに対する重み付け係数をそれぞれ０．７と０．３とに設定する。 The weighting coefficient setting unit 31 sets weighting coefficients w1 and w2 from the signal intensities of the two sound collecting beams. Here, the weight setting unit 31 performs weighting so that a sound collecting beam having a higher signal intensity is emphasized. For example, if the ratio of the signal intensity of two sound collecting beams is 7: 3, the weighting coefficient for each sound collecting beam is set to 0.7 and 0.3, respectively, so that the amplitude of the combined signal is normalized. Set.

信号合成部３２は、可変アンプ３２１，３２２と加算器３２３とを備える。可変アンプ３２１は与えられた重み付け係数ｗ１で集音ビームＸｃａの合成信号を増幅し、可変アンプ３２２は与えられた重み付け係数ｗ２で集音ビームＸｃｂの合成信号を増幅する。加算器３２３は可変増幅器３２１，３２２からの出力信号を合成し、合成信号をスピーカ５に出力する。 The signal synthesis unit 32 includes variable amplifiers 321 and 322 and an adder 323. The variable amplifier 321 amplifies the synthesized signal of the sound collection beam Xca with the given weighting coefficient w1, and the variable amplifier 322 amplifies the synthesized signal of the sound collection beam Xcb with the given weighting coefficient w2. The adder 323 combines the output signals from the variable amplifiers 321 and 322 and outputs the combined signal to the speaker 5.

これらの処理を、図６を例に説明すると、レベル判定部２９は、信号強度から集音ビームＸｃ５，Ｘｃ６を選択する。すなわち、局所領域１４１，１４２からなる領域に音源１００が存在すると検知する。セレクタ３０は、入力される集音ビームＸｃ５〜Ｘｃ８から集音ビームＸｃ５，Ｘｃ６の２つを選択して、集音ビームＸｃ５の合成信号を可変アンプ３２１に出力し、集音ビームＸｃ６の合成信号を可変アンプ３２２に出力する。重み付け設定部３１は、集音ビームＸｃ５，Ｘｃ６の振幅強度の比から、集音ビームＸｃ５に対する重み付け係数ｗ１と、集音ビームＸｃ６に対する重み付け係数ｗ２を設定して、重み付け係数ｗ１を可変アンプ３２１に与え、重み付け係数ｗ２を可変アンプ３２２に与える。可変アンプ３２１は重み付け係数ｗ１で集音ビームＸｃ５の合成信号を増幅し、可変アンプ３２２は重み付け係数ｗ２で集音ビームＸｃ６の合成信号を増幅する。加算器３２３は重み付け係数ｗ１の集音ビームＸｃ５からなる合成信号と、重み付け係数ｗ２の集音ビームＸｃ６からなる合成信号とをさらに合成してスピーカ５に出力する。 These processes will be described using FIG. 6 as an example. The level determination unit 29 selects the sound collection beams Xc5 and Xc6 from the signal intensity. That is, it is detected that the sound source 100 exists in an area composed of the local areas 141 and 142. The selector 30 selects two of the sound collecting beams Xc5 to Xc8 from the input sound collecting beams Xc5 to Xc8, and outputs a synthesized signal of the sound collecting beam Xc5 to the variable amplifier 321 to produce a synthesized signal of the sound collecting beam Xc6. Is output to the variable amplifier 322. The weighting setting unit 31 sets the weighting coefficient w1 for the sound collecting beam Xc5 and the weighting coefficient w2 for the sound collecting beam Xc6 from the ratio of the amplitude intensity of the sound collecting beams Xc5 and Xc6, and sets the weighting coefficient w1 to the variable amplifier 321. The weighting coefficient w 2 is given to the variable amplifier 322. The variable amplifier 321 amplifies the combined signal of the sound collection beam Xc5 with the weighting coefficient w1, and the variable amplifier 322 amplifies the combined signal of the sound collection beam Xc6 with the weighting coefficient w2. The adder 323 further synthesizes the synthesized signal composed of the sound collecting beam Xc5 having the weighting coefficient w1 and the synthesized signal composed of the sound collecting beam Xc6 having the weighting coefficient w2, and outputs the synthesized signal to the speaker 5.

このような構成および処理を用いることにより、音源方向を含む領域とこれに隣接する領域とからなる所定範囲からの音声を集音して出力することができる。これにより、音源が前述のように設定される局所領域の境界付近に存在しても、話者等の音源からの音声を強調し、それ以外のノイズを抑圧することができる。 By using such a configuration and processing, it is possible to collect and output sound from a predetermined range including a region including the sound source direction and a region adjacent thereto. Thereby, even if the sound source exists near the boundary of the local region set as described above, it is possible to enhance the sound from the sound source such as a speaker and suppress other noises.

ところで、前述の説明は音源が移動していない状態を示したが、前述の音源方向の検出処理および合成処理を、予め設定した所定タイミング周期で行うことで、音源の追尾を行うことができる。すなわち、音源が移動すれば、この移動に応じてそれぞれのタイミングで検出される音源方向も変化（移動）する。そして、この検出される音源方向の変化に応じて、集音指向性を変化させることで、実質的に集音しながら音源の追尾を行うことができる。この処理は機械的動作を含まないものであるので、装置の構成を単純化することができる。 By the way, although the above description has shown the state where the sound source has not moved, the sound source can be tracked by performing the above-described sound source direction detection processing and synthesis processing at a predetermined timing cycle set in advance. That is, if the sound source moves, the sound source direction detected at each timing also changes (moves) in accordance with this movement. Then, by changing the sound collection directivity according to the detected change of the sound source direction, the sound source can be tracked while substantially collecting the sound. Since this process does not include a mechanical operation, the configuration of the apparatus can be simplified.

この際、音声信号処理部２は、前記タイミング周期の隣り合うタイミングで検出音源方向が異なれば、先に出力した音声信号と、後に出力する音声信号とをクロスフェード処理してもよい。すなわち、先に出力している音声信号の信号強度を出力段で徐々に抑圧していくとともに、後に出力する音声信号の信号強度を出力段で徐々に増幅していく処理を行い、これらを合成して出力する。このような処理を行うことで、音源の移動に伴う音声の移動が、聴覚上滑らかに行われ、出力される音声を聞く者に違和感を与えない。ここで、音源が部分領域間を完全に移動するのではなく、部分領域の境界で音源が停止した場合には、前述のように、完全に出力信号が切り替わる処理を行う必要はない。この場合、クロスフェードを開始してから、徐々に集音ビームの比を変化させていき、最終的に互いの集音ビームの比が０．５：０．５になる合成処理を行ってもよい。 At this time, the audio signal processing unit 2 may cross-fade the audio signal output first and the audio signal output later if the detected sound source directions are different at adjacent timings in the timing period. In other words, the signal intensity of the audio signal output earlier is gradually suppressed at the output stage, and the signal intensity of the audio signal output later is gradually amplified at the output stage, and these are combined. And output. By performing such a process, the movement of the sound accompanying the movement of the sound source is performed smoothly and does not give a sense of incongruity to the person who hears the output sound. Here, when the sound source does not completely move between the partial areas but stops at the boundary of the partial areas, it is not necessary to perform the process of completely switching the output signal as described above. In this case, the ratio of the sound collecting beams is gradually changed after the start of the crossfade, and finally the synthesis process is performed in which the ratio of the sound collecting beams becomes 0.5: 0.5. Good.

なお、前述の説明では、音源が存在する全体領域を４つの部分領域に分割し、さらに音源が存在する部分領域を４つの局所領域に分割して、音源検知を行ったが、これら分割数は、装置の仕様や、装置の適用環境（音源検知領域の広さ等）に応じて、任意に設定することができる。 In the above description, the entire area where the sound source exists is divided into four partial areas, and the partial area where the sound source exists is further divided into four local areas, and sound source detection is performed. It can be arbitrarily set according to the specifications of the apparatus and the application environment of the apparatus (such as the width of the sound source detection area).

また、前述の説明では、２段で音源検知を行ったが、１段で行っても、３段以上の複数段でおこなってもよい。また、前述の各処理は、物理的な回路素子の組み合わせで実現しても、ＤＳＰ等の集積回路を用いてソフトウエア処理で実現してもよい。 In the above description, sound source detection is performed in two stages, but it may be performed in one stage or in a plurality of stages of three or more. The above-described processes may be realized by a combination of physical circuit elements, or may be realized by software processing using an integrated circuit such as a DSP.

また、前述の説明ではマイクロホンを直線状に配列したマイクロホンアレイを例に示したが、マトリクス状であってもよく、さらに、図８に示すようなマイクロホンの配置からなるマイクロホンアレイでもあってよい。 In the above description, the microphone array in which the microphones are linearly arranged is shown as an example. However, the microphone array may be a matrix or may be a microphone array having a microphone arrangement as shown in FIG.

図８（Ａ）は１次元の円状にマイクロホンを配置したマイクホロンアレイの構成を示し、（Ｂ）は複数次元の円状にマイクロホンを配置したマイクロホンアレイの構成を示す。なお、図８では、代表するマイクロホンにのみ記号を付し、他のマイクロホンへの記号は省略する。
図８（Ａ）に示すマイクロホンアレイ１は、各マイクロホン１０が所定間隔で１つ（１次元）の円周上に配置されたものである。図８（Ｂ）に示すマイクロホンアレイ１は、各マイクロホン１０が複数（複数次元）の円周上に配置されたものである。
このようにマイクロホン１０が配列されたマイクロホンアレイ１に対しても、前述の構成および処理を適用することができ、音源方向の検知と、この検知音源方向からの集音を実現することができる。 FIG. 8A shows a configuration of a microphone holon array in which microphones are arranged in a one-dimensional circle, and FIG. 8B shows a configuration of a microphone array in which microphones are arranged in a multi-dimensional circle. In FIG. 8, symbols are given only to representative microphones, and symbols to other microphones are omitted.
In the microphone array 1 shown in FIG. 8A, each microphone 10 is arranged on one (one-dimensional) circumference at a predetermined interval. In the microphone array 1 shown in FIG. 8B, each microphone 10 is arranged on a plurality (multiple dimensions) of a circumference.
The above-described configuration and processing can also be applied to the microphone array 1 in which the microphones 10 are arranged in this way, and detection of the sound source direction and sound collection from the detected sound source direction can be realized.

さらに、ここに示したマイクロホンアレイのマイクロホンの配置は一部の例にすぎず、マイクロホンが所定の配置パターンで配置されたマイクロホンアレイであれば、前述の構成および処理を実現することができ、前述の効果を奏することができる。 Furthermore, the arrangement of the microphones of the microphone array shown here is only a part of the examples, and the above-described configuration and processing can be realized if the microphone array is arranged in a predetermined arrangement pattern. The effect of can be produced.

１−マイクロホンアレイ、１０Ａ〜１０Ｍ−マイクロホン、２−音声信号処理部、２１Ａ〜２１Ｍ−ディジタルフィルタ、２２Ａ−ディレイバッファメモリ、２３１Ａ〜２３８Ａ−ＦＩＲフィルタ、２４１Ａ〜２４８Ａ−アンプ、２５Ａ〜２５Ｈ−加算器、２６，２８−ＢＰＦ、２７，２９−レベル判定部、３０−セレクタ、３１−重み付け設定部、３２−信号合成部、３２１，３２２−可変アンプ、３２３−加算器、５−スピーカ、１００−音源、１０１〜１０４−部分領域、１４１〜１４４−局所領域 1-microphone array, 10A-10M-microphone, 2-audio signal processor, 21A-21M-digital filter, 22A-delay buffer memory, 231A-238A-FIR filter, 241A-248A-amplifier, 25A-25H-adder , 26, 28-BPF, 27, 29-level determination unit, 30-selector, 31-weighting setting unit, 32-signal synthesis unit, 321, 322-variable amplifier, 323-adder, 5-speaker, 100-sound source 101-104 partial region, 141-144 local region

Claims

複数のマイクロホンを所定パターンに配列して構成されるマイクロホンアレイと、
前記複数のマイクロホンが集音した音声信号をそれぞれ所定の遅延時間で遅延して合成することにより、特定方向に強い指向性の集音ビームを形成するビーム形成部であって、並列にそれぞれ独立した遅延時間を設定することで同時に複数方向に集音ビームを形成するビーム形成部と、
該ビーム形成部が形成した複数の集音ビームのうち、信号レベルが最も高い集音ビームが集音する音声信号を選択して後段に出力する信号選択部と、
を備え、
前記ビーム形成部は、所定タイミング毎に集音ビームを形成し、
前記信号選択部は、前後するタイミングで異なる集音ビームを用いて、集音する音声信号を選択する場合に、前のタイミングで集音された音声信号による合成音声信号と、後のタイミングで集音された音声信号による合成音声信号とをクロスフェード処理して出力する、
ことを特徴とする集音装置。 A microphone array configured by arranging a plurality of microphones in a predetermined pattern;
A beam forming unit that forms a sound collecting beam having a strong directivity in a specific direction by synthesizing the audio signals collected by the plurality of microphones by delaying each with a predetermined delay time. A beam forming unit that simultaneously forms a sound collecting beam in a plurality of directions by setting a delay time;
A signal selection unit that selects a sound signal collected by the sound collection beam having the highest signal level from among the plurality of sound collection beams formed by the beam forming unit, and outputs to the subsequent stage;
With
The beam forming unit forms a sound collecting beam at every predetermined timing,
The signal selection unit selects a voice signal to be collected using different sound collection beams at different timings, and collects a synthesized voice signal based on a voice signal collected at the previous timing and at a later timing. Cross-fading and outputting the synthesized audio signal from the sound signal that was sounded,
A sound collector characterized by that.