WO2008047804A1

WO2008047804A1 - Voice conference device and voice conference system

Info

Publication number: WO2008047804A1
Application number: PCT/JP2007/070195
Authority: WO
Inventors: Toshiaki Ishibashi; Ryo Tanaka; Satoshi Ukai
Original assignee: Yamaha Corporation
Priority date: 2006-10-17
Filing date: 2007-10-16
Publication date: 2008-04-24
Also published as: CN101513056B; JP2008103824A; JP5028944B2; CN101513056A

Abstract

Provided is a teleconference system for collecting a wide range of voices of participants in a conference while imaging a main speaker. The voice conference device (1) collects a wide range of voices and voices divided into narrow ranges by using a microphone array formed by arranging a plurality of microphones MIC. Voice signals (MB1, MB2) collected in a wide range are used as a voice signal (MB0) for voice collection. Moreover, the voice collection direction (DS) is detected by using the voice signal of the highest level detected from voice signals (MB11 to MB14, MB21 to MB24) which have been collected by dividing a range into a narrow ranges and the imaging direction of a camera (7) is controlled according to the voice collection direction (DS).

Description

明細書 Specification

音声会議装置及び音声会議システム Audio conference device and audio conference system

技術分野 Technical field

[0001] この発明は、複数のマイクから構成されるマイクアレイの収音音声を用いて発言者方向を検出し、発言者方向にカメラの撮影方向を制御する音声会議装置及び音声会議システムに関する。 The present invention relates to an audio conference apparatus and an audio conference system that detect a speaker direction using sound collected from a microphone array including a plurality of microphones and control a shooting direction of a camera in the speaker direction.

背景技術 Background art

[0002] 従来、遠隔地間で会議を行う方法として、拠点毎に撮影機能を備えた会議システムを配置し、これらをネットワーク等で接続して、映像データや音声データを送受信する方法が多く用いられている。そして、このような会議に利用される音声会議システムが各種考案されている。 [0002] Conventionally, as a method of conducting a conference between remote locations, a conference system having a photographing function is arranged at each site, and these are connected via a network or the like, and a method of transmitting and receiving video data and audio data is often used. It has been. Various audio conferencing systems have been devised for such meetings.

特許文献 1の会議用撮像装置では、参加者毎に配置された指向性マイクより収音した音声信号に基づいて、発言者の位置を検出し、当該発言者の位置方向の映像をカメラにて撮影することが開示されて!/、る。 In the conference imaging apparatus disclosed in Patent Document 1, the position of a speaker is detected based on an audio signal picked up by a directional microphone arranged for each participant, and an image of the position direction of the speaker is captured by a camera. It is disclosed to shoot!

特許文献 1：特開昭 61— 198891号公報 Patent Document 1: Japanese Patent Laid-Open No. 61-198891

発明の開示 Disclosure of the invention

発明が解決しょうとする課題 Problems to be solved by the invention

[0003] しかしながら、特許文献 1の発明は、参加者毎に指向性マイクを配置する必要があり、会議の参加人数に応じて、指向性マイクを準備しなければならない。 However, in the invention of Patent Document 1, it is necessary to arrange a directional microphone for each participant, and a directional microphone must be prepared according to the number of participants in the conference.

また、収音用と発言者の位置検出用のマイクビームを兼用しているため、広い範囲の音を収音しょうとすると、発言者の特定が不能となり、狭い範囲の音を録音しようとすると、発言者は特定できるが、二人以上の発言が同時にあると一人の発言しか収音できないという問題がある。 In addition, since the microphone beam is used for collecting sound and detecting the speaker's position, if you try to pick up a wide range of sounds, it will be impossible to identify the speaker, and if you try to record a narrow range of sounds The speaker can be identified, but if there are two or more speakers at the same time, there is a problem that only one speaker can be picked up.

課題を解決するための手段 Means for solving the problem

[0004] 本発明は、上記の事情に鑑みてなされたものであり、音声会議装置は、 [0004] The present invention has been made in view of the above circumstances, and an audio conference apparatus includes:

所定パターンで配列された複数のマイクを有するマイクアレイと、 A microphone array having a plurality of microphones arranged in a predetermined pattern;

前記マイクアレイの各マイクで収音された複数の収音音声信号に基づ!/、て、自装置周りの第 1収音範囲が設定された第 1収音ビームを形成するエリア収音ビーム形成部と、 Based on multiple collected audio signals collected by each microphone of the microphone array! An area sound collection beam forming unit for forming a first sound collection beam in which a first sound collection range around the device is set;

前記マイクアレイの各マイクで収音された複数の収音音声信号に基づ!/、て、前記第 1収音範囲より狭い第 2収音範囲が設定された第 2収音ビームを形成するスポット収音ビーム形成部と、 Based on a plurality of collected sound signals picked up by each microphone of the microphone array, a second sound collecting beam in which a second sound collecting range narrower than the first sound collecting range is set is formed. A spot collecting beam forming section;

前記スポット収音ビーム形成部で形成された複数の第 2収音ビームから話者方向を検出し、該話者方向を撮影方向として検出する撮影方向検出部と、 A shooting direction detection unit that detects a speaker direction from a plurality of second sound collection beams formed by the spot sound collection beam forming unit, and detects the speaker direction as a shooting direction;

を備える。 Is provided.

[0005] この構成では、音声会議装置は、複数のマイクで構成されたマイクアレイを用いて収音を行う。音声会議装置は、収音した音声信号から、広範囲のエリアに対応したェリア収音ビームと狭範囲の複数のスポットに対応したスポット収音ビームとを形成する。そして、音声会議装置は、エリア収音ビームに基づいて音声データを生成し出力する。音声会議装置は、スポット収音ビームに基づいてカメラの撮影方向を制御する。これにより、音声会議装置は、広範囲に収音した音声データを出力することができる。また、音声会議装置は、主な発言者の方向をカメラの撮影方向とすることができる。更に、本発明の音声会議装置は、主な発言者が変わると、カメラの撮影方向を自動で変更することができるので、常に主な発言者を撮影方向に指定することができる。 [0005] With this configuration, the audio conference apparatus collects sound using a microphone array including a plurality of microphones. The audio conference apparatus forms an area sound collection beam corresponding to a wide area and a spot sound collection beam corresponding to a plurality of spots in a narrow range from the collected sound signal. Then, the audio conference apparatus generates and outputs audio data based on the area sound collection beam. The audio conference apparatus controls the shooting direction of the camera based on the spot pickup beam. As a result, the audio conference apparatus can output audio data collected over a wide range. Also, the audio conference apparatus can set the direction of the main speaker as the shooting direction of the camera. Furthermore, since the audio conferencing apparatus of the present invention can automatically change the shooting direction of the camera when the main speaker changes, the main speaker can always be designated as the shooting direction.

[0006] また、スポット収音ビーム形成部は、前記収音音声信号の高域成分のみを用いて収音ビームを形成する。 [0006] Further, the spot sound collecting beam forming unit forms a sound collecting beam using only a high frequency component of the sound collecting sound signal.

また、音声会議装置はさらに、他の音声会議装置とネットワークを介して接続され、該他の音声会議装置と通信を行う通信部と、前記エリア収音ビーム形成部で形成された第 1収音ビームに基づいて音声データを生成し、前記通信部を介して該音声データを前記他の音声会議装置に送信する制御部と、を備える。 In addition, the audio conference apparatus is further connected to another audio conference apparatus via a network, and communicates with the other audio conference apparatus, and a first convergence formed by the area sound collection beam forming section. A control unit that generates voice data based on the sound beam and transmits the voice data to the other voice conference apparatus via the communication unit.

[0007] この構成では、カメラの撮影方向の制御に用いる音声信号は、高域成分のみを用いることで、指向性を強めて収音ビームを形成する。 [0007] With this configuration, the sound signal used for controlling the shooting direction of the camera uses only a high-frequency component, thereby enhancing directivity and forming a sound collection beam.

これにより、音声会議装置は、カメラの撮影方向の制御に用いる収音ビームのみ、指向性を強めることができるので、発言者の位置をより正確に検出することができる。 As a result, the voice conference apparatus can increase the directivity of only the sound collection beam used for controlling the shooting direction of the camera, and thus the position of the speaker can be detected more accurately.

[0008] また、音声会議システムは、所定パターンで配列された複数のマイクを有するマイクアレイと、 [0008] Further, the audio conference system includes: A microphone array having a plurality of microphones arranged in a predetermined pattern;

前記マイクアレイの各マイクで収音された複数の収音音声信号に基づ!/、て、自装置周りの第 1収音範囲が設定された第 1収音ビームを形成するエリア収音ビーム形成部と、 Based on a plurality of collected sound signals picked up by each microphone of the microphone array! /, An area sound pickup that forms a first sound pickup beam in which a first sound pickup range around the device is set. A beam forming section,

該スポット収音ビーム形成部で形成された複数の第 2収音ビームから話者方向を検出し、該話者方向を撮影方向として検出する撮影方向検出部と、 A shooting direction detection unit that detects a speaker direction from the plurality of second sound collection beams formed by the spot sound collection beam forming unit, and detects the speaker direction as a shooting direction;

前記音声会議装置の撮影方向検出部により検出された撮影方向を撮影して映像データを生成する撮影部と、 A shooting unit that shoots the shooting direction detected by the shooting direction detection unit of the audio conference device and generates video data;

を備える。 Is provided.

[0009] この構成では、音声会議システムは、音声会議装置とカメラとを有する。音声会議装置は、広範囲に収音した音声データを生成するとともに、主な発言者を撮影方向としてカメラを制御する。カメラは、音声会議装置に指示された撮影方向を撮影して撮影データを生成する。 In this configuration, the audio conference system includes an audio conference apparatus and a camera. The audio conferencing device generates audio data collected over a wide range and controls the camera with the main speaker as the shooting direction. The camera shoots the shooting direction specified by the audio conference device and generates shooting data.

これにより、音声会議システムは、広範囲に音声を収音しながら、主な発言者をカメラの撮影方向とすることができる。更に、本発明の音声会議システムは、主な発言者が変わると、カメラの撮影方向を自動で変更することができるので、カメラは、常に主な発言者を撮影することができる。 As a result, the voice conference system can pick up the main speaker as the camera shooting direction while collecting voice over a wide range. Furthermore, since the audio conference system of the present invention can automatically change the shooting direction of the camera when the main speaker changes, the camera can always capture the main speaker.

発明の効果 The invention's effect

[0010] 以上のように、この発明によれば、会議の参加者の発言を広範囲に収音しながら、主な発言者を撮影することができる。 [0010] As described above, according to the present invention, it is possible to photograph main speakers while collecting a wide range of speeches of conference participants.

図面の簡単な説明 Brief Description of Drawings

[0011] [図 1]遠隔地と音声会議を行う音声会議システムの説明図である。 FIG. 1 is an explanatory diagram of a voice conference system that performs a voice conference with a remote place.

[図 2]本実施形態に係る音声会議装置 1の三面図である。 FIG. 2 is a three-sided view of the audio conference apparatus 1 according to the present embodiment.

[図 3]本実施形態に係る音声会議装置 1を表す三面図である。 FIG. 3 is a three-sided view showing the audio conference apparatus 1 according to the present embodiment.

[図 4]本実施形態に係る音声会議システムの機能的な構成を示すブロック図である。 [図 5]収音エリアの説明図である。 FIG. 4 is a block diagram showing a functional configuration of the audio conference system according to the present embodiment. FIG. 5 is an explanatory diagram of a sound collection area.

園 6]本実施形態に係る音声会議装置の他の利用方法についての説明図である。園 7]他の実施形態に係る音声会議システムの機能的な構成を示すブロック図である 6] It is explanatory drawing about the other usage method of the audio conference apparatus based on this embodiment. 7] is a block diagram showing a functional configuration of an audio conference system according to another embodiment

[図 8]他の実施形態に係る音声会議システムのブロック図である。 FIG. 8 is a block diagram of an audio conference system according to another embodiment.

符号の説明 Explanation of symbols

1 - - H尸； z d我 1--H 尸； z d 我

2-筐体 2-enclosure

3-脚部 3-leg

4-操作部 4-Operation part

5-発光部 5-light emitting part

6 -下面グリノレ 6-bottom grinole

7-カメラ 7-camera

8 - .表不末 8-.

9-ビデオ通信装置 9-Video communication equipment

10 制御部 10 Control unit

11一入出力コネクタパネル 11 One I / O connector panel

12一入出力インタフェース 12 one input / output interface

13 放音指向性制御部 13 Sound emission directivity control unit

14 D/Aコンバータ 14 D / A converter

15 放音用アンプ 15 Sound amplifier

16 収音用アンプ 16 Sound pickup amplifier

17 A/Dコンバータ 17 A / D converter

19ー収音ビーム選択部 19-Sound pickup beam selector

20 エコーキャンセル部 20 Echo cancellation part

21 適応型エコーキャンセ 21 Adaptive echo cancellation

22一力メラ制御部 22 Powerful control unit

71 -撮像部 72, 82 接続端子部 71-Imaging unit 72, 82 Connection terminal

81 表示部 81 Display

91一入出力インタフェース 91 one input / output interface

92 映像コーデック 92 Video codec

100—ネットワーク 100—Network

110—入出力コネクタ 110—I / O connector

181 , 182 収音ビーム生成部 181, 182 Sound collection beam generator

191 発言者位置検出部 191 Speaker position detector

211—適応型フィルタ 211—Adaptive filter

212—ポストプロセッサ 212—Postprocessor

MIC皿〜 MIC116, MIC20 〜 MIC216 マイク MIC tray to MIC116, MIC20 to MIC216

SP；!〜 SP16 スピーカ SP;! ~ SP16 Speaker

発明を実施するための最良の形態 BEST MODE FOR CARRYING OUT THE INVENTION

[0013] 本発明の実施形態に係る音声会議システムついて、図 1を参照して説明する。図 1 は、遠隔地とテレビ会議を行う音声会議システムの説明図である。 An audio conference system according to an embodiment of the present invention will be described with reference to FIG. Fig. 1 is an explanatory diagram of an audio conference system for teleconferencing with remote locations.

図 1に示すように、本発明の音声会議システムは、音声会議装置 1、カメラ 7、表示端末 8、ビデオ通信装置 9から構成される。音声会議装置 1には、カメラ 7が接続される。カメラ 7には、ビデオ通信装置 9が接続される。ビデオ通信装置 9には、表示端末 8が接続される。また、遠隔地間で音声会議を行う際には、音声会議装置 1とビデオ通信装置 9とは、ネットワーク 100を介して遠隔地にある音声会議システムと接続され As shown in FIG. 1, the audio conference system of the present invention includes an audio conference device 1, a camera 7, a display terminal 8, and a video communication device 9. A camera 7 is connected to the audio conference apparatus 1. A video communication device 9 is connected to the camera 7. A display terminal 8 is connected to the video communication device 9. When conducting a voice conference between remote locations, the audio conference device 1 and the video communication device 9 are connected to a remote audio conference system via the network 100.

[0014] 次に、音声会議システムを構築するカメラ 7、表示端末 8、ビデオ通信装置 9、音声会議装置 1の構成について説明する。 Next, the configuration of the camera 7, the display terminal 8, the video communication device 9, and the audio conference device 1 that construct the audio conference system will be described.

[0015] カメラ 7は、会議の参加者を撮影するためのもので、撮像部 71と接続端子部 72から構成され、音声会議装置 1から接続端子部 72を介して入力信号 (後述する収音方向 DS)を受け、撮像部 71を上下左右 (例えば、上下に約 120度、左右に約 200度）に回転させることで、音声会議装置 1に指示された方位の撮影を行う。カメラ 7は、撮影データを接続端子部 72を介してビデオ通信装置 9へ出力する。なお、接続端子部 7 2としてビデオ出力端子、マルチコネクタ、電源端子等がある。 [0015] The camera 7 is used to photograph the participants of the conference, and includes an imaging unit 71 and a connection terminal unit 72. The camera 7 receives an input signal from the audio conference device 1 via the connection terminal unit 72 (sound collection described later). In response to the direction DS), the imaging unit 71 is rotated up, down, left, and right (for example, about 120 degrees up and down, and about 200 degrees left and right), and shooting in the direction designated by the audio conference apparatus 1 is performed. The camera 7 outputs the shooting data to the video communication device 9 via the connection terminal unit 72. Connection terminal 7 2 includes a video output terminal, a multi-connector, a power supply terminal, and the like.

[0016] 表示端末 8は、ネットワーク 100を介して遠隔地のテレビ会議システムより受信した映像データを表示するためのもので、表示部 81と接続端子部 82から構成され、ビデォ通信装置 9から接続端子部 82を介して入力信号を受け、表示部 81に表示する。なお、表示端末 8は、プロジェクタや液晶ディスプレイ等である。 [0016] The display terminal 8 is used to display video data received from a remote video conference system via the network 100. The display terminal 8 includes a display unit 81 and a connection terminal unit 82. The input signal is received via the connection terminal section 82 and displayed on the display section 81. The display terminal 8 is a projector, a liquid crystal display, or the like.

[0017] ビデオ通信装置 9は、映像データの圧縮'伸張及びプロトコル制御を行う装置で、ネットワーク 100を介して映像データの送受信を行う。具体的には、ビデオ通信装置 9は、カメラ 7から入力された撮影データを、圧縮した後、パケット化してネットワーク 1 00へ出力する。また、ビデオ通信装置 9は、映像データがネットワーク 100から入力されると、パケット化された映像データを時系列に配列して順次出力することにより、ビットストリーム化した後、伸張して表示端末 8へ出力する。 The video communication device 9 is a device that performs compression / decompression of video data and protocol control, and transmits / receives video data via the network 100. Specifically, the video communication device 9 compresses the shooting data input from the camera 7 and then packetizes and outputs the packetized data to the network 100. In addition, when video data is input from the network 100, the video communication device 9 arranges the packetized video data in time series and sequentially outputs them to form a bit stream, which is then expanded and displayed on the display terminal 8. Output to.

[0018] 次に、音声会議装置 1の構成について図 2, 3を参照して説明する。なお、本実施形態に係る音声会議装置 1は、直線状に配列された複数のマイクからなるマイクァレィを用いる。そして、各マイクで収音した音声をそれぞれ遅延して合成することにより、収音指向性を形成する。この形成した収音指向性を収音ビームと呼ぶ。収音ビームの種類としては、収音ビームの向力、う先を特定の収音スポットに設定した狭範囲の設定と、ある程度の広がりがあるエリア（例えば、音声会議装置 1のそれぞれの側面方向（発言エリア））で発生した広範囲の音声を高いゲインで収音するとともに他エリァで発生した音声（ノイズ)を抑制する設定とを有する。 Next, the configuration of the audio conference apparatus 1 will be described with reference to FIGS. Note that the audio conference apparatus 1 according to the present embodiment uses a microphone array including a plurality of microphones arranged linearly. Then, sound collection directivity is formed by delaying and synthesizing the sound collected by each microphone. This formed sound collection directivity is called a sound collection beam. The types of sound collection beams include the direction of the sound collection beam, a narrow range with the tip set to a specific sound collection spot, and an area with a certain extent (for example, each side of the audio conference device 1). Direction (speech area)) to collect a wide range of sounds generated with high gain and to suppress the sounds (noise) generated in other areas.

[0019] 図 2は、音声会議装置を表す三面図である。図 2 (A)は平面図、図 2 (B)は正面図、図 2 (C)は右側面図である。図 3は、図 2に示す音声会議装置のスピーカ配列およびマイク配列を示した図であり、図 3 (A)は前記正面のマイク配列を示す、図 3 (B)は底面のスピーカ配列を示す、図 3 (C)は背面のマイク配列を示す。 FIG. 2 is a three-sided view showing the audio conference apparatus. 2A is a plan view, FIG. 2B is a front view, and FIG. 2C is a right side view. 3 is a diagram showing a speaker arrangement and a microphone arrangement of the audio conference apparatus shown in FIG. 2, FIG. 3 (A) shows the front microphone arrangement, and FIG. 3 (B) is a bottom speaker arrangement. Fig. 3 (C) shows the rear microphone arrangement.

以下の説明では、図 2 (B)に図示した面を正面と呼び、この図に基づいて装置の上下左右を定める。 In the following description, the surface shown in FIG. 2 (B) is called the front, and the upper, lower, left, and right sides of the device are defined based on this figure.

[0020] 音声会議装置 1は、筐体 2および脚部 3からなる外観を有し、筐体 2は、操作部 4、発光部 5、入出力コネクタパネル 11を備えている。筐体 2は、左右に長尺な略直方体形状からなり、筐体 2の左右端部には、筐体 2の底面を設置面から所定距離持ち上げる脚部 3が設けられる。 The audio conference apparatus 1 has an external appearance including a housing 2 and legs 3, and the housing 2 includes an operation unit 4, a light emitting unit 5, and an input / output connector panel 11. Case 2 has a substantially rectangular parallelepiped shape that is long on the left and right sides, and the bottom surface of Case 2 is lifted a predetermined distance from the installation surface at the left and right ends of Case 2. A leg 3 is provided.

[0021] 筐体 2の上面右端部には、テンキー等の操作ボタンや表示画面を有する操作部 4 が設けられている。操作部 4は筐体 2内に設置された制御部 10に接続されている。操作部 4は、参加者からの操作入力を受け付けて制御部 10に出力するとともに、制御部 10の制御により、操作内容や実行モード等を表示画面に表示する。 An operation unit 4 having operation buttons such as a numeric keypad and a display screen is provided at the upper right end of the housing 2. The operation unit 4 is connected to a control unit 10 installed in the housing 2. The operation unit 4 receives an operation input from a participant and outputs the operation input to the control unit 10, and displays an operation content, an execution mode, and the like on the display screen under the control of the control unit 10.

[0022] 筐体 2の上面中央部には、筐体 2の略中央を中心として放射状に配置された LED 等の発光素子からなる発光部 5が設置されている。発光部 5は、制御部 10からの発光制御に応じて発光する。制御部 10は、収音方向の LEDを点灯させる発光制御信号を発光部 5に入力する。 A light emitting unit 5 made of light emitting elements such as LEDs and the like arranged radially around the substantially center of the housing 2 is installed at the center of the upper surface of the housing 2. The light emitting unit 5 emits light according to the light emission control from the control unit 10. The control unit 10 inputs a light emission control signal for lighting the LED in the sound collection direction to the light emitting unit 5.

[0023] 筐体 2の右側面には、 LANインタフェース、アナログオーディオ入力端子、アナログオーディオ出力端子、デジタルオーディオ入出力端子、シリアル端子等を備える入出力コネクタパネル 11が設置されており、この入出力コネクタパネル 11の各コネクタ [0023] On the right side of the housing 2, an input / output connector panel 11 having a LAN interface, an analog audio input terminal, an analog audio output terminal, a digital audio input / output terminal, a serial terminal, and the like is installed. Connectors on connector panel 11

(以下、入出力コネクタ 110)は筐体 2内部に設置された入出力インタフェース 12に接続されている。また、入出力コネクタパネル 11には、電源が供給される DCジャックも設けられている。 (Hereinafter, the input / output connector 110) is connected to the input / output interface 12 installed in the housing 2. The input / output connector panel 11 is also provided with a DC jack to which power is supplied.

[0024] 筐体 2の下面には、同仕様の 16個のスピーカ SP；!〜 SP16が設置されている。これらスピーカ SP；!〜 SP16は筐体 2の長手方向に沿って一定の間隔で直線状に設置されており、これらによりスピーカアレイが構成される。筐体 2の正面および背面には、同仕様のマイク MIC10；!〜 MIC116およびマイク MIC20；!〜 MIC216が設置されている。これらマイク MIC10；!〜 MIC116、マイク MIC20；!〜 MIC216は長手方向に沿って直線状に設置されており、これらによりマイクアレイが構成される。 [0024] On the lower surface of the housing 2, 16 speakers SP;! To SP16 having the same specifications are installed. These speakers SP ;! to SP16 are installed in a straight line at regular intervals along the longitudinal direction of the housing 2, and these constitute a speaker array. On the front and back of case 2, microphones MIC10;! To MIC116 and microphones MIC20;! To MIC216 of the same specifications are installed. These microphones MIC10;! To MIC116 and microphone MIC20;! To MIC216 are installed in a straight line along the longitudinal direction, and these constitute a microphone array.

そして、筐体 2の下面および正面，背面側には、これらスピーカアレイおよびマイクアレイを覆う断面 U字形状で長手方向に樋状に形成された下面グリル 6が取り付けられている。この下面グリル 6は、パンチメッシュが施された金属板で構成されており、スピー力 SP 〜 SP16、マイク MIC10 〜 MIC116、 MIC20 〜 MIC216を保護するとともに、放音および収音する音声を通過させるようになつている。 On the lower surface, front surface, and rear surface side of the housing 2, a lower surface grill 6 that is U-shaped in cross-section and covers the speaker array and the microphone array and is formed in a longitudinal shape is attached. The bottom grill 6 is made of a metal plate with a punch mesh and protects the SP SP SP16, the microphones MIC10 to MIC116, and the MIC20 to MIC216, and allows sound that is emitted and collected to pass through. It ’s like that.

このマイク MIC10；!〜 MIC116と収音ビーム生成部 181とが正面側の収音ビームを形成し、マイク MIC20；!〜 MIC216と収音ビーム生成部 182とが背面側の収音ビームを形成する。 The microphone MIC10 ;! to MIC116 and the collected sound beam generation unit 181 form a sound collecting beam on the front side, and the microphone MIC20;! To MIC216 and the collected sound beam generating unit 182 are arranged on the rear side. Form.

[0025] なお、本実施形態では、スピーカアレイのスピーカ数を 16個とし、各マイクアレイのマイク数をそれぞれ 16個とした力これに限ることなぐ仕様に応じてスピーカ数およびマイク数は適宜設定すればよい。また、スピーカアレイおよびマイクアレイの間隔は任意である。すなわち、一定間隔でもよぐ中央部を密に配置し、両端部にいくに従い疎に配置するようにしてもよい。更に、本実施形態では、マイクアレイをラインアレイで構成しているが、マイクアレイはラインアレイに限定されず、マトリクス状に配列されたアレイでもよい。 [0025] In this embodiment, the number of speakers in the speaker array is 16 and the number of microphones in each microphone array is 16. The number of speakers and the number of microphones is not limited to this. What is necessary is just to set suitably. The distance between the speaker array and the microphone array is arbitrary. In other words, the central part may be arranged densely at regular intervals and may be arranged sparsely as it goes to both ends. Furthermore, in this embodiment, the microphone array is configured as a line array, but the microphone array is not limited to a line array, and may be an array arranged in a matrix.

[0026] 次に、音声会議システムの機能について図 4, 5を参照して説明する。図 4は、音声会議システムの機能的な構成を示すブロック図である。図 5は、収音エリアの説明図である。図 5 (A)は、音声収音用の収音エリアを示し、図 5 (B)は、位置検出用の収音 Next, functions of the audio conference system will be described with reference to FIGS. FIG. 4 is a block diagram showing a functional configuration of the audio conference system. Fig. 5 is an explanatory diagram of the sound collection area. Figure 5 (A) shows the sound collection area for sound collection, and Figure 5 (B) shows the sound collection for position detection.

[0027] 音声会議システムは、機能的に、制御部 10、入出力コネクタ 110、音声会議装置 1 の入出力インタフェース 12、放音指向性制御部 13、 D/Aコンバータ 14、放音用ァンプ 15、スピーカアレイ（スピーカ SP；!〜 SP16)、マイクアレイ（マイク MIC10；!〜 MI C116, MIC20；!〜 MIC216)、収音用アンプ 16、 A/Dコンバータ 17、収音ビーム生成部 181 , 182、収音ビーム選択部 19、エコーキャンセル部 20、カメラ制御部 22 、カメラ 7、表示端末 8、ビデオ通信装置 9の入出力インタフェース 91、映像コーデック 92、操作部 4を備える。 [0027] The audio conference system functionally includes a control unit 10, an input / output connector 110, an input / output interface 12 of the audio conference device 1, a sound output directivity control unit 13, a D / A converter 14, a sound output amplifier. 15, speaker array (speaker SP;! To SP16), microphone array (microphone MIC10;! To MIC116, MIC20;! To MIC216), sound collecting amplifier 16, A / D converter 17, sound collecting beam generator 181, 182, a collected sound beam selection unit 19, an echo cancellation unit 20, a camera control unit 22, a camera 7, a display terminal 8, an input / output interface 91 of the video communication device 9, a video codec 92, and an operation unit 4.

[0028] 制御部 10は、操作部 4からの入力を受け、放音指向性制御部 13を制御し、発言者位置検出部 191からの入力を受け、カメラ制御部 22を制御する。制御の詳細については後述する。 The control unit 10 receives an input from the operation unit 4, controls the sound output directivity control unit 13, receives an input from the speaker position detection unit 191, and controls the camera control unit 22. Details of the control will be described later.

[0029] 入出力インタフェース 12は、エコーキャンセル部 20力、ら入力された音声信号をパケット化して、ネットワーク 100に出力する。また、入出力コネクタ 110を介して入力された音声信号をビットストリームのデジタル音声信号 S1に変換して出力する。デジタル音声信号 S 1は、エコーキャンセル部 20を介して放音指向性制御部 13に供給される The input / output interface 12 packetizes the input audio signal from the echo cancel unit 20 and outputs the packet to the network 100. Also, the audio signal input via the input / output connector 110 is converted into a digital audio signal S1 of a bit stream and output. The digital audio signal S 1 is supplied to the sound emission directivity control unit 13 via the echo cancellation unit 20.

〇 Yes

より具体的には、ネットワーク 100および LANコネクタを介して音声信号が入力された場合、入出力インタフェース 12は、パケット化された音声信号を時系列に配列して順次出力することにより、ビットストリーム化して放音指向性制御部 13に出力する。なお、アナログオーディオ入力端子を介してアナログ信号が入力された場合には、入出力インタフェース 12は、この信号をデジタル化して放音指向性制御部 13に出力す More specifically, audio signals are input via the network 100 and LAN connector. In this case, the input / output interface 12 arranges the packetized audio signals in time series and sequentially outputs them, thereby converting them into bit streams and outputting them to the sound output directivity control unit 13. When an analog signal is input via the analog audio input terminal, the input / output interface 12 digitizes this signal and outputs it to the sound output directivity control unit 13.

[0030] 放音指向性制御部 13は、制御部 10の指示により、入出力インタフェース 12から供給された音声信号力もスピーカアレイの各スピーカ SP；!〜 SP16に供給する個別放音信号を生成する機能部である。放音指向性制御部 13は、スピーカアレイからビーム化された音声である放音ビームが放音されるように各スピーカ SP；!〜 SP16に供給する個別放音信号を生成する。このため、放音指向性制御部 13は、入力された音声信号に対してそれぞれ所定の遅延処理及び所定の振幅処理等を行って個別放音信号を生成する。なお、放音ビームは、狭範囲に放音する放音ビーム、及び、広範囲に放音する放音ビームがあり、それぞれ参加者の操作部 4の操作によるモード設定によって切り換えが可能である。 [0030] The sound emission directivity control unit 13 generates an individual sound emission signal supplied to each speaker SP ;! to SP16 of the speaker array, with the audio signal power supplied from the input / output interface 12 according to an instruction from the control unit 10. It is a functional part to do. The sound emission directivity control unit 13 generates an individual sound emission signal to be supplied to each speaker SP;! To SP16 so that a sound emission beam which is a sound made into a beam from the speaker array is emitted. For this reason, the sound emission directivity control unit 13 performs a predetermined delay process, a predetermined amplitude process, and the like on the input sound signal to generate individual sound emission signals. There are two types of sound emission beams: a sound emission beam that emits sound in a narrow range and a sound emission beam that emits sound in a wide range, which can be switched according to the mode setting by operating the operation unit 4 of each participant. .

そして、放音指向性制御部 13は、生成した個別放音信号をスピーカ SP；!〜 SP16 毎に設置された D/Aコンバータ 14に出力する。各 D/Aコンバータ 14は個別放音信号をアナログ形式に変換して各放音用アンプ 15に出力し、各放音用アンプ 15は個別放音信号を増幅してスピーカ SP1〜SP16に与える。 And the sound emission directivity control part 13 outputs the produced | generated individual sound emission signal to the D / A converter 14 installed for every speaker SP;!-SP16. Each D / A converter 14 converts the individual sound emission signal into an analog format and outputs it to each sound emission amplifier 15, and each sound emission amplifier 15 amplifies the individual sound emission signal and applies it to the speakers SP1 to SP16.

[0031] スピーカアレイの各スピーカ SP；!〜 SP16は、供給された個別放音信号を音声変換して外部に放音する。スピーカ SP；!〜 SP16は筐体 2の下面に下向きに設置されているので、放音された音声は、音声会議装置 1が設置される机の設置面で反射して、参加者のいる装置の横力斜め上方に向かって伝搬される。 [0031] Each speaker SP;! To SP16 of the speaker array converts the supplied individual sound emission signal into sound and emits the sound to the outside. Speakers SP;! To SP16 are installed downward on the lower surface of the housing 2 so that the emitted sound is reflected on the installation surface of the desk on which the audio conference device 1 is installed, and the device with the participants Side force is propagated diagonally upward.

[0032] マイクアレイの各マイク MIC10；!〜 MIC116、 MIC20；!〜 MIC216は、それぞれ音声会議装置 1の正面側、背面側の音声を収音して電気信号である音声信号に変換し、この音声信号を各収音用アンプ 16に出力する。各収音用アンプ 16は、音声信号を増幅してそれぞれ A/Dコンバータ 17に与え、 A/Dコンバータ 17は、アナログの音声信号をデジタル信号に変換して収音ビーム生成部 181 , 182に出力する。ここで、収音ビーム生成部 181には、正面に設置されたマイク MIC10；!〜 MIC116が収音した正面側の音声信号が入力され、収音ビーム生成部 182には、背面に設置されたマイク MIC20；!〜 MIC216が収音した背面側の音声信号が入力される。 [0032] Each microphone MIC10 ;! to MIC116, MIC20;! To MIC216 of the microphone array picks up the sound on the front side and the back side of the audio conference device 1 and converts it into an audio signal that is an electrical signal. This audio signal is output to each sound collecting amplifier 16. Each of the sound collecting amplifiers 16 amplifies the sound signal and supplies the amplified signal to the A / D converter 17, and the A / D converter 17 converts the analog sound signal into a digital signal to collect sound collecting beam generating units 181 and 182. Output to. Here, the microphone MIC10;! ~ MIC116 installed in the front is installed in the sound collection beam generator 181. The collected sound signal from the front side is input, and the sound collecting beam generation unit 182 receives the sound signal from the back side collected by the microphones MIC20;! To MIC216 installed on the back surface.

[0033] 収音ビーム生成部 181 , 182は、音声収音用の広範囲の収音ビーム及びカメラ 7制御用の狭範囲の収音ビームを形成するべぐ各マイク MIC10；!〜 MIC116, MIC2 0；!〜 MIC216が収音した音声信号に対して遅延処理を行う。 [0033] The sound collecting beam generating units 181 and 182 are each of the microphones MIC10 to form a wide sound collecting beam for sound collecting and a narrow sound collecting beam for controlling the camera 7;! To MIC116, MIC20. ; ~~ Delay processing is performed on the audio signal picked up by the MIC216.

具体的には、広範囲で音声を収音するために、図 5 (A)に示すように、正面側，背面側ともに 1つのエリアを設定して、これらエリアをそれぞれ収音する収音ビーム MB 1 , MB2を形成し、収音ビーム選択部 19に出力する。 Specifically, in order to pick up sound over a wide range, as shown in Fig. 5 (A), one area is set on both the front and back sides, and a sound collecting beam that picks up each area. MB 1 and MB 2 are formed and output to the collected sound beam selector 19.

また、主な発言者に対してカメラ 7を向けるよう制御するために、図 5 (B)に示すように、同時に複数スポット（図 5 (B)では正面側，背面側のそれぞれ 4スポット）に対する収音ビーム MB1；!〜 MB14、 MB2；!〜 MB24を形成し、収音ビーム選択部 19に出力する。 Also, in order to control the camera 7 to be directed to the main speaker, as shown in Fig. 5 (B), multiple spots (4 spots on the front side and back side in Fig. 5 (B)) are simultaneously applied. Collected sound beams MB1;! To MB14, MB2;! To MB24 are formed and output to the collected sound beam selector 19.

[0034] なお、カメラ 7制御用の狭範囲の収音ビーム生成時は、音声を収音する場合と異なつて音質を考慮する必要がないため、収音した音声信号をハイパスフィルタで濾波して指向性の強い lkHz〜3kHz程度の高音域の信号のみを用いて収音ビーム MB1 1~MB14, MB2；!〜 MB24を生成してもよい。 [0034] Note that, when generating a narrow-range sound collection beam for camera 7 control, it is not necessary to consider the sound quality unlike the case of collecting sound, so the collected sound signal is filtered by a high-pass filter. The collected sound beams MB1 1 to MB14, MB2;! To MB24 may be generated using only signals having a high directivity of about 1 kHz to 3 kHz.

[0035] また、本実施形態では、正面側、背面側にそれぞれ 4スポット形成して!/、るが、これに限らず、複数スポットであればよい。 [0035] In this embodiment, four spots are formed on the front side and the back side, respectively. However, the present invention is not limited to this, and a plurality of spots may be used.

[0036] 収音ビーム選択部 19は、発言者位置検出部 191にて、 8個の収音ビーム MB1；!〜 MB14、 MB2；!〜 MB24で収音した 8スポットの音声信号のうち、最も高レベルのものが目的の音声信号 (すなわち、ノイズではない会議参加者の発言）であるとして、最も高レベルの音声信号の収音方向 DSを検出し、収音方向 DSを制御部 10に出力する。 [0036] The sound collection beam selection unit 19 uses the speaker position detection unit 191 to select the most of the eight spot audio signals collected by the eight sound collection beams MB1;!-MB14, MB2;!-MB24. Assuming that the high-level audio signal is the target audio signal (that is, the speech of a conference participant that is not noise), the sound collection direction DS of the highest-level audio signal is detected, and the sound collection direction DS is sent to the control unit 10. Output.

また、収音ビーム選択部 19は、 2つの収音ビーム MB1 , MB2のうち、収音方向 DS を含む収音ビームを選択して音声信号 MB0として後段のエコーキャンセル部 20に出力する。 In addition, the sound collection beam selection unit 19 selects a sound collection beam including the sound collection direction DS from the two sound collection beams MB1 and MB2, and outputs the selected sound collection beam as an audio signal MB0 to the subsequent echo cancellation unit 20.

[0037] エコーキャンセル部 20は、「入出力インタフェース 12から入力された音声信号がスピー力 SP；!〜 SP16から放音され、この放音された音声信号がマイク MIC10；!〜 Ml CI 16、 MIC20；!〜 MIC216に回帰して再び入出力インタフェース 12から出力される」というエコー現象を防ぐための機能部である。エコーキャンセル部 20は、適応型フィルタ 211を用いて上記経路の回帰音を推定し、推定した回帰音をマイクが収音した音声信号から減算することによりエコーを抑制するものである。 [0037] The echo canceling unit 20 indicates that “the audio signal input from the input / output interface 12 is emitted from the sound power SP ;! to SP16, and the emitted audio signal is output from the microphone MIC10 ;! to Ml. It is a functional unit to prevent the echo phenomenon of “CI 16, MIC20;! ~ MIC216 and output from I / O interface 12 again”. The echo cancel unit 20 estimates the regression sound of the above path using the adaptive filter 211, and suppresses the echo by subtracting the estimated regression sound from the voice signal collected by the microphone.

具体的に、エコーキャンセル部 20は、適応型エコーキャンセラ 21を備えている。適応型エコーキャンセラ 21は、適応型フィルタ 211とポストプロセッサ 212とを備えている。適応型フィルタ 211は、スピーカ SPに供給される音声信号に基づき、マイク MIC に回帰する音声信号成分を推定して擬似回帰音信号を生成する。ポストプロセッサ 2 12は、収音ビーム選択部 19が出力した音声信号 MB0から、入力音声信号 S1に対する擬似回帰音信号を減算することによりエコー成分を除去する。この音声信号 MB 0からエコー成分を除去した音声信号は入出力インタフェース 12に入力される。 Specifically, the echo cancellation unit 20 includes an adaptive echo canceller 21. The adaptive echo canceller 21 includes an adaptive filter 211 and a post processor 212. The adaptive filter 211 estimates a sound signal component that returns to the microphone MIC based on the sound signal supplied to the speaker SP, and generates a pseudo-regression sound signal. The post processor 212 removes the echo component by subtracting the pseudo regression sound signal corresponding to the input sound signal S1 from the sound signal MB0 output by the sound collection beam selection unit 19. The audio signal obtained by removing the echo component from the audio signal MB 0 is input to the input / output interface 12.

[0038] このようなエコーキャンセル処理を行うことにより、スピーカ SPからマイク MICに回帰する音声信号を的確に予測して除去することができ、マイク MICで収音した音声信号のみを入出力インタフェース 12から出力することができる。 [0038] By performing such echo cancellation processing, the audio signal returning from the speaker SP to the microphone MIC can be accurately predicted and removed, and only the audio signal collected by the microphone MIC can be input / output interface. 12 can be output.

[0039] カメラ制御部 22は、制御部 10から収音方向 DSが入力されると、収音方向 DSを撮影方向の中心とするようにカメラ 7の撮像部 71の方向を制御する。このようにカメラ 7 は、音声会議装置 1から入力される収音方向 DSに従って撮影方向を決める。これにより、発言者を自動で撮影することができる。カメラ 7の撮影データは、映像コーデック 92へ出力される。 [0039] When the sound collection direction DS is input from the control unit 10, the camera control unit 22 controls the direction of the imaging unit 71 of the camera 7 so that the sound collection direction DS is the center of the imaging direction. Thus, the camera 7 determines the shooting direction according to the sound collection direction DS input from the audio conference apparatus 1. This makes it possible to automatically photograph the speaker. The shooting data of the camera 7 is output to the video codec 92.

[0040] 映像コーデック 92は、カメラ 7から入力された撮影データの圧縮を行い、入出力インタフエース 91へ出力する。また、映像コーデック 92は、入出力インタフェース 91から入力された映像信号 P1の伸張を行い表示端末 8へ出力する。 The video codec 92 compresses the shooting data input from the camera 7 and outputs the compressed data to the input / output interface 91. Also, the video codec 92 decompresses the video signal P1 input from the input / output interface 91 and outputs it to the display terminal 8.

[0041] 入出力インタフェース 91は、映像コーデック 92から入力された撮影データをバケツト化して、ネットワーク 100に出力する。また、入出力インタフェース 91は、ネットワーク 100から入力された映像信号をビットストリームのデジタル映像信号 P1に変換して出力する。デジタル映像信号 P1は、映像コーデック 92を介して表示端末 8に供給されより具体的には、ネットワーク 100を介して映像信号が入力された場合、入出力インタフエース 91は、パケット化された映像信号を時系列に配列して順次出力することにより、ビットストリーム化して表示端末 8に出力する。 [0041] The input / output interface 91 converts the shooting data input from the video codec 92 into a bucket and outputs it to the network 100. The input / output interface 91 converts the video signal input from the network 100 into a digital video signal P1 of a bit stream and outputs it. The digital video signal P1 is supplied to the display terminal 8 via the video codec 92. More specifically, when the video signal is input via the network 100, the input / output The tuface 91 arranges packetized video signals in time series and sequentially outputs them, thereby converting them into bit streams and outputting them to the display terminal 8.

[0042] 以上より、本実施形態の音声会議システムでは、音声収音用と発言者の位置検出用と 2つの異なる収音ビームを生成する。そして、音声収音用の収音ビームを用いて、音声会議装置に対して主な発言者と反対側の音声を収音せずに、主な発言者側の音声のみを効果的に収音することで、主な発言者の発言を明瞭化できる。更に、発言者の位置検出用の収音ビームを用いて、主な発言者の位置を特定することで、主な発言者にカメラ 7を向けて撮影することができる。また、主な発言者が変わると、自動でカメラ 7の方向を切り替えることができる。 As described above, in the audio conference system according to the present embodiment, two different sound collection beams are generated for sound collection and for speaker position detection. Then, using the sound collecting beam for sound collecting, the sound of the main speaker side is effectively collected without collecting the sound on the opposite side to the main speaker with respect to the audio conference apparatus. By doing so, the speech of the main speaker can be clarified. Furthermore, by identifying the position of the main speaker using the sound collection beam for detecting the position of the speaker, it is possible to take a picture with the camera 7 facing the main speaker. In addition, if the main speaker changes, the direction of the camera 7 can be switched automatically.

[0043] また、本発明の音声会議システムは、図 6に示すように、ビデオ通信装置 9を利用せずに、拡声装置として会議に用いることができる。この場合、音声会議装置 1には、カメラ 7が接続され、カメラ 7に表示端末 8を接続する。音声会議装置 1は、収音した音声を増幅して放音する。また、カメラ 7は、音声会議装置 1から入力される収音方向 DSに従って撮影方向を決定し、撮影を行い、撮影データを生成する。カメラ 7は、生成した撮影データを表示端末 8に出力して、表示端末 8で撮影データを表示する。これにより、発言者の発言を増幅して放音するとともに、主な発言者をカメラ 7で撮影して表示端末 8に表示することができる。このため、大会議室等で行われる会議においても、参加者は、容易に発言者の発言を聞くことができる。また、主な発言者を表示端末 8に表示して、会議を進行することができるので、会議の参加者は主な発言者を容易に知ることができる。 Further, as shown in FIG. 6, the audio conference system of the present invention can be used as a loudspeaker for a conference without using the video communication device 9. In this case, a camera 7 is connected to the audio conference apparatus 1, and a display terminal 8 is connected to the camera 7. The audio conference apparatus 1 amplifies the collected sound and emits it. The camera 7 determines the shooting direction according to the sound collection direction DS input from the audio conference apparatus 1, performs shooting, and generates shooting data. The camera 7 outputs the generated shooting data to the display terminal 8 and displays the shooting data on the display terminal 8. As a result, it is possible to amplify and emit the speaker's speech, and to capture the main speaker with the camera 7 and display it on the display terminal 8. For this reason, even in a conference held in a large conference room, participants can easily hear the speaker's comments. In addition, since the main speaker can be displayed on the display terminal 8 and the conference can proceed, the conference participants can easily know the main speaker.

[0044] なお、本実施形態に限らず、図 7に示すように、収音ビーム選択部 19は、音声信号の収音方向に関係なぐ 2つの収音ビーム MB1 , MB2を合成して音声信号 MBOを生成し、この音声信号 MBOを後段のエコーキャンセル部 20に出力してもよい。 [0044] Not limited to the present embodiment, as shown in FIG. 7, the sound collection beam selection unit 19 synthesizes two sound collection beams MB1 and MB2 that are related to the sound collection direction of the sound signal to produce the sound signal. An MBO may be generated, and this audio signal MBO may be output to the echo cancellation unit 20 at the subsequent stage.

これにより、 2つの収音ビーム MB1 , MB2を合成して音声信号 MBOを生成するので、主な発言者をカメラ 7で確実に撮影しながら、主な発言者側だけでなぐ広範囲に音声を収音することで全ての参加者の発言を効果的に収音することができる。 As a result, the sound signal MBO is generated by synthesizing the two sound collecting beams MB1 and MB2, so that the main speaker can be captured with the camera 7 and the sound can be heard over a wide range only by the main speaker. By collecting sound, the speech of all participants can be collected effectively.

[0045] 更に、本実施形態に限らず、図 8に示すように、音声会議装置 1に音声および映像の通信部を設けてもよい。この通信部を通じて相手方音声会議装置と通信会議を行うこと力 Sできる。この場合、カメラ 7で撮影した撮影データ及びマイクで収音された音声データは、音声会議装置 1を介してネットワーク 100に出力される。そして、遠隔地にある他の音声会議装置からネットワーク 100を介して入力された映像信号は、音声会議装置 1を介して、表示端末 8に表示される。他の音声会議装置に送信される撮影データ及び音声データに関して、複数の狭範囲の収音ビームにより検出された高レベルの音声信号に対応する収音方向を、撮影方向として制御されたカメラ 7によつて撮影された撮影データが送信される。また、狭範囲の収音ビームにより検出された収音方向を含む広範囲の収音ビームに基づいて生成された音声データが送信される。更に、この場合、映像信号の入出力インタフェース 91を音声信号の入出力インタフェース 12と一体化し、共通の入出力コネクタ 1 10を介してネットワーク 100に接続すればよい。 Furthermore, not limited to the present embodiment, as shown in FIG. 8, the audio conference apparatus 1 may be provided with an audio and video communication unit. Use this communication unit to hold a communication conference with the other party's voice conference device. The power of S In this case, the shooting data shot by the camera 7 and the voice data collected by the microphone are output to the network 100 via the voice conference device 1. Then, a video signal input from another audio conference device at a remote location via the network 100 is displayed on the display terminal 8 via the audio conference device 1. With respect to imaging data and audio data transmitted to other audio conferencing devices, the camera 7 is controlled with the sound collection direction corresponding to the high-level audio signals detected by a plurality of narrow-range sound collection beams as the shooting direction. The shooting data shot by is sent. In addition, sound data generated based on a wide range of sound collection beams including the sound collection direction detected by the narrow range of sound collection beams is transmitted. Further, in this case, the video signal input / output interface 91 may be integrated with the audio signal input / output interface 12 and connected to the network 100 via the common input / output connector 110.

なお、図 8は、図 4の音声会議装置 1に映像の通信部を更に設けている力 S、これに限らず、図 7の音声会議装置 1に映像の通信部を更に設けてもよい。 8 is a force S in which the audio communication device 1 in FIG. 4 is further provided with a video communication unit, but not limited thereto, the audio conference device 1 in FIG. 7 may be further provided with a video communication unit.

Claims

請求の範囲 The scope of the claims

[1] 所定パターンで配列された複数のマイクを有するマイクアレイと、 [1] a microphone array having a plurality of microphones arranged in a predetermined pattern;

を備えた音声会議装置。 An audio conference device.

[2] スポット収音ビーム形成部は、前記収音音声信号の高域成分のみを用いて収音ビームを形成する請求項 1に記載の音声会議装置。 [2] The audio conference apparatus according to [1], wherein the spot sound collection beam forming unit forms a sound collection beam using only a high frequency component of the collected sound signal.

[3] 他の音声会議装置とネットワークを介して接続され、該他の音声会議装置と通信を行う通信部と、 [3] A communication unit that is connected to another voice conference device via a network and communicates with the other voice conference device;

前記エリア収音ビーム形成部で形成された第 1収音ビームに基づいて音声データを生成し、前記通信部を介して該音声データを前記他の音声会議装置に送信する制御部と、 A control unit that generates audio data based on the first sound collection beam formed by the area sound collection beam forming unit, and transmits the audio data to the other audio conference device via the communication unit;

をさらに備えた請求項 1に記載の音声会議装置。 The audio conference apparatus according to claim 1, further comprising:

[4] 所定パターンで配列された複数のマイクを有するマイクアレイと、 [4] a microphone array having a plurality of microphones arranged in a predetermined pattern;

該スポット収音ビーム形成部で形成された複数の第 2収音ビームから話者方向を検出し、該話者方向を撮影方向として検出する撮影方向検出部と、前記音声会議装置の撮影方向検出部により検出された撮影方向を撮影して映像データを生成する撮影部と、 A shooting direction detection unit that detects a speaker direction from the plurality of second sound collection beams formed by the spot sound collection beam forming unit, and detects the speaker direction as a shooting direction; A shooting unit that shoots the shooting direction detected by the shooting direction detection unit of the audio conference device and generates video data;

を備えた音声会議システム。 Voice conference system with

[5] スポット収音ビーム形成部は、前記収音音声信号の高域成分のみを用いて収音ビームを形成する請求項 4に記載の音声会議システム。 5. The audio conference system according to claim 4, wherein the spot sound collection beam forming unit forms a sound collection beam using only a high frequency component of the collected sound signal.

[6] 他の音声会議装置とネットワークを介して接続され、該他の音声会議装置と通信を行う通信部と、 [6] A communication unit that is connected to another voice conference device via a network and communicates with the other voice conference device;

をさらに備えた請求項 4に記載の音声会議システム。 The audio conference system according to claim 4, further comprising: