WO2008047804A1 - Voice conference device and voice conference system - Google Patents

Voice conference device and voice conference system Download PDF

Info

Publication number
WO2008047804A1
WO2008047804A1 PCT/JP2007/070195 JP2007070195W WO2008047804A1 WO 2008047804 A1 WO2008047804 A1 WO 2008047804A1 JP 2007070195 W JP2007070195 W JP 2007070195W WO 2008047804 A1 WO2008047804 A1 WO 2008047804A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
audio
sound collection
unit
microphone
Prior art date
Application number
PCT/JP2007/070195
Other languages
French (fr)
Japanese (ja)
Inventor
Toshiaki Ishibashi
Ryo Tanaka
Satoshi Ukai
Original Assignee
Yamaha Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yamaha Corporation filed Critical Yamaha Corporation
Priority to CN2007800321284A priority Critical patent/CN101513056B/en
Publication of WO2008047804A1 publication Critical patent/WO2008047804A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R27/00Public address systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/568Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants
    • H04M3/569Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants using the instant speaker's algorithm
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2203/00Aspects of automatic or semi-automatic exchanges
    • H04M2203/50Aspects of automatic or semi-automatic exchanges related to audio conference
    • H04M2203/5072Multiple active speakers

Definitions

  • the present invention relates to an audio conference apparatus and an audio conference system that detect a speaker direction using sound collected from a microphone array including a plurality of microphones and control a shooting direction of a camera in the speaker direction.
  • the position of a speaker is detected based on an audio signal picked up by a directional microphone arranged for each participant, and an image of the position direction of the speaker is captured by a camera. It is disclosed to shoot!
  • Patent Document 1 Japanese Patent Laid-Open No. 61-198891
  • Patent Document 1 it is necessary to arrange a directional microphone for each participant, and a directional microphone must be prepared according to the number of participants in the conference.
  • the microphone beam is used for collecting sound and detecting the speaker's position, if you try to pick up a wide range of sounds, it will be impossible to identify the speaker, and if you try to record a narrow range of sounds The speaker can be identified, but if there are two or more speakers at the same time, there is a problem that only one speaker can be picked up.
  • an audio conference apparatus includes:
  • a microphone array having a plurality of microphones arranged in a predetermined pattern
  • An area sound collection beam forming unit for forming a first sound collection beam in which a first sound collection range around the device is set;
  • a second sound collecting beam in which a second sound collecting range narrower than the first sound collecting range is set is formed.
  • a shooting direction detection unit that detects a speaker direction from a plurality of second sound collection beams formed by the spot sound collection beam forming unit, and detects the speaker direction as a shooting direction;
  • the audio conference apparatus collects sound using a microphone array including a plurality of microphones.
  • the audio conference apparatus forms an area sound collection beam corresponding to a wide area and a spot sound collection beam corresponding to a plurality of spots in a narrow range from the collected sound signal. Then, the audio conference apparatus generates and outputs audio data based on the area sound collection beam.
  • the audio conference apparatus controls the shooting direction of the camera based on the spot pickup beam. As a result, the audio conference apparatus can output audio data collected over a wide range.
  • the audio conference apparatus can set the direction of the main speaker as the shooting direction of the camera. Furthermore, since the audio conferencing apparatus of the present invention can automatically change the shooting direction of the camera when the main speaker changes, the main speaker can always be designated as the shooting direction.
  • the spot sound collecting beam forming unit forms a sound collecting beam using only a high frequency component of the sound collecting sound signal.
  • the audio conference apparatus is further connected to another audio conference apparatus via a network, and communicates with the other audio conference apparatus, and a first convergence formed by the area sound collection beam forming section.
  • a control unit that generates voice data based on the sound beam and transmits the voice data to the other voice conference apparatus via the communication unit.
  • the sound signal used for controlling the shooting direction of the camera uses only a high-frequency component, thereby enhancing directivity and forming a sound collection beam.
  • the voice conference apparatus can increase the directivity of only the sound collection beam used for controlling the shooting direction of the camera, and thus the position of the speaker can be detected more accurately.
  • the audio conference system includes: A microphone array having a plurality of microphones arranged in a predetermined pattern;
  • An area sound pickup that forms a first sound pickup beam in which a first sound pickup range around the device is set.
  • a second sound collecting beam in which a second sound collecting range narrower than the first sound collecting range is set is formed.
  • a shooting direction detection unit that detects a speaker direction from the plurality of second sound collection beams formed by the spot sound collection beam forming unit, and detects the speaker direction as a shooting direction;
  • a shooting unit that shoots the shooting direction detected by the shooting direction detection unit of the audio conference device and generates video data
  • the audio conference system includes an audio conference apparatus and a camera.
  • the audio conferencing device generates audio data collected over a wide range and controls the camera with the main speaker as the shooting direction.
  • the camera shoots the shooting direction specified by the audio conference device and generates shooting data.
  • the voice conference system can pick up the main speaker as the camera shooting direction while collecting voice over a wide range. Furthermore, since the audio conference system of the present invention can automatically change the shooting direction of the camera when the main speaker changes, the camera can always capture the main speaker.
  • FIG. 1 is an explanatory diagram of a voice conference system that performs a voice conference with a remote place.
  • FIG. 2 is a three-sided view of the audio conference apparatus 1 according to the present embodiment.
  • FIG. 3 is a three-sided view showing the audio conference apparatus 1 according to the present embodiment.
  • FIG. 4 is a block diagram showing a functional configuration of the audio conference system according to the present embodiment.
  • FIG. 5 is an explanatory diagram of a sound collection area.
  • FIG. 8 is a block diagram of an audio conference system according to another embodiment.
  • FIG. Fig. 1 is an explanatory diagram of an audio conference system for teleconferencing with remote locations.
  • the audio conference system of the present invention includes an audio conference device 1, a camera 7, a display terminal 8, and a video communication device 9.
  • a camera 7 is connected to the audio conference apparatus 1.
  • a video communication device 9 is connected to the camera 7.
  • a display terminal 8 is connected to the video communication device 9.
  • the camera 7 is used to photograph the participants of the conference, and includes an imaging unit 71 and a connection terminal unit 72.
  • the camera 7 receives an input signal from the audio conference device 1 via the connection terminal unit 72 (sound collection described later).
  • the imaging unit 71 is rotated up, down, left, and right (for example, about 120 degrees up and down, and about 200 degrees left and right), and shooting in the direction designated by the audio conference apparatus 1 is performed.
  • the camera 7 outputs the shooting data to the video communication device 9 via the connection terminal unit 72.
  • Connection terminal 7 2 includes a video output terminal, a multi-connector, a power supply terminal, and the like.
  • the display terminal 8 is used to display video data received from a remote video conference system via the network 100.
  • the display terminal 8 includes a display unit 81 and a connection terminal unit 82.
  • the input signal is received via the connection terminal section 82 and displayed on the display section 81.
  • the display terminal 8 is a projector, a liquid crystal display, or the like.
  • the video communication device 9 is a device that performs compression / decompression of video data and protocol control, and transmits / receives video data via the network 100. Specifically, the video communication device 9 compresses the shooting data input from the camera 7 and then packetizes and outputs the packetized data to the network 100. In addition, when video data is input from the network 100, the video communication device 9 arranges the packetized video data in time series and sequentially outputs them to form a bit stream, which is then expanded and displayed on the display terminal 8. Output to.
  • the audio conference apparatus 1 uses a microphone array including a plurality of microphones arranged linearly. Then, sound collection directivity is formed by delaying and synthesizing the sound collected by each microphone. This formed sound collection directivity is called a sound collection beam.
  • the types of sound collection beams include the direction of the sound collection beam, a narrow range with the tip set to a specific sound collection spot, and an area with a certain extent (for example, each side of the audio conference device 1).
  • Direction (speech area)) to collect a wide range of sounds generated with high gain and to suppress the sounds (noise) generated in other areas.
  • FIG. 2 is a three-sided view showing the audio conference apparatus.
  • 2A is a plan view
  • FIG. 2B is a front view
  • FIG. 2C is a right side view.
  • 3 is a diagram showing a speaker arrangement and a microphone arrangement of the audio conference apparatus shown in FIG. 2,
  • FIG. 3 (A) shows the front microphone arrangement
  • FIG. 3 (B) is a bottom speaker arrangement.
  • Fig. 3 (C) shows the rear microphone arrangement.
  • FIG. 2 (B) the surface shown in FIG. 2 (B) is called the front, and the upper, lower, left, and right sides of the device are defined based on this figure.
  • the audio conference apparatus 1 has an external appearance including a housing 2 and legs 3, and the housing 2 includes an operation unit 4, a light emitting unit 5, and an input / output connector panel 11.
  • Case 2 has a substantially rectangular parallelepiped shape that is long on the left and right sides, and the bottom surface of Case 2 is lifted a predetermined distance from the installation surface at the left and right ends of Case 2.
  • a leg 3 is provided.
  • An operation unit 4 having operation buttons such as a numeric keypad and a display screen is provided at the upper right end of the housing 2.
  • the operation unit 4 is connected to a control unit 10 installed in the housing 2.
  • the operation unit 4 receives an operation input from a participant and outputs the operation input to the control unit 10, and displays an operation content, an execution mode, and the like on the display screen under the control of the control unit 10.
  • a light emitting unit 5 made of light emitting elements such as LEDs and the like arranged radially around the substantially center of the housing 2 is installed at the center of the upper surface of the housing 2.
  • the light emitting unit 5 emits light according to the light emission control from the control unit 10.
  • the control unit 10 inputs a light emission control signal for lighting the LED in the sound collection direction to the light emitting unit 5.
  • an input / output connector panel 11 having a LAN interface, an analog audio input terminal, an analog audio output terminal, a digital audio input / output terminal, a serial terminal, and the like is installed. Connectors on connector panel 11
  • the input / output connector 110 is connected to the input / output interface 12 installed in the housing 2.
  • the input / output connector panel 11 is also provided with a DC jack to which power is supplied.
  • a lower surface grill 6 that is U-shaped in cross-section and covers the speaker array and the microphone array and is formed in a longitudinal shape is attached.
  • the bottom grill 6 is made of a metal plate with a punch mesh and protects the SP SP SP16, the microphones MIC10 to MIC116, and the MIC20 to MIC216, and allows sound that is emitted and collected to pass through. It ’s like that.
  • the microphone MIC10 ;! to MIC116 and the collected sound beam generation unit 181 form a sound collecting beam on the front side, and the microphone MIC20;! To MIC216 and the collected sound beam generating unit 182 are arranged on the rear side. Form.
  • the number of speakers in the speaker array is 16 and the number of microphones in each microphone array is 16.
  • the number of speakers and the number of microphones is not limited to this. What is necessary is just to set suitably.
  • the distance between the speaker array and the microphone array is arbitrary. In other words, the central part may be arranged densely at regular intervals and may be arranged sparsely as it goes to both ends.
  • the microphone array is configured as a line array, but the microphone array is not limited to a line array, and may be an array arranged in a matrix.
  • FIG. 4 is a block diagram showing a functional configuration of the audio conference system.
  • Fig. 5 is an explanatory diagram of the sound collection area.
  • Figure 5 (A) shows the sound collection area for sound collection
  • Figure 5 (B) shows the sound collection for position detection.
  • the audio conference system functionally includes a control unit 10, an input / output connector 110, an input / output interface 12 of the audio conference device 1, a sound output directivity control unit 13, a D / A converter 14, a sound output amplifier. 15, speaker array (speaker SP;! To SP16), microphone array (microphone MIC10;! To MIC116, MIC20;! To MIC216), sound collecting amplifier 16, A / D converter 17, sound collecting beam generator 181, 182, a collected sound beam selection unit 19, an echo cancellation unit 20, a camera control unit 22, a camera 7, a display terminal 8, an input / output interface 91 of the video communication device 9, a video codec 92, and an operation unit 4.
  • the control unit 10 receives an input from the operation unit 4, controls the sound output directivity control unit 13, receives an input from the speaker position detection unit 191, and controls the camera control unit 22. Details of the control will be described later.
  • the input / output interface 12 packetizes the input audio signal from the echo cancel unit 20 and outputs the packet to the network 100. Also, the audio signal input via the input / output connector 110 is converted into a digital audio signal S1 of a bit stream and output. The digital audio signal S 1 is supplied to the sound emission directivity control unit 13 via the echo cancellation unit 20.
  • audio signals are input via the network 100 and LAN connector.
  • the input / output interface 12 arranges the packetized audio signals in time series and sequentially outputs them, thereby converting them into bit streams and outputting them to the sound output directivity control unit 13.
  • the input / output interface 12 digitizes this signal and outputs it to the sound output directivity control unit 13.
  • the sound emission directivity control unit 13 generates an individual sound emission signal supplied to each speaker SP ;! to SP16 of the speaker array, with the audio signal power supplied from the input / output interface 12 according to an instruction from the control unit 10. It is a functional part to do.
  • the sound emission directivity control unit 13 generates an individual sound emission signal to be supplied to each speaker SP;! To SP16 so that a sound emission beam which is a sound made into a beam from the speaker array is emitted. For this reason, the sound emission directivity control unit 13 performs a predetermined delay process, a predetermined amplitude process, and the like on the input sound signal to generate individual sound emission signals.
  • the sound emission directivity control part 13 outputs the produced
  • Each D / A converter 14 converts the individual sound emission signal into an analog format and outputs it to each sound emission amplifier 15, and each sound emission amplifier 15 amplifies the individual sound emission signal and applies it to the speakers SP1 to SP16.
  • Each speaker SP;! To SP16 of the speaker array converts the supplied individual sound emission signal into sound and emits the sound to the outside. Speakers SP;! To SP16 are installed downward on the lower surface of the housing 2 so that the emitted sound is reflected on the installation surface of the desk on which the audio conference device 1 is installed, and the device with the participants Side force is propagated diagonally upward.
  • Each microphone MIC10 ;! to MIC116, MIC20;! To MIC216 of the microphone array picks up the sound on the front side and the back side of the audio conference device 1 and converts it into an audio signal that is an electrical signal. This audio signal is output to each sound collecting amplifier 16.
  • Each of the sound collecting amplifiers 16 amplifies the sound signal and supplies the amplified signal to the A / D converter 17, and the A / D converter 17 converts the analog sound signal into a digital signal to collect sound collecting beam generating units 181 and 182.
  • the microphone MIC10;! ⁇ MIC116 installed in the front is installed in the sound collection beam generator 181.
  • the collected sound signal from the front side is input, and the sound collecting beam generation unit 182 receives the sound signal from the back side collected by the microphones MIC20;! To MIC216 installed on the back surface.
  • the sound collecting beam generating units 181 and 182 are each of the microphones MIC10 to form a wide sound collecting beam for sound collecting and a narrow sound collecting beam for controlling the camera 7;! To MIC116, MIC20. ; ⁇ Delay processing is performed on the audio signal picked up by the MIC216.
  • one area is set on both the front and back sides, and a sound collecting beam that picks up each area.
  • MB 1 and MB 2 are formed and output to the collected sound beam selector 19.
  • Fig. 5 (B) In order to control the camera 7 to be directed to the main speaker, as shown in Fig. 5 (B), multiple spots (4 spots on the front side and back side in Fig. 5 (B)) are simultaneously applied. Collected sound beams MB1;! To MB14, MB2;! To MB24 are formed and output to the collected sound beam selector 19.
  • the collected sound beams MB1 1 to MB14, MB2;! To MB24 may be generated using only signals having a high directivity of about 1 kHz to 3 kHz.
  • spots are formed on the front side and the back side, respectively.
  • the present invention is not limited to this, and a plurality of spots may be used.
  • the sound collection beam selection unit 19 uses the speaker position detection unit 191 to select the most of the eight spot audio signals collected by the eight sound collection beams MB1;!-MB14, MB2;!-MB24. Assuming that the high-level audio signal is the target audio signal (that is, the speech of a conference participant that is not noise), the sound collection direction DS of the highest-level audio signal is detected, and the sound collection direction DS is sent to the control unit 10. Output.
  • the sound collection beam selection unit 19 selects a sound collection beam including the sound collection direction DS from the two sound collection beams MB1 and MB2, and outputs the selected sound collection beam as an audio signal MB0 to the subsequent echo cancellation unit 20.
  • the echo canceling unit 20 indicates that “the audio signal input from the input / output interface 12 is emitted from the sound power SP ;! to SP16, and the emitted audio signal is output from the microphone MIC10 ;! to Ml. It is a functional unit to prevent the echo phenomenon of “CI 16, MIC20;! ⁇ MIC216 and output from I / O interface 12 again”.
  • the echo cancel unit 20 estimates the regression sound of the above path using the adaptive filter 211, and suppresses the echo by subtracting the estimated regression sound from the voice signal collected by the microphone.
  • the echo cancellation unit 20 includes an adaptive echo canceller 21.
  • the adaptive echo canceller 21 includes an adaptive filter 211 and a post processor 212.
  • the adaptive filter 211 estimates a sound signal component that returns to the microphone MIC based on the sound signal supplied to the speaker SP, and generates a pseudo-regression sound signal.
  • the post processor 212 removes the echo component by subtracting the pseudo regression sound signal corresponding to the input sound signal S1 from the sound signal MB0 output by the sound collection beam selection unit 19.
  • the audio signal obtained by removing the echo component from the audio signal MB 0 is input to the input / output interface 12.
  • the audio signal returning from the speaker SP to the microphone MIC can be accurately predicted and removed, and only the audio signal collected by the microphone MIC can be input / output interface. 12 can be output.
  • the camera control unit 22 controls the direction of the imaging unit 71 of the camera 7 so that the sound collection direction DS is the center of the imaging direction.
  • the camera 7 determines the shooting direction according to the sound collection direction DS input from the audio conference apparatus 1. This makes it possible to automatically photograph the speaker.
  • the shooting data of the camera 7 is output to the video codec 92.
  • the video codec 92 compresses the shooting data input from the camera 7 and outputs the compressed data to the input / output interface 91. Also, the video codec 92 decompresses the video signal P1 input from the input / output interface 91 and outputs it to the display terminal 8.
  • the input / output interface 91 converts the shooting data input from the video codec 92 into a bucket and outputs it to the network 100.
  • the input / output interface 91 converts the video signal input from the network 100 into a digital video signal P1 of a bit stream and outputs it.
  • the digital video signal P1 is supplied to the display terminal 8 via the video codec 92. More specifically, when the video signal is input via the network 100, the input / output The tuface 91 arranges packetized video signals in time series and sequentially outputs them, thereby converting them into bit streams and outputting them to the display terminal 8.
  • the audio conference system As described above, in the audio conference system according to the present embodiment, two different sound collection beams are generated for sound collection and for speaker position detection. Then, using the sound collecting beam for sound collecting, the sound of the main speaker side is effectively collected without collecting the sound on the opposite side to the main speaker with respect to the audio conference apparatus. By doing so, the speech of the main speaker can be clarified. Furthermore, by identifying the position of the main speaker using the sound collection beam for detecting the position of the speaker, it is possible to take a picture with the camera 7 facing the main speaker. In addition, if the main speaker changes, the direction of the camera 7 can be switched automatically.
  • the audio conference system of the present invention can be used as a loudspeaker for a conference without using the video communication device 9.
  • a camera 7 is connected to the audio conference apparatus 1, and a display terminal 8 is connected to the camera 7.
  • the audio conference apparatus 1 amplifies the collected sound and emits it.
  • the camera 7 determines the shooting direction according to the sound collection direction DS input from the audio conference apparatus 1, performs shooting, and generates shooting data.
  • the camera 7 outputs the generated shooting data to the display terminal 8 and displays the shooting data on the display terminal 8.
  • the main speaker can be displayed on the display terminal 8 and the conference can proceed, the conference participants can easily know the main speaker.
  • the sound collection beam selection unit 19 synthesizes two sound collection beams MB1 and MB2 that are related to the sound collection direction of the sound signal to produce the sound signal.
  • An MBO may be generated, and this audio signal MBO may be output to the echo cancellation unit 20 at the subsequent stage.
  • the sound signal MBO is generated by synthesizing the two sound collecting beams MB1 and MB2, so that the main speaker can be captured with the camera 7 and the sound can be heard over a wide range only by the main speaker.
  • the speech of all participants can be collected effectively.
  • the audio conference apparatus 1 may be provided with an audio and video communication unit. Use this communication unit to hold a communication conference with the other party's voice conference device.
  • the power of S In this case, the shooting data shot by the camera 7 and the voice data collected by the microphone are output to the network 100 via the voice conference device 1. Then, a video signal input from another audio conference device at a remote location via the network 100 is displayed on the display terminal 8 via the audio conference device 1.
  • the camera 7 is controlled with the sound collection direction corresponding to the high-level audio signals detected by a plurality of narrow-range sound collection beams as the shooting direction. The shooting data shot by is sent.
  • the video signal input / output interface 91 may be integrated with the audio signal input / output interface 12 and connected to the network 100 via the common input / output connector 110.
  • the audio communication device 1 in FIG. 4 is further provided with a video communication unit, but not limited thereto, the audio conference device 1 in FIG. 7 may be further provided with a video communication unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
  • Telephonic Communication Services (AREA)

Abstract

Provided is a teleconference system for collecting a wide range of voices of participants in a conference while imaging a main speaker. The voice conference device (1) collects a wide range of voices and voices divided into narrow ranges by using a microphone array formed by arranging a plurality of microphones MIC. Voice signals (MB1, MB2) collected in a wide range are used as a voice signal (MB0) for voice collection. Moreover, the voice collection direction (DS) is detected by using the voice signal of the highest level detected from voice signals (MB11 to MB14, MB21 to MB24) which have been collected by dividing a range into a narrow ranges and the imaging direction of a camera (7) is controlled according to the voice collection direction (DS).

Description

明 細 書  Specification
音声会議装置及び音声会議システム  Audio conference device and audio conference system
技術分野  Technical field
[0001] この発明は、複数のマイクから構成されるマイクアレイの収音音声を用いて発言者 方向を検出し、発言者方向にカメラの撮影方向を制御する音声会議装置及び音声 会議システムに関する。  The present invention relates to an audio conference apparatus and an audio conference system that detect a speaker direction using sound collected from a microphone array including a plurality of microphones and control a shooting direction of a camera in the speaker direction.
背景技術  Background art
[0002] 従来、遠隔地間で会議を行う方法として、拠点毎に撮影機能を備えた会議システム を配置し、これらをネットワーク等で接続して、映像データや音声データを送受信する 方法が多く用いられている。そして、このような会議に利用される音声会議システムが 各種考案されている。  [0002] Conventionally, as a method of conducting a conference between remote locations, a conference system having a photographing function is arranged at each site, and these are connected via a network or the like, and a method of transmitting and receiving video data and audio data is often used. It has been. Various audio conferencing systems have been devised for such meetings.
特許文献 1の会議用撮像装置では、参加者毎に配置された指向性マイクより収音 した音声信号に基づいて、発言者の位置を検出し、当該発言者の位置方向の映像 をカメラにて撮影することが開示されて!/、る。  In the conference imaging apparatus disclosed in Patent Document 1, the position of a speaker is detected based on an audio signal picked up by a directional microphone arranged for each participant, and an image of the position direction of the speaker is captured by a camera. It is disclosed to shoot!
特許文献 1:特開昭 61— 198891号公報  Patent Document 1: Japanese Patent Laid-Open No. 61-198891
発明の開示  Disclosure of the invention
発明が解決しょうとする課題  Problems to be solved by the invention
[0003] しかしながら、特許文献 1の発明は、参加者毎に指向性マイクを配置する必要があ り、会議の参加人数に応じて、指向性マイクを準備しなければならない。 However, in the invention of Patent Document 1, it is necessary to arrange a directional microphone for each participant, and a directional microphone must be prepared according to the number of participants in the conference.
また、収音用と発言者の位置検出用のマイクビームを兼用しているため、広い範囲 の音を収音しょうとすると、発言者の特定が不能となり、狭い範囲の音を録音しようと すると、発言者は特定できるが、二人以上の発言が同時にあると一人の発言しか収 音できないという問題がある。  In addition, since the microphone beam is used for collecting sound and detecting the speaker's position, if you try to pick up a wide range of sounds, it will be impossible to identify the speaker, and if you try to record a narrow range of sounds The speaker can be identified, but if there are two or more speakers at the same time, there is a problem that only one speaker can be picked up.
課題を解決するための手段  Means for solving the problem
[0004] 本発明は、上記の事情に鑑みてなされたものであり、音声会議装置は、 [0004] The present invention has been made in view of the above circumstances, and an audio conference apparatus includes:
所定パターンで配列された複数のマイクを有するマイクアレイと、  A microphone array having a plurality of microphones arranged in a predetermined pattern;
前記マイクアレイの各マイクで収音された複数の収音音声信号に基づ!/、て、自装 置周りの第 1収音範囲が設定された第 1収音ビームを形成するエリア収音ビーム形成 部と、 Based on multiple collected audio signals collected by each microphone of the microphone array! An area sound collection beam forming unit for forming a first sound collection beam in which a first sound collection range around the device is set;
前記マイクアレイの各マイクで収音された複数の収音音声信号に基づ!/、て、前記 第 1収音範囲より狭い第 2収音範囲が設定された第 2収音ビームを形成するスポット 収音ビーム形成部と、  Based on a plurality of collected sound signals picked up by each microphone of the microphone array, a second sound collecting beam in which a second sound collecting range narrower than the first sound collecting range is set is formed. A spot collecting beam forming section;
前記スポット収音ビーム形成部で形成された複数の第 2収音ビームから話者方向を 検出し、該話者方向を撮影方向として検出する撮影方向検出部と、  A shooting direction detection unit that detects a speaker direction from a plurality of second sound collection beams formed by the spot sound collection beam forming unit, and detects the speaker direction as a shooting direction;
を備える。  Is provided.
[0005] この構成では、音声会議装置は、複数のマイクで構成されたマイクアレイを用いて 収音を行う。音声会議装置は、収音した音声信号から、広範囲のエリアに対応したェ リア収音ビームと狭範囲の複数のスポットに対応したスポット収音ビームとを形成する 。そして、音声会議装置は、エリア収音ビームに基づいて音声データを生成し出力す る。音声会議装置は、スポット収音ビームに基づいてカメラの撮影方向を制御する。 これにより、音声会議装置は、広範囲に収音した音声データを出力することができ る。また、音声会議装置は、主な発言者の方向をカメラの撮影方向とすることができる 。更に、本発明の音声会議装置は、主な発言者が変わると、カメラの撮影方向を自動 で変更することができるので、常に主な発言者を撮影方向に指定することができる。  [0005] With this configuration, the audio conference apparatus collects sound using a microphone array including a plurality of microphones. The audio conference apparatus forms an area sound collection beam corresponding to a wide area and a spot sound collection beam corresponding to a plurality of spots in a narrow range from the collected sound signal. Then, the audio conference apparatus generates and outputs audio data based on the area sound collection beam. The audio conference apparatus controls the shooting direction of the camera based on the spot pickup beam. As a result, the audio conference apparatus can output audio data collected over a wide range. Also, the audio conference apparatus can set the direction of the main speaker as the shooting direction of the camera. Furthermore, since the audio conferencing apparatus of the present invention can automatically change the shooting direction of the camera when the main speaker changes, the main speaker can always be designated as the shooting direction.
[0006] また、スポット収音ビーム形成部は、前記収音音声信号の高域成分のみを用いて 収音ビームを形成する。  [0006] Further, the spot sound collecting beam forming unit forms a sound collecting beam using only a high frequency component of the sound collecting sound signal.
また、音声会議装置はさらに、他の音声会議装置とネットワークを介して接続され、 該他の音声会議装置と通信を行う通信部と、前記エリア収音ビーム形成部で形成さ れた第 1収音ビームに基づいて音声データを生成し、前記通信部を介して該音声デ ータを前記他の音声会議装置に送信する制御部と、を備える。  In addition, the audio conference apparatus is further connected to another audio conference apparatus via a network, and communicates with the other audio conference apparatus, and a first convergence formed by the area sound collection beam forming section. A control unit that generates voice data based on the sound beam and transmits the voice data to the other voice conference apparatus via the communication unit.
[0007] この構成では、カメラの撮影方向の制御に用いる音声信号は、高域成分のみを用 いることで、指向性を強めて収音ビームを形成する。  [0007] With this configuration, the sound signal used for controlling the shooting direction of the camera uses only a high-frequency component, thereby enhancing directivity and forming a sound collection beam.
これにより、音声会議装置は、カメラの撮影方向の制御に用いる収音ビームのみ、 指向性を強めることができるので、発言者の位置をより正確に検出することができる。  As a result, the voice conference apparatus can increase the directivity of only the sound collection beam used for controlling the shooting direction of the camera, and thus the position of the speaker can be detected more accurately.
[0008] また、音声会議システムは、 所定パターンで配列された複数のマイクを有するマイクアレイと、 [0008] Further, the audio conference system includes: A microphone array having a plurality of microphones arranged in a predetermined pattern;
前記マイクアレイの各マイクで収音された複数の収音音声信号に基づ!/、て、自装 置周りの第 1収音範囲が設定された第 1収音ビームを形成するエリア収音ビーム形成 部と、  Based on a plurality of collected sound signals picked up by each microphone of the microphone array! /, An area sound pickup that forms a first sound pickup beam in which a first sound pickup range around the device is set. A beam forming section,
前記マイクアレイの各マイクで収音された複数の収音音声信号に基づ!/、て、前記 第 1収音範囲より狭い第 2収音範囲が設定された第 2収音ビームを形成するスポット 収音ビーム形成部と、  Based on a plurality of collected sound signals picked up by each microphone of the microphone array, a second sound collecting beam in which a second sound collecting range narrower than the first sound collecting range is set is formed. A spot collecting beam forming section;
該スポット収音ビーム形成部で形成された複数の第 2収音ビームから話者方向を検 出し、該話者方向を撮影方向として検出する撮影方向検出部と、  A shooting direction detection unit that detects a speaker direction from the plurality of second sound collection beams formed by the spot sound collection beam forming unit, and detects the speaker direction as a shooting direction;
前記音声会議装置の撮影方向検出部により検出された撮影方向を撮影して映像 データを生成する撮影部と、  A shooting unit that shoots the shooting direction detected by the shooting direction detection unit of the audio conference device and generates video data;
を備える。  Is provided.
[0009] この構成では、音声会議システムは、音声会議装置とカメラとを有する。音声会議 装置は、広範囲に収音した音声データを生成するとともに、主な発言者を撮影方向と してカメラを制御する。カメラは、音声会議装置に指示された撮影方向を撮影して撮 影データを生成する。  In this configuration, the audio conference system includes an audio conference apparatus and a camera. The audio conferencing device generates audio data collected over a wide range and controls the camera with the main speaker as the shooting direction. The camera shoots the shooting direction specified by the audio conference device and generates shooting data.
これにより、音声会議システムは、広範囲に音声を収音しながら、主な発言者をカメ ラの撮影方向とすることができる。更に、本発明の音声会議システムは、主な発言者 が変わると、カメラの撮影方向を自動で変更することができるので、カメラは、常に主 な発言者を撮影することができる。  As a result, the voice conference system can pick up the main speaker as the camera shooting direction while collecting voice over a wide range. Furthermore, since the audio conference system of the present invention can automatically change the shooting direction of the camera when the main speaker changes, the camera can always capture the main speaker.
発明の効果  The invention's effect
[0010] 以上のように、この発明によれば、会議の参加者の発言を広範囲に収音しながら、 主な発言者を撮影することができる。  [0010] As described above, according to the present invention, it is possible to photograph main speakers while collecting a wide range of speeches of conference participants.
図面の簡単な説明  Brief Description of Drawings
[0011] [図 1]遠隔地と音声会議を行う音声会議システムの説明図である。  FIG. 1 is an explanatory diagram of a voice conference system that performs a voice conference with a remote place.
[図 2]本実施形態に係る音声会議装置 1の三面図である。  FIG. 2 is a three-sided view of the audio conference apparatus 1 according to the present embodiment.
[図 3]本実施形態に係る音声会議装置 1を表す三面図である。  FIG. 3 is a three-sided view showing the audio conference apparatus 1 according to the present embodiment.
[図 4]本実施形態に係る音声会議システムの機能的な構成を示すブロック図である。 [図 5]収音エリアの説明図である。 FIG. 4 is a block diagram showing a functional configuration of the audio conference system according to the present embodiment. FIG. 5 is an explanatory diagram of a sound collection area.
園 6]本実施形態に係る音声会議装置の他の利用方法についての説明図である。 園 7]他の実施形態に係る音声会議システムの機能的な構成を示すブロック図である 6] It is explanatory drawing about the other usage method of the audio conference apparatus based on this embodiment. 7] is a block diagram showing a functional configuration of an audio conference system according to another embodiment
[図 8]他の実施形態に係る音声会議システムのブロック図である。 FIG. 8 is a block diagram of an audio conference system according to another embodiment.
符号の説明 Explanation of symbols
1 - - H尸; z d我  1--H 尸 ; z d 我
2-筐体  2-enclosure
3-脚部  3-leg
4-操作部  4-Operation part
5-発光部  5-light emitting part
6 -下面グリノレ  6-bottom grinole
7-カメラ  7-camera
8 - .表不 末  8-.
9-ビデオ通信装置  9-Video communication equipment
10 制御部  10 Control unit
11一入出力コネクタパネル  11 One I / O connector panel
12一入出力インタフェース  12 one input / output interface
13 放音指向性制御部  13 Sound emission directivity control unit
14 D/Aコンバータ  14 D / A converter
15 放音用アンプ  15 Sound amplifier
16 収音用アンプ  16 Sound pickup amplifier
17 A/Dコンバータ  17 A / D converter
19ー収音ビーム選択部  19-Sound pickup beam selector
20 エコーキャンセル部  20 Echo cancellation part
21 適応型エコーキャンセ  21 Adaptive echo cancellation
22一力メラ制御部  22 Powerful control unit
71 -撮像部 72, 82 接続端子部 71-Imaging unit 72, 82 Connection terminal
81 表示部  81 Display
91一入出力インタフェース  91 one input / output interface
92 映像コーデック  92 Video codec
100—ネットワーク  100—Network
110—入出力コネクタ  110—I / O connector
181 , 182 収音ビーム生成部  181, 182 Sound collection beam generator
191 発言者位置検出部  191 Speaker position detector
211—適応型フィルタ  211—Adaptive filter
212—ポストプロセッサ  212—Postprocessor
MIC皿〜 MIC116, MIC20 〜 MIC216 マイク  MIC tray to MIC116, MIC20 to MIC216
SP;!〜 SP16 スピーカ  SP;! ~ SP16 Speaker
発明を実施するための最良の形態  BEST MODE FOR CARRYING OUT THE INVENTION
[0013] 本発明の実施形態に係る音声会議システムついて、図 1を参照して説明する。図 1 は、遠隔地とテレビ会議を行う音声会議システムの説明図である。 An audio conference system according to an embodiment of the present invention will be described with reference to FIG. Fig. 1 is an explanatory diagram of an audio conference system for teleconferencing with remote locations.
図 1に示すように、本発明の音声会議システムは、音声会議装置 1、カメラ 7、表示 端末 8、ビデオ通信装置 9から構成される。音声会議装置 1には、カメラ 7が接続され る。カメラ 7には、ビデオ通信装置 9が接続される。ビデオ通信装置 9には、表示端末 8が接続される。また、遠隔地間で音声会議を行う際には、音声会議装置 1とビデオ 通信装置 9とは、ネットワーク 100を介して遠隔地にある音声会議システムと接続され  As shown in FIG. 1, the audio conference system of the present invention includes an audio conference device 1, a camera 7, a display terminal 8, and a video communication device 9. A camera 7 is connected to the audio conference apparatus 1. A video communication device 9 is connected to the camera 7. A display terminal 8 is connected to the video communication device 9. When conducting a voice conference between remote locations, the audio conference device 1 and the video communication device 9 are connected to a remote audio conference system via the network 100.
[0014] 次に、音声会議システムを構築するカメラ 7、表示端末 8、ビデオ通信装置 9、音声 会議装置 1の構成について説明する。 Next, the configuration of the camera 7, the display terminal 8, the video communication device 9, and the audio conference device 1 that construct the audio conference system will be described.
[0015] カメラ 7は、会議の参加者を撮影するためのもので、撮像部 71と接続端子部 72から 構成され、音声会議装置 1から接続端子部 72を介して入力信号 (後述する収音方向 DS)を受け、撮像部 71を上下左右 (例えば、上下に約 120度、左右に約 200度)に 回転させることで、音声会議装置 1に指示された方位の撮影を行う。カメラ 7は、撮影 データを接続端子部 72を介してビデオ通信装置 9へ出力する。なお、接続端子部 7 2としてビデオ出力端子、マルチコネクタ、電源端子等がある。 [0015] The camera 7 is used to photograph the participants of the conference, and includes an imaging unit 71 and a connection terminal unit 72. The camera 7 receives an input signal from the audio conference device 1 via the connection terminal unit 72 (sound collection described later). In response to the direction DS), the imaging unit 71 is rotated up, down, left, and right (for example, about 120 degrees up and down, and about 200 degrees left and right), and shooting in the direction designated by the audio conference apparatus 1 is performed. The camera 7 outputs the shooting data to the video communication device 9 via the connection terminal unit 72. Connection terminal 7 2 includes a video output terminal, a multi-connector, a power supply terminal, and the like.
[0016] 表示端末 8は、ネットワーク 100を介して遠隔地のテレビ会議システムより受信した 映像データを表示するためのもので、表示部 81と接続端子部 82から構成され、ビデ ォ通信装置 9から接続端子部 82を介して入力信号を受け、表示部 81に表示する。 なお、表示端末 8は、プロジェクタや液晶ディスプレイ等である。  [0016] The display terminal 8 is used to display video data received from a remote video conference system via the network 100. The display terminal 8 includes a display unit 81 and a connection terminal unit 82. The input signal is received via the connection terminal section 82 and displayed on the display section 81. The display terminal 8 is a projector, a liquid crystal display, or the like.
[0017] ビデオ通信装置 9は、映像データの圧縮'伸張及びプロトコル制御を行う装置で、 ネットワーク 100を介して映像データの送受信を行う。具体的には、ビデオ通信装置 9は、カメラ 7から入力された撮影データを、圧縮した後、パケット化してネットワーク 1 00へ出力する。また、ビデオ通信装置 9は、映像データがネットワーク 100から入力 されると、パケット化された映像データを時系列に配列して順次出力することにより、 ビットストリーム化した後、伸張して表示端末 8へ出力する。  The video communication device 9 is a device that performs compression / decompression of video data and protocol control, and transmits / receives video data via the network 100. Specifically, the video communication device 9 compresses the shooting data input from the camera 7 and then packetizes and outputs the packetized data to the network 100. In addition, when video data is input from the network 100, the video communication device 9 arranges the packetized video data in time series and sequentially outputs them to form a bit stream, which is then expanded and displayed on the display terminal 8. Output to.
[0018] 次に、音声会議装置 1の構成について図 2, 3を参照して説明する。なお、本実施 形態に係る音声会議装置 1は、直線状に配列された複数のマイクからなるマイクァレ ィを用いる。そして、各マイクで収音した音声をそれぞれ遅延して合成することにより 、収音指向性を形成する。この形成した収音指向性を収音ビームと呼ぶ。収音ビー ムの種類としては、収音ビームの向力、う先を特定の収音スポットに設定した狭範囲の 設定と、ある程度の広がりがあるエリア(例えば、音声会議装置 1のそれぞれの側面 方向(発言エリア))で発生した広範囲の音声を高いゲインで収音するとともに他エリ ァで発生した音声(ノイズ)を抑制する設定とを有する。  Next, the configuration of the audio conference apparatus 1 will be described with reference to FIGS. Note that the audio conference apparatus 1 according to the present embodiment uses a microphone array including a plurality of microphones arranged linearly. Then, sound collection directivity is formed by delaying and synthesizing the sound collected by each microphone. This formed sound collection directivity is called a sound collection beam. The types of sound collection beams include the direction of the sound collection beam, a narrow range with the tip set to a specific sound collection spot, and an area with a certain extent (for example, each side of the audio conference device 1). Direction (speech area)) to collect a wide range of sounds generated with high gain and to suppress the sounds (noise) generated in other areas.
[0019] 図 2は、音声会議装置を表す三面図である。図 2 (A)は平面図、図 2 (B)は正面図 、図 2 (C)は右側面図である。図 3は、図 2に示す音声会議装置のスピーカ配列およ びマイク配列を示した図であり、図 3 (A)は前記正面のマイク配列を示す、図 3 (B)は 底面のスピーカ配列を示す、図 3 (C)は背面のマイク配列を示す。  FIG. 2 is a three-sided view showing the audio conference apparatus. 2A is a plan view, FIG. 2B is a front view, and FIG. 2C is a right side view. 3 is a diagram showing a speaker arrangement and a microphone arrangement of the audio conference apparatus shown in FIG. 2, FIG. 3 (A) shows the front microphone arrangement, and FIG. 3 (B) is a bottom speaker arrangement. Fig. 3 (C) shows the rear microphone arrangement.
以下の説明では、図 2 (B)に図示した面を正面と呼び、この図に基づいて装置の上 下左右を定める。  In the following description, the surface shown in FIG. 2 (B) is called the front, and the upper, lower, left, and right sides of the device are defined based on this figure.
[0020] 音声会議装置 1は、筐体 2および脚部 3からなる外観を有し、筐体 2は、操作部 4、 発光部 5、入出力コネクタパネル 11を備えている。筐体 2は、左右に長尺な略直方体 形状からなり、筐体 2の左右端部には、筐体 2の底面を設置面から所定距離持ち上 げる脚部 3が設けられる。 The audio conference apparatus 1 has an external appearance including a housing 2 and legs 3, and the housing 2 includes an operation unit 4, a light emitting unit 5, and an input / output connector panel 11. Case 2 has a substantially rectangular parallelepiped shape that is long on the left and right sides, and the bottom surface of Case 2 is lifted a predetermined distance from the installation surface at the left and right ends of Case 2. A leg 3 is provided.
[0021] 筐体 2の上面右端部には、テンキー等の操作ボタンや表示画面を有する操作部 4 が設けられている。操作部 4は筐体 2内に設置された制御部 10に接続されている。 操作部 4は、参加者からの操作入力を受け付けて制御部 10に出力するとともに、制 御部 10の制御により、操作内容や実行モード等を表示画面に表示する。  An operation unit 4 having operation buttons such as a numeric keypad and a display screen is provided at the upper right end of the housing 2. The operation unit 4 is connected to a control unit 10 installed in the housing 2. The operation unit 4 receives an operation input from a participant and outputs the operation input to the control unit 10, and displays an operation content, an execution mode, and the like on the display screen under the control of the control unit 10.
[0022] 筐体 2の上面中央部には、筐体 2の略中央を中心として放射状に配置された LED 等の発光素子からなる発光部 5が設置されている。発光部 5は、制御部 10からの発 光制御に応じて発光する。制御部 10は、収音方向の LEDを点灯させる発光制御信 号を発光部 5に入力する。  A light emitting unit 5 made of light emitting elements such as LEDs and the like arranged radially around the substantially center of the housing 2 is installed at the center of the upper surface of the housing 2. The light emitting unit 5 emits light according to the light emission control from the control unit 10. The control unit 10 inputs a light emission control signal for lighting the LED in the sound collection direction to the light emitting unit 5.
[0023] 筐体 2の右側面には、 LANインタフェース、アナログオーディオ入力端子、アナログ オーディオ出力端子、デジタルオーディオ入出力端子、シリアル端子等を備える入 出力コネクタパネル 11が設置されており、この入出力コネクタパネル 11の各コネクタ [0023] On the right side of the housing 2, an input / output connector panel 11 having a LAN interface, an analog audio input terminal, an analog audio output terminal, a digital audio input / output terminal, a serial terminal, and the like is installed. Connectors on connector panel 11
(以下、入出力コネクタ 110)は筐体 2内部に設置された入出力インタフェース 12に 接続されている。また、入出力コネクタパネル 11には、電源が供給される DCジャック も設けられている。 (Hereinafter, the input / output connector 110) is connected to the input / output interface 12 installed in the housing 2. The input / output connector panel 11 is also provided with a DC jack to which power is supplied.
[0024] 筐体 2の下面には、同仕様の 16個のスピーカ SP;!〜 SP16が設置されている。これ らスピーカ SP;!〜 SP16は筐体 2の長手方向に沿って一定の間隔で直線状に設置さ れており、これらによりスピーカアレイが構成される。筐体 2の正面および背面には、 同仕様のマイク MIC10;!〜 MIC116およびマイク MIC20;!〜 MIC216が設置され ている。これらマイク MIC10;!〜 MIC116、マイク MIC20;!〜 MIC216は長手方向 に沿って直線状に設置されており、これらによりマイクアレイが構成される。  [0024] On the lower surface of the housing 2, 16 speakers SP;! To SP16 having the same specifications are installed. These speakers SP ;! to SP16 are installed in a straight line at regular intervals along the longitudinal direction of the housing 2, and these constitute a speaker array. On the front and back of case 2, microphones MIC10;! To MIC116 and microphones MIC20;! To MIC216 of the same specifications are installed. These microphones MIC10;! To MIC116 and microphone MIC20;! To MIC216 are installed in a straight line along the longitudinal direction, and these constitute a microphone array.
そして、筐体 2の下面および正面,背面側には、これらスピーカアレイおよびマイク アレイを覆う断面 U字形状で長手方向に樋状に形成された下面グリル 6が取り付けら れている。この下面グリル 6は、パンチメッシュが施された金属板で構成されており、ス ピー力 SP 〜 SP16、マイク MIC10 〜 MIC116、 MIC20 〜 MIC216を保護する とともに、放音および収音する音声を通過させるようになつている。  On the lower surface, front surface, and rear surface side of the housing 2, a lower surface grill 6 that is U-shaped in cross-section and covers the speaker array and the microphone array and is formed in a longitudinal shape is attached. The bottom grill 6 is made of a metal plate with a punch mesh and protects the SP SP SP16, the microphones MIC10 to MIC116, and the MIC20 to MIC216, and allows sound that is emitted and collected to pass through. It ’s like that.
このマイク MIC10;!〜 MIC116と収音ビーム生成部 181とが正面側の収音ビーム を形成し、マイク MIC20;!〜 MIC216と収音ビーム生成部 182とが背面側の収音ビ ームを形成する。 The microphone MIC10 ;! to MIC116 and the collected sound beam generation unit 181 form a sound collecting beam on the front side, and the microphone MIC20;! To MIC216 and the collected sound beam generating unit 182 are arranged on the rear side. Form.
[0025] なお、本実施形態では、スピーカアレイのスピーカ数を 16個とし、各マイクアレイの マイク数をそれぞれ 16個とした力 これに限ることなぐ仕様に応じてスピーカ数およ びマイク数は適宜設定すればよい。また、スピーカアレイおよびマイクアレイの間隔は 任意である。すなわち、一定間隔でもよぐ中央部を密に配置し、両端部にいくに従 い疎に配置するようにしてもよい。更に、本実施形態では、マイクアレイをラインアレイ で構成しているが、マイクアレイはラインアレイに限定されず、マトリクス状に配列され たアレイでもよい。  [0025] In this embodiment, the number of speakers in the speaker array is 16 and the number of microphones in each microphone array is 16. The number of speakers and the number of microphones is not limited to this. What is necessary is just to set suitably. The distance between the speaker array and the microphone array is arbitrary. In other words, the central part may be arranged densely at regular intervals and may be arranged sparsely as it goes to both ends. Furthermore, in this embodiment, the microphone array is configured as a line array, but the microphone array is not limited to a line array, and may be an array arranged in a matrix.
[0026] 次に、音声会議システムの機能について図 4, 5を参照して説明する。図 4は、音声 会議システムの機能的な構成を示すブロック図である。図 5は、収音エリアの説明図 である。図 5 (A)は、音声収音用の収音エリアを示し、図 5 (B)は、位置検出用の収音  Next, functions of the audio conference system will be described with reference to FIGS. FIG. 4 is a block diagram showing a functional configuration of the audio conference system. Fig. 5 is an explanatory diagram of the sound collection area. Figure 5 (A) shows the sound collection area for sound collection, and Figure 5 (B) shows the sound collection for position detection.
[0027] 音声会議システムは、機能的に、制御部 10、入出力コネクタ 110、音声会議装置 1 の入出力インタフェース 12、放音指向性制御部 13、 D/Aコンバータ 14、放音用ァ ンプ 15、スピーカアレイ(スピーカ SP;!〜 SP16)、マイクアレイ(マイク MIC10;!〜 MI C116, MIC20;!〜 MIC216)、収音用アンプ 16、 A/Dコンバータ 17、収音ビーム 生成部 181 , 182、収音ビーム選択部 19、エコーキャンセル部 20、カメラ制御部 22 、カメラ 7、表示端末 8、ビデオ通信装置 9の入出力インタフェース 91、映像コーデッ ク 92、操作部 4を備える。 [0027] The audio conference system functionally includes a control unit 10, an input / output connector 110, an input / output interface 12 of the audio conference device 1, a sound output directivity control unit 13, a D / A converter 14, a sound output amplifier. 15, speaker array (speaker SP;! To SP16), microphone array (microphone MIC10;! To MIC116, MIC20;! To MIC216), sound collecting amplifier 16, A / D converter 17, sound collecting beam generator 181, 182, a collected sound beam selection unit 19, an echo cancellation unit 20, a camera control unit 22, a camera 7, a display terminal 8, an input / output interface 91 of the video communication device 9, a video codec 92, and an operation unit 4.
[0028] 制御部 10は、操作部 4からの入力を受け、放音指向性制御部 13を制御し、発言者 位置検出部 191からの入力を受け、カメラ制御部 22を制御する。制御の詳細につい ては後述する。  The control unit 10 receives an input from the operation unit 4, controls the sound output directivity control unit 13, receives an input from the speaker position detection unit 191, and controls the camera control unit 22. Details of the control will be described later.
[0029] 入出力インタフェース 12は、エコーキャンセル部 20力、ら入力された音声信号をパケ ット化して、ネットワーク 100に出力する。また、入出力コネクタ 110を介して入力され た音声信号をビットストリームのデジタル音声信号 S1に変換して出力する。デジタル 音声信号 S 1は、エコーキャンセル部 20を介して放音指向性制御部 13に供給される The input / output interface 12 packetizes the input audio signal from the echo cancel unit 20 and outputs the packet to the network 100. Also, the audio signal input via the input / output connector 110 is converted into a digital audio signal S1 of a bit stream and output. The digital audio signal S 1 is supplied to the sound emission directivity control unit 13 via the echo cancellation unit 20.
Yes
より具体的には、ネットワーク 100および LANコネクタを介して音声信号が入力され た場合、入出力インタフェース 12は、パケット化された音声信号を時系列に配列して 順次出力することにより、ビットストリーム化して放音指向性制御部 13に出力する。な お、アナログオーディオ入力端子を介してアナログ信号が入力された場合には、入 出力インタフェース 12は、この信号をデジタル化して放音指向性制御部 13に出力す More specifically, audio signals are input via the network 100 and LAN connector. In this case, the input / output interface 12 arranges the packetized audio signals in time series and sequentially outputs them, thereby converting them into bit streams and outputting them to the sound output directivity control unit 13. When an analog signal is input via the analog audio input terminal, the input / output interface 12 digitizes this signal and outputs it to the sound output directivity control unit 13.
[0030] 放音指向性制御部 13は、制御部 10の指示により、入出力インタフェース 12から供 給された音声信号力もスピーカアレイの各スピーカ SP;!〜 SP16に供給する個別放 音信号を生成する機能部である。放音指向性制御部 13は、スピーカアレイからビー ム化された音声である放音ビームが放音されるように各スピーカ SP;!〜 SP16に供給 する個別放音信号を生成する。このため、放音指向性制御部 13は、入力された音声 信号に対してそれぞれ所定の遅延処理及び所定の振幅処理等を行って個別放音 信号を生成する。なお、放音ビームは、狭範囲に放音する放音ビーム、及び、広範 囲に放音する放音ビームがあり、それぞれ参加者の操作部 4の操作によるモード設 定によって切り換えが可能である。 [0030] The sound emission directivity control unit 13 generates an individual sound emission signal supplied to each speaker SP ;! to SP16 of the speaker array, with the audio signal power supplied from the input / output interface 12 according to an instruction from the control unit 10. It is a functional part to do. The sound emission directivity control unit 13 generates an individual sound emission signal to be supplied to each speaker SP;! To SP16 so that a sound emission beam which is a sound made into a beam from the speaker array is emitted. For this reason, the sound emission directivity control unit 13 performs a predetermined delay process, a predetermined amplitude process, and the like on the input sound signal to generate individual sound emission signals. There are two types of sound emission beams: a sound emission beam that emits sound in a narrow range and a sound emission beam that emits sound in a wide range, which can be switched according to the mode setting by operating the operation unit 4 of each participant. .
そして、放音指向性制御部 13は、生成した個別放音信号をスピーカ SP;!〜 SP16 毎に設置された D/Aコンバータ 14に出力する。各 D/Aコンバータ 14は個別放音 信号をアナログ形式に変換して各放音用アンプ 15に出力し、各放音用アンプ 15は 個別放音信号を増幅してスピーカ SP1〜SP16に与える。  And the sound emission directivity control part 13 outputs the produced | generated individual sound emission signal to the D / A converter 14 installed for every speaker SP;!-SP16. Each D / A converter 14 converts the individual sound emission signal into an analog format and outputs it to each sound emission amplifier 15, and each sound emission amplifier 15 amplifies the individual sound emission signal and applies it to the speakers SP1 to SP16.
[0031] スピーカアレイの各スピーカ SP;!〜 SP16は、供給された個別放音信号を音声変換 して外部に放音する。スピーカ SP;!〜 SP16は筐体 2の下面に下向きに設置されて いるので、放音された音声は、音声会議装置 1が設置される机の設置面で反射して、 参加者のいる装置の横力 斜め上方に向かって伝搬される。  [0031] Each speaker SP;! To SP16 of the speaker array converts the supplied individual sound emission signal into sound and emits the sound to the outside. Speakers SP;! To SP16 are installed downward on the lower surface of the housing 2 so that the emitted sound is reflected on the installation surface of the desk on which the audio conference device 1 is installed, and the device with the participants Side force is propagated diagonally upward.
[0032] マイクアレイの各マイク MIC10;!〜 MIC116、 MIC20;!〜 MIC216は、それぞれ 音声会議装置 1の正面側、背面側の音声を収音して電気信号である音声信号に変 換し、この音声信号を各収音用アンプ 16に出力する。各収音用アンプ 16は、音声信 号を増幅してそれぞれ A/Dコンバータ 17に与え、 A/Dコンバータ 17は、アナログ の音声信号をデジタル信号に変換して収音ビーム生成部 181 , 182に出力する。こ こで、収音ビーム生成部 181には、正面に設置されたマイク MIC10;!〜 MIC116が 収音した正面側の音声信号が入力され、収音ビーム生成部 182には、背面に設置さ れたマイク MIC20;!〜 MIC216が収音した背面側の音声信号が入力される。 [0032] Each microphone MIC10 ;! to MIC116, MIC20;! To MIC216 of the microphone array picks up the sound on the front side and the back side of the audio conference device 1 and converts it into an audio signal that is an electrical signal. This audio signal is output to each sound collecting amplifier 16. Each of the sound collecting amplifiers 16 amplifies the sound signal and supplies the amplified signal to the A / D converter 17, and the A / D converter 17 converts the analog sound signal into a digital signal to collect sound collecting beam generating units 181 and 182. Output to. Here, the microphone MIC10;! ~ MIC116 installed in the front is installed in the sound collection beam generator 181. The collected sound signal from the front side is input, and the sound collecting beam generation unit 182 receives the sound signal from the back side collected by the microphones MIC20;! To MIC216 installed on the back surface.
[0033] 収音ビーム生成部 181 , 182は、音声収音用の広範囲の収音ビーム及びカメラ 7制 御用の狭範囲の収音ビームを形成するべぐ各マイク MIC10;!〜 MIC116, MIC2 0;!〜 MIC216が収音した音声信号に対して遅延処理を行う。 [0033] The sound collecting beam generating units 181 and 182 are each of the microphones MIC10 to form a wide sound collecting beam for sound collecting and a narrow sound collecting beam for controlling the camera 7;! To MIC116, MIC20. ; ~~ Delay processing is performed on the audio signal picked up by the MIC216.
具体的には、広範囲で音声を収音するために、図 5 (A)に示すように、正面側,背 面側ともに 1つのエリアを設定して、これらエリアをそれぞれ収音する収音ビーム MB 1 , MB2を形成し、収音ビーム選択部 19に出力する。  Specifically, in order to pick up sound over a wide range, as shown in Fig. 5 (A), one area is set on both the front and back sides, and a sound collecting beam that picks up each area. MB 1 and MB 2 are formed and output to the collected sound beam selector 19.
また、主な発言者に対してカメラ 7を向けるよう制御するために、図 5 (B)に示すよう に、同時に複数スポット(図 5 (B)では正面側,背面側のそれぞれ 4スポット)に対する 収音ビーム MB1;!〜 MB14、 MB2;!〜 MB24を形成し、収音ビーム選択部 19に出 力する。  Also, in order to control the camera 7 to be directed to the main speaker, as shown in Fig. 5 (B), multiple spots (4 spots on the front side and back side in Fig. 5 (B)) are simultaneously applied. Collected sound beams MB1;! To MB14, MB2;! To MB24 are formed and output to the collected sound beam selector 19.
[0034] なお、カメラ 7制御用の狭範囲の収音ビーム生成時は、音声を収音する場合と異な つて音質を考慮する必要がないため、収音した音声信号をハイパスフィルタで濾波し て指向性の強い lkHz〜3kHz程度の高音域の信号のみを用いて収音ビーム MB1 1~MB14, MB2;!〜 MB24を生成してもよい。  [0034] Note that, when generating a narrow-range sound collection beam for camera 7 control, it is not necessary to consider the sound quality unlike the case of collecting sound, so the collected sound signal is filtered by a high-pass filter. The collected sound beams MB1 1 to MB14, MB2;! To MB24 may be generated using only signals having a high directivity of about 1 kHz to 3 kHz.
[0035] また、本実施形態では、正面側、背面側にそれぞれ 4スポット形成して!/、るが、これ に限らず、複数スポットであればよい。  [0035] In this embodiment, four spots are formed on the front side and the back side, respectively. However, the present invention is not limited to this, and a plurality of spots may be used.
[0036] 収音ビーム選択部 19は、発言者位置検出部 191にて、 8個の収音ビーム MB1;!〜 MB14、 MB2;!〜 MB24で収音した 8スポットの音声信号のうち、最も高レベルのも のが目的の音声信号 (すなわち、ノイズではない会議参加者の発言)であるとして、 最も高レベルの音声信号の収音方向 DSを検出し、収音方向 DSを制御部 10に出力 する。  [0036] The sound collection beam selection unit 19 uses the speaker position detection unit 191 to select the most of the eight spot audio signals collected by the eight sound collection beams MB1;!-MB14, MB2;!-MB24. Assuming that the high-level audio signal is the target audio signal (that is, the speech of a conference participant that is not noise), the sound collection direction DS of the highest-level audio signal is detected, and the sound collection direction DS is sent to the control unit 10. Output.
また、収音ビーム選択部 19は、 2つの収音ビーム MB1 , MB2のうち、収音方向 DS を含む収音ビームを選択して音声信号 MB0として後段のエコーキャンセル部 20に 出力する。  In addition, the sound collection beam selection unit 19 selects a sound collection beam including the sound collection direction DS from the two sound collection beams MB1 and MB2, and outputs the selected sound collection beam as an audio signal MB0 to the subsequent echo cancellation unit 20.
[0037] エコーキャンセル部 20は、「入出力インタフェース 12から入力された音声信号がス ピー力 SP;!〜 SP16から放音され、この放音された音声信号がマイク MIC10;!〜 Ml CI 16、 MIC20;!〜 MIC216に回帰して再び入出力インタフェース 12から出力され る」というエコー現象を防ぐための機能部である。エコーキャンセル部 20は、適応型フ ィルタ 211を用いて上記経路の回帰音を推定し、推定した回帰音をマイクが収音した 音声信号から減算することによりエコーを抑制するものである。 [0037] The echo canceling unit 20 indicates that “the audio signal input from the input / output interface 12 is emitted from the sound power SP ;! to SP16, and the emitted audio signal is output from the microphone MIC10 ;! to Ml. It is a functional unit to prevent the echo phenomenon of “CI 16, MIC20;! ~ MIC216 and output from I / O interface 12 again”. The echo cancel unit 20 estimates the regression sound of the above path using the adaptive filter 211, and suppresses the echo by subtracting the estimated regression sound from the voice signal collected by the microphone.
具体的に、エコーキャンセル部 20は、適応型エコーキャンセラ 21を備えている。適 応型エコーキャンセラ 21は、適応型フィルタ 211とポストプロセッサ 212とを備えてい る。適応型フィルタ 211は、スピーカ SPに供給される音声信号に基づき、マイク MIC に回帰する音声信号成分を推定して擬似回帰音信号を生成する。ポストプロセッサ 2 12は、収音ビーム選択部 19が出力した音声信号 MB0から、入力音声信号 S1に対 する擬似回帰音信号を減算することによりエコー成分を除去する。この音声信号 MB 0からエコー成分を除去した音声信号は入出力インタフェース 12に入力される。  Specifically, the echo cancellation unit 20 includes an adaptive echo canceller 21. The adaptive echo canceller 21 includes an adaptive filter 211 and a post processor 212. The adaptive filter 211 estimates a sound signal component that returns to the microphone MIC based on the sound signal supplied to the speaker SP, and generates a pseudo-regression sound signal. The post processor 212 removes the echo component by subtracting the pseudo regression sound signal corresponding to the input sound signal S1 from the sound signal MB0 output by the sound collection beam selection unit 19. The audio signal obtained by removing the echo component from the audio signal MB 0 is input to the input / output interface 12.
[0038] このようなエコーキャンセル処理を行うことにより、スピーカ SPからマイク MICに回帰 する音声信号を的確に予測して除去することができ、マイク MICで収音した音声信 号のみを入出力インタフェース 12から出力することができる。  [0038] By performing such echo cancellation processing, the audio signal returning from the speaker SP to the microphone MIC can be accurately predicted and removed, and only the audio signal collected by the microphone MIC can be input / output interface. 12 can be output.
[0039] カメラ制御部 22は、制御部 10から収音方向 DSが入力されると、収音方向 DSを撮 影方向の中心とするようにカメラ 7の撮像部 71の方向を制御する。このようにカメラ 7 は、音声会議装置 1から入力される収音方向 DSに従って撮影方向を決める。これに より、発言者を自動で撮影することができる。カメラ 7の撮影データは、映像コーデック 92へ出力される。  [0039] When the sound collection direction DS is input from the control unit 10, the camera control unit 22 controls the direction of the imaging unit 71 of the camera 7 so that the sound collection direction DS is the center of the imaging direction. Thus, the camera 7 determines the shooting direction according to the sound collection direction DS input from the audio conference apparatus 1. This makes it possible to automatically photograph the speaker. The shooting data of the camera 7 is output to the video codec 92.
[0040] 映像コーデック 92は、カメラ 7から入力された撮影データの圧縮を行い、入出力イン タフエース 91へ出力する。また、映像コーデック 92は、入出力インタフェース 91から 入力された映像信号 P1の伸張を行い表示端末 8へ出力する。  The video codec 92 compresses the shooting data input from the camera 7 and outputs the compressed data to the input / output interface 91. Also, the video codec 92 decompresses the video signal P1 input from the input / output interface 91 and outputs it to the display terminal 8.
[0041] 入出力インタフェース 91は、映像コーデック 92から入力された撮影データをバケツ ト化して、ネットワーク 100に出力する。また、入出力インタフェース 91は、ネットワーク 100から入力された映像信号をビットストリームのデジタル映像信号 P1に変換して出 力する。デジタル映像信号 P1は、映像コーデック 92を介して表示端末 8に供給され より具体的には、ネットワーク 100を介して映像信号が入力された場合、入出力イン タフエース 91は、パケット化された映像信号を時系列に配列して順次出力することに より、ビットストリーム化して表示端末 8に出力する。 [0041] The input / output interface 91 converts the shooting data input from the video codec 92 into a bucket and outputs it to the network 100. The input / output interface 91 converts the video signal input from the network 100 into a digital video signal P1 of a bit stream and outputs it. The digital video signal P1 is supplied to the display terminal 8 via the video codec 92. More specifically, when the video signal is input via the network 100, the input / output The tuface 91 arranges packetized video signals in time series and sequentially outputs them, thereby converting them into bit streams and outputting them to the display terminal 8.
[0042] 以上より、本実施形態の音声会議システムでは、音声収音用と発言者の位置検出 用と 2つの異なる収音ビームを生成する。そして、音声収音用の収音ビームを用いて 、音声会議装置に対して主な発言者と反対側の音声を収音せずに、主な発言者側 の音声のみを効果的に収音することで、主な発言者の発言を明瞭化できる。更に、 発言者の位置検出用の収音ビームを用いて、主な発言者の位置を特定することで、 主な発言者にカメラ 7を向けて撮影することができる。また、主な発言者が変わると、 自動でカメラ 7の方向を切り替えることができる。 As described above, in the audio conference system according to the present embodiment, two different sound collection beams are generated for sound collection and for speaker position detection. Then, using the sound collecting beam for sound collecting, the sound of the main speaker side is effectively collected without collecting the sound on the opposite side to the main speaker with respect to the audio conference apparatus. By doing so, the speech of the main speaker can be clarified. Furthermore, by identifying the position of the main speaker using the sound collection beam for detecting the position of the speaker, it is possible to take a picture with the camera 7 facing the main speaker. In addition, if the main speaker changes, the direction of the camera 7 can be switched automatically.
[0043] また、本発明の音声会議システムは、図 6に示すように、ビデオ通信装置 9を利用 せずに、拡声装置として会議に用いることができる。この場合、音声会議装置 1には、 カメラ 7が接続され、カメラ 7に表示端末 8を接続する。音声会議装置 1は、収音した 音声を増幅して放音する。また、カメラ 7は、音声会議装置 1から入力される収音方向 DSに従って撮影方向を決定し、撮影を行い、撮影データを生成する。カメラ 7は、生 成した撮影データを表示端末 8に出力して、表示端末 8で撮影データを表示する。 これにより、発言者の発言を増幅して放音するとともに、主な発言者をカメラ 7で撮 影して表示端末 8に表示することができる。このため、大会議室等で行われる会議に おいても、参加者は、容易に発言者の発言を聞くことができる。また、主な発言者を 表示端末 8に表示して、会議を進行することができるので、会議の参加者は主な発言 者を容易に知ることができる。  Further, as shown in FIG. 6, the audio conference system of the present invention can be used as a loudspeaker for a conference without using the video communication device 9. In this case, a camera 7 is connected to the audio conference apparatus 1, and a display terminal 8 is connected to the camera 7. The audio conference apparatus 1 amplifies the collected sound and emits it. The camera 7 determines the shooting direction according to the sound collection direction DS input from the audio conference apparatus 1, performs shooting, and generates shooting data. The camera 7 outputs the generated shooting data to the display terminal 8 and displays the shooting data on the display terminal 8. As a result, it is possible to amplify and emit the speaker's speech, and to capture the main speaker with the camera 7 and display it on the display terminal 8. For this reason, even in a conference held in a large conference room, participants can easily hear the speaker's comments. In addition, since the main speaker can be displayed on the display terminal 8 and the conference can proceed, the conference participants can easily know the main speaker.
[0044] なお、本実施形態に限らず、図 7に示すように、収音ビーム選択部 19は、音声信号 の収音方向に関係なぐ 2つの収音ビーム MB1 , MB2を合成して音声信号 MBOを 生成し、この音声信号 MBOを後段のエコーキャンセル部 20に出力してもよい。  [0044] Not limited to the present embodiment, as shown in FIG. 7, the sound collection beam selection unit 19 synthesizes two sound collection beams MB1 and MB2 that are related to the sound collection direction of the sound signal to produce the sound signal. An MBO may be generated, and this audio signal MBO may be output to the echo cancellation unit 20 at the subsequent stage.
これにより、 2つの収音ビーム MB1 , MB2を合成して音声信号 MBOを生成するの で、主な発言者をカメラ 7で確実に撮影しながら、主な発言者側だけでなぐ広範囲 に音声を収音することで全ての参加者の発言を効果的に収音することができる。  As a result, the sound signal MBO is generated by synthesizing the two sound collecting beams MB1 and MB2, so that the main speaker can be captured with the camera 7 and the sound can be heard over a wide range only by the main speaker. By collecting sound, the speech of all participants can be collected effectively.
[0045] 更に、本実施形態に限らず、図 8に示すように、音声会議装置 1に音声および映像 の通信部を設けてもよい。この通信部を通じて相手方音声会議装置と通信会議を行 うこと力 Sできる。この場合、カメラ 7で撮影した撮影データ及びマイクで収音された音 声データは、音声会議装置 1を介してネットワーク 100に出力される。そして、遠隔地 にある他の音声会議装置からネットワーク 100を介して入力された映像信号は、音声 会議装置 1を介して、表示端末 8に表示される。他の音声会議装置に送信される撮 影データ及び音声データに関して、複数の狭範囲の収音ビームにより検出された高 レベルの音声信号に対応する収音方向を、撮影方向として制御されたカメラ 7によつ て撮影された撮影データが送信される。また、狭範囲の収音ビームにより検出された 収音方向を含む広範囲の収音ビームに基づいて生成された音声データが送信され る。更に、この場合、映像信号の入出力インタフェース 91を音声信号の入出力インタ フェース 12と一体化し、共通の入出力コネクタ 1 10を介してネットワーク 100に接続 すればよい。 Furthermore, not limited to the present embodiment, as shown in FIG. 8, the audio conference apparatus 1 may be provided with an audio and video communication unit. Use this communication unit to hold a communication conference with the other party's voice conference device. The power of S In this case, the shooting data shot by the camera 7 and the voice data collected by the microphone are output to the network 100 via the voice conference device 1. Then, a video signal input from another audio conference device at a remote location via the network 100 is displayed on the display terminal 8 via the audio conference device 1. With respect to imaging data and audio data transmitted to other audio conferencing devices, the camera 7 is controlled with the sound collection direction corresponding to the high-level audio signals detected by a plurality of narrow-range sound collection beams as the shooting direction. The shooting data shot by is sent. In addition, sound data generated based on a wide range of sound collection beams including the sound collection direction detected by the narrow range of sound collection beams is transmitted. Further, in this case, the video signal input / output interface 91 may be integrated with the audio signal input / output interface 12 and connected to the network 100 via the common input / output connector 110.
なお、図 8は、図 4の音声会議装置 1に映像の通信部を更に設けている力 S、これに 限らず、図 7の音声会議装置 1に映像の通信部を更に設けてもよい。  8 is a force S in which the audio communication device 1 in FIG. 4 is further provided with a video communication unit, but not limited thereto, the audio conference device 1 in FIG. 7 may be further provided with a video communication unit.

Claims

請求の範囲 The scope of the claims
[1] 所定パターンで配列された複数のマイクを有するマイクアレイと、  [1] a microphone array having a plurality of microphones arranged in a predetermined pattern;
前記マイクアレイの各マイクで収音された複数の収音音声信号に基づ!/、て、自装 置周りの第 1収音範囲が設定された第 1収音ビームを形成するエリア収音ビーム形成 部と、  Based on a plurality of collected sound signals picked up by each microphone of the microphone array! /, An area sound pickup that forms a first sound pickup beam in which a first sound pickup range around the device is set. A beam forming section,
前記マイクアレイの各マイクで収音された複数の収音音声信号に基づ!/、て、前記 第 1収音範囲より狭い第 2収音範囲が設定された第 2収音ビームを形成するスポット 収音ビーム形成部と、  Based on a plurality of collected sound signals picked up by each microphone of the microphone array, a second sound collecting beam in which a second sound collecting range narrower than the first sound collecting range is set is formed. A spot collecting beam forming section;
前記スポット収音ビーム形成部で形成された複数の第 2収音ビームから話者方向を 検出し、該話者方向を撮影方向として検出する撮影方向検出部と、  A shooting direction detection unit that detects a speaker direction from a plurality of second sound collection beams formed by the spot sound collection beam forming unit, and detects the speaker direction as a shooting direction;
を備えた音声会議装置。  An audio conference device.
[2] スポット収音ビーム形成部は、前記収音音声信号の高域成分のみを用いて収音ビ ームを形成する請求項 1に記載の音声会議装置。 [2] The audio conference apparatus according to [1], wherein the spot sound collection beam forming unit forms a sound collection beam using only a high frequency component of the collected sound signal.
[3] 他の音声会議装置とネットワークを介して接続され、該他の音声会議装置と通信を 行う通信部と、 [3] A communication unit that is connected to another voice conference device via a network and communicates with the other voice conference device;
前記エリア収音ビーム形成部で形成された第 1収音ビームに基づいて音声データ を生成し、前記通信部を介して該音声データを前記他の音声会議装置に送信する 制御部と、  A control unit that generates audio data based on the first sound collection beam formed by the area sound collection beam forming unit, and transmits the audio data to the other audio conference device via the communication unit;
をさらに備えた請求項 1に記載の音声会議装置。  The audio conference apparatus according to claim 1, further comprising:
[4] 所定パターンで配列された複数のマイクを有するマイクアレイと、 [4] a microphone array having a plurality of microphones arranged in a predetermined pattern;
前記マイクアレイの各マイクで収音された複数の収音音声信号に基づ!/、て、自装 置周りの第 1収音範囲が設定された第 1収音ビームを形成するエリア収音ビーム形成 部と、  Based on a plurality of collected sound signals picked up by each microphone of the microphone array! /, An area sound pickup that forms a first sound pickup beam in which a first sound pickup range around the device is set. A beam forming section,
前記マイクアレイの各マイクで収音された複数の収音音声信号に基づ!/、て、前記 第 1収音範囲より狭い第 2収音範囲が設定された第 2収音ビームを形成するスポット 収音ビーム形成部と、  Based on a plurality of collected sound signals picked up by each microphone of the microphone array, a second sound collecting beam in which a second sound collecting range narrower than the first sound collecting range is set is formed. A spot collecting beam forming section;
該スポット収音ビーム形成部で形成された複数の第 2収音ビームから話者方向を検 出し、該話者方向を撮影方向として検出する撮影方向検出部と、 前記音声会議装置の撮影方向検出部により検出された撮影方向を撮影して映像 データを生成する撮影部と、 A shooting direction detection unit that detects a speaker direction from the plurality of second sound collection beams formed by the spot sound collection beam forming unit, and detects the speaker direction as a shooting direction; A shooting unit that shoots the shooting direction detected by the shooting direction detection unit of the audio conference device and generates video data;
を備えた音声会議システム。  Voice conference system with
[5] スポット収音ビーム形成部は、前記収音音声信号の高域成分のみを用いて収音ビ ームを形成する請求項 4に記載の音声会議システム。 5. The audio conference system according to claim 4, wherein the spot sound collection beam forming unit forms a sound collection beam using only a high frequency component of the collected sound signal.
[6] 他の音声会議装置とネットワークを介して接続され、該他の音声会議装置と通信を 行う通信部と、 [6] A communication unit that is connected to another voice conference device via a network and communicates with the other voice conference device;
前記エリア収音ビーム形成部で形成された第 1収音ビームに基づいて音声データ を生成し、前記通信部を介して該音声データを前記他の音声会議装置に送信する 制御部と、  A control unit that generates audio data based on the first sound collection beam formed by the area sound collection beam forming unit, and transmits the audio data to the other audio conference device via the communication unit;
をさらに備えた請求項 4に記載の音声会議システム。  The audio conference system according to claim 4, further comprising:
PCT/JP2007/070195 2006-10-17 2007-10-16 Voice conference device and voice conference system WO2008047804A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2007800321284A CN101513056B (en) 2006-10-17 2007-10-16 Audio conference apparatus and audio conference system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2006282565A JP5028944B2 (en) 2006-10-17 2006-10-17 Audio conference device and audio conference system
JP2006-282565 2006-10-17

Publications (1)

Publication Number Publication Date
WO2008047804A1 true WO2008047804A1 (en) 2008-04-24

Family

ID=39314031

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2007/070195 WO2008047804A1 (en) 2006-10-17 2007-10-16 Voice conference device and voice conference system

Country Status (3)

Country Link
JP (1) JP5028944B2 (en)
CN (1) CN101513056B (en)
WO (1) WO2008047804A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9615059B2 (en) 2015-07-28 2017-04-04 Ricoh Company, Ltd. Imaging apparatus, medium, and method for imaging
CN111866421A (en) * 2019-04-30 2020-10-30 陈筱涵 Conference recording system and conference recording method

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100970609B1 (en) * 2008-12-01 2010-07-16 박철우 camera place control unit with sensing the sound
CN102742261A (en) * 2010-05-24 2012-10-17 联发科技(新加坡)私人有限公司 Method for generating multimedia data to be displayed on display apparatus and associated multimedia player
CN102404663A (en) * 2010-09-10 2012-04-04 中兴通讯股份有限公司 Microphone array device, conference system and intelligent terminal
US9565493B2 (en) 2015-04-30 2017-02-07 Shure Acquisition Holdings, Inc. Array microphone system and method of assembling the same
US9554207B2 (en) 2015-04-30 2017-01-24 Shure Acquisition Holdings, Inc. Offset cartridge microphones
JP6547496B2 (en) 2015-08-03 2019-07-24 株式会社リコー Communication apparatus, communication method, program and communication system
JP6631166B2 (en) * 2015-08-03 2020-01-15 株式会社リコー Imaging device, program, and imaging method
JP6551155B2 (en) 2015-10-28 2019-07-31 株式会社リコー Communication system, communication apparatus, communication method and program
CN106911484A (en) * 2015-12-23 2017-06-30 卡讯电子股份有限公司 Microphone speech system control method
CN106101885A (en) * 2016-08-05 2016-11-09 上海柏莱特视听设备服务有限公司 Meeting mike
US10367948B2 (en) 2017-01-13 2019-07-30 Shure Acquisition Holdings, Inc. Post-mixing acoustic echo cancellation systems and methods
CN112335261B (en) 2018-06-01 2023-07-18 舒尔获得控股公司 Patterned microphone array
US11297423B2 (en) 2018-06-15 2022-04-05 Shure Acquisition Holdings, Inc. Endfire linear array microphone
US11310596B2 (en) 2018-09-20 2022-04-19 Shure Acquisition Holdings, Inc. Adjustable lobe shape for array microphones
JP2022526761A (en) 2019-03-21 2022-05-26 シュアー アクイジッション ホールディングス インコーポレイテッド Beam forming with blocking function Automatic focusing, intra-regional focusing, and automatic placement of microphone lobes
WO2020191354A1 (en) 2019-03-21 2020-09-24 Shure Acquisition Holdings, Inc. Housings and associated design features for ceiling array microphones
US11558693B2 (en) 2019-03-21 2023-01-17 Shure Acquisition Holdings, Inc. Auto focus, auto focus within regions, and auto placement of beamformed microphone lobes with inhibition and voice activity detection functionality
CN114051738A (en) 2019-05-23 2022-02-15 舒尔获得控股公司 Steerable speaker array, system and method thereof
TW202105369A (en) 2019-05-31 2021-02-01 美商舒爾獲得控股公司 Low latency automixer integrated with voice and noise activity detection
US11297426B2 (en) 2019-08-23 2022-04-05 Shure Acquisition Holdings, Inc. One-dimensional array microphone with improved directivity
US11552611B2 (en) 2020-02-07 2023-01-10 Shure Acquisition Holdings, Inc. System and method for automatic adjustment of reference gain
WO2021243368A2 (en) 2020-05-29 2021-12-02 Shure Acquisition Holdings, Inc. Transducer steering and configuration systems and methods using a local positioning system
CN116918351A (en) 2021-01-28 2023-10-20 舒尔获得控股公司 Hybrid Audio Beamforming System

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09163334A (en) * 1995-12-14 1997-06-20 Fujitsu Ltd Speaker detection circuit and video conference system
JPH10191290A (en) * 1996-12-27 1998-07-21 Kyocera Corp Video camera with built-in microphone

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10145763A (en) * 1996-11-15 1998-05-29 Mitsubishi Electric Corp Conference system
JP2002186084A (en) * 2000-12-14 2002-06-28 Matsushita Electric Ind Co Ltd Directive sound pickup device, sound source direction estimating device and system
JP3739673B2 (en) * 2001-06-22 2006-01-25 日本電信電話株式会社 Zoom estimation method, apparatus, zoom estimation program, and recording medium recording the program
CN1411278A (en) * 2002-11-25 2003-04-16 北京邮电通信设备厂 IP network TV conference system
JP4138680B2 (en) * 2004-02-27 2008-08-27 株式会社東芝 Acoustic signal processing apparatus, acoustic signal processing method, and adjustment method
CN2701199Y (en) * 2004-06-18 2005-05-18 陈荣 Desktop automatic controlled video-audio conference control device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09163334A (en) * 1995-12-14 1997-06-20 Fujitsu Ltd Speaker detection circuit and video conference system
JPH10191290A (en) * 1996-12-27 1998-07-21 Kyocera Corp Video camera with built-in microphone

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9615059B2 (en) 2015-07-28 2017-04-04 Ricoh Company, Ltd. Imaging apparatus, medium, and method for imaging
CN111866421A (en) * 2019-04-30 2020-10-30 陈筱涵 Conference recording system and conference recording method

Also Published As

Publication number Publication date
CN101513056B (en) 2011-12-14
JP2008103824A (en) 2008-05-01
JP5028944B2 (en) 2012-09-19
CN101513056A (en) 2009-08-19

Similar Documents

Publication Publication Date Title
JP5028944B2 (en) Audio conference device and audio conference system
EP2007168B1 (en) Voice conference device
JP2008288785A (en) Video conference apparatus
JP2007274463A (en) Remote conference apparatus
JP4816221B2 (en) Sound pickup device and audio conference device
JP4747949B2 (en) Audio conferencing equipment
JP2008312002A (en) Television conference apparatus
WO2007058130A1 (en) Teleconference device and sound emission/collection device
JP2009094682A (en) Audio processing system
JP2012147420A (en) Image processing device and image processing system
JP2007318550A (en) Sound emission/pickup apparatus
JP2007274462A (en) Video conference apparatus and video conference system
JP2005184386A (en) Sound collecting/video recording device
JP2008294690A (en) Voice conference device and voice conference system
JP2009212927A (en) Sound collecting apparatus
JP4479227B2 (en) Audio pickup / video imaging apparatus and imaging condition determination method
JP5028833B2 (en) Sound emission and collection device
JP5055987B2 (en) Audio conference device and audio conference system
JP2008017126A (en) Voice conference system
JP4929673B2 (en) Audio conferencing equipment
JP2007318521A (en) Sound emission/pickup apparatus
JP2009010808A (en) Loudspeaker device

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200780032128.4

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07829929

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07829929

Country of ref document: EP

Kind code of ref document: A1