JP5598394B2

JP5598394B2 - Conference terminal device, conference terminal control method, and conference terminal control program,

Info

Publication number: JP5598394B2
Application number: JP2011063692A
Authority: JP
Inventors: 愛秦
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2011-03-23
Filing date: 2011-03-23
Publication date: 2014-10-01
Anticipated expiration: 2031-03-23
Also published as: WO2012128033A1; JP2012199851A

Description

本発明は、会議端末装置、会議端末制御方法、及び会議端末制御プログラムに関し、詳細には、使用者との位置関係及び使用状況に適した出力音量に制御することができる会議端末装置、会議端末制御方法、会議端末制御プログラムに関する。 The present invention relates to a conference terminal device, a conference terminal control method, and a conference terminal control program, and in particular, a conference terminal device and a conference terminal that can be controlled to an output volume suitable for a positional relationship with a user and a usage situation. The present invention relates to a control method and a conference terminal control program.

従来、複数の拠点でネットワークを介して接続し、各拠点で取得された撮影画像と音声情報を双方向に送受信することにより、遠隔の地にある者同士の会議を実現するテレビ会議装置が知られている。 2. Description of the Related Art Conventionally, there has been known a video conference apparatus that realizes a conference between persons at remote locations by connecting via a network at multiple sites and bidirectionally transmitting and receiving captured images and audio information acquired at each site. It has been.

さらに、カメラで撮像した撮像画像に基づいて会議聴衆者までの距離を算出して、その距離に応じた音量でスピーカから音声を出力するような技術も開発されている。（特許文献１）。上記技術では、スピーカから聴衆者までの距離が遠い場合は、大きな音量でスピーカから音声を出力し、または聴衆者までの距離が近い場合は、小さい音量でスピーカから音声を出力するようになるので、聴衆者はどの位置に居ても最適な音量でスピーカからの出力音声を聞くことができるようになっている。 Furthermore, a technique has been developed in which a distance to a conference audience is calculated based on a captured image captured by a camera, and sound is output from a speaker at a volume corresponding to the distance. (Patent Document 1). In the above technology, when the distance from the speaker to the audience is long, the sound is output from the speaker at a high volume, or when the distance to the audience is short, the sound is output from the speaker at a low volume. The audience can listen to the output sound from the speaker at an optimum volume at any position.

特開２００９−２１８９５０号公報JP 2009-218950 A

しかし、上記技術では、スピーカと聴衆者との距離に適した音量で音声が出力されるようになっているが、環境の騒音状況までは考慮されていない。例えば、聴衆者がいる周囲の環境の騒音が大きい場合、聴衆者とスピーカとの距離で判断された音量で音声が出力されても、聴衆者にとっては聞き取り難い状況になってしまうという課題がある。 However, in the above technique, sound is output at a sound volume suitable for the distance between the speaker and the audience, but environmental noise conditions are not taken into consideration. For example, when the noise in the surrounding environment where the audience is present is large, there is a problem that even if the sound is output at a volume determined by the distance between the audience and the speaker, it becomes difficult for the audience to hear. .

そこで、本発明の目的は、会議端末装置、会議端末制御方法、及び会議端末制御プログラムに関し、詳細には、使用者との位置関係及び使用状況に適した出力音量に制御することができる会議端末装置、会議端末制御方法、会議端末制御プログラムを提供することを目的とする。 Therefore, an object of the present invention relates to a conference terminal device, a conference terminal control method, and a conference terminal control program, and in particular, a conference terminal that can be controlled to an output volume suitable for a positional relationship with a user and a usage situation. An object is to provide an apparatus, a conference terminal control method, and a conference terminal control program.

上記目的を達成する為に、請求項１に記載の会議端末装置では、各種情報の処理を行う情報処理装置と通信手段を介して撮像画像、または音声情報の送受信を行う会議端末装置において、前記会議端末装置が設置された拠点での使用者の音声を音声情報として入力する音声情報入力手段と、前記情報処理装置から送信された他拠点の音声情報を受信して出力する音声情報出力手段と、前記使用者との距離情報を計測する計測手段と、前記音声情報入力手段にて入力された前記使用者の発言による前記音声情報の音量を検出する音量情報検出手段と、前記音量情報検出手段により検出された前記音量情報と、前記計測手段により計測された前記距離情報に基づいて、前記音声情報出力手段から出力する他拠点の前記音声情報の音量を制御する出力音量情報制御手段とを備えることを特徴とする。 In order to achieve the above object, in the conference terminal device according to claim 1, in the conference terminal device that transmits and receives a captured image or audio information via an information processing device that performs processing of various information and a communication unit, Voice information input means for inputting the voice of the user at the site where the conference terminal apparatus is installed as voice information; voice information output means for receiving and outputting the voice information of the other site transmitted from the information processing apparatus; Measuring means for measuring distance information with respect to the user; volume information detecting means for detecting the volume of the voice information according to the user's speech input by the voice information input means; and the volume information detecting means. Based on the sound volume information detected by the sound information and the distance information measured by the measuring means. Characterized in that it comprises a volume information control means.

請求項２に記載の会議端末装置では、請求項１に記載の構成に加え、前記音量情報検出手段により検出された前記音声情報を発言した前記使用者を、発言者として特定する発言者特定手段をさらに備え、前記計測手段は、前記発言者特定手段により特定された前記発言者までの距離を発言者距離情報として計測し、前記出力音量情報制御手段は、前記発言者距離情報と、前記音量情報検出手段により検出された前記音量情報に基づいて、前記音声情報出力手段から出力する他拠点の前記音声情報の音量を制御することを特徴とする。 In the conference terminal device according to claim 2, in addition to the configuration according to claim 1, a speaker identification unit that identifies the user who has spoken the voice information detected by the volume information detection unit as a speaker. The measuring means measures the distance to the speaker specified by the speaker specifying means as speaker distance information, and the output volume information control means includes the speaker distance information and the volume. Based on the volume information detected by the information detection means, the volume of the voice information of the other base output from the voice information output means is controlled.

請求項３に記載の会議端末装置では、請求項２に記載の構成に加え、前記音量情報検出手段で検出された前記音量情報を、前記発言者距離情報に基づいて、前記発言者が発言した位置から所定距離の位置での前記音声情報の前記音量情報を算出する発言位置音量算出手段をさらに備え、前記出力音量情報制御手段は、前記発言位置音量算出手段により算出された前記音量情報と同等になるよう、前記音声情報出力手段から出力する他拠点の前記音声情報の音量を制御することを特徴とする。 In the conference terminal device according to claim 3, in addition to the configuration according to claim 2, the speaker speaks the volume information detected by the volume information detection unit based on the speaker distance information. A speech position volume calculation means for calculating the volume information of the audio information at a position a predetermined distance from the position, wherein the output volume information control means is equivalent to the volume information calculated by the speech position volume calculation means; The volume of the voice information of the other base output from the voice information output means is controlled so as to become.

請求項４に記載の会議端末装置では、請求項３に記載の構成に加え、前記発言者特定手段により発言した前記使用者を特定してから所定時間経過したかどうかを判断する経過時間判断手段をさらに備え、前記出力音量情報制御手段は、前記経過時間判断手段により前記所定時間経過していないと判断している間は、前記音声情報出力手段から出力する他拠点の前記音声情報の音量を、前記発言者距離情報と前記音量情報検出手段により検出された前記音量情報に基づいて制御することを特徴とする。 In the conference terminal device according to claim 4, in addition to the configuration according to claim 3, an elapsed time determination unit that determines whether or not a predetermined time has elapsed since the user who has spoken is specified by the speaker specifying unit. And the output volume information control means controls the volume of the voice information of the other base output from the voice information output means while the elapsed time judgment means judges that the predetermined time has not elapsed. And controlling based on the speaker distance information and the volume information detected by the volume information detecting means.

請求項５に記載の会議端末装置では、請求項１乃至４のいずれかに記載の構成に加え、前記使用者の撮像画像を撮像情報として取得する撮像情報取得手段をさらに備え、前記計測手段は、前記撮像情報取得手段により取得した前記撮像情報に基づいて前記使用者との距離情報を計測することを特徴とする。 In the conference terminal device according to claim 5, in addition to the configuration according to any one of claims 1 to 4, the conference terminal device further includes imaging information acquisition means for acquiring a captured image of the user as imaging information. The distance information with respect to the user is measured based on the imaging information acquired by the imaging information acquisition means.

請求項６に記載の会議端末装置では、請求項５に記載の構成に加え、前記計測手段は、前記撮像情報取得手段により取得した前記撮像情報に基づいて複数の前記使用者との距離を計測した場合、最も大きい前記距離情報を決定し、前記出力音量情報制御手段は、前記計測手段により計測された最も大きい前記距離情報に基づいて、前記音声情報出力手段から出力する他拠点の前記音声情報の最小音量を特定し、前記最小音量以上で出力するよう制御することを特徴とする。 In the conference terminal device according to claim 6, in addition to the configuration according to claim 5, the measurement unit measures distances to the plurality of users based on the imaging information acquired by the imaging information acquisition unit. And determining the largest distance information, and the output sound volume information control means outputs the voice information of the other base output from the voice information output means based on the largest distance information measured by the measuring means. The minimum volume is specified, and control is performed so that the output is at or above the minimum volume.

請求項７に記載の会議端末制御方法では、各種情報の処理を行う情報処理装置と通信手段を介して撮像画像、音声情報の送受信を行う会議端末装置において処理する会議端末制御方法において、前記会議端末装置が設置された拠点での使用者の音声を音声情報として入力する音声情報入力ステップと、前記情報処理装置から送信された他拠点の音声情報を受信して出力する音声情報出力ステップと、前記使用者との距離情報を計測する計測ステップと、前記音声情報入力ステップにて、前記使用者の発言による前記音声情報の音量を検出する音量情報検出ステップと、前記音量情報検出ステップにより検出された前記音量情報と、前記計測ステップにより計測された前記距離情報に基づいて、前記音声情報出力ステップにて出力する他拠点の前記音声情報の音量を制御する出力音量情報制御ステップとを処理することを特徴とする。 The conference terminal control method according to claim 7, wherein the conference terminal control method performs processing in a conference terminal device that performs transmission and reception of captured images and audio information via an information processing device that performs processing of various types of information and communication means. A voice information input step of inputting the voice of the user at the site where the terminal device is installed as voice information; a voice information output step of receiving and outputting the voice information of the other site transmitted from the information processing device; Detected by the volume information detecting step for detecting the volume of the voice information according to the user's speech and the volume information detecting step in the measuring step for measuring distance information with the user, and in the voice information input step. On the basis of the volume information and the distance information measured in the measurement step, Serial characterized by processing the output volume information control step of controlling the volume of the audio information.

請求項８に記載の会議端末制御プログラムでは、各種情報の処理を行う情報処理装置と通信手段を介して撮像画像、音声情報の送受信を行う会議端末装置にて実行する会議端末制御プログラムにおいて、前記会議端末装置が設置された拠点での使用者の音声を音声情報として入力する音声情報入力ステップと、前記情報処理装置から送信された他拠点の音声情報を受信して出力する音声情報出力ステップと、前記使用者との距離情報を計測する計測ステップと、前記音声情報入力ステップにて、前記使用者の発言による前記音声情報の音量を検出する音量情報検出ステップと、前記音量情報検出ステップにより検出された前記音量情報と、前記計測ステップにより計測された前記距離情報に基づいて、前記音声情報出力ステップにて出力する他拠点の前記音声情報の音量を制御する出力音量情報制御ステップとを前記会議端末装置で実行させることを特徴とする。 The conference terminal control program according to claim 8, wherein the conference terminal control program is executed by a conference terminal device that transmits and receives captured images and audio information via an information processing device that performs processing of various types of information and communication means. A voice information input step for inputting the voice of the user at the site where the conference terminal device is installed as voice information; a voice information output step for receiving and outputting the voice information of the other site transmitted from the information processing device; Detected by the step of measuring the distance to the user, the step of detecting the volume of the audio information according to the user's utterance, and the step of detecting the volume information in the step of inputting the voice information Output in the audio information output step based on the volume information and the distance information measured in the measurement step. Characterized in that to execute the output volume information control step of controlling the volume of the audio information of the site by the conference terminals.

請求項１に記載の会議端末装置では、各種情報の処理を行う情報処理装置と通信手段を介して撮像画像、または音声情報の送受信を行う。会議端末装置は、設置された拠点での使用者の音声を音声情報として入力する。また、情報処理装置から送信された他拠点の音声情報を受信して出力する。また、使用者との距離情報を計測する。入力された前記使用者の発言による音声情報の音量を検出し、検出された音量情報と、計測された距離情報に基づいて、出力する他拠点の音声情報の音量を制御する。よって、拠点に設置された会議端末装置は、会議に参加している使用者との距離を計測し、また使用者の発言する音声の音量を検出することで、使用者との距離と使用者の発言音量に基づいて、最適な音量で他拠点の音声情報を出力することができる。したがって、使用者は、周囲の騒音を考慮した音量で発言しているので、会議端末装置から出力される他拠点での会議使用者の音声を、自然に適切な音量で出力させることができるようになる。 In the conference terminal device according to claim 1, a captured image or audio information is transmitted / received via an information processing device that processes various types of information and communication means. The conference terminal device inputs the voice of the user at the installed base as voice information. Also, it receives and outputs the voice information of the other base transmitted from the information processing apparatus. In addition, distance information with the user is measured. The volume of the voice information according to the input of the user's speech is detected, and the volume of the voice information of the other base to be output is controlled based on the detected volume information and the measured distance information. Therefore, the conference terminal device installed at the base measures the distance from the user participating in the conference, and detects the volume of the voice spoken by the user, so that the distance from the user and the user are detected. Based on the utterance volume, voice information of other bases can be output at an optimum volume. Therefore, the user speaks at a volume that takes into account the surrounding noise, so that the voice of the conference user at the other site output from the conference terminal device can be naturally output at an appropriate volume. become.

請求項２に記載の会議端末装置では、請求項１に記載の構成の効果に加え、検出された音声情報を発言した使用者を、発言者として特定する。そして特定された発言者までの距離を発言者距離情報として計測し、発言者距離情報と、検出された音量情報に基づいて、出力する他拠点の音声情報の音量を制御する。よって、会議端末装置は、拠点にいる使用者の中から音声を発言した使用者を発言者として特定し、発言者との距離と音量に基づいて、他拠点からの音声情報の音量を制御して出力することができる。したがって、発言した使用者に適した音量で、会議端末装置から他拠点の会議使用者の音声を出力させることができるようになる。 In the conference terminal device according to the second aspect, in addition to the effect of the configuration according to the first aspect, the user who has spoken the detected voice information is specified as the speaker. Then, the distance to the specified speaker is measured as the speaker distance information, and the volume of the voice information of the other base to be output is controlled based on the speaker distance information and the detected volume information. Therefore, the conference terminal device identifies the user who has spoken out of the users at the base as the speaker, and controls the volume of the voice information from other bases based on the distance to the speaker and the volume. Can be output. Therefore, it becomes possible to output the voice of the conference user at the other site from the conference terminal device at a volume suitable for the user who has spoken.

請求項３に記載の会議端末装置では、請求項２に記載の構成の効果に加え、検出された音量情報を、発言者距離情報に基づいて、発言者が発言した位置から所定距離の位置での音声情報の音量情報を算出する。そして算出された音量情報と同等になるよう、出力する他拠点の音声情報の音量を制御する。よって、使用者は、発言した位置から所定距離の位置で聞こえる音量と同等の音量になるよう、会議端末装置から他拠点の会議使用者の音声を出力させることができるので、発言者とネットワークの向こう側の発言者が近くで会話をしているように自然な音量で、会話をすることができる。 In the conference terminal device according to claim 3, in addition to the effect of the configuration according to claim 2, the detected sound volume information is set at a predetermined distance from the position where the speaker speaks based on the speaker distance information. The volume information of the audio information is calculated. Then, the volume of the audio information of the other base to be output is controlled so as to be equal to the calculated volume information. Therefore, the user can output the voice of the conference user at the other site from the conference terminal device so that the volume is equivalent to the volume that can be heard at a predetermined distance from the spoken position. You can talk at a natural volume as if the other speaker is talking nearby.

請求項４に記載の会議端末装置では、請求項３に記載の構成の効果に加え、発言した使用者を特定してから所定時間経過したかどうかを判断する。そして所定時間経過していないと判断している間は、出力する他拠点の前記音声情報の音量を、発言者距離情報と検出された音量情報に基づいて制御する。よって、発言した使用者が特定されてから、発言が終了した後でも、所定時間内は同じ音量で会議端末装置から他拠点の会議使用者の音声を出力させることができる。 In the conference terminal device according to the fourth aspect, in addition to the effect of the configuration according to the third aspect, it is determined whether or not a predetermined time has elapsed since the user who spoke is specified. While it is determined that the predetermined time has not elapsed, the volume of the voice information of the other base to be output is controlled based on the speaker distance information and the detected volume information. Therefore, even after the utterance is ended after the uttered user is specified, the voice of the conference user at the other base can be output from the conference terminal device at the same volume for a predetermined time.

請求項５に記載の会議端末装置では、請求項１乃至４のいずれかに記載の構成の効果に加え、使用者の撮像画像を撮像情報として取得する。そして取得した撮像情報に基づいて使用者との距離情報を計測する。よって、会議端末装置と使用者との距離を、特別な計測機器を備えることなく、撮像画像を解析することで正確に計測することができるようになる。 In the conference terminal device according to the fifth aspect, in addition to the effect of the configuration according to any one of the first to fourth aspects, a captured image of the user is acquired as imaging information. And distance information with a user is measured based on the acquired imaging information. Therefore, the distance between the conference terminal device and the user can be accurately measured by analyzing the captured image without providing a special measurement device.

請求項６に記載の会議端末装置では、請求項５に記載の構成の効果に加え、取得した撮像情報に基づいて複数の使用者との距離を計測した場合、最も大きい距離情報を決定する。そして最も大きい距離情報に基づいて、出力する他拠点の音声情報の最小音量を特定し、最小音量以上で出力するよう制御する。よって、会議に参加している使用者の中で、最も会議端末装置から遠くに位置する使用者に適した音量以上で、他拠点の会議使用者の音声を出力させることができる。したがって、使用者は、どの位置に居ても聞こえる最低限の音量以上で、確実に他拠点の音声を聞き取ることができるようになる。 In the conference terminal device according to the sixth aspect, in addition to the effect of the configuration according to the fifth aspect, when the distances to a plurality of users are measured based on the acquired imaging information, the largest distance information is determined. And based on the largest distance information, the minimum sound volume of the audio | voice information of the other bases to output is specified, and it controls to output with the minimum sound volume or more. Therefore, it is possible to output the voice of the conference user at the other base at a volume higher than that suitable for the user who is farthest from the conference terminal device among the users participating in the conference. Therefore, the user can surely hear the voice of another base at a minimum volume that can be heard at any position.

請求項７に記載の会議端末制御方法では、各種情報の処理を行う情報処理装置と通信手段を介して撮像画像、または音声情報の送受信を行う。会議端末装置が設置された拠点での使用者の音声を音声情報として入力する。また、情報処理装置から送信された他拠点の音声情報を受信して出力する。また、使用者との距離情報を計測する。入力された前記使用者の発言による音声情報の音量を検出し、検出された音量情報と、計測された距離情報に基づいて、出力する他拠点の音声情報の音量を制御する。よって、拠点に設置された会議端末装置は、会議に参加している使用者との距離を計測し、また使用者の発言する音声の音量を検出することで、使用者との距離と使用者の発言音量に基づいて、最適な音量で他拠点の音声情報を出力することができる。したがって、使用者は、周囲の騒音を考慮した音量で発言しているので、会議端末装置から出力される他拠点での会議使用者の音声を、自然に適切な音量で出力させることができるようになる。 In the conference terminal control method according to the seventh aspect, a captured image or audio information is transmitted / received via an information processing apparatus and a communication unit that process various types of information. The voice of the user at the site where the conference terminal device is installed is input as voice information. Also, it receives and outputs the voice information of the other base transmitted from the information processing apparatus. In addition, distance information with the user is measured. The volume of the voice information according to the input of the user's speech is detected, and the volume of the voice information of the other base to be output is controlled based on the detected volume information and the measured distance information. Therefore, the conference terminal device installed at the base measures the distance from the user participating in the conference, and detects the volume of the voice spoken by the user, so that the distance from the user and the user are detected. Based on the utterance volume, voice information of other bases can be output at an optimum volume. Therefore, the user speaks at a volume that takes into account the surrounding noise, so that the voice of the conference user at the other site output from the conference terminal device can be naturally output at an appropriate volume. become.

請求項８に記載の会議端末制御プログラムでは、各種情報の処理を行う情報処理装置と通信手段を介して撮像画像、または音声情報の送受信を行う。会議端末装置が設置された拠点での使用者の音声を音声情報として入力する。また、情報処理装置から送信された他拠点の音声情報を受信して出力する。また、使用者との距離情報を計測する。入力された前記使用者の発言による音声情報の音量を検出し、検出された音量情報と、計測された距離情報に基づいて、出力する他拠点の音声情報の音量を制御する。よって、拠点に設置された会議端末装置は、会議に参加している使用者との距離を計測し、また使用者の発言する音声の音量を検出することで、使用者との距離と使用者の発言音量に基づいて、最適な音量で他拠点の音声情報を出力することができる。したがって、周囲の騒音を考慮した音量で発言しているので、会議端末装置から出力される他拠点での会議使用者の音声を、自然に適切な音量で出力させることができるようになる。 In the conference terminal control program according to the eighth aspect, a captured image or audio information is transmitted / received via an information processing apparatus that performs processing of various types of information and communication means. The voice of the user at the site where the conference terminal device is installed is input as voice information. Also, it receives and outputs the voice information of the other base transmitted from the information processing apparatus. In addition, distance information with the user is measured. The volume of the voice information according to the input of the user's speech is detected, and the volume of the voice information of the other base to be output is controlled based on the detected volume information and the measured distance information. Therefore, the conference terminal device installed at the base measures the distance from the user participating in the conference, and detects the volume of the voice spoken by the user, so that the distance from the user and the user are detected. Based on the utterance volume, voice information of other bases can be output at an optimum volume. Therefore, since the voice is spoken with the volume taking into account the surrounding noise, the voice of the conference user at the other site output from the conference terminal device can be naturally output at an appropriate volume.

テレビ会議システム１００の概略構成図である。1 is a schematic configuration diagram of a video conference system 100. FIG. テレビ会議システム１００の電気的構成図である。1 is an electrical configuration diagram of a video conference system 100. FIG. 会議端末装置１の構成説明図である。2 is a configuration explanatory diagram of a conference terminal device 1. FIG. 距離算出データベース１１５２Ａの説明図である。It is explanatory drawing of distance calculation database 1152A. 使用者距離情報データベース１１５２Ｂの説明図である。It is explanatory drawing of the user distance information database 1152B. 音量データベース１１５２Ｃの説明図である。It is explanatory drawing of the volume database 1152C. ＣＰＵ１１１にて実行されるメイン処理のフローチャート図である。It is a flowchart figure of the main process performed in CPU111. ＣＰＵ１１１にて実行される音声検出処理のフローチャート図である。It is a flowchart figure of the audio | voice detection process performed in CPU111. ＣＰＵ１１１にて実行される出力音量変更処理のフローチャート図である。It is a flowchart figure of the output volume change process performed in CPU111.

以下、本発明を具体化した会議端末装置の実施の形態について、図面を参照して説明する。なお、これらの図面は、本発明が採用しうる技術的特徴を説明するために用いられるものであり、記載されている装置の構成、各種処理のフローチャートなどは、それのみに限定する趣旨ではなく、単なる説明例である。 Embodiments of a conference terminal device embodying the present invention will be described below with reference to the drawings. These drawings are used for explaining the technical features that can be adopted by the present invention, and the configuration of the apparatus and the flowcharts of various processes described are not intended to be limited to the drawings. This is just an illustrative example.

まず、図１〜図２を参照して、本実施形態のテレビ会議システム１００の概略構成、及び構成要素である会議端末装置１、通信端末装置１２０の電気的構成について、順に説明する。図１に示すように、テレビ会議システム１００は、各拠点（以後、１０Ａ、１０Ｂ、１０Ｃとする。）に設けられた、会議端末装置１、通信端末装置１２０を含み、これらはすべてネットワーク２００に接続されている。ネットワーク２００としては、例えば、ＩＰ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）やＩＳＤＮ（ＩｎｔｅｇｒａｔｅｄＳｅｒｖｉｃｅｓＤｉｇｉｔａｌＮｅｔｗｏｒｋ）等のネットワークを採用することができる。図１では、拠点は３拠点しか図示されていないが、実際には複数が存在してもよい。また各拠点は、例えば、ある企業の同一サイト内に複数存在してもよいし、異なる事業所内や、異なる地域や国に点在して存在していてもよい。 First, with reference to FIGS. 1-2, the schematic structure of the video conference system 100 of this embodiment and the electrical structure of the conference terminal device 1 and the communication terminal device 120 which are components are demonstrated in order. As shown in FIG. 1, the video conference system 100 includes a conference terminal device 1 and a communication terminal device 120 provided at each base (hereinafter referred to as 10A, 10B, and 10C). It is connected. As the network 200, for example, a network such as IP (Internet Protocol) or ISDN (Integrated Services Digital Network) can be employed. In FIG. 1, only three bases are shown, but a plurality of bases may actually exist. Further, for example, a plurality of bases may exist in the same site of a certain company, or may exist in different offices or in different regions or countries.

通信端末装置１２０は、例えば、周知のパーソナルコンピュータであり、汎用型の装置である。通信端末装置１２０は、ネットワーク２００を介して接続された複数の拠点間で、使用者の撮像画像や音声情報を送受信することでＴＶ会議を行う為の通信機能等を備える。また、会議端末装置１は、テレビ会議使用者の撮像画像を取得するカメラ機器や、音声情報を取得するマイク機器、音声情報を出力するスピーカ機器を備える。また、会議端末装置１は、通信端末装置１２０と電気的な接続をすることにより、ネットワーク２００を介して他拠点とテレビ会議を行うための機器として機能するようになる。会議端末装置１、通信端末装置１２０についての詳細な説明は、後述する。 The communication terminal device 120 is, for example, a well-known personal computer and is a general-purpose device. The communication terminal device 120 includes a communication function for performing a TV conference by transmitting and receiving a user's captured image and audio information between a plurality of bases connected via the network 200. The conference terminal device 1 also includes a camera device that acquires a captured image of a video conference user, a microphone device that acquires audio information, and a speaker device that outputs audio information. In addition, the conference terminal device 1 functions as a device for performing a video conference with another site via the network 200 by being electrically connected to the communication terminal device 120. Detailed descriptions of the conference terminal device 1 and the communication terminal device 120 will be described later.

さらに、テレビ会議への参加を希望する使用者は、通信端末装置１２０のうちいずれかを操作し、使用者はＩＤ情報等を登録することによってテレビ会議にログインする。テレビ会議に参加した状態では、例えば会議端末装置１のカメラ（図３参照）によって会議に参加している使用者が撮像される。撮像された画像（以下「撮像画像」という。）は、会議端末装置１から通信端末装置１２０へ送信され、他拠点の通信端末装置１２０に対して送信される。撮像画像を受信した他拠点の通信端末装置１２０では、受信した撮像画像を自拠点で撮像した撮像画像と共に表示装置１２５に表示させる。撮像画像が複数存在する場合、個別のウィンドウ内に其々表示させる。これによってテレビ会議に参加する使用者は、各拠点で会議に参加する使用者の撮像画像を共有して視認することが可能となる。 Furthermore, a user who wishes to participate in the video conference operates one of the communication terminal devices 120, and the user logs in to the video conference by registering ID information and the like. In the state of participating in the video conference, for example, the user participating in the conference is imaged by the camera of the conference terminal device 1 (see FIG. 3). The captured image (hereinafter referred to as “captured image”) is transmitted from the conference terminal device 1 to the communication terminal device 120 and transmitted to the communication terminal device 120 at another base. The communication terminal device 120 at another site that has received the captured image causes the display device 125 to display the received captured image together with the captured image captured at the local site. When there are a plurality of captured images, they are displayed in individual windows. As a result, the user participating in the video conference can share and view the captured image of the user participating in the conference at each site.

次に、図２を参照して、会議端末装置１、通信端末装置１２０の電気的構成について説明する。会議端末装置１としては、ＣＰＵ１１１と、ＣＰＵ１１１に各々接続されたＲＯＭ１１２およびＲＡＭ１１３を備えている。ＣＰＵ１１１には、その他、入出力（Ｉ／Ｏ）インタフェイス１１４が接続されている。Ｉ／Ｏインタフェイス１１４には、カメラ６、マイク４、スピーカ１０、画像入力処理部１０６、音入力処理部１０４、音出力処理部１１０、記録装置１１５、外部接続装置１１６が接続されている。 Next, the electrical configuration of the conference terminal device 1 and the communication terminal device 120 will be described with reference to FIG. The conference terminal device 1 includes a CPU 111 and a ROM 112 and a RAM 113 connected to the CPU 111, respectively. In addition, an input / output (I / O) interface 114 is connected to the CPU 111. Connected to the I / O interface 114 are a camera 6, a microphone 4, a speaker 10, an image input processing unit 106, a sound input processing unit 104, a sound output processing unit 110, a recording device 115, and an external connection device 116.

ＣＰＵ１１１は、会議端末装置１の全体の制御を司る。ＲＯＭ１１２は、ＢＩＯＳを含む、会議端末装置１を動作させるための各種のプログラムや、そのための設定値を記憶している。ＣＰＵ１１１は、ＲＯＭ１１２や後述する記憶装置１１５に記憶されたプログラムに従って、会議端末装置１の動作を制御する。ＲＡＭ１１３は、各種データを一時的に記憶するための記憶装置である。 The CPU 111 governs overall control of the conference terminal device 1. The ROM 112 stores various programs for operating the conference terminal device 1 including the BIOS, and setting values for the various programs. The CPU 111 controls the operation of the conference terminal device 1 according to a program stored in the ROM 112 or a storage device 115 described later. The RAM 113 is a storage device for temporarily storing various data.

画像入力処理部１０６、音入力処理部１０４、音出力処理部１１０には、それぞれ、カメラ６、マイク４、スピーカ１０が接続されている。カメラ６は、会議端末装置１が設置されている拠点を撮像する撮像機器であり、画像入力処理部１０６は、カメラ６からの撮像画像の入力を処理する機器である。マイク４は、会議端末装置１が設置されている拠点の音情報を取得する機器であり、音入力処理部１０４は、マイク４から入力された音情報を処理する機器である。スピーカ１０は、他拠点の装置から送信された他拠点の音情報を出力する機器であり、音出力処理部１１０は、スピーカ１０へ音情報の出力条件に基づいて出力処理する機器である。 The image input processing unit 106, the sound input processing unit 104, and the sound output processing unit 110 are connected to the camera 6, the microphone 4, and the speaker 10, respectively. The camera 6 is an imaging device that captures the site where the conference terminal device 1 is installed, and the image input processing unit 106 is a device that processes input of a captured image from the camera 6. The microphone 4 is a device that acquires sound information of a base where the conference terminal device 1 is installed, and the sound input processing unit 104 is a device that processes sound information input from the microphone 4. The speaker 10 is a device that outputs sound information of another site transmitted from a device at another site, and the sound output processing unit 110 is a device that performs output processing to the speaker 10 based on the output condition of the sound information.

外部接続装置１１６は、汎用的なＵＳＢ機器等であり、通信端末装置１２０と接続することで撮像画像や音情報の送受信を行うための通信機器である。 The external connection device 116 is a general-purpose USB device or the like, and is a communication device for transmitting and receiving captured images and sound information by connecting to the communication terminal device 120.

また、記憶装置１１５は、プログラム情報記憶エリア１１５１、使用者情報記憶エリア１１５２、を含む複数の記憶エリアを備えている。プログラム情報記憶エリア１１５１には、詳細は図示しないが、撮像画像、音情報を通信端末装置１２０との間で送受信する機能を会議端末装置１に実行させるための各種プログラムが記憶されている。使用者情報記憶エリア１１５２には、詳細な説明は後述するが、図４に示すような距離算出データベース１１５２Ａ、図５に示す使用者距離情報データベース１１５２Ｂ、図６に示す音量データベース１１５２Ｃ等が記憶されている。 The storage device 115 includes a plurality of storage areas including a program information storage area 1151 and a user information storage area 1152. Although not shown in detail in the program information storage area 1151, various programs for causing the conference terminal device 1 to execute a function of transmitting and receiving captured image and sound information to and from the communication terminal device 120 are stored. The user information storage area 1152 stores a distance calculation database 1152A as shown in FIG. 4, a user distance information database 1152B as shown in FIG. 5, a volume database 1152C as shown in FIG. ing.

次に、通信端末装置１２０の電気的構成について説明をする。通信端末装置１２０としては、ＣＰＵ１２１と、ＣＰＵ１２１に各々接続されたＲＯＭ１２２およびＲＡＭ１２３を備えている。ＣＰＵ１２１には、その他、入出力（Ｉ／Ｏ）インタフェイス１２４が接続されている。Ｉ／Ｏインタフェイス１２４には、表示装置１２５、画像出力処理部１２６、入力装置１２７、通信装置１３３、記憶装置１３５、外部接続装置１５０、ＣＤ−ＲＯＭドライブ１４０が接続されている。 Next, the electrical configuration of the communication terminal apparatus 120 will be described. The communication terminal device 120 includes a CPU 121 and a ROM 122 and a RAM 123 connected to the CPU 121. In addition, an input / output (I / O) interface 124 is connected to the CPU 121. A display device 125, an image output processing unit 126, an input device 127, a communication device 133, a storage device 135, an external connection device 150, and a CD-ROM drive 140 are connected to the I / O interface 124.

ＣＰＵ１２１は、通信端末装置１２０の全体の制御を司る。ＲＯＭ１２２は、ＢＩＯＳを含む、通信端末装置１２０を動作させるための各種のプログラムや、そのための設定値を記憶している。ＣＰＵ１２１は、ＲＯＭ１２２や、後述する記憶装置１３５に記憶されたプログラムに従って、通信端末装置１２０の動作を制御する。ＲＡＭ１２３は、各種データを一時的に記憶するための記憶装置である。 The CPU 121 governs overall control of the communication terminal device 120. The ROM 122 stores various programs including the BIOS for operating the communication terminal device 120 and setting values for the programs. The CPU 121 controls the operation of the communication terminal device 120 according to a program stored in the ROM 122 or a storage device 135 described later. The RAM 123 is a storage device for temporarily storing various data.

画像出力処理部１２６には、表示装置１２５が接続されている。表示装置１２５は、他拠点から取得した撮像画像や、自拠点で撮像された撮像画像等を表示する機器であり、図示は省略するが汎用的な液晶モニタ、プロジェクタ等の表示機器である。画像出力処理部１２６は、表示装置１２５への撮像画像の出力を処理する機器である。また、入力装置１２７は、使用者が通信端末装置１２０へ情報を入力するための装置であり、汎用的なキーボードや、ペンタブレット等の入力装置である。 A display device 125 is connected to the image output processing unit 126. The display device 125 is a device that displays a captured image acquired from another site, a captured image captured at its own site, and the like. Although not shown, the display device 125 is a display device such as a general-purpose liquid crystal monitor or projector. The image output processing unit 126 is a device that processes output of a captured image to the display device 125. The input device 127 is a device for a user to input information to the communication terminal device 120, and is an input device such as a general-purpose keyboard or a pen tablet.

また、記憶装置１３５は、プログラム情報記憶エリア１３５１を含む複数の記憶エリアを備えている。詳細は図示しないが、撮像画像、音情報を会議端末装置１や、他拠点の通信端末装置１２０との間で送受信することでテレビ会議を行うためのテレビ会議アプリを、通信端末装置１２０に実行させるための各種プログラムが記憶されている。 The storage device 135 includes a plurality of storage areas including a program information storage area 1351. Although not shown in detail, the communication terminal device 120 executes a video conference application for performing a video conference by transmitting and receiving captured images and sound information to and from the conference terminal device 1 and the communication terminal device 120 at another site. Various programs are stored for this purpose.

ＣＤ−ＲＯＭドライブ１４０は、ＣＤ−ＲＯＭ１４１に記録されているデータを読み込む機器である。また、ＣＤ−ＲＯＭ１４１は、通信端末装置１２０でテレビ会議を行うためのテレビ会議アプリ等が記憶されている。ＣＤ−ＲＯＭ１４１の導入時には、これら各種プログラムが、ＣＤ−ＲＯＭ１４１から、プログラム記憶エリア１３５１に記憶される。外部接続装置１５０は、汎用的なＵＳＢ機器等であり、会議端末装置１に接続することで撮像画像や音情報の送受信を行うための通信機器である。通信装置１３３は、ネットワーク２００に接続し、他の通信端末装置１２０との間で撮像画像や音情報等の各種データの送受信を行うための機器である。 The CD-ROM drive 140 is a device that reads data recorded on the CD-ROM 141. Further, the CD-ROM 141 stores a video conference application for performing a video conference with the communication terminal device 120. When the CD-ROM 141 is introduced, these various programs are stored in the program storage area 1351 from the CD-ROM 141. The external connection device 150 is a general-purpose USB device or the like, and is a communication device for transmitting and receiving captured images and sound information by connecting to the conference terminal device 1. The communication device 133 is a device that is connected to the network 200 and transmits / receives various data such as captured images and sound information to / from other communication terminal devices 120.

次に、図３を参照しながら、本実施形態での会議端末装置１の外観構成について説明をする。図３に示すように、会議端末装置１は、第一筐体２と、第二筐体３とを備えている。第一筐体２と第二筐体３とは、回転軸５を介して連結され、回転軸５を中心に回動することで折り畳み可能に構成されている。会議端末装置１の上部中央には、回転軸５に回動可能に軸支されたカメラ６が設けられている。カメラ６の中心にはレンズ部７が設けられている。会議端末装置１は、回転軸５を上方に向け、第一筐体２と第二筐体３とを最大角度で開いた状態で、机上等の水平面に載置して自立させることができる。会議端末装置１が自立した状態で、カメラ６を回動し、レンズ部７を撮像対象者に向けることで、会議端末装置１の設置が完了する。 Next, an external configuration of the conference terminal device 1 according to the present embodiment will be described with reference to FIG. As shown in FIG. 3, the conference terminal device 1 includes a first housing 2 and a second housing 3. The first housing 2 and the second housing 3 are connected via a rotation shaft 5 and are configured to be foldable by rotating around the rotation shaft 5. In the upper center of the conference terminal device 1, a camera 6 pivotally supported on the rotary shaft 5 is provided. A lens unit 7 is provided at the center of the camera 6. The conference terminal device 1 can be placed on a horizontal surface such as a desk and can stand on its own with the rotating shaft 5 facing upward and the first housing 2 and the second housing 3 opened at the maximum angle. In a state where the conference terminal device 1 is independent, the camera 6 is rotated and the lens unit 7 is directed toward the person to be imaged, whereby the installation of the conference terminal device 1 is completed.

次に、第一筐体２の構造について説明する。図３に示すように、第一筐体２は、縦長の略直方体状に形成されている。第一筐体２の内側には、電子基板等の各種電子部品が格納されている。第一筐体２は、正面部２１、右側面部２２、左側面部２３、上面部２４、底面部２５、背面部２６を備えている。正面部２１の略中央には、スピーカ１０が設けられている。スピーカ１０は、他拠点の通信端末装置１２０から送信された音情報等を出力する。正面部２１の下部の左右両側には、マイク４（１１，１２）が各々設けられている。マイク４は、自拠点の使用者の音情報を取得し、接続している自拠点の通信端末装置１２０へ出力する。 Next, the structure of the first housing 2 will be described. As shown in FIG. 3, the first housing 2 is formed in a vertically long, substantially rectangular parallelepiped shape. Various electronic components such as an electronic board are stored inside the first housing 2. The first housing 2 includes a front part 21, a right side part 22, a left side part 23, a top part 24, a bottom part 25, and a back part 26. A speaker 10 is provided in the approximate center of the front portion 21. The speaker 10 outputs sound information and the like transmitted from the communication terminal device 120 at another base. Microphones 4 (11, 12) are respectively provided on the left and right sides of the lower portion of the front portion 21. The microphone 4 acquires the sound information of the user at the local site and outputs it to the communication terminal device 120 at the local site to which it is connected.

上面部２４は、正面部２１の上部から後方に向かって半円弧状に形成されている。その終端部は上面部２４の最上部に位置し、かつ回転軸５に対して直交する方向に切断されている。この終端部には、第一筐体２と第二筐体３を開く際に、第二筐体３の上端部に設けられた後述する係止部６６が係止するための被係止部５６が設けられている。図３に示すように、係止部６６が被係止部５６に係止することによって、第一筐体２と第二筐体３の回動が制限される。 The upper surface part 24 is formed in a semicircular arc shape from the upper part of the front part 21 toward the rear. The end portion is located at the uppermost portion of the upper surface portion 24 and is cut in a direction perpendicular to the rotation shaft 5. At this end portion, when the first housing 2 and the second housing 3 are opened, a locked portion for locking a locking portion 66 (described later) provided at the upper end portion of the second housing 3. 56 is provided. As shown in FIG. 3, when the locking portion 66 is locked to the locked portion 56, the rotation of the first housing 2 and the second housing 3 is limited.

また、上面部２４の長手方向中央には、略長方形状に切り欠いた凹部１４が設けられている。凹部１４の内側には、回転軸５を中心に回動可能に軸支されたカメラ６が設けられている。カメラ６を使用する時は、図３に示すように、カメラ６を回動してレンズ部７を外部に露出させる。カメラ６を使用しない時は、カメラ６を回動してレンズ部７を凹部１４の内側に収納する。 Further, a concave portion 14 cut out in a substantially rectangular shape is provided at the center of the upper surface portion 24 in the longitudinal direction. A camera 6 that is pivotally supported about the rotation shaft 5 is provided inside the recess 14. When the camera 6 is used, as shown in FIG. 3, the camera 6 is rotated to expose the lens unit 7 to the outside. When the camera 6 is not used, the camera 6 is rotated to house the lens unit 7 inside the recess 14.

右側面部２２の長手方向略中央には、スピーカ１０の音量を大きくするための＋（プラス）ボタン５１と、小さくするための−（マイナス）ボタン５２とが上下に並んで設けられている。右側面部２２の下側には、マイク４による音の入力を一時禁止するマイク禁止ボタン５３が設けられている。 A + (plus) button 51 for increasing the volume of the speaker 10 and a-(minus) button 52 for decreasing the volume of the speaker 10 are provided vertically at substantially the center in the longitudinal direction of the right side portion 22. A microphone prohibition button 53 that temporarily prohibits sound input by the microphone 4 is provided below the right side surface portion 22.

右側面部２２の背面部２６側の縁部に沿った部分には、右側面部２２よりも一段低くなった段差部２２Ａが設けられている。段差部２２Ａは、会議端末装置１を閉じたときに、第二筐体３の後述する側壁部３２によって覆われる部分である。 A step portion 22 </ b> A that is one step lower than the right side surface portion 22 is provided at a portion along the edge on the back surface portion 26 side of the right side surface portion 22. The step portion 22 </ b> A is a portion that is covered with a side wall portion 32 described later of the second housing 3 when the conference terminal device 1 is closed.

左側面部２３には、カメラ６による撮影を一時中断するカメラ中断ボタン（図示外）と、通信端末装置１２０に対して撮像画像、音情報の入出力を行うための通信用の配線を接続するため、汎用的なＵＳＢ機器等の外部接続装置部（図示外）が設けられている。 In order to connect a camera interruption button (not shown) for temporarily suspending photographing by the camera 6 and a communication wiring for inputting and outputting a captured image and sound information to the communication terminal device 120 on the left side surface portion 23. An external connection device unit (not shown) such as a general-purpose USB device is provided.

底面部２５は、背面部２６側の非接地部２７と、正面部２１側の接地部２８とからなる。接地部２８は、非接地部２７よりも下方に突出しているため、机上等の水平面に接地する部分となる。図３に示すように、接地部２８は、正面部２１側から背面部２６側に向かって所定角度で斜め上方に傾斜している。接地部２８の傾斜面は、会議端末装置１を最大角度で開き、机上等の水平面に自立させたときに水平となるように形成されている。 The bottom surface portion 25 includes a non-grounding portion 27 on the back surface portion 26 side and a grounding portion 28 on the front surface portion 21 side. Since the grounding part 28 protrudes below the non-grounding part 27, it becomes a part grounded on a horizontal surface such as a desk. As shown in FIG. 3, the grounding portion 28 is inclined obliquely upward at a predetermined angle from the front portion 21 side toward the rear portion 26 side. The inclined surface of the grounding portion 28 is formed to be horizontal when the conference terminal device 1 is opened at the maximum angle and is allowed to stand on a horizontal surface such as a desk.

次に、第二筐体３の構造について説明する。図３に示すように、第二筐体３は、断面凹状の蓋状に形成されている。第二筐体３は、薄板である本体部３１を備えている。本体部３１は、その上端部において回転軸５に回動可能に連結されている。本体部３１の左右両端部には、第一筐体２側に向かって突出するリブ状の側壁部３２（左端部の側壁部は図示外）が各々設けられている。さらに、本体部３１の下端部には、第一筐体２側に向かって突出すると共に、机上に接地するリブ状の接地部３３が設けられている。また、これら本体部３１、側壁部３２、接地部３３によって囲まれる内側に、第一筐体２の背面部２６側に覆い被さるための空洞部３７が形成されている。 Next, the structure of the second housing 3 will be described. As shown in FIG. 3, the second housing 3 is formed in a lid shape having a concave cross section. The second housing 3 includes a main body 31 that is a thin plate. The main body 31 is rotatably connected to the rotary shaft 5 at its upper end. Rib-shaped side wall portions 32 (the left end side wall portion is not shown) are provided on the left and right ends of the main body 31 so as to protrude toward the first housing 2. Further, a rib-shaped grounding portion 33 that protrudes toward the first housing 2 and is grounded on the desk is provided at the lower end of the main body 31. In addition, a hollow portion 37 is formed on the inner side surrounded by the main body portion 31, the side wall portion 32, and the grounding portion 33 so as to cover the back surface portion 26 side of the first housing 2.

次に、図４を参照しながら、使用者情報記憶エリア１１５２に記憶されている距離算出データベース１１５２Ａについて説明をする。距離算出データベース１１５２Ａは、カメラ６により取得された撮像画像に基づいて検出された人の顔領域のサイズを画素単位（ピクセル）で測定した値の項目と、そのサイズに対応付けられている人までの距離の項目を対応付けしたデータベースである。撮像画像から人の顔を検出する技術については、既に幾つかの方法が考案されているが、例えば、人の顔の目、鼻、口の輪郭線を示すパターンデータを予め記憶しておき、撮像画像から上記パターンと類似する輪郭形状が検出できるかで、人の顔の有無を検出する方法を用いればよい。また、サイズの項目については、検出された人の顔領域のサイズを測定するが、検出された顔の領域の色分布から顔の領域のサイズを測定するようにすれば良いが、これに限るものではない。図４に示す距離算出データベース１１５２Ａは、例えば、サイズが６０と検出された場合は、人と装置との距離は２．５ｍとして対応付けられている。また、サイズが１２０と検出された場合は、距離を１．０ｍとして対応付けられている。また、距離算出データベース１１５２Ａの顔のサイズと距離の関係は、カメラの撮像性能、設定条件等によって変化する値なので、予め顔のサイズと距離の関係を測定して更新できるようにすれば、より精度良く使用者との距離を計測することができるようになる。 Next, the distance calculation database 1152A stored in the user information storage area 1152 will be described with reference to FIG. The distance calculation database 1152A includes items of values obtained by measuring the size of the face area of the person detected based on the captured image acquired by the camera 6 in pixel units (pixels) and the person associated with the size. It is the database which matched the item of distance. Several techniques have already been devised for detecting a human face from a captured image.For example, pattern data indicating the contours of eyes, nose, and mouth of a human face is stored in advance. What is necessary is just to use the method of detecting the presence or absence of a human face, based on whether a contour shape similar to the pattern can be detected from the captured image. As for the size item, the size of the detected human face area is measured. However, the size of the face area may be measured from the color distribution of the detected face area. It is not a thing. In the distance calculation database 1152A illustrated in FIG. 4, for example, when the size is detected as 60, the distance between the person and the apparatus is associated with 2.5 m. If the size is detected as 120, the distance is set to 1.0 m. Further, the relationship between the face size and distance in the distance calculation database 1152A varies depending on the imaging performance of the camera, setting conditions, and the like. Therefore, if the relationship between the face size and distance can be measured and updated in advance, the relationship can be increased. The distance to the user can be measured with high accuracy.

次に、図５を参照しながら、使用者距離情報データベース１１５２Ｂについて説明をする。使用者距離情報データベース１１５２Ｂは、後述するメイン処理（図７）のステップＳ３０２で処理された際に記憶されるデータベースである。使用者距離情報データベース１１５２Ｂは、使用者を識別する項目と、距離の項目が対応付けて記憶される。使用者の項目は、検出された人に対応付けて自動で採番される識別情報が記憶されている。また、距離の項目では、ステップＳ３０２の処理にて、距離算出データベース１１５２Ａに基づいて算出された距離の情報が記憶されている。 Next, the user distance information database 1152B will be described with reference to FIG. The user distance information database 1152B is a database stored when processed in step S302 of the main process (FIG. 7) described later. The user distance information database 1152B stores an item for identifying the user and a distance item in association with each other. The user's item stores identification information that is automatically assigned in association with the detected person. In the distance item, information on the distance calculated based on the distance calculation database 1152A in the process of step S302 is stored.

次に、図６を参照しながら、音量データベース１１５２Ｃについて説明をする。音量データベース１１５２Ｃは、後述する図８の音声検出処理でのステップＳ４０５の処理で記憶生成されるデータベースである。音量データベース１１５２Ｃは、使用者の識別情報と、それに対応付けて検出された音量の値が各々記憶されている。 Next, the volume database 1152C will be described with reference to FIG. The volume database 1152C is a database that is stored and generated in the process of step S405 in the voice detection process of FIG. The volume database 1152C stores user identification information and volume values detected in association with the identification information.

次に、図７を参照して会議端末装置１のＣＰＵ１１１により行われるメイン処理について説明をする。図７は会議端末装置１でのメイン処理のフローチャートであり、会議端末装置１の電源がＯＮされると開始される。まず初めに、ＣＰＵ１１１は、通信端末装置１２０とＵＳＢ接続したかどうかを判断する（Ｓ３００）。具体的には、会議端末装置１の外部接続装置１１６と、通信端末装置１２０の外部接続装置１５０が、ＵＳＢ等の電気的な接続をされたかどうかを判断する。ステップＳ３００の処理で、ＣＰＵ１１１は、ＵＳＢ接続していないと判断した場合（Ｓ３００：ＮＯ）、再度ステップＳ３００の処理が繰り返し実行される。 Next, the main process performed by the CPU 111 of the conference terminal device 1 will be described with reference to FIG. FIG. 7 is a flowchart of the main process in the conference terminal device 1, which starts when the conference terminal device 1 is turned on. First, the CPU 111 determines whether or not a USB connection is established with the communication terminal device 120 (S300). Specifically, it is determined whether or not the external connection device 116 of the conference terminal device 1 and the external connection device 150 of the communication terminal device 120 are electrically connected by USB or the like. When the CPU 111 determines in the process of step S300 that the USB connection is not made (S300: NO), the process of step S300 is repeatedly executed again.

ステップＳ３００の処理で、ＣＰＵ１１１は、ＵＳＢ接続があったと判断した場合（Ｓ３００：ＹＥＳ）、次にＣＰＵ１１１は、接続した通信端末装置１２０がテレビ会議中かどうかを判断する（Ｓ３０１）。具体的には、テレビ会議アプリが起動して、他拠点の通信端末装置１２０と双方向通信接続がされている状態かどうかを判断する。ステップＳ３０１の処理で、ＣＰＵ１１１はテレビ会議中では無いと判断した場合（Ｓ３０１：ＮＯ）、再度Ｓ３０１を繰り返し実行する。また、ステップＳ３０１の処理で、ＣＰＵ１１１はテレビ会議中であると判断した場合（Ｓ３０１：ＹＥＳ）、次に装置の前に人が存在しているかどうかを判断する（Ｓ３０２）。ステップＳ３０１の処理で通信端末装置１２０がテレビ会議中の場合は、他拠点の撮像画像を受信したり、自拠点の撮像画像を他拠点へ送信する状態になる。そして、自拠点の撮像画像から、人が存在するかどうかを上述した顔の検出方法により判定を行う。ステップＳ３０２の処理で、ＣＰＵ１１１は、人が存在しないと判断した場合（Ｓ３０２：ＮＯ）、ステップＳ３０８へ移行する。また、ステップＳ３０２の処理で、ＣＰＵ１１１は、人が存在すると判断した場合（Ｓ３０２：ＹＥＳ）、使用者までの距離を測定する（Ｓ３０３）。ステップＳ３０３の処理では、具体的には、カメラ６で取得した撮像画像から、人の顔を検出し、距離算出データベース１１５２Ａを参照して使用者との距離を測定する。そして、測定結果を使用者距離情報データベース１１５２Ｂに記憶する処理が実行される。また、ステップＳ３０３の処理では、撮像画像に使用者が複数検出された場合は、それぞれの使用者について同様の処理を実施し、使用者距離情報データベース１１５２Ｂに追加記憶するようになっている。 If the CPU 111 determines in step S300 that there is a USB connection (S300: YES), then the CPU 111 determines whether the connected communication terminal device 120 is in a video conference (S301). Specifically, it is determined whether or not the video conference application is activated and a bidirectional communication connection is established with the communication terminal device 120 at another site. If the CPU 111 determines in the process of step S301 that there is no video conference (S301: NO), it repeats S301. If it is determined in step S301 that the CPU 111 is in a video conference (S301: YES), it is next determined whether or not there is a person in front of the device (S302). If the communication terminal device 120 is in a video conference in the process of step S301, the captured image of the other site is received or the captured image of the own site is transmitted to the other site. Then, it is determined from the captured image of the local site whether or not a person exists by the face detection method described above. If the CPU 111 determines in the process of step S302 that there is no person (S302: NO), the process proceeds to step S308. In the process of step S302, when the CPU 111 determines that there is a person (S302: YES), the CPU 111 measures the distance to the user (S303). In the process of step S303, specifically, a human face is detected from the captured image acquired by the camera 6, and the distance to the user is measured with reference to the distance calculation database 1152A. And the process which memorize | stores a measurement result in the user distance information database 1152B is performed. In the process of step S303, when a plurality of users are detected in the captured image, the same process is performed for each user and additionally stored in the user distance information database 1152B.

次に、ＣＰＵ１１１は、最低音量を決定する処理を実行する（Ｓ３０４）。ステップＳ３０４の処理では、ステップＳ３０３の処理で記憶された使用者距離情報データベース１１５２Ｂの中から最も距離が大きい値を特定する。本実施形態での使用者距離情報データベース１１５２Ｂでは、使用者Ｃの距離３ｍの値が最も大きい距離と判断する。そして、その最大距離に対応した最低音量を算出する。本実施形態での最低音量の算出方法は、例えば、会議端末装置１には予め、会議端末装置１から１ｍ離れた位置で所定音量が聞こえるように出力音量を調整した設定値が記憶されており、１ｍ位置での所定音量に対して、使用者までの距離間を伝達する音が減衰する減衰量を求める式を用いて算出するようになっている。減衰量をＰとして、求める式は、例えば、Ｐ＝２０Ｌｏｇ１０（距離／１）を用いるようにすれば良い。前記式は、音源から１ｍ離れた位置での音量を基準にして、実際に使用者までの距離での減衰量へ換算する式である。また、本実施形態では、人が支障なく聞くことができる音量として聞き手の場所で５０ｄＢを所定音量として登録しているとし、最大距離の３ｍまでに減衰する減衰量Ｐは、２０Ｌｏｇ１０（３／１）≒９ｄＢとして算出されるので、最低音量として、１ｍの位置で５９ｄＢ聞こえる出力音量が設定される。 Next, the CPU 111 executes a process for determining the minimum volume (S304). In the process of step S304, a value having the largest distance is specified from the user distance information database 1152B stored in the process of step S303. In the user distance information database 1152B in this embodiment, it is determined that the distance 3m of the user C has the largest value. Then, the minimum volume corresponding to the maximum distance is calculated. In the method of calculating the minimum volume in the present embodiment, for example, the conference terminal device 1 stores in advance a set value obtained by adjusting the output volume so that a predetermined volume can be heard at a position 1 m away from the conference terminal device 1. It is calculated by using an expression for obtaining an attenuation amount by which sound transmitted between distances to a user attenuates with respect to a predetermined sound volume at a 1 m position. For example, P = 20 Log 10 (distance / 1) may be used as an expression to obtain the attenuation amount as P. The above expression is an expression that is converted into an attenuation amount at a distance to the user on the basis of the sound volume at a position 1 m away from the sound source. Further, in the present embodiment, it is assumed that 50 dB is registered as a predetermined volume at the listener's place as a volume that can be heard by a person without any trouble. ) ≈9 dB, so the output volume at which 59 dB can be heard at the position of 1 m is set as the minimum volume.

次に、ＣＰＵ１１１は、音声検出処理を実行する（Ｓ３０５）。ステップＳ３０５の音声検出処理は、図８のフローチャート図を参照しながら説明をする。まず、ＣＰＵ１１１は、入力音量が所定値以上かどうかを判断する（Ｓ４０１）。ステップＳ４０１の処理では、マイク４から取得した使用者の発言音量が所定値以上かどうかを判断する。所定値以上かどうかの判断は、取得した音量がノイズ等の騒音かどうかを区別する為の処理である。そして、ステップＳ４０１の処理で、ＣＰＵ１１１は、所定値以下と判断した場合（Ｓ４０１：ＮＯ）、ステップＳ４０７の処理へ移行する。ステップＳ４０１の処理で、ＣＰＵ１１１は、入力音量が所定値以上と判断した場合（Ｓ４０１：ＹＥＳ）、初回ならばタイマを始動する（Ｓ４０２）。 Next, the CPU 111 executes a sound detection process (S305). The voice detection process in step S305 will be described with reference to the flowchart in FIG. First, the CPU 111 determines whether or not the input volume is greater than or equal to a predetermined value (S401). In the process of step S401, it is determined whether or not the user's speech volume acquired from the microphone 4 is greater than or equal to a predetermined value. Judgment whether or not it is a predetermined value or more is processing for distinguishing whether or not the acquired volume is noise such as noise. If the CPU 111 determines that the value is equal to or less than the predetermined value in the process of step S401 (S401: NO), the process proceeds to the process of step S407. In the process of step S401, when the CPU 111 determines that the input volume is greater than or equal to a predetermined value (S401: YES), the CPU 111 starts a timer if it is the first time (S402).

そして、ＣＰＵ１１１は、使用者の口の動きを検出する（Ｓ４０３）。ステップＳ４０３の処理では、具体的には、図７のステップＳ３０２の処理で検出した使用者の口の輪郭形状が、所定量変化したかどうかで判断するようになっている。次に、ＣＰＵ１１１は、使用者の口の動きを検出できたかどうかを判断する（Ｓ４０４）。ステップＳ４０４の処理で、ＣＰＵ１１１は、口の動きを検出できたと判断した場合（Ｓ４０４：ＹＥＳ）、口の動きを検出した使用者を発言者として特定し、図６に示すような音量データベース１１５２Ｃにそのときの入力音量を記憶する（Ｓ４０５）。ステップＳ４０４の処理で、ＣＰＵ１１１は、使用者の口の動きを検出できなかったと判断した場合は（Ｓ４０４：ＮＯ）、発言者無と特定して音量データベース１１５２Ｃに音量を記憶しない（Ｓ４０６）。また、ステップＳ４０４処理で、ＣＰＵ１１１は、同時に複数の使用者の口の動きを検出した場合、どの使用者の発言か特定できないので、この場合もステップＳ４０４：ＮＯと判断するようになっているものとする。 Then, the CPU 111 detects the movement of the user's mouth (S403). In the process of step S403, specifically, it is determined whether or not the contour shape of the user's mouth detected in the process of step S302 of FIG. 7 has changed by a predetermined amount. Next, the CPU 111 determines whether or not the movement of the user's mouth has been detected (S404). If the CPU 111 determines in the process of step S404 that the movement of the mouth has been detected (S404: YES), the user who has detected the movement of the mouth is identified as the speaker, and the volume database 1152C as shown in FIG. The input volume at that time is stored (S405). If the CPU 111 determines in step S404 that the movement of the user's mouth could not be detected (S404: NO), the CPU 111 identifies that there is no speaker and does not store the volume in the volume database 1152C (S406). In step S404, if the CPU 111 detects movements of the mouths of a plurality of users at the same time, it cannot determine which user's remarks. In this case as well, it is determined that step S404 is NO. And

ステップＳ４０１、Ｓ４０５、Ｓ４０６の後、ＣＰＵ１１１は、所定時間経過したかどうかを判断する（Ｓ４０７）。ステップＳ４０７の処理で、ＣＰＵ１１１は、所定時間経過していないと判断した場合（Ｓ４０７：ＮＯ）、再度ステップＳ４０１の処理へ戻る。また、ステップＳ４０７の処理で、ＣＰＵ１１１は、所定時間経過したと判断した場合（Ｓ４０７：ＹＥＳ）、本処理を終了する。 After steps S401, S405, and S406, the CPU 111 determines whether a predetermined time has elapsed (S407). If the CPU 111 determines in step S407 that the predetermined time has not elapsed (S407: NO), the process returns to step S401 again. If the CPU 111 determines in step S407 that the predetermined time has elapsed (S407: YES), the process ends.

図７のメイン処理の説明に戻り、上述した音声検出処理（Ｓ３０５）の後、ＣＰＵ１１１は、発言があったかどうかを判断する（Ｓ３０６）。ステップＳ３０６の処理では、図８で説明したステップＳ４０５の処理で、発言者の音量データが音量データベース１１５２Ｃに記憶されているかどうかで判断する。ステップＳ３０６の処理で、ＣＰＵ１１１は、発言有と判断した場合（３０６：ＹＥＳ）、出力音量変更処理を実行する（Ｓ３０７）。ステップＳ３０６の処理では、音量データベース１１５２Ｃに同一使用者に対して、所定回数以上(例えば、３回以上)の音量データが記憶されている場合のみ、発言有と判断するようになっている。これは、使用者の発話以外の突発的な発音を、発言として判定しないようにするためである。また、ステップＳ３０６の処理で、ＣＰＵ１１１は、発言無しと判断した場合（Ｓ３０６：ＮＯ）、前回の発言有の判定から一定時間が経過しているかどうかを判断する（Ｓ３０８）。 Returning to the description of the main process in FIG. 7, after the voice detection process (S305) described above, the CPU 111 determines whether or not there is a statement (S306). In the process of step S306, it is determined whether or not the volume data of the speaker is stored in the volume database 1152C in the process of step S405 described in FIG. If the CPU 111 determines in the process of step S306 that there is a statement (306: YES), it executes an output volume change process (S307). In the process of step S306, it is determined that there is a statement only when volume data of a predetermined number of times or more (for example, 3 times or more) is stored for the same user in the volume database 1152C. This is to prevent sudden pronunciation other than the user's utterance from being determined as a utterance. In the process of step S306, when the CPU 111 determines that there is no speech (S306: NO), the CPU 111 determines whether or not a certain time has passed since the previous speech presence determination (S308).

ここで、図９を参照しながら出力音量変更処理のフローチャートについて説明をする。まず初めに、ＣＰＵ１１１は、音量データベース１１５２Ｃに発言者の音量データが有るかどうかを判断する（Ｓ５０１）。ステップＳ５０１の処理で、ＣＰＵ１１１は、音量データが無いと判断した場合（Ｓ５０１：ＮＯ）、ステップＳ５０６の処理へ移行する。ステップＳ５０１の処理で、ＣＰＵ１１１は、音量データが有ると判断した場合（Ｓ５０１：ＹＥＳ）、最も多く発言を検出した使用者を発言者として特定し、特定した使用者に対応する出力音量Ｘを算出する（Ｓ５０２）。 Here, a flowchart of the output sound volume changing process will be described with reference to FIG. First, the CPU 111 determines whether or not there is speaker volume data in the volume database 1152C (S501). If the CPU 111 determines in the process of step S501 that there is no volume data (S501: NO), the process proceeds to step S506. If the CPU 111 determines in step S501 that there is volume data (S501: YES), the user who detects the most speech is identified as the speaker, and the output volume X corresponding to the identified user is calculated. (S502).

ステップＳ５０２で算出される出力音量Ｘについて説明をする。出力音量Ｘは、発言者から所定距離で検出される発言者音声の音量を算出し、他拠点から送信された音声を同程度の音量で、発言者に聞こえるようにするための会議端末装置１の出力音量である。上述したように音量は、会議端末装置１と発言者との間の距離で減衰するので、発言者の音量に対して、発言者から所定距離で聞こえている音量にする為に、減衰量を加算した音量を出力音量Ｘとするようになっている。例えば、図６に示す音量データベース１１５２Ｃの中で会議端末装置１から２ｍ離れている使用者Ａの音量データの平均値が５６ｄＢなので、使用者Ａから１ｍの距離では、使用者Ａの音声は５６−２０ｌｏｇ１０（１／２）＝６２ｄＢの音量で聞こえていることになる。そして、会議端末装置１から出力する音量が、６２ｄＢで使用者Ａに聞こえるようにする為には、会議端末装置１から１ｍ離れた所で聞こえる音量をＸとすると、Ｘ―２０ｌｏｇ１０（２／１）＝６２として、Ｘ＝６８ｄＢと算出できる。よって、会議端末装置１から出力した音量が１ｍの距離で６８ｄＢの音量で聞こえる音量で出力する。 The output volume X calculated in step S502 will be described. The output volume X is a conference terminal device 1 for calculating the volume of the speaker voice detected at a predetermined distance from the speaker and allowing the speaker to hear the voice transmitted from the other base at the same volume level. Is the output volume. As described above, the volume is attenuated by the distance between the conference terminal device 1 and the speaker. Therefore, in order to obtain a volume that can be heard at a predetermined distance from the speaker, the attenuation is set to the volume of the speaker. The added sound volume is set as the output sound volume X. For example, in the volume database 1152C shown in FIG. 6, the average value of the volume data of the user A who is 2 m away from the conference terminal device 1 is 56 dB. Therefore, at a distance of 1 m from the user A, the voice of the user A is 56 The sound is heard at a volume of −20 log 10 (1/2) = 62 dB. Then, in order for the volume output from the conference terminal apparatus 1 to be heard by the user A at 62 dB, X-20log10 (2/1) where X is the volume that can be heard 1 m away from the conference terminal apparatus 1. ) = 62 and X = 68 dB. Therefore, the volume output from the conference terminal device 1 is output at a volume that can be heard at a volume of 68 dB at a distance of 1 m.

ステップＳ５０２の処理の後、ＣＰＵ１１１は、出力音量Ｘが最低音量より小さいかどうかを判断する（Ｓ５０３）。ステップＳ５０３での最低音量とは、ステップＳ３０４（図７）の処理で決定された音量である。ステップＳ５０３の処理で、ＣＰＵ１１１は、出力音量Ｘが最低音量より大きいと判断した場合（Ｓ５０３：ＮＯ）、スピーカ１０から出力する音量を出力音量Ｘに設定変更する（Ｓ５０５）。また、ステップＳ５０３の処理にて、ＣＰＵ１１１は、出力音量Ｘが最低音量より小さいと判断した場合（Ｓ５０３：ＹＥＳ）、スピーカから出力する音量を最低音量に設定する（Ｓ５０４）。次に、ＣＰＵ１１１は、音量データベース１１５２Ｃに記憶されている音量データを削除する処理を実行し（Ｓ５０６）、本処理を終了する。 After the process of step S502, the CPU 111 determines whether or not the output volume X is smaller than the minimum volume (S503). The minimum volume in step S503 is the volume determined by the process in step S304 (FIG. 7). If the CPU 111 determines in step S503 that the output volume X is greater than the minimum volume (S503: NO), the CPU 111 changes the setting of the volume output from the speaker 10 to the output volume X (S505). If the CPU 111 determines in step S503 that the output volume X is lower than the minimum volume (S503: YES), the CPU 111 sets the volume output from the speaker to the minimum volume (S504). Next, the CPU 111 executes a process of deleting the volume data stored in the volume database 1152C (S506), and ends this process.

図７のメイン処理の説明に戻り、図９で説明した音量変更処理（Ｓ３０７）の後、ＣＰＵ１１１は、前回の発言有の判定から一定時間が経過しているかどうかを判断する（Ｓ３０８）。ステップＳ３０８の処理で、ＣＰＵ１１１は一定時間が経過していないと判断した場合（Ｓ３０８：ＮＯ）、ステップＳ３０５の処理へ戻る。また、一定時間経過したと判断した場合（Ｓ３０８：ＹＥＳ）、ＣＰＵ１１１は、ＵＳＢ接続を終了したかどうかを判断する（Ｓ３０９）。ステップＳ３０９の処理で、ＣＰＵ１１１は、ＵＳＢ接続終了でないと判断した場合（Ｓ３０９：ＮＯ）、テレビ会議が終了かどうかを判断する（Ｓ３１０）。ステップＳ３１０の処理で、テレビ会議が終了であると判断した場合（Ｓ３１０：ＹＥＳ）、ステップＳ３０１の処理へ戻る。また、ステップＳ３１０の処理で、ＣＰＵ１１１は、テレビ会議が終了でないと判断した場合（Ｓ３１０：ＮＯ）、ステップＳ３０２の処理へ戻り、以降の処理が再度実行される。また、ステップＳ３０９の処理で、ＣＰＵ１１１は、ＵＳＢ接続終了であると判断した場合（Ｓ３０９：ＹＥＳ）、ステップＳ３００の処理へ戻り、以降の処理を再度実行する。また、本メイン処理は、会議端末装置１の電源がＯＦＦされるまで継続して実行するようになっており、電源がＯＦＦされた場合に本処理を終了する。 Returning to the description of the main process in FIG. 7, after the volume change process (S307) described with reference to FIG. 9, the CPU 111 determines whether or not a certain time has elapsed since the previous determination of having a speech (S308). If the CPU 111 determines in the process of step S308 that a certain time has not elapsed (S308: NO), the process returns to step S305. If it is determined that a certain time has elapsed (S308: YES), the CPU 111 determines whether the USB connection is terminated (S309). If the CPU 111 determines in the process of step S309 that the USB connection has not ended (S309: NO), it determines whether the video conference is ended (S310). If it is determined in step S310 that the video conference is over (S310: YES), the process returns to step S301. If the CPU 111 determines in the process of step S310 that the video conference is not finished (S310: NO), the process returns to the process of step S302, and the subsequent processes are executed again. If the CPU 111 determines in step S309 that the USB connection has ended (S309: YES), the process returns to step S300, and the subsequent processing is executed again. Further, the main process is continuously executed until the power of the conference terminal device 1 is turned off, and the process is terminated when the power is turned off.

上述した処理を実行することにより、拠点に設置された会議端末装置１は、会議に参加している使用者との距離を計測し、また使用者の発言する音声の音量を検出することで、使用者との距離と使用者の発言音量に基づいて、最適な音量で他拠点の音声情報を出力することができる。したがって、使用者は、周囲の騒音を考慮した音量で発言しているので、会議端末装置１から出力される他拠点での会議使用者の音声を、自然に適切な音量で出力させることができるようになる。また、会議端末装置１は、拠点にいる使用者の中から音声を発言した使用者を発言者として特定し、発言者との距離と音量に基づいて、他拠点からの音声情報の音量を制御して出力することができる。したがって、発言した使用者に適した音量で、会議端末装置１から他拠点の会議使用者の音声を出力させることができるようになる。さらに、使用者は、発言した位置から所定距離の位置で聞こえる音量と同等の音量になるよう、会議端末装置１から他拠点の会議使用者の音声を出力させることができるので、発言者とネットワークの向こう側の発言者が近くで会話をしているように自然な音量で、会話をすることができる。また、発言した使用者が特定されてから、発言が終了した後でも、所定時間内は同じ音量で会議端末装置１から他拠点の会議使用者の音声を出力させることができる。さらに、取得した撮像情報に基づいて使用者との距離情報を計測する。よって、会議端末装置１と使用者との距離を、特別な計測機器を備えることなく、撮像画像を解析することで正確に計測することができるようになる。また、会議に参加している使用者の中で、最も会議端末装置１から遠くに位置する使用者に適した音量以上で、他拠点の会議使用者の音声を出力させることができる。したがって、使用者は、どの位置に居ても聞こえる最低限の音量以上で、確実に他拠点の音声を聞き取ることができるようになる。 By executing the processing described above, the conference terminal device 1 installed at the base measures the distance from the user participating in the conference and detects the volume of the voice spoken by the user. Based on the distance to the user and the voice volume of the user, the voice information of other bases can be output at an optimum volume. Accordingly, since the user speaks at a volume that takes into account the surrounding noise, the voice of the conference user at the other site output from the conference terminal device 1 can be naturally output at an appropriate volume. It becomes like this. Further, the conference terminal device 1 identifies a user who has spoken out of the users at the base as a speaker, and controls the volume of voice information from other bases based on the distance to the speaker and the volume. Can be output. Therefore, the conference terminal device 1 can output the voice of the conference user at the other site at a volume suitable for the user who has spoken. Furthermore, since the user can output the voice of the conference user at the other site from the conference terminal device 1 so that the volume is equivalent to the volume heard at a predetermined distance from the position where the speech is made, It is possible to talk at a natural volume as if the speaker on the other side is talking near. In addition, even after the utterance is finished after the uttered user is identified, the conference terminal device 1 can output the voice of the conference user at the other base at the same volume for a predetermined time. Furthermore, distance information with a user is measured based on the acquired imaging information. Therefore, the distance between the conference terminal device 1 and the user can be accurately measured by analyzing the captured image without providing a special measurement device. In addition, among the users participating in the conference, it is possible to output the voice of the conference user at the other base at a volume higher than that suitable for the user located farthest from the conference terminal device 1. Therefore, the user can surely hear the voice of another base at a minimum volume that can be heard at any position.

本発明は上記実施形態に限定されることはなく、様々な変形が可能であることは言うまでもない。例えば、上記実施例では、図７のステップＳ３０１の処理が行われてから、使用者までの距離や、音量を計測するようにしたが、これに限るものではなく、会議端末装置１の電源がＯＮされた後に行うようにしても良い。また、さらに、ステップＳ３０２の処理にて、使用者まので距離をカメラ６の撮像画像から算出するようにしたが、これに限るものではなく、例えば汎用的な非接触距離センサ等を用いて計測するようにしても良い。 It goes without saying that the present invention is not limited to the above-described embodiment, and various modifications are possible. For example, in the above embodiment, the distance to the user and the sound volume are measured after the process of step S301 in FIG. 7 is performed. However, the present invention is not limited to this, and the power supply of the conference terminal apparatus 1 is turned on. It may be performed after being turned on. Further, in the process of step S302, the distance to the user is calculated from the captured image of the camera 6. However, the present invention is not limited to this. For example, measurement is performed using a general-purpose non-contact distance sensor. You may make it do.

１会議端末装置
１０Ａテレビ会議拠点
１０Ｂテレビ会議拠点
１０Ｃテレビ会議拠点
１００テレビ会議システム
１２０通信端末装置
１１５２Ａ距離算出データベース
１１５２Ｂ使用者距離情報データベース
１１５２Ｃ音量データベース 1 conference terminal device 10A video conference base 10B video conference base 10C video conference base 100 video conference system 120 communication terminal device 1152A distance calculation database 1152B user distance information database 1152C volume database

Claims

各種情報の処理を行う情報処理装置と通信手段を介して撮像画像、または音声情報の送受信を行う会議端末装置において、
前記会議端末装置が設置された拠点での使用者の音声を音声情報として入力する音声情報入力手段と、
前記情報処理装置から送信された他拠点の音声情報を受信して出力する音声情報出力手段と、
前記使用者との距離情報を計測する計測手段と、
前記音声情報入力手段にて入力された前記使用者の発言による前記音声情報の音量を検出する音量情報検出手段と、
前記音量情報検出手段により検出された前記音量情報と、前記計測手段により計測された前記距離情報に基づいて、前記音声情報出力手段から出力する他拠点の前記音声情報の音量を制御する出力音量情報制御手段と、
を備えることを特徴とする会議端末装置。 In a conference terminal device that performs transmission / reception of captured images or audio information via an information processing device and communication means for processing various types of information,
Voice information input means for inputting the voice of the user at the site where the conference terminal device is installed as voice information;
Voice information output means for receiving and outputting voice information of another base transmitted from the information processing device;
Measuring means for measuring distance information with the user;
Volume information detection means for detecting the volume of the voice information due to the user's remarks input by the voice information input means;
Output volume information for controlling the volume of the voice information of the other base output from the voice information output unit based on the volume information detected by the volume information detection unit and the distance information measured by the measurement unit Control means;
A conference terminal device comprising:

前記音量情報検出手段により検出された前記音声情報を発言した前記使用者を、発言者として特定する発言者特定手段をさらに備え、
前記計測手段は、前記発言者特定手段により特定された前記発言者までの距離を発言者距離情報として計測し、
前記出力音量情報制御手段は、前記発言者距離情報と、前記音量情報検出手段により検出された前記音量情報に基づいて、前記音声情報出力手段から出力する他拠点の前記音声情報の音量を制御する
ことを特徴とする請求項１に記載の会議端末装置。 A speaker identification unit that identifies the user who has spoken the voice information detected by the volume information detection unit as a speaker;
The measuring means measures the distance to the speaker specified by the speaker specifying means as speaker distance information,
The output volume information control means controls the volume of the voice information of the other base output from the voice information output means based on the speaker distance information and the volume information detected by the volume information detection means. The conference terminal device according to claim 1.

前記音量情報検出手段で検出された前記音量情報を、前記発言者距離情報に基づいて、前記発言者が発言した位置から所定距離の位置での前記音声情報の前記音量情報を算出する発言位置音量算出手段をさらに備え、
前記出力音量情報制御手段は、前記発言位置音量算出手段により算出された前記音量情報と同等になるよう、前記音声情報出力手段から出力する他拠点の前記音声情報の音量を制御する
ことを特徴とする請求項２に記載の会議端末装置。 A speech position volume for calculating the volume information of the audio information at a predetermined distance from the position where the speaker speaks, based on the speaker distance information, based on the volume information detected by the volume information detecting means. A calculation means,
The output volume information control means controls the volume of the voice information of another base that is output from the voice information output means so as to be equivalent to the volume information calculated by the utterance position volume calculation means. The conference terminal device according to claim 2.

前記発言者特定手段により発言した前記使用者を特定してから所定時間経過したかどうかを判断する経過時間判断手段をさらに備え、
前記出力音量情報制御手段は、前記経過時間判断手段により前記所定時間経過していないと判断している間は、前記音声情報出力手段から出力する他拠点の前記音声情報の音量を、前記発言者距離情報と前記音量情報検出手段により検出された前記音量情報に基づいて制御する
ことを特徴とする請求項３に記載の会議端末装置。 Further comprising an elapsed time determining means for determining whether a predetermined time has elapsed since the user specified by the speaker specifying means has been specified,
While the output volume information control means determines that the predetermined time has not elapsed by the elapsed time determination means, the volume of the voice information of the other base output from the voice information output means is determined by the speaker. The conference terminal device according to claim 3, wherein control is performed based on distance information and the sound volume information detected by the sound volume information detecting means.

前記使用者の撮像画像を撮像情報として取得する撮像情報取得手段をさらに備え、
前記計測手段は、前記撮像情報取得手段により取得した前記撮像情報に基づいて前記使用者との距離情報を計測する
ことを特徴とする請求項１乃至４のいずれかに記載の会議端末装置。 It further comprises imaging information acquisition means for acquiring the user's captured image as imaging information,
The conference terminal apparatus according to claim 1, wherein the measurement unit measures distance information with respect to the user based on the imaging information acquired by the imaging information acquisition unit.

前記計測手段は、前記撮像情報取得手段により取得した前記撮像情報に基づいて複数の前記使用者との距離を計測した場合、最も大きい前記距離情報を決定し、
前記出力音量情報制御手段は、前記計測手段により計測された最も大きい前記距離情報に基づいて、前記音声情報出力手段から出力する他拠点の前記音声情報の最小音量を特定し、前記最小音量以上で出力するよう制御する
ことを特徴とする請求項５に記載の会議端末装置。 The measurement unit determines the largest distance information when measuring the distance to the plurality of users based on the imaging information acquired by the imaging information acquisition unit,
The output volume information control means specifies the minimum volume of the voice information of the other base output from the voice information output means based on the largest distance information measured by the measurement means, and is equal to or higher than the minimum volume. It controls so that it may output. The conference terminal device according to claim 5 characterized by things.

各種情報の処理を行う情報処理装置と通信手段を介して撮像画像、音声情報の送受信を行う会議端末装置において処理する会議端末制御方法において、
前記会議端末装置が設置された拠点での使用者の音声を音声情報として入力する音声情報入力ステップと、
前記情報処理装置から送信された他拠点の音声情報を受信して出力する音声情報出力ステップと、
前記使用者との距離情報を計測する計測ステップと、
前記音声情報入力ステップにて、前記使用者の発言による前記音声情報の音量を検出する音量情報検出ステップと、
前記音量情報検出ステップにより検出された前記音量情報と、前記計測ステップにより計測された前記距離情報に基づいて、前記音声情報出力ステップにて出力する他拠点の前記音声情報の音量を制御する出力音量情報制御ステップと、
を処理することを特徴とする会議端末制御方法。 In a conference terminal control method for processing in a conference terminal device that performs transmission and reception of captured images and audio information via an information processing device and communication means for processing various information,
A voice information input step of inputting a voice of a user at a base where the conference terminal device is installed as voice information;
A voice information output step of receiving and outputting voice information of another base transmitted from the information processing apparatus;
A measuring step for measuring distance information with the user;
In the voice information input step, a volume information detection step of detecting a volume of the voice information due to the user's statement;
Based on the volume information detected by the volume information detection step and the distance information measured by the measurement step, an output volume for controlling the volume of the voice information of the other base output in the voice information output step An information control step;
The conference terminal control method characterized by processing.

各種情報の処理を行う情報処理装置と通信手段を介して撮像画像、音声情報の送受信を行う会議端末装置にて実行する会議端末制御プログラムにおいて、
前記会議端末装置が設置された拠点での使用者の音声を音声情報として入力する音声情報入力ステップと、
前記情報処理装置から送信された他拠点の音声情報を受信して出力する音声情報出力ステップと、
前記使用者との距離情報を計測する計測ステップと、
前記音声情報入力ステップにて、前記使用者の発言による前記音声情報の音量を検出する音量情報検出ステップと、
前記音量情報検出ステップにより検出された前記音量情報と、前記計測ステップにより計測された前記距離情報に基づいて、前記音声情報出力ステップにて出力する他拠点の前記音声情報の音量を制御する出力音量情報制御ステップと、
を前記会議端末装置で実行させることを特徴とする会議端末制御プログラム。 In a conference terminal control program executed by an information processing device that processes various types of information and a conference terminal device that transmits and receives captured images and audio information via communication means,
A voice information input step of inputting a voice of a user at a base where the conference terminal device is installed as voice information;
A voice information output step of receiving and outputting voice information of another base transmitted from the information processing apparatus;
A measuring step for measuring distance information with the user;
In the voice information input step, a volume information detection step of detecting a volume of the voice information due to the user's statement;
Based on the volume information detected by the volume information detection step and the distance information measured by the measurement step, an output volume for controlling the volume of the voice information of the other base output in the voice information output step An information control step;
Is executed by the conference terminal device.