JP3616511B2

JP3616511B2 - Multipoint video conference system and method for controlling multipoint video conference system

Info

Publication number: JP3616511B2
Application number: JP29969498A
Authority: JP
Inventors: 健樹降矢
Original assignee: 日本電気エンジニアリング株式会社
Priority date: 1998-10-21
Filing date: 1998-10-21
Publication date: 2005-02-02
Anticipated expiration: 2018-10-21
Also published as: JP2000134596A

Description

【０００１】
【発明の属する技術分野】
本発明は、例えば、遠隔地にある複数のテレビ会議端末装置と多地点テレビ会議制御装置とを公衆回線網や専用線を介して接続し、画像信号、音声信号、データ信号を多重、分配することで多地点テレビの運営制御を行う多地点テレビ会議システム及び多地点テレビ会議システムの制御方法に関するものである。
【０００２】
【従来の技術】
従来、この種のテレビ会議システムでは、図７に示すように、複数のテレビ会議端末５４が公衆回線網や専用線を介して多地点テレビ会議制御装置５８に接続された構成を有している。会議室１のマイク５１に入力された音声はアンプ５２を介してテレビ会議端末５４で圧縮符号化され、同時に符号化された画像信号などと合成されて回線に送信される。他の会議室２、３でも同様の信号が回線に送信されるが、その構成は同様であるため、説明を省略する。
【０００３】
多地点テレビ会議制御装置５８は、それらの回線信号を受信し、ラインインターフェース部５５−１〜３、ＭＵＸ／ＤＭＵＸ５６−１〜３を介してその各成分（画像、音声など）を分離する。分離された音声信号は音声復号化部６０−１〜３で復号化され、これが加算器６４で加算される。加算器６４では加算処理と同時に各テレビ会議端末５４からの音声信号のレベルを検出し、その値が規定の閾値を超えているものを検出した場合はその旨を制御部５７に通知する。制御部５７はその情報に従って画像切換部５９を操作して各テレビ会議端末５４に送信する画像ソースを選択する。
【０００４】
加算器６４から出力される音声信号は全地点分の音声信号が加算されたものであるため、自局への折り返しを防ぐために自局音声減算部６３−１〜３で各々の受信音成分が減算された後に音声符号化部６１−１〜３で符号化され、ＭＵＸ／ＤＭＵＸ５６−１〜３において画像切換部５９で選択された画像信号と合成されてラインインターフェース部５５−１〜３と回線とを介して各テレビ会議端末５４に送信される。各テレビ会議端末５４では自局を除いて加算された音声信号を聞くことができる。
【０００５】
この種のテレビ会議システムでは、音声が規定の閾値を超えているテレビ会議端末５４からの画像信号が他のテレビ会議端末５４に送信されるため、テレビ会議端末５４からの音声信号のレベルにばらつきが生じた場合、円滑な画像信号切換を行うことができない、という問題があった。そこで、特開平７−１６２８２８号公報に開示された多地点テレビ会議システムでは、図７に示すように会議室１〜３に基準音声発生部５０を、多地点テレビ会議制御装置５８に音声信号利得補正回路６２−１〜３を設け、基準音声発生部５０から一定周波数の基準音声信号を送信してこれを音声信号利得補正回路６２−１〜３で受信し、既設の基準値との誤差を算出し、その分音声信号を減衰させることによって各テレビ会議端末５４から受信する音声信号の平均音声レベルを平滑かし、画像切換等の不均衡の解消を図っている。
【０００６】
【発明が解決しようとする課題】
しかしながら、上述した公報に開示された従来の多地点テレビ会議システムには、次のような問題があった。
【０００７】
（１）各会議室に基準音声発生部５０を設置しなければならず、会議地点数の増大に従ってコストが増大する。
【０００８】
（２）基準音声信号を利用する手法は、各会議室間の反響特性の差異を吸収することには効果があると考えられる。しかし、実際は個々人の声量の差、発言者とマイクの距離などリアルタイムで変化しうる要因のほうが音声レベルのばらつきに大きく影響を及ぼすものであり、従来の多地点テレビ会議システムではこの点を解消することは困難である。
【０００９】
（３）各会議室の音声レベルの最大値と画像を切り替えるための閾値との差が微少である場合、発言者は大声を出さないと画像切換制御が行われなくなる。従来の多地点テレビ会議システムにおいて、このような現象が発生した場合、多地点テレビ会議制御装置５８が有している基準値を低下させる操作が必要であり、その値は会議室の環境が変わる度に再試験を行い、調整しなければならないという手間が生じる。
【００１０】
そこで本発明の課題は、会議室の環境やテレビ会議端末に依存することなく各会議室から受信する音声のレベルを平滑化し、適切な画像切換処理を行うことが可能な多地点テレビ会議システム及び多地点テレビ会議システムの制御方法を提供することにある。
【００１１】
【課題を解決するための手段】
本発明は、複数のテレビ会議端末装置と、これらテレビ会議端末装置からの音声出力信号のうち所定の閾値を超えた値を有するテレビ会議端末装置からの映像出力信号をそれ以外のテレビ会議端末装置に送出する多地点テレビ会議制御装置とを備えた多地点テレビ会議システムに適用される。そして、上記課題は、各々のテレビ会議端末装置からの音声出力信号の最大値を検出する音声レベル検出手段と、音声レベル検出手段により検出された各々のテレビ会議端末装置からの音声出力信号の最大値と閾値との差分が一定範囲内に収まるよう、前記差分に応じて音声出力信号の利得を調整する音声レベル最大値調整手段とを多地点テレビ会議制御装置に設け、この多地点テレビ会議制御装置が、音声レベル最大値調整手段により利得が調整された後の音声出力信号に基づいていずれのテレビ会議端末装置からの映像出力信号を送出するかを決定することにより解決される。
【００１２】
ここで、音声レベル検出手段は、所定時間範囲内における各々のテレビ会議端末装置からの音声出力信号の最大値を検出することが好ましい。また、音声レベル最大値調整手段は、各々のテレビ会議端末装置からの音声出力信号に対して少なくとも上記所定時間の時間遅延を与えた後でこの音声出力信号の利得を調整することが好ましい。加えて、音声レベル最大値調整手段は、いずれかのテレビ会議端末装置からの音声出力信号の最大値が所定の音声最大基準値を超えた場合、この最大値が音声最大基準値以下となるように音声出力信号の利得を調整することが好ましい。そして、音声レベル検出手段により検出された各々のテレビ会議端末装置からの音声出力信号の最大値と閾値とが略等しい場合、閾値のレベルを引き下げた新たな閾値を設定する閾値設定手段を多地点テレビ会議制御装置に設け、音声レベル最大値調整手段が、閾値設定手段により設定された新たな閾値により音声出力信号の利得を調整することが好ましい。
【００１３】
また、本発明は、複数のテレビ会議端末装置を備え、これらテレビ会議端末装置からの音声出力信号のうち所定の閾値を超えた値を有するテレビ会議端末装置からの映像出力信号をそれ以外のテレビ会議端末装置に送出する多地点テレビ会議システムの制御方法に適用される。そして、上記他の課題は、各々のテレビ会議端末装置からの音声出力信号の最大値を検出し、検出された各々のテレビ会議端末装置からの音声出力信号の最大値と閾値との差分が一定範囲内に収まるよう、前記差分に応じて音声出力信号の利得を調整し、利得が調整された後の音声出力信号に基づいていずれのテレビ会議端末装置からの映像出力信号を送出するかを決定することにより解決される。
【００１４】
【発明の実施の形態】
以下、図面を参照して本発明の実施形態について詳細に説明する。
図１は、本発明の一実施形態である多地点テレビ会議システムの回路構成を示すブロック図である。
この図に示す多地点テレビ会議システムでは、３カ所の会議室のテレビ会議端末１２が公衆回線を介して多地点テレビ会議制御装置１３に接続されている。
【００１５】
会議室１において会議参加者が発した音声はマイク１０とアンプ１１を介してテレビ会議端末１２に送られ、圧縮符号化された後に図略の撮像装置から得られた画像信号と多重されて公衆回線に送信される。この多重化信号は多地点テレビ会議制御装置１３のラインインターフェース部１４−１で受信され、ＭＵＸ／ＤＭＵＸ１５−１で音声信号、画像信号が分離される。分離された音声信号は音声復号化部１７−１で復号化され、制御部１６と音声信号減衰部２０−１に分配される。会議室２、３で発せられた音声も同様にラインインターフェース部１４−２、１４−３、ＭＵＸ／ＤＭＵＸ１５−２、１５−３、音声復号化部１７−２、１７−３を介して制御部１６及び音声信号減衰部２０−２、２０−３に分配される。
【００１６】
図２は、制御部１６の詳細な構成を示すブロック図である。制御部１６に入力された各々の音声信号は、音声レベル検出部３０−１、３０−２、３０−３において予め定められた一定保護時間の間だけレベル測定が行われ、その最大音声レベルが測定される。
【００１７】
測定された最大音声レベルを表す信号はレベル最大値調整部３１に送出され、既設の画像切換閾値との差分が算出される。一例として、会議室１の音声レベルの変化が図４（ａ）、会議室２の音声レベルの変化が図４（ｂ）に示すような波形であった場合、会議室１における最大音声レベルと画像切換閾値ａとの差分はΔｔ１となり、同様に会議室２における最大音声レベルと画像切換閾値ａとの差分はΔｔ１となる。レベル最大値調整部３１はΔｔ１とΔｔ２との大小を比較し、値が大であるΔｔ１がΔｔ２と等しい値となるような音声レベル減衰量を算出する。
【００１８】
算出された音声レベル減衰量、及び、減衰調整後の予想差分値Δｔ２は制御回路３２に通知され、それを受けた制御回路３２は、減衰制御部３３を介して音声信号減衰部２０−１の減衰制御を行う。
【００１９】
図３は音声信号減衰部２０−１の詳細な構成を示すブロック図である。
音声信号減衰部２０−１では、音声復号化部１７−１からの音声信号が遅延制御部４０で上述の一定保護時間＋αだけ遅延がされているので、減衰制御部３３からの制御信号によって減衰部４１が減衰量を変更することにより、上述の差分Δｔ１は、音声加算部２２に入る前にΔｔ２とほぼ等値の値に修正される。
【００２０】
以上の処理は、音声レベル検出部３０−３の音声信号に対しても同様な対比処理、減衰処理がなされ、各会議室１〜３からの音声レベルが平滑化される。
【００２１】
図１に戻り、音声信号減衰部２０−１、２０−２、２０−３において平滑化された音声信号は音声加算部２２に送られてここで加算される。加算される際、各音声信号のいずれが画像切換閾値ａ（図４参照）を越えているかが測定され、その情報は制御部１６の制御回路３２に送られる。制御回路３２はこの情報をもとに分配画像信号のソースを判断し、画像切換部１９の制御を行う。画像切換部１９は各テレビ会議端末１２に供給すべき画像信号を選択し、ＭＵＸ／ＤＭＵＸ１５−１、１５−２、１５−３に供給する。
【００２２】
音声加算部２２で加算された音声は、音声の折り返しを省くために音声信号減算部２１−１、２１−２、２１−３で各々の自局音声を減算された後に音声符号化部１８−１、１８−２、１８−３で符号化される。符号化された音声信号はＭＵＸ／ＤＭＵＸ１５−１、１５−２、１５−３において画像切換部１９で選択された画像信号に多重され、ラインインターフェース部１４−１、１４−２、１４−３を介して公衆回線に送信される。
【００２３】
この多重化された信号は各会議室１〜３のテレビ会議端末１２で受信され、復号化されて図略のスピーカーやヘッドセットなどに出力される。以上の手順で音声信号が処理されることにより、各会議室１〜３からの音声信号レベルが平滑化され、発言者の遷移に追従した画像切換動作が実現できる。
【００２４】
従って、本実施形態によれば、会議室の環境やテレビ会議端末に依存することなく、各会議室１〜３から受信する音声のレベルを平滑化し、適切な画像切換処理を自動的に行うことができる。
【００２５】
なお、上述の制御部１６では、上述した平滑化を目的とした差分値演算の他に以下のような処理も行われる。
【００２６】
（１）最大値抑制処理
レベル最大値調整部３１において、いずれかの音声レベル検出部３０−１、３０−２、３０−３からの入力音声レベルが図５に示すように音声最大基準値ｂを越えてその差分がΔｔｍａｘとなった場合、異常音声を抑制するために、他の会議室１〜３からの音声と比較することなく、最大音声レベルが音声最大基準値ｂ以下になるように減衰量を算出し、制御回路３２に通知する。制御回路３２は受信した減衰量に基づいて該当する音声信号の減衰制御を行う。これにより、突発的な異常レベル音声信号が加算されることを防ぐことができる。
【００２７】
（２）画像切換閾値制御
レベル最大値調整部３１において、上述の一定保護時間の間、各音声レベル検出部３０−１、３０−２、３０−３からの音声信号の最大レベルと画像切換閾値ａとの差分が図６に示すようにΔｔｍｉｎ≒０である場合、各会議室においてはかなりの大きさの声で発言しないと画像切換閾値ａを音声レベルが越えず、発言をしている会議室１〜３の画像が画像切換部１９で分配画像として選択されない場合がある。制御回路３２はレベル最大値調整部３１から減衰すべき音声信号の減衰量と共に減衰後の予想差分値（図４の例ではΔｔ２）を受信しているので、予想差分値が０に近い値である場合、レベル最大値調整部３１及び音声加算部２２に対してレベルを引き下げた画像切換閾値ａ’を送信する。以後、レベル最大値調整部３１及び音声加算部２２はａ’を新たな画像切換閾値として処理を行う。これにより、手動による画像切換閾値の変更などを行うことなく適正な画像切換制御が続行される。
【００２８】
なお、本発明の多地点テレビ会議システム及び多地点テレビ会議システムの制御方法は、その細部が上述の一実施形態に限定されず、種々の変形が可能である。
【００２９】
【発明の効果】
以上詳細に説明したように、本発明によれば、会議室の環境やテレビ会議端末に依存することなく、各会議室から受信する音声のレベルを平滑化し、適切な画像切換処理を自動的に行うことができる多地点テレビ会議システム及び多地点テレビ会議システムの制御方法を実現することができる。
【図面の簡単な説明】
【図１】本発明の一実施形態である多地点テレビ会議システムの回路構成を示すブロック図である。
【図２】一実施形態の制御部の詳細な構成を示すブロック図である。
【図３】一実施形態の音声信号減衰部の詳細な構成を示すブロック図である。
【図４】（ａ）と（ｂ）は、音声信号の平滑化の手順を説明するための図である。
【図５】最大値抑制処理の手順を説明するための図である。
【図６】画像切換閾値制御の手順を説明するための図である。
【図７】従来の多地点テレビ会議システムの回路構成を示すブロック図である。
【符号の説明】
１〜３会議室
１０マイク
１１アンプ
１２テレビ会議端末
１３多地点テレビ会議制御装置
１４−１〜１４−３ラインインターフェース部
１５−１〜１５−３ＭＵＸ／ＤＭＵＸ
１６制御部
１７−１〜１７−３音声復号化部
１８−１〜１８−３音声符号化部
１９画像切換部
２０−１〜２０−３音声信号減衰部
２１−１〜２１−３音声信号減算部
２２音声加算部
３０−１〜３０−３音声レベル検出部
３１レベル最大値調整部
３２制御回路
３３減衰制御部
４０遅延制御部
４１減衰部[0001]
BACKGROUND OF THE INVENTION
The present invention, for example, connects a plurality of video conference terminal devices and multipoint video conference control devices at remote locations via a public network or a dedicated line, and multiplexes and distributes image signals, audio signals, and data signals. The present invention relates to a multipoint video conference system that controls the operation of multipoint video and a control method for the multipoint video conference system.
[0002]
[Prior art]
Conventionally, this type of video conference system has a configuration in which a plurality of video conference terminals 54 are connected to a multipoint video conference controller 58 via a public network or a dedicated line, as shown in FIG. . The audio input to the microphone 51 in the conference room 1 is compressed and encoded by the video conference terminal 54 via the amplifier 52, and is simultaneously synthesized with the encoded image signal and transmitted to the line. In the other conference rooms 2 and 3, the same signal is transmitted to the line, but the configuration is the same and the description is omitted.
[0003]
The multipoint video conference control device 58 receives these line signals, and separates each component (image, sound, etc.) via the line interface units 55-1 to 3-5 and MUX / DMUX 56-1 to 3-5. The separated audio signals are decoded by the audio decoding units 60-1 to 60-3 and added by the adder 64. The adder 64 detects the level of the audio signal from each video conference terminal 54 at the same time as the addition process, and notifies the control unit 57 when it detects that the value exceeds a prescribed threshold value. The control unit 57 operates the image switching unit 59 according to the information to select an image source to be transmitted to each video conference terminal 54.
[0004]
Since the audio signal output from the adder 64 is obtained by adding the audio signals for all the points, each of the received sound components is received by the own station voice subtracting units 63-1 to 63-3 in order to prevent return to the own station. After being subtracted, the audio encoding units 61-1 to 61-3 encode the signals, and the MUX / DMUXs 56-1 to 56-3 combine with the image signal selected by the image switching unit 59 to form the line interface units 55-1 to 55-3 and the line. And transmitted to each video conference terminal 54. Each video conference terminal 54 can listen to the added audio signal except for its own station.
[0005]
In this type of video conference system, since the image signal from the video conference terminal 54 whose sound exceeds the prescribed threshold is transmitted to the other video conference terminals 54, the level of the audio signal from the video conference terminal 54 varies. When this occurs, there is a problem that smooth image signal switching cannot be performed. Therefore, in the multipoint video conference system disclosed in Japanese Patent Laid-Open No. 7-162828, the reference audio generator 50 is provided in the conference rooms 1 to 3 and the audio signal gain is provided in the multipoint video conference controller 58 as shown in FIG. Correction circuits 62-1 to 62-3 are provided, a reference sound signal having a constant frequency is transmitted from the reference sound generator 50, and this is received by the sound signal gain correction circuits 62-1 to 62, and an error from an existing reference value is detected. By calculating and attenuating the audio signal accordingly, the average audio level of the audio signal received from each video conference terminal 54 is smoothed to eliminate imbalances such as image switching.
[0006]
[Problems to be solved by the invention]
However, the conventional multipoint video conference system disclosed in the above-mentioned publication has the following problems.
[0007]
(1) The reference sound generator 50 must be installed in each conference room, and the cost increases as the number of conference points increases.
[0008]
(2) It is considered that the method using the reference audio signal is effective in absorbing the difference in the echo characteristics between the conference rooms. However, in reality, factors that can change in real time, such as differences in the volume of individual voices and the distance between the speaker and the microphone, have a greater effect on variations in audio levels. It is difficult.
[0009]
(3) When the difference between the maximum value of the audio level of each conference room and the threshold value for switching images is very small, the image switching control is not performed unless the speaker speaks loudly. When such a phenomenon occurs in the conventional multipoint video conference system, an operation for lowering the reference value of the multipoint video conference control device 58 is required, and the value changes the conference room environment. Each time, retesting and adjustment is required.
[0010]
Accordingly, an object of the present invention is to provide a multipoint video conference system capable of smoothing the level of audio received from each conference room and performing appropriate image switching processing without depending on the environment of the conference room or the video conference terminal, and It is to provide a control method of a multipoint video conference system.
[0011]
[Means for Solving the Problems]
The present invention relates to a plurality of video conference terminal devices and video output signals from video conference terminal devices having values exceeding a predetermined threshold among audio output signals from these video conference terminal devices. The present invention is applied to a multi-point video conference system including a multi-point video conference control device that transmits to the network. And the said subject is the audio | voice level detection means which detects the maximum value of the audio | voice output signal from each video conference terminal apparatus, and the maximum of the audio | voice output signal from each video conference terminal apparatus detected by the audio | voice level detection means. The multipoint video conference control device is provided with a voice level maximum value adjusting unit that adjusts the gain of the audio output signal according to the difference so that the difference between the value and the threshold falls within a certain range, and this multipoint video conference control This can be solved by determining which video conference terminal device outputs the video output signal based on the audio output signal whose gain has been adjusted by the audio level maximum value adjusting means.
[0012]
Here, it is preferable that the audio level detection means detects the maximum value of the audio output signal from each video conference terminal device within a predetermined time range. The audio level maximum value adjusting means preferably adjusts the gain of the audio output signal after giving a time delay of at least the predetermined time to the audio output signal from each video conference terminal device. In addition, the audio level maximum value adjustment means is configured such that, when the maximum value of the audio output signal from any of the video conference terminal devices exceeds a predetermined audio maximum reference value, the maximum value is equal to or less than the audio maximum reference value. It is preferable to adjust the gain of the audio output signal. Then, when the maximum value of the audio output signal from each video conference terminal device detected by the audio level detection means is substantially equal to the threshold value, the threshold setting means for setting a new threshold value by lowering the threshold level is set at multiple points. It is preferable that the audio level maximum value adjusting unit provided in the video conference control apparatus adjusts the gain of the audio output signal with the new threshold set by the threshold setting unit.
[0013]
The present invention also includes a plurality of video conference terminal devices, and outputs a video output signal from a video conference terminal device having a value exceeding a predetermined threshold among audio output signals from these video conference terminal devices to other television conference devices. The present invention is applied to a control method of a multipoint video conference system that is transmitted to a conference terminal device. The other problem is that the maximum value of the audio output signal from each video conference terminal device is detected, and the difference between the detected maximum value of the audio output signal from each video conference terminal device and the threshold value is constant. The gain of the audio output signal is adjusted according to the difference so as to be within the range, and the video output signal from which video conference terminal device is transmitted is determined based on the audio output signal after the gain is adjusted It is solved by doing.
[0014]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a block diagram showing a circuit configuration of a multipoint video conference system according to an embodiment of the present invention.
In the multipoint video conference system shown in this figure, video conference terminals 12 in three conference rooms are connected to the multipoint video conference controller 13 through public lines.
[0015]
The voice uttered by the conference participant in the conference room 1 is sent to the video conference terminal 12 via the microphone 10 and the amplifier 11, and after being compressed and encoded, it is multiplexed with an image signal obtained from an imaging device (not shown) and public. Sent to the line. This multiplexed signal is received by the line interface unit 14-1 of the multipoint video conference controller 13, and the audio signal and the image signal are separated by the MUX / DMUX 15-1. The separated audio signal is decoded by the audio decoding unit 17-1, and distributed to the control unit 16 and the audio signal attenuation unit 20-1. Similarly, the voices uttered in the conference rooms 2 and 3 are also controlled via the line interface units 14-2 and 14-3, the MUX / DMUX 15-2 and 15-3, and the voice decoding units 17-2 and 17-3. 16 and the audio signal attenuators 20-2 and 20-3.
[0016]
FIG. 2 is a block diagram showing a detailed configuration of the control unit 16. Each audio signal input to the control unit 16 is measured for a predetermined protection time in the audio level detection units 30-1, 30-2, and 30-3, and the maximum audio level is determined. Measured.
[0017]
A signal representing the measured maximum audio level is sent to the level maximum value adjusting unit 31, and a difference from the existing image switching threshold is calculated. As an example, when the change in the audio level of the conference room 1 is a waveform as shown in FIG. 4A and the change in the audio level of the conference room 2 is as shown in FIG. 4B, the maximum audio level in the conference room 1 is The difference from the image switching threshold a is Δt1, and similarly, the difference between the maximum sound level in the conference room 2 and the image switching threshold a is Δt1. The level maximum value adjustment unit 31 compares Δt1 and Δt2, and calculates a sound level attenuation amount such that Δt1 having a large value is equal to Δt2.
[0018]
The calculated audio level attenuation amount and the expected difference value Δt2 after the attenuation adjustment are notified to the control circuit 32, and the control circuit 32 that receives the notification attenuates the audio signal attenuation unit 20-1 via the attenuation control unit 33. Perform attenuation control.
[0019]
FIG. 3 is a block diagram showing a detailed configuration of the audio signal attenuation unit 20-1.
In the audio signal attenuating unit 20-1, since the audio signal from the audio decoding unit 17-1 is delayed by the above-described constant protection time + α in the delay control unit 40, the audio signal is attenuated by the control signal from the attenuation control unit 33. By changing the attenuation amount by the unit 41, the above-described difference Δt1 is corrected to a value substantially equal to Δt2 before entering the audio adding unit 22.
[0020]
In the above processing, the same comparison processing and attenuation processing are performed on the audio signal of the audio level detection unit 30-3, and the audio levels from the conference rooms 1 to 3 are smoothed.
[0021]
Returning to FIG. 1, the audio signals smoothed in the audio signal attenuating units 20-1, 20-2 and 20-3 are sent to the audio adding unit 22 and added there. At the time of addition, it is measured which of the audio signals exceeds the image switching threshold value a (see FIG. 4), and the information is sent to the control circuit 32 of the control unit 16. The control circuit 32 determines the source of the distributed image signal based on this information and controls the image switching unit 19. The image switching unit 19 selects an image signal to be supplied to each video conference terminal 12 and supplies it to the MUX / DMUX 15-1, 15-2, 15-3.
[0022]
The speech added by the speech adder 22 is subtracted from the local speech by the speech signal subtractors 21-1, 21-2, and 21-3 in order to eliminate the return of speech, and then the speech encoder 18- 1, 18-2 and 18-3. The encoded audio signal is multiplexed with the image signal selected by the image switching unit 19 in the MUX / DMUX 15-1, 15-2, 15-3, and the line interface units 14-1, 14-2, 14-3 are passed through. Via the public line.
[0023]
The multiplexed signal is received by the video conference terminals 12 in the conference rooms 1 to 3, decoded, and output to a speaker or a headset (not shown). By processing the audio signal according to the above procedure, the audio signal level from each of the conference rooms 1 to 3 is smoothed, and an image switching operation that follows the transition of the speaker can be realized.
[0024]
Therefore, according to this embodiment, the level of the sound received from each of the conference rooms 1 to 3 is smoothed and the appropriate image switching process is automatically performed without depending on the environment of the conference room or the video conference terminal. Can do.
[0025]
The above-described control unit 16 performs the following processing in addition to the above-described difference value calculation for the purpose of smoothing.
[0026]
(1) Maximum value suppression processing level In the maximum value adjustment unit 31, the input audio level from any of the audio level detection units 30-1, 30-2, 30-3 is the audio maximum reference value b as shown in FIG. If the difference exceeds Δtmax and exceeds Δtmax, the maximum audio level is set to be equal to or less than the audio maximum reference value b without comparing with audio from other conference rooms 1 to 3 in order to suppress abnormal audio. The attenuation amount is calculated and notified to the control circuit 32. The control circuit 32 performs attenuation control of the corresponding audio signal based on the received attenuation amount. Thereby, it is possible to prevent a sudden abnormal level audio signal from being added.
[0027]
(2) In the image switching threshold control level maximum value adjusting unit 31, the maximum level of the audio signal from each of the audio level detecting units 30-1, 30-2, and 30-3 and the image switching threshold during the above-described fixed protection time. When the difference from a is Δtmin≈0 as shown in FIG. 6, the audio level does not exceed the image switching threshold value a in each conference room unless the speaker speaks with a considerably loud voice, and the conference is speaking. The images in the chambers 1 to 3 may not be selected as distribution images by the image switching unit 19. Since the control circuit 32 receives the attenuation value of the audio signal to be attenuated from the level maximum value adjustment unit 31 and the expected difference value after attenuation (Δt2 in the example of FIG. 4), the expected difference value is a value close to 0. In some cases, an image switching threshold value a ′ with a lowered level is transmitted to the level maximum value adjusting unit 31 and the audio adding unit 22. Thereafter, the level maximum value adjusting unit 31 and the audio adding unit 22 perform processing using a ′ as a new image switching threshold value. Thus, appropriate image switching control is continued without manually changing the image switching threshold.
[0028]
The details of the multipoint video conference system and the control method of the multipoint video conference system of the present invention are not limited to the above-described embodiment, and various modifications are possible.
[0029]
【The invention's effect】
As described above in detail, according to the present invention, the level of sound received from each conference room is smoothed and an appropriate image switching process is automatically performed without depending on the environment of the conference room or the video conference terminal. A multipoint video conference system and a control method for the multipoint video conference system can be realized.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a circuit configuration of a multipoint video conference system according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating a detailed configuration of a control unit according to an embodiment.
FIG. 3 is a block diagram illustrating a detailed configuration of an audio signal attenuation unit according to an embodiment.
FIGS. 4A and 4B are diagrams for explaining a procedure of smoothing an audio signal. FIG.
FIG. 5 is a diagram for explaining a procedure of maximum value suppression processing;
FIG. 6 is a diagram for explaining a procedure of image switching threshold control.
FIG. 7 is a block diagram showing a circuit configuration of a conventional multipoint video conference system.
[Explanation of symbols]
1-3 Conference room 10 Microphone 11 Amplifier 12 Video conference terminal 13 Multi-point video conference control device 14-1 to 14-3 Line interface unit 15-1 to 15-3 MUX / DMUX
16 Control units 17-1 to 17-3 Audio decoding units 18-1 to 18-3 Audio encoding unit 19 Image switching units 20-1 to 20-3 Audio signal attenuation units 21-1 to 21-3 Audio signal subtraction Unit 22 audio adding unit 30-1 to 30-3 audio level detecting unit 31 level maximum value adjusting unit 32 control circuit 33 attenuation control unit 40 delay control unit 41 attenuation unit

Claims

複数のテレビ会議端末装置と、これらテレビ会議端末装置からの音声出力信号のうち所定の閾値を超えた値を有するテレビ会議端末装置からの映像出力信号をそれ以外のテレビ会議端末装置に送出する多地点テレビ会議制御装置とを備えた多地点テレビ会議システムにおいて、
前記多地点テレビ会議制御装置が、各々の前記テレビ会議端末装置からの音声出力信号の最大値を検出する音声レベル検出手段と、
前記音声レベル検出手段により検出された各々の前記テレビ会議端末装置からの音声出力信号の最大値と前記閾値との差分が一定範囲内に収まるよう、前記差分に応じて前記音声出力信号の利得を調整する音声レベル最大値調整手段とを備え、
この音声レベル最大値調整手段により利得が調整された後の前記音声出力信号に基づいてどの前記テレビ会議端末装置からの映像出力信号を送出するかを決定することを特徴とする多地点テレビ会議システム。Multiple video conference terminal devices and a video output signal from a video conference terminal device having a value exceeding a predetermined threshold among audio output signals from these video conference terminal devices are sent to other video conference terminal devices. In a multipoint video conference system equipped with a point video conference control device,
The multipoint video conference control device detects an audio level detection means for detecting a maximum value of an audio output signal from each of the video conference terminal devices;
The gain of the audio output signal is set according to the difference so that the difference between the maximum value of the audio output signal from each of the video conference terminal devices detected by the audio level detecting means and the threshold value is within a certain range. An audio level maximum value adjusting means to adjust,
A multipoint video conference system, wherein a video output signal from which video conference terminal device is to be transmitted is determined based on the audio output signal whose gain has been adjusted by the audio level maximum value adjusting means. .

前記音声レベル検出手段は、所定時間範囲内における各々の前記テレビ会議端末装置からの音声出力信号の最大値を検出することを特徴とする請求項１記載の多地点テレビ会議システム。2. The multipoint video conference system according to claim 1, wherein the audio level detection means detects a maximum value of an audio output signal from each of the video conference terminal devices within a predetermined time range.

前記音声レベル最大値調整手段は、各々の前記テレビ会議端末装置からの音声出力信号に対して少なくとも上記所定時間の時間遅延を与えた後でこの音声出力信号の利得を調整することを特徴とする請求項２記載の多地点テレビ会議システム。The audio level maximum value adjusting unit adjusts the gain of the audio output signal after giving a time delay of at least the predetermined time to the audio output signal from each of the video conference terminal devices. The multipoint video conference system according to claim 2.

前記音声レベル最大値調整手段は、いずれかの前記テレビ会議端末装置からの音声出力信号の最大値が所定の音声最大基準値を超えた場合、この最大値が前記音声最大基準値以下となるように前記音声出力信号の利得を調整することを特徴とする請求項１〜３のいずれかの項記載の多地点テレビ会議システム。When the maximum value of the audio output signal from any of the video conference terminal devices exceeds a predetermined audio maximum reference value, the audio level maximum value adjusting unit is configured such that the maximum value is equal to or less than the audio maximum reference value. The multipoint video conference system according to any one of claims 1 to 3, wherein a gain of the audio output signal is adjusted.

前記多地点テレビ会議制御装置は、前記音声レベル検出手段により検出された各々の前記テレビ会議端末装置からの音声出力信号の最大値と前記閾値とが略等しい場合、前記閾値のレベルを引き下げた新たな閾値を設定する閾値設定手段を備え、
前記音声レベル最大値調整手段は、前記閾値設定手段により設定された新たな閾値により前記音声出力信号の利得を調整することを特徴とする請求項１〜４のいずれかの項記載の多地点テレビ会議システム。When the maximum value of the audio output signal from each of the video conference terminal devices detected by the audio level detection means is substantially equal to the threshold value, the multipoint video conference control device newly reduces the threshold level. A threshold setting means for setting a threshold value,
5. The multipoint television according to claim 1, wherein the audio level maximum value adjusting unit adjusts the gain of the audio output signal by a new threshold set by the threshold setting unit. Conference system.

複数のテレビ会議端末装置を備え、これらテレビ会議端末装置からの音声出力信号のうち所定の閾値を超えた値を有するテレビ会議端末装置からの映像出力信号をそれ以外のテレビ会議端末装置に送出する多地点テレビ会議システムの制御方法であって、
各々の前記テレビ会議端末装置からの音声出力信号の最大値を検出し、検出された各々の前記テレビ会議端末装置からの音声出力信号の最大値と前記閾値との差分が一定範囲内に収まるよう、前記差分に応じて前記音声出力信号の利得を調整し、利得が調整された後の前記音声出力信号に基づいていずれの前記テレビ会議端末装置からの映像出力信号を送出するかを決定することを特徴とする多地点テレビ会議システムの制御方法。A plurality of video conference terminal devices are provided, and a video output signal from a video conference terminal device having a value exceeding a predetermined threshold among audio output signals from these video conference terminal devices is transmitted to the other video conference terminal devices. A control method for a multipoint video conference system,
The maximum value of the audio output signal from each of the video conference terminal devices is detected, and the difference between the detected maximum value of the audio output signal from each of the video conference terminal devices and the threshold value is within a certain range. Adjusting the gain of the audio output signal according to the difference, and determining which video output signal from the video conference terminal device is to be transmitted based on the audio output signal after the gain is adjusted A control method of a multipoint video conference system characterized by the above.