JP2017034645A

JP2017034645A - Imaging apparatus, program, and imaging method

Info

Publication number: JP2017034645A
Application number: JP2015214722A
Authority: JP
Inventors: 清人五十嵐; Kiyoto Igarashi; 内山　裕章; Hiroaki Uchiyama; 裕章内山; 耕司桑田; Koji Kuwata; 高橋　仁人; Masahito Takahashi; 仁人高橋; 智幸後藤; Tomoyuki Goto; 和紀北澤; Kazuki Kitazawa; 宣正銀川; Nobumasa Gingawa; 未来袴谷; Miku Hakamatani
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2015-08-03
Filing date: 2015-10-30
Publication date: 2017-02-09
Anticipated expiration: 2035-10-30
Also published as: JP6631166B2

Abstract

PROBLEM TO BE SOLVED: To allow voice to be recorded more clearly during the period of imaging.SOLUTION: An imaging apparatus for imaging a person comprises: a detection part which detects the direction of a person having made a speech and a sound volume generated when the person makes the speech; a control part which controls an imaging direction so as to face the detected direction; and a notification part which notifies the person when the sound volume is smaller than the minimum value of a predetermined range.SELECTED DRAWING: Figure 3

Description

本発明は、撮影装置、プログラム及び撮影方法に関する。 The present invention relates to a photographing apparatus, a program, and a photographing method.

会議で発話した参加者をカメラで追尾して撮影し、会議の映像を外部に配信するシステムがある。 There is a system in which participants uttered in a meeting are tracked and photographed with a camera, and video of the meeting is distributed outside.

例えば、会議室に設置されているカメラにマイクアレイが配列されており、マイクアレイによって発話した参加者の方向が検知される。検知した参加者の方向にカメラの向きが制御され、発話している参加者が撮影される。別の参加者が発話した場合、発話した参加者の方向にカメラの向きが変更される。撮影された会議の映像（画像）は、ネットワークを介して閲覧者の端末に配信される。 For example, a microphone array is arranged on a camera installed in a conference room, and the direction of the participant who speaks is detected by the microphone array. The direction of the camera is controlled in the direction of the detected participant, and the participant who is speaking is photographed. When another participant speaks, the direction of the camera is changed to the direction of the speaking participant. The captured video (image) of the conference is distributed to the viewer's terminal via the network.

例えば、複数のマイクを配列して構成されたマイクアレイを用いて発話者の方向を検出し、検出した方向にカメラの向きを制御する技術が開示されている（例えば、特許文献１）。 For example, a technique for detecting the direction of a speaker using a microphone array configured by arranging a plurality of microphones and controlling the direction of the camera in the detected direction is disclosed (for example, Patent Document 1).

しかしながら、従来の技術では、撮影時に鮮明な音声を収録できないという問題があった。 However, the conventional technology has a problem in that clear sound cannot be recorded at the time of shooting.

例えば、発話者がカメラから遠く離れた位置に着席している場合、音声が小さすぎるため、鮮明な音声を収録できないことがある。また、発話者がカメラに近すぎる位置に着席している場合、音声が大きくなりすぎるため、音声が割れるおそれがある。 For example, when the speaker is seated at a position far away from the camera, the voice may be too small to record clear voice. Also, if the speaker is seated at a position that is too close to the camera, the sound will be too loud and the sound may break.

そこで、開示の技術では、より鮮明な音声を収録することを目的とする。 Therefore, the disclosed technique aims to record clearer audio.

実施形態では、人物を撮影する撮影装置において、発話した人物の方向及び発話時の音量を検出する検出部と、前記検出した方向に撮影方向を制御する制御部と、前記音量が所定範囲の最小値よりも小さい場合、前記人物に通知をする通知部と、を有する撮影装置が開示される。 In an embodiment, in a photographing apparatus for photographing a person, a detection unit that detects a direction of a person who speaks and a sound volume at the time of speaking, a control unit that controls a photographing direction in the detected direction, and a minimum sound volume within a predetermined range An imaging device having a notification unit that notifies the person when the value is smaller than the value is disclosed.

より鮮明な音声を収録することができる。 It can record clearer audio.

映像配信システムの全体の構成の例を示す図である。It is a figure which shows the example of a whole structure of a video delivery system. 配信端末のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of a delivery terminal. 実施形態１に係る配信端末の機能構成の例を示す図である。It is a figure which shows the example of a function structure of the delivery terminal which concerns on Embodiment 1. FIG. カメラの位置座標と角度との関係を示す図である。It is a figure which shows the relationship between the position coordinate of a camera, and an angle. 参加者の位置と通知されるメッセージとの関係を示す図である。It is a figure which shows the relationship between a participant's position and the message notified. 実施形態１の制御フローを示す図である。It is a figure which shows the control flow of Embodiment 1. FIG. 実施形態２に係る配信端末の機能構成の例を示す図である。It is a figure which shows the example of a function structure of the delivery terminal which concerns on Embodiment 2. FIG. ユーザテーブルのデータ構造の例を示す図である。It is a figure which shows the example of the data structure of a user table. 実施形態２の制御フローを示す図である。It is a figure which shows the control flow of Embodiment 2. FIG. 認証処理のフローを示す図である。It is a figure which shows the flow of an authentication process. 実施形態３の配信端末の第１のハードウェア構成を示す図である。FIG. 10 is a diagram illustrating a first hardware configuration of a distribution terminal according to the third embodiment. 実施形態３の配信端末の第２のハードウェア構成を示す図である。FIG. 10 is a diagram illustrating a second hardware configuration of a distribution terminal according to the third embodiment. ステレオカメラの外観の一例を示す図である。It is a figure which shows an example of the external appearance of a stereo camera. 距離センサの外観の一例を示す図である。It is a figure which shows an example of the external appearance of a distance sensor. 実施形態３の制御フローを示す図である。It is a figure which shows the control flow of Embodiment 3.

以下、本発明の実施形態について添付の図面を参照しながら説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することによって重複した説明を省く。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. In addition, in this specification and drawing, about the component which has substantially the same function structure, the duplicate description is abbreviate | omitted by attaching | subjecting the same code | symbol.

（実施形態１）
図１は、映像配信システム１の全体の構成の例を示す図である。映像配信システム１は、サーバ２と、配信端末３と、利用者端末４ａ〜４ｎと、表示装置６とを有する。サーバ２は、通信部２１を有する。配信端末３は、通信部３１と、処理部３２と、データ取得部３３と、データ出力部３４と、記憶部３５とを有する。サーバ２と、配信端末３と、利用者端末４ａ〜４ｎとは、通信ネットワーク５を介して接続される。また、表示装置６は、データ出力部３４に接続される。なお、表示装置６は、配信端末３に設けられていてもよい。 (Embodiment 1)
FIG. 1 is a diagram illustrating an example of the overall configuration of the video distribution system 1. The video distribution system 1 includes a server 2, a distribution terminal 3, user terminals 4 a to 4 n, and a display device 6. The server 2 has a communication unit 21. The distribution terminal 3 includes a communication unit 31, a processing unit 32, a data acquisition unit 33, a data output unit 34, and a storage unit 35. Server 2, distribution terminal 3, and user terminals 4 a to 4 n are connected via communication network 5. The display device 6 is connected to the data output unit 34. The display device 6 may be provided in the distribution terminal 3.

データ取得部３３は、例えば、会議室内の画像データ及び音声データを取得する。通信部３１は、取得した画像データ及び音声データを、通信ネットワーク５を介してサーバ２に送信する。サーバ２は、通信ネットワーク５を介して利用者端末４ａ〜４ｎに画像データ及び音声データを配信する。 For example, the data acquisition unit 33 acquires image data and audio data in the conference room. The communication unit 31 transmits the acquired image data and audio data to the server 2 via the communication network 5. The server 2 distributes image data and audio data to the user terminals 4a to 4n via the communication network 5.

図２は、配信端末３のハードウェア構成を示す図である。図２に示されているように、配信端末３は、配信端末３全体の動作を制御するＣＰＵ(Central Processing Unit)１０１、ＩＰＬ(Initial Program Loader)等のＣＰＵ１０１の駆動に用いられるプログラムを記憶したＲＯＭ(Read Only Memory)１０２、ＣＰＵ１０１のワークエリアとして使用されるＲＡＭ(Random Access Memory)１０３、端末用プログラム、画像データ、及び音声データ等の各種データを記憶するフラッシュメモリ１０４、ＣＰＵ１０１の制御にしたがってフラッシュメモリ１０４に対する各種データの読み出し又は書き込みを制御するＳＳＤ（Solid State Drive）１０５、フラッシュメモリ等の記録メディア１０６に対するデータの読み出し又は書き込み（記憶）を制御するメディアドライブ１０７、配信端末３の宛先を選択する場合などに操作される操作ボタン１０８、配信端末３の電源のＯＮ／ＯＦＦを切り換えるための電源スイッチ１０９、通信ネットワーク５を利用してデータ伝送をするためのネットワークＩ／Ｆ(Interface)１１１を備えている。 FIG. 2 is a diagram illustrating a hardware configuration of the distribution terminal 3. As shown in FIG. 2, the distribution terminal 3 stores programs used for driving the CPU 101 such as a CPU (Central Processing Unit) 101 that controls the operation of the entire distribution terminal 3 and an IPL (Initial Program Loader). A ROM (Read Only Memory) 102, a RAM (Random Access Memory) 103 used as a work area of the CPU 101, a flash memory 104 that stores various data such as terminal programs, image data, and audio data, and the control of the CPU 101 An SSD (Solid State Drive) 105 that controls reading or writing of various data to the flash memory 104, a media drive 107 that controls reading or writing (storage) of data to the recording medium 106 such as a flash memory, and the destination of the distribution terminal 3 Operation buttons operated when selecting , A power switch 109 for switching on / off the power of the distribution terminal 3, and a network I / F (Interface) 111 for data transmission using the communication network 5.

また、配信端末３は、ＣＰＵ１０１の制御に従って被写体を撮像して画像データを得る内蔵型のカメラ１１２、このカメラ１１２の駆動を制御する撮像素子Ｉ／Ｆ１１３、音声を入力する内蔵型のマイク１１４、音声を出力する内蔵型のスピーカ１１５、ＣＰＵ１０１の制御に従ってマイク１１４及びスピーカ１１５との間で音声信号の入出力を処理する音声入出力Ｉ／Ｆ１１６、ＣＰＵ１０１の制御に従って外付けのディスプレイ１２０に画像データを伝送するディスプレイＩ／Ｆ１１７、各種の外部機器を接続するための外部機器接続Ｉ／Ｆ１１８、及び上記各構成要素を図２に示されているように電気的に接続するためのアドレスバスやデータバス等のバスライン１１０を備えている。 The distribution terminal 3 also includes a built-in camera 112 that captures an image of a subject under the control of the CPU 101 to obtain image data, an image sensor I / F 113 that controls driving of the camera 112, a built-in microphone 114 that inputs sound, The built-in speaker 115 that outputs audio, the audio input / output I / F 116 that processes input / output of audio signals between the microphone 114 and the speaker 115 according to the control of the CPU 101, and the image data on the external display 120 according to the control of the CPU 101 Display I / F 117, external device connection I / F 118 for connecting various external devices, and address buses and data for electrically connecting the above components as shown in FIG. A bus line 110 such as a bus is provided.

ディスプレイ１２０は、被写体の画像や操作用アイコン等を表示する液晶や有機ＥＬによって構成された表示部である。また、ディスプレイ１２０は、ケーブル１２０ｃによってディスプレイＩ／Ｆ１１７に接続される。このケーブル１２０ｃは、アナログＲＧＢ（ＶＧＡ）信号用のケーブルであってもよいし、コンポーネントビデオ用のケーブルであってもよいし、ＨＤＭＩ（登録商標）(High-Definition Multimedia Interface)やＤＶＩ(Digital Video Interactive)信号用のケーブルであってもよい。 The display 120 is a display unit configured by a liquid crystal or an organic EL that displays an image of a subject, an operation icon, and the like. The display 120 is connected to the display I / F 117 by a cable 120c. The cable 120c may be an analog RGB (VGA) signal cable, a component video cable, HDMI (High-Definition Multimedia Interface) or DVI (Digital Video). Interactive) signal cable may be used.

カメラ１１２は、レンズや、光を電荷に変換して被写体の画像（映像）を電子化する固体撮像素子を含み、固体撮像素子として、ＣＭＯＳ(Complementary Metal Oxide Semiconductor)イメージセンサや、ＣＣＤ（Charge Coupled Device）イメージセンサ等が用いられる。 The camera 112 includes a lens and a solid-state imaging device that converts light into electric charges and digitizes a subject image (video). As the solid-state imaging device, a CMOS (Complementary Metal Oxide Semiconductor) image sensor, a CCD (Charge Coupled), or the like. Device) An image sensor or the like is used.

外部機器接続Ｉ／Ｆ１１８には、ＵＳＢ(Universal Serial Bus)ケーブル等によって、外付けカメラ、外付けマイク、及び外付けスピーカ等の外部機器がそれぞれ接続可能である。外付けカメラが接続された場合には、ＣＰＵ１０１の制御に従って、内蔵型のカメラ１１２に優先して、外付けカメラが動作する。同じく、外付けマイクが接続された場合や、外付けスピーカが接続された場合には、ＣＰＵ１０１の制御に従って、それぞれが内蔵型のマイク１１４や内蔵型のスピーカ１１５に優先して、外付けマイクや外付けスピーカを駆動させる。 External devices such as an external camera, an external microphone, and an external speaker can be connected to the external device connection I / F 118 by a USB (Universal Serial Bus) cable or the like. When an external camera is connected, the external camera operates in preference to the built-in camera 112 according to the control of the CPU 101. Similarly, when an external microphone is connected or when an external speaker is connected, each of the external microphones and the built-in speaker 115 is given priority over the internal microphone 114 and the internal speaker 115 according to the control of the CPU 101. Drive an external speaker.

なお、記録メディア１０６は、配信端末３に対して着脱自在な構成となっている。また、ＣＰＵ１０１の制御にしたがってデータの読み出し又は書き込みを行う不揮発性メモリであれば、フラッシュメモリ１０４に限らず、ＥＥＰＲＯＭ（Electrically Erasable and Programmable ROM）等を用いてもよい。 Note that the recording medium 106 is detachable from the distribution terminal 3. Further, as long as it is a non-volatile memory that reads or writes data according to the control of the CPU 101, not only the flash memory 104 but also an EEPROM (Electrically Erasable and Programmable ROM) or the like may be used.

更に、上記端末用プログラムは、インストール可能な形式又は実行可能な形式のファイルで、上記記録メディア１０６等の、コンピュータで読み取り可能な記録媒体に記録して流通させるようにしてもよい。また、上記端末用プログラムは、フラッシュメモリ１０４ではなくＲＯＭ１０２に記憶させるようにしてもよい。 Further, the terminal program may be recorded in a computer-readable recording medium such as the recording medium 106 and distributed as a file in an installable or executable format. The terminal program may be stored in the ROM 102 instead of the flash memory 104.

図３は、実施形態１に係る配信端末３の機能構成の例を示す図である。データ取得部３３は、音声取得部３３ａと、撮像部３３ｂとを有する。音声取得部３３ａは、会議室内の音声データを取得する。 FIG. 3 is a diagram illustrating an example of a functional configuration of the distribution terminal 3 according to the first embodiment. The data acquisition unit 33 includes an audio acquisition unit 33a and an imaging unit 33b. The voice acquisition unit 33a acquires voice data in the conference room.

撮像部３３ｂは、発話者の画像データを取得する。撮像部３３ｂは、取得した発話者の画像データを記憶部３５に記憶する。また、通信部３１は、会議室内の画像データ及び音声データをサーバ２へ送信する。 The imaging unit 33b acquires the image data of the speaker. The imaging unit 33b stores the acquired image data of the speaker in the storage unit 35. Further, the communication unit 31 transmits image data and audio data in the conference room to the server 2.

処理部３２は、検出部３２ａと、制御部３２ｂと、計測部３２ｃと、通知部３２ｄとを有する。検出部３２ａは、音声取得部３３ａを用いて、会議室で発話した参加者の方向を検出する。具体的には、検出部３２ａは、例えば、音声取得部３３ａで取得された音声データから音声の平均値を算出し、音声の平均値が所定値以上の場合に発話者が存在するものと判定する。 The processing unit 32 includes a detection unit 32a, a control unit 32b, a measurement unit 32c, and a notification unit 32d. The detection part 32a detects the direction of the participant who spoke in the meeting room using the audio | voice acquisition part 33a. Specifically, for example, the detection unit 32a calculates the average value of the voice from the voice data acquired by the voice acquisition unit 33a, and determines that the speaker is present when the average value of the voice is equal to or greater than a predetermined value. To do.

続いて、検出部３２ａは、マイクアレイを構成する各マイクから取得された音声データに基づいて発話者の方向を検出する。例えば、検出部３２ａは、座標を（０,０）とする原点から会議室内の基準点（座標（０,Ｙ_０））を撮影する方向を０度とした場合、撮像部３３ｂの位置座標（Ｘ_０,Ｙ_０）を取得し、また音声データに基づいて参加者を撮影した時の撮影角度θとを取得する。 Subsequently, the detection unit 32a detects the direction of the speaker based on the audio data acquired from each microphone constituting the microphone array. For example, when the direction in which the reference point (coordinate (0, Y ₀ )) in the conference room is imaged from the origin whose coordinates are (0, ₀ ) is set to 0 degrees, the detection unit 32a has a position coordinate ( X ₀ , Y ₀ ) and the shooting angle θ when the participant is shot based on the audio data.

以下、発話者の位置に対応した撮影方向に関するデータを方向データという。方向データは、撮像部３３ｂの位置座標（Ｘ_０,Ｙ_０）と参加者を撮影した時の撮影角度θとにより、例えば、（Ｘ_０,Ｙ_０,θ）のように表される。 Hereinafter, data relating to the shooting direction corresponding to the position of the speaker is referred to as direction data. The direction data is represented as (X ₀ , Y ₀ , θ), for example, by the position coordinates (X ₀ , Y ₀ ) of the imaging unit 33b and the shooting angle θ when the participant is shot.

図４は、撮像部３３ｂの位置座標と角度との関係を示す図である。方向データＢ（０,０,０）は、原点（０,０）から基準点β（０,Ｙ_０）を撮影した場合の撮像部３３ｂの位置座標（０,０）及び撮影角度０度を示す。 FIG. 4 is a diagram illustrating the relationship between the position coordinates and the angle of the imaging unit 33b. The direction data B (0,0,0) includes the position coordinate (0,0) of the imaging unit 33b and the shooting angle of 0 degree when the reference point β (0, Y ₀ ) is shot from the origin (0,0). Show.

方向データＡ（Ｘ_１,Ｙ_１,θ_１）は、撮像部３３ｂの位置座標（Ｘ_１,Ｙ_１）と、発話者が検出された方向に対応する撮像部３３ｂの撮影角度θ_１を示す。また、撮影角度θ_１は、原点（０,０）から基準点β（０,Ｙ_０）を撮影した撮像部３３ｂの撮影角度を０度とした場合において、位置座標（Ｘ_１,Ｙ_１）から発話者を撮影したときの撮影角度を示す。なお、撮像部３３ｂを原点（０,０）に固定する場合は、方向データを撮影角度θ_１のみで表してもよい。 The direction data A (X ₁ , Y ₁ , θ ₁ ) indicates the position coordinates (X ₁ , Y ₁ ) of the imaging unit 33b and the imaging angle θ ₁ of the imaging unit 33b corresponding to the direction in which the speaker is detected. . The shooting angle θ ₁ is the position coordinate (X ₁ , Y ₁ ) when the shooting angle of the imaging unit 33b that has shot the reference point β (0, Y ₀ ) from the origin (0, ₀ ) is 0 °. The shooting angle when the speaker is shot from is shown. In the case of fixing the imaging unit 33b to the origin (0,0) may represent the direction data imaging angle theta ₁ only.

また、検出部３２ａは、複数の撮像部３３ｂを用いて発話者を検出してもよい。例えば、複数の撮像部３３ｂを用いて発話者を検出する場合、発話者を検出した撮像部３３ｂの識別番号ｎと、撮像部３３ｂの位置及び撮影角度のデータ（Ｘ_０,Ｙ_０,θ）とに基づいて、検出部３２ａは、カメラの識別番号ｎを含む方向データ（ｎ,Ｘ_０,Ｙ_０,θ）を取得してもよい。 The detection unit 32a may detect a speaker using a plurality of imaging units 33b. For example, when a speaker is detected using a plurality of imaging units 33b, the identification number n of the imaging unit 33b that has detected the speaker, and the position and imaging angle data (X ₀ , Y ₀ , θ) of the imaging unit 33b. Based on the above, the detection unit 32a may acquire direction data (n, X ₀ , Y ₀ , θ) including the camera identification number n.

図３に戻る。制御部３２ｂは、撮像部３３ｂが発話者の方向に向くように撮影方向を制御する。例えば、制御部３２ｂは、方向データに基づいて、発話者が画像の中心となるように撮像部３３ｂを旋回させる。続いて、制御部３２ｂは、撮像部３３ｂによって撮影された発話者の画像データを取得する。制御部３２ｂは、取得した画像データから顔部分の画像データを抽出して記憶部３５に記憶させる。 Returning to FIG. The control unit 32b controls the shooting direction so that the imaging unit 33b faces the direction of the speaker. For example, the control unit 32b turns the imaging unit 33b based on the direction data so that the speaker is at the center of the image. Subsequently, the control unit 32b acquires the image data of the speaker imaged by the imaging unit 33b. The control unit 32b extracts the image data of the face portion from the acquired image data and stores it in the storage unit 35.

顔部分の画像データの抽出は、例えば、顔の形状及び肌色の領域を基に実行される。例えば、制御部３２ｂは、画像データの画素のＲＧＢ値をＹＣＣ表色系に変換し、Ｃｒ値及びＣｂ値が所定の範囲内にある肌色画素を抽出する。制御部３２ｂは、画像データから肌色画素が集まっている領域を特定することにより、顔画像を抽出する。 Extraction of the image data of the face portion is executed based on, for example, the face shape and the skin color area. For example, the control unit 32b converts the RGB values of the pixels of the image data into the YCC color system, and extracts skin color pixels whose Cr value and Cb value are within a predetermined range. The control unit 32b extracts a face image by specifying a region where skin color pixels are gathered from the image data.

計測部３２ｃは、記憶部３５に記憶された画像データに含まれる参加者の顔部分のサイズを計測する。計測部３２ｃは、肌色画素が集まっている領域の左端、右端、上端、下端を特定することで顔のサイズ（インチ）を計測する。 The measuring unit 32 c measures the size of the participant's face included in the image data stored in the storage unit 35. The measurement unit 32c measures the size (inches) of the face by specifying the left end, right end, upper end, and lower end of the area where the skin color pixels are gathered.

参加者が撮像部３３ｂの近くに着席していた場合、比較的顔部分のサイズが大きくなり、参加者が撮像部３３ｂから離れた位置に着席していた場合、比較的顔部分のサイズが小さくなる。 When the participant is seated near the imaging unit 33b, the size of the face portion is relatively large. When the participant is seated at a position away from the imaging unit 33b, the size of the face portion is relatively small. Become.

通知部３２ｄは、音声取得部３３ａによって取得された参加者の音声が所定範囲の最大値よりも大きいか否かと、参加者の音声が所定範囲の最小値よりも小さいか否かと、を判定する。また、通知部３２ｄは、例えば、発話者の音声データに係る音量の平均値を用いて上記所定範囲と比較する判定を行う。また、計測部３２ｃによって計測された参加者の顔部分のサイズが所定範囲の最大値よりも大きいか否かと、参加者の顔部分のサイズが所定範囲の最小値よりも小さいか否かと、を判定する。なお、参加者の音声は、デシベル「ｄＢ」等の単位で表され、参加者の顔のサイズは、インチ「ｉｎ」等の単位で表される。 The notification unit 32d determines whether or not the participant's voice acquired by the voice acquisition unit 33a is larger than the maximum value in the predetermined range and whether or not the participant's voice is lower than the minimum value in the predetermined range. . In addition, the notification unit 32d performs determination to compare with the predetermined range using, for example, an average value of sound volume related to the voice data of the speaker. Further, whether or not the size of the face part of the participant measured by the measuring unit 32c is larger than the maximum value of the predetermined range, and whether or not the size of the face part of the participant is smaller than the minimum value of the predetermined range. judge. Note that the participant's voice is expressed in units of decibel “dB” or the like, and the participant's face size is expressed in units of inch “in” or the like.

通知部３２ｄは、参加者の音声が所定範囲の最大値よりも大きく、かつ参加者の顔部分のサイズが所定範囲の最大値よりも大きい場合、マイクから離れることを促すメッセージを参加者に通知する。また、通知部３２ｄは、参加者の音声が所定範囲の最小値よりも小さく、かつ参加者の顔部分のサイズが所定範囲の最小値よりも小さい場合、マイクに近づくことを促すメッセージを参加者に通知する。 When the participant's voice is larger than the maximum value in the predetermined range and the size of the participant's face is larger than the maximum value in the predetermined range, the notification unit 32d notifies the participant of a message prompting the user to leave the microphone. To do. The notification unit 32d also displays a message prompting the user to approach the microphone when the participant's voice is smaller than the minimum value in the predetermined range and the size of the participant's face is smaller than the minimum value in the predetermined range. Notify

このように、通知部３２ｄは、発話者の音声データの音量が所定範囲の最大値よりも大きいか、又は所定の範囲の最小値よりも小さい場合に、さらに、発話者の顔のサイズに基づいて発話者に通知を行うか否かを判定する。例えば、通知部３２ｄは、発話者がマイクから遠い位置にいるが、声量が大きく明瞭な音声を取得できる場合、発話者にメッセージを通知しない。これにより、音声の収録に影響がない場合にまで発話者にメッセージを通知することを避けることができる。 As described above, the notification unit 32d further determines, based on the size of the speaker's face, when the volume of the voice data of the speaker is larger than the maximum value in the predetermined range or smaller than the minimum value in the predetermined range. To determine whether to notify the speaker. For example, the notification unit 32d does not notify the speaker of a message when the speaker is at a position far from the microphone but can acquire a clear voice with a large voice volume. Thereby, it is possible to avoid notifying the speaker of the message until there is no influence on the recording of the voice.

なお、本実施形態では、発話者の音声または顔部分のサイズのいずれか一方が所定範囲以内である場合、通知部３２ｄは、当該発話者を通知の対象としないが、これに限定されない。例えば、通知部３２ｄは、発話者の音声または顔部分のサイズのいずれか一方が所定範囲外の場合に当該発話者を通知の対象としてもよい。 In this embodiment, when either the voice of the speaker or the size of the face portion is within a predetermined range, the notification unit 32d does not target the speaker, but the present invention is not limited to this. For example, the notification unit 32d may set the speaker as a notification target when either the voice of the speaker or the size of the face portion is outside a predetermined range.

通知部３２ｄは、例えば、データ出力部３４を介して表示装置６にテロップを表示することでメッセージを参加者に通知する。また、通知部３２ｄは、データ出力部３４を介して音声データを外部に出力することによってメッセージを参加者に通知してもよい。なお、参加者へのメッセージの通知方法は、上記に限定されない。 The notification unit 32d notifies the participant of the message by displaying a telop on the display device 6 via the data output unit 34, for example. The notification unit 32d may notify the participant of the message by outputting the audio data to the outside via the data output unit 34. Note that the method of notifying the participant of the message is not limited to the above.

図５は、参加者の位置と通知されるメッセージとの関係を示す図である。図５（ａ）では、参加者βは、テーブルを介してデータ取得部３３（音声取得部３３ａ）の対面に着席しており、（ｂ）（ｃ）よりもデータ取得部３３から遠い位置に着席している。かかる場合に、通知部３２ｄは、参加者の音声が所定範囲の最小値よりも小さく、かつ参加者の顔部分のサイズが所定範囲の最小値よりも小さくなりやすいので、「マイクに近づいてください。」というテロップを表示装置６に表示する。 FIG. 5 is a diagram illustrating the relationship between the position of the participant and the message to be notified. In FIG. 5A, the participant β is seated facing the data acquisition unit 33 (voice acquisition unit 33a) via the table, and is farther from the data acquisition unit 33 than (b) and (c). I'm seated. In such a case, the notification unit 32d indicates that the participant's voice is smaller than the minimum value in the predetermined range and the size of the participant's face tends to be smaller than the minimum value in the predetermined range. Is displayed on the display device 6.

また、図５（ｂ）では、参加者βは、データ取得部３３の近辺に着席しており、（ａ）（ｃ）よりもデータ取得部３３に近い位置に着席している。かかる場合に、通知部３２ｄは、参加者の音声が所定範囲の最大値よりも大きく、かつ参加者の顔部分のサイズが所定範囲の最大値よりも小さくなりやすいので、「マイクから離れてください。」というテロップを表示装置６に表示する。 In FIG. 5B, the participant β is seated in the vicinity of the data acquisition unit 33, and is seated at a position closer to the data acquisition unit 33 than (a) and (c). In such a case, the notification unit 32d indicates that the participant's voice is larger than the maximum value in the predetermined range and the size of the participant's face is likely to be smaller than the maximum value in the predetermined range. Is displayed on the display device 6.

また、図５（ｃ）では、参加者βは、データ取得部３３に対して（ａ）よりもデータ取得部３３に近く、（ｂ）よりもデータ取得部３３に遠い位置に着席している。かかる場合に、通知部３２ｄは、参加者の音声が所定範囲内であるので、テロップ等を表示装置６に表示しない。 In FIG. 5C, the participant β is seated at a position closer to the data acquisition unit 33 than (a) and farther from the data acquisition unit 33 than (b) with respect to the data acquisition unit 33. . In such a case, the notification unit 32d does not display a telop or the like on the display device 6 because the participant's voice is within a predetermined range.

なお、参加者に通知されるメッセージの内容は、上記に限定されない。 Note that the content of the message notified to the participant is not limited to the above.

図６は、実施形態１の制御フローを示す図である。ビデオ会議の配信が開始されると（ステップＳ１０）、検出部３２ａは、発話者が存在するか否かを判定する（ステップＳ１１）。検出部３２ａは、例えば、音声取得部３３ａによって取得された音声データの音量の平均値が所定値以上であった場合に発話者が存在すると判定する。検出部３２ａは、発話者が存在しない場合（ステップＳ１１Ｎｏ）、所定時間経過後、再度発話者が存在するか否かの判定（ステップＳ１１）を実行する。一方、検出部３２ａは、発話者が存在する場合（ステップＳ１１Ｙｅｓ）、音声取得部３３ａによって発話者の方向を検出する（ステップＳ１２）。続いて、制御部３２ｂは、発話者の方向に撮像部３３ｂの撮影方向を制御する（ステップＳ１３）。続いて、計測部３２ｃは、撮像部３３ｂによって撮影された発話者の画像データに基づき、発話者の顔部分のサイズを計測する（ステップＳ１４）。 FIG. 6 is a diagram illustrating a control flow according to the first embodiment. When distribution of the video conference is started (step S10), the detection unit 32a determines whether or not a speaker is present (step S11). For example, the detection unit 32a determines that a speaker is present when the average value of the volume of the audio data acquired by the audio acquisition unit 33a is equal to or greater than a predetermined value. When the speaker is not present (No at Step S11), the detection unit 32a determines whether or not the speaker is present again after a predetermined time (Step S11). On the other hand, when the speaker is present (Yes at Step S11), the detection unit 32a detects the direction of the speaker by the voice acquisition unit 33a (Step S12). Subsequently, the control unit 32b controls the shooting direction of the imaging unit 33b in the direction of the speaker (step S13). Subsequently, the measurement unit 32c measures the size of the speaker's face based on the image data of the speaker photographed by the imaging unit 33b (step S14).

通知部３２ｄは、音声取得部３３ａから取得した発話者の音声データの音量が所定範囲の最小値よりも小さいか否かを判定する（ステップＳ１５）。通知部３２ｄは、発話者の音声データの音量が所定範囲の最小値よりも小さい場合（ステップ１５Ｙｅｓ）、通知部３２ｄは、計測部によって計測された発話者の顔のサイズが所定範囲の最小値よりも小さいか否かを判定する（ステップＳ１６）。検出部３２ａは、発話者の顔のサイズが所定範囲の最小値以上の場合（ステップＳ１６Ｎｏ）、タイマーで所定時間経過後（ステップＳ２１）、再度発話者が存在するか否かの判定（ステップＳ１１）を実行する。一方、通知部３２ｄは、発話者の顔のサイズが所定範囲の最小値よりも小さい場合（ステップＳ１６Ｙｅｓ）、発話者にマイクに近づくように通知する（ステップＳ１７）。メッセージは、例えば、表示装置６にテロップを表示する等の方法によって通知される。続いて、検出部３２ａは、タイマーで所定時間経過後（ステップＳ２１）、再度発話者が存在するか否かの判定（ステップＳ１１）を実行する。 The notification unit 32d determines whether or not the volume of the voice data of the speaker acquired from the voice acquisition unit 33a is smaller than the minimum value in the predetermined range (step S15). When the volume of the voice data of the speaker is smaller than the minimum value of the predetermined range (step 15 Yes), the notification unit 32d determines that the face size of the speaker measured by the measurement unit is the minimum value of the predetermined range. It is determined whether it is smaller than (step S16). When the size of the speaker's face is equal to or larger than the minimum value of the predetermined range (No at Step S16), the detection unit 32a determines whether or not the speaker exists again after a predetermined time elapses in the timer (Step S21) (Step S11). ). On the other hand, when the size of the speaker's face is smaller than the minimum value in the predetermined range (step S16 Yes), the notification unit 32d notifies the speaker to approach the microphone (step S17). The message is notified by a method such as displaying a telop on the display device 6, for example. Subsequently, after a predetermined time has elapsed with the timer (step S21), the detection unit 32a determines whether or not there is a speaker again (step S11).

発話者の音声データの音量の判定（ステップＳ１５）において、通知部３２ｄは、発話者の音声データの音量が所定範囲の最小値以上の場合（ステップ１５Ｎｏ）、発話者の音声データの音量が所定範囲の最大値以下である否かの判定（ステップＳ１８）に移行する。続いて、検出部３２ａは、発話者の音声データの音量が所定範囲の最大値以下である場合（ステップＳ１８Ｎｏ）、タイマーで所定時間経過後（ステップＳ２１）、再度発話者が存在するか否かの判定（ステップＳ１１）を実行する。一方、通知部３２ｄは、発話者の音声データの音量が所定範囲の最大値より大きい場合（ステップＳ１８Ｙｅｓ）、発話者の顔サイズが所定範囲の最大値よりも大きいか否かを判定する（ステップＳ１９）。検出部３２ａは、発話者の顔のサイズが最大値以下である場合（ステップＳ１９Ｎｏ）、タイマーで所定時間経過後（ステップＳ２１）、再度発話者が存在するか否かの判定（ステップＳ１１）を実行する。一方、通知部３２ｄは、発話者の顔のサイズが最大値より大きい場合（ステップＳ１９Ｙｅｓ）、発話者にマイクから離れるように通知する（ステップＳ２０）。続いて、検出部３２ａは、タイマーで所定時間経過後（ステップＳ２１）、再度発話者が存在するか否かの判定（ステップＳ１１）を実行する。 In determining the volume of the voice data of the speaker (step S15), the notification unit 32d determines that the volume of the voice data of the speaker is predetermined when the volume of the voice data of the speaker is equal to or higher than the minimum value in the predetermined range (No in step 15). The process proceeds to determination of whether or not the value is equal to or less than the maximum value of the range (step S18). Subsequently, when the volume of the voice data of the speaker is equal to or lower than the maximum value in the predetermined range (No in step S18), the detection unit 32a determines whether or not the speaker exists again after a predetermined time has elapsed in the timer (step S21). The determination (step S11) is executed. On the other hand, when the volume of the voice data of the speaker is larger than the maximum value in the predetermined range (Yes in step S18), the notification unit 32d determines whether the face size of the speaker is larger than the maximum value in the predetermined range (step S18). S19). If the size of the speaker's face is equal to or less than the maximum value (No at Step S19), the detecting unit 32a determines whether or not the speaker is present again (Step S11) after a predetermined time has elapsed (Step S21). Run. On the other hand, when the size of the speaker's face is larger than the maximum value (step S19 Yes), the notification unit 32d notifies the speaker to leave the microphone (step S20). Subsequently, after a predetermined time has elapsed with the timer (step S21), the detection unit 32a determines whether or not there is a speaker again (step S11).

（実施形態２）
図７は、実施形態２に係る配信端末７の機能構成の例を示す図である。配信端末７は、通信部３１と、処理部３２と、データ取得部３３と、データ出力部３４と、記憶部３５とを有する。記憶部３５は、ユーザテーブル３５ａを有する。 (Embodiment 2)
FIG. 7 is a diagram illustrating an example of a functional configuration of the distribution terminal 7 according to the second embodiment. The distribution terminal 7 includes a communication unit 31, a processing unit 32, a data acquisition unit 33, a data output unit 34, and a storage unit 35. The storage unit 35 includes a user table 35a.

データ取得部３３は、音声取得部３３ａと、撮像部３３ｂとを有する。処理部３２は、検出部３２ａと、制御部３２ｂと、計測部３２ｃと、通知部３２ｄと、格納部３２ｅとを有する。格納部３２ｅは、ユーザテーブル３５ａに接続される。また、通知部３２ｄは、データ出力部３４に接続される。 The data acquisition unit 33 includes an audio acquisition unit 33a and an imaging unit 33b. The processing unit 32 includes a detection unit 32a, a control unit 32b, a measurement unit 32c, a notification unit 32d, and a storage unit 32e. The storage unit 32e is connected to the user table 35a. The notification unit 32d is connected to the data output unit 34.

図８は、ユーザテーブル３５ａのデータ構造の例を示す図である。ユーザテーブル３５ａは、通知済みの参加者を識別するためのテーブルであり、ユーザＩＤと、顔画像と、通知の有無とを対応付ける。「ユーザＩＤ」は、ユーザを一意に識別する番号である。配信端末７は、「ユーザＩＤ」によって参加者を識別する。「顔画像」は、ユーザの顔部分の画像データを示す。例えば、「顔画像」は、ユーザ全体の画像データから顔部分を切り抜いた画像データである。例えば、ＪＰＥＧ、ＧＩＦ等の画像フォーマットで記憶される。「通知の有無」は、参加者（ユーザ）に対して位置の移動を促す通知を行ったか否かを示す。「有」が格納されている場合、既に参加者に「マイクに近づいてください。」「マイクから離れてください。」等の通知を行ったことを示す。「無」が格納されている場合、まだ参加者に通知を行っていないことを示す。例えば、ユーザテーブル３５ａには、ユーザＩＤ「0101」の顔画像は「画像ａ」であり、まだ通知を行っていないことを示す。また、ユーザＩＤ「0103」の顔画像は「画像ｃ」であり、既に通知を行ったことを示す。 FIG. 8 is a diagram illustrating an example of the data structure of the user table 35a. The user table 35a is a table for identifying a notified participant, and associates a user ID, a face image, and presence / absence of notification. “User ID” is a number that uniquely identifies a user. The distribution terminal 7 identifies the participant by “user ID”. “Face image” indicates image data of the face portion of the user. For example, “face image” is image data obtained by clipping a face portion from image data of the entire user. For example, it is stored in an image format such as JPEG or GIF. “Presence / absence of notification” indicates whether or not the participant (user) has been notified to move the position. When “Yes” is stored, it indicates that notification has already been given to the participant such as “Please approach the microphone” and “Please move away from the microphone”. When “None” is stored, it indicates that the participant has not been notified yet. For example, in the user table 35a, the face image of the user ID “0101” is “image a”, indicating that notification has not yet been made. Further, the face image of the user ID “0103” is “image c”, which indicates that notification has already been performed.

図７に戻る。検出部３２ａは、音声取得部３３ａを用いて、会議室で発話した参加者の方向を検出する。例えば、検出部３２ａは、座標を（０,０）とする原点から会議室内の基準点（座標（Ｘ_０,Ｙ_０））を撮影する方向を０度とした場合に、撮像部３３ｂの位置座標（Ｘ_１,Ｙ_１）と参加者を撮影した時の撮影角度θとを取得する。 Returning to FIG. The detection part 32a detects the direction of the participant who spoke in the meeting room using the audio | voice acquisition part 33a. For example, the detection unit 32a determines the position of the imaging unit 33b when the direction in which the reference point (coordinates (X ₀ , Y ₀ )) in the conference room is imaged from the origin whose coordinates are ( ₀ , ₀ ) is 0 degrees. The coordinates (X ₁ , Y ₁ ) and the shooting angle θ when the participant is shot are acquired.

制御部３２ｂは、発話者の方向に、撮像部３３ｂの撮影方向を制御する。例えば、制御部３２ｂは、検出部３２ａによって検出された方向データに基づいて、発話者が撮影画像の中心となるように撮像部３３ｂを旋回させる。 The control unit 32b controls the shooting direction of the imaging unit 33b in the direction of the speaker. For example, the control unit 32b turns the imaging unit 33b based on the direction data detected by the detection unit 32a so that the speaker is at the center of the captured image.

計測部３２ｃは、記憶部３５に記憶された画像データに含まれる参加者の顔部分のサイズを計測する。 The measuring unit 32 c measures the size of the participant's face included in the image data stored in the storage unit 35.

通知部３２ｄは、発話者の音声データの音量が所定範囲の最大値よりも大きく、かつ発話者の顔サイズが所定範囲の最大値よりも大きい場合、または、発話者の音声データの音量が所定範囲の最小値よりも小さく、かつ発話者の顔サイズが所定範囲の最小値よりも小さい場合、発話者の画像データを内蔵メモリ内に記憶させる。発話者の画像データは、発話者全体の画像であってもよいし、発話者の顔部分を切り抜いた画像であってもよい。なお、通知部３２ｄは、発話者の音声データの音量が所定範囲以内である場合、及び発話者の顔サイズが所定範囲以内である場合は、処理を終了させる。 The notification unit 32d determines that the volume of the voice data of the speaker is larger than the maximum value in the predetermined range and the face size of the speaker is larger than the maximum value in the predetermined range, or the volume of the voice data of the speaker is predetermined. When it is smaller than the minimum value of the range and the face size of the speaker is smaller than the minimum value of the predetermined range, the image data of the speaker is stored in the built-in memory. The image data of the speaker may be an image of the entire speaker or an image obtained by cutting out the face portion of the speaker. The notification unit 32d ends the process when the volume of the voice data of the speaker is within a predetermined range and when the face size of the speaker is within the predetermined range.

続いて、通知部３２ｄは、ユーザテーブル３５ａに記憶されているユーザの画像と比較し、顔認証することで発話者のユーザＩＤを特定する。例えば、通知部３２ｄは、固有顔等の顔認証アルゴリズムを用いて顔認証を行う。格納部３２ｅは、ユーザテーブル３５ａにおいて、特定したユーザＩＤに対応する「通知の有無」を参照し、過去に発話者に通知を行ったか否かを判定する。続いて、通知部３２ｄは、通知が行われていない場合、発話者に位置を移動する通知を行う。 Subsequently, the notification unit 32d identifies the user ID of the speaker by performing face authentication by comparing with the user image stored in the user table 35a. For example, the notification unit 32d performs face authentication using a face authentication algorithm such as a unique face. The storage unit 32e refers to the “presence / absence of notification” corresponding to the specified user ID in the user table 35a, and determines whether or not the speaker has been notified in the past. Subsequently, the notification unit 32d notifies the speaker of moving the position when the notification is not performed.

例えば、通知部３２ｄは、発話者の音声データの音量が所定範囲の最大値よりも大きく、かつ発話者の顔サイズが所定範囲の最大値よりも大きい場合、マイクから離れることを促すメッセージを発話者に通知する。一方、通知部３２ｄは、発話者の音声データの音量が所定範囲の最小値よりも小さく、かつ発話者の顔サイズが所定範囲の最小値よりも小さい場合、マイクに近づくことを促すメッセージを発話者に通知する。 For example, when the volume of the voice data of the speaker is larger than the maximum value in the predetermined range and the face size of the speaker is larger than the maximum value in the predetermined range, the notification unit 32d The person in charge. On the other hand, when the volume of the voice data of the speaker is smaller than the minimum value in the predetermined range and the face size of the speaker is smaller than the minimum value in the predetermined range, the notification unit 32d The person in charge.

格納部３２ｅは、通知部３２ｄによって発話者にメッセージが通知された場合、ユーザテーブル３５ａのユーザＩＤに対応する「通知の有無」に、通知済みであることを示す「有」を格納する。 When the message is notified to the speaker by the notification unit 32d, the storage unit 32e stores “present” indicating “notified” in the “presence / absence of notification” corresponding to the user ID in the user table 35a.

図９は、実施形態２の制御フローを示す図である。ビデオ会議の配信が開始されると（ステップＳ３０）、検出部３２ａは、発話者が存在するか否かを判定する（ステップＳ３１）。検出部３２ａは、発話者が存在しない場合（ステップＳ３１Ｎｏ）、所定時間経過後、再度発話者が存在するか否かの判定（ステップＳ３１）を実行する。一方、検出部３２ａは、発話者が存在する場合（ステップＳ３１Ｙｅｓ）、音声取得部３３ａによって発話者の方向を検出する（ステップＳ３２）。続いて、制御部３２ｂは、発話者の方向に撮像部３３ｂの撮影方向を制御する（ステップＳ３３）。続いて、計測部３２ｃは、発話者の顔部分のサイズを計測する（ステップＳ３４）。 FIG. 9 is a diagram illustrating a control flow of the second embodiment. When distribution of the video conference is started (step S30), the detection unit 32a determines whether or not a speaker is present (step S31). When the speaker is not present (No at Step S31), the detection unit 32a determines whether or not the speaker is present again after a predetermined time (Step S31). On the other hand, when the speaker is present (Yes at Step S31), the detection unit 32a detects the direction of the speaker by the voice acquisition unit 33a (Step S32). Subsequently, the control unit 32b controls the shooting direction of the imaging unit 33b in the direction of the speaker (step S33). Subsequently, the measuring unit 32c measures the size of the speaker's face (step S34).

続いて、通知部３２ｄは、音声取得部３３ａから取得した発話者の音声データの音量が所定範囲の最小値よりも小さいか否かを判定する（ステップＳ３５）。通知部３２ｄは、発話者の音声データの音量が所定範囲の最小値よりも小さい場合（ステップ３５Ｙｅｓ）、計測部によって計測された発話者の顔のサイズが所定範囲の最小値よりも小さいか否かを判定する（ステップＳ３６）。検出部３２ａは、発話者の顔のサイズが所定範囲の最小値以上の場合（ステップＳ３６Ｎｏ）、タイマーで所定時間経過後（ステップＳ４１）、再度発話者が存在するか否かの判定（ステップＳ３１）を実行する。一方、通知部３２ｄは、発話者の顔のサイズが所定範囲の最小値よりも小さい場合、認証処理を実行する（ステップＳ３７）。認証処理に関しては、図１０で説明する。続いて、検出部３２ａは、タイマーで所定時間経過後（ステップＳ４１）、再度発話者が存在するか否かの判定（ステップＳ３１）を実行する。 Subsequently, the notification unit 32d determines whether or not the volume of the voice data of the speaker acquired from the voice acquisition unit 33a is smaller than the minimum value in the predetermined range (step S35). When the volume of the speech data of the speaker is smaller than the minimum value in the predetermined range (step 35 Yes), the notification unit 32d determines whether the size of the speaker's face measured by the measurement unit is smaller than the minimum value in the predetermined range. Is determined (step S36). When the size of the speaker's face is equal to or larger than the minimum value in the predetermined range (No in step S36), the detection unit 32a determines whether or not the speaker exists again after a predetermined time has elapsed in the timer (step S41) (step S31). ). On the other hand, when the size of the speaker's face is smaller than the minimum value in the predetermined range, the notification unit 32d executes an authentication process (step S37). The authentication process will be described with reference to FIG. Subsequently, after a predetermined time has elapsed with the timer (step S41), the detection unit 32a determines whether or not there is a speaker again (step S31).

発話者の音声データの音量の判定（ステップＳ３５）において、通知部３２ｄは、発話者の音声データの音量が所定範囲の最小値以上の場合（ステップ３５Ｎｏ）、発話者の音声データの音量が所定範囲の最大値以下であるか否かの判定（ステップＳ３８）に移行する。続いて、検出部３２ａは、発話者の音声データの音量が所定範囲の最大値以下である場合（ステップＳ３８Ｎｏ）、タイマーで所定時間経過後（ステップＳ４１）、再度発話者が存在するか否かの判定（ステップＳ３１）を実行する。一方、通知部３２ｄは、発話者の音声データの音量が所定範囲の最大値より大きい場合（ステップＳ３８Ｙｅｓ）、発話者の顔サイズが所定範囲の最大値よりも大きいか否かを判定する（ステップＳ３９）。検出部３２ａは、発話者の顔のサイズが最大値以下である場合（ステップＳ３９Ｎｏ）、タイマーで所定時間経過後（ステップＳ４１）、再度発話者が存在するか否かの判定（ステップＳ３１）を実行する。一方、通知部３２ｄは、発話者の顔のサイズが最大値より大きい場合（ステップＳ３９Ｙｅｓ）、認証処理を実行する（ステップＳ４０）。続いて、検出部３２ａは、タイマーで所定時間経過後（ステップＳ２１）、再度発話者が存在するか否かの判定（ステップＳ３１）を実行する。 In determining the volume of the voice data of the speaker (step S35), the notification unit 32d determines that the volume of the voice data of the speaker is predetermined when the volume of the voice data of the speaker is equal to or higher than the minimum value in the predetermined range (No in step 35). The process proceeds to a determination (step S38) as to whether or not the maximum value is within the range. Subsequently, when the volume of the voice data of the speaker is equal to or lower than the maximum value of the predetermined range (step S38 No), the detection unit 32a determines whether or not the speaker exists again after a predetermined time elapses in the timer (step S41). (Step S31) is executed. On the other hand, when the volume of the voice data of the speaker is larger than the maximum value in the predetermined range (Yes in step S38), the notification unit 32d determines whether the face size of the speaker is larger than the maximum value in the predetermined range (step S38). S39). When the size of the speaker's face is equal to or less than the maximum value (No at Step S39), the detection unit 32a determines whether or not the speaker is present again (Step S31) after a predetermined time has elapsed (Step S41). Run. On the other hand, when the size of the speaker's face is larger than the maximum value (Yes at Step S39), the notification unit 32d executes an authentication process (Step S40). Subsequently, after a predetermined time has elapsed with the timer (step S21), the detection unit 32a determines whether or not there is a speaker again (step S31).

図１０は、認証処理のフローを示す図である。図１０の認証処理は、図９の（ステップＳ３７）及び（ステップＳ４０）に対応する。通知部３２ｄは、発話者の画像データを内蔵メモリ内に記憶させる（ステップＳ５０）。続いて、通知部３２ｄは、ユーザテーブル３５ａに記憶されているユーザの顔部分の画像データと比較し、顔認証を行うことで発話者のユーザＩＤを特定する（ステップＳ５１）。続いて、格納部３２ｅは、ユーザテーブル３５ａを参照し、過去に発話者に位置移動の通知を行ったか否かを判定する（ステップＳ５２）。通知部３２ｄは、過去に発話者に通知を行っていた場合（ステップＳ５２Ｎｏ）、処理を終了させる。まだ発話者に通知を行っていない場合（ステップＳ５２Ｙｅｓ）、通知部３２ｄは、発話者に位置を移動する通知を行う（ステップＳ５３）。 FIG. 10 is a diagram showing a flow of authentication processing. The authentication process in FIG. 10 corresponds to (Step S37) and (Step S40) in FIG. The notification unit 32d stores the image data of the speaker in the built-in memory (step S50). Subsequently, the notification unit 32d compares the image data of the user's face portion stored in the user table 35a and identifies the speaker user ID by performing face authentication (step S51). Subsequently, the storage unit 32e refers to the user table 35a, and determines whether or not the speaker has been notified of the position movement in the past (step S52). When the notification unit 32d has notified the speaker in the past (No in step S52), the notification unit 32d ends the process. If the speaker has not been notified yet (step S52 Yes), the notification unit 32d notifies the speaker of the movement of the position (step S53).

例えば、通知部３２ｄは、発話者の音声データの音量が所定範囲の最小値よりも小さく、かつ発話者の顔サイズが所定範囲よりも小さい場合（ステップＳ３７から移行した場合）、マイクに近づくことを促すメッセージを発話者に通知する。一方、通知部３２ｄは、発話者の音声データの音量が所定範囲の最大値よりも大きく、かつ発話者の顔サイズが所定範囲の最大値よりも大きい場合（ステップＳ４０から移行した場合）、マイクから離れることを促すメッセージを発話者に通知する。 For example, the notification unit 32d approaches the microphone when the volume of the voice data of the speaker is smaller than the minimum value of the predetermined range and the face size of the speaker is smaller than the predetermined range (when moving from step S37). A message prompting the speaker is notified to the speaker. On the other hand, when the volume of the voice data of the speaker is larger than the maximum value in the predetermined range and the face size of the speaker is larger than the maximum value in the predetermined range (when the process proceeds from step S40), the notification unit 32d Notify the speaker of a message prompting them to leave.

続いて、格納部３２ｅは、ユーザテーブル３５ａの「通知の有無」に「有」を格納する（ステップＳ５４）。 Subsequently, the storage unit 32e stores “present” in “notification presence / absence” of the user table 35a (step S54).

以上のように、過去にユーザに位置移動を通知したか否かを事前に判定することで、会議室内が満席である場合等のように、ユーザが移動できない事情がある場合に繰り返し通知することを避けることができる。 As described above, by determining in advance whether or not the user has been notified of the position movement in the past, it is repeatedly notified when there is a situation in which the user cannot move, such as when the conference room is full. Can be avoided.

（実施形態３）
実施形態１又は２のように顔のサイズから距離を計測する以外の方法で、発話者との距離を計測してもよい。例えば、計測部３２ｃは、距離センサを用いて発話者との距離を計測してもよい。距離センサには、例えば、ステレオカメラ、超音波センサ、赤外線センサ等が含まれる。なお、ステレオカメラは、発話者との距離の測定に並行して、発話者の画像データの取得を行ってもよい。 (Embodiment 3)
The distance to the speaker may be measured by a method other than measuring the distance from the face size as in the first or second embodiment. For example, the measurement unit 32c may measure the distance to the speaker using a distance sensor. Examples of the distance sensor include a stereo camera, an ultrasonic sensor, and an infrared sensor. Note that the stereo camera may acquire the image data of the speaker in parallel with the measurement of the distance to the speaker.

図１１Ａは、実施形態３の配信端末８の第１のハードウェア構成を示す図である。第１のハードウェア構成では、ステレオカメラ５０を用いて発話者との距離が測定される。配信端末８は、撮像素子Ｉ／Ｆ１１３にステレオカメラ５０が接続されている点で実施形態１の配信端末３と異なり、他のハードウェア構成は実施形態１と同様である。なお、撮像素子Ｉ／Ｆ１１３に、撮影用のカメラ１１２と距離計測用のステレオカメラ５０とがそれぞれ接続される構成であってもよい。 FIG. 11A is a diagram illustrating a first hardware configuration of the distribution terminal 8 according to the third embodiment. In the first hardware configuration, the distance from the speaker is measured using the stereo camera 50. The distribution terminal 8 is different from the distribution terminal 3 of the first embodiment in that the stereo camera 50 is connected to the image sensor I / F 113, and the other hardware configuration is the same as that of the first embodiment. Note that a configuration in which the imaging camera 112 and the distance measurement stereo camera 50 are connected to the imaging element I / F 113 may be possible.

図１１Ｂは、実施形態３の配信端末８の第２のハードウェア構成を示す図である。第２のハードウェア構成は、赤外線センサ５１を用いて発話者との距離が測定される。配信端末８は、バス１１０に接続されたセンサＩ／Ｆ１２２を介して赤外線センサ５１又は超音波センサ５２が接続されている点で実施形態１の配信端末３と異なり、他のハードウェア構成は実施形態１と同様である。 FIG. 11B is a diagram illustrating a second hardware configuration of the distribution terminal 8 according to the third embodiment. In the second hardware configuration, the distance from the speaker is measured using the infrared sensor 51. The distribution terminal 8 is different from the distribution terminal 3 of the first embodiment in that the infrared sensor 51 or the ultrasonic sensor 52 is connected via the sensor I / F 122 connected to the bus 110, and other hardware configurations are implemented. This is the same as the first embodiment.

また、配信端末８は、図３と同様の機能構成を有するので、各構成の説明を省略する。 Further, since the distribution terminal 8 has the same functional configuration as that of FIG. 3, the description of each configuration is omitted.

図１２は、ステレオカメラ５０の外観の一例を示す図である。ステレオカメラ５０は、並列して設置された複数のカメラを用いて発話者との距離を計測する機器である。各カメラの撮影方向は、独立して制御される。図１２の例では、２つのカメラが近接して設置されているが、カメラ間の距離を大きくしてもよい。また、図１２の例では、２つのカメラを使用しているが３つ以上のカメラを使用してもよい。 FIG. 12 is a diagram illustrating an example of the appearance of the stereo camera 50. The stereo camera 50 is a device that measures the distance to the speaker using a plurality of cameras installed in parallel. The shooting direction of each camera is controlled independently. In the example of FIG. 12, two cameras are installed close to each other, but the distance between the cameras may be increased. In the example of FIG. 12, two cameras are used, but three or more cameras may be used.

ステレオカメラ５０を用いて発話者との距離を算出する方法について説明する。カメラ間の距離Ａ［ｍ］と、カメラの焦点距離Ｂ［ｍ］と、各カメラによって撮像された発話者の位置の差Ｃ［ｍ］に基づいて、計測部３２ｃは、次の数式に基づいて発話者との距離Ｄ［ｍ］を算出する。 A method for calculating the distance to the speaker using the stereo camera 50 will be described. Based on the distance A [m] between the cameras, the focal length B [m] of the camera, and the difference C [m] of the position of the speaker imaged by each camera, the measuring unit 32c is based on the following equation. To calculate the distance D [m] from the speaker.

（数１）
Ｄ＝Ａ×Ｂ／Ｃ
カメラ間の距離Ａ［ｍ］は、大きい方が距離の測定精度が高いが、カメラ間の距離Ａを大きくする場合、距離の測定時間が大きくなる。カメラ間の距離が大きいと、発話者の探索を開始してから各カメラで発話者を捕捉するまでの時間が大きくなるためである。カメラ間の距離Ａ［ｍ］を大きくする場合、ステレオカメラ５０で計測するターゲットとなる距離範囲を狭く設定することで、探索にかかる時間を小さくすることができる。 (Equation 1)
D = A × B / C
As the distance A [m] between the cameras is larger, the distance measurement accuracy is higher, but when the distance A between the cameras is increased, the distance measurement time becomes longer. This is because if the distance between the cameras is large, the time from when the search for the speaker is started until the speaker is captured by each camera is increased. When the distance A [m] between the cameras is increased, the time required for the search can be reduced by setting the distance range to be measured by the stereo camera 50 to be narrow.

例えば、ステレオカメラ５０で計測するターゲットとなる距離範囲は、例えば、撮影する室内の広さに応じて設定される。ステレオカメラ５０で計測するターゲットとなる距離範囲をあらかじめ設定しておくことで、発話者との距離を計測する時間を短くすることができる。 For example, the distance range to be measured by the stereo camera 50 is set according to the size of the room in which the image is taken, for example. By setting the distance range to be measured by the stereo camera 50 in advance, the time for measuring the distance to the speaker can be shortened.

また、発話者の位置の差Ｃは、一方のカメラにより撮像された発話者の位置と、他方のカメラにより撮像された発話者の位置とが左右に例えば、５ｃｍずれていた場合、発話者の位置の差Ｃ［ｍ］は０．０５となる。 Further, the difference C between the positions of the speakers is that the position of the speaker captured by one camera is different from that of the speaker captured by the other camera, for example, by 5 cm from side to side. The position difference C [m] is 0.05.

また、ステレオカメラ５０は、計測部３３ｃ以外に図２のカメラ１１２として用いてもよい。例えば、ステレオカメラ５０は、カメラ１１２として使用される場合、複数のカメラにより撮像された画像を合成して画像データを生成してもよい。また、ステレオカメラ５０は、一方のカメラで撮像された画像を画像データとしてサーバ２に送信してもよい。また、ステレオカメラ５０は、専ら発話者との距離計測に用い、配信端末３は、発話者の画像データを取得するためのカメラ１１２を別に備えてもよい。 Moreover, you may use the stereo camera 50 as the camera 112 of FIG. 2 besides the measurement part 33c. For example, when the stereo camera 50 is used as the camera 112, the stereo camera 50 may generate image data by combining images captured by a plurality of cameras. In addition, the stereo camera 50 may transmit an image captured by one camera to the server 2 as image data. Further, the stereo camera 50 may be used exclusively for measuring the distance to the speaker, and the distribution terminal 3 may include a separate camera 112 for acquiring the image data of the speaker.

図１３は、赤外線センサ５１を有するカメラ１１２の外観の一例を示す図である。例えば、図１３に示すように赤外線センサ５１は、カメラ１１２の撮影方向と同じ方向に向くように、カメラ１１２と並列して配置される。赤外線センサ５１は、検出部３２ａによって検出された発話者の方向にカメラ１１２の撮影方向が制御された後に、発話者との距離の測定を開始する。なお、超音波センサ５２を距離センサとして用いる場合も、図１３の赤外線センサ５１と同様にカメラ１１２と並列して配置される。 FIG. 13 is a diagram illustrating an example of the appearance of the camera 112 having the infrared sensor 51. For example, as shown in FIG. 13, the infrared sensor 51 is arranged in parallel with the camera 112 so as to face the same direction as the shooting direction of the camera 112. The infrared sensor 51 starts measuring the distance to the speaker after the shooting direction of the camera 112 is controlled in the direction of the speaker detected by the detection unit 32a. Even when the ultrasonic sensor 52 is used as a distance sensor, it is arranged in parallel with the camera 112 in the same manner as the infrared sensor 51 of FIG.

また、赤外線センサ５１が距離センサである場合、赤外線センサ５１は、例えば、発話者に赤外線を照射し、反射光を検出した受光素子の位置に基づいて三角測量の原理で発話者との距離を計測する。 Further, when the infrared sensor 51 is a distance sensor, the infrared sensor 51 irradiates the speaker with infrared rays, and determines the distance from the speaker on the basis of the triangulation principle based on the position of the light receiving element that detects the reflected light. measure.

具体的には、赤外線センサ５１は、発話者に赤外線を照射し、位置検出素子ＰＳＤ（Position Sensing Device）で発話者からの反射光を受光する。発話者との距離に応じて反射光を検出する位置検出素子の位置が変化するので、計測部３２ｃは、反射光を検出した位置検出素子の位置を距離に換算することにより、発話者との距離を算出することができる。 Specifically, the infrared sensor 51 irradiates a speaker with infrared rays, and receives reflected light from the speaker with a position detection device PSD (Position Sensing Device). Since the position of the position detection element that detects the reflected light changes according to the distance to the speaker, the measurement unit 32c converts the position of the position detection element that detects the reflected light into a distance, thereby The distance can be calculated.

なお、赤外線センサ５１として使用される素子は、ＰＳＤに限定されず、ＯＥＳ（Opto Elektronischer Schaltkreis）等の他の種類の素子を使用してもよい。 The element used as the infrared sensor 51 is not limited to PSD, and other types of elements such as OES (Opto Elektronischer Schaltkreis) may be used.

また、超音波センサ５２を用いる場合は、計測部３２ｃは、検出部３２ａによって検出された発話者の方向に制御された後に、検出された発話者に超音波を発信して反射波を計測、又は発話者に赤外線を照射して反射光を計測することで、発話者との距離を測定する。 When the ultrasonic sensor 52 is used, the measuring unit 32c measures the reflected wave by transmitting an ultrasonic wave to the detected speaker after being controlled in the direction of the speaker detected by the detecting unit 32a. Alternatively, the distance to the speaker is measured by irradiating the speaker with infrared rays and measuring the reflected light.

例えば、計測部３２ｃは、超音波センサを用いて発話者に超音波を発信してから反射波を受信するまでの時間を計測することで発話者との距離を測定する。例えば、発話者に超音波を発信してから反射波を受信するまでの時間をｔ［ｓ］、音速をｃ［ｍ／ｓ］とした場合、計測部３２ｃは、以下の式に基づいて発話者との距離Ｌを算出する。 For example, the measurement unit 32c measures the distance from the speaker by measuring the time from when the ultrasonic wave is transmitted to the speaker using the ultrasonic sensor until the reflected wave is received. For example, when the time from transmitting an ultrasonic wave to a speaker until receiving a reflected wave is t [s] and the sound speed is c [m / s], the measurement unit 32c utters based on the following equation: The distance L with the person is calculated.

（数２）
Ｌ＝ｃ×ｔ／２
図１４は、実施形態３の制御フローを示す図である。ビデオ会議の配信が開始されると（ステップＳ６０）、検出部３２ａは、発話者が存在するか否かを判定する（ステップＳ６１）。検出部３２ａは、発話者が存在しない場合（ステップＳ６１Ｎｏ）、所定時間経過後、再度発話者が存在するか否かの判定（ステップＳ６１）を実行する。一方、検出部３２ａは、発話者が存在する場合（ステップＳ６１Ｙｅｓ）、音声取得部３３ａによって発話者の方向を検出する（ステップＳ６２）。続いて、制御部３２ｂは、発話者の方向に撮像部３３ｂの撮影方向を制御する（ステップＳ６３）。続いて、計測部３２ｃは、ステレオカメラ５０、赤外線センサ５１又は超音波センサ５２に基づいて、発話者との距離を計測する（ステップＳ６４）。 (Equation 2)
L = c × t / 2
FIG. 14 is a diagram illustrating a control flow of the third embodiment. When distribution of the video conference is started (step S60), the detection unit 32a determines whether or not a speaker is present (step S61). When the speaker is not present (No at Step S61), the detection unit 32a determines whether the speaker is present again (Step S61) after a predetermined time has elapsed. On the other hand, when the speaker is present (Yes in step S61), the detection unit 32a detects the direction of the speaker by the voice acquisition unit 33a (step S62). Subsequently, the control unit 32b controls the shooting direction of the imaging unit 33b in the direction of the speaker (step S63). Subsequently, the measuring unit 32c measures the distance from the speaker based on the stereo camera 50, the infrared sensor 51, or the ultrasonic sensor 52 (step S64).

通知部３２ｄは、音声取得部３３ａから取得した発話者の音声データの音量が所定範囲の最小値よりも小さいか否かを判定する（ステップＳ６５）。通知部３２ｄは、発話者の音声データの音量が所定範囲の最小値よりも小さい場合（ステップ６５Ｙｅｓ）、計測部によって計測された発話者との距離が所定範囲の最小値よりも小さいか否かを判定する（ステップＳ６６）。検出部３２ａは、発話者との距離が所定範囲の最小値以上の場合（ステップＳ６６Ｎｏ）、タイマーで所定時間経過後（ステップＳ７１）、再度発話者が存在するか否かの判定（ステップＳ６１）を実行する。一方、通知部３２ｄは、発話者の顔のサイズが所定範囲の最小値よりも小さい場合（ステップＳ６６Ｙｅｓ）、発話者にマイクに近づくように通知する（ステップＳ６７）。続いて、検出部３２ａは、タイマーで所定時間経過後（ステップＳ７１）、再度発話者が存在するか否かの判定（ステップＳ６１）を実行する。 The notification unit 32d determines whether or not the volume of the voice data of the speaker acquired from the voice acquisition unit 33a is smaller than the minimum value in the predetermined range (step S65). When the volume of the voice data of the speaker is smaller than the minimum value in the predetermined range (step 65 Yes), the notification unit 32d determines whether the distance from the speaker measured by the measurement unit is smaller than the minimum value in the predetermined range. Is determined (step S66). When the distance to the speaker is equal to or greater than the minimum value in the predetermined range (step S66 No), the detection unit 32a determines whether or not the speaker is present again after a predetermined time has elapsed with the timer (step S71) (step S61). Execute. On the other hand, when the size of the speaker's face is smaller than the minimum value in the predetermined range (step S66 Yes), the notification unit 32d notifies the speaker to approach the microphone (step S67). Subsequently, after a predetermined time has elapsed with the timer (step S71), the detection unit 32a determines whether or not there is a speaker again (step S61).

発話者の音声データの音量が所定範囲の最小値よりも小さいか否かの判定（ステップＳ６５）において、通知部３２ｄは、発話者の音声データの音量が所定範囲の最小値以上の場合（ステップ６５Ｎｏ）、発話者の音声データの音量が所定範囲の最大値以下であるか否かの判定（ステップＳ６８）に移行する。続いて、検出部３２ａは、発話者の音声データの音量が所定範囲の最大値以下である場合（ステップＳ６８Ｎｏ）、タイマーで所定時間経過後（ステップＳ７１）、再度発話者が存在するか否かの判定（ステップＳ６１）を実行する。一方、通知部３２ｄは、発話者の音声データの音量が所定範囲の最大値より大きい場合（ステップＳ６８Ｙｅｓ）、発話者との距離が所定範囲の最大値よりも大きいか否かを判定する（ステップＳ６９）。発話者との距離が最大値以下である場合（ステップＳ６９Ｎｏ）、検出部３２ａは、タイマーで所定時間経過後（ステップＳ７１）、再度発話者が存在するか否かの判定（ステップＳ６１）を実行する。 In determining whether or not the volume of the voice data of the speaker is smaller than the minimum value of the predetermined range (step S65), the notification unit 32d determines that the volume of the voice data of the speaker is equal to or higher than the minimum value of the predetermined range (step S65). 65No), the process proceeds to the determination (step S68) of whether or not the volume of the voice data of the speaker is equal to or lower than the maximum value in the predetermined range. Subsequently, when the volume of the voice data of the speaker is not more than the maximum value in the predetermined range (step S68 No), the detection unit 32a determines whether or not the speaker exists again after a predetermined time elapses in the timer (step S71). The determination (step S61) is executed. On the other hand, when the volume of the voice data of the speaker is larger than the maximum value in the predetermined range (step S68 Yes), the notification unit 32d determines whether or not the distance to the speaker is larger than the maximum value in the predetermined range (step S68). S69). When the distance to the speaker is equal to or less than the maximum value (No at Step S69), the detection unit 32a performs a determination as to whether or not the speaker exists again (Step S61) after a predetermined time elapses with a timer (Step S71). To do.

一方、通知部３２ｄは、発話者との距離が最大値より大きい場合（ステップＳ６９Ｙｅｓ）、発話者にマイクから離れるように通知する（ステップＳ７０）。続いて、検出部３２ａは、タイマーで所定時間経過後（ステップＳ７１）、再度発話者が存在するか否かの判定（ステップＳ６１）を実行する。 On the other hand, when the distance to the speaker is larger than the maximum value (step S69 Yes), the notification unit 32d notifies the speaker to leave the microphone (step S70). Subsequently, after a predetermined time has elapsed with the timer (step S71), the detection unit 32a determines whether or not there is a speaker again (step S61).

また、実施形態３において実施形態２で説明した認証処理を行ってもよい。かかる場合、（ステップＳ６７）及び（ステップＳ７０）において、図１０に示した認証処理が行われる。 In the third embodiment, the authentication process described in the second embodiment may be performed. In such a case, in (step S67) and (step S70), the authentication process shown in FIG. 10 is performed.

また、実施形態１乃至３は、コンピュータの媒体に格納されたプログラムを実行させることにより、配信端末３、配信端末７又は配信端末８の機能を実現できる。 In the first to third embodiments, the functions of the distribution terminal 3, the distribution terminal 7, or the distribution terminal 8 can be realized by executing a program stored in a computer medium.

以上、配信端末３を実施形態１で、配信端末７を実施形態２で、配信端末８を実施形態３で説明したが、本発明は上記実施形態に限定されるものではなく、本発明の範囲内で種々の変形及び改良が可能である。 The distribution terminal 3 has been described in the first embodiment, the distribution terminal 7 has been described in the second embodiment, and the distribution terminal 8 has been described in the third embodiment. However, the present invention is not limited to the above-described embodiment, and the scope of the present invention. Various modifications and improvements are possible.

なお、本実施形態において、配信端末３及び配信端末７は、撮影装置の一例である。記憶部３５は、記憶部の一例である。検出部３２ａは、検出部の一例である。制御部３２ｂは、制御部の一例である。計測部３２ｃは、計測部の一例である。通知部３２ｄは、通知部の一例である。格納部３２ｅは、格納部の一例である。ユーザＩＤは、識別情報の一例である。 Note that in the present embodiment, the distribution terminal 3 and the distribution terminal 7 are examples of imaging devices. The storage unit 35 is an example of a storage unit. The detection unit 32a is an example of a detection unit. The control unit 32b is an example of a control unit. The measurement unit 32c is an example of a measurement unit. The notification unit 32d is an example of a notification unit. The storage unit 32e is an example of a storage unit. The user ID is an example of identification information.

２サーバ
３,７,８配信端末
４利用者端末
５通信ネットワーク
６表示装置
３１通信部
３２処理部
３２ａ検出部
３２ｂ制御部
３２ｃ計測部
３２ｄ通知部
３３データ取得部
３３ａ音声取得部
３３ｂ撮像部
３４データ出力部
３５記憶部
１００ビデオ配信システム 2 Server 3, 7, 8 Distribution terminal 4 User terminal 5 Communication network 6 Display device 31 Communication unit 32 Processing unit 32a Detection unit 32b Control unit 32c Measurement unit 32d Notification unit 33 Data acquisition unit 33a Audio acquisition unit 33b Imaging unit 34 Data Output unit 35 Storage unit 100 Video distribution system

特開２００８−１０３８２４号公報JP 2008-103824 A

Claims

人物を撮影する撮影装置において、
発話した人物の方向及び発話時の音量を検出する検出部と、
前記検出した方向に撮影方向を制御する制御部と、
前記音量が所定範囲の最小値よりも小さい場合、前記人物に通知をする通知部と、を有する撮影装置。 In a photographing device for photographing a person,
A detection unit for detecting the direction of the person who spoke and the volume at the time of speaking;
A control unit for controlling the shooting direction in the detected direction;
And a notification unit configured to notify the person when the volume is smaller than a minimum value in a predetermined range.

前記通知部は、前記音量が所定範囲の最大値よりも大きい場合、前記人物に通知する請求項１に記載の撮影装置。 The imaging device according to claim 1, wherein the notification unit notifies the person when the volume is larger than a maximum value in a predetermined range.

前記検出部により検出された方向の前記人物の画像上の大きさを計測する計測部と、
前記通知部は、前記音量が前記所定範囲に係る第１範囲の最小値よりも小さく、かつ前記人物の画像上の大きさが第２範囲の最小値よりも小さい場合、前記人物に通知をする請求項１に記載の撮影装置。 A measuring unit for measuring the size of the person in the direction detected by the detecting unit;
The notification unit notifies the person when the volume is smaller than a minimum value of the first range related to the predetermined range and a size of the person on the image is smaller than a minimum value of the second range. The imaging device according to claim 1.

前記検出部により検出された方向の前記人物との距離を計測する計測部と、
前記通知部は、前記音量が前記所定範囲に係る第１範囲の最小値よりも小さく、かつ前記人物との距離が第２範囲の最小値よりも小さい場合、前記人物に通知をする請求項１に記載の撮影装置。 A measurement unit that measures a distance from the person in the direction detected by the detection unit;
The notification unit notifies the person when the volume is smaller than a minimum value of a first range related to the predetermined range and a distance from the person is smaller than a minimum value of a second range. The imaging device described in 1.

前記検出部により検出された方向の前記人物の画像上の大きさを計測する計測部と、
前記通知部は、前記音量が前記所定範囲に係る第１範囲の最大値よりも大きく、かつ前記人物の画像上の大きさが第２範囲の最大値よりも大きい場合、前記人物に通知をする請求項２に記載の撮影装置。 A measuring unit for measuring the size of the person in the direction detected by the detecting unit;
The notification unit notifies the person when the volume is larger than the maximum value of the first range related to the predetermined range and the size of the person on the image is larger than the maximum value of the second range. The imaging device according to claim 2.

前記検出部により検出された方向の前記人物との距離を計測する計測部と、
前記通知部は、前記音量が前記所定範囲に係る第１範囲の最大値よりも大きく、かつ前記人物との距離が第２範囲の最大値よりも大きい場合、前記人物に通知をする請求項２に記載の撮影装置。 A measurement unit that measures a distance from the person in the direction detected by the detection unit;
The notification unit notifies the person when the volume is larger than a maximum value of the first range related to the predetermined range and a distance from the person is larger than a maximum value of the second range. The imaging device described in 1.

前記撮影した画像から抽出された顔画像に対応する人物の識別情報に、通知の有無を対応付けて記憶部に格納する格納部を有し、
前記通知部は、前記撮影された人物に対して通知を行っていない場合に、通知をする請求項１〜６のいずれか１項に記載の撮影装置。 A storage unit for storing in the storage unit the identification information of the person corresponding to the face image extracted from the photographed image in association with the presence or absence of notification;
The imaging device according to any one of claims 1 to 6, wherein the notification unit performs notification when notifying the photographed person.

人物を撮影する撮影装置に実行させるプログラムであって、
発話した人物の方向及び発話時の音量を検出し、
前記検出した方向に撮影方向を制御し、
前記音量が所定範囲の最小値よりも小さい場合、前記人物に通知をする処理を撮影装置に実行させるプログラム。 A program to be executed by a photographing device for photographing a person,
Detect the direction of the person who spoke and the volume when speaking,
Control the shooting direction to the detected direction,
A program for causing a photographing device to execute a process of notifying the person when the volume is smaller than a minimum value in a predetermined range.

人物を撮影する撮影装置が実行する方法であって、
発話した人物の方向及び発話時の音量を検出し、
前記検出した方向に撮影方向を制御し、
前記音量が所定範囲の最小値よりも小さい場合、前記人物に通知をする処理を撮影装置が実行する方法。 A method performed by a photographing device for photographing a person,
Detect the direction of the person who spoke and the volume when speaking,
Control the shooting direction to the detected direction,
A method in which a photographing apparatus executes a process of notifying the person when the volume is smaller than a minimum value in a predetermined range.