JP2019022151A

JP2019022151A - Information processing apparatus, image processing system, control method, and program

Info

Publication number: JP2019022151A
Application number: JP2017141208A
Authority: JP
Inventors: 佐藤　肇; Hajime Sato; 肇佐藤
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-07-20
Filing date: 2017-07-20
Publication date: 2019-02-07

Abstract

To make it possible to generate a virtual viewpoint image with high image quality in a system for generating a virtual viewpoint image using a plurality of cameras even when cameras vibrate.SOLUTION: The information processing apparatus includes: image acquisition means for acquiring a photographing image; vibration information acquiring means for acquiring vibration information with respect to the imaging apparatus which captured the photographing image; clipping means for cutting out a predetermined area used for generating the virtual viewpoint image from the photographing image; and determination means for determining the predetermined area on the basis of the vibration information.SELECTED DRAWING: Figure 3

Description

本発明は、被写体を複数の方向から撮影するための複数のカメラを含む画像処理システムに関する。 The present invention relates to an image processing system including a plurality of cameras for photographing a subject from a plurality of directions.

昨今、複数のカメラを異なる位置に設置して多視点で同期撮影し、当該撮影により得られた複数視点画像を用いて仮想視点コンテンツを生成する技術が注目されている。このような技術によれば、例えば、サッカーやバスケットボールのハイライトシーンを様々な角度から視聴することが出来るため、通常の画像と比較してユーザに高臨場感を与えることができる。 In recent years, attention has been focused on a technique for installing a plurality of cameras at different positions, performing synchronous shooting from multiple viewpoints, and generating virtual viewpoint content using a plurality of viewpoint images obtained by the shooting. According to such a technique, for example, since a highlight scene of soccer or basketball can be viewed from various angles, it is possible to give the user a higher sense of realism than a normal image.

複数視点画像に基づく仮想視点コンテンツの生成においては、仮想視点画像が生成される。すなわち、複数のカメラが撮影した画像がサーバなどの画像処理部に集約され、当該画像処理部が、三次元モデル生成、レンダリングなどの処理を施し、仮想視点画像を生成する。そして、この仮想視点画像をユーザ端末に伝送することで、ユーザは仮想視点コンテンツの閲覧ができる。 In generating virtual viewpoint content based on a plurality of viewpoint images, a virtual viewpoint image is generated. That is, images taken by a plurality of cameras are collected in an image processing unit such as a server, and the image processing unit performs processing such as three-dimensional model generation and rendering to generate a virtual viewpoint image. Then, by transmitting this virtual viewpoint image to the user terminal, the user can browse the virtual viewpoint content.

この種の画像処理を行うシステムにおいては、カメラに付加される振動が当該カメラが捉える画像のブレとなって現われ、画像処理に影響する。このような振動への対策として、ジャイロ等の振動を検出可能なセンサの出力に基づいて、カメラシステムのレンズをシフトさせて撮影する光学的な補正処理がある。また、例えば、４Ｋの解像度から２Ｋの解像度の画像を切り出すことで、画像のブレを取り除いた電子的な補正処理もある。 In a system that performs this type of image processing, vibration added to the camera appears as blurring of an image captured by the camera, which affects image processing. As a countermeasure against such vibration, there is an optical correction process in which the lens of the camera system is shifted and photographed based on the output of a sensor capable of detecting vibration such as a gyro. Also, for example, there is an electronic correction process in which image blurring is removed by cutting out an image with a resolution of 2K from a resolution of 4K.

特許文献１には、立体視を可能にする複眼カメラが、光軸中心が水平方向に一致するように調整されており、複眼カメラで手振れ補正を行う技術が開示されている。個々のカメラで補正量に差が出ると、レンズの初期中心位置がずれて立体視ができなくなるため、光軸中心の位置がずれないように共通のサイズおよびアスペクト比で視点画像が切り出される構成が記載されている。 Patent Document 1 discloses a technique in which a compound eye camera that enables stereoscopic viewing is adjusted so that the center of the optical axis coincides with the horizontal direction, and camera shake correction is performed with the compound eye camera. If there is a difference in the correction amount between individual cameras, the initial center position of the lens will shift and stereoscopic viewing will not be possible, so the viewpoint image will be cut out with a common size and aspect ratio so that the position of the optical axis center will not shift Is described.

国際公開第２０１１／１１４５７２号International Publication No. 2011/114572

光学的補正処理または電子的補正処理のどちらかを利用して補正処理を実行していた従来のカメラシステムにおいては、検出した振動信号のうち、比較的振幅の小さなブレに対しては、十分に余裕を確保して補正することが可能であった。 In the conventional camera system that executes the correction process using either the optical correction process or the electronic correction process, it is sufficient to detect a vibration with a relatively small amplitude among the detected vibration signals. It was possible to correct with a margin.

しかしながら、例えば、スタジアムなどで観客がとび跳ねたりして発生した振幅の大きなブレに対しては、十分な補正の余裕を確保することが難しい。また、複数のカメラを備えた画像処理システムにおいては、カメラごとに振幅や周波数の異なる振動が付加される。このため、上記特許文献１に記載された技術のように、同じ補正量で各カメラの画像のブレを補正すると、各補正画像を統合して仮想視点画像を生成することが困難となる場合がある。 However, for example, it is difficult to secure a sufficient correction margin for a shake having a large amplitude caused by a spectator jumping at a stadium or the like. Further, in an image processing system including a plurality of cameras, vibrations having different amplitudes and frequencies are added for each camera. For this reason, as in the technique described in Patent Document 1 described above, when blurring of the images of each camera is corrected with the same correction amount, it may be difficult to integrate the corrected images and generate a virtual viewpoint image. is there.

本発明は、上記の課題に鑑みてなされたものであり、その目的は、複数のカメラを用いて仮想視点画像を生成するシステムにおいて、カメラが振動する場合であっても、画質の高い仮想視点画像を生成できるようにすることである。 The present invention has been made in view of the above problems, and an object of the present invention is to generate a virtual viewpoint with high image quality even when the camera vibrates in a system that generates a virtual viewpoint image using a plurality of cameras. It is to be able to generate an image.

上記課題を解決するため、本発明に係る情報処理装置は、撮影画像を取得する画像取得手段と、撮影画像を撮影した撮影装置に対する振動情報を取得する振動情報取得手段と、仮想視点画像の生成に用いる所定領域を撮影画像から切り出す切り出し手段と、振動情報に基づいて所定領域を決定する決定手段と、を備える。 In order to solve the above problems, an information processing apparatus according to the present invention includes an image acquisition unit that acquires a captured image, a vibration information acquisition unit that acquires vibration information for the imaging device that has captured the captured image, and a generation of a virtual viewpoint image. A cutout unit that cuts out a predetermined region used for the image from the captured image, and a determination unit that determines the predetermined region based on the vibration information.

本発明によれば、複数のカメラを用いて仮想視点画像を生成するシステムにおいて、カメラが振動する場合であっても、画質の高い仮想視点画像を生成できるようになる。 According to the present invention, in a system that generates a virtual viewpoint image using a plurality of cameras, a virtual viewpoint image with high image quality can be generated even when the camera vibrates.

画像処理システム１００の構成を説明するための図。1 is a diagram for explaining a configuration of an image processing system 100. FIG. カメラアダプタ１２０の機能構成を説明するためのブロック図。The block diagram for demonstrating the function structure of the camera adapter. 画像処理部６１３０の構成を説明するためのブロック図。FIG. 9 is a block diagram for explaining the configuration of an image processing unit 6130. 振動情報に基づいた回転処理について説明するための図。The figure for demonstrating the rotation process based on vibration information. 振動情報に基づいた画像の切り出し処理を説明するための図。The figure for demonstrating the cutting-out process of the image based on vibration information. 前景判定部５０１１が行う判断処理を説明するための図。The figure for demonstrating the judgment process which the foreground judgment part 5011 performs. 画像処理部６１３０内のキャリブレーション制御部６１３３及び前景背景分離部６１３１の処理を示すフローチャート。15 is a flowchart showing processing of a calibration control unit 6133 and foreground / background separation unit 6131 in the image processing unit 6130. 背景画像の例を示す図。The figure which shows the example of a background image. 画像処理部６１３０内の背景切出部５００４の処理を示すフローチャート。10 is a flowchart showing processing of a background cutout unit 5004 in the image processing unit 6130. 画像処理部６１３０内の三次元モデル情報生成部６１３２の処理を示すフローチャート。14 is a flowchart showing processing of a three-dimensional model information generation unit 6132 in the image processing unit 6130. フロントエンドサーバ２３０の機能構成を説明するためのブロック図。The block diagram for demonstrating the function structure of the front end server 230. FIG. データベース２５０の機能構成を説明するためのブロック図。The block diagram for demonstrating the function structure of the database 250. FIG. バックエンドサーバ２７０の機能構成を説明するためのブロック図。The block diagram for demonstrating the function structure of the back end server 270. FIG. 仮想カメラ８００１について説明するための図。The figure for demonstrating the virtual camera 8001. FIG. 仮想カメラ操作ＵＩ３３０の機能構成を説明するためのブロック図。4 is a block diagram for explaining a functional configuration of a virtual camera operation UI 330. FIG. エンドユーザ端末１９０の接続構成を説明するための図。The figure for demonstrating the connection structure of the end user terminal 190. FIG. カメラアダプタ１２０のハードウェア構成を示すブロック図。The block diagram which shows the hardware constitutions of the camera adapter 120. FIG.

以下図面に従って本発明に係る実施形態を詳細に説明する。なお、以下の実施形態において示す構成は一例に過ぎず、本発明は図示された構成に限定されるものではない。 Embodiments according to the present invention will be described below in detail with reference to the drawings. The configurations shown in the following embodiments are merely examples, and the present invention is not limited to the illustrated configurations.

競技場（スタジアム）やコンサートホールなどの施設に複数のカメラ及びマイクを設置し撮影及び集音を行うシステムについて、図１を用いて説明する。 A system for shooting and collecting sound by installing a plurality of cameras and microphones in facilities such as a stadium (stadium) and a concert hall will be described with reference to FIG.

＜画像処理システム１００の説明＞
図１は、画像処理システム１００の構成を説明するための図である。画像処理システム１００は、センサシステム１１０ａ，…，１１０ｚと、画像コンピューティングサーバ２００と、コントローラ３００と、スイッチングハブ１８０と、エンドユーザ端末１９０とを含む。 <Description of Image Processing System 100>
FIG. 1 is a diagram for explaining the configuration of the image processing system 100. The image processing system 100 includes a sensor system 110a,..., 110z, an image computing server 200, a controller 300, a switching hub 180, and an end user terminal 190.

コントローラ３００は、制御ステーション３１０と、仮想カメラ操作ＵＩ３３０とを含む。制御ステーション３１０は、画像処理システム１００を構成するそれぞれのブロックに対して、ネットワーク３１０ａ，３１０ｂ，３１０ｃ、１８０ａ、１８０ｂ、及び１７０ａ，…，１７０ｙを通じて動作状態の管理及びパラメータ設定制御などを行う。 The controller 300 includes a control station 310 and a virtual camera operation UI 330. The control station 310 performs operation state management, parameter setting control, and the like through the networks 310a, 310b, 310c, 180a, 180b, and 170a,.

＜センサシステム１１０の説明＞
最初に、センサシステム１１０ａ，…，センサシステム１１０ｚの２６セットの画像及び音声をセンサシステム１１０ｚから画像コンピューティングサーバ２００へ送信する動作について説明する。 <Description of Sensor System 110>
First, the operation of transmitting 26 sets of images and sounds of the sensor system 110a,..., Sensor system 110z from the sensor system 110z to the image computing server 200 will be described.

画像処理システム１００では、デイジーチェーンにより接続されたセンサシステム１１０を複数備える。ここで、本実施形態において、特別な説明がない場合は、センサシステム１１０ａからセンサシステム１１０ｚまでの２６セットのシステムを区別せずセンサシステム１１０と記載する。各センサシステム１１０内の装置についても同様に、特別な説明がない場合は区別せず、マイク１１１、カメラ１１２、雲台１１３、外部センサ１１４、及びカメラアダプタ１２０と記載する。なお、センサシステムの台数として２６セットと記載しているが、あくまでも一例であり、台数をこれに限定するものではない。また、本実施形態では、特に断りがない限り、画像という文言が、動画と静止画の概念を含むものとして説明する。すなわち、本実施形態の画像処理システム１００は、静止画及び動画の何れについても処理可能である。また、本実施形態では、画像処理システム１００により提供される仮想視点コンテンツには、仮想視点画像と仮想視点音声が含まれる例を中心に説明するが、これに限らない。例えば、仮想視点コンテンツに音声が含まれていなくても良い。また、例えば、仮想視点コンテンツに含まれる音声が、仮想視点に最も近いマイクにより集音された音声であっても良い。また、本実施形態では、説明の簡略化のため、部分的に音声についての記載を省略しているが、基本的に画像と音声は共に処理されるものとする。また、情報処理装置は、例えば、カメラアダプタ１２０である。 The image processing system 100 includes a plurality of sensor systems 110 connected by a daisy chain. Here, in this embodiment, unless there is a special description, 26 sets of systems from the sensor system 110a to the sensor system 110z are described as the sensor system 110 without being distinguished. Similarly, the devices in each sensor system 110 are also referred to as a microphone 111, a camera 112, a pan head 113, an external sensor 114, and a camera adapter 120 unless otherwise specified. Although the number of sensor systems is described as 26 sets, it is merely an example, and the number of sensor systems is not limited to this. In the present embodiment, the term “image” will be described as including the concept of a moving image and a still image unless otherwise specified. That is, the image processing system 100 according to the present embodiment can process both still images and moving images. In the present embodiment, the virtual viewpoint content provided by the image processing system 100 will be described mainly with reference to an example in which a virtual viewpoint image and virtual viewpoint sound are included, but the present invention is not limited to this. For example, audio may not be included in the virtual viewpoint content. For example, the sound included in the virtual viewpoint content may be a sound collected by a microphone closest to the virtual viewpoint. Further, in this embodiment, for the sake of simplicity of explanation, the description of the sound is partially omitted, but it is basically assumed that both the image and the sound are processed. The information processing apparatus is, for example, a camera adapter 120.

センサシステム１１０ａ，…，１１０ｚは、それぞれ１台ずつのカメラ１１２ａ，…，１１２ｚを含む。すなわち、画像処理システム１００は、被写体を複数の方向から撮影するための複数のカメラを有する。 Each of the sensor systems 110a, ..., 110z includes one camera 112a, ..., 112z. That is, the image processing system 100 includes a plurality of cameras for photographing a subject from a plurality of directions.

センサシステム１１０は、マイク１１１と、カメラ１１２と、雲台１１３と、外部センサ１１４と、カメラアダプタ１２０とを含んで構成されるが、この構成に限定するものではない。 The sensor system 110 includes a microphone 111, a camera 112, a pan head 113, an external sensor 114, and a camera adapter 120, but is not limited to this configuration.

マイク１１１ａにて集音された音声と、カメラ１１２ａにて撮影された画像は、カメラアダプタ１２０ａにおいて後述の画像処理が施された後、デイジーチェーン１７０ａを通してセンサシステム１１０ｂのカメラアダプタ１２０ｂに伝送される。同様に、センサシステム１１０ｂは、集音された音声と撮影された画像を、センサシステム１１０ａから取得した画像及び音声と合わせてセンサシステム１１０ｃに伝送する。 The sound collected by the microphone 111a and the image photographed by the camera 112a are subjected to image processing described later in the camera adapter 120a, and then transmitted to the camera adapter 120b of the sensor system 110b through the daisy chain 170a. . Similarly, the sensor system 110b transmits the collected sound and the captured image together with the image and sound acquired from the sensor system 110a to the sensor system 110c.

前述した動作を続けることにより、センサシステム１１０ａ，…，１１０ｚが取得した画像及び音声は、センサシステム１１０ｚからネットワーク１８０ｂを用いてスイッチングハブ１８０に伝わり、その後、画像コンピューティングサーバ２００へ伝送される。 By continuing the above-described operation, images and sounds acquired by the sensor systems 110a,..., 110z are transmitted from the sensor system 110z to the switching hub 180 using the network 180b, and then transmitted to the image computing server 200.

なお、本実施形態では、カメラ１１２ａ，…，１１２ｚと、カメラアダプタ１２０ａ，…，１２０ｚが分離された構成にしているが、同一筺体で一体化されていてもよい。その場合、マイク１１１ａ，…，１１１ｚは一体化されたカメラ１１２に内蔵されてもよいし、カメラ１１２の外部に接続されていてもよい。 In the present embodiment, the cameras 112a, ..., 112z and the camera adapters 120a, ..., 120z are separated from each other, but they may be integrated in the same casing. In that case, the microphones 111 a,..., 111 z may be built in the integrated camera 112 or may be connected to the outside of the camera 112.

＜画像コンピューティングサーバ２００の説明＞
次に、画像コンピューティングサーバ２００の構成及び動作について説明する。画像コンピューティングサーバ２００は、センサシステム１１０ｚから取得したデータの処理を行う。 <Description of Image Computing Server 200>
Next, the configuration and operation of the image computing server 200 will be described. The image computing server 200 processes data acquired from the sensor system 110z.

画像コンピューティングサーバ２００は、フロントエンドサーバ２３０と、データベース２５０（以下、ＤＢと記載する場合がある。）と、バックエンドサーバ２７０と、タイムサーバ２９０とを含む。 The image computing server 200 includes a front-end server 230, a database 250 (hereinafter sometimes referred to as DB), a back-end server 270, and a time server 290.

タイムサーバ２９０は、時刻及び同期信号を配信する機能を有し、スイッチングハブ１８０を介してセンサシステム１１０ａ，…，１１０ｚに時刻及び同期信号を配信する。時刻と同期信号を受信したカメラアダプタ１２０ａ，…，１２０ｚは、カメラ１１２ａ，…，１１２ｚを、時刻と同期信号とをもとにＧｅｎｌｏｃｋさせ画像フレーム同期を行う。すなわち、タイムサーバ２９０は、複数のカメラ１１２の撮影タイミングを同期させる。 The time server 290 has a function of distributing the time and the synchronization signal, and distributes the time and the synchronization signal to the sensor systems 110a, ..., 110z via the switching hub 180. The camera adapters 120a,..., 120z that have received the time and the synchronization signal cause the cameras 112a,..., 112z to Genlock based on the time and the synchronization signal to perform image frame synchronization. That is, the time server 290 synchronizes the shooting timings of the plurality of cameras 112.

フロントエンドサーバ２３０は、センサシステム１１０ｚから取得した画像及び音声から、セグメント化された伝送パケットを再構成してデータ形式を変換した後に、カメラの識別子やデータ種別、フレーム番号に応じてデータベース２５０に書き込む。 The front-end server 230 reconstructs the segmented transmission packet from the image and sound acquired from the sensor system 110z and converts the data format, and then stores it in the database 250 according to the camera identifier, data type, and frame number. Write.

バックエンドサーバ２７０は、仮想カメラ操作ＵＩ３３０から視点の指定を受け付け、受け付けられた視点に基づいて、データベース２５０から対応する画像及び音声データを読み出し、レンダリング処理を行って仮想視点画像を生成する。 The back-end server 270 receives the designation of the viewpoint from the virtual camera operation UI 330, reads the corresponding image and audio data from the database 250 based on the accepted viewpoint, and performs rendering processing to generate a virtual viewpoint image.

なお、画像コンピューティングサーバ２００の構成はこれに限らない。例えば、フロントエンドサーバ２３０、データベース２５０、及びバックエンドサーバ２７０のうち少なくとも２つが一体に構成されていてもよい。また、フロントエンドサーバ２３０、データベース２５０、及びバックエンドサーバ２７０の少なくとも何れかが複数存在していてもよい。また、画像コンピューティングサーバ２００の任意の位置に、上記の装置以外の装置が含まれていてもよい。さらに、画像コンピューティングサーバ２００の機能の少なくとも一部をエンドユーザ端末１９０や仮想カメラ操作ＵＩ３３０が有していてもよい。 The configuration of the image computing server 200 is not limited to this. For example, at least two of the front-end server 230, the database 250, and the back-end server 270 may be configured integrally. A plurality of at least one of the front-end server 230, the database 250, and the back-end server 270 may exist. In addition, a device other than the above devices may be included in an arbitrary position of the image computing server 200. Further, the end user terminal 190 and the virtual camera operation UI 330 may have at least a part of the functions of the image computing server 200.

レンダリング処理された仮想視点画像は、バックエンドサーバ２７０からエンドユーザ端末１９０に送信され、エンドユーザ端末１９０を操作するユーザは視点の指定に応じた画像閲覧及び音声視聴が出来る。すなわち、バックエンドサーバ２７０は、複数のカメラ１１２により撮影された撮影画像（複数視点画像）と視点情報とに基づく仮想視点コンテンツを生成する。より具体的には、バックエンドサーバ２７０は、例えば、複数のカメラアダプタ１２０により複数のカメラ１１２による撮影画像から切り出された所定領域の画像データと、ユーザ操作により指定された視点に基づいて、仮想視点コンテンツを生成する。そして、バックエンドサーバ２７０は、生成した仮想視点コンテンツをエンドユーザ端末１９０に提供する。カメラアダプタ１２０による所定領域の切り出し処理の詳細については後述する。 The rendered virtual viewpoint image is transmitted from the back-end server 270 to the end user terminal 190, and the user who operates the end user terminal 190 can view images and view audio according to the designation of the viewpoint. That is, the back-end server 270 generates virtual viewpoint content based on captured images (multiple viewpoint images) captured by the plurality of cameras 112 and viewpoint information. More specifically, the back-end server 270 performs, for example, virtual image processing based on image data of a predetermined area cut out from images captured by the plurality of cameras 112 by the plurality of camera adapters 120 and the viewpoint specified by the user operation. Generate viewpoint content. Then, the back end server 270 provides the generated virtual viewpoint content to the end user terminal 190. Details of the predetermined area cut-out processing by the camera adapter 120 will be described later.

本実施形態における仮想視点コンテンツは、仮想的な視点から被写体を撮影した場合に得られる画像としての仮想視点画像を含むコンテンツである。言い換えると、仮想視点画像は、指定された視点における見えを表す画像であるとも言える。仮想的な視点（仮想視点）は、ユーザにより指定されても良いし、画像解析の結果等に基づいて自動的に指定されても良い。すなわち、仮想視点画像には、ユーザが任意に指定した視点に対応する任意視点画像（自由視点画像）が含まれる。また、複数の候補からユーザが指定した視点に対応する画像や、装置が自動で指定した視点に対応する画像も、仮想視点画像に含まれる。なお、本実施形態では、仮想視点コンテンツに音声データ（オーディオデータ）が含まれる場合の例を中心に説明するが、必ずしも音声データが含まれていなくても良い。 The virtual viewpoint content in the present embodiment is content including a virtual viewpoint image as an image obtained when a subject is photographed from a virtual viewpoint. In other words, it can be said that the virtual viewpoint image is an image representing the appearance at the designated viewpoint. The virtual viewpoint (virtual viewpoint) may be specified by the user, or may be automatically specified based on the result of image analysis or the like. That is, the virtual viewpoint image includes an arbitrary viewpoint image (free viewpoint image) corresponding to the viewpoint arbitrarily designated by the user. An image corresponding to the viewpoint designated by the user from a plurality of candidates and an image corresponding to the viewpoint automatically designated by the apparatus are also included in the virtual viewpoint image. In this embodiment, an example in which audio data (audio data) is included in the virtual viewpoint content will be mainly described. However, audio data may not necessarily be included.

仮想カメラ操作ＵＩ３３０は、バックエンドサーバ２７０を介してデータベース２５０にアクセスする。バックエンドサーバ２７０で画像生成処理に係わる共通処理を行い、操作ＵＩに係わるアプリケーションの差分部分を仮想カメラ操作ＵＩ３３０で行っている。 The virtual camera operation UI 330 accesses the database 250 via the back end server 270. The back-end server 270 performs common processing related to image generation processing, and the virtual camera operation UI 330 performs a difference portion of the application related to the operation UI.

このように、画像処理システム１００においては、被写体を複数の方向から撮影するための複数のカメラ１１２による撮影画像に基づいて、バックエンドサーバ２７０により仮想視点画像が生成される。なお、本実施形態における画像処理システム１００は、上記で説明した物理的な構成に限定される訳ではなく、論理的に構成されていてもよい。 As described above, in the image processing system 100, the virtual viewpoint image is generated by the back-end server 270 based on the images captured by the plurality of cameras 112 for capturing the subject from a plurality of directions. Note that the image processing system 100 in the present embodiment is not limited to the physical configuration described above, and may be logically configured.

＜機能ブロック図の説明＞
次に、画像処理システムシステム１００における各ノード（カメラアダプタ１２０、フロントエンドサーバ２３０、データベース２５０、バックエンドサーバ２７０、仮想カメラ操作ＵＩ３３０、エンドユーザ端末１９０）の機能ブロック図を説明する。 <Explanation of functional block diagram>
Next, a functional block diagram of each node (camera adapter 120, front end server 230, database 250, back end server 270, virtual camera operation UI 330, end user terminal 190) in the image processing system 100 will be described.

図２は、カメラアダプタ１２０の機能構成を説明するためのブロック図である。カメラアダプタ１２０は、ネットワークアダプタ６１１０と、伝送部６１２０と、画像処理部６１３０と、外部機器制御部６１４０と、を含む。ネットワークアダプタ６１１０は、データ送受信部６１１１と、時刻制御部６１１２とを含む。 FIG. 2 is a block diagram for explaining a functional configuration of the camera adapter 120. The camera adapter 120 includes a network adapter 6110, a transmission unit 6120, an image processing unit 6130, and an external device control unit 6140. Network adapter 6110 includes a data transmission / reception unit 6111 and a time control unit 6112.

データ送受信部６１１１は、デイジーチェーン１７０、ネットワーク２９１、及びネットワーク３１０ａを介し、他のカメラアダプタ１２０、フロントエンドサーバ２３０、タイムサーバ２９０、及び制御ステーション３１０とデータ通信を行う。例えば、データ送受信部６１１１は、カメラ１１２による撮影画像から前景背景分離部６１３１により分離された前景画像と背景画像とを、別のカメラアダプタ１２０に対して出力する。出力先のカメラアダプタ１２０は、画像処理システム１００のカメラアダプタ１２０のうち、データルーティング処理部６１２２の処理に応じて予め定められた順序における次のカメラアダプタ１２０である。各カメラアダプタ１２０が、前景画像及び背景画像を出力することで、複数の視点から撮影された前景画像と背景画像とに基づいた仮想視点画像が生成される。なお、撮影画像から分離した前景画像を出力して背景画像は出力しないカメラアダプタ１２０が存在してもよい。 The data transmission / reception unit 6111 performs data communication with the other camera adapters 120, the front-end server 230, the time server 290, and the control station 310 via the daisy chain 170, the network 291 and the network 310a. For example, the data transmission / reception unit 6111 outputs the foreground image and the background image separated by the foreground / background separation unit 6131 from the image captured by the camera 112 to another camera adapter 120. The output destination camera adapter 120 is the next camera adapter 120 in the order predetermined according to the processing of the data routing processing unit 6122 among the camera adapters 120 of the image processing system 100. Each camera adapter 120 outputs a foreground image and a background image, thereby generating a virtual viewpoint image based on the foreground image and the background image taken from a plurality of viewpoints. There may be a camera adapter 120 that outputs a foreground image separated from a captured image and does not output a background image.

時刻制御部６１１２は、例えば、ＩＥＥＥ１５８８規格のＯｒｄｉｎａｙＣｌｏｃｋに準拠し、タイムサーバ２９０との間で送受信したデータのタイムスタンプを保存する機能と、タイムサーバ２９０と時刻同期を行う機能とを有する。なお、ＩＥＥＥ１５８８に限定する訳ではなく、他のＥｔｈｅｒＡＶＢ規格や、独自プロトコルによってタイムサーバとの時刻同期を実現してもよい。 The time control unit 6112 conforms to, for example, the IEEE 1588 standard Ordinary Clock, and has a function of storing a time stamp of data transmitted to and received from the time server 290 and a function of performing time synchronization with the time server 290. Note that the present invention is not limited to IEEE 1588, and time synchronization with a time server may be realized by another EtherAVB standard or a unique protocol.

伝送部６１２０は、データ圧縮・伸張部６１２１と、データルーティング処理部６１２２と、時刻同期制御部６１２３と、画像・音声伝送処理部６１２４と、データルーティング情報保持部６１２５とを含む。 The transmission unit 6120 includes a data compression / decompression unit 6121, a data routing processing unit 6122, a time synchronization control unit 6123, an image / audio transmission processing unit 6124, and a data routing information holding unit 6125.

データ圧縮・伸張部６１２１は、データ送受信部６１１１を介して送受信されるデータに対して所定の圧縮方式、圧縮率、及びフレームレートを適用した圧縮を行う機能と、圧縮されたデータを伸張する機能とを有している。 The data compression / decompression unit 6121 has a function of compressing data transmitted / received via the data transmission / reception unit 6111 by applying a predetermined compression method, compression rate, and frame rate, and a function of decompressing the compressed data And have.

データルーティング処理部６１２２は、後述するデータルーティング情報保持部６１２５が保持するデータを利用し、データ送受信部６１１１が受信したデータ及び画像処理部６１３０で処理されたデータのルーティング先を決定する。さらに、決定したルーティング先へデータを送信する機能を有している。ルーティング先としては、同一の注視点にフォーカスされたカメラ１１２に対応するカメラアダプタ１２０とするのが、それぞれのカメラ１１２同士の画像フレーム相関が高いため画像処理を行う上で好適である。複数のカメラアダプタ１２０それぞれのデータルーティング処理部６１２２による決定に応じて、画像処理システム１００において前景画像や背景画像をリレー形式で出力するカメラアダプタ１２０の順序が定まる。 The data routing processing unit 6122 determines the routing destination of the data received by the data transmitting / receiving unit 6111 and the data processed by the image processing unit 6130 using data held by the data routing information holding unit 6125 described later. Further, it has a function of transmitting data to the determined routing destination. As a routing destination, the camera adapter 120 corresponding to the camera 112 focused on the same gazing point is suitable for performing image processing because the image frame correlation between the cameras 112 is high. In accordance with the determination by the data routing processing unit 6122 of each of the plurality of camera adapters 120, the order of the camera adapters 120 that output the foreground image and the background image in the relay format in the image processing system 100 is determined.

時刻同期制御部６１２３は、ＩＥＥＥ１５８８規格のＰＴＰ（ＰｒｅｃｉｓｉｏｎＴｉｍｅＰｒｏｔｏｃｏｌ）に準拠し、タイムサーバ２９０と時刻同期に係わる処理を行う機能を有している。なお、ＰＴＰに限定するのではなく、他の同様のプロトコルを利用して時刻同期してもよい。 The time synchronization control unit 6123 conforms to the IEEE 1588 standard PTP (Precision Time Protocol) and has a function of performing processing related to time synchronization with the time server 290. The time synchronization is not limited to PTP, and other similar protocols may be used.

画像・音声伝送処理部６１２４は、画像データ又は音声データを、データ送受信部６１１１を介して他のカメラアダプタ１２０またはフロントエンドサーバ２３０へ転送するためのメッセージを作成する機能を有している。メッセージには、画像データ又は音声データ、及び各データのメタ情報が含まる。本実施形態のメタ情報には、画像の撮影または音声のサンプリング時のタイムコードまたはシーケンス番号、データ種別、及びカメラ１１２やマイク１１１の個体を示す識別子などが含まれる。なお、送信する画像データまたは音声データは、データ圧縮・伸張部６１２１でデータ圧縮されていてもよい。また、画像・音声伝送処理部６１２４は、他のカメラアダプタ１２０からデータ送受信部６１１１を介してメッセージを受取る。そして、メッセージに含まれるデータ種別に応じて、伝送プロトコル規定のパケットサイズにフラグメントされたデータ情報を画像データまたは音声データに復元する。なお、データを復元した際にデータが圧縮されている場合は、データ圧縮・伸張部６１２１が伸張処理を行う。 The image / audio transmission processing unit 6124 has a function of creating a message for transferring image data or audio data to another camera adapter 120 or the front-end server 230 via the data transmission / reception unit 6111. The message includes image data or audio data, and meta information of each data. The meta information of this embodiment includes a time code or sequence number at the time of image capturing or audio sampling, a data type, an identifier indicating the individual of the camera 112 or the microphone 111, and the like. Note that image data or audio data to be transmitted may be compressed by the data compression / decompression unit 6121. The image / sound transmission processing unit 6124 receives a message from another camera adapter 120 via the data transmission / reception unit 6111. Then, in accordance with the data type included in the message, the data information fragmented to the packet size specified by the transmission protocol is restored to image data or audio data. If the data is compressed when the data is restored, the data compression / decompression unit 6121 performs decompression processing.

データルーティング情報保持部６１２５は、データ送受信部６１１１で送受信されるデータの送信先を決定するためのアドレス情報を保持する機能を有する。 The data routing information holding unit 6125 has a function of holding address information for determining a transmission destination of data transmitted / received by the data transmitting / receiving unit 6111.

外部機器制御部６１４０は、カメラアダプタ１２０に接続する機器を制御する機能を有し、カメラ制御部６１４１と、マイク制御部６１４２と、雲台制御部６１４３と、センサ制御部６１４４とを含む。 The external device control unit 6140 has a function of controlling devices connected to the camera adapter 120, and includes a camera control unit 6141, a microphone control unit 6142, a pan head control unit 6143, and a sensor control unit 6144.

カメラ制御部６１４１は、カメラ１１２と接続し、カメラ１１２の制御、撮影画像取得、同期信号提供、及び時刻設定などを行う機能を有している。カメラ１１２の制御には、例えば、撮影パラメータ（画素数、色深度、フレームレート、及びホワイトバランスなど）の設定及び参照、カメラ１１２の状態（撮影中、停止中、同期中、及びエラーなど）の取得、撮影の開始及び停止や、ピント調整などがある。同期信号提供は、時刻同期制御部６１２３がタイムサーバ２９０と同期した時刻を利用し、撮影タイミング（制御クロック）をカメラ１１２に提供することで行われる。時刻設定は、時刻同期制御部６１２３がタイムサーバ２９０と同期した時刻を、例えばＳＭＰＴＥ１２Ｍのフォーマットに準拠したタイムコードで提供することで行われる。これにより、カメラ１１２から受取る画像データに提供したタイムコードが付与されることになる。 The camera control unit 6141 is connected to the camera 112 and has functions of controlling the camera 112, obtaining a captured image, providing a synchronization signal, setting a time, and the like. The control of the camera 112 includes, for example, setting and referring to shooting parameters (number of pixels, color depth, frame rate, white balance, etc.), and status of the camera 112 (shooting, stopping, synchronizing, error, etc.) Acquisition, start and stop of shooting, focus adjustment, etc. The synchronization signal is provided by using the time synchronized with the time server 290 by the time synchronization control unit 6123 and providing the camera 112 with the photographing timing (control clock). The time setting is performed by providing the time synchronized with the time server 290 by the time synchronization control unit 6123 using, for example, a time code conforming to the SMPTE12M format. As a result, the provided time code is added to the image data received from the camera 112.

センサ制御部６１４４は、外部センサ１１４と接続し、外部センサ１１４がセンシングしたセンサ情報を取得する機能を有する。例えば、外部センサ１１４としてジャイロセンサが利用されると、振動を表す情報（以下、「振動情報」と呼ぶ。）を取得することができる。そして、センサ制御部６１４４が取得した振動情報を用い、画像処理部６１３０は、撮影画像から所定領域を切り出す。なお、センサシステム１１０のセンサは外部センサ１１４に限定するわけではなく、カメラアダプタ１２０に内蔵されたセンサであっても同様の効果が得られる。振動情報取得手段は、例えば、センサ制御部６１４４である。 The sensor control unit 6144 is connected to the external sensor 114 and has a function of acquiring sensor information sensed by the external sensor 114. For example, when a gyro sensor is used as the external sensor 114, information indicating vibration (hereinafter referred to as “vibration information”) can be acquired. Then, using the vibration information acquired by the sensor control unit 6144, the image processing unit 6130 cuts out a predetermined area from the captured image. The sensor of the sensor system 110 is not limited to the external sensor 114, and the same effect can be obtained even if the sensor is built in the camera adapter 120. The vibration information acquisition unit is, for example, the sensor control unit 6144.

振動情報は、外部センサ１１４で検出される物理的振動を、センサ制御部６１４４でアナログ情報からデジタル情報に変換したデータである。そして、画像処理部６１３０は、振動情報に基いた画像の切り出し処理（図３において詳述する。）および隣接設置されたカメラ１１２の画像との位置合わせを実行する。位置合わせは、例えば、隣接のカメラ画像をカメラの設置情報から射影変換し、各画像の輝度分布から特徴点を抽出し、当該特徴点同士をマッチングさせることで行う。なお、位置合わせと振動情報に基づく切り出し処理の実行順序は、装置のメモリや伝送速度などで適宜決定すればよい。 The vibration information is data obtained by converting physical vibration detected by the external sensor 114 from analog information to digital information by the sensor control unit 6144. Then, the image processing unit 6130 executes an image cut-out process (described in detail in FIG. 3) based on the vibration information and alignment with the image of the adjacent camera 112. The alignment is performed, for example, by projectively transforming adjacent camera images from camera installation information, extracting feature points from the luminance distribution of each image, and matching the feature points. Note that the execution order of the clipping process based on the alignment and vibration information may be appropriately determined depending on the memory of the apparatus, the transmission speed, and the like.

また、振動検出可能な外部センサ１１４は、カメラごとではなく、所定の間隔で間引いた状態で設置してもよい。すなわち、外部センサ１１４の個数は、カメラの個数より少なくてもよい。外部センサ１１４の取付位置、カメラの取付位置、さらには振動発生源やセンサシステムを取り付ける構造物において予め測定しておく振動データに基づいて、外部センサがないカメラの振動情報を補完して生成してもよい。 Further, the external sensor 114 capable of detecting vibration may be installed in a state where it is thinned out at a predetermined interval instead of every camera. That is, the number of external sensors 114 may be smaller than the number of cameras. Based on the vibration data measured in advance in the mounting position of the external sensor 114, the mounting position of the camera, and the vibration source and the structure to which the sensor system is mounted, the vibration information of the camera without the external sensor is complemented and generated. May be.

次に、図３を参照して、画像処理部６１３０の構成について詳述する。図３は、画像処理部６１３０の構成を説明するためのブロック図である。画像処理部６１３０は、前景背景分離部６１３１と、三次元モデル情報生成部６１３２と、キャリブレーション制御部６１３３とを含む。 Next, the configuration of the image processing unit 6130 will be described in detail with reference to FIG. FIG. 3 is a block diagram for explaining the configuration of the image processing unit 6130. The image processing unit 6130 includes a foreground / background separation unit 6131, a 3D model information generation unit 6132, and a calibration control unit 6133.

キャリブレーション制御部６１３３は、上述したように入力された画像に対して切り出し処理や位置合わせを行う。本実施形態における画像処理システム１００では、複数のカメラによる撮影範囲が重複しているため、各カメラで撮影された撮影画像全体を常に取得する必要はない。そこで、画像処理部６１３０は、各撮影画像から所定領域を切り出し、切り出した画像について画像処理を行う。カメラが振動しても常に同一の実空間の画像が切り出されるようにするため、キャリブレーション制御部６１３３は、振動情報に基づいて切り出す領域を決定する。 The calibration control unit 6133 performs cut-out processing and alignment on the input image as described above. In the image processing system 100 according to the present embodiment, since the shooting ranges of a plurality of cameras overlap, it is not always necessary to acquire the entire shot image shot by each camera. Therefore, the image processing unit 6130 cuts out a predetermined area from each captured image and performs image processing on the cut out image. In order to always cut out the same real space image even if the camera vibrates, the calibration control unit 6133 determines a region to be cut out based on the vibration information.

キャリブレーション制御部６１３３は、例えば、８Ｋカメラの画像データを、振動情報に基づいて、元の８Ｋサイズよりも小さいサイズで切り出して、隣接設置されたカメラ１１２の画像と位置合わせを行う。この結果、電子的に防振された画像を生成でき、画像コンピューティングサーバ２００におけるカメラ１１２の台数分の位置合わせの処理負荷を軽減する効果が得られる。 For example, the calibration control unit 6133 cuts out image data of the 8K camera with a size smaller than the original 8K size based on the vibration information, and performs alignment with the image of the adjacent camera 112. As a result, it is possible to generate an image that is electronically shaken, and an effect of reducing the processing load of the alignment for the number of cameras 112 in the image computing server 200 can be obtained.

図４及び図５を参照して、振動情報に基づいた切り出し処理について説明する。キャリブレーション制御部６１３３は、上述したように、振動情報に基づいて撮影画像から所定領域（解像度）について切り出し処理を行う。具体的には、振動情報の各方向の振動振幅および回転方向に基づいて、その切り出し方向、原点からのシフト量を決定する。 With reference to FIG.4 and FIG.5, the extraction process based on vibration information is demonstrated. As described above, the calibration control unit 6133 performs a clipping process on a predetermined region (resolution) from the captured image based on the vibration information. Specifically, based on the vibration amplitude and rotation direction in each direction of the vibration information, the cutout direction and the shift amount from the origin are determined.

図４は、振動情報に基づいた回転処理について説明するための図である。外部センサ１１４は、２軸（Ｘ軸（水平）、Ｙ軸（鉛直））について振動情報を出力する。これらの情報に基づいて、キャリブレーション制御部６１３３は、撮影画像１を回転させて補正処理画像２となるように補正する。なお、外部センサ１１４として、３軸（Ｘ軸、Ｙ軸、Ｚ軸）の情報が得られる場合は、上記に加えて図４の視点方向（奥行き方向）をＺ軸方向として、補正することも可能である。また、外部センサ１１４は、ジャイロセンサや加速度センサなど、２軸または３軸で振動情報が出力できればどのようなものを用いてもよい。 FIG. 4 is a diagram for explaining the rotation processing based on the vibration information. The external sensor 114 outputs vibration information for two axes (X axis (horizontal), Y axis (vertical)). Based on these pieces of information, the calibration control unit 6133 rotates the captured image 1 and corrects it to be the corrected image 2. In addition to the above, when the external sensor 114 obtains information on three axes (X axis, Y axis, Z axis), the viewpoint direction (depth direction) in FIG. 4 may be corrected as the Z axis direction. Is possible. The external sensor 114 may be any sensor such as a gyro sensor or an acceleration sensor as long as vibration information can be output in two or three axes.

図５は、振動情報に基づいた画像の切り出し処理を説明するための図である。図５（ａ）は、振動がない状態を示している。撮影画像５０１において、領域５０２で切り出し処理を行う。振動がない状態では、撮影画像５０１の所定位置（例えば、中央）に所定の解像度、すなわち所定のサイズの切り出し領域５０２を設定する。 FIG. 5 is a diagram for explaining an image clipping process based on vibration information. FIG. 5A shows a state where there is no vibration. In the captured image 501, the clipping process is performed in the area 502. In a state where there is no vibration, a cutout area 502 having a predetermined resolution, that is, a predetermined size is set at a predetermined position (for example, the center) of the captured image 501.

図５（ｂ）は、振動が小さい状態で、振動がない場合の切り出し領域５０３と、振動情報に基づいてシフトさせた切り出し領域５０４とが示されている。図に示すように、解像度としては同一の範囲であるが、振動情報に基づいたシフト方向とシフト量で切り出し領域５０４が決定される。 FIG. 5B shows a cutout area 503 when there is no vibration and no vibration, and a cutout area 504 shifted based on the vibration information. As shown in the figure, although the resolution is in the same range, the cutout region 504 is determined by the shift direction and the shift amount based on the vibration information.

図５（ｃ）は、振動が比較的大きい状態で、振動がない場合の切り出し領域５０３と、振動情報に基づいてシフトさせた切り出し領域５０６が示されている。ここで、切り出し領域５０６のうち斜線の領域５０７は、撮影画像５０１外であり、画像データがない領域である。画像データがない領域については、所定の解像度（所定のサイズの画像）となるようにパディングが施される。以上のように、キャリブレーション制御部６１３３で切り出された画像は、前景背景分離部６１３１へ入力される。 FIG. 5C shows a cutout area 503 in a case where there is no vibration in a relatively large vibration state, and a cutout area 506 shifted based on vibration information. Here, a hatched area 507 in the cutout area 506 is an area outside the captured image 501 and having no image data. The area without image data is padded so as to have a predetermined resolution (an image of a predetermined size). As described above, the image cut out by the calibration control unit 6133 is input to the foreground / background separation unit 6131.

図３に戻って、前景背景分離部６１３１について説明する。前景背景分離部６１３１は、切り出し画像を前景画像と背景画像とに分離する機能を有している。 Returning to FIG. 3, the foreground / background separator 6131 will be described. The foreground / background separator 6131 has a function of separating the cut-out image into a foreground image and a background image.

カメラアダプタ１２０の前景背景分離部６１３１は、まず対応するカメラ１１２による撮影画像（切り出し画像）から前景領域を抽出する。ここで、前景領域は、例えば、撮影画像に対するオブジェクト検出の結果、オブジェクトが検出された領域である。前景背景分離部６１３１は、撮影画像から抽出した前景領域内の画像を前景画像、前景領域以外の領域の画像（前景画像以外）を背景画像として、撮影画像を前景画像と背景画像とに分離する。なお、オブジェクトとは、例えば、人物である。ただし、これに限られず、オブジェクトは、特定の人物（選手、監督、及び／又は審判など）であっても良いし、ボール等の画像パターンが予め定められている物体であっても良い。また、オブジェクトとして、動体が検出されるようにしても良い。人物等の重要なオブジェクトを含む前景画像とそのようなオブジェクトを含まない背景画像を分離して処理することで、画像処理システム１００において生成される仮想視点画像の上記のオブジェクトに該当する部分の画像の品質を向上できる。また、前景画像と背景画像の分離を複数のカメラアダプタ１２０それぞれが行うことで、複数のカメラ１１２を備えた画像処理システム１００における負荷を分散させることができる。 The foreground / background separation unit 6131 of the camera adapter 120 first extracts a foreground area from a photographed image (cutout image) of the corresponding camera 112. Here, the foreground region is, for example, a region where an object has been detected as a result of object detection on the captured image. The foreground / background separation unit 6131 separates the photographed image into the foreground image and the background image, using the image in the foreground area extracted from the photographed image as the foreground image, the image in the region other than the foreground area (other than the foreground image) as the background image. . The object is, for example, a person. However, the object is not limited to this, and the object may be a specific person (player, manager, and / or referee, etc.), or may be an object having a predetermined image pattern such as a ball. A moving object may be detected as the object. An image of a portion corresponding to the object of the virtual viewpoint image generated in the image processing system 100 by separating and processing a foreground image including an important object such as a person and a background image not including such an object. Can improve the quality. Further, the foreground image and the background image are separated by each of the plurality of camera adapters 120, whereby the load on the image processing system 100 including the plurality of cameras 112 can be distributed.

前景背景分離部６１３１は、前景分離部５００１と、背景更新部５００３と、背景切出部５００４と、振動判定部５０１０と、前景判定部５０１１とを有する。前景分離部５００１は、入力された画像に対して、オブジェクト検出を行って前景領域を抽出する。例えば、背景画像５００２と比較して得られた背景差分情報に基づいて、前景領域を抽出する。そして、前景領域の各画素を連結して前景画像を生成する。 The foreground / background separation unit 6131 includes a foreground separation unit 5001, a background update unit 5003, a background cutout unit 5004, a vibration determination unit 5010, and a foreground determination unit 5011. The foreground separation unit 5001 performs object detection on the input image and extracts a foreground region. For example, the foreground area is extracted based on the background difference information obtained by comparison with the background image 5002. Then, each pixel in the foreground area is connected to generate a foreground image.

背景更新部５００３は、背景画像５００２とキャリブレーション制御部６１３３によってカメラ１１２の位置合わせが行われた画像を用いて新しい背景画像を生成し、背景画像５００２を新しい背景画像に更新する。 The background update unit 5003 generates a new background image using the background image 5002 and the image in which the camera 112 is aligned by the calibration control unit 6133, and updates the background image 5002 to a new background image.

背景切出部５００４は、背景画像の伝送のため、背景画像５００２の一部を切り出す制御を行う。振動判定部５０１０は、外部センサ１１４の出力値を処理するセンサ制御部６１４４から振動情報の出力を読み込む。そして、振動情報の振幅について所定の閾値と比較処理を行う。前景判定部５０１１は、前景分離部５００１で分離した前景画像であるオブジェクト全体が撮影画像内に存在するか否かの判断処理を実行する。そして、判断結果に基づき伝送部６１２０に前景画像を送る。 The background cutout unit 5004 performs control to cut out a part of the background image 5002 for transmission of the background image. The vibration determination unit 5010 reads the output of vibration information from the sensor control unit 6144 that processes the output value of the external sensor 114. Then, a comparison process is performed with a predetermined threshold for the amplitude of the vibration information. The foreground determination unit 5011 executes a determination process as to whether or not the entire object that is the foreground image separated by the foreground separation unit 5001 exists in the captured image. Then, the foreground image is sent to the transmission unit 6120 based on the determination result.

ここで、図６を参照して、前景判定部５０１１が行う判断処理について説明する。図６は、前景判定部５０１１が行う判断処理を説明するための図である。図６では、撮影画像６０１について、振動がない状態の切り出し領域６０２が示されている。 Here, the determination process performed by the foreground determination unit 5011 will be described with reference to FIG. FIG. 6 is a diagram for explaining the determination process performed by the foreground determination unit 5011. In FIG. 6, a cutout region 602 with no vibration is shown for the captured image 601.

図６（ａ）において、矩形の領域６０３は、振動情報に基づいて領域６０２からシフトされた切り出し領域である。領域６０３においてオブジェクトが検出された領域が、白いシルエットで表されており、切り出し画像のうち、このシルエット内の画像が前景画像６０５である。切り出し領域６０３のうち、領域６１０は撮影画像６０１外であり、画像データがない領域である。 In FIG. 6A, a rectangular area 603 is a cut-out area shifted from the area 602 based on the vibration information. The area where the object is detected in the area 603 is represented by a white silhouette, and the image in this silhouette is the foreground image 605 among the cut-out images. Of the cutout area 603, the area 610 is outside the captured image 601 and is an area without image data.

図６（ｂ）において、矩形の領域６０６は、振動情報に基づいて領域６０２からシフトされた切り出し領域である。領域６０６においてオブジェクト検出された領域が、白いシルエットで表されている。撮影画像６０１外の領域６１１は、画像データがない領域であり、前景画像６０７は、前景画像６０５に比べると領域６１１に係るシルエットの脚部分が切れている。 In FIG. 6B, a rectangular area 606 is a cut-out area shifted from the area 602 based on the vibration information. A region where an object is detected in the region 606 is represented by a white silhouette. A region 611 outside the photographed image 601 is a region where there is no image data, and the foreground image 607 has a silhouette leg portion related to the region 611 cut out compared to the foreground image 605.

前景判定部５０１１は、前景画像６０５及び６０７について、三次元モデル情報生成部６１３２へ伝送するか否かを、前景画像が後工程における三次元モデルの生成に有効であるか否かによって判定する。すなわち、前景画像として抽出したオブジェクトの全体が撮影画像内に収まっているか否かで判定する。例えば、検出されたオブジェクトの部分から全体の大きさを推定し、推定されたオブジェクトの全体が撮影画像に収まっている場合は、前景画像は三次元モデルの生成に使用できる（有効である）と判定する。また、前景判定部５０１１は、オブジェクトの全体について判定することに限らず、全体の９０％、全体の８０％のように全体の所定以上の割合が撮影画像に収まっている場合に、当該オブジェクトの前景画像が有効であると判定するようにしてもよい。 The foreground determination unit 5011 determines whether or not to transmit the foreground images 605 and 607 to the 3D model information generation unit 6132 based on whether or not the foreground image is effective for generating a 3D model in a subsequent process. That is, the determination is made based on whether or not the entire object extracted as the foreground image is within the captured image. For example, if the overall size is estimated from the detected object portion and the estimated object is entirely within the captured image, the foreground image can be used to generate a 3D model (effective). judge. In addition, the foreground determination unit 5011 is not limited to determining the entire object, and when the ratio of the entire predetermined amount or more such as 90% of the entire object and 80% of the entire object is included in the captured image, It may be determined that the foreground image is valid.

なお、上記に限らず、例えば、検出されたオブジェクトのエッジと、切り出し画像の端辺（撮影画像の端辺）とが重なる場合に、推定されるオブジェクトの全体は、撮影画像に収まっていないと判定するようにしてもよい。 Note that the present invention is not limited to the above. For example, when the edge of the detected object overlaps the edge of the cut-out image (the edge of the captured image), the estimated object is not entirely contained in the captured image. You may make it determine.

図６（ａ）では、前景画像６０５の下端６０４は、撮影画像６０１内であり、前景画像６０５は、人型のオブジェクト全体を表している。この場合、前景判定部５０１１は、前景画像６０５は後工程における三次元モデルの生成に有効であると判定する。 In FIG. 6A, the lower end 604 of the foreground image 605 is in the captured image 601 and the foreground image 605 represents the entire humanoid object. In this case, the foreground determination unit 5011 determines that the foreground image 605 is effective for generating a three-dimensional model in a subsequent process.

図６（ｂ）では、領域６１１を除いた領域６０６に係るシルエットから、人型のシルエットがオブジェクト全体の大きさとして推定される。この場合、前景判定部５０１１は、前景画像６０７はオブジェクト全体を表していないため、三次元モデルの生成に有効ではないと判定する。 In FIG. 6B, a humanoid silhouette is estimated as the size of the entire object from the silhouette related to the region 606 excluding the region 611. In this case, the foreground determination unit 5011 determines that the foreground image 607 does not represent the entire object and thus is not effective for generating a three-dimensional model.

なお、図６において切り出し領域が上下方向にシフトした場合について説明したが、切り出し領域が左右方向にシフトした場合についても、前景判定部５０１１は同様に前景画像が有効であるか否かを判定する。 Although the case where the cutout area is shifted in the vertical direction has been described with reference to FIG. 6, the foreground determination unit 5011 similarly determines whether the foreground image is valid even when the cutout area is shifted in the horizontal direction. .

上述した前景判定部５０１１の判定結果に応じて、前景分離部５００１で分離された前景画像が伝送部６１２０を介して三次元モデル情報生成部６１３２に伝送される。 Depending on the determination result of the foreground determination unit 5011 described above, the foreground image separated by the foreground separation unit 5001 is transmitted to the 3D model information generation unit 6132 via the transmission unit 6120.

図３に戻って、三次元モデル情報生成部６１３２について説明する。三次元モデル情報生成部６１３２は、三次元モデル処理部５００５と、他カメラ前景受信部５００６と、カメラパラメータ受信部５００７とを有する。 Returning to FIG. 3, the three-dimensional model information generation unit 6132 will be described. The 3D model information generation unit 6132 includes a 3D model processing unit 5005, another camera foreground reception unit 5006, and a camera parameter reception unit 5007.

他カメラ前景受信部５００６は、他のカメラアダプタ１２０で前景背景分離された前景画像を受信する。カメラパラメータ受信部５００７は、カメラ固有の内部パラメータ（焦点距離、画像中心、及びレンズ歪みパラメータ等）と、カメラの位置姿勢を表す外部パラメータ（回転行列及び位置ベクトル等）を受信する。 The other camera foreground receiving unit 5006 receives the foreground image obtained by separating the foreground and background with the other camera adapter 120. The camera parameter receiving unit 5007 receives camera-specific internal parameters (focal length, image center, lens distortion parameters, etc.) and external parameters (rotation matrix, position vector, etc.) representing the position and orientation of the camera.

三次元モデル処理部５００５は、前景分離部５００１で分離された前景画像と、伝送部６１２０を介して受信した他のカメラ１１２の前景画像を用いて、例えば、ステレオカメラの原理等から三次元モデルに関わる画像情報を逐次生成する。 The 3D model processing unit 5005 uses the foreground image separated by the foreground separation unit 5001 and the foreground image of the other camera 112 received via the transmission unit 6120, for example, based on the principle of a stereo camera. The image information related to the is generated sequentially.

次に、図７〜１０を参照して、上述した画像処理部６１３０の処理の流れについて説明する。図７は、画像処理部６１３０内のキャリブレーション制御部６１３３及び前景背景分離部６１３１の処理を示すフローチャートである。ステップＳ１０１において、振動判定部５０１０は、外部センサ１１４の出力値を処理するセンサ制御部６１４４から振動情報を読み込む。 Next, with reference to FIGS. 7 to 10, the processing flow of the above-described image processing unit 6130 will be described. FIG. 7 is a flowchart showing processing of the calibration control unit 6133 and the foreground / background separation unit 6131 in the image processing unit 6130. In step S101, the vibration determination unit 5010 reads vibration information from the sensor control unit 6144 that processes the output value of the external sensor 114.

ステップＳ１０２において、振動判定部５０１０は、振動情報の振幅値について所定の閾値と比較処理を行う。この閾値は、カメラを取り付ける場所において、あらかじめ計測して設定される値であり、以降の画像処理が不可能となる振動振幅として設定されるものである。振動情報の振幅値が所定の閾値を超える場合（ステップＳ１０２において、ＮＯ）、処理はステップＳ１０３に進む。所定の閾値以下の場合（ステップＳ１０２において、ＹＥＳ）、処理はステップＳ１０４に進む。 In step S102, the vibration determination unit 5010 compares the amplitude value of the vibration information with a predetermined threshold value. This threshold value is a value that is measured and set in advance at the place where the camera is attached, and is set as a vibration amplitude that makes subsequent image processing impossible. If the amplitude value of the vibration information exceeds a predetermined threshold (NO in step S102), the process proceeds to step S103. If it is equal to or smaller than the predetermined threshold (YES in step S102), the process proceeds to step S104.

ステップＳ１０３では、振動判定部５０１０は、振動情報の振幅値が所定の閾値を超えた旨のエラー情報を撮像画像のメタ情報にセットする。撮像画像のメタ情報は伝送部６１２０に送られる。この処理によって、位置合わせや前景背景分離処理などの画像処理を実行せず、いち早くエラー状態として出力することが可能となる。そして、処理は終了する。 In step S103, the vibration determination unit 5010 sets error information indicating that the amplitude value of the vibration information exceeds a predetermined threshold value in the meta information of the captured image. The meta information of the captured image is sent to the transmission unit 6120. By this processing, it is possible to output as an error state quickly without executing image processing such as alignment and foreground / background separation processing. Then, the process ends.

ステップＳ１０４では、振動判定部５０１０は、伝送部６１２０を介して所定範囲内の一番近い隣接カメラから、そのさらに隣接カメラへと順次、撮像画像のメタ情報を読み出し、エラーの有無を調べる。そして、エラーの発生していないメタ情報を送信したカメラが所定範囲内かどうかを判断する。すなわち、エラーの無い（非エラーの）、最も近隣のカメラが所定範囲内に存在するかを判定する。なお、所定範囲内とは、そのカメラの画像を用いて、当該カメラの画像処理が実行可能である範囲であって、カメラの設置位置や設置角度などであらかじめ設定されるものである。そうして、所定範囲内のカメラで読み出し可能と判断された場合（ステップＳ１０４において、ＹＥＳ）は、ステップＳ１０６に進む。所定範囲外となった場合（ステップＳ１０４において、ＮＯ）は、ステップＳ１０５に進むことになる。 In step S104, the vibration determination unit 5010 sequentially reads the meta information of the captured image from the nearest neighboring camera within the predetermined range to the further neighboring camera via the transmission unit 6120, and checks whether there is an error. Then, it is determined whether or not the camera that transmitted the meta information in which no error has occurred is within a predetermined range. That is, it is determined whether or not the nearest camera having no error (non-error) exists within a predetermined range. The term “within a predetermined range” refers to a range in which image processing of the camera can be executed using the image of the camera, and is set in advance by the installation position, the installation angle, and the like of the camera. If it is determined that reading is possible with a camera within a predetermined range (YES in step S104), the process proceeds to step S106. If it is outside the predetermined range (NO in step S104), the process proceeds to step S105.

ステップＳ１０５では、振動判定部５０１０は、読み出すべき隣接カメラが所定範囲外となった旨のエラー情報を撮像画像のメタ情報にセットする。撮像画像のメタ情報は伝送部６１２０に送られる。そして、処理は終了する。ステップＳ１０６では、キャリブレーション制御部６１３３は、エラー無しカメラの画像を読み出す。 In step S105, the vibration determination unit 5010 sets error information indicating that the adjacent camera to be read is out of the predetermined range in the meta information of the captured image. The meta information of the captured image is sent to the transmission unit 6120. Then, the process ends. In step S106, the calibration control unit 6133 reads the image of the camera without error.

ステップＳ１０７では、キャリブレーション制御部６１３３は、撮影画像に対して切り出し処理を行う。すなわち、カメラに内蔵された加速度センサあるいはジャイロセンサなどのセンサからの振動情報に基づいて入力画像に対する画像位置のシフトや画像の回転処理を行い、所定の領域を切り出す処理を行う。これにより、フレーム画像間のブレが抑制されるが、ブレ補正の手法としてはその他の方法を用いてもよい。例えば、時間的に連続した複数のフレーム画像を比較することで画像の移動量を推定し補正するような画像処理による方法や、レンズシフト方式及びセンサシフト方式などのカメラの内部で実現する方法等でもよい。また、キャリブレーション制御部６１３３は、フロントエンドサーバ２３０から受信したパラメータに基づいて、入力画像の画素値にオフセット値を加算するなどの色補正処理も行う。 In step S107, the calibration control unit 6133 performs a clipping process on the captured image. That is, based on vibration information from a sensor such as an acceleration sensor or a gyro sensor built in the camera, the image position is shifted with respect to the input image and the image is rotated to cut out a predetermined area. This suppresses blurring between frame images, but other methods may be used as a blurring correction method. For example, a method based on image processing that estimates and corrects a moving amount of an image by comparing a plurality of temporally continuous frame images, a method realized inside a camera such as a lens shift method and a sensor shift method, etc. But you can. The calibration control unit 6133 also performs color correction processing such as adding an offset value to the pixel value of the input image based on the parameters received from the front end server 230.

ステップＳ１０８では、キャリブレーション制御部６１３３は、切り出し画像とステップＳ１０６で読み出した画像との位置合わせ処理を実行する。例えば、隣接のカメラ画像をカメラの設置情報から射影変換し、各画像の輝度分布から特徴点を抽出し、当該特徴点同士をマッチングさせる。 In step S108, the calibration control unit 6133 executes the alignment process between the cut-out image and the image read in step S106. For example, projective transformation is performed on adjacent camera images from camera installation information, feature points are extracted from the luminance distribution of each image, and the feature points are matched with each other.

ステップＳ１０９では、キャリブレーション制御部６１３３は、位置合わせ処理が成功したか否かを判断する。例えば、撮影範囲内に旗などの大きな物体があり、撮像画像が大きく異なっている等により、画像間で特徴点のマッチングがとれなかった場合は、位置合わせ処理が成功しなかった（失敗した）と判断する。位置合わせが成功しなかったと判断した場合（ステップＳ１０９において、ＮＯ）には、処理はステップＳ１１０に進む。位置合わせ処理が成功したと判断した場合（ステップＳ１０９において、ＹＥＳ）には、処理はステップＳ１１１に進む。 In step S109, the calibration control unit 6133 determines whether the alignment process has been successful. For example, if there was a large object such as a flag in the shooting range, and the feature points were not matched between the images because the captured images differed significantly, the alignment process was not successful (failed). Judge. If it is determined that the alignment has not been successful (NO in step S109), the process proceeds to step S110. If it is determined that the alignment process has been successful (YES in step S109), the process proceeds to step S111.

ステップＳ１１０では、キャリブレーション制御部６１３３は、位置合わせ処理が成功しなかった旨のエラー情報を撮像画像のメタ情報にセットする。撮影画像のメタ情報は伝送部６１２０に送られる。そして、処理は終了する。 In step S110, the calibration control unit 6133 sets error information indicating that the alignment process has not been successful in the meta information of the captured image. The meta information of the captured image is sent to the transmission unit 6120. Then, the process ends.

ステップＳ１１１では、前景分離部５００１は、キャリブレーション制御部６１３３から入力された画像について、前景画像と背景画像に分離する処理を実行する。 In step S111, the foreground separation unit 5001 executes processing for separating the image input from the calibration control unit 6133 into a foreground image and a background image.

ステップＳ１１２では、前景判定部５０１１は、振動情報に基づいてシフトされた切り出し領域全体が撮影画像に含まれるかどうかを判断する。例えば、撮影画像が表す矩形の座標情報と切り出し領域の座標情報とを比較すればよい。切り出し領域全体が撮影画像内にない場合（ステップＳ１１２において、ＮＯ）は、処理はステップＳ１１３に進む。切り出し領域全体が撮影画像内にある場合（ステップＳ１１２において、ＹＥＳ）は、処理はステップＳ１１５に進む。 In step S112, the foreground determination unit 5011 determines whether or not the entire cutout area shifted based on the vibration information is included in the captured image. For example, the rectangular coordinate information represented by the captured image may be compared with the coordinate information of the cutout area. If the entire cutout area is not in the captured image (NO in step S112), the process proceeds to step S113. If the entire cutout area is in the captured image (YES in step S112), the process proceeds to step S115.

ステップＳ１１３では、前景判定部５０１１は、ステップＳ１１１で前景分離部５００１が出力した前景画像のオブジェクト全体が撮影画像に含まれるかどうかを判断する。例えば、前景画像として抽出された人物やボール等のオブジェクト全体が撮影画像に含まれているかを、撮影画像が表す矩形の座標情報と前景画像の座標情報とを比較する。前景画像のオブジェクト全体が撮影画像に含まれていない場合（ステップＳ１１３において、ＮＯ）は、処理はステップＳ１１４に進む。前景画像のオブジェクト全体が撮影画像に含まれている場合（ステップＳ１１３において、ＹＥＳ）は、処理はステップＳ１１５に進む。 In step S113, the foreground determination unit 5011 determines whether the entire image of the foreground image output by the foreground separation unit 5001 in step S111 is included in the captured image. For example, the rectangular coordinate information represented by the photographed image and the coordinate information of the foreground image are compared to determine whether the entire object such as a person or a ball extracted as the foreground image is included in the photographed image. If the entire object of the foreground image is not included in the captured image (NO in step S113), the process proceeds to step S114. If the entire object of the foreground image is included in the captured image (YES in step S113), the process proceeds to step S115.

ステップＳ１１４では、前景判定部５０１１は、撮像画像のメタ情報に前景画像のオブジェクト全体が撮影画像に含まれていない旨のエラー情報をセットする。この場合には前景画像の出力は行われない。 In step S114, the foreground determination unit 5011 sets error information indicating that the entire object of the foreground image is not included in the captured image in the meta information of the captured image. In this case, the foreground image is not output.

ステップＳ１１５では、前景判定部５０１１は、前景分離部５００１が出力した前景画像を、伝送部６１２０に送る。なお、ステップＳ１０３，Ｓ１０５，Ｓ１１０，Ｓ１１４でメタ情報にセットされたエラー情報は、他のカメラアダプタ１２０が実行するステップＳ１０４の判定に用いられる。また、ステップS１０７の切り出し処理は、位置合わせ処理が成功した後で行ってもよい。 In step S115, the foreground determination unit 5011 sends the foreground image output from the foreground separation unit 5001 to the transmission unit 6120. Note that the error information set in the meta information in steps S103, S105, S110, and S114 is used for the determination in step S104 executed by another camera adapter 120. Further, the cutout process in step S107 may be performed after the alignment process is successful.

上述した処理により、あらかじめ設定された振動の大きさによって伝送をするかどうかを決めるよりも、以下の効果が期待できる。すなわち、リアルタイムのシーンの画像によって、前景画像のオブジェクト全体が撮影画像内にあるかどうかを判断してから伝送するかどうかを決定するので、実質的に振動に対して補正領域を拡大するのと同等の効果を得ることができる。言い換えると、本実施形態では、カメラの撮影画像を仮想視点画像の生成のために用いるべきか否かを、当該撮影画像内におけるオブジェクトの位置の情報を用いて判断する。これにより、単なるカメラの振動レベルに基づいて判断するよりも、画質の高い仮想視点画像を生成できる可能性が高まる。 By the processing described above, the following effects can be expected rather than determining whether or not to perform transmission according to the magnitude of vibration set in advance. In other words, the real-time scene image determines whether or not the entire foreground image object is in the captured image, and then determines whether or not to transmit. The same effect can be obtained. In other words, in this embodiment, it is determined using information on the position of the object in the captured image whether or not the captured image of the camera should be used for generating the virtual viewpoint image. This increases the possibility that a virtual viewpoint image with high image quality can be generated rather than making a determination based on a simple camera vibration level.

なお、本実施形態の前景分離部５００１は、入力された画像の各画素と、背景画像５００２内の対応する位置にある画素との画素値の差分に基づいて撮影画像から前景領域を抽出する。ここで、図８（Ａ）に示した背景画像５００２に対して、図８（Ｂ）のような、人物が映っている画像５１０２が入力されたとすると、人物が映っている領域の各画素においては差分が大きくなる。差分が閾値Ｌより大きい場合には、その画素が前景として設定される。そして、前景画素を連結することにより、前景領域が抽出される。連結方法としては、例えば、公知の領域成長法を用いることができる。なお、前景検出についてはこの他にも、特徴量や機械学習を用いる手法などさまざまな手法がある。前景分離部５００１は、前景領域の画像（前景画像）を伝送部６１２０へ出力する。 Note that the foreground separation unit 5001 of this embodiment extracts a foreground region from the captured image based on the difference in pixel value between each pixel of the input image and a pixel at a corresponding position in the background image 5002. Here, if an image 5102 showing a person as shown in FIG. 8B is input to the background image 5002 shown in FIG. 8A, each pixel in the area where the person appears is shown. Increases the difference. If the difference is greater than the threshold L, the pixel is set as the foreground. Then, the foreground region is extracted by connecting the foreground pixels. As a connection method, for example, a known region growth method can be used. There are various other methods for foreground detection, such as a method using feature amounts or machine learning. The foreground separation unit 5001 outputs an image of the foreground area (foreground image) to the transmission unit 6120.

次に、背景更新部５００３による背景更新処理（背景画像の生成処理）について説明する。背景更新部５００３は、入力画像と、メモリに保存されている背景画像とを用いて、背景画像５００２を更新する処理を行う。更新処理は各画素に対して行われる。より具体的には、背景更新部５００３は、入力画像の各画素に対して、背景画像内の対応する位置にある画素との画素値の差分を導出する。そして、差分が定められた閾値Ｋより小さい画素は背景であると判断し、差分が閾値Ｋより大きい画素は、背景以外の何らかのオブジェクトが映っていると判断する。 Next, background update processing (background image generation processing) by the background update unit 5003 will be described. The background update unit 5003 performs a process of updating the background image 5002 using the input image and the background image stored in the memory. The update process is performed for each pixel. More specifically, the background update unit 5003 derives a pixel value difference between each pixel of the input image and a pixel at a corresponding position in the background image. Then, it is determined that a pixel whose difference is smaller than the predetermined threshold K is the background, and a pixel whose difference is larger than the threshold K is determined that some object other than the background is reflected.

次いで、背景更新部５００３は、背景であると判断した画素について、入力画像の画素値と背景画像の画素値とを一定の比率で混合した値によって、背景画像を更新する。このようにして背景画像５００２は生成される。なお、背景更新処理については他にも様々な手法が考えられる。 Next, the background update unit 5003 updates the background image with a value obtained by mixing the pixel value of the input image and the pixel value of the background image at a certain ratio for the pixel determined to be the background. In this way, the background image 5002 is generated. Various other methods can be considered for the background update processing.

背景更新部５００３の処理の後、背景切出部５００４は、背景画像５００２からその一部を読み出し、伝送部６１２０へ送信する。スタジアム等でサッカーなどの競技を撮影する際に、フィールド全体を死角なく撮影できるようカメラ１１２を複数配置した場合、カメラ１１２間で背景情報の大部分が重複するという特徴がある。背景情報は膨大なため、伝送帯域制約の面から重複した部分は削除して伝送することで伝送量を削減することができる。次に、図９を参照して、背景切出部５００４の処理を説明する。 After the processing of the background update unit 5003, the background cutout unit 5004 reads a part of the background image 5002 and transmits it to the transmission unit 6120. When shooting a game such as soccer in a stadium or the like, when a plurality of cameras 112 are arranged so that the entire field can be shot without blind spots, most of the background information overlaps between the cameras 112. Since the background information is enormous, it is possible to reduce the transmission amount by deleting and transmitting the duplicated portion in terms of transmission band restrictions. Next, the processing of the background cutout unit 5004 will be described with reference to FIG.

図９は、画像処理部６１３０内の背景切出部５００４の処理を示すフローチャートである。ステップＳ４０１では、背景切出部５００４は、例えば、図８（Ｃ）に示す部分領域３４０１のように、背景画像５００２の中央部分を設定する。つまり、部分領域３４０１は自カメラ１１２が伝送を担当する背景画像の領域であり、それ以外の領域は、他のカメラ１１２が伝送を担当する。 FIG. 9 is a flowchart showing processing of the background cutout unit 5004 in the image processing unit 6130. In step S401, the background cutout unit 5004 sets the central portion of the background image 5002, for example, as a partial region 3401 shown in FIG. That is, the partial area 3401 is an area of the background image that the camera 112 is responsible for transmission, and other areas are responsible for transmission of the other areas.

ステップＳ４０２では、背景切出部５００４は、設定された背景画像の部分領域３４０１を切り出す。 In step S402, the background cutout unit 5004 cuts out the set partial area 3401 of the background image.

ステップＳ４０３では、背景切出部５００４は、部分背景画像を伝送部６１２０へ出力し、処理を終了する。 In step S403, the background cutout unit 5004 outputs the partial background image to the transmission unit 6120, and ends the process.

出力された背景画像は画像コンピューティングサーバ２００に集められ、背景モデルのテクスチャとして利用される。各カメラアダプタ１２０において背景画像５００２を切出す位置は、背景モデルに対するテクスチャ情報が不足しないように、予め決められたパラメータ値に応じて設定されている。通常は伝送データ量をより少なくするため、切出す領域は必要最小限となるように設定される。これにより、膨大な背景情報の伝送量を削減できるという効果があり、高解像度化にも対応できるシステムにすることができる。 The output background images are collected by the image computing server 200 and used as the texture of the background model. The position at which the background image 5002 is cut out in each camera adapter 120 is set according to a predetermined parameter value so that the texture information for the background model is not insufficient. Usually, in order to reduce the amount of transmission data, the area to be cut out is set to be the minimum necessary. Thereby, there is an effect that the transmission amount of the vast amount of background information can be reduced, and a system that can cope with higher resolution can be obtained.

前景背景分離部６１３１の処理後、三次元モデル情報生成部６１３２は、前景画像を用いて三次元モデル情報の生成を行う。カメラアダプタが隣のカメラからの前景画像を受信すると、伝送部６１２０を介して他カメラ前景受信部５００６にその前景画像が入力される。 After the processing of the foreground / background separation unit 6131, the 3D model information generation unit 6132 generates 3D model information using the foreground image. When the camera adapter receives the foreground image from the adjacent camera, the foreground image is input to the other camera foreground receiving unit 5006 via the transmission unit 6120.

次に、図１０を参照して、前景画像が入力されたときに三次元モデル処理部５００５が実行する処理を説明する。ここで、画像コンピューティングサーバ２００が各カメラ１１２の撮影画像を集め、画像処理を開始し仮想視点画像を生成する場合に、計算量が多く画像生成に係る時間が長くなることが考えられる。特に、三次元モデル生成における計算量が顕著に大きくなるおそれがある。そこで、図１０に示す処理では、画像コンピューティングサーバ２００における処理量を低減するために、カメラアダプタ１２０間をデイジーチェーンでつないでデータを伝送する中で、逐次三次元モデル情報を生成する方法について説明する。 Next, processing executed by the 3D model processing unit 5005 when a foreground image is input will be described with reference to FIG. Here, when the image computing server 200 collects the captured images of the respective cameras 112, starts image processing, and generates a virtual viewpoint image, it is conceivable that the amount of calculation is large and the time required for image generation becomes long. In particular, the amount of calculation in generating a three-dimensional model may be significantly increased. Therefore, in the process shown in FIG. 10, in order to reduce the processing amount in the image computing server 200, a method of sequentially generating 3D model information while transmitting data by connecting the camera adapters 120 in a daisy chain. explain.

図１０は、画像処理部６１３０内の三次元モデル情報生成部６１３２の処理を示すフローチャートである。ステップＳ５０１では、他カメラ前景受信部５００６は、他のカメラ１１２により撮影された前景画像を受信する。 FIG. 10 is a flowchart showing the processing of the 3D model information generation unit 6132 in the image processing unit 6130. In step S <b> 501, the other camera foreground receiving unit 5006 receives the foreground image captured by the other camera 112.

ステップＳ５０２では、三次元モデル処理部５００５は、受信した前景画像を撮影したカメラ１１２が自カメラ１１２と同一注視点のグループに属し、且つ、隣接カメラであるかどうかを確認する。注視点が同一グループかつ隣接カメラである場合（ステップＳ５０２において、ＹＥＳ）、処理はステップＳ５０３に進む。そうでない場合（ステップＳ５０２において、ＮＯ）、三次元モデル処理部５００５は、当該他カメラ１１２の前景画像との相関がないと判断し、処理は終了する。 In step S502, the 3D model processing unit 5005 confirms whether the camera 112 that captured the received foreground image belongs to the same point of sight group as the camera 112 and is an adjacent camera. If the gazing point is the same group and an adjacent camera (YES in step S502), the process proceeds to step S503. Otherwise (NO in step S502), the 3D model processing unit 5005 determines that there is no correlation with the foreground image of the other camera 112, and the process ends.

ステップＳ５０３では、三次元モデル処理部５００５は、前景画像のデプス情報の導出を行う。前景画像のデプス情報の導出は、具体的には、まず前景分離部５００１から受信した前景画像と他のカメラ１１２の前景画像との対応付けを行う。そして、対応付けされた各画素の座標値とカメラパラメータに基づいて、各前景画像上の各画素のデプス情報を導出する。ここで、画像の対応付けの手法としては、例えば、ブロックマッチング法が用いられる。ブロックマッチング法は良く知られた方法であるので詳細な説明は省く。また、対応付けの方法としてはその他にも、特徴点検出、特徴量算出、及びマッチング処理などを組み合わせて性能を向上させるようなさまざまな手法があり、どの手法を用いてもよい。 In step S503, the 3D model processing unit 5005 derives depth information of the foreground image. Specifically, the depth information of the foreground image is derived by first associating the foreground image received from the foreground separation unit 5001 with the foreground image of another camera 112. Then, the depth information of each pixel on each foreground image is derived based on the coordinate value and camera parameter of each associated pixel. Here, as a method for associating images, for example, a block matching method is used. Since the block matching method is a well-known method, a detailed description is omitted. In addition, there are various methods for improving performance by combining feature point detection, feature amount calculation, matching processing, and the like as the association method, and any method may be used.

ステップＳ５０４では、三次元モデル処理部５００５は、前景画像の三次元モデル情報を導出する。三次元モデル情報の導出は、具体的には、前景画像の各画素について、ステップＳ５０３で導出したデプス情報と、カメラパラメータ受信部５００７に格納されたカメラパラメータに基づいて画素の世界座標値を導出する。そして、世界座標値と画素値をセットとして、点群として構成される三次元モデルの１つの点データを設定する。以上の処理により、前景分離部５００１から受信した前景画像から得られた三次元モデルの一部の点群情報と、他のカメラ１１２の前景画像から得られた三次元モデルの一部の点群情報とが得られる。 In step S504, the 3D model processing unit 5005 derives 3D model information of the foreground image. Specifically, for the derivation of the 3D model information, for each pixel of the foreground image, the world coordinate value of the pixel is derived based on the depth information derived in step S503 and the camera parameter stored in the camera parameter receiving unit 5007. To do. Then, one point data of the three-dimensional model configured as a point group is set with the world coordinate value and the pixel value as a set. Through the above processing, some point cloud information of the 3D model obtained from the foreground image received from the foreground separation unit 5001 and some point clouds of the 3D model obtained from the foreground image of the other camera 112. Information.

ステップＳ５０５では、三次元モデル処理部５００５は、得られた三次元モデル情報にカメラ番号およびフレーム番号をメタ情報として付加し（メタ情報は、例えば、タイムコードや絶対時刻でもよい。）伝送部６１２０へ出力する。 In step S505, the 3D model processing unit 5005 adds a camera number and a frame number as meta information to the obtained 3D model information (the meta information may be a time code or an absolute time, for example). Output to.

これによって、カメラアダプタ１２０間がデイジーチェーンで接続され、かつ、複数の注視点が設定される場合でも、デイジーチェーンによってデータを伝送しながら、カメラ１１２間の相関に応じて画像処理を行い、三次元モデル情報を逐次生成することができる。その結果、処理が高速化される効果がある。 As a result, even when the camera adapters 120 are connected in a daisy chain and a plurality of gazing points are set, image processing is performed according to the correlation between the cameras 112 while transmitting data through the daisy chain. Original model information can be generated sequentially. As a result, there is an effect of speeding up the processing.

なお本実施形態では、以上に説明した各処理はカメラアダプタ１２０に実装されたＦＰＧＡまたはＡＳＩＣなどのハードウェアによって実行されるが、例えばＣＰＵ、ＧＰＵ、ＤＳＰなどを用いてソフトウェア処理によって実行してもよい。また本実施形態ではカメラアダプタ１２０内で三次元モデル情報生成を実行したが、各カメラ１１２からの全ての前景画像が集められる画像コンピューティングサーバ２００が三次元モデル情報の生成を行ってもよい。 In the present embodiment, each process described above is executed by hardware such as FPGA or ASIC mounted on the camera adapter 120, but may be executed by software processing using, for example, a CPU, GPU, DSP, or the like. Good. In the present embodiment, the 3D model information generation is performed in the camera adapter 120. However, the image computing server 200 in which all foreground images from each camera 112 are collected may generate the 3D model information.

＜フロントエンドサーバ２３０の機能ブロック＞
図１１は、フロントエンドサーバ２３０の機能構成を説明するためのブロック図である。フロントエンドサーバ２３０は、制御部２１１０と、データ入力制御部２１２０と、データ同期部２１３０と、ＣＡＤデータ記憶部２１３５と、キャリブレーション部２１４０と、画像処理部２１５０とを含む。さらに、フロントエンドサーバ２３０は、三次元モデル結合部２１６０と、画像結合部２１７０と、撮影データファイル生成部２１８０と、非撮影データファイル生成部２１８５と、ＤＢアクセス制御部２１９０とを含む。 <Functional blocks of front-end server 230>
FIG. 11 is a block diagram for explaining a functional configuration of the front-end server 230. The front end server 230 includes a control unit 2110, a data input control unit 2120, a data synchronization unit 2130, a CAD data storage unit 2135, a calibration unit 2140, and an image processing unit 2150. Further, the front-end server 230 includes a 3D model combining unit 2160, an image combining unit 2170, a shooting data file generation unit 2180, a non-shooting data file generation unit 2185, and a DB access control unit 2190.

制御部２１１０は、ＣＰＵやＤＲＡＭ、プログラムデータや各種データを記憶したＨＤＤやＮＡＮＤメモリなどの記憶媒体、Ｅｔｈｅｒｎｅｔ（登録商標）等のハードウェアで構成される。そして、フロントエンドサーバ２３０の各機能ブロック及びフロントエンドサーバ２３０のシステム全体の制御を行う。また、モード制御を行って、キャリブレーション動作や撮影前の準備動作、及び撮影中動作などの動作モードを切り替える。また、例えば、ネットワーク３１０ｂ（Ｅｔｈｅｒｎｅｔ（登録商標））を通じて制御ステーション３１０からの制御指示を受信し、各モードの切り替えやデータの入出力などを行う。また、同じくネットワーク３１０ｂを通じて制御ステーション３１０からスタジアムＣＡＤデータ（スタジアム形状データ）を取得し、スタジアムＣＡＤデータをＣＡＤデータ記憶部２１３５と撮影データファイル生成部２１８０に送信する。なお、本実施形態におけるスタジアムＣＡＤデータ（スタジアム形状データ）はスタジアムの形状を示す三次元データであり、メッシュモデルやその他の三次元形状を表すデータであればよく、ＣＡＤ形式に限定されない。 The control unit 2110 includes a CPU, DRAM, a storage medium such as an HDD or NAND memory that stores program data and various data, and hardware such as Ethernet (registered trademark). Then, each functional block of the front end server 230 and the entire system of the front end server 230 are controlled. Also, mode control is performed to switch operation modes such as a calibration operation, a preparatory operation before photographing, and an operation during photographing. Further, for example, a control instruction from the control station 310 is received through the network 310b (Ethernet (registered trademark)), and each mode is switched and data is input / output. Similarly, stadium CAD data (stadium shape data) is acquired from the control station 310 through the network 310b, and the stadium CAD data is transmitted to the CAD data storage unit 2135 and the imaging data file generation unit 2180. The stadium CAD data (stadium shape data) in the present embodiment is three-dimensional data indicating the shape of the stadium, and may be data representing a mesh model or other three-dimensional shapes, and is not limited to the CAD format.

データ入力制御部２１２０は、ネットワーク１８０ａ，１８０ｂ（Ｅｔｈｅｒｎｅｔ（登録商標））等の通信路とスイッチングハブ１８０を介して、カメラアダプタ１２０とネットワーク接続されている。そして、データ入力制御部２１２０は、ネットワーク１８０ａ，１８０ｂ、スイッチングハブ１８０を通してカメラアダプタ１２０から前景画像、背景画像、被写体の三次元モデル、音声データ、及びカメラキャリブレーション撮影画像データを取得する。また、データ入力制御部２１２０は、取得した前景画像及び背景画像を、データ同期部２１３０に送信し、カメラキャリブレーション撮影画像データをキャリブレーション部２１４０に送信する。 The data input control unit 2120 is connected to the camera adapter 120 via a communication path such as networks 180 a and 180 b (Ethernet (registered trademark)) and the switching hub 180. Then, the data input control unit 2120 acquires the foreground image, the background image, the three-dimensional model of the subject, the audio data, and the camera calibration photographed image data from the camera adapter 120 through the networks 180a and 180b and the switching hub 180. In addition, the data input control unit 2120 transmits the acquired foreground image and background image to the data synchronization unit 2130, and transmits camera calibration captured image data to the calibration unit 2140.

データ同期部２１３０は、カメラアダプタ１２０から取得したデータをＤＲＡＭ上に一次的に記憶し、前景画像、背景画像、音声データ及び三次元モデルデータが揃うまでバッファする。なお、前景画像、背景画像、音声データ及び三次元モデルデータをまとめて、以降では撮影データと称する。撮影データにはルーティング情報やタイムコード情報（時間情報）、カメラ識別子、位置合わせの可否、前景画像の撮影範囲内外の判定等のメタ情報が付与されており、データ同期部２１３０は、このメタ情報を元にデータの属性を確認する。これにより、データ同期部２１３０は、同一時刻のデータであることなどを判断してデータがそろったことを確認する。これは、ネットワークによって各カメラアダプタ１２０から転送されたデータについて、ネットワークパケットの受信順序は保証されず、ファイル生成に必要なデータが揃うまでバッファする必要があるためである。データがそろったら、データ同期部２１３０は、前景画像及び背景画像を画像処理部２１５０に、三次元モデルデータを三次元モデル結合部２１６０に、音声データを撮影データファイル生成部２１８０にそれぞれ送信する。 The data synchronization unit 2130 temporarily stores the data acquired from the camera adapter 120 on the DRAM, and buffers until the foreground image, background image, audio data, and 3D model data are ready. The foreground image, the background image, the audio data, and the 3D model data are collectively referred to as “photographed data” hereinafter. The shooting data is provided with meta information such as routing information, time code information (time information), a camera identifier, whether or not alignment is possible, and whether or not the shooting range of the foreground image is within, and the data synchronization unit 2130 uses this meta information. Check the data attributes based on. As a result, the data synchronization unit 2130 determines that the data is ready by determining that the data is at the same time. This is because the network packet reception order is not guaranteed for the data transferred from each camera adapter 120 by the network, and it is necessary to buffer until data necessary for file generation is available. When the data is ready, the data synchronization unit 2130 transmits the foreground image and the background image to the image processing unit 2150, the 3D model data to the 3D model combination unit 2160, and the audio data to the captured data file generation unit 2180.

ＣＡＤデータ記憶部２１３５は、制御部２１１０から受け取ったスタジアム形状を示す三次元データを、ＤＲＡＭまたはＨＤＤやＮＡＮＤメモリ等の記憶媒体に保存する。そして、画像結合部２１７０に対して、スタジアム形状データの要求を受け取った際に保存されたスタジアム形状データを送信する。 The CAD data storage unit 2135 stores the three-dimensional data indicating the stadium shape received from the control unit 2110 in a storage medium such as a DRAM or HDD or NAND memory. And the stadium shape data preserve | saved when the request | requirement of stadium shape data is received with respect to the image coupling | bond part 2170 is transmitted.

キャリブレーション部２１４０は、カメラのキャリブレーション動作を行い、キャリブレーションによって得られたカメラパラメータを後述する非撮影データファイル生成部２１８５に送る。また同時に、自身の記憶領域にもカメラパラメータを保持し、後述する三次元モデル結合部２１６０にカメラパラメータ情報を提供する。 The calibration unit 2140 performs a camera calibration operation and sends camera parameters obtained by the calibration to a non-photographed data file generation unit 2185 described later. At the same time, the camera parameters are also stored in its own storage area, and camera parameter information is provided to a 3D model combining unit 2160 described later.

画像処理部２１５０は、前景画像や背景画像に対して、カメラ間の色や輝度値の合わせこみ、ＲＡＷ画像データが入力される場合には現像処理、及びカメラのレンズ歪みの補正等の処理を行う。そして、画像処理を行った前景画像は、撮影データファイル生成部２１８０に、背景画像は画像結合部２１７０にそれぞれ送信する。 The image processing unit 2150 performs processing such as matching of colors and brightness values between cameras with respect to the foreground image and background image, development processing when RAW image data is input, and correction of camera lens distortion. Do. Then, the foreground image subjected to the image processing is transmitted to the shooting data file generation unit 2180, and the background image is transmitted to the image combination unit 2170.

三次元モデル結合部２１６０は、カメラアダプタ１２０から取得した同一時刻の三次元モデルデータをキャリブレーション部２１４０が生成したカメラパラメータを用いて結合する。そして、例えば、ＶｉｓｕａｌＨｕｌｌと呼ばれる方法を用いて、スタジアム全体における前景画像の三次元モデルデータを生成する。生成した三次元モデルは撮影データファイル生成部２１８０に送信される。 The 3D model combining unit 2160 combines the 3D model data at the same time acquired from the camera adapter 120 using the camera parameters generated by the calibration unit 2140. Then, for example, three-dimensional model data of the foreground image in the entire stadium is generated using a method called VisualHull. The generated three-dimensional model is transmitted to the photographing data file generation unit 2180.

画像結合部２１７０は、画像処理部２１５０から背景画像を取得し、ＣＡＤデータ記憶部２１３５からスタジアムの三次元形状データ（スタジアム形状データ）を取得し、取得したスタジアムの三次元形状データの座標に対する背景画像の位置を特定する。背景画像の各々についてスタジアムの三次元形状データの座標に対する位置が特定できると、背景画像を結合して１つの背景画像とする。なお、本背景画像の三次元形状データの作成については、バックエンドサーバ２７０が実施してもよい。 The image combining unit 2170 acquires a background image from the image processing unit 2150, acquires stadium three-dimensional shape data (stadium shape data) from the CAD data storage unit 2135, and obtains a background for the coordinates of the acquired stadium three-dimensional shape data. Specify the position of the image. When the position of each of the background images with respect to the coordinates of the three-dimensional shape data of the stadium can be specified, the background images are combined into one background image. Note that the back-end server 270 may create the three-dimensional shape data of the background image.

撮影データファイル生成部２１８０は、データ同期部２１３０から音声データを、画像処理部２１５０から前景画像を、三次元モデル結合部２１６０から三次元モデルデータを、画像結合部２１７０から三次元形状に結合された背景画像を取得する。そして、取得したこれらのデータをＤＢアクセス制御部２１９０に対して出力する。ここで、撮影データファイル生成部２１８０は、これらのデータをそれぞれの時間情報に基づいて対応付けて出力する。ただし、これらのデータの一部を対応付けて出力してもよい。例えば、撮影データファイル生成部２１８０は、前景画像と背景画像とを、前景画像の時間情報及び背景画像の時間情報に基づいて対応付けて出力する。また例えば、撮影データファイル生成部２１８０は、前景画像、背景画像、及び三次元モデルデータを、前景画像の時間情報、背景画像の時間情報、及び三次元モデルデータの時間情報に基づいて対応付けて撮影データとして出力する。 The shooting data file generation unit 2180 is combined with the audio data from the data synchronization unit 2130, the foreground image from the image processing unit 2150, the 3D model data from the 3D model combining unit 2160, and the 3D shape from the image combining unit 2170. Get the background image. Then, the acquired data is output to the DB access control unit 2190. Here, the imaging data file generation unit 2180 outputs these data in association with each other based on the time information. However, some of these data may be output in association with each other. For example, the shooting data file generation unit 2180 outputs the foreground image and the background image in association with each other based on the time information of the foreground image and the time information of the background image. Further, for example, the shooting data file generation unit 2180 associates the foreground image, the background image, and the 3D model data based on the time information of the foreground image, the time information of the background image, and the time information of the 3D model data. Output as shooting data.

非撮影データファイル生成部２１８５は、キャリブレーション部２１４０からカメラパラメータ、制御部２１１０からスタジアムの三次元形状データを取得し、ファイル形式に応じて成形した後にＤＢアクセス制御部２１９０に送信する。 The non-photographed data file generation unit 2185 acquires the camera parameters from the calibration unit 2140 and the three-dimensional shape data of the stadium from the control unit 2110, shapes them according to the file format, and transmits them to the DB access control unit 2190.

ＤＢアクセス制御部２１９０は、ＩｎｆｉｎｉＢａｎｄなどにより高速な通信が可能となるようにデータベース２５０と接続される。そして、撮影データファイル生成部２１８０及び非撮影データファイル生成部２１８５から受信したファイルをデータベース２５０に対して送信する。本実施形態では、撮影データファイル生成部２１８０が時間情報に基づいて対応付けた撮影データは、フロントエンドサーバ２３０とネットワークを介して接続される記憶装置であるデータベース２５０へＤＢアクセス制御部２１９０を介して出力される。 The DB access control unit 2190 is connected to the database 250 so that high-speed communication is possible using InfiniBand. Then, the files received from the imaging data file generation unit 2180 and the non-imaging data file generation unit 2185 are transmitted to the database 250. In this embodiment, the shooting data file generation unit 2180 associates the shooting data associated with the time information via the DB access control unit 2190 to the database 250 that is a storage device connected to the front-end server 230 via the network. Is output.

本実施形態ではフロントエンドサーバ２３０が前景画像と背景画像の対応付けを行うものとするが、これに限らず、データベース２５０が対応付けを行ってもよい。例えば、データベース２５０はフロントエンドサーバ２３０から時間情報を有する前景画像及び背景画像を取得する。そしてデータベース２５０は、前景画像と背景画像とを前景画像の時間情報及び背景画像の時間情報に基づいて対応付けて、データベース２５０が備える記憶部に出力してもよい。 In the present embodiment, the front-end server 230 associates the foreground image and the background image. However, the present invention is not limited to this, and the database 250 may associate. For example, the database 250 acquires a foreground image and a background image having time information from the front end server 230. The database 250 may associate the foreground image and the background image with each other based on the time information of the foreground image and the time information of the background image, and output them to the storage unit included in the database 250.

＜データベース２５０の機能ブロック＞
図１２は、データベース２５０の機能構成を説明するためのブロック図である。データベース２５０は、制御部２４１０と、データ入力部２４２０と、データ出力部２４３０と、キャッシュ２４４０と、一次ストレージ２４５０とを有する。また、一次ストレージ２４５０は、二次ストレージ２４６０と接続されている。 <Functional blocks of database 250>
FIG. 12 is a block diagram for explaining the functional configuration of the database 250. The database 250 includes a control unit 2410, a data input unit 2420, a data output unit 2430, a cache 2440, and a primary storage 2450. The primary storage 2450 is connected to the secondary storage 2460.

制御部２４１０は、ＣＰＵやＤＲＡＭ、プログラムデータや各種データを記憶したＨＤＤやＮＡＮＤメモリなどの記憶媒体、及びＥｔｈｅｒｎｅｔ（登録商標）等のハードウェアで構成される。そして、データベース２５０の各機能ブロック及びデータベース２５０のシステム全体の制御を行う。 The control unit 2410 includes a CPU, DRAM, a storage medium such as an HDD or NAND memory that stores program data and various data, and hardware such as Ethernet (registered trademark). Then, each functional block of the database 250 and the entire system of the database 250 are controlled.

データ入力部２４２０は、ＩｎｆｉｎｉＢａｎｄ等の高速な通信によって、フロントエンドサーバ２３０から撮影データや非撮影データのファイルを受信する。受信したファイルは、キャッシュ２４４０に送られる。また、受信した撮影データのメタ情報を読み出し、メタ情報に記録されたタイムコード情報やルーティング情報、カメラ識別子等の情報を元に、取得したデータへのアクセスが可能になるようにデータベーステーブルを作成する。 The data input unit 2420 receives shooting data and non-shooting data files from the front-end server 230 through high-speed communication such as InfiniBand. The received file is sent to the cache 2440. In addition, the meta information of the received shooting data is read, and a database table is created so that the acquired data can be accessed based on information such as time code information, routing information, and camera identifier recorded in the meta information. To do.

データ出力部２４３０は、バックエンドサーバ２７０から要求されたデータが後述するキャッシュ２４４０、一次ストレージ２４５０、二次ストレージ２４６０のいずれに保存されているか判断する。そして、ＩｎｆｉｎｉＢａｎｄ等の高速な通信によって、保存された先からデータを読み出してバックエンドサーバ２７０に送信する。 The data output unit 2430 determines whether data requested from the back-end server 270 is stored in a cache 2440, a primary storage 2450, or a secondary storage 2460, which will be described later. Then, the data is read from the stored destination and transmitted to the back-end server 270 by high-speed communication such as InfiniBand.

キャッシュ２４４０は、高速な入出力スループットを実現可能なＤＲＡＭ等の記憶装置を有しており、データ入力部２４２０から取得した撮影データや非撮影データを記憶装置に格納する。格納されたデータは一定量保持され、それを超えるデータが入力される場合に、古いデータから随時一次ストレージ２４５０へと書き出され、書き出し済みのデータは新たなデータによって上書きされる。キャッシュされたデータはデータ出力部２４３０によって読み出される。 The cache 2440 has a storage device such as a DRAM capable of realizing high-speed input / output throughput, and stores shooting data and non-shooting data acquired from the data input unit 2420 in the storage device. A certain amount of stored data is held, and when data exceeding that amount is input, old data is written to the primary storage 2450 as needed, and the written data is overwritten by new data. The cached data is read by the data output unit 2430.

一次ストレージ２４５０は、ＳＳＤ等のストレージメディアを並列につなぐなどして構成される。データ入力部２４２０からの大量のデータの書き込み及びデータ出力部２４３０からのデータ読み出しが同時に実現できる。そして、一次ストレージ２４５０には、キャッシュ２４４０上に格納されたデータの古いものから順に書き出される。 The primary storage 2450 is configured by connecting storage media such as SSDs in parallel. Writing a large amount of data from the data input unit 2420 and reading data from the data output unit 2430 can be realized simultaneously. Then, the oldest data stored in the cache 2440 is written to the primary storage 2450 in order.

二次ストレージ２４６０は、ＨＤＤやテープメディア等で構成される。高速性よりも大容量が重視され、一次ストレージ２４５０と比較して安価で長期間の保存に適するメディアであることが求められる。二次ストレージ２４６０には、撮影が完了した後、データのバックアップとして一次ストレージ２４５０に格納されたデータが書き出される。 The secondary storage 2460 is configured by an HDD, a tape medium, or the like. Large capacity is more important than high speed, and it is required to be a medium that is cheaper and suitable for long-term storage than primary storage 2450. The data stored in the primary storage 2450 is written to the secondary storage 2460 as a backup of the data after shooting is completed.

＜バックエンドサーバ２７０の説明＞
図１３は、バックエンドサーバ２７０の機能構成を説明するためのブロック図である。バックエンドサーバ２７０は、データ受信部３００１と、背景テクスチャ貼り付け部３００２と、前景テクスチャ決定部３００３と、テクスチャ境界色合わせ部３００４と、仮想視点前景画像生成部３００５と、レンダリング部３００６とを有する。さらに、仮想視点音声生成部３００７と、合成部３００８と、画像出力部３００９と、前景オブジェクト決定部３０１０と、要求リスト生成部３０１１と、要求データ出力部３０１２と、レンダリングモード管理部３０１４とを有する。 <Description of back-end server 270>
FIG. 13 is a block diagram for explaining a functional configuration of the back-end server 270. The back-end server 270 includes a data receiving unit 3001, a background texture pasting unit 3002, a foreground texture determining unit 3003, a texture boundary color matching unit 3004, a virtual viewpoint foreground image generating unit 3005, and a rendering unit 3006. . Furthermore, it has a virtual viewpoint audio generation unit 3007, a synthesis unit 3008, an image output unit 3009, a foreground object determination unit 3010, a request list generation unit 3011, a request data output unit 3012, and a rendering mode management unit 3014. .

データ受信部３００１は、データベース２５０およびコントローラ３００から送信されるデータを受信する。また、データベース２５０からは、スタジアムの形状を示す三次元データ（スタジアム形状データ）、前景画像、背景画像、前景画像の三次元モデル（以降、前景三次元モデルと称する）、及び音声を受信する。
また、データ受信部３００１は、仮想視点画像の生成に係る視点を指定するコントローラ３００から出力される仮想カメラパラメータを受信する。仮想カメラパラメータとは、仮想視点の位置や姿勢などを表すデータであり、例えば、外部パラメータの行列と内部パラメータの行列が用いられる。 The data receiving unit 3001 receives data transmitted from the database 250 and the controller 300. Further, the database 250 receives three-dimensional data (stadium shape data) indicating the shape of the stadium, a foreground image, a background image, a three-dimensional model of the foreground image (hereinafter referred to as a foreground three-dimensional model), and audio.
In addition, the data receiving unit 3001 receives virtual camera parameters output from the controller 300 that specifies a viewpoint related to generation of a virtual viewpoint image. The virtual camera parameter is data representing the position and orientation of the virtual viewpoint, and for example, an external parameter matrix and an internal parameter matrix are used.

背景テクスチャ貼り付け部３００２は、背景メッシュモデル管理部３０１３から取得する背景メッシュモデル（スタジアム形状データ）で示される三次元空間形状に対して背景画像をテクスチャとして貼り付ける。そうすることでテクスチャ付き背景メッシュモデルを生成する。メッシュモデルとは、例えば、ＣＡＤデータなど三次元の空間形状を面の集合で表現したデータのことである。テクスチャとは、物体の表面の質感を表現するために貼り付ける画像のことである。 The background texture pasting unit 3002 pastes a background image as a texture to the three-dimensional space shape indicated by the background mesh model (stadium shape data) acquired from the background mesh model management unit 3013. By doing so, a textured background mesh model is generated. The mesh model is data that represents a three-dimensional space shape such as CAD data by a set of surfaces. A texture is an image that is pasted to express the texture of the surface of an object.

前景テクスチャ決定部３００３は、前景画像及び前景三次元モデル群より前景三次元モデルのテクスチャ情報を決定する。前景テクスチャ境界色合わせ部３００４は、各前景三次元モデルのテクスチャ情報と各三次元モデル群からテクスチャの境界の色合わせを行い、前景オブジェクト毎に色付き前景三次元モデル群を生成する。 The foreground texture determination unit 3003 determines texture information of the foreground 3D model from the foreground image and the foreground 3D model group. The foreground texture boundary color matching unit 3004 performs texture matching between the texture information of each foreground 3D model and each 3D model group, and generates a colored foreground 3D model group for each foreground object.

仮想視点前景画像生成部３００５は、仮想カメラパラメータに基づいて、前景画像群を仮想視点からの見た目となるように透視変換する。レンダリング部３００６は、レンダリングモード管理部３０１４で決定された、仮想視点画像の生成に用いられる生成方式に基づいて、背景画像と前景画像をレンダリングして全景の仮想視点画像を生成する。 The virtual viewpoint foreground image generation unit 3005 performs perspective transformation so that the foreground image group looks from the virtual viewpoint based on the virtual camera parameters. The rendering unit 3006 renders the background image and the foreground image based on the generation method used for generating the virtual viewpoint image determined by the rendering mode management unit 3014 to generate a virtual viewpoint image of the entire view.

本実施形態では仮想視点画像の生成方式として、モデルベースレンダリング（Ｍｏｄｅｌ−ＢａｓｅｄＲｅｎｄｅｒｉｎｇ：ＭＢＲ）とイメージベース（Ｉｍａｇｅ−ＢａｓｅｄＲｅｎｄｅｒｉｎｇ：ＩＢＲ）の２つのレンダリングモードが用いられる。 In the present embodiment, two rendering modes of model-based rendering (Model-Based Rendering: MBR) and image-based (Image-Based Rendering: IBR) are used as the generation method of the virtual viewpoint image.

ＭＢＲとは、被写体を複数の方向から撮影した複数の撮影画像に基づいて生成される三次元モデルを用いて仮想視点画像を生成する方式である。レンダリングモードがＭＢＲの場合、背景メッシュモデルと前景テクスチャ境界色合わせ部３００４で生成した前景三次元モデル群を合成することで全景モデルが生成され、その全景モデルから仮想視点画像が生成される。 MBR is a method for generating a virtual viewpoint image using a three-dimensional model generated based on a plurality of captured images obtained by capturing a subject from a plurality of directions. When the rendering mode is MBR, a foreground model is generated by combining the background mesh model and the foreground 3D model group generated by the foreground texture boundary color matching unit 3004, and a virtual viewpoint image is generated from the foreground model.

ＩＢＲとは、対象のシーンを複数視点から撮影した入力画像群を変形、合成することによって仮想視点からの見えを再現した仮想視点画像を生成する技術である。本実施形態では、ＩＢＲを用いる場合、ＭＢＲを用いて三次元モデルを生成するための複数の撮影画像より少ない１又は複数の撮影画像に基づいて仮想視点画像が生成される。レンダリングモードがＩＢＲの場合、背景テクスチャモデルに基づいて仮想視点から見た背景画像が生成され、そこに仮想視点前景画像生成部３００５で生成された前景画像を合成することで仮想視点画像が生成される。なお、レンダリング部３００６はＭＢＲとＩＢＲ以外のレンダリング手法を用いてもよい。 IBR is a technique for generating a virtual viewpoint image that reproduces the appearance from a virtual viewpoint by transforming and synthesizing an input image group obtained by photographing a target scene from a plurality of viewpoints. In the present embodiment, when using IBR, a virtual viewpoint image is generated based on one or a plurality of captured images that are fewer than a plurality of captured images for generating a three-dimensional model using MBR. When the rendering mode is IBR, a background image viewed from a virtual viewpoint is generated based on the background texture model, and a virtual viewpoint image is generated by synthesizing the foreground image generated by the virtual viewpoint foreground image generation unit 3005 there. The Note that the rendering unit 3006 may use a rendering method other than MBR and IBR.

レンダリングモード管理部３０１４は、仮想視点画像の生成に用いられる生成方式としてのレンダリングモードを決定し、決定結果を保持する。仮想視点音声生成部３００７は、仮想カメラパラメータに基づいて、仮想視点において聞こえる音声（音声群）を生成する。合成部３００８は、レンダリング部３００６で生成された画像群と仮想視点音声生成部３００７で生成された音声を合成して仮想視点コンテンツを生成する。 The rendering mode management unit 3014 determines a rendering mode as a generation method used for generating a virtual viewpoint image, and holds the determination result. The virtual viewpoint sound generation unit 3007 generates sound (sound group) that can be heard at the virtual viewpoint based on the virtual camera parameters. The synthesizing unit 3008 generates a virtual viewpoint content by synthesizing the image group generated by the rendering unit 3006 and the audio generated by the virtual viewpoint audio generating unit 3007.

画像出力部３００９は、コントローラ３００とエンドユーザ端末１９０へＥｔｈｅｒｎｅｔ（登録商標）を用いて仮想視点コンテンツを出力する。ただし、外部への伝送手段はＥｔｈｅｒｎｅｔ（登録商標）に限定されるものではなく、ＳＤＩ、ＤｉｓｐｌａｙＰｏｒｔ、及びＨＤＭＩ（登録商標）などの信号伝送手段を用いてもよい。なお、バックエンドサーバ２７０は、レンダリング部３００６で生成された、音声を含まない仮想視点画像を出力してもよい。 The image output unit 3009 outputs virtual viewpoint content to the controller 300 and the end user terminal 190 using Ethernet (registered trademark). However, the transmission means to the outside is not limited to Ethernet (registered trademark), and signal transmission means such as SDI, DisplayPort, and HDMI (registered trademark) may be used. Note that the back-end server 270 may output the virtual viewpoint image that is generated by the rendering unit 3006 and does not include sound.

前景オブジェクト決定部３０１０は、仮想カメラパラメータと前景三次元モデルに含まれる前景オブジェクトの空間上の位置を示す前景オブジェクトの位置情報から、表示される前景オブジェクト群を決定して、前景オブジェクトリストを出力する。つまり、前景オブジェクト決定部３０１０は、仮想視点の画像情報を物理的なカメラ１１２にマッピングする処理を実施する。 The foreground object determination unit 3010 determines the foreground object group to be displayed from the virtual camera parameters and the position information of the foreground object indicating the position of the foreground object included in the foreground 3D model, and outputs the foreground object list To do. That is, the foreground object determination unit 3010 performs a process of mapping the virtual viewpoint image information to the physical camera 112.

要求リスト生成部３０１１は、指定時間の前景オブジェクトリストに対応する前景画像群と前景三次元モデル群、及び背景画像と音声データをデータベース２５０に要求するための、要求リストを生成する。前景オブジェクトについては仮想視点を考慮して選択されたデータがデータベース２５０に要求されるが、背景画像と音声データについてはそのフレームに関する全てのデータが要求される。バックエンドサーバ２７０の起動後、背景メッシュモデルが取得されるまで背景メッシュモデルの要求リストが生成される。 The request list generation unit 3011 generates a request list for requesting the database 250 for the foreground image group and the foreground 3D model group corresponding to the foreground object list at the specified time, and the background image and audio data. For the foreground object, data selected in consideration of the virtual viewpoint is requested from the database 250, but for the background image and the audio data, all data relating to the frame is requested. After the back-end server 270 is started, a background mesh model request list is generated until the background mesh model is acquired.

要求データ出力部３０１２は、入力された要求リストを元に、データベース２５０に対してデータ要求のコマンドを出力する。背景メッシュモデル管理部３０１３は、データベース２５０から受信した背景メッシュモデルを記憶する。 The request data output unit 3012 outputs a data request command to the database 250 based on the input request list. The background mesh model management unit 3013 stores the background mesh model received from the database 250.

なお、本実施形態ではバックエンドサーバ２７０が仮想視点画像の生成方式の決定と仮想視点画像の生成の両方を行う場合を中心に説明するが、これに限らない。 In this embodiment, the case where the back-end server 270 performs both the determination of the generation method of the virtual viewpoint image and the generation of the virtual viewpoint image will be mainly described, but the present invention is not limited to this.

図１４は、仮想カメラ８００１について説明するための図である。図１４（ａ）において、仮想カメラ８００１は、設置されたどのカメラ１１２とも異なる視点において撮影を行うことができる仮想的なカメラである。すなわち、画像処理システム１００において生成される仮想視点画像が、仮想カメラ８００１による撮影画像である。図１４（ａ）において、円周上に配置された複数のセンサシステム１１０それぞれがカメラ１１２を有している。例えば、仮想視点画像を生成することにより、あたかもサッカーゴールの近くの仮想カメラ８００１で撮影されたかのような画像を生成することができる。仮想カメラ８００１の撮影画像である仮想視点画像は、設置された複数のカメラ１１２の画像を画像処理することで生成される。オペレータ（ユーザ）は仮想カメラ８００１の位置等操作することで、自由な視点からの撮影画像を得ることができる。図１４（ｂ）において、仮想カメラパス８００２は、仮想カメラ８００１の１フレームごとの位置や姿勢を表す情報の列を示している。詳細は後述する。 FIG. 14 is a diagram for explaining the virtual camera 8001. In FIG. 14A, a virtual camera 8001 is a virtual camera that can shoot from a different viewpoint from any camera 112 installed. That is, the virtual viewpoint image generated in the image processing system 100 is a captured image by the virtual camera 8001. In FIG. 14A, each of the plurality of sensor systems 110 arranged on the circumference has a camera 112. For example, by generating a virtual viewpoint image, it is possible to generate an image as if it was shot with a virtual camera 8001 near the soccer goal. A virtual viewpoint image that is a captured image of the virtual camera 8001 is generated by performing image processing on images of a plurality of cameras 112 that are installed. An operator (user) can obtain a captured image from a free viewpoint by operating the position of the virtual camera 8001 and the like. In FIG. 14B, a virtual camera path 8002 indicates a column of information indicating the position and orientation of each frame of the virtual camera 8001. Details will be described later.

図１５は、仮想カメラ操作ＵＩ３３０の機能構成を説明するためのブロック図である。仮想カメラ操作ＵＩ３３０は、仮想カメラ管理部８１３０および操作ＵＩ部８１２０を有する。これらは同一機器上に実装されてもよいし、それぞれサーバとなる装置とクライアントとなる装置に別々に実装されてもよい。 FIG. 15 is a block diagram for explaining a functional configuration of the virtual camera operation UI 330. The virtual camera operation UI 330 includes a virtual camera management unit 8130 and an operation UI unit 8120. These may be mounted on the same device, or may be separately mounted on a server device and a client device.

仮想カメラ操作部８１０１は、オペレータの仮想カメラ８００１に対する操作、すなわち仮想視点画像の生成に係る視点を指定するためのユーザによる指示を受け付けて処理する。オペレータの操作内容は、例えば、位置の変更（移動）、姿勢の変更（回転）、及びズーム倍率の変更などである。 The virtual camera operation unit 8101 receives and processes an operation by the operator on the virtual camera 8001, that is, an instruction from the user for designating a viewpoint related to generation of a virtual viewpoint image. The operation contents of the operator include, for example, position change (movement), posture change (rotation), zoom magnification change, and the like.

仮想カメラパラメータ導出部８１０２は、仮想カメラ８００１の位置や姿勢などを表す仮想カメラパラメータを導出する。仮想パラメータは、演算によって導出されてもよいし、ルックアップテーブルの参照などによって導出されてもよい。仮想カメラパラメータとして、例えば、外部パラメータを表す行列と内部パラメータを表す行列が用いられる。ここで、仮想カメラ８００１の位置と姿勢は外部パラメータに含まれ、ズーム値は内部パラメータに含まれる。 A virtual camera parameter deriving unit 8102 derives virtual camera parameters representing the position, orientation, and the like of the virtual camera 8001. The virtual parameter may be derived by calculation, or may be derived by referring to a lookup table. As virtual camera parameters, for example, a matrix representing external parameters and a matrix representing internal parameters are used. Here, the position and orientation of the virtual camera 8001 are included in the external parameters, and the zoom value is included in the internal parameters.

仮想カメラ制約管理部８１０３は、仮想カメラ操作部８１０１により受け付けられる指示に基づく視点の指定が制限される制限領域を特定するための情報を取得し管理する。この情報は例えば、仮想カメラ８００１の位置や姿勢、ズーム値などに関する制約である。 The virtual camera constraint management unit 8103 acquires and manages information for specifying a restricted area in which designation of a viewpoint based on an instruction received by the virtual camera operation unit 8101 is restricted. This information is, for example, restrictions on the position and orientation of the virtual camera 8001, the zoom value, and the like.

衝突判定部８１０４は、仮想カメラパラメータ導出部８１０２で導出された仮想カメラパラメータが仮想カメラ制約を満たしているかを判定する。制約を満たしていない場合は、例えば、オペレータによる操作入力をキャンセルし、制約を満たす位置から仮想カメラ８００１が動かないよう制御したり、制約を満たす位置に仮想カメラ８００１を戻したりする。 The collision determination unit 8104 determines whether the virtual camera parameter derived by the virtual camera parameter deriving unit 8102 satisfies the virtual camera constraint. If the constraint is not satisfied, for example, the operation input by the operator is canceled and control is performed so that the virtual camera 8001 does not move from a position satisfying the constraint, or the virtual camera 8001 is returned to a position satisfying the constraint.

フィードバック出力部８１０５は、衝突判定部８１０４の判定結果をオペレータにフィードバックする。例えば、オペレータの操作により、仮想カメラ制約が満たされなくなる場合に、そのことをオペレータに通知する。例えば、オペレータが仮想カメラ８００１を上方に移動しようと操作したが、移動先が仮想カメラ制約を満たさないとする。その場合、オペレータに、これ以上上方に仮想カメラ８００１を移動できないことを通知する。 The feedback output unit 8105 feeds back the determination result of the collision determination unit 8104 to the operator. For example, when the virtual camera restriction is not satisfied by the operator's operation, this is notified to the operator. For example, it is assumed that the operator operates to move the virtual camera 8001 upward, but the moving destination does not satisfy the virtual camera restriction. In that case, the operator is notified that the virtual camera 8001 cannot be moved further upward.

仮想カメラパス管理部８１０６は、オペレータの操作に応じた仮想カメラ８００１のパス（仮想カメラパス８００２）を管理する。図１４（ｂ）において示したように、仮想カメラパス８００２とは、仮想カメラ８００１の１フレームごと位置や姿勢を表す情報の列である。例えば、仮想カメラ８００１の位置や姿勢を表す情報として仮想カメラパラメータが用いられる。例えば、６０フレーム／秒のフレームレートの設定における１秒分の情報は、６０個の仮想カメラパラメータの列となる。仮想カメラパス管理部８１０６は、衝突判定部８１０４で判定済みの仮想カメラパラメータを、バックエンドサーバ２７０に送信する。バックエンドサーバ２７０は、受信した仮想カメラパラメータを用いて、仮想視点画像及び仮想視点音声を生成する。 The virtual camera path management unit 8106 manages the path of the virtual camera 8001 (virtual camera path 8002) according to the operation of the operator. As shown in FIG. 14B, the virtual camera path 8002 is a sequence of information representing the position and orientation of the virtual camera 8001 for each frame. For example, virtual camera parameters are used as information representing the position and orientation of the virtual camera 8001. For example, information for one second in the frame rate setting of 60 frames / second is a string of 60 virtual camera parameters. The virtual camera path management unit 8106 transmits the virtual camera parameters determined by the collision determination unit 8104 to the back-end server 270. The back-end server 270 generates a virtual viewpoint image and a virtual viewpoint sound using the received virtual camera parameters.

オーサリング部８１０７は、オペレータがリプレイ画像を生成する際の編集機能を提供する。オーサリング部８１０７は、ユーザ操作に応じて、リプレイ画像用の仮想カメラパス８００２の初期値として、仮想カメラパス管理部８１０６が保持する仮想カメラパス８００２の一部を取り出す。 The authoring unit 8107 provides an editing function when an operator generates a replay image. The authoring unit 8107 extracts a part of the virtual camera path 8002 held by the virtual camera path management unit 8106 as an initial value of the virtual camera path 8002 for the replay image in response to a user operation.

仮想カメラ画像・音声出力部８１０８は、バックエンドサーバ２７０から受け取った仮想カメラ画像・音声を出力する。オペレータは出力された画像及び音声を確認しながら仮想カメラ８００１を操作する。 The virtual camera image / sound output unit 8108 outputs the virtual camera image / sound received from the back-end server 270. The operator operates the virtual camera 8001 while confirming the output image and sound.

図１６は、エンドユーザ端末１９０の機能構成を説明するためのブロック図である。サービスアプリケーションが動作するエンドユーザ端末１９０は、例えば、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）である。なお、エンドユーザ端末１９０は、ＰＣに限らず、スマートフォンやタブレット端末、高精細な大型ディスプレイでもよいものとする。 FIG. 16 is a block diagram for explaining a functional configuration of the end user terminal 190. The end user terminal 190 on which the service application operates is, for example, a PC (Personal Computer). Note that the end user terminal 190 is not limited to a PC, and may be a smartphone, a tablet terminal, or a large high-definition display.

アプリケーション管理部１０００１は、基本ソフト部１０００２から入力されたユーザ入力情報を、バックエンドサーバ２７０のバックエンドサーバコマンドに変換して、基本ソフト部１０００２へ出力する。また、アプリケーション管理部１０００１は、基本ソフト部１０００２から入力された画像を、所定の表示領域に描画するための画像描画指示を、基本ソフト部１０００２へ出力する。 The application management unit 10001 converts user input information input from the basic software unit 10002 into a backend server command of the backend server 270 and outputs the backend server command to the basic software unit 10002. In addition, the application management unit 10001 outputs an image drawing instruction for drawing an image input from the basic software unit 10002 in a predetermined display area to the basic software unit 10002.

基本ソフト部１０００２は、例えば、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）であり、ユーザ入力部１０００４から入力されたユーザ入力情報を、アプリケーション管理部１０００１へ出力する。また、ネットワーク通信部１０００３から入力された画像や音声をアプリケーション管理部１０００１へ出力したり、アプリケーション管理部１０００１から入力されたバックエンドサーバコマンドをネットワーク通信部１０００３へ出力したりする。さらに、アプリケーション管理部１０００１から入力された画像描画指示を、画像出力部１０００５へ出力する。 The basic software unit 10002 is, for example, an OS (Operating System), and outputs user input information input from the user input unit 10004 to the application management unit 10001. In addition, an image or sound input from the network communication unit 10003 is output to the application management unit 10001, and a back-end server command input from the application management unit 10001 is output to the network communication unit 10003. Further, the image drawing instruction input from the application management unit 10001 is output to the image output unit 10005.

ネットワーク通信部１０００３は、基本ソフト部１０００２から入力されたバックエンドサーバコマンドを、ＬＡＮケーブル上で通信可能なＬＡＮ通信信号に変換して、バックエンドサーバ２７０へ出力する。そして、バックエンドサーバ２７０から受信した画像や音声データが加工可能となるように、基本ソフト部１０００２へデータを渡す。 The network communication unit 10003 converts the back-end server command input from the basic software unit 10002 into a LAN communication signal that can be communicated on the LAN cable, and outputs the LAN communication signal to the back-end server 270. Then, the data is transferred to the basic software unit 10002 so that the image and audio data received from the back-end server 270 can be processed.

ユーザ入力部１０００４は、キーボード入力（物理キーボード又はソフトキーボード）やボタン入力に基づくユーザ入力情報や、ユーザ入力機器からＵＳＢケーブルを介して入力されたユーザ入力情報を取得し、基本ソフト部１０００２へ出力する。 The user input unit 10004 acquires user input information based on keyboard input (physical keyboard or soft keyboard) or button input, or user input information input from a user input device via a USB cable, and outputs the acquired information to the basic software unit 10002 To do.

画像出力部１０００５は、基本ソフト部１０００２から出力された画像表示指示に基づく画像を画像信号に変換して、外部ディスプレイや一体型のディスプレイなどに出力する。 The image output unit 10005 converts an image based on the image display instruction output from the basic software unit 10002 into an image signal, and outputs the image signal to an external display or an integrated display.

音声出力部１０００６は、基本ソフト部１０００２から出力された音声出力指示に基づく音声データを外部スピーカあるいは一体型スピーカに出力する。端末属性管理部１０００７は、端末の表示解像度、画像符号化コーデック種別、及び端末種別（スマートフォンなのか、大型ディスプレイなのかなど）を管理する。 The audio output unit 10006 outputs audio data based on the audio output instruction output from the basic software unit 10002 to an external speaker or an integrated speaker. The terminal attribute management unit 10007 manages the display resolution of the terminal, the image encoding codec type, and the terminal type (whether it is a smartphone or a large display).

サービス属性管理部１０００８は、エンドユーザ端末１９０に提供するサービス種別に関する情報を管理する。例えば、エンドユーザ端末１９０に搭載されるアプリケーションの種別や利用可能な画像配信サービスなどが管理される。課金管理部１０００９では、ユーザの画像配信サービスへの登録決済状況や課金金額に応じた、受信可能な画像配信シーン数の管理などが行われる。 The service attribute management unit 10008 manages information related to the service type provided to the end user terminal 190. For example, the type of application installed in the end user terminal 190 and the available image distribution service are managed. The charge management unit 10009 manages the number of receivable image distribution scenes according to the registration settlement status of the user to the image distribution service and the charge amount.

＜ハードウェア構成について＞
続いて、本実施形態を構成する各装置のハードウェア構成について説明する。上述の通り、本実施形態では、カメラアダプタ１２０がＦＰＧＡ及び／又はＡＳＩＣなどのハードウェアを実装し、これらのハードウェアによって、上述した各処理を実行する場合の例を中心に説明した。それはセンサシステム１１０内の各種装置や、フロントエンドサーバ２３０、データベース２５０、バックエンドサーバ２７０、及びコントローラ３００についても同様である。しかしながら、上記装置のうち、少なくとも何れかが、例えばＣＰＵ、ＧＰＵ、ＤＳＰなどを用い、ソフトウェア処理によって本実施形態の処理を実行するようにしても良い。 <About hardware configuration>
Next, the hardware configuration of each device constituting this embodiment will be described. As described above, in the present embodiment, the camera adapter 120 is mounted with hardware such as FPGA and / or ASIC, and the above-described processing is executed mainly by using the hardware. The same applies to various devices in the sensor system 110, the front-end server 230, the database 250, the back-end server 270, and the controller 300. However, at least one of the above devices may use, for example, a CPU, GPU, DSP, or the like to execute the processing of this embodiment by software processing.

図１７は、カメラアダプタ１２０のハードウェア構成を示すブロック図である。なお、フロントエンドサーバ２３０、データベース２５０、バックエンドサーバ２７０、制御ステーション３１０、仮想カメラ操作ＵＩ３３０、及びエンドユーザ端末１９０などの装置も、図１７のハードウェア構成となりうる。カメラアダプタ１２０は、ＣＰＵ１２０１と、ＲＯＭ１２０２と、ＲＡＭ１２０３と、補助記憶装置１２０４と、表示部１２０５と、操作部１２０６と、通信部１２０７と、バス１２０８とを有する。 FIG. 17 is a block diagram illustrating a hardware configuration of the camera adapter 120. Note that devices such as the front-end server 230, the database 250, the back-end server 270, the control station 310, the virtual camera operation UI 330, and the end user terminal 190 can also have the hardware configuration shown in FIG. The camera adapter 120 includes a CPU 1201, a ROM 1202, a RAM 1203, an auxiliary storage device 1204, a display unit 1205, an operation unit 1206, a communication unit 1207, and a bus 1208.

ＣＰＵ１２０１は、ＲＯＭ１２０２やＲＡＭ１２０３に格納されているコンピュータプログラムやデータを用いてカメラアダプタ１２０の全体を制御する。ＲＯＭ１２０２は、変更を必要としないプログラムやパラメータを格納する。ＲＡＭ１２０３は、補助記憶装置１２０４から供給されるプログラムやデータ、及び通信部１２０７を介して外部から供給されるデータなどを一時記憶する。補助記憶装置１２０４は、例えばハードディスクドライブ等で構成され、静止画や動画などのコンテンツデータを記憶する。 The CPU 1201 controls the entire camera adapter 120 using computer programs and data stored in the ROM 1202 and the RAM 1203. The ROM 1202 stores programs and parameters that do not need to be changed. The RAM 1203 temporarily stores programs and data supplied from the auxiliary storage device 1204, data supplied from the outside via the communication unit 1207, and the like. The auxiliary storage device 1204 is composed of, for example, a hard disk drive and stores content data such as still images and moving images.

表示部１２０５は、例えば、液晶ディスプレイ等で構成され、ユーザがカメラアダプタ１２０を操作するためのＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）などを表示する。操作部１２０６は、例えば、キーボードやマウス等で構成され、ユーザによる操作を受けて各種の指示をＣＰＵ１２０１に入力する。通信部１２０７は、カメラ１１２やフロントエンドサーバ２３０などの外部の装置と通信を行う。バス１２０８は、カメラアダプタ１２０の各部を繋いで情報を伝達する。 The display unit 1205 includes, for example, a liquid crystal display, and displays a GUI (Graphical User Interface) for the user to operate the camera adapter 120. The operation unit 1206 includes, for example, a keyboard and a mouse, and inputs various instructions to the CPU 1201 in response to user operations. The communication unit 1207 communicates with external devices such as the camera 112 and the front end server 230. A bus 1208 connects each part of the camera adapter 120 to transmit information.

上述の実施形態は、画像処理システム１００が競技場やコンサートホールなどの施設に設置される場合の例を中心に説明した。施設の他の例としては、例えば、遊園地、公園、競馬場、競輪場、カジノ、プール、スケートリンク、スキー場、ライブハウスなどがある。また、各種施設で行われるイベントは、屋内で行われるものであっても屋外で行われるものであっても良い。また、本実施形態における施設は、一時的に（期間限定で）建設される施設も含む。 The above-described embodiment has been described mainly with respect to an example in which the image processing system 100 is installed in a facility such as a stadium or a concert hall. Other examples of facilities include an amusement park, a park, a racetrack, a bicycle racetrack, a casino, a pool, a skating rink, a ski resort, a live house, and the like. In addition, events performed at various facilities may be performed indoors or outdoors. The facility in the present embodiment also includes a facility that is temporarily constructed (for a limited time).

＜他の実施形態＞
振動情報に関わる処理は、カメラアダプタ１２０ではなく、フロントエンドサーバ２３０で行ってもよい。具体的には、データ入力制御部２１２０のサーバ画像処理部６２３０で実行することも可能である。 <Other embodiments>
The processing related to the vibration information may be performed by the front end server 230 instead of the camera adapter 120. Specifically, it can be executed by the server image processing unit 6230 of the data input control unit 2120.

また、外部センサは各カメラに接続されなくてもよく、複数台のカメラにひとつの外部センサを設け、センサの情報を各カメラに通知しても構わない。 In addition, the external sensor may not be connected to each camera, and one external sensor may be provided for a plurality of cameras, and information about the sensor may be notified to each camera.

カメラアダプタ１２０では、例えば、振動情報のうち振動振幅が所定値を超える場合は、前景画像や背景画像に分離せず、全ての画像データおよび振動情報をメタ情報とともに送信する。そして、サーバ画像処理部６２３０は、カメラアダプタ１２０から受信した画像および振動情報に基づき、先述したような位置合わせ、前景画像に関する処理を実行する。 For example, when the vibration amplitude of the vibration information exceeds a predetermined value, the camera adapter 120 transmits all the image data and vibration information together with the meta information without being separated into the foreground image and the background image. Then, the server image processing unit 6230 executes processing related to the alignment and foreground image as described above based on the image and vibration information received from the camera adapter 120.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

以上、上述した実施形態によれば、カメラ１１２の台数などのシステムを構成する装置の規模、及び撮影画像の出力解像度や出力フレームレートなどに依らず、仮想視点画像を簡便に生成することが出来る。以上、本発明の実施形態について詳述したが、本発明は上述の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形及び変更が可能である。 As described above, according to the above-described embodiment, it is possible to easily generate a virtual viewpoint image regardless of the scale of the apparatus constituting the system such as the number of cameras 112, the output resolution of the captured image, the output frame rate, and the like. . As mentioned above, although embodiment of this invention was explained in full detail, this invention is not limited to the above-mentioned embodiment, In the range of the summary of this invention described in the claim, various deformation | transformation and change Is possible.

１１０センサシステム、１１１マイク、１１２カメラ、１１３雲台、１２０カメラアダプタ、１８０スイッチングハブ、１９０エンドユーザ端末、２３０フロントエンドサーバ、２５０データベース、２７０バックエンドサーバ、２９０タイムサーバ、３１０制御ステーション、３３０仮想カメラ操作ＵＩ 110 sensor system, 111 microphone, 112 camera, 113 pan head, 120 camera adapter, 180 switching hub, 190 end user terminal, 230 front end server, 250 database, 270 back end server, 290 time server, 310 control station, 330 virtual Camera operation UI

Claims

撮影画像を取得する画像取得手段と、
前記撮影画像を撮影した撮影装置に対する振動情報を取得する振動情報取得手段と、
仮想視点画像の生成に用いる所定領域を前記撮影画像から切り出す切り出し手段と、
前記振動情報に基づいて前記所定領域を決定する決定手段と、を備えることを特徴とする情報処理装置。 Image acquisition means for acquiring a captured image;
Vibration information acquisition means for acquiring vibration information for a photographing apparatus that has photographed the photographed image;
Clipping means for cutting out a predetermined region used for generating the virtual viewpoint image from the captured image;
An information processing apparatus comprising: determining means for determining the predetermined area based on the vibration information.

前記切り出し画像から、前記切り出し画像に含まれるオブジェクトを前景画像として分離する分離手段と、
前記前景画像が、前記仮想視点画像を生成するのに有効であるか否かを判定する判定手段とを、さらに備えることを特徴とする請求項１に記載の情報処理装置。 Separation means for separating an object included in the cutout image from the cutout image as a foreground image;
The information processing apparatus according to claim 1, further comprising: a determination unit that determines whether or not the foreground image is effective for generating the virtual viewpoint image.

前記判定手段は、前記前景画像であるオブジェクトが、オブジェクトの全体を表す場合に、前記前景画像は有効であると判定することを特徴とする請求項２に記載の情報処理装置。 The information processing apparatus according to claim 2, wherein the determination unit determines that the foreground image is valid when the object that is the foreground image represents the entire object.

前記判定手段が有効と判定した場合に前記前景画像を出力する出力手段をさらに備えることを特徴とする請求項２または３に記載の情報処理装置。 The information processing apparatus according to claim 2, further comprising an output unit that outputs the foreground image when the determination unit determines to be valid.

前記切り出し画像と、前記取得した撮影画像を撮影した撮影装置とは異なる他の撮影装置から取得した画像との位置合わせを行う位置合わせ手段をさらに備え、
前記分離手段は、前記位置合わせ手段が前記位置合わせに失敗した場合に、前記前景画像の分離を行わないことを特徴とする請求項４に記載の情報処理装置。 The image processing apparatus further comprises alignment means for aligning the clipped image with an image acquired from another imaging device different from the imaging device that captured the acquired captured image,
The information processing apparatus according to claim 4, wherein the separation unit does not separate the foreground image when the alignment unit fails in the alignment.

前記出力手段は、前記判定手段が有効でないと判定した場合にエラー情報を出力することを特徴とする請求項４または５に記載の情報処理装置。 The information processing apparatus according to claim 4, wherein the output unit outputs error information when it is determined that the determination unit is not valid.

前記切り出し手段は、前記取得した撮影画像を撮影した前記撮影装置と所定範囲内にある撮影装置に、前記エラー情報のない非エラーとなっている撮影装置が存在する場合に、前記切り出しを行うことを特徴とする請求項６に記載の情報処理装置。 The clipping unit performs the clipping when the imaging device that captured the captured image and the imaging device within a predetermined range include a non-error imaging device without the error information. The information processing apparatus according to claim 6.

前記切り出し手段は、前記撮影画像と、前記非エラーの前記撮影装置からの撮影画像との位置合わせが成功した場合に、前記切り出しを行う、ことを特徴とする請求項７に記載の情報処理装置。 The information processing apparatus according to claim 7, wherein the cutout unit performs the cutout when alignment between the photographed image and the photographed image from the non-error photographing apparatus is successful. .

前記出力手段は、前記切り出し画像と、前記非エラーの前記撮影装置からの撮影画像との位置合わせに成功した場合に、前記前景画像を出力する、ことを特徴とする請求項７に記載の情報処理装置。 8. The information according to claim 7, wherein the output means outputs the foreground image when alignment of the cut-out image and the captured image from the non-error imaging device is successful. Processing equipment.

前記切り出し手段は、前記振動情報が示す振動振幅が所定の値を超える場合は前記切り出しを行わないことを特徴とする請求項１から９のいずれか一項に記載の情報処理装置。 The information processing apparatus according to any one of claims 1 to 9, wherein the clipping unit does not perform the clipping when a vibration amplitude indicated by the vibration information exceeds a predetermined value.

複数の撮影手段と、
前記撮影手段に対する振動情報を取得する振動情報取得手段と、
前記複数の撮影手段からの撮影画像に基づいて仮想視点画像を生成する画像生成手段と、
前記複数の撮影手段のうち１つ以上に対する前記振動情報に基づいて、前記複数の撮影手段のうち１つ以上による前記撮影画像のそれぞれについて前記仮想視点画像の生成に用いるか否かを判定する判定手段と、を備えることを特徴とする画像処理システム。 A plurality of photographing means;
Vibration information acquisition means for acquiring vibration information for the photographing means;
Image generating means for generating a virtual viewpoint image based on the photographed images from the plurality of photographing means;
Judgment to determine whether or not to use each of the photographed images by one or more of the plurality of photographing means to generate the virtual viewpoint image based on the vibration information for one or more of the plurality of photographing means. And an image processing system.

前記仮想視点画像の生成に用いる所定領域を、前記撮影画像から切り出す切り出し手段と、
前記振動情報に基づいて前記所定領域を決定する決定手段と、
をさらに備えることを特徴とする請求項１１に記載の画像処理システム。 Clipping means for cutting out a predetermined area used for generating the virtual viewpoint image from the captured image;
Determining means for determining the predetermined area based on the vibration information;
The image processing system according to claim 11, further comprising:

前記切り出し画像から、前記切り出し画像に含まれるオブジェクトを前景画像として分離する分離手段と、
前記前景画像が、前記仮想視点画像を生成するのに有効か否かを判定する判定手段とを、さらに備えることを特徴とする請求項１２に記載の画像処理システム。 Separation means for separating an object included in the cutout image from the cutout image as a foreground image;
The image processing system according to claim 12, further comprising determination means for determining whether or not the foreground image is effective for generating the virtual viewpoint image.

前記判定手段は、前記前景画像であるオブジェクトが、オブジェクトの全体を表す場合に、前記前景画像は有効であると判定することを特徴とする請求項１３に記載の画像処理システム。 The image processing system according to claim 13, wherein the determination unit determines that the foreground image is valid when the object that is the foreground image represents the entire object.

前記判定手段により有効と判定された前記前景画像を前記画像生成手段に伝送する伝送手段と、をさらに備えることを特徴とする請求項１３または１４に記載の画像処理システム。 The image processing system according to claim 13, further comprising a transmission unit configured to transmit the foreground image determined to be valid by the determination unit to the image generation unit.

前記撮影手段と前記判定手段とをそれぞれにおいて含む撮影装置を複数備えることを特徴とする請求項１５に記載の画像処理システム。 The image processing system according to claim 15, comprising a plurality of imaging devices each including the imaging unit and the determination unit.

前記撮影装置は、前記分離手段と前記伝送手段と、をさらに備えることを特徴とする請求項１６に記載の画像処理システム。 The image processing system according to claim 16, wherein the photographing apparatus further includes the separation unit and the transmission unit.

前記画像生成手段と前記判定手段とを含む画像処理装置を備えることを特徴とする請求項１３から１５のいずれか一項に記載の画像処理システム。 The image processing system according to claim 13, further comprising an image processing device including the image generation unit and the determination unit.

前記分離手段は、前記切り出し画像を、前記前景画像と前記前景画像以外の画像である背景画像とに分離し、
前記伝送手段は、前記背景画像を前記画像生成手段に伝送し、
前記画像生成手段は、前記複数の撮影手段の前記振動情報が示す振動振幅が所定の値より小さい場合に、前記伝送された背景画像に基づいて、前記仮想視点画像に用いる背景画像を更新することを特徴とする請求項１５に記載の画像処理システム。 The separating means separates the cut-out image into the foreground image and a background image that is an image other than the foreground image,
The transmission means transmits the background image to the image generation means,
The image generation means updates a background image used for the virtual viewpoint image based on the transmitted background image when the vibration amplitude indicated by the vibration information of the plurality of photographing means is smaller than a predetermined value. The image processing system according to claim 15.

前記複数の撮影手段の各々の振動を検出する複数の検出手段をさらに備えることを特徴とする請求項１１から１９のいずれか一項に記載の画像処理システム。 The image processing system according to claim 11, further comprising a plurality of detection units that detect vibrations of the plurality of imaging units.

前記複数の検出手段の個数は、前記複数の撮影手段の個数よりも少なく、前記複数の撮影手段の各々の振動は、１つ以上の前記検出手段により検出された振動から補完により取得されることを特徴とする請求項２０に記載の画像処理システム。 The number of the plurality of detection means is less than the number of the plurality of imaging means, and the vibration of each of the plurality of imaging means is acquired by complementation from the vibration detected by one or more of the detection means. The image processing system according to claim 20.

情報処理装置を制御する方法であって、
撮影画像を取得する画像取得工程と、
前記撮影画像を撮影した撮影装置に対する振動情報を取得する振動情報取得工程と、
仮想視点画像の生成に用いる所定領域を前記撮影画像から切り出す切り出し工程と、
前記振動情報に基づいて前記所定領域を決定する決定工程と、を備えることを特徴とする情報処理装置の制御方法。 A method for controlling an information processing apparatus, comprising:
An image acquisition process for acquiring a captured image;
A vibration information acquisition step of acquiring vibration information for the imaging device that captured the captured image;
A cutting-out step of cutting out a predetermined region used for generating the virtual viewpoint image from the captured image;
And a determination step of determining the predetermined area based on the vibration information.

画像処理システムを制御する方法であって、
複数の撮影手段から撮影画像を取得する工程と、
前記撮影手段に対する振動情報を取得する振動情報取得工程と、
前記複数の撮影手段からの前記撮影画像に基づいて仮想視点画像を生成する画像生成工程と、
前記複数の撮影手段それぞれに対する前記振動情報に基づいて、前記撮影画像のそれぞれについて前記仮想視点画像の生成に用いるか否かを判定する判定工程と、を備えることを特徴とする画像処理システムの制御方法。 A method for controlling an image processing system comprising:
Acquiring captured images from a plurality of imaging means;
A vibration information acquisition step of acquiring vibration information for the photographing means;
An image generation step of generating a virtual viewpoint image based on the captured images from the plurality of imaging means;
A control step of determining whether to use each of the captured images for generation of the virtual viewpoint image based on the vibration information for each of the plurality of imaging means. Method.

コンピュータを、請求項１から１０のいずれか一項に記載の情報処理装置の各手段として機能させるためのプログラム。 The program for functioning a computer as each means of the information processing apparatus as described in any one of Claims 1-10.