JP6759375B2

JP6759375B2 - Systems, methods and programs that generate virtual viewpoint images

Info

Publication number: JP6759375B2
Application number: JP2019014184A
Authority: JP
Inventors: 梅村　直樹; 直樹梅村
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-01-30
Filing date: 2019-01-30
Publication date: 2020-09-23
Anticipated expiration: 2037-12-14
Also published as: JP2019114269A

Description

本発明は、複数の視点位置から撮像した複数の画像に基づき、仮想視点からの画像を生成する技術に関する。 The present invention relates to a technique for generating an image from a virtual viewpoint based on a plurality of images captured from a plurality of viewpoint positions.

昨今、複数台の実カメラで撮影した画像を用いて、3次元空間内に仮想的に配置した実際には存在しないカメラ（仮想カメラ）からの画像を再現する仮想視点画像技術が注目されている。この仮想視点画像技術によれば、例えば、サッカーやバスケットボールの試合におけるハイライトシーンを様々な角度から視聴することが可能になるため、ユーザにより高臨場感を与えることができる。 Recently, a virtual viewpoint image technology that reproduces an image from a camera (virtual camera) that does not actually exist and is virtually arranged in a three-dimensional space by using images taken by a plurality of real cameras is attracting attention. .. According to this virtual viewpoint image technology, for example, highlight scenes in soccer and basketball games can be viewed from various angles, so that a user can be given a higher sense of presence.

仮想視点画像の生成には、複数の実カメラが撮影した画像データの画像処理サーバ等への集約と、当該サーバ等における３次元モデル（オブジェクトの形状データ）の生成やレンダリング処理が必要となり得る。 In order to generate a virtual viewpoint image, it may be necessary to aggregate image data taken by a plurality of real cameras into an image processing server or the like, and to generate or render a three-dimensional model (object shape data) on the server or the like.

オブジェクトの３次元形状を推定する手法として、「視体積交差法（Visual Hull）」と呼ばれる手法が知られている（特許文献１）。 As a method for estimating the three-dimensional shape of an object, a method called "visual volume crossing method (Visual Hull)" is known (Patent Document 1).

特開２０１４−１０８０５号公報Japanese Unexamined Patent Publication No. 2014-10805

従来の３次元形状を推定する技術では、例えば、撮影範囲に存在するサッカーゴールなどの静止物体である構造物については、３次元モデルが生成されない恐れがあった。これは、３次元形状の推定の対象となるオブジェクトは、撮影画像内の動体である人間等の前景の部分であるためである。つまり、サッカーゴールなど静止状態の構造物は背景として扱われる結果、３次元モデルの生成対象とならない。構造物の３次元モデルが生成されていない状態で仮想視点画像を生成すると、動きのない構造物等は動きのある人物等の後ろに２次元的に表現され、地面等に張りついたように表現されてしまい、実際の撮影シーンとはかけ離れた映像表現になってしまう。図１にその一例を示す。図１は、サッカーのワンシーンの仮想視点画像であるが、サッカーゴール（ゴールポスト、クロスバー、ゴールネットの全要素）が芝生のフィールドに張りついたような画像になっている。また、図１３（ｂ）は、相撲のワンシーンの仮想視点画像であるが、押し出されて土俵下に倒れているはずの力士が土俵上に倒れているかのような画像になっている。 In the conventional technique for estimating the three-dimensional shape, there is a possibility that a three-dimensional model may not be generated for a structure that is a stationary object such as a soccer goal existing in the shooting range. This is because the object for which the three-dimensional shape is estimated is the foreground part of a moving object such as a human being in the captured image. That is, a stationary structure such as a soccer goal is treated as a background and is not a target for generating a three-dimensional model. When a virtual viewpoint image is generated when a three-dimensional model of a structure is not generated, a non-moving structure or the like is represented two-dimensionally behind a moving person or the like, as if it were stuck to the ground or the like. It will be expressed, and the image expression will be far from the actual shooting scene. An example is shown in FIG. FIG. 1 is a virtual viewpoint image of a soccer scene, which looks like a soccer goal (all elements of a goal post, a crossbar, and a goal net) stuck to a grass field. Further, FIG. 13B is a virtual viewpoint image of one scene of sumo wrestling, but it is an image as if a sumo wrestler who should have been pushed out and fell under the ring is lying on the ring.

本発明は、上記課題に鑑みてなされたものであり、その目的は、自然な仮想視点画像が得られるようにすることである。 The present invention has been made in view of the above problems, and an object of the present invention is to obtain a natural virtual viewpoint image.

本発明に係るシステムは、複数の方向から撮影されるオブジェクトに対応する３次元形状データを生成する第１生成手段と、複数の方向から撮影される構造物に対応する３次元形状データを取得する第１取得手段と、複数の方向から撮影される、少なくとも前記オブジェクト及び前記構造物とは異なる背景に対応する背景データを取得する第２取得手段と、指定された視点を示す情報を取得する第３取得手段と、前記第１生成手段により生成された前記オブジェクトに対応する３次元形状データと、前記第１取得手段により取得された前記構造物に対応する３次元形状データと、前記第２取得手段により取得された前記背景データと、前記第３取得手段により取得された前記視点を示す情報とに基づいて、画像を生成する第２生成手段と、を有し、前記第１生成手段は、前記オブジェクトの領域、前記構造物の領域及び前記背景の領域を含む画像と、前記オブジェクトの領域を含まず前記構造物の領域及び前記背景の領域を含む画像とに基づいて、前記オブジェクトに対応する３次元形状データを生成することができることを特徴とする。 The system according to the present invention acquires the first generation means for generating three-dimensional shape data corresponding to an object photographed from a plurality of directions and the three-dimensional shape data corresponding to a structure photographed from a plurality of directions. A first acquisition means, a second acquisition means for acquiring background data corresponding to a background different from at least the object and the structure, which are photographed from a plurality of directions, and a second acquisition means for acquiring information indicating a designated viewpoint. The 3 acquisition means, the 3D shape data corresponding to the object generated by the 1st acquisition means, the 3D shape data corresponding to the structure acquired by the 1st acquisition means, and the 2nd acquisition. The first generation means includes a second generation means for generating an image based on the background data acquired by the means and the information indicating the viewpoint acquired by the third acquisition means. Corresponds to the object based on an image that includes the area of the object, the area of the structure, and the background area, and an image that does not include the area of the object but includes the area of the structure and the background area. It is characterized in that it can generate three-dimensional shape data.

本発明によれば、自然な仮想視点画像を得ることができる。 According to the present invention, a natural virtual viewpoint image can be obtained.

従来手法の問題点を説明する図Diagram explaining the problems of the conventional method 実施形態１に係る、カメラシステムの配置の一例を示す図The figure which shows an example of the arrangement of the camera system which concerns on Embodiment 1. 仮想視点画像生成システムのハードウェア構成の一例を示す図Diagram showing an example of the hardware configuration of the virtual viewpoint image generation system 複数のカメラの共通撮影領域を説明する図The figure explaining the common shooting area of a plurality of cameras ボリュームデータの説明図Explanatory diagram of volume data 実施形態１に係る、構造物モデルの生成過程を示すシーケンス図A sequence diagram showing a structure model generation process according to the first embodiment. （ａ）はサッカーゴールがない状態のフィールドの撮影画像を示し、（ｂ）はサッカーゴールがある状態のフィールドの撮像画像を示す図(A) shows a photographed image of a field without a soccer goal, and (b) shows a captured image of a field with a soccer goal. サッカーゴールの３次元モデルをボリュームデータ上で示した図Figure showing 3D model of soccer goal on volume data 実施形態１に係る、仮想視点画像の生成過程を示すシーケンス図A sequence diagram showing a process of generating a virtual viewpoint image according to the first embodiment. （ａ）は撮影画像の一例を示し、（ｂ）は前景画像の一例を示し、（ｃ）は仮想視点画像の一例を示す図(A) shows an example of a captured image, (b) shows an example of a foreground image, and (c) shows an example of a virtual viewpoint image. 選手の３次元モデルをボリュームデータ上で示した図Figure showing 3D model of player on volume data 実施形態１の変形例に係る、カメラシステムの配置の一例を示す図The figure which shows an example of the arrangement of the camera system which concerns on the modification of Embodiment 1. （ａ）及び（ｂ）は従来手法の問題点を説明する図(A) and (b) are diagrams for explaining the problems of the conventional method. 土俵を真上から見た俯瞰図上で、その周囲を4つの領域に分けたことを示す図A diagram showing that the surrounding area is divided into four areas on a bird's-eye view of the ring from directly above. 実施形態２に係る、撮影シーン内の構造物部分の画像データを間引いて伝送する処理の流を示すフローチャートA flowchart showing a flow of processing for thinning out and transmitting image data of a structure portion in a shooting scene according to the second embodiment. 実施形態２に係る、仮想視点画像の生成処理の流れを示すフローチャートA flowchart showing the flow of the virtual viewpoint image generation process according to the second embodiment.

以下、添付の図面を参照して、本発明を実施する形態について説明する。各実施形態において示す構成は一例に過ぎず、本発明は図示された構成に限定されるものではない。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. The configurations shown in each embodiment are merely examples, and the present invention is not limited to the configurations shown.

実施形態１Embodiment 1

近年、カメラの高画質化によって撮影画像の解像度は上がり、そのデータ量は増加する傾向にある。複数のカメラで撮影した複数視点画像のデータを、ネットワーク網を介してサーバ等に送信する際にそのまま送信するとネットワークに大きな負荷が掛かってしまう。さらには、複数視点画像のデータを受信したサーバ等における３次元モデル生成やレンダリング処理時の計算量も増加する。そこで本実施形態では、複数視点画像のデータ伝送時のネットワーク負荷を抑制しつつ、撮影シーン内に存在する構造物等が現実に近くなるように３次元で表現された自然な仮想視点画像を得る態様について説明する。具体的には、撮影シーン内の静止または静止状態に近い状態が継続する構造物を前景でも背景でもない独自属性のオブジェクトとして切り離し、予め３次元モデル化しておく態様を説明する。以下では、サッカーの試合を撮影シーンとし、構造物としてのサッカーゴールを予め３次元モデル化する場合を例に説明を行うものとする。 In recent years, the resolution of captured images has increased due to the improvement in image quality of cameras, and the amount of data tends to increase. When transmitting the data of the multi-viewpoint image taken by a plurality of cameras to a server or the like via the network, if the data is transmitted as it is, a heavy load is applied to the network. Furthermore, the amount of calculation during the three-dimensional model generation and rendering processing on the server or the like that receives the data of the plural viewpoint image also increases. Therefore, in the present embodiment, while suppressing the network load at the time of data transmission of the multi-viewpoint image, a natural virtual viewpoint image expressed in three dimensions so that the structures and the like existing in the shooting scene become close to reality is obtained. Aspects will be described. Specifically, a mode in which a structure in a shooting scene that remains stationary or close to a stationary state is separated as an object having a unique attribute that is neither a foreground nor a background and modeled in three dimensions in advance will be described. In the following, a soccer match will be used as a shooting scene, and a case where a soccer goal as a structure is modeled in three dimensions in advance will be described as an example.

なお、仮想視点画像とは、エンドユーザ及び／又は選任のオペレータ等が自由に仮想カメラの位置及び姿勢を操作することによって生成される映像であり、自由視点画像や任意視点画像などとも呼ばれる。また、生成される仮想視点画像やその元になる複数視点画像は、動画であっても、静止画であってもよい。以下に述べる各実施形態では、入力される複数視点画像及び出力される仮想視点画像が、共に動画である場合の例を中心に説明するものとする。なお、本実施形態における構造物は、同じアングルから時系列で撮影を行った場合にその位置に変化が見られない静的オブジェクト（静止物体）であればよい。例えば、屋内スタジオなどを撮影シーンとする場合には、家具や小道具を本実施形態でいう構造物として扱うことができる。 The virtual viewpoint image is an image generated by the end user and / or an appointed operator or the like freely manipulating the position and orientation of the virtual camera, and is also called a free viewpoint image or an arbitrary viewpoint image. Further, the generated virtual viewpoint image and the plurality of viewpoint images based on the virtual viewpoint image may be a moving image or a still image. In each of the embodiments described below, an example in which both the input multi-viewpoint image and the output virtual viewpoint image are moving images will be described. The structure in the present embodiment may be a static object (stationary object) whose position does not change when photographs are taken from the same angle in time series. For example, when an indoor studio or the like is used as a shooting scene, furniture and props can be treated as the structure referred to in the present embodiment.

図２は、本実施形態に係る、仮想視点画像生成システムを構成する全１０台のカメラシステム１１０ａ〜１１０ｊの配置を、フィールド２００を真上から見た俯瞰図において示した図である。各カメラシステム１１０ａ〜１１０ｊは、地上からある一定の高さにフィールド２００を囲むように設置されており、一方のゴール前を様々な角度から撮影して、視点の異なる複数視点画像データを取得する。芝生のフィールド２００上には、サッカーコート２０１が（実際には白のラインで）描かれており、その左右両端にサッカーゴールが置かれている。また、左側のサッカーゴール２０２の前の×印２０３はカメラシステム１１０ａ〜１１０ｊの共通の視線方向（注視点）を示し、破線の円２０４は注視点２０３を中心としてカメラシステム１１０ａ〜１１０ｊがそれぞれ撮影可能なエリアを示している。本実施形態では、フィールド２００の1つの角を原点として、長手方向をx軸、短手方向をy軸、高さ方向をz軸とした座標系で表すこととする。 FIG. 2 is a view showing the arrangement of all 10 camera systems 110a to 110j constituting the virtual viewpoint image generation system according to the present embodiment in a bird's-eye view of the field 200 viewed from directly above. Each camera system 110a to 110j is installed so as to surround the field 200 at a certain height from the ground, and photographs one goal in front from various angles to acquire multi-viewpoint image data having different viewpoints. .. A soccer court 201 is drawn (actually with a white line) on the grass field 200, and soccer goals are placed on the left and right ends thereof. Further, the x mark 203 in front of the soccer goal 202 on the left side indicates the common line-of-sight direction (gazing point) of the camera systems 110a to 110j, and the broken line circle 204 is photographed by the camera systems 110a to 110j centering on the gazing point 203. Shows possible areas. In the present embodiment, one corner of the field 200 is set as the origin, and the coordinate system is represented by the x-axis in the longitudinal direction, the y-axis in the lateral direction, and the z-axis in the height direction.

図３は、仮想視点画像生成システムのハードウェア構成の一例を示す図である。図３の仮想視点画像生成システムは、カメラシステム１１０ａ〜１１０ｊ、スイッチングハブ１２０、制御装置１３０、サーバ１４０、及びデータベース１５０で構成される。 FIG. 3 is a diagram showing an example of the hardware configuration of the virtual viewpoint image generation system. The virtual viewpoint image generation system of FIG. 3 is composed of camera systems 110a to 110j, a switching hub 120, a control device 130, a server 140, and a database 150.

各カメラシステム１１０ａ〜１１０ｊ内には、レンズや撮像センサなどで構成される撮像部１１１ａ〜１１１ｊ、及び制御装置１３０の指示に従って撮像部の制御や所定の画像処理を行うカメラアダプタ１１２ａ〜１１２ｊが備わっている。カメラアダプタは、制御や画像処理に必要な演算処理装置（ＣＰＵ或いはＡＳＩＣ）やメモリ（ＲＡＭ及びＲＯＭ）を備えている。また、カメラシステム１１０ａ〜１１０ｊの間は、ネットワークケーブル１６０ａ〜１６０ｉによって、隣り合うカメラシステム同士がデイジーチェーン方式で接続されている。カメラシステム１１０ａ〜１１０ｊで撮影された画像データは、ネットワークケーブル１６０ａ〜１６０ｉを介して伝送される。スイッチングハブ（以下、「ＨＵＢ」と表記）１２０は、ネットワーク上のデータ伝送のルーティングを行う。ＨＵＢ１２０とカメラシステム１１０ａとの間はネットワークケーブル１７０ａで接続され、ＨＵＢ１２０とカメラシステム１１０ｊとの間はネットワークケーブル１７０ｂで接続されている。サーバ１４０は、カメラシステム１１０ａ〜１１０ｊから送信されてきた複数視点画像データを加工して、仮想視点画像データを生成する。また、サーバ１４０は、時刻同期信号を生成してシステム全体の同期制御も担う。データベース１５０（以下、「ＤＢ」と表記）は、サーバ１４０から送られてきた画像データを蓄積し、蓄積した画像データを必要に応じてサーバ１５０に提供する。なお、ＨＵＢ１２０とサーバ１４０との間はネットワークケーブル１７０ｃで、サーバ１４０とＤＢ１５０との間はネットワークケーブル１７０ｄで、ＨＵＢ１２０と制御装置１３０との間はネットワークケーブル１１３ｅで接続されている。制御装置１３０は、各カメラシステム１１０ａ〜１１０ｊやサーバ１４０を統括的に制御する。そして、複数視点画像を元にサーバ１４０で生成された仮想視点画像を、例えば不図示の表示装置やネットワーク上の他の情報処理装置に出力する。図３に示すシステム構成では、複数のカメラシステム間をデイジーチェーン方式で接続しているが、ＨＵＢ１２０と各カメラシステム１１０ａ〜１１０ｊをそれぞれ直接接続するスター型接続でも構わない。また、仮想視点画像生成システムを構成するカメラシステムの数は１０台に限定されるものではない。 Each camera system 110a to 110j includes image pickup units 111a to 111j composed of a lens, an image pickup sensor, and the like, and camera adapters 112a to 112j that control the image pickup unit and perform predetermined image processing according to the instructions of the control device 130. ing. The camera adapter includes an arithmetic processing unit (CPU or ASIC) and memory (RAM and ROM) necessary for control and image processing. Further, between the camera systems 110a to 110j, adjacent camera systems are connected to each other by a network cable 160a to 160i in a daisy chain system. The image data captured by the camera systems 110a to 110j is transmitted via the network cables 160a to 160i. The switching hub (hereinafter referred to as “HUB”) 120 routes data transmission on the network. The HUB 120 and the camera system 110a are connected by a network cable 170a, and the HUB 120 and the camera system 110j are connected by a network cable 170b. The server 140 processes the multi-viewpoint image data transmitted from the camera systems 110a to 110j to generate virtual viewpoint image data. The server 140 also generates a time synchronization signal to control the synchronization of the entire system. The database 150 (hereinafter, referred to as "DB") accumulates image data sent from the server 140, and provides the accumulated image data to the server 150 as needed. The network cable 170c is connected between the HUB 120 and the server 140, the network cable 170d is connected between the server 140 and the DB 150, and the network cable 113e is connected between the HUB 120 and the control device 130. The control device 130 comprehensively controls each camera system 110a to 110j and the server 140. Then, the virtual viewpoint image generated by the server 140 based on the plurality of viewpoint images is output to, for example, a display device (not shown) or another information processing device on the network. In the system configuration shown in FIG. 3, a plurality of camera systems are connected by a daisy chain method, but a star type connection in which the HUB 120 and each camera system 110a to 110j are directly connected may be used. Further, the number of camera systems constituting the virtual viewpoint image generation system is not limited to 10.

ここで、本実施形態における複数視点画像データの取得について説明する。まず、サーバ１４０が、時刻同期信号を各カメラシステムに対して送信する（タイムサーバ機能）。各カメラシステム１１０ａ〜１１０ｊにおいては、内部のカメラアダプタ１１２ａ〜１１２ｊの制御下で各撮像部１１１ａ〜１１１ｊが、受信した時刻同期信号に従って撮影を行う。これにより、フレーム単位で同期が取れた動画による複数視点画像の取得が可能になる。具体的には以下のようにして、各カメラシステムで撮影された画像データがサーバ１４０へと順次伝送される。まず、カメラシステム１１０ａにおいて、撮像部１１１ａによって撮影した画像データに対しカメラアダプタ１１２ａにて後述の画像処理を施した後、ネットワークケーブル１６０ａ介して、カメラシステム１１０ｂに伝送する。カメラシステム１１０ｂは、同様の処理を行って、その撮影画像データを、カメラシステム１１０ａから取得した撮影画像データと合わせてカメラシステム１１０ｃに伝送する。各カメラシステムで同様の処理が実行され、１０台のカメラシステム１１０ａ〜１１０ｊそれぞれで取得された計１０視点分の撮影画像データが、ネットワークケーブル１７０ｂを介してＨＵＢ１２０に伝送されて、最終的にサーバ１４０に送られる。サーバ１４０は、受け取った１０視点分の撮影画像データを用いて、後述する構造物モデルの生成、オブジェクトの形状推定、レンダリングといった画像処理を行う。 Here, the acquisition of the multi-viewpoint image data in the present embodiment will be described. First, the server 140 transmits a time synchronization signal to each camera system (time server function). In the camera systems 110a to 110j, under the control of the internal camera adapters 112a to 112j, the imaging units 111a to 111j take pictures according to the received time synchronization signal. This makes it possible to acquire a multi-viewpoint image with a moving image synchronized in frame units. Specifically, the image data taken by each camera system is sequentially transmitted to the server 140 as follows. First, in the camera system 110a, the image data captured by the imaging unit 111a is subjected to image processing described later by the camera adapter 112a, and then transmitted to the camera system 110b via the network cable 160a. The camera system 110b performs the same processing, and transmits the captured image data to the camera system 110c together with the captured image data acquired from the camera system 110a. The same processing is executed in each camera system, and the captured image data for a total of 10 viewpoints acquired by each of the 10 camera systems 110a to 110j is transmitted to the HUB 120 via the network cable 170b, and finally the server. Sent to 140. The server 140 uses the received image data for 10 viewpoints to perform image processing such as generation of a structure model, estimation of the shape of an object, and rendering, which will be described later.

図４は、上記１０台のカメラシステムのうち４台のカメラシステム１１０ａ〜１１０ｄがそれぞれ有する撮像部１１１ａ〜１１１ｄからの撮影領域を、前述の図２をベースに模式的に表した図である。カメラシステム１１０ａ〜１１０ｄのそれぞれから伸びる三角形の領域４１１〜４１４は、カメラシステム１１０ａ〜１１０ｄにそれぞれ対応する撮影領域を視体積で表したものである。そして、上記４つの三角形で示す撮影領域４１１〜４１４が重なる多角形の領域４１５は、カメラシステム１１０ａ〜１１０ｄの共通撮像領域を表している。ここでは、４台のカメラシステムの場合を例に共通撮像領域を説明したが、同様の方法で、全１０台のカメラシステムにおける共通撮像領域が導出可能である。当然のことながら、全１０台のカメラシステムにおける共通撮影領域は、上述の多角形領域４１５よりも小さくなる。このように共通の注視点を撮像するカメラ群の共通撮像領域は、各カメラが持つ視体積の重複領域を算出することで得ることができる。また、共通撮影領域に存在するオブジェクトの３次元モデルも同様に、各カメラシステムで取得された複数視点画像の重複領域から導出可能である。 FIG. 4 is a diagram schematically showing the photographing regions from the imaging units 111a to 111d of the four camera systems 110a to 110d among the ten camera systems, respectively, based on FIG. 2 described above. The triangular regions 411 to 414 extending from each of the camera systems 110a to 110d represent the imaging regions corresponding to the camera systems 110a to 110d in terms of visual volume. The polygonal region 415 on which the photographing regions 411 to 414 represented by the four triangles overlap represents the common imaging region of the camera systems 110a to 110d. Here, the common imaging region has been described by taking the case of four camera systems as an example, but the common imaging region in all ten camera systems can be derived by the same method. As a matter of course, the common shooting area in all 10 camera systems is smaller than the above-mentioned polygonal area 415. The common imaging region of the camera group that captures the common gazing point can be obtained by calculating the overlapping region of the visual volume of each camera. Similarly, the three-dimensional model of the object existing in the common shooting area can be derived from the overlapping area of the multi-viewpoint image acquired by each camera system.

次に、本実施形態の特徴の１つである、上述のようにして得た共通撮像領域内に存在する構造物を３次元モデル化する方法について説明する。ここでは、サッカーゴール２０２の３次元モデルを生成する場合を例に説明を行う。まず、フィールド２００上の３次元空間を一定の大きさを持つ立方体（ボクセル）で充填したボリュームデータ（図５を参照）を用意する。ボリュームデータを構成するボクセルの値は０と１で表現され、「１」は形状領域、「０」は非形状領域をそれぞれ示す。図５において、符号５０１がボクセル（実際のサイズよりも説明の便宜上大きく表記）を示している。次に、各カメラシステム１１０ａ〜１１０ｊが備える撮像部１１１ａ〜１１１ｊのカメラパラメータを用いて、ボクセルの３次元座標をワールド座標系からカメラ座標系に変換する。そして、構造物がそのカメラ座標系にある場合は、ボクセルによって当該構造物の３次元形状を表したモデル（構造物モデル）が生成される。なお、カメラパラメータとは、各撮像部１１１ａ〜１１１ｊの設置位置や向き（視線方向）並びにレンズの焦点距離等の情報を指す。 Next, a method of three-dimensionally modeling the structure existing in the common imaging region obtained as described above, which is one of the features of the present embodiment, will be described. Here, a case where a three-dimensional model of the soccer goal 202 is generated will be described as an example. First, volume data (see FIG. 5) in which the three-dimensional space on the field 200 is filled with cubes (voxels) having a constant size is prepared. The values of voxels constituting the volume data are represented by 0 and 1, where "1" indicates a shape region and "0" indicates a non-shape region, respectively. In FIG. 5, reference numeral 501 indicates a voxel (larger than the actual size for convenience of explanation). Next, the three-dimensional coordinates of the voxels are converted from the world coordinate system to the camera coordinate system by using the camera parameters of the imaging units 111a to 111j included in each camera system 110a to 110j. Then, when the structure is in the camera coordinate system, a model (structure model) representing the three-dimensional shape of the structure is generated by the voxel. The camera parameters refer to information such as the installation position and orientation (line-of-sight direction) of each imaging unit 111a to 111j, and the focal length of the lens.

図６は、撮影シーン内に存在する構造物モデルの生成過程を示すシーケンス図である。このシーケンス図で示される一連の処理を、例えば競技場の設営時など、仮想視点画像の元データとなる複数視点画像の本編の撮影開始前（例えば試合開始前）に行っておく。図６においては、１０台のカメラシステム１１０ａ〜１１０ｊの集合を「カメラシステム群」として表記している。 FIG. 6 is a sequence diagram showing the generation process of the structure model existing in the shooting scene. A series of processes shown in this sequence diagram are performed before the start of shooting the main part of the multi-viewpoint image, which is the original data of the virtual viewpoint image (for example, before the start of the game), such as when setting up a stadium. In FIG. 6, a set of 10 camera systems 110a to 110j is represented as a “camera system group”.

ステップ６０１では、各撮像部１１１ａ〜１１１ｊが、構造物がない（ここでは、サッカーゴール２０２が未設置）状態の対象３次元空間（ここではフィールド２００）を撮影する。図７（ａ）は、サッカーゴール２０２がない状態でのフィールド２００をカメラシステム１１０ｉの撮像部１１１ｉから撮影して得られた画像を示している。視点の異なるこのような撮影画像がそれぞれのカメラシステムにおいて取得される。 In step 601 each of the imaging units 111a to 111j photographs the target three-dimensional space (here, the field 200) in a state where there is no structure (here, the soccer goal 202 is not installed). FIG. 7A shows an image obtained by photographing the field 200 without the soccer goal 202 from the imaging unit 111i of the camera system 110i. Such captured images with different viewpoints are acquired in each camera system.

次に、ステップ６０２では、各撮像部１１１ａ〜１１１ｊが、構造物がある（ここでは、サッカーゴール２０２０が設置）状態の対象３次元空間（フィールド２００）を撮影する。図７（ｂ）は、サッカーゴール２０２がある状態でのフィールド２００をカメラシステム１１０ｉの撮像部１１１ｉから撮影して得られた画像を示している。ステップ６０１と同様、視点の異なるこのような撮影画像がそれぞれのカメラシステムにおいて取得される。なお、ステップ６０１及び６０２で得た撮影画像データは、各カメラアダプタ１１２ａ〜１１２ｊ内のメモリで保持しているものとする。 Next, in step 602, the imaging units 111a to 111j photograph the target three-dimensional space (field 200) in a state where there is a structure (here, the soccer goal 2020 is installed). FIG. 7B shows an image obtained by photographing the field 200 with the soccer goal 202 from the imaging unit 111i of the camera system 110i. Similar to step 601 such captured images with different viewpoints are acquired in each camera system. It is assumed that the captured image data obtained in steps 601 and 602 are held in the memories in the camera adapters 112a to 112j.

ステップ６０３では、各カメラアダプタ１１２ａ〜１１２ｊが、ステップ６０１で得た撮影画像とステップ６０２で得た撮影画像との差分から、構造物が写っている画像領域とそれ以外の背景が写っている画像領域とに分離する。これにより、構造物（ここではサッカーゴール２０２）に対応する画像データとそれ以外の背景（ここではフィールド２００）に対応する画像データが得られる。 In step 603, each camera adapter 112a to 112j uses the difference between the captured image obtained in step 601 and the captured image obtained in step 602 to show an image area in which the structure is shown and an image in which the other background is shown. Separate into areas. As a result, image data corresponding to the structure (here, soccer goal 202) and image data corresponding to the other background (here, field 200) can be obtained.

ステップ６０４では、各カメラアダプタ１１２ａ〜１１２ｊが、ステップ６０３で得られた構造物に対応する画像データと背景に対応する画像データをサーバ１４０に対して伝送する。 In step 604, each camera adapter 112a to 112j transmits the image data corresponding to the structure obtained in step 603 and the image data corresponding to the background to the server 140.

ステップ６０５では、サーバ１４０が、各カメラシステムから受信した構造物の画像データと各カメラシステムのカメラパラメータとに基づいて、前述したボクセルで構成される構造物（ここではサッカーゴール２０２）の３次元モデルを生成する。図８は、サッカーゴール２０２の３次元モデルを、前述のボリュームデータ上にて示した図である。なお、ボクセルそのものではなく、ボクセルの中心を示す点の集合（点群）によって、３次元形状を表現してもよい。こうして生成された構造物モデルは、サーバ１４０内のメモリ或いはＤＢ１５０に保存される。また、構造物の画像データと一緒に受け取った背景の画像データも併せて保存される。 In step 605, the server 140 is three-dimensional of the structure composed of the above-mentioned voxels (here, soccer goal 202) based on the image data of the structure received from each camera system and the camera parameters of each camera system. Generate a model. FIG. 8 is a diagram showing a three-dimensional model of the soccer goal 202 on the above-mentioned volume data. The three-dimensional shape may be expressed not by the voxel itself but by a set (point cloud) of points indicating the center of the voxel. The structure model generated in this way is stored in the memory in the server 140 or in the DB 150. In addition, the background image data received together with the image data of the structure is also saved.

以上が、撮影シーン内の構造物モデルを生成する際の処理の流れである。同様の手法で、例えばコーナーフラッグといった他の構造物の３次元モデルを生成してもよい。なお、本実施形態では、カメラアダプタ側にて構造物とそれ以外の背景との分離を行ったが、これをサーバ１４０側で行ってもよい。 The above is the flow of processing when generating the structure model in the shooting scene. A three-dimensional model of another structure, such as a corner flag, may be generated in a similar manner. In the present embodiment, the structure and the background other than the structure are separated on the camera adapter side, but this may be performed on the server 140 side.

続いて、上述のようにして得られた構造物モデルを用いて、撮影シーン内に存在する構造物が違和感なく表現される仮想視点画像の生成について説明する。図９は、本実施形態に係る、仮想視点画像の生成過程を示すシーケンス図である。図６のシーケンス図と同様、１０台のカメラシステム１１０ａ〜１１０ｊの集合を「カメラシステム群」として表記している。 Subsequently, using the structure model obtained as described above, the generation of a virtual viewpoint image in which the structure existing in the shooting scene is expressed without discomfort will be described. FIG. 9 is a sequence diagram showing a process of generating a virtual viewpoint image according to the present embodiment. Similar to the sequence diagram of FIG. 6, a set of 10 camera systems 110a to 110j is described as a “camera system group”.

サッカーの試合開始などに合わせ、ステップ９０１では、制御装置１３０が、サーバ１４０に対し仮想視点画像の元になる複数視点画像の撮影指示（撮影開始コマンド）を送る。続くステップ９０２では、制御装置１３０からの撮影指示を受けて、サーバ１４０が、各カメラシステム１１０ａ〜１１０ｊに対し、時刻同期信号を送信する。そして、ステップ９０３では、各カメラシステム１１０ａ〜１１０ｊが、対象３次元空間（ここでは、フィールド２００上の３次元空間）の撮影を開始する。これにより例えばカメラシステム１１０ｉにおいては、図１０（ａ）で示すような、サッカーの試合中の画像が得られる。そして、視点の異なるこのような画像の撮影がそれぞれのカメラシステムにおいて行われる。 In step 901, the control device 130 sends a shooting instruction (shooting start command) of a multi-viewpoint image, which is a source of the virtual viewpoint image, to the server 140 in accordance with the start of a soccer match or the like. In the following step 902, the server 140 transmits a time synchronization signal to each of the camera systems 110a to 110j in response to a shooting instruction from the control device 130. Then, in step 903, each camera system 110a to 110j starts shooting the target three-dimensional space (here, the three-dimensional space on the field 200). As a result, for example, in the camera system 110i, an image during a soccer match as shown in FIG. 10A can be obtained. Then, such images with different viewpoints are taken in each camera system.

ステップ９０４では、各カメラアダプタ１１２ａ〜１１２ｊにおいて、ステップ９０３で取得した撮影画像から、動きのあるオブジェクトからなる前景（ここでは、選手とボール）のデータを抽出する処理が実行される。この抽出処理は、ステップ９０３で取得した撮影画像と前述のステップ６０２で取得した構造物ありの撮影画像（図７（ｂ））とを比較し、その差分に基づいて前景と背景とに分離する処理と言い換えることができる。図１０（ｂ）は、図１０（ａ）の撮影画像（全景画像）から抽出された前景のみの画像を示している。続くステップ９０５では、各カメラアダプタ１１２ａ〜１１２ｊが、抽出された前景の画像データをサーバ１４０に対して伝送する。このとき、フィールド２００やサッカーゴール２０２に対応する画像領域（背景の画像データ）については、サーバ１４０に伝送されない。よって、その分だけデータ伝送量が抑制される。 In step 904, each camera adapter 112a to 112j executes a process of extracting data of a foreground (here, a player and a ball) composed of moving objects from the captured image acquired in step 903. In this extraction process, the captured image acquired in step 903 and the captured image with a structure (FIG. 7 (b)) acquired in step 602 described above are compared, and the foreground and background are separated based on the difference. It can be rephrased as processing. FIG. 10B shows an image of only the foreground extracted from the captured image (panoramic image) of FIG. 10A. In the following step 905, the camera adapters 112a to 112j transmit the extracted foreground image data to the server 140. At this time, the image area (background image data) corresponding to the field 200 and the soccer goal 202 is not transmitted to the server 140. Therefore, the amount of data transmission is suppressed by that amount.

ステップ９０６では、ユーザ指示に基づき、制御装置１３０が、仮想視点画像の生成指示（生成開始コマンド）を、仮想視点や注視点に関する情報と共にサーバ１４０に送信する。この際、仮想視点画像を作成・視聴したいユーザは、制御装置１３０が備えるＧＵＩ（不図示）を介して、仮想視点画像の生成に必要な情報を入力する。具体的には、仮想視点の位置やその移動経路、さらにはどこ（どのオブジェクト）を注視するのかといった仮想視点画像の生成に必要な情報（以下、「仮想視点情報」と呼ぶ。）を所定のＵＩ画面を介して設定する。 In step 906, the control device 130 transmits a virtual viewpoint image generation instruction (generation start command) to the server 140 together with information on the virtual viewpoint and the gazing point based on the user instruction. At this time, the user who wants to create and view the virtual viewpoint image inputs the information necessary for generating the virtual viewpoint image via the GUI (not shown) included in the control device 130. Specifically, information necessary for generating a virtual viewpoint image (hereinafter, referred to as "virtual viewpoint information") such as the position of the virtual viewpoint, its movement path, and where (which object) to gaze at is predetermined. Set via UI screen.

ステップ９０７では、サーバ１４０が、カメラ群から受信した前景の画像データと前述のカメラパラメータとを用いて、撮影シーン内で動きのあるオブジェクトの３次元モデル（前景モデル）を生成する。ここでは、選手とボールの３次元モデルが前景モデルとして生成されることになる。図１１は、本ステップで生成される選手とボールの３次元モデルのうち、ある一人の選手に対応する３次元モデルを、前述の図８と同様、ボリュームデータ上にて示した図である。 In step 907, the server 140 generates a three-dimensional model (foreground model) of a moving object in the shooting scene by using the foreground image data received from the camera group and the above-mentioned camera parameters. Here, a three-dimensional model of the player and the ball is generated as a foreground model. FIG. 11 is a diagram showing a three-dimensional model corresponding to a certain player among the three-dimensional models of the player and the ball generated in this step on the volume data as in FIG. 8 described above.

ステップ９０８では、サーバ１４０は、制御装置１３０から受け取った仮想視点情報、ステップ９０７で取得した前景モデル、及び予め生成・取得しておいた構造物モデル及び背景データを用いて、仮想視点画像を生成する。具体的には、例えばVisual Hull手法などを用いて、設定された仮想視点（仮想カメラ）から見た場合の構造物モデルと前景モデルのそれぞれの形状推定を行う。この形状推定処理の結果、撮影シーン内に存在するオブジェクトの３次元形状を表現したボリュームデータが得られる。こうして、仮想視点からみたオブジェクトの３次元形状が得られると、次に、これらオブジェクトの３次元形状を１つの画像に合成する。合成処理の際、設定された仮想視点からの距離が、構造物モデルよりも前景モデルの方が近い場合は、構造物モデルの上から前景モデルをマッピングする。逆に、構造物モデルの方が前景モデルよりも仮想視点に近い場合は、前景モデルの上から構造物モデルをマッピングする。こうして、例えばカメラシステム１１０ｉの撮像部１１１ｉからの視点を高さ方向（+z方向）に異動した点を仮想視点とした場合の仮想視点画像は、図１０（ｃ）に示したような画像となる。図１０（ｃ）に示す仮想視点画像においては、前景モデルである選手とボール、構造物モデルであるサッカーゴールが、いずれも自然な３次元形状にてフィールド２００上にマッピングされているのが分かる。このような処理を、別途設定されたタイムフレーム数分だけ繰り返すことで、動画による所望の仮想視点画像が得られる。 In step 908, the server 140 generates a virtual viewpoint image using the virtual viewpoint information received from the control device 130, the foreground model acquired in step 907, and the structure model and background data generated / acquired in advance. To do. Specifically, for example, using the Visual Hull method or the like, the shapes of the structure model and the foreground model when viewed from the set virtual viewpoint (virtual camera) are estimated. As a result of this shape estimation process, volume data expressing the three-dimensional shape of the object existing in the shooting scene can be obtained. Once the three-dimensional shapes of the objects viewed from the virtual viewpoint are obtained in this way, the three-dimensional shapes of these objects are then combined into one image. In the synthesis process, if the distance from the set virtual viewpoint is closer to the foreground model than to the structure model, the foreground model is mapped from above the structure model. On the contrary, when the structure model is closer to the virtual viewpoint than the foreground model, the structure model is mapped from above the foreground model. In this way, for example, when the point at which the viewpoint from the imaging unit 111i of the camera system 110i is moved in the height direction (+ z direction) is used as the virtual viewpoint, the virtual viewpoint image is the image as shown in FIG. 10 (c). Become. In the virtual viewpoint image shown in FIG. 10C, it can be seen that the player and the ball, which are the foreground model, and the soccer goal, which is the structure model, are all mapped on the field 200 in a natural three-dimensional shape. .. By repeating such processing for the number of time frames set separately, a desired virtual viewpoint image by moving image can be obtained.

なお、本実施形態では、図９のシーケンスにおいて、背景の画像データを一切伝送しないことでトータルのデータ伝送量の抑制を図っている。この場合、例えば屋外でのスポーツシーンを動画で撮影する場合などでは日照条件などが時系列で変化することから、出来上がる仮想視点画像における背景部分が実際と異なってしまうという問題が起こり得る。このような問題が懸念される場合には、前景の画像データを伝送する合間に、ステップ９０４の前景・背景分離で得られた背景の画像データを適宜伝送してもよい。 In the present embodiment, in the sequence of FIG. 9, the total amount of data transmission is suppressed by not transmitting any background image data. In this case, for example, when shooting an outdoor sports scene as a moving image, the sunshine conditions and the like change in time series, so that there may be a problem that the background portion in the completed virtual viewpoint image is different from the actual one. When such a problem is a concern, the background image data obtained in the foreground / background separation in step 904 may be appropriately transmitted between the transmission of the foreground image data.

また、本実施形態では、構造物モデルの生成と前景モデルの生成とをサーバ１４０で行っているが、これに限定されない。例えば、構造物モデルの生成までをカメラアダプタで行ってそれをサーバ１４０に送信してもよい。或いは他の情報処理装置で生成した構造物モデルのデータをサーバ１４０が取得してもよい。要は、複数視点画像から抽出した前景データから前景モデルを生成する段階で、サーバ１４０において構造物モデルが利用可能な状態になっていればよい。 Further, in the present embodiment, the structure model and the foreground model are generated by the server 140, but the present invention is not limited to this. For example, the structure model may be generated by the camera adapter and transmitted to the server 140. Alternatively, the server 140 may acquire the data of the structure model generated by another information processing device. The point is that the structure model may be available on the server 140 at the stage of generating the foreground model from the foreground data extracted from the plurality of viewpoint images.

＜変形例＞
上述の例では、撮影シーン内の構造物を前景でも背景でもない独自属性のオブジェクトとして扱い、予め構造物の３次元モデルを生成・保持しておくことでデータ伝送量の削減を図った。データ伝送量の削減の観点からは、構造物の３次元モデルを背景として扱うことでもその目的は達成可能である。ただし、構造物モデルを背景として扱う場合には、以下のような問題が生じる。 <Modification example>
In the above example, the structure in the shooting scene is treated as an object having a unique attribute that is neither the foreground nor the background, and a three-dimensional model of the structure is generated and held in advance to reduce the amount of data transmission. From the viewpoint of reducing the amount of data transmission, the purpose can be achieved by treating the three-dimensional model of the structure as a background. However, when the structure model is used as the background, the following problems occur.

図１２は、撮影シーンが相撲の場合の、本変形例に係る仮想視点画像生成システムを構成する全１０台のカメラシステム１１０ａ〜１１０ｊの配置を示した図である。各カメラシステム１１０ａ〜１１０ｊは、相撲会場の天井に土俵の周りを囲むように設置されており、土俵上を様々な角度から撮影して、視点の異なる複数視点画像データを取得する。この場合、土俵（＝構造物）だけの状態を撮影した画像に基づき３次元モデル化を行ない、得られた土俵の３次元形状を背景として扱う。 FIG. 12 is a diagram showing the arrangement of all 10 camera systems 110a to 110j constituting the virtual viewpoint image generation system according to the present modification when the shooting scene is sumo. Each of the camera systems 110a to 110j is installed on the ceiling of the sumo venue so as to surround the ring, and photographs the top of the ring from various angles to acquire multi-viewpoint image data having different viewpoints. In this case, a three-dimensional model is performed based on an image of only the ring (= structure), and the obtained three-dimensional shape of the ring is treated as a background.

ここで、力士２人が相撲を取った結果、例えば図１３（ａ）に示すように片方の力士が土俵上から落ちたとする。この図１３（ａ）の状態を全１０台のカメラシステム１１０ａ〜１１０ｊで撮影してサーバ１４０に対して前景の画像データのみを伝送するケースを考える。サーバ１４０にて前景の画像データを受信したサーバ１４０は、背景として予め作製しておいた土俵の３次元モデル上に前景である力士２人をマッピングすることになる。そうすると、図１３（ｂ）に示すように、押し出されて土俵の外に倒れているはずの力士が土俵の上で倒れているかのような画像になってしまう。つまり、３次元モデル化した構造物を背景として扱う場合、前景の位置によっては自然な仮想視点画像とはならない。そこで、構造物モデルを背景として扱う場合には、自然な仮想視点画像を得られるかどうかを事前に判定し、不自然な仮想視点画像となる可能性が高い場合にユーザに対し警告を行なうことが望ましい。 Here, it is assumed that as a result of two wrestlers wrestling, one wrestler has fallen from the ring as shown in FIG. 13 (a), for example. Consider a case in which the state of FIG. 13A is photographed by all 10 camera systems 110a to 110j and only the foreground image data is transmitted to the server 140. The server 140, which receives the image data of the foreground on the server 140, maps the two wrestlers who are the foreground on the three-dimensional model of the ring prepared in advance as the background. Then, as shown in FIG. 13B, the image looks as if a wrestler who should have been pushed out and fell outside the ring is lying on the ring. That is, when a three-dimensional modeled structure is treated as a background, it does not become a natural virtual viewpoint image depending on the position of the foreground. Therefore, when treating the structure model as a background, it is determined in advance whether or not a natural virtual viewpoint image can be obtained, and a warning is given to the user when there is a high possibility that the virtual viewpoint image will be unnatural. Is desirable.

図１４は、土俵を真上から見た俯瞰図であり、土俵の周りをA、B、C、Dの4つの領域に分けている。このA、B、C、Dの領域それぞれは、土俵の下（土俵外）の部分を示している。中央の×印は、カメラシステム１１０ａ〜１１０ｊ内の撮像部１１１ａ〜１１１ｊの注視点である。本変形例では、仮想視点画像の生成指示があった際に、その前景の位置を確かめる。上記の例では、力士の位置が土俵上であるかどうかを、指定された仮想視点（仮想カメラ）からの距離或いは図示していない土俵全体を俯瞰で撮影しているカメラの画像にも基づき判定する。そして、力士のうち少なくとも１人が土俵上におらず、指定された仮想視点の位置と力士の位置とが、A〜Dのいずれか同じ領域内に存在していなければ仮想視点画像の生成不能と判断し、警告を行なうようにする。一方がAの領域内で他方がCの領域内といったように、仮想視点の位置が存在する領域と、力士のいる位置の領域とが異なる場合は、実際とは異なる場所に力士を貼り付けたような、不自然な仮想視点画像が生成される可能性が高いためである。このように、構造物モデルを背景として扱う場合には、留意が必要である。 FIG. 14 is a bird's-eye view of the ring from directly above, and the area around the ring is divided into four areas, A, B, C, and D. Each of the areas A, B, C, and D shows the part under the ring (outside the ring). The x mark in the center is the gazing point of the imaging units 111a to 111j in the camera systems 110a to 110j. In this modified example, when a virtual viewpoint image generation instruction is given, the position of the foreground is confirmed. In the above example, whether or not the wrestler's position is on the ring is determined based on the distance from the specified virtual viewpoint (virtual camera) or the image of the camera that is taking a bird's-eye view of the entire ring (not shown). To do. Then, if at least one of the wrestlers is not on the ring and the position of the designated virtual viewpoint and the position of the wrestler do not exist in the same area of any of A to D, the virtual viewpoint image cannot be generated. And give a warning. If the area where the virtual viewpoint position exists and the area where the wrestler is located are different, such as one in the area of A and the other in the area of C, the wrestler is pasted in a place different from the actual one. This is because there is a high possibility that such an unnatural virtual viewpoint image will be generated. In this way, care must be taken when treating the structure model as a background.

本実施形態によれば、構造物についてはその３次元モデルを予め作成しておき、他の前景モデルと異なる扱いをする。これによって、仮想視点画像の元になる複数視点画像のデータ伝送量を抑制しつつ、撮影シーン内の構造物が違和感なく表現された仮想視点画像を生成することが可能となる。 According to this embodiment, a three-dimensional model of the structure is created in advance and treated differently from other foreground models. This makes it possible to generate a virtual viewpoint image in which the structure in the shooting scene is expressed without discomfort while suppressing the amount of data transmission of the plurality of viewpoint images that are the basis of the virtual viewpoint image.

実施形態２Embodiment 2

実施形態１は、撮影シーン内の構造物を、前景でも背景でもない独自属性のオブジェクトとして切り離し、予め３次元モデル化してサーバに保持しておくことで、データ転送量を抑制する態様であった。次に、撮影シーン内の構造物を前景として扱いつつ、構造物についてはデータを間引いて伝送することで、データ転送量を抑制する態様を実施形態２として説明する。なお、システム構成など実施形態１と共通する内容については説明を省略ないしは簡略化し、以下では差異点を中心に説明するものとする。 The first embodiment is an embodiment in which the amount of data transfer is suppressed by separating the structure in the shooting scene as an object having a unique attribute that is neither the foreground nor the background, modeling it in advance in three dimensions, and holding it in the server. .. Next, a mode in which the amount of data transfer is suppressed by thinning out and transmitting data for the structure while treating the structure in the shooting scene as the foreground will be described as the second embodiment. The contents common to the first embodiment such as the system configuration will be omitted or simplified, and the differences will be mainly described below.

本実施形態でも、実施形態１と同じ、サッカーの試合を撮影シーンとする場合を例に説明を行う。すなわち、カメラシステムの配置は前述の図２と同じであるとの前提で、以下説明を行う。この場合において、構造物であるサッカーゴールを、選手やボールとは区別しつつも、前景モデルとして扱う。図１５は、本実施形態に係る、撮影シーン内の構造物部分の画像データを間引いて伝送する処理の流を示すフローチャートである。図１５のフローは、制御装置１３０のＵＩを介してユーザが仮想視点画像の元になる複数視点画像の撮影を指示した場合に、各カメラシステムにおいてその実行を開始する。すなわち、カメラアダプタ内のＣＰＵ等が所定のプログラムを実行することで実現される。 Also in this embodiment, the same as in the first embodiment, the case where a soccer match is used as a shooting scene will be described as an example. That is, the following description will be given on the assumption that the arrangement of the camera system is the same as that in FIG. In this case, the soccer goal, which is a structure, is treated as a foreground model while being distinguished from the players and the ball. FIG. 15 is a flowchart showing a flow of processing for thinning out and transmitting image data of a structure portion in a shooting scene according to the present embodiment. The flow of FIG. 15 starts execution in each camera system when a user instructs the user to take a multi-viewpoint image which is a source of the virtual viewpoint image via the UI of the control device 130. That is, it is realized by the CPU or the like in the camera adapter executing a predetermined program.

ここで、図１５のフローの実行開始前には、その準備処理を終えている必要がある。具体的には、各カメラシステム１１０ａ〜１１０ｊにおいて、フィールド２００を構造物がない状態とある状態とで撮影した全景画像（図７（ａ）及び（ｂ）を参照）をそれぞれ取得し、各カメラアダプタ１１２ａ〜１１２ｊ内のメモリに保持しておく。この準備処理は、例えば試合が始まる前の競技場の設営時点に行っておく。なお、準備処理で得られたこれら画像のデータはサーバ１４０へも送信され、後述の仮想視点画像の生成処理で参照するためにサーバ１４０内のメモリに保持される。このような準備処理の完了を前提に、図１５のフローが実行可能となる。 Here, it is necessary to complete the preparatory process before starting the execution of the flow of FIG. Specifically, in each camera system 110a to 110j, panoramic images (see FIGS. 7A and 7B) taken in the field 200 with and without a structure are acquired, and each camera It is stored in the memory in the adapters 112a to 112j. This preparatory process is performed, for example, at the time of setting up the stadium before the start of the game. The data of these images obtained in the preparatory process is also transmitted to the server 140, and is held in the memory in the server 140 for reference in the virtual viewpoint image generation process described later. The flow of FIG. 15 can be executed on the premise that such a preparatory process is completed.

まず、ステップ１５０１では、各カメラアダプタ１１２ａ〜１１２ｊにおいて、その内部に持つカウンタ（不図示）の値が初期化される。具体的には、初期値として“0”が設定される。続くステップ１５０２では、各撮像部１１１ａ〜１１１ｊにおいて、サーバ１４０から送信されてくる時刻同期信号に従った撮影が開始される。次に、ステップ１５０３では、現在のカウンタ値が“０”であるか否かによって、以降の処理の切り分けがなされる。カウンタ値が“０”であればステップ１５０７に進み、“０”以外の値であればステップ１５０４に進む。 First, in step 1501, the values of the counters (not shown) held inside the camera adapters 112a to 112j are initialized. Specifically, "0" is set as the initial value. In the following step 1502, the imaging units 111a to 111j start taking pictures according to the time synchronization signal transmitted from the server 140. Next, in step 1503, the subsequent processing is separated depending on whether or not the current counter value is “0”. If the counter value is "0", the process proceeds to step 1507, and if the counter value is other than "0", the process proceeds to step 1504.

ステップ１５０４では、カウンタ値が“１”だけ減算（デクリメント）される。続くステップ１５０５では、各カメラアダプタ１１２ａ〜１１２ｊにおいて、各撮像部１１１ａ〜１１１ｊで撮影した画像（フレーム）から前景の領域を抽出する処理が実行される。具体的には、準備処理で取得・保持しておいた２パターンの全景画像のうち構造物ありの全景画像を用いて、撮影画像との差分を求める処理（前景・背景分離処理）を行う。いま、準備処理で取得した２パターンの全景画像のうち構造物ありの全景画像には、構造物としてのサッカーゴール２０２がフィールド２００上に設置された状態で写っている（図７（ｂ））。したがって、サッカーゴールを含まない、選手やボールといった動的オブジェクトのみが写っている領域を切り出した画像が前景データとして得られることになる。そして、ステップ１５０６では、各カメラアダプタ１１２ａ〜１１２ｊが、ステップ１５０５で得られた構造物を含まない前景データをサーバ１４０に対して送信する。前景データの送信を終えると、ステップ１５１０に進み、撮影終了かどうかが判定される。サーバ１４０から撮影終了の指示を受信していなければステップ１５０３に戻る。 In step 1504, the counter value is decremented by "1". In the following step 1505, each camera adapter 112a to 112j executes a process of extracting a foreground region from the images (frames) captured by the imaging units 111a to 111j. Specifically, the process of obtaining the difference from the captured image (foreground / background separation process) is performed using the panoramic image with a structure out of the two patterns of panoramic images acquired and held in the preparatory process. Of the two patterns of panoramic images acquired in the preparatory process, the panoramic image with a structure shows the soccer goal 202 as a structure installed on the field 200 (FIG. 7 (b)). .. Therefore, an image obtained by cutting out an area in which only dynamic objects such as players and balls are shown, which does not include a soccer goal, can be obtained as foreground data. Then, in step 1506, each camera adapter 112a to 112j transmits the foreground data not including the structure obtained in step 1505 to the server 140. When the transmission of the foreground data is completed, the process proceeds to step 1510, and it is determined whether or not the shooting is completed. If the instruction to end shooting has not been received from the server 140, the process returns to step 1503.

ステップ１５０７では、各カメラアダプタ１１２ａ〜１１２ｊにおいて、各撮像部１１１ａ〜１１１ｊで撮影した画像（フレーム）から前景の領域を抽出する処理が実行される。具体的には、準備処理で取得・保持しておいた２パターンの全景画像のうち構造物なしの全景画像を用いて、撮影画像との差分を求める前景・背景分離処理を行う。いま、準備処理で取得した２パターンの全景画像のうち構造物なしの全景画像には、サッカーゴール２０２が未設置状態のフィールド２００だけが写っている（図７（ａ））。したがって、選手やボールが写っている領域だけでなくサッカーゴールが写っている領域をも併せて切り出した画像が前景データとして得られることになる。つまり、本ステップにおいては、構造物であるサッカーゴールも前景として抽出される。そして、ステップ１５０８では、各カメラアダプタ１１２ａ〜１１２ｊが、ステップ１５０７で得られた「構造物を含んだ前景データ」をサーバ１４０に対し送信する。この際、構造物の領域も前景データに含まれていることが受信したサーバ１４０側でも判るよう、構造物の有無を示す情報（例えば、含む場合を“１”含まない場合を“０”で示す２値フラグ）を付与して送信を行う。続くステップ１５０９では、カウンタに所定の値Ｎ（Ｎ＞１）が設定される。具体的には、各撮像部１１１ａ〜１１１ｊによる動画撮影のフレームレートが６０ｆｐｓの場合、例えば“６０”といった値が設定される。ユーザは、カウンタに設定する所定値を任意の値とすることで、構造物を含んだ前景データを送信する頻度（Ｎ回に１回）を自由に変更することができる。カウンタに対する所定値の設定を終えた後は、ステップ１５１０に進み、撮影終了かどうかが判定される。サーバ１４０から撮影終了の指示を受信していなければステップ１５０３に戻る。 In step 1507, each camera adapter 112a to 112j executes a process of extracting a foreground region from images (frames) captured by the imaging units 111a to 111j. Specifically, the foreground / background separation process for obtaining the difference from the captured image is performed using the panoramic image without a structure out of the two patterns of panoramic images acquired / held in the preparatory process. Of the two patterns of panoramic images acquired in the preparatory process, only the field 200 in which the soccer goal 202 is not installed is shown in the panoramic image without structures (FIG. 7A). Therefore, an image obtained by cutting out not only the area where the player or the ball is shown but also the area where the soccer goal is shown can be obtained as the foreground data. That is, in this step, the soccer goal, which is a structure, is also extracted as the foreground. Then, in step 1508, each camera adapter 112a to 112j transmits the "foreground data including the structure" obtained in step 1507 to the server 140. At this time, information indicating the presence or absence of the structure (for example, "1" when it is included and "0" when it is not included is set so that the server 140 that received the data can know that the area of the structure is also included in the foreground data. The indicated binary flag) is added and transmission is performed. In the following step 1509, a predetermined value N (N> 1) is set in the counter. Specifically, when the frame rate of moving image shooting by each imaging unit 111a to 111j is 60 fps, a value such as "60" is set. The user can freely change the frequency (once every N times) of transmitting the foreground data including the structure by setting a predetermined value set in the counter to an arbitrary value. After completing the setting of the predetermined value for the counter, the process proceeds to step 1510, and it is determined whether or not the shooting is completed. If the instruction to end shooting has not been received from the server 140, the process returns to step 1503.

以上が、撮影シーン内の構造物部分の画像データを間引いて伝送する処理の内容である。このような処理を行う結果、例えば所定値としてフレームレートと同じ値がカウンタに設定された場合は、６０回に１回のみ構造物（ここではサッカーゴール）を含む前景の画像データの伝送がサーバ１４０に対しなされることになる。もちろん、選手やボールといった動的オブジェクトは６０回のすべてで（毎フレーム）伝送される。このように、静的オブジェクトである構造物の画像情報に関しては、選手やボールといった動的オブジェクトよりもフレームレートを下げて伝送することができるため、構造物を含む前景の画像データを毎フレーム伝送するよりも伝送効率を格段に上げることができる。また、構造物の画像情報を含む前景画像を構造物の画像情報を含まない前景画像より少ない頻度で送信することで、伝送データを削減することができる。 The above is the content of the process of thinning out and transmitting the image data of the structure portion in the shooting scene. As a result of performing such processing, for example, when the same value as the frame rate is set in the counter as a predetermined value, the transmission of the foreground image data including the structure (here, the soccer goal) is performed only once in 60 times by the server. It will be done against 140. Of course, dynamic objects such as players and balls are transmitted all 60 times (every frame). In this way, the image information of the structure, which is a static object, can be transmitted at a lower frame rate than the dynamic object such as a player or a ball, so that the image data of the foreground including the structure is transmitted every frame. Transmission efficiency can be significantly improved compared to the above. Further, the transmission data can be reduced by transmitting the foreground image including the image information of the structure at a lower frequency than the foreground image not including the image information of the structure.

次に、上述のようにして順次送られてきた前景の画像データを元に、サーバ１４０で仮想視点画像を生成する際の処理について説明する。図１６は、サーバ１４０における仮想視点画像の生成処理の流れを示すフローチャートである。図１６のフローは、カメラシステム１１０ａ〜１１０ｊで撮影され伝送されてきた全ての前景の画像データの中から、ユーザが指定した特定のタイムフレーム（例えば10秒分）分の前景画像を対象に、フレーム単位で実行されるものである。また、この一連の処理は、制御装置１３０の指示に基づき、サーバ１４０内のＣＰＵが所定のプログラムを実行することで実現される。 Next, a process for generating a virtual viewpoint image on the server 140 based on the foreground image data sent sequentially as described above will be described. FIG. 16 is a flowchart showing the flow of the virtual viewpoint image generation process on the server 140. The flow of FIG. 16 targets foreground images for a specific time frame (for example, 10 seconds) specified by the user from all the foreground image data captured and transmitted by the camera systems 110a to 110j. It is executed on a frame-by-frame basis. Further, this series of processes is realized by the CPU in the server 140 executing a predetermined program based on the instruction of the control device 130.

まず、ステップ１６０１では、設定されたタイムフレーム分の前景の画像データのうち、処理対象となる注目する前景画像（フレーム）が決定される。続くステップ１６０２では、注目前景画像に構造物が含まれているかどうかが、前述の２値フラグに基づき判定される。判定の結果、注目前景画像に構造物が含まれる場合はステップ１６０３へ進み、構造物が含まれない場合はステップ１６０５に進む。 First, in step 1601, the foreground image (frame) of interest to be processed is determined from the foreground image data for the set time frame. In the following step 1602, whether or not the foreground image of interest contains a structure is determined based on the above-mentioned binary flag. As a result of the determination, if the foreground image of interest contains a structure, the process proceeds to step 1603, and if the foreground image of interest does not include a structure, the process proceeds to step 1605.

注目前景画像に構造物が含まれる場合のステップ１６０３では、注目前景画像から構造物に対応する画像領域を抽出し、構造物を表す画像（以下、「構造物画像」と呼ぶ）が生成される。この生成処理は、以下のような手順で行われる。まず、前述の準備処理で取得され予め保持しておいた構造物ありの状態の撮影画像（全景画像）と注目前景画像との差分を求め、前景に対応する画像領域を取り出す。次に、当該取り出した前景に対応する画像領域と、予め保持しておいた構造物なしの状態の撮影画像（全景画像）とを合成する。そして、当該合成によって得られた合成画像と、注目前景画像との差分を求め、構造物に対応する画像領域のみを表す構造物画像が得られる。そして、ステップ１６０４にて、ステップ１６０３で生成された構造物画像のデータが、サーバ１４０内のメモリに保持される。既に構造物画像のデータが保持されている場合は、新たに生成された構造物画像のデータによって上書き（更新）されることになる。生成された構造物画像のデータをメモリに保存した後は、ステップ１６０７に進む。 In step 1603 when the structure is included in the foreground image of interest, an image area corresponding to the structure is extracted from the foreground image of interest, and an image representing the structure (hereinafter referred to as “structure image”) is generated. .. This generation process is performed by the following procedure. First, the difference between the captured image (overall view image) with the structure acquired in the above-mentioned preparatory process and the foreground image of interest is obtained, and the image area corresponding to the foreground is extracted. Next, the image area corresponding to the extracted foreground and the captured image (entire view image) in a state without a structure held in advance are combined. Then, the difference between the composite image obtained by the synthesis and the foreground image of interest is obtained, and a structure image representing only the image region corresponding to the structure is obtained. Then, in step 1604, the data of the structure image generated in step 1603 is held in the memory in the server 140. If the structure image data is already retained, it will be overwritten (updated) by the newly generated structure image data. After saving the data of the generated structure image in the memory, the process proceeds to step 1607.

一方、注目前景画像に構造物が含まれない場合のステップ１６０５では、先行するステップ１６０３及びステップ１６０４の処理で生成・保持された構造物画像のデータが読み出される。続くステップ１６０６では、読み出した構造物画像と、構造物を含まない注目前景画像とが合成され、構造物を含んだ注目前景画像が生成される。 On the other hand, in step 1605 when the structure is not included in the foreground image of interest, the data of the structure image generated and held by the preceding processes of steps 1603 and 1604 is read out. In the following step 1606, the read structure image and the attention foreground image not including the structure are combined to generate the attention foreground image including the structure.

ステップ１６０７では、構造物を前景の一部とした、撮影シーン内のオブジェクトの３次元モデル（前景モデル）が生成される。この際、注目前景画像に元々構造物を含んでいた場合（ステップ１６０２でＹｅｓ）の本ステップでは、当該注目前景画像をそのまま用いて前景モデルが生成される。一方、注目前景画像に元々構造物を含んでいなかった場合（ステップ１６０２でＮｏ）の本ステップでは、ステップ１６０６で構造物が合成された注目前景画像を用いて前景モデルが生成される。いずれの場合においても、選手やボールといった動的オブジェクトに加え、構造物（静的オブジェクト）であるサッカーゴールをも含んだ前景モデルが生成されることになる。 In step 1607, a three-dimensional model (foreground model) of an object in the shooting scene is generated with the structure as a part of the foreground. At this time, in this step when the structure is originally included in the attention foreground image (Yes in step 1602), the foreground model is generated using the attention foreground image as it is. On the other hand, in this step when the structure is not originally included in the foreground image of interest (No in step 1602), a foreground model is generated using the foreground image of interest in which the structure is synthesized in step 1606. In either case, a foreground model that includes a soccer goal, which is a structure (static object), is generated in addition to dynamic objects such as players and balls.

ステップ１６０８では、ユーザによって別途設定された仮想視点の位置情報に基づいて、ステップ１６０７で生成した前景モデルを当該仮想視点から見た場合の形状推定を行ない、仮想視点画像が生成される。 In step 1608, the shape of the foreground model generated in step 1607 when viewed from the virtual viewpoint is estimated based on the position information of the virtual viewpoint separately set by the user, and the virtual viewpoint image is generated.

以上が、本実施形態に係る、サーバ１４０での仮想視点画像の生成処理の内容である。本実施形態のように、撮影シーン内の構造物を前景として扱いつつその画像データを間引いて伝送することによっても、実施形態１と同様の効果を得ることができる。 The above is the content of the virtual viewpoint image generation process on the server 140 according to the present embodiment. As in the present embodiment, the same effect as that of the first embodiment can be obtained by treating the structure in the shooting scene as the foreground and thinning out the image data for transmission.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other Examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１１０ａ〜１１０ｊカメラシステム
１１２ａ〜１１２ｊカメラアダプタ
１４０サーバ 110a-110j Camera system 112a-112j Camera adapter 140 server

Claims

複数の方向から撮影されるオブジェクトに対応する３次元形状データを生成する第１生成手段と、
複数の方向から撮影される構造物に対応する３次元形状データを取得する第１取得手段と、
複数の方向から撮影される、少なくとも前記オブジェクト及び前記構造物とは異なる背景に対応する背景データを取得する第２取得手段と、
指定された視点を示す情報を取得する第３取得手段と、
前記第１生成手段により生成された前記オブジェクトに対応する３次元形状データと、前記第１取得手段により取得された前記構造物に対応する３次元形状データと、前記第２取得手段により取得された前記背景データと、前記第３取得手段により取得された前記視点を示す情報とに基づいて、画像を生成する第２生成手段と、
を有し、
前記第１生成手段は、前記オブジェクトの領域、前記構造物の領域及び前記背景の領域を含む画像と、前記オブジェクトの領域を含まず前記構造物の領域及び前記背景の領域を含む画像とに基づいて、前記オブジェクトに対応する３次元形状データを生成することができることを特徴とするシステム。 The first generation means for generating 3D shape data corresponding to objects photographed from a plurality of directions, and
A first acquisition means for acquiring 3D shape data corresponding to a structure photographed from a plurality of directions,
A second acquisition means for acquiring background data corresponding to at least a background different from the object and the structure, which is photographed from a plurality of directions.
A third acquisition means for acquiring information indicating the specified viewpoint,
The three-dimensional shape data corresponding to the object generated by the first generation means, the three-dimensional shape data corresponding to the structure acquired by the first acquisition means, and the three-dimensional shape data acquired by the second acquisition means. A second generation means that generates an image based on the background data and information indicating the viewpoint acquired by the third acquisition means.
Have,
The first generation means is based on an image including the area of the object, the area of the structure and the background area, and an image not including the area of the object but including the area of the structure and the background area. The system is characterized in that it can generate three-dimensional shape data corresponding to the object.

前記第１取得手段は、イベント開始前の撮影により得られた撮影画像に基づいて生成された前記構造物に対応する３次元形状データを取得することを特徴とする請求項１に記載のシステム。 The system according to claim 1, wherein the first acquisition means acquires three-dimensional shape data corresponding to the structure generated based on a photographed image obtained by photographing before the start of an event.

前記第１取得手段は、イベントの開始前に複数の撮影方向からの撮影により取得された複数の撮影画像に基づく複数の画像であって、前記構造物の領域を他の領域と区別して表す複数の画像に基づいて生成された前記構造物に対応する３次元形状データを取得することを特徴とする請求項１又は２に記載のシステム。 The first acquisition means is a plurality of images based on a plurality of captured images acquired by shooting from a plurality of shooting directions before the start of the event, and represents a region of the structure in a distinctive manner from other regions. The system according to claim 1 or 2, wherein the three-dimensional shape data corresponding to the structure generated based on the image of the above is acquired.

前記第１取得手段は、イベント開始前の撮影により得られた撮影画像に基づいて、前記構造物に対応する３次元形状データを生成して取得することを特徴とする請求項１乃至３の何れか１項に記載のシステム。 Any of claims 1 to 3, wherein the first acquisition means generates and acquires three-dimensional shape data corresponding to the structure based on a photographed image obtained by photographing before the start of the event. The system according to item 1.

前記第１取得手段は、イベントの開始前に複数の撮影方向からの撮影により取得された複数の撮影画像に基づく複数の画像であって、前記構造物の領域を他の領域と区別して表す複数の画像に基づいて、前記構造物に対応する３次元形状データを生成して取得することを特徴とする請求項１乃至４のいずれか１項に記載のシステム。 The first acquisition means is a plurality of images based on a plurality of captured images acquired by shooting from a plurality of shooting directions before the start of the event, and represents a region of the structure in a distinctive manner from other regions. The system according to any one of claims 1 to 4, wherein the three-dimensional shape data corresponding to the structure is generated and acquired based on the image of the above.

前記第１取得手段は、イベントの開始前に、前記構造物に対応する３次元形状データを取得することを特徴とする請求項１乃至５のいずれか１項に記載のシステム。 The system according to any one of claims 1 to 5, wherein the first acquisition means acquires three-dimensional shape data corresponding to the structure before the start of an event.

前記イベントの開始前の撮影により取得された撮影画像は、前記オブジェクトの領域を含まず前記構造物の領域及び背景の領域を含む画像であることを特徴とする請求項２に記載のシステム。 The system according to claim 2, wherein the captured image acquired by photographing before the start of the event is an image that does not include the region of the object but includes the region of the structure and the background region.

前記第１生成手段は、イベントの開始後に複数の撮影方向からの撮影により取得された複数の撮影画像に基づいて、前記オブジェクトに対応する３次元形状データを生成することができることを特徴とする請求項１乃至７のいずれか１項に記載のシステム。 The first generation means is capable of generating three-dimensional shape data corresponding to the object based on a plurality of captured images acquired by photographing from a plurality of photographing directions after the start of the event. Item 6. The system according to any one of Items 1 to 7.

前記第１生成手段は、イベントの開始後に複数の撮影方向からの撮影により取得された複数の撮影画像と、イベントの開始前に複数の撮影方向からの撮影により取得された複数の撮影画像に基づいて、前記オブジェクトに対応する３次元形状データを生成することができることを特徴とする請求項１乃至８のいずれか１項に記載のシステム。 The first generation means is based on a plurality of captured images acquired by shooting from a plurality of shooting directions after the start of the event and a plurality of captured images acquired by shooting from a plurality of shooting directions before the start of the event. The system according to any one of claims 1 to 8, wherein the three-dimensional shape data corresponding to the object can be generated.

前記イベントの開始後に複数の撮影方向からの撮影により取得された複数の撮影画像は、前記オブジェクトの領域、前記構造物の領域及び前記背景の領域を含む画像であることを特徴とする請求項８又は９に記載のシステム。 8. The photographed image acquired by photographing from a plurality of photographing directions after the start of the event is an image including the area of the object, the area of the structure, and the area of the background. Or the system according to 9.

前記構造物の３次元形状データと、前記背景データと、前記視点を示す情報とは、システムが有する通信手段により取得されることを特徴とする請求項１乃至３のいずれか１項に記載のシステム。 The third-dimensional shape data of the structure, the background data, and the information indicating the viewpoint are according to any one of claims 1 to 3, wherein the information indicating the viewpoint is acquired by a communication means included in the system. system.

指定された視点に基づく画像を生成するために用いられるシステムであって、
撮影画像に基づいて構造物の領域を含む画像を生成する第１生成手段と、
撮影画像に基づいてオブジェクトの領域を含み前記構造物の領域を含まない画像を生成する第２生成手段と、
前記第１生成手段により生成された画像と、前記第２生成手段により生成された画像とを送信する送信手段と、
前記送信手段により送信された前記第１生成手段により生成された画像及び前記第２生成手段により生成された画像に基づいて、前記構造物の領域を他の領域と区別して表す画像を生成する第３生成手段と、
を有し、
前記送信手段は、前記第１生成手段により生成された画像を、前記第２生成手段により生成された画像より低い頻度で送信することができる、
ことを特徴とするシステム。 A system used to generate images based on a specified viewpoint.
A first generation means for generating an image including a region of a structure based on a captured image,
A second generation means for generating an image that includes an area of an object and does not include an area of the structure based on a captured image.
A transmission means for transmitting an image generated by the first generation means and an image generated by the second generation means, and
Based on the image generated by the first generation means and the image generated by the second generation means transmitted by the transmission means, an image representing the region of the structure to be distinguished from other regions is generated. 3 generation means and
Have,
The transmission means can transmit the image generated by the first generation means at a lower frequency than the image generated by the second generation means.
A system characterized by that.

指定された視点に基づく画像を生成するために用いられるシステムであって、
撮影画像に基づいて構造物の領域を含む画像を生成する第１生成手段と、
撮影画像に基づいてオブジェクトの領域を含み前記構造物の領域を含まない画像を生成する第２生成手段と、
前記第１生成手段により生成された画像と、前記第２生成手段により生成された画像とを送信する送信手段と、
前記送信手段により送信された前記第１生成手段により生成された画像及び前記第２生成手段により生成された画像に基づいて、前記オブジェクトに対応する３次元形状データ及び前記構造物に対応する３次元形状データを取得する第３生成手段と、
を有し、
前記送信手段は、前記第１生成手段により生成された画像を、前記第２生成手段により生成された画像より低い頻度で送信することができる、
ことを特徴とするシステム。 A system used to generate images based on a specified viewpoint.
A first generation means for generating an image including a region of a structure based on a captured image,
A second generation means for generating an image that includes an area of an object and does not include an area of the structure based on a captured image.
A transmission means for transmitting an image generated by the first generation means and an image generated by the second generation means, and
Based on the image generated by the first generation means and the image generated by the second generation means transmitted by the transmission means, the three-dimensional shape data corresponding to the object and the three dimensions corresponding to the structure are obtained. A third generation means for acquiring shape data and
Have,
The transmission means can transmit the image generated by the first generation means at a lower frequency than the image generated by the second generation means.
A system characterized by that.

前記送信手段により送信された前記第１生成手段により生成された画像及び前記第２生成手段により生成された画像に基づいて、前記指定された視点に基づく画像を生成する第４生成手段をさらに有することを特徴とする請求項１２又は１３に記載のシステム。 It further has a fourth generation means that generates an image based on the designated viewpoint based on the image generated by the first generation means and the image generated by the second generation means transmitted by the transmission means. The system according to claim 12 or 13 .

前記オブジェクトは、動体であることを特徴とする請求項１乃至１４のいずれか１項に記載のシステム。 The system according to any one of claims 1 to 14 , wherein the object is a moving object.

人物とボールのうち少なくとも一方は、前記オブジェクトであることを特徴とする請求項１乃至１５のいずれか１項に記載のシステム。 The system according to any one of claims 1 to 15 , wherein at least one of the person and the ball is the object.

前記構造物は、静止状態が継続する物体であることを特徴とする請求項１乃至１６のいずれか１項に記載のシステム。 The system according to any one of claims 1 to 16 , wherein the structure is an object that remains stationary.

サッカーの試合に用いられるサッカーゴール及びコーナーフラッグの少なくとも一方は、前記構造物であることを特徴とする請求項１乃至１６のいずれか１項に記載のシステム。 The system according to any one of claims 1 to 16 , wherein at least one of a soccer goal and a corner flag used in a soccer match is the structure.

前記構造物は、所定の位置に設置された物体であることを特徴とする請求項１乃至１８のいずれか１項に記載のシステム。 The system according to any one of claims 1 to 18 , wherein the structure is an object installed at a predetermined position.

前記構造物の少なくとも一部は、オブジェクトである人物が競技を行うフィールド上に設置されていることを特徴とする請求項１乃至１９のいずれか１項に記載のシステム。 The system according to any one of claims 1 to 19 , wherein at least a part of the structure is installed on a field where a person who is an object competes.

前記構造物は、指定された物体であることを特徴とする請求項１乃至２０のいずれか１項に記載のシステム。 The system according to any one of claims 1 to 20 , wherein the structure is a designated object.

指定された視点に基づく画像を生成する生成方法であって、
複数の方向から撮影されるオブジェクトに対応する３次元形状データを生成する第１生成工程と、
複数の方向から撮影される構造物に対応する３次元形状データを取得する第１取得工程と、
複数の方向から撮影される、少なくとも前記オブジェクト及び前記構造物とは異なる背景に対応する背景データを取得する第２取得工程と、
指定された視点を示す情報を取得する第３取得工程と、
前記第１生成工程により生成された前記オブジェクトに対応する３次元形状データと、前記第１取得工程により取得された前記構造物に対応する３次元形状データと、前記第２取得工程により取得された前記背景データと、前記第３取得工程により取得された前記視点を示す情報とに基づいて、画像を生成する第２生成工程と、
を有し、
前記第１生成工程において、前記オブジェクトの領域、前記構造物の領域及び背景の領域を含む画像と、前記オブジェクトの領域を含まず前記構造物の領域及び背景の領域を含む画像とに基づいて、前記オブジェクトに対応する３次元形状データを生成されることを特徴とする生成方法。 A generation method that generates an image based on a specified viewpoint.
The first generation step of generating 3D shape data corresponding to objects photographed from multiple directions, and
The first acquisition process for acquiring 3D shape data corresponding to structures photographed from multiple directions, and
A second acquisition step of acquiring background data corresponding to at least a background different from the object and the structure, which is photographed from a plurality of directions.
The third acquisition process to acquire information indicating the specified viewpoint, and
The three-dimensional shape data corresponding to the object generated by the first generation step, the three-dimensional shape data corresponding to the structure acquired by the first acquisition step, and the three-dimensional shape data corresponding to the structure acquired by the second acquisition step. A second generation step of generating an image based on the background data and the information indicating the viewpoint acquired by the third acquisition step.
Have,
In the first generation step, based on an image including the area of the object, the area of the structure and the background area, and an image not including the area of the object and including the area of the structure and the background area. A generation method characterized in that three-dimensional shape data corresponding to the object is generated.

前記第１取得工程において、イベントの開始前の複数の撮影方向からの撮影により取得された、オブジェクトの領域を含まず前記構造物の領域及び背景の領域を含む画像に基づく複数の画像であって、前記構造物の領域を他の領域と区別して表す複数の画像に基づいて、前記構造物に対応する３次元形状データが生成されて取得されることを特徴とする請求項２２に記載の生成方法。 A plurality of images based on an image acquired by shooting from a plurality of shooting directions before the start of the event in the first acquisition step, which does not include the area of the object but includes the area of the structure and the background area. 22. The generation according to claim 22 , wherein three-dimensional shape data corresponding to the structure is generated and acquired based on a plurality of images representing the region of the structure in a distinctive manner from other regions. Method.

指定された視点に基づく画像を生成する生成方法であって、
撮影画像に基づいて構造物の領域を含む画像を生成する第１生成工程と、
撮影画像に基づいてオブジェクトの領域を含み前記構造物の領域を含まない画像を生成する第２生成工程と、
前記第１生成工程により生成された画像と、前記第２生成工程により生成された画像とを送信する送信工程と、
前記送信工程において送信された前記第１生成工程において生成された画像及び前記第２生成工程により生成された画像に基づいて、前記構造物の領域を他の領域と区別して表す画像を生成する第３生成工程と、
を有し、
前記送信工程において、前記第１生成工程により生成された画像は、前記第２生成工程により生成された画像より低い頻度で送信されることを特徴とする生成方法。 A generation method that generates an image based on a specified viewpoint.
The first generation step of generating an image including a region of a structure based on a captured image, and
The second generation step of generating an image including the area of the object and not including the area of the structure based on the captured image, and
A transmission step of transmitting an image generated by the first generation step and an image generated by the second generation step.
Based on the image generated in the first generation step and the image generated in the second generation step transmitted in the transmission step, an image representing the region of the structure to be distinguished from other regions is generated. 3 generation process and
Have,
A generation method characterized in that, in the transmission step, the image generated by the first generation step is transmitted at a lower frequency than the image generated by the second generation step.

指定された視点に基づく画像を生成する生成方法であって、A generation method that generates an image based on a specified viewpoint.
撮影画像に基づいて構造物の領域を含む画像を生成する第１生成工程と、The first generation step of generating an image including a region of a structure based on a captured image, and
撮影画像に基づいてオブジェクトの領域を含み前記構造物の領域を含まない画像を生成する第２生成工程と、The second generation step of generating an image including the area of the object and not including the area of the structure based on the captured image, and
前記第１生成工程により生成された画像と、前記第２生成工程により生成された画像とを送信する送信工程と、A transmission step of transmitting an image generated by the first generation step and an image generated by the second generation step.
前記送信工程において送信された前記第１生成工程において生成された画像及び前記第２生成工程により生成された画像に基づいて、前記オブジェクトに対応する３次元形状データ及び前記構造物に対応する３次元形状データを取得する第３生成工程と、Based on the image generated in the first generation step and the image generated in the second generation step transmitted in the transmission step, the three-dimensional shape data corresponding to the object and the three-dimensional corresponding to the structure are obtained. The third generation process to acquire shape data and
を有し、Have,
前記送信工程において、前記第１生成工程により生成された画像は、前記第２生成工程により生成された画像より低い頻度で送信される、In the transmission step, the image generated by the first generation step is transmitted at a lower frequency than the image generated by the second generation step.
ことを特徴とする生成方法。A generation method characterized by that.

コンピュータを、請求項１乃至２１のいずれか１項に記載のシステムの各手段として動作させるためのプログラム。 A program for operating a computer as each means of the system according to any one of claims 1 to 21 .