JP2021182443A

JP2021182443A - Transmission device and transmission method, and program

Info

Publication number: JP2021182443A
Application number: JP2021132062A
Authority: JP
Inventors: 貴志花本; Takashi Hanamoto
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-08-07
Filing date: 2021-08-13
Publication date: 2021-11-25
Anticipated expiration: 2037-09-19
Also published as: JP7204843B2

Abstract

To make it possible to efficiently provide a client with material data for playing back virtual viewpoint video.SOLUTION: A generation device generates material data used for creating a virtual viewpoint image according to the position and direction of a virtual viewpoint, and has: a generation unit that generates a plurality of pieces of material data based on a plurality of photographed images obtained by a plurality of cameras photographing a photographic area from different directions, the generation unit generating the plurality of pieces of material data including at least first material data and second material data with higher quality than that of the first material data; and a storage control unit that stores the material data generated by the generation unit in a storage unit. Of the plurality of pieces of material data stored by the storage control unit in the storage unit, the material data selected based on information obtained from a client is provided to the client.SELECTED DRAWING: Figure 3

Description

本発明は、仮想視点映像のための三次元形状データを送信する送信装置および送信方法に関する。 The present invention relates to a transmission device and a transmission method for transmitting three-dimensional shape data for a virtual viewpoint image.

複数台の実カメラ映像を用いて、３次元空間内の任意の位置に配置した仮想カメラからの映像を再現する技術として、自由視点映像（仮想視点映像）技術がある。仮想視点映像技術では、被写体の３次元形状を推定することにより、任意の仮想カメラ位置からの映像を生成する。被写体のモデルデータ（３次元形状とテクスチャ画像）をユーザの所有する端末に送信することによって、ユーザのインタラクティブな操作に対応した仮想視点映像を生成することが可能である。しかし、被写体のモデルデータのデータ量は膨大であるため、モデルデータの送信は通信帯域を圧迫してしまう。送信されるデータ量を削減する方法として、形状の変化量に応じて３次元形状の構成密度を変動させる構成（特許文献１）が提案されている。 There is a free viewpoint image (virtual viewpoint image) technology as a technique for reproducing an image from a virtual camera arranged at an arbitrary position in a three-dimensional space by using a plurality of real camera images. In the virtual viewpoint image technology, an image is generated from an arbitrary virtual camera position by estimating the three-dimensional shape of the subject. By transmitting the model data (three-dimensional shape and texture image) of the subject to the terminal owned by the user, it is possible to generate a virtual viewpoint image corresponding to the interactive operation of the user. However, since the amount of model data of the subject is enormous, the transmission of the model data puts pressure on the communication band. As a method of reducing the amount of data to be transmitted, a configuration (Patent Document 1) has been proposed in which the configuration density of a three-dimensional shape is changed according to the amount of change in the shape.

特許第５５６３５４５号公報Japanese Patent No. 5563545

しかしながら、特許文献１では、形状の構成密度のみに着目しているため、ユーザにとって重要な情報が欠落する恐れがある。したがって、特許文献１のようなデータ量の削減手法は、仮想視点映像を生成するためのモデルデータを生成するのには不向きである。 However, since Patent Document 1 focuses only on the composition density of the shape, there is a risk that important information for the user will be lost. Therefore, the data amount reduction method as in Patent Document 1 is not suitable for generating model data for generating a virtual viewpoint image.

本発明は、仮想視点映像を再生するための素材データを効率的にクライアントへ提供可能とすることを目的とする。 An object of the present invention is to make it possible to efficiently provide a client with material data for reproducing a virtual viewpoint image.

本発明の一態様による送信装置は以下の構成を有する。すなわち、
複数の撮影装置が行う撮影により得られる複数の撮影画像に基づいて生成される複数の三次元形状データであって、点群又はボクセルにより表される三次元形状データと、メッシュにより表される三次元形状データと、を含む複数の三次元形状データの中から、仮想視点画像の生成に使用される三次元形状データを決定する決定手段と、
前記決定手段により決定される三次元形状データを他の装置に送信する送信手段と、
を有する。 The transmission device according to one aspect of the present invention has the following configuration. That is,
A plurality of three-dimensional shape data generated based on a plurality of captured images obtained by imaging performed by a plurality of photographing devices, the three-dimensional shape data represented by a point cloud or a voxel, and the third order represented by a mesh. A determination means for determining the 3D shape data used for generating the virtual viewpoint image from the original shape data and a plurality of 3D shape data including the original shape data.
A transmission means for transmitting the three-dimensional shape data determined by the determination means to another device, and
Have.

本発明によれば、仮想視点映像を再生するための素材データを効率的にクライアントへ提供できる。 According to the present invention, it is possible to efficiently provide a client with material data for reproducing a virtual viewpoint image.

画像表示システムの構成と、画像処理装置の構成を示すブロック図。A block diagram showing the configuration of an image display system and the configuration of an image processing device. 表示装置の構成を示すブロック図。A block diagram showing the configuration of a display device. 画像表示システムにおけるカメラの配置を示す概要図。Schematic diagram showing the arrangement of cameras in an image display system. 仮想視点映像の送信の処理を示すフローチャート。A flowchart showing the process of transmitting a virtual viewpoint video. 階層モデルデータ生成の処理を示すフローチャート。A flowchart showing the process of generating hierarchical model data. 階層モデルデータの概要を示した図。The figure which showed the outline of the hierarchical model data. 階層モデルデータの概要を示した図。The figure which showed the outline of the hierarchical model data. 属性データの概要を示した図。The figure which showed the outline of the attribute data. 属性データの概要を示した図。The figure which showed the outline of the attribute data. 属性データの生成処理を示すフローチャート。A flowchart showing the generation process of attribute data. モデルデータ送信処理を示すフローチャート。A flowchart showing the model data transmission process. 表示装置のＧＵＩを示した図。The figure which showed the GUI of the display device. 仮想視点映像の生成処理を示すフローチャート。A flowchart showing the generation process of the virtual viewpoint video. 送信用データを示した図。The figure which showed the data for transmission. 属性データの修正処理を説明する図。The figure explaining the correction process of attribute data. 属性データの修正処理の他の例を説明する図。The figure explaining another example of the correction process of attribute data.

以下、本発明の実施形態について、図面を参照して説明する。なお、以下の実施形態は本発明を限定するものではなく、また、本実施形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。なお、同一の構成については、同じ符号を付して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. It should be noted that the following embodiments do not limit the present invention, and not all combinations of features described in the present embodiment are essential for the means for solving the present invention. The same configuration will be described with the same reference numerals.

＜第１実施形態＞
第１実施形態では、ユーザ端末でのインタラクティブな仮想視点映像の再生時に必要なモデルデータの送信方法に関して説明する。ここでは、説明を簡易にするため、送信元であるサーバが単一で、受信先であるクライアントが複数存在するケースを想定する。また、ロケーションをサッカー競技が行われるスタジアム内とし、サーバである画像処理装置がスタジアム内に存在し、観客席においてユーザがクライアントである表示装置（スマートフォン、タブレットなどの端末）を操作し、仮想視点映像を閲覧するものとする。なお、本実施形態において仮想視点映像とは、仮想的に設定された視点からの映像のことを意味する。仮想視点映像に類似の用語として、自由視点映像や任意視点映像等の呼称も存在する。 <First Embodiment>
In the first embodiment, a method of transmitting model data necessary for reproducing an interactive virtual viewpoint video on a user terminal will be described. Here, for the sake of simplicity, it is assumed that there is a single server as a source and a plurality of clients as recipients. In addition, the location is in the stadium where the soccer competition is held, the image processing device that is the server exists in the stadium, and the user operates the display device (terminal such as smartphone, tablet) that is the client in the audience seats, and the virtual viewpoint. The video shall be viewed. In the present embodiment, the virtual viewpoint image means an image from a virtually set viewpoint. As a term similar to virtual viewpoint video, there are also names such as free viewpoint video and arbitrary viewpoint video.

図１Ａは、第１実施形態における、画像処理装置の構成例と画像表示システムの構成例を示すブロック図である。画像処理装置１００は、ＣＰＵ１０１、メインメモリ１０２、記憶部１０３、入力部１０４、表示部１０５、外部Ｉ／Ｆ部１０６、バス１０７を備える。ＣＰＵ１０１は、演算処理や各種プログラムを実行する。メインメモリ１０２は、処理に必要なプログラム、データ、作業領域などをＣＰＵ１０１に提供する。記憶部１０３は、画像処理プログラム、ＧＵＩ表示に必要な各種データ、などを格納する。記憶部１０３には、例えばハードディスクやシリコンディスク等の不揮発性メモリが用いられる。入力部１０４は、キーボードやマウス等の装置であり、サーバ管理者からの操作入力を受け付ける。表示部１０５はＧＵＩの表示を行う。外部Ｉ／Ｆ部１０６は、ＬＡＮ１０８を介してカメラ装置群や表示装置群と接続し、映像データや制御信号データ、モデルデータの送受信を行う。バス１０７は上述の各部を接続し、データ転送を行う。 FIG. 1A is a block diagram showing a configuration example of an image processing device and a configuration example of an image display system according to the first embodiment. The image processing device 100 includes a CPU 101, a main memory 102, a storage unit 103, an input unit 104, a display unit 105, an external I / F unit 106, and a bus 107. The CPU 101 executes arithmetic processing and various programs. The main memory 102 provides the CPU 101 with programs, data, work areas, and the like necessary for processing. The storage unit 103 stores an image processing program, various data required for GUI display, and the like. For the storage unit 103, for example, a non-volatile memory such as a hard disk or a silicon disk is used. The input unit 104 is a device such as a keyboard and a mouse, and receives operation input from the server administrator. The display unit 105 displays the GUI. The external I / F unit 106 connects to the camera device group and the display device group via the LAN 108, and transmits / receives video data, control signal data, and model data. The bus 107 connects each of the above-mentioned parts and performs data transfer.

ＬＡＮ１０８は有線および／または無線で構成され、画像処理装置、カメラ装置群、表示装置群、分析装置間でのデータ送受信に利用される。カメラ装置群は、複数のカメラ１２０で構成される。それぞれのカメラ１２０はＬＡＮ１０８経由で画像処理装置１００と接続されており、画像処理装置１００からの制御信号をもとに、撮影の開始と停止、カメラ設定（シャッタースピード、焦点距離、絞値など）の変更、撮影データの転送を行う。表示装置群は、複数のユーザ端末１３０（スマートフォンやタブレットなど）から構成される。それぞれのユーザ端末１３０はＬＡＮ１０８経由で画像処理装置１００と接続されており、画像処理装置１００から仮想視点映像の閲覧に必要なモデルデータを受信する。ユーザ端末１３０は、受信したモデルデータを用いて、仮想視点映像を生成し、表示する。ＬＡＮ１０８の通信帯域は有限であるため、ユーザ端末１３０が受信可能なモデルデータのサイズはユーザ数に依存する。分析装置１４０は、カメラ１２０の映像や被写体に設置した各種センサ情報を用いて、被写体のプレーの種類を分析する。なお、分析装置１４０はオプションであり、必須な構成要素ではない。なお、システム構成については、上記以外にも、様々な構成要素が存在する。例えば、ＬＡＮ１０８の代わりに、インターネットやＷＡＮなどを介して、各種デバイスが接続されるようにしても良い。また例えば、画像処理装置１００と複数のカメラ１２０と分析装置１４０がＬＡＮ１０８を介して接続されることによって画像処理システムを形成し、当該画像処理システムとユーザ端末１３０がインターネット等によって接続されるようにしても良い。 The LAN 108 is configured by wire and / or wireless, and is used for data transmission / reception between an image processing device, a camera device group, a display device group, and an analysis device. The camera device group is composed of a plurality of cameras 120. Each camera 120 is connected to the image processing device 100 via LAN 108, and based on the control signal from the image processing device 100, shooting start and stop, camera settings (shutter speed, focal length, aperture value, etc.) Change and transfer shooting data. The display device group is composed of a plurality of user terminals 130 (smartphones, tablets, etc.). Each user terminal 130 is connected to the image processing device 100 via the LAN 108, and receives model data necessary for viewing the virtual viewpoint image from the image processing device 100. The user terminal 130 generates and displays a virtual viewpoint image using the received model data. Since the communication band of the LAN 108 is finite, the size of the model data that can be received by the user terminal 130 depends on the number of users. The analyzer 140 analyzes the type of play of the subject by using the image of the camera 120 and various sensor information installed on the subject. The analyzer 140 is an option and is not an essential component. Regarding the system configuration, there are various components other than the above. For example, instead of LAN 108, various devices may be connected via the Internet, WAN, or the like. Further, for example, the image processing device 100, the plurality of cameras 120, and the analysis device 140 are connected via the LAN 108 to form an image processing system so that the image processing system and the user terminal 130 are connected by the Internet or the like. May be.

図１Ｂは、第１実施形態による、表示装置としてのユーザ端末１３０の構成を示すブロック図である。ユーザ端末１３０は、ＣＰＵ１３１、メインメモリ１３２、記憶部１３３、入力部１３４、表示部１３５、外部Ｉ／Ｆ部１３６、バス１３７を備える。ＣＰＵ１３１は、演算処理や各種プログラムを実行する。メインメモリ１３２は、処理に必要なプログラム、データ、作業領域などをＣＰＵ１３１に提供する。記憶部１３３は、仮想視点映像を生成、表示するためのプログラム、ＧＵＩ表示に必要な各種データ、などを格納する。記憶部１３３には、例えばハードディスクやシリコンディスク等の不揮発性メモリが用いられる。入力部１３４は、キーボード、マウス、タッチパネル等の装置であり、仮想視点映像を観察するユーザからの操作入力を受け付ける。表示部１３５は、仮想視点映像やＧＵＩの表示を行う。外部Ｉ／Ｆ部１３６は、ＬＡＮ１０８と接続し、例えば画像処理装置１００から送信された、仮想視点映像を再生するためのモデルデータを受信する。バス１３７は上述の各部を接続し、データ転送を行う。 FIG. 1B is a block diagram showing a configuration of a user terminal 130 as a display device according to the first embodiment. The user terminal 130 includes a CPU 131, a main memory 132, a storage unit 133, an input unit 134, a display unit 135, an external I / F unit 136, and a bus 137. The CPU 131 executes arithmetic processing and various programs. The main memory 132 provides the CPU 131 with programs, data, work areas, and the like necessary for processing. The storage unit 133 stores a program for generating and displaying a virtual viewpoint image, various data necessary for GUI display, and the like. For the storage unit 133, for example, a non-volatile memory such as a hard disk or a silicon disk is used. The input unit 134 is a device such as a keyboard, a mouse, and a touch panel, and receives an operation input from a user who observes a virtual viewpoint image. The display unit 135 displays a virtual viewpoint image and a GUI. The external I / F unit 136 is connected to the LAN 108 and receives model data for reproducing a virtual viewpoint image transmitted from, for example, the image processing device 100. Bus 137 connects each of the above-mentioned parts and transfers data.

図２は、複数のカメラ１２０の配置を示した図である。サッカー競技を行うフィールド２０１上に複数の被写体２０２が存在し、複数のカメラ１２０がフィールド２０１を取り囲むように配置されている。複数のカメラ１２０は主に観客席などに配置され、それぞれフィールド２０１が収まるように焦点距離と撮影方向が設定されている。 FIG. 2 is a diagram showing the arrangement of a plurality of cameras 120. A plurality of subjects 202 exist on the field 201 in which a soccer game is played, and a plurality of cameras 120 are arranged so as to surround the field 201. The plurality of cameras 120 are mainly arranged in the audience seats and the like, and the focal length and the shooting direction are set so that the field 201 fits in each.

図３は、画像処理装置１００による送信までの一連の処理過程を示したフローチャートである。Ｓ３０１では、画像処理装置１００は、カメラ１２０の撮影により得られた映像を取得し、映像中の被写体ごとに、データサイズが異なる複数の階層（図５Ａにより詳述する）のモデルデータを生成する。Ｓ３０１の処理に関しては、図４にて詳述する。Ｓ３０２では、画像処理装置１００は、撮影対象となる競技の指定を受け付ける。ここでは、「サッカー」、「ラグビー」、「フィギュア」などの競技名の指定を受け付ける。Ｓ３０３では、画像処理装置１００は、Ｓ３０２で受け付けた競技の種別、分析装置１４０からのデータを基に、仮想視点映像生成に必要なモデルデータの階層が記述された属性データを生成する。図６Ａ、図６Ｂの参照により後述するように、属性データは、映像中のコンテンツの属性と要求される階層とを対応付けるデータである。Ｓ３０３の処理に関しては、図７の参照により後述する。Ｓ３０４では、画像処理装置１００は、属性データによって要求される階層のモデルデータを被写体ごとに選択して送信用モデルデータを構成し、表示装置であるユーザ端末１３０からのリクエストに応じて送信する。この送信用モデルデータの構築において、ＬＡＮ１０８の通信帯域の使用状態なども考慮される。Ｓ３０４の処理に関しては、図８の参照により後述する。 FIG. 3 is a flowchart showing a series of processing processes up to transmission by the image processing apparatus 100. In S301, the image processing device 100 acquires an image obtained by shooting with the camera 120, and generates model data of a plurality of layers (detailed with reference to FIG. 5A) having different data sizes for each subject in the image. .. The processing of S301 will be described in detail with reference to FIG. In S302, the image processing device 100 accepts the designation of the competition to be photographed. Here, the designation of competition names such as "soccer", "rugby", and "figure" is accepted. In S303, the image processing device 100 generates attribute data in which the hierarchy of model data required for virtual viewpoint image generation is described, based on the competition type received in S302 and the data from the analysis device 140. As will be described later with reference to FIGS. 6A and 6B, the attribute data is data that associates the attributes of the content in the video with the required hierarchy. The processing of S303 will be described later with reference to FIG. In S304, the image processing device 100 selects model data in the hierarchy required by the attribute data for each subject to form transmission model data, and transmits the model data in response to a request from the user terminal 130, which is a display device. In constructing the transmission model data, the usage state of the communication band of LAN 108 is also taken into consideration. The processing of S304 will be described later with reference to FIG.

図４は、インタラクティブな仮想視点映像の生成に必要なモデルデータを複数の階層について生成する処理を示したフローチャートであり、Ｓ３０１の処理の詳細を示している。Ｓ４０１において、画像処理装置１００（ＣＰＵ１０１）は、スタジアムや観客席などの背景モデルデータを生成する。背景モデルデータは３次元形状を構築するメッシュデータと、色を再現するためのテクスチャデータから構成され、３Ｄレーザスキャナや、多視点ステレオ法などを用いて生成される。 FIG. 4 is a flowchart showing a process of generating model data necessary for generating an interactive virtual viewpoint image for a plurality of layers, and shows details of the process of S301. In S401, the image processing device 100 (CPU101) generates background model data such as a stadium and spectator seats. The background model data is composed of mesh data for constructing a three-dimensional shape and texture data for reproducing colors, and is generated by using a 3D laser scanner, a multi-viewpoint stereo method, or the like.

Ｓ４０２において、ＣＰＵ１０１は、カメラ１２０に対して、撮影時の露光が適切となるようなカメラ設定の変更と、撮影開始の信号を送信する。撮影開始の信号に応じてカメラ１２０は撮影を開始し、映像データをＬＡＮ１０８経由で画像処理装置１００に転送する。画像処理装置１００は、カメラ１２０からの映像データを受信し、メインメモリ１０２上に展開する。画像処理装置１００は、映像データを、各カメラの映像フレームを同一のタイムコードごとにまとめた多視点フレームとして管理する。また、このとき、画像処理装置１００は、各カメラの位置・姿勢をStructure from Motion等の方法を用いて算出し、記憶しておく。 In S402, the CPU 101 transmits to the camera 120 a signal for changing the camera settings so that the exposure at the time of shooting is appropriate and a signal for starting shooting. The camera 120 starts shooting in response to the shooting start signal, and transfers the video data to the image processing device 100 via the LAN 108. The image processing device 100 receives the video data from the camera 120 and deploys it on the main memory 102. The image processing device 100 manages the video data as a multi-viewpoint frame in which the video frames of each camera are grouped by the same time code. At this time, the image processing device 100 calculates and stores the position and posture of each camera by using a method such as Structure from Motion.

Ｓ４０３において、ＣＰＵ１０１は、映像データから被写体群の輪郭を抽出し、Visual-hullなどの方法を用いて被写体群の３次元形状・位置を生成する。被写体群の輪郭抽出は、１台のカメラのカメラ映像の全フレームにおいて中間値フィルタを用いることで取得可能である。また、３次元形状は点群データ、もしくはボクセルデータとして出力される。本処理は、全タイムコードの多視点フレームに対して実施され、多視点フレーム単位で全被写体の形状点群データ（高密度点群による形状データ）が生成される。生成された形状点群データは記憶部１０３に保存される。 In S403, the CPU 101 extracts the outline of the subject group from the video data and generates the three-dimensional shape / position of the subject group by using a method such as Visual-hull. The contour extraction of the subject group can be obtained by using an intermediate value filter in all frames of the camera image of one camera. The three-dimensional shape is output as point cloud data or voxel data. This process is executed for the multi-viewpoint frame of all time codes, and the shape point cloud data (shape data by the high-density point cloud) of all the subjects is generated in the multi-viewpoint frame unit. The generated shape point cloud group data is stored in the storage unit 103.

Ｓ４０４において、ＣＰＵ１０１は、Ｓ４０３で生成した形状点群データに対して、間引きを行い、残った点群をつないで面（三角形ポリゴン）を構成するメッシュ化を行い、被写体を表すメッシュを生成する。メッシュ化には、周知の技術を適用可能であり、例えばBall Pivotingなどの方法を用いることができる。ＣＰＵ１０１は、多視点フレーム単位に生成した全ての形状点群データに対してメッシュ化の処理を実行し、得られたデータ（低密度なメッシュデータ）を記憶部１０３に保存する。Ｓ４０５において、ＣＰＵ１０１は、Ｓ４０４で生成したメッシュに対して貼り付けるテクスチャデータ（被写体のテクスチャ）を生成する。被写体のテクスチャの生成には周知の技術を適用可能である。ＣＰＵ１０１は、多視点フレーム単位に生成した全てのメッシュデータに対してテクスチャを生成し、得られたデータを記憶部１０３に保存する。 In S404, the CPU 101 thins out the shape point cloud data generated in S403, connects the remaining point clouds to form a mesh to form a surface (triangular polygon), and generates a mesh representing the subject. Well-known techniques can be applied to meshing, for example, a method such as Ball Pivoting can be used. The CPU 101 executes a meshing process on all the shape point cloud data generated in units of multi-viewpoint frames, and stores the obtained data (low-density mesh data) in the storage unit 103. In S405, the CPU 101 generates texture data (texture of the subject) to be attached to the mesh generated in S404. Well-known techniques can be applied to generate the texture of the subject. The CPU 101 generates textures for all the mesh data generated in units of multi-viewpoint frames, and stores the obtained data in the storage unit 103.

Ｓ４０６において、ＣＰＵ１０１は、映像先頭のタイムコードに該当する多視点フレーム（初期フレーム）から生成されたメッシュとテクスチャを記憶部１０３から読み出し、メインメモリ１０２に展開する。Ｓ４０７において、ＣＰＵ１０１は、Ｓ４０６で読み込んだメッシュに対してメッシュを姿勢制御するためのボーンを組み込む。ボーンは図５Ｂ（ｂ）に示すように、人間の骨のような構造を持ち、関節５０２と、関節５０２同士をつなぐ骨組み５０３とを有し、メッシュ５０１の内部に格納される。ボーンはあらかじめ用意されており、メッシュに合わせてサイズや初期関節位置を変更することで、種々のメッシュに格納可能である。メッシュ５０１とボーンは連動して変形するため、関節５０２の位置を移動することによって、メッシュ５０１に様々な姿勢・動作（座る、走る、蹴るなど）を再現させることが可能である。また、各々の関節５０２の位置は骨組み５０３によって移動が制限されるため、より人間に近い動作が再現できる。 In S406, the CPU 101 reads the mesh and texture generated from the multi-viewpoint frame (initial frame) corresponding to the time code at the beginning of the video from the storage unit 103, and expands the mesh and texture into the main memory 102. In S407, the CPU 101 incorporates a bone for controlling the attitude of the mesh with respect to the mesh read in S406. As shown in FIG. 5B (b), the bone has a human bone-like structure, has a joint 502, and has a skeleton 503 connecting the joints 502 to each other, and is stored inside the mesh 501. Bones are prepared in advance and can be stored in various meshes by changing the size and initial joint position according to the mesh. Since the mesh 501 and the bone are deformed in conjunction with each other, it is possible to reproduce various postures / movements (sitting, running, kicking, etc.) on the mesh 501 by moving the position of the joint 502. Further, since the position of each joint 502 is restricted in movement by the skeleton 503, a movement closer to a human can be reproduced.

Ｓ４０８において、ＣＰＵ１０１は、カメラ映像を用いて全被写体の関節位置を推定する。関節位置の推定には周知の技術を適用することが可能である。例えば、機械学習を用い、映像上の２次元での関節位置(x(n,i,k,t), y(n,i,k,t))を取得する。ここで、０≦x＜画像の横幅、０≦y＜画像の縦幅、０≦n＜カメラ数、０≦i＜被写体数、０≦k＜関節数、０≦t＜フレーム数である。少なくとも２台以上のカメラ映像において２次元関節位置を取得後、Ｓ４０２で求めた各カメラの位置を基に三角測量を用いて３次元での関節位置(X(i,k,t), Y(i,k,t), Z(i,k,t))を取得する。このとき、X、Y、Zは３次元空間における座標値で、０≦i＜被写体数、０≦k＜関節数、０≦t＜フレーム数である。これにより、被写体の各関節の移動軌跡、すなわち、姿勢の変遷が取得される。 In S408, the CPU 101 estimates the joint positions of all the subjects using the camera image. Well-known techniques can be applied to estimate the joint position. For example, using machine learning, the joint position (x (n, i, k, t), y (n, i, k, t)) in two dimensions on the image is acquired. Here, 0 ≦ x <horizontal width of the image, 0 ≦ y <vertical width of the image, 0 ≦ n <number of cameras, 0 ≦ i <number of subjects, 0 ≦ k <number of joints, 0 ≦ t <number of frames. After acquiring the two-dimensional joint positions in the images of at least two or more cameras, the joint positions in three dimensions (X (i, k, t), Y (X (i, k, t), Y ( Get i, k, t), Z (i, k, t)). At this time, X, Y, and Z are coordinate values in the three-dimensional space, and 0 ≦ i <number of subjects, 0 ≦ k <number of joints, and 0 ≦ t <number of frames. As a result, the movement locus of each joint of the subject, that is, the transition of the posture is acquired.

Ｓ４０９において、ＣＰＵ１０１は、Ｓ４０７で生成したボーンの関節位置を、Ｓ４０８で生成した３次元関節位置と多視点フレーム単位で対応付け、メッシュおよびボーンを姿勢変遷化（アニメーション化）する。これによって、メッシュとテクスチャは映像先頭の多視点フレームのデータのみ用意し、関節位置の軌跡を表す少量のアニメーションデータを付加するのみで良いので、実質的にモデルデータの時間軸方向への圧縮となり、データ量を大幅に削減できる。 In S409, the CPU 101 associates the joint position of the bone generated in S407 with the three-dimensional joint position generated in S408 in units of multi-viewpoint frames, and changes the posture (animation) of the mesh and the bone. As a result, the mesh and texture need only be prepared for the data of the multi-viewpoint frame at the beginning of the video, and only a small amount of animation data representing the locus of the joint position needs to be added, so that the model data is substantially compressed in the time axis direction. , The amount of data can be significantly reduced.

Ｓ４１０では、ＣＰＵ１０１は、Ｓ４０３〜Ｓ４０９で生成したモデルデータを、図５Ａの表５ａに示す階層構造として保持する。表５ａにおいて、階層は３段階に分かれており、それぞれ、３次元形状とテクスチャから構成される。階層３は、最高階層であり、３次元形状としてＳ４０３で生成された点群データを、テクスチャとしてＳ４０２で取得されたカメラ映像を含み、データ量が最も大きく、生成される仮想視点映像の画質が最も高い。階層２は、３次元形状としてＳ４０４で生成されたメッシュデータを含み、テクスチャとしてＳ４０５で生成されたテクスチャデータを含み、データ量および生成される仮想視点映像の画質は共に中程度である。階層１は、本実施形態では、最低階層であり、３次元形状としてＳ４０６で取得されたメッシュデータとＳ４０９で取得されたアニメーションデータを含み、テクスチャとしてＳ４０６で取得されたテクスチャデータを含む。階層１のモデルデータはデータ量が最も軽いが、仮想視点映像の画質は最も低い。階層ごとに再現可能な項目を図５Ｂ（ａ）の表５ｂにまとめた。階層が下がるにつれて、表現可能な項目が減少するため、データ送信時にはコンテンツ内容に応じて、適切な階層を選択する必要がある。 In S410, the CPU 101 holds the model data generated in S403 to S409 as a hierarchical structure shown in Table 5a of FIG. 5A. In Table 5a, the hierarchy is divided into three stages, each composed of a three-dimensional shape and a texture. Layer 3 is the highest layer, and includes the point cloud data generated in S403 as a three-dimensional shape and the camera image acquired in S402 as a texture, the amount of data is the largest, and the image quality of the generated virtual viewpoint image is high. highest. Layer 2 contains the mesh data generated in S404 as a three-dimensional shape, includes the texture data generated in S405 as a texture, and the amount of data and the image quality of the generated virtual viewpoint image are both medium. The layer 1 is the lowest layer in the present embodiment, and includes the mesh data acquired in S406 as a three-dimensional shape and the animation data acquired in S409, and includes the texture data acquired in S406 as a texture. The model data of layer 1 has the lightest amount of data, but the image quality of the virtual viewpoint image is the lowest. Items that can be reproduced for each layer are summarized in Table 5b of FIG. 5B (a). As the hierarchy goes down, the items that can be expressed decrease, so it is necessary to select an appropriate hierarchy according to the content content when transmitting data.

Ｓ４１１において、分析装置１４０が存在する場合、ＣＰＵ１０１は、分析装置１４０から取得した分析データである被写体のプレー情報（シュート、パス、クリアなどのプレー内容）と、モデルデータを紐づける。これにより、例えば、シュート時の、所望の階層の３次元形状とテクスチャデータを抽出する処理などが実施可能となる。以上の処理により、階層構造を持つモデルデータの生成が完了する。 In S411, when the analyzer 140 is present, the CPU 101 associates the play information (play content such as shoot, pass, clear, etc.) of the subject, which is the analysis data acquired from the analyzer 140, with the model data. This makes it possible to perform, for example, a process of extracting three-dimensional shapes and texture data of a desired layer at the time of shooting. By the above processing, the generation of model data having a hierarchical structure is completed.

図６Ａ、図６Ｂは、モデルデータの圧縮に必要となる、属性データを説明した図である。本実施形態では、属性データには、競技属性、エリア属性、試合属性の３種類が存在し、圧縮効果の大きさは、試合属性＞エリア属性＞競技属性の順となる。各属性データには、仮想視点映像生成に要求されるモデルデータの階層（要求階層）が記述されており、競技属性、エリア属性、試合属性の順に、より細かく要求階層が分類されている。 6A and 6B are diagrams illustrating attribute data required for compression of model data. In the present embodiment, there are three types of attribute data, competition attribute, area attribute, and match attribute, and the magnitude of the compression effect is in the order of match attribute> area attribute> competition attribute. Each attribute data describes a hierarchy of model data (request hierarchy) required for virtual viewpoint image generation, and the request hierarchy is classified in the order of competition attribute, area attribute, and match attribute.

競技属性には、図６Ａ（ａ）の表６ａに示されるようにコンテンツの属性としての競技の種類ごとに要求階層が記述されている。例えば、アメリカンフットボール（アメフト）では、選手がヘルメットを被っており、顔が隠蔽されているため、テクスチャの要求階層は低い。一方で、フィギュアスケートやサッカーでは、選手の顔や表情をはっきり見たいというニーズがあるため、テクスチャへの要求階層が高い。また、３次元形状に関して、アメフトやサッカーでは選手のプレー位置が重要であり、形状の詳細さや動きの滑らかさに対するニーズが低いため、要求階層は低い。一方、フィギュアスケートでは、演技時の動きが重要であるため、３次元形状の要求階層は高くなる。このように、競技ごとに３次元形状とテクスチャ各々で要求階層を決定し、高い方をその競技の要求階層と定義する。 In the competition attribute, as shown in Table 6a of FIG. 6A (a), the required hierarchy is described for each type of competition as the attribute of the content. For example, in American football (American football), the player is wearing a helmet and his face is hidden, so the required level of texture is low. On the other hand, in figure skating and soccer, there is a need to clearly see the faces and facial expressions of athletes, so the level of demand for texture is high. Further, regarding the three-dimensional shape, the play position of the player is important in American football and soccer, and the need for the detail of the shape and the smoothness of movement is low, so the required hierarchy is low. On the other hand, in figure skating, movement during performance is important, so the required hierarchy of three-dimensional shape is high. In this way, the required hierarchy is determined for each of the three-dimensional shapes and textures for each competition, and the higher one is defined as the required hierarchy for the competition.

エリア属性は、図６Ａ（ｃ）の表６ｃに示されるように、それぞれの競技の競技場について、コンテンツの属性としてのエリア（競技場の一部）ごとに要求階層が記述されている。例えば、サッカーでは、図６Ａ（ｂ）に示すように、ゴール前であるエリア０、１は注目されるプレーの発生率が高いため、要求階層が最も高い。コーナーキック等が発生するエリア２が次に要求階層が高く、エリア０、１、２以外の領域の要求階層は低い。なお、図６Ａ（ｂ）ではエリア２が１か所にのみ示されているが、実際はフィールドの４隅にエリア２が設定される。以上を踏まえて、エリアの領域情報とそれに対応する要求階層を図６Ａ（ｃ）の表６ｃのように記述し、これをサッカーのエリア属性とする。一方、フィギュアスケートのように、被写体がどの領域で注目するプレーを行うのかを絞り込めない競技では、エリア属性を定義しない。 As for the area attribute, as shown in Table 6c of FIG. 6A (c), the required hierarchy is described for each area (part of the stadium) as the attribute of the content for each competition stadium. For example, in soccer, as shown in FIG. 6A (b), areas 0 and 1 in front of the goal have a high incidence of attention-grabbing play, so that the required hierarchy is the highest. Area 2 where a corner kick or the like occurs has the next highest request hierarchy, and areas other than areas 0, 1, and 2 have the lowest request hierarchy. Although the area 2 is shown only in one place in FIG. 6A (b), the area 2 is actually set at the four corners of the field. Based on the above, the area information of the area and the corresponding request hierarchy are described as shown in Table 6c of FIG. 6A (c), and this is used as the area attribute of soccer. On the other hand, in competitions such as figure skating where it is not possible to narrow down the area in which the subject pays attention, the area attribute is not defined.

試合属性では、図６Ｂの表６ｄに示すように、コンテンツの属性としてエリア、タイムコードが用いられ、エリア、タイムコードごとに要求階層が記述されている。例えば、サッカーの場合、分析装置１４０からどのようなプレー（シュートなど）がどのタイムコードで発生したかが取得できる。そのため、注目度の高いプレーが発生したタイムコード（期間）において要求階層を高め、それ以外のタイムコードでは要求階層を低下させるなどの処置により、圧縮効率を高めることが可能である。なお、試合属性の生成には分析装置１４０が必須であるため、分析装置１４０が接続されていない場合は、試合属性は定義できない。 In the match attribute, as shown in Table 6d of FIG. 6B, an area and a time code are used as content attributes, and a request hierarchy is described for each area and time code. For example, in the case of soccer, what kind of play (shoot, etc.) occurred at which time code can be obtained from the analyzer 140. Therefore, it is possible to increase the compression efficiency by taking measures such as increasing the request hierarchy in the time code (period) in which the play with high attention is generated and lowering the request hierarchy in the other time codes. Since the analyzer 140 is indispensable for generating the match attribute, the match attribute cannot be defined if the analyzer 140 is not connected.

図７は、属性データを生成する処理を示したフローチャートであり、Ｓ３０３の処理の詳細を示している。Ｓ７０１において、ＣＰＵ１０１は、コンテンツにエリア属性が定義されているか否かを判断する。ＹＥＳの場合はＳ７０２に、ＮＯの場合はＳ７０４にそれぞれ処理が進む。Ｓ７０２において、ＣＰＵ１０１は、コンテンツに試合属性が定義されているか否かを判断する。ＹＥＳの場合はＳ７０３に、ＮＯの場合はＳ７０５にそれぞれ処理が進む。試合属性が存在する場合（Ｓ７０１、Ｓ７０２でＹＥＳ）、Ｓ７０３において、ＣＰＵ１０１は、属性データとして試合属性を選択する。エリア属性が存在しない場合（Ｓ７０１でＮＯ）、Ｓ７０４において、ＣＰＵ１０１は、属性データとして競技属性を選択する。エリア属性が存在するが試合属性が存在しない場合（Ｓ７０１でＹＥＳ、Ｓ７０２でＮＯ）、Ｓ７０５において、ＣＰＵ１０１は、属性データとしてエリア属性を選択する。Ｓ７０６では、選択された属性を基に、図６Ａ，図６Ｂに示した表６ａ、６ｃ、６ｄのような属性データを生成する。例えば、コンテンツにエリア属性としてエリア０，１，２の範囲を示す座標（例えば、(x0,y0)〜(x1,y1)など）が定義されている場合、ＣＰＵ１０１はこれを用いて表６ｃのような属性データを生成する。また、コンテンツにエリア属性に加えて注目度の高いプレーが発生したタイムコードが含まれている場合、ＣＰＵ１０１は、表６ｄのような属性データを生成する。 FIG. 7 is a flowchart showing the process of generating the attribute data, and shows the details of the process of S303. In S701, the CPU 101 determines whether or not the area attribute is defined in the content. If YES, the process proceeds to S702, and if NO, the process proceeds to S704. In S702, the CPU 101 determines whether or not the match attribute is defined in the content. If YES, the process proceeds to S703, and if NO, the process proceeds to S705. When the match attribute exists (YES in S701 and S702), in S703, the CPU 101 selects the match attribute as the attribute data. When the area attribute does not exist (NO in S701), in S704, the CPU 101 selects the competition attribute as the attribute data. When the area attribute exists but the match attribute does not exist (YES in S701, NO in S702), in S705, the CPU 101 selects the area attribute as the attribute data. In S706, attribute data as shown in Tables 6a, 6c, and 6d shown in FIGS. 6A and 6B are generated based on the selected attributes. For example, if the content defines coordinates indicating the range of areas 0, 1 and 2 (for example, (x0, y0) to (x1, y1)) as area attributes, the CPU 101 uses these coordinates in Table 6c. Generate attribute data like this. Further, when the content includes the time code in which the play with high attention is generated in addition to the area attribute, the CPU 101 generates the attribute data as shown in Table 6d.

図８は、モデルデータを表示装置に送信する処理過程を示したフローチャートであり、Ｓ３０４の詳細を示している。Ｓ８０１において、ＣＰＵ１０１は、サーバである画像処理装置１００をクライアントである表示装置（ユーザ端末１３０）からのリクエスト待機状態にする。Ｓ８０２において、仮想視点映像の視聴者であるユーザによるユーザ端末１３０への所定の操作に応じて、ユーザ端末１３０（ＣＰＵ１３１）は、映像再生用のアプリケーションを起動する。Ｓ８０３において、ユーザ端末１３０（ＣＰＵ１３１）は、ユーザが視聴したいコンテンツを選択する。コンテンツ選択にはアプリケーションが用いられる。映像再生用のアプリケーションの起動後、ユーザ端末１３０（ＣＰＵ１３１）は、図９に示されるようなコンテンツ選択ウィンドウ９０１を表示部１３５に表示する。ユーザは所望のアイコン９０２をタッチすることで、所望のコンテンツを選択することができる。コンテンツが選択されると、ユーザ端末１３０（ＣＰＵ１３１）は、画像処理装置１００に対してモデルデータをダウンロードするリクエストを送信する。その際、ユーザ端末１３０（ＣＰＵ１３１）は、ユーザ端末１３０が備える表示装置（表示部１３５）のディスプレイ解像度、ＣＰＵ１３１やＧＰＵのスペック情報も画像処理装置１００に送信する。 FIG. 8 is a flowchart showing a processing process of transmitting model data to the display device, and shows the details of S304. In S801, the CPU 101 puts the image processing device 100, which is a server, into a request standby state from a display device (user terminal 130), which is a client. In S802, the user terminal 130 (CPU 131) activates an application for video reproduction in response to a predetermined operation on the user terminal 130 by a user who is a viewer of the virtual viewpoint video. In S803, the user terminal 130 (CPU131) selects the content that the user wants to watch. An application is used for content selection. After starting the application for video reproduction, the user terminal 130 (CPU 131) displays the content selection window 901 as shown in FIG. 9 on the display unit 135. The user can select the desired content by touching the desired icon 902. When the content is selected, the user terminal 130 (CPU 131) sends a request to download the model data to the image processing device 100. At that time, the user terminal 130 (CPU 131) also transmits the display resolution of the display device (display unit 135) included in the user terminal 130 and the spec information of the CPU 131 and the GPU to the image processing device 100.

なお、ユーザ端末１３０は、コンテンツ選択ウィンドウ９０１を表示するために、画像処理装置１００から選択対象となるコンテンツのリストを取得しておく。リストに掲載されているコンテンツの各々は、時間的に連続した１まとまりの多視点フレームに対応する。例えば、プレー内容（分析装置１４０による分析結果）に基づいて、そのプレーが発生した近辺のタイムコードを含む一連の多視点フレームにより１つのコンテンツが生成されてもよい。例えば、ステップＳ４１１で説明したように、プレー情報ごとに、紐づけられたモデルデータを１つのコンテンツとしてもよい。或いは、例えば、試合の前半の多視点フレームと後半の多視点フレームで別々のコンテンツが生成されてもよい。なお、各コンテンツには、プレー内容とその発生位置に基づいて自動的に設定された仮想カメラの位置、姿勢（方向）が定義されていてもよい。 The user terminal 130 acquires a list of contents to be selected from the image processing device 100 in order to display the content selection window 901. Each of the contents on the list corresponds to one set of multi-viewpoint frames that are continuous in time. For example, one content may be generated by a series of multi-viewpoint frames including a time code in the vicinity where the play occurred, based on the play content (analysis result by the analyzer 140). For example, as described in step S411, the model data associated with each play information may be used as one content. Alternatively, for example, different contents may be generated in the multi-view frame in the first half of the game and the multi-view frame in the second half. The position and posture (direction) of the virtual camera automatically set based on the play content and the position where the play content is generated may be defined in each content.

画像処理装置１００は、Ｓ８０１で、ユーザ端末１３０から送信対象のコンテンツのリクエストを受け付けると、Ｓ８０４以降の処理により、送信対象とする階層を決定し、決定された階層のモデルデータを送信する。まず、Ｓ８０４では、画像処理装置１００のＣＰＵ１０１が通信回線の空き状況を取得する。Ｓ８０５において、ＣＰＵ１０１は、ユーザ端末１３０から受信したスペック情報から、モデルデータのスペック階層を設定する。例えば、ＣＰＵやＧＰＵの性能がローエンドであれば、処理負荷の高い階層３や階層２のモデルデータを処理できないので、スペック階層を階層１に設定する。また、ディスプレイ解像度が低い場合は、階層間の差異が認識しにくいので、スペック階層を階層２以下（すなわち階層１または階層２）に設定する。Ｓ８０６では、ＣＰＵ１０１は、Ｓ８０５で設定されたたスペック階層が階層１であるか否かを判断する。階層１である場合は、処理はＳ８１１に進み、それ以外の場合は、処理はＳ８０７に進む。 When the image processing apparatus 100 receives the request for the content to be transmitted from the user terminal 130 in S801, the image processing apparatus 100 determines the layer to be transmitted by the processing after S804, and transmits the model data of the determined layer. First, in S804, the CPU 101 of the image processing device 100 acquires the availability of the communication line. In S805, the CPU 101 sets the spec hierarchy of the model data from the spec information received from the user terminal 130. For example, if the performance of the CPU or GPU is low-end, the model data of the layer 3 or layer 2 having a high processing load cannot be processed, so the spec layer is set to layer 1. Further, when the display resolution is low, it is difficult to recognize the difference between the layers, so the spec layer is set to layer 2 or less (that is, layer 1 or layer 2). In S806, the CPU 101 determines whether or not the spec hierarchy set in S805 is the hierarchy 1. If it is layer 1, the process proceeds to S811, otherwise the process proceeds to S807.

Ｓ８０７において、ＣＰＵ１０１は、Ｓ３０３で生成された属性データを用いて送信用モデルデータを生成する。送信用モデルデータは、図１１に示すように、タイムコードごとに生成される。図１１（ａ）はタイムコード０番目における送信用データである。データ構造を記述するヘッダ部と、背景モデルデータ、被写体モデルデータから構成され、被写体ごと（選手ごと）に要求階層に応じた階層のデータを保持する。ただし、全ての被写体が最低階層である階層１のモデルデータを必ず保持するものとする。これは、後述する仮想視点映像生成時に利用するためである。図１１（ｂ）はタイムコード１番目の送信用データを示す。背景モデルデータは重複するため、削減されている。また、各被写体のモデルデータの階層も属性データに合わせて変更されている。これらを全タイムコード分つなぎ合わせたものが、送信用モデルデータとなる。このとき、要求階層がスペック階層よりも上位の階層である場合、その要求階層をスペック階層まで引き下げる。このように表示装置（ユーザ端末１３０の表示部１３５）の能力に基づいて送信用モデルデータを構成するモデルデータの階層が制限される。 In S807, the CPU 101 generates transmission model data using the attribute data generated in S303. As shown in FIG. 11, the transmission model data is generated for each time code. FIG. 11A is transmission data at the 0th time code. It is composed of a header part that describes the data structure, background model data, and subject model data, and holds the data of the hierarchy corresponding to the request hierarchy for each subject (for each player). However, it is assumed that the model data of the layer 1 in which all the subjects are the lowest layer is always retained. This is to be used when generating a virtual viewpoint image, which will be described later. FIG. 11B shows the transmission data of the first time code. Background model data has been reduced due to duplication. In addition, the hierarchy of model data of each subject is also changed according to the attribute data. The data obtained by connecting these for all time codes is the transmission model data. At this time, if the request hierarchy is higher than the spec hierarchy, the request hierarchy is lowered to the spec hierarchy. In this way, the hierarchy of model data constituting the transmission model data is limited based on the ability of the display device (display unit 135 of the user terminal 130).

Ｓ８０８では、ＣＰＵ１０１は、Ｓ８０４で取得した通信回線の空き状況と、Ｓ８０７で生成した送信用モデルデータのサイズから、送信用モデルデータを送信可能であるかを判断する。通信可能と判断された（ＹＥＳ）場合、処理はＳ８１４に進み、通信できないと判断された（ＮＯ）場合、処理はＳ８０９に進む。Ｓ８０９において、ＣＰＵ１０１は、属性データに記述されている要求階層を１段階低下させて送信用モデルデータを生成する。例えば、図６Ａ（ｃ）の表６ｃのエリア０の要求階層は３から２に、エリア２の要求階層は２から１に低下させる。ただし、要求階層が１である場合は、それ以上低下させない。Ｓ８１０では、ＣＰＵ１０１は、Ｓ８０４で取得した通信回線の空き状況とＳ８０９で生成した送信用モデルデータのサイズから、送信用モデルデータを送信可能であるかを判断する。送信可能な場合、処理はＳ８１４に進み、送信可能でない場合、処理はＳ８１１に進む。Ｓ８１１では、ＣＰＵ１０１はすべての要求階層を１に設定して送信用モデルデータを生成する。Ｓ８１２では、ＣＰＵ１０１は、Ｓ８０４で取得した通信回線の空き状況と、Ｓ８１１で生成した送信用データのサイズから、送信用モデルデータを送信可能であるかを判断する。ＹＥＳの場合、処理はＳ８１４に進み、ＮＯの場合、処理はＳ８１３に進む。Ｓ８１３では、ＣＰＵ１０１は、通信回線に空き容量が出るまで（他のユーザが通信を完了するまで）待機する。Ｓ８１４では、送信用モデルデータを画像処理装置１００から表示装置（ユーザ端末１３０）に送信する。 In S808, the CPU 101 determines whether or not the transmission model data can be transmitted from the availability of the communication line acquired in S804 and the size of the transmission model data generated in S807. If it is determined that communication is possible (YES), the process proceeds to S814, and if it is determined that communication is not possible (NO), the process proceeds to S809. In S809, the CPU 101 lowers the request hierarchy described in the attribute data by one step to generate transmission model data. For example, the request hierarchy of area 0 in Table 6c of FIG. 6A (c) is lowered from 3 to 2, and the request hierarchy of area 2 is lowered from 2 to 1. However, when the request hierarchy is 1, it is not lowered any further. In S810, the CPU 101 determines whether or not the transmission model data can be transmitted from the availability of the communication line acquired in S804 and the size of the transmission model data generated in S809. If transmission is possible, the process proceeds to S814, and if transmission is not possible, the process proceeds to S811. In S811, the CPU 101 sets all request layers to 1 and generates transmission model data. In S812, the CPU 101 determines whether or not transmission model data can be transmitted from the availability of the communication line acquired in S804 and the size of the transmission data generated in S811. If YES, the process proceeds to S814, and if NO, the process proceeds to S813. In S813, the CPU 101 waits until there is free space on the communication line (until another user completes the communication). In S814, the transmission model data is transmitted from the image processing device 100 to the display device (user terminal 130).

Ｓ８１５では、ユーザ端末１３０（ＣＰＵ１３１）がモデルデータを受信する。Ｓ８１６では、ＣＰＵ１３１が、受信したモデルデータを用いて仮想視点映像を生成し、表示部１３５に再生する。ユーザ端末１３０においてアプリケーションを実行しているＣＰＵ１３１は、画像処理装置１００からモデルデータを受信すると、図９（ｂ）に示されるような仮想視点ウィンドウ９０３に遷移する。仮想視点ウィンドウ９０３では、選手モデルデータ９０４、背景モデルデータ９０５が表示されており、画面へのタッチ操作等に応じて任意のカメラ位置／方向／画角で映像を表示することが可能である。また、タイムコードスライダバー９０６を用いて、任意のタイムコードの映像に移動することも可能である。以下、図１０を参照して、仮想視点映像の生成に関して説明する。 In S815, the user terminal 130 (CPU131) receives the model data. In S816, the CPU 131 generates a virtual viewpoint image using the received model data and reproduces it on the display unit 135. When the CPU 131 executing the application on the user terminal 130 receives the model data from the image processing device 100, the CPU 131 transitions to the virtual viewpoint window 903 as shown in FIG. 9B. In the virtual viewpoint window 903, the player model data 904 and the background model data 905 are displayed, and it is possible to display an image at an arbitrary camera position / direction / angle of view according to a touch operation on the screen or the like. It is also possible to move to an image of an arbitrary time code by using the time code slider bar 906. Hereinafter, the generation of the virtual viewpoint image will be described with reference to FIG.

図１０は、ユーザ端末１３０による仮想視点映像生成の処理を説明したフローチャートであり、Ｓ８１６の詳細を示している。Ｓ１００１では、ＣＰＵ１３１は、ユーザのタッチ操作に合わせた仮想カメラの位置、方向、画角を設定する。Ｓ１００２では、ＣＰＵ１３１は、階層１のモデルデータを用いて、設定された仮想カメラの位置、方向、画角における映像生成を行う（映像１）。映像生成は、周知のコンピュータグラフィックスの技術を用いることで実施可能である。 FIG. 10 is a flowchart illustrating the process of generating the virtual viewpoint image by the user terminal 130, and shows the details of S816. In S1001, the CPU 131 sets the position, direction, and angle of view of the virtual camera according to the user's touch operation. In S1002, the CPU 131 uses the model data of the layer 1 to generate an image at the set position, direction, and angle of view of the virtual camera (image 1). Video generation can be performed using well-known computer graphics techniques.

Ｓ１００３では、ＣＰＵ１３１は、送信されたモデルデータに階層２のモデルデータが存在するか否かを判断する。ＹＥＳの場合、処理はＳ１００４に進み、ＮＯの場合、処理はＳ１００５に進む。Ｓ１００４において、ＣＰＵ１３１は、階層２のモデルデータを用いて、設定された仮想カメラの位置、方向、画角における映像生成を行う（映像２）。Ｓ１００５において、ＣＰＵ１３１は、送信されたモデルデータに階層３のモデルデータが存在するか否かを判断する。ＹＥＳの場合、処理はＳ１００６に進み、ＮＯの場合、処理はＳ１００７に進む。Ｓ１００６では、ＣＰＵ１３１は、階層３のモデルデータを用いて、設定された仮想カメラの位置、方向、画角における映像生成を行う（映像３）。 In S1003, the CPU 131 determines whether or not the model data of the layer 2 exists in the transmitted model data. If YES, the process proceeds to S1004, and if NO, the process proceeds to S1005. In S1004, the CPU 131 uses the model data of the layer 2 to generate an image at the set position, direction, and angle of view of the virtual camera (image 2). In S1005, the CPU 131 determines whether or not the model data of the layer 3 exists in the transmitted model data. If YES, the process proceeds to S1006, and if NO, the process proceeds to S1007. In S1006, the CPU 131 uses the model data of the layer 3 to generate an image at the set position, direction, and angle of view of the virtual camera (image 3).

Ｓ１００７では、ＣＰＵ１３１は、連続するタイムコード間（前タイムコードと現タイムコード）で選手モデルの階層に差異があるかを判断する。これは、例えば、タイムコード０では選手２の階層が１であるが、タイムコード１では階層が３になるようなケースである。差異がある場合、処理はＳ１００８に進み、差異がない場合、処理はＳ１００９に進む。Ｓ１００８では、ＣＰＵ１３１は、映像１と映像２、３を合成（たとえばアルファブレンディング）して被写体映像を生成する。これは、タイムコード間で階層に差異がある場合、画質が急激に変動することを防ぐためである。一方、Ｓ１００９では、ＣＰＵ１３１は、映像１の被写体領域を高画質の映像２または映像３で置換して被写体映像を生成する。Ｓ１０１０では、ＣＰＵ１３１は、背景モデルをレンダリングして背景映像を生成する。Ｓ１０１１では、ＣＰＵ１３１は、被写体映像と背景映像を合成し、仮想視点映像を生成する。なお、Ｓ１００７において、タイムコード０では選手２の階層が２であるが、タイムコード１では階層が１になるようなケース（連続するタイムコードにおいて階層が低下するケース）では、差異がないと判断している。このようなケースでは、被写体の重要度が低下しており、画質が急激に変化しても問題がないためである。なお、上記では階層１の映像から階層２または３の映像に変化した場合に映像の合成を行うようにし、他の場合には合成を行わないようにしたが、これに限られるものではない。例えば、被写体のモデルデータの階層が変化した場合に変化前の階層の被写体映像と変化後の階層の被写体映像を合成するようにしてもよい。 In S1007, the CPU 131 determines whether there is a difference in the hierarchy of the player model between consecutive time codes (previous time code and current time code). This is a case where, for example, in the time code 0, the hierarchy of the player 2 is 1, but in the time code 1, the hierarchy is 3. If there is a difference, the process proceeds to S1008, and if there is no difference, the process proceeds to S1009. In S1008, the CPU 131 generates a subject image by synthesizing (for example, alpha blending) the image 1 and the images 2 and 3. This is to prevent the image quality from suddenly fluctuating when there is a difference in the hierarchy between the time codes. On the other hand, in S1009, the CPU 131 replaces the subject area of the image 1 with the high-quality image 2 or the image 3 to generate the subject image. In S1010, the CPU 131 renders a background model to generate a background image. In S1011, the CPU 131 synthesizes the subject image and the background image to generate a virtual viewpoint image. In S1007, it is determined that there is no difference in the case where the player 2 has a hierarchy of 2 in the time code 0, but the hierarchy becomes 1 in the time code 1 (the case where the hierarchy is lowered in the continuous time code). doing. In such a case, the importance of the subject is reduced, and there is no problem even if the image quality changes suddenly. In the above, when the video of the layer 1 is changed to the video of the layer 2 or 3, the video is synthesized, and in other cases, the composite is not performed, but the present invention is not limited to this. For example, when the hierarchy of the model data of the subject changes, the subject image of the layer before the change and the subject image of the hierarchy after the change may be combined.

以上説明したように、第１実施形態の画像処理装置によれば、被写体ごとの３次元のモデルデータが複数の階層で生成され、競技や実試合の分析結果などのコンテンツの特性を基に被写体の重要度が決定される。そして、重要度に応じて送信用モデルデータを構成するモデルデータの階層が設定されるので、インタラクティブ操作に対応した仮想視点映像のためのモデルデータを適切に生成し、且つ、効率よく送信することができる。 As described above, according to the image processing apparatus of the first embodiment, three-dimensional model data for each subject is generated in a plurality of layers, and the subject is based on the characteristics of the content such as the analysis result of the competition or the actual game. The importance of is determined. Then, since the hierarchy of the model data that constitutes the model data for transmission is set according to the importance, the model data for the virtual viewpoint video corresponding to the interactive operation should be appropriately generated and efficiently transmitted. Can be done.

＜第２実施形態＞
第２実施形態では、被写体の競技への関与度と注目度、ユーザの嗜好性から属性データを修正する構成に関して説明する。なお、第１実施形態と重複するシステム、処理の説明は省略する。 <Second Embodiment>
In the second embodiment, the configuration for modifying the attribute data from the degree of involvement and attention of the subject in the competition and the preference of the user will be described. The description of the system and the process that overlap with the first embodiment will be omitted.

第１実施形態では、競技の種類、エリア、重要なプレー等の発生イベントを用いて、各被写体のモデルデータの重要性を判断し、使用する階層を決定した。第２実施形態では、さらに被写体である選手の注目度（例えば、有名選手か否か）、ユーザの嗜好性（例えば、好きな選手か否か）、競技への関与度（例えば、ボールとの距離）を用いて要求階層を変更する。図１２（ａ）において、表１２ａは、注目度、嗜好性、関与度に基づく階層の変更例を示している。ボールと選手との距離である関与度は、ＣＰＵ１０１が、映像データを解析して自動的に取得する。注目度、嗜好性は、ユーザ端末１３０上の所定のユーザインターフェースを介してユーザが設定した内容である。ユーザによる設定は、ユーザ端末１３０から通信により画像処理装置１００へ通知される。表１２ａの要求階層の欄は、属性データに基づいて決定された各被写体のモデルデータの階層の例である。また、表１２ａのＳ１２０１、Ｓ１２０３の欄に記載された階層は、それぞれ、図１２（ｂ）のフローチャートのＳ１２０１、Ｓ１２０３において、注目度、嗜好性、関与度に基づいて変更された後の階層を示している。 In the first embodiment, the importance of the model data of each subject is determined by using the occurrence event such as the type of competition, the area, and the important play, and the hierarchy to be used is determined. In the second embodiment, the degree of attention of the player who is the subject (for example, whether or not it is a famous player), the user's preference (for example, whether or not it is a favorite player), and the degree of involvement in the competition (for example, with the ball). Change the request hierarchy using distance). In FIG. 12 (a), Table 12a shows an example of changing the hierarchy based on the degree of attention, the degree of preference, and the degree of involvement. The CPU 101 automatically acquires the degree of involvement, which is the distance between the ball and the player, by analyzing the video data. The degree of attention and preference are contents set by the user via a predetermined user interface on the user terminal 130. The setting by the user is notified from the user terminal 130 to the image processing device 100 by communication. The column of the request hierarchy in Table 12a is an example of the hierarchy of model data of each subject determined based on the attribute data. Further, the hierarchies described in the columns S1201 and S1203 of Table 12a are the hierarchies after being changed based on the degree of attention, the degree of preference, and the degree of involvement in S1201 and S1203 of the flowchart of FIG. 12B, respectively. Shows.

図１２（ｂ）は、図３のＳ３０３において属性データを生成した後に、各被写体（選手）について行う階層の変更処理を説明したフローチャートである。Ｓ１２０１において、画像処理装置１００のＣＰＵ１０１は、表１２ａ中の注目度と嗜好性に基づいて、各選手の階層を変更する。例えば、注目度、嗜好性がともに低い選手については、要求階層を１つ下げる、注目度と嗜好性がともに高い選手については要求階層を最高階層にするといった、あらかじめ設定されたルールに従って階層が変更される。本例では、選手Ｎは要求階層が階層１であったが、注目度、嗜好性が高いため、階層を３に引き上げている。一方、選手１は要求階層が階層２であったが、注目度、嗜好性が低いため、階層を１に引き下げている。 FIG. 12B is a flowchart illustrating a layer change process performed for each subject (player) after the attribute data is generated in S303 of FIG. In S1201, the CPU 101 of the image processing device 100 changes the hierarchy of each player based on the degree of attention and preference in Table 12a. For example, the hierarchy is changed according to preset rules, such as lowering the request hierarchy by one for players with low attention and preference, and raising the request hierarchy to the highest hierarchy for players with high attention and preference. Will be done. In this example, the player N has a request hierarchy of hierarchy 1, but since the degree of attention and preference is high, the hierarchy is raised to three. On the other hand, although the player 1 had a request hierarchy of hierarchy 2, the hierarchy was lowered to 1 because the degree of attention and preference was low.

Ｓ１２０２では、ＣＰＵ１０１は、変動前後で送信用モデルデータのサイズが増加するか否かを判断する。ＹＥＳの場合はＳ１２０３に進み、ＮＯの場合は処理を終了する。Ｓ１２０３では、送信用モデルデータのサイズを下げるため、表１２ａの関与度（ボールとの距離）に応じて、階層を引き下げる。例えば、選手２はＳ１２０１後において階層３であったが、ボールとの距離が離れているため、競技への関与度が低いと判断し、階層２に引き下げる。 In S1202, the CPU 101 determines whether or not the size of the transmission model data increases before and after the fluctuation. If YES, the process proceeds to S1203, and if NO, the process ends. In S1203, in order to reduce the size of the transmission model data, the hierarchy is lowered according to the degree of involvement (distance to the ball) in Table 12a. For example, the player 2 was in the third level after S1201, but since the distance from the ball is far, it is judged that the degree of involvement in the competition is low, and the player 2 is lowered to the second level.

以上説明したように、第２実施形態によれば、被写体ごとに生成された複数の階層のモデルデータから使用する階層を選択するにおいて、被写体の注目度、ユーザの嗜好性、競技への関与度というような、個々の被写体に関する属性が考慮される。結果、より適切な送信用モデルデータを生成することができ、インタラクティブ操作に対応した仮想視点映像のためのモデルデータを効率よく送信することができる。なお、上記では、個々の被写体に関する属性として、関与度、注目度、嗜好性を例示したが、これらに限られるものではない。また、Ｓ１２０１では注目度と嗜好性の両方を考慮したが、いずれか一方を考慮する構成であってもよい。 As described above, according to the second embodiment, in selecting the layer to be used from the model data of a plurality of layers generated for each subject, the degree of attention of the subject, the preference of the user, and the degree of involvement in the competition. Attributes related to individual subjects, such as, are considered. As a result, more appropriate model data for transmission can be generated, and model data for virtual viewpoint video corresponding to interactive operation can be efficiently transmitted. In the above, the degree of involvement, the degree of attention, and the preference are exemplified as the attributes related to each subject, but the attributes are not limited to these. Further, in S1201, both the degree of attention and the palatability are taken into consideration, but a configuration may be made in which either one is taken into consideration.

＜第３実施形態＞
第２実施形態では、個々の被写体に設定される属性に基づいて要求階層を変更する構成を説明した。第３実施形態では、３次元モデルデータをストリーム送信する際に、仮想カメラの位置、姿勢、画角に応じて要求階層を変更し、送信用モデルデータを最適化する構成に関して説明する。なお、第１実施形態、第２実施形態と重複するシステム、処理の説明は省略する。 <Third Embodiment>
In the second embodiment, a configuration for changing the request hierarchy based on the attributes set for each subject has been described. In the third embodiment, when the three-dimensional model data is stream-transmitted, the request hierarchy is changed according to the position, posture, and angle of view of the virtual camera, and the configuration for optimizing the transmission model data will be described. The description of the system and the process overlapping with the first embodiment and the second embodiment will be omitted.

ストリーム送信では、画像処理装置１００からタイムコードＭのモデルデータを送信し、表示装置で受信・再生を行う。その後、ユーザ端末１３０から仮想カメラの位置・姿勢をフィードバックし、それを基に、画像処理装置１００はタイムコードＭ＋１のモデルデータを送信する。このような処理を順次繰り返すことにより、全データの受信を待たずに、表示装置でインタラクティブな仮想視点映像再生が可能である。このとき、前タイムコードの仮想カメラ位置・姿勢を考慮することにより、より最適な階層を選択することが可能である。 In stream transmission, model data of time code M is transmitted from the image processing device 100, and reception / reproduction is performed by the display device. After that, the position / orientation of the virtual camera is fed back from the user terminal 130, and the image processing device 100 transmits the model data of the time code M + 1 based on the feedback. By repeating such processing in sequence, it is possible to reproduce the interactive virtual viewpoint video on the display device without waiting for the reception of all the data. At this time, it is possible to select a more optimal hierarchy by considering the virtual camera position / orientation of the previous time code.

図１３（ａ）は、あるタイムコードＭでの仮想カメラの位置／姿勢／画角を表している。このとき、仮想カメラの画角内および画角内に近い選手は、次のタイムコードにおいても映像に含まれる可能性が高い。一方、画角内から遠く離れた選手は、映像に含まれる可能性が低い。また、仮想カメラとの距離が離れている場合は画質として階層間の差異が発生しにくい。これらの項目をまとめると図１３（ｂ）の表１３ｂのようになる。なお、表１３ｂにおいては、属性データに基づいて設定された要求階層が図１３（ｃ）の処理（ステップＳ１３０１、Ｓ１３０３）により変更される例が示されている。 FIG. 13A shows the position / posture / angle of view of the virtual camera at a certain time code M. At this time, players within the angle of view of the virtual camera and close to the angle of view are likely to be included in the video even in the next time code. On the other hand, players far from the angle of view are unlikely to be included in the video. Further, when the distance from the virtual camera is long, the difference in image quality between layers is unlikely to occur. These items are summarized in Table 13b of FIG. 13 (b). Note that Table 13b shows an example in which the request hierarchy set based on the attribute data is changed by the process (steps S1301 and S1303) of FIG. 13 (c).

図１３（ｃ）は、ストリーミング中に階層調整を行う処理を説明したフローチャートである。Ｓ１３０１において、ＣＰＵ１０１は、各選手がカメラの画角内（〇）か否（×）か、もしくは画角に近い領域に存在するか（△）、という項目と、仮想カメラと選手間の距離とを用いて、要求階層を変更する。例えば、選手Ｎは要求階層が１であったが、画角内に近く、仮想カメラとの距離が短いため、階層を３に引き上げる。一方、選手２は要求階層が３であったが、仮想カメラとの距離が長いため、階層を２に引き下げる。 FIG. 13C is a flowchart illustrating a process of performing layer adjustment during streaming. In S1301, the CPU 101 includes items such as whether or not each player is within the angle of view (○) of the camera (×) or exists in a region close to the angle of view (Δ), and the distance between the virtual camera and the player. To change the request hierarchy using. For example, player N had a request hierarchy of 1, but since it is close to the angle of view and the distance to the virtual camera is short, the hierarchy is raised to 3. On the other hand, player 2 had a request hierarchy of 3, but since the distance to the virtual camera is long, the hierarchy is lowered to 2.

Ｓ１３０２では、ＣＰＵ１０１は、変更の前後で送信用データのサイズが増加するか否かを判断する。ＹＥＳの場合、処理はＳ１３０３に進み、ＮＯの場合は処理を終了する。Ｓ１３０３では、ＣＰＵ１０１は、サイズを下げるため、仮想カメラの移動速度と、仮想カメラと選手間の距離に応じて、階層を引き下げる。なお、仮想カメラの移動速度は前フレームおよびそれ以前のフレームにおける位置姿勢の変化量に基づいて計算される。例えば、仮想カメラの移動速度が高速の場合を考える。このとき、選手２はＳ１３０１後において階層２であったが、仮想カメラとの距離が離れているため、映像内をかなり高速に移動することになる。そのため、階層ごとの画質の差異はほとんど発生しないと判断し、階層１に引き下げる。 In S1302, the CPU 101 determines whether or not the size of the transmission data increases before and after the change. If YES, the process proceeds to S1303, and if NO, the process ends. In S1303, in order to reduce the size, the CPU 101 lowers the hierarchy according to the moving speed of the virtual camera and the distance between the virtual camera and the player. The moving speed of the virtual camera is calculated based on the amount of change in the position and orientation in the previous frame and the frames before that. For example, consider the case where the moving speed of the virtual camera is high. At this time, the player 2 was in the layer 2 after S1301, but since the distance from the virtual camera is long, the player 2 moves at a considerably high speed in the image. Therefore, it is determined that there is almost no difference in image quality for each layer, and the image quality is reduced to layer 1.

以上説明したように、第３実施形態によれば、ユーザ端末において指定されている仮想カメラの位置／姿勢／画角を基に被写体の階層が変更されるので、仮想カメラの状態に応じて適切な送信用モデルデータを生成することができる。なお上記の実施形態では、被写体ごとにデータサイズが異なる複数の階層のモデルデータを生成する例を中心に説明した。しかし、複数の被写体のうち、１又は複数の特定の被写体についてのみ複数の階層のモデルデータを生成するようにしても良い。 As described above, according to the third embodiment, the hierarchy of the subject is changed based on the position / posture / angle of view of the virtual camera specified in the user terminal, so that it is appropriate according to the state of the virtual camera. It is possible to generate model data for transmission. In the above embodiment, an example of generating model data of a plurality of layers having different data sizes for each subject has been mainly described. However, the model data of a plurality of layers may be generated only for one or a plurality of specific subjects among the plurality of subjects.

＜その他の実施形態＞
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 <Other embodiments>
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１００：画像処理装置、１０１：ＣＰＵ、１０２：メインメモリ、１０３：記憶部、１０４：入力部、１０５：表示部、１０６：外部Ｉ／Ｆ部、１０８：ＬＡＮ、１２０：カメラ、１３０：ユーザ端末、１４０：分析装置 100: Image processing device, 101: CPU, 102: Main memory, 103: Storage unit, 104: Input unit, 105: Display unit, 106: External I / F unit, 108: LAN, 120: Camera, 130: User terminal , 140: Analyzer

本発明の一態様による送信装置は以下の構成を有する。すなわち、
複数の撮影装置が被写体を撮影することにより得られる複数の撮影画像に基づく仮想視点画像の生成に使用される三次元形状データであって、特定の被写体の三次元形状を表す被写体の三次元形状データと、前記特定の被写体とは異なるオブジェクトに対応する背景の三次元形状を表す背景の三次元形状データとを他の装置に送信する送信手段と、
前記送信手段により前記背景の三次元形状データが送信される頻度が、前記特定の被写体の三次元形状データが送信される頻度よりも少なくなるように制御する制御手段と
を有する。
The transmission device according to one aspect of the present invention has the following configuration. That is,
Three-dimensional shape data used to generate a virtual viewpoint image based on a plurality of captured images obtained by photographing a subject by a plurality of photographing devices, and is a three-dimensional shape of a subject representing the three-dimensional shape of a specific subject. A transmission means for transmitting data and background three-dimensional shape data representing a background three-dimensional shape corresponding to an object different from the specific subject to another device.
It has a control means for controlling the frequency at which the three-dimensional shape data of the background is transmitted by the transmission means to be less than the frequency at which the three-dimensional shape data of the specific subject is transmitted .

Claims

複数の撮影装置が行う撮影により得られる複数の撮影画像に基づいて生成される複数の三次元形状データであって、点群又はボクセルにより表される三次元形状データと、メッシュにより表される三次元形状データと、を含む複数の三次元形状データの中から、仮想視点画像の生成に使用される三次元形状データを決定する決定手段と、
前記決定手段により決定される三次元形状データを他の装置に送信する送信手段と、
を有することを特徴とする送信装置。 A plurality of three-dimensional shape data generated based on a plurality of captured images obtained by imaging performed by a plurality of photographing devices, the three-dimensional shape data represented by a point cloud or a voxel, and the third order represented by a mesh. A determination means for determining the 3D shape data used for generating the virtual viewpoint image from the original shape data and a plurality of 3D shape data including the original shape data.
A transmission means for transmitting the three-dimensional shape data determined by the determination means to another device, and
A transmitter characterized by having.

前記決定手段は、前記複数の撮影装置により撮影される撮影対象の種類に基づいて、仮想視点画像の生成に使用される三次元形状データを決定することを特徴とする請求項１に記載の送信装置。 The transmission according to claim 1, wherein the determination means determines three-dimensional shape data to be used for generating a virtual viewpoint image based on the types of images to be photographed by the plurality of image pickup devices. Device.

前記撮影対象の種類は、前記複数の撮影装置により撮影される競技の種類を含むことを特徴とする請求項２に記載の送信装置。 The transmission device according to claim 2, wherein the type of the photographing target includes a type of competition photographed by the plurality of photographing devices.

前記撮影対象の種類は、前記複数の撮影装置により撮影される領域の種類を含むことを特徴とする請求項２又は３に記載の送信装置。 The transmission device according to claim 2 or 3, wherein the type of the photographing target includes a type of an area photographed by the plurality of photographing devices.

前記撮影対象の種類は、前記複数の撮影装置により撮影されるプレーの種類を含むことを特徴とする請求項２乃至４のいずれか１項に記載の送信装置。 The transmission device according to any one of claims 2 to 4, wherein the type of the photographing target includes a type of play photographed by the plurality of photographing devices.

前記決定手段は、
前記複数の撮影装置により第１の撮影対象が撮影される場合、前記第１の撮影対象に関する仮想視点画像の生成に使用される三次元形状データとして、点群又はボクセルにより表される三次元形状データを決定し、
前記複数の撮影装置により、前記第１の撮影対象とは異なる第２の撮影対象が撮影される場合、前記第２の撮影対象に関する仮想視点画像の生成に使用される三次元形状データとして、メッシュにより表される三次元形状データを決定する
ことを特徴とする請求項２乃至５のいずれか１項に記載の送信装置。 The determination means is
When the first shooting target is shot by the plurality of shooting devices, the three-dimensional shape represented by a point cloud or a voxel is used as the three-dimensional shape data used to generate the virtual viewpoint image for the first shooting target. Determine the data,
When a second shooting target different from the first shooting target is shot by the plurality of shooting devices, the mesh is used as three-dimensional shape data used to generate a virtual viewpoint image for the second shooting target. The transmitting device according to any one of claims 2 to 5, wherein the three-dimensional shape data represented by the above is determined.

前記撮影対象の種類は、前記他の装置から取得される情報に基づいて特定されることを特徴とする請求項２乃至６のいずれか１項に記載の送信装置。 The transmitting device according to any one of claims 2 to 6, wherein the type of the photographing target is specified based on the information acquired from the other device.

前記送信手段は、前記撮影対象の種類を指定させるための表示を、前記他の装置が有するディスプレイに表示させるための情報を前記他の装置に送信することを特徴とする請求項２乃至７のいずれか１項に記載の送信装置。 15. The transmitter according to any one of the following items.

前記決定手段は、生成される仮想視点画像のタイムコードに基づいて、仮想視点画像の生成に使用される三次元形状データを決定することを特徴とする請求項１乃至８のいずれか１項に記載の送信装置。 The determination means according to any one of claims 1 to 8, wherein the determination means determines the three-dimensional shape data used for generating the virtual viewpoint image based on the time code of the generated virtual viewpoint image. The transmitter described.

前記決定手段は、前記送信装置と前記他の装置とを接続する通信回線の空き状況に基づいて、仮想視点画像の生成に使用される三次元形状データを決定することを特徴とする請求項１乃至９のいずれか１項に記載の送信装置。 Claim 1 is characterized in that the determination means determines three-dimensional shape data used for generating a virtual viewpoint image based on the availability of a communication line connecting the transmission device and the other device. 9. The transmitter according to any one of 9.

前記決定手段は、前記他の装置が有するプロセッサ及びディスプレイの能力のうち、少なくともいずれか一方に基づいて、仮想視点画像の生成に使用される三次元形状データを決定することを特徴とする請求項１乃至１０のいずれか１項に記載の送信装置。 The determination means is characterized in that it determines the three-dimensional shape data used for generating a virtual viewpoint image based on at least one of the processor and display capabilities of the other device. The transmitter according to any one of 1 to 10.

前記決定手段は、前記他の装置を使用するユーザの嗜好性に基づいて、仮想視点画像の生成に使用される三次元形状データを決定することを特徴とする請求項１乃至１１のいずれか１項に記載の送信装置。 One of claims 1 to 11, wherein the determination means determines the three-dimensional shape data used for generating the virtual viewpoint image based on the preference of the user who uses the other device. Transmitter as described in section.

前記決定手段は、三次元形状データにより表されるオブジェクトであって、前記複数の撮影装置により撮影されるオブジェクトの注目度、及び、前記複数の撮影装置により撮影される競技に対する前記オブジェクトの関与度とのうち、少なくとも一方に基づいて、仮想視点画像の生成に使用される三次元形状データを決定することを特徴とする請求項１乃至１２のいずれか１項に記載の送信装置。 The determining means is an object represented by three-dimensional shape data, and the degree of attention of the object photographed by the plurality of photographing devices and the degree of involvement of the object in the competition photographed by the plurality of photographing devices. The transmission device according to any one of claims 1 to 12, wherein the three-dimensional shape data used for generating a virtual viewpoint image is determined based on at least one of the above.

前記複数の三次元形状データは、関節及び骨格に基づいて表される三次元形状データをさらに含むことを特徴とする請求項１乃至１３のいずれか１項に記載の送信装置。 The transmission device according to any one of claims 1 to 13, wherein the plurality of three-dimensional shape data further includes three-dimensional shape data represented based on a joint and a skeleton.

前記決定手段は、前記複数の撮影画像に基づいて、前記複数の三次元形状データを生成可能であることを特徴とする請求項１乃至１４のいずれか１項に記載の送信装置。 The transmission device according to any one of claims 1 to 14, wherein the determination means can generate the plurality of three-dimensional shape data based on the plurality of captured images.

前記決定手段は、
前記複数の撮影画像に基づいて、点群により表される第１の三次元形状データを生成し、
生成した前記第１の三次元形状データに基づいて、メッシュにより表される第２の三次元形状データを生成する
ことを特徴とする請求項１５に記載の送信装置。 The determination means is
Based on the plurality of captured images, a first three-dimensional shape data represented by a point cloud is generated.
The transmission device according to claim 15, wherein a second three-dimensional shape data represented by a mesh is generated based on the generated first three-dimensional shape data.

前記決定手段は、前記複数の撮影画像に基づいて、仮想視点画像の生成に使用される三次元形状データとして決定される三次元形状データを生成することを特徴とする請求項１５又は１６に記載の送信装置。 15. The determination means according to claim 15 or 16, wherein the determination means generates three-dimensional shape data determined as three-dimensional shape data used for generating a virtual viewpoint image based on the plurality of captured images. Transmitter.

複数の撮影装置が行う撮影により得られる複数の撮影画像に基づいて生成される複数の三次元形状データであって、点群又はボクセルにより表される三次元形状データと、メッシュにより表される三次元形状データと、を含む複数の三次元形状データの中から、仮想視点画像の生成に使用される三次元形状データを決定する決定工程と、
前記決定工程において決定される三次元形状データを他の装置に送信する送信工程と、
を有することを特徴とする送信方法。 A plurality of three-dimensional shape data generated based on a plurality of captured images obtained by imaging performed by a plurality of photographing devices, the three-dimensional shape data represented by a point cloud or a voxel, and the third order represented by a mesh. A determination process for determining the 3D shape data used to generate the virtual viewpoint image from the original shape data and a plurality of 3D shape data including the original shape data.
A transmission step of transmitting the three-dimensional shape data determined in the determination step to another device, and
A transmission method characterized by having.

コンピュータを、請求項１乃至１７のいずれか１項に記載の送信装置として機能させるためのプログラム。 A program for operating a computer as the transmitting device according to any one of claims 1 to 17.