JP2021152828A

JP2021152828A - Free viewpoint video generation method, device, and program

Info

Publication number: JP2021152828A
Application number: JP2020053507A
Authority: JP
Inventors: 良亮渡邊; Ryosuke Watanabe
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-03-25
Filing date: 2020-03-25
Publication date: 2021-09-30
Anticipated expiration: 2040-03-25
Also published as: JP7319939B2

Abstract

To provide a free viewpoint video generation method capable of generating a 3D model in which no loss is produced in an occlusion portion to achieve appropriate texture mapping.SOLUTION: A subject silhouette image generation unit 102 generates a subject silhouette image per camera. A shielding silhouette image generating unit 103 generates a shielding silhouette image per camera. A silhouette integration unit 105 generates an integrated silhouette image by integrating a subject and each silhouette image of a shielding per camera. A 3D model generation unit 106 generates an integrated 3D model by a visual volume intersection method using each integrated silhouette image. An occlusion information generation unit 108 generates occlusion information in which whether each portion of the integrated 3D model is visible or invisible at a viewpoint of each camera is registered. A free viewpoint rendering unit 110 maps a texture acquired by a visible camera into an invisible portion by some camera per portion of the integrated 3D model on the basis of the occlusion information.SELECTED DRAWING: Figure 1

Description

本発明は、視点の異なる複数のカメラ画像に基づいて自由視点映像を生成する方法、装置およびプログラムに係り、特に、オクルージョン部分に欠損が生じない3Dモデルを生成し、オクルージョン部分への適切なテクスチャマッピングを実現する自由視点映像生成方法、装置およびプログラムに関する。 The present invention relates to a method, device and program for generating a free viewpoint image based on a plurality of camera images having different viewpoints, and in particular, generates a 3D model in which no defect occurs in the occlusion part, and an appropriate texture for the occlusion part. It relates to a free-viewpoint image generation method, a device and a program for realizing mapping.

自由視点映像技術は、視点の異なる複数台のカメラ映像に基づいてカメラが存在しない視点も含めた任意の視点からの映像視聴を可能とする技術である。自由視点映像を実現する一手法として、非特許文献１に開示される視体積交差法に基づく3Dモデルベースの自由視点映像生成手法が存在する。 The free-viewpoint video technology is a technology that enables video viewing from an arbitrary viewpoint including a viewpoint in which a camera does not exist, based on images from a plurality of cameras having different viewpoints. As a method for realizing a free-viewpoint image, there is a 3D model-based free-viewpoint image generation method based on the visual volume crossing method disclosed in Non-Patent Document 1.

視体積交差法は、図１０に示したように各カメラcamの映像から被写体の部分だけを抽出した２値のシルエット画像を用いて、各カメラcamのシルエット画像を3D空間に投影して視体積を求め、その積集合となる部分のみを3DCGのモデルとして残すことによって3Dモデルを生成する手法である。 In the visual volume crossing method, as shown in FIG. 10, a binary silhouette image obtained by extracting only the subject portion from the image of each camera cam is used, and the silhouette image of each camera cam is projected into a 3D space to obtain the visual volume. Is a method to generate a 3D model by finding and leaving only the part that is the product set as a 3DCG model.

このような視体積交差法は、非特許文献2に開示されるフルモデル方式自由視点（＝3Dモデルの形状を忠実に表現する方式）や、非特許文献３に開示されるビルボード方式自由視点（＝3Dモデルをビルボードと呼ばれる板の形状で制作し、近いカメラからのテクスチャをビルボードにマッピングする方式）を実現する上での基礎技術として利用されている。 Such a visual volume crossing method includes a full model free viewpoint (= a method of faithfully expressing the shape of a 3D model) disclosed in Non-Patent Document 2 and a billboard free viewpoint disclosed in Non-Patent Document 3. It is used as a basic technology to realize (= a method of creating a 3D model in the shape of a board called a billboard and mapping the texture from a nearby camera to the billboard).

視体積交差法で利用する積集合を得るためのシルエット画像の抽出手法としては、非特許文献４に代表される背景差分法ベースの手法が知られている。背景差分法は、背景モデルと呼ばれる被写体が存在しない状態のモデルと、入力画像の差分を基に被写体を抽出する手法である。 As a method for extracting a silhouette image for obtaining an intersection used in the visual volume crossing method, a method based on the background subtraction method represented by Non-Patent Document 4 is known. The background subtraction method is a method of extracting a subject based on the difference between a model called a background model in which no subject exists and an input image.

ところで、例えばスポーツシーンなどでは、フィールド上に移動しない構造物（例えば、サッカーのゴールポストやバレーのネット）が登場するケースがある。背景差分法ベースのシルエット抽出により取得したシルエット画像を用いて視体積交差法を適用する場合、このような構造物が自由視点の品質に悪影響を与える場合がある。 By the way, in a sports scene, for example, there are cases where a structure that does not move on the field (for example, a soccer goal post or a volleyball net) appears. When the visual volume crossing method is applied using the silhouette image obtained by silhouette extraction based on the background subtraction method, such a structure may adversely affect the quality of the free viewpoint.

例えば、スポーツ選手などの被写体の前にゴールポストなどの構造物が覆いかぶさる場合、これらの構造物は静止していることから背景差分法では背景と判定され、シルエットを抽出できない。 For example, when a structure such as a goal post covers a subject such as an athlete, since these structures are stationary, they are determined to be background by the background subtraction method, and a silhouette cannot be extracted.

視体積交差法では、モデル化されるか否かを判定するボクセルグリッドに対応するシルエット画像の画素が、多くのカメラにおいて前景と判定されるとボクセルグリッドがモデル化される。この前景判定の閾値となるカメラ台数が少なくなると、誤った部位が3Dモデル化されやすくなることから、実運用としては全てのカメラにおいて前景と判定された場合に、ボクセルグリッドをモデル化するケースが多い。したがって、構造物によってシルエットに欠損が生じていると、図１１に示したように、あるカメラから見て構造物の裏側に存在する被写体に欠損が生じ得る。 In the visual volume crossing method, the voxel grid is modeled when the pixels of the silhouette image corresponding to the voxel grid for determining whether or not to be modeled are determined to be the foreground in many cameras. If the number of cameras that is the threshold for this foreground judgment is small, the wrong part is likely to be modeled in 3D. Therefore, in actual operation, there are cases where the voxel grid is modeled when it is judged to be the foreground for all cameras. many. Therefore, if the silhouette is defective due to the structure, as shown in FIG. 11, the subject existing on the back side of the structure when viewed from a certain camera may be defective.

このような技術課題は、背景差分法を用いたシルエット抽出において現れやすい傾向にあるが、例えば非特許文献５や非特許文献６が開示するDeep Learningをベースとした背景差分法以外のシルエット抽出手法でも、構造物に遮蔽された部分がシルエットとして抽出されない可能性があり、背景差分法に限定されるものではない。 Such technical problems tend to appear in silhouette extraction using the background subtraction method. For example, a silhouette extraction method other than the background subtraction method based on Deep Learning disclosed in Non-Patent Document 5 and Non-Patent Document 6. However, there is a possibility that the part shielded by the structure is not extracted as a silhouette, and the method is not limited to the background subtraction method.

特許文献１は、このような技術課題を解決するために、サッカーのゴールポストなどの被写体を遮蔽する構造物のシルエット画像（＝以後「遮蔽物シルエット画像」と表現する場合もある）をカメラごとに用意し、背景差分法で取得した被写体シルエット画像に遮蔽物シルエット画像を加算して得られる統合シルエット画像を用いて視体積交差法を行うことで、遮蔽物による欠損のない3Dモデルの生成を可能にしている。 In Patent Document 1, in order to solve such a technical problem, a silhouette image of a structure that shields a subject such as a soccer goal post (= may be hereinafter referred to as a "shield silhouette image") is used for each camera. By performing the visual volume crossing method using the integrated silhouette image obtained by adding the shield silhouette image to the subject silhouette image acquired by the background subtraction method, a 3D model without defects due to the shield can be generated. It is possible.

しかしながら、統合シルエット画像を用いた視体積交差法では、ゴールポストの3Dモデルもモデル化されてしまう。ゴールポストがモデル化されると、例えば非特許文献３のビルボード自由視点を実現する際に、ゴールポストモデルに接触している人物がゴールポストのモデルと一体化して巨大なビルボードが生成され、被写体の表示位置の誤差が大きくなってしまう課題がある。 However, in the visual volume crossing method using the integrated silhouette image, the 3D model of the goal post is also modeled. When the goal post is modeled, for example, when realizing the billboard free viewpoint of Non-Patent Document 3, a person in contact with the goal post model is integrated with the goal post model to generate a huge billboard. However, there is a problem that the error of the display position of the subject becomes large.

すなわち、ビルボード自由視点では、被写体の位置にビルボードというボードを立てて表現を行う都合上、視体積交差法により生成されるモデルの塊ごとに3Dオブジェクトをラベリングし、各々の塊に応じてビルボードが形成される。被写体が巨大な構造物などに触れた場合、被写体と構造物のモデルは一つの大きな塊として扱われ、一つのビルボードにまとめられる。 That is, from the billboard free viewpoint, a 3D object is labeled for each block of the model generated by the visual volume crossing method for the convenience of setting up a board called a billboard at the position of the subject and expressing it, and according to each block. A billboard is formed. When the subject touches a huge structure, the model of the subject and the structure are treated as one big block and put together on one billboard.

このビルボードは、ボードの中心を軸にユーザの選択視点に正対するように回転することから、構造物と人物がくっついたまま回転するような違和感を与える。また、この塊が解消された瞬間に人物の表示位置が大幅に変わるなどの違和感の原因となる。 Since this billboard rotates around the center of the board so as to face the user's selection viewpoint, it gives a sense of discomfort as if the structure and the person rotate while sticking to each other. In addition, the moment the lump is resolved, the display position of the person changes drastically, which causes a sense of discomfort.

加えて、統合シルエット画像を用いた視体積交差法では、ゴールポストモデルがフレーム毎に形成されることになるので3Dモデルのデータサイズが増大する。 In addition, in the visual volume crossing method using the integrated silhouette image, the goal post model is formed for each frame, so that the data size of the 3D model increases.

このような技術課題に対して、特許文献１では視体積交差法で3D空間のモデル化を行った後に、この視体積交差法でモデル化されるゴールポストを削除する機能も開示されている。特許文献１によれば、遮蔽物が被写体を覆い隠す場合であっても欠損のない被写体の3Dシェイプの再構成が可能となる。 In response to such a technical problem, Patent Document 1 also discloses a function of deleting a goal post modeled by the visual volume crossing method after modeling a 3D space by the visual volume crossing method. According to Patent Document 1, even when a shield covers the subject, it is possible to reconstruct the 3D shape of the subject without defects.

なお、構造物の3Dモデルを削除すると3D空間内に本来あるべき構造物が存在しなくなるが、自由視点映像を視聴する際には、このような構造物は静的な汎用3DCGモデルなどを用いて配置すればよく、このような実装により視体積交差法由来の構造物モデルを用いるよりも形状が正確な3Dモデルを表示させることが可能になる。 If the 3D model of the structure is deleted, the structure that should be originally does not exist in the 3D space, but when viewing the free-viewpoint video, such a structure uses a static general-purpose 3DCG model or the like. With such an implementation, it is possible to display a 3D model having a more accurate shape than using a structure model derived from the visual volume crossing method.

特開2019-106170号公報JP-A-2019-106170

Laurentini, A. "The visual hull concept for silhouette based image understanding.", IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 150-162, (1994).Laurentini, A. "The visual hull concept for silhouette based image understanding.", IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 150-162, (1994). J. Kilner, J. Starck, A. Hilton and O. Grau, "Dual-Mode Deformable Models for Free-Viewpoint Video of Sports Events," Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007), Montreal, QC, 2007, pp. 177-184.J. Kilner, J. Starck, A. Hilton and O. Grau, "Dual-Mode Deformable Models for Free-Viewpoint Video of Sports Events," Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM 2007), Montreal, QC, 2007, pp. 177-184. H. Sankoh, S. Naito, K. Nonaka, H. Sabirin, J. Chen, "Robust Billboard-based, Free-viewpoint Video Synthesis Algorithm to Overcome Occlusions under Challenging Outdoor Sport Scenes", Proceedings of the 26th ACM international conference on Multimedia, pp. 1724-1732, (2018)H. Sankoh, S. Naito, K. Nonaka, H. Sabirin, J. Chen, "Robust Billboard-based, Free-viewpoint Video Synthesis Algorithm to Overcome Occlusions under Challenging Outdoor Sport Scenes", Proceedings of the 26th ACM international conference on Multimedia, pp. 1724-1732, (2018) C. Stauffer and W. E. L. Grimson, "Adaptive background mixture models for real-time tracking," 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 246-252 Vol. 2 (1999).C. Stauffer and W. E. L. Grimson, "Adaptive background mixture models for real-time tracking," 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 246-252 Vol. 2 (1999). D. Bolya, C. Zhou, F. Xiao, Y. J. Lee, "YOLACT: Real-Time Instance Segmentation", The IEEE International Conference on Computer Vision (ICCV), pp. 9157-9166, (2019).D. Bolya, C. Zhou, F. Xiao, Y. J. Lee, "YOLACT: Real-Time Instance Segmentation", The IEEE International Conference on Computer Vision (ICCV), pp. 9157-9166, (2019). L. A. Lim and H. Y. Keles, "Learning multi-scale features for foreground segmentation," Pattern Analysis and Applications, pp. 1-12, (2019).L. A. Lim and H. Y. Keles, "Learning multi-scale features for foreground segmentation," Pattern Analysis and Applications, pp. 1-12, (2019). Qiang Yao, Hiroshi Sankoh, Nonaka Keisuke, Sei Naito. "Automatic camera self-calibration for immersive navigation of free viewpoint sports video," 2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP), 1-6, 2016.Qiang Yao, Hiroshi Sankoh, Nonaka Keisuke, Sei Naito. "Automatic camera self-calibration for immersive navigation of free viewpoint sports video," 2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP), 1-6, 2016. J. Chen, R. Watanabe, K. Nonaka, T. Konno, H. Sankoh, S. Naito, "A Fast Free-viewpoint Video Synthesis Algorithm for Sports Scenes", 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019), WeAT17.2, (2019).J. Chen, R. Watanabe, K. Nonaka, T. Konno, H. Sankoh, S. Naito, "A Fast Free-viewpoint Video Synthesis Algorithm for Sports Scenes", 2019 IEEE / RSJ International Conference on Intelligent Robots and Systems ( IROS 2019), WeAT17.2, (2019).

特許文献１は、3Dモデル生成（3Dモデルの形状を得る処理）に関する機構を開示するのみで、遮蔽物を考慮したテクスチャマッピングの方法については開示していない。 Patent Document 1 only discloses a mechanism related to 3D model generation (processing for obtaining a shape of a 3D model), and does not disclose a method of texture mapping in consideration of a shield.

遮蔽物としてサッカーのゴールポストを例にして説明すると、ゴールポストの背後に存在する人物モデルにはゴールポストのテクスチャが映り込まないようにする必要がある。しかしながら、特許文献１が開示する機構を用いてテクスチャマッピングを行うと、ゴールポストのテクスチャが人物の3Dモデルにマッピングされてしまう。 Taking a soccer goal post as an example as a shield, it is necessary to prevent the texture of the goal post from being reflected in the person model behind the goal post. However, when texture mapping is performed using the mechanism disclosed in Patent Document 1, the texture of the goal post is mapped to the 3D model of the person.

加えて、特許文献１には遮蔽物シルエット画像を自動生成する機構が開示されていない。一般にカメラ台数が多数に及ぶ場合、遮蔽物シルエット画像を手動で作成することは人的リソースの面などから課題が大きいため、自動生成のソリューションが必須である。 In addition, Patent Document 1 does not disclose a mechanism for automatically generating a shield silhouette image. Generally, when the number of cameras is large, manually creating a shield silhouette image poses a big problem in terms of human resources, so an automatically generated solution is indispensable.

本発明の目的は、上記の技術課題を解決し、オクルージョン部分に欠損が生じない3Dモデルを生成し、かつオクルージョン部分への適切なテクスチャマッピングを実現できる自由視点映像生成方法、装置およびプログラムを提供することにある。 An object of the present invention is to provide a free-viewpoint image generation method, an apparatus, and a program capable of solving the above technical problems, generating a 3D model in which no defects occur in the occlusion portion, and realizing appropriate texture mapping to the occlusion portion. To do.

上記の目的を達成するために、本発明は、被写体および遮蔽物を視点の異なる複数のカメラで同期撮影したカメラ画像に基づいて自由視点映像を生成する自由視点映像生成装置において、以下の構成を具備した点に特徴がある。 In order to achieve the above object, the present invention has the following configuration in a free-viewpoint image generator that generates a free-viewpoint image based on camera images obtained by synchronously capturing a subject and a shield with a plurality of cameras having different viewpoints. It is characterized by the fact that it is equipped.

(1) 本発明は、カメラごとに被写体シルエット画像を生成する手段と、カメラごとに遮蔽物シルエット画像を生成する手段と、カメラごとに被写体および遮蔽物の各シルエット画像を統合して統合シルエット画像を生成する手段と、各統合シルエット画像を用いた視体積交差法により統合3Dモデルを生成する手段と、統合3Dモデルの各部位が各カメラの視点で可視および不可視のいずれであるかを登録したオクルージョン情報を生成する手段と、オクルージョン情報に基づいて、統合3Dモデルの部位ごとに一部のカメラで不可視の部位へ当該部位が可視のカメラで取得したテクスチャをマッピングする手段とを具備した点に第１の特徴がある。 (1) The present invention integrates a means for generating a subject silhouette image for each camera, a means for generating a shield silhouette image for each camera, and each silhouette image of a subject and a shield for each camera, and integrates the silhouette image. The means for generating the integrated 3D model, the means for generating the integrated 3D model by the visual volume crossing method using each integrated silhouette image, and whether each part of the integrated 3D model is visible or invisible from the viewpoint of each camera are registered. It is equipped with a means for generating occlusion information and a means for mapping the texture acquired by the camera in which the part is visible to the part invisible by some cameras for each part of the integrated 3D model based on the occlusion information. There is the first feature.

(2) 本発明は、統合3Dモデルから遮蔽物3Dモデルを減じる手段を更に具備し、マッピングする手段は、遮蔽物の3Dモデルが減ぜられた統合3Dモデルの各部位に前記オクルージョン情報を用いてテクスチャをマッピングするようにした点に第２の特徴がある。 (2) The present invention further includes means for subtracting the shield 3D model from the integrated 3D model, and the mapping means uses the occlusion information for each part of the integrated 3D model in which the shield 3D model is reduced. The second feature is that the texture is mapped.

本発明によれば、以下のような効果が達成される。 According to the present invention, the following effects are achieved.

(1) 本発明は前記第１の特徴を具備したので、遮蔽物を考慮して欠損のない3Dモデル生成を行えることに加えて、遮蔽物が存在することによる遮蔽を考慮したテクスチャマッピングが可能になるので、品質面に優れた自由視点映像を生成することができる。 (1) Since the present invention has the above-mentioned first feature, in addition to being able to generate a 3D model without defects in consideration of a shield, it is possible to perform texture mapping in consideration of the shield due to the presence of the shield. Therefore, it is possible to generate a free-viewpoint image having excellent quality.

(2) 本発明は前記第２の特徴を具備したので、3Dモデルのデータ量軽減が期待できることに加えて、ビルボード自由視点を実現する際に、遮蔽物の3Dモデルと被写体の3Dモデルとが統合されたままの巨大なビルボードが回転する現象の発生を抑止できる。 (2) Since the present invention has the second feature, it can be expected to reduce the amount of data in the 3D model, and when the billboard free viewpoint is realized, the 3D model of the shield and the 3D model of the subject are used. It is possible to prevent the occurrence of the phenomenon that a huge billboard that remains integrated is rotated.

発明の第１実施形態に係る自由視点映像生成装置の所要部の構成を示した機能ブロック図である。It is a functional block diagram which showed the structure of the required part of the free viewpoint image generation apparatus which concerns on 1st Embodiment of this invention. 遮蔽物シルエット画像の生成方法を示した図である。It is a figure which showed the generation method of the shield silhouette image. カメラパラメータの例を示した図である。It is a figure which showed the example of a camera parameter. 統合シルエット画像の生成方法を示した図である。It is a figure which showed the generation method of the integrated silhouette image. レンダリング方法を模式的に示した図である。It is a figure which showed typically the rendering method. 本発明により生成されるレンダリングモデルを従来技術により生成されるレンダリングモデルと比較した図である。It is a figure which compared the rendering model generated by this invention with the rendering model generated by the prior art. 発明の第２実施形態に係る自由視点映像生成装置の所要部の構成を示した機能ブロック図である。It is a functional block diagram which showed the structure of the required part of the free viewpoint image generation apparatus which concerns on 2nd Embodiment of this invention. 複数の視聴端末へ仮想視点の異なるレンダリング画像を配信する多端末配信システムへの適用例（その１）を示した図である。It is a figure which showed the application example (the 1) to the multi-terminal distribution system which distributes the rendered image with a different virtual viewpoint to a plurality of viewing terminals. 複数の視聴端末へ仮想視点の異なるレンダリング画像を配信する多端末配信システムへの適用例（その２）を示した図である。It is a figure which showed the application example (the 2) to the multi-terminal distribution system which distributes the rendered image with a different virtual viewpoint to a plurality of viewing terminals. 視体積交差法を説明するための図である。It is a figure for demonstrating the visual volume crossing method. 遮蔽物により被写体シルエット画像に欠損が生じる例を示した図である。It is a figure which showed the example which the subject silhouette image is damaged by a shield.

以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は、本発明の第１実施形態に係る自由視点映像生成装置１の所要部の構成を示した機能ブロック図であり、ここではスポーツシーンとしてサッカーに注目し、サッカーの競技シーンを視点の異なる複数のカメラで同期撮影した映像に基づいて自由視点映像を生成する場合を例にして説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a functional block diagram showing a configuration of a required part of the free-viewpoint video generator 1 according to the first embodiment of the present invention. Here, attention is paid to soccer as a sports scene, and a soccer competition scene is viewed as a viewpoint. An example will be described in which a free-viewpoint image is generated based on images taken synchronously by a plurality of different cameras.

このような自由視点映像生成装置１は、CPU、メモリ、インタフェースおよびこれらを接続するバス等を備えた汎用のコンピュータやモバイル端末に、後述する各機能を実現するアプリケーション（プログラム）を実装することで構成できる。あるいは、アプリケーションの一部をハードウェア化またはプログラム化した専用機や単能機としても構成できる。 Such a free-viewpoint video generator 1 implements an application (program) that realizes each function described later on a general-purpose computer or mobile terminal equipped with a CPU, a memory, an interface, a bus connecting them, and the like. Can be configured. Alternatively, it can be configured as a dedicated machine or a single-purpose machine in which a part of the application is made into hardware or programmed.

カメラ映像取得部１０１は、競技フィールドを撮影する複数のカメラCamからカメラ映像を取得する。本実施形態では、フルモデル自由視点を制作することとし、全てのカメラCamが固定されており、試合中に各カメラの画角が変化することは想定しない。 The camera image acquisition unit 101 acquires camera images from a plurality of camera Cams that capture the competition field. In this embodiment, a full model free viewpoint is produced, all cameras are fixed, and it is not assumed that the angle of view of each camera changes during the game.

被写体シルエット画像生成部１０２は、フレーム間で動きのある動的オブジェクト（以下、被写体と表現する）のシルエット画像を、例えば背景差分法によりカメラ画像ごとにフレーム単位で生成する。 The subject silhouette image generation unit 102 generates a silhouette image of a dynamic object (hereinafter, referred to as a subject) that moves between frames for each camera image by, for example, a background subtraction method.

遮蔽物シルエット画像生成部１０３は、フレーム間で動きの無い静的オブジェクト（以下、遮蔽物と表現する）のシルエット画像を、予め定義された汎用の遮蔽物3Dモデルおよびカメラパラメータを用いてカメラごとに自動生成する。前記カメラパラメータは、遮蔽物に代表される既知の構造物から抽出した各特徴点とカメラ画像から抽出した遮蔽物の各特徴点とのマッチング結果に基づいて推定できる。例えば、サッカーの試合におけるゴールポストがスタジアムの３次元空間中のどこに配置されるかという情報は既知である。ゴールポストのサイズも規格で決定されていることを加味すれば、ゴールポストの角などの特徴点の3次元位置は既知である。各カメラから得られる2D画像中からこのような特徴点を特定し、特定された特徴点と既知の3次元位置とのマッチングを取ることで、カメラの位置や向きを特定（＝カメラキャリブレーション）できる。 The shield silhouette image generation unit 103 creates a silhouette image of a static object (hereinafter referred to as a shield) that does not move between frames for each camera using a predefined general-purpose shield 3D model and camera parameters. Is automatically generated. The camera parameters can be estimated based on the matching result of each feature point extracted from a known structure represented by a shield and each feature point of the shield extracted from the camera image. For example, information on where a goal post in a soccer match is placed in the three-dimensional space of a stadium is known. Considering that the size of the goal post is also determined by the standard, the three-dimensional positions of the feature points such as the corners of the goal post are known. By identifying such feature points from the 2D images obtained from each camera and matching the identified feature points with known 3D positions, the position and orientation of the camera can be specified (= camera calibration). can.

本実施形態では、カメラが固定されているので遮蔽物シルエット画像の生成は最初に一度だけ行えば良い。生成された遮蔽物シルエット画像は遮蔽物シルエット画像DB１０４に蓄積される。 In the present embodiment, since the camera is fixed, the shield silhouette image needs to be generated only once at the beginning. The generated shield silhouette image is stored in the shield silhouette image DB104.

汎用の遮蔽物3Dモデルは、.objや.fbxなどの汎用3Dモデル形式として用意できるが、本実施形態ではゴールポストが遮蔽物と見なされるところ、その形状は競技規定等により既知である。したがって、汎用3Dモデルを用意する代わりに、複数の直方体や円柱の3Dモデルを組み合わせてゴールポストを模した遮蔽物3Dモデルを生成しても良い。 A general-purpose shield 3D model can be prepared as a general-purpose 3D model format such as .obj or .fbx, but in this embodiment, the goal post is regarded as a shield, and its shape is known by competition regulations and the like. Therefore, instead of preparing a general-purpose 3D model, a shield 3D model that imitates a goal post may be generated by combining a plurality of rectangular parallelepiped or cylinder 3D models.

前記遮蔽物シルエット画像生成部１０３は、競技場を模した3D空間中の所定位置に前記遮蔽物3Dモデルを配置し、図２に示したように、カメラパラメータを用いてこれを各カメラ画像上に逆投影することで遮蔽物シルエット画像を生成する。ここで言うカメラパラメータとは、カメラ行列（内部パラメータ行列）及び外部パラメータ行列のことを指し、例えば、図３のような形式で与えられる。 The shield silhouette image generation unit 103 arranges the shield 3D model at a predetermined position in a 3D space imitating a stadium, and as shown in FIG. 2, uses camera parameters to display the shield silhouette image on each camera image. A shield silhouette image is generated by back-projecting to. The camera parameters referred to here refer to a camera matrix (internal parameter matrix) and an external parameter matrix, and are given in the form shown in FIG. 3, for example.

カメラパラメータは手動で取得しても良いし、非特許文献７に開示されるように、オートキャリブレーションにより取得しても良い。非特許文献７のようにコートの形状からオートキャリブレーションを行う手法と組み合わせればキャリブレーションまで含めた全過程を全自動で行うことができる。 The camera parameters may be acquired manually or by auto-calibration as disclosed in Non-Patent Document 7. If combined with the method of performing auto-calibration from the shape of the coat as in Non-Patent Document 7, the entire process including calibration can be performed fully automatically.

前記遮蔽物シルエット画像生成部１０３がカメラごとに出力するゴールポストの遮蔽物シルエット画像には、本発明者等による先の特許出願（特願2019-231270号）の発明を適用することで、その輪郭を膨張する等の画像加工を行ってもよい。 By applying the invention of the previous patent application (Japanese Patent Application No. 2019-231270) by the present inventors, etc., to the shield silhouette image of the goal post output by the shield silhouette image generation unit 103 for each camera. Image processing such as expanding the contour may be performed.

例えば、3Dモデルを逆投影することによって得られるシルエット画像は、シルエット画像自体が離散的な位置しか表現できないことから、誤差が発生して不正確になる可能性がある。このようなシルエットを用いて再び視体積交差法で3Dモデルを生成すると、実際のゴールポストよりも小さいポストモデルが生成されてしまう可能性がある。このような誤差を軽減する観点で、得られたシルエットの輪郭を膨張させるなどのシルエット画像加工を行ってもよい。 For example, a silhouette image obtained by back-projecting a 3D model may be inaccurate due to errors because the silhouette image itself can only represent discrete positions. If a 3D model is generated again by the visual volume crossing method using such a silhouette, a post model smaller than the actual goal post may be generated. From the viewpoint of reducing such an error, silhouette image processing such as expanding the contour of the obtained silhouette may be performed.

シルエット統合部１０５は、図４に一例を示したように、カメラごとにフレーム単位で遮蔽物シルエット画像と被写体シルエット画像とを統合して統合シルエット画像を生成する。この統合処理は、例えばシルエットの前景が255、背景が0で表現される際に、入力される二つのマスクのいずれかが255であれば被写体を前景とする論理和によって行われる。 As shown in FIG. 4, the silhouette integration unit 105 integrates the shield silhouette image and the subject silhouette image on a frame-by-frame basis for each camera to generate an integrated silhouette image. This integrated processing is performed by, for example, when the foreground of the silhouette is expressed as 255 and the background is expressed as 0, and if either of the two input masks is 255, the subject is the foreground.

3Dモデル生成部１０６は、シルエット統合部１０５が出力するN枚の統合シルエット画像を用いた視体積交差法により、被写体および遮蔽物の3Dボクセルモデル（統合3Dモデル）を生成する。本実施形態では、3Dモデル生成の対象範囲（例えば、スポーツ映像なら当該スポーツが行われるフィールド等）に単位ボクセルサイズMでボクセルグリッドを配置しておき、ボクセルグリッドごとに3Dモデルを形成するか否かが視体積交差法に基づいて判定される。 The 3D model generation unit 106 generates a 3D voxel model (integrated 3D model) of a subject and a shield by a visual volume crossing method using N integrated silhouette images output by the silhouette integration unit 105. In the present embodiment, whether or not a voxel grid is arranged in a unit voxel size M in a target range of 3D model generation (for example, a field where the sport is performed in the case of sports video) and a 3D model is formed for each voxel grid. The voxel is determined based on the visual volume crossing method.

視体積交差法は、N枚のシルエット画像を3次元ワールド座標に投影した際の視錐体の共通部分を次式(1)に基づいて視体積（Visual Hull）VH(I)として獲得する技術である。 The visual volume crossing method is a technique for acquiring the intersection of visual cones when N silhouette images are projected onto 3D world coordinates as visual volume (Visual Hull) VH (I) based on the following equation (1). Is.

上式(1)にて、集合Iは各カメラのシルエット画像の集合であり、V_iはi番目のカメラから得られるシルエット画像に基づいて計算される視錐体である。また、通常はN枚全てのカメラの共通部分となる部分がモデル化されるが、N-1枚が共通する場合にモデル化するなど、モデル化が成されるカメラ台数に関しては変更してもよい。視体積が生成されるカメラ台数の閾値を下げることで、少ない枚数のシルエット画像で被写体が欠けた場合にも3Dモデルの復元が可能になる一方、ノイズが多くなるなどの副作用が現れる可能性がある。このカメラ台数の閾値は手動で設定される。 In the above equation (1), the set I is the set of silhouette images of each camera, and V _i is the visual cone calculated based on the silhouette images obtained from the i-th camera. In addition, normally, the part that is the common part of all N cameras is modeled, but even if the number of cameras to be modeled is changed, such as modeling when N-1 cameras are common. good. By lowering the threshold of the number of cameras that generate the visual volume, it is possible to restore the 3D model even if the subject is missing in a small number of silhouette images, but side effects such as increased noise may appear. be. The threshold for the number of cameras is set manually.

前記3Dモデル生成部１０６が出力する統合3Dモデルでは、ゴールポスト部分のシルエットが統合できているため、遮蔽物の背後に隠れる物体について欠損のない3Dモデルを生成することが可能となる。 In the integrated 3D model output by the 3D model generation unit 106, since the silhouette of the goal post portion can be integrated, it is possible to generate a 3D model without defects for the object hidden behind the shield.

この視体積交差法の処理は、非特許文献８に示されるような２段階の視体積交差法に対して行ってもよい。この場合、２段階の視体積交差法のいずれの段階でも、シルエット統合部で生成した統合シルエット画像を利用して視体積交差法でモデル化を行う。 The processing of this visual volume crossing method may be performed on a two-step visual volume crossing method as shown in Non-Patent Document 8. In this case, in any of the two stages of the visual volume crossing method, modeling is performed by the visual volume crossing method using the integrated silhouette image generated by the silhouette integration unit.

このとき、例えばマーチンキューブ法などのボクセルモデルをポリゴンモデルに変換する手法を用いてボクセルモデルをポリゴンモデルに変換する機能を追加し、ポリゴンモデルとして3Dモデルを出力する機能を有していても良い。本実施例では、3Dモデル生成部１０６で視体積交差法を行った後、マーチンキューブ法に基づいてボクセルモデルがポリゴンモデルに変換される。 At this time, a function of converting the voxel model to a polygon model by using a method of converting the voxel model to a polygon model such as the Marching cube method may be added, and a function of outputting a 3D model as a polygon model may be provided. .. In this embodiment, after the visual volume crossing method is performed by the 3D model generation unit 106, the voxel model is converted into a polygon model based on the Marching cube method.

遮蔽物3Dモデル生成部１０７は、前記遮蔽物シルエット画像DB１０４にカメラごとに蓄積されている遮蔽物シルエット画像を用いた視体積交差法により遮蔽物3Dモデルを生成する。本実施形態ではカメラが固定されているので遮蔽物3Dモデルの生成はカメラごとに一度だけ行えば良い。 The shield 3D model generation unit 107 generates a shield 3D model by the visual volume crossing method using the shield silhouette image stored in the shield silhouette image DB 104 for each camera. In this embodiment, since the cameras are fixed, the shield 3D model needs to be generated only once for each camera.

生成された遮蔽物3Dモデルは、後述する遮蔽物3Dモデル減算部１０９に入力されるが、当該遮蔽物3Dモデル生成部１０７を省略し、前記遮蔽物シルエット画像生成部１０３が遮蔽物シルエット画像の生成に用いた汎用の遮蔽物3Dモデルを遮蔽物3Dモデル減算部１０９に直接入力するようにしても良い。 The generated shield 3D model is input to the shield 3D model subtraction unit 109 described later, but the shield 3D model generation unit 107 is omitted, and the shield silhouette image generation unit 103 is the shield silhouette image. The general-purpose shield 3D model used for generation may be directly input to the shield 3D model subtraction unit 109.

しかしながら、遮蔽物3Dモデル減算部１０９へは、前記統合3Dモデルとして統合された遮蔽物3Dモデルと同一モデルを入力することが望ましいことから、本実施形態では統合された遮蔽物3Dモデルを視体積交差法により生成した際に用いた遮蔽物シルエット画像を用いて遮蔽物3Dモデルを生成している。 However, since it is desirable to input the same model as the integrated shield 3D model as the integrated 3D model to the shield 3D model subtraction unit 109, in the present embodiment, the integrated shield 3D model is used as the visual volume. A shield 3D model is generated using the shield silhouette image used when it was generated by the intersection method.

オクルージョン情報生成部１０８は、3Dモデルのオクルージョン情報の計算を行う。オクルージョン情報とは、生成された統合3Dモデルの各部位が各カメラから可視または遮蔽による不可視のいずれの状態であるかを記録した情報であり、後述する自由視点レンダリング部１１０は、当該オクルージョン情報を参照することによって、不可視部位のテクスチャマッピングを可視のカメラ映像に基づいて行えるようになる。 The occlusion information generation unit 108 calculates the occlusion information of the 3D model. The occlusion information is information that records whether each part of the generated integrated 3D model is visible from each camera or invisible due to shielding, and the free-viewpoint rendering unit 110 described later displays the occlusion information. By referring to it, the texture mapping of the invisible part can be performed based on the visible camera image.

本実施例では、3Dモデル生成部１０６により3Dのポリゴンモデルが生成されるため、3Dポリゴンモデルの各頂点部位に関する遮蔽関係がオクルージョン情報として記録される。例えば、N台のカメラが存在する環境であれば、3Dポリゴンモデルの頂点部位ごとにN個のオクルージョン情報が記録される。 In this embodiment, since the 3D polygon model is generated by the 3D model generation unit 106, the shielding relationship regarding each vertex portion of the 3D polygon model is recorded as occlusion information. For example, in an environment where N cameras exist, N occlusion information is recorded for each vertex part of the 3D polygon model.

本実施形態では、頂点部位が可視であれば「1」、不可視であれば「0」などの形式でオクルージョン情報が記録される。これにより各頂点部位のオクルージョン情報を可視／不可視の1bitで表現できる。オクルージョン情報は、遮蔽物に起因した遮蔽のみならず、他の被写体に起因した遮蔽も含めて全ての遮蔽関係が考慮される。 In the present embodiment, occlusion information is recorded in a format such as "1" if the apex portion is visible and "0" if it is invisible. As a result, the occlusion information of each vertex part can be expressed by 1 bit of visible / invisible. The occlusion information considers not only the occlusion caused by the obstruction but also all the occlusion relationships including the occlusion caused by other subjects.

例えば、二人の選手A，Bがあるカメラ視点で重なることでオクルージョンが発生し、このとき選手Aが選手Bを覆い隠していれば選手Bに選手Aのテクスチャが映り込まないようにテクスチャをマッピングする必要がある。このような場合、選手Bの不可視となる頂点部位もオクルージョン情報が「０」（不可視）として記録される。 For example, occlusion occurs when two players A and B overlap at a certain camera viewpoint, and if player A covers player B at this time, the texture is applied so that the texture of player A is not reflected on player B. Need to be mapped. In such a case, the occlusion information is also recorded as "0" (invisible) at the apex portion where the player B is invisible.

遮蔽物3Dモデル減算部１０９は、3Dモデル生成部１０６が生成した統合3Dモデルから遮蔽物3Dモデルに相当する部分を取り除く処理を行う。本実施形態では、遮蔽物3Dモデル生成部１０７が生成した遮蔽物3Dモデルの位置を参照し、その位置に存在するポリゴンが統合3Dモデルから消去される。 The shield 3D model subtraction unit 109 performs a process of removing a portion corresponding to the shield 3D model from the integrated 3D model generated by the 3D model generation unit 106. In the present embodiment, the position of the shield 3D model generated by the shield 3D model generation unit 107 is referred to, and the polygon existing at that position is erased from the integrated 3D model.

この減算処理を行うことによって3Dモデルのデータ量軽減が期待できることに加えて、ビルボード自由視点を実現する際に、ポストの3Dモデルと被写体の3Dモデルが繋がってしまい、巨大なビルボードが回転する現象の発生を抑止できる。 In addition to being able to expect a reduction in the amount of data in the 3D model by performing this subtraction process, the 3D model of the post and the 3D model of the subject will be connected when the billboard free viewpoint is realized, and the huge billboard will rotate. It is possible to suppress the occurrence of such a phenomenon.

自由視点レンダリング部１１０は、遮蔽物3Dモデル減算部１０９が出力する被写体のみの3Dモデル、オクルージョン情報生成部１０８で生成されたオクルージョン情報および各カメラ画像（テクスチャ）を用いて、任意の仮想視点p_vから見た合成映像をレンダリングする。 The free viewpoint rendering unit 110 uses an arbitrary virtual viewpoint p using the 3D model of only the subject output by the obstruction 3D model subtraction unit 109, the occlusion information generated by the occlusion information generation unit 108, and each camera image (texture). Render the composite image seen from _v.

図５は、自由視点レンダリング部１１０によるレンダリング方法を模式的に示した図である。本実施形態では、統合3Dモデルから遮蔽3Dモデルを減じて取得した実質的に被写体の3Dモデルの各部位（本実施形態では、ポリゴン）の可視／不可視をオクルージョン情報に基づいてカメラごとに判断し、一部のカメラ画像で不可視の部位を他の可視のカメラ画像を用いてテクスチャマッピングするようにしている。 FIG. 5 is a diagram schematically showing a rendering method by the free viewpoint rendering unit 110. In this embodiment, the visibility / invisible of each part (polygon in this embodiment) of the 3D model of the subject obtained by subtracting the shielding 3D model from the integrated 3D model is determined for each camera based on the occlusion information. , Invisible parts in some camera images are texture-mapped using other visible camera images.

本実施形態では、初めに要求された仮想視点p_vに最近傍の２台のカメラCam₁，Cam₂を選択し、各カメラ画像Ic₁，Ic₂を3DモデルM_jのポリゴンgにマッピングする。その前処理として、本実施形態ではポリゴンgを構成する全ての頂点のオクルージョン情報を用いて当該ポリゴンgの可視判定を行う。ポリゴンgが三角ポリゴンであれば、３つの頂点の各オクルージョン情報に基づいて可視判定が行われる。 In this embodiment, selects the requested virtual viewpoint p _v on the two nearest camera Cam _1, Cam ₂ initially, mapping each camera image Ic _1, Ic ₂ the polygon g of the 3D model M _j .. As a preprocessing thereof, in the present embodiment, the visibility of the polygon g is determined by using the occlusion information of all the vertices constituting the polygon g. If the polygon g is a triangular polygon, the visibility determination is performed based on the occlusion information of each of the three vertices.

例えば、カメラCam1に対するポリゴンgの可視判定フラグをg_c1と表現するとき、三角ポリゴンgを構成する３頂点の全てが可視であればフラグg_c1は可視、３頂点のうちいずれか一つでも不可視であればフラグg_c1は不可視とされる。このようにして各ポリゴンの可視判定の結果が得られると、以下のようにケース別でテクスチャマッピングが行われる。 For example, when the visibility judgment flag of the polygon g with respect to the camera Cam1 _{is expressed as g c1} _{, the flag g c1} is visible if all three vertices constituting the triangular polygon g are visible, and any one of the three vertices is invisible. If so, the flag g _c1 is invisible. When the result of the visibility determination of each polygon is obtained in this way, texture mapping is performed for each case as follows.

ケース１．フラグg_c1，g_c2がいずれも可視の場合：
次式(2)によりアルファブレンドによるマッピングが行われる。 Case 1. If flags g _c1 and g _c2 are both visible:
Mapping by alpha blend is performed by the following equation (2).

ここで、texture_c1(g)、texture_c2(g)はポリゴンgがカメラCam₁，Cam₂において対応するカメラ画像領域を示し、texture(g)は当該ポリゴンにマッピングされるテクスチャを示す。また、アルファブレンドの比率aは仮想視点p_vと各カメラ視点pc₁，pc₂との距離（アングル）の比に応じて算出される。 Here, texture _c1 (g) and texture _c2 (g) _{indicate the camera image area in which the polygon g corresponds to the cameras Cam 1} and Cam ₂ , and texture (g) indicates the texture mapped to the polygon. The alpha blend ratio a is calculated according to the ratio of the distance (angle) between the _{virtual viewpoint p v} and the camera viewpoints pc ₁ and pc _2.

ケース２．フラグg_c1，g_c2のいずれかのみが可視の場合：
可視であるカメラのテクスチャのみを用いてポリゴンgがレンダリングされる。すなわち上式(2)において、可視であるカメラのtexture_ci(g)に対応するアルファブレンド比率aの値を1とする。その他の形態としては、仮想視点p_vからみて次に近いカメラCam₃を、カメラCam₁，Cam₂うち不可視であるカメラの代わりとして参照する。この際、テクスチャのアルファブレンドの方法は上式(2)と同様である。 Case 2. If only flags g _c1 or g _c2 are visible:
The polygon g is rendered using only the visible camera texture. That is, in the above equation (2), the value of the alpha blend ratio a corresponding to _{the texture ci (g) of the visible camera is set to 1.} As another form, the _{camera Cam 3} , which is the next closest to the _{virtual viewpoint p v} , is referred to as a substitute for the invisible camera of the cameras Cam ₁ and Cam _2. At this time, the method of alpha blending the texture is the same as the above equation (2).

ケース３．フラグg_c1，g_c2の全てが不可視である場合：
仮想視点p_vからみて次に近いカメラCam₃のテクスチャを用いてレンダリングする。カメラCam₃も不可視である場合は、さらに次に近いカメラCam₄…といったように、距離の近いカメラから順にカメラテクスチャを参照する。この際、順次参照するカメラの台数を２以上として、上式(2)に則ってブレンディング処理を行っても良い。 Case 3. If all of the flags g _c1 and g _{c2 are invisible:}
Render using the texture of the _{camera Cam 3} , which is the next closest to the virtual viewpoint p _v. If the camera Cam _{3 is} also invisible, the camera textures are referenced in order from the camera with the closest distance, such as the _{next closest camera Cam 4 ....} At this time, the blending process may be performed according to the above equation (2), with the number of cameras referred to sequentially being 2 or more.

上記の例では、初期参照する近傍カメラ台数を２台としているが、ユーザ設定により変更しても良い。その際、初期参照カメラ台数bに応じて、上式(2)はb台のカメラの線形和（重みの総和が１）とする拡張が行われる。また、全てのカメラにおいて不可視となったポリゴンについてはテクスチャがマッピングされない。 In the above example, the number of nearby cameras to be initially referred to is two, but it may be changed by user setting. At that time, according to the number of initial reference cameras b, the above equation (2) is extended so that the linear sum of the b cameras (the sum of the weights is 1). Also, textures are not mapped to polygons that are invisible in all cameras.

なお、自由視点レンダリング部１１０における遮蔽物3Dモデルの表示は、予め用意された汎用3Dモデルなどを入力として、それを配置することで行われる。これは、ゴールポストなどの3Dモデルは一般的に時刻と共に大きく変化することがないことに加え、視体積交差法由来のモデルはあくまでN台のカメラから合成することで生成された3Dモデルのため、品質面でも事前に用意されたものに劣る可能性が高いからである。 In addition, the display of the shield 3D model in the free viewpoint rendering unit 110 is performed by inputting a general-purpose 3D model or the like prepared in advance and arranging it. This is because 3D models such as goal posts generally do not change significantly with time, and models derived from the visual volume crossing method are 3D models generated by synthesizing from N cameras. This is because there is a high possibility that the quality will be inferior to that prepared in advance.

図６は、本実施形態により生成されるレンダリングモデル[同図(b)]を従来技術により生成されるレンダリング画像[同図(a)]と比較した図である。 FIG. 6 is a diagram comparing the rendering model [the figure (b)] generated by the present embodiment with the rendered image [the figure (a)] generated by the prior art.

従来技術では、ゴールポストにより遮蔽されるシルエット画像の左脚部分に欠損が生じているのに対して、本実施形態により生成されたレンダリングモデルでは左脚部分にテクスチャが正確にマッピングされており、欠損や違和感のない正確な自由視点映像が再現されていることが判る。 In the conventional technique, the left leg portion of the silhouette image shielded by the goal post is defective, whereas in the rendering model generated by the present embodiment, the texture is accurately mapped to the left leg portion. It can be seen that an accurate free-viewpoint image without any defects or discomfort is reproduced.

なお、上記の第１実施形態では遮蔽物3Dモデル減算部１０９を設け、統合3Dモデルから遮蔽物3Dモデルを除去し、実質的に被写体3Dモデルのみを対象にレンダリングを行うものとして説明した。 In the first embodiment described above, it has been described that the shield 3D model subtraction unit 109 is provided, the shield 3D model is removed from the integrated 3D model, and rendering is performed substantially only on the subject 3D model.

しかしながら、本発明はこれのみに限定されるものではなく、図７に示した第２実施形態のように、遮蔽物3Dモデル生成部１０７および遮蔽物3Dモデル減算部１０９を省略し、遮蔽物3Dモデルが減算されていない統合3Dモデルを対象に自由視点レンダリングが行われるようにしても良い。 However, the present invention is not limited to this, and as in the second embodiment shown in FIG. 7, the shield 3D model generation unit 107 and the shield 3D model subtraction unit 109 are omitted, and the shield 3D Free-viewpoint rendering may be performed on the integrated 3D model to which the model has not been subtracted.

このようにしても、自由視点レンダリング部１１０において遮蔽物3Dモデルが汎用3Dモデルなどで入力される3DCGで覆い隠されれば見た目の違和感は生じにくい。 Even in this way, if the shield 3D model is covered with the 3DCG input by the general-purpose 3D model or the like in the free-viewpoint rendering unit 110, the appearance is unlikely to be uncomfortable.

図８，９は、複数の視聴端末へ仮想視点の異なるレンダリング画像を配信する多端末配信システムへの適用例を示した図である。 8 and 9 are diagrams showing an application example to a multi-terminal distribution system that distributes rendered images having different virtual viewpoints to a plurality of viewing terminals.

一般に、3Dモデルの生成やオクルージョン情報は各フレームに対して1回計算されればよいため、ハイエンドなPCなどで高速に計算を行って保存しておく。そして、この3Dモデルやオクルージョン情報を、自由視点を視聴したい視聴端末に配信し、各視聴端末にレンダリング部を配置するような構成とすることで、ハイエンドなPCが１台と、低スペックな複数の視聴端末とで多端末配信を実現できる。 In general, 3D model generation and occlusion information need only be calculated once for each frame, so it is calculated and saved at high speed on a high-end PC or the like. Then, by distributing this 3D model and occlusion information to the viewing terminals that want to view the free viewpoint and arranging the rendering unit on each viewing terminal, one high-end PC and multiple low-spec PCs are used. Multi-terminal distribution can be realized with the viewing terminal of.

3Dモデルの遮蔽関係自体は、自由視点レンダリング部１１０に入力される3Dモデルを用いて当該レンダリング部で改めて計算することも可能である。しかしながら、事前にオクルージョン情報という形で保存しておくことで、レンダリング部はオクルージョン情報を参照するだけで遮蔽関係を読み解くことが可能になることから、自由視点レンダリング部１１０の処理負荷を低減できる効果が期待される。 The shielding relationship itself of the 3D model can be recalculated by the rendering unit using the 3D model input to the free viewpoint rendering unit 110. However, by saving in the form of occlusion information in advance, the rendering unit can decipher the shielding relationship only by referring to the occlusion information, so that the processing load of the free viewpoint rendering unit 110 can be reduced. There is expected.

図８の例では、レンダリングに特化した複数の専用PCを用意し、各視聴端末からの視聴要求に応答して視点の異なる自由視点映像をレンダリングして配信している。 In the example of FIG. 8, a plurality of dedicated PCs specialized for rendering are prepared, and free viewpoint images having different viewpoints are rendered and distributed in response to viewing requests from each viewing terminal.

図９の例では、各視聴端末に自由視点レンダリング部１１０を実装し、視聴端末ごとにレンダリングが実行されるようにしている。 In the example of FIG. 9, a free-viewpoint rendering unit 110 is mounted on each viewing terminal so that rendering is executed for each viewing terminal.

なお、上記の実施形態では各カメラが固定である場合を例にして説明したが、本発明はこれのみに限定されるものではなく、移動カメラを用いる場合にも同様に適用できる。以下、移動カメラを用いる場合に第１実施形態から変更される構成について説明する。 In the above embodiment, the case where each camera is fixed has been described as an example, but the present invention is not limited to this, and can be similarly applied to the case where a mobile camera is used. Hereinafter, the configuration changed from the first embodiment when the mobile camera is used will be described.

前記被写体シルエット画像生成部１０２は、第１実施形態では背景差分法を用いて被写体シルエット画像を生成した。しかしながら、移動カメラを用いると背景が変化し、被写体シルエット画像を背景差分法で生成することは難しい。そこで、非特許文献５に開示されるように、フレームごとに独立した処理を行えるシルエット抽出手法を用いることができる。 In the first embodiment, the subject silhouette image generation unit 102 generates a subject silhouette image by using the background subtraction method. However, when a moving camera is used, the background changes, and it is difficult to generate a subject silhouette image by the background subtraction method. Therefore, as disclosed in Non-Patent Document 5, a silhouette extraction method that can perform independent processing for each frame can be used.

前記遮蔽物シルエット生成部１０３は、第１実施形態では遮蔽物3Dモデルおよびカメラパラメータに基づいて、最初に１回だけ遮蔽物シルエット画像を生成するものとして説明した。しかしながら、移動カメラを用いるとフレームごとにカメラパラメータが変化する。そこで、フレームごとに最新のカメラパラメータに基づいて3Dモデルを3D空間に配置し、これを各カメラ画像上に逆投影することができる。 In the first embodiment, the shield silhouette generation unit 103 has been described as generating a shield silhouette image only once at the beginning based on the shield 3D model and camera parameters. However, when a mobile camera is used, the camera parameters change for each frame. Therefore, a 3D model can be placed in 3D space for each frame based on the latest camera parameters, and this can be back-projected onto each camera image.

なお、カメラパラメータの算出作業をフレームごとに手動で行うことは困難であることから、非特許文献７に開示されるように、オートキャリブレーションを行いながらフレームごとにカメラ行列および外部パラメータ行列を計算し、フレームごとに異なる遮蔽物シルエット画像を算出するようにしても良い。 Since it is difficult to manually calculate the camera parameters for each frame, as disclosed in Non-Patent Document 7, the camera matrix and the external parameter matrix are calculated for each frame while performing autocalibration. However, a different shield silhouette image may be calculated for each frame.

遮蔽物シルエット画像生成部１０３は、第１実施形態では遮蔽物3Dモデルおよびカメラパラメータに基づいて、最初に１回だけ遮蔽物シルエット画像を生成するものとして説明した。しかしながら、移動カメラを用いるとフレームごとに遮蔽物シルエット画像生成部１０３で生成される遮蔽物シルエット画像が変化するので、遮蔽物3Dモデル生成部１０７もフレームごとに遮蔽物3Dモデルを生成する機能を具備していてもよい。 In the first embodiment, the shield silhouette image generation unit 103 has been described as generating the shield silhouette image only once at the beginning based on the shield 3D model and the camera parameters. However, when the moving camera is used, the shield silhouette image generated by the shield silhouette image generation unit 103 changes for each frame, so the shield 3D model generation unit 107 also has a function of generating a shield 3D model for each frame. It may be provided.

１…自由視点映像生成装置，１０１…カメラ映像取得部，１０２…被写体シルエット画像生成部，１０３…遮蔽物シルエット画像生成部，１０４…遮蔽物シルエット画像DB，１０５…シルエット統合部，１０６…3Dモデル生成部，１０７…遮蔽物3Dモデル生成部，１０８…オクルージョン情報生成部，１０９…遮蔽物3Dモデル減算部，１１０…自由視点レンダリング部 1 ... Free viewpoint image generator, 101 ... Camera image acquisition unit, 102 ... Subject silhouette image generation unit, 103 ... Shield silhouette image generation unit, 104 ... Shield silhouette image DB, 105 ... Silhouette integration unit, 106 ... 3D model Generation unit, 107 ... Obstruction 3D model generation unit, 108 ... Occlusion information generation unit, 109 ... Obstruction 3D model subtraction unit, 110 ... Free viewpoint rendering unit

Claims

被写体および遮蔽物を視点の異なる複数のカメラで同期撮影したカメラ画像に基づいて自由視点映像を生成する自由視点映像生成装置において、
カメラごとに被写体シルエット画像を生成する手段と、
カメラごとに遮蔽物シルエット画像を生成する手段と、
カメラごとに被写体および遮蔽物の各シルエット画像を統合して統合シルエット画像を生成する手段と、
各統合シルエット画像を用いた視体積交差法により統合3Dモデルを生成する手段と、
前記統合3Dモデルの各部位が各カメラの視点で可視および不可視のいずれであるかを登録したオクルージョン情報を生成する手段と、
前記オクルージョン情報に基づいて、前記統合3Dモデルの部位ごとに、一部のカメラで不可視の部位へ当該部位が可視のカメラで取得したテクスチャをマッピングする手段とを具備したことを特徴とする自由視点映像生成装置。 In a free-viewpoint image generator that generates a free-viewpoint image based on camera images of a subject and an obstruction taken synchronously by multiple cameras with different viewpoints.
A means to generate a subject silhouette image for each camera,
A means to generate a shield silhouette image for each camera,
A means to integrate each silhouette image of the subject and the obstruction for each camera to generate an integrated silhouette image,
A means to generate an integrated 3D model by the visual volume crossing method using each integrated silhouette image,
A means for generating occlusion information that registers whether each part of the integrated 3D model is visible or invisible from the viewpoint of each camera.
Based on the occlusion information, each part of the integrated 3D model is provided with a means for mapping a texture acquired by a camera in which the part is visible to a part invisible by some cameras. Video generator.

前記統合3Dモデルから遮蔽物3Dモデルを減じる手段を具備し、
前記マッピングする手段は、遮蔽物の3Dモデルが減ぜられた統合3Dモデルの各部位に前記オクルージョン情報を用いてテクスチャをマッピングすることを特徴とする請求項１に記載の自由視点映像生成装置。 Provided with means to reduce the obstruction 3D model from the integrated 3D model.
The free-viewpoint image generation device according to claim 1, wherein the mapping means maps a texture to each part of an integrated 3D model in which a 3D model of a shield is reduced by using the occlusion information.

前記遮蔽物3Dモデルが当該遮蔽物を模した汎用の3Dモデルであることを特徴とする請求項２に記載の自由視点映像生成装置。 The free viewpoint image generation device according to claim 2, wherein the shield 3D model is a general-purpose 3D model that imitates the shield.

前記遮蔽物シルエット画像に基づいて遮蔽物3Dモデルを生成する手段を具備し、
前記遮蔽物3Dモデルを減じる手段は、前記生成した遮蔽物3Dモデルを前記統合3Dモデルから減じることを特徴とする請求項２に記載の自由視点映像生成装置。 A means for generating a shield 3D model based on the shield silhouette image is provided.
The free viewpoint image generator according to claim 2, wherein the means for reducing the shield 3D model is to reduce the generated shield 3D model from the integrated 3D model.

前記統合3Dモデルがポリゴンモデルであり、
前記オクルージョン情報には、各ポリゴンの頂点部位ごとに各カメラの視点で可視および不可視のいずれであるかが登録されたことを特徴とする請求項１ないし４のいずれかに記載の自由視点映像生成装置。 The integrated 3D model is a polygon model,
The free viewpoint image generation according to any one of claims 1 to 4, wherein in the occlusion information, whether it is visible or invisible from the viewpoint of each camera is registered for each vertex portion of each polygon. Device.

前記遮蔽物シルエット画像を生成する手段は、別途に用意した遮蔽物3Dモデルを三次元空間の当該遮蔽物の定位置に配置し、当該定位置の遮蔽物をカメラパラメータに基づいて各カメラへ逆投影することで各遮蔽物シルエット画像を生成することを特徴とする請求項１ないし５のいずれかに記載の自由視点映像生成装置。 The means for generating the shield silhouette image is to place a separately prepared shield 3D model at a fixed position of the shield in three-dimensional space, and reverse the shield at the fixed position to each camera based on the camera parameters. The free-viewpoint image generation device according to any one of claims 1 to 5, wherein each shield silhouette image is generated by projection.

前記カメラパラメータは、遮蔽物を模した既知の構造物から抽出した各特徴点とカメラ画像から抽出した遮蔽物の各特徴点とのマッチング結果に基づいて推定されることを特徴とする請求項６に記載の自由視点映像生成装置。 6. The camera parameter is estimated based on the matching result of each feature point extracted from a known structure imitating a shield and each feature point of the shield extracted from a camera image. The free-viewpoint image generator described in.

コンピュータが、被写体および遮蔽物を視点の異なる複数のカメラで同期撮影したカメラ画像に基づいて自由視点映像を生成する自由視点映像生成方法において、
カメラごとに被写体シルエット画像を生成し、
カメラごとに遮蔽物シルエット画像を生成し、
カメラごとに被写体および遮蔽物の各シルエット画像を統合して統合シルエット画像を生成し、
各統合シルエット画像を用いた視体積交差法により統合3Dモデルを生成し、
前記統合3Dモデルの各部位が各カメラの視点で可視および不可視のいずれであるかを登録したオクルージョン情報を生成し、
前記オクルージョン情報に基づいて、前記統合3Dモデルの部位ごとに、一部のカメラで不可視の部位へ当該部位が可視のカメラで取得したテクスチャをマッピングすることを特徴とする自由視点映像生成方法。 In a free-viewpoint image generation method in which a computer generates a free-viewpoint image based on camera images obtained by synchronously shooting a subject and an obstacle with a plurality of cameras having different viewpoints.
Generate a subject silhouette image for each camera
Generate a shield silhouette image for each camera
The silhouette images of the subject and the obstruction are integrated for each camera to generate an integrated silhouette image.
Generate an integrated 3D model by the visual volume crossing method using each integrated silhouette image.
Occlusion information that registers whether each part of the integrated 3D model is visible or invisible from the viewpoint of each camera is generated.
A method for generating a free-viewpoint image, which comprises mapping a texture acquired by a camera in which the part is visible to a part invisible by some cameras for each part of the integrated 3D model based on the occlusion information.

前記統合3Dモデルから遮蔽物3Dモデルを減じ、
遮蔽物の3Dモデルが減ぜられた統合3Dモデルの各部位に前記オクルージョン情報を用いてテクスチャをマッピングすることを特徴とする請求項８に記載の自由視点映像生成方法。 Subtract the obstruction 3D model from the integrated 3D model,
The free viewpoint image generation method according to claim 8, wherein a texture is mapped to each part of the integrated 3D model in which the 3D model of the shield is reduced by using the occlusion information.

被写体および遮蔽物を視点の異なる複数のカメラで同期撮影したカメラ画像に基づいて自由視点映像を生成する自由視点映像生成プログラムにおいて、
カメラごとに被写体シルエット画像を生成する手順と、
カメラごとに遮蔽物シルエット画像を生成する手順と、
カメラごとに被写体および遮蔽物の各シルエット画像を統合して統合シルエット画像を生成する手順と、
各統合シルエット画像を用いた視体積交差法により統合3Dモデルを生成する手順と、
前記統合3Dモデルの各部位が各カメラの視点で可視および不可視のいずれであるかを登録したオクルージョン情報を生成する手順と、
前記オクルージョン情報に基づいて、前記統合3Dモデルの部位ごとに、一部のカメラで不可視の部位へ当該部位が可視のカメラで取得したテクスチャをマッピングする手順と、
をコンピュータに実行させる自由視点映像生成プログラム。 In a free-viewpoint image generation program that generates a free-viewpoint image based on camera images of a subject and an obstruction taken synchronously by multiple cameras with different viewpoints.
The procedure for generating a subject silhouette image for each camera,
The procedure for generating a shield silhouette image for each camera,
The procedure for integrating the silhouette images of the subject and the obstruction for each camera to generate an integrated silhouette image,
The procedure for generating an integrated 3D model by the visual volume crossing method using each integrated silhouette image, and
A procedure for generating occlusion information that registers whether each part of the integrated 3D model is visible or invisible from the viewpoint of each camera, and
Based on the occlusion information, for each part of the integrated 3D model, the procedure of mapping the texture acquired by the camera in which the part is visible to the part invisible by some cameras, and the procedure.
A free-viewpoint video generation program that lets a computer execute.

前記統合3Dモデルから遮蔽物3Dモデルを減じる手順を含み、
前記マッピングする手順では、遮蔽物の3Dモデルが減ぜられた統合3Dモデルの各部位に前記オクルージョン情報を用いてテクスチャをマッピングすることを特徴とする請求項１０に記載の自由視点映像生成プログラム。 Including the procedure of subtracting the obstruction 3D model from the integrated 3D model,
The free viewpoint image generation program according to claim 10, wherein in the mapping procedure, a texture is mapped to each part of the integrated 3D model in which the 3D model of the shield is reduced by using the occlusion information.