JP6894873B2

JP6894873B2 - Image processing equipment, methods and programs

Info

Publication number: JP6894873B2
Application number: JP2018138177A
Authority: JP
Inventors: 軍陳; 敬介野中; 内藤　整; 整内藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2018-07-24
Filing date: 2018-07-24
Publication date: 2021-06-30
Anticipated expiration: 2038-07-24
Also published as: JP2020016975A

Description

本発明は、自由視点画像を高速に得られるビルボード方式へと適用することが可能な、当該ビルボードを生成するために必要となる、仮想視点に対応する画像上の対象の領域を高速且つ高精度に得ることのできる、画像処理装置、方法及びプログラムに関する。 INDUSTRIAL APPLICABILITY The present invention can apply a free-viewpoint image to a billboard system that can be obtained at high speed, and can quickly and quickly cover a target area on an image corresponding to a virtual viewpoint, which is required to generate the billboard. It relates to an image processing apparatus, a method and a program that can be obtained with high accuracy.

自由視点映像技術は、視聴者が対話的且つ自由に視点を制御することで、任意の視点から見た映像を生成可能とするものである。当該技術によれば、自由視点ではない通常の映像コンテンツにおけるように視点がコンテンツ作成者等によって予め決められてしまうことなく、視聴者が所望の視点を設定することによって、臨場感ある映像を視聴することが可能である。当該技術においては一般に、映像の対象シーンを囲うように複数のカメラを用いて撮影を行うことで得られる多視点映像から、高品質な自由視点映像を生成させることが行われている。ここで、自由視点映像を生成するアプローチは２種類に分けることができる。１つは完全な３次元再構成に基づくものであり、もう１つは当該３次元再構成を必ずしも利用しないものである。 The free-viewpoint video technology enables the viewer to interactively and freely control the viewpoint to generate an image viewed from an arbitrary viewpoint. According to this technology, the viewer can set a desired viewpoint to view a realistic image without the viewpoint being determined in advance by the content creator or the like as in ordinary video content that is not a free viewpoint. It is possible to do. In this technique, it is generally performed to generate a high-quality free-viewpoint image from a multi-viewpoint image obtained by shooting with a plurality of cameras so as to surround the target scene of the image. Here, the approach for generating a free-viewpoint image can be divided into two types. One is based on a complete 3D reconstruction and the other does not necessarily utilize the 3D reconstruction.

非特許文献１においては、３次元再構成及びテクスチャマッピングを自動で行うことによる自由視点映像技術が紹介されており、シルエット画像による視体積の交差部分（積集合）として生成されるビジュアル・ハル（Visual Hull）が抽出対象の近似的な形状を与えると共に、視点依存のテクスチャマッピングを行うことで高品質なレンダリングが可能となる旨が指摘されている。ここで、テクスチャマッピングに目立つ誤差が生じないように、ビジュアル・ハルの形状は精度を有している必要がある。 Non-Patent Document 1 introduces a free-viewpoint video technology by automatically performing three-dimensional reconstruction and texture mapping, and a visual hull (product set) generated as an intersection (intersection) of visual volumes based on a silhouette image. It has been pointed out that Visual Hull) gives an approximate shape to the extraction target and enables high-quality rendering by performing viewpoint-dependent texture mapping. Here, the shape of the visual hull needs to be accurate so that there is no noticeable error in the texture mapping.

非特許文献２においては、完全な３次元再構成を行うことなく自由視点映像を半自動で生成する手法が紹介されており、具体的には対象の３次元モデルの２次元領域による近似としてのビルボード（billboard）に関してテクスチャマッピングを行うべく、対象検出、対象追跡及び対象分離を行うことにより、携帯端末上のアプリを用いて映像品質を損なうことなくスポーツの試合の自由視点映像を生成できる、とされている。 Non-Patent Document 2 introduces a method of semi-automatically generating a free-viewpoint image without performing complete 3D reconstruction. Specifically, a building as an approximation by a 2D region of a target 3D model. By performing target detection, target tracking, and target separation in order to perform texture mapping on the board (billboard), it is possible to generate a free-viewpoint video of a sports game using an application on a mobile terminal without compromising video quality. Has been done.

非特許文献３においては、対象のビジュアル・ハルを多面体モデルとして得る手法として、具体的に（１）視線エッジ(viewing edge)を計算し、（２）錐体交差エッジを計算し、（３）多面体の面を特定する、というステップからなる手法が紹介されている。当該手法は第一に、緻密で多様な形状に対応しうる多面体モデルを検証可能に生成可能であり、第二に、既存の最新手法よりも効率的に当該多面体モデルを生成可能である、とされている。 In Non-Patent Document 3, as a method of obtaining the target visual hull as a polyhedral model, (1) the viewing edge is specifically calculated, (2) the pyramid intersection edge is calculated, and (3) A method consisting of the steps of identifying the faces of a polyhedron is introduced. Firstly, the method can generate a polyhedral model that can correspond to precise and various shapes in a verifiable manner, and secondly, it can generate the polyhedral model more efficiently than the existing latest method. Has been done.

Grau O, Hilton A, Kilner J, et al. A free-viewpoint video system for visualization of sport scenes[J]. SMPTE motion imaging journal, 2007, 116(5-6): 213-219.Grau O, Hilton A, Kilner J, et al. A free-viewpoint video system for visualization of sport scenes [J]. SMPTE motion imaging journal, 2007, 116 (5-6): 213-219. Sabirin, Houari, et al. "Semi-Automatic Generation of Free Viewpoint Video Contents for Sport Events: Toward Real-time Delivery of Immersive Experience." IEEE MultiMedia (2018).Sabirin, Houari, et al. "Semi-Automatic Generation of Free Viewpoint Video Contents for Sport Events: Toward Real-time Delivery of Immersive Experience." IEEE MultiMedia (2018). Franco J S, Boyer E. Efficient polyhedral modeling from silhouettes[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(3): 414-427.Franco J S, Boyer E. Efficient polyhedral modeling from silhouettes [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31 (3): 414-427.

しかしながら、以上のような従来技術は、自由視点映像を効率的且つ高品質に得ることに関して、依然として改良の余地を有するものであった。 However, the above-mentioned conventional techniques still have room for improvement in obtaining free-viewpoint images efficiently and with high quality.

すなわち、非特許文献１では高品質な自由視点映像を得るためには高精度な３次元モデルとしてのビジュアル・ハルを生成せざるを得ず、これによって計算量が顕著に増加するため効率が低下せざるを得なかった。また、非特許文献２では効率化のためのビルボード方式により完全な３次元モデルを得ること自体を回避しているが、ビルボード方式は３次元モデルを２次元で近似することによって、オクルージョン（対象同士が重なり合うことで、カメラから見えない遮蔽箇所が生じること）がある場合に最終的に生成される自由視点映像が不自然なものとなってしまうという問題があった。 That is, in Non-Patent Document 1, in order to obtain a high-quality free-viewpoint image, a visual hull as a highly accurate three-dimensional model must be generated, which significantly increases the amount of calculation and thus reduces efficiency. I had no choice but to do it. Further, in Non-Patent Document 2, it is avoided to obtain a complete three-dimensional model by the billboard method for efficiency, but the billboard method approximates the three-dimensional model in two dimensions to cause occlusion (occlusion). There is a problem that the finally generated free-viewpoint image becomes unnatural when there is a shielded part that cannot be seen from the camera due to the overlapping of the objects.

さらに、非特許文献３にも改良の余地があった。例えば、多面体モデルによって非特許文献１よりも高速化が達成されるとはいえ、当該多面体モデルについてもやはり、計算には相応の負荷が伴うものであった。 Further, there is room for improvement in Non-Patent Document 3. For example, although the polyhedron model achieves a higher speed than that of Non-Patent Document 1, the polyhedron model also involves a considerable load in the calculation.

以上のような従来技術の課題に鑑み、本発明は、自由視点画像を高速に得られるビルボード方式へと適用することが可能な、当該ビルボードを生成するために必要となる、仮想視点に対応する画像上の対象の領域を高速且つ高精度に得ることのできる、画像処理装置、方法及びプログラムを提供することを目的とする。 In view of the above problems of the prior art, the present invention provides a virtual viewpoint required to generate the billboard, which can be applied to a billboard system capable of obtaining a free-viewpoint image at high speed. It is an object of the present invention to provide an image processing apparatus, method and program capable of obtaining a target area on a corresponding image at high speed and with high accuracy.

上記目的を達成するため、本発明は画像処理装置であって、多視点画像から、各画像における視点位置と、前景及び背景の境界上の点と、に基づいて対象の点群データを得る点群生成部と、前記点群データをクラスタリングして個別対象ごとの点群データとなす分別部と、当該個別対象ごとの点群データにおいて各点の近傍点を取得する近傍取得部と、前記個別対象ごとの点群データを指定される仮想視点での画像平面へと逆投影する逆投影部と、前記取得された各点の近傍点を前記画像平面上の各点が接続する近傍点とすることで、当該接続する関係で定義される領域として、当該画像平面における個別対象の領域を抽出する抽出部と、を備えることを特徴とする。また、当該画像処理装置に対応する方法及びプログラムであることを特徴とする。 In order to achieve the above object, the present invention is an image processing device, and a point of obtaining target point cloud data from a multi-viewpoint image based on a viewpoint position in each image and a point on the boundary between the foreground and the background. A group generation unit, a sorting unit that clusters the point cloud data into point cloud data for each individual object, a neighborhood acquisition unit that acquires neighborhood points of each point in the point cloud data for each individual object, and the individual unit. The back projection unit that back-projects the point cloud data for each object onto the image plane at the designated virtual viewpoint and the neighborhood points of the acquired points are defined as the neighborhood points to which the points on the image plane are connected. As a result, the region defined by the connecting relationship is characterized by including an extraction unit for extracting an individual target region in the image plane. Further, it is characterized in that it is a method and a program corresponding to the image processing apparatus.

本発明によれば、指定される仮想視点での画像における、入力された多視点画像に撮影されている個別対象の領域を、高速且つ高精度に得ることが可能となる。 According to the present invention, it is possible to obtain a region of an individual object captured in an input multi-viewpoint image in an image at a designated virtual viewpoint at high speed and with high accuracy.

一実施形態に係る画像処理装置の機能ブロック図である。It is a functional block diagram of the image processing apparatus which concerns on one Embodiment. 点群生成部での処理を説明するための図である。It is a figure for demonstrating the processing in a point group generation part. 図２の補助説明図である。It is an auxiliary explanatory view of FIG. 点群生成部及び分別部での処理を説明するための図である。It is a figure for demonstrating the processing in a point group generating part and a sorting part. 逆投影部及び抽出部での処理を説明するための図である。It is a figure for demonstrating the processing in the back projection part and the extraction part.

図１は、一実施形態に係る画像処理装置の機能ブロック図である。図示するように、画像処理装置10は、点群生成部1、分別部2、近傍取得部3、逆投影部4、抽出部5及び描画部6を備える。画像処理装置10はその全体的な動作として、入力される多視点画像と、ユーザ（視聴者）により入力される仮想視点と、から、ビルボード方式に基づいて当該指定された仮想視点で見た自由視点画像を生成するものである。 FIG. 1 is a functional block diagram of an image processing device according to an embodiment. As shown in the figure, the image processing device 10 includes a point group generating unit 1, a sorting unit 2, a neighborhood acquisition unit 3, a back projection unit 4, an extraction unit 5, and a drawing unit 6. As its overall operation, the image processing device 10 is viewed from the input multi-viewpoint image and the virtual viewpoint input by the user (viewer) from the designated virtual viewpoint based on the billboard method. It generates a free-viewpoint image.

また図示するように、画像処理装置10の各部1〜6の概略的な機能は次の通りである。点群生成部1は、入力される多視点画像より、当該多視点画像において撮影されている対象（一般に１つ以上の対象が想定される）の3次元空間内での配置を反映したものとして、3次元空間内での点群データを生成し、当該点群データを分別部2へと出力する。なお、入力される多視点画像にはその各視点の画像において予め前景及び背景の区別が与えられており、点群生成部1では当該与えられている前景及び背景の区別も利用することによって点群データを生成する。 Further, as shown in the figure, the schematic functions of the respective parts 1 to 6 of the image processing apparatus 10 are as follows. The point cloud generator 1 reflects the arrangement of the objects (generally one or more objects) captured in the multi-viewpoint image in the three-dimensional space from the input multi-viewpoint image. , Generates point cloud data in the three-dimensional space, and outputs the point cloud data to the sorting unit 2. It should be noted that the input multi-viewpoint image is given a distinction between the foreground and the background in advance in the image of each viewpoint, and the point cloud generation unit 1 also uses the given distinction between the foreground and the background to make a point. Generate point cloud data.

分別部2は、点群生成部1から得られた点群データに対してクラスタリングを行うことで個別対象ごとに分別された点群データを得て、当該個別対象ごとの点群データを近傍取得部3及び逆投影部4へと出力する。 The sorting unit 2 obtains the point cloud data sorted for each individual object by clustering the point cloud data obtained from the point cloud generating unit 1, and acquires the point cloud data for each individual target in the neighborhood. Output to unit 3 and back projection unit 4.

近傍取得部3は、分別部2から得られた個別対象ごとの点群データに属する各点について、その近傍に該当する点が同じ個別対象の点群データ内のうちのいずれであるかの情報を取得して、個別対象の点群データごとに当該取得された各点の近傍の点についての情報を抽出部5へと出力する。 The neighborhood acquisition unit 3 provides information on which of the points corresponding to the neighborhood is in the same individual target point cloud data for each point belonging to the point cloud data for each individual object obtained from the sorting unit 2. Is acquired, and the information about the points in the vicinity of each acquired point is output to the extraction unit 5 for each point cloud data of the individual target.

逆投影部4は、分別部2から得られた個別対象ごとの点群データを、ユーザが設定する仮想視点に対応する画像平面へと逆投影して、当該逆投影した結果すなわち当該画像平面上での個別対象ごとの点群データを抽出部5へと出力する。逆投影部4での当該逆投影処理はすなわち、ユーザ設定の仮想視点に仮想カメラがあるものとして、個別対象ごとの点群データを当該仮想カメラで撮影した際のカメラ画像上での配置を求める処理に該当する。 The back projection unit 4 back-projects the point cloud data for each individual object obtained from the sorting unit 2 onto the image plane corresponding to the virtual viewpoint set by the user, and the result of the back projection, that is, on the image plane. The point cloud data for each individual object in is output to the extraction unit 5. In the back projection process in the back projection unit 4, that is, assuming that there is a virtual camera in the virtual viewpoint set by the user, the arrangement on the camera image when the point cloud data for each individual object is taken by the virtual camera is obtained. Corresponds to processing.

抽出部5は、近傍取得部3から得られた個別対象ごとの点群データ（３次元空間内のデータ）の各点の近傍の点の情報を利用することにより、逆投影部4から得られた画像平面上での配置（２次元画像平面上のデータ）としての個別対象ごとの点群データに関して、対応する個別対象が当該画像平面上で占める領域の情報を求め、当該得られた画像平面上での個別対象の領域の情報を描画部6へと出力する。抽出部5では、近傍取得部3から得られた情報において互いに近傍の点であるとされる点同士は当該領域上での接続関係にあるものとして、当該領域の情報を得ることができる。 The extraction unit 5 is obtained from the back projection unit 4 by using the information of the points in the vicinity of each point of the point group data (data in the three-dimensional space) for each individual object obtained from the neighborhood acquisition unit 3. Regarding the point group data for each individual object as the arrangement on the image plane (data on the two-dimensional image plane), the information of the area occupied by the corresponding individual object on the image plane is obtained, and the obtained image plane. The information of the area of the individual target above is output to the drawing unit 6. In the extraction unit 5, the information obtained from the neighborhood acquisition unit 3 can obtain the information in the region, assuming that the points that are considered to be points in the vicinity of each other are connected to each other on the region.

描画部6は、画像処理装置10に入力された多視点画像のうち、ユーザ設定の仮想視点に近いと判定される１つ以上の視点の画像を用いて、抽出部5で得られた画像平面（ユーザ設定の仮想視点のカメラ画像の平面に相当）上の各個別対象の領域（個別対象が占める前景に相当）へと描画（レンダリング）を行うことにより、ユーザ設定の仮想視点で見た自由視点画像を生成する。 The drawing unit 6 uses an image of one or more viewpoints determined to be close to the virtual viewpoint set by the user among the multi-viewpoint images input to the image processing device 10, and the image plane obtained by the extraction unit 5. By drawing (rendering) to the area of each individual object (corresponding to the foreground occupied by the individual object) on (corresponding to the plane of the camera image of the virtual viewpoint set by the user), the freedom seen from the virtual viewpoint set by the user Generate a viewpoint image.

以上、画像処理装置10の各部1〜6の概略的な処理内容を説明したが、当該処理は映像上の各時刻のフレームについてそれぞれ実行することにより、リアルタイムで実施することが可能である。すなわち、画像処理装置10は所定のフレームレートで与えられる多視点映像の各時刻t=1,2,3,…のフレームとしての多視点画像MVP(t)と、ユーザから設定される各時刻tにおける仮想視点V(t)と、を入力として受け取り、当該仮想視点V(t)で見たものとしての自由視点画像FVP(t)を、各時刻tにおける映像としてリアルタイムで出力することができる。本発明の画像処理装置10による高速な処理によって当該リアルタイムでの処理が可能となると共に、多視点映像における対象間にオクルージョンが発生しているような場合であっても高品質な自由視点映像を得ることが可能となる。 Although the outline processing contents of each part 1 to 6 of the image processing apparatus 10 have been described above, the processing can be performed in real time by executing each frame at each time on the video. That is, the image processing device 10 has a multi-viewpoint image MVP (t) as a frame of each time t = 1, 2, 3, ... Of the multi-viewpoint video given at a predetermined frame rate, and each time t set by the user. The virtual viewpoint V (t) in the above is received as an input, and the free viewpoint image FVP (t) as seen at the virtual viewpoint V (t) can be output in real time as a video at each time t. The high-speed processing by the image processing apparatus 10 of the present invention enables the real-time processing, and even when occlusion occurs between objects in the multi-viewpoint image, a high-quality free-viewpoint image can be obtained. It becomes possible to obtain.

以下、画像処理装置10の各部1〜6の処理の詳細を説明する。図２ないし図５は、当該各部の処理にて扱われるデータ等を模式例によって示すものであり、以下の説明においては適宜、これらを例として参照する。図２は点群生成部1での処理を説明するための図であり、図３は図２の補助説明図である。図４は[a]〜[f]と分けて点群生成部1及び分別部2での処理を説明するための図である。図５は[g]〜[k]と分けて逆投影部4及び抽出部5での処理を説明するための図である。 The details of the processing of each part 1 to 6 of the image processing apparatus 10 will be described below. 2 to 5 show data and the like handled in the processing of each part by a schematic example, and in the following description, these will be referred to as examples as appropriate. FIG. 2 is a diagram for explaining the processing in the point group generating unit 1, and FIG. 3 is an auxiliary explanatory diagram of FIG. FIG. 4 is a diagram for explaining the processing in the point group generating unit 1 and the sorting unit 2 separately from [a] to [f]. FIG. 5 is a diagram for explaining the processing in the back projection unit 4 and the extraction unit 5 separately from [g] to [k].

＜点群生成部1＞
点群生成部1の役割は、前掲の非特許文献１のような完全な3次元再構成の結果としてポリゴンメッシュ等の形で与えられるビジュアル・ハルに相当するものを、ビルボード方式に基づいて自由視点画像を生成する画像処理装置10において高速に求めるものである。これにより、画像処理装置10は従来のビルボード方式では困難であったオクルージョン耐性を確保することができる。点群生成部1では具体的には点群として、当該ビジュアル・ハルに相当するものを求める。 <Point group generator 1>
The role of the point cloud generator 1 is based on the billboard method, which corresponds to a visual hull given in the form of a polygon mesh or the like as a result of complete three-dimensional reconstruction as in Non-Patent Document 1 described above. This is required at high speed in the image processing device 10 that generates a free viewpoint image. As a result, the image processing device 10 can secure occlusion resistance, which was difficult with the conventional billboard method. Specifically, the point group generator 1 obtains a point cloud corresponding to the visual hull.

ここで、点群生成部1が点群を求めるための入力としての多視点画像は、N台のカメラC1, C2, … ,CNによって撮影されたN枚の画像P1, P2, …, PNで構成されているものとし、背景差分法などの任意の既存手法を適用することによって、各画像P1, P2, …, PNにおいては前景及び背景の区別が予め付与されているものとする。ここで、各画像P1, P2, …, PNにおける前景の領域が、画像処理装置10によって生成される自由視点映像においてレンダリングの対象となる、撮影されている対象の領域に相当する。また、各カメラ及び各カメラ間のキャリブレーション情報も既知であるものとする。 Here, the multi-viewpoint image as the input for the point cloud generator 1 to obtain the point cloud is N images P1, P2,…, PN taken by N cameras C1, C2,…, CN. It is assumed that the foreground and the background are distinguished in advance in each image P1, P2, ..., PN by applying an arbitrary existing method such as the background subtraction method. Here, the foreground area in each of the images P1, P2, ..., PN corresponds to the area of the object being photographed, which is the object of rendering in the free-viewpoint image generated by the image processing device 10. It is also assumed that the calibration information between each camera and each camera is known.

点群生成部1では上記のように各視点の画像において前景及び背景の区別が付与された多視点画像よりビジュアル・ハルの近似としての点群を生成する。点群生成部1は具体的には、前掲の非特許文献３においてビジュアル・ハルの近似として生成されているポリゴンメッシュにおける視線エッジ（viewing edge、視線線分）を求めたうえでさらに、本発明の一実施形態においては特に、当該線分としての視線エッジの両端の点のみを抽出することで、点群を高速に生成することができる。 The point group generating unit 1 generates a point group as an approximation of the visual hull from the multi-viewpoint image in which the foreground and the background are distinguished in the image of each viewpoint as described above. Specifically, the point cloud generation unit 1 obtains the line-of-sight edge (viewing edge) of the polygon mesh generated as an approximation of the visual hull in Non-Patent Document 3 described above, and further, the present invention. In one embodiment, a point cloud can be generated at high speed by extracting only the points at both ends of the line-of-sight edge as the line segment.

ここで、点群生成部1における処理としての視線エッジの求め方を説明する。視線エッジ（一般に多数存在する）は、入力されたN枚の多視点画像P1, P2, …, PNの中から1つをリファレンス用の画像として選び、別のもう１つをターゲット用の画像として選んだうえで、当該リファレンス用の画像及びターゲット用の画像に対応するものとして、求めることができる。ターゲット用の画像Ptとリファレンス用の画像Pr（ここで、t=1, 2, …, N, r=1, 2, …, N, t≠r）とから得られる視線エッジ集合をVE(Pt,Pr)とすると、点群生成部1においては、以下の式(1)で与えられるようにターゲット用の画像Ptとリファレンス用の画像Prとの選択の仕方の全ての組み合わせ（多視点画像がN枚の場合、当該組み合わせ総数はN(N-1)通り）における視線エッジ集合VE(Pt,Pr)を集めたVE_allとして、視線エッジの全てを求める。そして前述のように、点群生成部1は当該求めたエッジ集合VE_allに属する各エッジ（線分）の両端の点の集合を、点群として出力する。 Here, a method of obtaining the line-of-sight edge as a process in the point group generating unit 1 will be described. For the line-of-sight edge (generally present in large numbers), select one of the input N multi-viewpoint images P1, P2,…, PN as the reference image, and use the other as the target image. After selecting, it can be obtained as corresponding to the reference image and the target image. VE (Pt) is the line-of-sight edge set obtained from the target image Pt and the reference image Pr (where t = 1, 2,…, N, r = 1, 2,…, N, t ≠ r). , Pr), in the point group generator 1, all combinations of the selection method of the target image Pt and the reference image Pr (multi-viewpoint image) as given by the following equation (1). In the case of N images, the total number of combinations is N (N-1)), and _all of the line-of-sight edges are obtained as VE all, which is a collection of line-of-sight edge sets VE (Pt, Pr). Then, as described above, the point cloud generator 1 outputs a set of points at both ends of each edge (line segment) belonging to the _{obtained edge set VE all as a point cloud.}

図２は、ターゲット画像Pt及びリファレンス画像Prから視線エッジ集合VE(Pt,Pr)を求める処理を説明するための図であり、ここではPt=P1且つPr=P2（t=1且つr=2）として選んだ場合を例として示しているが、選び方によらずエッジ集合VE(Pt,Pr)を求める処理は共通である。図２では、ターゲット画像P1及びリファレンス画像P2をそれぞれ撮影するカメラC1及びカメラC2の位置を、その光学中心の位置として点（白丸〇）で示している。（従って、図２に関する説明では「光学中心C1」や「光学中心C2」としてカメラの光学中心に言及することとする。）図２では光学中心C1や光学中心C2を通る直線群などが描かれているが、当該描かれている配置は、周知のステレオビジョンに関するエピポーラ幾何における配置と同様のものである。 FIG. 2 is a diagram for explaining the process of obtaining the line-of-sight edge set VE (Pt, Pr) from the target image Pt and the reference image Pr. Here, Pt = P1 and Pr = P2 (t = 1 and r = 2). ) Is shown as an example, but the process of obtaining the edge set VE (Pt, Pr) is common regardless of the selection method. In FIG. 2, the positions of the camera C1 and the camera C2 that capture the target image P1 and the reference image P2, respectively, are indicated by points (white circles ◯) as the positions of the optical centers thereof. (Therefore, in the description of FIG. 2, the optical center of the camera is referred to as "optical center C1" or "optical center C2".) In FIG. 2, a group of straight lines passing through the optical center C1 and the optical center C2 is drawn. However, the depicted arrangement is similar to the well-known arrangement in epipolar geometry for stereovision.

図２のP1,P2に関する視線エッジ集合VE(P1,P2)は以下の繰り返しステップのフローによって求めることができる。
（手順１）ターゲット画像P1の前景領域R1（図２にて画像P1内に灰色で示す領域）と背景領域との境界上の点piを新たに選択し、手順２へ進む。
（手順２）ターゲット画像P1の光学中心C1から点piへと向かう視線VL1と、各画像P1及びP2の光学中心C1及びC2を通る直線と、によって張られる平面（エピポーラ面）と、リファレンス画像P2における前景領域R2（図２にて画像P2内に灰色で示す領域）と、が交差する箇所として、前景領域R2内の線分（一般に複数）を求め、光学中心C2から当該交差箇所の線分を視線VL1へと投影することで、視線エッジ集合VE(P1,P2)に属する線分を求めてから、手順３へと進む。
（手順３）ここまでに繰り返された一連の手順１において点piを全て選択済みであれば、ここまでの一連の手順２において得られた視線エッジ集合VE(P1,P2)を出力して、当該フローを終了する。点piが全て選択済みでなければ、手順１へと戻り、それまでに選択されていない新たな点piを選択するようにする。 The line-of-sight edge set VE (P1, P2) for P1 and P2 in FIG. 2 can be obtained by the flow of the following repeating steps.
(Procedure 1) A new point pi on the boundary between the foreground area R1 of the target image P1 (the area shown in gray in the image P1 in FIG. 2) and the background area is newly selected, and the process proceeds to step 2.
(Procedure 2) A plane (epipolar surface) stretched by a line segment VL1 from the optical center C1 of the target image P1 toward the point pi, a straight line passing through the optical centers C1 and C2 of each image P1 and P2, and a reference image P2. The line segment (generally a plurality) in the foreground region R2 is obtained as the intersection with the foreground region R2 (the region shown in gray in the image P2 in FIG. 2), and the line segment at the intersection is obtained from the optical center C2. Is projected onto the line-of-sight VL1 to obtain a line segment belonging to the line-of-sight edge set VE (P1, P2), and then the process proceeds to step 3.
(Procedure 3) If all the point pis have been selected in the series of steps 1 repeated up to this point, the line-of-sight edge set VE (P1, P2) obtained in the series of steps 2 up to this point is output. End the flow. If all the point pis have not been selected, the process returns to step 1 and a new point pi that has not been selected so far is selected.

ここで、手順２に関して、図２の例ではエピポーラ面とリファレンス画像P2との交線（エピポーラ線）L2上において、エピポーラ面と前景領域R2とが交差する線分が２つ求まっており、従って、これを視線VL1へと光学中心C2から投影したものとして、２点pi-1及びpi-2を両端とする線分と、２点pi-3及びpi-4を両端とする線分と、の２つの線分が視線エッジ集合VE(P1,P2)に属するものとして求まっている。 Here, regarding step 2, in the example of FIG. 2, two line segments where the epipolar surface and the foreground region R2 intersect are obtained on the intersection line (epipolar line) L2 between the epipolar surface and the reference image P2. Assuming that this is projected onto the line of sight VL1 from the optical center C2, a line segment having two points pi-1 and pi-2 at both ends and a line segment having two points pi-3 and pi-4 at both ends, The two line segments of are found to belong to the line-of-sight edge set VE (P1, P2).

また、手順１で前景領域R1と背景領域との境界上から選ぶ点piの例が、図２の例に対応するものとして図３に示されている。図３では、p1, p2, …, p8の８つの点がそれぞれ個別のpiとして選択される。従って、図３の例であれば、上記の手順１〜手順３が８回だけ繰り返されることで、全ての点piについて処理が終わり、当該フローが終了することとなる。 Further, an example of a point pi selected from the boundary between the foreground region R1 and the background region in step 1 is shown in FIG. 3 as corresponding to the example of FIG. In FIG. 3, eight points p1, p2,…, and p8 are selected as individual pis. Therefore, in the case of the example of FIG. 3, by repeating the above steps 1 to 3 only eight times, the processing is completed for all the point pis, and the flow is completed.

なお、本発明の一実施形態において手順１で境界上の点piを選択することに関しても、非特許文献３とは異なる手法である。すなわち、非特許文献３ではターゲット画像において境界(contour)を多角形近似しており、特に境界上の点を選択しているわけではない。一方、手順１では境界上の各点piを、ターゲット画像上で当該境界上にあり、且つ、ピクセル（画素）単位で、すなわち、当該ターゲット画像上の距離で、互いに一定間隔にあるものとして離散的に選択する。当該ピクセル単位で離散的に各点piを選択することにより、得られる視線エッジの分布を均一化することが可能になるという効果がある。 In one embodiment of the present invention, selecting the point pi on the boundary in step 1 is also a method different from that of Non-Patent Document 3. That is, in Non-Patent Document 3, the boundary is approximated by a polygon in the target image, and a point on the boundary is not particularly selected. On the other hand, in step 1, each point pi on the boundary is discrete on the target image as being on the boundary and at regular intervals from each other in pixel units, that is, at a distance on the target image. Select. By selecting each point pi discretely in pixel units, there is an effect that the distribution of the obtained line-of-sight edges can be made uniform.

なお、手順１及び手順２に関して、画像P1の前景領域R1（及び背景領域）と画像P2の前景領域R2（及び背景領域）とは、前述の通り、入力される多視点画像において予め前景及び背景の区別が付与されていることによって定まるものである。 Regarding steps 1 and 2, the foreground area R1 (and background area) of the image P1 and the foreground area R2 (and the background area) of the image P2 are, as described above, the foreground and background in the input multi-viewpoint image in advance. It is determined by the fact that the distinction between is given.

図２及び図３はt=1,r=2の１つの選び方の場合に関して視線エッジ集合VE(Pt,Pr)を求める処理を説明したが、前述の式(1)の通り、その他の選び方に関しても同様に視線エッジ集合VE(Pt,Pr)を求めることで、全ての選び方に関して求まった視線エッジ集合VE(Pt,Pr)の全体として視線エッジ集合VE_allが得られることとなる。図４では、[a]〜[d]と分けることで、当該選び方に応じた視線エッジ集合VE(Pt,Pr)が増えていく過程、すなわち、選び方がカバーされるごとに式(1)の和集合が増えて求まっていく過程の例が示されている。[a]はカメラC1をターゲット側、カメラC2をリファレンス側としてVE(P1,P2)のみを3次元空間にプロットした例であり、[b]はカメラC1をターゲット側、カメラC2〜C4をそれぞれリファレンス側としてVE(P1,P2)〜VE(P1,P4)をまとめてプロットした例であり、[c]はカメラC1をターゲット側、カメラC2〜C8をそれぞれリファレンス側としてVE(P1,P2)〜VE(P1,P8)をまとめてプロットした例であり、[d]はカメラC1をターゲット側、カメラC2〜C10をそれぞれリファレンス側としてVE(P1,P2)〜VE(P1,P10)をまとめてプロットした例である。[a]から[d]へと至るにつれ式(1)の和集合が増えて（すなわち、最終的なVE_allに近づいて）求まっていく過程を見て取ることができる。 FIGS. 2 and 3 have described the process of obtaining the line-of-sight edge set VE (Pt, Pr) for one selection method of t = 1 and r = 2, but as described in the above equation (1), regarding the other selection methods. Similarly, by obtaining the line-of-sight edge set VE (Pt, Pr), the line-of-sight edge set VE _all can be obtained as a whole of the line-of-sight edge set VE (Pt, Pr) obtained for all selection methods. In FIG. 4, by dividing into [a] to [d], the process of increasing the line-of-sight edge set VE (Pt, Pr) according to the selection method, that is, each time the selection method is covered, the equation (1) An example of the process of increasing the union and finding it is shown. [a] is an example in which only VE (P1, P2) is plotted in a three-dimensional space with camera C1 as the target side and camera C2 as the reference side. [B] is an example in which camera C1 is the target side and cameras C2 to C4 are respectively. This is an example of plotting VE (P1, P2) to VE (P1, P4) together as the reference side. [C] shows VE (P1, P2) with camera C1 as the target side and cameras C2 to C8 as the reference side. This is an example of plotting ~ VE (P1, P8) together. [D] summarizes VE (P1, P2) ~ VE (P1, P10) with camera C1 as the target side and cameras C2 to C10 as the reference side. This is an example plotted. We can see the process of increasing the union of equation (1) from [a] to [d] (that is, _{approaching the final VE all).}

図４の[e]は、最終的に求まった視点エッジ集合VE_allの各エッジ（線分）の両端の点として点群生成部1が出力する3次元空間における点群データの例である。図示するように、当該点群は２つの個別対象OB1,OB2（例えば、サッカーフィールド上の2人のサッカー選手としての個別対象）から構成されるものであるが、当該構成されることが判明するのは次の分別部2の処理によってである。 [E] in FIG. 4 is an example of point cloud data in the three-dimensional space output by the point cloud generator 1 as points at both ends of each edge (line segment) of _{the finally obtained viewpoint edge set VE all.} As shown in the figure, the point cloud is composed of two individual objects OB1 and OB2 (for example, individual objects as two soccer players on a soccer field), and it turns out that the point cloud is composed. Is due to the processing of the next sorting unit 2.

＜分別部2＞
分別部2では、点群生成部1で得た点群データ（3次元空間内の複数の点座標データ）に対してクラスタリングを行い、結果としての各クラスタが多視点画像において撮影されている個別対象に該当するものとして、クラスタリング結果を得る。一実施形態では、点群生成部1で得た3次元の点群データPG∋{(x, y, z)}に対して直接、クラスタリングを行うようにすればよい。別の一実施形態では、点群生成部1で得た3次元の点群データPGを所定平面（XY平面とする）上に投影した2次元データP_XY∋{(X, Y)}を対象として、クラスタリングを行うようにしてもよい。（この場合、2次元データに投影したうえでのクラスタリング結果を、元の3次元の点群データに対するクラスタリング結果として採用することとなる。） <Separation part 2>
In the sorting unit 2, clustering is performed on the point cloud data (multiple point coordinate data in the three-dimensional space) obtained by the point cloud generation unit 1, and each cluster as a result is individually photographed in the multi-viewpoint image. Obtain the clustering result as applicable to the target. In one embodiment, clustering may be performed directly on the three-dimensional point cloud data PG ∋ {(x, y, z)} obtained by the point cloud generation unit 1. _{In another embodiment, the target is the two-dimensional data P XY} ∋ {(X, Y)} obtained by projecting the three-dimensional point cloud data PG obtained by the point cloud generation unit 1 onto a predetermined plane (referred to as the XY plane). As a result, clustering may be performed. (In this case, the clustering result after projecting to the 2D data will be adopted as the clustering result for the original 3D point cloud data.)

ここで、当該投影するための所定平面XYに関しては、個別対象が移動する方向に即して張られる平面（例えば、個別対象がサッカーフィールド上における各サッカー選手である場合、当該地面としてのサッカーフィールドの平面）、又は、入力される多視点画像のキャリブレーションに利用した平面（例えば、キャリブレーションのための平面マーカを配置した平面）を採用することができる。また、その他の平面を採用してもよい。 Here, regarding the predetermined plane XY for projection, a plane stretched along the direction in which the individual target moves (for example, when the individual target is each soccer player on the soccer field, the soccer field as the ground). Plane) or a plane used for calibrating the input multi-viewpoint image (for example, a plane on which a plane marker for calibration is arranged) can be adopted. Moreover, other planes may be adopted.

クラスタリング結果の各クラスタをクラスタo_i(i=1, 2, …, n)とすると、分別部2での処理はクラスタリング関数clusteringによって以下の式(2A)又は(2B)のように形式的に表すことができる。式(2A)は3次元データを直接にクラスタリングする場合のものであり、式(2B)は2次元データに投影してからクラスタリングする場合のものである。ここで、入力される多視点画像における既知の対象の数として、クラスタ数nは予め与えておけばよい。 Assuming that each cluster of the clustering result is cluster o _i (i = 1, 2,…, n), the processing in the sorting part 2 is formally performed by the clustering function clustering as in the following equation (2A) or (2B). Can be represented. Equation (2A) is for direct clustering of 3D data, and equation (2B) is for clustering after projecting onto 2D data. Here, the number of clusters n may be given in advance as the number of known objects in the input multi-viewpoint image.

なお、クラスタリングの手法についてはk-means法（k平均化法）その他といったような任意の既存手法を用いればよい。ここで、クラスタリング対象となる点同士の距離（類似度）は、ユークリッド距離等で評価すればよい。 As the clustering method, any existing method such as k-means method (k-means method) or the like may be used. Here, the distance (similarity) between the points to be clustered may be evaluated by the Euclidean distance or the like.

図４の例では[f]に示されるように、n=2として式(2B)の実施形態を適用することで、クラスタリング結果o₁,o₂が対象OB1,OB2に対応するものとして得られている。こうして[e]に示すように、元の3次元点群データにおいても対象OB1,OB2に対応するものとしてのクラスタリング結果が得られることとなる。 In the example of FIG. 4, as shown in [f], by applying the embodiment of Eq. (2B) with n = 2, clustering results o ₁ and o ₂ are obtained as corresponding to the target OBs 1 and OB 2. ing. In this way, as shown in [e], the clustering result corresponding to the target OB1 and OB2 can be obtained even in the original 3D point cloud data.

＜近傍取得部3＞
近傍取得部3の役割は、後段側の抽出部5において対象の領域を適切に抽出可能とするための情報としての、各点の近傍点がいずれの点であるかの情報を得ることである。なお、その意義は後述の抽出部5の説明の際にも説明する。近傍取得部3では具体的に、分別部2で得た個別対象o_i（前述の通りクラスタo_iが個別対象o_iに該当する）ごとの点群データ（個別対象ごとの各点の3次元位置座標のデータ）の各点tに関して、同クラスタo_i内での所定数k個の最近傍pt₁, pt₂, …, pt_k（∈o_i）の情報を取得する。この際、各点tについて同クラスタo_i内の他の点との距離を計算したうえで、距離が近い上位のk個を選択すればよい。すなわち、近傍取得部3の処理は、クラスタo_i内において点tのk近傍を求める関数find_nearest_pointによって形式的に以下の式(3)のように表すことができる。この際、任意の既存のk近傍法（KNN）を利用してよい。 <Neighborhood acquisition unit 3>
The role of the neighborhood acquisition unit 3 is to obtain information on which point the neighborhood point of each point is, as information for enabling the extraction unit 5 on the latter stage side to appropriately extract the target area. .. The significance of this will be explained in the explanation of the extraction unit 5 described later. In the vicinity acquisition unit 3 specifically, three-dimensional points of the point cloud data (each individual target for each sorting unit 2 obtained in individual subjects o _i (as described above cluster o _i corresponds to the individual subject o _i) for each point t of data coordinates), a predetermined number k nearest neighbors pt _1, pt ₂ in the same cluster o _i, ..., it acquires information of pt _k (∈o _i). At this time, _{after calculating the distance from each point t to other points in the same cluster o i} , the upper k points having a short distance may be selected. That is, the processing of the neighborhood acquisition unit 3 can be formally expressed as the following equation (3) by the function find_nearest_point that finds the k-nearest neighbor of the point t in _{the cluster o i.} At this time, any existing k-nearest neighbor method (KNN) may be used.

本発明の一実施形態においては、近傍取得部3により当該得られる各点tとそのk個の近傍点との集合{t, pt₁, pt₂, …, pt_k}は凸包を形成している前提により、後段側の抽出部5において三角形化による領域抽出を行う。すなわち、場合によっては（稀に）凸包でないこともありうるが、このような前提により三角形化による領域抽出を行うことができ、結果として抽出される領域の精度も確保されることとなる。 _{In one embodiment of the present invention, the set {t, pt 1} , pt ₂ , ..., pt _k } of each point t obtained by the neighborhood acquisition unit 3 and its k neighborhood points forms a convex hull. Based on the above premise, the area extraction by triangulation is performed in the extraction unit 5 on the latter stage side. That is, in some cases, it may not be a convex hull (rarely), but based on such a premise, the region can be extracted by triangulation, and the accuracy of the extracted region can be ensured as a result.

なお、点群生成部1においてある程度だけ密に点群を生成しておくことにより、近傍取得部3により当該得られる各点tとそのk個の近傍点との3次元空間上での距離は小さくなり、後段側の逆投影部4で画像平面上へと逆投影した際に、集合{t, pt₁, pt₂, …, pt_k}の中で同一のピクセルに逆投影されるものが存在する場合もありうる。そこで、集合{t, pt₁, pt₂, …, pt_k}から少なくとも3つのピクセルが逆投影により得られる可能性を高めるように、kの値は例えば3より大きいものとする（k≧4）ことが望ましい。例えば、k=6あるいはk=7としてよい。なお、後段側の抽出部5において三角形化により領域を定めるために、近傍取得部3によって近傍と判定されている少なくとも3つの互いに異なるピクセルが必要となる。 By generating the point groups densely to some extent in the point group generation unit 1, the distance between each point t obtained by the neighborhood acquisition unit 3 and its k neighborhood points in the three-dimensional space can be determined. When it becomes smaller and is back-projected onto the image plane by the back-projection unit 4 on the rear stage side, the set {t, pt ₁ , pt ₂ ,…, pt _k } is back-projected to the same pixel. It may exist. Therefore, the value of k is assumed to be greater than 3, for example, so as to increase the possibility that at least 3 pixels from the set {t, pt ₁ , pt ₂ ,…, pt _{k} can be obtained by back projection (k ≧ 4).} ) Is desirable. For example, k = 6 or k = 7. In addition, in order to determine the area by triangulation in the extraction unit 5 on the latter stage side, at least three different pixels determined to be in the neighborhood by the neighborhood acquisition unit 3 are required.

図５の例では、[g]に示すようなある個別対象OB1の点群に関して、その一部を拡大表示したものとして[h]に示すようなある点pt1(点tの１つの例)についての、k=6として得られた6個の近傍の点pt2〜pt7が示されている。このように、[g]に示す個別対象OB1の点群に属する全ての点tに関して、近傍取得部3によってその近傍の点が求められることとなる。 In the example of FIG. 5, regarding the point cloud of a certain individual target OB1 as shown in [g], a certain point pt1 (one example of the point t) as shown in [h] as a part of the point cloud is enlarged and displayed. The six neighborhood points pt2 to pt7 obtained with k = 6 are shown. In this way, for all the points t belonging to the point cloud of the individual target OB1 shown in [g], the neighborhood acquisition unit 3 obtains the points in the vicinity thereof.

＜逆投影部4＞
逆投影部4は、分別部2から得られた個別対象o_iごとの点群データを、ユーザが設定する仮想視点に対応する画像平面へと逆投影する。当該逆投影の結果は、ユーザが設定する仮想視点に対応する画像平面(u,v)での画像において、2値画像として表現することができる。すなわち、当該画像平面上のある画素位置(u,v)に関して、点群データに属する点が少なくとも1つ逆投影されていれば値を1（白色）とし、1つも逆投影される点がなければ値を0（黒色）とした2値画像として、逆投影部4の出力を得ることができる。 <Back projection 4>
The back projection unit 4 back-projects _{the point cloud data for each individual object o i} obtained from the sorting unit 2 onto the image plane corresponding to the virtual viewpoint set by the user. The result of the back projection can be expressed as a binary image in the image on the image plane (u, v) corresponding to the virtual viewpoint set by the user. That is, with respect to a certain pixel position (u, v) on the image plane, if at least one point belonging to the point cloud data is back-projected, the value is set to 1 (white), and there must be no back-projected point. For example, the output of the back projection unit 4 can be obtained as a binary image with a value of 0 (black).

ここで、逆投影の処理に関しては周知の通り、次の式(4)で与えられるように、ユーザが設定する仮想視点に対応する透視投影行列Tⁱ ₃₄（3行4列）を乗ずることによって実現することができる。なお、Tⁱ ₃₄の上付きのiは仮想視点カメラの識別子である。式(4)にて[x y z 1]及び[ru rv r]（両者共に転置により列ベクトル）はそれぞれ、逆投影される対象としての点群の各点(x, y, z)と、その画像平面上への逆投影結果としての画素位置(u, v)と、の同次座標表現である。 ^{Here, as is well known about the back projection process, by multiplying the perspective projection matrix T i} ₃₄ (3 rows and 4 columns) corresponding to the virtual viewpoint set by the user, as given by the following equation (4). It can be realized. ^{The superscript i} of T _{i 34} is the identifier of the virtual viewpoint camera. In Eq. (4), [x yz 1] and [ru rv r] (both are column vectors by transposition) are the points (x, y, z) of the point cloud to be back-projected and their images. It is a homogeneous coordinate representation of the pixel position (u, v) as a result of back projection on a plane.

図５の[i]の例では、ある個別対象OB1の点群に関して逆投影部4による逆投影の結果が前述の白又は黒の2値画像の形式で示されている。 In the example of [i] of FIG. 5, the result of the back projection by the back projection unit 4 with respect to the point cloud of a certain individual target OB1 is shown in the above-mentioned white or black binary image format.

＜抽出部5＞
抽出部5では、逆投影部4で得たユーザが設定する仮想視点に対応する画像上での個別対象o_iごとの点群データ（図５の[i]の例のように、当該画像上で一般に離散的且つスパースに分布するものとなっている）に関して、当該離散的且つスパースな各点の間に形成される領域のうち、個別対象o_iが占める領域に該当すると判定される領域を埋めていくことにより、個別対象o_iが占める領域を抽出する。当該埋めていく際に、近傍取得部3で各点tに関して近傍と判定されたk個の点の情報、すなわち、集合{t, pt₁, pt₂, …, pt_k}の情報を参照することにより、当該ｋ近傍と判定された（3次元空間内の）集合{t, pt₁, pt₂, …, pt_k}は、逆投影部4によって2次元画像上に投影された際にも個別対象o_iが占める領域に含まれる凸包を形成しているという前提を用いて、三角形化（三角形の３つの頂点を選んでその内部を埋める処理）によって個別対象o_iが占める領域を抽出する。 <Extractor 5>
_{In the extraction unit 5, the point cloud data for each individual target o i} on the image corresponding to the virtual viewpoint set by the user obtained by the back projection unit 4 (as in the example of [i] in FIG. 5 on the image). Of the regions formed between the discrete and sparse points, the region determined to correspond to the region occupied by the _{individual target o i} is defined as the region formed between the discrete and sparse points. By filling in, the _{area occupied by the individual target o i} is extracted. When filling in, the information of k points determined to be near each point t by the neighborhood acquisition unit 3, that is, the information of the set {t, pt ₁ , pt ₂ , ..., pt _k } is referred to. Therefore, the set {t, pt ₁ , pt ₂ , ..., pt _k } determined to be in the vicinity of k is also projected onto the two-dimensional image by the back projection unit 4. with assumption that forms a convex hull that is included in the area occupied by the individual target o _i, extracting a region occupied by the individual target o _i by triangulation (process of filling the interior choose three vertices of the triangle) To do.

具体的には、（3次元空間内の）集合{t, pt₁, pt₂, …, pt_k}を逆投影部4によって逆投影した集合（画像上のピクセル位置の集合）を{I_t, I₁, I₂, …, I_k}として、当該点集合{I_t, I₁, I₂, …, I_k}に対して任意の既存手法による三角形化を適用するようにすればよい。三角形化は、関数triangulationによって以下の式(5)のように形式的に表現することができる。ここで、出力されるTはm行3列の行列であり、そのm個の各行（すなわち、サイズ3の行ベクトルに相当）へと、構成された三角形の頂点を表すインデクスを格納するものとする。 Specifically, the set (set of pixel positions on the image) obtained by back-projecting the set {t, pt ₁ , pt ₂ ,…, pt _k _{} (in 3D space) by the back projection unit 4 is {I t.} , I ₁ , I ₂ ,…, I _k }, and the triangulation by any existing method may be applied to the _{point set {I t} , I ₁ , I ₂ ,…, I _k}. .. Triangulation can be formally expressed by the function triangulation as shown in Eq. (5) below. Here, the output T is a matrix of m rows and 3 columns, and each of the m rows (that is, equivalent to a row vector of size 3) stores an index representing the vertices of the constructed triangle. To do.

図５の[j]には画像上の局所的な領域における当該三角形化の例が示され、[k]には当該局所的な三角形化を個別対象OB1に関する点の全体に渡って行った結果の総和として、個別対象OB1（人物）の領域が得られた例が示されている。当該人物としての個別対象OB1は上側に頭部、中段部に胴体及び片方の手、下側に両足の領域を見て取ることができる。ここで、三角形化を式(5)のように近傍と判定された点に基づいて実施することにより、例えば下側の両足の領域が、両足の間に位置する領域が埋まってしまうことなく適切に得られている。 [J] of FIG. 5 shows an example of the triangulation in a local region on the image, and [k] shows the result of performing the local triangulation over the entire points related to the individual target OB1. As the sum of the above, an example in which the area of the individual target OB1 (person) is obtained is shown. The individual target OB1 as the person can see the area of the head on the upper side, the torso and one hand in the middle part, and the areas of both feet on the lower side. Here, by performing the triangulation based on the points determined to be in the vicinity as in Eq. (5), for example, the area of both lower feet is appropriate without filling the area located between both feet. Has been obtained in.

＜描画部6＞
描画部6では、抽出部5で得たユーザが設定する仮想視点に対応する画像内における各個別対象の領域（前景）に対して、入力された多視点画像から当該仮想視点に近いと判定される１つ以上の画像を用いてテクスチャマッピングを行うことにより、当該仮想視点での自由視点画像を生成する。ここで、各個別対象の領域（前景）に関して、入力される多視点画像に対して予めデプス情報を付与しておくことで、あるいは、多視点画像内の個別対象に対してステレオマッチングを適用することで、抽出部5で得た個別対象の領域（前景）に関してデプス情報を与えてビルボードを生成したうえで、任意の既存手法（ビルボード方式）によって自由視点画像を生成することができる。この際、背景の情報は予め既知のものとして与えておくようにすればよい。ビルボード方式での自由視点画像の生成は既存手法として例えば、前掲の非特許文献２や以下のものを利用してよい。
HAYASHI, Kunihiko; SAITO, Hideo. Synthesizing free-viewpoint images from multiple view videos in soccer stadium. In: Computer Graphics, Imaging and Visualisation, 2006 International Conference on. IEEE, 2006. p. 220-225. <Drawing part 6>
In the drawing unit 6, it is determined from the input multi-viewpoint image that the area (foreground) of each individual target in the image corresponding to the virtual viewpoint set by the user obtained in the extraction unit 5 is close to the virtual viewpoint. By performing texture mapping using one or more images, a free viewpoint image at the virtual viewpoint is generated. Here, with respect to the area (foreground) of each individual object, depth information is added to the input multi-view image in advance, or stereo matching is applied to the individual object in the multi-view image. As a result, it is possible to generate a billboard by giving depth information about the area (foreground) of the individual object obtained by the extraction unit 5, and then generate a free viewpoint image by an arbitrary existing method (bilboard method). At this time, the background information may be given as known in advance. For the generation of the free viewpoint image by the billboard method, for example, the above-mentioned Non-Patent Document 2 or the following may be used as an existing method.
HAYASHI, Kunihiko; SAITO, Hideo. Synthesizing free-viewpoint images from multiple view videos in soccer stadium. In: Computer Graphics, Imaging and Visualization, 2006 International Conference on. IEEE, 2006. p. 220-225.

以上、本発明によれば、自由視点画像を生成するためのビルボードを得るための個別対象の領域を高速且つ高精度に計算可能であることから、自由視点画像も高速且つ高精度に生成することが可能となる。なお、当該高速且つ高精度な計算を可能にしている点として、以下の全て又はその任意の一部を挙げることができる。
（１）完全な３次元モデルとしてのビジュアル・ハルを生成することに代えて、点群生成部1においてその近似として高速に計算可能な点群データを得ている。
（２）当該点群データは、多視点画像の各画像における前景及び背景の境界の点に基づいて生成するものであるため、ビジュアル・ハルの近似であっても形状を正確に反映したものとなっている。
（３）非特許文献３におけるような多面体モデルを用いることなく、分別部2においてクラスタリングにより高速に、撮影されている複数の個別対象をそれぞれ分離することができる。
（４）個別対象に凹部がある場合にも、近傍取得部3によって得られる各点の近傍点の情報によって対処することが可能である。
（５）抽出部5において局所的な三角形化という簡素な処理によって、個別対象の領域を得ることができる。 As described above, according to the present invention, since the area of the individual object for obtaining the billboard for generating the free viewpoint image can be calculated at high speed and with high accuracy, the free viewpoint image is also generated at high speed and with high accuracy. It becomes possible. All or any part of the following can be mentioned as points that enable the high-speed and high-precision calculation.
(1) Instead of generating a visual hull as a complete three-dimensional model, the point cloud generation unit 1 obtains point cloud data that can be calculated at high speed as an approximation thereof.
(2) Since the point cloud data is generated based on the points at the boundary between the foreground and the background in each image of the multi-viewpoint image, the shape is accurately reflected even if it is an approximation of the visual hull. It has become.
(3) Without using the polyhedron model as in Non-Patent Document 3, it is possible to separate a plurality of individual objects being photographed at high speed by clustering in the sorting unit 2.
(4) Even if the individual object has a recess, it is possible to deal with it by using the information on the neighborhood points of each point obtained by the neighborhood acquisition unit 3.
(5) An individual target region can be obtained by a simple process of local triangulation in the extraction unit 5.

本発明は、コンピュータを画像処理装置10として機能させるプログラムとしても提供可能である。当該コンピュータには、CPU(中央演算装置)、メモリ及び各種I/Fといった周知のハードウェア構成のものを採用することができ、CPUが画像処理装置10の各部の機能に対応する命令を実行することとなる。また、当該コンピュータはさらに、CPUよりも並列処理を高速実施可能なGPU（グラフィック処理装置）を備え、CPUに代えて画像処理装置10の全部又は任意の一部分の機能を当該GPUにおいてプログラムを読み込んで実行するようにしてもよい。 The present invention can also be provided as a program that causes a computer to function as an image processing device 10. A well-known hardware configuration such as a CPU (central processing unit), memory, and various I / Fs can be adopted for the computer, and the CPU executes instructions corresponding to the functions of each part of the image processing device 10. It will be. In addition, the computer is further equipped with a GPU (graphic processing unit) capable of performing parallel processing at a higher speed than the CPU, and instead of the CPU, the GPU reads a program for all or any part of the functions of the image processing unit 10. You may want to do it.

10…画像処理装置、1…点群生成部、2…分別部、3…近傍取得部、4…逆投影部、5…抽出部、6…描画部 10 ... image processing device, 1 ... point group generating unit, 2 ... sorting unit, 3 ... neighborhood acquisition unit, 4 ... back projection unit, 5 ... extraction unit, 6 ... drawing unit

Claims

多視点画像から、各画像における視点位置と、前景及び背景の境界上の点と、に基づいて対象の点群データを得る点群生成部と、
前記点群データをクラスタリングして個別対象ごとの点群データとなす分別部と、
当該個別対象ごとの点群データにおいて各点の近傍点を取得する近傍取得部と、
前記個別対象ごとの点群データを指定される仮想視点での画像平面へと逆投影する逆投影部と、
前記取得された各点の近傍点を前記画像平面上の各点が接続する近傍点とすることで、当該接続する関係で定義される領域として、当該画像平面における個別対象の領域を抽出する抽出部と、を備えることを特徴とする画像処理装置。 A point cloud generator that obtains target point cloud data based on the viewpoint position in each image and the points on the boundary between the foreground and the background from the multi-view image.
A sorting unit that clusters the point cloud data into point cloud data for each individual object,
A neighborhood acquisition unit that acquires neighborhood points of each point in the point cloud data for each individual object,
A back-projection unit that back-projects the point cloud data for each individual object onto the image plane at the designated virtual viewpoint, and
By setting the neighborhood points of the acquired points as the neighborhood points to which the points on the image plane are connected, the region of the individual target in the image plane is extracted as the region defined by the connecting relationship. An image processing device including a unit and a unit.

前記点群生成部は、多視点画像のうちの１つをターゲット画像とし別のもう１つをリファレンス画像とし、ターゲット画像における前景及び背景上の境界上の第一点と、ターゲット画像の光学中心である第二点と、リファレンス画像の光学中心である第三点と、によって定まるエピポーラ面が、リファレンス画像の前景の領域から切り取る線分を、第二点及び第一点を通るターゲット画像における視線へと投影した線分より、前記対象の点群を得ることを特徴とする請求項１に記載の画像処理装置。 The point group generation unit uses one of the multi-viewpoint images as the target image and the other as the reference image, the first point on the foreground and background boundaries of the target image, and the optical center of the target image. The epipolar plane determined by the second point, which is the optical center of the reference image, cuts the line segment from the foreground region of the reference image, and the line of sight in the target image passing through the second and first points. The image processing apparatus according to claim 1, wherein the target point group is obtained from the line segment projected onto the image.

前記点群生成部は、前記投影した線分の両端の点が含まれるものとして、前記対象の点群を得ることを特徴とする請求項２に記載の画像処理装置。 The image processing apparatus according to claim 2, wherein the point cloud generation unit obtains the target point cloud, assuming that the points at both ends of the projected line segment are included.

前記点群生成部は、ターゲット画像における前景及び背景上の境界上の第一点を、当該境界上から画像上の距離で一定間隔ごとに選出したうえで、前記対象の点群を得ることを特徴とする請求項２または３に記載の画像処理装置。 The point group generating unit selects the first point on the boundary between the foreground and the background in the target image at regular intervals at a distance on the image from the boundary, and then obtains the target point group. The image processing apparatus according to claim 2 or 3.

前記分別部は、前記点群データを所定平面に投影した際の当該所定平面上での位置関係に基づいて、前記点群データをクラスタリングすることを特徴とする請求項１ないし４のいずれかに特徴の画像処理装置。 The sorting unit according to any one of claims 1 to 4, wherein the point cloud data is clustered based on the positional relationship on the predetermined plane when the point cloud data is projected onto the predetermined plane. Characteristic image processing device.

前記抽出部は、各点に関して前記取得された近傍点との間に形成される線分を、辺として有する三角形領域の和として、前記画像平面における個別対象の領域を抽出することを特徴とする請求項１ないし５のいずれかに記載の画像処理装置。 The extraction unit is characterized in that a region of an individual object in the image plane is extracted as the sum of triangular regions having a line segment formed between the acquired neighborhood points as a side for each point. The image processing apparatus according to any one of claims 1 to 5.

前記多視点画像と、前記抽出された個別対象の領域と、に基づいて、前記指定された仮想視点における自由視点画像を生成する描画部をさらに備えることを特徴とする請求項１ないし６のいずれかに記載の画像処理装置。 Any of claims 1 to 6, further comprising a drawing unit that generates a free viewpoint image in the designated virtual viewpoint based on the multi-viewpoint image and the extracted individual target area. The image processing device described in Crab.

多視点画像から、各画像における視点位置と、前景及び背景の境界上の点と、に基づいて対象の点群データを得る点群生成段階と、
前記点群データをクラスタリングして個別対象ごとの点群データとなす分別段階と、
当該個別対象ごとの点群データにおいて各点の近傍点を取得する近傍取得段階と、
前記個別対象ごとの点群データを指定される仮想視点での画像平面へと逆投影する逆投影段階と、
前記取得された各点の近傍点を前記画像平面上の各点が接続する近傍点とすることで、当該接続する関係で定義される領域として、当該画像平面における個別対象の領域を抽出する抽出段階と、を備えることを特徴とする画像処理方法。 A point cloud generation stage in which target point cloud data is obtained from a multi-viewpoint image based on the viewpoint position in each image and the points on the boundary between the foreground and the background.
A sorting step in which the point cloud data is clustered into point cloud data for each individual object,
In the neighborhood acquisition stage of acquiring the neighborhood points of each point in the point cloud data for each individual object,
A back-projection step in which the point cloud data for each individual object is back-projected onto the image plane at the specified virtual viewpoint, and
By setting the neighborhood points of the acquired points as the neighborhood points to which the points on the image plane are connected, the region of the individual target in the image plane is extracted as the region defined by the connecting relationship. An image processing method comprising:

多視点画像から、各画像における視点位置と、前景及び背景の境界上の点と、に基づいて対象の点群データを得る点群生成部と、
前記点群データをクラスタリングして個別対象ごとの点群データとなす分別部と、
当該個別対象ごとの点群データにおいて各点の近傍点を取得する近傍取得部と、
前記個別対象ごとの点群データを指定される仮想視点での画像平面へと逆投影する逆投影部と、
前記取得された各点の近傍点を前記画像平面上の各点が接続する近傍点とすることで、当該接続する関係で定義される領域として、当該画像平面における個別対象の領域を抽出する抽出部と、を備える画像処理装置として、コンピュータを機能させることを特徴とする画像処理プログラム。 A point cloud generator that obtains target point cloud data based on the viewpoint position in each image and the points on the boundary between the foreground and the background from the multi-view image.
A sorting unit that clusters the point cloud data into point cloud data for each individual object,
A neighborhood acquisition unit that acquires neighborhood points of each point in the point cloud data for each individual object,
A back-projection unit that back-projects the point cloud data for each individual object onto the image plane at the designated virtual viewpoint, and
By setting the neighborhood points of the acquired points as the neighborhood points to which the points on the image plane are connected, the region of the individual target in the image plane is extracted as the region defined by the connecting relationship. An image processing program characterized in that a computer functions as an image processing device including a unit.