JP2024010527A

JP2024010527A - Feature map generation apparatus, feature map generation method, and program

Info

Publication number: JP2024010527A
Application number: JP2022111908A
Authority: JP
Inventors: 修二酒井; Shuji Sakai
Original assignee: Toppan Holdings Inc
Current assignee: Toppan Holdings Inc
Priority date: 2022-07-12
Filing date: 2022-07-12
Publication date: 2024-01-24

Abstract

PROBLEM TO BE SOLVED: To provide a feature map generation apparatus which generates a feature map which can be restored in a three-dimensional manner with high accuracy.

SOLUTION: A feature map generation apparatus 1 includes: a kernel deformation method determination unit 100 which determines a first deformation method corresponding to a first target view image so that specific coordinates in a kernel may be converted into coordinates of a corresponding point in the first target view image, using three-dimensional coordinates, a normal direction, and a first camera parameter corresponding to the first target view image, on the first target view image included in a multi-view image; a kernel deformation unit 101 which generates a first deformed kernel for the first target view image by deforming a criterial reference kernel using the first deformation method; and a convolution operation unit 102 which extracts feature quantities in the first target view image by performing convolution operation on the first target view image using the first deformed kernel, to generate a first feature map corresponding to the first target view image using the extracted feature quantities.

SELECTED DRAWING: Figure 1

Description

本発明は、特徴マップ生成装置、特徴マップ生成方法、及びプログラムに関する。 The present invention relates to a feature map generation device, a feature map generation method, and a program.

対象物体が異なる視点から撮像された複数の視点画像（以下、「多視点画像」と称する場合がある）に基づき、対象物体の三次元形状モデルを生成する多視点ステレオ技術がある。多視点ステレオ技術は、コンピュータビジョンの研究コミュニティだけでなく、文化財のデジタルアーカイブやエンターテイメント産業など、幅広い分野で注目されている。 There is a multi-view stereo technique that generates a three-dimensional shape model of a target object based on a plurality of viewpoint images (hereinafter sometimes referred to as "multi-view images") of the target object taken from different viewpoints. Multi-view stereo technology is attracting attention not only in the computer vision research community but also in a wide range of fields such as digital archives of cultural assets and the entertainment industry.

例えば、非特許文献１には、深層学習に基づく多視点ステレオ技術が開示されている。深層学習に基づく多視点ステレオでは、特性の異なる様々な対象物体が撮影された多視点画像を含む多視点画像データセットを用いて深層学習ネットワークを学習させた学習済ネットワークを生成し、生成した学習済みネットワークを用いて新たな多視点画像から対象の三次元形状を復元する。このとき、学習用のデータセットに、多視点ステレオにて復元が難しいとされる、低テクスチャを有する対象物体や反射が強い対象物体の多視点画像を含めることによって、低テクスチャ領域や反射に対して高いロバスト性を発揮する学習済ネットワークを構築することが可能となる。 For example, Non-Patent Document 1 discloses a multi-view stereo technique based on deep learning. In multi-view stereo based on deep learning, a trained network is generated by training a deep learning network using a multi-view image dataset that includes multi-view images of various target objects with different characteristics, and the generated learning The 3D shape of the object is reconstructed from the new multi-view image using the predefined network. At this time, by including multi-view images of target objects with low texture and target objects with strong reflections, which are difficult to restore using multi-view stereo, in the training dataset, it is possible to reduce the effects of low texture areas and reflections. This makes it possible to construct a trained network that exhibits high robustness.

図９に示すように、一般に、深層学習に基づく多視点ステレオ技術に用いられる学習済ネットワークＮＷは、特徴マップ生成ネットワークＮＷ１、コストボリューム構築ネットワークＮＷ２、コストボリューム正則化ネットワークＮＷ３、及びデプスマップ生成ネットワークＮＷ４の４つのネットワークの組み合わせにより構築される。 As shown in FIG. 9, the trained networks NW used in multi-view stereo technology based on deep learning generally include a feature map generation network NW1, a cost volume construction network NW2, a cost volume regularization network NW3, and a depth map generation network. It is constructed by a combination of four networks of NW4.

ここでの学習済ネットワークＮＷの入力は、参照視点画像及び近傍視点画像群からなる多視点画像群と、各多視点画像におけるそれぞれのカメラパラメータである。学習済ネットワークＮＷの出力は、参照視点画像におけるデプスマップである。 The inputs of the trained network NW here are a multi-view image group consisting of a reference viewpoint image and a neighboring viewpoint image group, and respective camera parameters in each multi-view image. The output of the learned network NW is a depth map in the reference viewpoint image.

特徴マップ生成ネットワークＮＷ１では、入力される多視点画像における画素ごとの特徴量を示す特徴マップが生成される。特徴マップを生成する過程においては、多視点画像に畳込演算を行うことによって画素ごとの特徴量が抽出される。このとき、多視点画像間の対応点では、特徴マップにおける特徴量が近い値となることが期待される。 The feature map generation network NW1 generates a feature map indicating the feature amount for each pixel in the input multi-view image. In the process of generating a feature map, a feature quantity for each pixel is extracted by performing a convolution operation on a multi-view image. At this time, it is expected that the feature amounts in the feature map will have similar values at corresponding points between the multi-view images.

コストボリューム構築ネットワークＮＷ２では、多視点画像における各画像の特徴マップを用いて、離散的な奥行きを有する平面であって、参照視点画像に正対した複数の平面からなるコストボリュームが構築される。例えば、コストボリュームは、ボクセルで表現され、ボクセルの各値は、多視点の特徴マップの対応点における特徴量間の分散や相関である。 The cost volume construction network NW2 uses the feature map of each image in the multi-view image to construct a cost volume consisting of a plurality of planes having discrete depths and directly facing the reference viewpoint image. For example, the cost volume is expressed by voxels, and each voxel value is the variance or correlation between feature amounts at corresponding points in a multi-view feature map.

コストボリューム正則化ネットワークＮＷ３では、三次元畳み込み層を通して、コストボリュームが正則化される。デプスマップ生成ネットワークＮＷ４では、参照視点画像の各ピクセルについて、参照視点画像におけるコストボリュームに対応する奥行きの中から最適なコストの奥行きが選択されることによってデプスマップが生成される。例えば、特徴量間の分散に基づくコストを使用する場合、コスト、つまり分散が最小となる奥行きが最適なコストの奥行きとなる。一方、特徴量間の相関に基づくコストを使用する場合、コスト、つまり相関が最大となる奥行きが最適なコストの奥行きとなる。 In the cost volume regularization network NW3, the cost volume is regularized through a three-dimensional convolution layer. In the depth map generation network NW4, a depth map is generated for each pixel of the reference viewpoint image by selecting the optimal cost depth from among the depths corresponding to the cost volume in the reference viewpoint image. For example, when using a cost based on the variance between feature quantities, the depth at which the cost, that is, the variance is the minimum, is the optimal cost depth. On the other hand, when using a cost based on the correlation between feature quantities, the depth at which the cost, that is, the correlation is maximum, is the optimal cost depth.

ＹａｏＹａｏ，ＺｉｘｉｎＬｕｏ，ＳｈｉｗｅｉＬｉ，ＴｉａｎＦａｎｇ，ａｎｄＬｏｎｇＱｕａｎ、” ＭＶＳＮｅｔ：ＤｅｐｔｈＩｎｆｅｒｅｎｃｅｆｏｒＵｎｓｔｒｕｃｔｕｒｅｄＭｕｌｔｉ－ｖｉｅｗＳｔｅｒｅｏ”，ＥｕｒｏｐｅａｎＣｏｎｆｅｒｅｎｃｅＣｏｍｐｕｔｅｒＶｉｓｉｏｎ（ＥＣＣＶ）（２０１８）．Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan, “MVSNet: Depth Inference for Unstructured Multi-view Stereo”, Eur open Conference Computer Vision (ECCV) (2018).

しかしながら、深層学習に基づく多視点ステレオ技術を用いて、特徴マップを生成する場合、多視点画像間における画像変形の影響を強く受ける。これは、特徴マップを生成する過程において畳込演算が実行されるためである。畳込演算では、多視点画像における対象画素の画素値と、その周囲にある画素の画素値のそれぞれに、畳み込み係数を乗算した値を合成することによって演算が行われる。このため、画像に平行移動が生じた場合、対象画素とその周囲にある画素の組合せが、平行移動する前と変化しないことから、各画素における畳込演算の演算結果は変わらない。一方、画像に回転、拡大縮小、射影変換などの平行移動以外の画像変形が生じた場合、対象画素及びその周囲にある画素の組合せが、変形前とは異なる組合せとなることから各画素における畳込演算の演算結果が変化する。 However, when a feature map is generated using a multi-view stereo technique based on deep learning, it is strongly affected by image deformation between multi-view images. This is because a convolution operation is performed in the process of generating a feature map. In the convolution operation, the operation is performed by combining the pixel value of the target pixel in the multi-view image and the pixel values of the surrounding pixels, respectively, multiplied by a convolution coefficient. Therefore, when a parallel shift occurs in an image, the combination of the target pixel and its surrounding pixels remains unchanged from before the parallel shift, so the result of the convolution operation for each pixel remains unchanged. On the other hand, when an image is deformed other than parallel movement, such as rotation, scaling, or projective transformation, the combination of the target pixel and its surrounding pixels becomes a different combination than before the deformation. The calculation result of the inclusive operation changes.

多視点画像は、同じ対象物体が互いに異なる複数の視点から撮像された画像である。このため、多視点画像間には、多視点画像における各視点の位置や姿勢、および、対象の形状に依存した、複雑な画像変形が生じる。基線長が短い平行ステレオの場合、視点間の画像変形が比較的小さいため、視点が変わっても、特徴マップ生成で計算される特徴マップの対応点の特徴量は大きく変化しない。しかし、基線長が長いステレオペアや、視点間に回転が生じると、画像間の変形が平行移動のみで近似することができなくなるため、生成される特徴マップの対応点の特徴量が、視点ごとに変化する。 A multi-view image is an image in which the same target object is imaged from a plurality of different viewpoints. Therefore, complex image deformation occurs between multi-view images depending on the position and orientation of each viewpoint in the multi-view images and the shape of the object. In the case of parallel stereo with a short baseline length, image deformation between viewpoints is relatively small, so even if the viewpoint changes, the feature amounts of corresponding points in the feature map calculated in feature map generation do not change significantly. However, in the case of a stereo pair with a long baseline length or rotation between viewpoints, the deformation between images cannot be approximated only by parallel movement, so the feature values of corresponding points in the generated feature map are Changes to

一方、深層学習に基づく多視点ステレオでは、参照視点画像における各ピクセルについて、真の奥行きが仮定された場合に、多視点画像間における対応点の特徴量が近くなることを想定している。そのため、参照視点画像に対して、基線長が長い近傍視点画像や、視点間に回転が存在する近傍視点画像のように、参照視点画像に対して画像変形が大きい近傍視点画像が含まれる場合、三次元復元精度が著しく低下する。したがって、画像間における画像変形が大きい多視点画像を用いた場合、高精度な三次元復元が可能な特徴マップを生成することが困難であった。 On the other hand, multi-view stereo based on deep learning assumes that when the true depth of each pixel in a reference viewpoint image is assumed, the feature amounts of corresponding points between multi-view images become close. Therefore, when there are neighboring viewpoint images that have a large image deformation with respect to the reference viewpoint image, such as a neighboring viewpoint image with a long baseline length or a neighboring viewpoint image with rotation between viewpoints, Three-dimensional reconstruction accuracy is significantly reduced. Therefore, when using multi-view images with large image deformations between images, it is difficult to generate a feature map that allows highly accurate three-dimensional reconstruction.

本発明は、上記の課題に基づいてなされたものであり、画像変形が大きい多視点画像を用いた場合であっても、高精度な三次元復元が可能な特徴マップを生成することができる特徴マップ生成装置、特徴マップ生成方法、及びプログラムを提供することを目的とする。 The present invention has been made based on the above-mentioned problems, and has the feature that it is possible to generate a feature map that allows highly accurate three-dimensional reconstruction even when using multi-view images with large image deformations. The present invention aims to provide a map generation device, a feature map generation method, and a program.

本発明の特徴マップ生成装置は、対象物体が互いに異なる複数の視点から撮像された２以上の多視点画像における特徴マップを生成する特徴マップ生成装置であって、前記多視点画像に含まれる第１対象視点画像に対して、三次元座標、法線方向及び前記第１対象視点画像に対応するカメラパラメータである第１カメラパラメータを用いて、カーネルにおける特定の座標が、前記第１対象視点画像における対応点の座標に変換されるように、前記第１対象視点画像に対応するカーネルの変形方法である第１変形方法を決定するカーネル変形方法決定部と、基準となる基準カーネルを、前記第１変形方法を用いて変形することによって、前記第１対象視点画像に対する畳込演算に用いるカーネルである第１変形後カーネルを生成するカーネル変形部と、前記第１変形後カーネルを用いて前記第１対象視点画像に対する畳込演算を行うことによって前記第１対象視点画像における特徴量を抽出し、抽出した特徴量を用いて前記第１対象視点画像に対応する第１特徴マップを生成する畳込演算部と、を備える。 A feature map generation device of the present invention is a feature map generation device that generates a feature map in two or more multi-view images in which a target object is imaged from a plurality of different viewpoints, and the feature map generation device generates a feature map in two or more multi-view images in which a target object is imaged from a plurality of different viewpoints, Using a three-dimensional coordinate, a normal direction, and a first camera parameter, which is a camera parameter corresponding to the first target viewpoint image, for the target viewpoint image, specific coordinates in the kernel are set in the first target viewpoint image. a kernel deformation method determination unit that determines a first deformation method that is a deformation method of the kernel corresponding to the first target viewpoint image, and a reference kernel that is a reference, so that the kernel is transformed into the coordinates of the corresponding point; a kernel transformation unit that generates a first transformed kernel, which is a kernel used in a convolution operation on the first target viewpoint image, by transforming it using a transformation method; A convolution operation that extracts feature amounts in the first target viewpoint image by performing a convolution operation on the target viewpoint image, and generates a first feature map corresponding to the first target viewpoint image using the extracted feature amounts. It is equipped with a section and a section.

本発明の特徴マップ生成装置では、前記カーネル変形方法決定部は、前記多視点画像に含まれる第２対象視点画像であって、前記第１対象視点画像とは異なる第２対象視点画像に対応するカメラパラメータである第２カメラパラメータを取得し、前記三次元座標、前記法線方向、前記第１カメラパラメータ及び前記第２カメラパラメータを用いて、前記第２対象視点画像における対応点の座標が、前記第１対象視点画像における対応点の座標に変換されるように、前記第１変形方法を決定する。 In the feature map generation device of the present invention, the kernel deformation method determining unit is configured to select a second target viewpoint image that is included in the multi-view image and that corresponds to a second target viewpoint image that is different from the first target viewpoint image. Obtain a second camera parameter that is a camera parameter, and use the three-dimensional coordinates, the normal direction, the first camera parameter, and the second camera parameter to determine the coordinates of the corresponding point in the second target viewpoint image. The first transformation method is determined so that the coordinates of the corresponding points in the first target viewpoint image are transformed.

本発明の特徴マップ生成装置では、前記カーネル変形方法決定部は、三次元座標、法線方向及びカメラパラメータの組合せ毎にカーネルの変形方法を決定し、前記カーネル変形部は、共通する同一の前記基準カーネルを、前記組合せの各々に対して決定された変形方法を用いて変形することによって、前記組合せの各々に対応する前記第１変形後カーネルを生成する。 In the feature map generation device of the present invention, the kernel deformation method determining unit determines a kernel deformation method for each combination of three-dimensional coordinates, normal direction, and camera parameters, and the kernel deformation unit The first transformed kernel corresponding to each of the combinations is generated by transforming the reference kernel using the transformation method determined for each of the combinations.

本発明の特徴マップ生成装置では、前記カーネル変形方法決定部は、前記第１対象視点画像のピクセル毎に設定した三次元座標又は法線方向の少なくとも一方を用いて、前記第１対象視点画像のピクセル毎に前記第１変形方法を決定し、前記カーネル変形部は、前記基準カーネルを、前記第１対象視点画像のピクセル毎に、前記第１対象視点画像のピクセル毎に決定された前記第１変形方法を用いて変形することによって、前記第１対象視点画像のピクセルの各々に対応する前記第１変形後カーネルを生成し、前記畳込演算部は、前記第１対象視点画像のピクセル毎に生成された前記第１変形後カーネルを用いて、前記第１対象視点画像における各ピクセルにおける畳込演算を行うことによって、前記第１特徴マップを生成する。 In the feature map generation device of the present invention, the kernel deformation method determining unit uses at least one of the three-dimensional coordinates or the normal direction set for each pixel of the first target viewpoint image to transform the first target viewpoint image. The first deformation method is determined for each pixel, and the kernel deformation unit converts the reference kernel into the first deformation method determined for each pixel of the first target viewpoint image, and the first deformation method determined for each pixel of the first target viewpoint image. The first transformed kernel corresponding to each pixel of the first target viewpoint image is generated by transforming the first target viewpoint image using a transformation method, and the convolution calculation unit generates the first transformed kernel for each pixel of the first target viewpoint image. The first feature map is generated by performing a convolution operation on each pixel in the first target viewpoint image using the generated first transformed kernel.

本発明の特徴マップ生成装置では、前記カーネル変形部は、前記基準カーネルを、前記第１変形方法を用いて変形したカーネルを第１仮変形後カーネルとし、前記第１仮変形後カーネルに対して正方格子で配置された座標を用いた補間処理を行うことにより、前記第１変形後カーネルを生成する。 In the feature map generation device of the present invention, the kernel transformation unit defines a kernel obtained by transforming the reference kernel using the first transformation method as a first temporary transformation kernel, and sets the kernel as a first temporary transformation kernel. The first transformed kernel is generated by performing an interpolation process using coordinates arranged in a square grid.

本発明の特徴マップ生成装置では、前記カーネル変形方法決定部は、互いに異なる複数の法線方向の各々に対応する前記第１変形方法を決定し、前記カーネル変形部は、前記基準カーネルを、前記法線方向の各々に対して決定された前記第１変形方法を用いて変形することによって、前記法線方向の各々に対応する前記第１変形後カーネルを生成し、前記畳込演算部は、前記法線方向の各々に対して生成された前記第１変形後カーネルを用いて、前記第１対象視点画像に対する畳込演算を行うことによって、前記法線方向の各々に対応する前記第１特徴マップを生成する。 In the feature map generation device of the present invention, the kernel deformation method determining unit determines the first deformation method corresponding to each of a plurality of mutually different normal directions, and the kernel deformation unit converts the reference kernel into the first deformation method. The first deformed kernel corresponding to each of the normal directions is generated by deforming using the first deformation method determined for each of the normal directions, and the convolution calculation unit: The first feature corresponding to each of the normal directions is obtained by performing a convolution operation on the first target viewpoint image using the first transformed kernel generated for each of the normal directions. Generate a map.

本発明の特徴マップ生成装置では、複数の前記特徴マップにおける対応点からコスト値を計算するコスト値計算部を更に備え、前記コスト値計算部は、前記法線方向の各々に対応する前記第１特徴マップにおける対応点のそれぞれの特徴量から算出される、前記対応点の特徴量が類似する度合に基づく仮コスト値を計算し、前記法線方向の各々に対応して計算された前記仮コスト値のうち、前記対応点の特徴量が類似する前記仮コスト値を、前記コスト値とする。 The feature map generation device of the present invention further includes a cost value calculation unit that calculates a cost value from corresponding points in the plurality of feature maps, and the cost value calculation unit includes the first A provisional cost value is calculated based on the degree to which the feature amounts of the corresponding points are similar, which is calculated from the feature amounts of each of the corresponding points in the feature map, and the temporary cost is calculated corresponding to each of the normal directions. Among the values, the provisional cost value having similar feature amounts of the corresponding points is set as the cost value.

本発明の特徴マップ生成方法は、対象物体が互いに異なる複数の視点から撮像された２以上の多視点画像における特徴マップを生成する特徴マップ生成装置が行う特徴マップ生成方法であって、カーネル変形方法決定部が、前記多視点画像に含まれる第１対象視点画像に対して、三次元座標、法線方向及び前記第１対象視点画像に対応するカメラパラメータである第１カメラパラメータを用いて、カーネルにおける特定の座標が、前記第１対象視点画像における対応点の座標に変換されるように、前記第１対象視点画像に対応するカーネルの変形方法である第１変形方法を決定し、カーネル変形部が、基準となる基準カーネルを、前記第１変形方法を用いて変形することによって、前記第１対象視点画像に対する畳込演算に用いるカーネルである第１変形後カーネルを生成し、畳込演算部が、前記第１変形後カーネルを用いて前記第１対象視点画像に対する畳込演算を行うことによって前記第１対象視点画像における特徴量を抽出し、抽出した特徴量を用いて前記第１対象視点画像に対応する第１特徴マップを生成する。 The feature map generation method of the present invention is a feature map generation method performed by a feature map generation device that generates feature maps in two or more multi-view images in which a target object is imaged from a plurality of different viewpoints, and includes a kernel deformation method. The determining unit determines a kernel for a first target viewpoint image included in the multi-view image using three-dimensional coordinates, a normal direction, and a first camera parameter that is a camera parameter corresponding to the first target viewpoint image. A first deformation method, which is a method of deforming a kernel corresponding to the first target viewpoint image, is determined so that specific coordinates in the first target viewpoint image are transformed into coordinates of a corresponding point in the first target viewpoint image, and a kernel transformation unit By transforming the reference kernel, which is a reference, using the first transformation method, a first transformed kernel, which is a kernel used in the convolution operation on the first target viewpoint image, is generated, and the convolution operation unit However, by performing a convolution operation on the first target viewpoint image using the first transformed kernel, a feature amount in the first target viewpoint image is extracted, and the extracted feature amount is used to perform a convolution operation on the first target viewpoint image. A first feature map corresponding to the image is generated.

本発明のプログラムは、対象物体が互いに異なる複数の視点から撮像された２以上の多視点画像における特徴マップを生成する特徴マップ生成装置に、特徴マップを生成させるプログラムであって、前記多視点画像に含まれる第１対象視点画像に対して、三次元座標、法線方向及び前記第１対象視点画像に対応するカメラパラメータである第１カメラパラメータを用いて、カーネルにおける特定の座標が、前記第１対象視点画像における対応点の座標に変換されるように、前記第１対象視点画像に対応するカーネルの変形方法である第１変形方法を決定させ、基準となる基準カーネルを、前記第１変形方法を用いて変形することによって、前記第１対象視点画像に対する畳込演算に用いるカーネルである第１変形後カーネルを生成させ、前記第１変形後カーネルを用いて前記第１対象視点画像に対する畳込演算を行うことによって前記第１対象視点画像における特徴量を抽出し、抽出した特徴量を用いて前記第１対象視点画像に対応する第１特徴マップを生成させる、プログラムである。 The program of the present invention is a program that causes a feature map generation device that generates feature maps in two or more multi-view images in which a target object is imaged from a plurality of different viewpoints to generate a feature map, and the program causes a feature map to be generated in the multi-view images. For the first target viewpoint image included in the first target viewpoint image, specific coordinates in the kernel are determined by using three-dimensional coordinates, a normal direction, and a first camera parameter that is a camera parameter corresponding to the first target viewpoint image. A first transformation method, which is a transformation method of the kernel corresponding to the first target viewpoint image, is determined so that the kernel is transformed into the coordinates of the corresponding point in the first target viewpoint image, and the reference kernel serving as a reference is transformed into the first transformation method. A first transformed kernel, which is a kernel used in a convolution operation on the first target viewpoint image, is generated by transforming the first target viewpoint image using a method, and the first transformed kernel is used to perform a convolution operation on the first target viewpoint image. The program extracts a feature amount in the first target viewpoint image by performing an inclusive calculation, and generates a first feature map corresponding to the first target viewpoint image using the extracted feature amount.

本発明によれば、画像間における画像変形が大きい多視点画像を用いた場合であっても、高精度な三次元復元が可能な特徴マップを生成することができる。 According to the present invention, even when using multi-view images with large image deformations between images, it is possible to generate a feature map that allows highly accurate three-dimensional restoration.

実施形態の特徴マップ生成装置１の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a feature map generation device 1 according to an embodiment. 実施形態の特徴マップ生成装置１が行う処理を説明するための図である。FIG. 2 is a diagram for explaining processing performed by the feature map generation device 1 of the embodiment. 実施形態の特徴マップ生成装置１が行う処理を説明するための図である。FIG. 2 is a diagram for explaining processing performed by the feature map generation device 1 of the embodiment. 実施形態の特徴マップ生成装置１が行う処理を説明するための図である。FIG. 2 is a diagram for explaining processing performed by the feature map generation device 1 of the embodiment. 実施形態の特徴マップ生成装置１を用いてデプスマップを生成する処理を説明する図である。FIG. 2 is a diagram illustrating a process of generating a depth map using the feature map generation device 1 of the embodiment. 実施形態の特徴マップ生成装置１が行う処理の流れを示すフローチャートである。It is a flowchart showing the flow of processing performed by the feature map generation device 1 of the embodiment. 実施形態の効果を説明するための図である。FIG. 3 is a diagram for explaining the effects of the embodiment. 実施形態の効果を説明するための図である。FIG. 3 is a diagram for explaining the effects of the embodiment. 実施形態の効果を説明するための図である。FIG. 3 is a diagram for explaining the effects of the embodiment. 従来例を説明するための図である。FIG. 2 is a diagram for explaining a conventional example.

以下、実施形態の特徴マップ生成装置１を、図面を参照しながら説明する。 Hereinafter, a feature map generation device 1 according to an embodiment will be described with reference to the drawings.

図１は、本実施形態に係る特徴マップ生成装置１の構成の一例を示すブロック図である。図１に示すように、特徴マップ生成装置１は、例えば、カーネル変形方法決定部１００、カーネル変形部１０１、畳込演算部１０２、コスト値計算部１０３、コストボリューム正則化部１０４、デプスマップ生成部１０５、三次元点群生成部１０６、多視点画像記憶部１０７、特徴マップ記憶部１０８、三次元情報記憶部１０９、基準カーネル記憶部１１０、変形後カーネル記憶部１１１、コストボリューム記憶部１１２、デプスマップ記憶部１１３、及び三次元点群記憶部１１４を備える。 FIG. 1 is a block diagram showing an example of the configuration of a feature map generation device 1 according to this embodiment. As shown in FIG. 1, the feature map generation device 1 includes, for example, a kernel transformation method determination unit 100, a kernel transformation unit 101, a convolution calculation unit 102, a cost value calculation unit 103, a cost volume regularization unit 104, a depth map generation unit unit 105, three-dimensional point cloud generation unit 106, multi-view image storage unit 107, feature map storage unit 108, three-dimensional information storage unit 109, reference kernel storage unit 110, post-transformation kernel storage unit 111, cost volume storage unit 112, It includes a depth map storage section 113 and a three-dimensional point cloud storage section 114.

カーネル変形方法決定部１００は、カーネルを変形させる変形方法を決定する。ここでのカーネルは、二次元画像座標に畳み込み係数を対応させた配列である（例えば、（５）式を参照）。カーネルは、画像における各画素と畳込演算を行う際に用いられる。 The kernel deformation method determining unit 100 determines a deformation method for deforming the kernel. The kernel here is an array in which convolution coefficients correspond to two-dimensional image coordinates (for example, see equation (5)). The kernel is used when performing a convolution operation with each pixel in the image.

ここで、図２、図３を用いて、カーネル変形方法決定部１００がカーネルの変形方法を決定する処理ついて説明する。図２及び図３は、実施形態の特徴マップ生成装置１が行う処理を説明するための図である。 Here, the process by which the kernel transformation method determination unit 100 determines the kernel transformation method will be explained using FIGS. 2 and 3. 2 and 3 are diagrams for explaining the processing performed by the feature map generation device 1 of the embodiment.

図２には、対象物体（ＴａｒｇｅｔＯｂｊｅｃｔ）が、互いに異なる視点（Ｖｉｅｗ１～Ｖｉｅｗ３）から撮像される例が模式的に示されている。視点Ｖｉｅｗ１におけるカメラパラメータを｛Ｋ_１、Ｒ_１、ｔ_１｝とする。視点Ｖｉｅｗ２のカメラパラメータを｛Ｋ_２、Ｒ_２、ｔ_２｝とする。視点Ｖｉｅｗ３のカメラパラメータを｛Ｋ_３、Ｒ_３、ｔ_３｝とする。なお、Ｋはカメラパラメータにおける内部パラメータを表す。Ｒはカメラパラメータにおける回転行列を表す。ｔはカメラパラメータにおける並進ベクトルを表す。 FIG. 2 schematically shows an example in which a target object is imaged from different viewpoints (View 1 to View 3). Let camera parameters at viewpoint View1 be {K ₁ , R ₁ , t ₁ }. Let camera parameters of viewpoint View2 be {K ₂ , R ₂ , t ₂ }. Let camera parameters of viewpoint View3 be {K ₃ , R ₃ , t ₃ }. Note that K represents an internal parameter in the camera parameters. R represents a rotation matrix in camera parameters. t represents a translation vector in camera parameters.

また、対象物体において、対象物体の表面にある三次元座標Ｍと法線方向ｎの組｛Ｍ、ｎ｝を定義することができる。 Further, in the target object, a set {M, n} of three-dimensional coordinates M and normal direction n on the surface of the target object can be defined.

図２に示すように、多視点画像におけるカメラの撮像位置及び姿勢により、対象物体の見え方が変わる。例えば、視点Ｖｉｅｗ１にて対象物体を撮像した場合、対象物体を正面からみた画像が撮像される。視点Ｖｉｅｗ２にて対象物体を撮像した場合、視点Ｖｉｅｗ１に対し、上下左右が逆であり、且つ拡大された画像が撮像される。視点Ｖｉｅｗ３にて対象物体を撮像した場合、対象物体を斜めの方向から見た画像が撮像される。 As shown in FIG. 2, the appearance of the target object changes depending on the imaging position and orientation of the camera in the multi-view image. For example, when an image of a target object is captured at viewpoint View1, an image of the target object viewed from the front is captured. When an image of the target object is captured at the viewpoint View2, an image is captured that is upside down, horizontally and vertically reversed, and is enlarged with respect to the viewpoint View1. When the target object is imaged at the viewpoint View3, an image of the target object viewed from an oblique direction is captured.

このように、同じ対象物体を異なる視点から撮像した多視点画像において、画像間に回転、拡大縮小、射影変換などの平行移動以外の画像変形が生じることが多い。画像間に平行移動以外の画像変形が生じた場合、一般的な畳込演算を用いて各画像から抽出した畳込演算の演算結果（特徴量）が、各画像の対応点においてが異なる値となるため、三次元復元精度が著しく低下する。 As described above, in multi-view images of the same target object taken from different viewpoints, image deformations other than parallel movement such as rotation, scaling, and projective transformation often occur between images. When image deformation other than parallel movement occurs between images, the results (features) of the convolution operation extracted from each image using a general convolution operation may have different values at corresponding points in each image. As a result, three-dimensional reconstruction accuracy is significantly reduced.

これに対し、本実施形態では、画像変形に応じて、畳込演算に用いるカーネルを変形させ、変形させた後のカーネル（後述する変形後カーネル）を用いて、その画像について畳込演算を行うようにした。こうすることによって、画像間における画像変形画大きい場合であっても各画像から抽出した畳込演算の演算結果（特徴量）が、各画像の対応点において同じような値になるようにすることができ、三次元復元精度が低下してしまうことを抑制することが可能となる。 In contrast, in this embodiment, the kernel used for the convolution operation is transformed according to the image transformation, and the transformed kernel (the transformed kernel described later) is used to perform the convolution operation on the image. I did it like that. By doing this, even if the image deformation between images is large, the calculation results (features) of the convolution operation extracted from each image can be made to have similar values at corresponding points in each image. This makes it possible to suppress a decrease in three-dimensional reconstruction accuracy.

カーネル変形方法決定部１００は、画像変形に応じてカーネルを変形させる方法を決定する。具体的には、カーネル変形方法決定部１００は、画像における視点と、対象物体の表面にある三次元座標Ｍ及びその法線方向ｎの組｛Ｍ、ｎ｝の関係に応じて、畳込演算に用いるカーネルを変形させる方法を決定する。以下では、カーネル変形方法決定部１００が、畳込演算に用いるカーネルを変形させる方法を決定する処理について説明する。 The kernel deformation method determining unit 100 determines a method for deforming the kernel in accordance with image deformation. Specifically, the kernel deformation method determining unit 100 performs a convolution operation according to the relationship between the viewpoint in the image and the set {M, n} of the three-dimensional coordinate M on the surface of the target object and its normal direction n. Determine how to transform the kernel used for Below, a process in which the kernel transformation method determining unit 100 determines a method for transforming a kernel used in a convolution operation will be described.

図３には、カーネルを変形させた例が模式的に示されている。カーネル変形方法決定部１００は、基準カーネル（ＣｏｍｍｏｎＫｅｒｎｅｌ）において、例えば正方格子などを用いて設定された二次元座標が、各画像における対応点の二次元座標となるように、基準カーネルを変形させる変形方法を決定する。 FIG. 3 schematically shows an example of deforming the kernel. The kernel transformation method determination unit 100 transforms the reference kernel (Common Kernel) so that the two-dimensional coordinates set using, for example, a square lattice become the two-dimensional coordinates of corresponding points in each image. Determine the transformation method.

例えば、カーネル変形方法決定部１００は、基準カーネルにおける二次元座標Ｐ００、Ｐ１０、Ｐ２０、Ｐ３０のそれぞれが、視点Ｖｉｅｗ１における対応点、例えば、対象物体の右目、左目、右側の口角、左側の口角のそれぞれの位置に対応する二次元座標Ｐ０１、Ｐ１１、Ｐ２１、Ｐ３１に変換されるように、視点Ｖｉｅｗ１に対応する変形方法を決定する。 For example, the kernel deformation method determination unit 100 determines that each of the two-dimensional coordinates P00, P10, P20, and P30 in the reference kernel corresponds to the corresponding point in the viewpoint View1, for example, the right eye, left eye, right corner of the mouth, and left corner of the target object. A deformation method corresponding to the viewpoint View1 is determined so that it is transformed into two-dimensional coordinates P01, P11, P21, and P31 corresponding to the respective positions.

カーネル変形方法決定部１００は、基準カーネルにおける二次元座標Ｐ００、Ｐ１０、Ｐ２０、Ｐ３０のそれぞれが、視点Ｖｉｅｗ２における対応点のそれぞれの二次元座標Ｐ０２、Ｐ１２、Ｐ２２、Ｐ３２に変換されるように、視点Ｖｉｅｗ２に対応する変形方法を決定する。 The kernel transformation method determining unit 100 converts the two-dimensional coordinates P00, P10, P20, and P30 in the reference kernel into the two-dimensional coordinates P02, P12, P22, and P32 of the corresponding points in the viewpoint View2, respectively. A transformation method corresponding to viewpoint View2 is determined.

カーネル変形方法決定部１００は、基準カーネルにおける二次元座標Ｐ００、Ｐ１０、Ｐ２０、Ｐ３０のそれぞれが、視点Ｖｉｅｗ３における対応点のそれぞれの二次元座標Ｐ０３、Ｐ１３、Ｐ２３、Ｐ３３に変換されるように、視点Ｖｉｅｗ３に対応する変形方法を決定する。 The kernel transformation method determining unit 100 converts the two-dimensional coordinates P00, P10, P20, and P30 in the reference kernel into the two-dimensional coordinates P03, P13, P23, and P33 of the corresponding points in the viewpoint View3, respectively. A transformation method corresponding to viewpoint View3 is determined.

また、カーネル変形方法決定部１００は、２つの画像間の画像変形に応じて、変形方法を決定するようにしてもよい。 Further, the kernel deformation method determination unit 100 may determine the deformation method according to image deformation between two images.

例えば、カーネル変形方法決定部１００は、視点Ｖｉｅｗ１における座標が、視点Ｖｉｅｗ２における対応点がある座標に変換されるように、変形方法を決定する。具体的には、カーネル変形方法決定部１００は、視点Ｖｉｅｗ１における二次元座標Ｐ０１、Ｐ１１、Ｐ２１、Ｐ３１が、視点Ｖｉｅｗ２における対応点、つまり二次元座標Ｐ０２、Ｐ１２、Ｐ２２、Ｐ３２に変換されるように、視点Ｖｉｅｗ２に対応する変形方法を決定する。カーネル変形方法決定部１００は、視点Ｖｉｅｗ１における二次元座標Ｐ０１、Ｐ１１、Ｐ２１、Ｐ３１が、視点Ｖｉｅｗ３における対応点、つまり二次元座標Ｐ０３、Ｐ１３、Ｐ２３、Ｐ３３に変換されるように、視点Ｖｉｅｗ３に対応する変形方法を決定する。 For example, the kernel transformation method determining unit 100 determines a transformation method such that the coordinates at the viewpoint View1 are transformed into the coordinates of the corresponding point at the viewpoint View2. Specifically, the kernel transformation method determining unit 100 converts the two-dimensional coordinates P01, P11, P21, and P31 in the viewpoint View1 into corresponding points in the viewpoint View2, that is, the two-dimensional coordinates P02, P12, P22, and P32. Next, a transformation method corresponding to viewpoint View2 is determined. The kernel transformation method determination unit 100 transforms the two-dimensional coordinates P01, P11, P21, and P31 in the viewpoint View1 into corresponding points in the viewpoint View3, that is, the two-dimensional coordinates P03, P13, P23, and P33. Determine the corresponding deformation method.

図１に戻り、カーネル変形方法決定部１００は、多視点画像記憶部１０７から、多視点画像における各視点に対応するカメラパラメータを取得する。カーネル変形方法決定部１００は、三次元情報記憶部１０９から三次元点の座標（位置座標）と法線方向を取得する。カーネル変形方法決定部１００は、各視点に対応するカメラパラメータと三次元点の座標と法線方向とから、畳込演算を行う対象視点におけるカーネルの変形方法を決定する。 Returning to FIG. 1, the kernel deformation method determining unit 100 acquires camera parameters corresponding to each viewpoint in the multi-view image from the multi-view image storage unit 107. The kernel deformation method determining unit 100 acquires the coordinates (position coordinates) and normal direction of a three-dimensional point from the three-dimensional information storage unit 109. The kernel deformation method determining unit 100 determines the method of deforming the kernel at the target viewpoint for which the convolution operation is to be performed, from the camera parameters corresponding to each viewpoint, the coordinates of the three-dimensional point, and the normal direction.

カーネルの変形方法として、例えば、射影変換を用いる場合、カーネル変形方法決定部１００は、射影変換行列Ｈを算出する。射影変換行列Ｈは（１）式を用いて計算することができる。 For example, when projective transformation is used as the kernel transformation method, the kernel transformation method determination unit 100 calculates a projective transformation matrix H. The projective transformation matrix H can be calculated using equation (1).

ここで、（１）式における三次元点の座標Ｍと法線方向ｎは、参照視点画像におけるカメラ座標系で与えられる。（１）式に示す射影変換行列Ｈは、三次元点の座標Ｍと法線方向ｎで定義される三次元平面を介して、参照視点画像における二次元画像座標を、対象視点画像における二次元画像座標に変換する座標変換行列に相当する。対象視点画像は、畳込演算を行う対象となる多視点画像である。 Here, the coordinates M and the normal direction n of the three-dimensional point in equation (1) are given by the camera coordinate system in the reference viewpoint image. The projective transformation matrix H shown in equation (1) converts the two-dimensional image coordinates in the reference viewpoint image into the two-dimensional image coordinates in the target viewpoint image via the three-dimensional plane defined by the coordinate M of the three-dimensional point and the normal direction n. Corresponds to a coordinate transformation matrix that transforms into image coordinates. The target viewpoint image is a multi-view image on which a convolution operation is performed.

このようにして決定されたカーネル変形方法で変形されたカーネルを用いた畳込演算を行うことにより、三次元点の座標と法線方向が実際の対象表面における三次元座標と法線方向と近い場合に、多視点画像間の画像変形による特徴量の変化が軽減され、画像変形が大きい多視点画像でも各視点の対応点において近い特徴量を抽出することができる。 By performing a convolution operation using the kernel deformed by the kernel deformation method determined in this way, the coordinates and normal direction of the three-dimensional point are close to the three-dimensional coordinates and normal direction on the actual target surface. In this case, changes in feature amounts due to image deformation between multi-view images are reduced, and even in multi-view images with large image deformations, similar feature amounts can be extracted at corresponding points of each viewpoint.

カーネル変形方法決定部１００は、例えば、参照視点画像のカメラパラメータと、対象視点画像のカメラパラメータとの二つの視点におけるカメラパラメータを用いてカーネル変形方法を決定する。或いは、カーネル変形方法決定部１００は、対象視点画像のカメラパラメータのみ、つまりひとつの視点におけるカメラパラメータを用いてカーネル変形方法を決定しても良い。 The kernel deformation method determination unit 100 determines the kernel deformation method using camera parameters at two viewpoints, for example, camera parameters of a reference viewpoint image and camera parameters of a target viewpoint image. Alternatively, the kernel deformation method determination unit 100 may determine the kernel deformation method using only the camera parameters of the target viewpoint image, that is, the camera parameters at one viewpoint.

対象視点画像におけるひとつの視点のカメラパラメータのみを用いる場合、カーネル変形方法決定部１００は、対象視点画像におけるカメラ座標系で与えられた三次元点の座標と法線方向を用いる。 When using only the camera parameters of one viewpoint in the target viewpoint image, the kernel transformation method determining unit 100 uses the coordinates and normal direction of a three-dimensional point given in the camera coordinate system in the target viewpoint image.

一視点のカメラパラメータのみを用いる場合、例えば、カーネル変形方法決定部１００は、対象視点に設けられるカメラに正対した平面を、与えられた法線方向に回転させる射影変換行列を、カーネル変形方法として決定する。一視点のカメラパラメータのみを用いる場合、多視点画像における各視点画像を用いて、それぞれ独立に、つまり参照視点画像とは無関係に、特徴量を抽出する処理を行うことができるという利点がある。 When using only the camera parameters of one viewpoint, for example, the kernel deformation method determining unit 100 uses the kernel deformation method to generate a projective transformation matrix that rotates a plane facing the camera provided at the target viewpoint in a given normal direction. Determine as. When only camera parameters from one viewpoint are used, there is an advantage that feature values can be extracted using each viewpoint image in a multi-view image independently, that is, independently of the reference viewpoint image.

一方、参照視点画像のカメラパラメータと、対象視点画像のカメラパラメータとの二視点のカメラパラメータを用いる場合、複数の視点での整合性がとれた特徴量を抽出することができるという利点がある。 On the other hand, when using camera parameters from two viewpoints, that is, the camera parameters of the reference viewpoint image and the camera parameters of the target viewpoint image, there is an advantage that it is possible to extract feature amounts that are consistent from a plurality of viewpoints.

カーネル変形方法決定部１００は、例えば、一組の参照視点画像のカメラパラメータと対象視点画像のカメラパラメータに対して、一組の三次元点の座標と法線方向を三次元情報記憶部１０９から読み込んで、ひとつの射影変換行列を決定する。 For example, the kernel deformation method determining unit 100 obtains the coordinates and normal direction of a set of three-dimensional points from the three-dimensional information storage unit 109 for a set of camera parameters of a reference viewpoint image and camera parameters of a target viewpoint image. Read and determine one projective transformation matrix.

或いは、カーネル変形方法決定部１００は、対象視点画像におけるピクセルごとに異なる三次元点の座標と法線方向を三次元情報記憶部１０９から読み込んで、対象視点画像におけるピクセルごとに異なる射影変換行列を決定しても良い。 Alternatively, the kernel transformation method determination unit 100 reads the coordinates and normal direction of a three-dimensional point that differ for each pixel in the target viewpoint image from the three-dimensional information storage unit 109, and creates a projective transformation matrix that differs for each pixel in the target viewpoint image. You may decide.

一般に、対象物体の対象表面における三次元座標と法線方向は、対象視点画像におけるピクセルごとに異なる。そのため、ピクセルごとに異なる三次元点の座標と法線方向を用いて、ピクセルごとに異なる射影変換行列を決定した方が、最終的な三次元復元誤差は小さくなる可能性が高い。一方で、ピクセルごとに異なる変形方法でカーネルを変形する場合、計算コストが大きくなる。このため、視点ごとにひとつの変形方法を用いたほうが処理時間を短くすることができる。 Generally, the three-dimensional coordinates and normal direction on the target surface of the target object differ for each pixel in the target viewpoint image. Therefore, the final three-dimensional reconstruction error is likely to be smaller if a different projective transformation matrix is determined for each pixel using the coordinates and normal direction of the three-dimensional point that are different for each pixel. On the other hand, when deforming the kernel using a different deformation method for each pixel, the calculation cost increases. Therefore, using one transformation method for each viewpoint can reduce processing time.

カーネル変形方法決定部１００は、例えば、一組の参照視点画像のカメラパラメータ、対象視点画像のカメラパラメータ、および三次元点の座標に対して、ひとつの法線方向を三次元情報記憶部１０９から読み込んで、ひとつの射影変換行列を決定する。 For example, the kernel deformation method determining unit 100 obtains one normal direction from the three-dimensional information storage unit 109 for a set of camera parameters of a reference viewpoint image, camera parameters of a target viewpoint image, and coordinates of a three-dimensional point. Read and determine one projective transformation matrix.

或いは、カーネル変形方法決定部１００は、一組の参照視点画像のカメラパラメータ、対象視点画像のカメラパラメータ、および三次元点の座標に対して、二つ以上からなる複数の法線方向を三次元情報記憶部１０９から読み込んで、それぞれの法線方向に対応して複数の射影変換行列を決定しても良い。 Alternatively, the kernel deformation method determination unit 100 converts a plurality of normal directions consisting of two or more to three-dimensional A plurality of projective transformation matrices may be determined by reading from the information storage unit 109 and corresponding to each normal direction.

一般に、考慮する法線方向の数が多いほど、後述する畳込演算部１０２で演算される特徴量からなる特徴マップについて、複数の法線方向の各々に対応する特徴マップの中に多視点の特徴マップと整合性がとれる特徴マップが含まれる可能性が高くなる。一方、考慮する法線方向の数が少ないほど、計算コストが小さくなるという利点がある。 In general, the larger the number of normal directions to consider, the more multi-viewpoints are included in the feature map corresponding to each of a plurality of normal directions for a feature map made up of feature quantities calculated by the convolution calculation unit 102, which will be described later. There is a high possibility that a feature map that is consistent with the feature map will be included. On the other hand, there is an advantage that the smaller the number of normal directions to consider, the lower the calculation cost.

カーネル変形部１０１は、変形後カーネルを生成する。カーネル変形部１０１は、例えば、基準カーネル記憶部１１０から基準カーネルを読み込むことによって基準カーネルを取得する。カーネル変形部１０１は、カーネル変形方法決定部１００により決定されたカーネル変形方法に従い、基準カーネルを変形させることによって変形後カーネルを生成する。カーネル変形部１０１は、生成した変形後カーネルを変形後カーネル記憶部１１１に書き込む。 The kernel transformation unit 101 generates a transformed kernel. The kernel modification unit 101 obtains a reference kernel by reading the reference kernel from the reference kernel storage unit 110, for example. The kernel transformation unit 101 generates a transformed kernel by transforming the reference kernel according to the kernel transformation method determined by the kernel transformation method determination unit 100. The kernel transformation unit 101 writes the generated transformed kernel to the transformed kernel storage unit 111.

例えば、カーネル変形方法が射影変換行列Ｈで表現される射影変換であった場合、カーネル変形部１０１は、（２）式を用いて基準カーネルを変形することによって変形後カーネルを生成する。 For example, if the kernel transformation method is projective transformation expressed by a projection transformation matrix H, the kernel transformation unit 101 generates a transformed kernel by transforming the reference kernel using equation (2).

カーネル変形部１０１は、異なる複数のカーネル変形方法が与えられた際に、同一の基準カーネルをそれぞれ異なるカーネル変形方法で変形させ、それぞれ異なる複数の変形後カーネルを生成するようにしても良い。 When provided with a plurality of different kernel transformation methods, the kernel transformation unit 101 may transform the same reference kernel using different kernel transformation methods to generate a plurality of different transformed kernels.

つまり、カーネル変形部１０１は、対象視点画像、三次元点の座標、及び法線方向の組合せに応じて、同じ基準カーネルに対して異なる変形後カーネルを生成する。また、カーネル変形部１０１は、異なる対象視点画像、異なる三次元点の座標、及び異なる法線方向に対して、共通する同じ基準カーネルを用いて、それぞれの対象視点画像、三次元点の座標、及び法線方向に合わせた変形後カーネルを生成する。 That is, the kernel transformation unit 101 generates different transformed kernels for the same reference kernel depending on the combination of the target viewpoint image, the coordinates of the three-dimensional point, and the normal direction. Further, the kernel transformation unit 101 uses the same common reference kernel for different target viewpoint images, coordinates of different three-dimensional points, and different normal directions, and calculates the coordinates of the respective target viewpoint images, coordinates of three-dimensional points, and generate a deformed kernel according to the normal direction.

基準カーネルが正方格子で配置された整数の二次元画像座標を持つ場合でも、変形方法によっては、変形後カーネルの二次元画像座標が実数を持つ可能性がある。これに対して、カーネル変形部１０１は、変形後カーネルが実数の座標を持つ場合、その実数の座標を持つ変形後カーネルに対して、正方格子で配置された整数の二次元画像座標で補間処理を行うことにより整数の二次元画像座標をもつ変形後カーネルを生成しても良い。ここで、実数の二次元画像座標を持つ変形後カーネルは、「仮変形後カーネル」の一例である。補間方法としては、例えば、双線形補間を用いることができる。 Even if the reference kernel has integer two-dimensional image coordinates arranged in a square lattice, the two-dimensional image coordinates of the transformed kernel may have real numbers depending on the transformation method. On the other hand, when the transformed kernel has real number coordinates, the kernel transformation unit 101 performs interpolation processing on the transformed kernel having the real number coordinates using integer two-dimensional image coordinates arranged in a square lattice. A transformed kernel having integer two-dimensional image coordinates may be generated by performing the following. Here, the transformed kernel having real two-dimensional image coordinates is an example of a "temporary transformed kernel." As the interpolation method, for example, bilinear interpolation can be used.

ここで、図４を用いて、カーネル変形部１０１が整数の二次元画像座標をもつ変形後カーネルを生成する例について説明する。図４は、実施形態の特徴マップ生成装置１が行う処理を説明するための図である。 Here, an example in which the kernel transformation unit 101 generates a transformed kernel having integer two-dimensional image coordinates will be described using FIG. 4. FIG. 4 is a diagram for explaining the processing performed by the feature map generation device 1 of the embodiment.

図４における各視点（Ｖｉｅｗ１～Ｖｉｅｗ３）の左から２番目には、各視点に対応させた変形後カーネルを、５×５の正方格子の上にマッピングさせた例が模式的に示されている。この図の例に示すように、変形後カーネルの二次元画像座標が実数を持つ場合、正方格子上の任意の点に変形後カーネルの二次元画像座標がマッピングされる。 The second from the left of each viewpoint (View 1 to View 3) in FIG. 4 schematically shows an example in which the transformed kernel corresponding to each viewpoint is mapped onto a 5×5 square grid. . As shown in the example of this figure, when the two-dimensional image coordinates of the transformed kernel have real numbers, the two-dimensional image coordinates of the transformed kernel are mapped to arbitrary points on the square grid.

図４における各視点（Ｖｉｅｗ１～Ｖｉｅｗ３）の左から３番目には、実数の座標を持つ変形後カーネルに双線形補間（ＢｉｌｉｎｅａｒＩｎｔｅｒｐｏｌａｔｉｏｎ）による補間処理を行うことにより、整数の二次元画像座標をもつ変形後カーネルを生成した例が模式的に示されている。 The third from the left of each viewpoint (View 1 to View 3) in FIG. 4 has two-dimensional image coordinates of integers by performing interpolation processing by bilinear interpolation on the transformed kernel having coordinates of real numbers. An example of generating a transformed kernel is schematically shown.

これにより、変形後カーネルが一般的な畳込演算で用いられるカーネルと同様に整数の二次元画像座標を持つものとすることができる。このため、一般的な畳込演算処理をそのまま利用して、変形後カーネルを用いた畳込演算を実行することができる。 This allows the transformed kernel to have integer two-dimensional image coordinates, similar to the kernel used in general convolution operations. Therefore, it is possible to perform a convolution operation using a transformed kernel by directly using general convolution operation processing.

変形後カーネルが実数の座標を持つ場合、後述する畳込演算部１０２において畳込演算を実行する際に、畳込演算を行う各ピクセルの演算において補間処理が必要となるため、計算コストが大きくなる。これに対して、変形後カーネルが整数の座標を持つようにすることによって、各ピクセルの演算において補間処理を行う必要がなくなる。このため、畳込演算部１０２が行う処理に要する時間を短くすることが可能である。 If the transformed kernel has real number coordinates, when the convolution operation unit 102 (described later) executes the convolution operation, interpolation processing is required for each pixel in which the convolution operation is performed, resulting in a large calculation cost. Become. On the other hand, by making the transformed kernel have integer coordinates, there is no need to perform interpolation processing in each pixel calculation. Therefore, it is possible to shorten the time required for the processing performed by the convolution calculation unit 102.

畳込演算部１０２は、畳込演算を行う。畳込演算部１０２は、例えば、変形後カーネル記憶部１１１から変形後カーネルを読み込んで取得する。また、畳込演算部１０２は、多視点画像記憶部１０７から対象視点画像を読み込んで取得する。畳込演算部１０２は、変形後カーネルを用いて、対象視点画像に畳込演算を行う。畳込演算部１０２は、対象視点画像に、畳込演算の演算結果を特徴量として対応づけた特徴マップを生成する。畳込演算部１０２は、生成した特徴マップを特徴マップ記憶部１０８に記憶させる。 The convolution calculation unit 102 performs a convolution calculation. The convolution calculation unit 102 reads and obtains the transformed kernel from the transformed kernel storage unit 111, for example. Further, the convolution calculation unit 102 reads and obtains a target viewpoint image from the multi-view image storage unit 107. The convolution operation unit 102 performs a convolution operation on the target viewpoint image using the transformed kernel. The convolution calculation unit 102 generates a feature map in which the result of the convolution calculation is associated with the target viewpoint image as a feature quantity. The convolution calculation unit 102 stores the generated feature map in the feature map storage unit 108.

変形後カーネルを用いて畳込演算を行うことにより、多視点画像間の画像変形が大きい場合においても、画像変形によって特徴量（畳込演算の演算結果）が変化する度合が軽減され、各視点において抽出されるそれぞれの対応における特徴量が近い値となる。 By performing a convolution operation using the transformed kernel, even if the image deformation between multi-view images is large, the degree to which the feature amount (the result of the convolution operation) changes due to image deformation is reduced, and each viewpoint The feature amounts in each correspondence extracted in the above process have similar values.

深層学習に基づく多視点ステレオでは、各視点の画像のそれぞれから生成される特徴マップにおいて、真の三次元座標が与えられた際に、各視点における対応点の特徴量が等しくなるか、或いは極めて近い値となることが前提とされる。そのため、画像変形が大きい画像間であっても、その画像間における対応点の特徴量が近い値となるように特徴量を抽出することができれば、高精度な三次元形状の復元につなげることが可能である。 In multi-view stereo based on deep learning, in the feature map generated from each image of each viewpoint, when true three-dimensional coordinates are given, the feature values of corresponding points in each viewpoint are equal or extremely It is assumed that the values are close. Therefore, even if the image deformation is large, if features can be extracted so that the features of corresponding points between images have similar values, it is possible to restore a three-dimensional shape with high accuracy. It is possible.

変形後カーネルが正方格子で配置された整数の二次元画像座標を持つカーネルである場合、畳込演算部１０２は、一般的な畳込演算により対象視点画像に畳込演算を行う。一方、変形後カーネルが実数の二次元画像座標を持つカーネルである場合、畳込演算部１０２は、例えば、対象視点画像における実数の二次元画像座標の画素値を、その周囲にある整数の二次元画像座標の画素値を補間するとによって算出し、算出した実数の二次元画像座標の画素値に対し、実数の二次元画像座標を持つカーネルを用いた畳込演算を行う。 When the transformed kernel is a kernel having integer two-dimensional image coordinates arranged in a square grid, the convolution operation unit 102 performs a convolution operation on the target viewpoint image using a general convolution operation. On the other hand, if the transformed kernel is a kernel with real two-dimensional image coordinates, the convolution calculation unit 102 converts the pixel value of the real two-dimensional image coordinates in the target viewpoint image into two of the surrounding integers. The pixel values of the two-dimensional image coordinates are calculated by interpolating the pixel values of the real two-dimensional image coordinates, and a convolution operation using a kernel having the real two-dimensional image coordinates is performed on the calculated pixel values of the two-dimensional image coordinates of the real numbers.

補間方法としては、例えば、双線形補間を用いる。ここで、畳込演算において、実数の二次元画像座標をもつカーネルを用いて、補間により求めた実数の二次元画像座標の画素値に畳込演算を行うことと、実数の二次元画像座標をもつカーネルを整数の座標系で補間し、補間後のカーネルを用いて整数の二次元画像座標の画素値に畳込演算を行うことは、同じ演算結果となる。 For example, bilinear interpolation is used as the interpolation method. Here, in the convolution operation, a kernel with real two-dimensional image coordinates is used to perform a convolution operation on the pixel value of the real two-dimensional image coordinates obtained by interpolation, and the real two-dimensional image coordinates are Interpolating a kernel with an integer coordinate system and performing a convolution operation on a pixel value of an integer two-dimensional image coordinate using the interpolated kernel yields the same operation result.

変形後カーネルが、ひとつの対象視点画像に対して決定されたひとつのカーネルである場合、畳込演算部１０２は、対象視点画像において共通する同一の変形後カーネルを用いて畳込演算を行う。 When the transformed kernel is one kernel determined for one target viewpoint image, the convolution calculation unit 102 performs the convolution calculation using the same transformed kernel common to the target viewpoint images.

一方、変形後カーネルが、対象視点画像におけるピクセルごとに決定された、ピクセルごとに異なる変形後カーネルである場合、畳込演算部１０２は、対象視点画像において、ピクセルごとに異なる変形後カーネルを用いて畳込演算を行う。 On the other hand, if the transformed kernel is a transformed kernel that is determined for each pixel in the target viewpoint image and is different for each pixel, the convolution calculation unit 102 uses a transformed kernel that is different for each pixel in the target viewpoint image. Perform the convolution operation.

畳込演算部１０２は、ひとつの対象視点画像に対して、複数の変形後カーネルを用いて畳込演算を行い、特徴マップを生成しても良い。例えば、畳込演算部１０２は、複数の変形後カーネルを変形後カーネル記憶部１１１から読み込み、ひとつの対象視点画像を多視点画像記憶部から読み込み、読み込んだ対象視点画像に対して、異なる複数の変形後カーネルを用いて順に畳込演算を行い、特徴マップを生成する。この場合、ひとつの対象視点画像に対して、複数の変形後カーネルのそれぞれに対応する複数の特徴マップが生成される。 The convolution calculation unit 102 may perform a convolution calculation on one target viewpoint image using a plurality of transformed kernels to generate a feature map. For example, the convolution calculation unit 102 reads a plurality of transformed kernels from the transformed kernel storage unit 111, reads one target viewpoint image from the multi-view image storage unit, and applies different multiple viewpoint images to the read target viewpoint image. A feature map is generated by sequentially performing convolution operations using the transformed kernel. In this case, a plurality of feature maps corresponding to each of the plurality of transformed kernels are generated for one target viewpoint image.

コスト値計算部１０３は、複数の特徴マップからコスト値を計算する。コスト値は、特徴マップにおける対応点が類似する度合であって、例えば、対応点における特徴量間の分散や相関を示す値である。 The cost value calculation unit 103 calculates cost values from a plurality of feature maps. The cost value is the degree to which corresponding points in the feature map are similar, and is, for example, a value indicating the variance or correlation between feature amounts at the corresponding points.

ここで、ある三次元点の座標と法線方向の組｛Ｍ_１、ｎ_１｝が与えられた場合を考える。この場合、多視点画像に含まれる複数の対象視点画像｛Ｉ_０、Ｉ_１、…｝のそれぞれに対して、カーネル変形方法決定部１００、カーネル変形部１０１、及び畳込演算部１０２のそれぞれを連携させることによって、三次元点の座標と法線方向の組｛Ｍ_１、ｎ_１｝を適用した場合おける、それぞれの対象視点画像の特徴マップ｛ｆ_０，１、ｆ_１，１、…｝が生成される。 Here, consider a case where a set {M ₁ , n ₁ } of the coordinates and normal direction of a certain three-dimensional point is given. In this case, each of the kernel transformation method determination unit 100, the kernel transformation unit 101, and the convolution calculation unit 102 is executed for each of the plurality of target viewpoint images {I ₀ , I ₁ , ...} included in the multi-view image. By linking, the feature map {f _0,1 , f _1,1 ,...} of each target viewpoint image when the set of three-dimensional point coordinates and normal direction {M ₁ , n ₁ } is applied. is generated.

このとき、コスト値を計算する対象である三次元座標Ｍ＝［Ｘ、Ｙ、Ｚ］^Ｔにおけるコスト値Ｃ_１は、各対象視点画像の特徴マップ｛ｆ_０，１、ｆ_１，１、…｝における、三次元座標Ｍに対応する対応点の特徴量のそれぞれの分散を示す値である。 At this time, the cost value C ₁ at the three-dimensional coordinates M=[X, Y, Z] ^T , which is the target for calculating the cost value, is the feature map of each target viewpoint image {f _0,1 , f _1,1 , . . . } is a value indicating the respective variance of the feature amount of the corresponding point corresponding to the three-dimensional coordinate M.

同様に、同じ三次元点の座標に対し別の法線方向の組｛Ｍ₁、ｎ_２｝が与えられた場合を考える。この場合、多視点画像に含まれる複数の対象視点画像それぞれの特徴マップ｛ｆ_０，２、ｆ_１，２、…｝が生成される。コスト値計算部１０３は、各対象視点画像の特徴マップ｛ｆ_０，２、ｆ_１，２、…｝の対応点における特徴量の分散を示す値をコスト値Ｃ_２として計算する。 Similarly, consider a case where another set of normal directions {M ₁ , n ₂ } is given to the coordinates of the same three-dimensional point. In this case, feature maps {f _0,2 , f _1,2 , . . . } of each of the plurality of target viewpoint images included in the multi-view image are generated. The cost value calculation unit 103 calculates a value indicating the variance of the feature amount at the corresponding points of the feature map {f _0,2 , f _1,2 , . . . } of each target viewpoint image as a cost value C ₂ .

コスト値計算部１０３は、同じ三次元点の座標に対する、二つの法線方向のそれぞれに対するコスト値の集合｛Ｃ_１、Ｃ_２｝を構成する要素のうちの最小値を、最終的なコスト値Ｃとする。 The cost value calculation unit 103 calculates the minimum value of the elements constituting the set of cost values {C ₁ , C ₂ } for each of the two normal directions with respect to the coordinates of the same three-dimensional point as the final cost value. Let it be C.

上記の説明では、同じ三次元点の座標に対する、二つの法線方向のそれぞれに対する最終的なコスト値Ｃを算出する場合を例示して説明したが、同じ三次元点の座標に対して三つ以上の法線方向の組が与えられた場合にも、同様な方法により、コスト値の集合｛Ｃ_１、Ｃ_２、…｝を生成することができる。コスト値計算部１０３は、コスト値の集合｛Ｃ_１、Ｃ_２、…｝を構成する要素のうちの最小値を、最終的なコスト値とする。 In the above explanation, the case where the final cost value C is calculated for each of the two normal directions for the coordinates of the same three-dimensional point has been explained as an example, but for the coordinates of the same three-dimensional point, Even when the above set of normal directions is given, a set of cost values {C ₁ , C ₂ , . . . } can be generated using a similar method. The cost value calculation unit 103 sets the minimum value of the elements constituting the set of cost values {C ₁ , C ₂ , . . . } as the final cost value.

仮定された法線方向、つまり与えられた法線方向と、真の法線方向とが近い場合、特徴マップにおける対応点の特徴量が近い値となる。この場合、コスト値の集合、つまり各法線方向に対する特徴量間の分散は小さい値をとる。したがって、複数の法線方向のそれぞれについてコスト値を計算し、それらの最小値を、最終的なコスト値として採用することは、仮定した複数の法線方向のそれぞれから、真の法線方向と最も整合性が取れる法線方向で計算したコスト値を選択することと等しい。 When the assumed normal direction, that is, the given normal direction, and the true normal direction are close to each other, the feature amounts of corresponding points in the feature map have similar values. In this case, the set of cost values, that is, the variance between the feature amounts for each normal direction takes a small value. Therefore, calculating the cost value for each of multiple normal directions and adopting the minimum value as the final cost value is the same as the true normal direction from each of the assumed multiple normal directions. This is equivalent to selecting the cost value calculated in the normal direction that provides the most consistency.

上記では、コスト値計算部１０３は、コスト値として、複数の対象視点画像の特徴マップの対応点における特徴量の分散を計算しても良いし、複数の対象視点画像の特徴マップの対応点における特徴量の相関を計算しても良い。コスト値計算部１０３は、コスト値として、少なくとも複数の対象視点画像のそれぞれの特徴マップにおける対応点の特徴量が類似する度合を計算できればよい。すなわち、コスト値計算部１０３は、コスト値として、分散及び相関の何れを示す値を計算しても良い。 In the above, the cost value calculation unit 103 may calculate, as the cost value, the variance of the feature amount at the corresponding points of the feature maps of the plurality of target viewpoint images, or the variance of the feature amount at the corresponding points of the feature maps of the plurality of target viewpoint images. Correlation between features may also be calculated. The cost value calculation unit 103 only needs to be able to calculate, as the cost value, the degree to which the feature amounts of corresponding points in the respective feature maps of at least a plurality of target viewpoint images are similar. That is, the cost value calculation unit 103 may calculate a value indicating either variance or correlation as the cost value.

特徴量の相関をコスト値として計算する場合、コスト値計算部１０３は、法線方向に対応するコスト値の集合における各要素のうちの最大値を、最終的なコスト値とする。各対象視点画像における特徴マップの対応点の特徴量が近い値をとる場合、各法線方向に対する特徴量間の相関が大きい値をとる。このため、法線方向ごとのコスト値の最大値を、最終的なコスト値として採用することで、真の法線方向と最も整合性がとれる法線方向を用いて計算したコスト値を選択することができる。 When calculating the correlation between feature amounts as a cost value, the cost value calculation unit 103 sets the maximum value of each element in the set of cost values corresponding to the normal direction as the final cost value. When the feature amounts of corresponding points of the feature map in each target viewpoint image take close values, the correlation between the feature amounts with respect to each normal direction takes a large value. Therefore, by adopting the maximum cost value for each normal direction as the final cost value, the cost value calculated using the normal direction that is most consistent with the true normal direction is selected. be able to.

なお、コスト値計算部１０３は、特徴マップに対応づけられる変数（三次元座標Ｍ、法線方向ｎ）の何れを基準としてコスト値を計算しても良い。例えば、上述したように、同じ三次元座標を基準として、複数の法線方向ｎのそれぞれについてコスト値を計算しても良いし、三次元座標と法線方向ｎの組ごとにコスト値を計算しても良い。また、同じ法線方向を基準として、複数の三次元座標のそれぞれについてコスト値を計算しても良い。 Note that the cost value calculation unit 103 may calculate the cost value based on any of the variables (three-dimensional coordinate M, normal direction n) associated with the feature map. For example, as described above, the cost value may be calculated for each of a plurality of normal directions n based on the same three-dimensional coordinate, or the cost value may be calculated for each pair of three-dimensional coordinate and normal direction n. You may do so. Furthermore, cost values may be calculated for each of a plurality of three-dimensional coordinates using the same normal direction as a reference.

さらに、コスト値計算部１０３は、コストボリュームを生成する。コストボリュームは、参照視点画像に正対する複数の平面であって、離散的な奥行きを有する複数の平面にコスト値が反映されたものである。コスト値計算部１０３は、例えば、あらかじめ三次元空間上に設定された三次元ボクセルのすべてのボクセルに対して、各ボクセルに対応する三次元座標におけるコスト値を計算することで、コストボリュームを生成する。コスト値計算部１０３は、生成したコストボリュームを、コストボリューム記憶部１１２に記憶させる。 Furthermore, the cost value calculation unit 103 generates a cost volume. The cost volume is a plurality of planes that directly face the reference viewpoint image and has cost values reflected on the plurality of planes that have discrete depths. The cost value calculation unit 103 generates a cost volume by, for example, calculating cost values at three-dimensional coordinates corresponding to each voxel for all three-dimensional voxels set in advance on a three-dimensional space. do. The cost value calculation unit 103 stores the generated cost volume in the cost volume storage unit 112.

このとき、コスト値計算部１０３は、三次元ボクセルを、立方体で設定しても良いし、参照視点画像の面と底面が正対する四角スイ台で設定しても良い。 At this time, the cost value calculation unit 103 may set the three-dimensional voxel as a cube, or may set the three-dimensional voxel as a square cube in which the surface of the reference viewpoint image and the bottom face directly.

コストボリューム正則化部１０４は、コストボリュームを正則化する。コストボリューム正則化部１０４は、例えば、コストボリューム記憶部１１２からコストボリュームを読み込んで取得する。コストボリューム正則化部１０４は、取得したコストボリュームに対して三次元の畳込演算を行うことによりコストボリュームを正則化する。コストボリューム正則化部１０４は、正則化した後のコストボリュームを、コストボリューム記憶部１１２に記憶させる。このとき、コストボリューム正則化部１０４は、三次元の畳込演算において、異なるカーネルを複数回適用しても良い。 The cost volume regularization unit 104 regularizes the cost volume. The cost volume regularization unit 104 reads and acquires the cost volume from the cost volume storage unit 112, for example. The cost volume regularization unit 104 regularizes the cost volume by performing a three-dimensional convolution operation on the obtained cost volume. The cost volume regularization unit 104 causes the cost volume storage unit 112 to store the regularized cost volume. At this time, the cost volume regularization unit 104 may apply different kernels multiple times in the three-dimensional convolution operation.

デプスマップ生成部１０５は、デプスマップを生成する。デプスマップ生成部１０５は、例えば、コストボリューム記憶部１１２から正則化後のコストボリュームを読み込んで取得する。デプスマップ生成部１０５は、取得したコストボリュームから参照視点画像におけるデプスマップを抽出する。デプスマップ生成部１０５は、抽出したデプスマップをデプスマップ記憶部１１３に記憶させる。 The depth map generation unit 105 generates a depth map. The depth map generation unit 105 reads and obtains the regularized cost volume from the cost volume storage unit 112, for example. The depth map generation unit 105 extracts a depth map in the reference viewpoint image from the acquired cost volume. The depth map generation unit 105 stores the extracted depth map in the depth map storage unit 113.

デプスマップ生成部１０５は、参照視点画像の各ピクセルについて、そのピクセルに対応する視線上にあるコストボリュームのコスト値を抽出し、抽出したコスト値のうち最も整合性のとれるコスト値の奥行きを、デプスマップの奥行値として選択する。 For each pixel of the reference viewpoint image, the depth map generation unit 105 extracts the cost value of the cost volume on the line of sight corresponding to that pixel, and calculates the depth of the most consistent cost value among the extracted cost values. Select as the depth value of the depth map.

例えば、コストボリュームにおける三次元ボクセルが、参照視点画像の面と底面が正対した四角スイ台で設定される場合、各ピクセルの視線は、コストボリューム上にある一列のボクセルと一致する。コスト値が特徴量の分散で示される場合、デプスマップ生成部１０５は、参照視点画像の各ピクセルについて、そのピクセルの視線上にある最もコスト値が小さくなる奥行きを、デプスマップの奥行値として選択する。一方、コスト値が特徴量の相関で示される場合、デプスマップ生成部１０５は、参照視点画像の各ピクセルについて、そのピクセルの視線上にある最もコスト値が大きくなる奥行きを、デプスマップの奥行値として選択する。 For example, if the three-dimensional voxels in the cost volume are set as a rectangular platform in which the surface of the reference viewpoint image and the bottom face directly, the line of sight of each pixel coincides with a row of voxels on the cost volume. When the cost value is represented by the variance of the feature amount, the depth map generation unit 105 selects, for each pixel of the reference viewpoint image, the depth at which the cost value is the smallest on the line of sight of that pixel as the depth value of the depth map. do. On the other hand, when the cost value is indicated by the correlation of feature amounts, the depth map generation unit 105 calculates, for each pixel of the reference viewpoint image, the depth at which the cost value is the largest on the line of sight of that pixel, as the depth value of the depth map. Select as.

三次元点群生成部１０６は、三次元点群を生成する。三次元点群生成部１０６は、例えば、デプスマップ記憶部からデプスマップを読み込んで取得する。三次元点群生成部１０６は、取得したデプスマップを三次元点群に変換することにより三次元点群を生成する。三次元点群生成部１０６は、生成した三次元点群を、三次元点群記憶部１１４に記憶させる。 The three-dimensional point group generation unit 106 generates a three-dimensional point group. The three-dimensional point cloud generation unit 106, for example, reads and acquires a depth map from a depth map storage unit. The three-dimensional point cloud generation unit 106 generates a three-dimensional point cloud by converting the acquired depth map into a three-dimensional point cloud. The three-dimensional point cloud generation unit 106 stores the generated three-dimensional point cloud in the three-dimensional point cloud storage unit 114.

例えば、一枚のデプスマップを読み込んだ場合、三次元点群生成部１０６は、読み込んだデプスマップと、そのカメラパラメータから、各ピクセルの三次元座標を計算した計算結果を三次元点群とする。一方、異なる視点に対応する複数のデプスマップを読み込んだ場合、三次元点群生成部１０６は、まず、視点ごとにデプスマップを三次元点群に変換し、次に、各視点の三次元点群をひとつの三次元点群に統合する。三次元点群生成部１０６は、例えば、三次元点群を統合する場合、各視点のカメラパラメータに従い、三次元点群の三次元座標をカメラ座標系から世界座標系に変換することによって、異なる視点の三次元点群を同一の座標系に合わせた表現に変換することによって、ひとつの三次元点群に統合する。 For example, when one depth map is read, the three-dimensional point cloud generation unit 106 calculates the three-dimensional coordinates of each pixel from the read depth map and its camera parameters, and generates a three-dimensional point cloud. . On the other hand, when a plurality of depth maps corresponding to different viewpoints are read, the three-dimensional point cloud generation unit 106 first converts the depth map into a three-dimensional point cloud for each viewpoint, and then converts the depth map into a three-dimensional point cloud for each viewpoint. Integrate the group into one 3D point cloud. For example, when integrating three-dimensional point clouds, the three-dimensional point cloud generation unit 106 converts the three-dimensional coordinates of the three-dimensional point cloud from the camera coordinate system to the world coordinate system according to the camera parameters of each viewpoint. By converting the 3D point cloud of the viewpoint into a representation that conforms to the same coordinate system, it is integrated into a single 3D point cloud.

多視点画像記憶部１０７は、Ｎ枚からなる多視点画像と、その多視点画像の各視点に対応するカメラパラメータを記憶する。ここで、多視点画像を構成する視点のうち一つの視点は、参照視点画像の視点とする。ここでＮは２以上の整数である。 The multi-view image storage unit 107 stores N multi-view images and camera parameters corresponding to each viewpoint of the multi-view images. Here, one of the viewpoints forming the multi-view image is the viewpoint of the reference viewpoint image. Here, N is an integer of 2 or more.

特徴マップ記憶部１０８は、特徴マップを記憶する。ここで、特徴マップ記憶部１０８は、特徴マップに、その特徴マップを生成する元となる画像に対応する視点、および、その特徴マップを生成する際に用いた三次元点の座標と法線方向と紐づけて記憶しても良い。 The feature map storage unit 108 stores feature maps. Here, the feature map storage unit 108 stores, in the feature map, the viewpoint corresponding to the image from which the feature map is generated, and the coordinates and normal direction of the three-dimensional point used when generating the feature map. You can also remember it by linking it with.

三次元情報記憶部１０９は、カーネル変形方法を決定するための三次元点の座標および法線方向を記憶する。ここで、三次元情報記憶部１０９は、ひとつの三次元点の座標に対して複数の法線方向の組を紐づけて記憶しても良い。また、三次元情報記憶部１０９は、各視点に対して、一組の三次元点の座標と法線方向を記憶しても良いし、ピクセルごとに異なる三次元点の座標と法線方向を記憶しても良い。ピクセルごとに異なる三次元点の座標と法線方向を記憶する場合、三次元情報記憶部１０９は、三次元点の座標と法線方向を、三次元座標マップおよび法線方向マップとして記憶する。 The three-dimensional information storage unit 109 stores coordinates and normal directions of three-dimensional points for determining a kernel deformation method. Here, the three-dimensional information storage unit 109 may store a plurality of sets of normal directions in association with the coordinates of one three-dimensional point. Furthermore, the three-dimensional information storage unit 109 may store a set of three-dimensional point coordinates and normal direction for each viewpoint, or may store different three-dimensional point coordinates and normal direction for each pixel. You can memorize it. When storing the coordinates and normal direction of a three-dimensional point that are different for each pixel, the three-dimensional information storage unit 109 stores the coordinates and normal direction of the three-dimensional point as a three-dimensional coordinate map and a normal direction map.

三次元情報記憶部１０９は、あらかじめ用意される三次元点の座標群｛Ｍ｝、および法線群｛ｎ｝を記憶し、三次元座標群および法線群の中から選択した三次元点の座標と法線方向の組を、カーネル変形方法を決めるための三次元点の座標および法線方向の組としてもよい。 The three-dimensional information storage unit 109 stores a group of coordinates {M} and a group of normals {n} of a three-dimensional point prepared in advance, and stores a group of coordinates {M} and a group of normals {n} of a three-dimensional point that are prepared in advance, and stores a group of coordinates {M} and a group of normals {n} of a three-dimensional point that are prepared in advance. The set of coordinates and normal direction may be a set of coordinates and normal direction of a three-dimensional point for determining the kernel deformation method.

ここで、あらかじめ用意される三次元点の座標群｛Ｍ｝は、例えば、（３）式で示される。 Here, the coordinate group {M} of three-dimensional points prepared in advance is expressed by, for example, equation (3).

また、あらかじめ用意される法線群｛ｎ｝は、例えば、（４）式で示される。 Further, the group of normals {n} prepared in advance is expressed by, for example, equation (4).

基準カーネル記憶部１１０は、基準カーネルを記憶する。基準カーネルは、変形後カーネルを生成する基準となるカーネルであり、例えば、二次元画像座標と重み係数の組の集合で表現される。基準カーネルは、例えば、（５）式で示される。基準カーネル記憶部１１０は、互いに異なる基準カーネルを複数記憶しても良い。 The reference kernel storage unit 110 stores reference kernels. The reference kernel is a kernel that serves as a reference for generating the transformed kernel, and is expressed, for example, as a set of pairs of two-dimensional image coordinates and weighting coefficients. The reference kernel is expressed, for example, by equation (5). The reference kernel storage unit 110 may store a plurality of different reference kernels.

変形後カーネル記憶部１１１は、変形後カーネルを記憶する。変形後カーネルは、カーネル変形部１０１によって、基準カーネルが、カーネル変形方法決定部１００により決定された方法を用いて変形されることによって生成されたカーネルである。変形後カーネルは、基準カーネルと同様に、例えば、二次元画像座標と重み係数の組の集合で表現される。変形後カーネルは、例えば、（６）式で示される。 The transformed kernel storage unit 111 stores the transformed kernel. The transformed kernel is a kernel generated by the kernel transformation unit 101 transforming the reference kernel using the method determined by the kernel transformation method determination unit 100. Like the reference kernel, the transformed kernel is expressed, for example, by a set of two-dimensional image coordinates and weighting coefficients. The transformed kernel is expressed, for example, by equation (6).

（６）式に示す変形後カーネルの二次元画像座標は、例えば、正方格子で配置された整数の座標として構成しても良いし、実数の座標として構成しても良い。カーネル変形部１０１が基準カーネルを変形する際に、正方格子で配置された整数の座標で補間しながら基準カーネルを変形した場合、変形後カーネルの要素数Ｋ’及び重み係数（ａ’_ｉ）は、基準カーネルの要素数Ｋ及び重み係数（ａ_ｉ）と異なる値をとる。一方、カーネル変形部１０１が、補間せずに、基準カーネルを変形した場合、変形後カーネルの要素数Ｋ’及び重み係数（ａ’_ｉ）は、基準カーネルの要素数Ｋ及び重み係数（ａ_ｉ）と一致する。但し、ｉはカーネルを構成する要素に応じた変数であり、１≦ｉ≦Ｋ’である。 The two-dimensional image coordinates of the transformed kernel shown in equation (6) may be configured, for example, as integer coordinates arranged in a square lattice, or as real number coordinates. When the kernel transformation unit 101 transforms the reference kernel while interpolating with integer coordinates arranged in a square grid, the number of elements K' and the weighting coefficient (a' _i ) of the transformed kernel are as follows. , takes values different from the number of elements K and the weighting coefficient (a _i ) of the reference kernel. On the other hand, when the kernel transformation unit 101 transforms the reference kernel without interpolation, the number of elements K' and the weighting coefficient (a' _i ) of the transformed kernel are equal to the number of elements K and the weighting coefficient (a' _i ) of the standard kernel. ) matches. However, i is a variable depending on the elements constituting the kernel, and satisfies 1≦i≦K'.

変形後カーネル記憶部１１１は、変形後カーネルに、その変形後カーネルを生成する元になった基準カーネルを紐づけて記憶しても良い。つまり、基準カーネルが複数用意される場合、変形後カーネル記憶部１１１は、変形後カーネルのそれぞれに、どの基準カーネルから生成されたかを示す情報が紐づけて記憶する。また、変形後カーネル記憶部１１１は、変形後カーネルに、その変形後カーネルを生成する際に用いた三次元点の座標と法線方向を紐づけて記憶しても良い。 The transformed kernel storage unit 111 may store the transformed kernel in association with a reference kernel from which the transformed kernel is generated. That is, when a plurality of reference kernels are prepared, the transformed kernel storage unit 111 stores information indicating which reference kernel was generated from each of the transformed kernels in association with each other. Further, the deformed kernel storage unit 111 may store the deformed kernel in association with the coordinates and normal direction of the three-dimensional point used to generate the deformed kernel.

コストボリューム記憶部１１２は、コストボリュームを記憶する。コストボリュームは、三次元空間上に設定された三次元ボクセルの各ボクセルにコスト値が対応づけられた情報である。 The cost volume storage unit 112 stores cost volumes. The cost volume is information in which a cost value is associated with each voxel of three-dimensional voxels set on a three-dimensional space.

デプスマップ記憶部１１３は、多視点画像の各視点におけるデプスマップを記憶する。ここで、デプスマップの大きさは、対応する視点（画像）の画像サイズと一致する。また、デプスマップの各ピクセルの奥行値は、対応する視点の各ピクセルの二次元座標における対象物体までの奥行きを示す値である。 The depth map storage unit 113 stores depth maps at each viewpoint of the multi-view image. Here, the size of the depth map matches the image size of the corresponding viewpoint (image). Further, the depth value of each pixel of the depth map is a value indicating the depth to the target object in the two-dimensional coordinates of each pixel of the corresponding viewpoint.

三次元点群記憶部１１４は、対象物体を三次元復元した三次元復元点群を記憶する。ここで、三次元点群は、三次元座標の集合として定義される。つまり、三次元点群記憶部１１４は、三次元点群の各点の三次元座標を記憶する。また、三次元点群が色付きの三次元点群の場合、三次元点群記憶部１１４は、三次元点群の各点の三次元座標に加えて、三次元点群の各点の色（例えば、ＲＧＢ値）を記憶する。 The three-dimensional point cloud storage unit 114 stores a three-dimensional restored point group obtained by three-dimensionally restoring the target object. Here, a three-dimensional point group is defined as a set of three-dimensional coordinates. That is, the three-dimensional point group storage unit 114 stores the three-dimensional coordinates of each point in the three-dimensional point group. Furthermore, when the three-dimensional point group is a colored three-dimensional point group, the three-dimensional point group storage unit 114 stores the color ( For example, RGB values) are stored.

ここで、図５を用いてデプスマップを生成する処理の流れを説明する。図５は、実施形態の特徴マップ生成装置１を用いてデプスマップを生成する処理を説明する図である。 Here, the flow of processing for generating a depth map will be explained using FIG. 5. FIG. 5 is a diagram illustrating a process of generating a depth map using the feature map generation device 1 of the embodiment.

図５に示すように、本実施形態の特徴マップ生成装置１は、一般の深層学習に基づく多視点ステレオ技術に用いられる４つの学習済ネットワークＮＷを基に、デプスマップを生成する。 As shown in FIG. 5, the feature map generation device 1 of this embodiment generates a depth map based on four trained networks NW used in general deep learning-based multi-view stereo technology.

特徴マップ生成装置１は、画像間における画像変形に対応させた特徴マップ生成ネットワークＮＷ１＃を用いて特徴マップを生成する。具体的に、特徴マップ生成装置１では、カーネル変形方法決定部１００、カーネル変形部１０１、及び畳込演算部１０２が連携して、各画像Ｉ_ｉにおける法線方向ｎ_ｋに対応する、特徴マップを生成する。ただし、ｉは多視点画像の枚数に対応する変数であり、１≦ｉ≦Ｎ_Ｉである。また、ｋは法線方向の数に対応する変数であり、１≦ｋ≦Ｎ_ｎである。 Feature map generation device 1 generates a feature map using feature map generation network NW1# that corresponds to image deformation between images. Specifically, in the feature map generation device 1, the kernel transformation method determining unit 100, the kernel transformation unit 101, and the convolution calculation unit 102 cooperate to generate a feature map corresponding to the normal direction _nk in each image _Ii . generate. However, i is a variable corresponding to the number of multi-view images, and satisfies 1≦i≦ _NI . Further, k is a variable corresponding to the number of normal directions, and satisfies 1≦k≦N _n .

より具体的に、カーネル変形方法決定部１００は、対象物体における三次元座標Ｍ、法線方向ｎ_ｉ、及び各画像Ｉ_ｉのカメラパラメータに基づいて、各画像Ｉ_ｉにおける法線方向ｎ_ｋに対応する、基準カーネルの変形方法を決定する。カーネル変形部１０１は、カーネル変形方法決定部１００によって決定された変形方法を用いて、基準カーネルを変形させた変形後カーネル｛Ｐ_Ｉｉ、ｎ_ｋ｝を生成する。畳込演算部１０２は、各画像Ｉ_ｉに対し、各画像における法線方向ｎ_ｋに対応した変形後カーネル｛Ｐ_Ｉｉ、ｎ_ｋ｝を用いて、畳込演算を行う。畳込演算部１０２は、畳込演算の演算結果を、各画像Ｉ_ｉにおける画素毎の特徴量として抽出し、抽出した特徴量を用いて特徴マップｆ_ｎｋを生成する。 More specifically, the kernel deformation method determination unit 100 determines the normal direction n _k in each image I _i based on the three-dimensional coordinate M in the target object, the normal direction n _i , and the camera parameters of each image I _i . Determine the corresponding deformation method of the reference kernel. The kernel deformation unit 101 generates a deformed kernel {P _Ii , n _k } by deforming the reference kernel using the deformation method determined by the kernel deformation method determining unit 100 . The convolution calculation unit 102 performs a convolution calculation on each image I _i using a transformed kernel {P _Ii , n _k } corresponding to the normal direction n _k in each image. The convolution calculation unit 102 extracts the calculation result of the convolution calculation as a feature amount for each pixel in each image I _i , and generates a feature map f _nk using the extracted feature amount.

特徴マップ生成装置１は、コストボリューム構築ネットワークＮＷ２＃を用いてコストボリュームを構築する。具体的に、特徴マップ生成装置１では、コスト値計算部１０３が、特徴マップ生成ネットワークＮＷ１＃によって生成された、各画像Ｉ_ｉにおける法線方向ｎ_ｋに対応する特徴マップｆ_ｎｋのそれぞれを用いてコストボリュームを構築する。 Feature map generation device 1 constructs a cost volume using cost volume construction network NW2#. Specifically, in the feature map generation device 1, the cost value calculation unit 103 uses each of the feature maps f _nk corresponding to the normal direction n _k in each image I _i generated by the feature map generation network NW1#. Build cost volume.

より具体的に、コスト値計算部１０３は、対象物体における三次元座標Ｍに対し、法線方向ｎ_ｋごとに、特徴マップｆ_ｎｋにおける三次元座標Ｍの対応点のそれぞれの特徴量の分散を計算し、計算した分散値を、三次元座標Ｍにおける法線方向ｎ_ｋに対するコスト値Ｃ_ｋとする。コスト値計算部１０３は、三次元座標Ｍにおける法線方向ｎ_ｋ毎のコスト値Ｃ_ｋのうち、分散が小さいもの、つまり似た特徴を有するものを最終的なコスト値Ｃとする。 More specifically, the cost value calculation unit 103 calculates the variance of each feature amount of the corresponding point of the three-dimensional coordinate M in the feature map f _nk for each normal direction n _k with respect to the three-dimensional coordinate M of the target object. The calculated variance value is defined as a cost value C _k for the normal direction n _k in the three-dimensional coordinate M. The cost value calculation unit 103 determines, as the final cost value C, one of the cost values C _k for each normal direction n _k in the three-dimensional coordinate M, which has a small variance, that is, one having similar characteristics.

特徴マップ生成装置１は、コストボリューム正則化ネットワークＮＷ３、及びデプスマップ生成ネットワークＮＷ４を用いて、デプスマップを生成する。具体的に、特徴マップ生成装置１では、コストボリューム正則化部１０４、デプスマップ生成部１０５、及び三次元点群生成部１０６が連携することによって、従来技術（図７参照）と同様に、コストボリューム正則化ネットワークＮＷ３、及びデプスマップ生成ネットワークＮＷ４を用いることによってデプスマップを生成する。 The feature map generation device 1 generates a depth map using a cost volume regularization network NW3 and a depth map generation network NW4. Specifically, in the feature map generation device 1, the cost volume regularization unit 104, the depth map generation unit 105, and the three-dimensional point cloud generation unit 106 cooperate to reduce the cost as in the conventional technology (see FIG. 7). A depth map is generated by using a volume regularization network NW3 and a depth map generation network NW4.

ここで、図６を用いて特徴マップを生成する処理の流れを説明する。図６は、実施形態の特徴マップを生成する処理の流れを示すフローチャートである。 Here, the flow of processing for generating a feature map will be explained using FIG. 6. FIG. 6 is a flowchart showing the flow of processing for generating a feature map according to the embodiment.

ステップＳ１０：特徴マップ生成装置１は、対象視点画像のカメラパラメータを取得する。特徴マップ生成装置１は、多視点画像記憶部１０７を参照することによって、対象視点画像のカメラパラメータを取得する。
ステップＳ１１：特徴マップ生成装置１は、対象物体における三次元座標、及び法線方向の組を取得する。特徴マップ生成装置１は、三次元情報記憶部１０９を参照することによって、対象物体における三次元座標、及び法線方向を取得する。例えば、参照視点画像におけるカメラ座標系に基づいて、対象物体における三次元座標及び法線方向が設定され、設定された対象物体における三次元座標及び法線方向が、三次元情報記憶部１０９に記憶される。
ステップＳ１２：特徴マップ生成装置１は、基準カーネルを取得する。特徴マップ生成装置１は、基準カーネル記憶部１１０を参照することによって、基準カーネルを取得する。
ステップＳ１３：特徴マップ生成装置１は、カーネルの変形方法を決定する。特徴マップ生成装置１は、ステップＳ１０で取得した対象視点画像のカメラパラメータと、ステップＳ１１で取得した対象物体における三次元座標及び法線方向に基づいて、対象視点画像における、対象物体の三次元座標及び法線方向に対応する、カーネルの変形方法を決定する。 Step S10: The feature map generation device 1 acquires camera parameters of the target viewpoint image. The feature map generation device 1 obtains camera parameters of the target viewpoint image by referring to the multi-view image storage unit 107.
Step S11: The feature map generation device 1 obtains a set of three-dimensional coordinates and a normal direction of the target object. The feature map generation device 1 obtains the three-dimensional coordinates and normal direction of the target object by referring to the three-dimensional information storage unit 109. For example, the three-dimensional coordinates and normal direction of the target object are set based on the camera coordinate system in the reference viewpoint image, and the set three-dimensional coordinates and normal direction of the target object are stored in the three-dimensional information storage unit 109. be done.
Step S12: The feature map generation device 1 obtains a reference kernel. The feature map generation device 1 obtains a reference kernel by referring to the reference kernel storage unit 110.
Step S13: The feature map generation device 1 determines a kernel deformation method. The feature map generation device 1 calculates the three-dimensional coordinates of the target object in the target viewpoint image based on the camera parameters of the target viewpoint image acquired in step S10 and the three-dimensional coordinates and normal direction of the target object acquired in step S11. Determine the method of deforming the kernel corresponding to the direction and the normal direction.

ステップＳ１４：特徴マップ生成装置１は、変形後カーネルを生成する。特徴マップ生成装置１は、ステップＳ１３で決定したカーネルの変形方法を用いて、基準カーネルを変形することによって、変形後カーネルを生成する。
ステップＳ１５：特徴マップ生成装置１は、対象視点画像を取得する。特徴マップ生成装置１は、多視点画像記憶部１０７を参照することによって、対象視点画像を取得する。
ステップＳ１６：特徴マップ生成装置１は、変形後カーネルを用いて畳込演算を実行する。特徴マップ生成装置１は、ステップＳ１４で取得した対象視点画像に、ステップＳ１３で生成した変形後カーネルを用いた畳込演算を行うことによって、畳込演算を実行する。
ステップＳ１７：特徴マップ生成装置１は、特徴マップを生成する。特徴マップ生成装置１は、ステップＳ１６で実行した畳込演算の演算結果を特徴量として抽出し、抽出した特徴量を対象視点画像の各画素に対応づけることによって特徴マップを生成する。 Step S14: The feature map generation device 1 generates a transformed kernel. The feature map generation device 1 generates a transformed kernel by transforming the reference kernel using the kernel transformation method determined in step S13.
Step S15: The feature map generation device 1 acquires a target viewpoint image. The feature map generation device 1 obtains a target viewpoint image by referring to the multi-view image storage unit 107.
Step S16: The feature map generation device 1 executes a convolution operation using the transformed kernel. The feature map generation device 1 executes a convolution operation by performing a convolution operation on the target viewpoint image acquired at step S14 using the transformed kernel generated at step S13.
Step S17: The feature map generation device 1 generates a feature map. The feature map generation device 1 generates a feature map by extracting the calculation result of the convolution operation performed in step S16 as a feature amount, and associating the extracted feature amount with each pixel of the target viewpoint image.

ステップＳ１８：特徴マップ生成装置１は、三次元座標及び法線方向の組の全てについて特徴マップを生成したか否かを判定する。特徴マップ生成装置１は、三次元座標及び法線方向の組の全てについて特徴マップを生成していない場合、ステップＳ１９に示す処理を実行する。特徴マップ生成装置１は、三次元座標及び法線方向の組の全てについて特徴マップを生成した場合、ステップＳ２０に示す処理を実行する。
ステップＳ１９：特徴マップ生成装置１は、三次元座標及び法線方向の少なくとも一方を変更する。特徴マップ生成装置１は、三次元情報記憶部１０９を参照し、すでに特徴マップを生成した三次元座標及び法線方向の組とは異なる組合せを持つ三次元座標及び法線方向の組を取得することによって、三次元座標及び法線方向の少なくとも一方を変更する。特徴マップ生成装置１は、ステップＳ１１に示す処理を実行する。
ステップＳ２０：特徴マップ生成装置１は、対象視点画像の全てについて特徴マップを生成したか否かを判定する。特徴マップ生成装置１は、対象視点画像の全てについて特徴マップを生成していない場合、ステップＳ２１に示す処理を実行する。特徴マップ生成装置１は、対象視点画像の全てについて特徴マップを生成した場合、ステップＳ２２に示す処理を実行する。
ステップＳ２１：特徴マップ生成装置１は、特徴マップを生成する対象とする対象視点画像を変更する。特徴マップ生成装置１は、多視点画像記憶部１０７を参照し、すでに特徴マップを生成した対象視点画像とは異なる対象視点画像を取得することによって、対象視点画像を変更する。特徴マップ生成装置１は、ステップＳ１０に示す処理を実行する。 Step S18: The feature map generation device 1 determines whether feature maps have been generated for all sets of three-dimensional coordinates and normal directions. If the feature map generation device 1 has not generated feature maps for all sets of three-dimensional coordinates and normal directions, the feature map generation device 1 executes the process shown in step S19. When the feature map generation device 1 generates feature maps for all sets of three-dimensional coordinates and normal directions, it executes the process shown in step S20.
Step S19: The feature map generation device 1 changes at least one of the three-dimensional coordinates and the normal direction. The feature map generation device 1 refers to the three-dimensional information storage unit 109 and obtains a set of three-dimensional coordinates and normal direction having a different combination from the set of three-dimensional coordinates and normal direction for which the feature map has already been generated. By doing so, at least one of the three-dimensional coordinates and the normal direction is changed. The feature map generation device 1 executes the process shown in step S11.
Step S20: The feature map generation device 1 determines whether feature maps have been generated for all of the target viewpoint images. If the feature map generation device 1 has not generated feature maps for all of the target viewpoint images, it executes the process shown in step S21. When the feature map generation device 1 generates feature maps for all of the target viewpoint images, it executes the process shown in step S22.
Step S21: The feature map generation device 1 changes the target viewpoint image for which the feature map is to be generated. The feature map generation device 1 changes the target viewpoint image by referring to the multi-view image storage unit 107 and acquiring a target viewpoint image different from the target viewpoint image for which the feature map has already been generated. The feature map generation device 1 executes the process shown in step S10.

ステップＳ２２：特徴マップ生成装置１は、仮コスト値を計算する。特徴マップ生成装置１は、三次元座標に対し、法線方向ごとに、特徴マップにおける三次元座標の対応点の特徴量それぞれの分散を計算し、計算した分散値を、三次元座標における法線方向に対する仮コスト値とする。
ステップＳ２３：特徴マップ生成装置１は、仮コスト値から最終的なコスト値を決定する。特徴マップ生成装置１は、三次元座標における各法線方向に対する仮コスト値のうち、分散が小さいもの、つまり似た特徴を有するものを最終的なコスト値とする。 Step S22: The feature map generation device 1 calculates a provisional cost value. The feature map generation device 1 calculates the variance of each feature amount of the corresponding point of the three-dimensional coordinate in the feature map for each normal direction with respect to the three-dimensional coordinate, and applies the calculated variance value to the normal to the three-dimensional coordinate. Let this be the tentative cost value for the direction.
Step S23: The feature map generation device 1 determines the final cost value from the provisional cost value. The feature map generation device 1 determines, among the provisional cost values for each normal direction in three-dimensional coordinates, the one with small variance, that is, the one with similar features, as the final cost value.

ここで、本実施形態の効果を確認する方法について説明する。 Here, a method for confirming the effects of this embodiment will be explained.

まず、実験用データセットとして、画像変形の大きさや種類が異なるデータセットを用意する。多視点画像間の画像変形に影響する要因として、対象の三次元形状、視点間の距離（基線長）、視点間の回転、対象からカメラまでの距離、各視点の内部パラメータが挙げられる。これらを変化させながら、多視点画像を撮影することによって、様々な画像変形を含む多視点画像データセットを作成することができる。 First, datasets with different sizes and types of image deformation are prepared as experimental datasets. Factors that affect image deformation between multi-view images include the three-dimensional shape of the target, the distance between viewpoints (baseline length), the rotation between viewpoints, the distance from the target to the camera, and the internal parameters of each viewpoint. By photographing multi-view images while changing these, it is possible to create a multi-view image data set including various image transformations.

しかしながら、視点間の距離（基線長）等を変化させながら多視点画像を撮影しようとすると撮影の負荷が大きい。この対策として、既存の多視点画像データセットに対して、画像処理を施すことによって、疑似的に画像変形が異なる多視点画像データセットを作成することを検討した。 However, when trying to capture multi-view images while changing the distance between viewpoints (baseline length), etc., the burden of capturing is large. As a countermeasure to this problem, we considered creating a multi-view image dataset with pseudo-different image deformations by applying image processing to the existing multi-view image dataset.

例えば、既存の多視点画像データセットを用いて、新しい多視点画像データセットを作成する方法として、画像に対する二次元回転を加える方法がある。このとき、あるひとつの対象物体を撮影した一連の多視点画像に対して、すべて同じ回転角で回転を加えても、画像間における画像変形は変わらない。そこで、多視点画像の各画像について、互いに異なる回転角により回転を加える必要がある。 For example, as a method of creating a new multi-view image data set using an existing multi-view image data set, there is a method of adding two-dimensional rotation to the image. At this time, even if a series of multi-view images of a single target object are all rotated by the same rotation angle, the image deformation between the images will not change. Therefore, it is necessary to rotate each image of the multi-view image using different rotation angles.

具体的に、例えば、多視点画像に１番からＮ番（Ｎは多視点画像の画像枚数）の番号を与え、偶数番の画像のみを１８０度回転させ、奇数版の画像には画像変形を加えないとする方法がある。また、例えば、多視点画像の各画像について、すべて異なるランダムな回転角で回転を加える方法もある。 Specifically, for example, give the multi-view images numbers 1 to N (N is the number of multi-view images), rotate only the even-numbered images by 180 degrees, and apply image transformation to the odd-numbered images. There is a way to not add it. Alternatively, for example, there is a method in which each image of a multi-view image is rotated at a different random rotation angle.

機械学習用の多視点画像データセットには、多視点画像のほかに、各視点における真値のデプスマップと、各視点のカメラパラメータが含まれる。多視点画像における各視点の画像に画像変形を加えた場合、画像変形を加えた画像のデプスマップとカメラパラメータについても、整合性が取れるように、編集しなくてはならない。 A multi-view image dataset for machine learning includes, in addition to multi-view images, a true value depth map for each viewpoint and camera parameters for each viewpoint. When image deformation is applied to each viewpoint image in a multi-view image, the depth map and camera parameters of the image to which image deformation has been applied must also be edited to ensure consistency.

デプスマップについては、対応する視点の画像と全く同じ画像変形を加えればよい。例えば、多視点画像の各画像に二次元回転を加える場合、デプスマップも各画像と同じ回転角で同じ二次元回転を加える必要がある。 As for the depth map, it is sufficient to apply exactly the same image transformation as the image of the corresponding viewpoint. For example, when applying two-dimensional rotation to each image of a multi-view image, it is necessary to apply the same two-dimensional rotation to the depth map at the same rotation angle as each image.

また、各視点のカメラパラメータについても、視点画像における各視点の画像に加えた画像変形と整合性するように編集する必要がある。多視点画像の各画像に二次元回転を加えた場合、各視点のカメラパラメータについて、その視点に対応する回転角によるカメラの光軸回りの三次元回転行列を、外部パラメータの回転行列と並進ベクトルに加える必要がある。このとき、各画像の二次元回転の中心座標が画像中心である場合、カメラの内部パラメータを編集する必要はない。一方で、各画像の回転の中心座標が画像中心でない場合、画像中心から回転中心の座標のずれを、各視点の内部パラメータの画像中心に加算し、整合性をとる必要がある。 Furthermore, the camera parameters for each viewpoint also need to be edited to be consistent with the image deformation applied to the image for each viewpoint in the viewpoint image. When a two-dimensional rotation is applied to each image of a multi-view image, for the camera parameters of each viewpoint, the three-dimensional rotation matrix around the optical axis of the camera by the rotation angle corresponding to that viewpoint is calculated by the rotation matrix of the external parameter and the translation vector. need to be added to At this time, if the center coordinates of the two-dimensional rotation of each image are the image center, there is no need to edit the internal parameters of the camera. On the other hand, if the rotation center coordinates of each image are not the image center, it is necessary to add the deviation of the rotation center coordinates from the image center to the image center of the internal parameters of each viewpoint to ensure consistency.

既存の多視点画像データセットを用いて、新しい多視点画像データセットを作成する別の方法として、画像に対して拡大又は縮小（以下、拡大縮小という）を加える方法がある。このとき、あるひとつの対象物体を撮影した一連の多視点画像に対して、すべて同じ拡大縮小率で拡大縮小を加えても、画像間の画像変形は変わらない。そこで、多視点画像の各画像について、異なる拡大縮小率による拡大縮小を加える必要がある。例えば、多視点画像の各画像について、互いに異なるランダムな拡大縮小率によって、拡大縮小を加える方法がある。また、画像の垂直方向と水平方向で、拡大縮小率が異なっていてもよい。 Another method for creating a new multi-view image data set using an existing multi-view image data set is to apply enlargement or reduction (hereinafter referred to as scaling) to an image. At this time, even if a series of multi-view images of a single target object are all scaled at the same scaling ratio, the image deformation between the images will not change. Therefore, it is necessary to scale each image of the multi-view image using a different scaling ratio. For example, there is a method of scaling each image of a multi-view image using different random scaling ratios. Further, the scaling ratio may be different between the vertical direction and the horizontal direction of the image.

このとき、機械学習用の多視点画像データセットにおける、デプスマップとカメラパラメータについても、視点画像における各視点の画像に加えた拡大縮小に対して整合性が取れるように、編集しなくてはならない。 At this time, the depth map and camera parameters in the multi-view image dataset for machine learning must also be edited so that they are consistent with the scaling applied to each viewpoint image in the viewpoint image. .

デプスマップについては、対応する視点の画像と全く同じ拡大縮小率で拡大縮小をさせる必要がある。また、各視点のカメラパラメータについては、外部パラメータは変化させず、内部パラメータのうち、焦点距離を拡大縮小率で乗算する必要がある。垂直方向の拡大縮小率と水平方向の拡大縮小率が異なる場合、内部パラメータの垂直方向の焦点距離と水平方向の焦点距離は、それぞれ異なる倍率で乗算する。また、各画像の拡大縮小の中心座標が画像中心である場合、カメラの内部パラメータを編集する必要はない。一方で、各画像の拡大縮小の中心座標が画像中心でない場合、画像中心から拡大縮小中心の座標のずれを、各視点の内部パラメータの画像中心に加算し、整合性をとる必要がある。 As for the depth map, it is necessary to scale it at exactly the same scale ratio as the image of the corresponding viewpoint. Furthermore, regarding the camera parameters of each viewpoint, it is necessary to multiply the focal length among the internal parameters by the scaling factor without changing the external parameters. When the vertical scaling factor and the horizontal scaling factor are different, the vertical focal length and horizontal focal length of the internal parameters are multiplied by different scaling factors. Further, if the center coordinates of scaling each image are the center of the image, there is no need to edit the internal parameters of the camera. On the other hand, if the center coordinates of the enlargement/reduction of each image are not the image center, it is necessary to add the deviation of the coordinates of the enlargement/reduction center from the image center to the image center of the internal parameters of each viewpoint to ensure consistency.

既存の多視点画像データセットを用いて、新しい多視点画像データセットを作成する別の方法として、視点の三次元回転と内部パラメータの変化に伴う射影変換を加える方法がある。 Another method for creating a new multi-view image dataset using an existing multi-view image dataset is to add three-dimensional rotation of viewpoints and projective transformation in accordance with changes in internal parameters.

例えば、元のデータセット（既存の多視点画像データセット）の画像から、新しい多視点画像データセット（新しい多視点画像データセット）の画像への射影変換行列は（７）式を用いて計算することができる。 For example, the projective transformation matrix from the image of the original dataset (existing multi-view image dataset) to the image of the new multi-view image dataset (new multi-view image dataset) is calculated using equation (7). be able to.

このとき、あるひとつの対象物体を撮影した一連の多視点画像に対して、すべて同じ射影変換行列で射影変換を行っても、画像間の画像変形は変わらない。 At this time, even if a series of multi-view images of a single target object are all subjected to projective transformation using the same projective transformation matrix, the image deformation between the images will not change.

そこで、多視点画像の各画像について、異なる射影変換行列で射影変換を行う必要がある。例えば、多視点画像の各画像について、すべて異なるランダムな内部パラメータと視点の回転行列の組合せを用いる方法がある。 Therefore, it is necessary to perform projective transformation on each image of the multi-view image using a different projective transformation matrix. For example, there is a method of using a combination of random internal parameters and viewpoint rotation matrices that are all different for each image of a multi-view image.

ここで、回転行列の回転角によっては、元の画像と新しい画像の共通領域が非常に小さくなる場合があり得る。このため、例えば、元の画像と新しい画像の共通領域について、画像全体に対する共通領域の面積比に閾値を設け、共通領域の画像全体に対する割合が閾値を超えるような回転行列と内部パラメータを選ぶ方法がある。 Here, depending on the rotation angle of the rotation matrix, the common area between the original image and the new image may become very small. For this reason, for example, for the common area between the original image and the new image, a threshold is set for the area ratio of the common area to the entire image, and a rotation matrix and internal parameters are selected such that the ratio of the common area to the entire image exceeds the threshold. There is.

視点の三次元回転が加わる場合、機械学習用の多視点画像データセットについて、デプスマップに対して画像変形を加えるだけでは整合性が取れなくなる。そのため、元のデータセットのデプスマップから、隣接ピクセル間をつないだ三次元メッシュモデルを生成し、そのメッシュモデルを新しいカメラパラメータで投影してデプスマップを再度作り直す必要がある。 When three-dimensional rotation of viewpoints is added, consistency cannot be achieved simply by adding image deformation to the depth map for a multi-view image dataset for machine learning. Therefore, it is necessary to generate a three-dimensional mesh model that connects adjacent pixels from the depth map of the original data set, project the mesh model using new camera parameters, and recreate the depth map again.

新しいデータセットのカメラパラメータについては、内部パラメータは、新しいデータセットの内部パラメータであるＫ_ｎを用いる。外部パラメータは、回転行列と並進ベクトルに新しいデータセットの回転行列であるＲ_o→ｎを加える。 For the camera parameters of the new dataset, the intrinsic parameter uses K _n , which is the intrinsic parameter of the new dataset. The external parameters add the rotation matrix of the new data set, Ro _→n, to the rotation matrix and translation vector.

既存の多視点画像データセットから新しい多視点画像データセットを作成する方法について、上述した画像変形を組み合わせても良い。 Regarding the method of creating a new multi-view image data set from an existing multi-view image data set, the image transformations described above may be combined.

また、上述した画像変形による多視点画像データセットの作成について、異なる回転角や異なる拡大縮小率、および、これらの組み合わせに応じた画像変形を行うことによって、新しい多視点画像データセットを作成しても良い。例えば、二次元回転の回転角と拡大縮小率を、異なる乱数の組に応じて設定すれば、異なる画像変形を含む新しい多視点画像データセットを、多数生成することができる。 In addition, regarding the creation of a multi-view image dataset by image deformation as described above, a new multi-view image dataset can be created by performing image deformation according to different rotation angles, different scaling ratios, and combinations of these. Also good. For example, by setting the rotation angle and scaling ratio of two-dimensional rotation according to different sets of random numbers, it is possible to generate a large number of new multi-view image data sets including different image transformations.

ここで、多視点画像データセットにおいては、各参照視点画像に対する近傍視点画像群の選び方によって基線長が変化する。そのため、各参照視点画像に対する近傍視点画像群の選び方によって、視点間の画像変形の大きさも変化する。 Here, in a multi-view image data set, the baseline length changes depending on how a group of neighboring viewpoint images is selected for each reference viewpoint image. Therefore, the magnitude of image deformation between viewpoints also changes depending on how to select a group of neighboring viewpoint images for each reference viewpoint image.

例えば、参照視点画像に対する近傍視点画像群の選び方として、基線長に基づく方法がある。基線長に基づく手法では、参照視点画像と他の多視点画像との基線長を求め、基線長が短い順にＮ枚の多視点画像を選択し、選択した多視点画像を、参照視点画像に対する近傍視点画像群とする。 For example, as a method of selecting a group of neighboring viewpoint images with respect to a reference viewpoint image, there is a method based on the baseline length. In the method based on the baseline length, the baseline length between the reference viewpoint image and other multi-view images is determined, N multi-view images are selected in descending order of the baseline length, and the selected multi-view images are compared to the vicinity of the reference viewpoint image. A group of viewpoint images.

また、基線長に対する適正値をあらかじめ設定しておき、参照視点画像との基線長が適正値に近い順に近傍視点画像群を選択しても良い。また、基線長に対する下限と上限のどちらか一方、または両方をあらかじめ設定しておき、基線長がその範囲内に含まれる多視点画像から近傍視点画像群を選択しても良い。また、基線長に対する適正値や、基線長に対する下限と上限を変えながら、複数の多視点画像データセットを作成しても良い。例えば、基線長に対する適正値と、基線長に対する下限と上限とを乱数によって設定すれば、互いに異なる近傍視点画像群を有する多視点画像データセットを、多数生成することができる。 Alternatively, an appropriate value for the baseline length may be set in advance, and a group of neighboring viewpoint images may be selected in the order in which the baseline length with respect to the reference viewpoint image is closest to the appropriate value. Alternatively, one or both of a lower limit and an upper limit for the baseline length may be set in advance, and a group of nearby viewpoint images may be selected from multi-view images whose baseline lengths are within the range. Further, a plurality of multi-view image data sets may be created while changing the appropriate value for the baseline length and the lower and upper limits for the baseline length. For example, by setting an appropriate value for the baseline length and a lower limit and an upper limit for the baseline length using random numbers, it is possible to generate a large number of multi-view image data sets having mutually different neighboring viewpoint image groups.

近傍視点画像群の選び方として、上述の基線長に基づく手法の代わりに、光軸のなす角で近傍視点画像群を選択しても良い。光軸のなす角に基づく手法では、参照視点画像の光軸と他の多視点画像の光軸がなす角を求め、光軸のなす角が小さい順にＮ枚の多視点画像を選択し、選択した多視点画像を近傍視点画像群とする。 As a method of selecting a group of nearby viewpoint images, instead of the method based on the base line length described above, a group of nearby viewpoint images may be selected based on the angle formed by the optical axis. In the method based on the angle formed by the optical axis, the angle formed by the optical axis of the reference viewpoint image and the optical axis of other multi-view images is determined, and N multi-view images are selected in descending order of the angle formed by the optical axes. The multi-view images obtained are defined as a group of nearby viewpoint images.

また、光軸のなす角に対する適正値をあらかじめ設定しておき、参照視点画像との光軸のなす角が、設定された適正値に近い順に近傍視点画像群を選択しても良い。また、光軸のなす角に対する下限と上限のどちらか一方または両方をあらかじめ設定しておき、光軸のなす角がその範囲内に含まれる多視点画像から、近傍視点画像群を選択しても良い。また、光軸のなす角に対する適正値や、光軸のなす角に対する下限と上限を変えながら、複数の多視点画像データセットを作成しても良い。例えば、光軸のなす角に対する適正値と、光軸のなす角に対する下限と上限とを乱数によって設定すれば、異なる近傍視点画像群を有する多視点画像データセットを、多数生成することができる。 Alternatively, an appropriate value for the angle formed by the optical axis may be set in advance, and a group of neighboring viewpoint images may be selected in the order in which the angle formed by the optical axis with the reference viewpoint image is closest to the set appropriate value. Alternatively, you can set either or both of the lower limit and upper limit for the angle formed by the optical axis in advance, and select a group of nearby viewpoint images from among the multi-view images whose angles formed by the optical axis are within that range. good. Further, a plurality of multi-view image datasets may be created while changing the appropriate value for the angle formed by the optical axis and the lower and upper limits for the angle formed by the optical axis. For example, by setting an appropriate value for the angle formed by the optical axis and a lower limit and an upper limit for the angle formed by the optical axis using random numbers, it is possible to generate a large number of multi-view image data sets having different neighboring viewpoint image groups.

近傍視点群の選び方として、上述の基線長に基づく手法、及び光軸のなす角に基づく手法の代わりに、視差角に基づく手法を用いて近傍視点画像群を選択しても良い。 As for how to select a group of nearby viewpoints, instead of the method based on the base line length and the method based on the angle formed by the optical axis described above, a method based on a parallax angle may be used to select the group of nearby viewpoint images.

視差角に基づく手法では、１点以上の三次元点において参照視点画像と他の多視点画像との視差角を求め、視差角の平均値が小さい順にＮ枚の多視点画像を選択し、選択した多視点画像を近傍視点群とする。 In the method based on parallax angle, the parallax angle between the reference viewpoint image and other multi-view images is determined at one or more three-dimensional points, and N multi-view images are selected in descending order of the average value of the parallax angle. The multi-view images obtained are defined as a group of nearby viewpoints.

また、視差角の平均値に対する適正値をあらかじめ設定しておき、参照視点画像との視差角の平均値が適正値に近い順に近傍視点画像群を選択しても良い。また、視差角の平均値に対する下限と上限のどちらか一方または両方をあらかじめ設定しておき、視差角の平均値がその範囲内に含まれる多視点画像から近傍視点画像群を選択しても良い。また、視差角の平均値に対する適正値や、視差角の平均値に対する下限と上限を変えながら、複数の多視点画像データセットを作成しても良い。例えば、視差角の平均値に対する適正値と、視差角の平均値に対する下限と上限とを乱数によって設定すれば、異なる近傍視点画像群を有する多視点画像データセットを、多数生成することができる。 Alternatively, an appropriate value for the average value of the parallax angle may be set in advance, and a group of neighboring viewpoint images may be selected in the order in which the average value of the parallax angle with respect to the reference viewpoint image is closer to the appropriate value. Alternatively, one or both of a lower limit and an upper limit for the average value of the parallax angle may be set in advance, and a group of nearby viewpoint images may be selected from multi-view images whose average value of the parallax angle is within the range. . Further, a plurality of multi-view image datasets may be created while changing the appropriate value for the average value of the parallax angle and the lower and upper limits for the average value of the parallax angle. For example, by setting an appropriate value for the average value of the parallax angle and a lower limit and an upper limit for the average value of the parallax angle using random numbers, it is possible to generate a large number of multi-view image data sets having different neighboring viewpoint image groups.

上述した既存の多視点画像データセットから新しい多視点画像データセットを作成する方法について、機械学習に適用する場合、学習の事前に新しいデータセットを作成して準備しておいても良いし、学習における各繰り返しの中で新しいデータセットを作りながら学習させても良い。学習における各繰り返しの中で新しいデータセットを作りながら学習させる場合、事前に準備しておく場合に比べて、処理時間が長くなるが、メモリやハードディスクに記憶しておくデータ容量が小さくなるという利点がある。 When applying the above-mentioned method of creating a new multi-view image dataset from an existing multi-view image dataset to machine learning, you can prepare a new dataset by creating it in advance of learning, or Learning may be performed by creating a new data set during each iteration. If you train while creating a new data set during each iteration of learning, the processing time will be longer than if you prepare it in advance, but the advantage is that the amount of data stored in memory or hard disk will be smaller. There is.

上述した既存の多視点画像データセットから新しい多視点画像データセットを作成する方法は、多視点ステレオアルゴリズムの画像変形に対するロバスト性の評価に利用することもできると共に、機械学習における学習用のデータセットの拡充にも利用することができる。 The above method of creating a new multi-view image dataset from an existing multi-view image dataset can be used to evaluate the robustness of a multi-view stereo algorithm against image deformation, and can also be used as a learning dataset in machine learning. It can also be used to expand.

一般に、多視点画像においては、対象物体における三次元形状と、多視点画像における各視点の位置及び姿勢とに依存した様々な画像変形が含まれる。様々な対象物体や、様々な撮影状況に対応できるようにモデルを学習させるためには、多くの対象物体について、複数の撮影状況で撮像した多視点画像データセットを作成する必要がある。 Generally, a multi-view image includes various image deformations depending on the three-dimensional shape of the target object and the position and orientation of each viewpoint in the multi-view image. In order to train a model to handle various target objects and various shooting situations, it is necessary to create multi-view image data sets of many target objects captured under multiple shooting situations.

一方、多くの画像変形に対応させようとすると、必要なデータセットが膨大になり、作業負荷が増加する。これに対して、上述の多視点画像データセットを作成する方法を適用すれば、少数の多視点画像データセットから、より多様な画像変形を含む多視点画像データセットを、多数生成することができる。このようにして生成された、多様な画像変形を含む多視点画像データセットを用いて機械学習モデルを学習させることによって、多視点画像における画像変形に対するロバスト性が向上した学習済モデルを生成することが可能となる。 On the other hand, when attempting to accommodate many image transformations, the required data set becomes enormous and the workload increases. On the other hand, by applying the method for creating multi-view image datasets described above, it is possible to generate a large number of multi-view image datasets that include more diverse image transformations from a small number of multi-view image datasets. . By training a machine learning model using the thus generated multi-view image dataset containing various image deformations, a trained model with improved robustness against image deformations in multi-view images is generated. becomes possible.

ここで、実際の多視点画像データセットを使用して確認した本発明の効果について説明する。本実施形態の効果を確認するために用いる多視点画像データセットとして、多視点ステレオ評価用の公開データセットであるＤＴＵ－ＭＶＳデータセットを利用した。 Here, the effects of the present invention confirmed using an actual multi-view image data set will be described. As a multi-view image data set used to confirm the effects of this embodiment, the DTU-MVS data set, which is a public data set for multi-view stereo evaluation, was used.

図７（図７Ａ及び図７Ｂ）は、本実施形態の効果を説明するための図である。本実施形態の効果を確認するために用いた多視点画像データセットの例が示されている。図７Ａには、学習用データセットの例が示されている。図７Ｂには、評価用データセットの例が示されている。 FIG. 7 (FIGS. 7A and 7B) is a diagram for explaining the effects of this embodiment. An example of a multi-view image dataset used to confirm the effects of this embodiment is shown. FIG. 7A shows an example of a training data set. FIG. 7B shows an example of the evaluation data set.

図７Ａに示すように、学習用データセットには、非特許文献１に示されるような、画像変形を加えない多視点画像であって、参照視点画像との基線長が比較的短い多視点画像を、その近傍視点画像に対する近傍視点画像群として選択した。 As shown in FIG. 7A, the training dataset includes multi-view images that are not subjected to image deformation and have a relatively short baseline length from the reference viewpoint image, as shown in Non-Patent Document 1. were selected as a group of neighboring viewpoint images for the neighboring viewpoint images.

一方、図７Ｂに示すように、評価用データセットには、近傍視点画像群を選択する方法は、学習用データセットと同じであるが、多視点画像にランダムな二次元回転を加えたデータセットを利用した。 On the other hand, as shown in FIG. 7B, the evaluation dataset uses the same method of selecting neighboring viewpoint images as the training dataset, but is a dataset with random two-dimensional rotation added to the multi-view images. was used.

非特許文献１に示されるＭＶＳＮｅｔ（従来技術）と、本実施形態のそれぞれの方法を用いて、学習用データセットにて機械学習モデルを学習させたそれぞれの学習済モデルを用いて、評価用データセットに基づく三次元点群を生成する。 Using MVSNet (prior art) shown in Non-Patent Document 1 and each method of this embodiment, the evaluation data is calculated using each trained model that has been trained with a machine learning model using the training dataset. Generate a 3D point cloud based on a set.

それぞれの学習済モデルが、評価用データセットに基づいて生成した三次元点群、及びＤＴＵ－ＭＶＳデータセットに含まれる真値の三次元点群を、図８に示す。 FIG. 8 shows the three-dimensional point cloud generated by each trained model based on the evaluation data set and the three-dimensional point cloud of true values included in the DTU-MVS dataset.

図８は、本実施形態の効果を説明するための図である。図８には、ＤＴＵ－ＭＶＳデータセットに含まれる真値の三次元点群、及び従来技術と本実施形態のそれぞれの方法によって学習させ学習済モデルが評価用データセットに基づいて生成した三次元点群が示されている。図８ではモノクロ画像で示しているが、図８に示す各三次元点群は色付きの三次元点群である。 FIG. 8 is a diagram for explaining the effects of this embodiment. FIG. 8 shows a three-dimensional point cloud of true values included in the DTU-MVS dataset, and a three-dimensional point group generated by the trained model based on the evaluation dataset using the conventional technology and the method of this embodiment. A point cloud is shown. Although shown as a monochrome image in FIG. 8, each three-dimensional point group shown in FIG. 8 is a colored three-dimensional point group.

図８に示すように、ＭＶＳＮｅｔ（従来技術）では、多くの未復元領域が発生している。これは、ＭＶＳＮｅｔでは、多視点画像間の画像変形として平行移動しか対応することができないため、画像間における二次元回転を含む評価用データセットでは、正確な三次元点が求められないためであると考えられる。 As shown in FIG. 8, in MVSNet (prior art), many unrestored areas occur. This is because MVSNet can only support parallel translation as an image deformation between multi-view images, and accurate three-dimensional points cannot be found in evaluation datasets that include two-dimensional rotation between images. it is conceivable that.

これに対して、本実施形態では、画像間における二次元回転を含む評価用データセットを用いた場合であっても、正確な三次元点を求めることが可能である。このように、本実施形態を用いることで、多視点画像間における画像変形に対するロバスト性が向上することが確認できる。 In contrast, in this embodiment, even when using an evaluation data set that includes two-dimensional rotation between images, it is possible to obtain accurate three-dimensional points. In this way, it can be confirmed that by using this embodiment, the robustness against image deformation between multi-view images is improved.

以上説明したように、実施形態の特徴マップ生成装置１は、多視点画像における特徴マップを生成する特徴マップ生成装置である。多視点画像は、対象物体を互いに異なる複数の視点から撮像した２枚以上の画像である。特徴マップ生成装置１は、カーネル変形方法決定部１００と、カーネル変形部１０１と、畳込演算部１０２を備える。カーネル変形方法決定部１００は、多視点画像に含まれる対象視点画像（第１対象視点画像）に対して、三次元座標、法線方向、及び、その対象視点画像に対応するカメラパラメータ（第１カメラパラメータ）を用いて、カーネルにおける特定の座標が、その対象視点画像における対応点の座標に変換されるように、その対象視点画像に対応するカーネルの変形方法（第１変形方法）を決定する。カーネル変形部１０１は、基準カーネルを、カーネル変形方法決定部１００によって決定された変形方法を用いて変形することによって、その対象視点画像に対応する変形後カーネル（第１変形後カーネル）を生成する。畳込演算部１０２は、カーネル変形部１０１によって生成された変形後カーネルを用いて、その対象視点画像に対する畳込演算を行うことによって、その対象視点画像に対応する特徴マップ（第１特徴マップ）を生成する。 As described above, the feature map generation device 1 of the embodiment is a feature map generation device that generates feature maps in multi-view images. A multi-view image is two or more images of a target object taken from a plurality of different viewpoints. The feature map generation device 1 includes a kernel transformation method determining section 100, a kernel transformation section 101, and a convolution operation section 102. The kernel deformation method determining unit 100 determines three-dimensional coordinates, normal directions, and camera parameters (first Determine a method of deforming the kernel corresponding to the target viewpoint image (first deformation method) so that specific coordinates in the kernel are transformed into coordinates of a corresponding point in the target viewpoint image using the camera parameters). . The kernel transformation unit 101 generates a transformed kernel (first transformed kernel) corresponding to the target viewpoint image by transforming the reference kernel using the transformation method determined by the kernel transformation method determining unit 100. . The convolution operation unit 102 uses the transformed kernel generated by the kernel transformation unit 101 to perform a convolution operation on the target viewpoint image, thereby creating a feature map (first feature map) corresponding to the target viewpoint image. generate.

これにより、実施形態の特徴マップ生成装置１では、画像における画像変形に応じて基準カーネルを変形させた変形後カーネルを用いて、その画像における特徴量を抽出することができる。したがって、画像変形が大きい多視点画像を用いた場合であっても、高精度な三次元復元が可能な特徴マップを生成することができる。 Thereby, the feature map generation device 1 of the embodiment can extract the feature amount in the image using a transformed kernel obtained by transforming the reference kernel in accordance with the image transformation in the image. Therefore, even when a multi-view image with large image deformation is used, a feature map that allows highly accurate three-dimensional reconstruction can be generated.

また、実施形態の特徴マップ生成装置１では、カーネル変形方法決定部１００は、多視点画像に含まれる対象視点画像であって、互いに異なる２つの画像に対応するカメラパラメータ（第１カメラパラメータ、及び第２カメラパラメータ）を取得する。カーネル変形方法決定部１００は、三次元座標、法線方向、及び、その２つの画像に対応するカメラパラメータ（第１カメラパラメータ、及び第２カメラパラメータ）を用いて、その２つの画像のうち、一方の画像における対応点の座標が、他方の画像における対応点の座標に変換されるように、他方の画像に対応するカーネルの変形方法を決定する。例えば、カーネル変形方法決定部１００は、第１対象視点画像の第１カメラパラメータ、及び第２対象視点画像の第２カメラパラメータをもちいて、第２対象視点画像における対応点の座標が、第１対象視点画像における対応点の座標に変換されるように、第１対象視点画像に対応するカーネルの変形方法、すなわち第１変形方法を決定する。これにより、実施形態の特徴マップ生成装置１では、２つの画像間における画像変形を考慮して変形方法を決定することができ、上述した効果と同様の効果を奏する。 Further, in the feature map generation device 1 of the embodiment, the kernel deformation method determination unit 100 determines camera parameters (first camera parameters and second camera parameter). The kernel deformation method determination unit 100 uses the three-dimensional coordinates, normal direction, and camera parameters (first camera parameter and second camera parameter) corresponding to the two images to determine which of the two images, A method of deforming the kernel corresponding to the other image is determined so that the coordinates of the corresponding point in one image are converted to the coordinates of the corresponding point in the other image. For example, the kernel deformation method determination unit 100 uses the first camera parameter of the first target viewpoint image and the second camera parameter of the second target viewpoint image to determine whether the coordinates of the corresponding point in the second target viewpoint image are the first A method for transforming the kernel corresponding to the first target viewpoint image, that is, a first transformation method, is determined so that the kernel is transformed into the coordinates of a corresponding point in the target viewpoint image. Thereby, the feature map generation device 1 of the embodiment can determine a deformation method in consideration of image deformation between two images, and the same effect as described above can be achieved.

上述した実施形態において、特徴マップ生成装置１では、カーネル変形部１０１は、三次元座標、法線方向及びカメラパラメータの組合せ毎にカーネルの変形方法を決定する。カーネル変形部１０１は、共通する同一の基準カーネルを、カーネル変形方法決定部１００によって決定された組合せの各々に対応する変形方法を用いて変形することによって、組合せの各々に対応する変形後カーネルを生成する。これにより、実施形態の特徴マップ生成装置１では、三次元座標、法線方向及びカメラパラメータの組合せ毎に変形方法を決定することができ、上述した効果と同様の効果を奏する。 In the embodiment described above, in the feature map generation device 1, the kernel transformation unit 101 determines a kernel transformation method for each combination of three-dimensional coordinates, normal direction, and camera parameters. The kernel transformation unit 101 transforms the same common reference kernel using the transformation method corresponding to each of the combinations determined by the kernel transformation method determination unit 100, thereby generating transformed kernels corresponding to each of the combinations. generate. As a result, the feature map generation device 1 of the embodiment can determine a deformation method for each combination of three-dimensional coordinates, normal direction, and camera parameters, producing the same effects as described above.

上述した実施形態において、特徴マップ生成装置１では、カーネル変形方法決定部１００は、対象視点画像のピクセル毎に設定した三次元座標又は法線方向の少なくとも一方を用いて、その対象視点画像のピクセル毎に、変形方法を決定する。カーネル変形部１０１は、基準カーネルを、その対象視点画像のピクセル毎に、その対象視点画像のピクセル毎に決定された変形方法を用いて変形することによって、その対象視点画像のピクセルの各々に対応する変形後カーネルをそれぞれ生成する。畳込演算部１０２は、その対象視点画像のピクセル毎に生成された変形後カーネルを用いて、その対象視点画像における各ピクセルにおける畳込演算を行うことによって、特徴マップを生成する。これにより、実施形態の特徴マップ生成装置１では、対象視点画像のピクセル毎に変形方法を決定することができ、上述した効果と同様の効果を奏する。 In the embodiment described above, in the feature map generation device 1, the kernel deformation method determination unit 100 uses at least one of the three-dimensional coordinates or the normal direction set for each pixel of the target viewpoint image to transform the pixel of the target viewpoint image. Determine the transformation method for each case. The kernel transformation unit 101 transforms the reference kernel for each pixel of the target viewpoint image using a transformation method determined for each pixel of the target viewpoint image, thereby corresponding to each pixel of the target viewpoint image. Generate each transformed kernel. The convolution calculation unit 102 generates a feature map by performing a convolution calculation on each pixel in the target viewpoint image using the transformed kernel generated for each pixel in the target viewpoint image. As a result, the feature map generation device 1 of the embodiment can determine the deformation method for each pixel of the target viewpoint image, producing the same effects as described above.

上述した実施形態において、特徴マップ生成装置１では、カーネル変形部１０１は、基準カーネルを、カーネル変形方法決定部１００により決定された変形方法を用いて変形した、実数の二次元座標を持つカーネルを仮変形後カーネル（第１仮変形後カーネル）とする。カーネル変形部１０１は、その仮変形後カーネルに対して正方格子で配置された座標を用いた補間処理を行うことにより、整数の二次元座標を持つ変形後カーネル（第１変形後カーネル）を生成する。これにより、実施形態の特徴マップ生成装置１では、畳込演算の計算コストを増大させる実数の二次元座標を持つカーネルを、整数の二次元座標を持つ変形後カーネルに変換することができ、計算コストの増大を抑制することができる。 In the embodiment described above, in the feature map generation device 1, the kernel transformation unit 101 transforms the reference kernel using the transformation method determined by the kernel transformation method determination unit 100, and transforms the kernel with two-dimensional coordinates of real numbers. This is assumed to be a temporarily transformed kernel (first provisionally transformed kernel). The kernel transformation unit 101 generates a transformed kernel (first transformed kernel) having integer two-dimensional coordinates by performing interpolation processing on the temporarily transformed kernel using coordinates arranged in a square grid. do. As a result, the feature map generation device 1 of the embodiment can convert a kernel with two-dimensional coordinates of real numbers, which increases the calculation cost of the convolution operation, into a transformed kernel with two-dimensional coordinates of integers, and calculate Cost increase can be suppressed.

上述した実施形態において、特徴マップ生成装置１では、カーネル変形方法決定部１００は、互いに異なる複数の法線方向の各々に対応する変形方法をそれぞれ決定する。カーネル変形部１０１は、基準カーネルを、法線方向の各々に対して決定された変形方法を用いて変形することによって、法線方向の各々に対応する変形後カーネルを生成する。畳込演算部１０２は、法線方向の各々に対して生成された変形後カーネルを用いて、対象視点画像に対する畳込演算を行うことによって、法線方向の各々に対応する特徴マップをそれぞれ生成する。これにより、実施形態の特徴マップ生成装置１では、対象物体における三次元点について複数の法線方向を仮定した場合における、それぞれの特徴マップを生成することができ、上述した効果と同様の効果を奏する。 In the embodiment described above, in the feature map generation device 1, the kernel deformation method determination unit 100 determines a deformation method corresponding to each of a plurality of mutually different normal directions. The kernel transformation unit 101 generates transformed kernels corresponding to each of the normal directions by transforming the reference kernel using the transformation method determined for each of the normal directions. The convolution operation unit 102 generates feature maps corresponding to each of the normal directions by performing a convolution operation on the target viewpoint image using the transformed kernels generated for each of the normal directions. do. As a result, the feature map generation device 1 of the embodiment can generate feature maps for each of the three-dimensional points on the target object assuming a plurality of normal directions, and can achieve the same effect as described above. play.

上述した実施形態において、特徴マップ生成装置１は、コスト値計算部１０３を更に備える。コスト値計算部１０３は、複数の特徴マップにおける対応点からコスト値を計算する。コスト値計算部１０３は、法線方向の各々に対応する特徴マップにおける対応点のそれぞれの特徴量から算出される、対応点の特徴量が類似する度合に基づく仮のコスト値（仮コスト値）を計算する。コスト値計算部１０３は、法線方向の各々に対応して計算された仮コスト値のうち、対応点の特徴量が類似する仮コスト値を、最終的なコスト値とする。これにより、実施形態の特徴マップ生成装置１では、対象物体における三次元点について複数の法線方向を仮定した場合における、それぞれの対応点のコスト値が類似する度合から、特徴量が最も類似する法線方向を真の法線方向に近い法線方向として特定することができる。 In the embodiment described above, the feature map generation device 1 further includes a cost value calculation unit 103. The cost value calculation unit 103 calculates cost values from corresponding points in a plurality of feature maps. The cost value calculation unit 103 calculates a provisional cost value (temporary cost value) based on the degree to which the feature amounts of the corresponding points are similar, which is calculated from the feature amounts of each of the corresponding points in the feature map corresponding to each normal direction. Calculate. The cost value calculation unit 103 determines, among the provisional cost values calculated corresponding to each normal direction, a provisional cost value having similar feature amounts at corresponding points as a final cost value. As a result, in the feature map generation device 1 of the embodiment, when a plurality of normal directions are assumed for a three-dimensional point in a target object, the feature values are most similar based on the degree to which the cost values of respective corresponding points are similar. The normal direction can be specified as a normal direction close to the true normal direction.

上述した実施形態における特徴マップ生成装置１の全部または一部をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 All or part of the feature map generation device 1 in the embodiment described above may be realized by a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read into a computer system and executed. Note that the "computer system" herein includes hardware such as an OS and peripheral devices. Furthermore, the term "computer-readable recording medium" refers to portable media such as flexible disks, magneto-optical disks, ROMs, and CD-ROMs, and storage devices such as hard disks built into computer systems. Furthermore, a "computer-readable recording medium" refers to a storage medium that dynamically stores a program for a short period of time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. It may also include a device that retains a program for a certain period of time, such as a volatile memory inside a computer system that is a server or client in that case. Further, the above-mentioned program may be one for realizing a part of the above-mentioned functions, or may be one that can realize the above-mentioned functions in combination with a program already recorded in the computer system. It may also be realized using a programmable logic device such as an FPGA.

上述したコンピュータは、量子コンピュータであってもよい。量子コンピュータは、例えば、量子力学的な重ね合わせの原理を用いた並列計算を行うコンピュータであり、従来型のコンピュータより指数関数的に高速な計算が可能なコンピュータである。量子コンピュータを用いることによって、畳込演算などを実行する際に高速な計算が可能となる。 The computer mentioned above may be a quantum computer. A quantum computer is, for example, a computer that performs parallel calculations using the quantum mechanical principle of superposition, and is a computer that can perform calculations exponentially faster than conventional computers. By using quantum computers, high-speed calculations such as convolution operations are possible.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計、装置構成等も含まれる。 Although the embodiments of this invention have been described above in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and includes designs, device configurations, etc. within the scope of the gist of this invention. It will be done.

なお、以下の各発明も本発明に含まれる。 Note that the following inventions are also included in the present invention.

（発明１）
対象物体が互いに異なる複数の視点から撮像された２以上の多視点画像における特徴マップを生成する特徴マップ生成装置であって、
前記多視点画像に含まれる第１対象視点画像に対して、三次元座標、法線方向及び前記第１対象視点画像に対応するカメラパラメータである第１カメラパラメータを用いて、カーネルにおける特定の座標が、前記第１対象視点画像における対応点の座標に変換されるように、前記第１対象視点画像に対応するカーネルの変形方法である第１変形方法を決定するカーネル変形方法決定部と、
基準となる基準カーネルを、前記第１変形方法を用いて変形することによって、前記第１対象視点画像に対する畳込演算に用いるカーネルである第１変形後カーネルを生成するカーネル変形部と、
前記第１変形後カーネルを用いて前記第１対象視点画像に対する畳込演算を行うことによって前記第１対象視点画像における特徴量を抽出し、抽出した特徴量を用いて前記第１対象視点画像に対応する第１特徴マップを生成する畳込演算部と、
を備える特徴マップ生成装置。 (Invention 1)
A feature map generation device that generates feature maps in two or more multi-view images in which a target object is imaged from a plurality of different viewpoints,
For a first target viewpoint image included in the multi-view image, specific coordinates in the kernel are determined using three-dimensional coordinates, a normal direction, and a first camera parameter that is a camera parameter corresponding to the first target viewpoint image. a kernel deformation method determination unit that determines a first deformation method that is a deformation method of the kernel corresponding to the first target viewpoint image so that
a kernel transformation unit that generates a first transformed kernel that is a kernel used in a convolution operation on the first target viewpoint image by transforming a reference kernel that is a reference using the first transformation method;
A feature amount in the first target viewpoint image is extracted by performing a convolution operation on the first target viewpoint image using the first transformed kernel, and a feature amount is extracted from the first target viewpoint image using the extracted feature amount. a convolution operation unit that generates a corresponding first feature map;
A feature map generation device comprising:

（発明２）
前記カーネル変形方法決定部は、前記多視点画像に含まれる第２対象視点画像であって、前記第１対象視点画像とは異なる第２対象視点画像に対応するカメラパラメータである第２カメラパラメータを取得し、前記三次元座標、前記法線方向、前記第１カメラパラメータ及び前記第２カメラパラメータを用いて、前記第２対象視点画像における対応点の座標が、前記第１対象視点画像における対応点の座標に変換されるように、前記第１変形方法を決定する、
発明１に記載の特徴マップ生成装置。 (Invention 2)
The kernel deformation method determining unit determines a second camera parameter that is a camera parameter corresponding to a second target viewpoint image that is included in the multi-view image and is different from the first target viewpoint image. and using the three-dimensional coordinates, the normal direction, the first camera parameter, and the second camera parameter, the coordinates of the corresponding point in the second target viewpoint image are determined as the corresponding point in the first target viewpoint image. determining the first transformation method so that the coordinates of
A feature map generation device according to invention 1.

（発明３）
前記カーネル変形方法決定部は、三次元座標、法線方向及びカメラパラメータの組合せ毎にカーネルの変形方法を決定し、
前記カーネル変形部は、共通する同一の前記基準カーネルを、前記組合せの各々に対して決定された変形方法を用いて変形することによって、前記組合せの各々に対応する前記第１変形後カーネルを生成する、
発明１または発明２に記載の特徴マップ生成装置。 (Invention 3)
The kernel deformation method determining unit determines a kernel deformation method for each combination of three-dimensional coordinates, normal direction, and camera parameters,
The kernel transformation unit generates the first transformed kernel corresponding to each of the combinations by transforming the same and common reference kernel using a transformation method determined for each of the combinations. do,
A feature map generation device according to invention 1 or invention 2.

（発明４）
前記カーネル変形方法決定部は、前記第１対象視点画像のピクセル毎に設定した三次元座標又は法線方向の少なくとも一方を用いて、前記第１対象視点画像のピクセル毎に前記第１変形方法を決定し、
前記カーネル変形部は、前記基準カーネルを、前記第１対象視点画像のピクセル毎に、前記第１対象視点画像のピクセル毎に決定された前記第１変形方法を用いて変形することによって、前記第１対象視点画像のピクセルの各々に対応する前記第１変形後カーネルを生成し、
前記畳込演算部は、前記第１対象視点画像のピクセル毎に生成された前記第１変形後カーネルを用いて、前記第１対象視点画像における各ピクセルにおける畳込演算を行うことによって、前記第１特徴マップを生成する、
発明１から発明３のいずれか１つに記載の特徴マップ生成装置。 (Invention 4)
The kernel deformation method determining unit applies the first deformation method to each pixel of the first target viewpoint image using at least one of three-dimensional coordinates or a normal direction set for each pixel of the first target viewpoint image. decided,
The kernel deformation unit deforms the reference kernel for each pixel of the first target viewpoint image using the first deformation method determined for each pixel of the first target viewpoint image. generating the first transformed kernel corresponding to each pixel of one target viewpoint image;
The convolution calculation unit performs a convolution calculation on each pixel in the first target viewpoint image using the first transformed kernel generated for each pixel in the first target viewpoint image, thereby 1 Generate a feature map,
The feature map generation device according to any one of inventions 1 to 3.

（発明５）
前記カーネル変形部は、前記基準カーネルを、前記第１変形方法を用いて変形したカーネルを第１仮変形後カーネルとし、前記第１仮変形後カーネルに対して正方格子で配置された座標を用いた補間処理を行うことにより、前記第１変形後カーネルを生成する、
発明１から発明４のいずれか１つに記載の特徴マップ生成装置。 (Invention 5)
The kernel deformation unit deforms the reference kernel using the first deformation method as a first temporary deformation kernel, and uses coordinates arranged in a square lattice for the first temporary deformation kernel. generating the first transformed kernel by performing interpolation processing,
The feature map generation device according to any one of inventions 1 to 4.

（発明６）
前記カーネル変形方法決定部は、互いに異なる複数の法線方向の各々に対応する前記第１変形方法を決定し、
前記カーネル変形部は、前記基準カーネルを、前記法線方向の各々に対して決定された前記第１変形方法を用いて変形することによって、前記法線方向の各々に対応する前記第１変形後カーネルを生成し、
前記畳込演算部は、前記法線方向の各々に対して生成された前記第１変形後カーネルを用いて、前記第１対象視点画像に対する畳込演算を行うことによって、前記法線方向の各々に対応する前記第１特徴マップを生成する、
発明１から発明５のいずれか１つに記載の特徴マップ生成装置。 (Invention 6)
The kernel deformation method determining unit determines the first deformation method corresponding to each of a plurality of mutually different normal directions,
The kernel deformation unit deforms the reference kernel using the first deformation method determined for each of the normal directions, thereby deforming the reference kernel after the first deformation corresponding to each of the normal directions. generate the kernel,
The convolution operation unit performs a convolution operation on the first target viewpoint image using the first transformed kernel generated for each of the normal directions, thereby generating the first feature map corresponding to
The feature map generation device according to any one of inventions 1 to 5.

（発明７）
複数の前記特徴マップにおける対応点からコスト値を計算するコスト値計算部を更に備え、
前記コスト値計算部は、
前記法線方向の各々に対応する前記第１特徴マップにおける対応点のそれぞれの特徴量から算出される、前記対応点の特徴量が類似する度合に基づく仮コスト値を計算し、
前記法線方向の各々に対応して計算された前記仮コスト値のうち、前記対応点の特徴量が最も類似する前記仮コスト値を、前記コスト値とする、
発明６に記載の特徴マップ生成装置。 (Invention 7)
further comprising a cost value calculation unit that calculates a cost value from corresponding points in the plurality of feature maps,
The cost value calculation unit includes:
Calculating a provisional cost value based on the degree to which the feature amounts of the corresponding points are similar, calculated from the feature amounts of each of the corresponding points in the first feature map corresponding to each of the normal directions,
Among the provisional cost values calculated corresponding to each of the normal directions, the provisional cost value with the most similar feature amount of the corresponding point is set as the cost value;
The feature map generation device according to invention 6.

１…特徴マップ生成装置
１００…カーネル変形方法決定部
１０１…カーネル変形部
１０２…畳込演算部
１０３…コスト値計算部 1...Feature map generation device 100...Kernel transformation method determining section 101...Kernel transformation section 102...Convolution operation section 103...Cost value calculation section

Claims

対象物体が互いに異なる複数の視点から撮像された２以上の多視点画像における特徴マップを生成する特徴マップ生成装置であって、
前記多視点画像に含まれる第１対象視点画像に対して、三次元座標、法線方向及び前記第１対象視点画像に対応するカメラパラメータである第１カメラパラメータを用いて、カーネルにおける特定の座標が、前記第１対象視点画像における対応点の座標に変換されるように、前記第１対象視点画像に対応するカーネルの変形方法である第１変形方法を決定するカーネル変形方法決定部と、
基準となる基準カーネルを、前記第１変形方法を用いて変形することによって、前記第１対象視点画像に対する畳込演算に用いるカーネルである第１変形後カーネルを生成するカーネル変形部と、
前記第１変形後カーネルを用いて前記第１対象視点画像に対する畳込演算を行うことによって前記第１対象視点画像における特徴量を抽出し、抽出した特徴量を用いて前記第１対象視点画像に対応する第１特徴マップを生成する畳込演算部と、
を備える特徴マップ生成装置。 A feature map generation device that generates feature maps in two or more multi-view images in which a target object is imaged from a plurality of different viewpoints,
For a first target viewpoint image included in the multi-view image, specific coordinates in the kernel are determined using three-dimensional coordinates, a normal direction, and a first camera parameter that is a camera parameter corresponding to the first target viewpoint image. a kernel deformation method determination unit that determines a first deformation method that is a deformation method of the kernel corresponding to the first target viewpoint image so that
a kernel transformation unit that generates a first transformed kernel that is a kernel used in a convolution operation on the first target viewpoint image by transforming a reference kernel that is a reference using the first transformation method;
A feature amount in the first target viewpoint image is extracted by performing a convolution operation on the first target viewpoint image using the first transformed kernel, and a feature amount is extracted from the first target viewpoint image using the extracted feature amount. a convolution operation unit that generates a corresponding first feature map;
A feature map generation device comprising:

前記カーネル変形方法決定部は、前記多視点画像に含まれる第２対象視点画像であって、前記第１対象視点画像とは異なる第２対象視点画像に対応するカメラパラメータである第２カメラパラメータを取得し、前記三次元座標、前記法線方向、前記第１カメラパラメータ及び前記第２カメラパラメータを用いて、前記第２対象視点画像における対応点の座標が、前記第１対象視点画像における対応点の座標に変換されるように、前記第１変形方法を決定する、
請求項１に記載の特徴マップ生成装置。 The kernel deformation method determining unit determines a second camera parameter that is a camera parameter corresponding to a second target viewpoint image included in the multi-view image and that is different from the first target viewpoint image. and using the three-dimensional coordinates, the normal direction, the first camera parameter, and the second camera parameter, the coordinates of the corresponding point in the second target viewpoint image are determined as the corresponding point in the first target viewpoint image. determining the first transformation method so that the coordinates of
The feature map generation device according to claim 1.

前記カーネル変形方法決定部は、三次元座標、法線方向及びカメラパラメータの組合せ毎にカーネルの変形方法を決定し、
前記カーネル変形部は、共通する同一の前記基準カーネルを、前記組合せの各々に対して決定された変形方法を用いて変形することによって、前記組合せの各々に対応する前記第１変形後カーネルを生成する、
請求項１に記載の特徴マップ生成装置。 The kernel deformation method determining unit determines a kernel deformation method for each combination of three-dimensional coordinates, normal direction, and camera parameters,
The kernel transformation unit generates the first transformed kernel corresponding to each of the combinations by transforming the same and common reference kernel using a transformation method determined for each of the combinations. do,
The feature map generation device according to claim 1.

前記カーネル変形方法決定部は、前記第１対象視点画像のピクセル毎に設定した三次元座標又は法線方向の少なくとも一方を用いて、前記第１対象視点画像のピクセル毎に前記第１変形方法を決定し、
前記カーネル変形部は、前記基準カーネルを、前記第１対象視点画像のピクセル毎に、前記第１対象視点画像のピクセル毎に決定された前記第１変形方法を用いて変形することによって、前記第１対象視点画像のピクセルの各々に対応する前記第１変形後カーネルを生成し、
前記畳込演算部は、前記第１対象視点画像のピクセル毎に生成された前記第１変形後カーネルを用いて、前記第１対象視点画像における各ピクセルにおける畳込演算を行うことによって、前記第１特徴マップを生成する、
請求項１に記載の特徴マップ生成装置。 The kernel deformation method determining unit applies the first deformation method to each pixel of the first target viewpoint image using at least one of three-dimensional coordinates or a normal direction set for each pixel of the first target viewpoint image. decided,
The kernel deformation unit deforms the reference kernel for each pixel of the first target viewpoint image using the first deformation method determined for each pixel of the first target viewpoint image. generating the first deformed kernel corresponding to each pixel of one target viewpoint image;
The convolution operation unit performs a convolution operation on each pixel in the first target viewpoint image using the first transformed kernel generated for each pixel in the first target viewpoint image. 1 Generate a feature map,
The feature map generation device according to claim 1.

前記カーネル変形部は、前記基準カーネルを、前記第１変形方法を用いて変形したカーネルを第１仮変形後カーネルとし、前記第１仮変形後カーネルに対して正方格子で配置された座標を用いた補間処理を行うことにより、前記第１変形後カーネルを生成する、
請求項１に記載の特徴マップ生成装置。 The kernel deformation unit deforms the reference kernel using the first deformation method as a first temporary deformation kernel, and uses coordinates arranged in a square lattice for the first temporary deformation kernel. generating the first transformed kernel by performing interpolation processing,
The feature map generation device according to claim 1.

前記カーネル変形方法決定部は、互いに異なる複数の法線方向の各々に対応する前記第１変形方法を決定し、
前記カーネル変形部は、前記基準カーネルを、前記法線方向の各々に対して決定された前記第１変形方法を用いて変形することによって、前記法線方向の各々に対応する前記第１変形後カーネルを生成し、
前記畳込演算部は、前記法線方向の各々に対して生成された前記第１変形後カーネルを用いて、前記第１対象視点画像に対する畳込演算を行うことによって、前記法線方向の各々に対応する前記第１特徴マップを生成する、
請求項１に記載の特徴マップ生成装置。 The kernel deformation method determining unit determines the first deformation method corresponding to each of a plurality of mutually different normal directions,
The kernel deformation unit deforms the reference kernel using the first deformation method determined for each of the normal directions, thereby deforming the reference kernel after the first deformation corresponding to each of the normal directions. generate the kernel,
The convolution operation unit performs a convolution operation on the first target viewpoint image using the first transformed kernel generated for each of the normal directions, thereby generating the first feature map corresponding to
The feature map generation device according to claim 1.

複数の前記特徴マップにおける対応点からコスト値を計算するコスト値計算部を更に備え、
前記コスト値計算部は、
前記法線方向の各々に対応する前記第１特徴マップにおける対応点のそれぞれの特徴量から算出される、前記対応点の特徴量が類似する度合に基づく仮コスト値を計算し、
前記法線方向の各々に対応して計算された前記仮コスト値のうち、前記対応点の特徴量が類似する前記仮コスト値を、前記コスト値とする、
請求項６に記載の特徴マップ生成装置。 further comprising a cost value calculation unit that calculates a cost value from corresponding points in the plurality of feature maps,
The cost value calculation unit includes:
Calculating a provisional cost value based on the degree to which the feature amounts of the corresponding points are similar, calculated from the feature amounts of each of the corresponding points in the first feature map corresponding to each of the normal directions,
Among the provisional cost values calculated corresponding to each of the normal directions, the provisional cost values having similar feature amounts at the corresponding points are set as the cost values;
The feature map generation device according to claim 6.

対象物体が互いに異なる複数の視点から撮像された２以上の多視点画像における特徴マップを生成する特徴マップ生成装置が行う特徴マップ生成方法であって、
カーネル変形方法決定部が、前記多視点画像に含まれる第１対象視点画像に対して、三次元座標、法線方向及び前記第１対象視点画像に対応するカメラパラメータである第１カメラパラメータを用いて、カーネルにおける特定の座標が、前記第１対象視点画像における対応点の座標に変換されるように、前記第１対象視点画像に対応するカーネルの変形方法である第１変形方法を決定し、
カーネル変形部が、基準となる基準カーネルを、前記第１変形方法を用いて変形することによって、前記第１対象視点画像に対する畳込演算に用いるカーネルである第１変形後カーネルを生成し、
畳込演算部が、前記第１変形後カーネルを用いて前記第１対象視点画像に対する畳込演算を行うことによって前記第１対象視点画像における特徴量を抽出し、抽出した特徴量を用いて前記第１対象視点画像に対応する第１特徴マップを生成する、
特徴マップ生成方法。 A feature map generation method performed by a feature map generation device that generates feature maps in two or more multi-view images in which a target object is imaged from a plurality of different viewpoints, the method comprising:
The kernel deformation method determining unit uses, for a first target viewpoint image included in the multi-view image, three-dimensional coordinates, a normal direction, and a first camera parameter that is a camera parameter corresponding to the first target viewpoint image. determining a first deformation method, which is a method of deforming the kernel corresponding to the first target viewpoint image, so that specific coordinates in the kernel are converted to coordinates of a corresponding point in the first target viewpoint image;
a kernel transformation unit generates a first transformed kernel that is a kernel used in a convolution operation on the first target viewpoint image by transforming a reference kernel that is a reference using the first transformation method;
The convolution operation unit extracts a feature amount in the first target viewpoint image by performing a convolution operation on the first target viewpoint image using the first transformed kernel, and uses the extracted feature amount to perform a convolution operation on the first target viewpoint image. generating a first feature map corresponding to a first target viewpoint image;
Feature map generation method.

対象物体が互いに異なる複数の視点から撮像された２以上の多視点画像における特徴マップを生成する特徴マップ生成装置に、特徴マップを生成させるプログラムであって、
前記多視点画像に含まれる第１対象視点画像に対して、三次元座標、法線方向及び前記第１対象視点画像に対応するカメラパラメータである第１カメラパラメータを用いて、カーネルにおける特定の座標が、前記第１対象視点画像における対応点の座標に変換されるように、前記第１対象視点画像に対応するカーネルの変形方法である第１変形方法を決定させ、
基準となる基準カーネルを、前記第１変形方法を用いて変形することによって、前記第１対象視点画像に対する畳込演算に用いるカーネルである第１変形後カーネルを生成させ、
前記第１変形後カーネルを用いて前記第１対象視点画像に対する畳込演算を行うことによって前記第１対象視点画像における特徴量を抽出し、抽出した特徴量を用いて前記第１対象視点画像に対応する第１特徴マップを生成させる、
プログラム。 A program that causes a feature map generation device that generates feature maps in two or more multi-view images in which a target object is imaged from a plurality of different viewpoints to generate a feature map, the program comprising:
For a first target viewpoint image included in the multi-view image, specific coordinates in the kernel are determined using three-dimensional coordinates, a normal direction, and a first camera parameter that is a camera parameter corresponding to the first target viewpoint image. determines a first deformation method, which is a method of deforming a kernel corresponding to the first target viewpoint image, so that
Generating a first transformed kernel that is a kernel used in a convolution operation on the first target viewpoint image by transforming a reference kernel that is a reference using the first transformation method;
A feature amount in the first target viewpoint image is extracted by performing a convolution operation on the first target viewpoint image using the first transformed kernel, and a feature amount is extracted from the first target viewpoint image using the extracted feature amount. generating a corresponding first feature map;
program.