JP2023137841A

JP2023137841A - Three-dimensional shape restoration device, method, and program

Info

Publication number: JP2023137841A
Application number: JP2022044234A
Authority: JP
Inventors: 達也小林; Tatsuya Kobayashi
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2022-03-18
Filing date: 2022-03-18
Publication date: 2023-09-29

Abstract

To restore a three-dimensional shape of a target object with high accuracy using a foreground image extracted from a multi-viewpoint image captured under an arbitrary background environment.SOLUTION: A three-dimensional shape restoration device restores a three-dimensional shape of a target object using multi-viewpoint images captured by cameras with different viewpoints. A first foreground extraction unit 20a extracts a first foreground image F1k from a part of the multi-viewpoint images. A first three-dimensional shape restoration unit 40a coarsely restores a first three-dimensional shape M1 of the target object using the first foreground image F1k. A foreground extraction auxiliary image generating unit 30 uses the first three-dimensional shape M1 to generate a Trimap image for each of the multi-viewpoint images more than the part of the images. A second foreground extraction portion 20b extracts a second foreground image F2n for each of the many multi-viewpoint images using a corresponding Trimap image. A second three-dimensional shape restoration unit 40b restores a second three-dimensional shape M2 using the second foreground image F2n.SELECTED DRAWING: Figure 1

Description

本発明は、多視点画像から抽出した前景画像を用いて対象物の三次元形状を復元する三次元形状復元装置、方法及びプログラムに係り、特に、任意背景の環境下で撮影した多視点画像から抽出した前景画像を用いて三次元形状を復元する三次元形状復元装置、方法及びプログラムに関する。 The present invention relates to a three-dimensional shape restoring device, method, and program for restoring the three-dimensional shape of an object using a foreground image extracted from a multi-view image, and in particular, from a multi-view image taken under an arbitrary background environment. The present invention relates to a three-dimensional shape restoring device, method, and program for restoring a three-dimensional shape using an extracted foreground image.

画像から対象物の三次元復元を行うアプローチとして、多数の固定カメラを用いて撮影した多視点画像から三次元復元を行うアプローチと、1台の移動カメラを用いて撮影した多視点画像から三次元復元を行うアプローチとが存在する。 There are two approaches to 3D reconstruction of objects from images: 3D reconstruction from multi-view images taken using a large number of fixed cameras, and 3D reconstruction from multi-view images taken using a single moving camera. There are several approaches to performing restoration.

多数の固定カメラ（カメラ間の相対姿勢がキャリブレーションを行うなどして既知）を用いて撮影した多視点画像から三次元復元を行うアプローチで用いられる方式として、対象物の複数視点画像から対象物のシルエット（前景）を抽出し、SfS（Shape-from-Silhouette）法で三次元形状を求める方式が一般に知られている。 As a method used in an approach that performs 3D reconstruction from multi-view images taken using a large number of fixed cameras (relative postures between cameras are known through calibration, etc.), the object is reconstructed from multiple-view images of the object. A commonly known method is to extract the silhouette (foreground) of the image and obtain the three-dimensional shape using the Shape-from-Silhouette (SfS) method.

特許文献1には、シルエット抽出が不正確な場合でも高精度な三次元形状を求められるようにSfSを改良し、複数のシルエット画像を大局的に評価することで三次元復元を行う方式が開示されている。特許文献2には、一部領域が高速に動く対象物をSfS法で復元する際に、領域の速度に応じてパラメータを調整することで精度を向上する方式が開示されている。 Patent Document 1 discloses a method that improves SfS so that a highly accurate three-dimensional shape can be obtained even when silhouette extraction is inaccurate, and performs three-dimensional restoration by globally evaluating multiple silhouette images. has been done. Patent Document 2 discloses a method for improving accuracy by adjusting parameters according to the speed of the region when restoring an object in which some regions move at high speed using the SfS method.

一方、移動カメラ（あるいは、カメラ間の相対姿勢が未知の状態の固定カメラ複数台）を用いて撮影した多視点画像から三次元復元を行うアプローチで用いられる方式として、対象物の複数視点画像のそれぞれのカメラ姿勢をSfM（Structure-from-Motion）法（非特許文献1）で推定した上で、画像間のステレオマッチングによって視差画像を生成し、視差画像の合成により三次元形状を求めるMVS（Multi-View Stereo）法を用いる方式も知られている。例えば特許文献3には、ランダム探索によって対象物表面の法線方向を考慮したMVSを高速に行う、PatchMatch Stereo法（非特許文献2）を用いた三次元復元法が開示されている。 On the other hand, as a method used in an approach that performs 3D reconstruction from multi-view images taken using a moving camera (or multiple fixed cameras with unknown relative orientations between the cameras), After estimating the pose of each camera using the SfM (Structure-from-Motion) method (Non-Patent Document 1), a parallax image is generated by stereo matching between images, and a three-dimensional shape is obtained by synthesizing the parallax images. A method using the Multi-View Stereo method is also known. For example, Patent Document 3 discloses a three-dimensional reconstruction method using the PatchMatch Stereo method (Non-Patent Document 2), which performs MVS in consideration of the normal direction of the object surface by random search at high speed.

特許文献4には、移動カメラで対象物を撮影した際に発生するブラーに対する復元の頑健性を向上させるために、撮影画像中のブラー量に応じてステレオマッチングのスコアを調整することで復元精度を向上させる三次元復元法が開示されている。特許文献5には、ユーザが移動カメラで対象物を撮影する際に適切なカメラの動かし方ができるよう支援する方式が開示されている。 In Patent Document 4, in order to improve the robustness of restoration against blur that occurs when an object is photographed with a moving camera, the restoration accuracy is improved by adjusting the score of stereo matching according to the amount of blur in the photographed image. A three-dimensional reconstruction method is disclosed that improves the Patent Document 5 discloses a method for supporting a user in moving the camera appropriately when photographing an object with a moving camera.

特開2013-25458号公報Japanese Patent Application Publication No. 2013-25458 特開2020-35218号公報Japanese Patent Application Publication No. 2020-35218 特開2018-181047号公報Japanese Patent Application Publication No. 2018-181047 特開2020-9255号公報Japanese Patent Application Publication No. 2020-9255 特開2020-88646号公報Japanese Patent Application Publication No. 2020-88646 特開2005-078522号公報Japanese Patent Application Publication No. 2005-078522

J. L. Schonberger, et al., "Structure-from-Motion Revisited", Proc. of CVPR, 2016.J. L. Schonberger, et al., "Structure-from-Motion Revisited", Proc. of CVPR, 2016. M.Bleyer, C.Rhemann, C.Rother, "PatchMatch Stereo - Stereo Matching with Slanted Support Windows", Proc. of BMVC, 2011.M.Bleyer, C.Rhemann, C.Rother, "PatchMatch Stereo - Stereo Matching with Slanted Support Windows", Proc. of BMVC, 2011. N Xu, et al.,"Deep Image Matting", Proc. of CVPR, 2017.N Xu, et al.,"Deep Image Matting", Proc. of CVPR, 2017.

いずれの先行技術でも、任意背景の環境下で移動カメラにより撮影された対象物の多視点画像から当該対象物の三次元形状を高精度に復元することが困難であった。 In any of the prior art techniques, it is difficult to restore the three-dimensional shape of an object with high precision from multi-view images of the object taken by a moving camera in an environment with an arbitrary background.

例えば、特許文献1，2が用いるSfS法では多視点画像から対象物のシルエットを高精度に抽出する必要がある。しかしながら、一般に任意背景の環境下で移動カメラにより撮影された画像から対象物の高精度なシルエットを抽出することは難しい。特に、特許文献1，2が想定している背景差分法は背景が不変である前提の上で前景を抽出する手法であるため、カメラが移動するとシルエット抽出の精度が劣化する。 For example, in the SfS method used in Patent Documents 1 and 2, it is necessary to extract the silhouette of a target object from multi-view images with high precision. However, it is generally difficult to extract a highly accurate silhouette of an object from an image captured by a moving camera in an environment with an arbitrary background. In particular, since the background subtraction method envisaged in Patent Documents 1 and 2 extracts the foreground on the premise that the background remains unchanged, the accuracy of silhouette extraction deteriorates when the camera moves.

一方、全自動ではなく、部分的なユーザ入力に基づいて半自動的に高精度なシルエット抽出を行うAlpha Matting方式（非特許文献3）等も存在するが、多視点画像の全てに対してユーザ入力を行うことはユーザ負荷が高く、非現実的である。 On the other hand, there are Alpha Matting methods (non-patent document 3) that semi-automatically extract silhouettes with high precision based on partial user input rather than fully automatically, but user input is required for all multi-view images. Doing so imposes a heavy burden on the user and is impractical.

また、カメラが移動しても背景が変化しないよう、グリーンバック環境等、背景に全く模様が無い環境下で撮影を行うことも考えられる。しかしながら、一般にそのような環境下で撮影された多視点画像からはSfM法でカメラ姿勢を求めることが困難であるため、カメラ姿勢の誤差が拡大し、最終的な三次元復元精度が劣化する傾向がある。 It is also conceivable to take a picture in an environment where there is no pattern at all in the background, such as a green screen environment, so that the background does not change even when the camera moves. However, it is generally difficult to determine the camera pose using the SfM method from multi-view images taken under such environments, which tends to increase errors in camera pose and degrade the final three-dimensional reconstruction accuracy. There is.

一方、特許文献3-5が開示するMVS法は対象物のシルエット情報を必要としないために上述の課題は発生しない。しかしながら、同技術は一般に、模様（テクスチャ）が複雑な対象物の復元精度には優れるものの、模様が乏しい対象物の復元精度は劣化する傾向がある。 On the other hand, the MVS method disclosed in Patent Documents 3 to 5 does not require silhouette information of the object, so the above-mentioned problem does not occur. However, although this technique generally has excellent restoration accuracy for objects with complex patterns (textures), the restoration accuracy for objects with poor patterns tends to deteriorate.

例えば、単色のプラスチック素材の家具の様に模様が全く無い対象物の場合、ステレオ画像間で対象物領域のステレオマッチングが精度よく行えないため、視差画像の精度が劣化し、最終的な三次元復元精度が劣化する傾向がある。 For example, in the case of an object that has no pattern at all, such as furniture made of monochromatic plastic material, stereo matching of the object area between stereo images cannot be performed with high accuracy, which deteriorates the accuracy of the parallax image and creates the final three-dimensional image. Restoration accuracy tends to deteriorate.

本発明の目的は、上記の技術課題を解決し、対象物の模様を選ばないSfS法をベースにしつつ、少数のシルエット画像からSfS法で対象物の粗い三次元形状を復元し、当該三次元形状から残りの多数の視点画像の高精度なシルエット画像を自動的に生成し、SfS法を繰り返し適用することで最終的な三次元形状を復元することで、ユーザの負荷を抑えながら上述の課題を解決することにある。 The purpose of the present invention is to solve the above-mentioned technical problems, to restore the rough three-dimensional shape of the object using the SfS method from a small number of silhouette images, based on the SfS method that does not select the pattern of the object, and to restore the rough three-dimensional shape of the object from a small number of silhouette images. By automatically generating high-precision silhouette images of the remaining many viewpoint images from the shape and restoring the final three-dimensional shape by repeatedly applying the SfS method, we can solve the above problems while reducing the burden on the user. The goal is to solve the problem.

上記の目的を達成するために、本発明は、視点の異なるカメラで撮影した多視点画像を用いて対象物の三次元形状を復元する三次元形状復元装置において、以下の構成を具備した点に特徴がある。 In order to achieve the above object, the present invention provides a three-dimensional shape restoring device for restoring the three-dimensional shape of an object using multi-view images captured by cameras with different viewpoints, which has the following configuration. It has characteristics.

(1) 一部の多視点画像から抽出した第1の前景画像を用いて対象物の第1の三次元形状を復元する第1の復元手段と、前記第1の三次元形状及び前記一部よりも多数の多視点画像のカメラパラメータを用いて当該多数の多視点画像ごとにTrimap画像を生成する手段と、前記一部よりも多数の多視点画像ごとに対応するTrimap画像を用いて第2の前景画像を抽出する手段と、前記第2の前景画像を用いて第2の三次元形状を復元する第2の復元手段とを具備した。 (1) a first restoring means for restoring a first three-dimensional shape of an object using a first foreground image extracted from some multi-view images; and the first three-dimensional shape and the part. means for generating a Trimap image for each of the plurality of multiview images using camera parameters of a larger number of multiview images than the part; and second restoring means for restoring a second three-dimensional shape using the second foreground image.

(2) 前記第2の復元手段は、第2の三次元形状が所定の収束条件を充足するまで当該第2の三次元形状を前記Trimap画像を生成する手段へフィードバックし、前記Trimap画像を生成する手段は前記第1の三次元形状の代わりに前記フィードバックされた第2の三次元形状を用いてTrimap画像を生成し、前記第2の三次元形状を用いて生成したTrimap画像を用いて抽出した第2の前景画像を用いて第2の三次元形状を復元することを繰り返すようにした。 (2) The second restoration means feeds back the second three-dimensional shape to the means for generating the Trimap image until the second three-dimensional shape satisfies a predetermined convergence condition, and generates the Trimap image. The means for generating a Trimap image using the fed back second three-dimensional shape instead of the first three-dimensional shape, and extracting using the Trimap image generated using the second three-dimensional shape. The process of restoring the second three-dimensional shape using the second foreground image was repeated.

(3) 前記第1の復元手段は、前記第1の前景画像及び当該第1の前景画像に対応するカメラパラメータをSfS法に適用して、SfSベースで第1の三次元形状を復元するようにした。 (3) The first restoration means applies the first foreground image and camera parameters corresponding to the first foreground image to the SfS method to restore the first three-dimensional shape on an SfS basis. I made it.

(4) 前記カメラパラメータ推定部が前記一部よりも多数の多視点画像から対象物の三次元点群データをそれぞれ取得し、前記第1の復元手段は前記三次元点群データを用いて、点群ベースで第1の三次元形状を復元するようにした。 (4) The camera parameter estimating unit obtains three-dimensional point cloud data of the object from a larger number of multi-view images than the part, and the first restoring means uses the three-dimensional point cloud data, The first three-dimensional shape was restored based on point clouds.

(5) 前記第1の復元手段は、前記(3)のSfSベース及び前記(4)の点群ベースを組み合わせて第1の三次元形状を復元するようにした。 (5) The first restoring means restores the first three-dimensional shape by combining the SfS base of (3) and the point group base of (4).

本発明によれば、任意背景の環境下で撮影された多数の多視点画像から少数の高精度な前景画像を選ぶだけで多数の多視点画像の内容が反映された三次元形状を復元できるので、対象物の三次元形状をユーザやシステムリソースの負担を抑えながら高精度に復元できるようになる。 According to the present invention, it is possible to restore a three-dimensional shape that reflects the contents of a large number of multi-view images by simply selecting a small number of highly accurate foreground images from a large number of multi-view images taken in an environment with an arbitrary background. , it becomes possible to reconstruct the three-dimensional shape of an object with high precision while reducing the burden on users and system resources.

本発明の第1実施形態に係る三次元形状復元装置の機能ブロック図である。FIG. 1 is a functional block diagram of a three-dimensional shape restoring device according to a first embodiment of the present invention. 三次元形状復元装置が任意背景の多視点画像に基づいて対象物の三次元形状を復元する手順を模式的に示した図である。FIG. 3 is a diagram schematically showing a procedure in which the three-dimensional shape restoring device restores the three-dimensional shape of the object based on a multi-view image of an arbitrary background. 前景抽出補助画像の生成手順を示した図である。FIG. 7 is a diagram showing a procedure for generating a foreground extraction auxiliary image. 第2の前景画像の抽出手順を示した図である。FIG. 7 is a diagram showing a procedure for extracting a second foreground image. 本発明の第2実施形態に係る三次元形状復元装置の機能ブロック図である。FIG. 3 is a functional block diagram of a three-dimensional shape restoring device according to a second embodiment of the present invention. 本発明の第3実施形態に係る三次元形状復元装置の機能ブロック図である。FIG. 3 is a functional block diagram of a three-dimensional shape restoring device according to a third embodiment of the present invention. 本発明の第4実施形態に係る三次元形状復元装置の機能ブロック図である。FIG. 7 is a functional block diagram of a three-dimensional shape restoring device according to a fourth embodiment of the present invention. 本発明の第5実施形態に係る三次元形状復元装置の機能ブロック図である。FIG. 7 is a functional block diagram of a three-dimensional shape restoring device according to a fifth embodiment of the present invention.

以下、図面を参照して本発明の実施の形態について詳細に説明する。図1は本発明の第1実施形態に係る三次元形状復元装置1の構成を示した機能ブロック図であり、図2は三次元形状復元装置1が多視点画像に基づいて最終的に対象物の三次元形状を高品質に復元する手順を模式的に示した図である。 Embodiments of the present invention will be described in detail below with reference to the drawings. FIG. 1 is a functional block diagram showing the configuration of a three-dimensional shape restoring device 1 according to the first embodiment of the present invention, and FIG. FIG. 3 is a diagram schematically showing a procedure for restoring the three-dimensional shape of a 3D shape with high quality.

三次元形状復元装置1は、カメラパラメータ推定部10，2つの前景抽出部20（20a，20b）、前景抽出補助画像生成部30及び2つの三次元形状復元部40（40a，40b）を主要な構成とし、ここでは本発明の説明に不要な構成の図示は省略している。 The three-dimensional shape restoration device 1 mainly includes a camera parameter estimation section 10, two foreground extraction sections 20 (20a, 20b), a foreground extraction auxiliary image generation section 30, and two three-dimensional shape restoration sections 40 (40a, 40b). Here, illustrations of configurations unnecessary for explaining the present invention are omitted.

このような三次元形状復元装置1は、CPU，ROM，RAM，バス，インタフェース等を備えた少なくとも一台の汎用のコンピュータやサーバに各機能を実現するアプリケーション（プログラム）を実装することで構成できる。あるいはアプリケーションの一部をハードウェア化またはソフトウェア化した専用機や単能機としても構成できる。 Such a three-dimensional shape reconstruction device 1 can be configured by implementing an application (program) that realizes each function on at least one general-purpose computer or server equipped with a CPU, ROM, RAM, bus, interface, etc. . Alternatively, it can be configured as a dedicated machine or single-function machine in which part of the application is converted into hardware or software.

カメラパラメータ推定部10は複数視点のカメラ画像を多視点画像In（n=1,…,N）として取得し、各多視点画像（以下、単に画像と表現する場合もある）Inを撮影した際の各カメラの外部パラメータ（カメラの位置姿勢の情報）Wn（n=1,…,N）及び内部パラメータ（カメラの焦点距離や光軸中心位置等の情報）An（n=1,…,N）を同時に推定する。各カメラパラメータWn，Anの推定結果は第1及び第2の三次元形状復元部40a，40b並びに前景抽出補助画像生成部30へ提供される。 The camera parameter estimating unit 10 acquires camera images from multiple viewpoints as multi-view images In (n=1,...,N), and when each multi-view image (hereinafter also simply expressed as an image) In is photographed, External parameters (information on camera position and orientation) Wn (n=1,...,N) and internal parameters (information on camera focal length, optical axis center position, etc.) An (n=1,...,N) of each camera ) are estimated simultaneously. The estimation results of each camera parameter Wn, An are provided to the first and second three-dimensional shape restoration sections 40a, 40b and the foreground extraction auxiliary image generation section 30.

カメラの外部パラメータWn及び内部パラメータAnは、グローバルな三次元ユークリッド空間中の三次元座標[X, Y, Z] Tを各カメラ画像の二次元ピクセル座標 [u, v] Tに変換するパラメータであり、代表的には次式(1)で表される。 The camera's external parameters Wn and internal parameters An are parameters that convert the three-dimensional coordinates [X, Y, Z] T in the global three-dimensional Euclidean space to the two-dimensional pixel coordinates [u, v] T of each camera image. It is typically expressed by the following equation (1).

ここで、Aはカメラの内部パラメータであり、一般に焦点距離fx，fy及び光軸中心位置cx，cyの4パラメータで表される。Wはカメラの外部パラメータであり、一般に回転行列r11～r33の9パラメータ並びに相互に変換可能な回転ベクトルrx～rz（3パラメータ）及び並進ベクトルtx～tz（3パラメータ）の6パラメータで表される。 Here, A is an internal parameter of the camera, and is generally expressed by four parameters: focal length fx, fy and optical axis center position cx, cy. W is an external parameter of the camera, and is generally expressed by 9 parameters of rotation matrices r11 to r33, and 6 parameters of mutually convertible rotation vectors rx to rz (3 parameters) and translation vectors tx to tz (3 parameters) .

多視点画像からカメラの外部パラメータ及び内部パラメータ、更には多視点画像中の特徴点の三次元座標を同時に復元する手法は、例えば非特許文献1にSfM法として開示されている。前記カメラパラメータ推定部10は、任意のSfM法を用いて全てのカメラの外部パラメータ及び内部パラメータを同時に推定する。なお、同時に復元される特徴点の三次元座標は使用しない。 A method of simultaneously restoring external parameters and internal parameters of a camera from a multi-view image, as well as three-dimensional coordinates of feature points in the multi-view image is disclosed as an SfM method in, for example, Non-Patent Document 1. The camera parameter estimating unit 10 simultaneously estimates extrinsic parameters and intrinsic parameters of all cameras using an arbitrary SfM method. Note that the three-dimensional coordinates of feature points that are simultaneously restored are not used.

第1の前景抽出部20aは、図2(a)，(b)に示すように多数の多視点画像から選択した一部の画像Ik（ここで、Ikは全部でK枚、K＜N、k∈{1,…,K}）から対象物のシルエットを抽出し、これを第1の前景画像F¹kとして出力する。一般に前景画像とは画像中の前景領域のピクセルが指定された画像のことであり、前景領域の輝度を255（白）、背景領域の輝度を0（黒）等とした2値画像のことである。 The first foreground extraction unit 20a extracts some images Ik (here, Ik is K in total, K<N, k∈{1,...,K}), and outputs this as a first foreground image F ¹ k. In general, a foreground image is an image in which pixels in the foreground area of the image are specified, and is a binary image where the brightness of the foreground area is 255 (white) and the brightness of the background area is 0 (black), etc. be.

第1の前景抽出部20aは対象物のシルエットを任意の方法で抽出できる。例えば、ユーザが画像の各ピクセルについて前景／背景を手動で指定し、完全手作業で第1の前景画像を生成しても良い。多視点画像の全て（N枚）に対して完全手作業で前景画像を生成することはユーザ負荷の観点から非現実的だが、本実施形態では必要な作業が一部（少数のk枚）に限定されるためにユーザの負荷が大幅に軽減される。 The first foreground extraction unit 20a can extract the silhouette of the object using any method. For example, the user may manually specify the foreground/background for each pixel in the image to generate the first foreground image entirely manually. It is unrealistic to generate foreground images completely manually for all multi-view images (N images) from the perspective of user load, but in this embodiment, the necessary work is reduced to a portion (a small number of K images). Because of this limitation, the user's load is significantly reduced.

あるいは、ユーザが画像の各ピクセルを前景／背景／不明の3種のいずれかに手動で指定することでTrimap画像を生成し、非特許文献3に開示されるAlpha MattingによってTrimap画像から前景画像を自動生成するようにしても良い。Trimap画像とは、画像中の前景領域／背景領域／不明（前景背景の判別不可）のピクセルが指定された画像のことであり、前景領域の輝度を255（白）、背景領域の輝度を0（黒）、不明領域の輝度を128（灰色）等としたグレースケール画像等の形式で表現される。 Alternatively, the user can generate a Trimap image by manually specifying each pixel of the image as one of three types: foreground, background, or unknown, and convert the foreground image from the Trimap image using Alpha Matting, which is disclosed in Non-Patent Document 3. It may be automatically generated. A Trimap image is an image in which foreground area/background area/unknown (foreground/background cannot be determined) pixels are specified, and the brightness of the foreground area is set to 255 (white) and the brightness of the background area is set to 0. (black), the brightness of the unknown area is 128 (gray), etc. It is expressed in the form of a grayscale image.

Trimap画像にAlpha Mattingを適用することでTrimap画像の不明領域のピクセルを自動的に前景／背景に分類し、Trimap画像を前景画像に変換することができる。境界付近のピクセルをまとめて不明として大雑把に指定すれば良いため、完全手作業で前景画像を生成する場合と比較して、ユーザ負荷が大幅に軽減される。 By applying Alpha Matting to a Trimap image, pixels in unknown areas of the Trimap image can be automatically classified as foreground/background, and the Trimap image can be converted to a foreground image. Since all pixels near the boundary can be roughly designated as unknown, the user load is significantly reduced compared to the case where the foreground image is generated completely manually.

あるいは、入力画像から前景画像を完全自動で生成する（ただし、精度はTrimap画像を用いる方式に劣る）タイプのAlpha Matting方式（特許文献1、2が用いる背景差分法もこのタイプ）によって全ての多視点画像から前景画像を自動で生成した後、ユーザが精度良く生成できたと判断した前景画像のみを抽出して第1の前景画像F¹kとしても良い。この場合、ユーザはTrimap画像を生成する必要が無いため負荷が最も低くなる。 Alternatively, the Alpha Matting method (the background subtraction method used in Patent Documents 1 and 2 is also of this type) that completely automatically generates a foreground image from an input image (however, the accuracy is inferior to the method using a Trimap image) After automatically generating the foreground image from the viewpoint image, only the foreground image that the user has determined has been generated accurately may be extracted and set as the first foreground image F ¹ k. In this case, the load on the user is the lowest because there is no need to generate a Trimap image.

第1の三次元形状復元部40aは、図2(c)に示すように、第1の前景抽出部20aが一部の多視点画像から抽出した少数の第1の前景画像F¹k及び少なくとも当該第1の前景画像F¹kの抽出元の各多視点画像のカメラパラメータに基づいて対象物の三次元形状M1を復元する。 As shown in FIG. 2(c), the first three-dimensional shape restoration unit 40a extracts a small number of first foreground images F ¹ k extracted from some of the multi-view images by the first foreground extraction unit 20a, and at least The three-dimensional shape M1 of the object is restored based on the camera parameters of each multi-view image from which the first foreground image F ¹ k is extracted.

ここで、三次元形状とは対象物を包含する三次元空間中の三次元モデルのことである。三次元モデルには、一般にソリッドモデル・メッシュ（サーフェス）モデル・ワイヤフレームモデル等が存在する。本発明は特定のモデルに特化するものではないが、本実施形態では三次元形状がメッシュモデル（頂点・線分・面の情報で対象物表面を表現した三次元モデル）であるものとして説明を続ける。 Here, the three-dimensional shape refers to a three-dimensional model in a three-dimensional space that includes the object. Three-dimensional models generally include solid models, mesh (surface) models, wire frame models, and the like. Although the present invention is not specific to any particular model, this embodiment will be described assuming that the three-dimensional shape is a mesh model (a three-dimensional model that expresses the surface of an object using information about vertices, line segments, and surfaces). Continue.

前記第1の三次元形状復元部40aは少数（20枚程度）の前景画像F¹kを用いて三次元形状を復元するので、その品質は不十分、低品質となる。しかしながら、本実施形態では後述する第2の前景抽出部20bが当該少数よりも多数の第2の前景画像F²nを生成し、当該第2の前景画像F²nを用いた形状復元を改めて行うため、最終的には高精度、高品質な形状復元が可能になる。 Since the first three-dimensional shape restoring unit 40a restores the three-dimensional shape using a small number (about 20) of foreground images F ¹ k, the quality thereof is insufficient or low. However, in this embodiment, the second foreground extraction unit 20b, which will be described later, generates a larger number of second foreground images F ² n than the small number, and performs shape restoration using the second foreground images F ² n again. This ultimately makes it possible to restore the shape with high precision and quality.

前景抽出補助画像生成部30は、3Dレンダリング部301、前景画像生成部302及びTrimap画像を生成する領域分割部303を含み、図2(d)に示すように、第1の三次元形状復元部40aが復元した第1の三次元形状M1及び前記一部の多視点画像よりも多数の多視点画像の各カメラパラメータを用いて前景抽出補助画像Sを生成する。本実施形態では、全N台のカメラの外部パラメータWn及び内部パラメータAnを用いてN枚のTrimap画像を前景抽出補助画像Sn（n=1,…,N）として生成する。 The foreground extraction auxiliary image generation unit 30 includes a 3D rendering unit 301, a foreground image generation unit 302, and an area division unit 303 that generates a Trimap image, and as shown in FIG. 2(d), a first three-dimensional shape restoration unit 40a generates a foreground extraction auxiliary image S using the restored first three-dimensional shape M1 and camera parameters of a larger number of multi-view images than some of the multi-view images. In this embodiment, N Trimap images are generated as foreground extraction auxiliary images Sn (n=1,...,N) using external parameters Wn and internal parameters An of all N cameras.

図3は、前景抽出補助画像生成部30が前景抽出補助画像Snを生成する手順を模式的に示した図であり、初めに3Dレンダリング部301が第1の三次元形状M1及びN枚の多視点画像のカメラパラメータWn，Anを用いた3Dレンダリング処理により当該三次元形状M1を各カメラパラメータで撮影した際の二次元画像データに変換する。 FIG. 3 is a diagram schematically showing the procedure for the foreground extraction auxiliary image generation unit 30 to generate the foreground extraction auxiliary image Sn. The three-dimensional shape M1 is converted into two-dimensional image data when photographed with each camera parameter by 3D rendering processing using the camera parameters Wn and An of the viewpoint image.

次いで、各レンダリング結果から対象物の前景画像を取得するために、前景画像生成部302がデプスバッファから各ピクセルの奥行値を参照し、奥行値が存在するピクセルを当該三次元形状M1の前景ピクセルとして扱うことで、第1の三次元形状M1をベースとした前景画像F^M1nを生成する。 Next, in order to obtain a foreground image of the object from each rendering result, the foreground image generation unit 302 refers to the depth value of each pixel from the depth buffer, and selects a pixel with a depth value as a foreground pixel of the three-dimensional shape M1. By treating it as , a foreground image F ^M1 n based on the first three-dimensional shape M1 is generated.

このM1ベース前景画像F^M1nは粗い第1の三次元形状M1のシルエット画像であるため、そのままでは三次元形状の正確な復元に用いることができない。そこで、領域分割部303が各M1ベース前景画像F^M1nを、確実に前景ピクセルであると想定される領域Z^Fn、確実に背景ピクセルであると想定される領域Z^Bn及び前景／背景の判別が不確実な領域Z^Unに3分割することでTrimap化する。 Since this M1 base foreground image F ^M1 n is a rough silhouette image of the first three-dimensional shape M1, it cannot be used as is for accurate restoration of the three-dimensional shape. Therefore, the region dividing unit 303 divides each M1 base foreground image F ^M1 n into a region Z ^F n that is definitely assumed to be a foreground pixel, a region Z ^B n that is definitely assumed to be a background pixel, and a foreground/background pixel. Trimap is created by dividing into three regions Z ^U n in which the determination of is uncertain.

本実施形態では、各M1ベース前景画像F^M1nに膨張収縮処理を施すことで領域分割を実現する。初めにM1ベース前景画像F^M1nに対して収縮処理を行った結果の前景領域をZ^Fnとする。次いで、M1ベース前景画像F^M1nに対して膨張処理を行った結果の背景領域をZ^Bnとする。最後にZ^FnでもZ^Bnでもない領域をZ^Unとする。 In this embodiment, region segmentation is achieved by performing expansion/contraction processing on each M1-based foreground image F ^M1 n. First, let Z ^F n be the foreground region resulting from contraction processing performed on the M1 base foreground image F ^M1 n. Next, the background region resulting from the dilation process performed on the M1 base foreground image F ^M1 n is defined as Z ^B n. Finally, let Z ^U n be a region that is neither Z ^F n nor Z ^B n.

第1の三次元形状M1は誤差を含むものの対象物の三次元形状をある程度表していることが想定されるため、想定される誤差を加味した膨張収縮処理により、確実に前景／背景であると想定される領域を特定することができる。 Although the first three-dimensional shape M1 includes errors, it is assumed that it represents the three-dimensional shape of the object to some extent, so by expanding and contracting the assumed errors, it can be confirmed that it is the foreground/background. Possible areas can be identified.

ここで、前景抽出補助画像生成部30は収縮処理に対して膨張処理の回数を増やしたり、カーネルサイズを大きくしたりしても良い。一般にSfS法では、復元される形状が実際の形状よりも収縮する傾向にあるため、この調整により前景抽出補助画像Snの精度を高めることができる。 Here, the foreground extraction auxiliary image generation unit 30 may increase the number of times of expansion processing compared to contraction processing, or increase the kernel size. Generally, in the SfS method, the restored shape tends to shrink more than the actual shape, so this adjustment can improve the accuracy of the foreground extraction auxiliary image Sn.

また、前景抽出補助画像生成部30は、第1の形状復元に使用した少数の前景画像の枚数Nkが一定の閾値を下回った場合は収縮処理の回数を増やしたり、収縮処理のカーネルサイズを大きくしたりする一方、少数の前景画像の枚数Nkが一定の閾値を上回った場合は膨張処理の回数を増やしたり、膨張処理のカーネルサイズを大きくしたりしても良い。なお、膨張処理の回数を増やすと共に膨張処理のカーネルサイズも大きくするようにしても良い。 In addition, if the number Nk of the small number of foreground images used for the first shape restoration is less than a certain threshold, the foreground extraction auxiliary image generation unit 30 increases the number of contraction processes or increases the kernel size of the contraction process. On the other hand, if the number Nk of a small number of foreground images exceeds a certain threshold, the number of times the dilation process may be increased or the kernel size of the dilation process may be increased. Note that the kernel size of the dilation process may be increased as well as increasing the number of times the dilation process is performed.

あるいは、第1の形状復元に使用した少数の前景画像の枚数Nkが一定の閾値を下回る範囲内ではNkが少ないほど収縮処理の回数を増やしたり、収縮処理のカーネルサイズを大きくしたりする一方、前景画像の枚数Nkが一定の閾値を上回る範囲内ではNkが多いほど膨張処理の回数を増やしたり、膨張処理のカーネルサイズを大きくしたりしても良い。なお、膨張処理の回数を増やすと共に膨張処理のカーネルサイズも大きくするようにしても良い。 Alternatively, within a range where the number Nk of a small number of foreground images used for the first shape restoration is below a certain threshold, the smaller Nk is, the more the number of contraction processes is increased, or the kernel size of the contraction process is increased. Within a range where the number Nk of foreground images exceeds a certain threshold, the number of dilation processes may be increased or the kernel size of the dilation process may be increased as Nk increases. Note that the kernel size of the dilation process may be increased as well as increasing the number of times the dilation process is performed.

一般にSfS法では視点数（前景画像の枚数）が少ない場合に、復元される形状が実際の形状よりも外側に膨らむ部分が生じやすい傾向がある一方、視点数（前景画像の枚数）が多くなるほど、復元される形状が実際の形状よりも内側に萎む部分が生じやすい傾向があるため、このような調整を採用することで前景抽出補助画像（Trimap画像）の精度を更に高めることができる。 In general, in the SfS method, when the number of viewpoints (number of foreground images) is small, the reconstructed shape tends to bulge outward from the actual shape, while as the number of viewpoints (number of foreground images) increases, Since the restored shape tends to have parts that shrink inward from the actual shape, by adopting this kind of adjustment, it is possible to further improve the precision of the foreground extraction auxiliary image (Trimap image).

第2の前景抽出部20bはAlpha Matting部201を含み、前景抽出補助画像生成部30が多数（本実施形態では、N枚）の多視点画像ごとに生成した前景抽出補助画像（Trimap画像）Sn及び多数の多視点画像In（図2(e)）に基づいて、図2（f）に示すように多数の第2の前景画像F²n（n=1,…N）を抽出する。 The second foreground extraction unit 20b includes an Alpha Matting unit 201, and the foreground extraction auxiliary image (Trimap image) Sn generated by the foreground extraction auxiliary image generation unit 30 for each of a large number (N images in this embodiment) of multi-view images. Based on the large number of multi-view images In (FIG. 2(e)), a large number of second foreground images F ² n (n=1,...N) are extracted as shown in FIG. 2(f).

図4は第2の前景抽出部20bが第2の前景画像F²nを抽出する手順を模式的に示した図であり、初めにAlpha Matting部201おいて多視点画像Inごとに対応するTrimap画像Snを用いて、Alpha Mattingによってピクセルごとに前景／背景の推定を行う。推定結果は前景と判定した確度の連続値になっているため、所定の閾値Tで前景／背景の2値データに変換することで第2の前景画像F²nを得る。 FIG. 4 is a diagram schematically showing the procedure by which the second foreground extraction unit 20b extracts the second foreground image F ² n. Using image Sn, foreground/background is estimated for each pixel by Alpha Matting. Since the estimation result is a continuous value of the accuracy of determining foreground, the second foreground image F ² n is obtained by converting it into binary data of foreground/background using a predetermined threshold T.

閾値Tは前景（255）と背景（0）の間の任意の値を取ることが可能であり、予め所定の値をユーザが指定しておくことができるが、適切な値に設定することにより第2の形状復元の精度を高めることが可能である。 The threshold T can take any value between the foreground (255) and the background (0), and the user can specify a predetermined value in advance, but by setting it to an appropriate value, It is possible to improve the accuracy of the second shape restoration.

閾値Tは、カメラパラメータの一致する第1の前景画像F¹kと第2の前景画像F²n（k枚）との比較結果に基づいて自動的に設定しても良い。具体的には、第1の前景画像F¹kについて第2の前景画像F²nとの類似度を計算し、その平均または合計が最も高くなるように閾値Tを設定することができる。高精度な第1の前景画像F¹kを用いて閾値調整することで、第1の前景画像F¹kが存在しないカメラ画像についても、第2の前景抽出を同様に高精度に行えることが期待できる。 The threshold T may be automatically set based on a comparison result between the first foreground image F ¹ k and the second foreground image F ² n (k images) whose camera parameters match. Specifically, the degree of similarity between the first foreground image F ¹ k and the second foreground image F ² n can be calculated, and the threshold T can be set so that the average or sum of the similarities is the highest. By adjusting the threshold value using the highly accurate first foreground image F ¹ k, it is possible to perform the second foreground extraction with high accuracy even for camera images where the first foreground image F ¹ k does not exist. You can expect it.

なお、第2の前景抽出部20bは、第1の前景抽出部20aが第1の前景画像F¹kを抽出していない多視点画像、すなわち前記一部の多視点画像を除いた残り全ての多視点画像のみから第2の残りの前景画像F²n-kを抽出し、第1の前景画像F¹k及び第2の残りの前景画像F²n-kを統合して第2の前景画像F²nとしても良い。 Note that the second foreground extraction unit 20b extracts all the multi-view images from which the first foreground extraction unit 20a has not extracted the first foreground image F ¹ k, that is, all the remaining multi-view images except for the part of the multi-view images. The second remaining foreground image F ² nk is extracted from only the multi-view image, and the first foreground image F ¹ k and the second remaining foreground image F ² nk are integrated to form the second foreground image F ² n. It's good as well.

これにより、第2の前景抽出部20bが全ての多視点画像から前景画像を抽出する場合と比較して処理負荷の削減が可能となるのみならず、第1の前景画像F¹kは少数ではあるものの高精度である可能性が高いので、第2の形状復元の精度を更に高められるようになる。 This not only makes it possible to reduce the processing load compared to the case where the second foreground extraction unit 20b extracts foreground images from all multi-view images, but also allows the first foreground image F ¹ k to be Since there is a high possibility that the accuracy is high, the accuracy of the second shape restoration can be further increased.

第2の三次元形状復元部40bは、多数の第2の前景画像F²n及び対応するカメラパラメータを用いて、図2(g)に示すように、前記第1の三次元形状復元部40aと同様の手法で対象物の三次元形状M2を高品質に復元する。 The second three-dimensional shape restoring unit 40b uses a large number of second foreground images F ² n and corresponding camera parameters to restore the first three-dimensional shape restoring unit 40a, as shown in FIG. 2(g). The three-dimensional shape M2 of the object is restored with high quality using the same method.

本実施形態によれば、任意背景の環境下で撮影された対象物の多視点画像から当該対象物の三次元形状を高精度に復元することができる。 According to the present embodiment, the three-dimensional shape of the object can be restored with high precision from multi-view images of the object photographed under an environment with an arbitrary background.

図5は本発明の第2実施形態の構成を示した機能ブロック図であり、前記と同一の符号は同一又は同等部分を表しているので、その説明は省略する。 FIG. 5 is a functional block diagram showing the configuration of the second embodiment of the present invention, and the same reference numerals as above represent the same or equivalent parts, so a description thereof will be omitted.

本実施形態は、第2の三次元形状復元部40bが多数の第2の前景画像F²nを用いて復元した相対的に高品質な第2の三次元形状M2を前景抽出補助画像生成部30へフィードバックするようにした点に特徴がある。 In the present embodiment, the second three-dimensional shape restoration unit 40b restores a relatively high-quality second three-dimensional shape M2 using a large number of second foreground images F ² n to the foreground extraction auxiliary image generation unit. It is unique in that it provides feedback to 30.

前景抽出補助画像生成部30は、第2の三次元形状M2がフィードバックされると前記第1の三次元形状M1の代わりに当該第2の三次元形状M2を用いた3DレンダリングによりM2ベース前景画像F^M2nを生成し、これを領域分割することで前景抽出補助画像Snを生成する。第2の前景抽出部20bは、各多視点画像から前景抽出補助画像Snを用いて更に高精度化された第2の前景画像F²nを改めて抽出する。 When the second three-dimensional shape M2 is fed back, the foreground extraction auxiliary image generation unit 30 generates an M2-based foreground image by 3D rendering using the second three-dimensional shape M2 instead of the first three-dimensional shape M1. By generating F ^M2 n and dividing it into regions, a foreground extraction auxiliary image Sn is generated. The second foreground extraction unit 20b uses the foreground extraction auxiliary image Sn to extract a second foreground image F ² n with higher precision from each multi-view image.

第2の三次元形状復元部40bは前記更に高精度化された第2の前景画像F²n及び対応するカメラパラメータを用いて第2の三次元形状M2（M2_2）を再復元することを、収束判定部403が前回（n番目）の第2の三次元形状M2（M2_n）と今回の第2の三次元形状M2（M2_n+1）との差分が所定の閾値を下回ったと判定するまで繰り返す。 The second three-dimensional shape restoring unit 40b re-restores the second three-dimensional shape M2 (M2_2) using the second foreground image F ² n with higher precision and the corresponding camera parameters. Repeat until the convergence determination unit 403 determines that the difference between the previous (nth) second three-dimensional shape M2 (M2_n) and the current second three-dimensional shape M2 (M2_n+1) is less than a predetermined threshold. .

第2の三次元形状M2は第1の三次元形状M1よりも高精度であるため、上記のように前景抽出補助画像の生成及び第2の三次元形状M2の生成を繰り返すことで第2の三次元形状M2の品質を更に向上させることができるようになる。 Since the second three-dimensional shape M2 has higher precision than the first three-dimensional shape M1, by repeating the generation of the foreground extraction auxiliary image and the generation of the second three-dimensional shape M2 as described above, the second three-dimensional shape M2 is It becomes possible to further improve the quality of the three-dimensional shape M2.

図6は、本発明の第3実施形態の構成を示した図であり、前記と同一の符号は同一又は同等部分を表しているので、その説明は省略する。 FIG. 6 is a diagram showing the configuration of a third embodiment of the present invention, and since the same reference numerals as above represent the same or equivalent parts, the explanation thereof will be omitted.

本実施形態は、前記第1の三次元形状復元部40aがSfSベース復元部401を具備し、前記第1の前景抽出部20aが前記一部の多視点画像から抽出した第1の前景画像及び当該一部の多視点画像を撮影した各カメラのカメラパラメータを用いてSfS法により第1の三次元形状M1を復元する点に特徴がある。 In this embodiment, the first three-dimensional shape restoring unit 40a includes an SfS-based restoring unit 401, and the first foreground extracting unit 20a extracts the first foreground image and The feature is that the first three-dimensional shape M1 is restored by the SfS method using the camera parameters of each camera that captured the part of the multi-view images.

一般にSfS法は、使用する前景画像の枚数（視点数）が多ければ多いほど高精度な三次元形状の復元が可能になるため、画像のサブセットのみを用いる第1の形状復元は十分な精度が得られない傾向がある。 In general, in the SfS method, the more foreground images (number of viewpoints) used, the more accurate the three-dimensional shape can be restored, so the first shape restoration using only a subset of images is not accurate enough. There is a tendency not to get it.

前記SfSベース復元部401は、第1の前景画像F¹k及び対応するカメラパラメータを用いて三次元ボクセル空間にシルエットの逆投影を行うことで、対象物を包含するボクセルデータを算出する（この処理は視体積交差法又はVisual Hullとも呼ばれる）。 The SfS-based restoration unit 401 calculates voxel data that includes the object by back-projecting the silhouette onto the three-dimensional voxel space using the first foreground image F ¹ k and the corresponding camera parameters. The process is also called visual volume intersection method or Visual Hull).

前記SfSベース復元部401は更に、ボクセルデータをマーチングキューブ法等のアルゴリズムによってメッシュモデルに変換する。マーチングキューブ法では、隣接した8個のボクセルを頂点とする立方体を一単位として、8頂点のボクセルの値に応じて予め定義された15パターンのポリゴンに変換する処理を繰り返すことによってボクセルデータを三次元形状モデルに変換することができる。 The SfS base restoration unit 401 further converts the voxel data into a mesh model using an algorithm such as the marching cube method. In the marching cube method, one unit is a cube with eight adjacent voxels as vertices, and the voxel data is converted into three-dimensional polygons by repeating the process of converting it into 15 predefined patterns of polygons according to the values of the voxels at the eight vertices. It can be converted to the original shape model.

図7は、本発明の第4実施形態の構成を示した図であり、前記と同一の符号は同一又は同等部分を表しているので、その説明は省略する。 FIG. 7 is a diagram showing the configuration of the fourth embodiment of the present invention, and since the same reference numerals as above represent the same or equivalent parts, the explanation thereof will be omitted.

本実施形態は、前記第1の三次元形状復元部40aが点群ベース復元部402を具備し、全て（N台）のカメラ画像から抽出した点群データを用いて第1の三次元形状M1を復元する点に特徴がある。 In this embodiment, the first three-dimensional shape restoration unit 40a includes a point cloud-based restoration unit 402, and uses point cloud data extracted from all (N) camera images to restore the first three-dimensional shape M1. It is distinctive in that it restores.

カメラパラメータ推定部10は、各カメラが多視点画像を撮影した際の外部パラメータWn（n=1,…,N）及び内部パラメータAn（n=1,…,N）を同時に推定すると共に、SfM法で推定した多視点画像中の特徴点の三次元座標（三次元点群データP）を取得する。 The camera parameter estimation unit 10 simultaneously estimates the external parameters Wn (n=1,...,N) and the internal parameters An (n=1,...,N) when each camera captures a multi-view image, and also estimates the SfM The three-dimensional coordinates (three-dimensional point group data P) of the feature points in the multi-view image estimated by the method are obtained.

第1の三次元形状復元部40aは、三次元点群データPを用いて、三次元点群データから三次元モデルを復元する一般的なアルゴリズム（Poisson Surface Reconstruction法など）を用いて対象物の第1の三次元形状M1を復元する。 The first three-dimensional shape reconstruction unit 40a uses the three-dimensional point cloud data P to reconstruct the object using a general algorithm (such as the Poisson Surface Reconstruction method) that reconstructs a three-dimensional model from the three-dimensional point cloud data. Restore the first three-dimensional shape M1.

第1の形状復元は、一般的なSfM法と同様のアプローチでの形状復元になるため、模様が乏しい対象物の復元精度が劣化する傾向がある。しかしながら、本実施形態では全視点分の前景画像を用いた第2の形状復元による高精度な形状復元が期待できる。 Since the first shape restoration uses an approach similar to the general SfM method, the restoration accuracy of objects with poor patterns tends to deteriorate. However, in this embodiment, highly accurate shape restoration can be expected by the second shape restoration using foreground images for all viewpoints.

なお、上記の第3及び第4の実施形態では第1の三次元形状復元部40aがSfSベース又は点群ベースで第1の三次元形M1を復元するものとして説明したが、本発明はこれのみに限定されるものではなく、図8に示した第5実施形態のように、第1の三次元形状復元部40aにSfSベース復元部401及び点群ベース復元部402を設け、2つの復元方式を組み合わせるようにしても良い。 Note that in the third and fourth embodiments described above, the first three-dimensional shape restoring unit 40a was described as restoring the first three-dimensional shape M1 on an SfS basis or a point group basis, but the present invention is not limited to this. However, as in the fifth embodiment shown in FIG. A combination of methods may be used.

具体的には、第1の前景画像F¹kからSfS法によって復元した三次元形状M1を一度三次元点群データP'に変換し、カメラパラメータ推定部10が取得した三次元点群データPと前記変換した三次元点群データP'とを統合し、統合した三次元点群を用いて三次元形状M1を復元することができる。 Specifically, the three-dimensional shape M1 restored by the SfS method from the first foreground image F ¹ k is once converted into three-dimensional point group data P', and the three-dimensional point group data P obtained by the camera parameter estimation unit 10 is converted into three-dimensional point group data P'. and the transformed three-dimensional point group data P', and the three-dimensional shape M1 can be restored using the integrated three-dimensional point group.

そして、上記の各実施形態によれば、任意背景の環境下で撮影された対象物の多視点画像から当該対象物の三次元形状をユーザやシステムリソースの負担を抑えながら高精度に復元することが可能となるので、地理的あるいは経済的な格差を超えて多くの人々に多様な三次元形状を提供できるようになる。その結果、国連が主導する持続可能な開発目標(SDGs)の目標9「レジリエントなインフラを整備し、包括的で持続可能な産業化を推進する」や目標11「都市を包摂的、安全、レジリエントかつ持続可能にする」に貢献することが可能となる。 According to each of the above embodiments, the three-dimensional shape of an object can be restored with high precision from multi-view images of the object taken in an environment with an arbitrary background while reducing the burden on the user and system resources. This makes it possible to provide a variety of three-dimensional shapes to many people, regardless of geographic or economic disparity. As a result, Goal 9 of the Sustainable Development Goals (SDGs) led by the United Nations: ``Build resilient infrastructure and promote inclusive and sustainable industrialization'' and Goal 11: ``Make cities inclusive, safe and resilient.'' It will be possible to contribute to "making the world more sustainable and more sustainable."

1…三次元形状復元装置，10…カメラパラメータ推定部，20（20a，20b）…前景画像抽出部，30…前景抽出補助画像生成部，40（40a，40b）…三次元形状復元部，201…Alpha Matting部，301…3Dレンダリング部，302…前景画像生成部，303…領域分割部，401…SfSベース復元部，402…点群ベース復元部，403…収束判定部 1...Three-dimensional shape restoration device, 10...Camera parameter estimation section, 20 (20a, 20b)...Foreground image extraction section, 30...Foreground extraction auxiliary image generation section, 40 (40a, 40b)...Three-dimensional shape restoration section, 201 ...Alpha Matting section, 301...3D rendering section, 302...Foreground image generation section, 303...Region division section, 401...SfS-based restoration section, 402...Point cloud-based restoration section, 403...Convergence judgment section

Claims

視点の異なるカメラで撮影した多視点画像を用いて対象物の三次元形状を復元する三次元形状復元装置において、
各多視点画像のカメラパラメータを推定する手段と、
一部の多視点画像から抽出した第1の前景画像を用いて対象物の第1の三次元形状を復元する第1の復元手段と、
前記第1の三次元形状及び前記一部よりも多数の多視点画像のカメラパラメータを用いて当該多数の多視点画像ごとにTrimap画像を生成する手段と、
前記一部よりも多数の多視点画像ごとに対応するTrimap画像を用いて第2の前景画像を抽出する手段と、
前記第2の前景画像を用いて第2の三次元形状を復元する第2の復元手段とを具備したことを特徴とする三次元形状復元装置。 In a three-dimensional shape restoration device that restores the three-dimensional shape of an object using multi-view images captured by cameras with different viewpoints,
means for estimating camera parameters for each multi-view image;
a first restoring means for restoring a first three-dimensional shape of the object using a first foreground image extracted from some of the multi-view images;
means for generating a Trimap image for each of the plurality of multi-view images using the first three-dimensional shape and camera parameters of a larger number of multi-view images than the part;
means for extracting a second foreground image using a Trimap image corresponding to each of the plurality of multi-view images;
A three-dimensional shape restoring device comprising: second restoring means for restoring a second three-dimensional shape using the second foreground image.

前記第1の復元手段は、前記第1の前景画像及び当該第1の前景画像に対応するカメラパラメータをSfS法に適用して第1の三次元形状を復元することを特徴とする請求項１に記載の三次元形状復元装置。 Claim 1, wherein the first restoration means restores the first three-dimensional shape by applying the first foreground image and camera parameters corresponding to the first foreground image to an SfS method. The three-dimensional shape restoration device described in .

前記カメラパラメータを推定する手段が前記一部よりも多数の多視点画像から対象物の三次元点群データをそれぞれ取得し、
前記第1の復元手段は、前記三次元点群データを用いて第1の三次元形状を復元することを特徴とする請求項１または２に記載の三次元形状装置。 The means for estimating the camera parameters obtains three-dimensional point cloud data of the object from a larger number of multi-view images than the part, respectively;
3. The three-dimensional shape device according to claim 1, wherein the first restoring means restores the first three-dimensional shape using the three-dimensional point group data.

前記第2の復元手段は、第2の三次元形状が所定の収束条件を充足するまで当該第2の三次元形状を前記Trimap画像を生成する手段へフィードバックし、
前記Trimap画像を生成する手段は前記第1の三次元形状の代わりに前記フィードバックされた第2の三次元形状を用いてTrimap画像を生成し、
前記第2の三次元形状を用いて生成したTrimap画像を用いて抽出した第2の前景画像を用いて第2の三次元形状を復元することを繰り返すことを特徴とする請求項１ないし３のいずれかに記載の三次元形状復元装置。 The second restoring means feeds back the second three-dimensional shape to the means for generating the Trimap image until the second three-dimensional shape satisfies a predetermined convergence condition,
The means for generating a Trimap image generates a Trimap image using the fed back second three-dimensional shape instead of the first three-dimensional shape,
4. The process of restoring the second three-dimensional shape using a second foreground image extracted using a Trimap image generated using the second three-dimensional shape is repeated. The three-dimensional shape restoring device according to any one of the above.

前記Trimap画像を生成する手段は、前記第1の三次元形状及び前記一部よりも多数の多視点画像の各カメラパラメータを用いてTrimap画像を生成することを特徴とする請求項１ないし４のいずれかに記載の三次元形状復元装置。 5. The means for generating the Trimap image generates the Trimap image using each camera parameter of the first three-dimensional shape and a larger number of multi-view images than the part. The three-dimensional shape restoring device according to any one of the above.

前記一部よりも多数の多視点画像が前記一部の多視点画像を含み、
前記Trimap画像を生成する手段は、前記一部よりも多数の多視点画像から前記一部の多視点画像を除いた残りの多視点画像ごとにTrimap画像を生成し、
前記第2の前景画像を抽出する手段は、前記残りの多視点画像ごとに対応するTrimap画像を用いて第2の残りの前景画像を抽出し、
前記第2の復元手段は前記第1及び第2の残りの前景画像を用いて第2の三次元形状を復元することを特徴とする請求項１ないし５のいずれかに記載の三次元形状復元装置。 a larger number of multi-view images than the part include the part of the multi-view image;
The means for generating the Trimap image generates a Trimap image for each remaining multi-view image after removing the part of the multi-view image from a larger number of multi-view images than the part,
The means for extracting the second foreground image extracts a second remaining foreground image using a Trimap image corresponding to each of the remaining multi-view images,
The three-dimensional shape restoration according to any one of claims 1 to 5, wherein the second restoration means restores a second three-dimensional shape using the first and second remaining foreground images. Device.

前記Trimap画像を生成する手段は、前景ピクセルに判別できる領域、背景ピクセルに判別できる領域並びに前景及び背景のいずれとも判別できない領域に分割された画像を生成することを特徴とする請求項１ないし６のいずれかに記載の三次元形状復元装置。 7. The means for generating the Trimap image generates an image divided into an area that can be identified as a foreground pixel, an area that can be identified as a background pixel, and an area that cannot be identified as either the foreground or the background. The three-dimensional shape restoring device according to any one of the above.

前記Trimap画像を生成する手段は、膨張収縮処理を用いて領域分割を行うことを特徴とする請求項７に記載の三次元形状復元装置。 8. The three-dimensional shape restoring device according to claim 7, wherein the means for generating the Trimap image performs region division using expansion and contraction processing.

前記Trimap画像を生成する手段は、第1の前景画像に対して収縮処理を行った結果の前景領域を前景ピクセルに判別できる領域とし、第1の前景画像に対して膨張処理を行った結果の背景領域を背景ピクセルに判別できる領域とすることを特徴とする請求項８に記載の三次元形状復元装置。 The means for generating the Trimap image uses a foreground area that is a result of performing shrinkage processing on the first foreground image as an area that can be determined as a foreground pixel, and a means that generates a foreground area that is the result of performing dilation processing on the first foreground image. 9. The three-dimensional shape restoring device according to claim 8, wherein the background area is an area that can be identified as a background pixel.

前記Trimap画像を生成する手段は、第1の前景画像の枚数に応じて膨張処理の回数及び膨張処理のカーネルサイズの少なくとも一方を制御することを特徴とする請求項８または９に記載の三次元形状復元装置。 The three-dimensional image according to claim 8 or 9, wherein the means for generating the Trimap image controls at least one of the number of dilation processes and the kernel size of the dilation process according to the number of first foreground images. Shape restoration device.

前記Trimap画像を生成する手段は、第1の前景画像の枚数が所定の閾値を下回っていると収縮処理の回数を増やし、又は収縮処理のカーネルサイズを大きくすることを特徴とする請求項１０に記載の三次元形状復元装置。 11. The means for generating the Trimap image increases the number of contraction processes or increases the kernel size of the contraction process when the number of first foreground images is less than a predetermined threshold. The three-dimensional shape restoring device described.

前記Trimap画像を生成する手段は、第1の前景画像の枚数が所定の閾値を上回っていると膨張処理の回数を増やし、又は膨張処理のカーネルサイズを大きくすることを特徴とする請求項１０に記載の三次元形状復元装置。 11. The means for generating the Trimap image increases the number of times of dilation processing or increases the kernel size of dilation processing if the number of first foreground images exceeds a predetermined threshold. The three-dimensional shape restoring device described.

前記Trimap画像を生成する手段は、第1の前景画像の枚数が所定の閾値を下回っていると、前記枚数が少ないほど収縮処理の回数を増やし、又は収縮処理のカーネルサイズを大きくすることを特徴とする請求項１０に記載の三次元形状復元装置。 The means for generating the Trimap image is characterized in that when the number of first foreground images is less than a predetermined threshold, the smaller the number of first foreground images, the more the number of contraction processes is increased, or the kernel size of the contraction process is increased. The three-dimensional shape restoring device according to claim 10.

前記Trimap画像を生成する手段は、第1の前景画像の枚数が所定の閾値を上回っていると、前記枚数が多いほど膨張処理の回数を増やし、又は膨張処理のカーネルサイズを大きくすることを特徴とする請求項１０に記載の三次元形状復元装置。 The means for generating the Trimap image is characterized in that when the number of first foreground images exceeds a predetermined threshold, the larger the number of first foreground images, the more the number of times the dilation process is performed or the kernel size of the dilation process is increased. The three-dimensional shape restoring device according to claim 10.

前記第2の前景抽出手段は、前記Trimap画像に対するAlpha Mattingにより画素ごとに得られる前景又は背景の確度を閾値処理することで全景及び背景の2値データに変換された第2の前景画像を抽出することを特徴とする請求項１ないし１４のいずれかに記載の三次元形状復元装置。 The second foreground extraction means extracts a second foreground image converted into binary data of a panoramic view and a background by threshold processing the accuracy of the foreground or background obtained for each pixel by Alpha Matting on the Trimap image. The three-dimensional shape restoring device according to any one of claims 1 to 14.

前記第2の前景抽出部は、第1の前景画像と、カメラパラメータが当該第1の前景画像と一致する同数の第2の前景画像との類似度がより高くなるように前記閾値処理の閾値を設定することを特徴とする請求項１５に記載の三次元形状復元装置。 The second foreground extraction unit sets a threshold value for the threshold processing so that the degree of similarity between the first foreground image and the same number of second foreground images whose camera parameters match those of the first foreground image is higher. 16. The three-dimensional shape restoring device according to claim 15, wherein:

視点の異なるカメラで撮影した多視点画像を用いてコンピュータが対象物の三次元形状を復元する三次元形状復元方法において、
一部の多視点画像から抽出した第1の前景画像を用いて対象物の第1の三次元形状を復元し、
前記第1の三次元形状及び前記一部よりも多数の多視点画像のカメラパラメータを用いて当該多数の多視点画像ごとにTrimap画像を生成し、
前記一部よりも多数の多視点画像ごとに対応するTrimap画像を用いて第2の前景画像を抽出し、
前記第2の前景画像を用いて第2の三次元形状を復元することを特徴とする三次元形状復元方法。 In a three-dimensional shape restoration method in which a computer restores the three-dimensional shape of an object using multi-view images taken with cameras with different viewpoints,
Restore the first three-dimensional shape of the object using the first foreground image extracted from some of the multi-view images,
Generating a Trimap image for each of the plurality of multi-view images using the first three-dimensional shape and camera parameters of a larger number of multi-view images than the part;
extracting a second foreground image using a Trimap image corresponding to each of the plurality of multi-view images;
A three-dimensional shape restoration method, comprising restoring a second three-dimensional shape using the second foreground image.

視点の異なるカメラで撮影した多視点画像を用いて対象物の三次元形状を復元する三次元形状復元プログラムにおいて、
一部の多視点画像から抽出した第1の前景画像を用いて対象物の第1の三次元形状を復元する手順と、
前記第1の三次元形状及び前記一部よりも多数の多視点画像のカメラパラメータを用いて当該多数の多視点画像ごとにTrimap画像を生成する手順と、
前記一部よりも多数の多視点画像ごとに対応するTrimap画像を用いて第2の前景画像を抽出する手順と、
前記第2の前景画像を用いて第2の三次元形状を復元する手順と、をコンピュータに実行させることを特徴とする三次元形状復元プログラム。 In a 3D shape restoration program that restores the 3D shape of an object using multi-view images taken with cameras with different viewpoints,
a step of restoring a first three-dimensional shape of the object using a first foreground image extracted from some of the multi-view images;
a step of generating a Trimap image for each of the plurality of multi-view images using the first three-dimensional shape and camera parameters of a larger number of multi-view images than the part;
a step of extracting a second foreground image using a Trimap image corresponding to each of the plurality of multi-view images;
A three-dimensional shape restoration program characterized by causing a computer to execute the steps of restoring a second three-dimensional shape using the second foreground image.