JP7057762B2

JP7057762B2 - Height estimation device and program

Info

Publication number: JP7057762B2
Application number: JP2019019458A
Authority: JP
Inventors: 建鋒徐; 和之田坂
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2019-02-06
Filing date: 2019-02-06
Publication date: 2022-04-20
Anticipated expiration: 2039-02-06
Also published as: JP2020126514A

Description

本発明は、画像から対象の高さに関する情報を簡素に推定することが可能な高さ推定装置及びプログラムに関する。 The present invention relates to a height estimation device and a program capable of simply estimating information about the height of an object from an image.

同じ対象を異なるカメラ位置から撮影している２以上の画像（多視点映像の場合なども含む）を用いて、空間的に配置されている対象に関する情報を得る技術として、種々のものがある。 There are various techniques for obtaining information on spatially arranged objects by using two or more images (including the case of multi-viewpoint video) in which the same object is photographed from different camera positions.

特許文献１では、撮影視点が異なる２つのカメラにより同時に撮影した２つの画像（左右２枚のステレオ画像）の一方の視点を他方の撮像手段の視点に写した画像に変換し、視点変換後の画像と前記他方のカメラにより撮像した画像とをマッチングさせ、高さのある物体を検出する。ただし、特許文献１の手法は、前記２つのカメラが床面から同じ高さの地点に向きが等しく光軸が平行になるように設置されているというカメラ設置条件がある上に、物体の高さを算出する方法については検討されていない。さらに、消点算出が必要であるため、人など自然物の画像には適用困難である。 In Patent Document 1, one viewpoint of two images (two left and right stereo images) simultaneously taken by two cameras having different shooting viewpoints is converted into an image captured by the viewpoint of the other imaging means, and after the viewpoint conversion. The image is matched with the image captured by the other camera to detect a tall object. However, the method of Patent Document 1 has a camera installation condition that the two cameras are installed at the same height from the floor so that the directions are equal and the optical axes are parallel, and the height of the object is high. The method of calculating the height has not been examined. Furthermore, since it is necessary to calculate the vanishing point, it is difficult to apply it to images of natural objects such as humans.

特許文献１におけるようなカメラ設置条件が課されない手法として三角測量があり、例えば非特許文献１において三角測量関数（triangulatePoints関数）として利用可能になっている。知られているように、三角測量では対象となる点Pの三次元座標P(x,y,z)を、この点Pの2枚の画像中ので対応点座標(x_L,y_L)と(x_R,y_R)から、以下の式で求めることができる。ここでlはカメラ間の距離であり、fはカメラの焦点距離である。
x=x_Ll/(x_L-x_R)
y=fl/(x_L-x_R)
z=y_Ll/(x_L-x_R) 又は z=y_Rl/(x_L-x_R) Triangulation is a method that does not impose camera installation conditions as in Patent Document 1, and can be used as a triangulate points function in Non-Patent Document 1, for example. As is known, in triangulation, the three-dimensional coordinates P (x, y, z) of the target point P are the corresponding point coordinates (x _L , y _L ) in the two images of this point P. From (x _R , y _R ), it can be calculated by the following formula. Where l is the distance between the cameras and f is the focal length of the cameras.
x = x _L l / (x _L -x _R )
y = fl / (x _L -x _R )
z = y _L l / (x _L -x _R ) or z = y _R l / (x _L -x _R )

ここで、非特許文献１では、カメラの内部と外部パラメータを算出しておく必要がある。すなわち、実際のカメラレンズは、主に半径方向の歪みや、わずかに円周方向の歪みを持っているので、既知のキャリブレーションパターンを写した複数の視点（つまり、各視点が、複数の3次元点と2次元点の対応として記述）から、カメラの焦点距離を含めてカメラの内部パラメータ・外部パラメータを推定しておく。 Here, in Non-Patent Document 1, it is necessary to calculate the internal and external parameters of the camera. That is, since an actual camera lens mainly has radial distortion and slight circumferential distortion, multiple viewpoints (that is, each viewpoint has a plurality of 3) that capture a known calibration pattern. Estimate the internal and external parameters of the camera, including the focal distance of the camera, from the correspondence between the 3D point and the 2D point).

また、非特許文献３では、バレーボールのスポーツ映像（多視点映像）におけるボール検出手法として、少なくとも２台のカメラの三次元位置(X_c1,Y_c1,Z_c1),(X_c2,Y_c2,Z_c2)と、バレーネット上に設定される仮想平面（Virtual plane）のボールの位置(X_b1,Y_b1,0),(X_b2,Y_b2,0)を用いて三次元空間内でのボール位置を推定する。ここで、前記(X_b1,Y_b1,0),(X_b2,Y_b2,0)はカメラで撮った画像と仮想平面間の射影変換パラメータを事前に算出し、画像から射影変換で算出しておく。 Further, in Non-Patent Document 3, as a ball detection method in a volleyball sports image (multi-viewpoint image), three-dimensional positions (X _c1 , Y _c1 , Z _c1 ), (X _c2 , Y _c2 ,) of at least two cameras are used. Z _c2 ) and the ball position (X _b1 , Y _b1 , 0), (X _b2 , Y _b2 , 0) of the virtual plane set on the valley net in the three-dimensional space. Estimate the ball position. Here, the above (X _b1 , Y _b1 , 0), (X _b2 , Y _b2 , 0) calculate the projective transformation parameters between the image taken by the camera and the virtual plane in advance, and calculate from the image by projective transformation. Keep it.

特開昭63-108220号公報Japanese Unexamined Patent Publication No. 63-108220

Open Source Computer Vision Library, https://github.com/opencv/opencvOpen Source Computer Vision Library, https://github.com/opencv/opencv 精密工学会画像応用技術専門委員会画像処理応用システム基礎から応用まで東京電機大学出版局 2000Precision Engineering Society Image Application Technology Expert Committee Image Processing Application System From Basics to Applications Tokyo Denki University Press 2000 M. Takahashi, K. Ikeya, M. Kano, H. Ookubo and T. Mishina, "Robust volleyball tracking system using multi-view cameras," 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, 2016, pp. 2740-2745.M. Takahashi, K. Ikeya, M. Kano, H. Ookubo and T. Mishina, "Robust volleyball tracking system using multi-view cameras," 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, 2016, pp. 2740 -2745. Q. Yao, A. Kubota, K. Kawakita, K. Nonaka, H. Sankoh and S. Naito, "Fast camera self-calibration for synthesizing Free Viewpoint soccer Video," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, 2017, pp. 1612-1616.Q. Yao, A. Kubota, K. Kawakita, K. Nonaka, H. Sankoh and S. Naito, "Fast camera self-calibration for synthesizing Free Viewpoint soccer Video," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing ( ICASSP), New Orleans, LA, 2017, pp. 1612-1616. M. A. Fischler, R. C. Bolles. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Comm. of the ACM, Vol 24, pp 381-395, 1981.M. A. Fischler, R. C. Bolles. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Comm. Of the ACM, Vol 24, pp 381-395, 1981. Joseph Redmon and Ali Farhadi, "Yolov3: An incremental improvement," arXiv, 2018.Joseph Redmon and Ali Farhadi, "Yolov3: An incremental improvement," arXiv, 2018. Zhe Cao, Tomas Simon, Shih-EnWei, and Yaser Sheikh, "Realtime multi-person 2d pose estimation using part affinity fields," in CVPR 2017, 2017Zhe Cao, Tomas Simon, Shih-EnWei, and Yaser Sheikh, "Realtime multi-person 2d pose estimation using part affinity fields," in CVPR 2017, 2017

以上のような従来技術では、少なくとも2枚の画像から対象の空間内での高さに関する情報を得ようとすることを考えた場合に、制約があった。 In the above-mentioned conventional technique, there is a limitation when it is considered to obtain information on the height in the target space from at least two images.

すなわち、非特許文献１では既に述べた通り、既知のキャリブレーションパターンとしてチェスボードを使ってカメラキャリブレーションを行って、カメラの内部と外部パラメータを算出する必要があった。さらに、カメラが移動すると、再度のキャリブレーションが必要となった。このためには例えば非特許文献４のように、動的に対応点を検出する必要があった。非特許文献３も、カメラの3次元位置を求める必要があった。 That is, as already described in Non-Patent Document 1, it is necessary to perform camera calibration using a chess board as a known calibration pattern and calculate internal and external parameters of the camera. In addition, as the camera moved, it had to be recalibrated. For this purpose, it is necessary to dynamically detect the corresponding points, for example, as in Non-Patent Document 4. Non-Patent Document 3 also needed to find the three-dimensional position of the camera.

以上のような従来技術の課題に鑑み、本発明は、画像から対象の高さに関する情報を簡素に推定することが可能な高さ推定装置及びプログラムを提供することを目的とする。 In view of the above problems of the prior art, it is an object of the present invention to provide a height estimation device and a program capable of simply estimating information about the height of an object from an image.

上記目的を達成するため、本発明は高さ推定装置であって、少なくとも２つの視点による画像として第一画像及び第二画像を含む多視点画像より、当該多視点画像に撮影されている空間内平面における同一点対応を、前記第一画像及び前記第二画像の間において求め、当該同一点対応から平面射影変換を求めるパラメータ算出部と、前記第一画像及び前記第二画像より、同一対象に該当する点を第一位置及び第二位置としてそれぞれ検出する対象検出部と、前記第二位置に前記平面射影変換を適用して、当該第二位置を前記第一画像での座標に変換した変換位置を求める変換部と、前記第一位置と前記変換位置との相違に基づいて、前記同一対象の前記空間内平面からの高さに関する情報を推定する推定部と、を備えることを特徴とする。また、コンピュータを前記高さ推定装置として機能させるプログラムであることを特徴とする。 In order to achieve the above object, the present invention is a height estimation device, and is used in a space captured by a multi-viewpoint image rather than a multi-viewpoint image including a first image and a second image as images from at least two viewpoints. The same target is obtained from the first image and the second image from the parameter calculation unit that obtains the same point correspondence in the plane between the first image and the second image and obtains the plane projection conversion from the same point correspondence. A target detection unit that detects the corresponding points as the first position and the second position, respectively, and a conversion that applies the plane projection conversion to the second position and converts the second position into the coordinates in the first image. It is characterized by including a conversion unit for obtaining a position and an estimation unit for estimating height information regarding the height of the same object from the in-space plane based on the difference between the first position and the conversion position. .. Further, the program is characterized in that the computer functions as the height estimation device.

本発明によれば、入力される多視点画像のみを主に利用することで、チェスボードを用いたカメラキャリブレーション等を必ずしも必要とすることなく簡素に、対象の高さに関する情報を推定することが可能である。 According to the present invention, by mainly using only the input multi-viewpoint image, information on the height of the target can be simply estimated without necessarily requiring camera calibration using a chess board. Is possible.

一実施形態に係る高さ推定装置の機能ブロック図である。It is a functional block diagram of the height estimation device which concerns on one Embodiment. 入力としての多視点画像を撮影するための構成の模式例を示す図である。It is a figure which shows the schematic example of the structure for taking a multi-viewpoint image as an input. 図２の例において行列を求めるための所定の平面上の点の例を示す図である。It is a figure which shows the example of the point on a predetermined plane for finding a matrix in the example of FIG. 物体検出技術による検出結果として画像内の囲み領域の模式例を示す図である。It is a figure which shows the schematic example of the enclosed area in an image as a detection result by an object detection technique. 検出される関節骨格の模式例を示す図である。It is a figure which shows the schematic example of the detected articular skeleton. 変換部が変換誤差を求める対象の模式例を示す図である。It is a figure which shows the schematic example of the object which the conversion part asks for the conversion error. 推定部が誤差から高さを推定する原理を説明する模式図である。It is a schematic diagram explaining the principle that the estimation part estimates the height from an error. 学習データの一部分の模式例を表として示す図である。It is a figure which shows the schematic example of a part of the training data as a table. 図６の画像例に対応する例として、空間内平面を均等に区切った領域の画像上での対応領域を、この画像上でのモデルパラメータの算出単位領域とする例を示す図である。As an example corresponding to the image example of FIG. 6, it is a figure which shows the example which makes the corresponding area on the image of the area which divided the plane in space evenly, as the calculation unit area of a model parameter on this image.

図１は、一実施形態に係る高さ推定装置10の機能ブロック図である。図示するように、高さ推定装置10は、パラメータ算出部1、対象検出部2、変換部3、推定部4を備える。その全体的な動作として、高さ推定装置10は、入力として多視点画像を、パラメータ算出部1及び対象検出部2のそれぞれにおいて受け取り、推定部4より、多視点画像における対象の高さに関する情報を出力するものである。 FIG. 1 is a functional block diagram of the height estimation device 10 according to the embodiment. As shown in the figure, the height estimation device 10 includes a parameter calculation unit 1, a target detection unit 2, a conversion unit 3, and an estimation unit 4. As its overall operation, the height estimation device 10 receives a multi-viewpoint image as an input in each of the parameter calculation unit 1 and the target detection unit 2, and the estimation unit 4 provides information on the height of the target in the multi-viewpoint image. Is output.

ここで、高さ推定装置10に対する入力データとしての多視点画像は、少なくとも2つの視点で構成されたものであり、同一の現実空間を異なるカメラ視点でそれぞれ同時に撮影したものである。当該入力データとしての多視点画像は、多視点映像上のある一つの時刻のフレーム画像であってもよい。 Here, the multi-viewpoint image as input data for the height estimation device 10 is composed of at least two viewpoints, and the same real space is simultaneously photographed from different camera viewpoints. The multi-viewpoint image as the input data may be a frame image at a certain time on the multi-viewpoint video.

図２は、入力としての多視点画像を撮影するための構成の模式例を示す図である。図２の例は、現実空間としてのサッカー場のサッカーフィールドFLを異なる位置及び向きにある2つのカメラC1,C2でそれぞれ撮影して2つの視点での多視点画像を得る構成を、サッカー場FLの上空側から見た状態を模式的に示している。なお、サッカーフィールドFLはサッカー試合中であり、高さ推定装置10によって高さを推定される1つ以上の対象の例としての選手やボール等が存在するが、図２では描くのを省略している。 FIG. 2 is a diagram showing a schematic example of a configuration for capturing a multi-viewpoint image as an input. In the example of Fig. 2, the soccer field FL as a real space is shot with two cameras C1 and C2 at different positions and orientations to obtain multi-viewpoint images from two viewpoints. The state seen from the sky side is schematically shown. The soccer field FL is in a soccer match, and there are players, balls, etc. as examples of one or more targets whose height is estimated by the height estimation device 10, but they are omitted in FIG. ing.

この図２の例のように、高さ推定装置10に対する入力としての多視点画像における選手やボール等の対象は、サッカーフィールドFLがなす空間内平面に対して、当該空間内平面上に接して、すなわち、高さゼロの状態で存在するか、あるいは、選手が飛び上がることにより又はボールが蹴り上げられることにより当該平面から離れて、すなわち、高さを有した状態で存在しているものとする。高さ推定装置10によれば、このサッカーフィールドFLがなす平面のような多視点画像に撮影されている現実世界における空間内平面を基準とした、サッカー選手やボール等のような多視点画像に撮影されている対象の高さを推定することが可能である。 As in the example of FIG. 2, the target such as a player or a ball in the multi-viewpoint image as an input to the height estimation device 10 is in contact with the in-space plane formed by the soccer field FL on the in-space plane. That is, it is assumed that it exists in a state of zero height, or that it exists away from the plane by a player jumping up or by a ball being kicked up, that is, in a state of having a height. .. According to the height estimation device 10, it is possible to obtain a multi-viewpoint image such as a soccer player or a ball based on a plane in space in the real world captured by a multi-viewpoint image such as a plane formed by this soccer field FL. It is possible to estimate the height of the object being photographed.

以下、図２のように2つの視点によってサッカーフィールドFLを撮影した多視点画像が入力データである場合を主な例として、各部1～4の詳細を説明する。なお、説明のため、２つのカメラC1,C2の画像を画像P1,P2とする。すなわち、説明例としての入力データの多視点画像はこの画像ペアP1,P2であるものとする。 Hereinafter, the details of each part 1 to 4 will be described with the case where the multi-viewpoint image obtained by shooting the soccer field FL from two viewpoints as the input data is the main example as shown in FIG. For the sake of explanation, the images of the two cameras C1 and C2 are referred to as images P1 and P2. That is, it is assumed that the multi-viewpoint image of the input data as an explanatory example is this image pair P1 and P2.

[パラメータ算出部1]
パラメータ算出部1では、多視点画像から撮影されている現実空間における平面上の点から、各視点の画像の間で同一点に該当するものの対応関係を求めて、この対応関係から、各視点の画像間での座標変換を行う平面射影変換のパラメータを平面射影変換行列Hとして算出し、このパラメータHを変換部3に出力する。 [Parameter calculation unit 1]
The parameter calculation unit 1 obtains the correspondence between the images of each viewpoint that correspond to the same point from the points on the plane in the real space taken from the multi-viewpoint image, and from this correspondence, the correspondence of each viewpoint is obtained. The parameters of the planar projective transformation that performs coordinate transformation between images are calculated as the planar projective transformation matrix H, and this parameter H is output to the conversion unit 3.

ここで、平面射影変換行列Hに関しては周知のように、三次元空間内の所定平面（すなわち、図２の多視点画像P1,P2の例であれば例えばサッカーフィールドFLのなす空間内平面）上にある任意の点p=(X,Y,Z)が、これを撮影する画像P1においては点Aとして位置(x₁,y₁)にあり、同じくこれを撮影する画像P2においては点Bに(x₂,y₂)にある場合に、以下の式(1)によってこの同一点(X,Y,Z)に対応する点A,Bの間の座標変換を可能とするものである。 Here, as is well known about the planar projective transformation matrix H, on a predetermined plane in a three-dimensional space (that is, in the case of the multi-viewpoint images P1 and P2 in FIG. 2, for example, the in-space plane formed by the soccer field FL). Any point p = (X, Y, Z) in is at the position (x ₁ , y ₁ ) as point A in the image P1 to capture this, and to point B in the image P2 to capture this as well. In the case of (x ₂ , y ₂ ), the following equation (1) enables coordinate transformation between points A and B corresponding to this same point (X, Y, Z).

周知のように、式(1)にてベクトル(x₁,y₁,1)^T及び(x₂,y₂,1)^T（Tは転置）は点A,Bの斉次座標表現であり、式(1)では画像P2の点Bの座標(x₂,y₂)を画像P1の点Aの座標(x₁,y₁)へと変換する平面射影変換行列Hが示されている。同様に、この逆行列（平面射影変換行列H^-1）で点Aの座標(x₁,y₁)から点Bの座標(x₂,y₂)への逆変換も可能である。 As is well known, in equation (1), the vectors (x ₁ , y ₁ , 1) ^T and (x ₂ , y ₂ , 1) ^T (T is a transformation) are homogeneous coordinate representations of points A and B. , Equation (1) shows the planar projective transformation matrix H that transforms the coordinates (x ₂ , y ₂ ) of the point B of the image P2 into the coordinates (x ₁ , y ₁ ) of the point A of the image P1. Similarly, this inverse matrix (plane projection transformation matrix H ^-1 ) can be used for inverse transformation from the coordinates of point A (x ₁ , y ₁ ) to the coordinates of point B (x ₂ , y ₂ ).

また、平面射影変換行列Hの求め方に関しても、拡張現実表示等の分野において既知のように、画像P1,P2間で、撮影されている空間内の所定平面上の同一点を表している点対応を少なくとも4つ求め、これら少なくとも4点の画像座標を用いて最小二乗法等を用いた数値計算により、この行列Hを求めることができる。 Also, regarding how to obtain the planar projective transformation matrix H, as is known in the field of augmented reality display, the point that the images P1 and P2 represent the same point on a predetermined plane in the space being photographed. At least four correspondences can be obtained, and this matrix H can be obtained by numerical calculation using the minimum square method or the like using the image coordinates of these at least four points.

図３は、図２の例に対応して図２に追加で描いたものとして、行列Hを求めるための点の例を白丸（○）で示す図である。すなわち、画像P1,P2において図２のようにカメラC1,C2でサッカーフィールドFLを撮影している場合は、パラメータ算出部1は、図３にさらに10個の白丸（○）によって示されるように、このサッカーフィールドFL（すなわち平面）上のラインマークによって形成される10個の所定のコーナー点（互いに識別可能な特徴点）を画像P1,P2においてそれぞれ検出して対応関係を求めたうえで、行列Hを計算することができる。 FIG. 3 is a diagram in which an example of a point for obtaining the matrix H is shown by a white circle (◯) as an additional drawing in FIG. 2 corresponding to the example of FIG. That is, when the soccer field FL is photographed by the cameras C1 and C2 in the images P1 and P2 as shown in FIG. 2, the parameter calculation unit 1 is shown by 10 white circles (○) in FIG. , Ten predetermined corner points (characteristic points that can be distinguished from each other) formed by the line mark on this soccer field FL (that is, a plane) are detected in the images P1 and P2, respectively, and the correspondence is obtained. The matrix H can be calculated.

具体的にパラメータ算出部1では、第一実施形態として、図３のような特徴的なコーナー点を検出可能な既存技術であるSIFT特徴等の特徴点及び特徴量を画像P1,P2からそれぞれ、特徴点（座標）を検出し、且つ、この特徴点近傍から特徴量（ベクトル）を抽出することによって求め、さらに、画像P1,P2間で特徴量が一致すると判定されるものによって特徴点の対応関係を求め、この対応関係によって対応している画像P1,P2の特徴点の座標を用いて、行列Hを求めることができる。 Specifically, in the parameter calculation unit 1, as the first embodiment, feature points and feature quantities such as SIFT features, which are existing technologies capable of detecting characteristic corner points as shown in FIG. 3, are obtained from images P1 and P2, respectively. Correspondence of feature points by detecting feature points (coordinates) and extracting feature quantities (vectors) from the vicinity of these feature points, and further, by determining that the feature quantities match between images P1 and P2. The relationship can be obtained, and the matrix H can be obtained using the coordinates of the feature points of the images P1 and P2 corresponding to this correspondence.

ここで、対応関係をロバストに求めるために、前掲の非特許文献５に開示される既存手法であるRANSAC（ランダムサンプル投票）を用いてもよい。RANSACにおいてはランダムに抽出したサンプルにおける対応関係からモデルパラメータを求めたうえでインライア及びアウトライアの判定を行い、インライア数でモデルパラメータの妥当性をスコア評価することを繰り返す。このモデルパラメータとして行列Hを用いて、行列Hで対応点を座標変換した際に近傍位置にあると判定されるものをインライアとしてカウントすることで、最適なモデルパラメータとしての行列Hを結果として出力するようにすればよい。 Here, RANSAC (random sample voting), which is an existing method disclosed in Non-Patent Document 5 described above, may be used in order to robustly obtain a correspondence relationship. In RANSAC, after obtaining the model parameters from the correspondence in the randomly sampled samples, the inliers and outliers are judged, and the validity of the model parameters is repeatedly evaluated by the number of inliers. By using the matrix H as this model parameter and counting as an inlier what is determined to be in the vicinity when the corresponding points are coordinate-converted by the matrix H, the matrix H as the optimum model parameter is output as a result. You just have to do it.

[対象検出部2]
対象検出部2は、多視点画像の各画像において対象（サッカー選手やボール等）を検出し、検出結果として各画像において対象が検出された画像上の位置座標を、変換部3へと出力する。 [Target detection unit 2]
The target detection unit 2 detects a target (soccer player, ball, etc.) in each image of the multi-viewpoint image, and outputs the position coordinates on the image in which the target is detected in each image as a detection result to the conversion unit 3. ..

第一実施形態にて、対象検出部2は、各画像において対象が占める範囲を検出したうえで、この範囲のうち最も対象が存在する空間内平面に近い側にある位置座標として、事前に設定される多視点画像の撮影条件などに基づいて判定される所定の位置を、変換部3へと出力すればよい。 In the first embodiment, the target detection unit 2 detects the range occupied by the target in each image, and then sets it in advance as the position coordinates on the side of this range closest to the in-space plane where the target exists. A predetermined position determined based on the shooting conditions of the multi-viewpoint image to be obtained may be output to the conversion unit 3.

すなわち、多視点画像の画像座標を(x,y)とし、+x方向が水平右方向、-x方向が水平左方向、+y方向が垂直下方向、-y方向が垂直上方向であるものとする。事前設定される撮影条件として、多視点画像では現実空間を通常のアングルで撮影しているものとする。すなわち、現実空間の地面等の空間内平面（高さ推定装置10によって高さを推定する基準の平面）が画像においても概ね水平方向（±x方向）をなすように撮影され、この空間内平面に対して垂直上方向が画像においても概ね垂直上方向（-y方向）であり、空間内平面に対して垂直下方向が画像においても概ね垂直下方向（+y方向）であるものとする。 That is, the image coordinates of the multi-viewpoint image are (x, y), the + x direction is the horizontal right direction, the -x direction is the horizontal left direction, the + y direction is the vertical downward direction, and the -y direction is the vertical upward direction. And. As a preset shooting condition, it is assumed that the real space is shot at a normal angle in the multi-viewpoint image. That is, the plane in space such as the ground in the real space (the plane of the reference for estimating the height by the height estimation device 10) is photographed so as to form the horizontal direction (± x direction) in the image, and this plane in space is taken. It is assumed that the vertical upward direction is generally the vertical upward direction (-y direction) in the image, and the vertical downward direction with respect to the in-space plane is approximately the vertical downward direction (+ y direction) in the image.

このような事前の撮影条件のもと、対象検出部2は画像内で検出された対象の領域のうち、撮影されている空間内平面に対して最も垂直下方向に該当する、画像内での最も垂直下の位置を、対象位置として変換部3へと出力する。 Under such prior shooting conditions, the target detection unit 2 is located in the image, which corresponds to the most vertical downward direction with respect to the plane in the space being shot, among the target areas detected in the image. The lowest vertical position is output to the conversion unit 3 as the target position.

また、第二実施形態では、対象検出部2は、画像より対象の範囲を検出したうえで、この範囲内の所定位置（例えば中心の位置）を、対象位置として変換部3へと出力するようにしてもよい。 Further, in the second embodiment, the target detection unit 2 detects the target range from the image, and then outputs a predetermined position (for example, the center position) within this range to the conversion unit 3 as the target position. You may do it.

さらに、第三実施形態では、対象検出部2は、画像より対象を、その部位の区別と共に検出して、検出された対象における所定の部位の位置を、対象位置として変換部3へと出力するようにしてもよい。この部位が点としてはなく一定の範囲を占めるものとして検出される場合は、この一定範囲を占める部位を、上記第一又は第二実施形態における対象領域として扱うようにすればよい。 Further, in the third embodiment, the target detection unit 2 detects the target from the image together with the distinction between the parts, and outputs the position of the predetermined part in the detected target to the conversion unit 3 as the target position. You may do so. When this part is detected as occupying a certain range rather than as a point, the part occupying this certain range may be treated as the target area in the first or second embodiment.

対象検出部2における検出処理には、既存手法を利用してよい。 An existing method may be used for the detection process in the target detection unit 2.

例えば、サッカーフィールドFLにおけるボールや選手などの対象を検出するために、一般の物体検出技術を利用してよい。物体検出技術として例えば、前掲の非特許文献６に開示のYOLOv3を利用してよく、この場合、図４に模式例として示すような画像P内での矩形状の囲み領域（bounding box）B1,B2,B3として対象領域が得られると共に、各対象領域B1,B2,B3における対象種別の情報（物体認識結果）も得られることとなる。例えば、領域B1,B2はサッカー選手であり、領域B3はボールである、という対象種別の情報が得られる。 For example, a general object detection technique may be used to detect an object such as a ball or a player in a soccer field FL. As an object detection technique, for example, YOLO v3 disclosed in Non-Patent Document 6 described above may be used. In this case, a rectangular bounding box B1 in the image P as shown as a schematic example in FIG. 4 may be used. The target area is obtained as B2 and B3, and the information (object recognition result) of the target type in each target area B1, B2 and B3 is also obtained. For example, it is possible to obtain information on the target type that areas B1 and B2 are soccer players and areas B3 are balls.

図４では、選手として検出されている対象領域B1において、前述の第一実施形態によって出力される位置の例として、矩形領域B1のうち最も地面側（最も+y方向に寄った側）の所定位置として底辺の中点bdが示され、また、前述の第二実施形態によって出力される所定位置の例として、矩形領域B1の中心bcが示されている。 In FIG. 4, in the target area B1 detected as a player, as an example of the position output by the above-mentioned first embodiment, the predetermined position on the ground side (the side closest to the + y direction) of the rectangular area B1. The midpoint bd at the base is shown as the position, and the center bc of the rectangular region B1 is shown as an example of the predetermined position output by the second embodiment described above.

なお、対象検出部2においてYOLOv3等の物体検出技術により対象領域を検出する場合、得られる対象種別の情報によって第一実施形態と第二実施形態とを区別して適用するようにしてもよい。例えば、領域B3のボールのように、画像全体内において占める大きさが小さいことが想定されるものは第二実施形態を適用して中心等の所定位置を出力し、領域B1,B2の選手のように画像全体内において占める大きさが大きいことが想定されるものは第一実施形態を適用して最も垂直下側の所定位置を出力するようにしてもよい。ここで、画像全体において占める大きさが小さい又は大きいことを区別する情報は、それぞれの対象種別の情報に予め紐づけておけばよい。 When the target detection unit 2 detects the target area by an object detection technique such as YOLOv3, the first embodiment and the second embodiment may be applied separately according to the obtained target type information. For example, a ball that is supposed to occupy a small size in the entire image, such as a ball in area B3, applies the second embodiment to output a predetermined position such as the center of the player in areas B1 and B2. As described above, in the case where the size of the entire image is expected to be large, the first embodiment may be applied to output the predetermined position on the lowermost vertical side. Here, the information for distinguishing whether the size occupied in the entire image is small or large may be associated with the information of each target type in advance.

また、対象検出部2における検出処理として、検出対象が予めサッカー選手や陸上選手などの人物として設定される場合、前掲の非特許文献７にてOpenPoseと称して開示される関節骨格（スケルトン）検出処理を利用してよい。図５は、これにより検出される関節骨格の模式的な例であり、線分で示されるモデル化された骨格と、この線分の端点として白丸（〇）で示される関節位置と、が示されている。 Further, as the detection process in the target detection unit 2, when the detection target is set in advance as a person such as a soccer player or an athletics player, the joint skeleton (skeleton) detection disclosed as OpenPose in Non-Patent Document 7 described above Processing may be used. FIG. 5 is a schematic example of the joint skeleton detected by this, showing the modeled skeleton indicated by a line segment and the joint position indicated by a white circle (○) as the end point of this line segment. Has been done.

対象検出部2にて関節骨格として対象検出を行う場合、この全体を囲う矩形を定めてから、上記の物体検出技術の場合の矩形領域（bounding box）における場合と同様の手法（第一実施形態又は第二実施形態）で対象位置を出力してもよいし、第三実施形態で対象位置を出力してもよい。第三実施形態の場合、関節骨格のうち所定の関節の位置を対象位置として出力してよい。図５の最も下側（+y方向側）に他よりも大きな白丸（〇）として2つ示されるように、この所定の関節は、通常の立った姿勢において地面等の高さ推定基準となる空間内平面に最も近いことが想定される、右足及び左足としてもよい。 When target detection is performed as a joint skeleton by the target detection unit 2, a rectangle surrounding the entire object is determined, and then the same method as in the case of the rectangular region (bounding box) in the case of the above-mentioned object detection technique (first embodiment). Alternatively, the target position may be output in the second embodiment), or the target position may be output in the third embodiment. In the case of the third embodiment, the position of a predetermined joint in the joint skeleton may be output as the target position. As shown by two larger white circles (○) on the lowermost side (+ y direction side) of FIG. 5, this predetermined joint serves as a reference for estimating the height of the ground or the like in a normal standing posture. It may be the right foot and the left foot, which are assumed to be the closest to the plane in space.

変換部3では、パラメータ算出部1で得た平面射影変換行列Hを、対象検出部2で得た多視点画像P1,P2における画像P2の対象位置B=(x₂,y₂)に対して式(1)の右辺のように乗算することで、画像P2とは別視点の画像P1における位置C=(x_2[変換],y_2[変換])へと変換し、画像P2の点Bの別視点の画像P1での対応点Aの位置A=(x₁,y₁)と変換位置C=(x_2[変換],y_2[変換])との距離d(A,C)を、画像P1の点Aとこれに対応する画像P2の点Bとの、画像P1の座標上での変換誤差error(A,B)を表すものとして以下の式(2),(3)のように求め、求めた誤差error(A,B)を推定部4へと出力する。 In the conversion unit 3, the planar projective transformation matrix H obtained by the parameter calculation unit 1 is applied to the target position B = (x ₂ , y ₂ ) of the image P2 in the multi-viewpoint images P1 and P2 obtained by the target detection unit 2. By multiplying like the right side of equation (1), it is converted to the position C = (x _{2 [conversion]} , y _{2 [conversion]} ) in the image P1 at a different viewpoint from the image P2, and the point B of the image P2. The distance d (A, C) between the position A = (x ₁ , y ₁ ) of the corresponding point A in the image P1 from another viewpoint and the conversion position C = (x _{2 [conversion]} , y _{2 [conversion]} ). , The following equations (2) and (3) represent the conversion error error (A, B) between the point A of the image P1 and the corresponding point B of the image P2 on the coordinates of the image P1. And output the obtained error error (A, B) to the estimation unit 4.

図６に、上記の式(2),(3)で変換部3が変換誤差error(A,B)を求める対象の模式例を示す。すなわち、画像P1における点A（白丸〇で示す）と、点Aに対応する点としての画像P2における点B（黒丸●で示す）と、点Bを式(2)による行列Hの乗算で画像P1の座標へと座標変換した点C（黒丸●で示す）と、が示されている。特に、その最下段側に画像P1上において、点Aと、画像P2の点Bを行列Hで座標変換した点Cと、が共に示されている。 FIG. 6 shows a schematic example of a target for which the conversion unit 3 obtains a conversion error error (A, B) in the above equations (2) and (3). That is, the image is obtained by multiplying the point A (indicated by the white circle 〇) in the image P1, the point B (indicated by the black circle ●) in the image P2 as the point corresponding to the point A, and the point B by the matrix H according to the equation (2). The point C (indicated by the black circle ●), which has been converted to the coordinates of P1, is shown. In particular, on the lowermost side of the image P1, both the point A and the point C obtained by converting the coordinates of the point B of the image P2 by the matrix H are shown.

対象検出部2から2組以上のこのような画像P1,P2間での対応点A,Bが得られている場合、変換部3ではそのそれぞれについて変換誤差error(A,B)を求めればよい。 When the corresponding points A and B between two or more sets of such images P1 and P2 are obtained from the target detection unit 2, the conversion unit 3 may obtain a conversion error error (A, B) for each of them. ..

なお、変換部3においてこの誤差error(A,B)を求めるためには、画像P1における点Aと、（空間内の同一対象を撮影したものとして）これに対応する画像P2における点Bと、の対応が既知である必要がある。この対応関係は、対象検出部2で一般物体検出を利用した場合であれば、その際に得られる物体種別の情報（物体認識結果）が同じとなるものが、異なる視点の画像P1,P2間での対応点A,Bであるものと判断すればよい。対象検出部2で骨格関節検出処理を利用した場合であれば、骨格関節が得られた領域を囲む矩形領域にさらに一般物体検出を適用して、同様に対応関係を取得すればよい。 In order to obtain this error error (A, B) in the conversion unit 3, a point A in the image P1 and a point B in the corresponding image P2 (assuming that the same object in space is photographed) are used. Correspondence needs to be known. This correspondence is that when the target detection unit 2 uses general object detection, the information (object recognition result) of the object type obtained at that time is the same, but the images P1 and P2 from different viewpoints. It can be judged that the corresponding points A and B in. When the skeletal joint detection process is used in the target detection unit 2, the general object detection may be further applied to the rectangular area surrounding the area where the skeletal joint is obtained, and the correspondence relationship may be obtained in the same manner.

また、物体種別の情報では1対1の対応関係が得られない場合、仮の対応関係を与えておいて誤差error(A,B)を評価し、その値が最小となるものが対応するものであると判定してもよい。例えば、画像P1において「選手」が点A4,A5として2人検出され、画像P2においても「選手」が点B4,B5として2人検出されている場合、以下のように判断すればよい。3つ以上の候補がある場合も同様にすればよい。 In addition, if a one-to-one correspondence cannot be obtained from the information of the object type, a temporary correspondence is given and the error error (A, B) is evaluated, and the one with the smallest value corresponds. It may be determined that. For example, if two "players" are detected as points A4 and A5 in image P1 and two "players" are detected as points B4 and B5 in image P2, the following judgment may be made. Do the same if there are three or more candidates.

「error(A4,B4)＜error(A4,B5)」が真ならば、点A4と点B4が対応し、点A5と点B5が対応する。これが偽ならば、点A4と点B5が対応し、点A5と点B4が対応する。 If "error (A4, B4) <error (A4, B5)" is true, points A4 and B4 correspond, and points A5 and B5 correspond. If this is false, points A4 and B5 correspond, and points A5 and B4 correspond.

あるいは、点の近傍領域の色ヒストグラムなどを求め、これが類似しているものが対応しているものとして判定してもよい。すなわち、点A4の近傍の色ヒストグラムhist(A4)に類似するのは点B4近傍の色ヒストグラムhist(B4)か、または点B5近傍の色ヒストグラムhist(B5)かを調べ、より類似している方に対応しているものと判定してもよい。 Alternatively, a color histogram in a region near a point may be obtained, and it may be determined that similar ones correspond to each other. That is, it is more similar by examining whether the color histogram hist (A4) near the point A4 is similar to the color histogram hist (B4) near the point B4 or the color histogram hist (B5) near the point B5. It may be determined that it corresponds to one.

[推定部4]
推定部4では、以下のような考察に基づき、変換部3で得た誤差に所定の関係を適用することにより、この誤差を変換部3で求める対象となった対象検出部2で得た対象位置ごとの、高さに関する情報を推定して出力する。 [Estimator 4]
Based on the following considerations, the estimation unit 4 applies a predetermined relationship to the error obtained by the conversion unit 3, and the target obtained by the target detection unit 2 for which this error is obtained by the conversion unit 3 is obtained. Estimates and outputs height information for each position.

図７は、推定部4が誤差から高さを推定する原理を説明する模式図である。図７にて上段側Uには、図６の画像P2内の領域R2内に存在する高さを有する対象の例として旗F（ただし、図６では描くのを省略した）と、この旗Fを含む領域R2を図６の行列Hで画像P1の座標に変換した際の領域R1と、が示されている。（なお、領域R1,R2は共に矩形によりその一部を描いている。）旗Fは地面から立つポール部と、これにより支えられる縞模様の旗地部と、から構成されており、撮影された状態の画像P2内の領域R2では-y軸方向（垂直高さ方向）に立った状態として正面から歪みのない状態で見えているが、行列Hで変換した領域R1では概ね+x軸方向（水平横方向）に倒れて且つその形状が歪んだ状態FS（行列Hの写像による像FS）として見えている。 FIG. 7 is a schematic diagram illustrating the principle that the estimation unit 4 estimates the height from the error. In FIG. 7, on the upper side U, a flag F (however, not drawn in FIG. 6) and this flag F are shown as examples of objects having a height existing in the region R2 in the image P2 of FIG. The region R1 when the region R2 including the above is converted into the coordinates of the image P1 by the matrix H in FIG. 6 is shown. (In addition, areas R1 and R2 are both drawn by a rectangle.) Flag F is composed of a pole part that stands from the ground and a striped flag part that is supported by it, and was photographed. In the region R2 in the image P2 in the state of being in a state, it is visible from the front as standing in the -y axis direction (vertical height direction), but in the region R1 converted by the matrix H, it is generally in the + x axis direction. It is seen as a state FS (image FS by mapping the matrix H) in which it is tilted (horizontally and laterally) and its shape is distorted.

図７にて、画像P1の領域としての領域R1には、歪んだ状態FSに加えて、画像P1に撮影された本来の状態にある旗Fも重ねて描かれており、-y軸方向（垂直高さ方向）に立った状態として、且つ、画像P2の領域R2の場合とは異なり正面からは傾いて見えた状態として、旗Fが描かれている。（すなわち、領域R1において旗Fは画像P1における実写の状態で実在するものであり、写像による像FSは画像P1には存在しないものであるが、説明のために旗F及び写像による像FSを画像P1内での領域R1での対応位置に同時に描いている。） In FIG. 7, in the region R1 as the region of the image P1, in addition to the distorted state FS, the flag F in the original state captured in the image P1 is also drawn, and is drawn in the -y axis direction (-y-axis direction). The flag F is drawn as a state of standing (in the vertical height direction) and as a state of being tilted from the front unlike the case of the area R2 of the image P2. (That is, in the region R1, the flag F exists in the live-action state in the image P1, and the image FS by the mapping does not exist in the image P1, but the flag F and the image FS by the mapping are used for explanation. It is drawn at the same time in the corresponding position in the area R1 in the image P1.)

推定部4では、この領域R1において撮影された本来の状態の旗Fと歪んだ像FSとの関係として模式的に示されるように、旗Fのように高さのある対象に関して、その地面（行列Hを計算する基準となった空間内平面）により近い側の点に関しては行列Hによる変換での位置ずれが小さいが、高さを有して地面から遠ざかる点ほど、行列Hによる変換での位置ずれが大きくなる、という性質を利用して、誤差から高さを推定する。 In the estimation unit 4, as shown schematically as the relationship between the flag F in the original state taken in this region R1 and the distorted image FS, the ground (for a tall object such as the flag F) ( For points closer to (the plane in space that was the basis for calculating the matrix H), the positional deviation in the transformation by the matrix H is small, but the point that has a height and is farther from the ground is the point in the transformation by the matrix H. The height is estimated from the error by using the property that the misalignment becomes large.

図７では下段側Dに、上記の性質の模式例が示されている。下段側Dには上段側Uと同様の領域R1及びR2並びに旗F及びその歪んだ像FSが示されると共に、上記の性質を説明するための点が追加で描かれている。以下、図７の下段側Dを参照しながら上記の性質の模式例を説明する。画像P2の領域R2に撮影されている旗Fには、そのポール部に地面に近い側から順に点B1,B2,B3が示されており、これらの点に画像P1の領域R1で実際に撮影されている旗Fにおいて対応する点が、それぞれ点A1,A2,A3である。一方、点B1,B2,B3を行列Hで領域R1内の領域（旗Fの歪んだ像FSの領域）に変換した点がそれぞれ点C1,C2,C3である。 In FIG. 7, a schematic example of the above-mentioned property is shown on the lower side D. The lower D shows the same regions R1 and R2 as the upper U, the flag F and its distorted image FS, and additional points are drawn to explain the above properties. Hereinafter, a schematic example of the above properties will be described with reference to the lower side D of FIG. 7. The flag F photographed in the area R2 of the image P2 shows points B1, B2, and B3 in order from the side closest to the ground on the pole portion, and these points are actually photographed in the area R1 of the image P1. The corresponding points in the flag F are the points A1, A2, and A3, respectively. On the other hand, the points obtained by converting the points B1, B2, and B3 into the regions in the region R1 (the region of the distorted image FS of the flag F) by the matrix H are the points C1, C2, and C3, respectively.

従って、画像P1内の旗Fのポール部にある点A1,A2,A3と、これらにそれぞれ空間内での同一点として対応する画像P2内の旗Fのポール部にある点B1,B2,B3と、に関して、式(3)の変換誤差を具体的に以下の式(3-1),(3-2),(3-3)のように求めることができる。ここで、図７からも点A1,C1間、点A2,C2間、点A3,C3間の距離の大小関係が式(4)の関係であることが見て取れる。従って、式(3-1),(3-2),(3-3)の変換誤差の大小関係は式(5)の通りとなる。
error(A1,B1)=d(A1,C1)≒0 …(3-1)
error(A2,B2)=d(A2,C2) …(3-2)
error(A3,B3)=d(A3,C3) …(3-3)
d(A1,C1)<d(A2,C2)<d(A3,C3) …(4)
error(A1,B1)<error(A2,B2)<error(A3,B3) …(5) Therefore, the points A1, A2, A3 in the pole part of the flag F in the image P1 and the points B1, B2, B3 in the pole part of the flag F in the image P2 corresponding to these as the same points in the space respectively. With respect to, the conversion error of Eq. (3) can be specifically obtained as the following Eqs (3-1), (3-2), (3-3). Here, it can be seen from FIG. 7 that the magnitude relation of the distances between the points A1, C1, the points A2, C2, and the points A3, C3 is the relation of the equation (4). Therefore, the magnitude relation of the conversion error of Eqs. (3-1), (3-2), and (3-3) is as shown in Eq. (5).
error (A1, B1) = d (A1, C1) ≒ 0… (3-1)
error (A2, B2) = d (A2, C2)… (3-2)
error (A3, B3) = d (A3, C3)… (3-3)
d (A1, C1) <d (A2, C2) <d (A3, C3)… (4)
error (A1, B1) <error (A2, B2) <error (A3, B3)… (5)

一方、旗Fのポール部の点A1,A2,A3は、サッカーフィールドFLを撮影した画像P1内において、サッカーフィールドFLの平面上では同じ位置（旗Fを立てた位置）にあるが、その平面からの高さ（旗F内での高さ）が異なる3点である。具体的に、点A1はサッカーフィールドFLの平面に接しておりその高さh1≒0であり、（従って、点A1に対応する画像P2の点B1を変換した点C1は点A1とほぼ一致し、）点A2はサッカーフィールドの平面から旗半分程度の一定の高さh2を有しており、点A3は旗全部の高さh3を有しており、以下の式(6)の関係がある。（なお、対応する画像P2の点B1,B2,B3でも同様に以下の式(6)の関係がある。）
h1<h2<h3 …(6) On the other hand, the points A1, A2, and A3 of the pole part of the flag F are at the same position (the position where the flag F is erected) on the plane of the soccer field FL in the image P1 of the soccer field FL, but that plane. There are three points with different heights from (the height within the flag F). Specifically, the point A1 is in contact with the plane of the soccer field FL and its height h1 ≈ 0. (Therefore, the point C1 obtained by converting the point B1 of the image P2 corresponding to the point A1 almost coincides with the point A1. ,) Point A2 has a constant height h2 of about half the flag from the plane of the soccer field, and point A3 has the height h3 of the entire flag, which is related to the following equation (6). .. (Note that the points B1, B2, and B3 of the corresponding image P2 also have the relationship of the following equation (6).)
h1 <h2 <h3… (6)

式(5),(6)からわかるように、サッカーフィールドFLの平面上での同じ位置（旗Fが立てられている位置、画像P1では点A1の位置であり画像P2では点B1の位置）でその高さだけを変える場合、高さが点A1,A2,A3の順でh1<h2<h3のように高くなると、これに応じて画像P1での点A1,A2,A3に対応する変換誤差もerror(A1,B1)<error(A2,B2)<error(A3,B3)のように順に大きくなっていく。なぜならば、平面射影変換行列Hによる変換は、サッカーフィールドFLの平面上の点（この平面に乗る高さゼロの点）の変換だからである。 As can be seen from equations (5) and (6), the same position on the plane of the soccer field FL (the position where the flag F is erected, the position of point A1 in image P1 and the position of point B1 in image P2). When changing only the height with, if the height becomes higher like h1 <h2 <h3 in the order of points A1, A2, A3, the conversion corresponding to the points A1, A2, A3 in the image P1 accordingly. The error also increases in order like error (A1, B1) <error (A2, B2) <error (A3, B3). This is because the transformation by the planar projective transformation matrix H is the transformation of a point on the plane of the soccer field FL (a point with a height of zero on this plane).

すなわち、以上のような図７及び式(5),(6)に例示されるように、サッカーフィールドFLの平面上の任意の位置（画像P1の位置(x,y)で指定する）において、この平面からの高さと変換誤差との間は正の相関関係がある。すなわち、高さが大きいほど変換誤差も大きくなるという相関関係がある。この相関関係は、サッカーフィールドFLの平面上の位置を指定する、画像P1の位置(x,y)ごとに定まる相関関係である。従って、推定部4では、この相関関係を逆に利用して以下の式(7)により、変換誤差が大きいほど高さも大きいものとしとして、この位置A=(x,y)（対象検出部2の検出位置）での画像P1内の画素単位での高さh(x,y)を求めることができる。
h(x,y)=a(x,y)*error(A,B)+b(x,y) …(7)
なお、より正確には、この相関関係は、空間内平面を介した画像P1,P2間の変換関係である平面射影変換行列Hにも依存する関係として以下の式(8)のように書けるが、以下では特に断りのない限り、行列Hは一定である前提により式(8)ではなく式(7)に基づき説明を行う。
h(x,y,H)=a(x,y,H)*error(A,B)+b(x,y,H) …(8) That is, as illustrated in FIG. 7 and equations (5) and (6) as described above, at an arbitrary position on the plane of the soccer field FL (specified by the position (x, y) of the image P1). There is a positive correlation between the height from this plane and the conversion error. That is, there is a correlation that the larger the height, the larger the conversion error. This correlation is a correlation determined for each position (x, y) of the image P1 that specifies the position on the plane of the soccer field FL. Therefore, in the estimation unit 4, this correlation is used in reverse, and the height is assumed to be larger as the conversion error is larger by the following equation (7), and this position A = (x, y) (target detection unit 2). The height h (x, y) in pixel units in the image P1 at the detection position) can be obtained.
h (x, y) = a (x, y) * error (A, B) + b (x, y)… (7)
More precisely, this correlation can be written as the following equation (8) as a relationship that also depends on the planar projective transformation matrix H, which is the transformation relationship between the images P1 and P2 via the in-space plane. In the following, unless otherwise specified, the matrix H will be explained based on Eq. (7) instead of Eq. (8) on the assumption that the matrix H is constant.
h (x, y, H) = a (x, y, H) * error (A, B) + b (x, y, H)… (8)

上記の式(7)は、画像P1内の各位置A=(x,y)に応じて定まる定数a(x,y)及びb(x,y)により誤差error(A,B)の線形関数として相関関係がモデル化される場合の例であるが、各位置A=(x,y)に応じて定まる定数によるその他の関数で相関関係をモデル化してもよい。 The above equation (7) is a linear function of error error (A, B) due to the constants a (x, y) and b (x, y) determined according to each position A = (x, y) in the image P1. This is an example of the case where the correlation is modeled as, but the correlation may be modeled by another function with a constant determined according to each position A = (x, y).

ここで、モデルにおける位置A=(x,y)依存の定数は、サッカーフィールドFLの平面上の各位置に種々の高さの対象を実際に設けてそれぞれ所定のカメラ視点にある画像P1,P2の撮影を実際に行う等により取得される学習データを用意しておき、この学習データに対してフィッティングするパラメータとして、最小二乗法などにより予め求めておけばよい。あるいは、サッカーフィールドFL等の実際の対象からではなく、3次元CGモデルを用いて所定のカメラ視点にある画像P1,P2における各位置の種々の高さの対象をレンダリングし、これから学習データを用意して、フィッティングパラメータとしての定数を求めるようにしてもよい。 Here, the position A = (x, y) -dependent constants in the model are images P1 and P2 at a predetermined camera viewpoint by actually providing objects of various heights at each position on the plane of the soccer field FL. It suffices to prepare the training data acquired by actually taking a picture of the image, and to obtain it in advance by the least squares method or the like as a parameter to be fitted to the training data. Alternatively, instead of using an actual object such as a soccer field FL, a 3D CG model is used to render objects at various heights at each position in images P1 and P2 at a predetermined camera viewpoint, and training data is prepared from this. Then, a constant as a fitting parameter may be obtained.

例えば図７の例であれば、サッカーフィールドFLの平面上にある点としての画像P1の点A1（及びこの画像P2における対応点B1）の位置に関する学習データD(A1)として、A1の座標(x1,y1)に式(3-1),(3-2),(3-3)の変換誤差と、式(6)内に現れる高さh1,h2,h3とをそれぞれ紐づけた以下のものを利用することができる。
D(A1)={(x1,y1,error(A1,B1),h1),(x1,y1,error(A2,B2),h2),(x1,y1,error(A3,B3),h3)} For example, in the example of FIG. 7, the coordinates of A1 (and the coordinates of A1) as the training data D (A1) regarding the position of the point A1 (and the corresponding point B1 in this image P2) of the image P1 as a point on the plane of the soccer field FL The following is a combination of the conversion error of Eqs. (3-1), (3-2), (3-3) and the heights h1, h2, h3 appearing in Eqs. (6) to x1, y1), respectively. You can use things.
D (A1) = {(x1, y1, error (A1, B1), h1), (x1, y1, error (A2, B2), h2), (x1, y1, error (A3, B3), h3) }

図８は、上記のような学習データを表形式で示す例として、学習データの一部分の模式例である。このように、画像P1の各位置(x,y)に関して、画像P1,P2間での変換誤差errorを式(3)より求め、且つ、画像P1内での高さhも与えたものとして、学習データを用意しておくことができる。図８の例では位置(3,3)でのモデルと、位置(100,100)でのモデルとを求めることができる。なお、前述の学習データD(A1)や図８の学習データは、学習データの一部分の模式的な例であり、各位置(x,y)での学習データの個数はさらに多数であってもよい。 FIG. 8 is a schematic example of a part of the learning data as an example of showing the learning data in a table format as described above. In this way, assuming that the conversion error error between the images P1 and P2 is obtained from the equation (3) for each position (x, y) of the image P1, and the height h in the image P1 is also given. Learning data can be prepared. In the example of FIG. 8, the model at the position (3,3) and the model at the position (100,100) can be obtained. The above-mentioned training data D (A1) and the training data in FIG. 8 are schematic examples of a part of the training data, and even if the number of training data at each position (x, y) is larger. good.

以上、一実施形態として例示された本発明によれば、チェスボード等を用いたキャリブレーション等を必須とすることなく簡素に、多視点画像における対象位置の空間内平面からの高さに関する情報を推定することが可能である。以下、本発明のその他の実施形態等に関してさらに説明する。 As described above, according to the present invention exemplified as one embodiment, information on the height of the target position from the in-space plane in the multi-viewpoint image can be simply obtained without requiring calibration or the like using a chess board or the like. It is possible to estimate. Hereinafter, other embodiments of the present invention will be further described.

（１）推定部4で推定を可能にするためのモデルパラメータを算出するための学習データを取得するために用いる、カメラC1,C2によるサッカーフィールドFL等の空間内平面を捉えた画像P1,P2（学習用の画像）の撮影姿勢と、動き推定装置10にて実際に推定する対象として入力される画像P1,P2のそれぞれの撮影姿勢と、は大きく変わらないことが望ましい。ただし、動きがあっても画像内での空間内平面の位置状態が変わらないようなものは許容される。例えば、撮影姿勢（並進成分及び回転成分で指定される撮影姿勢）の位置（並進成分）が変化するようなカメラ移動のうち、サッカーフィールドFL等の空間内平面に対するカメラC1,C2の向き及び距離が変わらないような移動、すなわち、その並進成分がサッカーフィールドFL等の空間内平面に平行であるものは許容される。同様に、カメラから空間内平面へと下した垂線を軸としてカメラを回転させ、空間内平面とカメラとの距離及び空間内平面がなす水平面からのカメラ傾きを一定に保って、空間内平面を見る向きが回転するようなカメラの動きも許容される。 (1) Images P1 and P2 that capture the in-space plane such as soccer field FL by cameras C1 and C2, which are used to acquire learning data for calculating model parameters to enable estimation by the estimation unit 4. It is desirable that the shooting posture of the (learning image) and the shooting postures of the images P1 and P2 actually input by the motion estimation device 10 do not differ significantly. However, it is permissible that the position of the plane in space in the image does not change even if there is movement. For example, among camera movements that change the position (translational component) of the shooting posture (shooting posture specified by the translational component and rotation component), the orientation and distance of the cameras C1 and C2 with respect to the in-space plane such as the soccer field FL. It is permissible to move so that the movement does not change, that is, the translational component is parallel to the plane in space such as the soccer field FL. Similarly, the camera is rotated around the vertical line drawn from the camera to the in-space plane, and the distance between the in-space plane and the camera and the camera tilt from the horizontal plane formed by the in-space plane are kept constant to make the in-space plane. It is also permissible to move the camera so that the viewing direction rotates.

この際、カメラC1,C2の少なくとも一方が空間内平面に対して動く場合には式(1)の画像P1,P2間の平面射影変換行列Hも変化しうることとなる。そこで、線形関数でモデル化される場合の例を式(8)として説明したようなモデルパラメータは、複数(所定数K個)の平面射影変換行列H_k(k=1,2,…,K)ごとに学習データを用いて予め算出しておいた、対応するモデルパラメータM_k(k=1,2,…,K)を利用すればよい。すなわち、式(8)の線形関数モデルの場合であれば、所定の複数の平面射影変換行列H_k(k=1,2,…,K)ごとに画素位置(x,y)に依存するものとして定義される、そのモデルパラメータM_k=(a(x,y,H_k),b(x,y,H_k))(k=1,2,…,K)を予め算出して用意しておけばよい。そして例えば、行列差のノルム|H-H_k|の評価などにより、この差が最小となることでパラメータ算出部1で求めた行列Hに最も近いと判定されるような行列H_kに対応するモデルパラメータM_kを利用してもよいし、以下の（３）で後述するのと同様の補完手法で補完されたモデルパラメータを利用してもよい。一方で、カメラC1,C2が共に空間内平面に対して静止している前提においては、この静止状態での一定の行列Hに対応する単一のモデルパラメータを利用すればよい。 At this time, if at least one of the cameras C1 and C2 moves with respect to the plane in space, the planar projective transformation matrix H between the images P1 and P2 in Eq. (1) can also change. Therefore, the model parameters as described in Eq. (8) as an example of modeling with a linear function are multiple (predetermined number K) planar projective transformation matrices H _k (k = 1,2, ..., K). ), The corresponding model parameter M _k (k = 1,2, ..., K) calculated in advance using the training data may be used. That is, in the case of the linear function model of Eq. (8), it depends on the pixel position (x, y) for each predetermined plurality of planar projective transformation matrices H _k (k = 1,2, ..., K). The model parameter M _k = (a (x, y, H _k ), b (x, y, H _k )) (k = 1,2, ..., K) defined as You just have to leave it. Then, for example, by evaluating the norm | HH _k | of the matrix difference, the model parameter corresponding to the matrix H _k is determined to be the closest to the matrix H obtained by the parameter calculation unit 1 by minimizing this difference. M _k may be used, or model parameters complemented by the same complementing method as described later in (3) below may be used. On the other hand, assuming that the cameras C1 and C2 are both stationary with respect to the plane in space, a single model parameter corresponding to a constant matrix H in this stationary state may be used.

（２）推定部4では画像P1の各位置(x,y)に関してモデルパラメータを求めておくのではなく、画像P1の各領域ごとにモデルパラメータを求めておくことで、少ない量の学習データからモデルパラメータを算出可能としてもよい。この場合、パラメータ算出単位となる領域を、画像P1の座標(x,y)で均等に区切る（例えば、縦横共に等分割することで等しいサイズの複数の矩形領域に区切る）のではなく、サッカーフィールドFL等の高さ推定対象となる空間内平面において均等に区切った領域を画像P1で見た際の領域（空間内平面を均等に区切った領域が画像P1内において撮影される領域）によって区切ったものとして設けてもよい。図９に示される例は、図６の画像P1に対応する例として、サッカーフィールドFLの平面を均等に区切った領域の画像P1上での対応領域R11,R12,R13,R14,R15,R16を、画像P1上でのモデルパラメータの算出単位領域とする例である。 (2) The estimation unit 4 does not obtain the model parameters for each position (x, y) of the image P1, but obtains the model parameters for each region of the image P1 from a small amount of training data. Model parameters may be calculable. In this case, the area that is the unit for calculating the parameters is not evenly divided by the coordinates (x, y) of the image P1 (for example, it is divided into a plurality of rectangular areas of the same size by equally dividing both vertically and horizontally), but the soccer field. The area evenly divided in the space plane to be estimated for height such as FL is divided by the area when the image P1 is viewed (the area where the space plane is evenly divided is captured in the image P1). It may be provided as a thing. In the example shown in FIG. 9, as an example corresponding to the image P1 of FIG. 6, the corresponding areas R11, R12, R13, R14, R15, R16 on the image P1 of the area where the plane of the soccer field FL is evenly divided are used. , This is an example of using the model parameter calculation unit area on the image P1.

（３）推定部4では画像P1の各位置(x,y)のうち一部のみに関して、又は、画像P1を区切った各領域のうち一部のみに関して、モデルパラメータを求めておき、モデルパラメータが存在しない位置(x,y)や領域に関しては、その近傍の位置(x,y)や領域においてモデルパラメータが存在するものの値を用いて補完したものを、モデルパラメータとして利用してよい。補完の手法に関しては例えば、画像に解像度変換や幾何学的変換を施す際に画素値を再標本化して補完するのに利用される任意の既存手法（最近傍補完や線形補完など）を利用してよい。図９の例であれば、領域R11,R15,R13のみにモデルパラメータを用意しておき、例えば領域R14のモデルパラメータは隣接する領域R11,R15のモデルパラメータの平均値を用いるようにしてもよい。 (3) In the estimation unit 4, model parameters are obtained for only a part of each position (x, y) of the image P1 or only a part of each region delimited by the image P1, and the model parameter is set. As for the non-existent position (x, y) or region, the model parameter may be complemented by the value of the existing model parameter in the nearby position (x, y) or region. Regarding the complement method, for example, any existing method (nearest neighbor complement, linear interpolation, etc.) used for re-sampling and complementing pixel values when performing resolution conversion or geometric conversion on an image is used. It's okay. In the example of FIG. 9, model parameters may be prepared only in the regions R11, R15, and R13, and for example, the average value of the model parameters of the adjacent regions R11 and R15 may be used as the model parameter of the region R14. ..

（４）推定部4では、出力する高さに関する情報の態様の一実施形態として、式(3)で求めた誤差error(A,B)が所定閾値未満であるか以上であるかの判定により、対象検出部2にて検出した位置（すなわち画像P1の点A）がサッカーフィールドFL等の空間内平面に接しているか否かの判定結果を出力するようにしてもよい。同様に、誤差に対する閾値判定に代えて、以上の実施形態で画像P1の点Aの高さを推定したうえでこの高さに対する閾値判定により、空間平面内に接しているか否かの判定結果を推定部4が出力してもよい。すなわち、推定部4で推定する高さに関する情報は、空間内平面に接しているか、接しておらず空間内平面から離れて上方側に位置しているか、という二値的な情報としてもよい。 (4) In the estimation unit 4, as an embodiment of the aspect of the information regarding the output height, it is determined whether the error error (A, B) obtained by the equation (3) is less than or equal to a predetermined threshold value. , The determination result of whether or not the position detected by the target detection unit 2 (that is, the point A of the image P1) is in contact with the plane in space such as the soccer field FL may be output. Similarly, instead of the threshold value determination for the error, the height of the point A of the image P1 is estimated in the above embodiment, and then the threshold value determination for this height is used to determine whether or not the image P1 is in contact with the space plane. The estimation unit 4 may output. That is, the information regarding the height estimated by the estimation unit 4 may be binary information as to whether it is in contact with the in-space plane or is not in contact with it and is located on the upper side away from the in-space plane.

（５）高さ推定装置10を用いる際は、多視点画像におけるどの視点の画像において高さ情報を推定するかを予め設定しておけばよい。例えば以上の例は、多視点画像として２視点の画像P1,P2を用いる場合に、画像P1において高さ情報を推定するものとして予め設定しておいた場合の例であるが、全く同様にして画像P2において高さ情報を推定するように設定することも可能である。 (5) When using the height estimation device 10, it is sufficient to set in advance in which viewpoint image the height information is estimated in the multi-viewpoint image. For example, the above example is an example in which the height information is estimated in the image P1 when the two-viewpoint images P1 and P2 are used as the multi-viewpoint image. It is also possible to set the image P2 to estimate the height information.

（６）高さ推定装置10に対する入力の多視点画像が3視点以上である場合、高さ情報を推定する対象としての視点を上記のように設定したうえで、その他の視点の画像との間で以上の実施形態（2視点の場合の実施形態）と同様にして変換誤差を求めることによりそれぞれ高さを推定し、その平均値として最終的な推定結果を得るようにしてもよい。 (6) When the multi-viewpoint image of the input to the height estimation device 10 is 3 or more viewpoints, the viewpoint as the target for estimating the height information is set as described above, and then between the images of other viewpoints. In the same manner as in the above embodiment (the embodiment in the case of two viewpoints), the heights may be estimated by obtaining the conversion error, and the final estimation result may be obtained as the average value thereof.

例えば、4視点の多視点画像P1,P2,P3,P4が入力され、画像P1において高さを推定するように設定する場合、画像P1上の対象の位置として対象検出部2により検出された位置Aに関して、変換部3及び推定部4により、画像P1,P2間での変換誤差より高さH2が推定され、画像P1,P3間の変換誤差より高さH3が推定され、画像P1,P4間の変換誤差より高さH4が推定された場合、推定部4では最終的な出力結果としての推定高さをこれら3個の平均(H2+H3+H4)/3として得ることができる。この際、パラメータ算出部1では画像P1,P2間、画像P1,P3間、画像P1,P4間において平面射影変換行列を求めることにより、変換部3で上記それぞれの高さを推定することを可能とする。 For example, when multi-viewpoint images P1, P2, P3, P4 from four viewpoints are input and the height is set to be estimated in the image P1, the position detected by the target detection unit 2 as the position of the target on the image P1. Regarding A, the conversion unit 3 and the estimation unit 4 estimate the height H2 from the conversion error between the images P1 and P2, estimate the height H3 from the conversion error between the images P1 and P3, and estimate the height H3 between the images P1 and P4. When the height H4 is estimated from the conversion error of, the estimation unit 4 can obtain the estimated height as the final output result as the average (H2 + H3 + H4) / 3 of these three. At this time, the parameter calculation unit 1 can estimate the heights of the above by the conversion unit 3 by obtaining the planar projective transformation matrix between the images P1 and P2, between the images P1 and P3, and between the images P1 and P4. And.

あるいは別手法として、3視点以上の多視点画像が入力され、高さ推定対象の視点を設定しておく場合、設定された視点から最も離れた視点における変換誤差から推定される高さを最終結果として出力してもよい。上記の4視点の多視点画像P1,P2,P3,P4が入力され、画像P1において高さを推定するように設定する場合、画像P1のカメラ視点から最も離れた視点が画像P4のカメラ視点であったとする場合、上記のように画像P1,P4間での変換誤差から推定された高さH4を最終的な結果として推定部4が出力してよい。 Alternatively, as another method, when a multi-viewpoint image with three or more viewpoints is input and the viewpoint for height estimation is set, the final result is the height estimated from the conversion error at the viewpoint farthest from the set viewpoint. It may be output as. When the above four viewpoint multi-viewpoint images P1, P2, P3, P4 are input and the height is set to be estimated in the image P1, the viewpoint farthest from the camera viewpoint of the image P1 is the camera viewpoint of the image P4. If so, the estimation unit 4 may output the height H4 estimated from the conversion error between the images P1 and P4 as the final result as described above.

ここで、多視点画像のある視点から最も離れた視点を決定するには、2つのカメラ視点間の相対的な位置関係を表す、コンピュータグラフィックス分野等において既知のカメラの外部パラメータ（3×3の回転成分及び1×3の並進成分を含み、サイズ4×4の行列（空間座標点を斉次座標表現したサイズ1×4の列ベクトルを変換する行列）で与えられる外部パラメータ）に基づいて決定すればよい。このカメラパラメータは多視点画像に紐づけて予め与えておけばよい。 Here, in order to determine the viewpoint farthest from a certain viewpoint of the multi-viewpoint image, an external parameter (3 × 3) of a camera known in the field of computer graphics, etc., which represents the relative positional relationship between the two camera viewpoints. Based on a size 4x4 matrix (an external parameter that transforms a size 1x4 column vector that represents homogeneous coordinate representations of spatial coordinates) that contains the rotational and 1x3 translational components of You just have to decide. This camera parameter may be given in advance in association with the multi-viewpoint image.

（７）推定部4においては、多視点画像P1,P2が入力された場合に図８のような学習データにより学習モデルのパラメータを求めておくことで、画像P1上での対象の点Aの高さ（画像P1における高さ方向である画像座標のy方向における高さ）を推定するものとした。別の実施形態として、学習データにおける高さを多視点画像が撮影されている実空間での高さ（画像上の画素間隔であるピクセル単位の高さではなく、実際の長さとしてメートル等の単位での高さ）として与えておくことにより、画像P1上での対象の点Aの高さを、多視点画像が撮影されている実空間での高さとして推定するようにしてもよい。 (7) In the estimation unit 4, when the multi-viewpoint images P1 and P2 are input, the parameters of the learning model are obtained from the training data as shown in FIG. 8, so that the target point A on the image P1 can be obtained. The height (the height in the y direction of the image coordinates, which is the height direction in the image P1) is estimated. As another embodiment, the height in the training data is the height in the real space where the multi-viewpoint image is taken (not the height in pixel units, which is the pixel spacing on the image, but the actual length, such as meters). By giving it as the height in units), the height of the target point A on the image P1 may be estimated as the height in the real space where the multi-viewpoint image is taken.

（８）高さ推定装置10の適用例として、多視点映像に撮影されている陸上選手の右足及び左足をそれぞれ対象位置として検出することで、走る状態を解析することや、球技スポーツの多視点映像に撮影されているボールを対象位置として検出することで、ボールの軌跡を解析することなどが可能である。 (8) As an application example of the height estimation device 10, the running state can be analyzed by detecting the right foot and the left foot of a track and field athlete captured in a multi-viewpoint image as target positions, and a multi-viewpoint of ball game sports. By detecting the ball captured in the image as the target position, it is possible to analyze the trajectory of the ball.

（９）本発明は、コンピュータを高さ推定装置10として機能させるプログラムとしても提供可能である。当該コンピュータには、CPU(中央演算装置)、メモリ（RAM及びROM）及び各種I/Fといった周知のハードウェア構成のものを採用することができ、CPUが高さ推定装置10の各部の機能に対応するプログラム命令を実行することとなる。また、当該コンピュータはさらに、CPUよりも並列処理を高速実施可能なGPU（グラフィック処理装置）等の専用プロセッサを備え、CPUに代えて高さ推定装置10の全部又は任意の一部分の機能を当該専用プロセッサにおいてプログラムを読み込んで実行するようにしてもよい。また、高さ推定装置10の全部又は任意の一部分の機能を、ネットワークを介した別のコンピュータ（サーバ等）上に実装し、２台以上のコンピュータがネットワーク上で各自の処理結果データを図１に示される形で送受することにより、高さ推定装置10をシステムとして実現するようにしてもよい。 (9) The present invention can also be provided as a program that causes a computer to function as a height estimation device 10. The computer can be equipped with well-known hardware configurations such as CPU (central processing unit), memory (RAM and ROM), and various I / Fs, and the CPU functions as each part of the height estimation device 10. The corresponding program instruction will be executed. In addition, the computer is further equipped with a dedicated processor such as a GPU (graphic processing unit) that can execute parallel processing at a higher speed than the CPU, and instead of the CPU, the function of all or any part of the height estimation device 10 is dedicated. The program may be read and executed in the processor. In addition, the functions of all or any part of the height estimation device 10 are mounted on another computer (server, etc.) via the network, and two or more computers display their processing result data on the network in FIG. 1. The height estimation device 10 may be realized as a system by transmitting and receiving in the form shown in.

10…高さ推定装置、1…パラメータ算出部、2…対象検出部、3…変換部、4…推定部 10 ... height estimation device, 1 ... parameter calculation unit, 2 ... target detection unit, 3 ... conversion unit, 4 ... estimation unit

Claims

少なくとも２つの視点による画像として第一画像及び第二画像を含む多視点画像より、当該多視点画像に撮影されている空間内平面における同一点対応を、前記第一画像及び前記第二画像の間において求め、当該同一点対応から平面射影変換を求めるパラメータ算出部と、
前記第一画像及び前記第二画像より、同一対象に該当する点を第一位置及び第二位置としてそれぞれ検出する対象検出部と、
前記第二位置に前記平面射影変換を適用して、当該第二位置を前記第一画像での座標に変換した変換位置を求める変換部と、
前記第一位置と前記変換位置との相違に基づいて、前記同一対象の前記空間内平面からの高さに関する情報を推定する推定部と、を備え、
前記推定部では、前記第一位置と前記変換位置との距離に基づいて、当該距離が大きいほど前記空間内平面からの高さが大きいものとして推定し、
前記推定部では、前記距離が大きいほど前記空間内平面からの高さが大きいものとして推定することを、前記第一位置に応じて定まるモデルを用いて行い、
前記モデルは、前記第一位置の候補としての前記第一画像内の画素位置の一部のみに関して与えられており、前記第一位置が当該モデルを与えられていない位置である場合には、前記推定部では、近傍位置であって当該モデルが与えられている位置におけるモデルを用いて補完することで、前記第一位置に関するモデルを得ることを特徴とする高さ推定装置。 From a multi-viewpoint image including a first image and a second image as images from at least two viewpoints, the same point correspondence in the space plane captured by the multi-viewpoint image can be obtained between the first image and the second image. And the parameter calculation unit that obtains the plane projection conversion from the same point correspondence,
From the first image and the second image, a target detection unit that detects points corresponding to the same target as the first position and the second position, respectively.
A conversion unit that applies the planar projective transformation to the second position to obtain a conversion position obtained by converting the second position into coordinates in the first image.
It comprises an estimation unit that estimates information about the height of the same object from the space plane based on the difference between the first position and the conversion position.
Based on the distance between the first position and the conversion position, the estimation unit estimates that the larger the distance, the larger the height from the in-space plane.
In the estimation unit, estimation is performed assuming that the height from the inner plane of the space is larger as the distance is larger, using a model determined according to the first position.
The model is given only for a part of the pixel positions in the first image as a candidate for the first position, and if the first position is a position not given the model, the model is said. In the estimation unit, a height estimation device is characterized in that a model for the first position is obtained by complementing the model with a model at a position that is in the vicinity and the model is given .

少なくとも２つの視点による画像として第一画像及び第二画像を含む多視点画像より、当該多視点画像に撮影されている空間内平面における同一点対応を、前記第一画像及び前記第二画像の間において求め、当該同一点対応から平面射影変換を求めるパラメータ算出部と、
前記第一画像及び前記第二画像より、同一対象に該当する点を第一位置及び第二位置としてそれぞれ検出する対象検出部と、
前記第二位置に前記平面射影変換を適用して、当該第二位置を前記第一画像での座標に変換した変換位置を求める変換部と、
前記第一位置と前記変換位置との相違に基づいて、前記同一対象の前記空間内平面からの高さに関する情報を推定する推定部と、を備え、
前記推定部では、前記第一位置と前記変換位置との距離に基づいて、当該距離が大きいほど前記空間内平面からの高さが大きいものとして推定し、
前記推定部では、前記距離が大きいほど前記空間内平面からの高さが大きいものとして推定することを、前記第一位置に応じて定まるモデルを用いて行い、
前記モデルは、前記空間内平面を均等に区切った領域が前記第一画像において撮影される領域ごとに、当該領域の全部または一部に関して予め与えられていることを特徴とする高さ推定装置。 From a multi-viewpoint image including a first image and a second image as images from at least two viewpoints, the same point correspondence in the space plane captured by the multi-viewpoint image can be obtained between the first image and the second image. And the parameter calculation unit that obtains the plane projection conversion from the same point correspondence,
From the first image and the second image, a target detection unit that detects points corresponding to the same target as the first position and the second position, respectively.
A conversion unit that applies the planar projective transformation to the second position to obtain a conversion position obtained by converting the second position into coordinates in the first image.
It comprises an estimation unit that estimates information about the height of the same object from the space plane based on the difference between the first position and the conversion position.
Based on the distance between the first position and the conversion position, the estimation unit estimates that the larger the distance, the larger the height from the in-space plane.
In the estimation unit, estimation is performed assuming that the height from the inner plane of the space is larger as the distance is larger, using a model determined according to the first position.
The model is a height estimation device, characterized in that a region that evenly divides the in-space plane is given in advance for all or a part of the region for each region captured in the first image .

前記パラメータ算出部では、前記第一画像及び前記第二画像より特徴点及び特徴量を抽出し、当該抽出した特徴点及び特徴量の対応関係をRANSAC法により求めることで、前記平面射影変換を求めることを特徴とする請求項１または２に記載の高さ推定装置。 The parameter calculation unit extracts feature points and feature quantities from the first image and the second image, and obtains the correspondence between the extracted feature points and feature quantities by the RANSAC method to obtain the planar projective transformation. The height estimation device according to claim 1 or 2 .

前記推定部では、前記モデルを用いることにより、前記第一画像の画像座標において定義される高さとして、前記同一対象の前記空間内平面からの高さに関する情報を推定することを特徴とする請求項１ないし３のいずれかに記載の高さ推定装置。 The estimation unit is characterized in that by using the model, information regarding the height of the same object from the in-space plane is estimated as the height defined in the image coordinates of the first image. Item 6. The height estimation device according to any one of Items 1 to 3 .

前記推定部では、前記モデルを用いることにより、前記多視点画像が撮影されている空間において定義される高さとして、前記同一対象の前記空間内平面からの高さに関する情報を推定することを特徴とする請求項１ないし３のいずれかに記載の高さ推定装置。 The estimation unit is characterized in that by using the model, information on the height from the inner plane of the space of the same object is estimated as the height defined in the space where the multi-viewpoint image is taken. The height estimation device according to any one of claims 1 to 3 .

前記推定部では、前記第一位置と前記変換位置との相違に対して閾値判定を適用することにより、前記同一対象の前記空間内平面からの高さに関する情報に含まれるものとして、当該同一対象が前記空間内平面に接しているか否かの情報を推定することを特徴とする請求項１ないし５のいずれかに記載の高さ推定装置。 In the estimation unit, by applying the threshold value determination to the difference between the first position and the conversion position, the same object is assumed to be included in the information regarding the height of the same object from the in-space plane. The height estimation device according to any one of claims 1 to 5 , wherein the information on whether or not the object is in contact with the in-space plane is estimated.

少なくとも２つの視点による画像として第一画像及び第二画像を含む多視点画像より、当該多視点画像に撮影されている空間内平面における同一点対応を、前記第一画像及び前記第二画像の間において求め、当該同一点対応から平面射影変換を求めるパラメータ算出部と、
前記第一画像及び前記第二画像より、同一対象に該当する点を第一位置及び第二位置としてそれぞれ検出する対象検出部と、
前記第二位置に前記平面射影変換を適用して、当該第二位置を前記第一画像での座標に変換した変換位置を求める変換部と、
前記第一位置と前記変換位置との相違に基づいて、前記同一対象の前記空間内平面からの高さに関する情報を推定する推定部と、を備え、
前記多視点画像はさらに、１つ以上の視点による画像として１つ以上の第三画像を含み、
前記パラメータ算出部はさらに、前記同一点対応を、前記第一画像及び前記第三画像の間において求め、当該同一点対応から平面射影変換を求め、
前記対象検出部はさらに、前記同一対象に該当する点を前記第三画像より第三位置として検出し、
前記変換部はさらに、前記第三位置に対して前記第三画像に対応する平面射影変換を適用して、当該第三位置を前記第一画像での座標に変換した変換位置を求め、
前記推定部は、前記第一画像と前記第二画像に関する変換位置との距離と、前記第一画像と１つ以上の前記第三画像に関する変換位置とのそれぞれの距離と、をそれぞれ求め、当該それぞれ求めた距離が大きいほど前記空間内平面からの高さが大きいものとして高さをそれぞれ推定し、当該それぞれ推定した高さの平均として、前記同一対象の前記空間内平面からの高さを推定することを特徴とする高さ推定装置。 From a multi-viewpoint image including a first image and a second image as images from at least two viewpoints, the same point correspondence in the space plane captured by the multi-viewpoint image can be obtained between the first image and the second image. And the parameter calculation unit that obtains the plane projection conversion from the same point correspondence,
From the first image and the second image, a target detection unit that detects points corresponding to the same target as the first position and the second position, respectively.
A conversion unit that applies the planar projective transformation to the second position to obtain a conversion position obtained by converting the second position into coordinates in the first image.
It comprises an estimation unit that estimates information about the height of the same object from the space plane based on the difference between the first position and the conversion position.
The multi-viewpoint image further comprises one or more third images as images from one or more viewpoints.
The parameter calculation unit further obtains the same point correspondence between the first image and the third image, and obtains a planar projective transformation from the same point correspondence.
The target detection unit further detects a point corresponding to the same target as a third position from the third image.
The conversion unit further applies a planar projective transformation corresponding to the third image to the third position to obtain a conversion position obtained by converting the third position into coordinates in the first image.
The estimation unit obtains the distance between the first image and the conversion position with respect to the second image, and the distance between the first image and the conversion position with respect to one or more of the third images, respectively. The height is estimated assuming that the larger the distance obtained is, the larger the height from the in-space plane is, and the height from the in-space plane of the same object is estimated as the average of the estimated heights. A height estimation device characterized by

コンピュータを請求項１ないし７のいずれかに記載の高さ推定装置として機能させることを特徴とするプログラム。 A program characterized in that the computer functions as the height estimation device according to any one of claims 1 to 7 .