JP7500333B2

JP7500333B2 - GENERATION DEVICE, GENERATION METHOD, AND PROGRAM

Info

Publication number: JP7500333B2
Application number: JP2020133272A
Authority: JP
Inventors: 康文 ▲高▼間
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-08-05
Filing date: 2020-08-05
Publication date: 2024-06-17
Anticipated expiration: 2040-08-05
Also published as: JP2022029777A

Description

本発明は、人物の姿勢モデル生成技術に関する。 The present invention relates to a technology for generating a human pose model.

複数の異なる位置に設定された複数の撮像装置（多視点カメラ）により得られた画像（多視点カメラ画像）を用いて、被写体（人体）の姿勢モデルを推定する技術（姿勢推定技術）が注目されている。姿勢モデルは、被写体を構成する関節の位置、関節同士の接続関係、被写体の部位間の角度などを表す情報である。姿勢推定技術は、多視点カメラ画像に写る被写体の関節位置を推定することで、肘や膝などの角度などを推定できる。さらに、推定された姿勢モデルを用いることで、スポーツ選手の運動量や疲労度の評価、新旧のフォーム比較のような運動解析が可能になる。特許文献１では、多視点カメラを用いて被写体を撮像し、得られた多視点カメラ画像から被写体領域の画像（被写体画像）を抽出し、該被写体画像から該被写体の３次元関節位置を持つ姿勢モデルを推定している。 Attention has been focused on a technology (posture estimation technology) that estimates a posture model of a subject (human body) using images (multi-view camera images) obtained by multiple imaging devices (multi-view cameras) set at multiple different positions. The posture model is information that represents the positions of the joints that make up the subject, the connection between the joints, and the angles between parts of the subject. Posture estimation technology can estimate the angles of elbows, knees, etc. by estimating the joint positions of the subject captured in the multi-view camera images. Furthermore, by using the estimated posture model, it is possible to perform motion analysis such as evaluating the amount of exercise and fatigue level of an athlete and comparing old and new forms. In Patent Document 1, a subject is imaged using a multi-view camera, an image of the subject area (subject image) is extracted from the obtained multi-view camera images, and a posture model with the three-dimensional joint positions of the subject is estimated from the subject image.

特開２０１６－１２６４２５号公報JP 2016-126425 A

特許文献１では、被写体一人が写る被写体画像における特徴点を用いて該被写体の形状モデルを推定し、該形状モデルから該被写体の姿勢モデルを推定している。しかしながら、複数の被写体が多視点カメラに写るシーンにおいて各被写体の姿勢モデルを推定する場合、被写体毎に、該被写体を撮影したカメラを判別し、該判別されたカメラから得られるカメラ画像から被写体画像を取得する必要がある。さらに、多数のカメラに同じ被写体が写る場合、これらのカメラからの多視点カメラ画像の全てを用いて姿勢モデルを推定すると処理時間の増加を招く。また、異なる被写体同士が重なっている多視点カメラ画像を取得してしまうと、特定の被写体については全身が写らず、該被写体の姿勢モデルの推定誤差が大きくなる。 In Patent Document 1, a shape model of a single subject is estimated using feature points in a subject image, and a posture model of the subject is estimated from the shape model. However, when estimating a posture model for each subject in a scene in which multiple subjects are captured by multi-view cameras, it is necessary to identify the camera that captured the subject for each subject, and to obtain the subject image from the camera image obtained from the identified camera. Furthermore, when the same subject is captured by multiple cameras, estimating a posture model using all of the multi-view camera images from these cameras leads to an increase in processing time. Also, when multi-view camera images in which different subjects overlap are acquired, the entire body of a particular subject is not captured, resulting in a large estimation error in the posture model of the subject.

本発明は、上記の課題に鑑みてなされたものであり、複数の被写体の姿勢モデルを適切に生成することを目的とする。 The present invention was made in consideration of the above problems, and aims to appropriately generate posture models for multiple subjects.

上記目的を達成するための一手段として、本発明の生成装置は以下の構成を有する。すなわち、複数の撮像装置が複数の被写体を異なる方向から撮像することに基づいて得られた複数の被写体を示す画像を取得する画像取得手段と、前記複数の画像に基づいて、前記複数の被写体の各被写体に対して生成された、被写体の３次元形状を示す形状モデルを取得するモデル取得手段と、前記複数の被写体のうちの、３次元姿勢を示す姿勢モデルの生成の対象の被写体に対して生成された前記形状モデルに基づいて、前記複数の画像から前記対象の被写体の前記姿勢モデルの生成に用いる画像を特定する特定手段と、前記特定された画像に基づいて、前記対象の被写体の前記姿勢モデルを生成する生成手段と、を有する。 As one means for achieving the above object, the generating device of the present invention has the following configuration. That is, it has an image acquiring means for acquiring images showing multiple subjects obtained based on multiple imaging devices capturing images of multiple subjects from different directions, a model acquiring means for acquiring a shape model showing a three-dimensional shape of each of the multiple subjects generated for each of the multiple subjects based on the multiple images, a specifying means for identifying an image to be used for generating the posture model of the target subject from the multiple images based on the shape model generated for a target subject of the multiple subjects for which a posture model showing a three-dimensional posture is to be generated, and a generating means for generating the posture model of the target subject based on the identified image.

本発明によれば、複数の被写体の姿勢モデルを適切に生成することができる。 The present invention makes it possible to appropriately generate posture models for multiple subjects.

実施形態１における画像処理システムの構成例を示す図である。1 is a diagram illustrating an example of the configuration of an image processing system according to a first embodiment. 実施形態１における標準形状モデルと姿勢モデルを示す模式図である。3A and 3B are schematic diagrams showing a standard shape model and a posture model according to the first embodiment. 実施形態１における座標系を説明するための模式図である。FIG. 2 is a schematic diagram for explaining a coordinate system in the first embodiment. 実施形態１における形状推定装置のハードウェア構成例を示す図である。FIG. 2 is a diagram illustrating an example of a hardware configuration of the shape estimating device according to the first embodiment. 実施形態１における姿勢推定装置により実行される処理のフローチャートである。4 is a flowchart of a process executed by the posture estimation device according to the first embodiment. 実施形態２における画像処理システムの構成例を示す図である。FIG. 11 is a diagram illustrating an example of the configuration of an image processing system according to a second embodiment. 実施形態２における姿勢推定装置により実行される処理のフローチャートである。11 is a flowchart of a process executed by a posture estimation device according to a second embodiment. 実施形態２に関する被写体の重なりを示す模式図である。FIG. 11 is a schematic diagram showing overlapping of subjects according to the second embodiment.

以下、添付図面を参照して実施形態を詳しく説明する。なお、以下の実施形態は特許請求の範囲に係る発明を限定するものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 The following embodiments are described in detail with reference to the attached drawings. Note that the following embodiments do not limit the invention according to the claims. Although the embodiments describe multiple features, not all of these multiple features are necessarily essential to the invention, and multiple features may be combined in any manner. Furthermore, in the attached drawings, the same reference numbers are used for the same or similar configurations, and duplicate explanations are omitted.

（実施形態１）
本実施形態では、被写体の形状モデル（被写体の３次元形状）を推定した結果を用いて姿勢推定に用いる画像を特定し、特定された画像を用いて被写体の姿勢モデル（被写体の３次元姿勢）を推定して生成する方法について述べる。
形状モデルは、例えば被写体が人物である場合、被写体のシルエット、輪郭を示し、点群や複数のボクセルで表現されてもよい。また、形状モデルは、複数のポリゴンを含むポリゴンデータとして表現されてもよい。
姿勢モデルは、例えば被写体が人物である場合、その人物の関節位置を表す点と、骨格を表す線とで表現されてもよい。つまり、この場合、姿勢モデルは、複数の点と、２点間を結ぶ線と、を含んでもよい。姿勢モデルは、これに限定されず、それ以外の表現であってもよく、関節位置を示す点のみで表現されていてもよい。また、すべての関節位置が表現されていなくてもよく、一部の関節位置が表現されていてもよい。また、姿勢モデルが、被写体の３次元姿勢を表すものであれば、必ずしも関節位置を点で表現しなくても、いくつか又はすべての関節位置に代えて特徴的な部位を点で表現してもよい。また、人物の顔などの輪郭については、複数の点と直線あるいは曲線で表現してもよいし、球や楕円球で表現してもよい。 (Embodiment 1)
In this embodiment, a method is described in which an image to be used for posture estimation is identified using the results of estimating a shape model of a subject (three-dimensional shape of the subject), and the identified image is used to estimate and generate a posture model of the subject (three-dimensional posture of the subject).
For example, when the subject is a person, the shape model may represent the silhouette or contour of the subject and may be expressed as a point cloud or a plurality of voxels. The shape model may also be expressed as polygon data including a plurality of polygons.
For example, when the subject is a person, the posture model may be expressed by points representing the joint positions of the person and lines representing the skeleton. That is, in this case, the posture model may include a plurality of points and a line connecting the two points. The posture model is not limited to this, and may be expressed in other ways, and may be expressed only by points indicating the joint positions. In addition, all joint positions may not be expressed, and some joint positions may be expressed. In addition, as long as the posture model represents the three-dimensional posture of the subject, the joint positions do not necessarily have to be expressed by points, and characteristic parts may be expressed by points instead of some or all of the joint positions. In addition, the outline of the person's face, etc. may be expressed by a plurality of points and straight lines or curves, or may be expressed by a sphere or an elliptical sphere.

［画像処理システムの構成］
本実施形態における画像処理システムの構成例を図１に示す。本実施形態における画像処理システム１０は、撮像装置１００と姿勢推定装置１１０とを含む。なお、図１には１台の撮像装置１００を示すが、同様の構成の複数の撮像装置１００が、無線または有線の接続で姿勢推定装置１１０に接続されているものとする。また、以下の説明において、「撮像装置」は、「カメラ」と同義に用いられるものとする。 [Image Processing System Configuration]
An example of the configuration of an image processing system according to this embodiment is shown in Fig. 1. The image processing system 10 according to this embodiment includes an image capturing device 100 and a posture estimation device 110. Although Fig. 1 shows one image capturing device 100, it is assumed that a plurality of image capturing devices 100 having similar configurations are connected to the posture estimation device 110 via a wireless or wired connection. In the following description, the term "image capturing device" is used synonymously with the term "camera."

複数の撮像装置１００は、多数の異なる方向から撮像領域を撮像する、多視点カメラを構成し、異なる方向から撮像した複数の画像（多視点カメラ画像）を生成・取得する。多視点カメラを構成する各撮像装置１００は、撮像装置を識別するための識別番号を持つ。撮像装置１００は、撮像した画像から前景画像を抽出する機能など、他の機能やその機能を実現するハードウェア（回路や装置など）も含んでもよい。なお、前景画像とは、カメラにより撮像されて取得された撮像画像から、被写体領域（前景領域）を抽出した画像であり、被写体画像とも称することができる。撮像領域は、例えば、スポーツが行われる競技場の平面と任意の高さで囲まれた領域である。各撮像装置１００は、撮像領域を取り囲むようにそれぞれ異なる位置・異なる方向に設置され、同期して撮像を行う。なお、各撮像装置１００は撮像領域の全周にわたって設置されなくてもよく、設置場所の制限等によっては撮像領域の一部の方向にのみ設置されてもよい。多視点カメラを構成する撮像装置１００の数は限定されず、例えば撮像領域をサッカーやラグビーの競技場とする場合、競技場の周囲に数十～数百台程度の撮像装置１００が設置されてもよい。また、望遠カメラと広角カメラなど画角が異なるカメラが撮像装置１００として設置されてもよい。 The multiple imaging devices 100 constitute a multi-view camera that captures an imaging area from many different directions, and generate and acquire multiple images (multi-view camera images) captured from different directions. Each imaging device 100 constituting the multi-view camera has an identification number for identifying the imaging device. The imaging device 100 may also include other functions, such as a function to extract a foreground image from a captured image, and hardware (such as a circuit or device) that realizes that function. Note that a foreground image is an image in which a subject area (foreground area) is extracted from an image captured and acquired by a camera, and can also be called a subject image. The imaging area is, for example, an area surrounded by the plane of a stadium where sports are played and an arbitrary height. Each imaging device 100 is installed at a different position and in a different direction so as to surround the imaging area, and captures images synchronously. Note that each imaging device 100 does not have to be installed around the entire circumference of the imaging area, and may be installed only in a partial direction of the imaging area depending on restrictions on the installation location, etc. There is no limit to the number of imaging devices 100 that make up the multi-view camera. For example, if the imaging area is a soccer or rugby stadium, several tens to several hundreds of imaging devices 100 may be installed around the stadium. In addition, cameras with different angles of view, such as a telephoto camera and a wide-angle camera, may be installed as imaging devices 100.

撮像装置１００は、現実世界の１つの時刻情報で同期され、撮像した画像（映像）には毎フレームの画像に撮像時刻情報が付与される。さらに、撮像装置１００は、自装置の異常を検知する手段を併せ持ち、異常の有無を知らせる異常情報を、姿勢推定装置１１０へ送ることができる。撮像装置１００の異常は、例えば、撮像装置１００に備えられる、熱や振動など一般的なセンサの値を評価することで検知できる。 The imaging device 100 is synchronized with a single piece of real-world time information, and imaging time information is added to each frame of the captured image (video). Furthermore, the imaging device 100 also has a means for detecting abnormalities in its own device, and can send abnormality information indicating the presence or absence of an abnormality to the posture estimation device 110. An abnormality in the imaging device 100 can be detected, for example, by evaluating the values of general sensors such as heat and vibration provided in the imaging device 100.

さらに、撮像装置１００は、自装置の位置、姿勢（向き、撮像方向）、焦点距離、光学中心、歪みなどの状態情報を取得し、管理する。撮像装置１００の位置、姿勢（向き、撮像方向）は、撮像装置１００自身によって制御されてもよいし、撮像装置１００の位置や姿勢を制御する雲台によって制御されてもよい。以下では、撮像装置１００の状態情報をカメラパラメータとして説明を行うが、そのパラメータには、雲台等の別の装置により制御されるパラメータ（各種情報）が含まれていてもよい。撮像装置１００の位置、姿勢（向き、撮像方向）に関するカメラパラメータは、いわゆる外部パラメータであり、撮像装置１００の焦点距離、画像中心、歪みに関するパラメータは、いわゆる内部パラメータである。 Furthermore, the imaging device 100 acquires and manages status information such as its own position, attitude (orientation, imaging direction), focal length, optical center, and distortion. The position and attitude (orientation, imaging direction) of the imaging device 100 may be controlled by the imaging device 100 itself, or by a camera head that controls the position and attitude of the imaging device 100. In the following, the status information of the imaging device 100 is described as camera parameters, but these parameters may include parameters (various information) controlled by another device such as a camera head. The camera parameters related to the position and attitude (orientation, imaging direction) of the imaging device 100 are so-called external parameters, and the parameters related to the focal length, image center, and distortion of the imaging device 100 are so-called internal parameters.

図３は、本実施形態で用いる座標系を説明するための模式図である。図３（ａ）は世界座標系を示し、図３（ｂ）は図３（ａ）におけるカメラ３１０によるカメラ画像座標系を示す。例えば、図３（ａ）に示すように、被写体３０５を写すカメラ３１０から３４０がある場合、各カメラの位置や姿勢の情報（外部パラメータ）は、原点３００、Ｘｗ軸３０１、Ｙｗ軸３０２、Ｚｗ軸３０３で表現された１つの世界座標系で表現される。また、図３（ｂ）に示すように、カメラ３１０によるカメラ画像座標系において、カメラ画像座標系の原点３６０、Ｘｉ軸３６１、Ｙｉ軸３６２が設定され、カメラ３１０の撮像画像を画像３５０、座標（０，０）における画素を画素３７０とする。なお、他のカメラ３２０～３４０のカメラ画像座標系も図３（ｂ）と同様に定義することができる。 Figure 3 is a schematic diagram for explaining the coordinate system used in this embodiment. Figure 3(a) shows a world coordinate system, and Figure 3(b) shows a camera image coordinate system by camera 310 in Figure 3(a). For example, as shown in Figure 3(a), when there are cameras 310 to 340 capturing an object 305, the position and orientation information (external parameters) of each camera is expressed in one world coordinate system expressed by an origin 300, an Xw axis 301, a Yw axis 302, and a Zw axis 303. Also, as shown in Figure 3(b), in the camera image coordinate system by camera 310, an origin 360, an Xi axis 361, and a Yi axis 362 of the camera image coordinate system are set, and the captured image of camera 310 is image 350, and the pixel at coordinates (0,0) is pixel 370. The camera image coordinate systems of the other cameras 320 to 340 can also be defined in the same way as in Figure 3(b).

姿勢推定装置１１０は、複数の撮像装置１００から得られた画像から、被写体の姿勢モデルを推定して生成する生成装置として機能する。姿勢推定装置１１０は、被写体の姿勢モデルを、例えば、次のような方法で推定して生成する。まず、姿勢推定装置１１０は、複数の撮像装置１００から、複数の撮像装置１００が異なる方向から撮像することにより得られた複数の画像（多視点カメラ画像）を取得する。次に、姿勢推定装置１１０は、多視点カメラ画像から、人物などの被写体に対応する前景領域を抽出した前景画像（被写体画像）を取得する。姿勢推定装置１１０は、撮像装置１００から前景画像を取得してもよい。上述したように、前景画像とは、撮像装置１００により撮像されて取得された撮像画像から、被写体領域（前景領域）を抽出した画像である。前景領域として抽出される被写体とは、一般的に、時系列で同じ方向から撮像を行った場合において動きのある（その位置や形が変化し得る）動的被写体（動体）を指す。被写体は、例えば、ある競技において、それが行われるフィールド内にいる選手や審判などの人物、球技であれば人物に加えボールなども含む。また、コンサートやエンタテイメントにおいては、歌手、演奏者、パフォーマー、司会者などが被写体である。 The posture estimation device 110 functions as a generating device that estimates and generates a posture model of a subject from images obtained from multiple imaging devices 100. The posture estimation device 110 estimates and generates a posture model of a subject, for example, by the following method. First, the posture estimation device 110 acquires multiple images (multiple-view camera images) obtained by the multiple imaging devices 100 capturing images from different directions. Next, the posture estimation device 110 acquires a foreground image (subject image) in which a foreground area corresponding to a subject such as a person is extracted from the multi-view camera image. The posture estimation device 110 may acquire a foreground image from the imaging device 100. As described above, a foreground image is an image in which a subject area (foreground area) is extracted from an image captured by the imaging device 100. A subject extracted as a foreground area generally refers to a dynamic subject (moving object) that moves (whose position or shape may change) when images are captured from the same direction in a time series. For example, in a sport, the subject may be people such as players or referees on the field where the sport is being played, or in the case of a ball game, the subject may be the ball as well as people. In concerts and entertainment, the subject may be singers, musicians, performers, presenters, etc.

本実施形態では、姿勢モデルを推定するために、標準的な人の形を模した３次元の標準形状モデルとその姿勢モデル（初期姿勢モデル）が、あらかじめ姿勢推定装置１１０に入力されるものとする。図２に、入力される標準形状モデル２００と初期姿勢モデル２１０の一例を示す。これらのモデルはＣＧ（ＣｏｍｐｕｔｅｒＧｒａｐｈｉｃ）でも用いられる一般的なモデルで良く、ファイル形式（フォーマット）はＦＢＸ形式など一般的な形式で良い。標準形状モデル２００は、例えば、３次元のメッシュモデルで表現され、頂点座標と三角形もしくは四角形の面を構成する頂点ＩＤの情報が含まれる。初期姿勢モデル２１０は、頭部や首、臍、肩、肘、手首、足の付け根、膝、足首のような人体の主要部位や関節部位の位置を表す情報２１１とそれらの接続関係を示す情報２１２、隣接部位間の角度情報が含まれる。姿勢モデルの部位とメッシュモデルの部位とを対応付けておくことで、姿勢モデルの右腕を回転すれば、対応するメッシュモデルの部位も回転できる。姿勢推定では、このような姿勢モデルに対応したメッシュモデルを、各カメラ画像（各カメラの撮像画像）に射影した画像と各カメラの前景領域（被写体画像）とが一致するように変形させ、最も一致した際の姿勢を被写体の姿勢モデルとして推定する。ただし、３次元の姿勢モデルを推定する方法はこれに限定されない。例えば、２次元画像上で２次元の姿勢を推定し、各カメラとの対応に基づいて、３次元の姿勢モデルを推定する方法など、種々の方法を用いてもよい。 In this embodiment, in order to estimate a posture model, a three-dimensional standard shape model imitating a standard human shape and its posture model (initial posture model) are input to the posture estimation device 110 in advance. FIG. 2 shows an example of the input standard shape model 200 and initial posture model 210. These models may be general models used in CG (Computer Graphics), and the file format (format) may be a general format such as FBX format. The standard shape model 200 is expressed, for example, as a three-dimensional mesh model, and includes information on vertex coordinates and vertex IDs constituting triangular or quadrangular faces. The initial posture model 210 includes information 211 indicating the positions of major parts and joints of the human body such as the head, neck, navel, shoulders, elbows, wrists, groin, knees, and ankles, information 212 indicating the connection relationship between them, and angle information between adjacent parts. By associating the parts of the posture model with the parts of the mesh model, if the right arm of the posture model is rotated, the corresponding part of the mesh model can also be rotated. In pose estimation, a mesh model corresponding to such a pose model is deformed so that the image projected onto each camera image (image captured by each camera) matches the foreground region (subject image) of each camera, and the pose that best matches is estimated as the pose model of the subject. However, the method of estimating a three-dimensional pose model is not limited to this. For example, various methods may be used, such as a method of estimating a two-dimensional pose on a two-dimensional image and estimating a three-dimensional pose model based on the correspondence with each camera.

［姿勢推定装置の構成］
次に、姿勢推定装置１１０の構成について説明する。まず、姿勢推定装置１１０の内部構成について、図１を参照して説明する。姿勢推定装置１１０は、カメラ情報取得部１１１、形状推定部１１２、画像候補生成部１１３、画像選択部１１４、姿勢推定部１１５を有する。 [Configuration of posture estimation device]
Next, a configuration of pose estimation device 110 will be described. First, an internal configuration of pose estimation device 110 will be described with reference to Fig. 1. Pose estimation device 110 has camera information acquisition unit 111, shape estimation unit 112, image candidate generation unit 113, image selection unit 114, and pose estimation unit 115.

カメラ情報取得部１１１は、画像取得機能を有し、複数の撮像装置１００から、異なる方向から撮像された画像（多視点カメラ画像）を取得する。また、カメラ情報取得部１１１は、多視点カメラ画像から前景画像を取得（生成）してもよい。カメラ情報取得部１１１は、撮像装置１００から前景画像を取得してもよい。さらに、カメラ情報取得部１１１は、撮像装置１００のカメラパラメータを取得する。また、カメラ情報取得部１１１が、撮像装置１００のカメラパラメータを算出するようにしてもよい。例えば、カメラ情報取得部１１１は、各撮像装置１００の撮像画像から対応点を算出し、対応点を各撮像装置１００に投影した時の誤差が最小になるように最適化しながら、各撮像装置１００を校正することでカメラパラメータを算出してもよい。なお、校正方法は既存のいかなる方法であってもよい。なお、カメラパラメータは、撮像画像に同期して取得されてもよいし、事前準備の段階で取得されてもよい。また、カメラパラメータは、必要に応じて撮像画像に非同期で取得されてもよい。さらに、カメラ情報取得部１１１は、撮像装置１００から、異常情報を取得することができる。カメラ情報取得部１１１は、撮像画像を形状推定部１１２に出力する。また、カメラ情報取得部１１１は、カメラパラメータを、形状推定部１１２、画像候補生成部１１３、姿勢推定部１１５に出力する。また、カメラ情報取得部１１１は、前景画像を取得した場合に、形状推定部１１２、画像候補生成部１１３、画像選択部１１４に出力することができる。また、カメラ情報取得部１１１は、いずれかの撮像装置１００の異常情報を取得している場合は、該情報を形状推定部１１２に出力することができる。なお、後述する図５に示す処理のように、カメラ情報取得部１１１はシルエット画像を生成または取得し、該シルエット画像を形状推定部１１２に出力してもよい。 The camera information acquisition unit 111 has an image acquisition function and acquires images (multi-view camera images) captured from different directions from multiple imaging devices 100. The camera information acquisition unit 111 may also acquire (generate) a foreground image from the multi-view camera image. The camera information acquisition unit 111 may also acquire a foreground image from the imaging device 100. The camera information acquisition unit 111 may also acquire camera parameters of the imaging device 100. The camera information acquisition unit 111 may also calculate the camera parameters of the imaging device 100. For example, the camera information acquisition unit 111 may calculate corresponding points from the captured images of each imaging device 100, and calculate the camera parameters by calibrating each imaging device 100 while optimizing the corresponding points so that the error when projecting them onto each imaging device 100 is minimized. The calibration method may be any existing method. The camera parameters may be acquired in synchronization with the captured images, or may be acquired at the advance preparation stage. The camera parameters may be acquired asynchronously with the captured images as necessary. Furthermore, the camera information acquisition unit 111 can acquire anomaly information from the imaging device 100. The camera information acquisition unit 111 outputs the captured image to the shape estimation unit 112. The camera information acquisition unit 111 also outputs camera parameters to the shape estimation unit 112, the image candidate generation unit 113, and the posture estimation unit 115. When the camera information acquisition unit 111 acquires a foreground image, it can output the information to the shape estimation unit 112, the image candidate generation unit 113, and the image selection unit 114. When the camera information acquisition unit 111 acquires anomaly information of any of the imaging devices 100, it can output the information to the shape estimation unit 112. Note that, as in the process shown in FIG. 5 described later, the camera information acquisition unit 111 may generate or acquire a silhouette image and output the silhouette image to the shape estimation unit 112.

形状推定部１１２は、形状モデル取得機能を有し、カメラ情報取得部１１１により取得された、カメラの撮像画像とカメラパラメータに基づいて、被写体の３次元形状である形状モデルを生成して取得する。あるいは、形状推定部１１２は、カメラ情報取得部１１１から前景画像を取得する場合、撮像画像に代えて前景画像に基づいて３次元形状を推定して取得してもよい。なお。後述する図５に示す処理のように、形状推定部１１２は、シルエット画像に基づいて３次元形状を推定してもよい。また、形状推定部１１２は、カメラ情報取得部１１１から撮像装置の異常情報を取得し、いずれかの撮像装置１００の異常を検知した場合、該撮像装置１００から撮像画像／前景画像を形状推定のために使用しないように制御する。形状推定部１１２は、形状推定結果である各被写体の形状モデルと、異常が検知された撮像装置１００の情報を画像候補生成部１１３に出力する。なお、姿勢推定装置１１０は、別の装置において撮像画像又は前景画像に基づいて生成された形状モデルを取得する構成であってもよい。 The shape estimation unit 112 has a shape model acquisition function, and generates and acquires a shape model, which is a three-dimensional shape of a subject, based on the captured image and camera parameters acquired by the camera information acquisition unit 111. Alternatively, when the shape estimation unit 112 acquires a foreground image from the camera information acquisition unit 111, the shape estimation unit 112 may estimate and acquire a three-dimensional shape based on the foreground image instead of the captured image. Note that. As in the process shown in FIG. 5 described later, the shape estimation unit 112 may estimate a three-dimensional shape based on a silhouette image. In addition, the shape estimation unit 112 acquires abnormality information of the imaging device from the camera information acquisition unit 111, and when an abnormality is detected in any of the imaging devices 100, the shape estimation unit 112 controls the imaging device 100 so that the captured image/foreground image is not used for shape estimation. The shape estimation unit 112 outputs the shape model of each subject, which is the shape estimation result, and information on the imaging device 100 in which the abnormality is detected, to the image candidate generation unit 113. Note that the posture estimation device 110 may be configured to acquire a shape model generated based on a captured image or a foreground image in another device.

画像候補生成部１１３と画像選択部１１４は、姿勢モデルの推定に用いる画像を特定する手段として機能する。画像候補生成部１１３は、形状推定部１１２による形状推定結果である各被写体の形状モデルとカメラパラメータを用いて、被写体が写る撮像装置１００を導出し、姿勢推定に用いる１つ以上の前景画像を、画像候補として被写体毎に生成する。また、画像候補生成部１１３は、異常が検知された撮像装置１００からの前景画像については、画像候補にあげないように制御する。画像候補生成部１１３は、被写体毎の、姿勢推定で用いる前景画像の候補（画像候補）を画像選択部１１４へ出力する。 The image candidate generation unit 113 and the image selection unit 114 function as means for identifying images to be used in estimating a posture model. The image candidate generation unit 113 uses the shape model of each subject, which is the result of shape estimation by the shape estimation unit 112, and camera parameters to derive the imaging device 100 in which the subject appears, and generates one or more foreground images to be used in posture estimation as image candidates for each subject. The image candidate generation unit 113 also performs control so that foreground images from an imaging device 100 in which an abnormality has been detected are not included as image candidates. The image candidate generation unit 113 outputs foreground image candidates (image candidates) to be used in posture estimation for each subject to the image selection unit 114.

画像選択部１１４は、被写体が一定台数以上の撮像装置１００に写る場合、撮像装置１００の台数や撮像装置１００間の角度に基づいて、姿勢推定に用いる前景画像を、画像候補から選択（決定／特定）する。姿勢推定に用いる前景画像の数（カメラ台数と等しくなりうる）は、予め決められていてもよい。例えば、操作部４１６や通信Ｉ／Ｆ（図４）を介して、選択する前景画像の数が姿勢推定装置１１０に予め入力され得る。選択される前景画像の数は、固定の数であってもよいし、状況に応じて変更されてもよい。また、姿勢推定に用いる前景画像の数は、設定された下限数以上で設定された上限数以下であっても良い。この場合、姿勢推定の精度を向上させ、さらに処理負荷を軽減することができる。姿勢推定部１１５は、被写体毎に選択（決定）された前景画像と被写体が写る撮像装置１００のカメラパラメータを用いて被写体の姿勢モデルを推定して生成する。 When the subject is captured by a certain number or more of imaging devices 100, the image selection unit 114 selects (determines/specifies) a foreground image to be used for posture estimation from image candidates based on the number of imaging devices 100 and the angle between the imaging devices 100. The number of foreground images to be used for posture estimation (which may be equal to the number of cameras) may be determined in advance. For example, the number of foreground images to be selected may be input in advance to the posture estimation device 110 via the operation unit 416 or the communication I/F (FIG. 4). The number of foreground images to be selected may be a fixed number, or may be changed according to the situation. In addition, the number of foreground images to be used for posture estimation may be equal to or greater than a set lower limit number and equal to or less than a set upper limit number. In this case, the accuracy of posture estimation can be improved and the processing load can be further reduced. The posture estimation unit 115 estimates and generates a posture model of the subject using the foreground image selected (determined) for each subject and the camera parameters of the imaging device 100 in which the subject is captured.

次に、姿勢推定装置１１０のハードウェア構成について説明する。図４に、姿勢推定装置１１０のハードウェア構成例を示す。姿勢推定装置１１０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）４１１、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）４１２、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）４１３、補助記憶装置４１４、表示部４１５、操作部４１６、通信Ｉ／Ｆ（インタフェース）４１７、及びバス４１８を有する。ＣＰＵ４１１は、ＲＯＭ４１２やＲＡＭ４１３に格納されているコンピュータプログラムやデータを用いて姿勢推定装置１１０の全体を制御することで、図１に示す姿勢推定装置１１０の各機能を実現する。なお、姿勢推定装置１１０がＣＰＵ４１１とは異なる１又は複数の専用のハードウェアを有し、ＣＰＵ４１１による処理の少なくとも一部を専用のハードウェアが実行してもよい。専用のハードウェアの例としては、ＡＳＩＣ（特定用途向け集積回路）、ＦＰＧＡ（フィールドプログラマブルゲートアレイ）、およびＤＳＰ（デジタルシグナルプロセッサ）などがある。ＲＯＭ４１２は、変更を必要としないプログラムなどを格納する。ＲＡＭ４１３は、補助記憶装置４１４から供給されるプログラムやデータ、及び通信Ｉ／Ｆ４１７を介して外部から供給されるデータなどを一時記憶する。補助記憶装置４１４は、例えばハードディスクドライブ等で構成され、画像データや音声データなどの種々のデータを記憶する。 Next, the hardware configuration of the posture estimation device 110 will be described. FIG. 4 shows an example of the hardware configuration of the posture estimation device 110. The posture estimation device 110 has a CPU (Central Processing Unit) 411, a ROM (Read Only Memory) 412, a RAM (Random Access Memory) 413, an auxiliary storage device 414, a display unit 415, an operation unit 416, a communication I/F (interface) 417, and a bus 418. The CPU 411 realizes each function of the posture estimation device 110 shown in FIG. 1 by controlling the entire posture estimation device 110 using computer programs and data stored in the ROM 412 and the RAM 413. Note that the posture estimation device 110 may have one or more dedicated hardware different from the CPU 411, and at least a part of the processing by the CPU 411 may be executed by the dedicated hardware. Examples of dedicated hardware include ASICs (application-specific integrated circuits), FPGAs (field programmable gate arrays), and DSPs (digital signal processors). ROM 412 stores programs that do not require modification. RAM 413 temporarily stores programs and data supplied from auxiliary storage device 414, and data supplied from the outside via communication I/F 417. Auxiliary storage device 414 is composed of, for example, a hard disk drive, and stores various data such as image data and audio data.

表示部４１５は、例えば液晶ディスプレイやＬＥＤ（ＬｉｇｈｔＥｍｉｔｔｉｎｇＤｉｏｄｅ）等で構成され、ユーザが姿勢推定理装置１を操作するためのＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）などを表示する。操作部４１６は、例えばキーボードやマウス、ジョイスティック、タッチパネル等で構成され、ユーザによる操作を受けて各種の指示をＣＰＵ４１１に入力する。ＣＰＵ４１１は、表示部４１５を制御する表示制御部、及び操作部４１６を制御する操作制御部として動作する。 The display unit 415 is composed of, for example, a liquid crystal display or LEDs (Light Emitting Diodes), and displays a GUI (Graphical User Interface) for the user to operate the posture estimation device 1. The operation unit 416 is composed of, for example, a keyboard, mouse, joystick, touch panel, and receives operations by the user and inputs various instructions to the CPU 411. The CPU 411 operates as a display control unit that controls the display unit 415, and as an operation control unit that controls the operation unit 416.

通信Ｉ／Ｆ４１７は、姿勢推定装置１１０の外部の装置との通信に用いられる。例えば、姿勢推定装置１１０が外部の装置と有線で接続される場合には、通信用のケーブルが通信Ｉ／Ｆ４１７に接続される。姿勢推定装置１１０が外部の装置と無線通信する機能を有する場合には、通信Ｉ／Ｆ４１７はアンテナ（不図示）を備える。バス４１８は、姿勢推定装置１１０の各部をつないで情報を伝達する。 The communication I/F 417 is used for communication with devices external to the posture estimation device 110. For example, when the posture estimation device 110 is connected to an external device via a wired connection, a communication cable is connected to the communication I/F 417. When the posture estimation device 110 has a function for wireless communication with an external device, the communication I/F 417 includes an antenna (not shown). The bus 418 connects each part of the posture estimation device 110 to transmit information.

本実施形態では、表示部４１５と操作部４１６が姿勢推定装置１１０の内部に存在するものとするが、表示部４１５と操作部４１６との少なくとも一方が姿勢推定装置１１０の外部に別の装置として存在していてもよい。 In this embodiment, the display unit 415 and the operation unit 416 are assumed to exist inside the posture estimation device 110, but at least one of the display unit 415 and the operation unit 416 may exist as a separate device outside the posture estimation device 110.

［動作フロー］
続いて、姿勢推定装置１１０の動作について説明する。図５は、姿勢推定装置１１０により実行される処理のフローチャートである。図５に示すフローチャートは、姿勢推定装置１１０のＣＰＵ４１１がＲＯＭ４１２やＲＡＭ４１３に格納されている制御プログラムを実行し、情報の演算および加工並びに各ハードウェアの制御を実行することにより実現されうる。なお、Ｓ５５０の姿勢推定に用いる前景画像の数（姿勢推定に用いる撮像装置１００の台数）は、予め決めておくことができる。当該姿勢推定に用いる前景画像の数は、例えば、姿勢推定装置１１０に入力（設定）されていてもよく、または、Ｓ５１０においてカメラ情報取得部１１１が撮像装置１００から１つのパラメータとして当該数を取得してもよい。また、姿勢推定に用いる前景画像の数は、設定された下限数以上で設定された上限数以下であっても良い。 [Operation flow]
Next, the operation of the posture estimation device 110 will be described. FIG. 5 is a flowchart of the process executed by the posture estimation device 110. The flowchart shown in FIG. 5 can be realized by the CPU 411 of the posture estimation device 110 executing a control program stored in the ROM 412 or the RAM 413, and performing calculation and processing of information and control of each hardware. The number of foreground images used for posture estimation in S550 (the number of image capture devices 100 used for posture estimation) can be determined in advance. The number of foreground images used for posture estimation may be input (set) to the posture estimation device 110, for example, or the camera information acquisition unit 111 may acquire the number as one parameter from the image capture device 100 in S510. The number of foreground images used for posture estimation may be equal to or greater than the set lower limit number and equal to or less than the set upper limit number.

Ｓ５００において、カメラ情報取得部１１１は、複数の撮像装置１００からカメラパラメータを取得する。なお、カメラ情報取得部１１１が、該カメラパラメータを算出するようにしてもよい。また、カメラ情報取得部１１１が該カメラパラメータを算出する場合、撮像画像を取得する度に算出する必要はなく、姿勢推定する前に少なくとも１度算出すればよい。取得したカメラパラメータは、形状推定部１１２、画像候補生成部１１３、姿勢推定部１１５に出力される。 In S500, the camera information acquisition unit 111 acquires camera parameters from multiple image capture devices 100. The camera information acquisition unit 111 may calculate the camera parameters. When the camera information acquisition unit 111 calculates the camera parameters, it is not necessary to calculate them every time a captured image is acquired, and it is sufficient to calculate them at least once before pose estimation. The acquired camera parameters are output to the shape estimation unit 112, the image candidate generation unit 113, and the pose estimation unit 115.

Ｓ５１０において、カメラ情報取得部１１１は、複数の撮像装置１００から、複数の撮像画像を取得し、前景画像を取得（生成）する。もしくは、カメラ情報取得部１１１に各撮像装置１００により抽出された前景画像が入力されてもよい。本実施形態では、Ｓ５２０で行われる被写体の形状モデルの推定に、視体積交差法（ｓｈａｐｅｆｒｏｍｓｉｌｈｏｕｅｔｔｅ）を用いる。そのために、カメラ情報取得部１１１は、撮像画像を取得した場合は、該撮像画像から被写体のシルエット画像を生成する。シルエット画像は、被写体を撮像した撮像画像から、試合開始前などに被写体が存在しない時に予め撮像した背景画像との差分を算出する背景差分法などの一般的な手法を用いて生成されてもよい。ただし、シルエット画像を生成する方法は、これに限定されない。例えば、被写体（人体）を認識するなどの方法を用いて、被写体の領域を抽出するようにしてもよい。また、カメラ情報取得部１１１が前景画像を取得した場合、該前景画像からテクスチャ情報を消すことでシルエット画像を生成してもよい。具体的には、被写体が存在する領域の画素値を２５５、それ以外の領域の画素値を０にすればよい。また、カメラ情報取得部１１１は撮像装置１００からシルエット画像を取得してもよく、その場合は、カメラ情報取得部１１１において被写体のシルエット画像を生成する処理は省略することができる。取得されたシルエット画像は、一種の前景画像として、形状推定部１１２に出力される。合わせて、前景画像（前景画像のテクスチャデータ）は、画像候補生成部１１３と画像選択部１１４に出力されうる。 In S510, the camera information acquisition unit 111 acquires a plurality of captured images from a plurality of imaging devices 100, and acquires (generates) a foreground image. Alternatively, a foreground image extracted by each imaging device 100 may be input to the camera information acquisition unit 111. In this embodiment, a shape from silhouette method is used to estimate a shape model of the subject performed in S520. For this purpose, when the camera information acquisition unit 111 acquires a captured image, it generates a silhouette image of the subject from the captured image. The silhouette image may be generated using a general method such as a background difference method that calculates the difference between a captured image of the subject and a background image captured in advance when the subject does not exist, such as before the start of a match. However, the method of generating the silhouette image is not limited to this. For example, the area of the subject may be extracted using a method such as recognizing the subject (human body). In addition, when the camera information acquisition unit 111 acquires a foreground image, the silhouette image may be generated by erasing texture information from the foreground image. Specifically, the pixel value of the area where the subject exists may be set to 255, and the pixel value of the other areas may be set to 0. Furthermore, the camera information acquisition unit 111 may acquire a silhouette image from the imaging device 100, in which case the process of generating a silhouette image of the subject in the camera information acquisition unit 111 may be omitted. The acquired silhouette image is output to the shape estimation unit 112 as a type of foreground image. Additionally, the foreground image (texture data of the foreground image) may be output to the image candidate generation unit 113 and the image selection unit 114.

Ｓ５２０において、形状推定部１１２は、各被写体の形状モデルを推定する。本実施形態では、視体積交差法を用い、推定した結果（形状モデル）は、例えば、３次元座標の集合であるボクセルで表現される。形状推定方法はこれ以外の一般的な方法を用いることもできる。さらに、形状推定部１１２は、得られたボクセル集合を、隣接するボクセルの有無に基づく一般的な３次元ラベリング処理することで、被写体毎のボクセル集合に分割する。各ボクセルにはラベリング結果である被写体ＩＤが付与され、該ＩＤを指定することにより、被写体毎のボクセルを得ることができる（被写体ＩＤ＝０～最大被写体ＩＤ）。推定した形状モデルは、画像候補生成部１１３へ出力される。 In S520, the shape estimation unit 112 estimates a shape model for each subject. In this embodiment, a volume intersection method is used, and the estimated result (shape model) is expressed, for example, in voxels, which are a set of three-dimensional coordinates. Other general methods can also be used as the shape estimation method. Furthermore, the shape estimation unit 112 divides the obtained voxel set into voxel sets for each subject by performing a general three-dimensional labeling process based on the presence or absence of adjacent voxels. Each voxel is assigned a subject ID, which is the labeling result, and by specifying this ID, it is possible to obtain the voxel for each subject (subject ID = 0 to maximum subject ID). The estimated shape model is output to the image candidate generation unit 113.

Ｓ５３０～Ｓ５５０は、Ｓ５５０で姿勢モデルを推定する対象の被写体に対する処理である。Ｓ５３０において、画像候補生成部１１３は、Ｓ５５０の姿勢推定に用いる前景画像の候補を対象の被写体に対して生成し、画像選択部１１４へ出力する。具体的には、まず、画像候補生成部１１３は、対象の被写体ＩＤを指定し、該ＩＤのボクセル集合（該ＩＤの被写体の形状モデルに対応）を取得する。画像候補生成部１１３は、その後、該ボクセル集合の各ボクセルを走査し、世界座標系で表現されたボクセルの座標（ｘ、ｙ、ｚ）（図３（ａ）を参照）の各要素の最小値、最大値、および重心座標を算出する。そして、画像候補生成部１１３は、最小値と最大値を基に、ボクセル集合に外接するバウンディングボックスを構成する８点（ボックスの全頂点）を定義する。さらに、画像候補生成部１１３は、これらの点を各撮像装置１００に射影することで画像候補であるか否かを判定する。判定処理は例えば次のような方法が考えられる。まず、前記の８点の座標を各撮像装置１００の外部パラメータと内部パラメータを用いて、カメラ画像座標（図３（ｂ）を参照）に変換し、対象の被写体が各撮像装置１００に写るか否かを判定する。８つの点の少なくともいずれか一つがカメラ画像座標の内側（カメラ３１０の場合の図３（ｂ）の撮像画像３５０の内側に対応）であれば、対象の被写体が撮像装置１００に写ると判定し、重心座標をカメラ画像座標系の座標に変換する。そして、該重心座標が各撮像装置１００において前景画像の矩形の内側に射影された場合、該前景画像を該対象の被写体の姿勢推定に用いる画像候補であると判定する。 S530 to S550 are processes for the subject for which the posture model is estimated in S550. In S530, the image candidate generation unit 113 generates a foreground image candidate for the subject to be used in posture estimation in S550, and outputs it to the image selection unit 114. Specifically, the image candidate generation unit 113 first specifies the subject ID of the subject to be measured, and obtains a voxel set of the ID (corresponding to the shape model of the subject of the ID). The image candidate generation unit 113 then scans each voxel of the voxel set, and calculates the minimum value, maximum value, and center of gravity coordinate of each element of the voxel coordinates (x, y, z) (see FIG. 3A) expressed in the world coordinate system. Then, the image candidate generation unit 113 defines eight points (all vertices of the box) that constitute a bounding box circumscribing the voxel set based on the minimum value and maximum value. Furthermore, the image candidate generation unit 113 projects these points onto each imaging device 100 to determine whether they are image candidates. The determination process may be performed, for example, in the following manner. First, the coordinates of the eight points are converted into camera image coordinates (see FIG. 3B) using the external and internal parameters of each image capture device 100, and it is determined whether the target subject is captured in each image capture device 100. If at least one of the eight points is inside the camera image coordinates (corresponding to the inside of the captured image 350 in FIG. 3B for camera 310), it is determined that the target subject is captured in the image capture device 100, and the centroid coordinates are converted into coordinates in the camera image coordinate system. Then, if the centroid coordinates are projected inside the rectangle of the foreground image in each image capture device 100, it is determined that the foreground image is a candidate image to be used for estimating the posture of the target subject.

Ｓ５４０において、画像選択部１１４は、Ｓ５３０で取得した画像候補に優先順位を設ける。そして、画像選択部１１４は、設定した該優先順位に基づいて、画像候補の中から姿勢推定で用いる画像を選択（決定）し、選択した画像を姿勢推定部１１５へ出力する。本実施形態では、一例として、画像選択部１１４は、優先順位を、（１）被写体領域の画像解像度、（２）被写体領域サイズ、（３）被写体部位の写り（全身が写っているか、一部だけが写っているか）に基づいて設定する。
（１）画像解像度
画像解像度は例えば、重心座標に位置するボクセルを、ボクセルに外接する球で近似し、カメラ画像上に射影した際の円の直径（ピクセル単位）とする。ボクセルがカメラから近い場合、大きな円となり解像度が高く、遠い場合、円が小さくなり解像度が低くなる。また、カメラから最も近いボクセルを射影した際の円の直径（最大直径）を算出し、重心座標のボクセルで算出した直径を正規化する。優先順位は、正規化された直径に応じて、直径が長いほど高く、短いほど低く設定する。
（２）被写体領域サイズ
被写体領域サイズは、バウンディングボックスの８点をカメラ画像に射影して得られる領域内に含まれる画素数をカウントすることで算出できる。被写体領域サイズも被写体がカメラから近い場合、面積が大きく、遠い場合、面積が小さくなる。最大面積で算出した面積を正規化し、正規化された面積に応じて、面積が大きいほど優先順位は高く、小さいほど低く設定する。
（３）被写体部位の写り（被写体領域の写りの大きさ）
被写体の写りは、バウンディングボックスの８点の中でカメラ画像内に射影された点の数とする。８点全てがカメラ画像内に射影された場合、被写体の全身が入っている、８点未満の場合、被写体のいずれかの部位がカメラ画像外である可能性が高いことを意味する。被写体の写りは、８点を１．０、０点を０．０として正規化し、優先順位は８点全てが入った場合が一番高く、カメラ画像外の点が多いほど優先順位は低く設定する。 In S540, image selection unit 114 sets a priority order for the image candidates acquired in S530. Then, image selection unit 114 selects (determines) an image to be used for posture estimation from among the image candidates based on the set priority order, and outputs the selected image to posture estimation unit 115. In the present embodiment, as an example, image selection unit 114 sets the priority order based on (1) image resolution of the subject region, (2) subject region size, and (3) subject part appearance (whether the whole body is shown or only a part is shown).
(1) Image Resolution The image resolution is, for example, the diameter (in pixels) of a circle when a voxel located at the center of gravity coordinates is approximated by a sphere circumscribing the voxel and projected onto the camera image. If the voxel is close to the camera, the circle becomes large and the resolution is high, and if it is far away, the circle becomes small and the resolution is low. In addition, the diameter (maximum diameter) of the circle when the closest voxel to the camera is projected is calculated, and the diameter calculated for the voxel at the center of gravity coordinates is normalized. The priority is set according to the normalized diameter, with the longer the diameter, the higher the priority, and the shorter the diameter, the lower the priority.
(2) Subject area size The subject area size can be calculated by counting the number of pixels contained within the area obtained by projecting the eight points of the bounding box onto the camera image. The subject area size is also larger when the subject is close to the camera, and smaller when the subject is far away. The area calculated using the maximum area is normalized, and the priority is set to a higher value for larger areas and a lower value for smaller areas according to the normalized area.
(3) Image of the subject part (size of the subject area)
The subject's appearance is defined as the number of points among the eight points of the bounding box that are projected into the camera image. When all eight points are projected into the camera image, the entire body of the subject is included, and when there are less than eight points, it means that there is a high possibility that some part of the subject is outside the camera image. The subject's appearance is normalized with 8 points being 1.0 and 0 points being 0.0, and the priority is set to be the highest when all eight points are included, and the lower the priority is set as the more points outside the camera image.

画像選択部１１４は、優先順位を、（１）被写体の画像解像度、（２）被写体領域サイズ、（３）被写体部位の写りのいずれか一つ以上によって決定しても良い。もしくは、撮影に使用される各撮像装置１００から被写体までの距離にばらつきがあるような撮影環境であれば、（１）被写体の画像解像度と（３）被写体部位の写りの両方を用いても良い。この場合、（１）と（３）のそれぞれの方法で算出された値（順位）を正規化した値を掛け合わせ、大きい順に優先順位を設定してもよい。 The image selection unit 114 may determine the priority order based on one or more of (1) the image resolution of the subject, (2) the subject area size, and (3) the appearance of the subject's body part. Alternatively, if the shooting environment is one in which there is variation in the distance from each imaging device 100 used for shooting to the subject, both (1) the image resolution of the subject and (3) the appearance of the subject's body part may be used. In this case, the values (ranks) calculated by the methods (1) and (3) may be multiplied by normalized values to set the priority order in descending order.

画像選択部１１４は、前述のように設定した優先順位に基づき、Ｓ５５０の姿勢推定に用いる前景画像を選択（決定）する。本実施形態では、前述のように、姿勢推定に用いる前景画像の数は予め決められており、当該数の情報は姿勢推定装置１１０に入力されている。画像選択部１１４は、Ｓ５３０で生成した画像候補から、優先順位が高い方から当該数の前景画像を選択する。姿勢推定に用いる前景画像の数は、姿勢推定に必要な撮像装置１００の数に対応する。撮影シーンによっては、前記（１）～（３）の条件に基づいて、画像候補の全ての前景画像に対して同じような優先順位が設定されうる。すなわち、いずれの撮像装置１００の優先順位（優先度）も同じように評価されるケースも考えられる。このような場合、画像選択部１１４は、撮像装置それぞれの位置に基づいて、さらなる優先順位を設定することができる。例えば、撮像装置１００間の角度が可能な限り等間隔になるような複数の撮像装置１００による前景画像を選択する。具体的には、１６台の撮像装置１００が対象の被写体の方に向き、各撮像装置１００の該被写体の前景画像が画像候補として生成され、８つの前景画像を選択する場合を想定する。このとき、任意の撮像装置１００を基準装置として、該基準装置から１台おきに撮像装置１００を選択し、該選択した撮像装置１００の前景画像を、姿勢推定に用いる。このようにすることで、全周囲で見えが異なる前景画像を選択できる。 The image selection unit 114 selects (determines) the foreground images to be used for posture estimation in S550 based on the priority order set as described above. In this embodiment, as described above, the number of foreground images to be used for posture estimation is predetermined, and information on the number is input to the posture estimation device 110. The image selection unit 114 selects the number of foreground images from the image candidates generated in S530 in order of priority order. The number of foreground images to be used for posture estimation corresponds to the number of imaging devices 100 required for posture estimation. Depending on the shooting scene, the same priority order may be set for all foreground images of the image candidates based on the conditions (1) to (3) above. That is, it is also possible that the priority order (priority) of each imaging device 100 is evaluated in the same way. In such a case, the image selection unit 114 can set further priority order based on the position of each imaging device. For example, foreground images from multiple imaging devices 100 are selected such that the angles between the imaging devices 100 are as equal as possible. Specifically, assume that 16 imaging devices 100 are facing the target subject, and foreground images of the subject from each imaging device 100 are generated as image candidates, and eight foreground images are selected. In this case, an arbitrary imaging device 100 is set as a reference device, and every other imaging device 100 is selected from the reference device, and the foreground image of the selected imaging device 100 is used for pose estimation. In this way, foreground images that look different all around can be selected.

なお、画像候補の数が予め決められた数に達しない場合、被写体の姿勢推定を中断するようにしてもよい。また、前景画像の数は、状況に応じて変えられてもよい。また、選択される前景画像の数は、下限の数と上限の数の間に収まるように選択されてもよい。この場合、画像候補の数が、下限の数を超える場合、上限の数を超えない範囲で、前景画像が優先順位に従って上位から選択されるようにしてもよい。選択される前景画像の数は、上限の数以下であればよい。また、画像候補の数が下限の数に達しない場合、被写体の姿勢推定を中断するようにしてもよい。 Note that if the number of image candidates does not reach a predetermined number, estimation of the subject's posture may be interrupted. The number of foreground images may be changed depending on the situation. The number of foreground images selected may be selected to be between a lower limit and an upper limit. In this case, if the number of image candidates exceeds the lower limit, foreground images may be selected from the top according to priority, so long as the number does not exceed the upper limit. The number of foreground images selected may be equal to or less than the upper limit. Note that if the number of image candidates does not reach the lower limit, estimation of the subject's posture may be interrupted.

Ｓ５５０において、姿勢推定部１１５は、選択（決定）された前景画像を用いて対象の被写体の姿勢（姿勢モデル）を推定する。姿勢推定の方法は多視点カメラ情報と初期姿勢モデルを用いる既知の方法で良い。初期姿勢モデルは図２に示す標準形状モデル２００の初期の関節位置等の情報２１１と関節同士の接続関係の情報２１２および、隣接する部位の角度の情報が記載されたファイルを、姿勢推定装置１１０が起動時に少なくとも一度読み込むことで取得することができる。姿勢推定の方法は、例えば、次のような方法がある。まず、初期姿勢モデルを変形させて、一時的な形状モデルを取得し、選択された前景画像に射影する。そして、射影された領域と前景画像との類似度を評価する。これらの処理を類似度が一定の閾値を満たすまで姿勢モデルを変えながら繰り返し、閾値を満たしたときの姿勢モデルを最適な姿勢モデルと推定できる。また、連続的に運動する被写体を撮影した場合、初期姿勢モデルは１フレーム前の姿勢モデルでも良い。 In S550, the posture estimation unit 115 estimates the posture (posture model) of the target subject using the selected (determined) foreground image. The posture estimation method may be a known method using multi-view camera information and an initial posture model. The initial posture model can be obtained by the posture estimation device 110 reading at least once at startup a file that describes information 211 such as the initial joint positions of the standard shape model 200 shown in FIG. 2, information 212 on the connection relationship between the joints, and information on the angles of adjacent parts. For example, the posture estimation method may be as follows. First, the initial posture model is deformed to obtain a temporary shape model, and projected onto the selected foreground image. Then, the similarity between the projected area and the foreground image is evaluated. These processes are repeated while changing the posture model until the similarity meets a certain threshold, and the posture model when the threshold is met can be estimated as the optimal posture model. In addition, when a continuously moving subject is photographed, the initial posture model may be the posture model of the previous frame.

Ｓ５６０において、姿勢推定部１１５は、全被写体を選択したか（全被写体の姿勢モデルを推定したか）を、Ｓ５２０で付与したボクセル集合の被写体ＩＤ（０～最大被写体ＩＤ）に基づいて判定する。全被写体を選択した場合は、処理を終了し、そうでない場合は、処理はＳ５３０へ戻る。 In S560, the posture estimation unit 115 determines whether all subjects have been selected (whether posture models have been estimated for all subjects) based on the subject ID (0 to maximum subject ID) of the voxel set assigned in S520. If all subjects have been selected, the process ends; if not, the process returns to S530.

本実施形態により、広大な撮影領域を写したカメラ集合の中から被写体毎に姿勢推定に用いる前景画像の候補を取得し、その中から最適な前景画像を選択した上で、被写体毎の姿勢モデルを推定することが可能となる。 This embodiment makes it possible to obtain candidate foreground images to be used for pose estimation for each subject from a set of cameras capturing a vast shooting area, select the most suitable foreground image from among them, and then estimate a pose model for each subject.

（実施形態２）
本実施形態では、撮像装置に写る複数の被写体が重なる場合、被写体の形状モデルを推定した結果を用いて距離画像を生成し、該距離画像を基に被写体の前後関係を考慮した上で、姿勢推定に用いる画像を選択する方法について述べる。さらに、静止物の形状モデルを用いて、静止物と被写体の前後関係も考慮した上で画像を選択する方法についても述べる。本実施形態では、撮像領域がスポーツ競技場である場合を想定する。 (Embodiment 2)
In this embodiment, when multiple objects captured by an imaging device overlap, a method is described in which a distance image is generated using the results of estimating a shape model of the object, and an image to be used for pose estimation is selected based on the distance image while taking into account the front-to-back relationship of the object. Furthermore, a method is also described in which an image is selected using a shape model of a stationary object while also taking into account the front-to-back relationship between the stationary object and the object. In this embodiment, the imaging area is assumed to be a sports stadium.

［画像処理システムと姿勢推定装置の構成］
本実施形態における画像処理システムの構成例を図６に示す。本実施形態における画像処理システム６０は、撮像装置１００と姿勢推定装置６００とを含む。撮像装置１００の構成については、実施形態１と同様のため、説明を省略する。また、姿勢推定装置６００のハードウェア構成についても、実施形態１と同様のため、説明を省略する。姿勢推定装置１１０の内部構成について、図６を参照して説明する。姿勢推定装置６００は、実施形態１で説明した姿勢推定装置１１０の構成に、静止物情報取得部６０１と距離画像生成部６０２が追加された構成となっている。カメラ情報取得部１１１、形状推定部１１２、画像候補生成部１１３、姿勢推定部１１５については実施形態１と同様のため、説明を省略する。 [Configuration of image processing system and posture estimation device]
An example of the configuration of an image processing system in this embodiment is shown in FIG. 6. The image processing system 60 in this embodiment includes an image capturing device 100 and a posture estimation device 600. The configuration of the image capturing device 100 is the same as that in the first embodiment, so a description thereof will be omitted. The hardware configuration of the posture estimation device 600 is also the same as that in the first embodiment, so a description thereof will be omitted. The internal configuration of the posture estimation device 110 will be described with reference to FIG. 6. The posture estimation device 600 is configured by adding a stationary object information acquisition unit 601 and a distance image generation unit 602 to the configuration of the posture estimation device 110 described in the first embodiment. The camera information acquisition unit 111, the shape estimation unit 112, the image candidate generation unit 113, and the posture estimation unit 115 are the same as those in the first embodiment, so a description thereof will be omitted.

静止物情報取得部６０１は、静止物情報を取得する。静止物情報は、例えば、静止物の３次元モデル（静止物モデル）情報が記載されたファイルである。静止物の３次元モデルは、静止物を撮影した画像を基に一般的な３次元モデリングツールであらかじめ生成されうる。生成されたモデルの情報は、ＯＢＪファイル形式など、３次元モデルの標準ファイル形式を有する。静止物情報取得部６０１は、例えば、姿勢推定装置１１０の起動時に、操作部４１６や通信Ｉ／Ｆ（図４）を介して静止物情報を取得することができる。本実施形態における静止物とは、撮像領域であるスポーツ競技場で行われているスポーツがサッカーやラグビーのようなスポーツの場合、ゴールポストや看板など、試合中に同じ場所で静止し続ける物体を指す。 The stationary object information acquisition unit 601 acquires stationary object information. The stationary object information is, for example, a file in which 3D model (stationary object model) information of a stationary object is described. The 3D model of the stationary object can be generated in advance using a general 3D modeling tool based on an image of the stationary object. The information of the generated model has a standard file format for 3D models, such as an OBJ file format. The stationary object information acquisition unit 601 can acquire stationary object information via the operation unit 416 or the communication I/F (FIG. 4), for example, when the posture estimation device 110 is started. In this embodiment, a stationary object refers to an object that remains stationary in the same place during a game, such as a goal post or a signboard, when the sport being played in the sports stadium, which is the imaging area, is a sport such as soccer or rugby.

距離画像生成部６０２は、形状推定部１１２により推定された被写体の形状モデル（被写体モデル）を用いて、距離画像を生成する。本実施形態における距離画像は、撮像画像と同じ画素数を持ち、各画素には撮像装置１００から最も近い被写体の表面までの距離情報が格納された画像である。距離画像生成部６０２は、静止物情報取得部６０１から静止物情報が入力された場合、世界座標系で静止物モデルと被写体モデルを統合し、静止物と被写体の前後関係を考慮した距離画像を生成する。すなわち、距離画像生成部６０２は、撮像装置１００から最も近い被写体および／または静止物の表面までの距離情報が格納された画像を生成する。 The distance image generating unit 602 generates a distance image using a shape model (subject model) of the subject estimated by the shape estimation unit 112. The distance image in this embodiment has the same number of pixels as the captured image, and each pixel stores distance information from the imaging device 100 to the surface of the nearest subject. When still object information is input from the still object information acquisition unit 601, the distance image generating unit 602 integrates the still object model and the subject model in the world coordinate system and generates a distance image that takes into account the front-to-back relationship between the still object and the subject. In other words, the distance image generating unit 602 generates an image that stores distance information from the imaging device 100 to the nearest subject and/or the surface of the still object.

画像選択部１１４は、距離画像生成部６０２により生成された距離画像を参照しながら、姿勢推定に用いる前景画像を画像候補から選択する。 The image selection unit 114 selects a foreground image to be used for pose estimation from the image candidates while referring to the distance image generated by the distance image generation unit 602.

［動作フロー］
続いて、姿勢推定装置６００の動作について説明する。図７は、姿勢推定装置６００により実行される処理のフローチャートである。図７に示すフローチャートは、姿勢推定装置６００のＣＰＵ４１１がＲＯＭ４１２やＲＡＭ４１３に格納されている制御プログラムを実行し、情報の演算および加工並びに各ハードウェアの制御を実行することにより実現されうる。なお、Ｓ７０５、Ｓ７２５、Ｓ７４０以外の各ステップは実施形態１で説明した図５の各ステップと同様であるため、説明を省略する。 [Operation flow]
Next, the operation of posture estimation device 600 will be described. Fig. 7 is a flowchart of the process executed by posture estimation device 600. The flowchart shown in Fig. 7 can be realized by CPU 411 of posture estimation device 600 executing a control program stored in ROM 412 or RAM 413, and performing calculation and processing of information and control of each piece of hardware. Note that each step other than S705, S725, and S740 is the same as each step in Fig. 5 described in embodiment 1, and therefore description thereof will be omitted.

Ｓ５００においてカメラパラメータが取得された後のＳ７０５において、静止物情報取得部６０１は、取得した静止物情報を読み込み、該静止物情報を距離画像生成部６０２へ出力する。その後、Ｓ５１０とＳ５２０において前景画像が取得され各被写体の形状モデルが推定された後、Ｓ７２５において、距離画像生成部６０２は、同じ世界座標系で表現された被写体モデルと静止物モデルを統合した上で、各撮像装置１００についての距離画像を生成する。具体的には、距離画像生成部６０２は、各カメラパラメータを用いて、被写体モデルと静止物モデルをレンダリングし、その際に距離情報を計算する。そして、距離画像生成部６０２は該距離情報が各画素に格納された距離画像を生成する。レンダリングする際、被写体モデルが点群で表現された場合、距離画像に穴が生じる。そのため、点群を構成する各点を半径ｒの球や一辺ｎの立方体など、体積を持った立体物の集合として表現することで距離画像の穴の発生を防ぐことができる。また、ＭａｒｃｈｉｎｇＣｕｂｅｓ法などにより、点群からメッシュモデルを生成した上でレンダリングしても穴の発生を防ぐことができる。レンダリングの詳細は、ＯｐｅｎＧＬなど一般的なグラフィックスライブラリで実現できるので省略する。距離画像生成部６０２は、生成した距離画像を画像選択部１１４へ出力する。 In S705 after the camera parameters are acquired in S500, the still object information acquisition unit 601 reads the acquired still object information and outputs the still object information to the distance image generation unit 602. After that, in S510 and S520, the foreground image is acquired and the shape model of each object is estimated, and then in S725, the distance image generation unit 602 integrates the object model and the still object model expressed in the same world coordinate system and generates a distance image for each imaging device 100. Specifically, the distance image generation unit 602 uses each camera parameter to render the object model and the still object model, and calculates distance information at that time. Then, the distance image generation unit 602 generates a distance image in which the distance information is stored in each pixel. When the object model is expressed as a point cloud during rendering, holes are generated in the distance image. Therefore, the occurrence of holes in the distance image can be prevented by expressing each point constituting the point cloud as a collection of three-dimensional objects with volume, such as a sphere with a radius r or a cube with one side n. In addition, holes can be prevented from occurring even if a mesh model is generated from a point cloud using the Marching Cubes method or the like and then rendered. Details of rendering are omitted here as this can be achieved using a general graphics library such as OpenGL. The distance image generation unit 602 outputs the generated distance image to the image selection unit 114.

距離画像が生成された後、Ｓ５３０において対象の被写体についての画像候補が生成される。続くＳ７４０において、画像選択部１１４は、Ｓ７２５で生成された距離画像に基づいて、対象の被写体が別の被写体や静止物によって遮蔽されているか否かを判定しながら、対象の被写体についての画像候補（前景画像）の中から姿勢推定に用いる前景画像を選択（決定）する。すなわち、画像選択部１１４は、画像候補における対象の被写体の少なくとも一部が、他の被写体の領域および／または静止物の領域により遮蔽されているかを判定し、当該判定の結果に基づいて、該対象の被写体について姿勢推定に用いる前景画像を選択（決定）する。 After the distance image is generated, in S530, image candidate images for the target subject are generated. In the following S740, the image selection unit 114 selects (determines) a foreground image to be used for pose estimation from among the image candidate images (foreground images) for the target subject while determining whether the target subject is occluded by another subject or a stationary object based on the distance image generated in S725. That is, the image selection unit 114 determines whether at least a portion of the target subject in the image candidate is occluded by an area of another subject and/or an area of a stationary object, and selects (determines) a foreground image to be used for pose estimation for the target subject based on the result of the determination.

前景画像を選択するために、実施形態１で説明した図５のＳ５４０と同様に、画像選択部１１４は、優先順位を設定する。距離画像を参照しながら優先順位を付ける方法について、図８を参照しながら説明する。 To select a foreground image, the image selection unit 114 sets priorities in the same manner as S540 in FIG. 5 described in the first embodiment. A method of setting priorities while referring to a distance image will be described with reference to FIG. 8.

図８は、被写体の重なりを示す模式図である。図８の（ａ）～（ｅ）は、異なる撮像装置１００が複数の被写体を撮影した場合の、各撮像装置１００における前景領域（白色）を示す前景画像を示している。本実施形態では該前景領域は、被写体領域と静止物領域を含みうる。図８（ａ）は、３つの被写体８１０、８２０、８３０が離れている場合の前景画像を示す。図８（ｂ）は、被写体８２０が画角外、被写体８３０が被写体８１０を遮蔽している場合の前景画像を示す。図８（ｃ）は、被写体８２０が画角外、被写体８１０が被写体８３０を遮蔽している場合の前景画像を示す。図８（ｄ）は、被写体８２０は被写体８３０から離れているが、被写体８１０が画角外、被写体８３０は静止物（白棒領域）に遮蔽している場合の前景画像を示す。図８（ｅ）は、被写体８３０が画角外、被写体８１０と被写体８２０が離れている場合の前景画像を示す。 Figure 8 is a schematic diagram showing overlapping of subjects. (a) to (e) of Figure 8 show foreground images showing the foreground area (white) in each imaging device 100 when different imaging devices 100 capture multiple subjects. In this embodiment, the foreground area may include a subject area and a stationary object area. Figure 8 (a) shows a foreground image when three subjects 810, 820, and 830 are separated. Figure 8 (b) shows a foreground image when subject 820 is outside the angle of view and subject 830 is blocking subject 810. Figure 8 (c) shows a foreground image when subject 820 is outside the angle of view and subject 810 is blocking subject 830. Figure 8 (d) shows a foreground image when subject 820 is away from subject 830, but subject 810 is outside the angle of view and subject 830 is blocked by a stationary object (white bar area). FIG. 8(e) shows a foreground image when subject 830 is outside the angle of view and subjects 810 and 820 are far apart.

ここで、対象の被写体を被写体８３０とする。Ｓ５３０で画像候補生成部１１３が、対象の被写体８３０に対する画像候補を生成する場合、実施形態１で説明したＳ５３０の処理により、図８（ｅ）の前景画像では被写体８３０が存在しないため、当該前景画像は画像候補に含めない。画像候補生成部１１３は、図８（ａ）～（ｄ）の前景画像を、画像候補として生成する。 Here, the target subject is assumed to be subject 830. When the image candidate generation unit 113 generates image candidates for the target subject 830 in S530, due to the processing of S530 described in embodiment 1, subject 830 does not exist in the foreground image of FIG. 8(e), and therefore the foreground image is not included in the image candidates. The image candidate generation unit 113 generates the foreground images of FIGS. 8(a) to (d) as image candidates.

続いて画像選択部１１４は、図８（ａ）～（ｄ）の前景画像に対して優先順位を設定する。本実施形態では優先度を設定し、優先順位は該優先度が高い順から設定されるものとする。まず、該前景画像を取得した１つ以上の撮像装置１００について、各被写体の形状モデルをカメラ画像上に射影し、被写体毎の距離画像を生成する。そして、該距離画像における各被写体領域（画素値が０以外の領域）を囲む各矩形情報を生成する。該矩形情報の生成処理は、一度生成すれば次の被写体ＩＤのループにおいて生成されなくてもよい。画像選択部１１４は、対象の被写体領域の矩形が、他の被写体領域の矩形と交差しなければ、対象の被写体の前景領域は図８（ａ）のように分離されていると判断し、優先度を１と設定する。 Then, the image selection unit 114 sets priorities for the foreground images in Figures 8(a) to (d). In this embodiment, priorities are set, and the priorities are set from highest to lowest. First, for one or more imaging devices 100 that have acquired the foreground images, a shape model of each subject is projected onto a camera image to generate a distance image for each subject. Then, rectangular information surrounding each subject area (area with pixel values other than 0) in the distance image is generated. Once the rectangular information is generated, it does not need to be generated in the next subject ID loop. If the rectangle of the subject area of interest does not intersect with the rectangles of other subject areas, the image selection unit 114 determines that the foreground area of the subject of interest is separated as shown in Figure 8(a) and sets the priority to 1.

一方、矩形同士（対象の被写体領域の矩形と他の被写体領域の矩形）が交差する場合、対象の被写体が他の被写体に遮蔽されている可能性があるため、被写体同士の前後関係を判定する必要がある。判定するために、画像選択部１１４は、対象の被写体の距離画像を取得した撮像装置１００について、該対象の被写体の距離画像と、Ｓ７２５で生成した統合距離画像とを比較する。具体的には、画像選択部１１４は、対象の被写体の距離画像における対象被写体領域の各画素に格納された距離と、該統合距離画像の各画素に格納された距離を比較し、その差が一定の閾値未満であれば、該画素は該撮像装置１００から最も近い対象被写体の表面にあると推定し、カウントアップする。反対に、該差が閾値以上であれば、他の被写体に対象の被写体は遮蔽されていると推定する。被写体領域における全画素について距離を比較した結果、該撮像装置１００から最も近い対象被写体の表面にある画素としてカウントされた画素の数を被写体領域の全画素数で割った値を優先度とする。つまり、図８の被写体８３０の優先度を考えると、図８（ｂ）のように被写体８３０が被写体８１０より前に位置すると優先度は高く、図８（ｃ）のように被写体８３０が被写体８１０の奥に位置すると優先度は低くなるように、被写体８３０に対する優先度が算出される。 On the other hand, when rectangles (a rectangle in the target subject area and a rectangle in another subject area) intersect, the target subject may be occluded by the other subject, so it is necessary to determine the front-back relationship between the subjects. To make the determination, the image selection unit 114 compares the distance image of the target subject of the imaging device 100 that acquired the distance image of the target subject with the integrated distance image generated in S725. Specifically, the image selection unit 114 compares the distance stored in each pixel of the target subject area in the distance image of the target subject with the distance stored in each pixel of the integrated distance image, and if the difference is less than a certain threshold, it is estimated that the pixel is on the surface of the target subject closest to the imaging device 100 and counts up. Conversely, if the difference is equal to or greater than the threshold, it is estimated that the target subject is occluded by the other subject. As a result of comparing the distances for all pixels in the subject area, the number of pixels counted as pixels on the surface of the target subject closest to the imaging device 100 divided by the total number of pixels in the subject area is set as the priority. In other words, when considering the priority of subject 830 in FIG. 8, the priority of subject 830 is calculated so that if subject 830 is located in front of subject 810 as in FIG. 8(b), the priority is high, and if subject 830 is located behind subject 810 as in FIG. 8(c), the priority is low.

また、図８（ｄ）のように、被写体８３０を撮像装置１００に射影した結果、静止物領域に重なる可能性もある。この場合でも、被写体８３０の被写体領域内の画素の距離と統合距離画像における距離を比較すると、静止物領域の画素により差が大きくなり、該撮像装置１００から最も近い対象被写体の表面にある画素としてカウントされた画素の数が少なくなる。よって、優先度が低くなる。さらに、静止物と重なった場合、優先度が低くなる重みをかけることで、図８（ｄ）のような前景画像の優先度は極小にできる。 Also, as shown in FIG. 8(d), when the subject 830 is projected onto the imaging device 100, it may overlap with a stationary object area. Even in this case, when comparing the distance of the pixels in the subject area of the subject 830 with the distance in the integrated distance image, the difference is large due to the pixels in the stationary object area, and the number of pixels counted as pixels on the surface of the target subject closest to the imaging device 100 is reduced. Therefore, the priority is lowered. Furthermore, when there is overlap with a stationary object, the priority of the foreground image as shown in FIG. 8(d) can be minimized by applying a weight that lowers the priority.

このように処理することで、図８における被写体８３０の優先順位は、図８（ａ）から図８（ｄ）の順に低くなる。具体的には、図８（ａ）の前景画像の優先度を１、図８（ｄ）の優先度を極小に、図８（ｂ）は図８（ａ）に近い優先度を、図８（ｃ）は図８（ｂ）よりも低い優先度を設定できる。画像選択部１１４は該優先度に応じて優先順位を設定し、上位の前景画像を選択する。最後に、選択された前景画像を姿勢推定部１１５へ送ることで、姿勢推定部１１５は被写体の姿勢モデルを推定することができる。 Processing in this manner results in the priority order of the subject 830 in FIG. 8 decreasing from FIG. 8(a) to FIG. 8(d). Specifically, the priority of the foreground image in FIG. 8(a) can be set to 1, the priority of FIG. 8(d) to minimum, the priority of FIG. 8(b) to close to that of FIG. 8(a), and the priority of FIG. 8(c) to lower than that of FIG. 8(b). The image selection unit 114 sets the priority order according to the priority and selects the top foreground image. Finally, the selected foreground image is sent to the posture estimation unit 115, which can estimate a posture model of the subject.

本実施形態により、各撮像装置と被写体や静止物との距離を考慮しながら、姿勢推定に用いる前景画像を選択できるようになる。また、姿勢推定を行う対象の被写体が遮蔽された前景画像を選択することを防げることができるため、姿勢推定処理の処理時間と誤差を低減することが可能となる。 This embodiment makes it possible to select a foreground image to be used for pose estimation while taking into account the distance between each imaging device and the subject or stationary object. In addition, it is possible to prevent the selection of a foreground image in which the subject for which pose estimation is to be performed is occluded, thereby making it possible to reduce the processing time and error of the pose estimation process.

このように、以上に説明した実施形態によれば、多視点カメラに写る複数被写体の姿勢モデルを推定する場合、被写体の位置や被写体同士の重なりなどを考慮して、限定された台数の撮像装置と、当該撮像装置による画像を選択できる。その結果、姿勢推定の処理時間や誤差を低減できる。 As described above, according to the embodiment described above, when estimating pose models of multiple subjects captured by a multi-view camera, a limited number of imaging devices and images captured by those imaging devices can be selected, taking into account the positions of the subjects and overlaps between the subjects. As a result, the processing time and errors in pose estimation can be reduced.

（その他の実施形態）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 Other Embodiments
The present invention can also be realized by a process in which a program for implementing one or more of the functions of the above-described embodiments is supplied to a system or device via a network or a storage medium, and one or more processors in a computer of the system or device read and execute the program. The present invention can also be realized by a circuit (e.g., ASIC) that implements one or more of the functions.

発明は上記実施形態に制限されるものではなく、発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、発明の範囲を公にするために請求項を添付する The invention is not limited to the above-described embodiment, and various modifications and variations are possible without departing from the spirit and scope of the invention. Therefore, the following claims are appended to clarify the scope of the invention.

１００撮像装置、１１０；６００姿勢推定装置、１１１カメラ情報取得部、１１２形状推定部、１１３画像候補生成部、１１４画像選択部、１１５姿勢推定部、６０１静止物情報取得部、６０２距離画像生成部 100 Imaging device, 110; 600 Pose estimation device, 111 Camera information acquisition unit, 112 Shape estimation unit, 113 Image candidate generation unit, 114 Image selection unit, 115 Pose estimation unit, 601 Stationary object information acquisition unit, 602 Distance image generation unit

Claims

複数の撮像装置が複数の被写体を異なる方向から撮像することに基づいて得られた複数の被写体を示す複数の画像を取得する画像取得手段と、
前記複数の画像に基づいて、前記複数の被写体の各被写体に対して生成された、被写体の３次元形状を示す形状モデルを取得するモデル取得手段と、
前記複数の被写体のうちの、３次元姿勢を示す姿勢モデルの生成の対象の被写体に対して生成された前記形状モデルに基づいて、前記複数の画像から前記対象の被写体の前記姿勢モデルの生成に用いる画像を特定する特定手段と、
前記特定された画像に基づいて、前記対象の被写体の前記姿勢モデルを生成する生成手段と、
を有することを特徴とする生成装置。 an image acquisition means for acquiring a plurality of images showing a plurality of subjects obtained by a plurality of imaging devices capturing images of the plurality of subjects from different directions;
a model acquisition means for acquiring a shape model representing a three-dimensional shape of each of the plurality of objects, the shape model being generated for each of the plurality of objects based on the plurality of images;
a specifying means for specifying, from the plurality of images, an image to be used for generating a posture model of a target subject, which is a target of generating a posture model indicating a three-dimensional posture, based on the shape model generated for the target subject among the plurality of subjects;
generating means for generating the pose model of the target subject based on the identified image;
A generating device comprising:

前記特定手段は、前記対象の被写体に対して生成された前記形状モデルと、前記複数の画像における前記対象の被写体の領域に基づいて、前記対象の被写体の前記姿勢モデルの生成に用いる画像を特定することを特徴とする請求項１に記載の生成装置。 The generating device according to claim 1, characterized in that the identification means identifies images to be used in generating the posture model of the target subject based on the shape model generated for the target subject and the area of the target subject in the multiple images.

前記特定手段は、前記対象の被写体に対して生成された前記形状モデルと、前記複数の画像における前記対象の被写体の領域に基づいて、前記複数の画像に対する優先順位を設定し、当該優先順位に従って、前記対象の被写体の前記姿勢モデルの生成に用いる画像を特定することを特徴とする請求項２に記載の生成装置。 The generating device according to claim 2, characterized in that the identifying means sets priorities for the multiple images based on the shape model generated for the target subject and the area of the target subject in the multiple images, and identifies images to be used in generating the posture model of the target subject according to the priorities.

前記特定手段は、前記複数の画像における前記対象の被写体の領域の画像解像度に基づいて、前記優先順位を設定することを特徴とする請求項３に記載の生成装置。 The generating device according to claim 3, characterized in that the identification means sets the priority based on the image resolution of the area of the target subject in the multiple images.

前記特定手段は、前記複数の画像における前記対象の被写体の領域のサイズに基づいて、前記優先順位を設定することを特徴とする請求項３または４に記載の生成装置。 The generating device according to claim 3 or 4, characterized in that the identification means sets the priority based on the size of the area of the target subject in the multiple images.

前記特定手段は、前記複数の画像における前記対象の被写体の領域の写りの大きさに基づいて、前記優先順位を設定することを特徴とする請求項３から５のいずれか１項に記載の生成装置。 The generating device according to any one of claims 3 to 5, characterized in that the identification means sets the priority order based on the size of the area of the target subject in the multiple images.

前記特定手段は、前記優先順位と前記複数の撮像装置それぞれの位置に基づいて、前記対象の被写体の前記姿勢モデルの生成に用いる画像を特定することを特徴とする請求項３から６のいずれか１項に記載の生成装置。 The generating device according to any one of claims 3 to 6, characterized in that the identifying means identifies an image to be used in generating the posture model of the target subject based on the priority order and the respective positions of the plurality of image capturing devices.

前記複数の画像において、前記対象の被写体の領域の少なくとも一部が、前記複数の被写体のうちの他の被写体の領域および／または静止物の領域により遮蔽されているかを判定する判定手段を更に有し、
前記特定手段は、前記判定手段による前記判定の結果に基づいて、前記対象の被写体の前記姿勢モデルの生成に用いる画像を特定することを特徴とする請求項３に記載の生成装置。 a determining unit for determining whether at least a part of an area of the target subject in the plurality of images is occluded by an area of another subject among the plurality of subjects and/or an area of a stationary object;
The generating device according to claim 3 , wherein the specifying unit specifies an image to be used in generating the posture model of the target subject based on a result of the determination by the determining unit.

前記判定手段は、前記複数の撮像装置のうち前記対象の被写体を撮像する各撮像装置と前記対象の被写体との距離と、前記各撮像装置と前記他の被写体および／または前記静止物との距離に基づいて、前記対象の被写体の領域の少なくとも一部が前記他の被写体の領域および／または前記静止物の領域により遮蔽されているかを判定することを特徴とする請求項８に記載の生成装置。 The generating device according to claim 8, characterized in that the determining means determines whether at least a part of the area of the target subject is occluded by the area of the other subject and/or the area of the stationary object based on the distance between each of the imaging devices that captures the target subject and the target subject, and the distance between each of the imaging devices and the other subject and/or the stationary object.

前記画像取得手段は、前記複数の撮像装置のうち、異常が発生した撮像装置から異常情報を取得し、
前記モデル取得手段は、前記異常情報を取得した撮像装置により得られた画像に基づいて形状モデルを取得しないことを特徴とする請求項１から９のいずれか１項に記載の生成装置。 the image acquisition means acquires anomaly information from an imaging device in which an abnormality has occurred among the plurality of imaging devices;
10. The generating device according to claim 1, wherein the model acquisition means does not acquire the shape model based on an image obtained by an imaging device that has acquired the abnormality information.

前記特定手段は、前記対象の被写体の前記姿勢モデルの生成に用いる画像として複数の画像を特定することを特徴とする請求項１から１０のいずれか１項に記載の生成装置。 The generating device according to any one of claims 1 to 10, characterized in that the identification means identifies a plurality of images as images to be used in generating the posture model of the target subject.

複数の撮像装置が複数の被写体を異なる方向から撮像することに基づいて得られた複数の被写体を示す複数の画像を取得する画像取得工程と、
前記複数の画像に基づいて、前記複数の被写体の各被写体に対して生成された、被写体の３次元形状を示す形状モデルを取得するモデル取得工程と、
前記複数の被写体のうちの、３次元姿勢を示す姿勢モデルの生成の対象の被写体に対して生成された前記形状モデルに基づいて、前記複数の画像から前記対象の被写体の前記姿勢モデルの生成に用いる画像を特定する特定工程と、
前記特定された画像に基づいて、前記対象の被写体の前記姿勢モデルを生成する生成工程と、
を有することを特徴とする生成方法。 an image acquisition step of acquiring a plurality of images showing a plurality of subjects obtained by using a plurality of image capture devices to capture images of the plurality of subjects from different directions;
a model acquisition step of acquiring a shape model representing a three-dimensional shape of each of the plurality of objects, the shape model being generated for each of the plurality of objects based on the plurality of images;
a specifying step of specifying, from the plurality of images, an image to be used for generating a posture model of a target subject, which is a target of generating a posture model indicating a three-dimensional posture, based on the shape model generated for the target subject among the plurality of subjects;
generating the pose model of the target subject based on the identified image;
A method for producing the same, comprising:

コンピュータを、請求項１から１１のいずれか１項に記載の生成装置として機能させるためのプログラム。 A program for causing a computer to function as a generating device according to any one of claims 1 to 11.